xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* arm: xl vcpu-pin leads to oom-killer slashing processes
@ 2018-12-05 10:26 Andrii Anisov
  2018-12-05 10:28 ` Andrii Anisov
  2018-12-05 10:49 ` Julien Grall
  0 siblings, 2 replies; 13+ messages in thread
From: Andrii Anisov @ 2018-12-05 10:26 UTC (permalink / raw)
  To: xen-devel

Hello,

On the current
     6d8ffac (xenbits/master) xen/arm: gic: Remove duplicated comment in do_sgi
and
     7073942 (xenbits/staging, xenbits/smoke, xenbits/coverity-tested/smoke) pci: apply workaround for Intel errata HSE43 and BDF2/BDX2

`xl vcpu-pin` leads to oom-killer becomes mad and slashing all processes:


Poky (Yocto Project Reference Distro) 2.4.2 salvator-x hvc0

salvator-x login: root
Last login: Tue Sep 18 14:40:04 UTC 2018 on tty2
root@salvator-x:~# xl vcpu-list
Name                                ID  VCPU   CPU State   Time(s) Affinity (Hard / Soft)
Domain-0                             0     0    0   -b-       4.9  all / all
Domain-0                             0     1    1   r--       4.5  all / all
root@salvator-x:~# xl vcpu-pin Domain-0 all 0-1
root@salvator-x:~# xl vcpu-list
Name                                ID  VCPU   CPU State   Time(s) Affinity (Hard / Soft)
Domain-0                             0     0    0   r--       5.0  0-1 / all
Domain-0                             0     1    1   r--       6.1  0-1 / all
root@salvator-x:~# [   38.041306] systemd invoked oom-killer: gfp_mask=0x14040c0(GFP_KERNEL|__GFP_COMP), nodemask=(null),  order=0, oom_score_adj=0
[   38.052531] systemd cpuset=/ mems_allowed=0
[   38.056791] CPU: 1 PID: 1 Comm: systemd Tainted: G           O    4.14.35-yocto-standard #1
[   38.065112] Hardware name: Renesas Salvator-X 2nd version board based on r8a7795 ES2.0+ (DT)
[   38.073579] Call trace:
[   38.076095] [<ffff000008089a50>] dump_backtrace+0x0/0x3c8
[   38.081530] [<ffff000008089e2c>] show_stack+0x14/0x20
[   38.086631] [<ffff000008ac8640>] dump_stack+0x9c/0xbc
[   38.091729] [<ffff00000819b048>] dump_header+0x90/0x1e8
[   38.096996] [<ffff00000819a53c>] oom_kill_process+0x26c/0x568
[   38.102784] [<ffff00000819ac20>] out_of_memory+0x198/0x4c0
[   38.108317] [<ffff0000081a07d4>] __alloc_pages_nodemask+0xb5c/0xbf0
[   38.114622] [<ffff0000081f2a24>] alloc_pages_current+0x7c/0xe8
[   38.120494] [<ffff0000081fc354>] new_slab+0x404/0x548
[   38.125590] [<ffff0000081fe6f8>] ___slab_alloc+0x480/0x5f0
[   38.131121] [<ffff0000081fe88c>] __slab_alloc.isra.23+0x24/0x38
[   38.137082] [<ffff0000081ff0bc>] kmem_cache_alloc+0x19c/0x1e0
[   38.142875] [<ffff0000083278ec>] nfs_readhdr_alloc+0x1c/0x30
[   38.148574] [<ffff00000832667c>] nfs_generic_pg_pgios+0x1c/0xc8
[   38.154534] [<ffff000008326084>] nfs_pageio_doio+0x34/0x70
[   38.160065] [<ffff0000083275d0>] nfs_pageio_complete+0x50/0xc8
[   38.165939] [<ffff0000083286f8>] nfs_readpages+0xd8/0x1b8
[   38.171385] [<ffff0000081a60a0>] __do_page_cache_readahead+0x180/0x268
[   38.177955] [<ffff000008197b90>] filemap_fault+0x2c0/0x600
[   38.183481] [<ffff0000081c8ba8>] __do_fault+0x20/0x78
[   38.188577] [<ffff0000081ce8a4>] __handle_mm_fault+0xafc/0x1050
[   38.194538] [<ffff0000081cef24>] handle_mm_fault+0x12c/0x1d8
[   38.200243] [<ffff00000809d120>] do_page_fault+0x1a8/0x3d0
[   38.205770] [<ffff00000809d384>] do_translation_fault+0x3c/0x48
[   38.211732] [<ffff000008081310>] do_mem_abort+0x40/0xa0
[   38.217002] [<ffff0000080813f8>] do_el0_ia_bp_hardening+0x38/0x98
[   38.223137] Exception stack(0xffff000008013ec0 to 0xffff000008014000)
[   38.229618] 3ec0: 0000ffffc49a4860 0000ffffc49a4bb8 0000ffffc49a4bf7 0000000000000000
[   38.237484] 3ee0: 0000000000000000 0000ffffc49a49a0 000000000000001f 0000ffffaa8cf2a8
[   38.245342] 3f00: 0000ffffc49a4a60 ffffff80ffffffe8 0000ffffc49a4a80 0000ffffc49a4a80
[   38.253204] 3f20: 0000000000000018 000000005ba10e62 003ae0c145b56d95 000018ebb6886232
[   38.261066] 3f40: 0000ffffaa99aae0 0000ffffaa63ad10 0000ffffc49a47fe 0000ffffc49a4860
[   38.268928] 3f60: 0000ffffc49a4bb8 0000ffffc49a4bf7 0000ffffc49a4bb8 0000ffffaa8d9bc8
[   38.276791] 3f80: 0000000000000001 0000ffffc49a4bb8 0000ffffaa914f00 0000ffffc49a4bb8
[   38.284653] 3fa0: 00000000000f4240 0000ffffc49a47d0 0000ffffaa5cfad0 0000ffffc49a47d0
[   38.292516] 3fc0: 0000ffffaa5cdf50 0000000080000000 0000aaaafb9d1370 00000000ffffffff
[   38.300378] 3fe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[   38.308241] [<ffff0000080834d4>] el0_ia+0x18/0x1c
[   38.313028] Mem-Info:
[   38.315331] active_anon:3647 inactive_anon:5174 isolated_anon:0
[   38.315331]  active_file:4 inactive_file:3 isolated_file:0
[   38.315331]  unevictable:3 dirty:0 writeback:0 unstable:0
[   38.315331]  slab_reclaimable:1527 slab_unreclaimable:4757
[   38.315331]  mapped:3350 shmem:5255 pagetables:204 bounce:0
[   38.315331]  free:102395 free_pcp:171 free_cma:90122
[   38.348452] Node 0 active_anon:14588kB inactive_anon:20696kB active_file:0kB inactive_file:4kB unevictable:12kB isolated(anon):0kB isolated(file):0kB mapped:13344kB dirty:0kB writeback:0kB shmem:21020kB shmems
[   38.375385] Node 0 DMA free:388912kB min:24368kB low:30460kB high:36552kB active_anon:72kB inactive_anon:0kB active_file:0kB inactive_file:196kB unevictable:0kB writepending:0kB present:1703936kB managed:4267B
[   38.403038] lowmem_reserve[]: 0 1342 1342
[   38.407091] Node 0 Normal free:20800kB min:20684kB low:25852kB high:31020kB active_anon:14516kB inactive_anon:20696kB active_file:0kB inactive_file:0kB unevictable:12kB writepending:0kB present:1441792kB manaB
[   38.435343] lowmem_reserve[]: 0 0 0
[   38.438885] Node 0 DMA: 38*4kB (MC) 36*8kB (UMC) 12*16kB (UMC) 4*32kB (UE) 14*64kB (UMEC) 9*128kB (UMEC) 3*256kB (ME) 5*512kB (UMEC) 6*1024kB (UME) 4*2048kB (MEC) 90*4096kB (MC) = 389112kB
[   38.455651] Node 0 Normal: 804*4kB (UMH) 413*8kB (UMH) 263*16kB (UMH) 139*32kB (UMH) 72*64kB (UMH) 9*128kB (UM) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 20936kB
[   38.470339] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[   38.478803] 5266 total pagecache pages
[   38.482604] 0 pages in swap cache
[   38.485974] Swap cache stats: add 0, delete 0, find 0/0
[   38.491244] Free swap  = 0kB
[   38.494181] Total swap = 0kB
[   38.497119] 786432 pages RAM
[   38.500047] 0 pages HighMem/MovableOnly
[   38.503945] 649566 pages reserved
[   38.507319] 98304 pages cma reserved
[   38.510943] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
[   38.519552] [ 1884]     0  1884     3375      405       9       4        0             0 systemd-journal
[   38.529011] [ 2475]     0  2475     3262      255       8       3        0         -1000 systemd-udevd
[   38.538336] [ 3098]   997  3098     1523       76       7       3        0             0 systemd-network
[   38.547844] [ 3099]   999  3099     1237       89       6       3        0             0 avahi-daemon
[   38.557085] [ 3101]     0  3101      763       18       5       3        0             0 syslogd
[   38.565897] [ 3102]   998  3102     1115       87       6       3        0          -900 dbus-daemon
[   38.575057] [ 3103]   999  3103     1204       68       6       3        0             0 avahi-daemon
[   38.584303] [ 3104]     0  3104     1502      119       7       4        0             0 systemd-logind
[   38.593719] [ 3106]     0  3106      763       18       5       3        0             0 klogd
[   38.602360] [ 3125]     0  3125      977       80       6       3        0             0 xenstored
[   38.611344] [ 3138]     0  3138     1757      123       7       3        0             0 weston-launch
[   38.620692] [ 3141]   996  3141     1670       76       7       3        0             0 systemd-resolve
[   38.630200] [ 3159]     0  3159     1802      157       7       3        0             0 systemd
[   38.639001] [ 3161]     0  3161      579       32       5       3        0             0 agetty
[   38.647718] [ 3165]     0  3165     2382      344       8       3        0             0 (sd-pam)
[   38.656618] [ 3194]     0  3194    17196       48       6       3        0             0 xenconsoled
[   38.665777] [ 3203]     0  3203    17196       57       7       3        0             0 xenconsoled
[   38.674936] [ 3206]     0  3206    36895     3598      42       3        0             0 weston
[   38.683661] [ 3229]     0  3229      680       43       5       3        0             0 xenwatchdogd
[   38.692906] [ 3230]     0  3230     1403      125       7       3        0             0 login
[   38.701563] [ 3232]     0  3232     2612      431       9       3        0             0 weston-keyboard
[   38.711075] [ 3234]     0  3234     5359     3150      14       3        0             0 weston-desktop-
[   38.720556] [ 3237]     0  3237      913       99       5       3        0             0 sh
[   38.728934] Out of memory: Kill process 3206 (weston) score 25 or sacrifice child
[   38.736484] Killed process 3234 (weston-desktop-) total-vm:21436kB, anon-rss:1232kB, file-rss:8kB, shmem-rss:11360kB
[   45.856773] systemd invoked oom-killer: gfp_mask=0x14040c0(GFP_KERNEL|__GFP_COMP), nodemask=(null),  order=0, oom_score_adj=0
[   45.867983] systemd cpuset=/ mems_allowed=0


-- 
Sincerely,
Andrii Anisov.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: arm: xl vcpu-pin leads to oom-killer slashing processes
  2018-12-05 10:26 arm: xl vcpu-pin leads to oom-killer slashing processes Andrii Anisov
@ 2018-12-05 10:28 ` Andrii Anisov
  2018-12-05 10:49 ` Julien Grall
  1 sibling, 0 replies; 13+ messages in thread
From: Andrii Anisov @ 2018-12-05 10:28 UTC (permalink / raw)
  To: xen-devel

It happens with credit and credit2 schedulers, with old and new vgic.

On 05.12.18 12:26, Andrii Anisov wrote:
> Hello,
> 
> On the current
>      6d8ffac (xenbits/master) xen/arm: gic: Remove duplicated comment in do_sgi
> and
>      7073942 (xenbits/staging, xenbits/smoke, xenbits/coverity-tested/smoke) pci: apply workaround for Intel errata HSE43 and BDF2/BDX2
> 
> `xl vcpu-pin` leads to oom-killer becomes mad and slashing all processes:
> 

-- 
Sincerely,
Andrii Anisov.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: arm: xl vcpu-pin leads to oom-killer slashing processes
  2018-12-05 10:26 arm: xl vcpu-pin leads to oom-killer slashing processes Andrii Anisov
  2018-12-05 10:28 ` Andrii Anisov
@ 2018-12-05 10:49 ` Julien Grall
  2018-12-05 10:59   ` Andrii Anisov
  1 sibling, 1 reply; 13+ messages in thread
From: Julien Grall @ 2018-12-05 10:49 UTC (permalink / raw)
  To: Andrii Anisov, xen-devel



On 05/12/2018 10:26, Andrii Anisov wrote:
> Hello,
> 
> On the current
>      6d8ffac (xenbits/master) xen/arm: gic: Remove duplicated comment in do_sgi
> and
>      7073942 (xenbits/staging, xenbits/smoke, xenbits/coverity-tested/smoke) 
> pci: apply workaround for Intel errata HSE43 and BDF2/BDX2
> 
> `xl vcpu-pin` leads to oom-killer becomes mad and slashing all processes:

I am not sure to understand what is the relation between the two. What is the 
latest Xen commit where the oom-killer does not trigger?

How much memory do you have in Dom0? Do you have any memory hungry process running?

> 
> Poky (Yocto Project Reference Distro) 2.4.2 salvator-x hvc0
> 
> salvator-x login: root
> Last login: Tue Sep 18 14:40:04 UTC 2018 on tty2
> root@salvator-x:~# xl vcpu-list
> Name                                ID  VCPU   CPU State   Time(s) Affinity 
> (Hard / Soft)
> Domain-0                             0     0    0   -b-       4.9  all / all
> Domain-0                             0     1    1   r--       4.5  all / all
> root@salvator-x:~# xl vcpu-pin Domain-0 all 0-1
> root@salvator-x:~# xl vcpu-list
> Name                                ID  VCPU   CPU State   Time(s) Affinity 
> (Hard / Soft)
> Domain-0                             0     0    0   r--       5.0  0-1 / all
> Domain-0                             0     1    1   r--       6.1  0-1 / all
> root@salvator-x:~# [   38.041306] systemd invoked oom-killer: 
> gfp_mask=0x14040c0(GFP_KERNEL|__GFP_COMP), nodemask=(null),  order=0, 
> oom_score_adj=0
> [   38.052531] systemd cpuset=/ mems_allowed=0
> [   38.056791] CPU: 1 PID: 1 Comm: systemd Tainted: G           O    
> 4.14.35-yocto-standard #1
> [   38.065112] Hardware name: Renesas Salvator-X 2nd version board based on 
> r8a7795 ES2.0+ (DT)
> [   38.073579] Call trace:
> [   38.076095] [<ffff000008089a50>] dump_backtrace+0x0/0x3c8
> [   38.081530] [<ffff000008089e2c>] show_stack+0x14/0x20
> [   38.086631] [<ffff000008ac8640>] dump_stack+0x9c/0xbc
> [   38.091729] [<ffff00000819b048>] dump_header+0x90/0x1e8
> [   38.096996] [<ffff00000819a53c>] oom_kill_process+0x26c/0x568
> [   38.102784] [<ffff00000819ac20>] out_of_memory+0x198/0x4c0
> [   38.108317] [<ffff0000081a07d4>] __alloc_pages_nodemask+0xb5c/0xbf0
> [   38.114622] [<ffff0000081f2a24>] alloc_pages_current+0x7c/0xe8
> [   38.120494] [<ffff0000081fc354>] new_slab+0x404/0x548
> [   38.125590] [<ffff0000081fe6f8>] ___slab_alloc+0x480/0x5f0
> [   38.131121] [<ffff0000081fe88c>] __slab_alloc.isra.23+0x24/0x38
> [   38.137082] [<ffff0000081ff0bc>] kmem_cache_alloc+0x19c/0x1e0
> [   38.142875] [<ffff0000083278ec>] nfs_readhdr_alloc+0x1c/0x30
> [   38.148574] [<ffff00000832667c>] nfs_generic_pg_pgios+0x1c/0xc8
> [   38.154534] [<ffff000008326084>] nfs_pageio_doio+0x34/0x70
> [   38.160065] [<ffff0000083275d0>] nfs_pageio_complete+0x50/0xc8
> [   38.165939] [<ffff0000083286f8>] nfs_readpages+0xd8/0x1b8
> [   38.171385] [<ffff0000081a60a0>] __do_page_cache_readahead+0x180/0x268
> [   38.177955] [<ffff000008197b90>] filemap_fault+0x2c0/0x600
> [   38.183481] [<ffff0000081c8ba8>] __do_fault+0x20/0x78
> [   38.188577] [<ffff0000081ce8a4>] __handle_mm_fault+0xafc/0x1050
> [   38.194538] [<ffff0000081cef24>] handle_mm_fault+0x12c/0x1d8
> [   38.200243] [<ffff00000809d120>] do_page_fault+0x1a8/0x3d0
> [   38.205770] [<ffff00000809d384>] do_translation_fault+0x3c/0x48
> [   38.211732] [<ffff000008081310>] do_mem_abort+0x40/0xa0
> [   38.217002] [<ffff0000080813f8>] do_el0_ia_bp_hardening+0x38/0x98
> [   38.223137] Exception stack(0xffff000008013ec0 to 0xffff000008014000)
> [   38.229618] 3ec0: 0000ffffc49a4860 0000ffffc49a4bb8 0000ffffc49a4bf7 
> 0000000000000000
> [   38.237484] 3ee0: 0000000000000000 0000ffffc49a49a0 000000000000001f 
> 0000ffffaa8cf2a8
> [   38.245342] 3f00: 0000ffffc49a4a60 ffffff80ffffffe8 0000ffffc49a4a80 
> 0000ffffc49a4a80
> [   38.253204] 3f20: 0000000000000018 000000005ba10e62 003ae0c145b56d95 
> 000018ebb6886232
> [   38.261066] 3f40: 0000ffffaa99aae0 0000ffffaa63ad10 0000ffffc49a47fe 
> 0000ffffc49a4860
> [   38.268928] 3f60: 0000ffffc49a4bb8 0000ffffc49a4bf7 0000ffffc49a4bb8 
> 0000ffffaa8d9bc8
> [   38.276791] 3f80: 0000000000000001 0000ffffc49a4bb8 0000ffffaa914f00 
> 0000ffffc49a4bb8
> [   38.284653] 3fa0: 00000000000f4240 0000ffffc49a47d0 0000ffffaa5cfad0 
> 0000ffffc49a47d0
> [   38.292516] 3fc0: 0000ffffaa5cdf50 0000000080000000 0000aaaafb9d1370 
> 00000000ffffffff
> [   38.300378] 3fe0: 0000000000000000 0000000000000000 0000000000000000 
> 0000000000000000
> [   38.308241] [<ffff0000080834d4>] el0_ia+0x18/0x1c
> [   38.313028] Mem-Info:
> [   38.315331] active_anon:3647 inactive_anon:5174 isolated_anon:0
> [   38.315331]  active_file:4 inactive_file:3 isolated_file:0
> [   38.315331]  unevictable:3 dirty:0 writeback:0 unstable:0
> [   38.315331]  slab_reclaimable:1527 slab_unreclaimable:4757
> [   38.315331]  mapped:3350 shmem:5255 pagetables:204 bounce:0
> [   38.315331]  free:102395 free_pcp:171 free_cma:90122
> [   38.348452] Node 0 active_anon:14588kB inactive_anon:20696kB active_file:0kB 
> inactive_file:4kB unevictable:12kB isolated(anon):0kB isolated(file):0kB 
> mapped:13344kB dirty:0kB writeback:0kB shmem:21020kB shmems
> [   38.375385] Node 0 DMA free:388912kB min:24368kB low:30460kB high:36552kB 
> active_anon:72kB inactive_anon:0kB active_file:0kB inactive_file:196kB 
> unevictable:0kB writepending:0kB present:1703936kB managed:4267B
> [   38.403038] lowmem_reserve[]: 0 1342 1342
> [   38.407091] Node 0 Normal free:20800kB min:20684kB low:25852kB high:31020kB 
> active_anon:14516kB inactive_anon:20696kB active_file:0kB inactive_file:0kB 
> unevictable:12kB writepending:0kB present:1441792kB manaB
> [   38.435343] lowmem_reserve[]: 0 0 0
> [   38.438885] Node 0 DMA: 38*4kB (MC) 36*8kB (UMC) 12*16kB (UMC) 4*32kB (UE) 
> 14*64kB (UMEC) 9*128kB (UMEC) 3*256kB (ME) 5*512kB (UMEC) 6*1024kB (UME) 
> 4*2048kB (MEC) 90*4096kB (MC) = 389112kB
> [   38.455651] Node 0 Normal: 804*4kB (UMH) 413*8kB (UMH) 263*16kB (UMH) 
> 139*32kB (UMH) 72*64kB (UMH) 9*128kB (UM) 0*256kB 0*512kB 0*1024kB 0*2048kB 
> 0*4096kB = 20936kB
> [   38.470339] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 
> hugepages_size=2048kB
> [   38.478803] 5266 total pagecache pages
> [   38.482604] 0 pages in swap cache
> [   38.485974] Swap cache stats: add 0, delete 0, find 0/0
> [   38.491244] Free swap  = 0kB
> [   38.494181] Total swap = 0kB
> [   38.497119] 786432 pages RAM
> [   38.500047] 0 pages HighMem/MovableOnly
> [   38.503945] 649566 pages reserved
> [   38.507319] 98304 pages cma reserved
> [   38.510943] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents 
> oom_score_adj name
> [   38.519552] [ 1884]     0  1884     3375      405       9       4        
> 0             0 systemd-journal
> [   38.529011] [ 2475]     0  2475     3262      255       8       3        
> 0         -1000 systemd-udevd
> [   38.538336] [ 3098]   997  3098     1523       76       7       3        
> 0             0 systemd-network
> [   38.547844] [ 3099]   999  3099     1237       89       6       3        
> 0             0 avahi-daemon
> [   38.557085] [ 3101]     0  3101      763       18       5       3        
> 0             0 syslogd
> [   38.565897] [ 3102]   998  3102     1115       87       6       3        
> 0          -900 dbus-daemon
> [   38.575057] [ 3103]   999  3103     1204       68       6       3        
> 0             0 avahi-daemon
> [   38.584303] [ 3104]     0  3104     1502      119       7       4        
> 0             0 systemd-logind
> [   38.593719] [ 3106]     0  3106      763       18       5       3        
> 0             0 klogd
> [   38.602360] [ 3125]     0  3125      977       80       6       3        
> 0             0 xenstored
> [   38.611344] [ 3138]     0  3138     1757      123       7       3        
> 0             0 weston-launch
> [   38.620692] [ 3141]   996  3141     1670       76       7       3        
> 0             0 systemd-resolve
> [   38.630200] [ 3159]     0  3159     1802      157       7       3        
> 0             0 systemd
> [   38.639001] [ 3161]     0  3161      579       32       5       3        
> 0             0 agetty
> [   38.647718] [ 3165]     0  3165     2382      344       8       3        
> 0             0 (sd-pam)
> [   38.656618] [ 3194]     0  3194    17196       48       6       3        
> 0             0 xenconsoled
> [   38.665777] [ 3203]     0  3203    17196       57       7       3        
> 0             0 xenconsoled
> [   38.674936] [ 3206]     0  3206    36895     3598      42       3        
> 0             0 weston
> [   38.683661] [ 3229]     0  3229      680       43       5       3        
> 0             0 xenwatchdogd
> [   38.692906] [ 3230]     0  3230     1403      125       7       3        
> 0             0 login
> [   38.701563] [ 3232]     0  3232     2612      431       9       3        
> 0             0 weston-keyboard
> [   38.711075] [ 3234]     0  3234     5359     3150      14       3        
> 0             0 weston-desktop-
> [   38.720556] [ 3237]     0  3237      913       99       5       3        
> 0             0 sh
> [   38.728934] Out of memory: Kill process 3206 (weston) score 25 or sacrifice 
> child
> [   38.736484] Killed process 3234 (weston-desktop-) total-vm:21436kB, 
> anon-rss:1232kB, file-rss:8kB, shmem-rss:11360kB
> [   45.856773] systemd invoked oom-killer: 
> gfp_mask=0x14040c0(GFP_KERNEL|__GFP_COMP), nodemask=(null),  order=0, 
> oom_score_adj=0
> [   45.867983] systemd cpuset=/ mems_allowed=0
> 
> 

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: arm: xl vcpu-pin leads to oom-killer slashing processes
  2018-12-05 10:49 ` Julien Grall
@ 2018-12-05 10:59   ` Andrii Anisov
  2018-12-05 11:45     ` Julien Grall
  0 siblings, 1 reply; 13+ messages in thread
From: Andrii Anisov @ 2018-12-05 10:59 UTC (permalink / raw)
  To: Julien Grall, xen-devel

Hello Julien,

On 05.12.18 12:49, Julien Grall wrote:
> I am not sure to understand what is the relation between the two.
Me confused as well. I just notified about my observations.

> What is the latest Xen commit where the oom-killer does not trigger?
I didn't bisect it nor digged into it. I'm trying to measure IRQ latency as Stefano did.

> How much memory do you have in Dom0? Do you have any memory hungry process running?
Dom0 has 3Gb RAM. But it's not about the memory, I'm pretty sure. Until I decided to pin vcpus, I did all my routine without any issues.

-- 
Sincerely,
Andrii Anisov.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: arm: xl vcpu-pin leads to oom-killer slashing processes
  2018-12-05 10:59   ` Andrii Anisov
@ 2018-12-05 11:45     ` Julien Grall
  2018-12-05 11:59       ` Andrii Anisov
  0 siblings, 1 reply; 13+ messages in thread
From: Julien Grall @ 2018-12-05 11:45 UTC (permalink / raw)
  To: Andrii Anisov, xen-devel



On 05/12/2018 10:59, Andrii Anisov wrote:
> Hello Julien,

Hi,

> On 05.12.18 12:49, Julien Grall wrote:
>> I am not sure to understand what is the relation between the two.
> Me confused as well. I just notified about my observations.
> 
>> What is the latest Xen commit where the oom-killer does not trigger?
> I didn't bisect it nor digged into it. I'm trying to measure IRQ latency as 
> Stefano did.
> 
>> How much memory do you have in Dom0? Do you have any memory hungry process 
>> running?
> Dom0 has 3Gb RAM. But it's not about the memory, I'm pretty sure.Until I 
> decided to pin vcpus, I did all my routine without any issues.

Well, at least the kernel thinks it does not have anymore memory (see the call 
trace).

What do you mean by all your routine? How much work did you do on the platform 
before triggering oem-killer?

Looking at your log, you don't seem to have swap. With all your routine, how 
often are you close to the maximum memory?

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: arm: xl vcpu-pin leads to oom-killer slashing processes
  2018-12-05 11:45     ` Julien Grall
@ 2018-12-05 11:59       ` Andrii Anisov
  2018-12-05 12:15         ` Julien Grall
  0 siblings, 1 reply; 13+ messages in thread
From: Andrii Anisov @ 2018-12-05 11:59 UTC (permalink / raw)
  To: Julien Grall, xen-devel


On 05.12.18 13:45, Julien Grall wrote:
> Well, at least the kernel thinks it does not have anymore memory (see the call trace).
Yes, it thinks so. But it is not linked to domain .

> What do you mean by all your routine?
I mean all things I'm playing with now. Running tbm baremetal app in different use-cases, bringing it up with new vgic.

> How much work did you do on the platform before triggering oem-killer?
Without cpu pinning I do all I need without oom-killer being triggered.
After cpu pinning it takes 5-10 seconds until oom-killer starts to kill everything.

-- 
Sincerely,
Andrii Anisov.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: arm: xl vcpu-pin leads to oom-killer slashing processes
  2018-12-05 11:59       ` Andrii Anisov
@ 2018-12-05 12:15         ` Julien Grall
  2018-12-05 12:40           ` Andrii Anisov
  0 siblings, 1 reply; 13+ messages in thread
From: Julien Grall @ 2018-12-05 12:15 UTC (permalink / raw)
  To: Andrii Anisov, xen-devel



On 05/12/2018 11:59, Andrii Anisov wrote:
> 
> On 05.12.18 13:45, Julien Grall wrote:
>> Well, at least the kernel thinks it does not have anymore memory (see the call 
>> trace).
> Yes, it thinks so. But it is not linked to domain .

What do you mean? A memory corruption by Xen is extremely unlikely. So it looks 
like to me this is a related to your domain (kernel or userspace), possibly 
because memory has not been freed correctly.

> 
>> What do you mean by all your routine?
> I mean all things I'm playing with now. Running tbm baremetal app in different 
> use-cases, bringing it up with new vgic.
> 
>> How much work did you do on the platform before triggering oem-killer?
> Without cpu pinning I do all I need without oom-killer being triggered.
> After cpu pinning it takes 5-10 seconds until oom-killer starts to kill everything.

Below a list of questions to answer:
	- Can you give the steps to reproduce it from boot?
	- How much memory left do you have before calling xl vcpu-pin?
	- When exactly do you pin the vCPUs? (i.e how long after boot)
	- What are the other programs running? How much memory are they using?

Cheers,
-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: arm: xl vcpu-pin leads to oom-killer slashing processes
  2018-12-05 12:15         ` Julien Grall
@ 2018-12-05 12:40           ` Andrii Anisov
  2018-12-05 13:13             ` Julien Grall
  0 siblings, 1 reply; 13+ messages in thread
From: Andrii Anisov @ 2018-12-05 12:40 UTC (permalink / raw)
  To: Julien Grall, xen-devel



On 05.12.18 14:15, Julien Grall wrote:
>> Yes, it thinks so. But it is not linked to domain .
> 
> What do you mean?
It should be read as "But it is not linked to domain memory size".

> A memory corruption by Xen is extremely unlikely. 
I believe in that.

> So it looks like to me this is a related to your domain (kernel or userspace), possibly because memory has not been freed correctly.
It might be. But happens only with and right after vcpu pinning.

> Below a list of questions to answer:
>      - Can you give the steps to reproduce it from boot?
The step is single and trivial, just try to pin vcpus from Dom0.

>      - How much memory left do you have before calling xl vcpu-pin?
Meminfo says
     root@salvator-x:~# cat /proc/meminfo
     MemTotal:        2995828 kB
     MemFree:         2810360 kB
     MemAvailable:    2758420 kB

Top says:
     Mem: 185592K used, 2810236K free, 21020K shrd, 0K buff, 53000K cached
     CPU:   0% usr   8% sys   0% nic  91% idle   0% io   0% irq   0% sirq

>      - When exactly do you pin the vCPUs? (i.e how long after boot)
Right after login.

>      - What are the other programs running? How much memory are they using?
Weston, systemd daemons, Xen daemons, other Yocto daemons. Actually, nothing special.

-- 
Sincerely,
Andrii Anisov.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: arm: xl vcpu-pin leads to oom-killer slashing processes
  2018-12-05 12:40           ` Andrii Anisov
@ 2018-12-05 13:13             ` Julien Grall
  2018-12-05 14:46               ` Andrii Anisov
  0 siblings, 1 reply; 13+ messages in thread
From: Julien Grall @ 2018-12-05 13:13 UTC (permalink / raw)
  To: Andrii Anisov, xen-devel



On 05/12/2018 12:40, Andrii Anisov wrote:
> 
> 
> On 05.12.18 14:15, Julien Grall wrote:
>>> Yes, it thinks so. But it is not linked to domain .
>>
>> What do you mean?
> It should be read as "But it is not linked to domain memory size".

So if you increase the memory of the dom0 you will still see the error?

> 
>> A memory corruption by Xen is extremely unlikely. 
> I believe in that.

I need at least some sort of proof that Xen might corrupt the kernel. I don't 
believe we manage to just corrupt the kernel memory subsystem with good enough 
value reliably. So maybe we should start looking at more plausible cause.

> 
>> So it looks like to me this is a related to your domain (kernel or userspace), 
>> possibly because memory has not been freed correctly.
> It might be. But happens only with and right after vcpu pinning.

It does not mean this is because of Xen. It might just be because of Xen drivers 
that does not free memory.

> 
>> Below a list of questions to answer:
>>      - Can you give the steps to reproduce it from boot?
> The step is single and trivial, just try to pin vcpus from Dom0.

I tried and can't reproduce it. But I am using 4.20-rc4 and not 4.14.35. If you 
think the bug was introduced in recent Xen, then the first step is to downgrade 
Xen. If it does not happen on the downgraded version, then you can bisect it.

> 
>>      - How much memory left do you have before calling xl vcpu-pin?
> Meminfo says
>      root@salvator-x:~# cat /proc/meminfo
>      MemTotal:        2995828 kB
>      MemFree:         2810360 kB
>      MemAvailable:    2758420 kB

How about after vCPU pining? Do you see the memory free going down?

> 
> Top says:
>      Mem: 185592K used, 2810236K free, 21020K shrd, 0K buff, 53000K cached
>      CPU:   0% usr   8% sys   0% nic  91% idle   0% io   0% irq   0% sirq
> 
>>      - When exactly do you pin the vCPUs? (i.e how long after boot)
> Right after login.

So that's reliably happening? Are you sure there are nothing else on the system 
using memory? For instance you seem to have nfs in place.

> 
>>      - What are the other programs running? How much memory are they using?
> Weston, systemd daemons, Xen daemons, other Yocto daemons. Actually, nothing 
> special.

I would try to remove unnecessary programs. So you can narrow down the issues.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: arm: xl vcpu-pin leads to oom-killer slashing processes
  2018-12-05 13:13             ` Julien Grall
@ 2018-12-05 14:46               ` Andrii Anisov
  2018-12-13 14:13                 ` Andrii Anisov
  0 siblings, 1 reply; 13+ messages in thread
From: Andrii Anisov @ 2018-12-05 14:46 UTC (permalink / raw)
  To: Julien Grall, xen-devel



On 05.12.18 15:13, Julien Grall wrote:
> I need at least some sort of proof that Xen might corrupt the kernel. I don't believe we manage to just corrupt the kernel memory subsystem with good enough value reliably. So maybe we should start looking at more plausible cause.
I think I would be able to look deeper into it by the end of the week.

> It does not mean this is because of Xen. It might just be because of Xen drivers that does not free memory.
Totally agree. I did not dig into it yet. Just made a notification, so maybe other interested parties can check/verify the issue on different setups.

> I tried and can't reproduce it.
Great. So it might be the problem on my setup.

> But I am using 4.20-rc4 and not 4.14.35. If you think the bug was introduced in recent Xen, then the first step is to downgrade Xen. If it does not happen on the downgraded version, then you can bisect it.
It definitely does not reproduce with XEN 4.10 release. But it is pretty old.

> How about after vCPU pining? Do you see the memory free going down?
I've checked, and seen a strange thing: memtotal is shrinked down.
Meminfo before vcpu pin:

MemTotal:        2995828 kB
MemFree:         2810360 kB
MemAvailable:    2758420 kB
Buffers:               0 kB
Cached:            53092 kB
SwapCached:            0 kB
Active:            26716 kB
Inactive:          40980 kB
Active(anon):      14976 kB
Inactive(anon):    20648 kB
Active(file):      11740 kB
Inactive(file):    20332 kB
Unevictable:          12 kB
Mlocked:              12 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:                 0 kB
Writeback:             0 kB
AnonPages:         14644 kB
Mapped:            31632 kB
Shmem:             21020 kB
Slab:              29204 kB
SReclaimable:       9924 kB
SUnreclaim:        19280 kB
KernelStack:        2848 kB
PageTables:          828 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     1497912 kB
Committed_AS:      58556 kB
VmallocTotal:   135290290112 kB
VmallocUsed:           0 kB
VmallocChunk:          0 kB
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
CmaTotal:         393216 kB
CmaFree:          360488 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB


Meminfo before oom-killer rose:

root@salvator-x:~# cat /proc/meminfo
MemTotal:         549000 kB
MemFree:          412108 kB
MemAvailable:     347472 kB
Buffers:               0 kB
Cached:            19356 kB
SwapCached:            0 kB
Active:            15920 kB
Inactive:          17512 kB
Active(anon):      14516 kB
Inactive(anon):     9104 kB
Active(file):       1404 kB
Inactive(file):     8408 kB
Unevictable:          12 kB
Mlocked:              12 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:                 0 kB
Writeback:             0 kB
AnonPages:         14160 kB
Mapped:             9016 kB
Shmem:              9496 kB
Slab:              26412 kB
SReclaimable:       6792 kB
SUnreclaim:        19620 kB
KernelStack:        2880 kB
PageTables:          776 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:      274500 kB
Committed_AS:      49264 kB
VmallocTotal:   135290290112 kB
VmallocUsed:           0 kB
VmallocChunk:          0 kB
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
CmaTotal:         393216 kB
CmaFree:          360488 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB


> So that's reliably happening?
It is 100% reproducible on my setup.

> Are you sure there are nothing else on the system using memory? For instance you seem to have nfs in place.
Yes, Dom0 root is nfs.

-- 
Sincerely,
Andrii Anisov.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: arm: xl vcpu-pin leads to oom-killer slashing processes
  2018-12-05 14:46               ` Andrii Anisov
@ 2018-12-13 14:13                 ` Andrii Anisov
  2018-12-13 16:37                   ` Juergen Gross
  0 siblings, 1 reply; 13+ messages in thread
From: Andrii Anisov @ 2018-12-13 14:13 UTC (permalink / raw)
  To: xen-devel; +Cc: Julien Grall, Stefano Stabellini

Hello All,

OK, I've discovered a mechanism of the issue.
It is  because of `d->max_pages = ~0U;` in a `construct_dom0()`.
When I do vcpu-pin, libxl updates memory nodes in xenstore for Dom0. Then kernel watch sees those changes and trying to set new target for ballon, but the target becomes extremely high, and baloon sucks all the pages.

In my kernel (4.14) in `watch_target()` function there is a code:

         target_diff = xen_pv_domain() ? 0
                 : static_max - balloon_stats.target_pages;

Here we have `xen_pv_domain()` equal to zero, so `target_diff` big. Then, few lines below:

	balloon_set_new_target(new_target - target_diff);

`balloon_set_new_target()` receives a value wrapped over 64bit what kills the system.

Now I'm looking for an appropriate kernel patch for the kernel, to fix that. Any suggestions?

-- 
Sincerely,
Andrii Anisov.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: arm: xl vcpu-pin leads to oom-killer slashing processes
  2018-12-13 14:13                 ` Andrii Anisov
@ 2018-12-13 16:37                   ` Juergen Gross
  2018-12-14 10:36                     ` Andrii Anisov
  0 siblings, 1 reply; 13+ messages in thread
From: Juergen Gross @ 2018-12-13 16:37 UTC (permalink / raw)
  To: Andrii Anisov, xen-devel; +Cc: Julien Grall, Stefano Stabellini

On 12/13/18 3:13 PM, Andrii Anisov wrote:
> Hello All,
> 
> OK, I've discovered a mechanism of the issue.
> It is  because of `d->max_pages = ~0U;` in a `construct_dom0()`.
> When I do vcpu-pin, libxl updates memory nodes in xenstore for Dom0. 
> Then kernel watch sees those changes and trying to set new target for 
> ballon, but the target becomes extremely high, and baloon sucks all the 
> pages.
> 
> In my kernel (4.14) in `watch_target()` function there is a code:
> 
>          target_diff = xen_pv_domain() ? 0
>                  : static_max - balloon_stats.target_pages;
> 
> Here we have `xen_pv_domain()` equal to zero, so `target_diff` big. 
> Then, few lines below:
> 
>      balloon_set_new_target(new_target - target_diff);
> 
> `balloon_set_new_target()` receives a value wrapped over 64bit what 
> kills the system.
> 
> Now I'm looking for an appropriate kernel patch for the kernel, to fix 
> that. Any suggestions?
> 

You should use linux kernel commit 3596924a233e45aa918.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: arm: xl vcpu-pin leads to oom-killer slashing processes
  2018-12-13 16:37                   ` Juergen Gross
@ 2018-12-14 10:36                     ` Andrii Anisov
  0 siblings, 0 replies; 13+ messages in thread
From: Andrii Anisov @ 2018-12-14 10:36 UTC (permalink / raw)
  To: Juergen Gross, xen-devel; +Cc: Julien Grall, Stefano Stabellini

Hello Juergen,

On 13.12.18 18:37, Juergen Gross wrote:
> You should use linux kernel commit 3596924a233e45aa918.
That is exactly what is needed.
Thank you!

-- 
Sincerely,
Andrii Anisov.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2018-12-14 10:36 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-05 10:26 arm: xl vcpu-pin leads to oom-killer slashing processes Andrii Anisov
2018-12-05 10:28 ` Andrii Anisov
2018-12-05 10:49 ` Julien Grall
2018-12-05 10:59   ` Andrii Anisov
2018-12-05 11:45     ` Julien Grall
2018-12-05 11:59       ` Andrii Anisov
2018-12-05 12:15         ` Julien Grall
2018-12-05 12:40           ` Andrii Anisov
2018-12-05 13:13             ` Julien Grall
2018-12-05 14:46               ` Andrii Anisov
2018-12-13 14:13                 ` Andrii Anisov
2018-12-13 16:37                   ` Juergen Gross
2018-12-14 10:36                     ` Andrii Anisov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).