linux 4.19 xenstore memory allocation failure Re: [linux-4.19 test] 135420: regressions

* linux 4.19 xenstore memory allocation failure Re: [linux-4.19 test] 135420: regressions - FAIL
@ 2019-04-30 13:02 ` Ian Jackson
  0 siblings, 0 replies; 20+ messages in thread
From: Ian Jackson @ 2019-04-30 13:02 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Roger Pau Monné, Juergen Gross; +Cc: xen-devel

osstest service owner writes ("[linux-4.19 test] 135420: regressions - FAIL"):
> flight 135420 linux-4.19 real [real]
> http://logs.test-lab.xenproject.org/osstest/logs/135420/
> 
> Regressions :-(
> 
> Tests which did not succeed and are blocking,
> including tests which could not be run:

>  test-amd64-amd64-libvirt-vhd 17 guest-start/debian.repeat fail REGR. vs. 129313

This seems to be a kernel bug.

The guest creation failed.  The toolstack reports

 2019-04-30 04:11:17.521+0000: libxl:
 libxl_device.c:397:libxl__device_disk_set_backend: Disk vdev=xvda
 spec.backend=qdisk
 ...
 2019-04-30 04:11:27.600+0000: libxl:
 libxl_device.c:1418:libxl__wait_for_backend: Backend
 /local/domain/0/backend/qdisk/0/51712 not ready

 2019-04-30 04:11:27.600+0000: libxl:
 libxl_bootloader.c:417:bootloader_disk_attached_cb: Domain 5:failed
 to attach local disk for bootloader execution

Looking at the code in libxl, it is polling the specified xenstore
path hoping for a ready state to turn up.  It waits 10 seconds and
then gives up.  (Unfortunately it doesn't print the state it found.)

The backend is qemu.  qemu does not seem to have reported anything
untoward.  However, looking at the kernel log (full log below):

 Apr 30 04:11:17 chardonnay1 kernel: [ 1393.403311] xenwatch: page
 allocation failure: order:5,
 mode:0x60c0c0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null)

I conjecture the the following sequence of events:

 - libxl wants to run the guest's bootloader
 - libxl started qemu with instructions to become a qdisk
    backend for dom0, which libxl is intending to attach
    to in dom0
 - libxl gave the dom0 kernel instructions to create a vbd
    frontend for its own use, attached to the former
 - qemu started up and started following these instructions
 - the vbd state machine involves dom0 setting up xenstore
    watches, (i) for its own backend and (ii) maybe for qemu
    (qemu will want a watch and use libxenstore, which may
    use the socket or the kernel xenstore device - I haven't
    checked which)
 - qemu triggers a watch event by writing as the backend
    to its xenstore area
 - blkfront gets the watch event about this (confusingly
    this is "blkback_changed" which is part of blkfront)
 - blkback tries to read the state node
 - unfortunately, there is a memory allocation failure,
    meaning that blkback cannot read the state node
 - the watch event is thereby lost; everything hangs
 - libxl times out and libvirt asks it to tear down the
    busted domain

ISTM that there are *two* bugs here:

 1. Whatever caused the memory allocation failure

 2. That a memory allocation failure can cause permanent loss of a
     xenstore watch event

IDK yet what the failure probability is.  In this test it happened on
the first repetition of the `repeatedly start and stop guest' test,
but that followed a number of other tests including save/restore and
repeated migration.

Other failures in this flight which need not concern you as Linux Xen
core and blkfront maintainers:

>  build-armhf-pvops             6 kernel-build             fail REGR. vs. 129313

This is a genuine build failure due to the new compiler, which I have
mailed ARM folks about.  Ie it is a bug in the Linux 4.19 stable
branch but nothing to do with Xen.

>  test-amd64-amd64-xl-qcow2    17 guest-localmigrate/x10   fail REGR. vs. 129313

This is a known regression with the stretch upgrade and is nothing to
do with linux-4.19 (or Xen, I think).

Ian.

Apr 30 04:11:17 chardonnay1 kernel: [ 1393.403311] xenwatch: page allocation failure: order:5, mode:0x60c0c0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null)
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.404374] xenwatch cpuset=/ mems_allowed=0
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.405000] CPU: 1 PID: 42 Comm: xenwatch Not tainted 4.19.37 #1
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.405552] Hardware name: GIGABYTE GS-R12P4S/GA-7PCSL, BIOS R12 05/20/2014
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.406100] Call Trace:
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.406685]  dump_stack+0x72/0x97
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.407225]  warn_alloc+0xf3/0x180
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.407807]  __alloc_pages_slowpath+0xd31/0xdb0
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.408349]  ? get_page_from_freelist+0x39d/0xfb0
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.408893]  ? xs_talkv+0x216/0x2c0
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.409478]  ? xs_single+0x48/0x70
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.410015]  __alloc_pages_nodemask+0x1f8/0x240
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.410603]  kmalloc_order+0x13/0x70
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.411143]  kmalloc_order_trace+0x18/0xa0
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.411727]  talk_to_blkback+0xbb/0xdb0
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.412276]  ? xenbus_gather+0xd3/0x150
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.412816]  blkback_changed+0x11a/0xc20
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.413398]  ? xenbus_read_driver_state+0x34/0x60
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.413941]  xenwatch_thread+0x81/0x170
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.414523]  ? wait_woken+0x80/0x80
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.415061]  kthread+0xf3/0x130
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.415598]  ? test_reply.isra.3+0x40/0x40
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.416179]  ? kthread_destroy_worker+0x40/0x40
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.416721]  ret_from_fork+0x35/0x40
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.417337] Mem-Info:
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.417904] active_anon:5562 inactive_anon:7138 isolated_anon:0
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.417904]  active_file:13106 inactive_file:57519 isolated_file:0
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.417904]  unevictable:0 dirty:104 writeback:0 unstable:0
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.417904]  slab_reclaimable:3860 slab_unreclaimable:7214
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.417904]  mapped:7033 shmem:311 pagetables:953 bounce:0
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.417904]  free:4639 free_pcp:61 free_cma:0
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.420594] Node 0 active_anon:22248kB inactive_anon:28552kB active_file:52424kB inactive_file:230076kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:28132kB dirty:416kB writeback:0kB shmem:1244kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.422432] DMA free:1652kB min:100kB low:124kB high:148kB active_anon:0kB inactive_anon:8kB active_file:668kB inactive_file:13068kB unevictable:0kB writepending:0kB present:15928kB managed:15844kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.424314] lowmem_reserve[]: 0 373 373 373
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.424886] DMA32 free:16904kB min:2416kB low:3020kB high:3624kB active_anon:22248kB inactive_anon:28544kB active_file:51756kB inactive_file:217008kB unevictable:0kB writepending:416kB present:508356kB managed:405648kB mlocked:0kB kernel_stack:3200kB pagetables:3812kB bounce:0kB free_pcp:244kB local_pcp:0kB free_cma:0kB
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.427165] lowmem_reserve[]: 0 0 0 0
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.427726] DMA: 23*4kB (ME) 15*8kB (ME) 12*16kB (ME) 9*32kB (ME) 5*64kB (ME) 5*128kB (UME) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1652kB
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.428772] DMA32: 1524*4kB (UMH) 260*8kB (UMH) 189*16kB (UME) 96*32kB (UMEH) 28*64kB (MH) 3*128kB (H) 2*256kB (H) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 16960kB
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.430184] 70983 total pagecache pages
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.430699] 36 pages in swap cache
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.431234] Swap cache stats: add 51, delete 15, find 0/0
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.431751] Free swap  = 1949428kB
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.432249] Total swap = 1949692kB
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.432800] 131071 pages RAM
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.433293] 0 pages HighMem/MovableOnly
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.433806] 25698 pages reserved
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.434358] vbd vbd-51712: 12 allocating ring_info structure

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread