All of lore.kernel.org
 help / color / mirror / Atom feed
* linux 4.19 xenstore memory allocation failure Re: [linux-4.19 test] 135420: regressions - FAIL
@ 2019-04-30 13:02 ` Ian Jackson
  0 siblings, 0 replies; 20+ messages in thread
From: Ian Jackson @ 2019-04-30 13:02 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Roger Pau Monné, Juergen Gross; +Cc: xen-devel

osstest service owner writes ("[linux-4.19 test] 135420: regressions - FAIL"):
> flight 135420 linux-4.19 real [real]
> http://logs.test-lab.xenproject.org/osstest/logs/135420/
> 
> Regressions :-(
> 
> Tests which did not succeed and are blocking,
> including tests which could not be run:

>  test-amd64-amd64-libvirt-vhd 17 guest-start/debian.repeat fail REGR. vs. 129313

This seems to be a kernel bug.

The guest creation failed.  The toolstack reports

 2019-04-30 04:11:17.521+0000: libxl:
 libxl_device.c:397:libxl__device_disk_set_backend: Disk vdev=xvda
 spec.backend=qdisk
 ...
 2019-04-30 04:11:27.600+0000: libxl:
 libxl_device.c:1418:libxl__wait_for_backend: Backend
 /local/domain/0/backend/qdisk/0/51712 not ready

 2019-04-30 04:11:27.600+0000: libxl:
 libxl_bootloader.c:417:bootloader_disk_attached_cb: Domain 5:failed
 to attach local disk for bootloader execution

Looking at the code in libxl, it is polling the specified xenstore
path hoping for a ready state to turn up.  It waits 10 seconds and
then gives up.  (Unfortunately it doesn't print the state it found.)

The backend is qemu.  qemu does not seem to have reported anything
untoward.  However, looking at the kernel log (full log below):

 Apr 30 04:11:17 chardonnay1 kernel: [ 1393.403311] xenwatch: page
 allocation failure: order:5,
 mode:0x60c0c0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null)

I conjecture the the following sequence of events:

 - libxl wants to run the guest's bootloader
 - libxl started qemu with instructions to become a qdisk
    backend for dom0, which libxl is intending to attach
    to in dom0
 - libxl gave the dom0 kernel instructions to create a vbd
    frontend for its own use, attached to the former
 - qemu started up and started following these instructions
 - the vbd state machine involves dom0 setting up xenstore
    watches, (i) for its own backend and (ii) maybe for qemu
    (qemu will want a watch and use libxenstore, which may
    use the socket or the kernel xenstore device - I haven't
    checked which)
 - qemu triggers a watch event by writing as the backend
    to its xenstore area
 - blkfront gets the watch event about this (confusingly
    this is "blkback_changed" which is part of blkfront)
 - blkback tries to read the state node
 - unfortunately, there is a memory allocation failure,
    meaning that blkback cannot read the state node
 - the watch event is thereby lost; everything hangs
 - libxl times out and libvirt asks it to tear down the
    busted domain

ISTM that there are *two* bugs here:

 1. Whatever caused the memory allocation failure

 2. That a memory allocation failure can cause permanent loss of a
     xenstore watch event

IDK yet what the failure probability is.  In this test it happened on
the first repetition of the `repeatedly start and stop guest' test,
but that followed a number of other tests including save/restore and
repeated migration.


Other failures in this flight which need not concern you as Linux Xen
core and blkfront maintainers:

>  build-armhf-pvops             6 kernel-build             fail REGR. vs. 129313

This is a genuine build failure due to the new compiler, which I have
mailed ARM folks about.  Ie it is a bug in the Linux 4.19 stable
branch but nothing to do with Xen.

>  test-amd64-amd64-xl-qcow2    17 guest-localmigrate/x10   fail REGR. vs. 129313

This is a known regression with the stretch upgrade and is nothing to
do with linux-4.19 (or Xen, I think).

Ian.

Apr 30 04:11:17 chardonnay1 kernel: [ 1393.403311] xenwatch: page allocation failure: order:5, mode:0x60c0c0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null)
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.404374] xenwatch cpuset=/ mems_allowed=0
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.405000] CPU: 1 PID: 42 Comm: xenwatch Not tainted 4.19.37 #1
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.405552] Hardware name: GIGABYTE GS-R12P4S/GA-7PCSL, BIOS R12 05/20/2014
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.406100] Call Trace:
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.406685]  dump_stack+0x72/0x97
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.407225]  warn_alloc+0xf3/0x180
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.407807]  __alloc_pages_slowpath+0xd31/0xdb0
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.408349]  ? get_page_from_freelist+0x39d/0xfb0
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.408893]  ? xs_talkv+0x216/0x2c0
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.409478]  ? xs_single+0x48/0x70
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.410015]  __alloc_pages_nodemask+0x1f8/0x240
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.410603]  kmalloc_order+0x13/0x70
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.411143]  kmalloc_order_trace+0x18/0xa0
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.411727]  talk_to_blkback+0xbb/0xdb0
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.412276]  ? xenbus_gather+0xd3/0x150
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.412816]  blkback_changed+0x11a/0xc20
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.413398]  ? xenbus_read_driver_state+0x34/0x60
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.413941]  xenwatch_thread+0x81/0x170
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.414523]  ? wait_woken+0x80/0x80
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.415061]  kthread+0xf3/0x130
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.415598]  ? test_reply.isra.3+0x40/0x40
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.416179]  ? kthread_destroy_worker+0x40/0x40
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.416721]  ret_from_fork+0x35/0x40
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.417337] Mem-Info:
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.417904] active_anon:5562 inactive_anon:7138 isolated_anon:0
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.417904]  active_file:13106 inactive_file:57519 isolated_file:0
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.417904]  unevictable:0 dirty:104 writeback:0 unstable:0
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.417904]  slab_reclaimable:3860 slab_unreclaimable:7214
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.417904]  mapped:7033 shmem:311 pagetables:953 bounce:0
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.417904]  free:4639 free_pcp:61 free_cma:0
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.420594] Node 0 active_anon:22248kB inactive_anon:28552kB active_file:52424kB inactive_file:230076kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:28132kB dirty:416kB writeback:0kB shmem:1244kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.422432] DMA free:1652kB min:100kB low:124kB high:148kB active_anon:0kB inactive_anon:8kB active_file:668kB inactive_file:13068kB unevictable:0kB writepending:0kB present:15928kB managed:15844kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.424314] lowmem_reserve[]: 0 373 373 373
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.424886] DMA32 free:16904kB min:2416kB low:3020kB high:3624kB active_anon:22248kB inactive_anon:28544kB active_file:51756kB inactive_file:217008kB unevictable:0kB writepending:416kB present:508356kB managed:405648kB mlocked:0kB kernel_stack:3200kB pagetables:3812kB bounce:0kB free_pcp:244kB local_pcp:0kB free_cma:0kB
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.427165] lowmem_reserve[]: 0 0 0 0
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.427726] DMA: 23*4kB (ME) 15*8kB (ME) 12*16kB (ME) 9*32kB (ME) 5*64kB (ME) 5*128kB (UME) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1652kB
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.428772] DMA32: 1524*4kB (UMH) 260*8kB (UMH) 189*16kB (UME) 96*32kB (UMEH) 28*64kB (MH) 3*128kB (H) 2*256kB (H) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 16960kB
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.430184] 70983 total pagecache pages
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.430699] 36 pages in swap cache
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.431234] Swap cache stats: add 51, delete 15, find 0/0
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.431751] Free swap  = 1949428kB
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.432249] Total swap = 1949692kB
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.432800] 131071 pages RAM
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.433293] 0 pages HighMem/MovableOnly
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.433806] 25698 pages reserved
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.434358] vbd vbd-51712: 12 allocating ring_info structure

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Xen-devel] linux 4.19 xenstore memory allocation failure Re: [linux-4.19 test] 135420: regressions - FAIL
@ 2019-04-30 13:02 ` Ian Jackson
  0 siblings, 0 replies; 20+ messages in thread
From: Ian Jackson @ 2019-04-30 13:02 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Roger Pau Monné, Juergen Gross; +Cc: xen-devel

osstest service owner writes ("[linux-4.19 test] 135420: regressions - FAIL"):
> flight 135420 linux-4.19 real [real]
> http://logs.test-lab.xenproject.org/osstest/logs/135420/
> 
> Regressions :-(
> 
> Tests which did not succeed and are blocking,
> including tests which could not be run:

>  test-amd64-amd64-libvirt-vhd 17 guest-start/debian.repeat fail REGR. vs. 129313

This seems to be a kernel bug.

The guest creation failed.  The toolstack reports

 2019-04-30 04:11:17.521+0000: libxl:
 libxl_device.c:397:libxl__device_disk_set_backend: Disk vdev=xvda
 spec.backend=qdisk
 ...
 2019-04-30 04:11:27.600+0000: libxl:
 libxl_device.c:1418:libxl__wait_for_backend: Backend
 /local/domain/0/backend/qdisk/0/51712 not ready

 2019-04-30 04:11:27.600+0000: libxl:
 libxl_bootloader.c:417:bootloader_disk_attached_cb: Domain 5:failed
 to attach local disk for bootloader execution

Looking at the code in libxl, it is polling the specified xenstore
path hoping for a ready state to turn up.  It waits 10 seconds and
then gives up.  (Unfortunately it doesn't print the state it found.)

The backend is qemu.  qemu does not seem to have reported anything
untoward.  However, looking at the kernel log (full log below):

 Apr 30 04:11:17 chardonnay1 kernel: [ 1393.403311] xenwatch: page
 allocation failure: order:5,
 mode:0x60c0c0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null)

I conjecture the the following sequence of events:

 - libxl wants to run the guest's bootloader
 - libxl started qemu with instructions to become a qdisk
    backend for dom0, which libxl is intending to attach
    to in dom0
 - libxl gave the dom0 kernel instructions to create a vbd
    frontend for its own use, attached to the former
 - qemu started up and started following these instructions
 - the vbd state machine involves dom0 setting up xenstore
    watches, (i) for its own backend and (ii) maybe for qemu
    (qemu will want a watch and use libxenstore, which may
    use the socket or the kernel xenstore device - I haven't
    checked which)
 - qemu triggers a watch event by writing as the backend
    to its xenstore area
 - blkfront gets the watch event about this (confusingly
    this is "blkback_changed" which is part of blkfront)
 - blkback tries to read the state node
 - unfortunately, there is a memory allocation failure,
    meaning that blkback cannot read the state node
 - the watch event is thereby lost; everything hangs
 - libxl times out and libvirt asks it to tear down the
    busted domain

ISTM that there are *two* bugs here:

 1. Whatever caused the memory allocation failure

 2. That a memory allocation failure can cause permanent loss of a
     xenstore watch event

IDK yet what the failure probability is.  In this test it happened on
the first repetition of the `repeatedly start and stop guest' test,
but that followed a number of other tests including save/restore and
repeated migration.


Other failures in this flight which need not concern you as Linux Xen
core and blkfront maintainers:

>  build-armhf-pvops             6 kernel-build             fail REGR. vs. 129313

This is a genuine build failure due to the new compiler, which I have
mailed ARM folks about.  Ie it is a bug in the Linux 4.19 stable
branch but nothing to do with Xen.

>  test-amd64-amd64-xl-qcow2    17 guest-localmigrate/x10   fail REGR. vs. 129313

This is a known regression with the stretch upgrade and is nothing to
do with linux-4.19 (or Xen, I think).

Ian.

Apr 30 04:11:17 chardonnay1 kernel: [ 1393.403311] xenwatch: page allocation failure: order:5, mode:0x60c0c0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null)
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.404374] xenwatch cpuset=/ mems_allowed=0
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.405000] CPU: 1 PID: 42 Comm: xenwatch Not tainted 4.19.37 #1
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.405552] Hardware name: GIGABYTE GS-R12P4S/GA-7PCSL, BIOS R12 05/20/2014
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.406100] Call Trace:
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.406685]  dump_stack+0x72/0x97
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.407225]  warn_alloc+0xf3/0x180
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.407807]  __alloc_pages_slowpath+0xd31/0xdb0
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.408349]  ? get_page_from_freelist+0x39d/0xfb0
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.408893]  ? xs_talkv+0x216/0x2c0
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.409478]  ? xs_single+0x48/0x70
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.410015]  __alloc_pages_nodemask+0x1f8/0x240
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.410603]  kmalloc_order+0x13/0x70
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.411143]  kmalloc_order_trace+0x18/0xa0
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.411727]  talk_to_blkback+0xbb/0xdb0
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.412276]  ? xenbus_gather+0xd3/0x150
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.412816]  blkback_changed+0x11a/0xc20
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.413398]  ? xenbus_read_driver_state+0x34/0x60
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.413941]  xenwatch_thread+0x81/0x170
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.414523]  ? wait_woken+0x80/0x80
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.415061]  kthread+0xf3/0x130
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.415598]  ? test_reply.isra.3+0x40/0x40
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.416179]  ? kthread_destroy_worker+0x40/0x40
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.416721]  ret_from_fork+0x35/0x40
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.417337] Mem-Info:
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.417904] active_anon:5562 inactive_anon:7138 isolated_anon:0
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.417904]  active_file:13106 inactive_file:57519 isolated_file:0
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.417904]  unevictable:0 dirty:104 writeback:0 unstable:0
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.417904]  slab_reclaimable:3860 slab_unreclaimable:7214
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.417904]  mapped:7033 shmem:311 pagetables:953 bounce:0
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.417904]  free:4639 free_pcp:61 free_cma:0
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.420594] Node 0 active_anon:22248kB inactive_anon:28552kB active_file:52424kB inactive_file:230076kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:28132kB dirty:416kB writeback:0kB shmem:1244kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.422432] DMA free:1652kB min:100kB low:124kB high:148kB active_anon:0kB inactive_anon:8kB active_file:668kB inactive_file:13068kB unevictable:0kB writepending:0kB present:15928kB managed:15844kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.424314] lowmem_reserve[]: 0 373 373 373
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.424886] DMA32 free:16904kB min:2416kB low:3020kB high:3624kB active_anon:22248kB inactive_anon:28544kB active_file:51756kB inactive_file:217008kB unevictable:0kB writepending:416kB present:508356kB managed:405648kB mlocked:0kB kernel_stack:3200kB pagetables:3812kB bounce:0kB free_pcp:244kB local_pcp:0kB free_cma:0kB
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.427165] lowmem_reserve[]: 0 0 0 0
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.427726] DMA: 23*4kB (ME) 15*8kB (ME) 12*16kB (ME) 9*32kB (ME) 5*64kB (ME) 5*128kB (UME) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1652kB
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.428772] DMA32: 1524*4kB (UMH) 260*8kB (UMH) 189*16kB (UME) 96*32kB (UMEH) 28*64kB (MH) 3*128kB (H) 2*256kB (H) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 16960kB
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.430184] 70983 total pagecache pages
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.430699] 36 pages in swap cache
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.431234] Swap cache stats: add 51, delete 15, find 0/0
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.431751] Free swap  = 1949428kB
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.432249] Total swap = 1949692kB
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.432800] 131071 pages RAM
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.433293] 0 pages HighMem/MovableOnly
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.433806] 25698 pages reserved
Apr 30 04:11:17 chardonnay1 kernel: [ 1393.434358] vbd vbd-51712: 12 allocating ring_info structure

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: linux 4.19 xenstore memory allocation failure Re: [linux-4.19 test] 135420: regressions - FAIL
@ 2019-04-30 14:26   ` Jan Beulich
  0 siblings, 0 replies; 20+ messages in thread
From: Jan Beulich @ 2019-04-30 14:26 UTC (permalink / raw)
  To: Ian Jackson
  Cc: Juergen Gross, xen-devel, Konrad Rzeszutek Wilk, Roger Pau Monne

>>> On 30.04.19 at 15:02, <ian.jackson@citrix.com> wrote:
> ISTM that there are *two* bugs here:
> 
>  1. Whatever caused the memory allocation failure

An order-5 allocation is set to fail at any time (afaict). I find it
surprising that struct blkfront_ring_info instances (even arrays
of them when using multiple rings) get allocated using kcalloc()
rather than kvcalloc(), considering the size of the structure
(0x140E0 according to the disassembly of the 5.0.1 driver I
had to hand).

>  2. That a memory allocation failure can cause permanent loss of a
>      xenstore watch event

Well, isn't it sort of expected that an allocation failure will lead
to further problems?

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] linux 4.19 xenstore memory allocation failure Re: [linux-4.19 test] 135420: regressions - FAIL
@ 2019-04-30 14:26   ` Jan Beulich
  0 siblings, 0 replies; 20+ messages in thread
From: Jan Beulich @ 2019-04-30 14:26 UTC (permalink / raw)
  To: Ian Jackson
  Cc: Juergen Gross, xen-devel, Konrad Rzeszutek Wilk, Roger Pau Monne

>>> On 30.04.19 at 15:02, <ian.jackson@citrix.com> wrote:
> ISTM that there are *two* bugs here:
> 
>  1. Whatever caused the memory allocation failure

An order-5 allocation is set to fail at any time (afaict). I find it
surprising that struct blkfront_ring_info instances (even arrays
of them when using multiple rings) get allocated using kcalloc()
rather than kvcalloc(), considering the size of the structure
(0x140E0 according to the disassembly of the 5.0.1 driver I
had to hand).

>  2. That a memory allocation failure can cause permanent loss of a
>      xenstore watch event

Well, isn't it sort of expected that an allocation failure will lead
to further problems?

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: linux 4.19 xenstore memory allocation failure Re: [linux-4.19 test] 135420: regressions - FAIL
@ 2019-04-30 14:28   ` Juergen Gross
  0 siblings, 0 replies; 20+ messages in thread
From: Juergen Gross @ 2019-04-30 14:28 UTC (permalink / raw)
  To: Ian Jackson, Konrad Rzeszutek Wilk, Roger Pau Monné; +Cc: xen-devel

On 30/04/2019 15:02, Ian Jackson wrote:
> osstest service owner writes ("[linux-4.19 test] 135420: regressions - FAIL"):
>> flight 135420 linux-4.19 real [real]
>> http://logs.test-lab.xenproject.org/osstest/logs/135420/
>>
>> Regressions :-(
>>
>> Tests which did not succeed and are blocking,
>> including tests which could not be run:
> 
>>  test-amd64-amd64-libvirt-vhd 17 guest-start/debian.repeat fail REGR. vs. 129313
> 
> This seems to be a kernel bug.
> 
> The guest creation failed.  The toolstack reports
> 
>  2019-04-30 04:11:17.521+0000: libxl:
>  libxl_device.c:397:libxl__device_disk_set_backend: Disk vdev=xvda
>  spec.backend=qdisk
>  ...
>  2019-04-30 04:11:27.600+0000: libxl:
>  libxl_device.c:1418:libxl__wait_for_backend: Backend
>  /local/domain/0/backend/qdisk/0/51712 not ready
> 
>  2019-04-30 04:11:27.600+0000: libxl:
>  libxl_bootloader.c:417:bootloader_disk_attached_cb: Domain 5:failed
>  to attach local disk for bootloader execution
> 
> Looking at the code in libxl, it is polling the specified xenstore
> path hoping for a ready state to turn up.  It waits 10 seconds and
> then gives up.  (Unfortunately it doesn't print the state it found.)
> 
> The backend is qemu.  qemu does not seem to have reported anything
> untoward.  However, looking at the kernel log (full log below):
> 
>  Apr 30 04:11:17 chardonnay1 kernel: [ 1393.403311] xenwatch: page
>  allocation failure: order:5,
>  mode:0x60c0c0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null)

Doing an order=5 allocation for the ring is not necessary here, a
virtual contiguous area via vmalloc() should work, too.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] linux 4.19 xenstore memory allocation failure Re: [linux-4.19 test] 135420: regressions - FAIL
@ 2019-04-30 14:28   ` Juergen Gross
  0 siblings, 0 replies; 20+ messages in thread
From: Juergen Gross @ 2019-04-30 14:28 UTC (permalink / raw)
  To: Ian Jackson, Konrad Rzeszutek Wilk, Roger Pau Monné; +Cc: xen-devel

On 30/04/2019 15:02, Ian Jackson wrote:
> osstest service owner writes ("[linux-4.19 test] 135420: regressions - FAIL"):
>> flight 135420 linux-4.19 real [real]
>> http://logs.test-lab.xenproject.org/osstest/logs/135420/
>>
>> Regressions :-(
>>
>> Tests which did not succeed and are blocking,
>> including tests which could not be run:
> 
>>  test-amd64-amd64-libvirt-vhd 17 guest-start/debian.repeat fail REGR. vs. 129313
> 
> This seems to be a kernel bug.
> 
> The guest creation failed.  The toolstack reports
> 
>  2019-04-30 04:11:17.521+0000: libxl:
>  libxl_device.c:397:libxl__device_disk_set_backend: Disk vdev=xvda
>  spec.backend=qdisk
>  ...
>  2019-04-30 04:11:27.600+0000: libxl:
>  libxl_device.c:1418:libxl__wait_for_backend: Backend
>  /local/domain/0/backend/qdisk/0/51712 not ready
> 
>  2019-04-30 04:11:27.600+0000: libxl:
>  libxl_bootloader.c:417:bootloader_disk_attached_cb: Domain 5:failed
>  to attach local disk for bootloader execution
> 
> Looking at the code in libxl, it is polling the specified xenstore
> path hoping for a ready state to turn up.  It waits 10 seconds and
> then gives up.  (Unfortunately it doesn't print the state it found.)
> 
> The backend is qemu.  qemu does not seem to have reported anything
> untoward.  However, looking at the kernel log (full log below):
> 
>  Apr 30 04:11:17 chardonnay1 kernel: [ 1393.403311] xenwatch: page
>  allocation failure: order:5,
>  mode:0x60c0c0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null)

Doing an order=5 allocation for the ring is not necessary here, a
virtual contiguous area via vmalloc() should work, too.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: linux 4.19 xenstore memory allocation failure Re: [linux-4.19 test] 135420: regressions - FAIL
@ 2019-04-30 14:33     ` Ian Jackson
  0 siblings, 0 replies; 20+ messages in thread
From: Ian Jackson @ 2019-04-30 14:33 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Juergen Gross, xen-devel, Konrad Rzeszutek Wilk, Roger Pau Monne

Jan Beulich writes ("Re: [Xen-devel] linux 4.19 xenstore memory allocation failure Re: [linux-4.19 test] 135420: regressions - FAIL"):
> On 30.04.19 at 15:02, <ian.jackson@citrix.com> wrote:
> > ISTM that there are *two* bugs here:
> > 
> >  1. Whatever caused the memory allocation failure
> 
> An order-5 allocation is set to fail at any time (afaict). I find it
> surprising that struct blkfront_ring_info instances (even arrays
> of them when using multiple rings) get allocated using kcalloc()
> rather than kvcalloc(), considering the size of the structure
> (0x140E0 according to the disassembly of the 5.0.1 driver I
> had to hand).

I will leave answering this to the blkfront/linux folks...

> >  2. That a memory allocation failure can cause permanent loss of a
> >      xenstore watch event
> 
> Well, isn't it sort of expected that an allocation failure will lead
> to further problems?

I would have hoped that it would result in something other than a
hang.  At worst, blkfront ought to go into a state where it *knows*
that it is utterly broken and reports this properly.

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] linux 4.19 xenstore memory allocation failure Re: [linux-4.19 test] 135420: regressions - FAIL
@ 2019-04-30 14:33     ` Ian Jackson
  0 siblings, 0 replies; 20+ messages in thread
From: Ian Jackson @ 2019-04-30 14:33 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Juergen Gross, xen-devel, Konrad Rzeszutek Wilk, Roger Pau Monne

Jan Beulich writes ("Re: [Xen-devel] linux 4.19 xenstore memory allocation failure Re: [linux-4.19 test] 135420: regressions - FAIL"):
> On 30.04.19 at 15:02, <ian.jackson@citrix.com> wrote:
> > ISTM that there are *two* bugs here:
> > 
> >  1. Whatever caused the memory allocation failure
> 
> An order-5 allocation is set to fail at any time (afaict). I find it
> surprising that struct blkfront_ring_info instances (even arrays
> of them when using multiple rings) get allocated using kcalloc()
> rather than kvcalloc(), considering the size of the structure
> (0x140E0 according to the disassembly of the 5.0.1 driver I
> had to hand).

I will leave answering this to the blkfront/linux folks...

> >  2. That a memory allocation failure can cause permanent loss of a
> >      xenstore watch event
> 
> Well, isn't it sort of expected that an allocation failure will lead
> to further problems?

I would have hoped that it would result in something other than a
hang.  At worst, blkfront ought to go into a state where it *knows*
that it is utterly broken and reports this properly.

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: linux 4.19 xenstore memory allocation failure Re: [linux-4.19 test] 135420: regressions - FAIL
@ 2019-04-30 15:55       ` Roger Pau Monné
  0 siblings, 0 replies; 20+ messages in thread
From: Roger Pau Monné @ 2019-04-30 15:55 UTC (permalink / raw)
  To: Ian Jackson; +Cc: Juergen Gross, xen-devel, Jan Beulich, Konrad Rzeszutek Wilk

On Tue, Apr 30, 2019 at 03:33:00PM +0100, Ian Jackson wrote:
> Jan Beulich writes ("Re: [Xen-devel] linux 4.19 xenstore memory allocation failure Re: [linux-4.19 test] 135420: regressions - FAIL"):
> > On 30.04.19 at 15:02, <ian.jackson@citrix.com> wrote:
> > > ISTM that there are *two* bugs here:
> > > 
> > >  1. Whatever caused the memory allocation failure
> > 
> > An order-5 allocation is set to fail at any time (afaict). I find it
> > surprising that struct blkfront_ring_info instances (even arrays
> > of them when using multiple rings) get allocated using kcalloc()
> > rather than kvcalloc(), considering the size of the structure
> > (0x140E0 according to the disassembly of the 5.0.1 driver I
> > had to hand).
> 
> I will leave answering this to the blkfront/linux folks...

I think those allocations used to be small enough that kcalloc was
likely fine. Now with multiple rings, and multiple pages per ring
those have grown to a point where kcalloc is not fine anymore. I will
prepare a patch to switch to kvcalloc.

> > >  2. That a memory allocation failure can cause permanent loss of a
> > >      xenstore watch event
> > 
> > Well, isn't it sort of expected that an allocation failure will lead
> > to further problems?
> 
> I would have hoped that it would result in something other than a
> hang.  At worst, blkfront ought to go into a state where it *knows*
> that it is utterly broken and reports this properly.

I haven't yet checked all the possible error paths, but the ones I've
looked at use xenbus_dev_fatal which switches the device state to
closing and writes the error message into xenstore. However closing
state is not an error state, and can be used as part of the normal
flow of a PV device (for example in order to force a reconnection).

I don't think there's a documented way of reporting an unrecoverable
PV frontend error, but I might be mistaken.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] linux 4.19 xenstore memory allocation failure Re: [linux-4.19 test] 135420: regressions - FAIL
@ 2019-04-30 15:55       ` Roger Pau Monné
  0 siblings, 0 replies; 20+ messages in thread
From: Roger Pau Monné @ 2019-04-30 15:55 UTC (permalink / raw)
  To: Ian Jackson; +Cc: Juergen Gross, xen-devel, Jan Beulich, Konrad Rzeszutek Wilk

On Tue, Apr 30, 2019 at 03:33:00PM +0100, Ian Jackson wrote:
> Jan Beulich writes ("Re: [Xen-devel] linux 4.19 xenstore memory allocation failure Re: [linux-4.19 test] 135420: regressions - FAIL"):
> > On 30.04.19 at 15:02, <ian.jackson@citrix.com> wrote:
> > > ISTM that there are *two* bugs here:
> > > 
> > >  1. Whatever caused the memory allocation failure
> > 
> > An order-5 allocation is set to fail at any time (afaict). I find it
> > surprising that struct blkfront_ring_info instances (even arrays
> > of them when using multiple rings) get allocated using kcalloc()
> > rather than kvcalloc(), considering the size of the structure
> > (0x140E0 according to the disassembly of the 5.0.1 driver I
> > had to hand).
> 
> I will leave answering this to the blkfront/linux folks...

I think those allocations used to be small enough that kcalloc was
likely fine. Now with multiple rings, and multiple pages per ring
those have grown to a point where kcalloc is not fine anymore. I will
prepare a patch to switch to kvcalloc.

> > >  2. That a memory allocation failure can cause permanent loss of a
> > >      xenstore watch event
> > 
> > Well, isn't it sort of expected that an allocation failure will lead
> > to further problems?
> 
> I would have hoped that it would result in something other than a
> hang.  At worst, blkfront ought to go into a state where it *knows*
> that it is utterly broken and reports this properly.

I haven't yet checked all the possible error paths, but the ones I've
looked at use xenbus_dev_fatal which switches the device state to
closing and writes the error message into xenstore. However closing
state is not an error state, and can be used as part of the normal
flow of a PV device (for example in order to force a reconnection).

I don't think there's a documented way of reporting an unrecoverable
PV frontend error, but I might be mistaken.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: linux 4.19 xenstore memory allocation failure Re: [linux-4.19 test] 135420: regressions - FAIL
@ 2019-05-01  9:47         ` Ian Jackson
  0 siblings, 0 replies; 20+ messages in thread
From: Ian Jackson @ 2019-05-01  9:47 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Juergen Gross, xen-devel, Jan Beulich, Konrad Rzeszutek Wilk

Roger Pau Monne writes ("Re: [Xen-devel] linux 4.19 xenstore memory allocation failure Re: [linux-4.19 test] 135420: regressions - FAIL"):
> On Tue, Apr 30, 2019 at 03:33:00PM +0100, Ian Jackson wrote:
> > I will leave answering this to the blkfront/linux folks...
> 
> I think those allocations used to be small enough that kcalloc was
> likely fine. Now with multiple rings, and multiple pages per ring
> those have grown to a point where kcalloc is not fine anymore. I will
> prepare a patch to switch to kvcalloc.

Thanks.

FYI this same issue was reported by osstest in
  Subject: [linux-linus test] 135426: regressions - FAIL
ie on linux master.

ISTM that this patch you propose will have to go to stable branches
too ?

> > I would have hoped that it would result in something other than a
> > hang.  At worst, blkfront ought to go into a state where it *knows*
> > that it is utterly broken and reports this properly.
> 
> I haven't yet checked all the possible error paths, but the ones I've
> looked at use xenbus_dev_fatal which switches the device state to
> closing and writes the error message into xenstore.

What if you can't write to xenstore ?  Can we at least have a copy in
the kernel log ?  There might be other errors besides this memory
exhaustion, surely.

Error handling when the usual error reporting path is busted is
difficult indeed, but it is very helpful to have a fallback.

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] linux 4.19 xenstore memory allocation failure Re: [linux-4.19 test] 135420: regressions - FAIL
@ 2019-05-01  9:47         ` Ian Jackson
  0 siblings, 0 replies; 20+ messages in thread
From: Ian Jackson @ 2019-05-01  9:47 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Juergen Gross, xen-devel, Jan Beulich, Konrad Rzeszutek Wilk

Roger Pau Monne writes ("Re: [Xen-devel] linux 4.19 xenstore memory allocation failure Re: [linux-4.19 test] 135420: regressions - FAIL"):
> On Tue, Apr 30, 2019 at 03:33:00PM +0100, Ian Jackson wrote:
> > I will leave answering this to the blkfront/linux folks...
> 
> I think those allocations used to be small enough that kcalloc was
> likely fine. Now with multiple rings, and multiple pages per ring
> those have grown to a point where kcalloc is not fine anymore. I will
> prepare a patch to switch to kvcalloc.

Thanks.

FYI this same issue was reported by osstest in
  Subject: [linux-linus test] 135426: regressions - FAIL
ie on linux master.

ISTM that this patch you propose will have to go to stable branches
too ?

> > I would have hoped that it would result in something other than a
> > hang.  At worst, blkfront ought to go into a state where it *knows*
> > that it is utterly broken and reports this properly.
> 
> I haven't yet checked all the possible error paths, but the ones I've
> looked at use xenbus_dev_fatal which switches the device state to
> closing and writes the error message into xenstore.

What if you can't write to xenstore ?  Can we at least have a copy in
the kernel log ?  There might be other errors besides this memory
exhaustion, surely.

Error handling when the usual error reporting path is busted is
difficult indeed, but it is very helpful to have a fallback.

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: linux 4.19 xenstore memory allocation failure Re: [linux-4.19 test] 135420: regressions - FAIL
@ 2019-05-02  9:45           ` Roger Pau Monné
  0 siblings, 0 replies; 20+ messages in thread
From: Roger Pau Monné @ 2019-05-02  9:45 UTC (permalink / raw)
  To: Ian Jackson; +Cc: Juergen Gross, xen-devel, Jan Beulich, Konrad Rzeszutek Wilk

On Wed, May 01, 2019 at 10:47:49AM +0100, Ian Jackson wrote:
> Roger Pau Monne writes ("Re: [Xen-devel] linux 4.19 xenstore memory allocation failure Re: [linux-4.19 test] 135420: regressions - FAIL"):
> > On Tue, Apr 30, 2019 at 03:33:00PM +0100, Ian Jackson wrote:
> > > I will leave answering this to the blkfront/linux folks...
> > 
> > I think those allocations used to be small enough that kcalloc was
> > likely fine. Now with multiple rings, and multiple pages per ring
> > those have grown to a point where kcalloc is not fine anymore. I will
> > prepare a patch to switch to kvcalloc.
> 
> Thanks.
> 
> FYI this same issue was reported by osstest in
>   Subject: [linux-linus test] 135426: regressions - FAIL
> ie on linux master.
> 
> ISTM that this patch you propose will have to go to stable branches
> too ?

I agree.

> > > I would have hoped that it would result in something other than a
> > > hang.  At worst, blkfront ought to go into a state where it *knows*
> > > that it is utterly broken and reports this properly.
> > 
> > I haven't yet checked all the possible error paths, but the ones I've
> > looked at use xenbus_dev_fatal which switches the device state to
> > closing and writes the error message into xenstore.
> 
> What if you can't write to xenstore ?  Can we at least have a copy in
> the kernel log ?  There might be other errors besides this memory
> exhaustion, surely.

There's a call to dev_err also, which should print the same error
that's written to xenstore on the console. That however requires the
memory allocation of page in order to format the string to be printed
(see xenbus_va_dev_error).

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] linux 4.19 xenstore memory allocation failure Re: [linux-4.19 test] 135420: regressions - FAIL
@ 2019-05-02  9:45           ` Roger Pau Monné
  0 siblings, 0 replies; 20+ messages in thread
From: Roger Pau Monné @ 2019-05-02  9:45 UTC (permalink / raw)
  To: Ian Jackson; +Cc: Juergen Gross, xen-devel, Jan Beulich, Konrad Rzeszutek Wilk

On Wed, May 01, 2019 at 10:47:49AM +0100, Ian Jackson wrote:
> Roger Pau Monne writes ("Re: [Xen-devel] linux 4.19 xenstore memory allocation failure Re: [linux-4.19 test] 135420: regressions - FAIL"):
> > On Tue, Apr 30, 2019 at 03:33:00PM +0100, Ian Jackson wrote:
> > > I will leave answering this to the blkfront/linux folks...
> > 
> > I think those allocations used to be small enough that kcalloc was
> > likely fine. Now with multiple rings, and multiple pages per ring
> > those have grown to a point where kcalloc is not fine anymore. I will
> > prepare a patch to switch to kvcalloc.
> 
> Thanks.
> 
> FYI this same issue was reported by osstest in
>   Subject: [linux-linus test] 135426: regressions - FAIL
> ie on linux master.
> 
> ISTM that this patch you propose will have to go to stable branches
> too ?

I agree.

> > > I would have hoped that it would result in something other than a
> > > hang.  At worst, blkfront ought to go into a state where it *knows*
> > > that it is utterly broken and reports this properly.
> > 
> > I haven't yet checked all the possible error paths, but the ones I've
> > looked at use xenbus_dev_fatal which switches the device state to
> > closing and writes the error message into xenstore.
> 
> What if you can't write to xenstore ?  Can we at least have a copy in
> the kernel log ?  There might be other errors besides this memory
> exhaustion, surely.

There's a call to dev_err also, which should print the same error
that's written to xenstore on the console. That however requires the
memory allocation of page in order to format the string to be printed
(see xenbus_va_dev_error).

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: linux 4.19 xenstore memory allocation failure Re: [linux-4.19 test] 135420: regressions - FAIL
@ 2019-05-02 10:42             ` Ian Jackson
  0 siblings, 0 replies; 20+ messages in thread
From: Ian Jackson @ 2019-05-02 10:42 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Juergen Gross, xen-devel, Jan Beulich, Konrad Rzeszutek Wilk

Roger Pau Monne writes ("Re: [Xen-devel] linux 4.19 xenstore memory allocation failure Re: [linux-4.19 test] 135420: regressions - FAIL"):
> On Wed, May 01, 2019 at 10:47:49AM +0100, Ian Jackson wrote:
> > What if you can't write to xenstore ?  Can we at least have a copy in
> > the kernel log ?  There might be other errors besides this memory
> > exhaustion, surely.
> 
> There's a call to dev_err also, which should print the same error
> that's written to xenstore on the console. That however requires the
> memory allocation of page in order to format the string to be printed
> (see xenbus_va_dev_error).

Can we assume that memory exhaustion will always result in some
message from the memory allocator ?  If so then this new log message
would be a useful addition for cases *other* than a complete lack of
any available free page.  Eg, foolishly trying a large kcalloc
allocation, or some error not related to lack of memory at all.

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] linux 4.19 xenstore memory allocation failure Re: [linux-4.19 test] 135420: regressions - FAIL
@ 2019-05-02 10:42             ` Ian Jackson
  0 siblings, 0 replies; 20+ messages in thread
From: Ian Jackson @ 2019-05-02 10:42 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Juergen Gross, xen-devel, Jan Beulich, Konrad Rzeszutek Wilk

Roger Pau Monne writes ("Re: [Xen-devel] linux 4.19 xenstore memory allocation failure Re: [linux-4.19 test] 135420: regressions - FAIL"):
> On Wed, May 01, 2019 at 10:47:49AM +0100, Ian Jackson wrote:
> > What if you can't write to xenstore ?  Can we at least have a copy in
> > the kernel log ?  There might be other errors besides this memory
> > exhaustion, surely.
> 
> There's a call to dev_err also, which should print the same error
> that's written to xenstore on the console. That however requires the
> memory allocation of page in order to format the string to be printed
> (see xenbus_va_dev_error).

Can we assume that memory exhaustion will always result in some
message from the memory allocator ?  If so then this new log message
would be a useful addition for cases *other* than a complete lack of
any available free page.  Eg, foolishly trying a large kcalloc
allocation, or some error not related to lack of memory at all.

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: linux 4.19 xenstore memory allocation failure Re: [linux-4.19 test] 135420: regressions - FAIL
@ 2019-05-02 11:04               ` Roger Pau Monné
  0 siblings, 0 replies; 20+ messages in thread
From: Roger Pau Monné @ 2019-05-02 11:04 UTC (permalink / raw)
  To: Ian Jackson; +Cc: Juergen Gross, xen-devel, Jan Beulich, Konrad Rzeszutek Wilk

On Thu, May 02, 2019 at 11:42:04AM +0100, Ian Jackson wrote:
> Roger Pau Monne writes ("Re: [Xen-devel] linux 4.19 xenstore memory allocation failure Re: [linux-4.19 test] 135420: regressions - FAIL"):
> > On Wed, May 01, 2019 at 10:47:49AM +0100, Ian Jackson wrote:
> > > What if you can't write to xenstore ?  Can we at least have a copy in
> > > the kernel log ?  There might be other errors besides this memory
> > > exhaustion, surely.
> > 
> > There's a call to dev_err also, which should print the same error
> > that's written to xenstore on the console. That however requires the
> > memory allocation of page in order to format the string to be printed
> > (see xenbus_va_dev_error).
> 
> Can we assume that memory exhaustion will always result in some
> message from the memory allocator ?  If so then this new log message

I'm not sure I understand to what new log message you are referring
to. The dev_err call is already present in xenbus_va_dev_error, so
everything that's attempted to write to xenstore should also be
printed on the console.

> would be a useful addition for cases *other* than a complete lack of
> any available free page.  Eg, foolishly trying a large kcalloc
> allocation, or some error not related to lack of memory at all.

If there's no real memory shortage a failure to attach a frontend
should result in a message being written to both xenstore and the
console with the current Linux code AFAICT provided
xenbus_dev_{fatal/error} or xenbus_switch_fatal is used.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] linux 4.19 xenstore memory allocation failure Re: [linux-4.19 test] 135420: regressions - FAIL
@ 2019-05-02 11:04               ` Roger Pau Monné
  0 siblings, 0 replies; 20+ messages in thread
From: Roger Pau Monné @ 2019-05-02 11:04 UTC (permalink / raw)
  To: Ian Jackson; +Cc: Juergen Gross, xen-devel, Jan Beulich, Konrad Rzeszutek Wilk

On Thu, May 02, 2019 at 11:42:04AM +0100, Ian Jackson wrote:
> Roger Pau Monne writes ("Re: [Xen-devel] linux 4.19 xenstore memory allocation failure Re: [linux-4.19 test] 135420: regressions - FAIL"):
> > On Wed, May 01, 2019 at 10:47:49AM +0100, Ian Jackson wrote:
> > > What if you can't write to xenstore ?  Can we at least have a copy in
> > > the kernel log ?  There might be other errors besides this memory
> > > exhaustion, surely.
> > 
> > There's a call to dev_err also, which should print the same error
> > that's written to xenstore on the console. That however requires the
> > memory allocation of page in order to format the string to be printed
> > (see xenbus_va_dev_error).
> 
> Can we assume that memory exhaustion will always result in some
> message from the memory allocator ?  If so then this new log message

I'm not sure I understand to what new log message you are referring
to. The dev_err call is already present in xenbus_va_dev_error, so
everything that's attempted to write to xenstore should also be
printed on the console.

> would be a useful addition for cases *other* than a complete lack of
> any available free page.  Eg, foolishly trying a large kcalloc
> allocation, or some error not related to lack of memory at all.

If there's no real memory shortage a failure to attach a frontend
should result in a message being written to both xenstore and the
console with the current Linux code AFAICT provided
xenbus_dev_{fatal/error} or xenbus_switch_fatal is used.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: linux 4.19 xenstore memory allocation failure Re: [linux-4.19 test] 135420: regressions - FAIL
@ 2019-05-02 12:24                 ` Ian Jackson
  0 siblings, 0 replies; 20+ messages in thread
From: Ian Jackson @ 2019-05-02 12:24 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Juergen Gross, xen-devel, Jan Beulich, Konrad Rzeszutek Wilk

Roger Pau Monne writes ("Re: [Xen-devel] linux 4.19 xenstore memory allocation failure Re: [linux-4.19 test] 135420: regressions - FAIL"):
> On Thu, May 02, 2019 at 11:42:04AM +0100, Ian Jackson wrote:
> > Can we assume that memory exhaustion will always result in some
> > message from the memory allocator ?  If so then this new log message
> 
> I'm not sure I understand to what new log message you are referring
> to. The dev_err call is already present in xenbus_va_dev_error, so
> everything that's attempted to write to xenstore should also be
> printed on the console.

Oh, I misunderstood.  I thought you were talking about a hypothetical
new dev_err call.

Does that mean that you think in this case it tried to write a message
to the console and that too failed due to lack of memory ?

In which case it probably did the best it could.

Thanks,
Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] linux 4.19 xenstore memory allocation failure Re: [linux-4.19 test] 135420: regressions - FAIL
@ 2019-05-02 12:24                 ` Ian Jackson
  0 siblings, 0 replies; 20+ messages in thread
From: Ian Jackson @ 2019-05-02 12:24 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Juergen Gross, xen-devel, Jan Beulich, Konrad Rzeszutek Wilk

Roger Pau Monne writes ("Re: [Xen-devel] linux 4.19 xenstore memory allocation failure Re: [linux-4.19 test] 135420: regressions - FAIL"):
> On Thu, May 02, 2019 at 11:42:04AM +0100, Ian Jackson wrote:
> > Can we assume that memory exhaustion will always result in some
> > message from the memory allocator ?  If so then this new log message
> 
> I'm not sure I understand to what new log message you are referring
> to. The dev_err call is already present in xenbus_va_dev_error, so
> everything that's attempted to write to xenstore should also be
> printed on the console.

Oh, I misunderstood.  I thought you were talking about a hypothetical
new dev_err call.

Does that mean that you think in this case it tried to write a message
to the console and that too failed due to lack of memory ?

In which case it probably did the best it could.

Thanks,
Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2019-05-02 12:24 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-30 13:02 linux 4.19 xenstore memory allocation failure Re: [linux-4.19 test] 135420: regressions - FAIL Ian Jackson
2019-04-30 13:02 ` [Xen-devel] " Ian Jackson
2019-04-30 14:26 ` Jan Beulich
2019-04-30 14:26   ` [Xen-devel] " Jan Beulich
2019-04-30 14:33   ` Ian Jackson
2019-04-30 14:33     ` [Xen-devel] " Ian Jackson
2019-04-30 15:55     ` Roger Pau Monné
2019-04-30 15:55       ` [Xen-devel] " Roger Pau Monné
2019-05-01  9:47       ` Ian Jackson
2019-05-01  9:47         ` [Xen-devel] " Ian Jackson
2019-05-02  9:45         ` Roger Pau Monné
2019-05-02  9:45           ` [Xen-devel] " Roger Pau Monné
2019-05-02 10:42           ` Ian Jackson
2019-05-02 10:42             ` [Xen-devel] " Ian Jackson
2019-05-02 11:04             ` Roger Pau Monné
2019-05-02 11:04               ` [Xen-devel] " Roger Pau Monné
2019-05-02 12:24               ` Ian Jackson
2019-05-02 12:24                 ` [Xen-devel] " Ian Jackson
2019-04-30 14:28 ` Juergen Gross
2019-04-30 14:28   ` [Xen-devel] " Juergen Gross

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.