* Linux kernel tmem regression v4.1 -> v4.4
@ 2017-09-28 8:42 James Dingwall
2017-09-28 13:31 ` Juergen Gross
0 siblings, 1 reply; 4+ messages in thread
From: James Dingwall @ 2017-09-28 8:42 UTC (permalink / raw)
To: xen-devel; +Cc: jgross
Hi,
I am trying to migrate my domU instances from v4.1.44 to v4.4.88 and it
seems that whether or not e820_host = 1 in the domU configuration is the
cause of the following stack trace. Please note I have #define MC_DEBUG
1 in arch/x86/xen/multicall.c so the failed hypervisor call is logged.
I'm unsure which side of the kernel/xen boundary this really falls.
Sep 25 22:02:50 [kernel] 1 multicall(s) failed: cpu 0
Sep 25 22:02:50 [kernel] CPU: 0 PID: 22 Comm: kworker/0:1 Not tainted
4.4.88 #157
Sep 25 22:02:50 [kernel] Workqueue: events balloon_process
Sep 25 22:02:50 [kernel] 0000000000000000 ffff88001e31fa78
ffffffff812f9a28 ffff88001f80a220
Sep 25 22:02:50 [kernel] ffff88001f80a238 ffff88001e31fab0
ffffffff81004d79 0000000000115bb7
Sep 25 22:02:50 [kernel] ffff88001f80a270 ffff88001f80b330
ffff880195bb7000 0000000000000000
Sep 25 22:02:50 [kernel] Call Trace:
Sep 25 22:02:50 [kernel] [<ffffffff812f9a28>] dump_stack+0x61/0x7e
Sep 25 22:02:50 [kernel] [<ffffffff81004d79>] xen_mc_flush+0xfd/0x1a0
Sep 25 22:02:50 [kernel] [<ffffffff81006be5>] xen_alloc_pte+0x176/0x18e
Sep 25 22:02:50 [kernel] [<ffffffff8154521b>] phys_pmd_init+0x23c/0x2af
Sep 25 22:02:50 [kernel] [<ffffffff8154549b>] phys_pud_init+0x20d/0x2d4
Sep 25 22:02:50 [kernel] [<ffffffff81546022>]
kernel_physical_mapping_init+0x15e/0x233
Sep 25 22:02:50 [kernel] [<ffffffff81542694>]
init_memory_mapping+0x1c7/0x264
Sep 25 22:02:50 [kernel] [<ffffffff810411be>] arch_add_memory+0x50/0xda
Sep 25 22:02:50 [kernel] [<ffffffff81543191>]
add_memory_resource+0x9c/0x12d
Sep 25 22:02:50 [kernel] [<ffffffff8137462f>]
reserve_additional_memory+0x125/0x16b
Sep 25 22:02:50 [kernel] [<ffffffff8137482d>]
balloon_process+0x1b8/0x2c5
Sep 25 22:02:50 [kernel] [<ffffffff8107df27>] ?
__raw_callee_save___pv_queued_spin_unlock+0x11/0x1e
Sep 25 22:02:50 [kernel] [<ffffffff81060c18>]
process_one_work+0x19d/0x2a9
Sep 25 22:02:50 [kernel] [<ffffffff8106162a>] worker_thread+0x27d/0x36e
Sep 25 22:02:50 [kernel] [<ffffffff810613ad>] ?
rescuer_thread+0x2a2/0x2a2
Sep 25 22:02:50 [kernel] [<ffffffff8106575b>] kthread+0xda/0xe2
Sep 25 22:02:50 [kernel] [<ffffffff81065681>] ?
kthread_worker_fn+0x13f/0x13f
Sep 25 22:02:50 [kernel] [<ffffffff8154c57f>] ret_from_fork+0x3f/0x70
Sep 25 22:02:50 [kernel] [<ffffffff81065681>] ?
kthread_worker_fn+0x13f/0x13f
Sep 25 22:02:50 [kernel] call 1/2: op=14 arg=[ffff880115bb7000]
result=0_xen_alloc_pte+0x81/0x18e
Sep 25 22:02:50 [kernel] call 2/2: op=26 arg=[ffff88001f80b330]
result=-1_xen_alloc_pte+0xd7/0x18e
Sep 25 22:02:50 [kernel] ------------[ cut here ]------------
xen version is 4.8.1-r3 from Gentoo, dom0 is 4.1.44. I have seen the
same trace logged in an Ubuntu 16.04 guest with a 4.4 kernel. I don't
have a specific test case which triggers this but it will usually appear
within 24 hours but it depends on how much work the domU has been
performing (so probably how much ballooning it has been doing). Setting
e820_host = 0 in the config seems to prevent this happening.
In the kernel git log v4.1.44..v4.4.89 -- :/arch/x86/xen/mmu.c shows
some commits which seem to relate to the failed hypervisor operation and
working round the e820 map. I have not done a bisect to try and isolate
this more definitively. I suspect this could be a more general balloon
issue but perhaps is revealed with tmem more easily as the rate of
ballooning up/down is higher than occasional manual changes.
This is the guest /proc/iomem with e820_host = 0:
KERNEL: 4.4.89 #157 SMP Wed Sep 27 19:30:28 BST 2017
TMEM MODULE PARAMS:
/sys/module/tmem/parameters/cleancache: Y
/sys/module/tmem/parameters/frontswap: Y
/sys/module/tmem/parameters/selfballooning: Y
/sys/module/tmem/parameters/selfshrinking: Y
KERNEL COMMAND LINE: root=/dev/ram0 init=/linuxrc ramdisk=8192
real_root=/dev/systemvg/rootlv udev doscsi dolvm tmem
/proc/iomem:
00000000-00000fff : reserved
00001000-0009ffff : System RAM
000a0000-000fffff : reserved
000f0000-000fffff : System ROM
00100000-3fffffff : System RAM
01000000-015509ad : Kernel code
015509ae-01807ebf : Kernel data
01914000-019c1fff : Kernel bss
fee00000-fee00fff : Local APIC
And with e820_host = 1:
KERNEL: 4.4.89 #157 SMP Wed Sep 27 19:30:28 BST 2017
TMEM MODULE PARAMS:
/sys/module/tmem/parameters/cleancache: Y
/sys/module/tmem/parameters/frontswap: Y
/sys/module/tmem/parameters/selfballooning: Y
/sys/module/tmem/parameters/selfshrinking: Y
KERNEL COMMAND LINE: root=/dev/ram0 init=/linuxrc ramdisk=8192
real_root=/dev/systemvg/rootlv udev doscsi dolvm tmem
/proc/iomem:
00000000-00000fff : reserved
00001000-0009ffff : System RAM
000a0000-000fffff : reserved
000f0000-000fffff : System ROM
00100000-1fffffff : System RAM
01000000-015509ad : Kernel code
015509ae-01807ebf : Kernel data
01914000-019c1fff : Kernel bss
20000000-d7feffff : Unusable memory
d7ff0000-d7ffdfff : ACPI Tables
d7ffe000-d7ffffff : ACPI Non-volatile Storage
fee00000-fee00fff : Local APIC
100000000-11fffffff : System RAM
If other information about the environment is useful please let me know.
Thanks,
James
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Linux kernel tmem regression v4.1 -> v4.4
2017-09-28 8:42 Linux kernel tmem regression v4.1 -> v4.4 James Dingwall
@ 2017-09-28 13:31 ` Juergen Gross
2017-09-29 4:38 ` Juergen Gross
0 siblings, 1 reply; 4+ messages in thread
From: Juergen Gross @ 2017-09-28 13:31 UTC (permalink / raw)
To: James Dingwall, xen-devel
On 28/09/17 10:42, James Dingwall wrote:
> Hi,
>
> I am trying to migrate my domU instances from v4.1.44 to v4.4.88 and it
> seems that whether or not e820_host = 1 in the domU configuration is the
> cause of the following stack trace. Please note I have #define MC_DEBUG
> 1 in arch/x86/xen/multicall.c so the failed hypervisor call is logged.
> I'm unsure which side of the kernel/xen boundary this really falls.
>
> Sep 25 22:02:50 [kernel] 1 multicall(s) failed: cpu 0
> Sep 25 22:02:50 [kernel] CPU: 0 PID: 22 Comm: kworker/0:1 Not tainted
> 4.4.88 #157
> Sep 25 22:02:50 [kernel] Workqueue: events balloon_process
> Sep 25 22:02:50 [kernel] 0000000000000000 ffff88001e31fa78
> ffffffff812f9a28 ffff88001f80a220
> Sep 25 22:02:50 [kernel] ffff88001f80a238 ffff88001e31fab0
> ffffffff81004d79 0000000000115bb7
> Sep 25 22:02:50 [kernel] ffff88001f80a270 ffff88001f80b330
> ffff880195bb7000 0000000000000000
> Sep 25 22:02:50 [kernel] Call Trace:
> Sep 25 22:02:50 [kernel] [<ffffffff812f9a28>] dump_stack+0x61/0x7e
> Sep 25 22:02:50 [kernel] [<ffffffff81004d79>] xen_mc_flush+0xfd/0x1a0
> Sep 25 22:02:50 [kernel] [<ffffffff81006be5>] xen_alloc_pte+0x176/0x18e
> Sep 25 22:02:50 [kernel] [<ffffffff8154521b>] phys_pmd_init+0x23c/0x2af
> Sep 25 22:02:50 [kernel] [<ffffffff8154549b>] phys_pud_init+0x20d/0x2d4
> Sep 25 22:02:50 [kernel] [<ffffffff81546022>]
> kernel_physical_mapping_init+0x15e/0x233
> Sep 25 22:02:50 [kernel] [<ffffffff81542694>]
> init_memory_mapping+0x1c7/0x264
> Sep 25 22:02:50 [kernel] [<ffffffff810411be>] arch_add_memory+0x50/0xda
> Sep 25 22:02:50 [kernel] [<ffffffff81543191>]
> add_memory_resource+0x9c/0x12d
> Sep 25 22:02:50 [kernel] [<ffffffff8137462f>]
> reserve_additional_memory+0x125/0x16b
> Sep 25 22:02:50 [kernel] [<ffffffff8137482d>] balloon_process+0x1b8/0x2c5
> Sep 25 22:02:50 [kernel] [<ffffffff8107df27>] ?
> __raw_callee_save___pv_queued_spin_unlock+0x11/0x1e
> Sep 25 22:02:50 [kernel] [<ffffffff81060c18>] process_one_work+0x19d/0x2a9
> Sep 25 22:02:50 [kernel] [<ffffffff8106162a>] worker_thread+0x27d/0x36e
> Sep 25 22:02:50 [kernel] [<ffffffff810613ad>] ? rescuer_thread+0x2a2/0x2a2
> Sep 25 22:02:50 [kernel] [<ffffffff8106575b>] kthread+0xda/0xe2
> Sep 25 22:02:50 [kernel] [<ffffffff81065681>] ?
> kthread_worker_fn+0x13f/0x13f
> Sep 25 22:02:50 [kernel] [<ffffffff8154c57f>] ret_from_fork+0x3f/0x70
> Sep 25 22:02:50 [kernel] [<ffffffff81065681>] ?
> kthread_worker_fn+0x13f/0x13f
> Sep 25 22:02:50 [kernel] call 1/2: op=14 arg=[ffff880115bb7000]
> result=0_xen_alloc_pte+0x81/0x18e
> Sep 25 22:02:50 [kernel] call 2/2: op=26 arg=[ffff88001f80b330]
> result=-1_xen_alloc_pte+0xd7/0x18e
> Sep 25 22:02:50 [kernel] ------------[ cut here ]------------
>
>
> xen version is 4.8.1-r3 from Gentoo, dom0 is 4.1.44. I have seen the
> same trace logged in an Ubuntu 16.04 guest with a 4.4 kernel. I don't
> have a specific test case which triggers this but it will usually appear
> within 24 hours but it depends on how much work the domU has been
> performing (so probably how much ballooning it has been doing). Setting
> e820_host = 0 in the config seems to prevent this happening.
>
> In the kernel git log v4.1.44..v4.4.89 -- :/arch/x86/xen/mmu.c shows
> some commits which seem to relate to the failed hypervisor operation and
> working round the e820 map. I have not done a bisect to try and isolate
> this more definitively. I suspect this could be a more general balloon
> issue but perhaps is revealed with tmem more easily as the rate of
> ballooning up/down is higher than occasional manual changes.
>
> This is the guest /proc/iomem with e820_host = 0:
>
> KERNEL: 4.4.89 #157 SMP Wed Sep 27 19:30:28 BST 2017
> TMEM MODULE PARAMS:
> /sys/module/tmem/parameters/cleancache: Y
> /sys/module/tmem/parameters/frontswap: Y
> /sys/module/tmem/parameters/selfballooning: Y
> /sys/module/tmem/parameters/selfshrinking: Y
> KERNEL COMMAND LINE: root=/dev/ram0 init=/linuxrc ramdisk=8192
> real_root=/dev/systemvg/rootlv udev doscsi dolvm tmem
> /proc/iomem:
> 00000000-00000fff : reserved
> 00001000-0009ffff : System RAM
> 000a0000-000fffff : reserved
> 000f0000-000fffff : System ROM
> 00100000-3fffffff : System RAM
> 01000000-015509ad : Kernel code
> 015509ae-01807ebf : Kernel data
> 01914000-019c1fff : Kernel bss
> fee00000-fee00fff : Local APIC
>
> And with e820_host = 1:
>
> KERNEL: 4.4.89 #157 SMP Wed Sep 27 19:30:28 BST 2017
> TMEM MODULE PARAMS:
> /sys/module/tmem/parameters/cleancache: Y
> /sys/module/tmem/parameters/frontswap: Y
> /sys/module/tmem/parameters/selfballooning: Y
> /sys/module/tmem/parameters/selfshrinking: Y
> KERNEL COMMAND LINE: root=/dev/ram0 init=/linuxrc ramdisk=8192
> real_root=/dev/systemvg/rootlv udev doscsi dolvm tmem
> /proc/iomem:
> 00000000-00000fff : reserved
> 00001000-0009ffff : System RAM
> 000a0000-000fffff : reserved
> 000f0000-000fffff : System ROM
> 00100000-1fffffff : System RAM
> 01000000-015509ad : Kernel code
> 015509ae-01807ebf : Kernel data
> 01914000-019c1fff : Kernel bss
> 20000000-d7feffff : Unusable memory
> d7ff0000-d7ffdfff : ACPI Tables
> d7ffe000-d7ffffff : ACPI Non-volatile Storage
> fee00000-fee00fff : Local APIC
> 100000000-11fffffff : System RAM
>
>
> If other information about the environment is useful please let me know.
Cc-ing Konrad, who should be much more familiar with tmem than I am.
Juergen
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Linux kernel tmem regression v4.1 -> v4.4
2017-09-28 13:31 ` Juergen Gross
@ 2017-09-29 4:38 ` Juergen Gross
2017-09-29 8:50 ` Wei Liu
0 siblings, 1 reply; 4+ messages in thread
From: Juergen Gross @ 2017-09-29 4:38 UTC (permalink / raw)
To: James Dingwall, xen-devel
On 28/09/17 15:31, Juergen Gross wrote:
> On 28/09/17 10:42, James Dingwall wrote:
>> Hi,
>>
>> I am trying to migrate my domU instances from v4.1.44 to v4.4.88 and it
>> seems that whether or not e820_host = 1 in the domU configuration is the
>> cause of the following stack trace. Please note I have #define MC_DEBUG
>> 1 in arch/x86/xen/multicall.c so the failed hypervisor call is logged.
>> I'm unsure which side of the kernel/xen boundary this really falls.
>>
>> Sep 25 22:02:50 [kernel] 1 multicall(s) failed: cpu 0
>> Sep 25 22:02:50 [kernel] CPU: 0 PID: 22 Comm: kworker/0:1 Not tainted
>> 4.4.88 #157
>> Sep 25 22:02:50 [kernel] Workqueue: events balloon_process
>> Sep 25 22:02:50 [kernel] 0000000000000000 ffff88001e31fa78
>> ffffffff812f9a28 ffff88001f80a220
>> Sep 25 22:02:50 [kernel] ffff88001f80a238 ffff88001e31fab0
>> ffffffff81004d79 0000000000115bb7
>> Sep 25 22:02:50 [kernel] ffff88001f80a270 ffff88001f80b330
>> ffff880195bb7000 0000000000000000
>> Sep 25 22:02:50 [kernel] Call Trace:
>> Sep 25 22:02:50 [kernel] [<ffffffff812f9a28>] dump_stack+0x61/0x7e
>> Sep 25 22:02:50 [kernel] [<ffffffff81004d79>] xen_mc_flush+0xfd/0x1a0
>> Sep 25 22:02:50 [kernel] [<ffffffff81006be5>] xen_alloc_pte+0x176/0x18e
>> Sep 25 22:02:50 [kernel] [<ffffffff8154521b>] phys_pmd_init+0x23c/0x2af
>> Sep 25 22:02:50 [kernel] [<ffffffff8154549b>] phys_pud_init+0x20d/0x2d4
>> Sep 25 22:02:50 [kernel] [<ffffffff81546022>]
>> kernel_physical_mapping_init+0x15e/0x233
>> Sep 25 22:02:50 [kernel] [<ffffffff81542694>]
>> init_memory_mapping+0x1c7/0x264
>> Sep 25 22:02:50 [kernel] [<ffffffff810411be>] arch_add_memory+0x50/0xda
>> Sep 25 22:02:50 [kernel] [<ffffffff81543191>]
>> add_memory_resource+0x9c/0x12d
>> Sep 25 22:02:50 [kernel] [<ffffffff8137462f>]
>> reserve_additional_memory+0x125/0x16b
>> Sep 25 22:02:50 [kernel] [<ffffffff8137482d>] balloon_process+0x1b8/0x2c5
>> Sep 25 22:02:50 [kernel] [<ffffffff8107df27>] ?
>> __raw_callee_save___pv_queued_spin_unlock+0x11/0x1e
>> Sep 25 22:02:50 [kernel] [<ffffffff81060c18>] process_one_work+0x19d/0x2a9
>> Sep 25 22:02:50 [kernel] [<ffffffff8106162a>] worker_thread+0x27d/0x36e
>> Sep 25 22:02:50 [kernel] [<ffffffff810613ad>] ? rescuer_thread+0x2a2/0x2a2
>> Sep 25 22:02:50 [kernel] [<ffffffff8106575b>] kthread+0xda/0xe2
>> Sep 25 22:02:50 [kernel] [<ffffffff81065681>] ?
>> kthread_worker_fn+0x13f/0x13f
>> Sep 25 22:02:50 [kernel] [<ffffffff8154c57f>] ret_from_fork+0x3f/0x70
>> Sep 25 22:02:50 [kernel] [<ffffffff81065681>] ?
>> kthread_worker_fn+0x13f/0x13f
>> Sep 25 22:02:50 [kernel] call 1/2: op=14 arg=[ffff880115bb7000]
>> result=0_xen_alloc_pte+0x81/0x18e
>> Sep 25 22:02:50 [kernel] call 2/2: op=26 arg=[ffff88001f80b330]
>> result=-1_xen_alloc_pte+0xd7/0x18e
>> Sep 25 22:02:50 [kernel] ------------[ cut here ]------------
>>
>>
>> xen version is 4.8.1-r3 from Gentoo, dom0 is 4.1.44. I have seen the
>> same trace logged in an Ubuntu 16.04 guest with a 4.4 kernel. I don't
>> have a specific test case which triggers this but it will usually appear
>> within 24 hours but it depends on how much work the domU has been
>> performing (so probably how much ballooning it has been doing). Setting
>> e820_host = 0 in the config seems to prevent this happening.
>>
>> In the kernel git log v4.1.44..v4.4.89 -- :/arch/x86/xen/mmu.c shows
>> some commits which seem to relate to the failed hypervisor operation and
>> working round the e820 map. I have not done a bisect to try and isolate
>> this more definitively. I suspect this could be a more general balloon
>> issue but perhaps is revealed with tmem more easily as the rate of
>> ballooning up/down is higher than occasional manual changes.
>>
>> This is the guest /proc/iomem with e820_host = 0:
>>
>> KERNEL: 4.4.89 #157 SMP Wed Sep 27 19:30:28 BST 2017
>> TMEM MODULE PARAMS:
>> /sys/module/tmem/parameters/cleancache: Y
>> /sys/module/tmem/parameters/frontswap: Y
>> /sys/module/tmem/parameters/selfballooning: Y
>> /sys/module/tmem/parameters/selfshrinking: Y
>> KERNEL COMMAND LINE: root=/dev/ram0 init=/linuxrc ramdisk=8192
>> real_root=/dev/systemvg/rootlv udev doscsi dolvm tmem
>> /proc/iomem:
>> 00000000-00000fff : reserved
>> 00001000-0009ffff : System RAM
>> 000a0000-000fffff : reserved
>> 000f0000-000fffff : System ROM
>> 00100000-3fffffff : System RAM
>> 01000000-015509ad : Kernel code
>> 015509ae-01807ebf : Kernel data
>> 01914000-019c1fff : Kernel bss
>> fee00000-fee00fff : Local APIC
>>
>> And with e820_host = 1:
>>
>> KERNEL: 4.4.89 #157 SMP Wed Sep 27 19:30:28 BST 2017
>> TMEM MODULE PARAMS:
>> /sys/module/tmem/parameters/cleancache: Y
>> /sys/module/tmem/parameters/frontswap: Y
>> /sys/module/tmem/parameters/selfballooning: Y
>> /sys/module/tmem/parameters/selfshrinking: Y
>> KERNEL COMMAND LINE: root=/dev/ram0 init=/linuxrc ramdisk=8192
>> real_root=/dev/systemvg/rootlv udev doscsi dolvm tmem
>> /proc/iomem:
>> 00000000-00000fff : reserved
>> 00001000-0009ffff : System RAM
>> 000a0000-000fffff : reserved
>> 000f0000-000fffff : System ROM
>> 00100000-1fffffff : System RAM
>> 01000000-015509ad : Kernel code
>> 015509ae-01807ebf : Kernel data
>> 01914000-019c1fff : Kernel bss
>> 20000000-d7feffff : Unusable memory
>> d7ff0000-d7ffdfff : ACPI Tables
>> d7ffe000-d7ffffff : ACPI Non-volatile Storage
>> fee00000-fee00fff : Local APIC
>> 100000000-11fffffff : System RAM
>>
>>
>> If other information about the environment is useful please let me know.
>
> Cc-ing Konrad, who should be much more familiar with tmem than I am.
Strange, in my sent folder Konrad was still on Cc:
Trying again.
Juergen
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Linux kernel tmem regression v4.1 -> v4.4
2017-09-29 4:38 ` Juergen Gross
@ 2017-09-29 8:50 ` Wei Liu
0 siblings, 0 replies; 4+ messages in thread
From: Wei Liu @ 2017-09-29 8:50 UTC (permalink / raw)
To: Juergen Gross; +Cc: Wei Liu, James Dingwall, xen-devel
On Fri, Sep 29, 2017 at 06:38:18AM +0200, Juergen Gross wrote:
> >
> > Cc-ing Konrad, who should be much more familiar with tmem than I am.
>
> Strange, in my sent folder Konrad was still on Cc:
Konrad has a special setting to remove himself from CC, I think.
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2017-09-29 8:50 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-28 8:42 Linux kernel tmem regression v4.1 -> v4.4 James Dingwall
2017-09-28 13:31 ` Juergen Gross
2017-09-29 4:38 ` Juergen Gross
2017-09-29 8:50 ` Wei Liu
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.