All of lore.kernel.org
 help / color / mirror / Atom feed
* virnet: page allocation failure: order:0
@ 2016-08-15 13:26 ` Philipp Hahn
  0 siblings, 0 replies; 3+ messages in thread
From: Philipp Hahn @ 2016-08-15 13:26 UTC (permalink / raw)
  To: linux-kernel, Rusty Russell, qemu-devel

Hello,

this Sunday one of our virtual servers running linux-4.1.16 inside
OpenStack using qemu "crashed" while doing a backup using rsync to a
slow NFS server.
Crash here means that the server became unresponsive to network traffic:
- it was no longer able to contact the two LDAP servers
- no ssh login was possible
- the backup got stuck
- crond was still running and added process after process, leading to
~1.5k processes running after one day.

What I don't know is if the network problem was the cause or the
consequence. Because of that I want to understand why the follwoing
order=0 allocation failed:

> swapper/0: page allocation failure: order:0, mode:0x120

 4KiB
src/extern/linux/include/linux/gfp.h:
  18 #define ___GFP_HIGH»»···0x20u
  21 #define ___GFP_COLD»»···0x100u
  72 #define __GFP_HIGH»·((__force gfp_t)___GFP_HIGH)»···/* Should
access emergency pools? */


  75 #define __GFP_COLD»·((__force gfp_t)___GFP_COLD)»···/* Cache-cold
page required */



> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.1.0-ucs190-amd64 #1 Debian
4.1.6-1.190.201604142226

The kernel is 4.1.16 with some patches from Debian and some others.

> Hardware name: OpenStack Foundation OpenStack Nova, BIOS Bochs 01/01/2011
>  0000000000000000 0000000000000000 ffffffff81597807 0000000000000120
>  ffffffff8116b6e9 0000000000018200 ffff88021fffbb00 0000000000000000
>  ffff88021fffbb08 0000000000000000 ffff88021fffab50 ffffffff818164e0
> Call Trace:
>  <IRQ>  [<ffffffff81597807>] ? dump_stack+0x40/0x50
>  [<ffffffff8116b6e9>] ? warn_alloc_failed+0xf9/0x150
>  [<ffffffff8116ee7a>] ? __alloc_pages_nodemask+0x65a/0x9d0
>  [<ffffffff811b3024>] ? alloc_pages_current+0xa4/0x120
>  [<ffffffff81481d77>] ? skb_page_frag_refill+0xb7/0xe0
>  [<ffffffffa005614b>] ? try_fill_recv+0x31b/0x610 [virtio_net]
>  [<ffffffffa0056b10>] ? virtnet_receive+0x580/0x890 [virtio_net]

Received network packet, but failed to copy from VirtIO to local kernel
memory.

>  [<ffffffffa0056e46>] ? virtnet_poll+0x26/0x90 [virtio_net]
>  [<ffffffff81499629>] ? net_rx_action+0x159/0x330
>  [<ffffffff8107aace>] ? __do_softirq+0xde/0x260
>  [<ffffffff8107ae95>] ? irq_exit+0x95/0xa0
>  [<ffffffff815a0b74>] ? do_IRQ+0x64/0x110
>  [<ffffffff8159e9ee>] ? common_interrupt+0x6e/0x6e
>  <EOI>  [<ffffffff81020a70>] ? mwait_idle+0x150/0x150
>  [<ffffffff8105e192>] ? native_safe_halt+0x2/0x10
>  [<ffffffff81020a8c>] ? default_idle+0x1c/0xb0
>  [<ffffffff810b67f4>] ? cpu_startup_entry+0x314/0x3e0
>  [<ffffffff8191b093>] ? start_kernel+0x48d/0x498
>  [<ffffffff8191a95d>] ? set_init_arg+0x50/0x50
>  [<ffffffff8191a117>] ? early_idt_handler_array+0x117/0x120
>  [<ffffffff8191a117>] ? early_idt_handler_array+0x117/0x120
>  [<ffffffff8191a720>] ? x86_64_start_kernel+0x14a/0x159
> Mem-Info:
> active_anon:34861 inactive_anon:3120 isolated_anon:0
>  active_file:941577 inactive_file:946489 isolated_file:0
>  unevictable:974 dirty:2953 writeback:282206 unstable:1277
>  slab_reclaimable:55077 slab_unreclaimable:31365
>  mapped:3979 shmem:3986 pagetables:3163 bounce:0
>  free:9437 free_pcp:740 free_cma:0

Looks like enough memory is free

> Node 0 DMA free:15908kB min:20kB low:24kB high:28kB active_anon:0kB
inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB
isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB
mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB
slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB
pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB
free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
> lowmem_reserve[]: 0 3492 7971 7971
> Node 0 DMA32 free:19532kB min:4968kB low:6208kB high:7452kB
active_anon:62328kB inactive_anon:5336kB active_file:1659704kB
inactive_file:1671236kB unevictable:2056kB isolated(anon):0kB
isolated(file):0kB present:3653624kB managed:3578476kB mlocked:2056kB
dirty:10472kB writeback:479692kB mapped:11892kB shmem:7132kB
slab_reclaimable:79832kB slab_unreclaimable:46784kB kernel_stack:2000kB
pagetables:4660kB unstable:2596kB bounce:0kB free_pcp:1080kB
local_pcp:120kB free_cma:0kB writeback_tmp:0kB pages_scanned:0
all_unreclaimable? no
> lowmem_reserve[]: 0 0 4478 4478
> Node 0 Normal free:2308kB min:6372kB low:7964kB high:9556kB
active_anon:77116kB inactive_anon:7144kB active_file:2106604kB
inactive_file:2114720kB unevictable:1840kB isolated(anon):0kB
isolated(file):0kB present:4718592kB managed:4586204kB mlocked:1840kB
dirty:1340kB writeback:649132kB mapped:4024kB shmem:8812kB
slab_reclaimable:140476kB slab_unreclaimable:78676kB kernel_stack:1888kB
pagetables:7992kB unstable:2512kB bounce:0kB free_pcp:1880kB
local_pcp:164kB free_cma:0kB writeback_tmp:0kB pages_scanned:768
all_unreclaimable? no
> lowmem_reserve[]: 0 0 0 0

Show also enough pages being available

> Node 0 DMA: 1*4kB (U) 0*8kB 0*16kB 1*32kB (U) 2*64kB (U) 1*128kB (U)
1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (R) 3*4096kB (EM) = 15908kB
> Node 0 DMA32: 4632*4kB (UEM) 81*8kB (UMR) 4*16kB (R) 1*32kB (R) 0*64kB
1*128kB (R) 1*256kB (R) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 19656kB
> Node 0 Normal: 315*4kB (UM) 1*8kB (R) 1*16kB (R) 1*32kB (R) 1*64kB (R)
1*128kB (R) 2*256kB (R) 1*512kB (R) 0*1024kB 0*2048kB 0*4096kB = 2532kB
> Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0
hugepages_size=2048kB

Here too.

> 1892811 total pagecache pages
> 0 pages in swap cache
> Swap cache stats: add 0, delete 0, find 0/0
> Free swap  = 0kB
> Total swap = 0kB

Okay, no swap for defragmentation, but should be no problem for order=0

> 2097052 pages RAM
> 0 pages HighMem/MovableOnly
> 51905 pages reserved
> 0 pages hwpoisoned

Any explanation?

Thank you in advance

Philipp

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Qemu-devel] virnet: page allocation failure: order:0
@ 2016-08-15 13:26 ` Philipp Hahn
  0 siblings, 0 replies; 3+ messages in thread
From: Philipp Hahn @ 2016-08-15 13:26 UTC (permalink / raw)
  To: linux-kernel, Rusty Russell, qemu-devel

Hello,

this Sunday one of our virtual servers running linux-4.1.16 inside
OpenStack using qemu "crashed" while doing a backup using rsync to a
slow NFS server.
Crash here means that the server became unresponsive to network traffic:
- it was no longer able to contact the two LDAP servers
- no ssh login was possible
- the backup got stuck
- crond was still running and added process after process, leading to
~1.5k processes running after one day.

What I don't know is if the network problem was the cause or the
consequence. Because of that I want to understand why the follwoing
order=0 allocation failed:

> swapper/0: page allocation failure: order:0, mode:0x120

 4KiB
src/extern/linux/include/linux/gfp.h:
  18 #define ___GFP_HIGH»»···0x20u
  21 #define ___GFP_COLD»»···0x100u
  72 #define __GFP_HIGH»·((__force gfp_t)___GFP_HIGH)»···/* Should
access emergency pools? */


  75 #define __GFP_COLD»·((__force gfp_t)___GFP_COLD)»···/* Cache-cold
page required */



> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.1.0-ucs190-amd64 #1 Debian
4.1.6-1.190.201604142226

The kernel is 4.1.16 with some patches from Debian and some others.

> Hardware name: OpenStack Foundation OpenStack Nova, BIOS Bochs 01/01/2011
>  0000000000000000 0000000000000000 ffffffff81597807 0000000000000120
>  ffffffff8116b6e9 0000000000018200 ffff88021fffbb00 0000000000000000
>  ffff88021fffbb08 0000000000000000 ffff88021fffab50 ffffffff818164e0
> Call Trace:
>  <IRQ>  [<ffffffff81597807>] ? dump_stack+0x40/0x50
>  [<ffffffff8116b6e9>] ? warn_alloc_failed+0xf9/0x150
>  [<ffffffff8116ee7a>] ? __alloc_pages_nodemask+0x65a/0x9d0
>  [<ffffffff811b3024>] ? alloc_pages_current+0xa4/0x120
>  [<ffffffff81481d77>] ? skb_page_frag_refill+0xb7/0xe0
>  [<ffffffffa005614b>] ? try_fill_recv+0x31b/0x610 [virtio_net]
>  [<ffffffffa0056b10>] ? virtnet_receive+0x580/0x890 [virtio_net]

Received network packet, but failed to copy from VirtIO to local kernel
memory.

>  [<ffffffffa0056e46>] ? virtnet_poll+0x26/0x90 [virtio_net]
>  [<ffffffff81499629>] ? net_rx_action+0x159/0x330
>  [<ffffffff8107aace>] ? __do_softirq+0xde/0x260
>  [<ffffffff8107ae95>] ? irq_exit+0x95/0xa0
>  [<ffffffff815a0b74>] ? do_IRQ+0x64/0x110
>  [<ffffffff8159e9ee>] ? common_interrupt+0x6e/0x6e
>  <EOI>  [<ffffffff81020a70>] ? mwait_idle+0x150/0x150
>  [<ffffffff8105e192>] ? native_safe_halt+0x2/0x10
>  [<ffffffff81020a8c>] ? default_idle+0x1c/0xb0
>  [<ffffffff810b67f4>] ? cpu_startup_entry+0x314/0x3e0
>  [<ffffffff8191b093>] ? start_kernel+0x48d/0x498
>  [<ffffffff8191a95d>] ? set_init_arg+0x50/0x50
>  [<ffffffff8191a117>] ? early_idt_handler_array+0x117/0x120
>  [<ffffffff8191a117>] ? early_idt_handler_array+0x117/0x120
>  [<ffffffff8191a720>] ? x86_64_start_kernel+0x14a/0x159
> Mem-Info:
> active_anon:34861 inactive_anon:3120 isolated_anon:0
>  active_file:941577 inactive_file:946489 isolated_file:0
>  unevictable:974 dirty:2953 writeback:282206 unstable:1277
>  slab_reclaimable:55077 slab_unreclaimable:31365
>  mapped:3979 shmem:3986 pagetables:3163 bounce:0
>  free:9437 free_pcp:740 free_cma:0

Looks like enough memory is free

> Node 0 DMA free:15908kB min:20kB low:24kB high:28kB active_anon:0kB
inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB
isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB
mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB
slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB
pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB
free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
> lowmem_reserve[]: 0 3492 7971 7971
> Node 0 DMA32 free:19532kB min:4968kB low:6208kB high:7452kB
active_anon:62328kB inactive_anon:5336kB active_file:1659704kB
inactive_file:1671236kB unevictable:2056kB isolated(anon):0kB
isolated(file):0kB present:3653624kB managed:3578476kB mlocked:2056kB
dirty:10472kB writeback:479692kB mapped:11892kB shmem:7132kB
slab_reclaimable:79832kB slab_unreclaimable:46784kB kernel_stack:2000kB
pagetables:4660kB unstable:2596kB bounce:0kB free_pcp:1080kB
local_pcp:120kB free_cma:0kB writeback_tmp:0kB pages_scanned:0
all_unreclaimable? no
> lowmem_reserve[]: 0 0 4478 4478
> Node 0 Normal free:2308kB min:6372kB low:7964kB high:9556kB
active_anon:77116kB inactive_anon:7144kB active_file:2106604kB
inactive_file:2114720kB unevictable:1840kB isolated(anon):0kB
isolated(file):0kB present:4718592kB managed:4586204kB mlocked:1840kB
dirty:1340kB writeback:649132kB mapped:4024kB shmem:8812kB
slab_reclaimable:140476kB slab_unreclaimable:78676kB kernel_stack:1888kB
pagetables:7992kB unstable:2512kB bounce:0kB free_pcp:1880kB
local_pcp:164kB free_cma:0kB writeback_tmp:0kB pages_scanned:768
all_unreclaimable? no
> lowmem_reserve[]: 0 0 0 0

Show also enough pages being available

> Node 0 DMA: 1*4kB (U) 0*8kB 0*16kB 1*32kB (U) 2*64kB (U) 1*128kB (U)
1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (R) 3*4096kB (EM) = 15908kB
> Node 0 DMA32: 4632*4kB (UEM) 81*8kB (UMR) 4*16kB (R) 1*32kB (R) 0*64kB
1*128kB (R) 1*256kB (R) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 19656kB
> Node 0 Normal: 315*4kB (UM) 1*8kB (R) 1*16kB (R) 1*32kB (R) 1*64kB (R)
1*128kB (R) 2*256kB (R) 1*512kB (R) 0*1024kB 0*2048kB 0*4096kB = 2532kB
> Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0
hugepages_size=2048kB

Here too.

> 1892811 total pagecache pages
> 0 pages in swap cache
> Swap cache stats: add 0, delete 0, find 0/0
> Free swap  = 0kB
> Total swap = 0kB

Okay, no swap for defragmentation, but should be no problem for order=0

> 2097052 pages RAM
> 0 pages HighMem/MovableOnly
> 51905 pages reserved
> 0 pages hwpoisoned

Any explanation?

Thank you in advance

Philipp

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: RFH: virnet: page allocation failure: order:0
  2016-08-15 13:26 ` [Qemu-devel] " Philipp Hahn
  (?)
@ 2016-09-21  6:44 ` Philipp Hahn
  -1 siblings, 0 replies; 3+ messages in thread
From: Philipp Hahn @ 2016-09-21  6:44 UTC (permalink / raw)
  To: linux-kernel, linux-mm

Hello,

Am 15.08.2016 um 15:26 schrieb Philipp Hahn:
> this Sunday one of our virtual servers running linux-4.1.16 inside
> OpenStack using qemu "crashed" while doing a backup using rsync to a
> slow NFS server.

This happened again last weekend, with the same stack trace:

>> Call Trace:
>>  <IRQ>  [<ffffffff81597807>] ? dump_stack+0x40/0x50
>>  [<ffffffff8116b6e9>] ? warn_alloc_failed+0xf9/0x150
>>  [<ffffffff8116ee7a>] ? __alloc_pages_nodemask+0x65a/0x9d0
>>  [<ffffffff811b3024>] ? alloc_pages_current+0xa4/0x120
>>  [<ffffffff81481d77>] ? skb_page_frag_refill+0xb7/0xe0
>>  [<ffffffffa002d14b>] ? try_fill_recv+0x31b/0x610 [virtio_net]
>>  [<ffffffffa002db10>] ? virtnet_receive+0x580/0x890 [virtio_net]
>>  [<ffffffffa002de46>] ? virtnet_poll+0x26/0x90 [virtio_net]
>>  [<ffffffff81499629>] ? net_rx_action+0x159/0x330
>>  [<ffffffff8107aace>] ? __do_softirq+0xde/0x260
>>  [<ffffffff8107ae95>] ? irq_exit+0x95/0xa0
>>  [<ffffffff815a0b74>] ? do_IRQ+0x64/0x110
>>  [<ffffffff8159e9ee>] ? common_interrupt+0x6e/0x6e
>>  <EOI>  [<ffffffff81020a70>] ? mwait_idle+0x150/0x150
>>  [<ffffffff8105e192>] ? native_safe_halt+0x2/0x10
>>  [<ffffffff81020a8c>] ? default_idle+0x1c/0xb0
>>  [<ffffffff810b67f4>] ? cpu_startup_entry+0x314/0x3e0
>>  [<ffffffff8159db87>] ? _raw_spin_unlock_irqrestore+0x17/0x50
>>  [<ffffffff8104de55>] ? start_secondary+0x185/0x1b0


> What I don't know is if the network problem was the cause or the
> consequence. Because of that I want to understand why the follwoing
> order=0 allocation failed:

What I still don't understand is how an "order=0" allocation can fail?

>> Node 0 DMA: 1*4kB (U) 0*8kB 0*16kB 1*32kB (U) 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (R) 3*4096kB (EM) = 15908kB
>> Node 0 DMA32: 4701*4kB (UM) 0*8kB 9*16kB (R) 0*32kB 1*64kB (R) 0*128kB 0*256kB 1*512kB (R) 0*1024kB 0*2048kB 0*4096kB = 19524kB
>> Node 0 Normal: 352*4kB (UM) 5*8kB (UM) 2*16kB (R) 0*32kB 1*64kB (R) 1*128kB (R) 0*256kB 0*512kB 1*1024kB (R) 0*2048kB 0*4096kB = 2696kB
>> Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB

In every zone there are plenty of free pages, so why doesn't allocation
4 KiB work?

I looked at the source code, dis-assembly, similar reports, but now I'm
lost. Can someone give me a hint where to look next please?

Thank you in advance.

Philipp Hahn

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2016-09-21  6:54 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-15 13:26 virnet: page allocation failure: order:0 Philipp Hahn
2016-08-15 13:26 ` [Qemu-devel] " Philipp Hahn
2016-09-21  6:44 ` RFH: " Philipp Hahn

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.