All of lore.kernel.org
 help / color / mirror / Atom feed
* mm allocation failure and hang when running xfstests generic/269 on xfs
@ 2017-03-01  4:46 ` Xiong Zhou
  0 siblings, 0 replies; 79+ messages in thread
From: Xiong Zhou @ 2017-03-01  4:46 UTC (permalink / raw)
  To: linux-xfs, linux-mm; +Cc: linux-kernel, linux-fsdevel

Hi,

It's reproduciable, not everytime though. Ext4 works fine.

Based on test logs, it's bad on Linus tree commit:
  e5d56ef Merge tag 'watchdog-for-linus-v4.11'

It's good on commit:
  f8e6859 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc

Trying to narrow down a little bit.

Thanks,
Xiong

---

fsstress: vmalloc: allocation failure, allocated 12288 of 20480 bytes, mode:0x14080c2(GFP_KERNEL|__GFP_HIGHMEM|__GFP_ZERO), nodemask=(null)
fsstress cpuset=/ mems_allowed=0-1
CPU: 1 PID: 23460 Comm: fsstress Not tainted 4.10.0-master-45554b2+ #21
Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 10/05/2016
Call Trace:
 dump_stack+0x63/0x87
 warn_alloc+0x114/0x1c0
 ? alloc_pages_current+0x88/0x120
 __vmalloc_node_range+0x250/0x2a0
 ? kmem_zalloc_greedy+0x2b/0x40 [xfs]
 ? free_hot_cold_page+0x21f/0x280
 vzalloc+0x54/0x60
 ? kmem_zalloc_greedy+0x2b/0x40 [xfs]
 kmem_zalloc_greedy+0x2b/0x40 [xfs]
 xfs_bulkstat+0x11b/0x730 [xfs]
 ? xfs_bulkstat_one_int+0x340/0x340 [xfs]
 ? selinux_capable+0x20/0x30
 ? security_capable+0x48/0x60
 xfs_ioc_bulkstat+0xe4/0x190 [xfs]
 xfs_file_ioctl+0x9dd/0xad0 [xfs]
 ? do_filp_open+0xa5/0x100
 do_vfs_ioctl+0xa7/0x5e0
 SyS_ioctl+0x79/0x90
 do_syscall_64+0x67/0x180
 entry_SYSCALL64_slow_path+0x25/0x25
RIP: 0033:0x7f825023f577
RSP: 002b:00007ffffea76e58 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000003f7e RCX: 00007f825023f577
RDX: 00007ffffea76e70 RSI: ffffffffc0205865 RDI: 0000000000000003
RBP: 0000000000000003 R08: 0000000000000008 R09: 0000000000000036
R10: 0000000000000069 R11: 0000000000000246 R12: 0000000000000036
R13: 00007f824c002d00 R14: 0000000000001209 R15: 0000000000000000
Mem-Info:
active_anon:23126 inactive_anon:1719 isolated_anon:0
 active_file:153709 inactive_file:356889 isolated_file:0
 unevictable:0 dirty:0 writeback:0 unstable:0
 slab_reclaimable:43829 slab_unreclaimable:45414
 mapped:14638 shmem:2470 pagetables:1599 bounce:0
 free:7463113 free_pcp:23729 free_cma:0
Node 0 active_anon:36372kB inactive_anon:140kB active_file:449540kB inactive_file:355288kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:21792kB dirty:0kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 10240kB anon_thp: 796kB writeback_tmp:0kB unstable:0kB pages_scanned:0 all_unreclaimable? no
Node 1 active_anon:56132kB inactive_anon:6736kB active_file:165296kB inactive_file:1072268kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:36760kB dirty:0kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 12288kB anon_thp: 9084kB writeback_tmp:0kB unstable:0kB pages_scanned:0 all_unreclaimable? no
Node 0 DMA free:15884kB min:40kB low:52kB high:64kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15904kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:20kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 1821 15896 15896 15896
Node 0 DMA32 free:1860532kB min:4968kB low:6772kB high:8576kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:1948156kB managed:1865328kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:2132kB local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0 14074 14074 14074
Node 0 Normal free:13111616kB min:39660kB low:54072kB high:68484kB active_anon:36372kB inactive_anon:140kB active_file:449540kB inactive_file:355296kB unevictable:0kB writepending:0kB present:14680064kB managed:14412260kB mlocked:0kB slab_reclaimable:77256kB slab_unreclaimable:86096kB kernel_stack:8184kB pagetables:3132kB bounce:0kB free_pcp:45712kB local_pcp:212kB free_cma:0kB
lowmem_reserve[]: 0 0 0 0 0
Node 1 Normal free:14864424kB min:45432kB low:61940kB high:78448kB active_anon:56132kB inactive_anon:6736kB active_file:165296kB inactive_file:1072264kB unevictable:0kB writepending:0kB present:16777216kB managed:16508964kB mlocked:0kB slab_reclaimable:98060kB slab_unreclaimable:95540kB kernel_stack:7880kB pagetables:3264kB bounce:0kB free_pcp:46848kB local_pcp:640kB free_cma:0kB
lowmem_reserve[]: 0 0 0 0 0
Node 0 DMA: 1*4kB (U) 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15884kB
Node 0 DMA32: 13*4kB (UM) 10*8kB (UM) 13*16kB (UM) 11*32kB (M) 18*64kB (UM) 9*128kB (UM) 16*256kB (UM) 12*512kB (UM) 12*1024kB (UM) 10*2048kB (UM) 443*4096kB (M) = 1860532kB
Node 0 Normal: 2765*4kB (UME) 737*8kB (UME) 145*16kB (UME) 251*32kB (UME) 626*64kB (UM) 255*128kB (UME) 46*256kB (UME) 23*512kB (UE) 15*1024kB (U) 8*2048kB (U) 3163*4096kB (M) = 13110956kB
Node 1 Normal: 3422*4kB (UME) 1039*8kB (UME) 158*16kB (UME) 852*32kB (UME) 1125*64kB (UME) 617*128kB (UE) 329*256kB (UME) 117*512kB (UME) 18*1024kB (UM) 6*2048kB (U) 3537*4096kB (M) = 14865168kB
Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
513078 total pagecache pages
0 pages in swap cache
Swap cache stats: add 0, delete 0, find 0/0
Free swap  = 10485756kB
Total swap = 10485756kB
8355357 pages RAM
0 pages HighMem/MovableOnly
154743 pages reserved
0 pages cma reserved
0 pages hwpoisoned

^ permalink raw reply	[flat|nested] 79+ messages in thread

* mm allocation failure and hang when running xfstests generic/269 on xfs
@ 2017-03-01  4:46 ` Xiong Zhou
  0 siblings, 0 replies; 79+ messages in thread
From: Xiong Zhou @ 2017-03-01  4:46 UTC (permalink / raw)
  To: linux-xfs, linux-mm; +Cc: linux-kernel, linux-fsdevel

Hi,

It's reproduciable, not everytime though. Ext4 works fine.

Based on test logs, it's bad on Linus tree commit:
  e5d56ef Merge tag 'watchdog-for-linus-v4.11'

It's good on commit:
  f8e6859 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc

Trying to narrow down a little bit.

Thanks,
Xiong

---

fsstress: vmalloc: allocation failure, allocated 12288 of 20480 bytes, mode:0x14080c2(GFP_KERNEL|__GFP_HIGHMEM|__GFP_ZERO), nodemask=(null)
fsstress cpuset=/ mems_allowed=0-1
CPU: 1 PID: 23460 Comm: fsstress Not tainted 4.10.0-master-45554b2+ #21
Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 10/05/2016
Call Trace:
 dump_stack+0x63/0x87
 warn_alloc+0x114/0x1c0
 ? alloc_pages_current+0x88/0x120
 __vmalloc_node_range+0x250/0x2a0
 ? kmem_zalloc_greedy+0x2b/0x40 [xfs]
 ? free_hot_cold_page+0x21f/0x280
 vzalloc+0x54/0x60
 ? kmem_zalloc_greedy+0x2b/0x40 [xfs]
 kmem_zalloc_greedy+0x2b/0x40 [xfs]
 xfs_bulkstat+0x11b/0x730 [xfs]
 ? xfs_bulkstat_one_int+0x340/0x340 [xfs]
 ? selinux_capable+0x20/0x30
 ? security_capable+0x48/0x60
 xfs_ioc_bulkstat+0xe4/0x190 [xfs]
 xfs_file_ioctl+0x9dd/0xad0 [xfs]
 ? do_filp_open+0xa5/0x100
 do_vfs_ioctl+0xa7/0x5e0
 SyS_ioctl+0x79/0x90
 do_syscall_64+0x67/0x180
 entry_SYSCALL64_slow_path+0x25/0x25
RIP: 0033:0x7f825023f577
RSP: 002b:00007ffffea76e58 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000003f7e RCX: 00007f825023f577
RDX: 00007ffffea76e70 RSI: ffffffffc0205865 RDI: 0000000000000003
RBP: 0000000000000003 R08: 0000000000000008 R09: 0000000000000036
R10: 0000000000000069 R11: 0000000000000246 R12: 0000000000000036
R13: 00007f824c002d00 R14: 0000000000001209 R15: 0000000000000000
Mem-Info:
active_anon:23126 inactive_anon:1719 isolated_anon:0
 active_file:153709 inactive_file:356889 isolated_file:0
 unevictable:0 dirty:0 writeback:0 unstable:0
 slab_reclaimable:43829 slab_unreclaimable:45414
 mapped:14638 shmem:2470 pagetables:1599 bounce:0
 free:7463113 free_pcp:23729 free_cma:0
Node 0 active_anon:36372kB inactive_anon:140kB active_file:449540kB inactive_file:355288kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:21792kB dirty:0kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 10240kB anon_thp: 796kB writeback_tmp:0kB unstable:0kB pages_scanned:0 all_unreclaimable? no
Node 1 active_anon:56132kB inactive_anon:6736kB active_file:165296kB inactive_file:1072268kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:36760kB dirty:0kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 12288kB anon_thp: 9084kB writeback_tmp:0kB unstable:0kB pages_scanned:0 all_unreclaimable? no
Node 0 DMA free:15884kB min:40kB low:52kB high:64kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15904kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:20kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 1821 15896 15896 15896
Node 0 DMA32 free:1860532kB min:4968kB low:6772kB high:8576kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:1948156kB managed:1865328kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:2132kB local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0 14074 14074 14074
Node 0 Normal free:13111616kB min:39660kB low:54072kB high:68484kB active_anon:36372kB inactive_anon:140kB active_file:449540kB inactive_file:355296kB unevictable:0kB writepending:0kB present:14680064kB managed:14412260kB mlocked:0kB slab_reclaimable:77256kB slab_unreclaimable:86096kB kernel_stack:8184kB pagetables:3132kB bounce:0kB free_pcp:45712kB local_pcp:212kB free_cma:0kB
lowmem_reserve[]: 0 0 0 0 0
Node 1 Normal free:14864424kB min:45432kB low:61940kB high:78448kB active_anon:56132kB inactive_anon:6736kB active_file:165296kB inactive_file:1072264kB unevictable:0kB writepending:0kB present:16777216kB managed:16508964kB mlocked:0kB slab_reclaimable:98060kB slab_unreclaimable:95540kB kernel_stack:7880kB pagetables:3264kB bounce:0kB free_pcp:46848kB local_pcp:640kB free_cma:0kB
lowmem_reserve[]: 0 0 0 0 0
Node 0 DMA: 1*4kB (U) 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15884kB
Node 0 DMA32: 13*4kB (UM) 10*8kB (UM) 13*16kB (UM) 11*32kB (M) 18*64kB (UM) 9*128kB (UM) 16*256kB (UM) 12*512kB (UM) 12*1024kB (UM) 10*2048kB (UM) 443*4096kB (M) = 1860532kB
Node 0 Normal: 2765*4kB (UME) 737*8kB (UME) 145*16kB (UME) 251*32kB (UME) 626*64kB (UM) 255*128kB (UME) 46*256kB (UME) 23*512kB (UE) 15*1024kB (U) 8*2048kB (U) 3163*4096kB (M) = 13110956kB
Node 1 Normal: 3422*4kB (UME) 1039*8kB (UME) 158*16kB (UME) 852*32kB (UME) 1125*64kB (UME) 617*128kB (UE) 329*256kB (UME) 117*512kB (UME) 18*1024kB (UM) 6*2048kB (U) 3537*4096kB (M) = 14865168kB
Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
513078 total pagecache pages
0 pages in swap cache
Swap cache stats: add 0, delete 0, find 0/0
Free swap  = 10485756kB
Total swap = 10485756kB
8355357 pages RAM
0 pages HighMem/MovableOnly
154743 pages reserved
0 pages cma reserved
0 pages hwpoisoned

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: mm allocation failure and hang when running xfstests generic/269 on xfs
  2017-03-01  4:46 ` Xiong Zhou
@ 2017-03-02  0:37   ` Christoph Hellwig
  -1 siblings, 0 replies; 79+ messages in thread
From: Christoph Hellwig @ 2017-03-02  0:37 UTC (permalink / raw)
  To: Xiong Zhou; +Cc: linux-xfs, linux-mm, linux-kernel, linux-fsdevel

On Wed, Mar 01, 2017 at 12:46:34PM +0800, Xiong Zhou wrote:
> Hi,
> 
> It's reproduciable, not everytime though. Ext4 works fine.

On ext4 fsstress won't run bulkstat because it doesn't exist.  Either
way this smells like a MM issue to me as there were not XFS changes
in that area recently.

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: mm allocation failure and hang when running xfstests generic/269 on xfs
@ 2017-03-02  0:37   ` Christoph Hellwig
  0 siblings, 0 replies; 79+ messages in thread
From: Christoph Hellwig @ 2017-03-02  0:37 UTC (permalink / raw)
  To: Xiong Zhou; +Cc: linux-xfs, linux-mm, linux-kernel, linux-fsdevel

On Wed, Mar 01, 2017 at 12:46:34PM +0800, Xiong Zhou wrote:
> Hi,
> 
> It's reproduciable, not everytime though. Ext4 works fine.

On ext4 fsstress won't run bulkstat because it doesn't exist.  Either
way this smells like a MM issue to me as there were not XFS changes
in that area recently.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: mm allocation failure and hang when running xfstests generic/269 on xfs
  2017-03-02  0:37   ` Christoph Hellwig
@ 2017-03-02  5:19     ` Xiong Zhou
  -1 siblings, 0 replies; 79+ messages in thread
From: Xiong Zhou @ 2017-03-02  5:19 UTC (permalink / raw)
  To: Christoph Hellwig, mhocko
  Cc: Xiong Zhou, linux-xfs, linux-mm, linux-kernel, linux-fsdevel

On Wed, Mar 01, 2017 at 04:37:31PM -0800, Christoph Hellwig wrote:
> On Wed, Mar 01, 2017 at 12:46:34PM +0800, Xiong Zhou wrote:
> > Hi,
> > 
> > It's reproduciable, not everytime though. Ext4 works fine.
> 
> On ext4 fsstress won't run bulkstat because it doesn't exist.  Either
> way this smells like a MM issue to me as there were not XFS changes
> in that area recently.

Yap.

First bad commit:

commit 5d17a73a2ebeb8d1c6924b91e53ab2650fe86ffb
Author: Michal Hocko <mhocko@suse.com>
Date:   Fri Feb 24 14:58:53 2017 -0800

    vmalloc: back off when the current task is killed

Reverting this commit on top of
  e5d56ef Merge tag 'watchdog-for-linus-v4.11'
survives the tests.

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: mm allocation failure and hang when running xfstests generic/269 on xfs
@ 2017-03-02  5:19     ` Xiong Zhou
  0 siblings, 0 replies; 79+ messages in thread
From: Xiong Zhou @ 2017-03-02  5:19 UTC (permalink / raw)
  To: Christoph Hellwig, mhocko
  Cc: Xiong Zhou, linux-xfs, linux-mm, linux-kernel, linux-fsdevel

On Wed, Mar 01, 2017 at 04:37:31PM -0800, Christoph Hellwig wrote:
> On Wed, Mar 01, 2017 at 12:46:34PM +0800, Xiong Zhou wrote:
> > Hi,
> > 
> > It's reproduciable, not everytime though. Ext4 works fine.
> 
> On ext4 fsstress won't run bulkstat because it doesn't exist.  Either
> way this smells like a MM issue to me as there were not XFS changes
> in that area recently.

Yap.

First bad commit:

commit 5d17a73a2ebeb8d1c6924b91e53ab2650fe86ffb
Author: Michal Hocko <mhocko@suse.com>
Date:   Fri Feb 24 14:58:53 2017 -0800

    vmalloc: back off when the current task is killed

Reverting this commit on top of
  e5d56ef Merge tag 'watchdog-for-linus-v4.11'
survives the tests.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: mm allocation failure and hang when running xfstests generic/269 on xfs
  2017-03-02  5:19     ` Xiong Zhou
  (?)
  (?)
@ 2017-03-02  6:41       ` Bob Liu
  -1 siblings, 0 replies; 79+ messages in thread
From: Bob Liu @ 2017-03-02  6:41 UTC (permalink / raw)
  To: Xiong Zhou, Christoph Hellwig, mhocko
  Cc: linux-xfs, linux-mm, linux-kernel, linux-fsdevel

On 2017/3/2 13:19, Xiong Zhou wrote:
> On Wed, Mar 01, 2017 at 04:37:31PM -0800, Christoph Hellwig wrote:
>> On Wed, Mar 01, 2017 at 12:46:34PM +0800, Xiong Zhou wrote:
>>> Hi,
>>>
>>> It's reproduciable, not everytime though. Ext4 works fine.
>>
>> On ext4 fsstress won't run bulkstat because it doesn't exist.  Either
>> way this smells like a MM issue to me as there were not XFS changes
>> in that area recently.
> 
> Yap.
> 
> First bad commit:
> 

It looks like not a bug.
>From below commit, the allocation failure print was due to current process received SIGKILL signal.
You may need to confirm whether that's the case. 

Regards,
Bob

> commit 5d17a73a2ebeb8d1c6924b91e53ab2650fe86ffb
> Author: Michal Hocko <mhocko@suse.com>
> Date:   Fri Feb 24 14:58:53 2017 -0800
> 
>     vmalloc: back off when the current task is killed
> 
> Reverting this commit on top of
>   e5d56ef Merge tag 'watchdog-for-linus-v4.11'
> survives the tests.
> 

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: mm allocation failure and hang when running xfstests generic/269 on xfs
@ 2017-03-02  6:41       ` Bob Liu
  0 siblings, 0 replies; 79+ messages in thread
From: Bob Liu @ 2017-03-02  6:41 UTC (permalink / raw)
  To: Xiong Zhou, Christoph Hellwig, mhocko
  Cc: linux-xfs, linux-mm, linux-kernel, linux-fsdevel

On 2017/3/2 13:19, Xiong Zhou wrote:
> On Wed, Mar 01, 2017 at 04:37:31PM -0800, Christoph Hellwig wrote:
>> On Wed, Mar 01, 2017 at 12:46:34PM +0800, Xiong Zhou wrote:
>>> Hi,
>>>
>>> It's reproduciable, not everytime though. Ext4 works fine.
>>
>> On ext4 fsstress won't run bulkstat because it doesn't exist.  Either
>> way this smells like a MM issue to me as there were not XFS changes
>> in that area recently.
> 
> Yap.
> 
> First bad commit:
> 

It looks like not a bug.
>From below commit, the allocation failure print was due to current process received SIGKILL signal.
You may need to confirm whether that's the case. 

Regards,
Bob

> commit 5d17a73a2ebeb8d1c6924b91e53ab2650fe86ffb
> Author: Michal Hocko <mhocko@suse.com>
> Date:   Fri Feb 24 14:58:53 2017 -0800
> 
>     vmalloc: back off when the current task is killed
> 
> Reverting this commit on top of
>   e5d56ef Merge tag 'watchdog-for-linus-v4.11'
> survives the tests.
> 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: mm allocation failure and hang when running xfstests generic/269 on xfs
@ 2017-03-02  6:41       ` Bob Liu
  0 siblings, 0 replies; 79+ messages in thread
From: Bob Liu @ 2017-03-02  6:41 UTC (permalink / raw)
  To: Xiong Zhou, Christoph Hellwig, mhocko
  Cc: linux-xfs, linux-mm, linux-kernel, linux-fsdevel

On 2017/3/2 13:19, Xiong Zhou wrote:
> On Wed, Mar 01, 2017 at 04:37:31PM -0800, Christoph Hellwig wrote:
>> On Wed, Mar 01, 2017 at 12:46:34PM +0800, Xiong Zhou wrote:
>>> Hi,
>>>
>>> It's reproduciable, not everytime though. Ext4 works fine.
>>
>> On ext4 fsstress won't run bulkstat because it doesn't exist.  Either
>> way this smells like a MM issue to me as there were not XFS changes
>> in that area recently.
> 
> Yap.
> 
> First bad commit:
> 

It looks like not a bug.
>From below commit, the allocation failure print was due to current process received SIGKILL signal.
You may need to confirm whether that's the case. 

Regards,
Bob

> commit 5d17a73a2ebeb8d1c6924b91e53ab2650fe86ffb
> Author: Michal Hocko <mhocko@suse.com>
> Date:   Fri Feb 24 14:58:53 2017 -0800
> 
>     vmalloc: back off when the current task is killed
> 
> Reverting this commit on top of
>   e5d56ef Merge tag 'watchdog-for-linus-v4.11'
> survives the tests.
> 



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: mm allocation failure and hang when running xfstests generic/269 on xfs
@ 2017-03-02  6:41       ` Bob Liu
  0 siblings, 0 replies; 79+ messages in thread
From: Bob Liu @ 2017-03-02  6:41 UTC (permalink / raw)
  To: Xiong Zhou, Christoph Hellwig, mhocko
  Cc: linux-xfs, linux-mm, linux-kernel, linux-fsdevel

On 2017/3/2 13:19, Xiong Zhou wrote:
> On Wed, Mar 01, 2017 at 04:37:31PM -0800, Christoph Hellwig wrote:
>> On Wed, Mar 01, 2017 at 12:46:34PM +0800, Xiong Zhou wrote:
>>> Hi,
>>>
>>> It's reproduciable, not everytime though. Ext4 works fine.
>>
>> On ext4 fsstress won't run bulkstat because it doesn't exist.  Either
>> way this smells like a MM issue to me as there were not XFS changes
>> in that area recently.
> 
> Yap.
> 
> First bad commit:
> 

It looks like not a bug.
>From below commit, the allocation failure print was due to current process received SIGKILL signal.
You may need to confirm whether that's the case. 

Regards,
Bob

> commit 5d17a73a2ebeb8d1c6924b91e53ab2650fe86ffb
> Author: Michal Hocko <mhocko@suse.com>
> Date:   Fri Feb 24 14:58:53 2017 -0800
> 
>     vmalloc: back off when the current task is killed
> 
> Reverting this commit on top of
>   e5d56ef Merge tag 'watchdog-for-linus-v4.11'
> survives the tests.
> 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: mm allocation failure and hang when running xfstests generic/269 on xfs
  2017-03-02  5:19     ` Xiong Zhou
@ 2017-03-02  6:47       ` Anshuman Khandual
  -1 siblings, 0 replies; 79+ messages in thread
From: Anshuman Khandual @ 2017-03-02  6:47 UTC (permalink / raw)
  To: Xiong Zhou, Christoph Hellwig, mhocko
  Cc: linux-xfs, linux-mm, linux-kernel, linux-fsdevel

On 03/02/2017 10:49 AM, Xiong Zhou wrote:
> On Wed, Mar 01, 2017 at 04:37:31PM -0800, Christoph Hellwig wrote:
>> On Wed, Mar 01, 2017 at 12:46:34PM +0800, Xiong Zhou wrote:
>>> Hi,
>>>
>>> It's reproduciable, not everytime though. Ext4 works fine.
>> On ext4 fsstress won't run bulkstat because it doesn't exist.  Either
>> way this smells like a MM issue to me as there were not XFS changes
>> in that area recently.
> Yap.
> 
> First bad commit:
> 
> commit 5d17a73a2ebeb8d1c6924b91e53ab2650fe86ffb
> Author: Michal Hocko <mhocko@suse.com>
> Date:   Fri Feb 24 14:58:53 2017 -0800
> 
>     vmalloc: back off when the current task is killed
> 
> Reverting this commit on top of
>   e5d56ef Merge tag 'watchdog-for-linus-v4.11'
> survives the tests.

Does fsstress test or the system hang ? I am not familiar with this
code but If it's the test which is getting hung and its hitting this
new check introduced by the above commit that means the requester is
currently being killed by OOM killer for some other memory allocation
request. Then is not this kind if memory alloc failure expected ?

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: mm allocation failure and hang when running xfstests generic/269 on xfs
@ 2017-03-02  6:47       ` Anshuman Khandual
  0 siblings, 0 replies; 79+ messages in thread
From: Anshuman Khandual @ 2017-03-02  6:47 UTC (permalink / raw)
  To: Xiong Zhou, Christoph Hellwig, mhocko
  Cc: linux-xfs, linux-mm, linux-kernel, linux-fsdevel

On 03/02/2017 10:49 AM, Xiong Zhou wrote:
> On Wed, Mar 01, 2017 at 04:37:31PM -0800, Christoph Hellwig wrote:
>> On Wed, Mar 01, 2017 at 12:46:34PM +0800, Xiong Zhou wrote:
>>> Hi,
>>>
>>> It's reproduciable, not everytime though. Ext4 works fine.
>> On ext4 fsstress won't run bulkstat because it doesn't exist.  Either
>> way this smells like a MM issue to me as there were not XFS changes
>> in that area recently.
> Yap.
> 
> First bad commit:
> 
> commit 5d17a73a2ebeb8d1c6924b91e53ab2650fe86ffb
> Author: Michal Hocko <mhocko@suse.com>
> Date:   Fri Feb 24 14:58:53 2017 -0800
> 
>     vmalloc: back off when the current task is killed
> 
> Reverting this commit on top of
>   e5d56ef Merge tag 'watchdog-for-linus-v4.11'
> survives the tests.

Does fsstress test or the system hang ? I am not familiar with this
code but If it's the test which is getting hung and its hitting this
new check introduced by the above commit that means the requester is
currently being killed by OOM killer for some other memory allocation
request. Then is not this kind if memory alloc failure expected ?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: mm allocation failure and hang when running xfstests generic/269 on xfs
  2017-03-02  6:47       ` Anshuman Khandual
@ 2017-03-02  8:42         ` Michal Hocko
  -1 siblings, 0 replies; 79+ messages in thread
From: Michal Hocko @ 2017-03-02  8:42 UTC (permalink / raw)
  To: Xiong Zhou, Anshuman Khandual
  Cc: Christoph Hellwig, linux-xfs, linux-mm, linux-kernel, linux-fsdevel

On Thu 02-03-17 12:17:47, Anshuman Khandual wrote:
> On 03/02/2017 10:49 AM, Xiong Zhou wrote:
> > On Wed, Mar 01, 2017 at 04:37:31PM -0800, Christoph Hellwig wrote:
> >> On Wed, Mar 01, 2017 at 12:46:34PM +0800, Xiong Zhou wrote:
> >>> Hi,
> >>>
> >>> It's reproduciable, not everytime though. Ext4 works fine.
> >> On ext4 fsstress won't run bulkstat because it doesn't exist.  Either
> >> way this smells like a MM issue to me as there were not XFS changes
> >> in that area recently.
> > Yap.
> > 
> > First bad commit:
> > 
> > commit 5d17a73a2ebeb8d1c6924b91e53ab2650fe86ffb
> > Author: Michal Hocko <mhocko@suse.com>
> > Date:   Fri Feb 24 14:58:53 2017 -0800
> > 
> >     vmalloc: back off when the current task is killed
> > 
> > Reverting this commit on top of
> >   e5d56ef Merge tag 'watchdog-for-linus-v4.11'
> > survives the tests.
> 
> Does fsstress test or the system hang ? I am not familiar with this
> code but If it's the test which is getting hung and its hitting this
> new check introduced by the above commit that means the requester is
> currently being killed by OOM killer for some other memory allocation
> request.

Well, not exactly. It is sufficient for it to be _killed_ by SIGKILL.
And for that it just needs to do a group_exit when one thread was still
in the kernel (see zap_process). While I can change this check to
actually do the oom specific check I believe a more generic
fatal_signal_pending is the right thing to do here. I am still not sure
what is the actual problem here, though. Could you be more specific
please?

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: mm allocation failure and hang when running xfstests generic/269 on xfs
@ 2017-03-02  8:42         ` Michal Hocko
  0 siblings, 0 replies; 79+ messages in thread
From: Michal Hocko @ 2017-03-02  8:42 UTC (permalink / raw)
  To: Xiong Zhou, Anshuman Khandual
  Cc: Christoph Hellwig, linux-xfs, linux-mm, linux-kernel, linux-fsdevel

On Thu 02-03-17 12:17:47, Anshuman Khandual wrote:
> On 03/02/2017 10:49 AM, Xiong Zhou wrote:
> > On Wed, Mar 01, 2017 at 04:37:31PM -0800, Christoph Hellwig wrote:
> >> On Wed, Mar 01, 2017 at 12:46:34PM +0800, Xiong Zhou wrote:
> >>> Hi,
> >>>
> >>> It's reproduciable, not everytime though. Ext4 works fine.
> >> On ext4 fsstress won't run bulkstat because it doesn't exist.  Either
> >> way this smells like a MM issue to me as there were not XFS changes
> >> in that area recently.
> > Yap.
> > 
> > First bad commit:
> > 
> > commit 5d17a73a2ebeb8d1c6924b91e53ab2650fe86ffb
> > Author: Michal Hocko <mhocko@suse.com>
> > Date:   Fri Feb 24 14:58:53 2017 -0800
> > 
> >     vmalloc: back off when the current task is killed
> > 
> > Reverting this commit on top of
> >   e5d56ef Merge tag 'watchdog-for-linus-v4.11'
> > survives the tests.
> 
> Does fsstress test or the system hang ? I am not familiar with this
> code but If it's the test which is getting hung and its hitting this
> new check introduced by the above commit that means the requester is
> currently being killed by OOM killer for some other memory allocation
> request.

Well, not exactly. It is sufficient for it to be _killed_ by SIGKILL.
And for that it just needs to do a group_exit when one thread was still
in the kernel (see zap_process). While I can change this check to
actually do the oom specific check I believe a more generic
fatal_signal_pending is the right thing to do here. I am still not sure
what is the actual problem here, though. Could you be more specific
please?

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: mm allocation failure and hang when running xfstests generic/269 on xfs
  2017-03-02  8:42         ` Michal Hocko
@ 2017-03-02  9:23           ` Xiong Zhou
  -1 siblings, 0 replies; 79+ messages in thread
From: Xiong Zhou @ 2017-03-02  9:23 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Xiong Zhou, Anshuman Khandual, Christoph Hellwig, linux-xfs,
	linux-mm, linux-kernel, linux-fsdevel

On Thu, Mar 02, 2017 at 09:42:23AM +0100, Michal Hocko wrote:
> On Thu 02-03-17 12:17:47, Anshuman Khandual wrote:
> > On 03/02/2017 10:49 AM, Xiong Zhou wrote:
> > > On Wed, Mar 01, 2017 at 04:37:31PM -0800, Christoph Hellwig wrote:
> > >> On Wed, Mar 01, 2017 at 12:46:34PM +0800, Xiong Zhou wrote:
> > >>> Hi,
> > >>>
> > >>> It's reproduciable, not everytime though. Ext4 works fine.
> > >> On ext4 fsstress won't run bulkstat because it doesn't exist.  Either
> > >> way this smells like a MM issue to me as there were not XFS changes
> > >> in that area recently.
> > > Yap.
> > > 
> > > First bad commit:
> > > 
> > > commit 5d17a73a2ebeb8d1c6924b91e53ab2650fe86ffb
> > > Author: Michal Hocko <mhocko@suse.com>
> > > Date:   Fri Feb 24 14:58:53 2017 -0800
> > > 
> > >     vmalloc: back off when the current task is killed
> > > 
> > > Reverting this commit on top of
> > >   e5d56ef Merge tag 'watchdog-for-linus-v4.11'
> > > survives the tests.
> > 
> > Does fsstress test or the system hang ? I am not familiar with this
> > code but If it's the test which is getting hung and its hitting this
> > new check introduced by the above commit that means the requester is
> > currently being killed by OOM killer for some other memory allocation
> > request.
> 
> Well, not exactly. It is sufficient for it to be _killed_ by SIGKILL.
> And for that it just needs to do a group_exit when one thread was still
> in the kernel (see zap_process). While I can change this check to
> actually do the oom specific check I believe a more generic
> fatal_signal_pending is the right thing to do here. I am still not sure
> what is the actual problem here, though. Could you be more specific
> please?

It's blocking the test and system-shutdown. fsstress wont exit.

For anyone interested, a simple ugly reproducer:

cat > fst.sh <<EOFF
#! /bin/bash -x

[ \$# -ne 3 ] && { echo "./single FSTYP TEST XFSTESTS_DIR"; exit 1; }

FST=\$1
BLKSZ=4096

fallocate -l 10G /home/test.img
fallocate -l 15G /home/scratch.img

MNT1=/loopmnt
MNT2=/loopsch

mkdir -p \$MNT1
mkdir -p \$MNT2

DEV1=\$(losetup --find --show /home/test.img)
DEV2=\$(losetup --find --show /home/scratch.img)

cleanup()
{
	umount -d \$MNT1
	umount -d \$MNT2
	umount \$DEV1 \$DEV2
	losetup -D || losetup -a | awk -F: '{print \$1}' | xargs losetup -d
	rm -f /home/{test,scratch}.img
}

trap cleanup 0 1 2

if [[ \$FST =~ ext ]] ; then
	mkfs.\${FST} -Fq -b \${BLKSZ} \$DEV1
elif [[ \$FST =~ xfs ]] ; then
	mkfs.\${FST} -fq -b size=\${BLKSZ} \$DEV1
fi
if test \$? -ne 0 ; then
	echo "mkfs \$DEV1 failed" 
	exit 1
fi

if [[ \$FST =~ ext ]] ; then
	mkfs.\${FST} -Fq -b \${BLKSZ} \$DEV2
elif [[ \$FST =~ xfs ]] ; then
	mkfs.\${FST} -fq -b size=\${BLKSZ} \$DEV2
fi
if test \$? -ne 0 ; then
	echo "mkfs \$DEV2 failed"
	exit 1
fi

mount \$DEV1 \$MNT1
if test \$? -ne 0 ; then
	echo "mount \$DEV1 failed" 
	exit 1
fi
mount \$DEV2 \$MNT2
if test \$? -ne 0 ; then
	echo "mount \$DEV2 failed" 
	exit 1
fi

pushd \$3 || exit 1

cat > local.config <<EOF
TEST_DEV=\$DEV1
TEST_DIR=\$MNT1
SCRATCH_MNT=\$MNT2
SCRATCH_DEV=\$DEV2
EOF

i=0
while [ \$i -lt 50 ] ; do
	./check \$2
	echo \$i
	((i=\$i+1))
done

popd
EOFF

git clone git://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git
cd xfstests-dev
make -j2 && make install || exit 1
cd -
sh -x ./fst.sh xfs generic/269 xfstests-dev


> -- 
> Michal Hocko
> SUSE Labs

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: mm allocation failure and hang when running xfstests generic/269 on xfs
@ 2017-03-02  9:23           ` Xiong Zhou
  0 siblings, 0 replies; 79+ messages in thread
From: Xiong Zhou @ 2017-03-02  9:23 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Xiong Zhou, Anshuman Khandual, Christoph Hellwig, linux-xfs,
	linux-mm, linux-kernel, linux-fsdevel

On Thu, Mar 02, 2017 at 09:42:23AM +0100, Michal Hocko wrote:
> On Thu 02-03-17 12:17:47, Anshuman Khandual wrote:
> > On 03/02/2017 10:49 AM, Xiong Zhou wrote:
> > > On Wed, Mar 01, 2017 at 04:37:31PM -0800, Christoph Hellwig wrote:
> > >> On Wed, Mar 01, 2017 at 12:46:34PM +0800, Xiong Zhou wrote:
> > >>> Hi,
> > >>>
> > >>> It's reproduciable, not everytime though. Ext4 works fine.
> > >> On ext4 fsstress won't run bulkstat because it doesn't exist.  Either
> > >> way this smells like a MM issue to me as there were not XFS changes
> > >> in that area recently.
> > > Yap.
> > > 
> > > First bad commit:
> > > 
> > > commit 5d17a73a2ebeb8d1c6924b91e53ab2650fe86ffb
> > > Author: Michal Hocko <mhocko@suse.com>
> > > Date:   Fri Feb 24 14:58:53 2017 -0800
> > > 
> > >     vmalloc: back off when the current task is killed
> > > 
> > > Reverting this commit on top of
> > >   e5d56ef Merge tag 'watchdog-for-linus-v4.11'
> > > survives the tests.
> > 
> > Does fsstress test or the system hang ? I am not familiar with this
> > code but If it's the test which is getting hung and its hitting this
> > new check introduced by the above commit that means the requester is
> > currently being killed by OOM killer for some other memory allocation
> > request.
> 
> Well, not exactly. It is sufficient for it to be _killed_ by SIGKILL.
> And for that it just needs to do a group_exit when one thread was still
> in the kernel (see zap_process). While I can change this check to
> actually do the oom specific check I believe a more generic
> fatal_signal_pending is the right thing to do here. I am still not sure
> what is the actual problem here, though. Could you be more specific
> please?

It's blocking the test and system-shutdown. fsstress wont exit.

For anyone interested, a simple ugly reproducer:

cat > fst.sh <<EOFF
#! /bin/bash -x

[ \$# -ne 3 ] && { echo "./single FSTYP TEST XFSTESTS_DIR"; exit 1; }

FST=\$1
BLKSZ=4096

fallocate -l 10G /home/test.img
fallocate -l 15G /home/scratch.img

MNT1=/loopmnt
MNT2=/loopsch

mkdir -p \$MNT1
mkdir -p \$MNT2

DEV1=\$(losetup --find --show /home/test.img)
DEV2=\$(losetup --find --show /home/scratch.img)

cleanup()
{
	umount -d \$MNT1
	umount -d \$MNT2
	umount \$DEV1 \$DEV2
	losetup -D || losetup -a | awk -F: '{print \$1}' | xargs losetup -d
	rm -f /home/{test,scratch}.img
}

trap cleanup 0 1 2

if [[ \$FST =~ ext ]] ; then
	mkfs.\${FST} -Fq -b \${BLKSZ} \$DEV1
elif [[ \$FST =~ xfs ]] ; then
	mkfs.\${FST} -fq -b size=\${BLKSZ} \$DEV1
fi
if test \$? -ne 0 ; then
	echo "mkfs \$DEV1 failed" 
	exit 1
fi

if [[ \$FST =~ ext ]] ; then
	mkfs.\${FST} -Fq -b \${BLKSZ} \$DEV2
elif [[ \$FST =~ xfs ]] ; then
	mkfs.\${FST} -fq -b size=\${BLKSZ} \$DEV2
fi
if test \$? -ne 0 ; then
	echo "mkfs \$DEV2 failed"
	exit 1
fi

mount \$DEV1 \$MNT1
if test \$? -ne 0 ; then
	echo "mount \$DEV1 failed" 
	exit 1
fi
mount \$DEV2 \$MNT2
if test \$? -ne 0 ; then
	echo "mount \$DEV2 failed" 
	exit 1
fi

pushd \$3 || exit 1

cat > local.config <<EOF
TEST_DEV=\$DEV1
TEST_DIR=\$MNT1
SCRATCH_MNT=\$MNT2
SCRATCH_DEV=\$DEV2
EOF

i=0
while [ \$i -lt 50 ] ; do
	./check \$2
	echo \$i
	((i=\$i+1))
done

popd
EOFF

git clone git://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git
cd xfstests-dev
make -j2 && make install || exit 1
cd -
sh -x ./fst.sh xfs generic/269 xfstests-dev


> -- 
> Michal Hocko
> SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: mm allocation failure and hang when running xfstests generic/269 on xfs
  2017-03-02  5:19     ` Xiong Zhou
@ 2017-03-02 10:04       ` Tetsuo Handa
  -1 siblings, 0 replies; 79+ messages in thread
From: Tetsuo Handa @ 2017-03-02 10:04 UTC (permalink / raw)
  To: Xiong Zhou, Christoph Hellwig, mhocko
  Cc: linux-xfs, linux-mm, linux-kernel, linux-fsdevel

On 2017/03/02 14:19, Xiong Zhou wrote:
> On Wed, Mar 01, 2017 at 04:37:31PM -0800, Christoph Hellwig wrote:
>> On Wed, Mar 01, 2017 at 12:46:34PM +0800, Xiong Zhou wrote:
>>> Hi,
>>>
>>> It's reproduciable, not everytime though. Ext4 works fine.
>>
>> On ext4 fsstress won't run bulkstat because it doesn't exist.  Either
>> way this smells like a MM issue to me as there were not XFS changes
>> in that area recently.
> 
> Yap.
> 
> First bad commit:
> 
> commit 5d17a73a2ebeb8d1c6924b91e53ab2650fe86ffb
> Author: Michal Hocko <mhocko@suse.com>
> Date:   Fri Feb 24 14:58:53 2017 -0800
> 
>     vmalloc: back off when the current task is killed
> 
> Reverting this commit on top of
>   e5d56ef Merge tag 'watchdog-for-linus-v4.11'
> survives the tests.
> 

Looks like kmem_zalloc_greedy() is broken.
It loops forever until vzalloc() succeeds.
If somebody (not limited to the OOM killer) sends SIGKILL and
vmalloc() backs off, kmem_zalloc_greedy() will loop forever.

----------------------------------------
void *
kmem_zalloc_greedy(size_t *size, size_t minsize, size_t maxsize)
{
        void            *ptr;
        size_t          kmsize = maxsize;

        while (!(ptr = vzalloc(kmsize))) {
                if ((kmsize >>= 1) <= minsize)
                        kmsize = minsize;
        }
        if (ptr)
                *size = kmsize;
        return ptr;
}
----------------------------------------

So, commit 5d17a73a2ebeb8d1("vmalloc: back off when the current task is
killed") implemented __GFP_KILLABLE flag and automatically applied that
flag. As a result, those who are not ready to fail upon SIGKILL are
confused. ;-)

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: mm allocation failure and hang when running xfstests generic/269 on xfs
@ 2017-03-02 10:04       ` Tetsuo Handa
  0 siblings, 0 replies; 79+ messages in thread
From: Tetsuo Handa @ 2017-03-02 10:04 UTC (permalink / raw)
  To: Xiong Zhou, Christoph Hellwig, mhocko
  Cc: linux-xfs, linux-mm, linux-kernel, linux-fsdevel

On 2017/03/02 14:19, Xiong Zhou wrote:
> On Wed, Mar 01, 2017 at 04:37:31PM -0800, Christoph Hellwig wrote:
>> On Wed, Mar 01, 2017 at 12:46:34PM +0800, Xiong Zhou wrote:
>>> Hi,
>>>
>>> It's reproduciable, not everytime though. Ext4 works fine.
>>
>> On ext4 fsstress won't run bulkstat because it doesn't exist.  Either
>> way this smells like a MM issue to me as there were not XFS changes
>> in that area recently.
> 
> Yap.
> 
> First bad commit:
> 
> commit 5d17a73a2ebeb8d1c6924b91e53ab2650fe86ffb
> Author: Michal Hocko <mhocko@suse.com>
> Date:   Fri Feb 24 14:58:53 2017 -0800
> 
>     vmalloc: back off when the current task is killed
> 
> Reverting this commit on top of
>   e5d56ef Merge tag 'watchdog-for-linus-v4.11'
> survives the tests.
> 

Looks like kmem_zalloc_greedy() is broken.
It loops forever until vzalloc() succeeds.
If somebody (not limited to the OOM killer) sends SIGKILL and
vmalloc() backs off, kmem_zalloc_greedy() will loop forever.

----------------------------------------
void *
kmem_zalloc_greedy(size_t *size, size_t minsize, size_t maxsize)
{
        void            *ptr;
        size_t          kmsize = maxsize;

        while (!(ptr = vzalloc(kmsize))) {
                if ((kmsize >>= 1) <= minsize)
                        kmsize = minsize;
        }
        if (ptr)
                *size = kmsize;
        return ptr;
}
----------------------------------------

So, commit 5d17a73a2ebeb8d1("vmalloc: back off when the current task is
killed") implemented __GFP_KILLABLE flag and automatically applied that
flag. As a result, those who are not ready to fail upon SIGKILL are
confused. ;-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: mm allocation failure and hang when running xfstests generic/269 on xfs
  2017-03-02 10:04       ` Tetsuo Handa
  (?)
@ 2017-03-02 10:35         ` Michal Hocko
  -1 siblings, 0 replies; 79+ messages in thread
From: Michal Hocko @ 2017-03-02 10:35 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Xiong Zhou, Christoph Hellwig, linux-xfs, linux-mm, linux-kernel,
	linux-fsdevel

On Thu 02-03-17 19:04:48, Tetsuo Handa wrote:
[...]
> So, commit 5d17a73a2ebeb8d1("vmalloc: back off when the current task is
> killed") implemented __GFP_KILLABLE flag and automatically applied that
> flag. As a result, those who are not ready to fail upon SIGKILL are
> confused. ;-)

You are right! The function is documented it might fail but the code
doesn't really allow that. This seems like a bug to me. What do you
think about the following?
---
>From d02cb0285d8ce3344fd64dc7e2912e9a04bef80d Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.com>
Date: Thu, 2 Mar 2017 11:31:11 +0100
Subject: [PATCH] xfs: allow kmem_zalloc_greedy to fail

Even though kmem_zalloc_greedy is documented it might fail the current
code doesn't really implement this properly and loops on the smallest
allowed size for ever. This is a problem because vzalloc might fail
permanently. Since 5d17a73a2ebe ("vmalloc: back off when the current
task is killed") such a failure is much more probable than it used to
be. Fix this by bailing out if the minimum size request failed.

This has been noticed by a hung generic/269 xfstest by Xiong Zhou.

Reported-by: Xiong Zhou <xzhou@redhat.com>
Analyzed-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 fs/xfs/kmem.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c
index 339c696bbc01..ee95f5c6db45 100644
--- a/fs/xfs/kmem.c
+++ b/fs/xfs/kmem.c
@@ -34,6 +34,8 @@ kmem_zalloc_greedy(size_t *size, size_t minsize, size_t maxsize)
 	size_t		kmsize = maxsize;
 
 	while (!(ptr = vzalloc(kmsize))) {
+		if (kmsize == minsize)
+			break;
 		if ((kmsize >>= 1) <= minsize)
 			kmsize = minsize;
 	}
-- 
2.11.0

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* Re: mm allocation failure and hang when running xfstests generic/269 on xfs
@ 2017-03-02 10:35         ` Michal Hocko
  0 siblings, 0 replies; 79+ messages in thread
From: Michal Hocko @ 2017-03-02 10:35 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Xiong Zhou, Christoph Hellwig, linux-xfs, linux-mm, linux-kernel,
	linux-fsdevel

On Thu 02-03-17 19:04:48, Tetsuo Handa wrote:
[...]
> So, commit 5d17a73a2ebeb8d1("vmalloc: back off when the current task is
> killed") implemented __GFP_KILLABLE flag and automatically applied that
> flag. As a result, those who are not ready to fail upon SIGKILL are
> confused. ;-)

You are right! The function is documented it might fail but the code
doesn't really allow that. This seems like a bug to me. What do you
think about the following?
---
>From d02cb0285d8ce3344fd64dc7e2912e9a04bef80d Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.com>
Date: Thu, 2 Mar 2017 11:31:11 +0100
Subject: [PATCH] xfs: allow kmem_zalloc_greedy to fail

Even though kmem_zalloc_greedy is documented it might fail the current
code doesn't really implement this properly and loops on the smallest
allowed size for ever. This is a problem because vzalloc might fail
permanently. Since 5d17a73a2ebe ("vmalloc: back off when the current
task is killed") such a failure is much more probable than it used to
be. Fix this by bailing out if the minimum size request failed.

This has been noticed by a hung generic/269 xfstest by Xiong Zhou.

Reported-by: Xiong Zhou <xzhou@redhat.com>
Analyzed-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 fs/xfs/kmem.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c
index 339c696bbc01..ee95f5c6db45 100644
--- a/fs/xfs/kmem.c
+++ b/fs/xfs/kmem.c
@@ -34,6 +34,8 @@ kmem_zalloc_greedy(size_t *size, size_t minsize, size_t maxsize)
 	size_t		kmsize = maxsize;
 
 	while (!(ptr = vzalloc(kmsize))) {
+		if (kmsize == minsize)
+			break;
 		if ((kmsize >>= 1) <= minsize)
 			kmsize = minsize;
 	}
-- 
2.11.0

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* Re: mm allocation failure and hang when running xfstests generic/269 on xfs
@ 2017-03-02 10:35         ` Michal Hocko
  0 siblings, 0 replies; 79+ messages in thread
From: Michal Hocko @ 2017-03-02 10:35 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Xiong Zhou, Christoph Hellwig, linux-xfs, linux-mm, linux-kernel,
	linux-fsdevel

On Thu 02-03-17 19:04:48, Tetsuo Handa wrote:
[...]
> So, commit 5d17a73a2ebeb8d1("vmalloc: back off when the current task is
> killed") implemented __GFP_KILLABLE flag and automatically applied that
> flag. As a result, those who are not ready to fail upon SIGKILL are
> confused. ;-)

You are right! The function is documented it might fail but the code
doesn't really allow that. This seems like a bug to me. What do you
think about the following?
---

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: mm allocation failure and hang when running xfstests generic/269on xfs
  2017-03-02 10:35         ` Michal Hocko
@ 2017-03-02 10:53           ` Tetsuo Handa
  -1 siblings, 0 replies; 79+ messages in thread
From: Tetsuo Handa @ 2017-03-02 10:53 UTC (permalink / raw)
  To: mhocko; +Cc: xzhou, hch, linux-xfs, linux-mm, linux-kernel, linux-fsdevel

Michal Hocko wrote:
> On Thu 02-03-17 19:04:48, Tetsuo Handa wrote:
> [...]
> > So, commit 5d17a73a2ebeb8d1("vmalloc: back off when the current task is
> > killed") implemented __GFP_KILLABLE flag and automatically applied that
> > flag. As a result, those who are not ready to fail upon SIGKILL are
> > confused. ;-)
> 
> You are right! The function is documented it might fail but the code
> doesn't really allow that. This seems like a bug to me. What do you
> think about the following?

Yes, I think this patch is correct. Maybe

-May fail and may return vmalloced memory.
+May fail.

together, for it is always vmalloc()ed memory.

> ---
> From d02cb0285d8ce3344fd64dc7e2912e9a04bef80d Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.com>
> Date: Thu, 2 Mar 2017 11:31:11 +0100
> Subject: [PATCH] xfs: allow kmem_zalloc_greedy to fail
> 
> Even though kmem_zalloc_greedy is documented it might fail the current
> code doesn't really implement this properly and loops on the smallest
> allowed size for ever. This is a problem because vzalloc might fail
> permanently. Since 5d17a73a2ebe ("vmalloc: back off when the current
> task is killed") such a failure is much more probable than it used to
> be. Fix this by bailing out if the minimum size request failed.
> 
> This has been noticed by a hung generic/269 xfstest by Xiong Zhou.
> 
> Reported-by: Xiong Zhou <xzhou@redhat.com>
> Analyzed-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  fs/xfs/kmem.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c
> index 339c696bbc01..ee95f5c6db45 100644
> --- a/fs/xfs/kmem.c
> +++ b/fs/xfs/kmem.c
> @@ -34,6 +34,8 @@ kmem_zalloc_greedy(size_t *size, size_t minsize, size_t maxsize)
>  	size_t		kmsize = maxsize;
>  
>  	while (!(ptr = vzalloc(kmsize))) {
> +		if (kmsize == minsize)
> +			break;
>  		if ((kmsize >>= 1) <= minsize)
>  			kmsize = minsize;
>  	}
> -- 
> 2.11.0
> 
> -- 
> Michal Hocko
> SUSE Labs
> 

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: mm allocation failure and hang when running xfstests generic/269on xfs
@ 2017-03-02 10:53           ` Tetsuo Handa
  0 siblings, 0 replies; 79+ messages in thread
From: Tetsuo Handa @ 2017-03-02 10:53 UTC (permalink / raw)
  To: mhocko; +Cc: xzhou, hch, linux-xfs, linux-mm, linux-kernel, linux-fsdevel

Michal Hocko wrote:
> On Thu 02-03-17 19:04:48, Tetsuo Handa wrote:
> [...]
> > So, commit 5d17a73a2ebeb8d1("vmalloc: back off when the current task is
> > killed") implemented __GFP_KILLABLE flag and automatically applied that
> > flag. As a result, those who are not ready to fail upon SIGKILL are
> > confused. ;-)
> 
> You are right! The function is documented it might fail but the code
> doesn't really allow that. This seems like a bug to me. What do you
> think about the following?

Yes, I think this patch is correct. Maybe

-May fail and may return vmalloced memory.
+May fail.

together, for it is always vmalloc()ed memory.

> ---
> From d02cb0285d8ce3344fd64dc7e2912e9a04bef80d Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.com>
> Date: Thu, 2 Mar 2017 11:31:11 +0100
> Subject: [PATCH] xfs: allow kmem_zalloc_greedy to fail
> 
> Even though kmem_zalloc_greedy is documented it might fail the current
> code doesn't really implement this properly and loops on the smallest
> allowed size for ever. This is a problem because vzalloc might fail
> permanently. Since 5d17a73a2ebe ("vmalloc: back off when the current
> task is killed") such a failure is much more probable than it used to
> be. Fix this by bailing out if the minimum size request failed.
> 
> This has been noticed by a hung generic/269 xfstest by Xiong Zhou.
> 
> Reported-by: Xiong Zhou <xzhou@redhat.com>
> Analyzed-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  fs/xfs/kmem.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c
> index 339c696bbc01..ee95f5c6db45 100644
> --- a/fs/xfs/kmem.c
> +++ b/fs/xfs/kmem.c
> @@ -34,6 +34,8 @@ kmem_zalloc_greedy(size_t *size, size_t minsize, size_t maxsize)
>  	size_t		kmsize = maxsize;
>  
>  	while (!(ptr = vzalloc(kmsize))) {
> +		if (kmsize == minsize)
> +			break;
>  		if ((kmsize >>= 1) <= minsize)
>  			kmsize = minsize;
>  	}
> -- 
> 2.11.0
> 
> -- 
> Michal Hocko
> SUSE Labs
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: mm allocation failure and hang when running xfstests generic/269 on xfs
  2017-03-02 10:35         ` Michal Hocko
@ 2017-03-02 12:24           ` Brian Foster
  -1 siblings, 0 replies; 79+ messages in thread
From: Brian Foster @ 2017-03-02 12:24 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Tetsuo Handa, Xiong Zhou, Christoph Hellwig, linux-xfs, linux-mm,
	linux-kernel, linux-fsdevel

On Thu, Mar 02, 2017 at 11:35:20AM +0100, Michal Hocko wrote:
> On Thu 02-03-17 19:04:48, Tetsuo Handa wrote:
> [...]
> > So, commit 5d17a73a2ebeb8d1("vmalloc: back off when the current task is
> > killed") implemented __GFP_KILLABLE flag and automatically applied that
> > flag. As a result, those who are not ready to fail upon SIGKILL are
> > confused. ;-)
> 
> You are right! The function is documented it might fail but the code
> doesn't really allow that. This seems like a bug to me. What do you
> think about the following?
> ---
> From d02cb0285d8ce3344fd64dc7e2912e9a04bef80d Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.com>
> Date: Thu, 2 Mar 2017 11:31:11 +0100
> Subject: [PATCH] xfs: allow kmem_zalloc_greedy to fail
> 
> Even though kmem_zalloc_greedy is documented it might fail the current
> code doesn't really implement this properly and loops on the smallest
> allowed size for ever. This is a problem because vzalloc might fail
> permanently. Since 5d17a73a2ebe ("vmalloc: back off when the current
> task is killed") such a failure is much more probable than it used to
> be. Fix this by bailing out if the minimum size request failed.
> 
> This has been noticed by a hung generic/269 xfstest by Xiong Zhou.
> 
> Reported-by: Xiong Zhou <xzhou@redhat.com>
> Analyzed-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  fs/xfs/kmem.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c
> index 339c696bbc01..ee95f5c6db45 100644
> --- a/fs/xfs/kmem.c
> +++ b/fs/xfs/kmem.c
> @@ -34,6 +34,8 @@ kmem_zalloc_greedy(size_t *size, size_t minsize, size_t maxsize)
>  	size_t		kmsize = maxsize;
>  
>  	while (!(ptr = vzalloc(kmsize))) {
> +		if (kmsize == minsize)
> +			break;
>  		if ((kmsize >>= 1) <= minsize)
>  			kmsize = minsize;
>  	}

More consistent with the rest of the kmem code might be to accept a
flags argument and do something like this based on KM_MAYFAIL. The one
current caller looks like it would pass it, but I suppose we'd still
need a mechanism to break out should a new caller not pass that flag.
Would a fatal_signal_pending() check in the loop as well allow us to
break out in the scenario that is reproduced here?

Brian

> -- 
> 2.11.0
> 
> -- 
> Michal Hocko
> SUSE Labs
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: mm allocation failure and hang when running xfstests generic/269 on xfs
@ 2017-03-02 12:24           ` Brian Foster
  0 siblings, 0 replies; 79+ messages in thread
From: Brian Foster @ 2017-03-02 12:24 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Tetsuo Handa, Xiong Zhou, Christoph Hellwig, linux-xfs, linux-mm,
	linux-kernel, linux-fsdevel

On Thu, Mar 02, 2017 at 11:35:20AM +0100, Michal Hocko wrote:
> On Thu 02-03-17 19:04:48, Tetsuo Handa wrote:
> [...]
> > So, commit 5d17a73a2ebeb8d1("vmalloc: back off when the current task is
> > killed") implemented __GFP_KILLABLE flag and automatically applied that
> > flag. As a result, those who are not ready to fail upon SIGKILL are
> > confused. ;-)
> 
> You are right! The function is documented it might fail but the code
> doesn't really allow that. This seems like a bug to me. What do you
> think about the following?
> ---
> From d02cb0285d8ce3344fd64dc7e2912e9a04bef80d Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.com>
> Date: Thu, 2 Mar 2017 11:31:11 +0100
> Subject: [PATCH] xfs: allow kmem_zalloc_greedy to fail
> 
> Even though kmem_zalloc_greedy is documented it might fail the current
> code doesn't really implement this properly and loops on the smallest
> allowed size for ever. This is a problem because vzalloc might fail
> permanently. Since 5d17a73a2ebe ("vmalloc: back off when the current
> task is killed") such a failure is much more probable than it used to
> be. Fix this by bailing out if the minimum size request failed.
> 
> This has been noticed by a hung generic/269 xfstest by Xiong Zhou.
> 
> Reported-by: Xiong Zhou <xzhou@redhat.com>
> Analyzed-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  fs/xfs/kmem.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c
> index 339c696bbc01..ee95f5c6db45 100644
> --- a/fs/xfs/kmem.c
> +++ b/fs/xfs/kmem.c
> @@ -34,6 +34,8 @@ kmem_zalloc_greedy(size_t *size, size_t minsize, size_t maxsize)
>  	size_t		kmsize = maxsize;
>  
>  	while (!(ptr = vzalloc(kmsize))) {
> +		if (kmsize == minsize)
> +			break;
>  		if ((kmsize >>= 1) <= minsize)
>  			kmsize = minsize;
>  	}

More consistent with the rest of the kmem code might be to accept a
flags argument and do something like this based on KM_MAYFAIL. The one
current caller looks like it would pass it, but I suppose we'd still
need a mechanism to break out should a new caller not pass that flag.
Would a fatal_signal_pending() check in the loop as well allow us to
break out in the scenario that is reproduced here?

Brian

> -- 
> 2.11.0
> 
> -- 
> Michal Hocko
> SUSE Labs
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: mm allocation failure and hang when running xfstests generic/269 on xfs
  2017-03-02 12:24           ` Brian Foster
@ 2017-03-02 12:49             ` Michal Hocko
  -1 siblings, 0 replies; 79+ messages in thread
From: Michal Hocko @ 2017-03-02 12:49 UTC (permalink / raw)
  To: Brian Foster
  Cc: Tetsuo Handa, Xiong Zhou, Christoph Hellwig, linux-xfs, linux-mm,
	linux-kernel, linux-fsdevel

On Thu 02-03-17 07:24:27, Brian Foster wrote:
> On Thu, Mar 02, 2017 at 11:35:20AM +0100, Michal Hocko wrote:
> > On Thu 02-03-17 19:04:48, Tetsuo Handa wrote:
> > [...]
> > > So, commit 5d17a73a2ebeb8d1("vmalloc: back off when the current task is
> > > killed") implemented __GFP_KILLABLE flag and automatically applied that
> > > flag. As a result, those who are not ready to fail upon SIGKILL are
> > > confused. ;-)
> > 
> > You are right! The function is documented it might fail but the code
> > doesn't really allow that. This seems like a bug to me. What do you
> > think about the following?
> > ---
> > From d02cb0285d8ce3344fd64dc7e2912e9a04bef80d Mon Sep 17 00:00:00 2001
> > From: Michal Hocko <mhocko@suse.com>
> > Date: Thu, 2 Mar 2017 11:31:11 +0100
> > Subject: [PATCH] xfs: allow kmem_zalloc_greedy to fail
> > 
> > Even though kmem_zalloc_greedy is documented it might fail the current
> > code doesn't really implement this properly and loops on the smallest
> > allowed size for ever. This is a problem because vzalloc might fail
> > permanently. Since 5d17a73a2ebe ("vmalloc: back off when the current
> > task is killed") such a failure is much more probable than it used to
> > be. Fix this by bailing out if the minimum size request failed.
> > 
> > This has been noticed by a hung generic/269 xfstest by Xiong Zhou.
> > 
> > Reported-by: Xiong Zhou <xzhou@redhat.com>
> > Analyzed-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> > Signed-off-by: Michal Hocko <mhocko@suse.com>
> > ---
> >  fs/xfs/kmem.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c
> > index 339c696bbc01..ee95f5c6db45 100644
> > --- a/fs/xfs/kmem.c
> > +++ b/fs/xfs/kmem.c
> > @@ -34,6 +34,8 @@ kmem_zalloc_greedy(size_t *size, size_t minsize, size_t maxsize)
> >  	size_t		kmsize = maxsize;
> >  
> >  	while (!(ptr = vzalloc(kmsize))) {
> > +		if (kmsize == minsize)
> > +			break;
> >  		if ((kmsize >>= 1) <= minsize)
> >  			kmsize = minsize;
> >  	}
> 
> More consistent with the rest of the kmem code might be to accept a
> flags argument and do something like this based on KM_MAYFAIL.

Well, vmalloc doesn't really support GFP_NOFAIL semantic right now for
the same reason it doesn't support GFP_NOFS. So I am not sure this is a
good idea.

> The one
> current caller looks like it would pass it, but I suppose we'd still
> need a mechanism to break out should a new caller not pass that flag.
> Would a fatal_signal_pending() check in the loop as well allow us to
> break out in the scenario that is reproduced here?

Yes that check would work as well I just thought the break out when the
minsize request fails to be more logical. There might be other reasons
to fail the request and looping here seems just wrong. But whatever you
or other xfs people prefer.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: mm allocation failure and hang when running xfstests generic/269 on xfs
@ 2017-03-02 12:49             ` Michal Hocko
  0 siblings, 0 replies; 79+ messages in thread
From: Michal Hocko @ 2017-03-02 12:49 UTC (permalink / raw)
  To: Brian Foster
  Cc: Tetsuo Handa, Xiong Zhou, Christoph Hellwig, linux-xfs, linux-mm,
	linux-kernel, linux-fsdevel

On Thu 02-03-17 07:24:27, Brian Foster wrote:
> On Thu, Mar 02, 2017 at 11:35:20AM +0100, Michal Hocko wrote:
> > On Thu 02-03-17 19:04:48, Tetsuo Handa wrote:
> > [...]
> > > So, commit 5d17a73a2ebeb8d1("vmalloc: back off when the current task is
> > > killed") implemented __GFP_KILLABLE flag and automatically applied that
> > > flag. As a result, those who are not ready to fail upon SIGKILL are
> > > confused. ;-)
> > 
> > You are right! The function is documented it might fail but the code
> > doesn't really allow that. This seems like a bug to me. What do you
> > think about the following?
> > ---
> > From d02cb0285d8ce3344fd64dc7e2912e9a04bef80d Mon Sep 17 00:00:00 2001
> > From: Michal Hocko <mhocko@suse.com>
> > Date: Thu, 2 Mar 2017 11:31:11 +0100
> > Subject: [PATCH] xfs: allow kmem_zalloc_greedy to fail
> > 
> > Even though kmem_zalloc_greedy is documented it might fail the current
> > code doesn't really implement this properly and loops on the smallest
> > allowed size for ever. This is a problem because vzalloc might fail
> > permanently. Since 5d17a73a2ebe ("vmalloc: back off when the current
> > task is killed") such a failure is much more probable than it used to
> > be. Fix this by bailing out if the minimum size request failed.
> > 
> > This has been noticed by a hung generic/269 xfstest by Xiong Zhou.
> > 
> > Reported-by: Xiong Zhou <xzhou@redhat.com>
> > Analyzed-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> > Signed-off-by: Michal Hocko <mhocko@suse.com>
> > ---
> >  fs/xfs/kmem.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c
> > index 339c696bbc01..ee95f5c6db45 100644
> > --- a/fs/xfs/kmem.c
> > +++ b/fs/xfs/kmem.c
> > @@ -34,6 +34,8 @@ kmem_zalloc_greedy(size_t *size, size_t minsize, size_t maxsize)
> >  	size_t		kmsize = maxsize;
> >  
> >  	while (!(ptr = vzalloc(kmsize))) {
> > +		if (kmsize == minsize)
> > +			break;
> >  		if ((kmsize >>= 1) <= minsize)
> >  			kmsize = minsize;
> >  	}
> 
> More consistent with the rest of the kmem code might be to accept a
> flags argument and do something like this based on KM_MAYFAIL.

Well, vmalloc doesn't really support GFP_NOFAIL semantic right now for
the same reason it doesn't support GFP_NOFS. So I am not sure this is a
good idea.

> The one
> current caller looks like it would pass it, but I suppose we'd still
> need a mechanism to break out should a new caller not pass that flag.
> Would a fatal_signal_pending() check in the loop as well allow us to
> break out in the scenario that is reproduced here?

Yes that check would work as well I just thought the break out when the
minsize request fails to be more logical. There might be other reasons
to fail the request and looping here seems just wrong. But whatever you
or other xfs people prefer.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: mm allocation failure and hang when running xfstests generic/269 on xfs
  2017-03-02 12:49             ` Michal Hocko
@ 2017-03-02 13:00               ` Brian Foster
  -1 siblings, 0 replies; 79+ messages in thread
From: Brian Foster @ 2017-03-02 13:00 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Tetsuo Handa, Xiong Zhou, Christoph Hellwig, linux-xfs, linux-mm,
	linux-kernel, linux-fsdevel

On Thu, Mar 02, 2017 at 01:49:09PM +0100, Michal Hocko wrote:
> On Thu 02-03-17 07:24:27, Brian Foster wrote:
> > On Thu, Mar 02, 2017 at 11:35:20AM +0100, Michal Hocko wrote:
> > > On Thu 02-03-17 19:04:48, Tetsuo Handa wrote:
> > > [...]
> > > > So, commit 5d17a73a2ebeb8d1("vmalloc: back off when the current task is
> > > > killed") implemented __GFP_KILLABLE flag and automatically applied that
> > > > flag. As a result, those who are not ready to fail upon SIGKILL are
> > > > confused. ;-)
> > > 
> > > You are right! The function is documented it might fail but the code
> > > doesn't really allow that. This seems like a bug to me. What do you
> > > think about the following?
> > > ---
> > > From d02cb0285d8ce3344fd64dc7e2912e9a04bef80d Mon Sep 17 00:00:00 2001
> > > From: Michal Hocko <mhocko@suse.com>
> > > Date: Thu, 2 Mar 2017 11:31:11 +0100
> > > Subject: [PATCH] xfs: allow kmem_zalloc_greedy to fail
> > > 
> > > Even though kmem_zalloc_greedy is documented it might fail the current
> > > code doesn't really implement this properly and loops on the smallest
> > > allowed size for ever. This is a problem because vzalloc might fail
> > > permanently. Since 5d17a73a2ebe ("vmalloc: back off when the current
> > > task is killed") such a failure is much more probable than it used to
> > > be. Fix this by bailing out if the minimum size request failed.
> > > 
> > > This has been noticed by a hung generic/269 xfstest by Xiong Zhou.
> > > 
> > > Reported-by: Xiong Zhou <xzhou@redhat.com>
> > > Analyzed-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> > > Signed-off-by: Michal Hocko <mhocko@suse.com>
> > > ---
> > >  fs/xfs/kmem.c | 2 ++
> > >  1 file changed, 2 insertions(+)
> > > 
> > > diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c
> > > index 339c696bbc01..ee95f5c6db45 100644
> > > --- a/fs/xfs/kmem.c
> > > +++ b/fs/xfs/kmem.c
> > > @@ -34,6 +34,8 @@ kmem_zalloc_greedy(size_t *size, size_t minsize, size_t maxsize)
> > >  	size_t		kmsize = maxsize;
> > >  
> > >  	while (!(ptr = vzalloc(kmsize))) {
> > > +		if (kmsize == minsize)
> > > +			break;
> > >  		if ((kmsize >>= 1) <= minsize)
> > >  			kmsize = minsize;
> > >  	}
> > 
> > More consistent with the rest of the kmem code might be to accept a
> > flags argument and do something like this based on KM_MAYFAIL.
> 
> Well, vmalloc doesn't really support GFP_NOFAIL semantic right now for
> the same reason it doesn't support GFP_NOFS. So I am not sure this is a
> good idea.
> 

Not sure I follow..? I'm just suggesting to control the loop behavior
based on the KM_ flag, not to do or change anything wrt to GFP_ flags.

> > The one
> > current caller looks like it would pass it, but I suppose we'd still
> > need a mechanism to break out should a new caller not pass that flag.
> > Would a fatal_signal_pending() check in the loop as well allow us to
> > break out in the scenario that is reproduced here?
> 
> Yes that check would work as well I just thought the break out when the
> minsize request fails to be more logical. There might be other reasons
> to fail the request and looping here seems just wrong. But whatever you
> or other xfs people prefer.

There may be higher level reasons for why this code should or should not
loop, that just seems like a separate issue to me. My thinking is more
that this appears to be how every kmem_*() function operates today and
it seems a bit out of place to change behavior of one to fix a bug.

Maybe I'm missing something though.. are we subject to the same general
problem in any of the other kmem_*() functions that can currently loop
indefinitely?

Brian

> -- 
> Michal Hocko
> SUSE Labs

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: mm allocation failure and hang when running xfstests generic/269 on xfs
@ 2017-03-02 13:00               ` Brian Foster
  0 siblings, 0 replies; 79+ messages in thread
From: Brian Foster @ 2017-03-02 13:00 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Tetsuo Handa, Xiong Zhou, Christoph Hellwig, linux-xfs, linux-mm,
	linux-kernel, linux-fsdevel

On Thu, Mar 02, 2017 at 01:49:09PM +0100, Michal Hocko wrote:
> On Thu 02-03-17 07:24:27, Brian Foster wrote:
> > On Thu, Mar 02, 2017 at 11:35:20AM +0100, Michal Hocko wrote:
> > > On Thu 02-03-17 19:04:48, Tetsuo Handa wrote:
> > > [...]
> > > > So, commit 5d17a73a2ebeb8d1("vmalloc: back off when the current task is
> > > > killed") implemented __GFP_KILLABLE flag and automatically applied that
> > > > flag. As a result, those who are not ready to fail upon SIGKILL are
> > > > confused. ;-)
> > > 
> > > You are right! The function is documented it might fail but the code
> > > doesn't really allow that. This seems like a bug to me. What do you
> > > think about the following?
> > > ---
> > > From d02cb0285d8ce3344fd64dc7e2912e9a04bef80d Mon Sep 17 00:00:00 2001
> > > From: Michal Hocko <mhocko@suse.com>
> > > Date: Thu, 2 Mar 2017 11:31:11 +0100
> > > Subject: [PATCH] xfs: allow kmem_zalloc_greedy to fail
> > > 
> > > Even though kmem_zalloc_greedy is documented it might fail the current
> > > code doesn't really implement this properly and loops on the smallest
> > > allowed size for ever. This is a problem because vzalloc might fail
> > > permanently. Since 5d17a73a2ebe ("vmalloc: back off when the current
> > > task is killed") such a failure is much more probable than it used to
> > > be. Fix this by bailing out if the minimum size request failed.
> > > 
> > > This has been noticed by a hung generic/269 xfstest by Xiong Zhou.
> > > 
> > > Reported-by: Xiong Zhou <xzhou@redhat.com>
> > > Analyzed-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> > > Signed-off-by: Michal Hocko <mhocko@suse.com>
> > > ---
> > >  fs/xfs/kmem.c | 2 ++
> > >  1 file changed, 2 insertions(+)
> > > 
> > > diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c
> > > index 339c696bbc01..ee95f5c6db45 100644
> > > --- a/fs/xfs/kmem.c
> > > +++ b/fs/xfs/kmem.c
> > > @@ -34,6 +34,8 @@ kmem_zalloc_greedy(size_t *size, size_t minsize, size_t maxsize)
> > >  	size_t		kmsize = maxsize;
> > >  
> > >  	while (!(ptr = vzalloc(kmsize))) {
> > > +		if (kmsize == minsize)
> > > +			break;
> > >  		if ((kmsize >>= 1) <= minsize)
> > >  			kmsize = minsize;
> > >  	}
> > 
> > More consistent with the rest of the kmem code might be to accept a
> > flags argument and do something like this based on KM_MAYFAIL.
> 
> Well, vmalloc doesn't really support GFP_NOFAIL semantic right now for
> the same reason it doesn't support GFP_NOFS. So I am not sure this is a
> good idea.
> 

Not sure I follow..? I'm just suggesting to control the loop behavior
based on the KM_ flag, not to do or change anything wrt to GFP_ flags.

> > The one
> > current caller looks like it would pass it, but I suppose we'd still
> > need a mechanism to break out should a new caller not pass that flag.
> > Would a fatal_signal_pending() check in the loop as well allow us to
> > break out in the scenario that is reproduced here?
> 
> Yes that check would work as well I just thought the break out when the
> minsize request fails to be more logical. There might be other reasons
> to fail the request and looping here seems just wrong. But whatever you
> or other xfs people prefer.

There may be higher level reasons for why this code should or should not
loop, that just seems like a separate issue to me. My thinking is more
that this appears to be how every kmem_*() function operates today and
it seems a bit out of place to change behavior of one to fix a bug.

Maybe I'm missing something though.. are we subject to the same general
problem in any of the other kmem_*() functions that can currently loop
indefinitely?

Brian

> -- 
> Michal Hocko
> SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: mm allocation failure and hang when running xfstests generic/269 on xfs
  2017-03-02 13:00               ` Brian Foster
@ 2017-03-02 13:07                 ` Tetsuo Handa
  -1 siblings, 0 replies; 79+ messages in thread
From: Tetsuo Handa @ 2017-03-02 13:07 UTC (permalink / raw)
  To: bfoster, mhocko
  Cc: xzhou, hch, linux-xfs, linux-mm, linux-kernel, linux-fsdevel

Brian Foster wrote:
> On Thu, Mar 02, 2017 at 01:49:09PM +0100, Michal Hocko wrote:
> > On Thu 02-03-17 07:24:27, Brian Foster wrote:
> > > On Thu, Mar 02, 2017 at 11:35:20AM +0100, Michal Hocko wrote:
> > > > On Thu 02-03-17 19:04:48, Tetsuo Handa wrote:
> > > > [...]
> > > > > So, commit 5d17a73a2ebeb8d1("vmalloc: back off when the current task is
> > > > > killed") implemented __GFP_KILLABLE flag and automatically applied that
> > > > > flag. As a result, those who are not ready to fail upon SIGKILL are
> > > > > confused. ;-)
> > > > 
> > > > You are right! The function is documented it might fail but the code
> > > > doesn't really allow that. This seems like a bug to me. What do you
> > > > think about the following?
> > > > ---
> > > > From d02cb0285d8ce3344fd64dc7e2912e9a04bef80d Mon Sep 17 00:00:00 2001
> > > > From: Michal Hocko <mhocko@suse.com>
> > > > Date: Thu, 2 Mar 2017 11:31:11 +0100
> > > > Subject: [PATCH] xfs: allow kmem_zalloc_greedy to fail
> > > > 
> > > > Even though kmem_zalloc_greedy is documented it might fail the current
> > > > code doesn't really implement this properly and loops on the smallest
> > > > allowed size for ever. This is a problem because vzalloc might fail
> > > > permanently. Since 5d17a73a2ebe ("vmalloc: back off when the current
> > > > task is killed") such a failure is much more probable than it used to
> > > > be. Fix this by bailing out if the minimum size request failed.
> > > > 
> > > > This has been noticed by a hung generic/269 xfstest by Xiong Zhou.
> > > > 
> > > > Reported-by: Xiong Zhou <xzhou@redhat.com>
> > > > Analyzed-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> > > > Signed-off-by: Michal Hocko <mhocko@suse.com>
> > > > ---
> > > >  fs/xfs/kmem.c | 2 ++
> > > >  1 file changed, 2 insertions(+)
> > > > 
> > > > diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c
> > > > index 339c696bbc01..ee95f5c6db45 100644
> > > > --- a/fs/xfs/kmem.c
> > > > +++ b/fs/xfs/kmem.c
> > > > @@ -34,6 +34,8 @@ kmem_zalloc_greedy(size_t *size, size_t minsize, size_t maxsize)
> > > >  	size_t		kmsize = maxsize;
> > > >  
> > > >  	while (!(ptr = vzalloc(kmsize))) {
> > > > +		if (kmsize == minsize)
> > > > +			break;
> > > >  		if ((kmsize >>= 1) <= minsize)
> > > >  			kmsize = minsize;
> > > >  	}
> > > 
> > > More consistent with the rest of the kmem code might be to accept a
> > > flags argument and do something like this based on KM_MAYFAIL.
> > 
> > Well, vmalloc doesn't really support GFP_NOFAIL semantic right now for
> > the same reason it doesn't support GFP_NOFS. So I am not sure this is a
> > good idea.
> > 
> 
> Not sure I follow..? I'm just suggesting to control the loop behavior
> based on the KM_ flag, not to do or change anything wrt to GFP_ flags.

vmalloc() cannot support __GFP_NOFAIL. Therefore, kmem_zalloc_greedy()
cannot support !KM_MAYFAIL. We will change kmem_zalloc_greedy() not to
use vmalloc() when we need to support !KM_MAYFAIL. That won't be now.

> 
> > > The one
> > > current caller looks like it would pass it, but I suppose we'd still
> > > need a mechanism to break out should a new caller not pass that flag.
> > > Would a fatal_signal_pending() check in the loop as well allow us to
> > > break out in the scenario that is reproduced here?
> > 
> > Yes that check would work as well I just thought the break out when the
> > minsize request fails to be more logical. There might be other reasons
> > to fail the request and looping here seems just wrong. But whatever you
> > or other xfs people prefer.
> 
> There may be higher level reasons for why this code should or should not
> loop, that just seems like a separate issue to me. My thinking is more
> that this appears to be how every kmem_*() function operates today and
> it seems a bit out of place to change behavior of one to fix a bug.
> 
> Maybe I'm missing something though.. are we subject to the same general
> problem in any of the other kmem_*() functions that can currently loop
> indefinitely?
> 

The kmem code looks to me a source of problems. For example,
kmem_alloc() by RESCUER workqueue threads got stuck doing GFP_NOFS allocation.
Due to __GFP_NOWARN, warn_alloc() cannot warn allocation stalls.
Due to order <= PAGE_ALLOC_COSTLY_ORDER, the
"%s(%u) possible memory allocation deadlock size %u in %s (mode:0x%x)"
message cannot be printed because __alloc_pages_nodemask() does not
return. And due to GFP_NOFS without __GFP_HIGH nor __GFP_NOFAIL,
memory cannot be allocated to RESCUER thread which are trying to
allocate memory for reclaiming memory; the result is silent lockup.
(I must stop here, for this is a different thread.)

----------------------------------------
[ 1095.633625] MemAlloc: xfs-data/sda1(451) flags=0x4228060 switches=45509 seq=1 gfp=0x1604240(GFP_NOFS|__GFP_NOWARN|__GFP_COMP|__GFP_NOTRACK) order=0 delay=652073
[ 1095.633626] xfs-data/sda1   R  running task    12696   451      2 0x00000000
[ 1095.633663] Workqueue: xfs-data/sda1 xfs_end_io [xfs]
[ 1095.633665] Call Trace:
[ 1095.633668]  __schedule+0x336/0xe00
[ 1095.633671]  schedule+0x3d/0x90
[ 1095.633672]  schedule_timeout+0x20d/0x510
[ 1095.633675]  ? lock_timer_base+0xa0/0xa0
[ 1095.633678]  schedule_timeout_uninterruptible+0x2a/0x30
[ 1095.633680]  __alloc_pages_slowpath+0x2b5/0xd95
[ 1095.633687]  __alloc_pages_nodemask+0x3e4/0x460
[ 1095.633699]  alloc_pages_current+0x97/0x1b0
[ 1095.633702]  new_slab+0x4cb/0x6b0
[ 1095.633706]  ___slab_alloc+0x3a3/0x620
[ 1095.633728]  ? kmem_alloc+0x96/0x120 [xfs]
[ 1095.633730]  ? ___slab_alloc+0x5c6/0x620
[ 1095.633732]  ? cpuacct_charge+0x38/0x1e0
[ 1095.633767]  ? kmem_alloc+0x96/0x120 [xfs]
[ 1095.633770]  __slab_alloc+0x46/0x7d
[ 1095.633773]  __kmalloc+0x301/0x3b0
[ 1095.633802]  kmem_alloc+0x96/0x120 [xfs]
[ 1095.633804]  ? kfree+0x1fa/0x330
[ 1095.633842]  xfs_log_commit_cil+0x489/0x710 [xfs]
[ 1095.633864]  __xfs_trans_commit+0x83/0x260 [xfs]
[ 1095.633883]  xfs_trans_commit+0x10/0x20 [xfs]
[ 1095.633901]  __xfs_setfilesize+0xdb/0x240 [xfs]
[ 1095.633936]  xfs_setfilesize_ioend+0x89/0xb0 [xfs]
[ 1095.633954]  ? xfs_setfilesize_ioend+0x5/0xb0 [xfs]
[ 1095.633971]  xfs_end_io+0x81/0x110 [xfs]
[ 1095.633973]  process_one_work+0x22b/0x760
[ 1095.633975]  ? process_one_work+0x194/0x760
[ 1095.633997]  rescuer_thread+0x1f2/0x3d0
[ 1095.634002]  kthread+0x10f/0x150
[ 1095.634003]  ? worker_thread+0x4b0/0x4b0
[ 1095.634004]  ? kthread_create_on_node+0x70/0x70
[ 1095.634007]  ret_from_fork+0x31/0x40
[ 1095.634013] MemAlloc: xfs-eofblocks/s(456) flags=0x4228860 switches=15435 seq=1 gfp=0x1400240(GFP_NOFS|__GFP_NOWARN) order=0 delay=293074
[ 1095.634014] xfs-eofblocks/s R  running task    12032   456      2 0x00000000
[ 1095.634037] Workqueue: xfs-eofblocks/sda1 xfs_eofblocks_worker [xfs]
[ 1095.634038] Call Trace:
[ 1095.634040]  ? _raw_spin_lock+0x3d/0x80
[ 1095.634042]  ? vmpressure+0xd0/0x120
[ 1095.634044]  ? vmpressure+0xd0/0x120
[ 1095.634047]  ? vmpressure_prio+0x21/0x30
[ 1095.634049]  ? do_try_to_free_pages+0x70/0x300
[ 1095.634052]  ? try_to_free_pages+0x131/0x3f0
[ 1095.634058]  ? __alloc_pages_slowpath+0x3ec/0xd95
[ 1095.634065]  ? __alloc_pages_nodemask+0x3e4/0x460
[ 1095.634069]  ? alloc_pages_current+0x97/0x1b0
[ 1095.634111]  ? xfs_buf_allocate_memory+0x160/0x2a3 [xfs]
[ 1095.634133]  ? xfs_buf_get_map+0x2be/0x480 [xfs]
[ 1095.634169]  ? xfs_buf_read_map+0x2c/0x400 [xfs]
[ 1095.634204]  ? xfs_trans_read_buf_map+0x186/0x830 [xfs]
[ 1095.634222]  ? xfs_btree_read_buf_block.constprop.34+0x78/0xc0 [xfs]
[ 1095.634239]  ? xfs_btree_lookup_get_block+0x8a/0x180 [xfs]
[ 1095.634257]  ? xfs_btree_lookup+0xd0/0x3f0 [xfs]
[ 1095.634296]  ? kmem_zone_alloc+0x96/0x120 [xfs]
[ 1095.634299]  ? _raw_spin_unlock+0x27/0x40
[ 1095.634315]  ? xfs_bmbt_lookup_eq+0x1f/0x30 [xfs]
[ 1095.634348]  ? xfs_bmap_del_extent+0x1b2/0x1610 [xfs]
[ 1095.634380]  ? kmem_zone_alloc+0x96/0x120 [xfs]
[ 1095.634400]  ? __xfs_bunmapi+0x4db/0xda0 [xfs]
[ 1095.634421]  ? xfs_bunmapi+0x2b/0x40 [xfs]
[ 1095.634459]  ? xfs_itruncate_extents+0x1df/0x780 [xfs]
[ 1095.634502]  ? xfs_rename+0xc70/0x1080 [xfs]
[ 1095.634525]  ? xfs_free_eofblocks+0x1c4/0x230 [xfs]
[ 1095.634546]  ? xfs_inode_free_eofblocks+0x18d/0x280 [xfs]
[ 1095.634565]  ? xfs_inode_ag_walk.isra.13+0x2b5/0x620 [xfs]
[ 1095.634582]  ? xfs_inode_ag_walk.isra.13+0x91/0x620 [xfs]
[ 1095.634618]  ? xfs_inode_clear_eofblocks_tag+0x1a0/0x1a0 [xfs]
[ 1095.634630]  ? radix_tree_next_chunk+0x10b/0x2d0
[ 1095.634635]  ? radix_tree_gang_lookup_tag+0xd7/0x150
[ 1095.634672]  ? xfs_perag_get_tag+0x11d/0x370 [xfs]
[ 1095.634690]  ? xfs_perag_get_tag+0x5/0x370 [xfs]
[ 1095.634709]  ? xfs_inode_ag_iterator_tag+0x71/0xa0 [xfs]
[ 1095.634726]  ? xfs_inode_clear_eofblocks_tag+0x1a0/0x1a0 [xfs]
[ 1095.634744]  ? __xfs_icache_free_eofblocks+0x3b/0x40 [xfs]
[ 1095.634759]  ? xfs_eofblocks_worker+0x27/0x40 [xfs]
[ 1095.634762]  ? process_one_work+0x22b/0x760
[ 1095.634763]  ? process_one_work+0x194/0x760
[ 1095.634784]  ? rescuer_thread+0x1f2/0x3d0
[ 1095.634788]  ? kthread+0x10f/0x150
[ 1095.634789]  ? worker_thread+0x4b0/0x4b0
[ 1095.634790]  ? kthread_create_on_node+0x70/0x70
[ 1095.634793]  ? ret_from_fork+0x31/0x40
----------------------------------------

> Brian
> 
> > -- 
> > Michal Hocko
> > SUSE Labs

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: mm allocation failure and hang when running xfstests generic/269 on xfs
@ 2017-03-02 13:07                 ` Tetsuo Handa
  0 siblings, 0 replies; 79+ messages in thread
From: Tetsuo Handa @ 2017-03-02 13:07 UTC (permalink / raw)
  To: bfoster, mhocko
  Cc: xzhou, hch, linux-xfs, linux-mm, linux-kernel, linux-fsdevel

Brian Foster wrote:
> On Thu, Mar 02, 2017 at 01:49:09PM +0100, Michal Hocko wrote:
> > On Thu 02-03-17 07:24:27, Brian Foster wrote:
> > > On Thu, Mar 02, 2017 at 11:35:20AM +0100, Michal Hocko wrote:
> > > > On Thu 02-03-17 19:04:48, Tetsuo Handa wrote:
> > > > [...]
> > > > > So, commit 5d17a73a2ebeb8d1("vmalloc: back off when the current task is
> > > > > killed") implemented __GFP_KILLABLE flag and automatically applied that
> > > > > flag. As a result, those who are not ready to fail upon SIGKILL are
> > > > > confused. ;-)
> > > > 
> > > > You are right! The function is documented it might fail but the code
> > > > doesn't really allow that. This seems like a bug to me. What do you
> > > > think about the following?
> > > > ---
> > > > From d02cb0285d8ce3344fd64dc7e2912e9a04bef80d Mon Sep 17 00:00:00 2001
> > > > From: Michal Hocko <mhocko@suse.com>
> > > > Date: Thu, 2 Mar 2017 11:31:11 +0100
> > > > Subject: [PATCH] xfs: allow kmem_zalloc_greedy to fail
> > > > 
> > > > Even though kmem_zalloc_greedy is documented it might fail the current
> > > > code doesn't really implement this properly and loops on the smallest
> > > > allowed size for ever. This is a problem because vzalloc might fail
> > > > permanently. Since 5d17a73a2ebe ("vmalloc: back off when the current
> > > > task is killed") such a failure is much more probable than it used to
> > > > be. Fix this by bailing out if the minimum size request failed.
> > > > 
> > > > This has been noticed by a hung generic/269 xfstest by Xiong Zhou.
> > > > 
> > > > Reported-by: Xiong Zhou <xzhou@redhat.com>
> > > > Analyzed-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> > > > Signed-off-by: Michal Hocko <mhocko@suse.com>
> > > > ---
> > > >  fs/xfs/kmem.c | 2 ++
> > > >  1 file changed, 2 insertions(+)
> > > > 
> > > > diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c
> > > > index 339c696bbc01..ee95f5c6db45 100644
> > > > --- a/fs/xfs/kmem.c
> > > > +++ b/fs/xfs/kmem.c
> > > > @@ -34,6 +34,8 @@ kmem_zalloc_greedy(size_t *size, size_t minsize, size_t maxsize)
> > > >  	size_t		kmsize = maxsize;
> > > >  
> > > >  	while (!(ptr = vzalloc(kmsize))) {
> > > > +		if (kmsize == minsize)
> > > > +			break;
> > > >  		if ((kmsize >>= 1) <= minsize)
> > > >  			kmsize = minsize;
> > > >  	}
> > > 
> > > More consistent with the rest of the kmem code might be to accept a
> > > flags argument and do something like this based on KM_MAYFAIL.
> > 
> > Well, vmalloc doesn't really support GFP_NOFAIL semantic right now for
> > the same reason it doesn't support GFP_NOFS. So I am not sure this is a
> > good idea.
> > 
> 
> Not sure I follow..? I'm just suggesting to control the loop behavior
> based on the KM_ flag, not to do or change anything wrt to GFP_ flags.

vmalloc() cannot support __GFP_NOFAIL. Therefore, kmem_zalloc_greedy()
cannot support !KM_MAYFAIL. We will change kmem_zalloc_greedy() not to
use vmalloc() when we need to support !KM_MAYFAIL. That won't be now.

> 
> > > The one
> > > current caller looks like it would pass it, but I suppose we'd still
> > > need a mechanism to break out should a new caller not pass that flag.
> > > Would a fatal_signal_pending() check in the loop as well allow us to
> > > break out in the scenario that is reproduced here?
> > 
> > Yes that check would work as well I just thought the break out when the
> > minsize request fails to be more logical. There might be other reasons
> > to fail the request and looping here seems just wrong. But whatever you
> > or other xfs people prefer.
> 
> There may be higher level reasons for why this code should or should not
> loop, that just seems like a separate issue to me. My thinking is more
> that this appears to be how every kmem_*() function operates today and
> it seems a bit out of place to change behavior of one to fix a bug.
> 
> Maybe I'm missing something though.. are we subject to the same general
> problem in any of the other kmem_*() functions that can currently loop
> indefinitely?
> 

The kmem code looks to me a source of problems. For example,
kmem_alloc() by RESCUER workqueue threads got stuck doing GFP_NOFS allocation.
Due to __GFP_NOWARN, warn_alloc() cannot warn allocation stalls.
Due to order <= PAGE_ALLOC_COSTLY_ORDER, the
"%s(%u) possible memory allocation deadlock size %u in %s (mode:0x%x)"
message cannot be printed because __alloc_pages_nodemask() does not
return. And due to GFP_NOFS without __GFP_HIGH nor __GFP_NOFAIL,
memory cannot be allocated to RESCUER thread which are trying to
allocate memory for reclaiming memory; the result is silent lockup.
(I must stop here, for this is a different thread.)

----------------------------------------
[ 1095.633625] MemAlloc: xfs-data/sda1(451) flags=0x4228060 switches=45509 seq=1 gfp=0x1604240(GFP_NOFS|__GFP_NOWARN|__GFP_COMP|__GFP_NOTRACK) order=0 delay=652073
[ 1095.633626] xfs-data/sda1   R  running task    12696   451      2 0x00000000
[ 1095.633663] Workqueue: xfs-data/sda1 xfs_end_io [xfs]
[ 1095.633665] Call Trace:
[ 1095.633668]  __schedule+0x336/0xe00
[ 1095.633671]  schedule+0x3d/0x90
[ 1095.633672]  schedule_timeout+0x20d/0x510
[ 1095.633675]  ? lock_timer_base+0xa0/0xa0
[ 1095.633678]  schedule_timeout_uninterruptible+0x2a/0x30
[ 1095.633680]  __alloc_pages_slowpath+0x2b5/0xd95
[ 1095.633687]  __alloc_pages_nodemask+0x3e4/0x460
[ 1095.633699]  alloc_pages_current+0x97/0x1b0
[ 1095.633702]  new_slab+0x4cb/0x6b0
[ 1095.633706]  ___slab_alloc+0x3a3/0x620
[ 1095.633728]  ? kmem_alloc+0x96/0x120 [xfs]
[ 1095.633730]  ? ___slab_alloc+0x5c6/0x620
[ 1095.633732]  ? cpuacct_charge+0x38/0x1e0
[ 1095.633767]  ? kmem_alloc+0x96/0x120 [xfs]
[ 1095.633770]  __slab_alloc+0x46/0x7d
[ 1095.633773]  __kmalloc+0x301/0x3b0
[ 1095.633802]  kmem_alloc+0x96/0x120 [xfs]
[ 1095.633804]  ? kfree+0x1fa/0x330
[ 1095.633842]  xfs_log_commit_cil+0x489/0x710 [xfs]
[ 1095.633864]  __xfs_trans_commit+0x83/0x260 [xfs]
[ 1095.633883]  xfs_trans_commit+0x10/0x20 [xfs]
[ 1095.633901]  __xfs_setfilesize+0xdb/0x240 [xfs]
[ 1095.633936]  xfs_setfilesize_ioend+0x89/0xb0 [xfs]
[ 1095.633954]  ? xfs_setfilesize_ioend+0x5/0xb0 [xfs]
[ 1095.633971]  xfs_end_io+0x81/0x110 [xfs]
[ 1095.633973]  process_one_work+0x22b/0x760
[ 1095.633975]  ? process_one_work+0x194/0x760
[ 1095.633997]  rescuer_thread+0x1f2/0x3d0
[ 1095.634002]  kthread+0x10f/0x150
[ 1095.634003]  ? worker_thread+0x4b0/0x4b0
[ 1095.634004]  ? kthread_create_on_node+0x70/0x70
[ 1095.634007]  ret_from_fork+0x31/0x40
[ 1095.634013] MemAlloc: xfs-eofblocks/s(456) flags=0x4228860 switches=15435 seq=1 gfp=0x1400240(GFP_NOFS|__GFP_NOWARN) order=0 delay=293074
[ 1095.634014] xfs-eofblocks/s R  running task    12032   456      2 0x00000000
[ 1095.634037] Workqueue: xfs-eofblocks/sda1 xfs_eofblocks_worker [xfs]
[ 1095.634038] Call Trace:
[ 1095.634040]  ? _raw_spin_lock+0x3d/0x80
[ 1095.634042]  ? vmpressure+0xd0/0x120
[ 1095.634044]  ? vmpressure+0xd0/0x120
[ 1095.634047]  ? vmpressure_prio+0x21/0x30
[ 1095.634049]  ? do_try_to_free_pages+0x70/0x300
[ 1095.634052]  ? try_to_free_pages+0x131/0x3f0
[ 1095.634058]  ? __alloc_pages_slowpath+0x3ec/0xd95
[ 1095.634065]  ? __alloc_pages_nodemask+0x3e4/0x460
[ 1095.634069]  ? alloc_pages_current+0x97/0x1b0
[ 1095.634111]  ? xfs_buf_allocate_memory+0x160/0x2a3 [xfs]
[ 1095.634133]  ? xfs_buf_get_map+0x2be/0x480 [xfs]
[ 1095.634169]  ? xfs_buf_read_map+0x2c/0x400 [xfs]
[ 1095.634204]  ? xfs_trans_read_buf_map+0x186/0x830 [xfs]
[ 1095.634222]  ? xfs_btree_read_buf_block.constprop.34+0x78/0xc0 [xfs]
[ 1095.634239]  ? xfs_btree_lookup_get_block+0x8a/0x180 [xfs]
[ 1095.634257]  ? xfs_btree_lookup+0xd0/0x3f0 [xfs]
[ 1095.634296]  ? kmem_zone_alloc+0x96/0x120 [xfs]
[ 1095.634299]  ? _raw_spin_unlock+0x27/0x40
[ 1095.634315]  ? xfs_bmbt_lookup_eq+0x1f/0x30 [xfs]
[ 1095.634348]  ? xfs_bmap_del_extent+0x1b2/0x1610 [xfs]
[ 1095.634380]  ? kmem_zone_alloc+0x96/0x120 [xfs]
[ 1095.634400]  ? __xfs_bunmapi+0x4db/0xda0 [xfs]
[ 1095.634421]  ? xfs_bunmapi+0x2b/0x40 [xfs]
[ 1095.634459]  ? xfs_itruncate_extents+0x1df/0x780 [xfs]
[ 1095.634502]  ? xfs_rename+0xc70/0x1080 [xfs]
[ 1095.634525]  ? xfs_free_eofblocks+0x1c4/0x230 [xfs]
[ 1095.634546]  ? xfs_inode_free_eofblocks+0x18d/0x280 [xfs]
[ 1095.634565]  ? xfs_inode_ag_walk.isra.13+0x2b5/0x620 [xfs]
[ 1095.634582]  ? xfs_inode_ag_walk.isra.13+0x91/0x620 [xfs]
[ 1095.634618]  ? xfs_inode_clear_eofblocks_tag+0x1a0/0x1a0 [xfs]
[ 1095.634630]  ? radix_tree_next_chunk+0x10b/0x2d0
[ 1095.634635]  ? radix_tree_gang_lookup_tag+0xd7/0x150
[ 1095.634672]  ? xfs_perag_get_tag+0x11d/0x370 [xfs]
[ 1095.634690]  ? xfs_perag_get_tag+0x5/0x370 [xfs]
[ 1095.634709]  ? xfs_inode_ag_iterator_tag+0x71/0xa0 [xfs]
[ 1095.634726]  ? xfs_inode_clear_eofblocks_tag+0x1a0/0x1a0 [xfs]
[ 1095.634744]  ? __xfs_icache_free_eofblocks+0x3b/0x40 [xfs]
[ 1095.634759]  ? xfs_eofblocks_worker+0x27/0x40 [xfs]
[ 1095.634762]  ? process_one_work+0x22b/0x760
[ 1095.634763]  ? process_one_work+0x194/0x760
[ 1095.634784]  ? rescuer_thread+0x1f2/0x3d0
[ 1095.634788]  ? kthread+0x10f/0x150
[ 1095.634789]  ? worker_thread+0x4b0/0x4b0
[ 1095.634790]  ? kthread_create_on_node+0x70/0x70
[ 1095.634793]  ? ret_from_fork+0x31/0x40
----------------------------------------

> Brian
> 
> > -- 
> > Michal Hocko
> > SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: mm allocation failure and hang when running xfstests generic/269 on xfs
  2017-03-02 13:00               ` Brian Foster
@ 2017-03-02 13:27                 ` Michal Hocko
  -1 siblings, 0 replies; 79+ messages in thread
From: Michal Hocko @ 2017-03-02 13:27 UTC (permalink / raw)
  To: Brian Foster
  Cc: Tetsuo Handa, Xiong Zhou, Christoph Hellwig, linux-xfs, linux-mm,
	linux-kernel, linux-fsdevel

On Thu 02-03-17 08:00:09, Brian Foster wrote:
> On Thu, Mar 02, 2017 at 01:49:09PM +0100, Michal Hocko wrote:
> > On Thu 02-03-17 07:24:27, Brian Foster wrote:
> > > On Thu, Mar 02, 2017 at 11:35:20AM +0100, Michal Hocko wrote:
> > > > On Thu 02-03-17 19:04:48, Tetsuo Handa wrote:
> > > > [...]
> > > > > So, commit 5d17a73a2ebeb8d1("vmalloc: back off when the current task is
> > > > > killed") implemented __GFP_KILLABLE flag and automatically applied that
> > > > > flag. As a result, those who are not ready to fail upon SIGKILL are
> > > > > confused. ;-)
> > > > 
> > > > You are right! The function is documented it might fail but the code
> > > > doesn't really allow that. This seems like a bug to me. What do you
> > > > think about the following?
> > > > ---
> > > > From d02cb0285d8ce3344fd64dc7e2912e9a04bef80d Mon Sep 17 00:00:00 2001
> > > > From: Michal Hocko <mhocko@suse.com>
> > > > Date: Thu, 2 Mar 2017 11:31:11 +0100
> > > > Subject: [PATCH] xfs: allow kmem_zalloc_greedy to fail
> > > > 
> > > > Even though kmem_zalloc_greedy is documented it might fail the current
> > > > code doesn't really implement this properly and loops on the smallest
> > > > allowed size for ever. This is a problem because vzalloc might fail
> > > > permanently. Since 5d17a73a2ebe ("vmalloc: back off when the current
> > > > task is killed") such a failure is much more probable than it used to
> > > > be. Fix this by bailing out if the minimum size request failed.
> > > > 
> > > > This has been noticed by a hung generic/269 xfstest by Xiong Zhou.
> > > > 
> > > > Reported-by: Xiong Zhou <xzhou@redhat.com>
> > > > Analyzed-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> > > > Signed-off-by: Michal Hocko <mhocko@suse.com>
> > > > ---
> > > >  fs/xfs/kmem.c | 2 ++
> > > >  1 file changed, 2 insertions(+)
> > > > 
> > > > diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c
> > > > index 339c696bbc01..ee95f5c6db45 100644
> > > > --- a/fs/xfs/kmem.c
> > > > +++ b/fs/xfs/kmem.c
> > > > @@ -34,6 +34,8 @@ kmem_zalloc_greedy(size_t *size, size_t minsize, size_t maxsize)
> > > >  	size_t		kmsize = maxsize;
> > > >  
> > > >  	while (!(ptr = vzalloc(kmsize))) {
> > > > +		if (kmsize == minsize)
> > > > +			break;
> > > >  		if ((kmsize >>= 1) <= minsize)
> > > >  			kmsize = minsize;
> > > >  	}
> > > 
> > > More consistent with the rest of the kmem code might be to accept a
> > > flags argument and do something like this based on KM_MAYFAIL.
> > 
> > Well, vmalloc doesn't really support GFP_NOFAIL semantic right now for
> > the same reason it doesn't support GFP_NOFS. So I am not sure this is a
> > good idea.
> > 
> 
> Not sure I follow..? I'm just suggesting to control the loop behavior
> based on the KM_ flag, not to do or change anything wrt to GFP_ flags.

As Tetsuo already pointed out, vmalloc cannot really support never-fail
semantic with the current implementation so the semantic would have
to be implemented in kmem_zalloc_greedy and the only way to do that
would be to loop there and this is rather nasty as you can see from the
reported issue because the vmalloc failure might be permanent so there
won't be any way to make a forward progress. Breaking out of the loop
on fatal_signal_pending pending would break the non-failing sementic.

Besides that, there doesn't really seem to be any demand for this
semantic in the first place so why to make this more complicated than
necessary?

I see your argument about being in sync with other kmem helpers but
those are bit different because regular page/slab allocators allow never
fail semantic (even though this is mostly ignored by those helpers which
implement their own retries but that is a different topic).

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: mm allocation failure and hang when running xfstests generic/269 on xfs
@ 2017-03-02 13:27                 ` Michal Hocko
  0 siblings, 0 replies; 79+ messages in thread
From: Michal Hocko @ 2017-03-02 13:27 UTC (permalink / raw)
  To: Brian Foster
  Cc: Tetsuo Handa, Xiong Zhou, Christoph Hellwig, linux-xfs, linux-mm,
	linux-kernel, linux-fsdevel

On Thu 02-03-17 08:00:09, Brian Foster wrote:
> On Thu, Mar 02, 2017 at 01:49:09PM +0100, Michal Hocko wrote:
> > On Thu 02-03-17 07:24:27, Brian Foster wrote:
> > > On Thu, Mar 02, 2017 at 11:35:20AM +0100, Michal Hocko wrote:
> > > > On Thu 02-03-17 19:04:48, Tetsuo Handa wrote:
> > > > [...]
> > > > > So, commit 5d17a73a2ebeb8d1("vmalloc: back off when the current task is
> > > > > killed") implemented __GFP_KILLABLE flag and automatically applied that
> > > > > flag. As a result, those who are not ready to fail upon SIGKILL are
> > > > > confused. ;-)
> > > > 
> > > > You are right! The function is documented it might fail but the code
> > > > doesn't really allow that. This seems like a bug to me. What do you
> > > > think about the following?
> > > > ---
> > > > From d02cb0285d8ce3344fd64dc7e2912e9a04bef80d Mon Sep 17 00:00:00 2001
> > > > From: Michal Hocko <mhocko@suse.com>
> > > > Date: Thu, 2 Mar 2017 11:31:11 +0100
> > > > Subject: [PATCH] xfs: allow kmem_zalloc_greedy to fail
> > > > 
> > > > Even though kmem_zalloc_greedy is documented it might fail the current
> > > > code doesn't really implement this properly and loops on the smallest
> > > > allowed size for ever. This is a problem because vzalloc might fail
> > > > permanently. Since 5d17a73a2ebe ("vmalloc: back off when the current
> > > > task is killed") such a failure is much more probable than it used to
> > > > be. Fix this by bailing out if the minimum size request failed.
> > > > 
> > > > This has been noticed by a hung generic/269 xfstest by Xiong Zhou.
> > > > 
> > > > Reported-by: Xiong Zhou <xzhou@redhat.com>
> > > > Analyzed-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> > > > Signed-off-by: Michal Hocko <mhocko@suse.com>
> > > > ---
> > > >  fs/xfs/kmem.c | 2 ++
> > > >  1 file changed, 2 insertions(+)
> > > > 
> > > > diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c
> > > > index 339c696bbc01..ee95f5c6db45 100644
> > > > --- a/fs/xfs/kmem.c
> > > > +++ b/fs/xfs/kmem.c
> > > > @@ -34,6 +34,8 @@ kmem_zalloc_greedy(size_t *size, size_t minsize, size_t maxsize)
> > > >  	size_t		kmsize = maxsize;
> > > >  
> > > >  	while (!(ptr = vzalloc(kmsize))) {
> > > > +		if (kmsize == minsize)
> > > > +			break;
> > > >  		if ((kmsize >>= 1) <= minsize)
> > > >  			kmsize = minsize;
> > > >  	}
> > > 
> > > More consistent with the rest of the kmem code might be to accept a
> > > flags argument and do something like this based on KM_MAYFAIL.
> > 
> > Well, vmalloc doesn't really support GFP_NOFAIL semantic right now for
> > the same reason it doesn't support GFP_NOFS. So I am not sure this is a
> > good idea.
> > 
> 
> Not sure I follow..? I'm just suggesting to control the loop behavior
> based on the KM_ flag, not to do or change anything wrt to GFP_ flags.

As Tetsuo already pointed out, vmalloc cannot really support never-fail
semantic with the current implementation so the semantic would have
to be implemented in kmem_zalloc_greedy and the only way to do that
would be to loop there and this is rather nasty as you can see from the
reported issue because the vmalloc failure might be permanent so there
won't be any way to make a forward progress. Breaking out of the loop
on fatal_signal_pending pending would break the non-failing sementic.

Besides that, there doesn't really seem to be any demand for this
semantic in the first place so why to make this more complicated than
necessary?

I see your argument about being in sync with other kmem helpers but
those are bit different because regular page/slab allocators allow never
fail semantic (even though this is mostly ignored by those helpers which
implement their own retries but that is a different topic).

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: mm allocation failure and hang when running xfstests generic/269 on xfs
  2017-03-02 13:27                 ` Michal Hocko
@ 2017-03-02 13:41                   ` Brian Foster
  -1 siblings, 0 replies; 79+ messages in thread
From: Brian Foster @ 2017-03-02 13:41 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Tetsuo Handa, Xiong Zhou, Christoph Hellwig, linux-xfs, linux-mm,
	linux-kernel, linux-fsdevel

On Thu, Mar 02, 2017 at 02:27:55PM +0100, Michal Hocko wrote:
> On Thu 02-03-17 08:00:09, Brian Foster wrote:
> > On Thu, Mar 02, 2017 at 01:49:09PM +0100, Michal Hocko wrote:
> > > On Thu 02-03-17 07:24:27, Brian Foster wrote:
> > > > On Thu, Mar 02, 2017 at 11:35:20AM +0100, Michal Hocko wrote:
> > > > > On Thu 02-03-17 19:04:48, Tetsuo Handa wrote:
> > > > > [...]
> > > > > > So, commit 5d17a73a2ebeb8d1("vmalloc: back off when the current task is
> > > > > > killed") implemented __GFP_KILLABLE flag and automatically applied that
> > > > > > flag. As a result, those who are not ready to fail upon SIGKILL are
> > > > > > confused. ;-)
> > > > > 
> > > > > You are right! The function is documented it might fail but the code
> > > > > doesn't really allow that. This seems like a bug to me. What do you
> > > > > think about the following?
> > > > > ---
> > > > > From d02cb0285d8ce3344fd64dc7e2912e9a04bef80d Mon Sep 17 00:00:00 2001
> > > > > From: Michal Hocko <mhocko@suse.com>
> > > > > Date: Thu, 2 Mar 2017 11:31:11 +0100
> > > > > Subject: [PATCH] xfs: allow kmem_zalloc_greedy to fail
> > > > > 
> > > > > Even though kmem_zalloc_greedy is documented it might fail the current
> > > > > code doesn't really implement this properly and loops on the smallest
> > > > > allowed size for ever. This is a problem because vzalloc might fail
> > > > > permanently. Since 5d17a73a2ebe ("vmalloc: back off when the current
> > > > > task is killed") such a failure is much more probable than it used to
> > > > > be. Fix this by bailing out if the minimum size request failed.
> > > > > 
> > > > > This has been noticed by a hung generic/269 xfstest by Xiong Zhou.
> > > > > 
> > > > > Reported-by: Xiong Zhou <xzhou@redhat.com>
> > > > > Analyzed-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> > > > > Signed-off-by: Michal Hocko <mhocko@suse.com>
> > > > > ---
> > > > >  fs/xfs/kmem.c | 2 ++
> > > > >  1 file changed, 2 insertions(+)
> > > > > 
> > > > > diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c
> > > > > index 339c696bbc01..ee95f5c6db45 100644
> > > > > --- a/fs/xfs/kmem.c
> > > > > +++ b/fs/xfs/kmem.c
> > > > > @@ -34,6 +34,8 @@ kmem_zalloc_greedy(size_t *size, size_t minsize, size_t maxsize)
> > > > >  	size_t		kmsize = maxsize;
> > > > >  
> > > > >  	while (!(ptr = vzalloc(kmsize))) {
> > > > > +		if (kmsize == minsize)
> > > > > +			break;
> > > > >  		if ((kmsize >>= 1) <= minsize)
> > > > >  			kmsize = minsize;
> > > > >  	}
> > > > 
> > > > More consistent with the rest of the kmem code might be to accept a
> > > > flags argument and do something like this based on KM_MAYFAIL.
> > > 
> > > Well, vmalloc doesn't really support GFP_NOFAIL semantic right now for
> > > the same reason it doesn't support GFP_NOFS. So I am not sure this is a
> > > good idea.
> > > 
> > 
> > Not sure I follow..? I'm just suggesting to control the loop behavior
> > based on the KM_ flag, not to do or change anything wrt to GFP_ flags.
> 
> As Tetsuo already pointed out, vmalloc cannot really support never-fail
> semantic with the current implementation so the semantic would have
> to be implemented in kmem_zalloc_greedy and the only way to do that
> would be to loop there and this is rather nasty as you can see from the
> reported issue because the vmalloc failure might be permanent so there
> won't be any way to make a forward progress. Breaking out of the loop
> on fatal_signal_pending pending would break the non-failing sementic.
> 

Sure..

> Besides that, there doesn't really seem to be any demand for this
> semantic in the first place so why to make this more complicated than
> necessary?
> 

That may very well be the case. I'm not necessarily against this...

> I see your argument about being in sync with other kmem helpers but
> those are bit different because regular page/slab allocators allow never
> fail semantic (even though this is mostly ignored by those helpers which
> implement their own retries but that is a different topic).
> 

... but what I'm trying to understand here is whether this failure
scenario is specific to vmalloc() or whether the other kmem_*()
functions are susceptible to the same problem. For example, suppose we
replaced this kmem_zalloc_greedy() call with a kmem_zalloc(PAGE_SIZE,
KM_SLEEP) call. Could we hit the same problem if the process is killed?

Brian

> -- 
> Michal Hocko
> SUSE Labs
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: mm allocation failure and hang when running xfstests generic/269 on xfs
@ 2017-03-02 13:41                   ` Brian Foster
  0 siblings, 0 replies; 79+ messages in thread
From: Brian Foster @ 2017-03-02 13:41 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Tetsuo Handa, Xiong Zhou, Christoph Hellwig, linux-xfs, linux-mm,
	linux-kernel, linux-fsdevel

On Thu, Mar 02, 2017 at 02:27:55PM +0100, Michal Hocko wrote:
> On Thu 02-03-17 08:00:09, Brian Foster wrote:
> > On Thu, Mar 02, 2017 at 01:49:09PM +0100, Michal Hocko wrote:
> > > On Thu 02-03-17 07:24:27, Brian Foster wrote:
> > > > On Thu, Mar 02, 2017 at 11:35:20AM +0100, Michal Hocko wrote:
> > > > > On Thu 02-03-17 19:04:48, Tetsuo Handa wrote:
> > > > > [...]
> > > > > > So, commit 5d17a73a2ebeb8d1("vmalloc: back off when the current task is
> > > > > > killed") implemented __GFP_KILLABLE flag and automatically applied that
> > > > > > flag. As a result, those who are not ready to fail upon SIGKILL are
> > > > > > confused. ;-)
> > > > > 
> > > > > You are right! The function is documented it might fail but the code
> > > > > doesn't really allow that. This seems like a bug to me. What do you
> > > > > think about the following?
> > > > > ---
> > > > > From d02cb0285d8ce3344fd64dc7e2912e9a04bef80d Mon Sep 17 00:00:00 2001
> > > > > From: Michal Hocko <mhocko@suse.com>
> > > > > Date: Thu, 2 Mar 2017 11:31:11 +0100
> > > > > Subject: [PATCH] xfs: allow kmem_zalloc_greedy to fail
> > > > > 
> > > > > Even though kmem_zalloc_greedy is documented it might fail the current
> > > > > code doesn't really implement this properly and loops on the smallest
> > > > > allowed size for ever. This is a problem because vzalloc might fail
> > > > > permanently. Since 5d17a73a2ebe ("vmalloc: back off when the current
> > > > > task is killed") such a failure is much more probable than it used to
> > > > > be. Fix this by bailing out if the minimum size request failed.
> > > > > 
> > > > > This has been noticed by a hung generic/269 xfstest by Xiong Zhou.
> > > > > 
> > > > > Reported-by: Xiong Zhou <xzhou@redhat.com>
> > > > > Analyzed-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> > > > > Signed-off-by: Michal Hocko <mhocko@suse.com>
> > > > > ---
> > > > >  fs/xfs/kmem.c | 2 ++
> > > > >  1 file changed, 2 insertions(+)
> > > > > 
> > > > > diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c
> > > > > index 339c696bbc01..ee95f5c6db45 100644
> > > > > --- a/fs/xfs/kmem.c
> > > > > +++ b/fs/xfs/kmem.c
> > > > > @@ -34,6 +34,8 @@ kmem_zalloc_greedy(size_t *size, size_t minsize, size_t maxsize)
> > > > >  	size_t		kmsize = maxsize;
> > > > >  
> > > > >  	while (!(ptr = vzalloc(kmsize))) {
> > > > > +		if (kmsize == minsize)
> > > > > +			break;
> > > > >  		if ((kmsize >>= 1) <= minsize)
> > > > >  			kmsize = minsize;
> > > > >  	}
> > > > 
> > > > More consistent with the rest of the kmem code might be to accept a
> > > > flags argument and do something like this based on KM_MAYFAIL.
> > > 
> > > Well, vmalloc doesn't really support GFP_NOFAIL semantic right now for
> > > the same reason it doesn't support GFP_NOFS. So I am not sure this is a
> > > good idea.
> > > 
> > 
> > Not sure I follow..? I'm just suggesting to control the loop behavior
> > based on the KM_ flag, not to do or change anything wrt to GFP_ flags.
> 
> As Tetsuo already pointed out, vmalloc cannot really support never-fail
> semantic with the current implementation so the semantic would have
> to be implemented in kmem_zalloc_greedy and the only way to do that
> would be to loop there and this is rather nasty as you can see from the
> reported issue because the vmalloc failure might be permanent so there
> won't be any way to make a forward progress. Breaking out of the loop
> on fatal_signal_pending pending would break the non-failing sementic.
> 

Sure..

> Besides that, there doesn't really seem to be any demand for this
> semantic in the first place so why to make this more complicated than
> necessary?
> 

That may very well be the case. I'm not necessarily against this...

> I see your argument about being in sync with other kmem helpers but
> those are bit different because regular page/slab allocators allow never
> fail semantic (even though this is mostly ignored by those helpers which
> implement their own retries but that is a different topic).
> 

... but what I'm trying to understand here is whether this failure
scenario is specific to vmalloc() or whether the other kmem_*()
functions are susceptible to the same problem. For example, suppose we
replaced this kmem_zalloc_greedy() call with a kmem_zalloc(PAGE_SIZE,
KM_SLEEP) call. Could we hit the same problem if the process is killed?

Brian

> -- 
> Michal Hocko
> SUSE Labs
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: mm allocation failure and hang when running xfstests generic/269 on xfs
  2017-03-02 13:41                   ` Brian Foster
@ 2017-03-02 13:50                     ` Michal Hocko
  -1 siblings, 0 replies; 79+ messages in thread
From: Michal Hocko @ 2017-03-02 13:50 UTC (permalink / raw)
  To: Brian Foster
  Cc: Tetsuo Handa, Xiong Zhou, Christoph Hellwig, linux-xfs, linux-mm,
	linux-kernel, linux-fsdevel

On Thu 02-03-17 08:41:58, Brian Foster wrote:
> On Thu, Mar 02, 2017 at 02:27:55PM +0100, Michal Hocko wrote:
[...]
> > I see your argument about being in sync with other kmem helpers but
> > those are bit different because regular page/slab allocators allow never
> > fail semantic (even though this is mostly ignored by those helpers which
> > implement their own retries but that is a different topic).
> > 
> 
> ... but what I'm trying to understand here is whether this failure
> scenario is specific to vmalloc() or whether the other kmem_*()
> functions are susceptible to the same problem. For example, suppose we
> replaced this kmem_zalloc_greedy() call with a kmem_zalloc(PAGE_SIZE,
> KM_SLEEP) call. Could we hit the same problem if the process is killed?

Well, kmem_zalloc uses kmalloc which can also fail when we are out of
memory but in that case we can expect the OOM killer releasing some
memory which would allow us to make a forward progress on the next
retry. So essentially retrying around kmalloc is much more safe in this
regard. Failing vmalloc might be permanent because there is no vmalloc
space to allocate from or much more likely due to already mentioned
patch. So vmalloc is different, really.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: mm allocation failure and hang when running xfstests generic/269 on xfs
@ 2017-03-02 13:50                     ` Michal Hocko
  0 siblings, 0 replies; 79+ messages in thread
From: Michal Hocko @ 2017-03-02 13:50 UTC (permalink / raw)
  To: Brian Foster
  Cc: Tetsuo Handa, Xiong Zhou, Christoph Hellwig, linux-xfs, linux-mm,
	linux-kernel, linux-fsdevel

On Thu 02-03-17 08:41:58, Brian Foster wrote:
> On Thu, Mar 02, 2017 at 02:27:55PM +0100, Michal Hocko wrote:
[...]
> > I see your argument about being in sync with other kmem helpers but
> > those are bit different because regular page/slab allocators allow never
> > fail semantic (even though this is mostly ignored by those helpers which
> > implement their own retries but that is a different topic).
> > 
> 
> ... but what I'm trying to understand here is whether this failure
> scenario is specific to vmalloc() or whether the other kmem_*()
> functions are susceptible to the same problem. For example, suppose we
> replaced this kmem_zalloc_greedy() call with a kmem_zalloc(PAGE_SIZE,
> KM_SLEEP) call. Could we hit the same problem if the process is killed?

Well, kmem_zalloc uses kmalloc which can also fail when we are out of
memory but in that case we can expect the OOM killer releasing some
memory which would allow us to make a forward progress on the next
retry. So essentially retrying around kmalloc is much more safe in this
regard. Failing vmalloc might be permanent because there is no vmalloc
space to allocate from or much more likely due to already mentioned
patch. So vmalloc is different, really.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: mm allocation failure and hang when running xfstests generic/269 on xfs
  2017-03-02 13:50                     ` Michal Hocko
@ 2017-03-02 14:23                       ` Brian Foster
  -1 siblings, 0 replies; 79+ messages in thread
From: Brian Foster @ 2017-03-02 14:23 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Tetsuo Handa, Xiong Zhou, Christoph Hellwig, linux-xfs, linux-mm,
	linux-kernel, linux-fsdevel

On Thu, Mar 02, 2017 at 02:50:01PM +0100, Michal Hocko wrote:
> On Thu 02-03-17 08:41:58, Brian Foster wrote:
> > On Thu, Mar 02, 2017 at 02:27:55PM +0100, Michal Hocko wrote:
> [...]
> > > I see your argument about being in sync with other kmem helpers but
> > > those are bit different because regular page/slab allocators allow never
> > > fail semantic (even though this is mostly ignored by those helpers which
> > > implement their own retries but that is a different topic).
> > > 
> > 
> > ... but what I'm trying to understand here is whether this failure
> > scenario is specific to vmalloc() or whether the other kmem_*()
> > functions are susceptible to the same problem. For example, suppose we
> > replaced this kmem_zalloc_greedy() call with a kmem_zalloc(PAGE_SIZE,
> > KM_SLEEP) call. Could we hit the same problem if the process is killed?
> 
> Well, kmem_zalloc uses kmalloc which can also fail when we are out of
> memory but in that case we can expect the OOM killer releasing some
> memory which would allow us to make a forward progress on the next
> retry. So essentially retrying around kmalloc is much more safe in this
> regard. Failing vmalloc might be permanent because there is no vmalloc
> space to allocate from or much more likely due to already mentioned
> patch. So vmalloc is different, really.

Right.. that's why I'm asking. So it's technically possible but highly
unlikely due to the different failure characteristics. That seems
reasonable to me, then. 

To be clear, do we understand what causes the vzalloc() failure to be
effectively permanent in this specific reproducer? I know you mention
above that we could be out of vmalloc space, but that doesn't clarify
whether there are other potential failure paths or then what this has to
do with the fact that the process was killed. Does the pending signal
cause the subsequent failures or are you saying that there is some other
root cause of the failure, this process would effectively be spinning
here anyways, and we're just noticing it because it's trying to exit?

Brian

> -- 
> Michal Hocko
> SUSE Labs
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: mm allocation failure and hang when running xfstests generic/269 on xfs
@ 2017-03-02 14:23                       ` Brian Foster
  0 siblings, 0 replies; 79+ messages in thread
From: Brian Foster @ 2017-03-02 14:23 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Tetsuo Handa, Xiong Zhou, Christoph Hellwig, linux-xfs, linux-mm,
	linux-kernel, linux-fsdevel

On Thu, Mar 02, 2017 at 02:50:01PM +0100, Michal Hocko wrote:
> On Thu 02-03-17 08:41:58, Brian Foster wrote:
> > On Thu, Mar 02, 2017 at 02:27:55PM +0100, Michal Hocko wrote:
> [...]
> > > I see your argument about being in sync with other kmem helpers but
> > > those are bit different because regular page/slab allocators allow never
> > > fail semantic (even though this is mostly ignored by those helpers which
> > > implement their own retries but that is a different topic).
> > > 
> > 
> > ... but what I'm trying to understand here is whether this failure
> > scenario is specific to vmalloc() or whether the other kmem_*()
> > functions are susceptible to the same problem. For example, suppose we
> > replaced this kmem_zalloc_greedy() call with a kmem_zalloc(PAGE_SIZE,
> > KM_SLEEP) call. Could we hit the same problem if the process is killed?
> 
> Well, kmem_zalloc uses kmalloc which can also fail when we are out of
> memory but in that case we can expect the OOM killer releasing some
> memory which would allow us to make a forward progress on the next
> retry. So essentially retrying around kmalloc is much more safe in this
> regard. Failing vmalloc might be permanent because there is no vmalloc
> space to allocate from or much more likely due to already mentioned
> patch. So vmalloc is different, really.

Right.. that's why I'm asking. So it's technically possible but highly
unlikely due to the different failure characteristics. That seems
reasonable to me, then. 

To be clear, do we understand what causes the vzalloc() failure to be
effectively permanent in this specific reproducer? I know you mention
above that we could be out of vmalloc space, but that doesn't clarify
whether there are other potential failure paths or then what this has to
do with the fact that the process was killed. Does the pending signal
cause the subsequent failures or are you saying that there is some other
root cause of the failure, this process would effectively be spinning
here anyways, and we're just noticing it because it's trying to exit?

Brian

> -- 
> Michal Hocko
> SUSE Labs
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: mm allocation failure and hang when running xfstests generic/269 on xfs
  2017-03-02 14:23                       ` Brian Foster
@ 2017-03-02 14:34                         ` Michal Hocko
  -1 siblings, 0 replies; 79+ messages in thread
From: Michal Hocko @ 2017-03-02 14:34 UTC (permalink / raw)
  To: Brian Foster
  Cc: Tetsuo Handa, Xiong Zhou, Christoph Hellwig, linux-xfs, linux-mm,
	linux-kernel, linux-fsdevel

On Thu 02-03-17 09:23:15, Brian Foster wrote:
> On Thu, Mar 02, 2017 at 02:50:01PM +0100, Michal Hocko wrote:
> > On Thu 02-03-17 08:41:58, Brian Foster wrote:
> > > On Thu, Mar 02, 2017 at 02:27:55PM +0100, Michal Hocko wrote:
> > [...]
> > > > I see your argument about being in sync with other kmem helpers but
> > > > those are bit different because regular page/slab allocators allow never
> > > > fail semantic (even though this is mostly ignored by those helpers which
> > > > implement their own retries but that is a different topic).
> > > > 
> > > 
> > > ... but what I'm trying to understand here is whether this failure
> > > scenario is specific to vmalloc() or whether the other kmem_*()
> > > functions are susceptible to the same problem. For example, suppose we
> > > replaced this kmem_zalloc_greedy() call with a kmem_zalloc(PAGE_SIZE,
> > > KM_SLEEP) call. Could we hit the same problem if the process is killed?
> > 
> > Well, kmem_zalloc uses kmalloc which can also fail when we are out of
> > memory but in that case we can expect the OOM killer releasing some
> > memory which would allow us to make a forward progress on the next
> > retry. So essentially retrying around kmalloc is much more safe in this
> > regard. Failing vmalloc might be permanent because there is no vmalloc
> > space to allocate from or much more likely due to already mentioned
> > patch. So vmalloc is different, really.
> 
> Right.. that's why I'm asking. So it's technically possible but highly
> unlikely due to the different failure characteristics. That seems
> reasonable to me, then. 
> 
> To be clear, do we understand what causes the vzalloc() failure to be
> effectively permanent in this specific reproducer? I know you mention
> above that we could be out of vmalloc space, but that doesn't clarify
> whether there are other potential failure paths or then what this has to
> do with the fact that the process was killed. Does the pending signal
> cause the subsequent failures or are you saying that there is some other
> root cause of the failure, this process would effectively be spinning
> here anyways, and we're just noticing it because it's trying to exit?

In this particular case it is fatal_signal_pending that causes the
permanent failure. This check has been added to prevent from complete
memory reserves depletion on OOM when a killed task has a free ticket to
reserves and vmalloc requests can be really large. In this case there
was no OOM killer going on but fsstress has SIGKILL pending for other
reason. Most probably as a result of the group_exit when all threads
are killed (see zap_process). I could have turn fatal_signal_pending
into tsk_is_oom_victim which would be less likely to hit but in
principle fatal_signal_pending should be better because we do want to
bail out when the process is existing as soon as possible.

What I really wanted to say is that there are other possible permanent
failure paths in vmalloc AFAICS. They are much less probable but they
still exist.

Does that make more sense now?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: mm allocation failure and hang when running xfstests generic/269 on xfs
@ 2017-03-02 14:34                         ` Michal Hocko
  0 siblings, 0 replies; 79+ messages in thread
From: Michal Hocko @ 2017-03-02 14:34 UTC (permalink / raw)
  To: Brian Foster
  Cc: Tetsuo Handa, Xiong Zhou, Christoph Hellwig, linux-xfs, linux-mm,
	linux-kernel, linux-fsdevel

On Thu 02-03-17 09:23:15, Brian Foster wrote:
> On Thu, Mar 02, 2017 at 02:50:01PM +0100, Michal Hocko wrote:
> > On Thu 02-03-17 08:41:58, Brian Foster wrote:
> > > On Thu, Mar 02, 2017 at 02:27:55PM +0100, Michal Hocko wrote:
> > [...]
> > > > I see your argument about being in sync with other kmem helpers but
> > > > those are bit different because regular page/slab allocators allow never
> > > > fail semantic (even though this is mostly ignored by those helpers which
> > > > implement their own retries but that is a different topic).
> > > > 
> > > 
> > > ... but what I'm trying to understand here is whether this failure
> > > scenario is specific to vmalloc() or whether the other kmem_*()
> > > functions are susceptible to the same problem. For example, suppose we
> > > replaced this kmem_zalloc_greedy() call with a kmem_zalloc(PAGE_SIZE,
> > > KM_SLEEP) call. Could we hit the same problem if the process is killed?
> > 
> > Well, kmem_zalloc uses kmalloc which can also fail when we are out of
> > memory but in that case we can expect the OOM killer releasing some
> > memory which would allow us to make a forward progress on the next
> > retry. So essentially retrying around kmalloc is much more safe in this
> > regard. Failing vmalloc might be permanent because there is no vmalloc
> > space to allocate from or much more likely due to already mentioned
> > patch. So vmalloc is different, really.
> 
> Right.. that's why I'm asking. So it's technically possible but highly
> unlikely due to the different failure characteristics. That seems
> reasonable to me, then. 
> 
> To be clear, do we understand what causes the vzalloc() failure to be
> effectively permanent in this specific reproducer? I know you mention
> above that we could be out of vmalloc space, but that doesn't clarify
> whether there are other potential failure paths or then what this has to
> do with the fact that the process was killed. Does the pending signal
> cause the subsequent failures or are you saying that there is some other
> root cause of the failure, this process would effectively be spinning
> here anyways, and we're just noticing it because it's trying to exit?

In this particular case it is fatal_signal_pending that causes the
permanent failure. This check has been added to prevent from complete
memory reserves depletion on OOM when a killed task has a free ticket to
reserves and vmalloc requests can be really large. In this case there
was no OOM killer going on but fsstress has SIGKILL pending for other
reason. Most probably as a result of the group_exit when all threads
are killed (see zap_process). I could have turn fatal_signal_pending
into tsk_is_oom_victim which would be less likely to hit but in
principle fatal_signal_pending should be better because we do want to
bail out when the process is existing as soon as possible.

What I really wanted to say is that there are other possible permanent
failure paths in vmalloc AFAICS. They are much less probable but they
still exist.

Does that make more sense now?
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: mm allocation failure and hang when running xfstests generic/269 on xfs
  2017-03-02 14:34                         ` Michal Hocko
@ 2017-03-02 14:51                           ` Brian Foster
  -1 siblings, 0 replies; 79+ messages in thread
From: Brian Foster @ 2017-03-02 14:51 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Tetsuo Handa, Xiong Zhou, Christoph Hellwig, linux-xfs, linux-mm,
	linux-kernel, linux-fsdevel

On Thu, Mar 02, 2017 at 03:34:41PM +0100, Michal Hocko wrote:
> On Thu 02-03-17 09:23:15, Brian Foster wrote:
> > On Thu, Mar 02, 2017 at 02:50:01PM +0100, Michal Hocko wrote:
> > > On Thu 02-03-17 08:41:58, Brian Foster wrote:
> > > > On Thu, Mar 02, 2017 at 02:27:55PM +0100, Michal Hocko wrote:
> > > [...]
> > > > > I see your argument about being in sync with other kmem helpers but
> > > > > those are bit different because regular page/slab allocators allow never
> > > > > fail semantic (even though this is mostly ignored by those helpers which
> > > > > implement their own retries but that is a different topic).
> > > > > 
> > > > 
> > > > ... but what I'm trying to understand here is whether this failure
> > > > scenario is specific to vmalloc() or whether the other kmem_*()
> > > > functions are susceptible to the same problem. For example, suppose we
> > > > replaced this kmem_zalloc_greedy() call with a kmem_zalloc(PAGE_SIZE,
> > > > KM_SLEEP) call. Could we hit the same problem if the process is killed?
> > > 
> > > Well, kmem_zalloc uses kmalloc which can also fail when we are out of
> > > memory but in that case we can expect the OOM killer releasing some
> > > memory which would allow us to make a forward progress on the next
> > > retry. So essentially retrying around kmalloc is much more safe in this
> > > regard. Failing vmalloc might be permanent because there is no vmalloc
> > > space to allocate from or much more likely due to already mentioned
> > > patch. So vmalloc is different, really.
> > 
> > Right.. that's why I'm asking. So it's technically possible but highly
> > unlikely due to the different failure characteristics. That seems
> > reasonable to me, then. 
> > 
> > To be clear, do we understand what causes the vzalloc() failure to be
> > effectively permanent in this specific reproducer? I know you mention
> > above that we could be out of vmalloc space, but that doesn't clarify
> > whether there are other potential failure paths or then what this has to
> > do with the fact that the process was killed. Does the pending signal
> > cause the subsequent failures or are you saying that there is some other
> > root cause of the failure, this process would effectively be spinning
> > here anyways, and we're just noticing it because it's trying to exit?
> 
> In this particular case it is fatal_signal_pending that causes the
> permanent failure. This check has been added to prevent from complete
> memory reserves depletion on OOM when a killed task has a free ticket to
> reserves and vmalloc requests can be really large. In this case there
> was no OOM killer going on but fsstress has SIGKILL pending for other
> reason. Most probably as a result of the group_exit when all threads
> are killed (see zap_process). I could have turn fatal_signal_pending
> into tsk_is_oom_victim which would be less likely to hit but in
> principle fatal_signal_pending should be better because we do want to
> bail out when the process is existing as soon as possible.
> 
> What I really wanted to say is that there are other possible permanent
> failure paths in vmalloc AFAICS. They are much less probable but they
> still exist.
> 
> Does that make more sense now?

Yes, thanks. That explains why this crops up now where it hasn't in the
past. Please include that background in the commit log description.

Also, that kind of makes me think that a fatal_signal_pending() check is
still appropriate in the loop, even if we want to drop the infinite
retry loop in kmem_zalloc_greedy() as well. There's no sense in doing
however many retries are left before we return and that's also more
explicit for the next person who goes to change this code in the future.

Otherwise, I'm fine with breaking the infinite retry loop at the same
time. It looks like Christoph added this function originally so this
should probably require his ack as well..

Brian

> -- 
> Michal Hocko
> SUSE Labs
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: mm allocation failure and hang when running xfstests generic/269 on xfs
@ 2017-03-02 14:51                           ` Brian Foster
  0 siblings, 0 replies; 79+ messages in thread
From: Brian Foster @ 2017-03-02 14:51 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Tetsuo Handa, Xiong Zhou, Christoph Hellwig, linux-xfs, linux-mm,
	linux-kernel, linux-fsdevel

On Thu, Mar 02, 2017 at 03:34:41PM +0100, Michal Hocko wrote:
> On Thu 02-03-17 09:23:15, Brian Foster wrote:
> > On Thu, Mar 02, 2017 at 02:50:01PM +0100, Michal Hocko wrote:
> > > On Thu 02-03-17 08:41:58, Brian Foster wrote:
> > > > On Thu, Mar 02, 2017 at 02:27:55PM +0100, Michal Hocko wrote:
> > > [...]
> > > > > I see your argument about being in sync with other kmem helpers but
> > > > > those are bit different because regular page/slab allocators allow never
> > > > > fail semantic (even though this is mostly ignored by those helpers which
> > > > > implement their own retries but that is a different topic).
> > > > > 
> > > > 
> > > > ... but what I'm trying to understand here is whether this failure
> > > > scenario is specific to vmalloc() or whether the other kmem_*()
> > > > functions are susceptible to the same problem. For example, suppose we
> > > > replaced this kmem_zalloc_greedy() call with a kmem_zalloc(PAGE_SIZE,
> > > > KM_SLEEP) call. Could we hit the same problem if the process is killed?
> > > 
> > > Well, kmem_zalloc uses kmalloc which can also fail when we are out of
> > > memory but in that case we can expect the OOM killer releasing some
> > > memory which would allow us to make a forward progress on the next
> > > retry. So essentially retrying around kmalloc is much more safe in this
> > > regard. Failing vmalloc might be permanent because there is no vmalloc
> > > space to allocate from or much more likely due to already mentioned
> > > patch. So vmalloc is different, really.
> > 
> > Right.. that's why I'm asking. So it's technically possible but highly
> > unlikely due to the different failure characteristics. That seems
> > reasonable to me, then. 
> > 
> > To be clear, do we understand what causes the vzalloc() failure to be
> > effectively permanent in this specific reproducer? I know you mention
> > above that we could be out of vmalloc space, but that doesn't clarify
> > whether there are other potential failure paths or then what this has to
> > do with the fact that the process was killed. Does the pending signal
> > cause the subsequent failures or are you saying that there is some other
> > root cause of the failure, this process would effectively be spinning
> > here anyways, and we're just noticing it because it's trying to exit?
> 
> In this particular case it is fatal_signal_pending that causes the
> permanent failure. This check has been added to prevent from complete
> memory reserves depletion on OOM when a killed task has a free ticket to
> reserves and vmalloc requests can be really large. In this case there
> was no OOM killer going on but fsstress has SIGKILL pending for other
> reason. Most probably as a result of the group_exit when all threads
> are killed (see zap_process). I could have turn fatal_signal_pending
> into tsk_is_oom_victim which would be less likely to hit but in
> principle fatal_signal_pending should be better because we do want to
> bail out when the process is existing as soon as possible.
> 
> What I really wanted to say is that there are other possible permanent
> failure paths in vmalloc AFAICS. They are much less probable but they
> still exist.
> 
> Does that make more sense now?

Yes, thanks. That explains why this crops up now where it hasn't in the
past. Please include that background in the commit log description.

Also, that kind of makes me think that a fatal_signal_pending() check is
still appropriate in the loop, even if we want to drop the infinite
retry loop in kmem_zalloc_greedy() as well. There's no sense in doing
however many retries are left before we return and that's also more
explicit for the next person who goes to change this code in the future.

Otherwise, I'm fine with breaking the infinite retry loop at the same
time. It looks like Christoph added this function originally so this
should probably require his ack as well..

Brian

> -- 
> Michal Hocko
> SUSE Labs
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: mm allocation failure and hang when running xfstests generic/269 on xfs
  2017-03-02 14:51                           ` Brian Foster
@ 2017-03-02 15:14                             ` Michal Hocko
  -1 siblings, 0 replies; 79+ messages in thread
From: Michal Hocko @ 2017-03-02 15:14 UTC (permalink / raw)
  To: Brian Foster, Christoph Hellwig
  Cc: Tetsuo Handa, Xiong Zhou, linux-xfs, linux-mm, linux-kernel,
	linux-fsdevel

On Thu 02-03-17 09:51:31, Brian Foster wrote:
> On Thu, Mar 02, 2017 at 03:34:41PM +0100, Michal Hocko wrote:
> > On Thu 02-03-17 09:23:15, Brian Foster wrote:
> > > On Thu, Mar 02, 2017 at 02:50:01PM +0100, Michal Hocko wrote:
> > > > On Thu 02-03-17 08:41:58, Brian Foster wrote:
> > > > > On Thu, Mar 02, 2017 at 02:27:55PM +0100, Michal Hocko wrote:
> > > > [...]
> > > > > > I see your argument about being in sync with other kmem helpers but
> > > > > > those are bit different because regular page/slab allocators allow never
> > > > > > fail semantic (even though this is mostly ignored by those helpers which
> > > > > > implement their own retries but that is a different topic).
> > > > > > 
> > > > > 
> > > > > ... but what I'm trying to understand here is whether this failure
> > > > > scenario is specific to vmalloc() or whether the other kmem_*()
> > > > > functions are susceptible to the same problem. For example, suppose we
> > > > > replaced this kmem_zalloc_greedy() call with a kmem_zalloc(PAGE_SIZE,
> > > > > KM_SLEEP) call. Could we hit the same problem if the process is killed?
> > > > 
> > > > Well, kmem_zalloc uses kmalloc which can also fail when we are out of
> > > > memory but in that case we can expect the OOM killer releasing some
> > > > memory which would allow us to make a forward progress on the next
> > > > retry. So essentially retrying around kmalloc is much more safe in this
> > > > regard. Failing vmalloc might be permanent because there is no vmalloc
> > > > space to allocate from or much more likely due to already mentioned
> > > > patch. So vmalloc is different, really.
> > > 
> > > Right.. that's why I'm asking. So it's technically possible but highly
> > > unlikely due to the different failure characteristics. That seems
> > > reasonable to me, then. 
> > > 
> > > To be clear, do we understand what causes the vzalloc() failure to be
> > > effectively permanent in this specific reproducer? I know you mention
> > > above that we could be out of vmalloc space, but that doesn't clarify
> > > whether there are other potential failure paths or then what this has to
> > > do with the fact that the process was killed. Does the pending signal
> > > cause the subsequent failures or are you saying that there is some other
> > > root cause of the failure, this process would effectively be spinning
> > > here anyways, and we're just noticing it because it's trying to exit?
> > 
> > In this particular case it is fatal_signal_pending that causes the
> > permanent failure. This check has been added to prevent from complete
> > memory reserves depletion on OOM when a killed task has a free ticket to
> > reserves and vmalloc requests can be really large. In this case there
> > was no OOM killer going on but fsstress has SIGKILL pending for other
> > reason. Most probably as a result of the group_exit when all threads
> > are killed (see zap_process). I could have turn fatal_signal_pending
> > into tsk_is_oom_victim which would be less likely to hit but in
> > principle fatal_signal_pending should be better because we do want to
> > bail out when the process is existing as soon as possible.
> > 
> > What I really wanted to say is that there are other possible permanent
> > failure paths in vmalloc AFAICS. They are much less probable but they
> > still exist.
> > 
> > Does that make more sense now?
> 
> Yes, thanks. That explains why this crops up now where it hasn't in the
> past. Please include that background in the commit log description.

OK, does this sound better. I am open to any suggestions to improve this
of course

: xfs: allow kmem_zalloc_greedy to fail
: 
: Even though kmem_zalloc_greedy is documented it might fail the current
: code doesn't really implement this properly and loops on the smallest
: allowed size for ever. This is a problem because vzalloc might fail
: permanently - we might run out of vmalloc space or since 5d17a73a2ebe
: ("vmalloc: back off when the current task is killed") when the current
: task is killed. The later one makes the failure scenario much more
: probable than it used to be. Fix this by bailing out if the minimum size
: request failed.
: 
: This has been noticed by a hung generic/269 xfstest by Xiong Zhou.
: 
: fsstress: vmalloc: allocation failure, allocated 12288 of 20480 bytes, mode:0x14080c2(GFP_KERNEL|__GFP_HIGHMEM|__GFP_ZERO), nodemask=(null)
: fsstress cpuset=/ mems_allowed=0-1
: CPU: 1 PID: 23460 Comm: fsstress Not tainted 4.10.0-master-45554b2+ #21
: Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 10/05/2016
: Call Trace:
:  dump_stack+0x63/0x87
:  warn_alloc+0x114/0x1c0
:  ? alloc_pages_current+0x88/0x120
:  __vmalloc_node_range+0x250/0x2a0
:  ? kmem_zalloc_greedy+0x2b/0x40 [xfs]
:  ? free_hot_cold_page+0x21f/0x280
:  vzalloc+0x54/0x60
:  ? kmem_zalloc_greedy+0x2b/0x40 [xfs]
:  kmem_zalloc_greedy+0x2b/0x40 [xfs]
:  xfs_bulkstat+0x11b/0x730 [xfs]
:  ? xfs_bulkstat_one_int+0x340/0x340 [xfs]
:  ? selinux_capable+0x20/0x30
:  ? security_capable+0x48/0x60
:  xfs_ioc_bulkstat+0xe4/0x190 [xfs]
:  xfs_file_ioctl+0x9dd/0xad0 [xfs]
:  ? do_filp_open+0xa5/0x100
:  do_vfs_ioctl+0xa7/0x5e0
:  SyS_ioctl+0x79/0x90
:  do_syscall_64+0x67/0x180
:  entry_SYSCALL64_slow_path+0x25/0x25
: 
: fsstress keeps looping inside kmem_zalloc_greedy without any way out
: because vmalloc keeps failing due to fatal_signal_pending.
: 
: Reported-by: Xiong Zhou <xzhou@redhat.com>
: Analyzed-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
: Signed-off-by: Michal Hocko <mhocko@suse.com>

> Also, that kind of makes me think that a fatal_signal_pending() check is
> still appropriate in the loop, even if we want to drop the infinite
> retry loop in kmem_zalloc_greedy() as well. There's no sense in doing
> however many retries are left before we return and that's also more
> explicit for the next person who goes to change this code in the future.

I am not objecting to adding fatal_signal_pending as well I just thought
that from the logic POV breaking after reaching the minimum size is just
the right thing to do. We can optimize further by checking
fatal_signal_pending and reducing retries when we know it doesn't make
much sense but that should be done on top as an optimization IMHO.

> Otherwise, I'm fine with breaking the infinite retry loop at the same
> time. It looks like Christoph added this function originally so this
> should probably require his ack as well..

What do you think Christoph?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: mm allocation failure and hang when running xfstests generic/269 on xfs
@ 2017-03-02 15:14                             ` Michal Hocko
  0 siblings, 0 replies; 79+ messages in thread
From: Michal Hocko @ 2017-03-02 15:14 UTC (permalink / raw)
  To: Brian Foster, Christoph Hellwig
  Cc: Tetsuo Handa, Xiong Zhou, linux-xfs, linux-mm, linux-kernel,
	linux-fsdevel

On Thu 02-03-17 09:51:31, Brian Foster wrote:
> On Thu, Mar 02, 2017 at 03:34:41PM +0100, Michal Hocko wrote:
> > On Thu 02-03-17 09:23:15, Brian Foster wrote:
> > > On Thu, Mar 02, 2017 at 02:50:01PM +0100, Michal Hocko wrote:
> > > > On Thu 02-03-17 08:41:58, Brian Foster wrote:
> > > > > On Thu, Mar 02, 2017 at 02:27:55PM +0100, Michal Hocko wrote:
> > > > [...]
> > > > > > I see your argument about being in sync with other kmem helpers but
> > > > > > those are bit different because regular page/slab allocators allow never
> > > > > > fail semantic (even though this is mostly ignored by those helpers which
> > > > > > implement their own retries but that is a different topic).
> > > > > > 
> > > > > 
> > > > > ... but what I'm trying to understand here is whether this failure
> > > > > scenario is specific to vmalloc() or whether the other kmem_*()
> > > > > functions are susceptible to the same problem. For example, suppose we
> > > > > replaced this kmem_zalloc_greedy() call with a kmem_zalloc(PAGE_SIZE,
> > > > > KM_SLEEP) call. Could we hit the same problem if the process is killed?
> > > > 
> > > > Well, kmem_zalloc uses kmalloc which can also fail when we are out of
> > > > memory but in that case we can expect the OOM killer releasing some
> > > > memory which would allow us to make a forward progress on the next
> > > > retry. So essentially retrying around kmalloc is much more safe in this
> > > > regard. Failing vmalloc might be permanent because there is no vmalloc
> > > > space to allocate from or much more likely due to already mentioned
> > > > patch. So vmalloc is different, really.
> > > 
> > > Right.. that's why I'm asking. So it's technically possible but highly
> > > unlikely due to the different failure characteristics. That seems
> > > reasonable to me, then. 
> > > 
> > > To be clear, do we understand what causes the vzalloc() failure to be
> > > effectively permanent in this specific reproducer? I know you mention
> > > above that we could be out of vmalloc space, but that doesn't clarify
> > > whether there are other potential failure paths or then what this has to
> > > do with the fact that the process was killed. Does the pending signal
> > > cause the subsequent failures or are you saying that there is some other
> > > root cause of the failure, this process would effectively be spinning
> > > here anyways, and we're just noticing it because it's trying to exit?
> > 
> > In this particular case it is fatal_signal_pending that causes the
> > permanent failure. This check has been added to prevent from complete
> > memory reserves depletion on OOM when a killed task has a free ticket to
> > reserves and vmalloc requests can be really large. In this case there
> > was no OOM killer going on but fsstress has SIGKILL pending for other
> > reason. Most probably as a result of the group_exit when all threads
> > are killed (see zap_process). I could have turn fatal_signal_pending
> > into tsk_is_oom_victim which would be less likely to hit but in
> > principle fatal_signal_pending should be better because we do want to
> > bail out when the process is existing as soon as possible.
> > 
> > What I really wanted to say is that there are other possible permanent
> > failure paths in vmalloc AFAICS. They are much less probable but they
> > still exist.
> > 
> > Does that make more sense now?
> 
> Yes, thanks. That explains why this crops up now where it hasn't in the
> past. Please include that background in the commit log description.

OK, does this sound better. I am open to any suggestions to improve this
of course

: xfs: allow kmem_zalloc_greedy to fail
: 
: Even though kmem_zalloc_greedy is documented it might fail the current
: code doesn't really implement this properly and loops on the smallest
: allowed size for ever. This is a problem because vzalloc might fail
: permanently - we might run out of vmalloc space or since 5d17a73a2ebe
: ("vmalloc: back off when the current task is killed") when the current
: task is killed. The later one makes the failure scenario much more
: probable than it used to be. Fix this by bailing out if the minimum size
: request failed.
: 
: This has been noticed by a hung generic/269 xfstest by Xiong Zhou.
: 
: fsstress: vmalloc: allocation failure, allocated 12288 of 20480 bytes, mode:0x14080c2(GFP_KERNEL|__GFP_HIGHMEM|__GFP_ZERO), nodemask=(null)
: fsstress cpuset=/ mems_allowed=0-1
: CPU: 1 PID: 23460 Comm: fsstress Not tainted 4.10.0-master-45554b2+ #21
: Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 10/05/2016
: Call Trace:
:  dump_stack+0x63/0x87
:  warn_alloc+0x114/0x1c0
:  ? alloc_pages_current+0x88/0x120
:  __vmalloc_node_range+0x250/0x2a0
:  ? kmem_zalloc_greedy+0x2b/0x40 [xfs]
:  ? free_hot_cold_page+0x21f/0x280
:  vzalloc+0x54/0x60
:  ? kmem_zalloc_greedy+0x2b/0x40 [xfs]
:  kmem_zalloc_greedy+0x2b/0x40 [xfs]
:  xfs_bulkstat+0x11b/0x730 [xfs]
:  ? xfs_bulkstat_one_int+0x340/0x340 [xfs]
:  ? selinux_capable+0x20/0x30
:  ? security_capable+0x48/0x60
:  xfs_ioc_bulkstat+0xe4/0x190 [xfs]
:  xfs_file_ioctl+0x9dd/0xad0 [xfs]
:  ? do_filp_open+0xa5/0x100
:  do_vfs_ioctl+0xa7/0x5e0
:  SyS_ioctl+0x79/0x90
:  do_syscall_64+0x67/0x180
:  entry_SYSCALL64_slow_path+0x25/0x25
: 
: fsstress keeps looping inside kmem_zalloc_greedy without any way out
: because vmalloc keeps failing due to fatal_signal_pending.
: 
: Reported-by: Xiong Zhou <xzhou@redhat.com>
: Analyzed-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
: Signed-off-by: Michal Hocko <mhocko@suse.com>

> Also, that kind of makes me think that a fatal_signal_pending() check is
> still appropriate in the loop, even if we want to drop the infinite
> retry loop in kmem_zalloc_greedy() as well. There's no sense in doing
> however many retries are left before we return and that's also more
> explicit for the next person who goes to change this code in the future.

I am not objecting to adding fatal_signal_pending as well I just thought
that from the logic POV breaking after reaching the minimum size is just
the right thing to do. We can optimize further by checking
fatal_signal_pending and reducing retries when we know it doesn't make
much sense but that should be done on top as an optimization IMHO.

> Otherwise, I'm fine with breaking the infinite retry loop at the same
> time. It looks like Christoph added this function originally so this
> should probably require his ack as well..

What do you think Christoph?
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: mm allocation failure and hang when running xfstests generic/269 on xfs
  2017-03-02 15:14                             ` Michal Hocko
@ 2017-03-02 15:30                               ` Brian Foster
  -1 siblings, 0 replies; 79+ messages in thread
From: Brian Foster @ 2017-03-02 15:30 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Christoph Hellwig, Tetsuo Handa, Xiong Zhou, linux-xfs, linux-mm,
	linux-kernel, linux-fsdevel

On Thu, Mar 02, 2017 at 04:14:11PM +0100, Michal Hocko wrote:
> On Thu 02-03-17 09:51:31, Brian Foster wrote:
> > On Thu, Mar 02, 2017 at 03:34:41PM +0100, Michal Hocko wrote:
> > > On Thu 02-03-17 09:23:15, Brian Foster wrote:
> > > > On Thu, Mar 02, 2017 at 02:50:01PM +0100, Michal Hocko wrote:
> > > > > On Thu 02-03-17 08:41:58, Brian Foster wrote:
> > > > > > On Thu, Mar 02, 2017 at 02:27:55PM +0100, Michal Hocko wrote:
> > > > > [...]
> > > > > > > I see your argument about being in sync with other kmem helpers but
> > > > > > > those are bit different because regular page/slab allocators allow never
> > > > > > > fail semantic (even though this is mostly ignored by those helpers which
> > > > > > > implement their own retries but that is a different topic).
> > > > > > > 
> > > > > > 
> > > > > > ... but what I'm trying to understand here is whether this failure
> > > > > > scenario is specific to vmalloc() or whether the other kmem_*()
> > > > > > functions are susceptible to the same problem. For example, suppose we
> > > > > > replaced this kmem_zalloc_greedy() call with a kmem_zalloc(PAGE_SIZE,
> > > > > > KM_SLEEP) call. Could we hit the same problem if the process is killed?
> > > > > 
> > > > > Well, kmem_zalloc uses kmalloc which can also fail when we are out of
> > > > > memory but in that case we can expect the OOM killer releasing some
> > > > > memory which would allow us to make a forward progress on the next
> > > > > retry. So essentially retrying around kmalloc is much more safe in this
> > > > > regard. Failing vmalloc might be permanent because there is no vmalloc
> > > > > space to allocate from or much more likely due to already mentioned
> > > > > patch. So vmalloc is different, really.
> > > > 
> > > > Right.. that's why I'm asking. So it's technically possible but highly
> > > > unlikely due to the different failure characteristics. That seems
> > > > reasonable to me, then. 
> > > > 
> > > > To be clear, do we understand what causes the vzalloc() failure to be
> > > > effectively permanent in this specific reproducer? I know you mention
> > > > above that we could be out of vmalloc space, but that doesn't clarify
> > > > whether there are other potential failure paths or then what this has to
> > > > do with the fact that the process was killed. Does the pending signal
> > > > cause the subsequent failures or are you saying that there is some other
> > > > root cause of the failure, this process would effectively be spinning
> > > > here anyways, and we're just noticing it because it's trying to exit?
> > > 
> > > In this particular case it is fatal_signal_pending that causes the
> > > permanent failure. This check has been added to prevent from complete
> > > memory reserves depletion on OOM when a killed task has a free ticket to
> > > reserves and vmalloc requests can be really large. In this case there
> > > was no OOM killer going on but fsstress has SIGKILL pending for other
> > > reason. Most probably as a result of the group_exit when all threads
> > > are killed (see zap_process). I could have turn fatal_signal_pending
> > > into tsk_is_oom_victim which would be less likely to hit but in
> > > principle fatal_signal_pending should be better because we do want to
> > > bail out when the process is existing as soon as possible.
> > > 
> > > What I really wanted to say is that there are other possible permanent
> > > failure paths in vmalloc AFAICS. They are much less probable but they
> > > still exist.
> > > 
> > > Does that make more sense now?
> > 
> > Yes, thanks. That explains why this crops up now where it hasn't in the
> > past. Please include that background in the commit log description.
> 
> OK, does this sound better. I am open to any suggestions to improve this
> of course
> 

Yeah..

> : xfs: allow kmem_zalloc_greedy to fail
> : 
> : Even though kmem_zalloc_greedy is documented it might fail the current
> : code doesn't really implement this properly and loops on the smallest
> : allowed size for ever. This is a problem because vzalloc might fail
> : permanently - we might run out of vmalloc space or since 5d17a73a2ebe
> : ("vmalloc: back off when the current task is killed") when the current
> : task is killed. The later one makes the failure scenario much more
> : probable than it used to be. Fix this by bailing out if the minimum size
                               ^
	" because it makes vmalloc() failures permanent for tasks with fatal
	  signals pending."

> : request failed.
> : 
> : This has been noticed by a hung generic/269 xfstest by Xiong Zhou.
> : 
> : fsstress: vmalloc: allocation failure, allocated 12288 of 20480 bytes, mode:0x14080c2(GFP_KERNEL|__GFP_HIGHMEM|__GFP_ZERO), nodemask=(null)
> : fsstress cpuset=/ mems_allowed=0-1
> : CPU: 1 PID: 23460 Comm: fsstress Not tainted 4.10.0-master-45554b2+ #21
> : Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 10/05/2016
> : Call Trace:
> :  dump_stack+0x63/0x87
> :  warn_alloc+0x114/0x1c0
> :  ? alloc_pages_current+0x88/0x120
> :  __vmalloc_node_range+0x250/0x2a0
> :  ? kmem_zalloc_greedy+0x2b/0x40 [xfs]
> :  ? free_hot_cold_page+0x21f/0x280
> :  vzalloc+0x54/0x60
> :  ? kmem_zalloc_greedy+0x2b/0x40 [xfs]
> :  kmem_zalloc_greedy+0x2b/0x40 [xfs]
> :  xfs_bulkstat+0x11b/0x730 [xfs]
> :  ? xfs_bulkstat_one_int+0x340/0x340 [xfs]
> :  ? selinux_capable+0x20/0x30
> :  ? security_capable+0x48/0x60
> :  xfs_ioc_bulkstat+0xe4/0x190 [xfs]
> :  xfs_file_ioctl+0x9dd/0xad0 [xfs]
> :  ? do_filp_open+0xa5/0x100
> :  do_vfs_ioctl+0xa7/0x5e0
> :  SyS_ioctl+0x79/0x90
> :  do_syscall_64+0x67/0x180
> :  entry_SYSCALL64_slow_path+0x25/0x25
> : 
> : fsstress keeps looping inside kmem_zalloc_greedy without any way out
> : because vmalloc keeps failing due to fatal_signal_pending.
> : 
> : Reported-by: Xiong Zhou <xzhou@redhat.com>
> : Analyzed-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> : Signed-off-by: Michal Hocko <mhocko@suse.com>
> 
> > Also, that kind of makes me think that a fatal_signal_pending() check is
> > still appropriate in the loop, even if we want to drop the infinite
> > retry loop in kmem_zalloc_greedy() as well. There's no sense in doing
> > however many retries are left before we return and that's also more
> > explicit for the next person who goes to change this code in the future.
> 
> I am not objecting to adding fatal_signal_pending as well I just thought
> that from the logic POV breaking after reaching the minimum size is just
> the right thing to do. We can optimize further by checking
> fatal_signal_pending and reducing retries when we know it doesn't make
> much sense but that should be done on top as an optimization IMHO.
> 

I don't think of it as an optimization to not invoke calls (a
non-deterministic number of times) that we know are going to fail, but
ultimately I don't care too much how it's framed or if it's done in
separate patches or whatnot. As long as they are posted at the same
time. ;)

Brian

> > Otherwise, I'm fine with breaking the infinite retry loop at the same
> > time. It looks like Christoph added this function originally so this
> > should probably require his ack as well..
> 
> What do you think Christoph?
> -- 
> Michal Hocko
> SUSE Labs

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: mm allocation failure and hang when running xfstests generic/269 on xfs
@ 2017-03-02 15:30                               ` Brian Foster
  0 siblings, 0 replies; 79+ messages in thread
From: Brian Foster @ 2017-03-02 15:30 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Christoph Hellwig, Tetsuo Handa, Xiong Zhou, linux-xfs, linux-mm,
	linux-kernel, linux-fsdevel

On Thu, Mar 02, 2017 at 04:14:11PM +0100, Michal Hocko wrote:
> On Thu 02-03-17 09:51:31, Brian Foster wrote:
> > On Thu, Mar 02, 2017 at 03:34:41PM +0100, Michal Hocko wrote:
> > > On Thu 02-03-17 09:23:15, Brian Foster wrote:
> > > > On Thu, Mar 02, 2017 at 02:50:01PM +0100, Michal Hocko wrote:
> > > > > On Thu 02-03-17 08:41:58, Brian Foster wrote:
> > > > > > On Thu, Mar 02, 2017 at 02:27:55PM +0100, Michal Hocko wrote:
> > > > > [...]
> > > > > > > I see your argument about being in sync with other kmem helpers but
> > > > > > > those are bit different because regular page/slab allocators allow never
> > > > > > > fail semantic (even though this is mostly ignored by those helpers which
> > > > > > > implement their own retries but that is a different topic).
> > > > > > > 
> > > > > > 
> > > > > > ... but what I'm trying to understand here is whether this failure
> > > > > > scenario is specific to vmalloc() or whether the other kmem_*()
> > > > > > functions are susceptible to the same problem. For example, suppose we
> > > > > > replaced this kmem_zalloc_greedy() call with a kmem_zalloc(PAGE_SIZE,
> > > > > > KM_SLEEP) call. Could we hit the same problem if the process is killed?
> > > > > 
> > > > > Well, kmem_zalloc uses kmalloc which can also fail when we are out of
> > > > > memory but in that case we can expect the OOM killer releasing some
> > > > > memory which would allow us to make a forward progress on the next
> > > > > retry. So essentially retrying around kmalloc is much more safe in this
> > > > > regard. Failing vmalloc might be permanent because there is no vmalloc
> > > > > space to allocate from or much more likely due to already mentioned
> > > > > patch. So vmalloc is different, really.
> > > > 
> > > > Right.. that's why I'm asking. So it's technically possible but highly
> > > > unlikely due to the different failure characteristics. That seems
> > > > reasonable to me, then. 
> > > > 
> > > > To be clear, do we understand what causes the vzalloc() failure to be
> > > > effectively permanent in this specific reproducer? I know you mention
> > > > above that we could be out of vmalloc space, but that doesn't clarify
> > > > whether there are other potential failure paths or then what this has to
> > > > do with the fact that the process was killed. Does the pending signal
> > > > cause the subsequent failures or are you saying that there is some other
> > > > root cause of the failure, this process would effectively be spinning
> > > > here anyways, and we're just noticing it because it's trying to exit?
> > > 
> > > In this particular case it is fatal_signal_pending that causes the
> > > permanent failure. This check has been added to prevent from complete
> > > memory reserves depletion on OOM when a killed task has a free ticket to
> > > reserves and vmalloc requests can be really large. In this case there
> > > was no OOM killer going on but fsstress has SIGKILL pending for other
> > > reason. Most probably as a result of the group_exit when all threads
> > > are killed (see zap_process). I could have turn fatal_signal_pending
> > > into tsk_is_oom_victim which would be less likely to hit but in
> > > principle fatal_signal_pending should be better because we do want to
> > > bail out when the process is existing as soon as possible.
> > > 
> > > What I really wanted to say is that there are other possible permanent
> > > failure paths in vmalloc AFAICS. They are much less probable but they
> > > still exist.
> > > 
> > > Does that make more sense now?
> > 
> > Yes, thanks. That explains why this crops up now where it hasn't in the
> > past. Please include that background in the commit log description.
> 
> OK, does this sound better. I am open to any suggestions to improve this
> of course
> 

Yeah..

> : xfs: allow kmem_zalloc_greedy to fail
> : 
> : Even though kmem_zalloc_greedy is documented it might fail the current
> : code doesn't really implement this properly and loops on the smallest
> : allowed size for ever. This is a problem because vzalloc might fail
> : permanently - we might run out of vmalloc space or since 5d17a73a2ebe
> : ("vmalloc: back off when the current task is killed") when the current
> : task is killed. The later one makes the failure scenario much more
> : probable than it used to be. Fix this by bailing out if the minimum size
                               ^
	" because it makes vmalloc() failures permanent for tasks with fatal
	  signals pending."

> : request failed.
> : 
> : This has been noticed by a hung generic/269 xfstest by Xiong Zhou.
> : 
> : fsstress: vmalloc: allocation failure, allocated 12288 of 20480 bytes, mode:0x14080c2(GFP_KERNEL|__GFP_HIGHMEM|__GFP_ZERO), nodemask=(null)
> : fsstress cpuset=/ mems_allowed=0-1
> : CPU: 1 PID: 23460 Comm: fsstress Not tainted 4.10.0-master-45554b2+ #21
> : Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 10/05/2016
> : Call Trace:
> :  dump_stack+0x63/0x87
> :  warn_alloc+0x114/0x1c0
> :  ? alloc_pages_current+0x88/0x120
> :  __vmalloc_node_range+0x250/0x2a0
> :  ? kmem_zalloc_greedy+0x2b/0x40 [xfs]
> :  ? free_hot_cold_page+0x21f/0x280
> :  vzalloc+0x54/0x60
> :  ? kmem_zalloc_greedy+0x2b/0x40 [xfs]
> :  kmem_zalloc_greedy+0x2b/0x40 [xfs]
> :  xfs_bulkstat+0x11b/0x730 [xfs]
> :  ? xfs_bulkstat_one_int+0x340/0x340 [xfs]
> :  ? selinux_capable+0x20/0x30
> :  ? security_capable+0x48/0x60
> :  xfs_ioc_bulkstat+0xe4/0x190 [xfs]
> :  xfs_file_ioctl+0x9dd/0xad0 [xfs]
> :  ? do_filp_open+0xa5/0x100
> :  do_vfs_ioctl+0xa7/0x5e0
> :  SyS_ioctl+0x79/0x90
> :  do_syscall_64+0x67/0x180
> :  entry_SYSCALL64_slow_path+0x25/0x25
> : 
> : fsstress keeps looping inside kmem_zalloc_greedy without any way out
> : because vmalloc keeps failing due to fatal_signal_pending.
> : 
> : Reported-by: Xiong Zhou <xzhou@redhat.com>
> : Analyzed-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> : Signed-off-by: Michal Hocko <mhocko@suse.com>
> 
> > Also, that kind of makes me think that a fatal_signal_pending() check is
> > still appropriate in the loop, even if we want to drop the infinite
> > retry loop in kmem_zalloc_greedy() as well. There's no sense in doing
> > however many retries are left before we return and that's also more
> > explicit for the next person who goes to change this code in the future.
> 
> I am not objecting to adding fatal_signal_pending as well I just thought
> that from the logic POV breaking after reaching the minimum size is just
> the right thing to do. We can optimize further by checking
> fatal_signal_pending and reducing retries when we know it doesn't make
> much sense but that should be done on top as an optimization IMHO.
> 

I don't think of it as an optimization to not invoke calls (a
non-deterministic number of times) that we know are going to fail, but
ultimately I don't care too much how it's framed or if it's done in
separate patches or whatnot. As long as they are posted at the same
time. ;)

Brian

> > Otherwise, I'm fine with breaking the infinite retry loop at the same
> > time. It looks like Christoph added this function originally so this
> > should probably require his ack as well..
> 
> What do you think Christoph?
> -- 
> Michal Hocko
> SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* [PATCH 1/2] xfs: allow kmem_zalloc_greedy to fail
  2017-03-02 15:30                               ` Brian Foster
  (?)
  (?)
@ 2017-03-02 15:45                                 ` Michal Hocko
  -1 siblings, 0 replies; 79+ messages in thread
From: Michal Hocko @ 2017-03-02 15:45 UTC (permalink / raw)
  To: Christoph Hellwig, Brian Foster
  Cc: Tetsuo Handa, Xiong Zhou, linux-xfs, linux-mm, LKML,
	linux-fsdevel, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

Even though kmem_zalloc_greedy is documented it might fail the current
code doesn't really implement this properly and loops on the smallest
allowed size for ever. This is a problem because vzalloc might fail
permanently - we might run out of vmalloc space or since 5d17a73a2ebe
("vmalloc: back off when the current task is killed") when the current
task is killed. The later one makes the failure scenario much more
probable than it used to be because it makes vmalloc() failures
permanent for tasks with fatal signals pending.. Fix this by bailing out
if the minimum size request failed.

This has been noticed by a hung generic/269 xfstest by Xiong Zhou.

fsstress: vmalloc: allocation failure, allocated 12288 of 20480 bytes, mode:0x14080c2(GFP_KERNEL|__GFP_HIGHMEM|__GFP_ZERO), nodemask=(null)
fsstress cpuset=/ mems_allowed=0-1
CPU: 1 PID: 23460 Comm: fsstress Not tainted 4.10.0-master-45554b2+ #21
Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 10/05/2016
Call Trace:
 dump_stack+0x63/0x87
 warn_alloc+0x114/0x1c0
 ? alloc_pages_current+0x88/0x120
 __vmalloc_node_range+0x250/0x2a0
 ? kmem_zalloc_greedy+0x2b/0x40 [xfs]
 ? free_hot_cold_page+0x21f/0x280
 vzalloc+0x54/0x60
 ? kmem_zalloc_greedy+0x2b/0x40 [xfs]
 kmem_zalloc_greedy+0x2b/0x40 [xfs]
 xfs_bulkstat+0x11b/0x730 [xfs]
 ? xfs_bulkstat_one_int+0x340/0x340 [xfs]
 ? selinux_capable+0x20/0x30
 ? security_capable+0x48/0x60
 xfs_ioc_bulkstat+0xe4/0x190 [xfs]
 xfs_file_ioctl+0x9dd/0xad0 [xfs]
 ? do_filp_open+0xa5/0x100
 do_vfs_ioctl+0xa7/0x5e0
 SyS_ioctl+0x79/0x90
 do_syscall_64+0x67/0x180
 entry_SYSCALL64_slow_path+0x25/0x25

fsstress keeps looping inside kmem_zalloc_greedy without any way out
because vmalloc keeps failing due to fatal_signal_pending.

Reported-by: Xiong Zhou <xzhou@redhat.com>
Analyzed-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 fs/xfs/kmem.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c
index 339c696bbc01..ee95f5c6db45 100644
--- a/fs/xfs/kmem.c
+++ b/fs/xfs/kmem.c
@@ -34,6 +34,8 @@ kmem_zalloc_greedy(size_t *size, size_t minsize, size_t maxsize)
 	size_t		kmsize = maxsize;
 
 	while (!(ptr = vzalloc(kmsize))) {
+		if (kmsize == minsize)
+			break;
 		if ((kmsize >>= 1) <= minsize)
 			kmsize = minsize;
 	}
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH 1/2] xfs: allow kmem_zalloc_greedy to fail
@ 2017-03-02 15:45                                 ` Michal Hocko
  0 siblings, 0 replies; 79+ messages in thread
From: Michal Hocko @ 2017-03-02 15:45 UTC (permalink / raw)
  To: Christoph Hellwig, Brian Foster
  Cc: Tetsuo Handa, Xiong Zhou, linux-xfs, linux-mm, LKML,
	linux-fsdevel, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

Even though kmem_zalloc_greedy is documented it might fail the current
code doesn't really implement this properly and loops on the smallest
allowed size for ever. This is a problem because vzalloc might fail
permanently - we might run out of vmalloc space or since 5d17a73a2ebe
("vmalloc: back off when the current task is killed") when the current
task is killed. The later one makes the failure scenario much more
probable than it used to be because it makes vmalloc() failures
permanent for tasks with fatal signals pending.. Fix this by bailing out
if the minimum size request failed.

This has been noticed by a hung generic/269 xfstest by Xiong Zhou.

fsstress: vmalloc: allocation failure, allocated 12288 of 20480 bytes, mode:0x14080c2(GFP_KERNEL|__GFP_HIGHMEM|__GFP_ZERO), nodemask=(null)
fsstress cpuset=/ mems_allowed=0-1
CPU: 1 PID: 23460 Comm: fsstress Not tainted 4.10.0-master-45554b2+ #21
Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 10/05/2016
Call Trace:
 dump_stack+0x63/0x87
 warn_alloc+0x114/0x1c0
 ? alloc_pages_current+0x88/0x120
 __vmalloc_node_range+0x250/0x2a0
 ? kmem_zalloc_greedy+0x2b/0x40 [xfs]
 ? free_hot_cold_page+0x21f/0x280
 vzalloc+0x54/0x60
 ? kmem_zalloc_greedy+0x2b/0x40 [xfs]
 kmem_zalloc_greedy+0x2b/0x40 [xfs]
 xfs_bulkstat+0x11b/0x730 [xfs]
 ? xfs_bulkstat_one_int+0x340/0x340 [xfs]
 ? selinux_capable+0x20/0x30
 ? security_capable+0x48/0x60
 xfs_ioc_bulkstat+0xe4/0x190 [xfs]
 xfs_file_ioctl+0x9dd/0xad0 [xfs]
 ? do_filp_open+0xa5/0x100
 do_vfs_ioctl+0xa7/0x5e0
 SyS_ioctl+0x79/0x90
 do_syscall_64+0x67/0x180
 entry_SYSCALL64_slow_path+0x25/0x25

fsstress keeps looping inside kmem_zalloc_greedy without any way out
because vmalloc keeps failing due to fatal_signal_pending.

Reported-by: Xiong Zhou <xzhou@redhat.com>
Analyzed-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 fs/xfs/kmem.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c
index 339c696bbc01..ee95f5c6db45 100644
--- a/fs/xfs/kmem.c
+++ b/fs/xfs/kmem.c
@@ -34,6 +34,8 @@ kmem_zalloc_greedy(size_t *size, size_t minsize, size_t maxsize)
 	size_t		kmsize = maxsize;
 
 	while (!(ptr = vzalloc(kmsize))) {
+		if (kmsize == minsize)
+			break;
 		if ((kmsize >>= 1) <= minsize)
 			kmsize = minsize;
 	}
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH 1/2] xfs: allow kmem_zalloc_greedy to fail
@ 2017-03-02 15:45                                 ` Michal Hocko
  0 siblings, 0 replies; 79+ messages in thread
From: Michal Hocko @ 2017-03-02 15:45 UTC (permalink / raw)
  To: Christoph Hellwig, Brian Foster
  Cc: Tetsuo Handa, Xiong Zhou, linux-xfs, linux-mm, LKML,
	linux-fsdevel, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

Even though kmem_zalloc_greedy is documented it might fail the current
code doesn't really implement this properly and loops on the smallest
allowed size for ever. This is a problem because vzalloc might fail
permanently - we might run out of vmalloc space or since 5d17a73a2ebe
("vmalloc: back off when the current task is killed") when the current
task is killed. The later one makes the failure scenario much more
probable than it used to be because it makes vmalloc() failures
permanent for tasks with fatal signals pending.. Fix this by bailing out
if the minimum size request failed.

This has been noticed by a hung generic/269 xfstest by Xiong Zhou.

fsstress: vmalloc: allocation failure, allocated 12288 of 20480 bytes, mode:0x14080c2(GFP_KERNEL|__GFP_HIGHMEM|__GFP_ZERO), nodemask=(null)
fsstress cpuset=/ mems_allowed=0-1
CPU: 1 PID: 23460 Comm: fsstress Not tainted 4.10.0-master-45554b2+ #21
Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 10/05/2016
Call Trace:
 dump_stack+0x63/0x87
 warn_alloc+0x114/0x1c0
 ? alloc_pages_current+0x88/0x120
 __vmalloc_node_range+0x250/0x2a0
 ? kmem_zalloc_greedy+0x2b/0x40 [xfs]
 ? free_hot_cold_page+0x21f/0x280
 vzalloc+0x54/0x60
 ? kmem_zalloc_greedy+0x2b/0x40 [xfs]
 kmem_zalloc_greedy+0x2b/0x40 [xfs]
 xfs_bulkstat+0x11b/0x730 [xfs]
 ? xfs_bulkstat_one_int+0x340/0x340 [xfs]
 ? selinux_capable+0x20/0x30
 ? security_capable+0x48/0x60
 xfs_ioc_bulkstat+0xe4/0x190 [xfs]
 xfs_file_ioctl+0x9dd/0xad0 [xfs]
 ? do_filp_open+0xa5/0x100
 do_vfs_ioctl+0xa7/0x5e0
 SyS_ioctl+0x79/0x90
 do_syscall_64+0x67/0x180
 entry_SYSCALL64_slow_path+0x25/0x25

fsstress keeps looping inside kmem_zalloc_greedy without any way out
because vmalloc keeps failing due to fatal_signal_pending.

Reported-by: Xiong Zhou <xzhou@redhat.com>
Analyzed-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 fs/xfs/kmem.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c
index 339c696bbc01..ee95f5c6db45 100644
--- a/fs/xfs/kmem.c
+++ b/fs/xfs/kmem.c
@@ -34,6 +34,8 @@ kmem_zalloc_greedy(size_t *size, size_t minsize, size_t maxsize)
 	size_t		kmsize = maxsize;
 
 	while (!(ptr = vzalloc(kmsize))) {
+		if (kmsize == minsize)
+			break;
 		if ((kmsize >>= 1) <= minsize)
 			kmsize = minsize;
 	}
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH 1/2] xfs: allow kmem_zalloc_greedy to fail
@ 2017-03-02 15:45                                 ` Michal Hocko
  0 siblings, 0 replies; 79+ messages in thread
From: Michal Hocko @ 2017-03-02 15:45 UTC (permalink / raw)
  To: Christoph Hellwig, Brian Foster
  Cc: Tetsuo Handa, Xiong Zhou, linux-xfs, linux-mm, LKML,
	linux-fsdevel, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

Even though kmem_zalloc_greedy is documented it might fail the current
code doesn't really implement this properly and loops on the smallest
allowed size for ever. This is a problem because vzalloc might fail
permanently - we might run out of vmalloc space or since 5d17a73a2ebe
("vmalloc: back off when the current task is killed") when the current
task is killed. The later one makes the failure scenario much more
probable than it used to be because it makes vmalloc() failures
permanent for tasks with fatal signals pending.. Fix this by bailing out
if the minimum size request failed.

This has been noticed by a hung generic/269 xfstest by Xiong Zhou.

fsstress: vmalloc: allocation failure, allocated 12288 of 20480 bytes, mode:0x14080c2(GFP_KERNEL|__GFP_HIGHMEM|__GFP_ZERO), nodemask=(null)
fsstress cpuset=/ mems_allowed=0-1
CPU: 1 PID: 23460 Comm: fsstress Not tainted 4.10.0-master-45554b2+ #21
Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 10/05/2016
Call Trace:
 dump_stack+0x63/0x87
 warn_alloc+0x114/0x1c0
 ? alloc_pages_current+0x88/0x120
 __vmalloc_node_range+0x250/0x2a0
 ? kmem_zalloc_greedy+0x2b/0x40 [xfs]
 ? free_hot_cold_page+0x21f/0x280
 vzalloc+0x54/0x60
 ? kmem_zalloc_greedy+0x2b/0x40 [xfs]
 kmem_zalloc_greedy+0x2b/0x40 [xfs]
 xfs_bulkstat+0x11b/0x730 [xfs]
 ? xfs_bulkstat_one_int+0x340/0x340 [xfs]
 ? selinux_capable+0x20/0x30
 ? security_capable+0x48/0x60
 xfs_ioc_bulkstat+0xe4/0x190 [xfs]
 xfs_file_ioctl+0x9dd/0xad0 [xfs]
 ? do_filp_open+0xa5/0x100
 do_vfs_ioctl+0xa7/0x5e0
 SyS_ioctl+0x79/0x90
 do_syscall_64+0x67/0x180
 entry_SYSCALL64_slow_path+0x25/0x25

fsstress keeps looping inside kmem_zalloc_greedy without any way out
because vmalloc keeps failing due to fatal_signal_pending.

Reported-by: Xiong Zhou <xzhou@redhat.com>
Analyzed-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 fs/xfs/kmem.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c
index 339c696bbc01..ee95f5c6db45 100644
--- a/fs/xfs/kmem.c
+++ b/fs/xfs/kmem.c
@@ -34,6 +34,8 @@ kmem_zalloc_greedy(size_t *size, size_t minsize, size_t maxsize)
 	size_t		kmsize = maxsize;
 
 	while (!(ptr = vzalloc(kmsize))) {
+		if (kmsize == minsize)
+			break;
 		if ((kmsize >>= 1) <= minsize)
 			kmsize = minsize;
 	}
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH 2/2] xfs: back off from kmem_zalloc_greedy if the task is killed
  2017-03-02 15:45                                 ` Michal Hocko
  (?)
  (?)
@ 2017-03-02 15:45                                   ` Michal Hocko
  -1 siblings, 0 replies; 79+ messages in thread
From: Michal Hocko @ 2017-03-02 15:45 UTC (permalink / raw)
  To: Christoph Hellwig, Brian Foster
  Cc: Tetsuo Handa, Xiong Zhou, linux-xfs, linux-mm, LKML,
	linux-fsdevel, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

It doesn't really make much sense to retry vmalloc request if the
current task is killed. We should rather bail out as soon as possible
and let it RIP as soon as possible. The current implementation of
vmalloc will fail anyway.

Suggested-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 fs/xfs/kmem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c
index ee95f5c6db45..01c52567a4ff 100644
--- a/fs/xfs/kmem.c
+++ b/fs/xfs/kmem.c
@@ -34,7 +34,7 @@ kmem_zalloc_greedy(size_t *size, size_t minsize, size_t maxsize)
 	size_t		kmsize = maxsize;
 
 	while (!(ptr = vzalloc(kmsize))) {
-		if (kmsize == minsize)
+		if (kmsize == minsize || fatal_signal_pending(current))
 			break;
 		if ((kmsize >>= 1) <= minsize)
 			kmsize = minsize;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH 2/2] xfs: back off from kmem_zalloc_greedy if the task is killed
@ 2017-03-02 15:45                                   ` Michal Hocko
  0 siblings, 0 replies; 79+ messages in thread
From: Michal Hocko @ 2017-03-02 15:45 UTC (permalink / raw)
  To: Christoph Hellwig, Brian Foster
  Cc: Tetsuo Handa, Xiong Zhou, linux-xfs, linux-mm, LKML,
	linux-fsdevel, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

It doesn't really make much sense to retry vmalloc request if the
current task is killed. We should rather bail out as soon as possible
and let it RIP as soon as possible. The current implementation of
vmalloc will fail anyway.

Suggested-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 fs/xfs/kmem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c
index ee95f5c6db45..01c52567a4ff 100644
--- a/fs/xfs/kmem.c
+++ b/fs/xfs/kmem.c
@@ -34,7 +34,7 @@ kmem_zalloc_greedy(size_t *size, size_t minsize, size_t maxsize)
 	size_t		kmsize = maxsize;
 
 	while (!(ptr = vzalloc(kmsize))) {
-		if (kmsize == minsize)
+		if (kmsize == minsize || fatal_signal_pending(current))
 			break;
 		if ((kmsize >>= 1) <= minsize)
 			kmsize = minsize;
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH 2/2] xfs: back off from kmem_zalloc_greedy if the task is killed
@ 2017-03-02 15:45                                   ` Michal Hocko
  0 siblings, 0 replies; 79+ messages in thread
From: Michal Hocko @ 2017-03-02 15:45 UTC (permalink / raw)
  To: Christoph Hellwig, Brian Foster
  Cc: Tetsuo Handa, Xiong Zhou, linux-xfs, linux-mm, LKML,
	linux-fsdevel, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

It doesn't really make much sense to retry vmalloc request if the
current task is killed. We should rather bail out as soon as possible
and let it RIP as soon as possible. The current implementation of
vmalloc will fail anyway.

Suggested-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 fs/xfs/kmem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c
index ee95f5c6db45..01c52567a4ff 100644
--- a/fs/xfs/kmem.c
+++ b/fs/xfs/kmem.c
@@ -34,7 +34,7 @@ kmem_zalloc_greedy(size_t *size, size_t minsize, size_t maxsize)
 	size_t		kmsize = maxsize;
 
 	while (!(ptr = vzalloc(kmsize))) {
-		if (kmsize == minsize)
+		if (kmsize == minsize || fatal_signal_pending(current))
 			break;
 		if ((kmsize >>= 1) <= minsize)
 			kmsize = minsize;
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH 2/2] xfs: back off from kmem_zalloc_greedy if the task is killed
@ 2017-03-02 15:45                                   ` Michal Hocko
  0 siblings, 0 replies; 79+ messages in thread
From: Michal Hocko @ 2017-03-02 15:45 UTC (permalink / raw)
  To: Christoph Hellwig, Brian Foster
  Cc: Tetsuo Handa, Xiong Zhou, linux-xfs, linux-mm, LKML,
	linux-fsdevel, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

It doesn't really make much sense to retry vmalloc request if the
current task is killed. We should rather bail out as soon as possible
and let it RIP as soon as possible. The current implementation of
vmalloc will fail anyway.

Suggested-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 fs/xfs/kmem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c
index ee95f5c6db45..01c52567a4ff 100644
--- a/fs/xfs/kmem.c
+++ b/fs/xfs/kmem.c
@@ -34,7 +34,7 @@ kmem_zalloc_greedy(size_t *size, size_t minsize, size_t maxsize)
 	size_t		kmsize = maxsize;
 
 	while (!(ptr = vzalloc(kmsize))) {
-		if (kmsize == minsize)
+		if (kmsize == minsize || fatal_signal_pending(current))
 			break;
 		if ((kmsize >>= 1) <= minsize)
 			kmsize = minsize;
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* Re: mm allocation failure and hang when running xfstests generic/269 on xfs
  2017-03-02 14:51                           ` Brian Foster
@ 2017-03-02 15:47                             ` Christoph Hellwig
  -1 siblings, 0 replies; 79+ messages in thread
From: Christoph Hellwig @ 2017-03-02 15:47 UTC (permalink / raw)
  To: Brian Foster
  Cc: Michal Hocko, Tetsuo Handa, Xiong Zhou, Christoph Hellwig,
	linux-xfs, linux-mm, linux-kernel, linux-fsdevel

On Thu, Mar 02, 2017 at 09:51:31AM -0500, Brian Foster wrote:
> Otherwise, I'm fine with breaking the infinite retry loop at the same
> time. It looks like Christoph added this function originally so this
> should probably require his ack as well..

I just moved the code around, but I'll take a look as well.

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: mm allocation failure and hang when running xfstests generic/269 on xfs
@ 2017-03-02 15:47                             ` Christoph Hellwig
  0 siblings, 0 replies; 79+ messages in thread
From: Christoph Hellwig @ 2017-03-02 15:47 UTC (permalink / raw)
  To: Brian Foster
  Cc: Michal Hocko, Tetsuo Handa, Xiong Zhou, Christoph Hellwig,
	linux-xfs, linux-mm, linux-kernel, linux-fsdevel

On Thu, Mar 02, 2017 at 09:51:31AM -0500, Brian Foster wrote:
> Otherwise, I'm fine with breaking the infinite retry loop at the same
> time. It looks like Christoph added this function originally so this
> should probably require his ack as well..

I just moved the code around, but I'll take a look as well.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: mm allocation failure and hang when running xfstests generic/269 on xfs
  2017-03-02 15:30                               ` Brian Foster
@ 2017-03-02 15:47                                 ` Michal Hocko
  -1 siblings, 0 replies; 79+ messages in thread
From: Michal Hocko @ 2017-03-02 15:47 UTC (permalink / raw)
  To: Brian Foster
  Cc: Christoph Hellwig, Tetsuo Handa, Xiong Zhou, linux-xfs, linux-mm,
	linux-kernel, linux-fsdevel

On Thu 02-03-17 10:30:02, Brian Foster wrote:
> On Thu, Mar 02, 2017 at 04:14:11PM +0100, Michal Hocko wrote:
[...]
> > I am not objecting to adding fatal_signal_pending as well I just thought
> > that from the logic POV breaking after reaching the minimum size is just
> > the right thing to do. We can optimize further by checking
> > fatal_signal_pending and reducing retries when we know it doesn't make
> > much sense but that should be done on top as an optimization IMHO.
> > 
> 
> I don't think of it as an optimization to not invoke calls (a
> non-deterministic number of times) that we know are going to fail, but

the point is that vmalloc failure modes are an implementation detail
which might change in the future. The fix should be really independent
on the current implementation that is why I think the
fatal_signal_pending is just an optimization.

> ultimately I don't care too much how it's framed or if it's done in
> separate patches or whatnot. As long as they are posted at the same
> time. ;)

Done

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: mm allocation failure and hang when running xfstests generic/269 on xfs
@ 2017-03-02 15:47                                 ` Michal Hocko
  0 siblings, 0 replies; 79+ messages in thread
From: Michal Hocko @ 2017-03-02 15:47 UTC (permalink / raw)
  To: Brian Foster
  Cc: Christoph Hellwig, Tetsuo Handa, Xiong Zhou, linux-xfs, linux-mm,
	linux-kernel, linux-fsdevel

On Thu 02-03-17 10:30:02, Brian Foster wrote:
> On Thu, Mar 02, 2017 at 04:14:11PM +0100, Michal Hocko wrote:
[...]
> > I am not objecting to adding fatal_signal_pending as well I just thought
> > that from the logic POV breaking after reaching the minimum size is just
> > the right thing to do. We can optimize further by checking
> > fatal_signal_pending and reducing retries when we know it doesn't make
> > much sense but that should be done on top as an optimization IMHO.
> > 
> 
> I don't think of it as an optimization to not invoke calls (a
> non-deterministic number of times) that we know are going to fail, but

the point is that vmalloc failure modes are an implementation detail
which might change in the future. The fix should be really independent
on the current implementation that is why I think the
fatal_signal_pending is just an optimization.

> ultimately I don't care too much how it's framed or if it's done in
> separate patches or whatnot. As long as they are posted at the same
> time. ;)

Done

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 1/2] xfs: allow kmem_zalloc_greedy to fail
  2017-03-02 15:45                                 ` Michal Hocko
@ 2017-03-02 15:49                                   ` Christoph Hellwig
  -1 siblings, 0 replies; 79+ messages in thread
From: Christoph Hellwig @ 2017-03-02 15:49 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Christoph Hellwig, Brian Foster, Tetsuo Handa, Xiong Zhou,
	linux-xfs, linux-mm, LKML, linux-fsdevel, Michal Hocko

Looks fine,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 1/2] xfs: allow kmem_zalloc_greedy to fail
@ 2017-03-02 15:49                                   ` Christoph Hellwig
  0 siblings, 0 replies; 79+ messages in thread
From: Christoph Hellwig @ 2017-03-02 15:49 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Christoph Hellwig, Brian Foster, Tetsuo Handa, Xiong Zhou,
	linux-xfs, linux-mm, LKML, linux-fsdevel, Michal Hocko

Looks fine,

Reviewed-by: Christoph Hellwig <hch@lst.de>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 2/2] xfs: back off from kmem_zalloc_greedy if the task is killed
  2017-03-02 15:45                                   ` Michal Hocko
@ 2017-03-02 15:49                                     ` Christoph Hellwig
  -1 siblings, 0 replies; 79+ messages in thread
From: Christoph Hellwig @ 2017-03-02 15:49 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Christoph Hellwig, Brian Foster, Tetsuo Handa, Xiong Zhou,
	linux-xfs, linux-mm, LKML, linux-fsdevel, Michal Hocko

Looks fine,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 2/2] xfs: back off from kmem_zalloc_greedy if the task is killed
@ 2017-03-02 15:49                                     ` Christoph Hellwig
  0 siblings, 0 replies; 79+ messages in thread
From: Christoph Hellwig @ 2017-03-02 15:49 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Christoph Hellwig, Brian Foster, Tetsuo Handa, Xiong Zhou,
	linux-xfs, linux-mm, LKML, linux-fsdevel, Michal Hocko

Looks fine,

Reviewed-by: Christoph Hellwig <hch@lst.de>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 1/2] xfs: allow kmem_zalloc_greedy to fail
  2017-03-02 15:45                                 ` Michal Hocko
@ 2017-03-02 15:59                                   ` Brian Foster
  -1 siblings, 0 replies; 79+ messages in thread
From: Brian Foster @ 2017-03-02 15:59 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Christoph Hellwig, Tetsuo Handa, Xiong Zhou, linux-xfs, linux-mm,
	LKML, linux-fsdevel, Michal Hocko

On Thu, Mar 02, 2017 at 04:45:40PM +0100, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> Even though kmem_zalloc_greedy is documented it might fail the current
> code doesn't really implement this properly and loops on the smallest
> allowed size for ever. This is a problem because vzalloc might fail
> permanently - we might run out of vmalloc space or since 5d17a73a2ebe
> ("vmalloc: back off when the current task is killed") when the current
> task is killed. The later one makes the failure scenario much more
> probable than it used to be because it makes vmalloc() failures
> permanent for tasks with fatal signals pending.. Fix this by bailing out
> if the minimum size request failed.
> 
> This has been noticed by a hung generic/269 xfstest by Xiong Zhou.
> 
> fsstress: vmalloc: allocation failure, allocated 12288 of 20480 bytes, mode:0x14080c2(GFP_KERNEL|__GFP_HIGHMEM|__GFP_ZERO), nodemask=(null)
> fsstress cpuset=/ mems_allowed=0-1
> CPU: 1 PID: 23460 Comm: fsstress Not tainted 4.10.0-master-45554b2+ #21
> Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 10/05/2016
> Call Trace:
>  dump_stack+0x63/0x87
>  warn_alloc+0x114/0x1c0
>  ? alloc_pages_current+0x88/0x120
>  __vmalloc_node_range+0x250/0x2a0
>  ? kmem_zalloc_greedy+0x2b/0x40 [xfs]
>  ? free_hot_cold_page+0x21f/0x280
>  vzalloc+0x54/0x60
>  ? kmem_zalloc_greedy+0x2b/0x40 [xfs]
>  kmem_zalloc_greedy+0x2b/0x40 [xfs]
>  xfs_bulkstat+0x11b/0x730 [xfs]
>  ? xfs_bulkstat_one_int+0x340/0x340 [xfs]
>  ? selinux_capable+0x20/0x30
>  ? security_capable+0x48/0x60
>  xfs_ioc_bulkstat+0xe4/0x190 [xfs]
>  xfs_file_ioctl+0x9dd/0xad0 [xfs]
>  ? do_filp_open+0xa5/0x100
>  do_vfs_ioctl+0xa7/0x5e0
>  SyS_ioctl+0x79/0x90
>  do_syscall_64+0x67/0x180
>  entry_SYSCALL64_slow_path+0x25/0x25
> 
> fsstress keeps looping inside kmem_zalloc_greedy without any way out
> because vmalloc keeps failing due to fatal_signal_pending.
> 
> Reported-by: Xiong Zhou <xzhou@redhat.com>
> Analyzed-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/kmem.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c
> index 339c696bbc01..ee95f5c6db45 100644
> --- a/fs/xfs/kmem.c
> +++ b/fs/xfs/kmem.c
> @@ -34,6 +34,8 @@ kmem_zalloc_greedy(size_t *size, size_t minsize, size_t maxsize)
>  	size_t		kmsize = maxsize;
>  
>  	while (!(ptr = vzalloc(kmsize))) {
> +		if (kmsize == minsize)
> +			break;
>  		if ((kmsize >>= 1) <= minsize)
>  			kmsize = minsize;
>  	}
> -- 
> 2.11.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 1/2] xfs: allow kmem_zalloc_greedy to fail
@ 2017-03-02 15:59                                   ` Brian Foster
  0 siblings, 0 replies; 79+ messages in thread
From: Brian Foster @ 2017-03-02 15:59 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Christoph Hellwig, Tetsuo Handa, Xiong Zhou, linux-xfs, linux-mm,
	LKML, linux-fsdevel, Michal Hocko

On Thu, Mar 02, 2017 at 04:45:40PM +0100, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> Even though kmem_zalloc_greedy is documented it might fail the current
> code doesn't really implement this properly and loops on the smallest
> allowed size for ever. This is a problem because vzalloc might fail
> permanently - we might run out of vmalloc space or since 5d17a73a2ebe
> ("vmalloc: back off when the current task is killed") when the current
> task is killed. The later one makes the failure scenario much more
> probable than it used to be because it makes vmalloc() failures
> permanent for tasks with fatal signals pending.. Fix this by bailing out
> if the minimum size request failed.
> 
> This has been noticed by a hung generic/269 xfstest by Xiong Zhou.
> 
> fsstress: vmalloc: allocation failure, allocated 12288 of 20480 bytes, mode:0x14080c2(GFP_KERNEL|__GFP_HIGHMEM|__GFP_ZERO), nodemask=(null)
> fsstress cpuset=/ mems_allowed=0-1
> CPU: 1 PID: 23460 Comm: fsstress Not tainted 4.10.0-master-45554b2+ #21
> Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 10/05/2016
> Call Trace:
>  dump_stack+0x63/0x87
>  warn_alloc+0x114/0x1c0
>  ? alloc_pages_current+0x88/0x120
>  __vmalloc_node_range+0x250/0x2a0
>  ? kmem_zalloc_greedy+0x2b/0x40 [xfs]
>  ? free_hot_cold_page+0x21f/0x280
>  vzalloc+0x54/0x60
>  ? kmem_zalloc_greedy+0x2b/0x40 [xfs]
>  kmem_zalloc_greedy+0x2b/0x40 [xfs]
>  xfs_bulkstat+0x11b/0x730 [xfs]
>  ? xfs_bulkstat_one_int+0x340/0x340 [xfs]
>  ? selinux_capable+0x20/0x30
>  ? security_capable+0x48/0x60
>  xfs_ioc_bulkstat+0xe4/0x190 [xfs]
>  xfs_file_ioctl+0x9dd/0xad0 [xfs]
>  ? do_filp_open+0xa5/0x100
>  do_vfs_ioctl+0xa7/0x5e0
>  SyS_ioctl+0x79/0x90
>  do_syscall_64+0x67/0x180
>  entry_SYSCALL64_slow_path+0x25/0x25
> 
> fsstress keeps looping inside kmem_zalloc_greedy without any way out
> because vmalloc keeps failing due to fatal_signal_pending.
> 
> Reported-by: Xiong Zhou <xzhou@redhat.com>
> Analyzed-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/kmem.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c
> index 339c696bbc01..ee95f5c6db45 100644
> --- a/fs/xfs/kmem.c
> +++ b/fs/xfs/kmem.c
> @@ -34,6 +34,8 @@ kmem_zalloc_greedy(size_t *size, size_t minsize, size_t maxsize)
>  	size_t		kmsize = maxsize;
>  
>  	while (!(ptr = vzalloc(kmsize))) {
> +		if (kmsize == minsize)
> +			break;
>  		if ((kmsize >>= 1) <= minsize)
>  			kmsize = minsize;
>  	}
> -- 
> 2.11.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 2/2] xfs: back off from kmem_zalloc_greedy if the task is killed
  2017-03-02 15:45                                   ` Michal Hocko
@ 2017-03-02 15:59                                     ` Brian Foster
  -1 siblings, 0 replies; 79+ messages in thread
From: Brian Foster @ 2017-03-02 15:59 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Christoph Hellwig, Tetsuo Handa, Xiong Zhou, linux-xfs, linux-mm,
	LKML, linux-fsdevel, Michal Hocko

On Thu, Mar 02, 2017 at 04:45:41PM +0100, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> It doesn't really make much sense to retry vmalloc request if the
> current task is killed. We should rather bail out as soon as possible
> and let it RIP as soon as possible. The current implementation of
> vmalloc will fail anyway.
> 
> Suggested-by: Brian Foster <bfoster@redhat.com>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/kmem.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c
> index ee95f5c6db45..01c52567a4ff 100644
> --- a/fs/xfs/kmem.c
> +++ b/fs/xfs/kmem.c
> @@ -34,7 +34,7 @@ kmem_zalloc_greedy(size_t *size, size_t minsize, size_t maxsize)
>  	size_t		kmsize = maxsize;
>  
>  	while (!(ptr = vzalloc(kmsize))) {
> -		if (kmsize == minsize)
> +		if (kmsize == minsize || fatal_signal_pending(current))
>  			break;
>  		if ((kmsize >>= 1) <= minsize)
>  			kmsize = minsize;
> -- 
> 2.11.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 2/2] xfs: back off from kmem_zalloc_greedy if the task is killed
@ 2017-03-02 15:59                                     ` Brian Foster
  0 siblings, 0 replies; 79+ messages in thread
From: Brian Foster @ 2017-03-02 15:59 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Christoph Hellwig, Tetsuo Handa, Xiong Zhou, linux-xfs, linux-mm,
	LKML, linux-fsdevel, Michal Hocko

On Thu, Mar 02, 2017 at 04:45:41PM +0100, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> It doesn't really make much sense to retry vmalloc request if the
> current task is killed. We should rather bail out as soon as possible
> and let it RIP as soon as possible. The current implementation of
> vmalloc will fail anyway.
> 
> Suggested-by: Brian Foster <bfoster@redhat.com>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/kmem.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c
> index ee95f5c6db45..01c52567a4ff 100644
> --- a/fs/xfs/kmem.c
> +++ b/fs/xfs/kmem.c
> @@ -34,7 +34,7 @@ kmem_zalloc_greedy(size_t *size, size_t minsize, size_t maxsize)
>  	size_t		kmsize = maxsize;
>  
>  	while (!(ptr = vzalloc(kmsize))) {
> -		if (kmsize == minsize)
> +		if (kmsize == minsize || fatal_signal_pending(current))
>  			break;
>  		if ((kmsize >>= 1) <= minsize)
>  			kmsize = minsize;
> -- 
> 2.11.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 1/2] xfs: allow kmem_zalloc_greedy to fail
  2017-03-02 15:45                                 ` Michal Hocko
@ 2017-03-02 16:16                                   ` Michal Hocko
  -1 siblings, 0 replies; 79+ messages in thread
From: Michal Hocko @ 2017-03-02 16:16 UTC (permalink / raw)
  To: Christoph Hellwig, Brian Foster, Darrick J. Wong
  Cc: Tetsuo Handa, Xiong Zhou, linux-xfs, linux-mm, LKML, linux-fsdevel

I've just realized that Darrick was not on the CC list. Let's add him.
I believe this patch should go in in the current cycle because
5d17a73a2ebe was merged in this merge window and it can be abused...

The other patch [1] is not that urgent.

[1] http://lkml.kernel.org/r/20170302154541.16155-2-mhocko@kernel.org

On Thu 02-03-17 16:45:40, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> Even though kmem_zalloc_greedy is documented it might fail the current
> code doesn't really implement this properly and loops on the smallest
> allowed size for ever. This is a problem because vzalloc might fail
> permanently - we might run out of vmalloc space or since 5d17a73a2ebe
> ("vmalloc: back off when the current task is killed") when the current
> task is killed. The later one makes the failure scenario much more
> probable than it used to be because it makes vmalloc() failures
> permanent for tasks with fatal signals pending.. Fix this by bailing out
> if the minimum size request failed.
> 
> This has been noticed by a hung generic/269 xfstest by Xiong Zhou.
> 
> fsstress: vmalloc: allocation failure, allocated 12288 of 20480 bytes, mode:0x14080c2(GFP_KERNEL|__GFP_HIGHMEM|__GFP_ZERO), nodemask=(null)
> fsstress cpuset=/ mems_allowed=0-1
> CPU: 1 PID: 23460 Comm: fsstress Not tainted 4.10.0-master-45554b2+ #21
> Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 10/05/2016
> Call Trace:
>  dump_stack+0x63/0x87
>  warn_alloc+0x114/0x1c0
>  ? alloc_pages_current+0x88/0x120
>  __vmalloc_node_range+0x250/0x2a0
>  ? kmem_zalloc_greedy+0x2b/0x40 [xfs]
>  ? free_hot_cold_page+0x21f/0x280
>  vzalloc+0x54/0x60
>  ? kmem_zalloc_greedy+0x2b/0x40 [xfs]
>  kmem_zalloc_greedy+0x2b/0x40 [xfs]
>  xfs_bulkstat+0x11b/0x730 [xfs]
>  ? xfs_bulkstat_one_int+0x340/0x340 [xfs]
>  ? selinux_capable+0x20/0x30
>  ? security_capable+0x48/0x60
>  xfs_ioc_bulkstat+0xe4/0x190 [xfs]
>  xfs_file_ioctl+0x9dd/0xad0 [xfs]
>  ? do_filp_open+0xa5/0x100
>  do_vfs_ioctl+0xa7/0x5e0
>  SyS_ioctl+0x79/0x90
>  do_syscall_64+0x67/0x180
>  entry_SYSCALL64_slow_path+0x25/0x25
> 
> fsstress keeps looping inside kmem_zalloc_greedy without any way out
> because vmalloc keeps failing due to fatal_signal_pending.
> 
> Reported-by: Xiong Zhou <xzhou@redhat.com>
> Analyzed-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  fs/xfs/kmem.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c
> index 339c696bbc01..ee95f5c6db45 100644
> --- a/fs/xfs/kmem.c
> +++ b/fs/xfs/kmem.c
> @@ -34,6 +34,8 @@ kmem_zalloc_greedy(size_t *size, size_t minsize, size_t maxsize)
>  	size_t		kmsize = maxsize;
>  
>  	while (!(ptr = vzalloc(kmsize))) {
> +		if (kmsize == minsize)
> +			break;
>  		if ((kmsize >>= 1) <= minsize)
>  			kmsize = minsize;
>  	}
> -- 
> 2.11.0
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 1/2] xfs: allow kmem_zalloc_greedy to fail
@ 2017-03-02 16:16                                   ` Michal Hocko
  0 siblings, 0 replies; 79+ messages in thread
From: Michal Hocko @ 2017-03-02 16:16 UTC (permalink / raw)
  To: Christoph Hellwig, Brian Foster, Darrick J. Wong
  Cc: Tetsuo Handa, Xiong Zhou, linux-xfs, linux-mm, LKML, linux-fsdevel

I've just realized that Darrick was not on the CC list. Let's add him.
I believe this patch should go in in the current cycle because
5d17a73a2ebe was merged in this merge window and it can be abused...

The other patch [1] is not that urgent.

[1] http://lkml.kernel.org/r/20170302154541.16155-2-mhocko@kernel.org

On Thu 02-03-17 16:45:40, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> Even though kmem_zalloc_greedy is documented it might fail the current
> code doesn't really implement this properly and loops on the smallest
> allowed size for ever. This is a problem because vzalloc might fail
> permanently - we might run out of vmalloc space or since 5d17a73a2ebe
> ("vmalloc: back off when the current task is killed") when the current
> task is killed. The later one makes the failure scenario much more
> probable than it used to be because it makes vmalloc() failures
> permanent for tasks with fatal signals pending.. Fix this by bailing out
> if the minimum size request failed.
> 
> This has been noticed by a hung generic/269 xfstest by Xiong Zhou.
> 
> fsstress: vmalloc: allocation failure, allocated 12288 of 20480 bytes, mode:0x14080c2(GFP_KERNEL|__GFP_HIGHMEM|__GFP_ZERO), nodemask=(null)
> fsstress cpuset=/ mems_allowed=0-1
> CPU: 1 PID: 23460 Comm: fsstress Not tainted 4.10.0-master-45554b2+ #21
> Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 10/05/2016
> Call Trace:
>  dump_stack+0x63/0x87
>  warn_alloc+0x114/0x1c0
>  ? alloc_pages_current+0x88/0x120
>  __vmalloc_node_range+0x250/0x2a0
>  ? kmem_zalloc_greedy+0x2b/0x40 [xfs]
>  ? free_hot_cold_page+0x21f/0x280
>  vzalloc+0x54/0x60
>  ? kmem_zalloc_greedy+0x2b/0x40 [xfs]
>  kmem_zalloc_greedy+0x2b/0x40 [xfs]
>  xfs_bulkstat+0x11b/0x730 [xfs]
>  ? xfs_bulkstat_one_int+0x340/0x340 [xfs]
>  ? selinux_capable+0x20/0x30
>  ? security_capable+0x48/0x60
>  xfs_ioc_bulkstat+0xe4/0x190 [xfs]
>  xfs_file_ioctl+0x9dd/0xad0 [xfs]
>  ? do_filp_open+0xa5/0x100
>  do_vfs_ioctl+0xa7/0x5e0
>  SyS_ioctl+0x79/0x90
>  do_syscall_64+0x67/0x180
>  entry_SYSCALL64_slow_path+0x25/0x25
> 
> fsstress keeps looping inside kmem_zalloc_greedy without any way out
> because vmalloc keeps failing due to fatal_signal_pending.
> 
> Reported-by: Xiong Zhou <xzhou@redhat.com>
> Analyzed-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  fs/xfs/kmem.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c
> index 339c696bbc01..ee95f5c6db45 100644
> --- a/fs/xfs/kmem.c
> +++ b/fs/xfs/kmem.c
> @@ -34,6 +34,8 @@ kmem_zalloc_greedy(size_t *size, size_t minsize, size_t maxsize)
>  	size_t		kmsize = maxsize;
>  
>  	while (!(ptr = vzalloc(kmsize))) {
> +		if (kmsize == minsize)
> +			break;
>  		if ((kmsize >>= 1) <= minsize)
>  			kmsize = minsize;
>  	}
> -- 
> 2.11.0
> 

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 1/2] xfs: allow kmem_zalloc_greedy to fail
  2017-03-02 16:16                                   ` Michal Hocko
@ 2017-03-02 16:44                                     ` Darrick J. Wong
  -1 siblings, 0 replies; 79+ messages in thread
From: Darrick J. Wong @ 2017-03-02 16:44 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Christoph Hellwig, Brian Foster, Tetsuo Handa, Xiong Zhou,
	linux-xfs, linux-mm, LKML, linux-fsdevel

On Thu, Mar 02, 2017 at 05:16:06PM +0100, Michal Hocko wrote:
> I've just realized that Darrick was not on the CC list. Let's add him.
> I believe this patch should go in in the current cycle because
> 5d17a73a2ebe was merged in this merge window and it can be abused...
> 
> The other patch [1] is not that urgent.
> 
> [1] http://lkml.kernel.org/r/20170302154541.16155-2-mhocko@kernel.org

Both patches look ok to me.  I'll take both patches for rc2.
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

(Annoyingly I missed the whole thread yesterday due to vger slowness, in
case anyone was wondering why I didn't reply.)

--D

> 
> On Thu 02-03-17 16:45:40, Michal Hocko wrote:
> > From: Michal Hocko <mhocko@suse.com>
> > 
> > Even though kmem_zalloc_greedy is documented it might fail the current
> > code doesn't really implement this properly and loops on the smallest
> > allowed size for ever. This is a problem because vzalloc might fail
> > permanently - we might run out of vmalloc space or since 5d17a73a2ebe
> > ("vmalloc: back off when the current task is killed") when the current
> > task is killed. The later one makes the failure scenario much more
> > probable than it used to be because it makes vmalloc() failures
> > permanent for tasks with fatal signals pending.. Fix this by bailing out
> > if the minimum size request failed.
> > 
> > This has been noticed by a hung generic/269 xfstest by Xiong Zhou.
> > 
> > fsstress: vmalloc: allocation failure, allocated 12288 of 20480 bytes, mode:0x14080c2(GFP_KERNEL|__GFP_HIGHMEM|__GFP_ZERO), nodemask=(null)
> > fsstress cpuset=/ mems_allowed=0-1
> > CPU: 1 PID: 23460 Comm: fsstress Not tainted 4.10.0-master-45554b2+ #21
> > Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 10/05/2016
> > Call Trace:
> >  dump_stack+0x63/0x87
> >  warn_alloc+0x114/0x1c0
> >  ? alloc_pages_current+0x88/0x120
> >  __vmalloc_node_range+0x250/0x2a0
> >  ? kmem_zalloc_greedy+0x2b/0x40 [xfs]
> >  ? free_hot_cold_page+0x21f/0x280
> >  vzalloc+0x54/0x60
> >  ? kmem_zalloc_greedy+0x2b/0x40 [xfs]
> >  kmem_zalloc_greedy+0x2b/0x40 [xfs]
> >  xfs_bulkstat+0x11b/0x730 [xfs]
> >  ? xfs_bulkstat_one_int+0x340/0x340 [xfs]
> >  ? selinux_capable+0x20/0x30
> >  ? security_capable+0x48/0x60
> >  xfs_ioc_bulkstat+0xe4/0x190 [xfs]
> >  xfs_file_ioctl+0x9dd/0xad0 [xfs]
> >  ? do_filp_open+0xa5/0x100
> >  do_vfs_ioctl+0xa7/0x5e0
> >  SyS_ioctl+0x79/0x90
> >  do_syscall_64+0x67/0x180
> >  entry_SYSCALL64_slow_path+0x25/0x25
> > 
> > fsstress keeps looping inside kmem_zalloc_greedy without any way out
> > because vmalloc keeps failing due to fatal_signal_pending.
> > 
> > Reported-by: Xiong Zhou <xzhou@redhat.com>
> > Analyzed-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> > Signed-off-by: Michal Hocko <mhocko@suse.com>
> > ---
> >  fs/xfs/kmem.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c
> > index 339c696bbc01..ee95f5c6db45 100644
> > --- a/fs/xfs/kmem.c
> > +++ b/fs/xfs/kmem.c
> > @@ -34,6 +34,8 @@ kmem_zalloc_greedy(size_t *size, size_t minsize, size_t maxsize)
> >  	size_t		kmsize = maxsize;
> >  
> >  	while (!(ptr = vzalloc(kmsize))) {
> > +		if (kmsize == minsize)
> > +			break;
> >  		if ((kmsize >>= 1) <= minsize)
> >  			kmsize = minsize;
> >  	}
> > -- 
> > 2.11.0
> > 
> 
> -- 
> Michal Hocko
> SUSE Labs
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 1/2] xfs: allow kmem_zalloc_greedy to fail
@ 2017-03-02 16:44                                     ` Darrick J. Wong
  0 siblings, 0 replies; 79+ messages in thread
From: Darrick J. Wong @ 2017-03-02 16:44 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Christoph Hellwig, Brian Foster, Tetsuo Handa, Xiong Zhou,
	linux-xfs, linux-mm, LKML, linux-fsdevel

On Thu, Mar 02, 2017 at 05:16:06PM +0100, Michal Hocko wrote:
> I've just realized that Darrick was not on the CC list. Let's add him.
> I believe this patch should go in in the current cycle because
> 5d17a73a2ebe was merged in this merge window and it can be abused...
> 
> The other patch [1] is not that urgent.
> 
> [1] http://lkml.kernel.org/r/20170302154541.16155-2-mhocko@kernel.org

Both patches look ok to me.  I'll take both patches for rc2.
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

(Annoyingly I missed the whole thread yesterday due to vger slowness, in
case anyone was wondering why I didn't reply.)

--D

> 
> On Thu 02-03-17 16:45:40, Michal Hocko wrote:
> > From: Michal Hocko <mhocko@suse.com>
> > 
> > Even though kmem_zalloc_greedy is documented it might fail the current
> > code doesn't really implement this properly and loops on the smallest
> > allowed size for ever. This is a problem because vzalloc might fail
> > permanently - we might run out of vmalloc space or since 5d17a73a2ebe
> > ("vmalloc: back off when the current task is killed") when the current
> > task is killed. The later one makes the failure scenario much more
> > probable than it used to be because it makes vmalloc() failures
> > permanent for tasks with fatal signals pending.. Fix this by bailing out
> > if the minimum size request failed.
> > 
> > This has been noticed by a hung generic/269 xfstest by Xiong Zhou.
> > 
> > fsstress: vmalloc: allocation failure, allocated 12288 of 20480 bytes, mode:0x14080c2(GFP_KERNEL|__GFP_HIGHMEM|__GFP_ZERO), nodemask=(null)
> > fsstress cpuset=/ mems_allowed=0-1
> > CPU: 1 PID: 23460 Comm: fsstress Not tainted 4.10.0-master-45554b2+ #21
> > Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 10/05/2016
> > Call Trace:
> >  dump_stack+0x63/0x87
> >  warn_alloc+0x114/0x1c0
> >  ? alloc_pages_current+0x88/0x120
> >  __vmalloc_node_range+0x250/0x2a0
> >  ? kmem_zalloc_greedy+0x2b/0x40 [xfs]
> >  ? free_hot_cold_page+0x21f/0x280
> >  vzalloc+0x54/0x60
> >  ? kmem_zalloc_greedy+0x2b/0x40 [xfs]
> >  kmem_zalloc_greedy+0x2b/0x40 [xfs]
> >  xfs_bulkstat+0x11b/0x730 [xfs]
> >  ? xfs_bulkstat_one_int+0x340/0x340 [xfs]
> >  ? selinux_capable+0x20/0x30
> >  ? security_capable+0x48/0x60
> >  xfs_ioc_bulkstat+0xe4/0x190 [xfs]
> >  xfs_file_ioctl+0x9dd/0xad0 [xfs]
> >  ? do_filp_open+0xa5/0x100
> >  do_vfs_ioctl+0xa7/0x5e0
> >  SyS_ioctl+0x79/0x90
> >  do_syscall_64+0x67/0x180
> >  entry_SYSCALL64_slow_path+0x25/0x25
> > 
> > fsstress keeps looping inside kmem_zalloc_greedy without any way out
> > because vmalloc keeps failing due to fatal_signal_pending.
> > 
> > Reported-by: Xiong Zhou <xzhou@redhat.com>
> > Analyzed-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> > Signed-off-by: Michal Hocko <mhocko@suse.com>
> > ---
> >  fs/xfs/kmem.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c
> > index 339c696bbc01..ee95f5c6db45 100644
> > --- a/fs/xfs/kmem.c
> > +++ b/fs/xfs/kmem.c
> > @@ -34,6 +34,8 @@ kmem_zalloc_greedy(size_t *size, size_t minsize, size_t maxsize)
> >  	size_t		kmsize = maxsize;
> >  
> >  	while (!(ptr = vzalloc(kmsize))) {
> > +		if (kmsize == minsize)
> > +			break;
> >  		if ((kmsize >>= 1) <= minsize)
> >  			kmsize = minsize;
> >  	}
> > -- 
> > 2.11.0
> > 
> 
> -- 
> Michal Hocko
> SUSE Labs
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 1/2] xfs: allow kmem_zalloc_greedy to fail
  2017-03-02 15:45                                 ` Michal Hocko
@ 2017-03-03 22:54                                   ` Dave Chinner
  -1 siblings, 0 replies; 79+ messages in thread
From: Dave Chinner @ 2017-03-03 22:54 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Christoph Hellwig, Brian Foster, Tetsuo Handa, Xiong Zhou,
	linux-xfs, linux-mm, LKML, linux-fsdevel, Michal Hocko

On Thu, Mar 02, 2017 at 04:45:40PM +0100, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> Even though kmem_zalloc_greedy is documented it might fail the current
> code doesn't really implement this properly and loops on the smallest
> allowed size for ever. This is a problem because vzalloc might fail
> permanently - we might run out of vmalloc space or since 5d17a73a2ebe
> ("vmalloc: back off when the current task is killed") when the current
> task is killed. The later one makes the failure scenario much more
> probable than it used to be because it makes vmalloc() failures
> permanent for tasks with fatal signals pending.. Fix this by bailing out
> if the minimum size request failed.
> 
> This has been noticed by a hung generic/269 xfstest by Xiong Zhou.
> 
> fsstress: vmalloc: allocation failure, allocated 12288 of 20480 bytes, mode:0x14080c2(GFP_KERNEL|__GFP_HIGHMEM|__GFP_ZERO), nodemask=(null)
> fsstress cpuset=/ mems_allowed=0-1
> CPU: 1 PID: 23460 Comm: fsstress Not tainted 4.10.0-master-45554b2+ #21
> Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 10/05/2016
> Call Trace:
>  dump_stack+0x63/0x87
>  warn_alloc+0x114/0x1c0
>  ? alloc_pages_current+0x88/0x120
>  __vmalloc_node_range+0x250/0x2a0
>  ? kmem_zalloc_greedy+0x2b/0x40 [xfs]
>  ? free_hot_cold_page+0x21f/0x280
>  vzalloc+0x54/0x60
>  ? kmem_zalloc_greedy+0x2b/0x40 [xfs]
>  kmem_zalloc_greedy+0x2b/0x40 [xfs]
>  xfs_bulkstat+0x11b/0x730 [xfs]
>  ? xfs_bulkstat_one_int+0x340/0x340 [xfs]
>  ? selinux_capable+0x20/0x30
>  ? security_capable+0x48/0x60
>  xfs_ioc_bulkstat+0xe4/0x190 [xfs]
>  xfs_file_ioctl+0x9dd/0xad0 [xfs]
>  ? do_filp_open+0xa5/0x100
>  do_vfs_ioctl+0xa7/0x5e0
>  SyS_ioctl+0x79/0x90
>  do_syscall_64+0x67/0x180
>  entry_SYSCALL64_slow_path+0x25/0x25
> 
> fsstress keeps looping inside kmem_zalloc_greedy without any way out
> because vmalloc keeps failing due to fatal_signal_pending.
> 
> Reported-by: Xiong Zhou <xzhou@redhat.com>
> Analyzed-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  fs/xfs/kmem.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c
> index 339c696bbc01..ee95f5c6db45 100644
> --- a/fs/xfs/kmem.c
> +++ b/fs/xfs/kmem.c
> @@ -34,6 +34,8 @@ kmem_zalloc_greedy(size_t *size, size_t minsize, size_t maxsize)
>  	size_t		kmsize = maxsize;
>  
>  	while (!(ptr = vzalloc(kmsize))) {
> +		if (kmsize == minsize)
> +			break;
>  		if ((kmsize >>= 1) <= minsize)
>  			kmsize = minsize;
>  	}

Seems wrong to me - this function used to have lots of callers and
over time we've slowly removed them or replaced them with something
else. I'd suggest removing it completely, replacing the call sites
with kmem_zalloc_large().

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 1/2] xfs: allow kmem_zalloc_greedy to fail
@ 2017-03-03 22:54                                   ` Dave Chinner
  0 siblings, 0 replies; 79+ messages in thread
From: Dave Chinner @ 2017-03-03 22:54 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Christoph Hellwig, Brian Foster, Tetsuo Handa, Xiong Zhou,
	linux-xfs, linux-mm, LKML, linux-fsdevel, Michal Hocko

On Thu, Mar 02, 2017 at 04:45:40PM +0100, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> Even though kmem_zalloc_greedy is documented it might fail the current
> code doesn't really implement this properly and loops on the smallest
> allowed size for ever. This is a problem because vzalloc might fail
> permanently - we might run out of vmalloc space or since 5d17a73a2ebe
> ("vmalloc: back off when the current task is killed") when the current
> task is killed. The later one makes the failure scenario much more
> probable than it used to be because it makes vmalloc() failures
> permanent for tasks with fatal signals pending.. Fix this by bailing out
> if the minimum size request failed.
> 
> This has been noticed by a hung generic/269 xfstest by Xiong Zhou.
> 
> fsstress: vmalloc: allocation failure, allocated 12288 of 20480 bytes, mode:0x14080c2(GFP_KERNEL|__GFP_HIGHMEM|__GFP_ZERO), nodemask=(null)
> fsstress cpuset=/ mems_allowed=0-1
> CPU: 1 PID: 23460 Comm: fsstress Not tainted 4.10.0-master-45554b2+ #21
> Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 10/05/2016
> Call Trace:
>  dump_stack+0x63/0x87
>  warn_alloc+0x114/0x1c0
>  ? alloc_pages_current+0x88/0x120
>  __vmalloc_node_range+0x250/0x2a0
>  ? kmem_zalloc_greedy+0x2b/0x40 [xfs]
>  ? free_hot_cold_page+0x21f/0x280
>  vzalloc+0x54/0x60
>  ? kmem_zalloc_greedy+0x2b/0x40 [xfs]
>  kmem_zalloc_greedy+0x2b/0x40 [xfs]
>  xfs_bulkstat+0x11b/0x730 [xfs]
>  ? xfs_bulkstat_one_int+0x340/0x340 [xfs]
>  ? selinux_capable+0x20/0x30
>  ? security_capable+0x48/0x60
>  xfs_ioc_bulkstat+0xe4/0x190 [xfs]
>  xfs_file_ioctl+0x9dd/0xad0 [xfs]
>  ? do_filp_open+0xa5/0x100
>  do_vfs_ioctl+0xa7/0x5e0
>  SyS_ioctl+0x79/0x90
>  do_syscall_64+0x67/0x180
>  entry_SYSCALL64_slow_path+0x25/0x25
> 
> fsstress keeps looping inside kmem_zalloc_greedy without any way out
> because vmalloc keeps failing due to fatal_signal_pending.
> 
> Reported-by: Xiong Zhou <xzhou@redhat.com>
> Analyzed-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  fs/xfs/kmem.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c
> index 339c696bbc01..ee95f5c6db45 100644
> --- a/fs/xfs/kmem.c
> +++ b/fs/xfs/kmem.c
> @@ -34,6 +34,8 @@ kmem_zalloc_greedy(size_t *size, size_t minsize, size_t maxsize)
>  	size_t		kmsize = maxsize;
>  
>  	while (!(ptr = vzalloc(kmsize))) {
> +		if (kmsize == minsize)
> +			break;
>  		if ((kmsize >>= 1) <= minsize)
>  			kmsize = minsize;
>  	}

Seems wrong to me - this function used to have lots of callers and
over time we've slowly removed them or replaced them with something
else. I'd suggest removing it completely, replacing the call sites
with kmem_zalloc_large().

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 1/2] xfs: allow kmem_zalloc_greedy to fail
  2017-03-03 22:54                                   ` Dave Chinner
@ 2017-03-03 23:19                                     ` Darrick J. Wong
  -1 siblings, 0 replies; 79+ messages in thread
From: Darrick J. Wong @ 2017-03-03 23:19 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Michal Hocko, Christoph Hellwig, Brian Foster, Tetsuo Handa,
	Xiong Zhou, linux-xfs, linux-mm, LKML, linux-fsdevel,
	Michal Hocko

On Sat, Mar 04, 2017 at 09:54:44AM +1100, Dave Chinner wrote:
> On Thu, Mar 02, 2017 at 04:45:40PM +0100, Michal Hocko wrote:
> > From: Michal Hocko <mhocko@suse.com>
> > 
> > Even though kmem_zalloc_greedy is documented it might fail the current
> > code doesn't really implement this properly and loops on the smallest
> > allowed size for ever. This is a problem because vzalloc might fail
> > permanently - we might run out of vmalloc space or since 5d17a73a2ebe
> > ("vmalloc: back off when the current task is killed") when the current
> > task is killed. The later one makes the failure scenario much more
> > probable than it used to be because it makes vmalloc() failures
> > permanent for tasks with fatal signals pending.. Fix this by bailing out
> > if the minimum size request failed.
> > 
> > This has been noticed by a hung generic/269 xfstest by Xiong Zhou.
> > 
> > fsstress: vmalloc: allocation failure, allocated 12288 of 20480 bytes, mode:0x14080c2(GFP_KERNEL|__GFP_HIGHMEM|__GFP_ZERO), nodemask=(null)
> > fsstress cpuset=/ mems_allowed=0-1
> > CPU: 1 PID: 23460 Comm: fsstress Not tainted 4.10.0-master-45554b2+ #21
> > Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 10/05/2016
> > Call Trace:
> >  dump_stack+0x63/0x87
> >  warn_alloc+0x114/0x1c0
> >  ? alloc_pages_current+0x88/0x120
> >  __vmalloc_node_range+0x250/0x2a0
> >  ? kmem_zalloc_greedy+0x2b/0x40 [xfs]
> >  ? free_hot_cold_page+0x21f/0x280
> >  vzalloc+0x54/0x60
> >  ? kmem_zalloc_greedy+0x2b/0x40 [xfs]
> >  kmem_zalloc_greedy+0x2b/0x40 [xfs]
> >  xfs_bulkstat+0x11b/0x730 [xfs]
> >  ? xfs_bulkstat_one_int+0x340/0x340 [xfs]
> >  ? selinux_capable+0x20/0x30
> >  ? security_capable+0x48/0x60
> >  xfs_ioc_bulkstat+0xe4/0x190 [xfs]
> >  xfs_file_ioctl+0x9dd/0xad0 [xfs]
> >  ? do_filp_open+0xa5/0x100
> >  do_vfs_ioctl+0xa7/0x5e0
> >  SyS_ioctl+0x79/0x90
> >  do_syscall_64+0x67/0x180
> >  entry_SYSCALL64_slow_path+0x25/0x25
> > 
> > fsstress keeps looping inside kmem_zalloc_greedy without any way out
> > because vmalloc keeps failing due to fatal_signal_pending.
> > 
> > Reported-by: Xiong Zhou <xzhou@redhat.com>
> > Analyzed-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> > Signed-off-by: Michal Hocko <mhocko@suse.com>
> > ---
> >  fs/xfs/kmem.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c
> > index 339c696bbc01..ee95f5c6db45 100644
> > --- a/fs/xfs/kmem.c
> > +++ b/fs/xfs/kmem.c
> > @@ -34,6 +34,8 @@ kmem_zalloc_greedy(size_t *size, size_t minsize, size_t maxsize)
> >  	size_t		kmsize = maxsize;
> >  
> >  	while (!(ptr = vzalloc(kmsize))) {
> > +		if (kmsize == minsize)
> > +			break;
> >  		if ((kmsize >>= 1) <= minsize)
> >  			kmsize = minsize;
> >  	}
> 
> Seems wrong to me - this function used to have lots of callers and
> over time we've slowly removed them or replaced them with something
> else. I'd suggest removing it completely, replacing the call sites
> with kmem_zalloc_large().

Heh.  I thought the reason why _greedy still exists (for its sole user
bulkstat) is that bulkstat had the flexibility to deal with receiving
0, 1, or 4 pages.  So yeah, we could just kill it.

But thinking even more stingily about memory, are there applications
that care about being able to bulkstat 16384 inodes at once?  How badly
does bulkstat need to be able to bulk-process more than a page's worth
of inobt records, anyway?

--D

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 1/2] xfs: allow kmem_zalloc_greedy to fail
@ 2017-03-03 23:19                                     ` Darrick J. Wong
  0 siblings, 0 replies; 79+ messages in thread
From: Darrick J. Wong @ 2017-03-03 23:19 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Michal Hocko, Christoph Hellwig, Brian Foster, Tetsuo Handa,
	Xiong Zhou, linux-xfs, linux-mm, LKML, linux-fsdevel,
	Michal Hocko

On Sat, Mar 04, 2017 at 09:54:44AM +1100, Dave Chinner wrote:
> On Thu, Mar 02, 2017 at 04:45:40PM +0100, Michal Hocko wrote:
> > From: Michal Hocko <mhocko@suse.com>
> > 
> > Even though kmem_zalloc_greedy is documented it might fail the current
> > code doesn't really implement this properly and loops on the smallest
> > allowed size for ever. This is a problem because vzalloc might fail
> > permanently - we might run out of vmalloc space or since 5d17a73a2ebe
> > ("vmalloc: back off when the current task is killed") when the current
> > task is killed. The later one makes the failure scenario much more
> > probable than it used to be because it makes vmalloc() failures
> > permanent for tasks with fatal signals pending.. Fix this by bailing out
> > if the minimum size request failed.
> > 
> > This has been noticed by a hung generic/269 xfstest by Xiong Zhou.
> > 
> > fsstress: vmalloc: allocation failure, allocated 12288 of 20480 bytes, mode:0x14080c2(GFP_KERNEL|__GFP_HIGHMEM|__GFP_ZERO), nodemask=(null)
> > fsstress cpuset=/ mems_allowed=0-1
> > CPU: 1 PID: 23460 Comm: fsstress Not tainted 4.10.0-master-45554b2+ #21
> > Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 10/05/2016
> > Call Trace:
> >  dump_stack+0x63/0x87
> >  warn_alloc+0x114/0x1c0
> >  ? alloc_pages_current+0x88/0x120
> >  __vmalloc_node_range+0x250/0x2a0
> >  ? kmem_zalloc_greedy+0x2b/0x40 [xfs]
> >  ? free_hot_cold_page+0x21f/0x280
> >  vzalloc+0x54/0x60
> >  ? kmem_zalloc_greedy+0x2b/0x40 [xfs]
> >  kmem_zalloc_greedy+0x2b/0x40 [xfs]
> >  xfs_bulkstat+0x11b/0x730 [xfs]
> >  ? xfs_bulkstat_one_int+0x340/0x340 [xfs]
> >  ? selinux_capable+0x20/0x30
> >  ? security_capable+0x48/0x60
> >  xfs_ioc_bulkstat+0xe4/0x190 [xfs]
> >  xfs_file_ioctl+0x9dd/0xad0 [xfs]
> >  ? do_filp_open+0xa5/0x100
> >  do_vfs_ioctl+0xa7/0x5e0
> >  SyS_ioctl+0x79/0x90
> >  do_syscall_64+0x67/0x180
> >  entry_SYSCALL64_slow_path+0x25/0x25
> > 
> > fsstress keeps looping inside kmem_zalloc_greedy without any way out
> > because vmalloc keeps failing due to fatal_signal_pending.
> > 
> > Reported-by: Xiong Zhou <xzhou@redhat.com>
> > Analyzed-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> > Signed-off-by: Michal Hocko <mhocko@suse.com>
> > ---
> >  fs/xfs/kmem.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c
> > index 339c696bbc01..ee95f5c6db45 100644
> > --- a/fs/xfs/kmem.c
> > +++ b/fs/xfs/kmem.c
> > @@ -34,6 +34,8 @@ kmem_zalloc_greedy(size_t *size, size_t minsize, size_t maxsize)
> >  	size_t		kmsize = maxsize;
> >  
> >  	while (!(ptr = vzalloc(kmsize))) {
> > +		if (kmsize == minsize)
> > +			break;
> >  		if ((kmsize >>= 1) <= minsize)
> >  			kmsize = minsize;
> >  	}
> 
> Seems wrong to me - this function used to have lots of callers and
> over time we've slowly removed them or replaced them with something
> else. I'd suggest removing it completely, replacing the call sites
> with kmem_zalloc_large().

Heh.  I thought the reason why _greedy still exists (for its sole user
bulkstat) is that bulkstat had the flexibility to deal with receiving
0, 1, or 4 pages.  So yeah, we could just kill it.

But thinking even more stingily about memory, are there applications
that care about being able to bulkstat 16384 inodes at once?  How badly
does bulkstat need to be able to bulk-process more than a page's worth
of inobt records, anyway?

--D

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 1/2] xfs: allow kmem_zalloc_greedy to fail
  2017-03-03 23:19                                     ` Darrick J. Wong
@ 2017-03-04  4:48                                       ` Dave Chinner
  -1 siblings, 0 replies; 79+ messages in thread
From: Dave Chinner @ 2017-03-04  4:48 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Michal Hocko, Christoph Hellwig, Brian Foster, Tetsuo Handa,
	Xiong Zhou, linux-xfs, linux-mm, LKML, linux-fsdevel,
	Michal Hocko

On Fri, Mar 03, 2017 at 03:19:12PM -0800, Darrick J. Wong wrote:
> On Sat, Mar 04, 2017 at 09:54:44AM +1100, Dave Chinner wrote:
> > On Thu, Mar 02, 2017 at 04:45:40PM +0100, Michal Hocko wrote:
> > > From: Michal Hocko <mhocko@suse.com>
> > > 
> > > Even though kmem_zalloc_greedy is documented it might fail the current
> > > code doesn't really implement this properly and loops on the smallest
> > > allowed size for ever. This is a problem because vzalloc might fail
> > > permanently - we might run out of vmalloc space or since 5d17a73a2ebe
> > > ("vmalloc: back off when the current task is killed") when the current
> > > task is killed. The later one makes the failure scenario much more
> > > probable than it used to be because it makes vmalloc() failures
> > > permanent for tasks with fatal signals pending.. Fix this by bailing out
> > > if the minimum size request failed.
> > > 
> > > This has been noticed by a hung generic/269 xfstest by Xiong Zhou.
> > > 
> > > fsstress: vmalloc: allocation failure, allocated 12288 of 20480 bytes, mode:0x14080c2(GFP_KERNEL|__GFP_HIGHMEM|__GFP_ZERO), nodemask=(null)
> > > fsstress cpuset=/ mems_allowed=0-1
> > > CPU: 1 PID: 23460 Comm: fsstress Not tainted 4.10.0-master-45554b2+ #21
> > > Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 10/05/2016
> > > Call Trace:
> > >  dump_stack+0x63/0x87
> > >  warn_alloc+0x114/0x1c0
> > >  ? alloc_pages_current+0x88/0x120
> > >  __vmalloc_node_range+0x250/0x2a0
> > >  ? kmem_zalloc_greedy+0x2b/0x40 [xfs]
> > >  ? free_hot_cold_page+0x21f/0x280
> > >  vzalloc+0x54/0x60
> > >  ? kmem_zalloc_greedy+0x2b/0x40 [xfs]
> > >  kmem_zalloc_greedy+0x2b/0x40 [xfs]
> > >  xfs_bulkstat+0x11b/0x730 [xfs]
> > >  ? xfs_bulkstat_one_int+0x340/0x340 [xfs]
> > >  ? selinux_capable+0x20/0x30
> > >  ? security_capable+0x48/0x60
> > >  xfs_ioc_bulkstat+0xe4/0x190 [xfs]
> > >  xfs_file_ioctl+0x9dd/0xad0 [xfs]
> > >  ? do_filp_open+0xa5/0x100
> > >  do_vfs_ioctl+0xa7/0x5e0
> > >  SyS_ioctl+0x79/0x90
> > >  do_syscall_64+0x67/0x180
> > >  entry_SYSCALL64_slow_path+0x25/0x25
> > > 
> > > fsstress keeps looping inside kmem_zalloc_greedy without any way out
> > > because vmalloc keeps failing due to fatal_signal_pending.
> > > 
> > > Reported-by: Xiong Zhou <xzhou@redhat.com>
> > > Analyzed-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> > > Signed-off-by: Michal Hocko <mhocko@suse.com>
> > > ---
> > >  fs/xfs/kmem.c | 2 ++
> > >  1 file changed, 2 insertions(+)
> > > 
> > > diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c
> > > index 339c696bbc01..ee95f5c6db45 100644
> > > --- a/fs/xfs/kmem.c
> > > +++ b/fs/xfs/kmem.c
> > > @@ -34,6 +34,8 @@ kmem_zalloc_greedy(size_t *size, size_t minsize, size_t maxsize)
> > >  	size_t		kmsize = maxsize;
> > >  
> > >  	while (!(ptr = vzalloc(kmsize))) {
> > > +		if (kmsize == minsize)
> > > +			break;
> > >  		if ((kmsize >>= 1) <= minsize)
> > >  			kmsize = minsize;
> > >  	}
> > 
> > Seems wrong to me - this function used to have lots of callers and
> > over time we've slowly removed them or replaced them with something
> > else. I'd suggest removing it completely, replacing the call sites
> > with kmem_zalloc_large().
> 
> Heh.  I thought the reason why _greedy still exists (for its sole user
> bulkstat) is that bulkstat had the flexibility to deal with receiving
> 0, 1, or 4 pages.  So yeah, we could just kill it.

irbuf is sized to minimise AGI locking, but if memory is low
it just uses what it can get. Keep in mind the number of inodes we
need to process is determined by the userspace buffer size, which
can easily be sized to hold tens of thousands of struct
xfs_bulkstat.

> But thinking even more stingily about memory, are there applications
> that care about being able to bulkstat 16384 inodes at once?

IIRC, xfsdump can bulkstat up to 64k inodes per call....

> How badly
> does bulkstat need to be able to bulk-process more than a page's worth
> of inobt records, anyway?

Benchmark it on a busy system doing lots of other AGI work (e.g. a
busy NFS server workload with a working set of tens of millions of
inodes so it doesn't fit in cache) and find out. That's generally
how I answer those sorts of questions...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 1/2] xfs: allow kmem_zalloc_greedy to fail
@ 2017-03-04  4:48                                       ` Dave Chinner
  0 siblings, 0 replies; 79+ messages in thread
From: Dave Chinner @ 2017-03-04  4:48 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Michal Hocko, Christoph Hellwig, Brian Foster, Tetsuo Handa,
	Xiong Zhou, linux-xfs, linux-mm, LKML, linux-fsdevel,
	Michal Hocko

On Fri, Mar 03, 2017 at 03:19:12PM -0800, Darrick J. Wong wrote:
> On Sat, Mar 04, 2017 at 09:54:44AM +1100, Dave Chinner wrote:
> > On Thu, Mar 02, 2017 at 04:45:40PM +0100, Michal Hocko wrote:
> > > From: Michal Hocko <mhocko@suse.com>
> > > 
> > > Even though kmem_zalloc_greedy is documented it might fail the current
> > > code doesn't really implement this properly and loops on the smallest
> > > allowed size for ever. This is a problem because vzalloc might fail
> > > permanently - we might run out of vmalloc space or since 5d17a73a2ebe
> > > ("vmalloc: back off when the current task is killed") when the current
> > > task is killed. The later one makes the failure scenario much more
> > > probable than it used to be because it makes vmalloc() failures
> > > permanent for tasks with fatal signals pending.. Fix this by bailing out
> > > if the minimum size request failed.
> > > 
> > > This has been noticed by a hung generic/269 xfstest by Xiong Zhou.
> > > 
> > > fsstress: vmalloc: allocation failure, allocated 12288 of 20480 bytes, mode:0x14080c2(GFP_KERNEL|__GFP_HIGHMEM|__GFP_ZERO), nodemask=(null)
> > > fsstress cpuset=/ mems_allowed=0-1
> > > CPU: 1 PID: 23460 Comm: fsstress Not tainted 4.10.0-master-45554b2+ #21
> > > Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 10/05/2016
> > > Call Trace:
> > >  dump_stack+0x63/0x87
> > >  warn_alloc+0x114/0x1c0
> > >  ? alloc_pages_current+0x88/0x120
> > >  __vmalloc_node_range+0x250/0x2a0
> > >  ? kmem_zalloc_greedy+0x2b/0x40 [xfs]
> > >  ? free_hot_cold_page+0x21f/0x280
> > >  vzalloc+0x54/0x60
> > >  ? kmem_zalloc_greedy+0x2b/0x40 [xfs]
> > >  kmem_zalloc_greedy+0x2b/0x40 [xfs]
> > >  xfs_bulkstat+0x11b/0x730 [xfs]
> > >  ? xfs_bulkstat_one_int+0x340/0x340 [xfs]
> > >  ? selinux_capable+0x20/0x30
> > >  ? security_capable+0x48/0x60
> > >  xfs_ioc_bulkstat+0xe4/0x190 [xfs]
> > >  xfs_file_ioctl+0x9dd/0xad0 [xfs]
> > >  ? do_filp_open+0xa5/0x100
> > >  do_vfs_ioctl+0xa7/0x5e0
> > >  SyS_ioctl+0x79/0x90
> > >  do_syscall_64+0x67/0x180
> > >  entry_SYSCALL64_slow_path+0x25/0x25
> > > 
> > > fsstress keeps looping inside kmem_zalloc_greedy without any way out
> > > because vmalloc keeps failing due to fatal_signal_pending.
> > > 
> > > Reported-by: Xiong Zhou <xzhou@redhat.com>
> > > Analyzed-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> > > Signed-off-by: Michal Hocko <mhocko@suse.com>
> > > ---
> > >  fs/xfs/kmem.c | 2 ++
> > >  1 file changed, 2 insertions(+)
> > > 
> > > diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c
> > > index 339c696bbc01..ee95f5c6db45 100644
> > > --- a/fs/xfs/kmem.c
> > > +++ b/fs/xfs/kmem.c
> > > @@ -34,6 +34,8 @@ kmem_zalloc_greedy(size_t *size, size_t minsize, size_t maxsize)
> > >  	size_t		kmsize = maxsize;
> > >  
> > >  	while (!(ptr = vzalloc(kmsize))) {
> > > +		if (kmsize == minsize)
> > > +			break;
> > >  		if ((kmsize >>= 1) <= minsize)
> > >  			kmsize = minsize;
> > >  	}
> > 
> > Seems wrong to me - this function used to have lots of callers and
> > over time we've slowly removed them or replaced them with something
> > else. I'd suggest removing it completely, replacing the call sites
> > with kmem_zalloc_large().
> 
> Heh.  I thought the reason why _greedy still exists (for its sole user
> bulkstat) is that bulkstat had the flexibility to deal with receiving
> 0, 1, or 4 pages.  So yeah, we could just kill it.

irbuf is sized to minimise AGI locking, but if memory is low
it just uses what it can get. Keep in mind the number of inodes we
need to process is determined by the userspace buffer size, which
can easily be sized to hold tens of thousands of struct
xfs_bulkstat.

> But thinking even more stingily about memory, are there applications
> that care about being able to bulkstat 16384 inodes at once?

IIRC, xfsdump can bulkstat up to 64k inodes per call....

> How badly
> does bulkstat need to be able to bulk-process more than a page's worth
> of inobt records, anyway?

Benchmark it on a busy system doing lots of other AGI work (e.g. a
busy NFS server workload with a working set of tens of millions of
inodes so it doesn't fit in cache) and find out. That's generally
how I answer those sorts of questions...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 1/2] xfs: allow kmem_zalloc_greedy to fail
  2017-03-03 22:54                                   ` Dave Chinner
@ 2017-03-06 13:21                                     ` Michal Hocko
  -1 siblings, 0 replies; 79+ messages in thread
From: Michal Hocko @ 2017-03-06 13:21 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Christoph Hellwig, Brian Foster, Tetsuo Handa, Xiong Zhou,
	linux-xfs, linux-mm, LKML, linux-fsdevel

On Sat 04-03-17 09:54:44, Dave Chinner wrote:
> On Thu, Mar 02, 2017 at 04:45:40PM +0100, Michal Hocko wrote:
> > From: Michal Hocko <mhocko@suse.com>
> > 
> > Even though kmem_zalloc_greedy is documented it might fail the current
> > code doesn't really implement this properly and loops on the smallest
> > allowed size for ever. This is a problem because vzalloc might fail
> > permanently - we might run out of vmalloc space or since 5d17a73a2ebe
> > ("vmalloc: back off when the current task is killed") when the current
> > task is killed. The later one makes the failure scenario much more
> > probable than it used to be because it makes vmalloc() failures
> > permanent for tasks with fatal signals pending.. Fix this by bailing out
> > if the minimum size request failed.
> > 
> > This has been noticed by a hung generic/269 xfstest by Xiong Zhou.
> > 
> > fsstress: vmalloc: allocation failure, allocated 12288 of 20480 bytes, mode:0x14080c2(GFP_KERNEL|__GFP_HIGHMEM|__GFP_ZERO), nodemask=(null)
> > fsstress cpuset=/ mems_allowed=0-1
> > CPU: 1 PID: 23460 Comm: fsstress Not tainted 4.10.0-master-45554b2+ #21
> > Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 10/05/2016
> > Call Trace:
> >  dump_stack+0x63/0x87
> >  warn_alloc+0x114/0x1c0
> >  ? alloc_pages_current+0x88/0x120
> >  __vmalloc_node_range+0x250/0x2a0
> >  ? kmem_zalloc_greedy+0x2b/0x40 [xfs]
> >  ? free_hot_cold_page+0x21f/0x280
> >  vzalloc+0x54/0x60
> >  ? kmem_zalloc_greedy+0x2b/0x40 [xfs]
> >  kmem_zalloc_greedy+0x2b/0x40 [xfs]
> >  xfs_bulkstat+0x11b/0x730 [xfs]
> >  ? xfs_bulkstat_one_int+0x340/0x340 [xfs]
> >  ? selinux_capable+0x20/0x30
> >  ? security_capable+0x48/0x60
> >  xfs_ioc_bulkstat+0xe4/0x190 [xfs]
> >  xfs_file_ioctl+0x9dd/0xad0 [xfs]
> >  ? do_filp_open+0xa5/0x100
> >  do_vfs_ioctl+0xa7/0x5e0
> >  SyS_ioctl+0x79/0x90
> >  do_syscall_64+0x67/0x180
> >  entry_SYSCALL64_slow_path+0x25/0x25
> > 
> > fsstress keeps looping inside kmem_zalloc_greedy without any way out
> > because vmalloc keeps failing due to fatal_signal_pending.
> > 
> > Reported-by: Xiong Zhou <xzhou@redhat.com>
> > Analyzed-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> > Signed-off-by: Michal Hocko <mhocko@suse.com>
> > ---
> >  fs/xfs/kmem.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c
> > index 339c696bbc01..ee95f5c6db45 100644
> > --- a/fs/xfs/kmem.c
> > +++ b/fs/xfs/kmem.c
> > @@ -34,6 +34,8 @@ kmem_zalloc_greedy(size_t *size, size_t minsize, size_t maxsize)
> >  	size_t		kmsize = maxsize;
> >  
> >  	while (!(ptr = vzalloc(kmsize))) {
> > +		if (kmsize == minsize)
> > +			break;
> >  		if ((kmsize >>= 1) <= minsize)
> >  			kmsize = minsize;
> >  	}
> 
> Seems wrong to me - this function used to have lots of callers and
> over time we've slowly removed them or replaced them with something
> else. I'd suggest removing it completely, replacing the call sites
> with kmem_zalloc_large().

I do not really care how this gets fixed. Dropping kmem_zalloc_greedy
sounds like a way to go. I am not familiar with xfs_bulkstat to do an
edicated guess which allocation size to use. So I guess I have to
postpone this to you guys if you prefer that route though.

Thanks!
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH 1/2] xfs: allow kmem_zalloc_greedy to fail
@ 2017-03-06 13:21                                     ` Michal Hocko
  0 siblings, 0 replies; 79+ messages in thread
From: Michal Hocko @ 2017-03-06 13:21 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Christoph Hellwig, Brian Foster, Tetsuo Handa, Xiong Zhou,
	linux-xfs, linux-mm, LKML, linux-fsdevel

On Sat 04-03-17 09:54:44, Dave Chinner wrote:
> On Thu, Mar 02, 2017 at 04:45:40PM +0100, Michal Hocko wrote:
> > From: Michal Hocko <mhocko@suse.com>
> > 
> > Even though kmem_zalloc_greedy is documented it might fail the current
> > code doesn't really implement this properly and loops on the smallest
> > allowed size for ever. This is a problem because vzalloc might fail
> > permanently - we might run out of vmalloc space or since 5d17a73a2ebe
> > ("vmalloc: back off when the current task is killed") when the current
> > task is killed. The later one makes the failure scenario much more
> > probable than it used to be because it makes vmalloc() failures
> > permanent for tasks with fatal signals pending.. Fix this by bailing out
> > if the minimum size request failed.
> > 
> > This has been noticed by a hung generic/269 xfstest by Xiong Zhou.
> > 
> > fsstress: vmalloc: allocation failure, allocated 12288 of 20480 bytes, mode:0x14080c2(GFP_KERNEL|__GFP_HIGHMEM|__GFP_ZERO), nodemask=(null)
> > fsstress cpuset=/ mems_allowed=0-1
> > CPU: 1 PID: 23460 Comm: fsstress Not tainted 4.10.0-master-45554b2+ #21
> > Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 10/05/2016
> > Call Trace:
> >  dump_stack+0x63/0x87
> >  warn_alloc+0x114/0x1c0
> >  ? alloc_pages_current+0x88/0x120
> >  __vmalloc_node_range+0x250/0x2a0
> >  ? kmem_zalloc_greedy+0x2b/0x40 [xfs]
> >  ? free_hot_cold_page+0x21f/0x280
> >  vzalloc+0x54/0x60
> >  ? kmem_zalloc_greedy+0x2b/0x40 [xfs]
> >  kmem_zalloc_greedy+0x2b/0x40 [xfs]
> >  xfs_bulkstat+0x11b/0x730 [xfs]
> >  ? xfs_bulkstat_one_int+0x340/0x340 [xfs]
> >  ? selinux_capable+0x20/0x30
> >  ? security_capable+0x48/0x60
> >  xfs_ioc_bulkstat+0xe4/0x190 [xfs]
> >  xfs_file_ioctl+0x9dd/0xad0 [xfs]
> >  ? do_filp_open+0xa5/0x100
> >  do_vfs_ioctl+0xa7/0x5e0
> >  SyS_ioctl+0x79/0x90
> >  do_syscall_64+0x67/0x180
> >  entry_SYSCALL64_slow_path+0x25/0x25
> > 
> > fsstress keeps looping inside kmem_zalloc_greedy without any way out
> > because vmalloc keeps failing due to fatal_signal_pending.
> > 
> > Reported-by: Xiong Zhou <xzhou@redhat.com>
> > Analyzed-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> > Signed-off-by: Michal Hocko <mhocko@suse.com>
> > ---
> >  fs/xfs/kmem.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c
> > index 339c696bbc01..ee95f5c6db45 100644
> > --- a/fs/xfs/kmem.c
> > +++ b/fs/xfs/kmem.c
> > @@ -34,6 +34,8 @@ kmem_zalloc_greedy(size_t *size, size_t minsize, size_t maxsize)
> >  	size_t		kmsize = maxsize;
> >  
> >  	while (!(ptr = vzalloc(kmsize))) {
> > +		if (kmsize == minsize)
> > +			break;
> >  		if ((kmsize >>= 1) <= minsize)
> >  			kmsize = minsize;
> >  	}
> 
> Seems wrong to me - this function used to have lots of callers and
> over time we've slowly removed them or replaced them with something
> else. I'd suggest removing it completely, replacing the call sites
> with kmem_zalloc_large().

I do not really care how this gets fixed. Dropping kmem_zalloc_greedy
sounds like a way to go. I am not familiar with xfs_bulkstat to do an
edicated guess which allocation size to use. So I guess I have to
postpone this to you guys if you prefer that route though.

Thanks!
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 79+ messages in thread

end of thread, other threads:[~2017-03-06 13:24 UTC | newest]

Thread overview: 79+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-01  4:46 mm allocation failure and hang when running xfstests generic/269 on xfs Xiong Zhou
2017-03-01  4:46 ` Xiong Zhou
2017-03-02  0:37 ` Christoph Hellwig
2017-03-02  0:37   ` Christoph Hellwig
2017-03-02  5:19   ` Xiong Zhou
2017-03-02  5:19     ` Xiong Zhou
2017-03-02  6:41     ` Bob Liu
2017-03-02  6:41       ` Bob Liu
2017-03-02  6:41       ` Bob Liu
2017-03-02  6:41       ` Bob Liu
2017-03-02  6:47     ` Anshuman Khandual
2017-03-02  6:47       ` Anshuman Khandual
2017-03-02  8:42       ` Michal Hocko
2017-03-02  8:42         ` Michal Hocko
2017-03-02  9:23         ` Xiong Zhou
2017-03-02  9:23           ` Xiong Zhou
2017-03-02 10:04     ` Tetsuo Handa
2017-03-02 10:04       ` Tetsuo Handa
2017-03-02 10:35       ` Michal Hocko
2017-03-02 10:35         ` Michal Hocko
2017-03-02 10:35         ` Michal Hocko
2017-03-02 10:53         ` mm allocation failure and hang when running xfstests generic/269on xfs Tetsuo Handa
2017-03-02 10:53           ` Tetsuo Handa
2017-03-02 12:24         ` mm allocation failure and hang when running xfstests generic/269 on xfs Brian Foster
2017-03-02 12:24           ` Brian Foster
2017-03-02 12:49           ` Michal Hocko
2017-03-02 12:49             ` Michal Hocko
2017-03-02 13:00             ` Brian Foster
2017-03-02 13:00               ` Brian Foster
2017-03-02 13:07               ` Tetsuo Handa
2017-03-02 13:07                 ` Tetsuo Handa
2017-03-02 13:27               ` Michal Hocko
2017-03-02 13:27                 ` Michal Hocko
2017-03-02 13:41                 ` Brian Foster
2017-03-02 13:41                   ` Brian Foster
2017-03-02 13:50                   ` Michal Hocko
2017-03-02 13:50                     ` Michal Hocko
2017-03-02 14:23                     ` Brian Foster
2017-03-02 14:23                       ` Brian Foster
2017-03-02 14:34                       ` Michal Hocko
2017-03-02 14:34                         ` Michal Hocko
2017-03-02 14:51                         ` Brian Foster
2017-03-02 14:51                           ` Brian Foster
2017-03-02 15:14                           ` Michal Hocko
2017-03-02 15:14                             ` Michal Hocko
2017-03-02 15:30                             ` Brian Foster
2017-03-02 15:30                               ` Brian Foster
2017-03-02 15:45                               ` [PATCH 1/2] xfs: allow kmem_zalloc_greedy to fail Michal Hocko
2017-03-02 15:45                                 ` Michal Hocko
2017-03-02 15:45                                 ` Michal Hocko
2017-03-02 15:45                                 ` Michal Hocko
2017-03-02 15:45                                 ` [PATCH 2/2] xfs: back off from kmem_zalloc_greedy if the task is killed Michal Hocko
2017-03-02 15:45                                   ` Michal Hocko
2017-03-02 15:45                                   ` Michal Hocko
2017-03-02 15:45                                   ` Michal Hocko
2017-03-02 15:49                                   ` Christoph Hellwig
2017-03-02 15:49                                     ` Christoph Hellwig
2017-03-02 15:59                                   ` Brian Foster
2017-03-02 15:59                                     ` Brian Foster
2017-03-02 15:49                                 ` [PATCH 1/2] xfs: allow kmem_zalloc_greedy to fail Christoph Hellwig
2017-03-02 15:49                                   ` Christoph Hellwig
2017-03-02 15:59                                 ` Brian Foster
2017-03-02 15:59                                   ` Brian Foster
2017-03-02 16:16                                 ` Michal Hocko
2017-03-02 16:16                                   ` Michal Hocko
2017-03-02 16:44                                   ` Darrick J. Wong
2017-03-02 16:44                                     ` Darrick J. Wong
2017-03-03 22:54                                 ` Dave Chinner
2017-03-03 22:54                                   ` Dave Chinner
2017-03-03 23:19                                   ` Darrick J. Wong
2017-03-03 23:19                                     ` Darrick J. Wong
2017-03-04  4:48                                     ` Dave Chinner
2017-03-04  4:48                                       ` Dave Chinner
2017-03-06 13:21                                   ` Michal Hocko
2017-03-06 13:21                                     ` Michal Hocko
2017-03-02 15:47                               ` mm allocation failure and hang when running xfstests generic/269 on xfs Michal Hocko
2017-03-02 15:47                                 ` Michal Hocko
2017-03-02 15:47                           ` Christoph Hellwig
2017-03-02 15:47                             ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.