* mm allocation failure and hang when running xfstests generic/269 on xfs @ 2017-03-01 4:46 Xiong Zhou 2017-03-02 0:37 ` Christoph Hellwig 0 siblings, 1 reply; 36+ messages in thread From: Xiong Zhou @ 2017-03-01 4:46 UTC (permalink / raw) To: linux-xfs, linux-mm; +Cc: linux-kernel, linux-fsdevel Hi, It's reproduciable, not everytime though. Ext4 works fine. Based on test logs, it's bad on Linus tree commit: e5d56ef Merge tag 'watchdog-for-linus-v4.11' It's good on commit: f8e6859 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc Trying to narrow down a little bit. Thanks, Xiong --- fsstress: vmalloc: allocation failure, allocated 12288 of 20480 bytes, mode:0x14080c2(GFP_KERNEL|__GFP_HIGHMEM|__GFP_ZERO), nodemask=(null) fsstress cpuset=/ mems_allowed=0-1 CPU: 1 PID: 23460 Comm: fsstress Not tainted 4.10.0-master-45554b2+ #21 Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 10/05/2016 Call Trace: dump_stack+0x63/0x87 warn_alloc+0x114/0x1c0 ? alloc_pages_current+0x88/0x120 __vmalloc_node_range+0x250/0x2a0 ? kmem_zalloc_greedy+0x2b/0x40 [xfs] ? free_hot_cold_page+0x21f/0x280 vzalloc+0x54/0x60 ? kmem_zalloc_greedy+0x2b/0x40 [xfs] kmem_zalloc_greedy+0x2b/0x40 [xfs] xfs_bulkstat+0x11b/0x730 [xfs] ? xfs_bulkstat_one_int+0x340/0x340 [xfs] ? selinux_capable+0x20/0x30 ? security_capable+0x48/0x60 xfs_ioc_bulkstat+0xe4/0x190 [xfs] xfs_file_ioctl+0x9dd/0xad0 [xfs] ? do_filp_open+0xa5/0x100 do_vfs_ioctl+0xa7/0x5e0 SyS_ioctl+0x79/0x90 do_syscall_64+0x67/0x180 entry_SYSCALL64_slow_path+0x25/0x25 RIP: 0033:0x7f825023f577 RSP: 002b:00007ffffea76e58 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 0000000000003f7e RCX: 00007f825023f577 RDX: 00007ffffea76e70 RSI: ffffffffc0205865 RDI: 0000000000000003 RBP: 0000000000000003 R08: 0000000000000008 R09: 0000000000000036 R10: 0000000000000069 R11: 0000000000000246 R12: 0000000000000036 R13: 00007f824c002d00 R14: 0000000000001209 R15: 0000000000000000 Mem-Info: active_anon:23126 inactive_anon:1719 isolated_anon:0 active_file:153709 inactive_file:356889 isolated_file:0 unevictable:0 dirty:0 writeback:0 unstable:0 slab_reclaimable:43829 slab_unreclaimable:45414 mapped:14638 shmem:2470 pagetables:1599 bounce:0 free:7463113 free_pcp:23729 free_cma:0 Node 0 active_anon:36372kB inactive_anon:140kB active_file:449540kB inactive_file:355288kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:21792kB dirty:0kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 10240kB anon_thp: 796kB writeback_tmp:0kB unstable:0kB pages_scanned:0 all_unreclaimable? no Node 1 active_anon:56132kB inactive_anon:6736kB active_file:165296kB inactive_file:1072268kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:36760kB dirty:0kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 12288kB anon_thp: 9084kB writeback_tmp:0kB unstable:0kB pages_scanned:0 all_unreclaimable? no Node 0 DMA free:15884kB min:40kB low:52kB high:64kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15904kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:20kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB lowmem_reserve[]: 0 1821 15896 15896 15896 Node 0 DMA32 free:1860532kB min:4968kB low:6772kB high:8576kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:1948156kB managed:1865328kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:2132kB local_pcp:0kB free_cma:0kB lowmem_reserve[]: 0 0 14074 14074 14074 Node 0 Normal free:13111616kB min:39660kB low:54072kB high:68484kB active_anon:36372kB inactive_anon:140kB active_file:449540kB inactive_file:355296kB unevictable:0kB writepending:0kB present:14680064kB managed:14412260kB mlocked:0kB slab_reclaimable:77256kB slab_unreclaimable:86096kB kernel_stack:8184kB pagetables:3132kB bounce:0kB free_pcp:45712kB local_pcp:212kB free_cma:0kB lowmem_reserve[]: 0 0 0 0 0 Node 1 Normal free:14864424kB min:45432kB low:61940kB high:78448kB active_anon:56132kB inactive_anon:6736kB active_file:165296kB inactive_file:1072264kB unevictable:0kB writepending:0kB present:16777216kB managed:16508964kB mlocked:0kB slab_reclaimable:98060kB slab_unreclaimable:95540kB kernel_stack:7880kB pagetables:3264kB bounce:0kB free_pcp:46848kB local_pcp:640kB free_cma:0kB lowmem_reserve[]: 0 0 0 0 0 Node 0 DMA: 1*4kB (U) 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15884kB Node 0 DMA32: 13*4kB (UM) 10*8kB (UM) 13*16kB (UM) 11*32kB (M) 18*64kB (UM) 9*128kB (UM) 16*256kB (UM) 12*512kB (UM) 12*1024kB (UM) 10*2048kB (UM) 443*4096kB (M) = 1860532kB Node 0 Normal: 2765*4kB (UME) 737*8kB (UME) 145*16kB (UME) 251*32kB (UME) 626*64kB (UM) 255*128kB (UME) 46*256kB (UME) 23*512kB (UE) 15*1024kB (U) 8*2048kB (U) 3163*4096kB (M) = 13110956kB Node 1 Normal: 3422*4kB (UME) 1039*8kB (UME) 158*16kB (UME) 852*32kB (UME) 1125*64kB (UME) 617*128kB (UE) 329*256kB (UME) 117*512kB (UME) 18*1024kB (UM) 6*2048kB (U) 3537*4096kB (M) = 14865168kB Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB 513078 total pagecache pages 0 pages in swap cache Swap cache stats: add 0, delete 0, find 0/0 Free swap = 10485756kB Total swap = 10485756kB 8355357 pages RAM 0 pages HighMem/MovableOnly 154743 pages reserved 0 pages cma reserved 0 pages hwpoisoned -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: mm allocation failure and hang when running xfstests generic/269 on xfs 2017-03-01 4:46 mm allocation failure and hang when running xfstests generic/269 on xfs Xiong Zhou @ 2017-03-02 0:37 ` Christoph Hellwig 2017-03-02 5:19 ` Xiong Zhou 0 siblings, 1 reply; 36+ messages in thread From: Christoph Hellwig @ 2017-03-02 0:37 UTC (permalink / raw) To: Xiong Zhou; +Cc: linux-xfs, linux-mm, linux-kernel, linux-fsdevel On Wed, Mar 01, 2017 at 12:46:34PM +0800, Xiong Zhou wrote: > Hi, > > It's reproduciable, not everytime though. Ext4 works fine. On ext4 fsstress won't run bulkstat because it doesn't exist. Either way this smells like a MM issue to me as there were not XFS changes in that area recently. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: mm allocation failure and hang when running xfstests generic/269 on xfs 2017-03-02 0:37 ` Christoph Hellwig @ 2017-03-02 5:19 ` Xiong Zhou 2017-03-02 6:41 ` Bob Liu ` (2 more replies) 0 siblings, 3 replies; 36+ messages in thread From: Xiong Zhou @ 2017-03-02 5:19 UTC (permalink / raw) To: Christoph Hellwig, mhocko Cc: Xiong Zhou, linux-xfs, linux-mm, linux-kernel, linux-fsdevel On Wed, Mar 01, 2017 at 04:37:31PM -0800, Christoph Hellwig wrote: > On Wed, Mar 01, 2017 at 12:46:34PM +0800, Xiong Zhou wrote: > > Hi, > > > > It's reproduciable, not everytime though. Ext4 works fine. > > On ext4 fsstress won't run bulkstat because it doesn't exist. Either > way this smells like a MM issue to me as there were not XFS changes > in that area recently. Yap. First bad commit: commit 5d17a73a2ebeb8d1c6924b91e53ab2650fe86ffb Author: Michal Hocko <mhocko@suse.com> Date: Fri Feb 24 14:58:53 2017 -0800 vmalloc: back off when the current task is killed Reverting this commit on top of e5d56ef Merge tag 'watchdog-for-linus-v4.11' survives the tests. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: mm allocation failure and hang when running xfstests generic/269 on xfs 2017-03-02 5:19 ` Xiong Zhou @ 2017-03-02 6:41 ` Bob Liu 2017-03-02 6:47 ` Anshuman Khandual 2017-03-02 10:04 ` Tetsuo Handa 2 siblings, 0 replies; 36+ messages in thread From: Bob Liu @ 2017-03-02 6:41 UTC (permalink / raw) To: Xiong Zhou, Christoph Hellwig, mhocko Cc: linux-xfs, linux-mm, linux-kernel, linux-fsdevel On 2017/3/2 13:19, Xiong Zhou wrote: > On Wed, Mar 01, 2017 at 04:37:31PM -0800, Christoph Hellwig wrote: >> On Wed, Mar 01, 2017 at 12:46:34PM +0800, Xiong Zhou wrote: >>> Hi, >>> >>> It's reproduciable, not everytime though. Ext4 works fine. >> >> On ext4 fsstress won't run bulkstat because it doesn't exist. Either >> way this smells like a MM issue to me as there were not XFS changes >> in that area recently. > > Yap. > > First bad commit: > It looks like not a bug. >From below commit, the allocation failure print was due to current process received SIGKILL signal. You may need to confirm whether that's the case. Regards, Bob > commit 5d17a73a2ebeb8d1c6924b91e53ab2650fe86ffb > Author: Michal Hocko <mhocko@suse.com> > Date: Fri Feb 24 14:58:53 2017 -0800 > > vmalloc: back off when the current task is killed > > Reverting this commit on top of > e5d56ef Merge tag 'watchdog-for-linus-v4.11' > survives the tests. > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: mm allocation failure and hang when running xfstests generic/269 on xfs 2017-03-02 5:19 ` Xiong Zhou 2017-03-02 6:41 ` Bob Liu @ 2017-03-02 6:47 ` Anshuman Khandual 2017-03-02 8:42 ` Michal Hocko 2017-03-02 10:04 ` Tetsuo Handa 2 siblings, 1 reply; 36+ messages in thread From: Anshuman Khandual @ 2017-03-02 6:47 UTC (permalink / raw) To: Xiong Zhou, Christoph Hellwig, mhocko Cc: linux-xfs, linux-mm, linux-kernel, linux-fsdevel On 03/02/2017 10:49 AM, Xiong Zhou wrote: > On Wed, Mar 01, 2017 at 04:37:31PM -0800, Christoph Hellwig wrote: >> On Wed, Mar 01, 2017 at 12:46:34PM +0800, Xiong Zhou wrote: >>> Hi, >>> >>> It's reproduciable, not everytime though. Ext4 works fine. >> On ext4 fsstress won't run bulkstat because it doesn't exist. Either >> way this smells like a MM issue to me as there were not XFS changes >> in that area recently. > Yap. > > First bad commit: > > commit 5d17a73a2ebeb8d1c6924b91e53ab2650fe86ffb > Author: Michal Hocko <mhocko@suse.com> > Date: Fri Feb 24 14:58:53 2017 -0800 > > vmalloc: back off when the current task is killed > > Reverting this commit on top of > e5d56ef Merge tag 'watchdog-for-linus-v4.11' > survives the tests. Does fsstress test or the system hang ? I am not familiar with this code but If it's the test which is getting hung and its hitting this new check introduced by the above commit that means the requester is currently being killed by OOM killer for some other memory allocation request. Then is not this kind if memory alloc failure expected ? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: mm allocation failure and hang when running xfstests generic/269 on xfs 2017-03-02 6:47 ` Anshuman Khandual @ 2017-03-02 8:42 ` Michal Hocko 2017-03-02 9:23 ` Xiong Zhou 0 siblings, 1 reply; 36+ messages in thread From: Michal Hocko @ 2017-03-02 8:42 UTC (permalink / raw) To: Xiong Zhou, Anshuman Khandual Cc: Christoph Hellwig, linux-xfs, linux-mm, linux-kernel, linux-fsdevel On Thu 02-03-17 12:17:47, Anshuman Khandual wrote: > On 03/02/2017 10:49 AM, Xiong Zhou wrote: > > On Wed, Mar 01, 2017 at 04:37:31PM -0800, Christoph Hellwig wrote: > >> On Wed, Mar 01, 2017 at 12:46:34PM +0800, Xiong Zhou wrote: > >>> Hi, > >>> > >>> It's reproduciable, not everytime though. Ext4 works fine. > >> On ext4 fsstress won't run bulkstat because it doesn't exist. Either > >> way this smells like a MM issue to me as there were not XFS changes > >> in that area recently. > > Yap. > > > > First bad commit: > > > > commit 5d17a73a2ebeb8d1c6924b91e53ab2650fe86ffb > > Author: Michal Hocko <mhocko@suse.com> > > Date: Fri Feb 24 14:58:53 2017 -0800 > > > > vmalloc: back off when the current task is killed > > > > Reverting this commit on top of > > e5d56ef Merge tag 'watchdog-for-linus-v4.11' > > survives the tests. > > Does fsstress test or the system hang ? I am not familiar with this > code but If it's the test which is getting hung and its hitting this > new check introduced by the above commit that means the requester is > currently being killed by OOM killer for some other memory allocation > request. Well, not exactly. It is sufficient for it to be _killed_ by SIGKILL. And for that it just needs to do a group_exit when one thread was still in the kernel (see zap_process). While I can change this check to actually do the oom specific check I believe a more generic fatal_signal_pending is the right thing to do here. I am still not sure what is the actual problem here, though. Could you be more specific please? -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: mm allocation failure and hang when running xfstests generic/269 on xfs 2017-03-02 8:42 ` Michal Hocko @ 2017-03-02 9:23 ` Xiong Zhou 0 siblings, 0 replies; 36+ messages in thread From: Xiong Zhou @ 2017-03-02 9:23 UTC (permalink / raw) To: Michal Hocko Cc: Xiong Zhou, Anshuman Khandual, Christoph Hellwig, linux-xfs, linux-mm, linux-kernel, linux-fsdevel On Thu, Mar 02, 2017 at 09:42:23AM +0100, Michal Hocko wrote: > On Thu 02-03-17 12:17:47, Anshuman Khandual wrote: > > On 03/02/2017 10:49 AM, Xiong Zhou wrote: > > > On Wed, Mar 01, 2017 at 04:37:31PM -0800, Christoph Hellwig wrote: > > >> On Wed, Mar 01, 2017 at 12:46:34PM +0800, Xiong Zhou wrote: > > >>> Hi, > > >>> > > >>> It's reproduciable, not everytime though. Ext4 works fine. > > >> On ext4 fsstress won't run bulkstat because it doesn't exist. Either > > >> way this smells like a MM issue to me as there were not XFS changes > > >> in that area recently. > > > Yap. > > > > > > First bad commit: > > > > > > commit 5d17a73a2ebeb8d1c6924b91e53ab2650fe86ffb > > > Author: Michal Hocko <mhocko@suse.com> > > > Date: Fri Feb 24 14:58:53 2017 -0800 > > > > > > vmalloc: back off when the current task is killed > > > > > > Reverting this commit on top of > > > e5d56ef Merge tag 'watchdog-for-linus-v4.11' > > > survives the tests. > > > > Does fsstress test or the system hang ? I am not familiar with this > > code but If it's the test which is getting hung and its hitting this > > new check introduced by the above commit that means the requester is > > currently being killed by OOM killer for some other memory allocation > > request. > > Well, not exactly. It is sufficient for it to be _killed_ by SIGKILL. > And for that it just needs to do a group_exit when one thread was still > in the kernel (see zap_process). While I can change this check to > actually do the oom specific check I believe a more generic > fatal_signal_pending is the right thing to do here. I am still not sure > what is the actual problem here, though. Could you be more specific > please? It's blocking the test and system-shutdown. fsstress wont exit. For anyone interested, a simple ugly reproducer: cat > fst.sh <<EOFF #! /bin/bash -x [ \$# -ne 3 ] && { echo "./single FSTYP TEST XFSTESTS_DIR"; exit 1; } FST=\$1 BLKSZ=4096 fallocate -l 10G /home/test.img fallocate -l 15G /home/scratch.img MNT1=/loopmnt MNT2=/loopsch mkdir -p \$MNT1 mkdir -p \$MNT2 DEV1=\$(losetup --find --show /home/test.img) DEV2=\$(losetup --find --show /home/scratch.img) cleanup() { umount -d \$MNT1 umount -d \$MNT2 umount \$DEV1 \$DEV2 losetup -D || losetup -a | awk -F: '{print \$1}' | xargs losetup -d rm -f /home/{test,scratch}.img } trap cleanup 0 1 2 if [[ \$FST =~ ext ]] ; then mkfs.\${FST} -Fq -b \${BLKSZ} \$DEV1 elif [[ \$FST =~ xfs ]] ; then mkfs.\${FST} -fq -b size=\${BLKSZ} \$DEV1 fi if test \$? -ne 0 ; then echo "mkfs \$DEV1 failed" exit 1 fi if [[ \$FST =~ ext ]] ; then mkfs.\${FST} -Fq -b \${BLKSZ} \$DEV2 elif [[ \$FST =~ xfs ]] ; then mkfs.\${FST} -fq -b size=\${BLKSZ} \$DEV2 fi if test \$? -ne 0 ; then echo "mkfs \$DEV2 failed" exit 1 fi mount \$DEV1 \$MNT1 if test \$? -ne 0 ; then echo "mount \$DEV1 failed" exit 1 fi mount \$DEV2 \$MNT2 if test \$? -ne 0 ; then echo "mount \$DEV2 failed" exit 1 fi pushd \$3 || exit 1 cat > local.config <<EOF TEST_DEV=\$DEV1 TEST_DIR=\$MNT1 SCRATCH_MNT=\$MNT2 SCRATCH_DEV=\$DEV2 EOF i=0 while [ \$i -lt 50 ] ; do ./check \$2 echo \$i ((i=\$i+1)) done popd EOFF git clone git://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git cd xfstests-dev make -j2 && make install || exit 1 cd - sh -x ./fst.sh xfs generic/269 xfstests-dev > -- > Michal Hocko > SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: mm allocation failure and hang when running xfstests generic/269 on xfs 2017-03-02 5:19 ` Xiong Zhou 2017-03-02 6:41 ` Bob Liu 2017-03-02 6:47 ` Anshuman Khandual @ 2017-03-02 10:04 ` Tetsuo Handa 2017-03-02 10:35 ` Michal Hocko 2 siblings, 1 reply; 36+ messages in thread From: Tetsuo Handa @ 2017-03-02 10:04 UTC (permalink / raw) To: Xiong Zhou, Christoph Hellwig, mhocko Cc: linux-xfs, linux-mm, linux-kernel, linux-fsdevel On 2017/03/02 14:19, Xiong Zhou wrote: > On Wed, Mar 01, 2017 at 04:37:31PM -0800, Christoph Hellwig wrote: >> On Wed, Mar 01, 2017 at 12:46:34PM +0800, Xiong Zhou wrote: >>> Hi, >>> >>> It's reproduciable, not everytime though. Ext4 works fine. >> >> On ext4 fsstress won't run bulkstat because it doesn't exist. Either >> way this smells like a MM issue to me as there were not XFS changes >> in that area recently. > > Yap. > > First bad commit: > > commit 5d17a73a2ebeb8d1c6924b91e53ab2650fe86ffb > Author: Michal Hocko <mhocko@suse.com> > Date: Fri Feb 24 14:58:53 2017 -0800 > > vmalloc: back off when the current task is killed > > Reverting this commit on top of > e5d56ef Merge tag 'watchdog-for-linus-v4.11' > survives the tests. > Looks like kmem_zalloc_greedy() is broken. It loops forever until vzalloc() succeeds. If somebody (not limited to the OOM killer) sends SIGKILL and vmalloc() backs off, kmem_zalloc_greedy() will loop forever. ---------------------------------------- void * kmem_zalloc_greedy(size_t *size, size_t minsize, size_t maxsize) { void *ptr; size_t kmsize = maxsize; while (!(ptr = vzalloc(kmsize))) { if ((kmsize >>= 1) <= minsize) kmsize = minsize; } if (ptr) *size = kmsize; return ptr; } ---------------------------------------- So, commit 5d17a73a2ebeb8d1("vmalloc: back off when the current task is killed") implemented __GFP_KILLABLE flag and automatically applied that flag. As a result, those who are not ready to fail upon SIGKILL are confused. ;-) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: mm allocation failure and hang when running xfstests generic/269 on xfs 2017-03-02 10:04 ` Tetsuo Handa @ 2017-03-02 10:35 ` Michal Hocko 2017-03-02 10:53 ` mm allocation failure and hang when running xfstests generic/269on xfs Tetsuo Handa 2017-03-02 12:24 ` mm allocation failure and hang when running xfstests generic/269 on xfs Brian Foster 0 siblings, 2 replies; 36+ messages in thread From: Michal Hocko @ 2017-03-02 10:35 UTC (permalink / raw) To: Tetsuo Handa Cc: Xiong Zhou, Christoph Hellwig, linux-xfs, linux-mm, linux-kernel, linux-fsdevel On Thu 02-03-17 19:04:48, Tetsuo Handa wrote: [...] > So, commit 5d17a73a2ebeb8d1("vmalloc: back off when the current task is > killed") implemented __GFP_KILLABLE flag and automatically applied that > flag. As a result, those who are not ready to fail upon SIGKILL are > confused. ;-) You are right! The function is documented it might fail but the code doesn't really allow that. This seems like a bug to me. What do you think about the following? --- ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: mm allocation failure and hang when running xfstests generic/269on xfs 2017-03-02 10:35 ` Michal Hocko @ 2017-03-02 10:53 ` Tetsuo Handa 2017-03-02 12:24 ` mm allocation failure and hang when running xfstests generic/269 on xfs Brian Foster 1 sibling, 0 replies; 36+ messages in thread From: Tetsuo Handa @ 2017-03-02 10:53 UTC (permalink / raw) To: mhocko; +Cc: xzhou, hch, linux-xfs, linux-mm, linux-kernel, linux-fsdevel Michal Hocko wrote: > On Thu 02-03-17 19:04:48, Tetsuo Handa wrote: > [...] > > So, commit 5d17a73a2ebeb8d1("vmalloc: back off when the current task is > > killed") implemented __GFP_KILLABLE flag and automatically applied that > > flag. As a result, those who are not ready to fail upon SIGKILL are > > confused. ;-) > > You are right! The function is documented it might fail but the code > doesn't really allow that. This seems like a bug to me. What do you > think about the following? Yes, I think this patch is correct. Maybe -May fail and may return vmalloced memory. +May fail. together, for it is always vmalloc()ed memory. > --- > From d02cb0285d8ce3344fd64dc7e2912e9a04bef80d Mon Sep 17 00:00:00 2001 > From: Michal Hocko <mhocko@suse.com> > Date: Thu, 2 Mar 2017 11:31:11 +0100 > Subject: [PATCH] xfs: allow kmem_zalloc_greedy to fail > > Even though kmem_zalloc_greedy is documented it might fail the current > code doesn't really implement this properly and loops on the smallest > allowed size for ever. This is a problem because vzalloc might fail > permanently. Since 5d17a73a2ebe ("vmalloc: back off when the current > task is killed") such a failure is much more probable than it used to > be. Fix this by bailing out if the minimum size request failed. > > This has been noticed by a hung generic/269 xfstest by Xiong Zhou. > > Reported-by: Xiong Zhou <xzhou@redhat.com> > Analyzed-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> > Signed-off-by: Michal Hocko <mhocko@suse.com> > --- > fs/xfs/kmem.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c > index 339c696bbc01..ee95f5c6db45 100644 > --- a/fs/xfs/kmem.c > +++ b/fs/xfs/kmem.c > @@ -34,6 +34,8 @@ kmem_zalloc_greedy(size_t *size, size_t minsize, size_t maxsize) > size_t kmsize = maxsize; > > while (!(ptr = vzalloc(kmsize))) { > + if (kmsize == minsize) > + break; > if ((kmsize >>= 1) <= minsize) > kmsize = minsize; > } > -- > 2.11.0 > > -- > Michal Hocko > SUSE Labs > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: mm allocation failure and hang when running xfstests generic/269 on xfs 2017-03-02 10:35 ` Michal Hocko 2017-03-02 10:53 ` mm allocation failure and hang when running xfstests generic/269on xfs Tetsuo Handa @ 2017-03-02 12:24 ` Brian Foster 2017-03-02 12:49 ` Michal Hocko 1 sibling, 1 reply; 36+ messages in thread From: Brian Foster @ 2017-03-02 12:24 UTC (permalink / raw) To: Michal Hocko Cc: Tetsuo Handa, Xiong Zhou, Christoph Hellwig, linux-xfs, linux-mm, linux-kernel, linux-fsdevel On Thu, Mar 02, 2017 at 11:35:20AM +0100, Michal Hocko wrote: > On Thu 02-03-17 19:04:48, Tetsuo Handa wrote: > [...] > > So, commit 5d17a73a2ebeb8d1("vmalloc: back off when the current task is > > killed") implemented __GFP_KILLABLE flag and automatically applied that > > flag. As a result, those who are not ready to fail upon SIGKILL are > > confused. ;-) > > You are right! The function is documented it might fail but the code > doesn't really allow that. This seems like a bug to me. What do you > think about the following? > --- > From d02cb0285d8ce3344fd64dc7e2912e9a04bef80d Mon Sep 17 00:00:00 2001 > From: Michal Hocko <mhocko@suse.com> > Date: Thu, 2 Mar 2017 11:31:11 +0100 > Subject: [PATCH] xfs: allow kmem_zalloc_greedy to fail > > Even though kmem_zalloc_greedy is documented it might fail the current > code doesn't really implement this properly and loops on the smallest > allowed size for ever. This is a problem because vzalloc might fail > permanently. Since 5d17a73a2ebe ("vmalloc: back off when the current > task is killed") such a failure is much more probable than it used to > be. Fix this by bailing out if the minimum size request failed. > > This has been noticed by a hung generic/269 xfstest by Xiong Zhou. > > Reported-by: Xiong Zhou <xzhou@redhat.com> > Analyzed-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> > Signed-off-by: Michal Hocko <mhocko@suse.com> > --- > fs/xfs/kmem.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c > index 339c696bbc01..ee95f5c6db45 100644 > --- a/fs/xfs/kmem.c > +++ b/fs/xfs/kmem.c > @@ -34,6 +34,8 @@ kmem_zalloc_greedy(size_t *size, size_t minsize, size_t maxsize) > size_t kmsize = maxsize; > > while (!(ptr = vzalloc(kmsize))) { > + if (kmsize == minsize) > + break; > if ((kmsize >>= 1) <= minsize) > kmsize = minsize; > } More consistent with the rest of the kmem code might be to accept a flags argument and do something like this based on KM_MAYFAIL. The one current caller looks like it would pass it, but I suppose we'd still need a mechanism to break out should a new caller not pass that flag. Would a fatal_signal_pending() check in the loop as well allow us to break out in the scenario that is reproduced here? Brian > -- > 2.11.0 > > -- > Michal Hocko > SUSE Labs > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: mm allocation failure and hang when running xfstests generic/269 on xfs 2017-03-02 12:24 ` mm allocation failure and hang when running xfstests generic/269 on xfs Brian Foster @ 2017-03-02 12:49 ` Michal Hocko 2017-03-02 13:00 ` Brian Foster 0 siblings, 1 reply; 36+ messages in thread From: Michal Hocko @ 2017-03-02 12:49 UTC (permalink / raw) To: Brian Foster Cc: Tetsuo Handa, Xiong Zhou, Christoph Hellwig, linux-xfs, linux-mm, linux-kernel, linux-fsdevel On Thu 02-03-17 07:24:27, Brian Foster wrote: > On Thu, Mar 02, 2017 at 11:35:20AM +0100, Michal Hocko wrote: > > On Thu 02-03-17 19:04:48, Tetsuo Handa wrote: > > [...] > > > So, commit 5d17a73a2ebeb8d1("vmalloc: back off when the current task is > > > killed") implemented __GFP_KILLABLE flag and automatically applied that > > > flag. As a result, those who are not ready to fail upon SIGKILL are > > > confused. ;-) > > > > You are right! The function is documented it might fail but the code > > doesn't really allow that. This seems like a bug to me. What do you > > think about the following? > > --- > > From d02cb0285d8ce3344fd64dc7e2912e9a04bef80d Mon Sep 17 00:00:00 2001 > > From: Michal Hocko <mhocko@suse.com> > > Date: Thu, 2 Mar 2017 11:31:11 +0100 > > Subject: [PATCH] xfs: allow kmem_zalloc_greedy to fail > > > > Even though kmem_zalloc_greedy is documented it might fail the current > > code doesn't really implement this properly and loops on the smallest > > allowed size for ever. This is a problem because vzalloc might fail > > permanently. Since 5d17a73a2ebe ("vmalloc: back off when the current > > task is killed") such a failure is much more probable than it used to > > be. Fix this by bailing out if the minimum size request failed. > > > > This has been noticed by a hung generic/269 xfstest by Xiong Zhou. > > > > Reported-by: Xiong Zhou <xzhou@redhat.com> > > Analyzed-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> > > Signed-off-by: Michal Hocko <mhocko@suse.com> > > --- > > fs/xfs/kmem.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c > > index 339c696bbc01..ee95f5c6db45 100644 > > --- a/fs/xfs/kmem.c > > +++ b/fs/xfs/kmem.c > > @@ -34,6 +34,8 @@ kmem_zalloc_greedy(size_t *size, size_t minsize, size_t maxsize) > > size_t kmsize = maxsize; > > > > while (!(ptr = vzalloc(kmsize))) { > > + if (kmsize == minsize) > > + break; > > if ((kmsize >>= 1) <= minsize) > > kmsize = minsize; > > } > > More consistent with the rest of the kmem code might be to accept a > flags argument and do something like this based on KM_MAYFAIL. Well, vmalloc doesn't really support GFP_NOFAIL semantic right now for the same reason it doesn't support GFP_NOFS. So I am not sure this is a good idea. > The one > current caller looks like it would pass it, but I suppose we'd still > need a mechanism to break out should a new caller not pass that flag. > Would a fatal_signal_pending() check in the loop as well allow us to > break out in the scenario that is reproduced here? Yes that check would work as well I just thought the break out when the minsize request fails to be more logical. There might be other reasons to fail the request and looping here seems just wrong. But whatever you or other xfs people prefer. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: mm allocation failure and hang when running xfstests generic/269 on xfs 2017-03-02 12:49 ` Michal Hocko @ 2017-03-02 13:00 ` Brian Foster 2017-03-02 13:07 ` Tetsuo Handa 2017-03-02 13:27 ` Michal Hocko 0 siblings, 2 replies; 36+ messages in thread From: Brian Foster @ 2017-03-02 13:00 UTC (permalink / raw) To: Michal Hocko Cc: Tetsuo Handa, Xiong Zhou, Christoph Hellwig, linux-xfs, linux-mm, linux-kernel, linux-fsdevel On Thu, Mar 02, 2017 at 01:49:09PM +0100, Michal Hocko wrote: > On Thu 02-03-17 07:24:27, Brian Foster wrote: > > On Thu, Mar 02, 2017 at 11:35:20AM +0100, Michal Hocko wrote: > > > On Thu 02-03-17 19:04:48, Tetsuo Handa wrote: > > > [...] > > > > So, commit 5d17a73a2ebeb8d1("vmalloc: back off when the current task is > > > > killed") implemented __GFP_KILLABLE flag and automatically applied that > > > > flag. As a result, those who are not ready to fail upon SIGKILL are > > > > confused. ;-) > > > > > > You are right! The function is documented it might fail but the code > > > doesn't really allow that. This seems like a bug to me. What do you > > > think about the following? > > > --- > > > From d02cb0285d8ce3344fd64dc7e2912e9a04bef80d Mon Sep 17 00:00:00 2001 > > > From: Michal Hocko <mhocko@suse.com> > > > Date: Thu, 2 Mar 2017 11:31:11 +0100 > > > Subject: [PATCH] xfs: allow kmem_zalloc_greedy to fail > > > > > > Even though kmem_zalloc_greedy is documented it might fail the current > > > code doesn't really implement this properly and loops on the smallest > > > allowed size for ever. This is a problem because vzalloc might fail > > > permanently. Since 5d17a73a2ebe ("vmalloc: back off when the current > > > task is killed") such a failure is much more probable than it used to > > > be. Fix this by bailing out if the minimum size request failed. > > > > > > This has been noticed by a hung generic/269 xfstest by Xiong Zhou. > > > > > > Reported-by: Xiong Zhou <xzhou@redhat.com> > > > Analyzed-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> > > > Signed-off-by: Michal Hocko <mhocko@suse.com> > > > --- > > > fs/xfs/kmem.c | 2 ++ > > > 1 file changed, 2 insertions(+) > > > > > > diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c > > > index 339c696bbc01..ee95f5c6db45 100644 > > > --- a/fs/xfs/kmem.c > > > +++ b/fs/xfs/kmem.c > > > @@ -34,6 +34,8 @@ kmem_zalloc_greedy(size_t *size, size_t minsize, size_t maxsize) > > > size_t kmsize = maxsize; > > > > > > while (!(ptr = vzalloc(kmsize))) { > > > + if (kmsize == minsize) > > > + break; > > > if ((kmsize >>= 1) <= minsize) > > > kmsize = minsize; > > > } > > > > More consistent with the rest of the kmem code might be to accept a > > flags argument and do something like this based on KM_MAYFAIL. > > Well, vmalloc doesn't really support GFP_NOFAIL semantic right now for > the same reason it doesn't support GFP_NOFS. So I am not sure this is a > good idea. > Not sure I follow..? I'm just suggesting to control the loop behavior based on the KM_ flag, not to do or change anything wrt to GFP_ flags. > > The one > > current caller looks like it would pass it, but I suppose we'd still > > need a mechanism to break out should a new caller not pass that flag. > > Would a fatal_signal_pending() check in the loop as well allow us to > > break out in the scenario that is reproduced here? > > Yes that check would work as well I just thought the break out when the > minsize request fails to be more logical. There might be other reasons > to fail the request and looping here seems just wrong. But whatever you > or other xfs people prefer. There may be higher level reasons for why this code should or should not loop, that just seems like a separate issue to me. My thinking is more that this appears to be how every kmem_*() function operates today and it seems a bit out of place to change behavior of one to fix a bug. Maybe I'm missing something though.. are we subject to the same general problem in any of the other kmem_*() functions that can currently loop indefinitely? Brian > -- > Michal Hocko > SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: mm allocation failure and hang when running xfstests generic/269 on xfs 2017-03-02 13:00 ` Brian Foster @ 2017-03-02 13:07 ` Tetsuo Handa 2017-03-02 13:27 ` Michal Hocko 1 sibling, 0 replies; 36+ messages in thread From: Tetsuo Handa @ 2017-03-02 13:07 UTC (permalink / raw) To: bfoster, mhocko Cc: xzhou, hch, linux-xfs, linux-mm, linux-kernel, linux-fsdevel Brian Foster wrote: > On Thu, Mar 02, 2017 at 01:49:09PM +0100, Michal Hocko wrote: > > On Thu 02-03-17 07:24:27, Brian Foster wrote: > > > On Thu, Mar 02, 2017 at 11:35:20AM +0100, Michal Hocko wrote: > > > > On Thu 02-03-17 19:04:48, Tetsuo Handa wrote: > > > > [...] > > > > > So, commit 5d17a73a2ebeb8d1("vmalloc: back off when the current task is > > > > > killed") implemented __GFP_KILLABLE flag and automatically applied that > > > > > flag. As a result, those who are not ready to fail upon SIGKILL are > > > > > confused. ;-) > > > > > > > > You are right! The function is documented it might fail but the code > > > > doesn't really allow that. This seems like a bug to me. What do you > > > > think about the following? > > > > --- > > > > From d02cb0285d8ce3344fd64dc7e2912e9a04bef80d Mon Sep 17 00:00:00 2001 > > > > From: Michal Hocko <mhocko@suse.com> > > > > Date: Thu, 2 Mar 2017 11:31:11 +0100 > > > > Subject: [PATCH] xfs: allow kmem_zalloc_greedy to fail > > > > > > > > Even though kmem_zalloc_greedy is documented it might fail the current > > > > code doesn't really implement this properly and loops on the smallest > > > > allowed size for ever. This is a problem because vzalloc might fail > > > > permanently. Since 5d17a73a2ebe ("vmalloc: back off when the current > > > > task is killed") such a failure is much more probable than it used to > > > > be. Fix this by bailing out if the minimum size request failed. > > > > > > > > This has been noticed by a hung generic/269 xfstest by Xiong Zhou. > > > > > > > > Reported-by: Xiong Zhou <xzhou@redhat.com> > > > > Analyzed-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> > > > > Signed-off-by: Michal Hocko <mhocko@suse.com> > > > > --- > > > > fs/xfs/kmem.c | 2 ++ > > > > 1 file changed, 2 insertions(+) > > > > > > > > diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c > > > > index 339c696bbc01..ee95f5c6db45 100644 > > > > --- a/fs/xfs/kmem.c > > > > +++ b/fs/xfs/kmem.c > > > > @@ -34,6 +34,8 @@ kmem_zalloc_greedy(size_t *size, size_t minsize, size_t maxsize) > > > > size_t kmsize = maxsize; > > > > > > > > while (!(ptr = vzalloc(kmsize))) { > > > > + if (kmsize == minsize) > > > > + break; > > > > if ((kmsize >>= 1) <= minsize) > > > > kmsize = minsize; > > > > } > > > > > > More consistent with the rest of the kmem code might be to accept a > > > flags argument and do something like this based on KM_MAYFAIL. > > > > Well, vmalloc doesn't really support GFP_NOFAIL semantic right now for > > the same reason it doesn't support GFP_NOFS. So I am not sure this is a > > good idea. > > > > Not sure I follow..? I'm just suggesting to control the loop behavior > based on the KM_ flag, not to do or change anything wrt to GFP_ flags. vmalloc() cannot support __GFP_NOFAIL. Therefore, kmem_zalloc_greedy() cannot support !KM_MAYFAIL. We will change kmem_zalloc_greedy() not to use vmalloc() when we need to support !KM_MAYFAIL. That won't be now. > > > > The one > > > current caller looks like it would pass it, but I suppose we'd still > > > need a mechanism to break out should a new caller not pass that flag. > > > Would a fatal_signal_pending() check in the loop as well allow us to > > > break out in the scenario that is reproduced here? > > > > Yes that check would work as well I just thought the break out when the > > minsize request fails to be more logical. There might be other reasons > > to fail the request and looping here seems just wrong. But whatever you > > or other xfs people prefer. > > There may be higher level reasons for why this code should or should not > loop, that just seems like a separate issue to me. My thinking is more > that this appears to be how every kmem_*() function operates today and > it seems a bit out of place to change behavior of one to fix a bug. > > Maybe I'm missing something though.. are we subject to the same general > problem in any of the other kmem_*() functions that can currently loop > indefinitely? > The kmem code looks to me a source of problems. For example, kmem_alloc() by RESCUER workqueue threads got stuck doing GFP_NOFS allocation. Due to __GFP_NOWARN, warn_alloc() cannot warn allocation stalls. Due to order <= PAGE_ALLOC_COSTLY_ORDER, the "%s(%u) possible memory allocation deadlock size %u in %s (mode:0x%x)" message cannot be printed because __alloc_pages_nodemask() does not return. And due to GFP_NOFS without __GFP_HIGH nor __GFP_NOFAIL, memory cannot be allocated to RESCUER thread which are trying to allocate memory for reclaiming memory; the result is silent lockup. (I must stop here, for this is a different thread.) ---------------------------------------- [ 1095.633625] MemAlloc: xfs-data/sda1(451) flags=0x4228060 switches=45509 seq=1 gfp=0x1604240(GFP_NOFS|__GFP_NOWARN|__GFP_COMP|__GFP_NOTRACK) order=0 delay=652073 [ 1095.633626] xfs-data/sda1 R running task 12696 451 2 0x00000000 [ 1095.633663] Workqueue: xfs-data/sda1 xfs_end_io [xfs] [ 1095.633665] Call Trace: [ 1095.633668] __schedule+0x336/0xe00 [ 1095.633671] schedule+0x3d/0x90 [ 1095.633672] schedule_timeout+0x20d/0x510 [ 1095.633675] ? lock_timer_base+0xa0/0xa0 [ 1095.633678] schedule_timeout_uninterruptible+0x2a/0x30 [ 1095.633680] __alloc_pages_slowpath+0x2b5/0xd95 [ 1095.633687] __alloc_pages_nodemask+0x3e4/0x460 [ 1095.633699] alloc_pages_current+0x97/0x1b0 [ 1095.633702] new_slab+0x4cb/0x6b0 [ 1095.633706] ___slab_alloc+0x3a3/0x620 [ 1095.633728] ? kmem_alloc+0x96/0x120 [xfs] [ 1095.633730] ? ___slab_alloc+0x5c6/0x620 [ 1095.633732] ? cpuacct_charge+0x38/0x1e0 [ 1095.633767] ? kmem_alloc+0x96/0x120 [xfs] [ 1095.633770] __slab_alloc+0x46/0x7d [ 1095.633773] __kmalloc+0x301/0x3b0 [ 1095.633802] kmem_alloc+0x96/0x120 [xfs] [ 1095.633804] ? kfree+0x1fa/0x330 [ 1095.633842] xfs_log_commit_cil+0x489/0x710 [xfs] [ 1095.633864] __xfs_trans_commit+0x83/0x260 [xfs] [ 1095.633883] xfs_trans_commit+0x10/0x20 [xfs] [ 1095.633901] __xfs_setfilesize+0xdb/0x240 [xfs] [ 1095.633936] xfs_setfilesize_ioend+0x89/0xb0 [xfs] [ 1095.633954] ? xfs_setfilesize_ioend+0x5/0xb0 [xfs] [ 1095.633971] xfs_end_io+0x81/0x110 [xfs] [ 1095.633973] process_one_work+0x22b/0x760 [ 1095.633975] ? process_one_work+0x194/0x760 [ 1095.633997] rescuer_thread+0x1f2/0x3d0 [ 1095.634002] kthread+0x10f/0x150 [ 1095.634003] ? worker_thread+0x4b0/0x4b0 [ 1095.634004] ? kthread_create_on_node+0x70/0x70 [ 1095.634007] ret_from_fork+0x31/0x40 [ 1095.634013] MemAlloc: xfs-eofblocks/s(456) flags=0x4228860 switches=15435 seq=1 gfp=0x1400240(GFP_NOFS|__GFP_NOWARN) order=0 delay=293074 [ 1095.634014] xfs-eofblocks/s R running task 12032 456 2 0x00000000 [ 1095.634037] Workqueue: xfs-eofblocks/sda1 xfs_eofblocks_worker [xfs] [ 1095.634038] Call Trace: [ 1095.634040] ? _raw_spin_lock+0x3d/0x80 [ 1095.634042] ? vmpressure+0xd0/0x120 [ 1095.634044] ? vmpressure+0xd0/0x120 [ 1095.634047] ? vmpressure_prio+0x21/0x30 [ 1095.634049] ? do_try_to_free_pages+0x70/0x300 [ 1095.634052] ? try_to_free_pages+0x131/0x3f0 [ 1095.634058] ? __alloc_pages_slowpath+0x3ec/0xd95 [ 1095.634065] ? __alloc_pages_nodemask+0x3e4/0x460 [ 1095.634069] ? alloc_pages_current+0x97/0x1b0 [ 1095.634111] ? xfs_buf_allocate_memory+0x160/0x2a3 [xfs] [ 1095.634133] ? xfs_buf_get_map+0x2be/0x480 [xfs] [ 1095.634169] ? xfs_buf_read_map+0x2c/0x400 [xfs] [ 1095.634204] ? xfs_trans_read_buf_map+0x186/0x830 [xfs] [ 1095.634222] ? xfs_btree_read_buf_block.constprop.34+0x78/0xc0 [xfs] [ 1095.634239] ? xfs_btree_lookup_get_block+0x8a/0x180 [xfs] [ 1095.634257] ? xfs_btree_lookup+0xd0/0x3f0 [xfs] [ 1095.634296] ? kmem_zone_alloc+0x96/0x120 [xfs] [ 1095.634299] ? _raw_spin_unlock+0x27/0x40 [ 1095.634315] ? xfs_bmbt_lookup_eq+0x1f/0x30 [xfs] [ 1095.634348] ? xfs_bmap_del_extent+0x1b2/0x1610 [xfs] [ 1095.634380] ? kmem_zone_alloc+0x96/0x120 [xfs] [ 1095.634400] ? __xfs_bunmapi+0x4db/0xda0 [xfs] [ 1095.634421] ? xfs_bunmapi+0x2b/0x40 [xfs] [ 1095.634459] ? xfs_itruncate_extents+0x1df/0x780 [xfs] [ 1095.634502] ? xfs_rename+0xc70/0x1080 [xfs] [ 1095.634525] ? xfs_free_eofblocks+0x1c4/0x230 [xfs] [ 1095.634546] ? xfs_inode_free_eofblocks+0x18d/0x280 [xfs] [ 1095.634565] ? xfs_inode_ag_walk.isra.13+0x2b5/0x620 [xfs] [ 1095.634582] ? xfs_inode_ag_walk.isra.13+0x91/0x620 [xfs] [ 1095.634618] ? xfs_inode_clear_eofblocks_tag+0x1a0/0x1a0 [xfs] [ 1095.634630] ? radix_tree_next_chunk+0x10b/0x2d0 [ 1095.634635] ? radix_tree_gang_lookup_tag+0xd7/0x150 [ 1095.634672] ? xfs_perag_get_tag+0x11d/0x370 [xfs] [ 1095.634690] ? xfs_perag_get_tag+0x5/0x370 [xfs] [ 1095.634709] ? xfs_inode_ag_iterator_tag+0x71/0xa0 [xfs] [ 1095.634726] ? xfs_inode_clear_eofblocks_tag+0x1a0/0x1a0 [xfs] [ 1095.634744] ? __xfs_icache_free_eofblocks+0x3b/0x40 [xfs] [ 1095.634759] ? xfs_eofblocks_worker+0x27/0x40 [xfs] [ 1095.634762] ? process_one_work+0x22b/0x760 [ 1095.634763] ? process_one_work+0x194/0x760 [ 1095.634784] ? rescuer_thread+0x1f2/0x3d0 [ 1095.634788] ? kthread+0x10f/0x150 [ 1095.634789] ? worker_thread+0x4b0/0x4b0 [ 1095.634790] ? kthread_create_on_node+0x70/0x70 [ 1095.634793] ? ret_from_fork+0x31/0x40 ---------------------------------------- > Brian > > > -- > > Michal Hocko > > SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: mm allocation failure and hang when running xfstests generic/269 on xfs 2017-03-02 13:00 ` Brian Foster 2017-03-02 13:07 ` Tetsuo Handa @ 2017-03-02 13:27 ` Michal Hocko 2017-03-02 13:41 ` Brian Foster 1 sibling, 1 reply; 36+ messages in thread From: Michal Hocko @ 2017-03-02 13:27 UTC (permalink / raw) To: Brian Foster Cc: Tetsuo Handa, Xiong Zhou, Christoph Hellwig, linux-xfs, linux-mm, linux-kernel, linux-fsdevel On Thu 02-03-17 08:00:09, Brian Foster wrote: > On Thu, Mar 02, 2017 at 01:49:09PM +0100, Michal Hocko wrote: > > On Thu 02-03-17 07:24:27, Brian Foster wrote: > > > On Thu, Mar 02, 2017 at 11:35:20AM +0100, Michal Hocko wrote: > > > > On Thu 02-03-17 19:04:48, Tetsuo Handa wrote: > > > > [...] > > > > > So, commit 5d17a73a2ebeb8d1("vmalloc: back off when the current task is > > > > > killed") implemented __GFP_KILLABLE flag and automatically applied that > > > > > flag. As a result, those who are not ready to fail upon SIGKILL are > > > > > confused. ;-) > > > > > > > > You are right! The function is documented it might fail but the code > > > > doesn't really allow that. This seems like a bug to me. What do you > > > > think about the following? > > > > --- > > > > From d02cb0285d8ce3344fd64dc7e2912e9a04bef80d Mon Sep 17 00:00:00 2001 > > > > From: Michal Hocko <mhocko@suse.com> > > > > Date: Thu, 2 Mar 2017 11:31:11 +0100 > > > > Subject: [PATCH] xfs: allow kmem_zalloc_greedy to fail > > > > > > > > Even though kmem_zalloc_greedy is documented it might fail the current > > > > code doesn't really implement this properly and loops on the smallest > > > > allowed size for ever. This is a problem because vzalloc might fail > > > > permanently. Since 5d17a73a2ebe ("vmalloc: back off when the current > > > > task is killed") such a failure is much more probable than it used to > > > > be. Fix this by bailing out if the minimum size request failed. > > > > > > > > This has been noticed by a hung generic/269 xfstest by Xiong Zhou. > > > > > > > > Reported-by: Xiong Zhou <xzhou@redhat.com> > > > > Analyzed-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> > > > > Signed-off-by: Michal Hocko <mhocko@suse.com> > > > > --- > > > > fs/xfs/kmem.c | 2 ++ > > > > 1 file changed, 2 insertions(+) > > > > > > > > diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c > > > > index 339c696bbc01..ee95f5c6db45 100644 > > > > --- a/fs/xfs/kmem.c > > > > +++ b/fs/xfs/kmem.c > > > > @@ -34,6 +34,8 @@ kmem_zalloc_greedy(size_t *size, size_t minsize, size_t maxsize) > > > > size_t kmsize = maxsize; > > > > > > > > while (!(ptr = vzalloc(kmsize))) { > > > > + if (kmsize == minsize) > > > > + break; > > > > if ((kmsize >>= 1) <= minsize) > > > > kmsize = minsize; > > > > } > > > > > > More consistent with the rest of the kmem code might be to accept a > > > flags argument and do something like this based on KM_MAYFAIL. > > > > Well, vmalloc doesn't really support GFP_NOFAIL semantic right now for > > the same reason it doesn't support GFP_NOFS. So I am not sure this is a > > good idea. > > > > Not sure I follow..? I'm just suggesting to control the loop behavior > based on the KM_ flag, not to do or change anything wrt to GFP_ flags. As Tetsuo already pointed out, vmalloc cannot really support never-fail semantic with the current implementation so the semantic would have to be implemented in kmem_zalloc_greedy and the only way to do that would be to loop there and this is rather nasty as you can see from the reported issue because the vmalloc failure might be permanent so there won't be any way to make a forward progress. Breaking out of the loop on fatal_signal_pending pending would break the non-failing sementic. Besides that, there doesn't really seem to be any demand for this semantic in the first place so why to make this more complicated than necessary? I see your argument about being in sync with other kmem helpers but those are bit different because regular page/slab allocators allow never fail semantic (even though this is mostly ignored by those helpers which implement their own retries but that is a different topic). -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: mm allocation failure and hang when running xfstests generic/269 on xfs 2017-03-02 13:27 ` Michal Hocko @ 2017-03-02 13:41 ` Brian Foster 2017-03-02 13:50 ` Michal Hocko 0 siblings, 1 reply; 36+ messages in thread From: Brian Foster @ 2017-03-02 13:41 UTC (permalink / raw) To: Michal Hocko Cc: Tetsuo Handa, Xiong Zhou, Christoph Hellwig, linux-xfs, linux-mm, linux-kernel, linux-fsdevel On Thu, Mar 02, 2017 at 02:27:55PM +0100, Michal Hocko wrote: > On Thu 02-03-17 08:00:09, Brian Foster wrote: > > On Thu, Mar 02, 2017 at 01:49:09PM +0100, Michal Hocko wrote: > > > On Thu 02-03-17 07:24:27, Brian Foster wrote: > > > > On Thu, Mar 02, 2017 at 11:35:20AM +0100, Michal Hocko wrote: > > > > > On Thu 02-03-17 19:04:48, Tetsuo Handa wrote: > > > > > [...] > > > > > > So, commit 5d17a73a2ebeb8d1("vmalloc: back off when the current task is > > > > > > killed") implemented __GFP_KILLABLE flag and automatically applied that > > > > > > flag. As a result, those who are not ready to fail upon SIGKILL are > > > > > > confused. ;-) > > > > > > > > > > You are right! The function is documented it might fail but the code > > > > > doesn't really allow that. This seems like a bug to me. What do you > > > > > think about the following? > > > > > --- > > > > > From d02cb0285d8ce3344fd64dc7e2912e9a04bef80d Mon Sep 17 00:00:00 2001 > > > > > From: Michal Hocko <mhocko@suse.com> > > > > > Date: Thu, 2 Mar 2017 11:31:11 +0100 > > > > > Subject: [PATCH] xfs: allow kmem_zalloc_greedy to fail > > > > > > > > > > Even though kmem_zalloc_greedy is documented it might fail the current > > > > > code doesn't really implement this properly and loops on the smallest > > > > > allowed size for ever. This is a problem because vzalloc might fail > > > > > permanently. Since 5d17a73a2ebe ("vmalloc: back off when the current > > > > > task is killed") such a failure is much more probable than it used to > > > > > be. Fix this by bailing out if the minimum size request failed. > > > > > > > > > > This has been noticed by a hung generic/269 xfstest by Xiong Zhou. > > > > > > > > > > Reported-by: Xiong Zhou <xzhou@redhat.com> > > > > > Analyzed-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> > > > > > Signed-off-by: Michal Hocko <mhocko@suse.com> > > > > > --- > > > > > fs/xfs/kmem.c | 2 ++ > > > > > 1 file changed, 2 insertions(+) > > > > > > > > > > diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c > > > > > index 339c696bbc01..ee95f5c6db45 100644 > > > > > --- a/fs/xfs/kmem.c > > > > > +++ b/fs/xfs/kmem.c > > > > > @@ -34,6 +34,8 @@ kmem_zalloc_greedy(size_t *size, size_t minsize, size_t maxsize) > > > > > size_t kmsize = maxsize; > > > > > > > > > > while (!(ptr = vzalloc(kmsize))) { > > > > > + if (kmsize == minsize) > > > > > + break; > > > > > if ((kmsize >>= 1) <= minsize) > > > > > kmsize = minsize; > > > > > } > > > > > > > > More consistent with the rest of the kmem code might be to accept a > > > > flags argument and do something like this based on KM_MAYFAIL. > > > > > > Well, vmalloc doesn't really support GFP_NOFAIL semantic right now for > > > the same reason it doesn't support GFP_NOFS. So I am not sure this is a > > > good idea. > > > > > > > Not sure I follow..? I'm just suggesting to control the loop behavior > > based on the KM_ flag, not to do or change anything wrt to GFP_ flags. > > As Tetsuo already pointed out, vmalloc cannot really support never-fail > semantic with the current implementation so the semantic would have > to be implemented in kmem_zalloc_greedy and the only way to do that > would be to loop there and this is rather nasty as you can see from the > reported issue because the vmalloc failure might be permanent so there > won't be any way to make a forward progress. Breaking out of the loop > on fatal_signal_pending pending would break the non-failing sementic. > Sure.. > Besides that, there doesn't really seem to be any demand for this > semantic in the first place so why to make this more complicated than > necessary? > That may very well be the case. I'm not necessarily against this... > I see your argument about being in sync with other kmem helpers but > those are bit different because regular page/slab allocators allow never > fail semantic (even though this is mostly ignored by those helpers which > implement their own retries but that is a different topic). > ... but what I'm trying to understand here is whether this failure scenario is specific to vmalloc() or whether the other kmem_*() functions are susceptible to the same problem. For example, suppose we replaced this kmem_zalloc_greedy() call with a kmem_zalloc(PAGE_SIZE, KM_SLEEP) call. Could we hit the same problem if the process is killed? Brian > -- > Michal Hocko > SUSE Labs > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: mm allocation failure and hang when running xfstests generic/269 on xfs 2017-03-02 13:41 ` Brian Foster @ 2017-03-02 13:50 ` Michal Hocko 2017-03-02 14:23 ` Brian Foster 0 siblings, 1 reply; 36+ messages in thread From: Michal Hocko @ 2017-03-02 13:50 UTC (permalink / raw) To: Brian Foster Cc: Tetsuo Handa, Xiong Zhou, Christoph Hellwig, linux-xfs, linux-mm, linux-kernel, linux-fsdevel On Thu 02-03-17 08:41:58, Brian Foster wrote: > On Thu, Mar 02, 2017 at 02:27:55PM +0100, Michal Hocko wrote: [...] > > I see your argument about being in sync with other kmem helpers but > > those are bit different because regular page/slab allocators allow never > > fail semantic (even though this is mostly ignored by those helpers which > > implement their own retries but that is a different topic). > > > > ... but what I'm trying to understand here is whether this failure > scenario is specific to vmalloc() or whether the other kmem_*() > functions are susceptible to the same problem. For example, suppose we > replaced this kmem_zalloc_greedy() call with a kmem_zalloc(PAGE_SIZE, > KM_SLEEP) call. Could we hit the same problem if the process is killed? Well, kmem_zalloc uses kmalloc which can also fail when we are out of memory but in that case we can expect the OOM killer releasing some memory which would allow us to make a forward progress on the next retry. So essentially retrying around kmalloc is much more safe in this regard. Failing vmalloc might be permanent because there is no vmalloc space to allocate from or much more likely due to already mentioned patch. So vmalloc is different, really. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: mm allocation failure and hang when running xfstests generic/269 on xfs 2017-03-02 13:50 ` Michal Hocko @ 2017-03-02 14:23 ` Brian Foster 2017-03-02 14:34 ` Michal Hocko 0 siblings, 1 reply; 36+ messages in thread From: Brian Foster @ 2017-03-02 14:23 UTC (permalink / raw) To: Michal Hocko Cc: Tetsuo Handa, Xiong Zhou, Christoph Hellwig, linux-xfs, linux-mm, linux-kernel, linux-fsdevel On Thu, Mar 02, 2017 at 02:50:01PM +0100, Michal Hocko wrote: > On Thu 02-03-17 08:41:58, Brian Foster wrote: > > On Thu, Mar 02, 2017 at 02:27:55PM +0100, Michal Hocko wrote: > [...] > > > I see your argument about being in sync with other kmem helpers but > > > those are bit different because regular page/slab allocators allow never > > > fail semantic (even though this is mostly ignored by those helpers which > > > implement their own retries but that is a different topic). > > > > > > > ... but what I'm trying to understand here is whether this failure > > scenario is specific to vmalloc() or whether the other kmem_*() > > functions are susceptible to the same problem. For example, suppose we > > replaced this kmem_zalloc_greedy() call with a kmem_zalloc(PAGE_SIZE, > > KM_SLEEP) call. Could we hit the same problem if the process is killed? > > Well, kmem_zalloc uses kmalloc which can also fail when we are out of > memory but in that case we can expect the OOM killer releasing some > memory which would allow us to make a forward progress on the next > retry. So essentially retrying around kmalloc is much more safe in this > regard. Failing vmalloc might be permanent because there is no vmalloc > space to allocate from or much more likely due to already mentioned > patch. So vmalloc is different, really. Right.. that's why I'm asking. So it's technically possible but highly unlikely due to the different failure characteristics. That seems reasonable to me, then. To be clear, do we understand what causes the vzalloc() failure to be effectively permanent in this specific reproducer? I know you mention above that we could be out of vmalloc space, but that doesn't clarify whether there are other potential failure paths or then what this has to do with the fact that the process was killed. Does the pending signal cause the subsequent failures or are you saying that there is some other root cause of the failure, this process would effectively be spinning here anyways, and we're just noticing it because it's trying to exit? Brian > -- > Michal Hocko > SUSE Labs > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: mm allocation failure and hang when running xfstests generic/269 on xfs 2017-03-02 14:23 ` Brian Foster @ 2017-03-02 14:34 ` Michal Hocko 2017-03-02 14:51 ` Brian Foster 0 siblings, 1 reply; 36+ messages in thread From: Michal Hocko @ 2017-03-02 14:34 UTC (permalink / raw) To: Brian Foster Cc: Tetsuo Handa, Xiong Zhou, Christoph Hellwig, linux-xfs, linux-mm, linux-kernel, linux-fsdevel On Thu 02-03-17 09:23:15, Brian Foster wrote: > On Thu, Mar 02, 2017 at 02:50:01PM +0100, Michal Hocko wrote: > > On Thu 02-03-17 08:41:58, Brian Foster wrote: > > > On Thu, Mar 02, 2017 at 02:27:55PM +0100, Michal Hocko wrote: > > [...] > > > > I see your argument about being in sync with other kmem helpers but > > > > those are bit different because regular page/slab allocators allow never > > > > fail semantic (even though this is mostly ignored by those helpers which > > > > implement their own retries but that is a different topic). > > > > > > > > > > ... but what I'm trying to understand here is whether this failure > > > scenario is specific to vmalloc() or whether the other kmem_*() > > > functions are susceptible to the same problem. For example, suppose we > > > replaced this kmem_zalloc_greedy() call with a kmem_zalloc(PAGE_SIZE, > > > KM_SLEEP) call. Could we hit the same problem if the process is killed? > > > > Well, kmem_zalloc uses kmalloc which can also fail when we are out of > > memory but in that case we can expect the OOM killer releasing some > > memory which would allow us to make a forward progress on the next > > retry. So essentially retrying around kmalloc is much more safe in this > > regard. Failing vmalloc might be permanent because there is no vmalloc > > space to allocate from or much more likely due to already mentioned > > patch. So vmalloc is different, really. > > Right.. that's why I'm asking. So it's technically possible but highly > unlikely due to the different failure characteristics. That seems > reasonable to me, then. > > To be clear, do we understand what causes the vzalloc() failure to be > effectively permanent in this specific reproducer? I know you mention > above that we could be out of vmalloc space, but that doesn't clarify > whether there are other potential failure paths or then what this has to > do with the fact that the process was killed. Does the pending signal > cause the subsequent failures or are you saying that there is some other > root cause of the failure, this process would effectively be spinning > here anyways, and we're just noticing it because it's trying to exit? In this particular case it is fatal_signal_pending that causes the permanent failure. This check has been added to prevent from complete memory reserves depletion on OOM when a killed task has a free ticket to reserves and vmalloc requests can be really large. In this case there was no OOM killer going on but fsstress has SIGKILL pending for other reason. Most probably as a result of the group_exit when all threads are killed (see zap_process). I could have turn fatal_signal_pending into tsk_is_oom_victim which would be less likely to hit but in principle fatal_signal_pending should be better because we do want to bail out when the process is existing as soon as possible. What I really wanted to say is that there are other possible permanent failure paths in vmalloc AFAICS. They are much less probable but they still exist. Does that make more sense now? -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: mm allocation failure and hang when running xfstests generic/269 on xfs 2017-03-02 14:34 ` Michal Hocko @ 2017-03-02 14:51 ` Brian Foster 2017-03-02 15:14 ` Michal Hocko 2017-03-02 15:47 ` Christoph Hellwig 0 siblings, 2 replies; 36+ messages in thread From: Brian Foster @ 2017-03-02 14:51 UTC (permalink / raw) To: Michal Hocko Cc: Tetsuo Handa, Xiong Zhou, Christoph Hellwig, linux-xfs, linux-mm, linux-kernel, linux-fsdevel On Thu, Mar 02, 2017 at 03:34:41PM +0100, Michal Hocko wrote: > On Thu 02-03-17 09:23:15, Brian Foster wrote: > > On Thu, Mar 02, 2017 at 02:50:01PM +0100, Michal Hocko wrote: > > > On Thu 02-03-17 08:41:58, Brian Foster wrote: > > > > On Thu, Mar 02, 2017 at 02:27:55PM +0100, Michal Hocko wrote: > > > [...] > > > > > I see your argument about being in sync with other kmem helpers but > > > > > those are bit different because regular page/slab allocators allow never > > > > > fail semantic (even though this is mostly ignored by those helpers which > > > > > implement their own retries but that is a different topic). > > > > > > > > > > > > > ... but what I'm trying to understand here is whether this failure > > > > scenario is specific to vmalloc() or whether the other kmem_*() > > > > functions are susceptible to the same problem. For example, suppose we > > > > replaced this kmem_zalloc_greedy() call with a kmem_zalloc(PAGE_SIZE, > > > > KM_SLEEP) call. Could we hit the same problem if the process is killed? > > > > > > Well, kmem_zalloc uses kmalloc which can also fail when we are out of > > > memory but in that case we can expect the OOM killer releasing some > > > memory which would allow us to make a forward progress on the next > > > retry. So essentially retrying around kmalloc is much more safe in this > > > regard. Failing vmalloc might be permanent because there is no vmalloc > > > space to allocate from or much more likely due to already mentioned > > > patch. So vmalloc is different, really. > > > > Right.. that's why I'm asking. So it's technically possible but highly > > unlikely due to the different failure characteristics. That seems > > reasonable to me, then. > > > > To be clear, do we understand what causes the vzalloc() failure to be > > effectively permanent in this specific reproducer? I know you mention > > above that we could be out of vmalloc space, but that doesn't clarify > > whether there are other potential failure paths or then what this has to > > do with the fact that the process was killed. Does the pending signal > > cause the subsequent failures or are you saying that there is some other > > root cause of the failure, this process would effectively be spinning > > here anyways, and we're just noticing it because it's trying to exit? > > In this particular case it is fatal_signal_pending that causes the > permanent failure. This check has been added to prevent from complete > memory reserves depletion on OOM when a killed task has a free ticket to > reserves and vmalloc requests can be really large. In this case there > was no OOM killer going on but fsstress has SIGKILL pending for other > reason. Most probably as a result of the group_exit when all threads > are killed (see zap_process). I could have turn fatal_signal_pending > into tsk_is_oom_victim which would be less likely to hit but in > principle fatal_signal_pending should be better because we do want to > bail out when the process is existing as soon as possible. > > What I really wanted to say is that there are other possible permanent > failure paths in vmalloc AFAICS. They are much less probable but they > still exist. > > Does that make more sense now? Yes, thanks. That explains why this crops up now where it hasn't in the past. Please include that background in the commit log description. Also, that kind of makes me think that a fatal_signal_pending() check is still appropriate in the loop, even if we want to drop the infinite retry loop in kmem_zalloc_greedy() as well. There's no sense in doing however many retries are left before we return and that's also more explicit for the next person who goes to change this code in the future. Otherwise, I'm fine with breaking the infinite retry loop at the same time. It looks like Christoph added this function originally so this should probably require his ack as well.. Brian > -- > Michal Hocko > SUSE Labs > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: mm allocation failure and hang when running xfstests generic/269 on xfs 2017-03-02 14:51 ` Brian Foster @ 2017-03-02 15:14 ` Michal Hocko 2017-03-02 15:30 ` Brian Foster 2017-03-02 15:47 ` Christoph Hellwig 1 sibling, 1 reply; 36+ messages in thread From: Michal Hocko @ 2017-03-02 15:14 UTC (permalink / raw) To: Brian Foster, Christoph Hellwig Cc: Tetsuo Handa, Xiong Zhou, linux-xfs, linux-mm, linux-kernel, linux-fsdevel On Thu 02-03-17 09:51:31, Brian Foster wrote: > On Thu, Mar 02, 2017 at 03:34:41PM +0100, Michal Hocko wrote: > > On Thu 02-03-17 09:23:15, Brian Foster wrote: > > > On Thu, Mar 02, 2017 at 02:50:01PM +0100, Michal Hocko wrote: > > > > On Thu 02-03-17 08:41:58, Brian Foster wrote: > > > > > On Thu, Mar 02, 2017 at 02:27:55PM +0100, Michal Hocko wrote: > > > > [...] > > > > > > I see your argument about being in sync with other kmem helpers but > > > > > > those are bit different because regular page/slab allocators allow never > > > > > > fail semantic (even though this is mostly ignored by those helpers which > > > > > > implement their own retries but that is a different topic). > > > > > > > > > > > > > > > > ... but what I'm trying to understand here is whether this failure > > > > > scenario is specific to vmalloc() or whether the other kmem_*() > > > > > functions are susceptible to the same problem. For example, suppose we > > > > > replaced this kmem_zalloc_greedy() call with a kmem_zalloc(PAGE_SIZE, > > > > > KM_SLEEP) call. Could we hit the same problem if the process is killed? > > > > > > > > Well, kmem_zalloc uses kmalloc which can also fail when we are out of > > > > memory but in that case we can expect the OOM killer releasing some > > > > memory which would allow us to make a forward progress on the next > > > > retry. So essentially retrying around kmalloc is much more safe in this > > > > regard. Failing vmalloc might be permanent because there is no vmalloc > > > > space to allocate from or much more likely due to already mentioned > > > > patch. So vmalloc is different, really. > > > > > > Right.. that's why I'm asking. So it's technically possible but highly > > > unlikely due to the different failure characteristics. That seems > > > reasonable to me, then. > > > > > > To be clear, do we understand what causes the vzalloc() failure to be > > > effectively permanent in this specific reproducer? I know you mention > > > above that we could be out of vmalloc space, but that doesn't clarify > > > whether there are other potential failure paths or then what this has to > > > do with the fact that the process was killed. Does the pending signal > > > cause the subsequent failures or are you saying that there is some other > > > root cause of the failure, this process would effectively be spinning > > > here anyways, and we're just noticing it because it's trying to exit? > > > > In this particular case it is fatal_signal_pending that causes the > > permanent failure. This check has been added to prevent from complete > > memory reserves depletion on OOM when a killed task has a free ticket to > > reserves and vmalloc requests can be really large. In this case there > > was no OOM killer going on but fsstress has SIGKILL pending for other > > reason. Most probably as a result of the group_exit when all threads > > are killed (see zap_process). I could have turn fatal_signal_pending > > into tsk_is_oom_victim which would be less likely to hit but in > > principle fatal_signal_pending should be better because we do want to > > bail out when the process is existing as soon as possible. > > > > What I really wanted to say is that there are other possible permanent > > failure paths in vmalloc AFAICS. They are much less probable but they > > still exist. > > > > Does that make more sense now? > > Yes, thanks. That explains why this crops up now where it hasn't in the > past. Please include that background in the commit log description. OK, does this sound better. I am open to any suggestions to improve this of course : xfs: allow kmem_zalloc_greedy to fail : : Even though kmem_zalloc_greedy is documented it might fail the current : code doesn't really implement this properly and loops on the smallest : allowed size for ever. This is a problem because vzalloc might fail : permanently - we might run out of vmalloc space or since 5d17a73a2ebe : ("vmalloc: back off when the current task is killed") when the current : task is killed. The later one makes the failure scenario much more : probable than it used to be. Fix this by bailing out if the minimum size : request failed. : : This has been noticed by a hung generic/269 xfstest by Xiong Zhou. : : fsstress: vmalloc: allocation failure, allocated 12288 of 20480 bytes, mode:0x14080c2(GFP_KERNEL|__GFP_HIGHMEM|__GFP_ZERO), nodemask=(null) : fsstress cpuset=/ mems_allowed=0-1 : CPU: 1 PID: 23460 Comm: fsstress Not tainted 4.10.0-master-45554b2+ #21 : Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 10/05/2016 : Call Trace: : dump_stack+0x63/0x87 : warn_alloc+0x114/0x1c0 : ? alloc_pages_current+0x88/0x120 : __vmalloc_node_range+0x250/0x2a0 : ? kmem_zalloc_greedy+0x2b/0x40 [xfs] : ? free_hot_cold_page+0x21f/0x280 : vzalloc+0x54/0x60 : ? kmem_zalloc_greedy+0x2b/0x40 [xfs] : kmem_zalloc_greedy+0x2b/0x40 [xfs] : xfs_bulkstat+0x11b/0x730 [xfs] : ? xfs_bulkstat_one_int+0x340/0x340 [xfs] : ? selinux_capable+0x20/0x30 : ? security_capable+0x48/0x60 : xfs_ioc_bulkstat+0xe4/0x190 [xfs] : xfs_file_ioctl+0x9dd/0xad0 [xfs] : ? do_filp_open+0xa5/0x100 : do_vfs_ioctl+0xa7/0x5e0 : SyS_ioctl+0x79/0x90 : do_syscall_64+0x67/0x180 : entry_SYSCALL64_slow_path+0x25/0x25 : : fsstress keeps looping inside kmem_zalloc_greedy without any way out : because vmalloc keeps failing due to fatal_signal_pending. : : Reported-by: Xiong Zhou <xzhou@redhat.com> : Analyzed-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> : Signed-off-by: Michal Hocko <mhocko@suse.com> > Also, that kind of makes me think that a fatal_signal_pending() check is > still appropriate in the loop, even if we want to drop the infinite > retry loop in kmem_zalloc_greedy() as well. There's no sense in doing > however many retries are left before we return and that's also more > explicit for the next person who goes to change this code in the future. I am not objecting to adding fatal_signal_pending as well I just thought that from the logic POV breaking after reaching the minimum size is just the right thing to do. We can optimize further by checking fatal_signal_pending and reducing retries when we know it doesn't make much sense but that should be done on top as an optimization IMHO. > Otherwise, I'm fine with breaking the infinite retry loop at the same > time. It looks like Christoph added this function originally so this > should probably require his ack as well.. What do you think Christoph? -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: mm allocation failure and hang when running xfstests generic/269 on xfs 2017-03-02 15:14 ` Michal Hocko @ 2017-03-02 15:30 ` Brian Foster 2017-03-02 15:45 ` [PATCH 1/2] xfs: allow kmem_zalloc_greedy to fail Michal Hocko 2017-03-02 15:47 ` mm allocation failure and hang when running xfstests generic/269 on xfs Michal Hocko 0 siblings, 2 replies; 36+ messages in thread From: Brian Foster @ 2017-03-02 15:30 UTC (permalink / raw) To: Michal Hocko Cc: Christoph Hellwig, Tetsuo Handa, Xiong Zhou, linux-xfs, linux-mm, linux-kernel, linux-fsdevel On Thu, Mar 02, 2017 at 04:14:11PM +0100, Michal Hocko wrote: > On Thu 02-03-17 09:51:31, Brian Foster wrote: > > On Thu, Mar 02, 2017 at 03:34:41PM +0100, Michal Hocko wrote: > > > On Thu 02-03-17 09:23:15, Brian Foster wrote: > > > > On Thu, Mar 02, 2017 at 02:50:01PM +0100, Michal Hocko wrote: > > > > > On Thu 02-03-17 08:41:58, Brian Foster wrote: > > > > > > On Thu, Mar 02, 2017 at 02:27:55PM +0100, Michal Hocko wrote: > > > > > [...] > > > > > > > I see your argument about being in sync with other kmem helpers but > > > > > > > those are bit different because regular page/slab allocators allow never > > > > > > > fail semantic (even though this is mostly ignored by those helpers which > > > > > > > implement their own retries but that is a different topic). > > > > > > > > > > > > > > > > > > > ... but what I'm trying to understand here is whether this failure > > > > > > scenario is specific to vmalloc() or whether the other kmem_*() > > > > > > functions are susceptible to the same problem. For example, suppose we > > > > > > replaced this kmem_zalloc_greedy() call with a kmem_zalloc(PAGE_SIZE, > > > > > > KM_SLEEP) call. Could we hit the same problem if the process is killed? > > > > > > > > > > Well, kmem_zalloc uses kmalloc which can also fail when we are out of > > > > > memory but in that case we can expect the OOM killer releasing some > > > > > memory which would allow us to make a forward progress on the next > > > > > retry. So essentially retrying around kmalloc is much more safe in this > > > > > regard. Failing vmalloc might be permanent because there is no vmalloc > > > > > space to allocate from or much more likely due to already mentioned > > > > > patch. So vmalloc is different, really. > > > > > > > > Right.. that's why I'm asking. So it's technically possible but highly > > > > unlikely due to the different failure characteristics. That seems > > > > reasonable to me, then. > > > > > > > > To be clear, do we understand what causes the vzalloc() failure to be > > > > effectively permanent in this specific reproducer? I know you mention > > > > above that we could be out of vmalloc space, but that doesn't clarify > > > > whether there are other potential failure paths or then what this has to > > > > do with the fact that the process was killed. Does the pending signal > > > > cause the subsequent failures or are you saying that there is some other > > > > root cause of the failure, this process would effectively be spinning > > > > here anyways, and we're just noticing it because it's trying to exit? > > > > > > In this particular case it is fatal_signal_pending that causes the > > > permanent failure. This check has been added to prevent from complete > > > memory reserves depletion on OOM when a killed task has a free ticket to > > > reserves and vmalloc requests can be really large. In this case there > > > was no OOM killer going on but fsstress has SIGKILL pending for other > > > reason. Most probably as a result of the group_exit when all threads > > > are killed (see zap_process). I could have turn fatal_signal_pending > > > into tsk_is_oom_victim which would be less likely to hit but in > > > principle fatal_signal_pending should be better because we do want to > > > bail out when the process is existing as soon as possible. > > > > > > What I really wanted to say is that there are other possible permanent > > > failure paths in vmalloc AFAICS. They are much less probable but they > > > still exist. > > > > > > Does that make more sense now? > > > > Yes, thanks. That explains why this crops up now where it hasn't in the > > past. Please include that background in the commit log description. > > OK, does this sound better. I am open to any suggestions to improve this > of course > Yeah.. > : xfs: allow kmem_zalloc_greedy to fail > : > : Even though kmem_zalloc_greedy is documented it might fail the current > : code doesn't really implement this properly and loops on the smallest > : allowed size for ever. This is a problem because vzalloc might fail > : permanently - we might run out of vmalloc space or since 5d17a73a2ebe > : ("vmalloc: back off when the current task is killed") when the current > : task is killed. The later one makes the failure scenario much more > : probable than it used to be. Fix this by bailing out if the minimum size ^ " because it makes vmalloc() failures permanent for tasks with fatal signals pending." > : request failed. > : > : This has been noticed by a hung generic/269 xfstest by Xiong Zhou. > : > : fsstress: vmalloc: allocation failure, allocated 12288 of 20480 bytes, mode:0x14080c2(GFP_KERNEL|__GFP_HIGHMEM|__GFP_ZERO), nodemask=(null) > : fsstress cpuset=/ mems_allowed=0-1 > : CPU: 1 PID: 23460 Comm: fsstress Not tainted 4.10.0-master-45554b2+ #21 > : Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 10/05/2016 > : Call Trace: > : dump_stack+0x63/0x87 > : warn_alloc+0x114/0x1c0 > : ? alloc_pages_current+0x88/0x120 > : __vmalloc_node_range+0x250/0x2a0 > : ? kmem_zalloc_greedy+0x2b/0x40 [xfs] > : ? free_hot_cold_page+0x21f/0x280 > : vzalloc+0x54/0x60 > : ? kmem_zalloc_greedy+0x2b/0x40 [xfs] > : kmem_zalloc_greedy+0x2b/0x40 [xfs] > : xfs_bulkstat+0x11b/0x730 [xfs] > : ? xfs_bulkstat_one_int+0x340/0x340 [xfs] > : ? selinux_capable+0x20/0x30 > : ? security_capable+0x48/0x60 > : xfs_ioc_bulkstat+0xe4/0x190 [xfs] > : xfs_file_ioctl+0x9dd/0xad0 [xfs] > : ? do_filp_open+0xa5/0x100 > : do_vfs_ioctl+0xa7/0x5e0 > : SyS_ioctl+0x79/0x90 > : do_syscall_64+0x67/0x180 > : entry_SYSCALL64_slow_path+0x25/0x25 > : > : fsstress keeps looping inside kmem_zalloc_greedy without any way out > : because vmalloc keeps failing due to fatal_signal_pending. > : > : Reported-by: Xiong Zhou <xzhou@redhat.com> > : Analyzed-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> > : Signed-off-by: Michal Hocko <mhocko@suse.com> > > > Also, that kind of makes me think that a fatal_signal_pending() check is > > still appropriate in the loop, even if we want to drop the infinite > > retry loop in kmem_zalloc_greedy() as well. There's no sense in doing > > however many retries are left before we return and that's also more > > explicit for the next person who goes to change this code in the future. > > I am not objecting to adding fatal_signal_pending as well I just thought > that from the logic POV breaking after reaching the minimum size is just > the right thing to do. We can optimize further by checking > fatal_signal_pending and reducing retries when we know it doesn't make > much sense but that should be done on top as an optimization IMHO. > I don't think of it as an optimization to not invoke calls (a non-deterministic number of times) that we know are going to fail, but ultimately I don't care too much how it's framed or if it's done in separate patches or whatnot. As long as they are posted at the same time. ;) Brian > > Otherwise, I'm fine with breaking the infinite retry loop at the same > > time. It looks like Christoph added this function originally so this > > should probably require his ack as well.. > > What do you think Christoph? > -- > Michal Hocko > SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH 1/2] xfs: allow kmem_zalloc_greedy to fail 2017-03-02 15:30 ` Brian Foster @ 2017-03-02 15:45 ` Michal Hocko 2017-03-02 15:45 ` [PATCH 2/2] xfs: back off from kmem_zalloc_greedy if the task is killed Michal Hocko ` (4 more replies) 2017-03-02 15:47 ` mm allocation failure and hang when running xfstests generic/269 on xfs Michal Hocko 1 sibling, 5 replies; 36+ messages in thread From: Michal Hocko @ 2017-03-02 15:45 UTC (permalink / raw) To: Christoph Hellwig, Brian Foster Cc: Tetsuo Handa, Xiong Zhou, linux-xfs, linux-mm, LKML, linux-fsdevel, Michal Hocko From: Michal Hocko <mhocko@suse.com> Even though kmem_zalloc_greedy is documented it might fail the current code doesn't really implement this properly and loops on the smallest allowed size for ever. This is a problem because vzalloc might fail permanently - we might run out of vmalloc space or since 5d17a73a2ebe ("vmalloc: back off when the current task is killed") when the current task is killed. The later one makes the failure scenario much more probable than it used to be because it makes vmalloc() failures permanent for tasks with fatal signals pending.. Fix this by bailing out if the minimum size request failed. This has been noticed by a hung generic/269 xfstest by Xiong Zhou. fsstress: vmalloc: allocation failure, allocated 12288 of 20480 bytes, mode:0x14080c2(GFP_KERNEL|__GFP_HIGHMEM|__GFP_ZERO), nodemask=(null) fsstress cpuset=/ mems_allowed=0-1 CPU: 1 PID: 23460 Comm: fsstress Not tainted 4.10.0-master-45554b2+ #21 Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 10/05/2016 Call Trace: dump_stack+0x63/0x87 warn_alloc+0x114/0x1c0 ? alloc_pages_current+0x88/0x120 __vmalloc_node_range+0x250/0x2a0 ? kmem_zalloc_greedy+0x2b/0x40 [xfs] ? free_hot_cold_page+0x21f/0x280 vzalloc+0x54/0x60 ? kmem_zalloc_greedy+0x2b/0x40 [xfs] kmem_zalloc_greedy+0x2b/0x40 [xfs] xfs_bulkstat+0x11b/0x730 [xfs] ? xfs_bulkstat_one_int+0x340/0x340 [xfs] ? selinux_capable+0x20/0x30 ? security_capable+0x48/0x60 xfs_ioc_bulkstat+0xe4/0x190 [xfs] xfs_file_ioctl+0x9dd/0xad0 [xfs] ? do_filp_open+0xa5/0x100 do_vfs_ioctl+0xa7/0x5e0 SyS_ioctl+0x79/0x90 do_syscall_64+0x67/0x180 entry_SYSCALL64_slow_path+0x25/0x25 fsstress keeps looping inside kmem_zalloc_greedy without any way out because vmalloc keeps failing due to fatal_signal_pending. Reported-by: Xiong Zhou <xzhou@redhat.com> Analyzed-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Michal Hocko <mhocko@suse.com> --- fs/xfs/kmem.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c index 339c696bbc01..ee95f5c6db45 100644 --- a/fs/xfs/kmem.c +++ b/fs/xfs/kmem.c @@ -34,6 +34,8 @@ kmem_zalloc_greedy(size_t *size, size_t minsize, size_t maxsize) size_t kmsize = maxsize; while (!(ptr = vzalloc(kmsize))) { + if (kmsize == minsize) + break; if ((kmsize >>= 1) <= minsize) kmsize = minsize; } -- 2.11.0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH 2/2] xfs: back off from kmem_zalloc_greedy if the task is killed 2017-03-02 15:45 ` [PATCH 1/2] xfs: allow kmem_zalloc_greedy to fail Michal Hocko @ 2017-03-02 15:45 ` Michal Hocko 2017-03-02 15:49 ` Christoph Hellwig 2017-03-02 15:59 ` Brian Foster 2017-03-02 15:49 ` [PATCH 1/2] xfs: allow kmem_zalloc_greedy to fail Christoph Hellwig ` (3 subsequent siblings) 4 siblings, 2 replies; 36+ messages in thread From: Michal Hocko @ 2017-03-02 15:45 UTC (permalink / raw) To: Christoph Hellwig, Brian Foster Cc: Tetsuo Handa, Xiong Zhou, linux-xfs, linux-mm, LKML, linux-fsdevel, Michal Hocko From: Michal Hocko <mhocko@suse.com> It doesn't really make much sense to retry vmalloc request if the current task is killed. We should rather bail out as soon as possible and let it RIP as soon as possible. The current implementation of vmalloc will fail anyway. Suggested-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Michal Hocko <mhocko@suse.com> --- fs/xfs/kmem.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c index ee95f5c6db45..01c52567a4ff 100644 --- a/fs/xfs/kmem.c +++ b/fs/xfs/kmem.c @@ -34,7 +34,7 @@ kmem_zalloc_greedy(size_t *size, size_t minsize, size_t maxsize) size_t kmsize = maxsize; while (!(ptr = vzalloc(kmsize))) { - if (kmsize == minsize) + if (kmsize == minsize || fatal_signal_pending(current)) break; if ((kmsize >>= 1) <= minsize) kmsize = minsize; -- 2.11.0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [PATCH 2/2] xfs: back off from kmem_zalloc_greedy if the task is killed 2017-03-02 15:45 ` [PATCH 2/2] xfs: back off from kmem_zalloc_greedy if the task is killed Michal Hocko @ 2017-03-02 15:49 ` Christoph Hellwig 2017-03-02 15:59 ` Brian Foster 1 sibling, 0 replies; 36+ messages in thread From: Christoph Hellwig @ 2017-03-02 15:49 UTC (permalink / raw) To: Michal Hocko Cc: Christoph Hellwig, Brian Foster, Tetsuo Handa, Xiong Zhou, linux-xfs, linux-mm, LKML, linux-fsdevel, Michal Hocko Looks fine, Reviewed-by: Christoph Hellwig <hch@lst.de> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 2/2] xfs: back off from kmem_zalloc_greedy if the task is killed 2017-03-02 15:45 ` [PATCH 2/2] xfs: back off from kmem_zalloc_greedy if the task is killed Michal Hocko 2017-03-02 15:49 ` Christoph Hellwig @ 2017-03-02 15:59 ` Brian Foster 1 sibling, 0 replies; 36+ messages in thread From: Brian Foster @ 2017-03-02 15:59 UTC (permalink / raw) To: Michal Hocko Cc: Christoph Hellwig, Tetsuo Handa, Xiong Zhou, linux-xfs, linux-mm, LKML, linux-fsdevel, Michal Hocko On Thu, Mar 02, 2017 at 04:45:41PM +0100, Michal Hocko wrote: > From: Michal Hocko <mhocko@suse.com> > > It doesn't really make much sense to retry vmalloc request if the > current task is killed. We should rather bail out as soon as possible > and let it RIP as soon as possible. The current implementation of > vmalloc will fail anyway. > > Suggested-by: Brian Foster <bfoster@redhat.com> > Signed-off-by: Michal Hocko <mhocko@suse.com> > --- Reviewed-by: Brian Foster <bfoster@redhat.com> > fs/xfs/kmem.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c > index ee95f5c6db45..01c52567a4ff 100644 > --- a/fs/xfs/kmem.c > +++ b/fs/xfs/kmem.c > @@ -34,7 +34,7 @@ kmem_zalloc_greedy(size_t *size, size_t minsize, size_t maxsize) > size_t kmsize = maxsize; > > while (!(ptr = vzalloc(kmsize))) { > - if (kmsize == minsize) > + if (kmsize == minsize || fatal_signal_pending(current)) > break; > if ((kmsize >>= 1) <= minsize) > kmsize = minsize; > -- > 2.11.0 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/2] xfs: allow kmem_zalloc_greedy to fail 2017-03-02 15:45 ` [PATCH 1/2] xfs: allow kmem_zalloc_greedy to fail Michal Hocko 2017-03-02 15:45 ` [PATCH 2/2] xfs: back off from kmem_zalloc_greedy if the task is killed Michal Hocko @ 2017-03-02 15:49 ` Christoph Hellwig 2017-03-02 15:59 ` Brian Foster ` (2 subsequent siblings) 4 siblings, 0 replies; 36+ messages in thread From: Christoph Hellwig @ 2017-03-02 15:49 UTC (permalink / raw) To: Michal Hocko Cc: Christoph Hellwig, Brian Foster, Tetsuo Handa, Xiong Zhou, linux-xfs, linux-mm, LKML, linux-fsdevel, Michal Hocko Looks fine, Reviewed-by: Christoph Hellwig <hch@lst.de> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/2] xfs: allow kmem_zalloc_greedy to fail 2017-03-02 15:45 ` [PATCH 1/2] xfs: allow kmem_zalloc_greedy to fail Michal Hocko 2017-03-02 15:45 ` [PATCH 2/2] xfs: back off from kmem_zalloc_greedy if the task is killed Michal Hocko 2017-03-02 15:49 ` [PATCH 1/2] xfs: allow kmem_zalloc_greedy to fail Christoph Hellwig @ 2017-03-02 15:59 ` Brian Foster 2017-03-02 16:16 ` Michal Hocko 2017-03-03 22:54 ` Dave Chinner 4 siblings, 0 replies; 36+ messages in thread From: Brian Foster @ 2017-03-02 15:59 UTC (permalink / raw) To: Michal Hocko Cc: Christoph Hellwig, Tetsuo Handa, Xiong Zhou, linux-xfs, linux-mm, LKML, linux-fsdevel, Michal Hocko On Thu, Mar 02, 2017 at 04:45:40PM +0100, Michal Hocko wrote: > From: Michal Hocko <mhocko@suse.com> > > Even though kmem_zalloc_greedy is documented it might fail the current > code doesn't really implement this properly and loops on the smallest > allowed size for ever. This is a problem because vzalloc might fail > permanently - we might run out of vmalloc space or since 5d17a73a2ebe > ("vmalloc: back off when the current task is killed") when the current > task is killed. The later one makes the failure scenario much more > probable than it used to be because it makes vmalloc() failures > permanent for tasks with fatal signals pending.. Fix this by bailing out > if the minimum size request failed. > > This has been noticed by a hung generic/269 xfstest by Xiong Zhou. > > fsstress: vmalloc: allocation failure, allocated 12288 of 20480 bytes, mode:0x14080c2(GFP_KERNEL|__GFP_HIGHMEM|__GFP_ZERO), nodemask=(null) > fsstress cpuset=/ mems_allowed=0-1 > CPU: 1 PID: 23460 Comm: fsstress Not tainted 4.10.0-master-45554b2+ #21 > Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 10/05/2016 > Call Trace: > dump_stack+0x63/0x87 > warn_alloc+0x114/0x1c0 > ? alloc_pages_current+0x88/0x120 > __vmalloc_node_range+0x250/0x2a0 > ? kmem_zalloc_greedy+0x2b/0x40 [xfs] > ? free_hot_cold_page+0x21f/0x280 > vzalloc+0x54/0x60 > ? kmem_zalloc_greedy+0x2b/0x40 [xfs] > kmem_zalloc_greedy+0x2b/0x40 [xfs] > xfs_bulkstat+0x11b/0x730 [xfs] > ? xfs_bulkstat_one_int+0x340/0x340 [xfs] > ? selinux_capable+0x20/0x30 > ? security_capable+0x48/0x60 > xfs_ioc_bulkstat+0xe4/0x190 [xfs] > xfs_file_ioctl+0x9dd/0xad0 [xfs] > ? do_filp_open+0xa5/0x100 > do_vfs_ioctl+0xa7/0x5e0 > SyS_ioctl+0x79/0x90 > do_syscall_64+0x67/0x180 > entry_SYSCALL64_slow_path+0x25/0x25 > > fsstress keeps looping inside kmem_zalloc_greedy without any way out > because vmalloc keeps failing due to fatal_signal_pending. > > Reported-by: Xiong Zhou <xzhou@redhat.com> > Analyzed-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> > Signed-off-by: Michal Hocko <mhocko@suse.com> > --- Reviewed-by: Brian Foster <bfoster@redhat.com> > fs/xfs/kmem.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c > index 339c696bbc01..ee95f5c6db45 100644 > --- a/fs/xfs/kmem.c > +++ b/fs/xfs/kmem.c > @@ -34,6 +34,8 @@ kmem_zalloc_greedy(size_t *size, size_t minsize, size_t maxsize) > size_t kmsize = maxsize; > > while (!(ptr = vzalloc(kmsize))) { > + if (kmsize == minsize) > + break; > if ((kmsize >>= 1) <= minsize) > kmsize = minsize; > } > -- > 2.11.0 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/2] xfs: allow kmem_zalloc_greedy to fail 2017-03-02 15:45 ` [PATCH 1/2] xfs: allow kmem_zalloc_greedy to fail Michal Hocko ` (2 preceding siblings ...) 2017-03-02 15:59 ` Brian Foster @ 2017-03-02 16:16 ` Michal Hocko 2017-03-02 16:44 ` Darrick J. Wong 2017-03-03 22:54 ` Dave Chinner 4 siblings, 1 reply; 36+ messages in thread From: Michal Hocko @ 2017-03-02 16:16 UTC (permalink / raw) To: Christoph Hellwig, Brian Foster, Darrick J. Wong Cc: Tetsuo Handa, Xiong Zhou, linux-xfs, linux-mm, LKML, linux-fsdevel I've just realized that Darrick was not on the CC list. Let's add him. I believe this patch should go in in the current cycle because 5d17a73a2ebe was merged in this merge window and it can be abused... The other patch [1] is not that urgent. [1] http://lkml.kernel.org/r/20170302154541.16155-2-mhocko@kernel.org On Thu 02-03-17 16:45:40, Michal Hocko wrote: > From: Michal Hocko <mhocko@suse.com> > > Even though kmem_zalloc_greedy is documented it might fail the current > code doesn't really implement this properly and loops on the smallest > allowed size for ever. This is a problem because vzalloc might fail > permanently - we might run out of vmalloc space or since 5d17a73a2ebe > ("vmalloc: back off when the current task is killed") when the current > task is killed. The later one makes the failure scenario much more > probable than it used to be because it makes vmalloc() failures > permanent for tasks with fatal signals pending.. Fix this by bailing out > if the minimum size request failed. > > This has been noticed by a hung generic/269 xfstest by Xiong Zhou. > > fsstress: vmalloc: allocation failure, allocated 12288 of 20480 bytes, mode:0x14080c2(GFP_KERNEL|__GFP_HIGHMEM|__GFP_ZERO), nodemask=(null) > fsstress cpuset=/ mems_allowed=0-1 > CPU: 1 PID: 23460 Comm: fsstress Not tainted 4.10.0-master-45554b2+ #21 > Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 10/05/2016 > Call Trace: > dump_stack+0x63/0x87 > warn_alloc+0x114/0x1c0 > ? alloc_pages_current+0x88/0x120 > __vmalloc_node_range+0x250/0x2a0 > ? kmem_zalloc_greedy+0x2b/0x40 [xfs] > ? free_hot_cold_page+0x21f/0x280 > vzalloc+0x54/0x60 > ? kmem_zalloc_greedy+0x2b/0x40 [xfs] > kmem_zalloc_greedy+0x2b/0x40 [xfs] > xfs_bulkstat+0x11b/0x730 [xfs] > ? xfs_bulkstat_one_int+0x340/0x340 [xfs] > ? selinux_capable+0x20/0x30 > ? security_capable+0x48/0x60 > xfs_ioc_bulkstat+0xe4/0x190 [xfs] > xfs_file_ioctl+0x9dd/0xad0 [xfs] > ? do_filp_open+0xa5/0x100 > do_vfs_ioctl+0xa7/0x5e0 > SyS_ioctl+0x79/0x90 > do_syscall_64+0x67/0x180 > entry_SYSCALL64_slow_path+0x25/0x25 > > fsstress keeps looping inside kmem_zalloc_greedy without any way out > because vmalloc keeps failing due to fatal_signal_pending. > > Reported-by: Xiong Zhou <xzhou@redhat.com> > Analyzed-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> > Signed-off-by: Michal Hocko <mhocko@suse.com> > --- > fs/xfs/kmem.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c > index 339c696bbc01..ee95f5c6db45 100644 > --- a/fs/xfs/kmem.c > +++ b/fs/xfs/kmem.c > @@ -34,6 +34,8 @@ kmem_zalloc_greedy(size_t *size, size_t minsize, size_t maxsize) > size_t kmsize = maxsize; > > while (!(ptr = vzalloc(kmsize))) { > + if (kmsize == minsize) > + break; > if ((kmsize >>= 1) <= minsize) > kmsize = minsize; > } > -- > 2.11.0 > -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/2] xfs: allow kmem_zalloc_greedy to fail 2017-03-02 16:16 ` Michal Hocko @ 2017-03-02 16:44 ` Darrick J. Wong 0 siblings, 0 replies; 36+ messages in thread From: Darrick J. Wong @ 2017-03-02 16:44 UTC (permalink / raw) To: Michal Hocko Cc: Christoph Hellwig, Brian Foster, Tetsuo Handa, Xiong Zhou, linux-xfs, linux-mm, LKML, linux-fsdevel On Thu, Mar 02, 2017 at 05:16:06PM +0100, Michal Hocko wrote: > I've just realized that Darrick was not on the CC list. Let's add him. > I believe this patch should go in in the current cycle because > 5d17a73a2ebe was merged in this merge window and it can be abused... > > The other patch [1] is not that urgent. > > [1] http://lkml.kernel.org/r/20170302154541.16155-2-mhocko@kernel.org Both patches look ok to me. I'll take both patches for rc2. Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> (Annoyingly I missed the whole thread yesterday due to vger slowness, in case anyone was wondering why I didn't reply.) --D > > On Thu 02-03-17 16:45:40, Michal Hocko wrote: > > From: Michal Hocko <mhocko@suse.com> > > > > Even though kmem_zalloc_greedy is documented it might fail the current > > code doesn't really implement this properly and loops on the smallest > > allowed size for ever. This is a problem because vzalloc might fail > > permanently - we might run out of vmalloc space or since 5d17a73a2ebe > > ("vmalloc: back off when the current task is killed") when the current > > task is killed. The later one makes the failure scenario much more > > probable than it used to be because it makes vmalloc() failures > > permanent for tasks with fatal signals pending.. Fix this by bailing out > > if the minimum size request failed. > > > > This has been noticed by a hung generic/269 xfstest by Xiong Zhou. > > > > fsstress: vmalloc: allocation failure, allocated 12288 of 20480 bytes, mode:0x14080c2(GFP_KERNEL|__GFP_HIGHMEM|__GFP_ZERO), nodemask=(null) > > fsstress cpuset=/ mems_allowed=0-1 > > CPU: 1 PID: 23460 Comm: fsstress Not tainted 4.10.0-master-45554b2+ #21 > > Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 10/05/2016 > > Call Trace: > > dump_stack+0x63/0x87 > > warn_alloc+0x114/0x1c0 > > ? alloc_pages_current+0x88/0x120 > > __vmalloc_node_range+0x250/0x2a0 > > ? kmem_zalloc_greedy+0x2b/0x40 [xfs] > > ? free_hot_cold_page+0x21f/0x280 > > vzalloc+0x54/0x60 > > ? kmem_zalloc_greedy+0x2b/0x40 [xfs] > > kmem_zalloc_greedy+0x2b/0x40 [xfs] > > xfs_bulkstat+0x11b/0x730 [xfs] > > ? xfs_bulkstat_one_int+0x340/0x340 [xfs] > > ? selinux_capable+0x20/0x30 > > ? security_capable+0x48/0x60 > > xfs_ioc_bulkstat+0xe4/0x190 [xfs] > > xfs_file_ioctl+0x9dd/0xad0 [xfs] > > ? do_filp_open+0xa5/0x100 > > do_vfs_ioctl+0xa7/0x5e0 > > SyS_ioctl+0x79/0x90 > > do_syscall_64+0x67/0x180 > > entry_SYSCALL64_slow_path+0x25/0x25 > > > > fsstress keeps looping inside kmem_zalloc_greedy without any way out > > because vmalloc keeps failing due to fatal_signal_pending. > > > > Reported-by: Xiong Zhou <xzhou@redhat.com> > > Analyzed-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> > > Signed-off-by: Michal Hocko <mhocko@suse.com> > > --- > > fs/xfs/kmem.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c > > index 339c696bbc01..ee95f5c6db45 100644 > > --- a/fs/xfs/kmem.c > > +++ b/fs/xfs/kmem.c > > @@ -34,6 +34,8 @@ kmem_zalloc_greedy(size_t *size, size_t minsize, size_t maxsize) > > size_t kmsize = maxsize; > > > > while (!(ptr = vzalloc(kmsize))) { > > + if (kmsize == minsize) > > + break; > > if ((kmsize >>= 1) <= minsize) > > kmsize = minsize; > > } > > -- > > 2.11.0 > > > > -- > Michal Hocko > SUSE Labs > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/2] xfs: allow kmem_zalloc_greedy to fail 2017-03-02 15:45 ` [PATCH 1/2] xfs: allow kmem_zalloc_greedy to fail Michal Hocko ` (3 preceding siblings ...) 2017-03-02 16:16 ` Michal Hocko @ 2017-03-03 22:54 ` Dave Chinner 2017-03-03 23:19 ` Darrick J. Wong 2017-03-06 13:21 ` Michal Hocko 4 siblings, 2 replies; 36+ messages in thread From: Dave Chinner @ 2017-03-03 22:54 UTC (permalink / raw) To: Michal Hocko Cc: Christoph Hellwig, Brian Foster, Tetsuo Handa, Xiong Zhou, linux-xfs, linux-mm, LKML, linux-fsdevel, Michal Hocko On Thu, Mar 02, 2017 at 04:45:40PM +0100, Michal Hocko wrote: > From: Michal Hocko <mhocko@suse.com> > > Even though kmem_zalloc_greedy is documented it might fail the current > code doesn't really implement this properly and loops on the smallest > allowed size for ever. This is a problem because vzalloc might fail > permanently - we might run out of vmalloc space or since 5d17a73a2ebe > ("vmalloc: back off when the current task is killed") when the current > task is killed. The later one makes the failure scenario much more > probable than it used to be because it makes vmalloc() failures > permanent for tasks with fatal signals pending.. Fix this by bailing out > if the minimum size request failed. > > This has been noticed by a hung generic/269 xfstest by Xiong Zhou. > > fsstress: vmalloc: allocation failure, allocated 12288 of 20480 bytes, mode:0x14080c2(GFP_KERNEL|__GFP_HIGHMEM|__GFP_ZERO), nodemask=(null) > fsstress cpuset=/ mems_allowed=0-1 > CPU: 1 PID: 23460 Comm: fsstress Not tainted 4.10.0-master-45554b2+ #21 > Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 10/05/2016 > Call Trace: > dump_stack+0x63/0x87 > warn_alloc+0x114/0x1c0 > ? alloc_pages_current+0x88/0x120 > __vmalloc_node_range+0x250/0x2a0 > ? kmem_zalloc_greedy+0x2b/0x40 [xfs] > ? free_hot_cold_page+0x21f/0x280 > vzalloc+0x54/0x60 > ? kmem_zalloc_greedy+0x2b/0x40 [xfs] > kmem_zalloc_greedy+0x2b/0x40 [xfs] > xfs_bulkstat+0x11b/0x730 [xfs] > ? xfs_bulkstat_one_int+0x340/0x340 [xfs] > ? selinux_capable+0x20/0x30 > ? security_capable+0x48/0x60 > xfs_ioc_bulkstat+0xe4/0x190 [xfs] > xfs_file_ioctl+0x9dd/0xad0 [xfs] > ? do_filp_open+0xa5/0x100 > do_vfs_ioctl+0xa7/0x5e0 > SyS_ioctl+0x79/0x90 > do_syscall_64+0x67/0x180 > entry_SYSCALL64_slow_path+0x25/0x25 > > fsstress keeps looping inside kmem_zalloc_greedy without any way out > because vmalloc keeps failing due to fatal_signal_pending. > > Reported-by: Xiong Zhou <xzhou@redhat.com> > Analyzed-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> > Signed-off-by: Michal Hocko <mhocko@suse.com> > --- > fs/xfs/kmem.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c > index 339c696bbc01..ee95f5c6db45 100644 > --- a/fs/xfs/kmem.c > +++ b/fs/xfs/kmem.c > @@ -34,6 +34,8 @@ kmem_zalloc_greedy(size_t *size, size_t minsize, size_t maxsize) > size_t kmsize = maxsize; > > while (!(ptr = vzalloc(kmsize))) { > + if (kmsize == minsize) > + break; > if ((kmsize >>= 1) <= minsize) > kmsize = minsize; > } Seems wrong to me - this function used to have lots of callers and over time we've slowly removed them or replaced them with something else. I'd suggest removing it completely, replacing the call sites with kmem_zalloc_large(). Cheers, Dave. -- Dave Chinner david@fromorbit.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/2] xfs: allow kmem_zalloc_greedy to fail 2017-03-03 22:54 ` Dave Chinner @ 2017-03-03 23:19 ` Darrick J. Wong 2017-03-04 4:48 ` Dave Chinner 2017-03-06 13:21 ` Michal Hocko 1 sibling, 1 reply; 36+ messages in thread From: Darrick J. Wong @ 2017-03-03 23:19 UTC (permalink / raw) To: Dave Chinner Cc: Michal Hocko, Christoph Hellwig, Brian Foster, Tetsuo Handa, Xiong Zhou, linux-xfs, linux-mm, LKML, linux-fsdevel, Michal Hocko On Sat, Mar 04, 2017 at 09:54:44AM +1100, Dave Chinner wrote: > On Thu, Mar 02, 2017 at 04:45:40PM +0100, Michal Hocko wrote: > > From: Michal Hocko <mhocko@suse.com> > > > > Even though kmem_zalloc_greedy is documented it might fail the current > > code doesn't really implement this properly and loops on the smallest > > allowed size for ever. This is a problem because vzalloc might fail > > permanently - we might run out of vmalloc space or since 5d17a73a2ebe > > ("vmalloc: back off when the current task is killed") when the current > > task is killed. The later one makes the failure scenario much more > > probable than it used to be because it makes vmalloc() failures > > permanent for tasks with fatal signals pending.. Fix this by bailing out > > if the minimum size request failed. > > > > This has been noticed by a hung generic/269 xfstest by Xiong Zhou. > > > > fsstress: vmalloc: allocation failure, allocated 12288 of 20480 bytes, mode:0x14080c2(GFP_KERNEL|__GFP_HIGHMEM|__GFP_ZERO), nodemask=(null) > > fsstress cpuset=/ mems_allowed=0-1 > > CPU: 1 PID: 23460 Comm: fsstress Not tainted 4.10.0-master-45554b2+ #21 > > Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 10/05/2016 > > Call Trace: > > dump_stack+0x63/0x87 > > warn_alloc+0x114/0x1c0 > > ? alloc_pages_current+0x88/0x120 > > __vmalloc_node_range+0x250/0x2a0 > > ? kmem_zalloc_greedy+0x2b/0x40 [xfs] > > ? free_hot_cold_page+0x21f/0x280 > > vzalloc+0x54/0x60 > > ? kmem_zalloc_greedy+0x2b/0x40 [xfs] > > kmem_zalloc_greedy+0x2b/0x40 [xfs] > > xfs_bulkstat+0x11b/0x730 [xfs] > > ? xfs_bulkstat_one_int+0x340/0x340 [xfs] > > ? selinux_capable+0x20/0x30 > > ? security_capable+0x48/0x60 > > xfs_ioc_bulkstat+0xe4/0x190 [xfs] > > xfs_file_ioctl+0x9dd/0xad0 [xfs] > > ? do_filp_open+0xa5/0x100 > > do_vfs_ioctl+0xa7/0x5e0 > > SyS_ioctl+0x79/0x90 > > do_syscall_64+0x67/0x180 > > entry_SYSCALL64_slow_path+0x25/0x25 > > > > fsstress keeps looping inside kmem_zalloc_greedy without any way out > > because vmalloc keeps failing due to fatal_signal_pending. > > > > Reported-by: Xiong Zhou <xzhou@redhat.com> > > Analyzed-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> > > Signed-off-by: Michal Hocko <mhocko@suse.com> > > --- > > fs/xfs/kmem.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c > > index 339c696bbc01..ee95f5c6db45 100644 > > --- a/fs/xfs/kmem.c > > +++ b/fs/xfs/kmem.c > > @@ -34,6 +34,8 @@ kmem_zalloc_greedy(size_t *size, size_t minsize, size_t maxsize) > > size_t kmsize = maxsize; > > > > while (!(ptr = vzalloc(kmsize))) { > > + if (kmsize == minsize) > > + break; > > if ((kmsize >>= 1) <= minsize) > > kmsize = minsize; > > } > > Seems wrong to me - this function used to have lots of callers and > over time we've slowly removed them or replaced them with something > else. I'd suggest removing it completely, replacing the call sites > with kmem_zalloc_large(). Heh. I thought the reason why _greedy still exists (for its sole user bulkstat) is that bulkstat had the flexibility to deal with receiving 0, 1, or 4 pages. So yeah, we could just kill it. But thinking even more stingily about memory, are there applications that care about being able to bulkstat 16384 inodes at once? How badly does bulkstat need to be able to bulk-process more than a page's worth of inobt records, anyway? --D > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/2] xfs: allow kmem_zalloc_greedy to fail 2017-03-03 23:19 ` Darrick J. Wong @ 2017-03-04 4:48 ` Dave Chinner 0 siblings, 0 replies; 36+ messages in thread From: Dave Chinner @ 2017-03-04 4:48 UTC (permalink / raw) To: Darrick J. Wong Cc: Michal Hocko, Christoph Hellwig, Brian Foster, Tetsuo Handa, Xiong Zhou, linux-xfs, linux-mm, LKML, linux-fsdevel, Michal Hocko On Fri, Mar 03, 2017 at 03:19:12PM -0800, Darrick J. Wong wrote: > On Sat, Mar 04, 2017 at 09:54:44AM +1100, Dave Chinner wrote: > > On Thu, Mar 02, 2017 at 04:45:40PM +0100, Michal Hocko wrote: > > > From: Michal Hocko <mhocko@suse.com> > > > > > > Even though kmem_zalloc_greedy is documented it might fail the current > > > code doesn't really implement this properly and loops on the smallest > > > allowed size for ever. This is a problem because vzalloc might fail > > > permanently - we might run out of vmalloc space or since 5d17a73a2ebe > > > ("vmalloc: back off when the current task is killed") when the current > > > task is killed. The later one makes the failure scenario much more > > > probable than it used to be because it makes vmalloc() failures > > > permanent for tasks with fatal signals pending.. Fix this by bailing out > > > if the minimum size request failed. > > > > > > This has been noticed by a hung generic/269 xfstest by Xiong Zhou. > > > > > > fsstress: vmalloc: allocation failure, allocated 12288 of 20480 bytes, mode:0x14080c2(GFP_KERNEL|__GFP_HIGHMEM|__GFP_ZERO), nodemask=(null) > > > fsstress cpuset=/ mems_allowed=0-1 > > > CPU: 1 PID: 23460 Comm: fsstress Not tainted 4.10.0-master-45554b2+ #21 > > > Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 10/05/2016 > > > Call Trace: > > > dump_stack+0x63/0x87 > > > warn_alloc+0x114/0x1c0 > > > ? alloc_pages_current+0x88/0x120 > > > __vmalloc_node_range+0x250/0x2a0 > > > ? kmem_zalloc_greedy+0x2b/0x40 [xfs] > > > ? free_hot_cold_page+0x21f/0x280 > > > vzalloc+0x54/0x60 > > > ? kmem_zalloc_greedy+0x2b/0x40 [xfs] > > > kmem_zalloc_greedy+0x2b/0x40 [xfs] > > > xfs_bulkstat+0x11b/0x730 [xfs] > > > ? xfs_bulkstat_one_int+0x340/0x340 [xfs] > > > ? selinux_capable+0x20/0x30 > > > ? security_capable+0x48/0x60 > > > xfs_ioc_bulkstat+0xe4/0x190 [xfs] > > > xfs_file_ioctl+0x9dd/0xad0 [xfs] > > > ? do_filp_open+0xa5/0x100 > > > do_vfs_ioctl+0xa7/0x5e0 > > > SyS_ioctl+0x79/0x90 > > > do_syscall_64+0x67/0x180 > > > entry_SYSCALL64_slow_path+0x25/0x25 > > > > > > fsstress keeps looping inside kmem_zalloc_greedy without any way out > > > because vmalloc keeps failing due to fatal_signal_pending. > > > > > > Reported-by: Xiong Zhou <xzhou@redhat.com> > > > Analyzed-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> > > > Signed-off-by: Michal Hocko <mhocko@suse.com> > > > --- > > > fs/xfs/kmem.c | 2 ++ > > > 1 file changed, 2 insertions(+) > > > > > > diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c > > > index 339c696bbc01..ee95f5c6db45 100644 > > > --- a/fs/xfs/kmem.c > > > +++ b/fs/xfs/kmem.c > > > @@ -34,6 +34,8 @@ kmem_zalloc_greedy(size_t *size, size_t minsize, size_t maxsize) > > > size_t kmsize = maxsize; > > > > > > while (!(ptr = vzalloc(kmsize))) { > > > + if (kmsize == minsize) > > > + break; > > > if ((kmsize >>= 1) <= minsize) > > > kmsize = minsize; > > > } > > > > Seems wrong to me - this function used to have lots of callers and > > over time we've slowly removed them or replaced them with something > > else. I'd suggest removing it completely, replacing the call sites > > with kmem_zalloc_large(). > > Heh. I thought the reason why _greedy still exists (for its sole user > bulkstat) is that bulkstat had the flexibility to deal with receiving > 0, 1, or 4 pages. So yeah, we could just kill it. irbuf is sized to minimise AGI locking, but if memory is low it just uses what it can get. Keep in mind the number of inodes we need to process is determined by the userspace buffer size, which can easily be sized to hold tens of thousands of struct xfs_bulkstat. > But thinking even more stingily about memory, are there applications > that care about being able to bulkstat 16384 inodes at once? IIRC, xfsdump can bulkstat up to 64k inodes per call.... > How badly > does bulkstat need to be able to bulk-process more than a page's worth > of inobt records, anyway? Benchmark it on a busy system doing lots of other AGI work (e.g. a busy NFS server workload with a working set of tens of millions of inodes so it doesn't fit in cache) and find out. That's generally how I answer those sorts of questions... Cheers, Dave. -- Dave Chinner david@fromorbit.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/2] xfs: allow kmem_zalloc_greedy to fail 2017-03-03 22:54 ` Dave Chinner 2017-03-03 23:19 ` Darrick J. Wong @ 2017-03-06 13:21 ` Michal Hocko 1 sibling, 0 replies; 36+ messages in thread From: Michal Hocko @ 2017-03-06 13:21 UTC (permalink / raw) To: Dave Chinner Cc: Christoph Hellwig, Brian Foster, Tetsuo Handa, Xiong Zhou, linux-xfs, linux-mm, LKML, linux-fsdevel On Sat 04-03-17 09:54:44, Dave Chinner wrote: > On Thu, Mar 02, 2017 at 04:45:40PM +0100, Michal Hocko wrote: > > From: Michal Hocko <mhocko@suse.com> > > > > Even though kmem_zalloc_greedy is documented it might fail the current > > code doesn't really implement this properly and loops on the smallest > > allowed size for ever. This is a problem because vzalloc might fail > > permanently - we might run out of vmalloc space or since 5d17a73a2ebe > > ("vmalloc: back off when the current task is killed") when the current > > task is killed. The later one makes the failure scenario much more > > probable than it used to be because it makes vmalloc() failures > > permanent for tasks with fatal signals pending.. Fix this by bailing out > > if the minimum size request failed. > > > > This has been noticed by a hung generic/269 xfstest by Xiong Zhou. > > > > fsstress: vmalloc: allocation failure, allocated 12288 of 20480 bytes, mode:0x14080c2(GFP_KERNEL|__GFP_HIGHMEM|__GFP_ZERO), nodemask=(null) > > fsstress cpuset=/ mems_allowed=0-1 > > CPU: 1 PID: 23460 Comm: fsstress Not tainted 4.10.0-master-45554b2+ #21 > > Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 10/05/2016 > > Call Trace: > > dump_stack+0x63/0x87 > > warn_alloc+0x114/0x1c0 > > ? alloc_pages_current+0x88/0x120 > > __vmalloc_node_range+0x250/0x2a0 > > ? kmem_zalloc_greedy+0x2b/0x40 [xfs] > > ? free_hot_cold_page+0x21f/0x280 > > vzalloc+0x54/0x60 > > ? kmem_zalloc_greedy+0x2b/0x40 [xfs] > > kmem_zalloc_greedy+0x2b/0x40 [xfs] > > xfs_bulkstat+0x11b/0x730 [xfs] > > ? xfs_bulkstat_one_int+0x340/0x340 [xfs] > > ? selinux_capable+0x20/0x30 > > ? security_capable+0x48/0x60 > > xfs_ioc_bulkstat+0xe4/0x190 [xfs] > > xfs_file_ioctl+0x9dd/0xad0 [xfs] > > ? do_filp_open+0xa5/0x100 > > do_vfs_ioctl+0xa7/0x5e0 > > SyS_ioctl+0x79/0x90 > > do_syscall_64+0x67/0x180 > > entry_SYSCALL64_slow_path+0x25/0x25 > > > > fsstress keeps looping inside kmem_zalloc_greedy without any way out > > because vmalloc keeps failing due to fatal_signal_pending. > > > > Reported-by: Xiong Zhou <xzhou@redhat.com> > > Analyzed-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> > > Signed-off-by: Michal Hocko <mhocko@suse.com> > > --- > > fs/xfs/kmem.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c > > index 339c696bbc01..ee95f5c6db45 100644 > > --- a/fs/xfs/kmem.c > > +++ b/fs/xfs/kmem.c > > @@ -34,6 +34,8 @@ kmem_zalloc_greedy(size_t *size, size_t minsize, size_t maxsize) > > size_t kmsize = maxsize; > > > > while (!(ptr = vzalloc(kmsize))) { > > + if (kmsize == minsize) > > + break; > > if ((kmsize >>= 1) <= minsize) > > kmsize = minsize; > > } > > Seems wrong to me - this function used to have lots of callers and > over time we've slowly removed them or replaced them with something > else. I'd suggest removing it completely, replacing the call sites > with kmem_zalloc_large(). I do not really care how this gets fixed. Dropping kmem_zalloc_greedy sounds like a way to go. I am not familiar with xfs_bulkstat to do an edicated guess which allocation size to use. So I guess I have to postpone this to you guys if you prefer that route though. Thanks! -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: mm allocation failure and hang when running xfstests generic/269 on xfs 2017-03-02 15:30 ` Brian Foster 2017-03-02 15:45 ` [PATCH 1/2] xfs: allow kmem_zalloc_greedy to fail Michal Hocko @ 2017-03-02 15:47 ` Michal Hocko 1 sibling, 0 replies; 36+ messages in thread From: Michal Hocko @ 2017-03-02 15:47 UTC (permalink / raw) To: Brian Foster Cc: Christoph Hellwig, Tetsuo Handa, Xiong Zhou, linux-xfs, linux-mm, linux-kernel, linux-fsdevel On Thu 02-03-17 10:30:02, Brian Foster wrote: > On Thu, Mar 02, 2017 at 04:14:11PM +0100, Michal Hocko wrote: [...] > > I am not objecting to adding fatal_signal_pending as well I just thought > > that from the logic POV breaking after reaching the minimum size is just > > the right thing to do. We can optimize further by checking > > fatal_signal_pending and reducing retries when we know it doesn't make > > much sense but that should be done on top as an optimization IMHO. > > > > I don't think of it as an optimization to not invoke calls (a > non-deterministic number of times) that we know are going to fail, but the point is that vmalloc failure modes are an implementation detail which might change in the future. The fix should be really independent on the current implementation that is why I think the fatal_signal_pending is just an optimization. > ultimately I don't care too much how it's framed or if it's done in > separate patches or whatnot. As long as they are posted at the same > time. ;) Done -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: mm allocation failure and hang when running xfstests generic/269 on xfs 2017-03-02 14:51 ` Brian Foster 2017-03-02 15:14 ` Michal Hocko @ 2017-03-02 15:47 ` Christoph Hellwig 1 sibling, 0 replies; 36+ messages in thread From: Christoph Hellwig @ 2017-03-02 15:47 UTC (permalink / raw) To: Brian Foster Cc: Michal Hocko, Tetsuo Handa, Xiong Zhou, Christoph Hellwig, linux-xfs, linux-mm, linux-kernel, linux-fsdevel On Thu, Mar 02, 2017 at 09:51:31AM -0500, Brian Foster wrote: > Otherwise, I'm fine with breaking the infinite retry loop at the same > time. It looks like Christoph added this function originally so this > should probably require his ack as well.. I just moved the code around, but I'll take a look as well. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 36+ messages in thread
end of thread, other threads:[~2017-03-06 13:21 UTC | newest] Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-03-01 4:46 mm allocation failure and hang when running xfstests generic/269 on xfs Xiong Zhou 2017-03-02 0:37 ` Christoph Hellwig 2017-03-02 5:19 ` Xiong Zhou 2017-03-02 6:41 ` Bob Liu 2017-03-02 6:47 ` Anshuman Khandual 2017-03-02 8:42 ` Michal Hocko 2017-03-02 9:23 ` Xiong Zhou 2017-03-02 10:04 ` Tetsuo Handa 2017-03-02 10:35 ` Michal Hocko 2017-03-02 10:53 ` mm allocation failure and hang when running xfstests generic/269on xfs Tetsuo Handa 2017-03-02 12:24 ` mm allocation failure and hang when running xfstests generic/269 on xfs Brian Foster 2017-03-02 12:49 ` Michal Hocko 2017-03-02 13:00 ` Brian Foster 2017-03-02 13:07 ` Tetsuo Handa 2017-03-02 13:27 ` Michal Hocko 2017-03-02 13:41 ` Brian Foster 2017-03-02 13:50 ` Michal Hocko 2017-03-02 14:23 ` Brian Foster 2017-03-02 14:34 ` Michal Hocko 2017-03-02 14:51 ` Brian Foster 2017-03-02 15:14 ` Michal Hocko 2017-03-02 15:30 ` Brian Foster 2017-03-02 15:45 ` [PATCH 1/2] xfs: allow kmem_zalloc_greedy to fail Michal Hocko 2017-03-02 15:45 ` [PATCH 2/2] xfs: back off from kmem_zalloc_greedy if the task is killed Michal Hocko 2017-03-02 15:49 ` Christoph Hellwig 2017-03-02 15:59 ` Brian Foster 2017-03-02 15:49 ` [PATCH 1/2] xfs: allow kmem_zalloc_greedy to fail Christoph Hellwig 2017-03-02 15:59 ` Brian Foster 2017-03-02 16:16 ` Michal Hocko 2017-03-02 16:44 ` Darrick J. Wong 2017-03-03 22:54 ` Dave Chinner 2017-03-03 23:19 ` Darrick J. Wong 2017-03-04 4:48 ` Dave Chinner 2017-03-06 13:21 ` Michal Hocko 2017-03-02 15:47 ` mm allocation failure and hang when running xfstests generic/269 on xfs Michal Hocko 2017-03-02 15:47 ` Christoph Hellwig
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).