All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Carlos Maiolino <cmaiolino@redhat.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 5/5] xfs: Convert kmem_alloc() users
Date: Fri, 22 Nov 2019 14:30:48 -0800	[thread overview]
Message-ID: <20191122223048.GK6219@magnolia> (raw)
In-Reply-To: <20191122155756.GE6219@magnolia>

On Fri, Nov 22, 2019 at 07:57:56AM -0800, Darrick J. Wong wrote:
> On Wed, Nov 20, 2019 at 11:44:25AM +0100, Carlos Maiolino wrote:
> > Use kmalloc() directly.

<snip all this>

> > diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> > index 5423171e0b7d..7bb53fbf32f6 100644
> > --- a/fs/xfs/xfs_log_recover.c
> > +++ b/fs/xfs/xfs_log_recover.c
> > @@ -1962,7 +1962,7 @@ xlog_recover_buffer_pass1(
> >  		}
> >  	}
> >  
> > -	bcp = kmem_alloc(sizeof(struct xfs_buf_cancel), 0);
> > +	bcp = kmalloc(sizeof(struct xfs_buf_cancel), GFP_KERNEL | __GFP_NOFAIL);
> >  	bcp->bc_blkno = buf_f->blf_blkno;
> >  	bcp->bc_len = buf_f->blf_len;
> >  	bcp->bc_refcount = 1;
> > @@ -2932,7 +2932,8 @@ xlog_recover_inode_pass2(
> >  	if (item->ri_buf[0].i_len == sizeof(struct xfs_inode_log_format)) {
> >  		in_f = item->ri_buf[0].i_addr;
> >  	} else {
> > -		in_f = kmem_alloc(sizeof(struct xfs_inode_log_format), 0);
> > +		in_f = kmalloc(sizeof(struct xfs_inode_log_format),
> > +			       GFP_KERNEL | __GFP_NOFAIL);
> >  		need_free = 1;
> >  		error = xfs_inode_item_format_convert(&item->ri_buf[0], in_f);
> >  		if (error)
> > @@ -4271,7 +4272,7 @@ xlog_recover_add_to_trans(
> >  		return 0;
> >  	}
> >  
> > -	ptr = kmem_alloc(len, 0);
> > +	ptr = kmalloc(len, GFP_KERNEL | __GFP_NOFAIL);
> >  	memcpy(ptr, dp, len);
> >  	in_f = (struct xfs_inode_log_format *)ptr;
> 
> I noticed that kmalloc is generating warnings with generic/049 when 16k
> directories (-n size=16k) are enabled.  I /think/ this is because it's
> quite possible to write out an xlog_op_header with a length of more than
> a single page; log recovery will then try to allocate a huge memory
> buffer to recover the transaction; and so we try to do a huge NOFAIL
> allocation, which makes the mm unhappy.
> 
> The one thing I've noticed with this conversion series is that the flags
> translation isn't 100% 1-to-1.  Before, kmem_flags_convert didn't
> explicitly set __GFP_NOFAIL anywhere; we simply took the default
> behavior.  IIRC that means that small allocations actually /are/
> guaranteed to succeed, but multipage allocations certainly aren't.
> This seems to be one place where we could have asked for a lot of
> memory, failed to get it, and crashed.
> 
> Now that we explicitly set NOFAIL in all the places where we don't also
> check for a null return, I think we're just uncovering latent bugs
> lurking in the code base.  The kernel does actually fulfill the
> allocation request, but it's clearly not happy.

FWIW I ran with various dirsizes and options and it looks like this is
the only place where we screw this up... patches soon.

--D

> --D
> 
> Relevant snippet of dmesg; everything else was normal:
> 
>  XFS (sdd): Mounting V5 Filesystem
>  XFS (sdd): Starting recovery (logdev: internal)
>  ------------[ cut here ]------------
>  WARNING: CPU: 1 PID: 459342 at mm/page_alloc.c:3275 get_page_from_freelist+0x434/0x1660
>  Modules linked in: dm_thin_pool dm_persistent_data dm_bio_prison dm_snapshot dm_bufio dm_flakey xfs libcrc32c ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 ip_set_hash_ip ip_set_hash_net xt_tcpudp xt_set ip_set_hash_mac bfq ip_set nfnetlink ip6table_filter ip6_tables iptable_filter sch_fq_codel ip_tables x_tables nfsv4 af_packet [last unloaded: scsi_debug]
>  CPU: 1 PID: 459342 Comm: mount Not tainted 5.4.0-rc3-djw #rc3
>  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1ubuntu1 04/01/2014
>  RIP: 0010:get_page_from_freelist+0x434/0x1660
>  Code: e6 00 00 00 00 48 89 84 24 a0 00 00 00 0f 84 08 fd ff ff f7 84 24 c0 00 00 00 00 80 00 00 74 0c 83 bc 24 84 00 00 00 01 76 02 <0f> 0b 49 8d 87 10 05 00 00 48 89 c7 48 89 84 24 88 00 00 00 e8 03
>  RSP: 0018:ffffc900035d3918 EFLAGS: 00010202
>  RAX: ffff88803fffb680 RBX: 0000000000002968 RCX: ffffea0000c8e108
>  RDX: ffff88803fffbba8 RSI: ffff88803fffb870 RDI: 0000000000000000
>  RBP: 0000000000000002 R08: 0000000000000201 R09: 000000000002ff81
>  R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
>  R13: 0000000000048cc0 R14: 0000000000000001 R15: ffff88803fffb680
>  FS:  00007fcfdf89a080(0000) GS:ffff88803ea00000(0000) knlGS:0000000000000000
>  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>  CR2: 000055b202ec48c8 CR3: 000000003bfdc005 CR4: 00000000001606a0
>  Call Trace:
>   ? kvm_clock_read+0x14/0x30
>   __alloc_pages_nodemask+0x172/0x3a0
>   kmalloc_order+0x18/0x80
>   kmalloc_order_trace+0x1d/0x130
>   xlog_recover_add_to_trans+0x4b/0x340 [xfs]
>   xlog_recovery_process_trans+0xe9/0xf0 [xfs]
>   xlog_recover_process_data+0x9e/0x1f0 [xfs]
>   xlog_do_recovery_pass+0x3a9/0x7c0 [xfs]
>   xlog_do_log_recovery+0x72/0x150 [xfs]
>   xlog_do_recover+0x43/0x2a0 [xfs]
>   xlog_recover+0xdf/0x170 [xfs]
>   xfs_log_mount+0x2e3/0x300 [xfs]
>   xfs_mountfs+0x4e7/0x9f0 [xfs]
>   xfs_fc_fill_super+0x2f8/0x520 [xfs]
>   ? xfs_fs_destroy_inode+0x4f0/0x4f0 [xfs]
>   get_tree_bdev+0x198/0x270
>   vfs_get_tree+0x23/0xb0
>   do_mount+0x87e/0xa20
>   ksys_mount+0xb6/0xd0
>   __x64_sys_mount+0x21/0x30
>   do_syscall_64+0x50/0x180
>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
>  RIP: 0033:0x7fcfdf15d3ca
>  Code: 48 8b 0d c1 8a 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8e 8a 2c 00 f7 d8 64 89 01 48
>  RSP: 002b:00007fff0af10a58 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
>  RAX: ffffffffffffffda RBX: 000055b202ec1970 RCX: 00007fcfdf15d3ca
>  RDX: 000055b202ec1be0 RSI: 000055b202ec1c20 RDI: 000055b202ec1c00
>  RBP: 0000000000000000 R08: 000055b202ec1b80 R09: 0000000000000000
>  R10: 00000000c0ed0000 R11: 0000000000000202 R12: 000055b202ec1c00
>  R13: 000055b202ec1be0 R14: 0000000000000000 R15: 00007fcfdf67e8a4
>  irq event stamp: 18398
>  hardirqs last  enabled at (18397): [<ffffffff8123738f>] __slab_alloc.isra.83+0x6f/0x80
>  hardirqs last disabled at (18398): [<ffffffff81001d8a>] trace_hardirqs_off_thunk+0x1a/0x20
>  softirqs last  enabled at (18158): [<ffffffff81a003af>] __do_softirq+0x3af/0x4a4
>  softirqs last disabled at (18149): [<ffffffff8106528c>] irq_exit+0xbc/0xe0
>  ---[ end trace 3669c914fa8ccac6 ]---
> 
> AFAICT this is because inode buffers are 32K on this system
> 
> >  
> > diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c
> > index a2664afa10c3..2993af4a9935 100644
> > --- a/fs/xfs/xfs_qm.c
> > +++ b/fs/xfs/xfs_qm.c
> > @@ -988,7 +988,8 @@ xfs_qm_reset_dqcounts_buf(
> >  	if (qip->i_d.di_nblocks == 0)
> >  		return 0;
> >  
> > -	map = kmem_alloc(XFS_DQITER_MAP_SIZE * sizeof(*map), 0);
> > +	map = kmalloc(XFS_DQITER_MAP_SIZE * sizeof(*map),
> > +		      GFP_KERNEL | __GFP_NOFAIL);
> >  
> >  	lblkno = 0;
> >  	maxlblkcnt = XFS_B_TO_FSB(mp, mp->m_super->s_maxbytes);
> > diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
> > index 7f03b4ab3452..dfd419d402ea 100644
> > --- a/fs/xfs/xfs_rtalloc.c
> > +++ b/fs/xfs/xfs_rtalloc.c
> > @@ -962,7 +962,7 @@ xfs_growfs_rt(
> >  	/*
> >  	 * Allocate a new (fake) mount/sb.
> >  	 */
> > -	nmp = kmem_alloc(sizeof(*nmp), 0);
> > +	nmp = kmalloc(sizeof(*nmp), GFP_KERNEL | __GFP_NOFAIL);
> >  	/*
> >  	 * Loop over the bitmap blocks.
> >  	 * We will do everything one bitmap block at a time.
> > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> > index cc1933dc652f..eee831681e9c 100644
> > --- a/fs/xfs/xfs_super.c
> > +++ b/fs/xfs/xfs_super.c
> > @@ -1739,7 +1739,7 @@ static int xfs_init_fs_context(
> >  {
> >  	struct xfs_mount	*mp;
> >  
> > -	mp = kmem_alloc(sizeof(struct xfs_mount), KM_ZERO);
> > +	mp = kzalloc(sizeof(struct xfs_mount), GFP_KERNEL | __GFP_NOFAIL);
> >  	if (!mp)
> >  		return -ENOMEM;
> >  
> > -- 
> > 2.23.0
> > 

  reply	other threads:[~2019-11-22 22:30 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-20 10:44 [PATCH 0/5] Remove/convert more kmem_* wrappers Carlos Maiolino
2019-11-20 10:44 ` [PATCH V2 1/5] xfs: remove kmem_zone_zalloc() Carlos Maiolino
2019-11-20 10:44 ` [PATCH V2 2/5] xfs: Remove kmem_zone_alloc() wrapper Carlos Maiolino
2019-11-20 18:58   ` Darrick J. Wong
2019-11-20 10:44 ` [PATCH 3/5] xfs: remove kmem_zalloc() wrapper Carlos Maiolino
2019-11-20 19:00   ` Darrick J. Wong
2019-11-20 21:24   ` Dave Chinner
2019-11-20 21:41     ` Darrick J. Wong
2019-11-20 22:44       ` Carlos Maiolino
2019-11-20 23:08         ` Darrick J. Wong
2019-11-20 10:44 ` [PATCH 4/5] xfs: Remove kmem_realloc Carlos Maiolino
2019-11-20 19:00   ` Darrick J. Wong
2019-11-20 10:44 ` [PATCH 5/5] xfs: Convert kmem_alloc() users Carlos Maiolino
2019-11-20 19:00   ` Darrick J. Wong
2019-11-22 15:57   ` Darrick J. Wong
2019-11-22 22:30     ` Darrick J. Wong [this message]
2019-11-24 22:02       ` Darrick J. Wong
2019-11-25  9:28         ` Carlos Maiolino
2020-05-14  9:26 ` [PATCH 0/5] Remove/convert more kmem_* wrappers Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191122223048.GK6219@magnolia \
    --to=darrick.wong@oracle.com \
    --cc=cmaiolino@redhat.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.