All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Brian Foster <bfoster@redhat.com>
Cc: Dave Chinner <david@fromorbit.com>, linux-xfs@vger.kernel.org
Subject: Re: [PATCH 2/3] xfs: add kmem_alloc_io()
Date: Wed, 21 Aug 2019 08:08:01 -0700	[thread overview]
Message-ID: <20190821150801.GF1037350@magnolia> (raw)
In-Reply-To: <20190821133533.GB19646@bfoster>

On Wed, Aug 21, 2019 at 09:35:33AM -0400, Brian Foster wrote:
> On Wed, Aug 21, 2019 at 06:38:19PM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > Memory we use to submit for IO needs strict alignment to the
> > underlying driver contraints. Worst case, this is 512 bytes. Given
> > that all allocations for IO are always a power of 2 multiple of 512
> > bytes, the kernel heap provides natural alignment for objects of
> > these sizes and that suffices.
> > 
> > Until, of course, memory debugging of some kind is turned on (e.g.
> > red zones, poisoning, KASAN) and then the alignment of the heap
> > objects is thrown out the window. Then we get weird IO errors and
> > data corruption problems because drivers don't validate alignment
> > and do the wrong thing when passed unaligned memory buffers in bios.
> > 
> > TO fix this, introduce kmem_alloc_io(), which will guaranteeat least
> 
> s/TO/To/
> 
> > 512 byte alignment of buffers for IO, even if memory debugging
> > options are turned on. It is assumed that the minimum allocation
> > size will be 512 bytes, and that sizes will be power of 2 mulitples
> > of 512 bytes.
> > 
> > Use this everywhere we allocate buffers for IO.
> > 
> > This no longer fails with log recovery errors when KASAN is enabled
> > due to the brd driver not handling unaligned memory buffers:
> > 
> > # mkfs.xfs -f /dev/ram0 ; mount /dev/ram0 /mnt/test
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > ---
> >  fs/xfs/kmem.c            | 61 +++++++++++++++++++++++++++++-----------
> >  fs/xfs/kmem.h            |  1 +
> >  fs/xfs/xfs_buf.c         |  4 +--
> >  fs/xfs/xfs_log.c         |  2 +-
> >  fs/xfs/xfs_log_recover.c |  2 +-
> >  fs/xfs/xfs_trace.h       |  1 +
> >  6 files changed, 50 insertions(+), 21 deletions(-)
> > 
> > diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c
> > index edcf393c8fd9..ec693c0fdcff 100644
> > --- a/fs/xfs/kmem.c
> > +++ b/fs/xfs/kmem.c
> ...
> > @@ -62,6 +56,39 @@ kmem_alloc_large(size_t size, xfs_km_flags_t flags)
> >  	return ptr;
> >  }
> >  
> > +/*
> > + * Same as kmem_alloc_large, except we guarantee a 512 byte aligned buffer is
> > + * returned. vmalloc always returns an aligned region.
> > + */
> > +void *
> > +kmem_alloc_io(size_t size, xfs_km_flags_t flags)
> > +{
> > +	void	*ptr;
> > +
> > +	trace_kmem_alloc_io(size, flags, _RET_IP_);
> > +
> > +	ptr = kmem_alloc(size, flags | KM_MAYFAIL);
> > +	if (ptr) {
> > +		if (!((long)ptr & 511))

Er... didn't you say "it needs to grab the alignment from
[blk_]queue_dma_alignment(), not use a hard coded value of 511"?

How is this different?  If this buffer is really for IO then shouldn't
we pass in the buftarg or something so that we find the real alignment?
Or check it down in the xfs_buf code that associates a page to a buffer?

Even if all that logic is hidden behind CONFIG_XFS_DEBUG?

> > +			return ptr;
> > +		kfree(ptr);
> > +	}
> > +	return __kmem_vmalloc(size, flags);

How about checking the vmalloc alignment too?  If we're going to be this
paranoid we might as well go all the way. :)

--D

> > +}
> 
> Even though it is unfortunate, this seems like a quite reasonable and
> isolated temporary solution to the problem to me. The one concern I have
> is if/how much this could affect performance under certain
> circumstances. I realize that these callsites are isolated in the common
> scenario. Less common scenarios like sub-page block sizes (whether due
> to explicit mkfs time format or default configurations on larger page
> size systems) can fall into this path much more frequently, however.
> 
> Since this implies some kind of vm debug option is enabled, performance
> itself isn't critical when this solution is active. But how bad is it in
> those cases where we might depend on this more heavily? Have you
> confirmed that the end configuration is still "usable," at least?
> 
> I ask because the repeated alloc/free behavior can easily be avoided via
> something like an mp flag (which may require a tweak to the
> kmem_alloc_io() interface) to skip further kmem_alloc() calls from this
> path once we see one unaligned allocation. That assumes this behavior is
> tied to functionality that isn't dynamically configured at runtime, of
> course.
> 
> Brian
> 
> > +
> > +void *
> > +kmem_alloc_large(size_t size, xfs_km_flags_t flags)
> > +{
> > +	void	*ptr;
> > +
> > +	trace_kmem_alloc_large(size, flags, _RET_IP_);
> > +
> > +	ptr = kmem_alloc(size, flags | KM_MAYFAIL);
> > +	if (ptr)
> > +		return ptr;
> > +	return __kmem_vmalloc(size, flags);
> > +}
> > +
> >  void *
> >  kmem_realloc(const void *old, size_t newsize, xfs_km_flags_t flags)
> >  {
> > diff --git a/fs/xfs/kmem.h b/fs/xfs/kmem.h
> > index 267655acd426..423a1fa0fcd6 100644
> > --- a/fs/xfs/kmem.h
> > +++ b/fs/xfs/kmem.h
> > @@ -59,6 +59,7 @@ kmem_flags_convert(xfs_km_flags_t flags)
> >  }
> >  
> >  extern void *kmem_alloc(size_t, xfs_km_flags_t);
> > +extern void *kmem_alloc_io(size_t, xfs_km_flags_t);
> >  extern void *kmem_alloc_large(size_t size, xfs_km_flags_t);
> >  extern void *kmem_realloc(const void *, size_t, xfs_km_flags_t);
> >  static inline void  kmem_free(const void *ptr)
> > diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> > index ca0849043f54..7bd1f31febfc 100644
> > --- a/fs/xfs/xfs_buf.c
> > +++ b/fs/xfs/xfs_buf.c
> > @@ -353,7 +353,7 @@ xfs_buf_allocate_memory(
> >  	 */
> >  	size = BBTOB(bp->b_length);
> >  	if (size < PAGE_SIZE) {
> > -		bp->b_addr = kmem_alloc(size, KM_NOFS);
> > +		bp->b_addr = kmem_alloc_io(size, KM_NOFS);
> >  		if (!bp->b_addr) {
> >  			/* low memory - use alloc_page loop instead */
> >  			goto use_alloc_page;
> > @@ -368,7 +368,7 @@ xfs_buf_allocate_memory(
> >  		}
> >  		bp->b_offset = offset_in_page(bp->b_addr);
> >  		bp->b_pages = bp->b_page_array;
> > -		bp->b_pages[0] = virt_to_page(bp->b_addr);
> > +		bp->b_pages[0] = kmem_to_page(bp->b_addr);
> >  		bp->b_page_count = 1;
> >  		bp->b_flags |= _XBF_KMEM;
> >  		return 0;
> > diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> > index 7fc3c1ad36bc..1830d185d7fc 100644
> > --- a/fs/xfs/xfs_log.c
> > +++ b/fs/xfs/xfs_log.c
> > @@ -1415,7 +1415,7 @@ xlog_alloc_log(
> >  		iclog->ic_prev = prev_iclog;
> >  		prev_iclog = iclog;
> >  
> > -		iclog->ic_data = kmem_alloc_large(log->l_iclog_size,
> > +		iclog->ic_data = kmem_alloc_io(log->l_iclog_size,
> >  				KM_MAYFAIL);
> >  		if (!iclog->ic_data)
> >  			goto out_free_iclog;
> > diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> > index 13d1d3e95b88..b4a6a008986b 100644
> > --- a/fs/xfs/xfs_log_recover.c
> > +++ b/fs/xfs/xfs_log_recover.c
> > @@ -125,7 +125,7 @@ xlog_alloc_buffer(
> >  	if (nbblks > 1 && log->l_sectBBsize > 1)
> >  		nbblks += log->l_sectBBsize;
> >  	nbblks = round_up(nbblks, log->l_sectBBsize);
> > -	return kmem_alloc_large(BBTOB(nbblks), KM_MAYFAIL);
> > +	return kmem_alloc_io(BBTOB(nbblks), KM_MAYFAIL);
> >  }
> >  
> >  /*
> > diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> > index 8bb8b4704a00..eaae275ed430 100644
> > --- a/fs/xfs/xfs_trace.h
> > +++ b/fs/xfs/xfs_trace.h
> > @@ -3604,6 +3604,7 @@ DEFINE_EVENT(xfs_kmem_class, name, \
> >  	TP_PROTO(ssize_t size, int flags, unsigned long caller_ip), \
> >  	TP_ARGS(size, flags, caller_ip))
> >  DEFINE_KMEM_EVENT(kmem_alloc);
> > +DEFINE_KMEM_EVENT(kmem_alloc_io);
> >  DEFINE_KMEM_EVENT(kmem_alloc_large);
> >  DEFINE_KMEM_EVENT(kmem_realloc);
> >  DEFINE_KMEM_EVENT(kmem_zone_alloc);
> > -- 
> > 2.23.0.rc1
> > 

  reply	other threads:[~2019-08-21 15:08 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-21  8:38 [PATCH 0/3] xfs: avoid IO issues unaligned memory allocation Dave Chinner
2019-08-21  8:38 ` [PATCH 1/3] xfs: add kmem allocation trace points Dave Chinner
2019-08-21 13:34   ` Brian Foster
2019-08-21 23:20   ` Christoph Hellwig
2019-08-21  8:38 ` [PATCH 2/3] xfs: add kmem_alloc_io() Dave Chinner
2019-08-21 13:35   ` Brian Foster
2019-08-21 15:08     ` Darrick J. Wong [this message]
2019-08-21 21:24       ` Dave Chinner
2019-08-21 15:23     ` Eric Sandeen
2019-08-21 21:14     ` Dave Chinner
2019-08-22 13:40       ` Brian Foster
2019-08-22 22:39         ` Dave Chinner
2019-08-23 12:10           ` Brian Foster
2019-08-21 23:24   ` Christoph Hellwig
2019-08-22  0:31     ` Dave Chinner
2019-08-22  7:59       ` Christoph Hellwig
2019-08-22  8:51         ` Peter Zijlstra
2019-08-22  9:10           ` Peter Zijlstra
2019-08-22 10:14             ` Dave Chinner
2019-08-22 11:14               ` Vlastimil Babka
2019-08-22 12:07                 ` Dave Chinner
2019-08-22 12:19                   ` Vlastimil Babka
2019-08-22 13:17                     ` Dave Chinner
2019-08-22 14:26                       ` Vlastimil Babka
2019-08-26 12:21                         ` Michal Hocko
2019-08-21  8:38 ` [PATCH 3/3] xfs: alignment check bio buffers Dave Chinner
2019-08-21 13:39   ` Brian Foster
2019-08-21 21:39     ` Dave Chinner
2019-08-22 13:47       ` Brian Foster
2019-08-22 23:03         ` Dave Chinner
2019-08-23 12:33           ` Brian Foster
2019-08-21 23:30     ` Christoph Hellwig
2019-08-22  0:44       ` Dave Chinner
2019-08-21 23:29   ` Christoph Hellwig
2019-08-22  0:37     ` Dave Chinner
2019-08-22  8:03       ` Christoph Hellwig
2019-08-22 10:17         ` Dave Chinner
2019-08-22  2:50     ` Ming Lei
2019-08-22  4:49       ` Dave Chinner
2019-08-22  7:23         ` Ming Lei
2019-08-22  8:08         ` Christoph Hellwig
2019-08-22 10:20           ` Ming Lei
2019-08-23  0:14             ` Christoph Hellwig
2019-08-23  1:19               ` Ming Lei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190821150801.GF1037350@magnolia \
    --to=darrick.wong@oracle.com \
    --cc=bfoster@redhat.com \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.