All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Ming Lei <ming.lei@redhat.com>
Cc: "Darrick J . Wong" <darrick.wong@oracle.com>,
	linux-xfs@vger.kernel.org, Jens Axboe <axboe@kernel.dk>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Dave Chinner <dchinner@redhat.com>,
	Christoph Hellwig <hch@lst.de>,
	Alexander Duyck <alexander.h.duyck@linux.intel.com>,
	Aaron Lu <aaron.lu@intel.com>, Christopher Lameter <cl@linux.com>,
	Linux FS Devel <linux-fsdevel@vger.kernel.org>,
	linux-mm@kvack.org, linux-block@vger.kernel.org
Subject: Re: [PATCH] xfs: allocate sector sized IO buffer via page_frag_alloc
Date: Tue, 26 Feb 2019 07:11:22 +1100	[thread overview]
Message-ID: <20190225201122.GF23020@dastard> (raw)
In-Reply-To: <20190225084623.GA8397@ming.t460p>

On Mon, Feb 25, 2019 at 04:46:25PM +0800, Ming Lei wrote:
> On Mon, Feb 25, 2019 at 03:36:48PM +1100, Dave Chinner wrote:
> > On Mon, Feb 25, 2019 at 12:09:04PM +0800, Ming Lei wrote:
> > > XFS uses kmalloc() to allocate sector sized IO buffer.
> > ....
> > > Use page_frag_alloc() to allocate the sector sized buffer, then the
> > > above issue can be fixed because offset_in_page of allocated buffer
> > > is always sector aligned.
> > 
> > Didn't we already reject this approach because page frags cannot be
> 
> I remembered there is this kind of issue mentioned, but just not found
> the details, so post out the patch for restarting the discussion.

As previously discussed, the only solution that fits all use cases
we have to support are a slab caches that do not break object
alignment when slab debug options are turned on.

> > reused and that pages allocated to the frag pool are pinned in
> > memory until all fragments allocated on the page have been freed?
> 
> Yes, that is one problem. But if one page is consumed, sooner or later,
> all fragments will be freed, then the page becomes available again.

Ah, no, your assumption about how metadata caching in XFS works is
flawed. Some metadata ends up being cached for the life of the
filesystem because it is so frequently referenced it never gets
reclaimed. AG headers, btree root blocks, etc.  And the XFS metadata
cache hangs on to such metadata even under extreme memory pressure
because if we reclaim it then any filesystem operation will need to
reallocate that memory to clean dirty pages and that is the very
last thing we want to do under extreme memory pressure conditions.

If allocation cannot reuse holes in pages (i.e. works as a proper
slab cache) then we are going to blow out the amount of memory that
the XFS metadata cache uses very badly on filesystems where block
size != page size. 

> > i.e. when we consider 64k page machines and 4k block sizes (i.e.
> > default config), every single metadata allocation is a sub-page
> > allocation and so will use this new page frag mechanism. IOWs, it
> > will result in fragmenting memory severely and typical memory
> > reclaim not being able to fix it because the metadata that pins each
> > page is largely unreclaimable...
> 
> It can be an issue in case of IO timeout & retry.

This makes no sense to me. Exactly how does filesystem memory
allocation affect IO timeouts and any retries the filesystem might
issue?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  parent reply	other threads:[~2019-02-25 20:13 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-25  4:09 [PATCH] xfs: allocate sector sized IO buffer via page_frag_alloc Ming Lei
2019-02-25  4:36 ` Dave Chinner
2019-02-25  8:46   ` Ming Lei
2019-02-25 10:03     ` Ming Lei
2019-02-25 20:11     ` Dave Chinner [this message]
2019-02-25 13:15   ` Vlastimil Babka
2019-02-25 20:26     ` Dave Chinner
2019-02-26  2:22       ` Ming Lei
2019-02-26  3:02         ` Dave Chinner
2019-02-26  3:27           ` Matthew Wilcox
2019-02-26  4:58             ` Dave Chinner
2019-02-26  9:33               ` Ming Lei
2019-02-26 10:06                 ` Vlastimil Babka
2019-02-26 10:06                   ` Vlastimil Babka
2019-02-26 11:12                   ` Ming Lei
2019-02-26 11:12                     ` Ming Lei
2019-02-26 12:12                     ` Matthew Wilcox
2019-02-26 12:35                       ` Ming Lei
2019-02-26 13:02                         ` Matthew Wilcox
2019-02-26 13:42                           ` Ming Lei
2019-02-26 14:04                             ` Matthew Wilcox
2019-02-26 16:14                               ` Darrick J. Wong
2019-02-26 16:19                                 ` Matthew Wilcox
2019-02-27  1:41                                   ` Ming Lei
2019-02-27  7:07                                   ` Vlastimil Babka
2019-03-08  8:18                                     ` Christoph Hellwig
2019-02-27 21:38                                 ` Dave Chinner
2019-02-26 15:30                             ` Christopher Lameter
2019-02-26 15:30                               ` Christopher Lameter
2019-02-26 20:45                 ` Dave Chinner
2019-02-27  1:50                   ` Ming Lei
2019-02-27  3:41                     ` Dave Chinner
2019-02-26 15:20     ` Christopher Lameter
2019-02-26 15:20       ` Christopher Lameter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190225201122.GF23020@dastard \
    --to=david@fromorbit.com \
    --cc=aaron.lu@intel.com \
    --cc=alexander.h.duyck@linux.intel.com \
    --cc=axboe@kernel.dk \
    --cc=cl@linux.com \
    --cc=darrick.wong@oracle.com \
    --cc=dchinner@redhat.com \
    --cc=hch@lst.de \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=ming.lei@redhat.com \
    --cc=vkuznets@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.