All of lore.kernel.org
 help / color / mirror / Atom feed
From: Brian Foster <bfoster@redhat.com>
To: Christoph Hellwig <hch@lst.de>
Cc: linux-xfs@vger.kernel.org, eguan@redhat.com, darrick.wong@oracle.com
Subject: Re: [PATCH 1/4] xfs: fix bogus minleft manipulations
Date: Wed, 14 Dec 2016 16:51:33 -0500	[thread overview]
Message-ID: <20161214215133.GA26688@bfoster.bfoster> (raw)
In-Reply-To: <20161214193626.GA12106@lst.de>

On Wed, Dec 14, 2016 at 08:36:26PM +0100, Christoph Hellwig wrote:
> On Wed, Dec 14, 2016 at 12:35:07PM -0500, Brian Foster wrote:
> > I'm still trying to grok the minleft/dop_low stuff, but doesn't this
> > increase the risk of bmbt block allocation failure?
> 
> No.  If working correctly it reduces the chance of a failing bmbt block
> allocation in transaction to zero, at the expense of potentially
> failing whole block allocations while we still have a few blocks left.
> But given that the first leads to a fs shutdown and the latter just to
> an ENOSPC it seems worthwhile.
> 

Indeed, I completely agree in principle with the series. It's much
better to leave some free space efficiency on the table for greater
reliability.  I just want to make sure I understand how this change
works and whether it does so correctly...

> > E.g., args.minleft
> > is set to the transaction reservation earlier in the function. AFAIK the
> > res is not guaranteed to be in a particular AG, but in that case we're
> > just setting a start AG and ultimately cycling through the AGs. If that
> > fails, this hunk resets minleft, restarts at fsbno 0 and activates the
> > sequential agno allocation requirement for subsequent allocs. Doesn't
> > removing this logic mean we can successfully reserve blocks in a write
> > transaction that will never ultimately be able to allocate bmbt blocks,
> > even if the data blocks can all be allocated..?
> 
> We only set it to res if no firstblock was set.  The only case where
> that is possible during the first bmbt block allocation in a reflink
> operation - for actual direct I/O writes, fallocate calls or unwritten
> extent conversion we will always have firstblock set from the allocation
> of the actual data extent.  And that allocation now has the proper minleft
> set after this series so that we are guaranteed we have enough space
> in one single AG for the whole allocation.
> 

Ok, I need to stare at that codepath a bit more..

In the meantime, what about the delalloc codepath as well? We reserve
indlen out of the global pool at write time and track the res in the
in-core extents. Writeback comes around later and we try to allocate
real blocks without any need for transaction reservation. Now that we
set 'args.minleft = xfs_bmap_worst_indlen()' in xfs_bmap_btalloc() and
never reset it, what guarantees any particular AG can satisfy that
request based on the global reservation?

I'm basically just a little nervous that we might rely on some of these
minleft hacks to avoid serious failures like in bmbt allocation or
writeback, despite the fact that the broader mechanism may be bogus...

Brian

> In the reflink case where firstblock wasn't set the minleft ensures
> that we fail the first allocation if we don't have enough blocks
> in a single AG.
> retryloop to try the allocation 
> 
> > The other thing to note here is that minleft is initialized to 0 and
> > only set to the transaction res conditionally, so if firstblock is
> > already set we won't use minleft at all.
> 
> Yes, that's intentional - we already did our minleft check when allocating
> the data extent.

  reply	other threads:[~2016-12-14 21:51 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-13 15:59 minleft fixes Christoph Hellwig
2016-12-13 15:59 ` [PATCH 1/4] xfs: fix bogus minleft manipulations Christoph Hellwig
2016-12-14 17:35   ` Brian Foster
2016-12-14 19:36     ` Christoph Hellwig
2016-12-14 21:51       ` Brian Foster [this message]
     [not found]         ` <20161215143430.GB29477@bfoster.bfoster>
2016-12-16  8:21           ` Christoph Hellwig
2016-12-19 11:38           ` Christoph Hellwig
2016-12-20 14:17             ` Brian Foster
2016-12-20 21:45               ` Dave Chinner
2016-12-15 22:09   ` Dave Chinner
2016-12-16  8:20     ` Christoph Hellwig
2017-01-04  6:32       ` Darrick J. Wong
2017-01-04  8:50         ` Christoph Hellwig
2016-12-13 15:59 ` [PATCH 2/4] xfs: calculate minleft correctly for bmap allocations Christoph Hellwig
2016-12-14 18:24   ` Brian Foster
2016-12-13 15:59 ` [PATCH 3/4] xfs: adjust allocation length in xfs_alloc_space_available Christoph Hellwig
2016-12-14 18:24   ` Brian Foster
2016-12-14 19:37     ` Christoph Hellwig
2016-12-15 20:41   ` Libor Klepáč
2016-12-16  8:20     ` Christoph Hellwig
2016-12-16  0:28   ` Dave Chinner
2016-12-16  8:25     ` Christoph Hellwig
2016-12-18 23:55       ` Dave Chinner
2016-12-13 15:59 ` [PATCH 4/4] xfs: don't rely on ->total " Christoph Hellwig
2016-12-14 18:30   ` Brian Foster
2016-12-14 19:38     ` Christoph Hellwig
2016-12-14 21:51       ` Brian Foster
2016-12-15  8:55         ` Christoph Hellwig
2016-12-15 12:00           ` Brian Foster
2016-12-14 10:51 ` minleft fixes Eryu Guan
2016-12-15 10:24   ` Eryu Guan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161214215133.GA26688@bfoster.bfoster \
    --to=bfoster@redhat.com \
    --cc=darrick.wong@oracle.com \
    --cc=eguan@redhat.com \
    --cc=hch@lst.de \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.