All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Brian Foster <bfoster@redhat.com>
Cc: Ming Lin <mlin@kernel.org>, Christoph Hellwig <hch@lst.de>,
	linux-xfs@vger.kernel.org,
	Ceph Development <ceph-devel@vger.kernel.org>,
	"LIU, Fei" <james.liu@alibaba-inc.com>,
	xiongwei.jiang@alibaba-inc.com, boqian.zy@alibaba-inc.com,
	sheng.qiu@alibaba-inc.com, Sage Weil <sweil@redhat.com>
Subject: Re: xlog_write: reservation ran out
Date: Thu, 4 May 2017 10:34:35 +1000	[thread overview]
Message-ID: <20170504003435.GC17542@dastard> (raw)
In-Reply-To: <20170503141030.GA11549@bfoster.bfoster>

On Wed, May 03, 2017 at 10:10:32AM -0400, Brian Foster wrote:
> On Wed, May 03, 2017 at 09:51:52AM +1000, Dave Chinner wrote:
> > On Mon, May 01, 2017 at 10:41:06AM -0700, Ming Lin wrote:
> > > On 5/1/2017 6:12 AM, Brian Foster wrote:
> > > > On Sun, Apr 30, 2017 at 11:10:15PM -0700, Ming Lin wrote:
> > > >>
> > > >> On 4/28/2017 1:56 PM, Ming Lin wrote:
> > > >>> I'm new to xfs code.
> > > >>>
> > > >>> Search  XFS_TRANS_INACTIVE and the usage is like below,
> > > >>>
> > > >>> xfs_trans_alloc(mp, XFS_TRANS_INACTIVE);
> > > >>> xfs_trans_reserve(tp, &M_RES(mp)->tr_itruncate, 0, 0);
> > > >>>
> > > >>> xfs_trans_alloc(mp, XFS_TRANS_INACTIVE);
> > > >>> xfs_trans_reserve(tp, &M_RES(mp)->tr_ifree, XFS_IFREE_SPACE_RES(mp), 0);
> > > >>>
> > > >>> seems tr_remove is not related.
> > > >>> I'll just try to enlarge the reservation for tr_itruncate and tr_ifree.
> > > >>
> > > >> Now things are a little bit more clear. I tried below debug patch.
> > > >> The t_decrease[] array was used to track where the space was used.
> > > >>
> > > >>  fs/xfs/libxfs/xfs_trans_resv.c |  4 ++--
> > > >>  fs/xfs/xfs_log.c               | 23 ++++++++++++++++++++---
> > > >>  fs/xfs/xfs_log_cil.c           |  8 ++++++++
> > > >>  fs/xfs/xfs_log_priv.h          |  3 +++
> > > >>  fs/xfs/xfs_super.c             |  1 +
> > > >>  5 files changed, 34 insertions(+), 5 deletions(-)
> > > >>
> > > > ...
> > > >>  277 static void
> > > >>  278 xlog_cil_insert_items(
> > > >>  279         struct xlog             *log,
> > > >>  280         struct xfs_trans        *tp)
> > > >>  281 {
> > > >>
> > > >> ....
> > > >>
> > > >>  340         /* do we need space for more log record headers? */
> > > >>  341         iclog_space = log->l_iclog_size - log->l_iclog_hsize;
> > > >>  342         if (len > 0 && (ctx->space_used / iclog_space !=
> > > >>  343                                 (ctx->space_used + len) / iclog_space)) {
> > > >>  344                 int hdrs;
> > > >>  345
> > > >>  346                 hdrs = (len + iclog_space - 1) / iclog_space;
> > > >>  347                 /* need to take into account split region headers, too */
> > > >>  348                 hdrs *= log->l_iclog_hsize + sizeof(struct xlog_op_header);
> > > >>  349                 ctx->ticket->t_unit_res += hdrs;
> > > >>  350                 ctx->ticket->t_curr_res += hdrs;
> > > >>  351                 tp->t_ticket->t_curr_res -= hdrs;
> > > >>  352                 tp->t_ticket->t_decrease[6] = hdrs;
> > > >>  353                 ASSERT(tp->t_ticket->t_curr_res >= len);
> > > >>  354         }
> > > >>  355         tp->t_ticket->t_curr_res -= len;
> > > >>  356         tp->t_ticket->t_decrease[7] = len;
> > > >>  357         ctx->space_used += len;
> > > >>  358
> > > >>  359         spin_unlock(&cil->xc_cil_lock);
> > > >>  360 }
> > > >>
> > > >> Any idea why it used so many reservation space here?
> > > >>
> > > > 
> > > > Nothing really rings a bell for me atm. Perhaps others might have ideas.
> > > > That does appear to be a sizable overrun, as opposed to a few bytes that
> > > > could more likely be attributed to rounding, header accounting issues or
> > > > something of that nature.
> > > 
> > > FYI, here are some numbers.
> > > 
> > > The original "unit res" is 83024. I made it x2 larger, so now it's 166048
> > > "unit res" - "current res" = the reservation space already used
> > > 
> > > XFS (nvme10n1p1): xlog_write: reservation summary:
> > >   trans type  = INACTIVE (3)
> > >   unit res    = 166048 bytes
> > >   current res = 77088 bytes
> > >   total reg   = 0 bytes (o/flow = 0 bytes)
> > >   ophdrs      = 0 (ophdr space = 0 bytes)
> > >   ophdr + reg = 0 bytes
> > >   num regions = 0
> > 
> > So the problem here is that the ticket summary is being printed
> > after everything useful has been removed from the ticket and moved
> > to the CIL. i.e. for more useful debug, this overrun condition
> > needs to be checked between the xlog_cil_insert_format_items() call
> > and moving the items off the transaction in xfs_log_commit_cil().
> > 
> > That way xlog_print_tic_res() will actually tell us how many regions
> > are being logged and give us a much better idea of the scope of
> > changes logged in the transaction that lead to the overrun.
> > 
> 
> Hmm.. interesting idea, but I'm not sure that just moving the ticket
> summary prior to the CIL insertion tells us much. For one, this is a
> transaction ticket and afaict the region count bits are only really used
> for a CIL ticket, unless I'm missing something..?

Seems like it - I was talking about dumping the logged regions in the
transaction log items, not the formatted log vectors passed to
xlog_write() from the CIL.

We've got the log items and their vectors still attached to the
transaction pointer after the call to xlog_cil_insert_format_items()
(i.e. on the tp->t_items list), and we know what the formatted
length is going to be and so we can directly check for overrun
before doing anything else. i.e. something like:

....
	xlog_cil_insert_format_items(log, tp, &len, &diff_iovecs);

	/* account for space used by new iovec headers  */
	len += diff_iovecs * sizeof(xlog_op_header_t);

	/* trivial overrun check */
	if (tp->t_ticket->t_curr_res - len < 0) {
		/* factor printing into helper */

		xfs_warn("transaction reservation overrun <transaction info>")
		/* dump transaction/ticket info */

		list_for_each_entry(lidp, &tp->t_items, lid_trans) {
			struct xfs_log_item     *lip = lidp->lid_item;
			struct xfs_log_iovec	*lv = lip->li_lv;

			/* dump log item info */
			xfs_warn("Log item type, flags, ...");

			/* dump lv array info */

			/* dump formatted log vector regions */
			for (i = 0; i < lv->lv_niovecs; i++) {
				/* dump individual lv info */
				/* dump first 32 bytes */
			}
		}
	}

> IOWs, it seems to me
> that if we had such pre-CIL format overrun detection, we'd probably want
> to add new code to dump information about what is attached to the
> transaction so we can analyze what was logged against the actual
> reservation.

Yes, that's exactly what I'm suggesting needs to be done.

Keep in mind this is what xlog_print_tic_res() did prior to the
introduction of delayed logging. i.e. I broke the detailed region
logging of transaction overruns 7 years ago when changing it to
provide detailed logging of CIL ticket reservation overruns, and
this is the first time since then that we've needed transaction
overrun debugging...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  parent reply	other threads:[~2017-05-04  0:34 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-28 20:15 xlog_write: reservation ran out Ming Lin
2017-04-28 20:24 ` Ming Lin
2017-04-28 20:56   ` Ming Lin
2017-05-01  6:10     ` Ming Lin
2017-05-01 13:12       ` Brian Foster
2017-05-01 17:41         ` Ming Lin
2017-05-01 18:48           ` Brian Foster
2017-05-01 20:18             ` Ming Lin
2017-05-02 23:51           ` Dave Chinner
2017-05-03 14:10             ` Brian Foster
2017-05-03 15:53               ` Darrick J. Wong
2017-05-03 16:15                 ` Brian Foster
2017-05-04  0:37                   ` Dave Chinner
2017-05-04  0:34               ` Dave Chinner [this message]
2017-05-04 11:51                 ` Brian Foster
2017-05-05  3:16                   ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170504003435.GC17542@dastard \
    --to=david@fromorbit.com \
    --cc=bfoster@redhat.com \
    --cc=boqian.zy@alibaba-inc.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=hch@lst.de \
    --cc=james.liu@alibaba-inc.com \
    --cc=linux-xfs@vger.kernel.org \
    --cc=mlin@kernel.org \
    --cc=sheng.qiu@alibaba-inc.com \
    --cc=sweil@redhat.com \
    --cc=xiongwei.jiang@alibaba-inc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.