All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Farman <farman@linux.ibm.com>
To: Keith Busch <kbusch@kernel.org>
Cc: Keith Busch <kbusch@fb.com>,
	linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org,
	linux-nvme@lists.infradead.org,
	Christian Borntraeger <borntraeger@linux.ibm.com>,
	axboe@kernel.dk, Kernel Team <Kernel-team@fb.com>,
	hch@lst.de, bvanassche@acm.org, damien.lemoal@opensource.wdc.com,
	ebiggers@kernel.org, pankydev8@gmail.com,
	Halil Pasic <pasic@linux.ibm.com>
Subject: Re: [PATCHv6 11/11] iomap: add support for dma aligned direct-io
Date: Mon, 27 Jun 2022 11:21:20 -0400	[thread overview]
Message-ID: <c5affe3096fd7b7996cb5fbcb0c41bbf3dde028e.camel@linux.ibm.com> (raw)
In-Reply-To: <e0038866ac54176beeac944c9116f7a9bdec7019.camel@linux.ibm.com>

On Thu, 2022-06-23 at 17:34 -0400, Eric Farman wrote:
> On Thu, 2022-06-23 at 16:32 -0400, Eric Farman wrote:
> > On Thu, 2022-06-23 at 13:11 -0600, Keith Busch wrote:
> > > On Thu, Jun 23, 2022 at 12:51:08PM -0600, Keith Busch wrote:
> > > > On Thu, Jun 23, 2022 at 02:29:13PM -0400, Eric Farman wrote:
> > > > > On Fri, 2022-06-10 at 12:58 -0700, Keith Busch wrote:
> > > > > > From: Keith Busch <kbusch@kernel.org>
> > > > > > 
> > > > > > Use the address alignment requirements from the
> > > > > > block_device
> > > > > > for
> > > > > > direct
> > > > > > io instead of requiring addresses be aligned to the block
> > > > > > size.
> > > > > 
> > > > > Hi Keith,
> > > > > 
> > > > > Our s390 PV guests recently started failing to boot from a
> > > > > -next
> > > > > host,
> > > > > and git blame brought me here.
> > > > > 
> > > > > As near as I have been able to tell, we start tripping up on
> > > > > this
> > > > > code
> > > > > from patch 9 [1] that gets invoked with this patch:
> > > > > 
> > > > > > 	for (k = 0; k < i->nr_segs; k++, skip = 0) {
> > > > > > 		size_t len = i->iov[k].iov_len - skip;
> > > > > > 
> > > > > > 		if (len > size)
> > > > > > 			len = size;
> > > > > > 		if (len & len_mask)
> > > > > > 			return false;
> > > > > 
> > > > > The iovec we're failing on has two segments, one with a len
> > > > > of
> > > > > x200
> > > > > (and base of x...000) and another with a len of xe00 (and a
> > > > > base
> > > > > of
> > > > > x...200), while len_mask is of course xfff.
> > > > > 
> > > > > So before I go any further on what we might have broken, do
> > > > > you
> > > > > happen
> > > > > to have any suggestions what might be going on here, or
> > > > > something
> > > > > I
> > > > > should try?
> > > > 
> > > > Thanks for the notice, sorry for the trouble. This check wasn't
> > > > intended to
> > > > have any difference from the previous code with respect to the
> > > > vector lengths.
> > > > 
> > > > Could you tell me if you're accessing this through the block
> > > > device
> > > > direct-io,
> > > > or through iomap filesystem?
> > 
> > Reasonably certain the failure's on iomap. I'd reverted the subject
> > patch from next-20220622 and got things in working order.
> > 
> > > If using iomap, the previous check was this:
> > > 
> > > 	unsigned int blkbits =
> > > blksize_bits(bdev_logical_block_size(iomap->bdev));
> > > 	unsigned int align = iov_iter_alignment(dio->submit.iter);
> > > 	...
> > > 	if ((pos | length | align) & ((1 << blkbits) - 1))
> > > 		return -EINVAL;
> > > 
> > > 
> > ...
> > > The result of "iov_iter_alignment()" would include "0xe00 |
> > > 0x200"
> > > in
> > > your
> > > example, and checked against 0xfff should have been failing prior
> > > to
> > > this
> > > patch. Unless I'm missing something...
> > 
> > Nope, you're not. I didn't look back at what the old check was
> > doing,
> > just saw "0xe00 and 0x200" and thought "oh there's one page"
> > instead
> > of
> > noting the code was or'ing them. My bad.
> > 
> > That was the last entry in my trace before the guest gave up, as
> > everything else through this code up to that point seemed okay.
> > I'll
> > pick up the working case and see if I can get a clearer picture
> > between
> > the two.
> 
> Looking over the trace again, I realize I did dump
> iov_iter_alignment()
> as a comparator, and I see one pass through that had a non-zero
> response but bdev_iter_is_aligned() returned true...
> 
> count = x1000
> iov_offset = x0
> nr_segs = 1
> iov_len = x1000	(len_mask = xfff)
> iov_base = x...200 (addr_mask = x1ff)
> 
> That particular pass through is in the middle of the stuff it tried
> to
> do, so I don't know if that's the cause or not but it strikes me as
> unusual. Will look into that tomorrow and report back.
> 

Apologies, it took me an extra day to get back to this, but it is
indeed this pass through that's causing our boot failures. I note that
the old code (in iomap_dio_bio_iter), did:

        if ((pos | length | align) & ((1 << blkbits) - 1))
                return -EINVAL;

With blkbits equal to 12, the resulting mask was 0x0fff against an
align value (from iov_iter_alignment) of x200 kicks us out.

The new code (in iov_iter_aligned_iovec), meanwhile, compares this:

                if ((unsigned long)(i->iov[k].iov_base + skip) &
addr_mask)
                        return false;

iov_base (and the output of the old iov_iter_aligned_iovec() routine)
is x200, but since addr_mask is x1ff this check provides a different
response than it used to.

To check this, I changed the comparator to len_mask (almost certainly
not the right answer since addr_mask is then unused, but it was good
for a quick test), and our PV guests are able to boot again with -next
running in the host.

Thanks,
Eric


  reply	other threads:[~2022-06-27 15:22 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-10 19:58 [PATCHv6 00/11] direct-io dma alignment Keith Busch
2022-06-10 19:58 ` [PATCHv6 01/11] block: fix infinite loop for invalid zone append Keith Busch
2022-06-10 19:58 ` [PATCHv6 02/11] block/bio: remove duplicate append pages code Keith Busch
2022-06-10 19:58 ` [PATCHv6 03/11] block: export dma_alignment attribute Keith Busch
2022-06-10 19:58 ` [PATCHv6 04/11] block: introduce bdev_dma_alignment helper Keith Busch
2022-06-10 19:58 ` [PATCHv6 05/11] block: add a helper function for dio alignment Keith Busch
2022-07-22 21:53   ` Bart Van Assche
2022-06-10 19:58 ` [PATCHv6 06/11] block/merge: count bytes instead of sectors Keith Busch
2022-07-22 21:57   ` Bart Van Assche
2022-06-10 19:58 ` [PATCHv6 07/11] block/bounce: " Keith Busch
2022-06-13 14:22   ` Christoph Hellwig
2022-07-22 22:01   ` Bart Van Assche
2022-07-25 14:46     ` Keith Busch
2022-06-10 19:58 ` [PATCHv6 08/11] iov: introduce iov_iter_aligned Keith Busch
2022-06-10 19:58 ` [PATCHv6 09/11] block: introduce bdev_iter_is_aligned helper Keith Busch
2022-06-10 19:58 ` [PATCHv6 10/11] block: relax direct io memory alignment Keith Busch
2022-06-10 19:58 ` [PATCHv6 11/11] iomap: add support for dma aligned direct-io Keith Busch
2022-06-23 18:29   ` Eric Farman
2022-06-23 18:51     ` Keith Busch
2022-06-23 19:11       ` Keith Busch
2022-06-23 20:32         ` Eric Farman
2022-06-23 21:34           ` Eric Farman
2022-06-27 15:21             ` Eric Farman [this message]
2022-06-27 15:36               ` Keith Busch
2022-06-28  9:00                 ` Halil Pasic
2022-06-28 15:20                   ` Eric Farman
2022-06-29  3:18                     ` Eric Farman
2022-06-29  3:52                       ` Keith Busch
2022-06-29 18:04                         ` Eric Farman
2022-06-29 19:07                           ` Keith Busch
2022-06-29 19:28                             ` Eric Farman
2022-06-30  5:45                             ` Christian Borntraeger
2022-07-22  7:36   ` Eric Biggers
2022-07-22  7:36     ` [f2fs-dev] " Eric Biggers
2022-07-22 14:43     ` Keith Busch
2022-07-22 14:43       ` [f2fs-dev] " Keith Busch
2022-07-22 18:01       ` Eric Biggers
2022-07-22 18:01         ` [f2fs-dev] " Eric Biggers
2022-07-22 20:26         ` Keith Busch
2022-07-22 20:26           ` [f2fs-dev] " Keith Busch
2022-07-25 18:19           ` Eric Biggers
2022-07-25 18:19             ` [f2fs-dev] " Eric Biggers
2022-07-24  2:13         ` Jaegeuk Kim
2022-07-24  2:13           ` [f2fs-dev] " Jaegeuk Kim
2022-07-22 17:53     ` Darrick J. Wong
2022-07-22 17:53       ` [f2fs-dev] " Darrick J. Wong
2022-07-22 18:12       ` Eric Biggers
2022-07-22 18:12         ` [f2fs-dev] " Eric Biggers
2022-07-23  5:03         ` Darrick J. Wong
2022-07-23  5:03           ` [f2fs-dev] " Darrick J. Wong
2022-06-13 21:22 ` [PATCHv6 00/11] direct-io dma alignment Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c5affe3096fd7b7996cb5fbcb0c41bbf3dde028e.camel@linux.ibm.com \
    --to=farman@linux.ibm.com \
    --cc=Kernel-team@fb.com \
    --cc=axboe@kernel.dk \
    --cc=borntraeger@linux.ibm.com \
    --cc=bvanassche@acm.org \
    --cc=damien.lemoal@opensource.wdc.com \
    --cc=ebiggers@kernel.org \
    --cc=hch@lst.de \
    --cc=kbusch@fb.com \
    --cc=kbusch@kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=pankydev8@gmail.com \
    --cc=pasic@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.