linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Boaz Harrosh <boaz@plexistor.com>
To: Dave Chinner <david@fromorbit.com>,
	Linda Knippers <linda.knippers@hp.com>
Cc: Jeff Moyer <jmoyer@redhat.com>,
	"matthew r. wilcox" <matthew.r.wilcox@intel.com>,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: Re: regression introduced by "block: Add support for DAX reads/writes to block devices"
Date: Thu, 06 Aug 2015 10:52:47 +0300	[thread overview]
Message-ID: <55C3124F.3020602@plexistor.com> (raw)
In-Reply-To: <20150806032421.GA16638@dastard>

On 08/06/2015 06:24 AM, Dave Chinner wrote:
> On Wed, Aug 05, 2015 at 09:42:54PM -0400, Linda Knippers wrote:
>> On 08/05/2015 06:01 PM, Dave Chinner wrote:
>>> On Wed, Aug 05, 2015 at 04:19:08PM -0400, Jeff Moyer wrote:
<>
>>>>
>>>> I sat down with Linda to look into it, and the problem is that mkfs.xfs
>>>> sets the blocksize of the device to 512 (via BLKBSZSET), and then reads
>>>> from the last sector of the device.  This results in dax_io trying to do
>>>> a page-sized I/O at 512 bytes from the end of the device.
>>>

This part I do not understand. how is mkfs.xfs reading the sector?
Is it through open(/dev/pmem0,...) ? O_DIRECT?

If so then yes the inode of /dev/pmem0 is IS_DAX() and will try
to use the dax.c stuff. (I think, which Kernel?)

Which means this is a bug.

>>> Right - we have to be able to do IO to that last sector, so this is
>>> a sanity check to tell if the block dev is large enough. The XFS
>>> kernel code does the same end-of-device sector read when the
>>> filesystem is mounted, too.
>>>
>>>> bdev_direct_access, receiving this bogus pos/size combo, returns
>>>> -ERANGE:
>>>>
>>>> 	if ((sector + DIV_ROUND_UP(size, 512)) >
>>>> 					part_nr_sects_read(bdev->bd_part))
>>>> 		return -ERANGE;
>>>>
>>>> Given that file systems supporting dax refuse to mount with a blocksize
>>>> != page size, I'm guessing this is sort of expected behavior.  However,
>>>> we really shouldn't be breaking direct I/O on pmem devices.
>>>

No this is a BUG. read/write buffered/direct to an IS_DAX() inode should
be able to be of any alignment size. Since with DAX buffered/direct is
exact same code path and buffered IO expects any size IO.

This is probably a bug in the DAX handling of the bdev-inode. Let me
test this. I will send a fix ASAP.

<>
>>> the output of:
>>>
>>> 	/sys/block/pmem0/queue/logical_block_size
>> 512
>>
>>> 	/sys/block/pmem0/queue/physical_block_size
>> 512
>>

There is a pending fix for this.
Do you need it sent to stable ?

>>> 	/sys/block/pmem0/queue/hw_sector_size
>> 512
>>
>>> 	/sys/block/pmem0/queue/minimum_io_size
>> 512
>>
>>> 	/sys/block/pmem0/queue/optimal_io_size
>> 0

Thanks
Boaz



  reply	other threads:[~2015-08-06  7:52 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-08-05 20:19 regression introduced by "block: Add support for DAX reads/writes to block devices" Jeff Moyer
2015-08-05 22:01 ` Dave Chinner
2015-08-06  1:42   ` Linda Knippers
2015-08-06  3:24     ` Dave Chinner
2015-08-06  7:52       ` Boaz Harrosh [this message]
2015-08-06 20:34         ` Dave Chinner
2015-08-09  8:52           ` Boaz Harrosh
2015-08-10 16:32             ` Linda Knippers
2015-08-10 21:27               ` Dave Chinner
2015-08-10 23:04                 ` Linda Knippers
2015-08-06 14:21 ` Wilcox, Matthew R
2015-08-06 15:33   ` Jeff Moyer
2015-08-06 15:51     ` Wilcox, Matthew R
2015-08-06 21:30   ` Jeff Moyer
2015-08-07 18:11     ` Wilcox, Matthew R
2015-08-07 20:41       ` Jeff Moyer
2015-08-10  7:42         ` Boaz Harrosh
2015-08-12 21:11           ` Jeff Moyer
2015-08-13  5:32             ` Boaz Harrosh
2015-08-13 14:00               ` Jeff Moyer
2015-08-13 16:42                 ` Linda Knippers
2015-08-13 17:14                   ` Jeff Moyer
2015-08-13 17:52                     ` Linda Knippers
2015-08-13 18:19                       ` Jeff Moyer
2015-08-13 19:32                         ` Wilcox, Matthew R
2015-08-14 16:28                           ` Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55C3124F.3020602@plexistor.com \
    --to=boaz@plexistor.com \
    --cc=david@fromorbit.com \
    --cc=jmoyer@redhat.com \
    --cc=linda.knippers@hp.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=matthew.r.wilcox@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).