From: Linda Knippers <linda.knippers@hp.com>
To: Dave Chinner <david@fromorbit.com>, Jeff Moyer <jmoyer@redhat.com>
Cc: "matthew r. wilcox" <matthew.r.wilcox@intel.com>,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: Re: regression introduced by "block: Add support for DAX reads/writes to block devices"
Date: Wed, 05 Aug 2015 21:42:54 -0400 [thread overview]
Message-ID: <55C2BB9E.3040709@hp.com> (raw)
In-Reply-To: <20150805220113.GC3902@dastard>
On 08/05/2015 06:01 PM, Dave Chinner wrote:
> On Wed, Aug 05, 2015 at 04:19:08PM -0400, Jeff Moyer wrote:
>> Hi, Matthew,
>>
>> Linda Knippers noticed that commit (bbab37ddc20b) breaks mkfs.xfs:
>>
>> # mkfs -t xfs -f /dev/pmem0
>> meta-data=/dev/pmem0 isize=256 agcount=4, agsize=524288 blks
>> = sectsz=512 attr=2, projid32bit=1
>> = crc=0 finobt=0
>> data = bsize=4096 blocks=2097152, imaxpct=25
>> = sunit=0 swidth=0 blks
>> naming =version 2 bsize=4096 ascii-ci=0 ftype=0
>> log =internal log bsize=4096 blocks=2560, version=2
>> = sectsz=512 sunit=0 blks, lazy-count=1
>> realtime =none extsz=4096 blocks=0, rtextents=0
>> mkfs.xfs: read failed: Numerical result out of range
>>
>> I sat down with Linda to look into it, and the problem is that mkfs.xfs
>> sets the blocksize of the device to 512 (via BLKBSZSET), and then reads
>> from the last sector of the device. This results in dax_io trying to do
>> a page-sized I/O at 512 bytes from the end of the device.
>
> Right - we have to be able to do IO to that last sector, so this is
> a sanity check to tell if the block dev is large enough. The XFS
> kernel code does the same end-of-device sector read when the
> filesystem is mounted, too.
>
>> bdev_direct_access, receiving this bogus pos/size combo, returns
>> -ERANGE:
>>
>> if ((sector + DIV_ROUND_UP(size, 512)) >
>> part_nr_sects_read(bdev->bd_part))
>> return -ERANGE;
>>
>> Given that file systems supporting dax refuse to mount with a blocksize
>> != page size, I'm guessing this is sort of expected behavior. However,
>> we really shouldn't be breaking direct I/O on pmem devices.
>
> If the device is advertising 512 byte sector size support, then this
> needs to work, especially as DAX is completely transparent on the
> block device. Remember that DAX through a filesystem works on
> filesystem data block size boundaries, so a 512 byte sector/4k block
> size filesystem will be able to use DAX for mmapped files just fine.
>
>> So, what do you want to do? We could make the pmem device's logical
>> block size fixed at the sytem page size. Or, we could modify the dax
>> code to work with blocksize < pagesize. Or, we could continue using the
>> direct I/O codepath for direct block device access. What do you think?
>
> I don't know how the pmem device sets up it's limits. Can you post
> the output of:
>
> /sys/block/pmem0/queue/logical_block_size
512
> /sys/block/pmem0/queue/physical_block_size
512
> /sys/block/pmem0/queue/hw_sector_size
512
> /sys/block/pmem0/queue/minimum_io_size
512
> /sys/block/pmem0/queue/optimal_io_size
0
Let me know if you need anything else.
-- ljk
> As these all affect how mkfs.xfs configures the filesystem being
> made and so influences the size and alignment of the IO is does....
>
> Cheers,
>
> Dave.
>
next prev parent reply other threads:[~2015-08-06 1:43 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-08-05 20:19 regression introduced by "block: Add support for DAX reads/writes to block devices" Jeff Moyer
2015-08-05 22:01 ` Dave Chinner
2015-08-06 1:42 ` Linda Knippers [this message]
2015-08-06 3:24 ` Dave Chinner
2015-08-06 7:52 ` Boaz Harrosh
2015-08-06 20:34 ` Dave Chinner
2015-08-09 8:52 ` Boaz Harrosh
2015-08-10 16:32 ` Linda Knippers
2015-08-10 21:27 ` Dave Chinner
2015-08-10 23:04 ` Linda Knippers
2015-08-06 14:21 ` Wilcox, Matthew R
2015-08-06 15:33 ` Jeff Moyer
2015-08-06 15:51 ` Wilcox, Matthew R
2015-08-06 21:30 ` Jeff Moyer
2015-08-07 18:11 ` Wilcox, Matthew R
2015-08-07 20:41 ` Jeff Moyer
2015-08-10 7:42 ` Boaz Harrosh
2015-08-12 21:11 ` Jeff Moyer
2015-08-13 5:32 ` Boaz Harrosh
2015-08-13 14:00 ` Jeff Moyer
2015-08-13 16:42 ` Linda Knippers
2015-08-13 17:14 ` Jeff Moyer
2015-08-13 17:52 ` Linda Knippers
2015-08-13 18:19 ` Jeff Moyer
2015-08-13 19:32 ` Wilcox, Matthew R
2015-08-14 16:28 ` Dan Williams
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55C2BB9E.3040709@hp.com \
--to=linda.knippers@hp.com \
--cc=david@fromorbit.com \
--cc=jmoyer@redhat.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=matthew.r.wilcox@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).