linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linda Knippers <linda.knippers@hp.com>
To: Boaz Harrosh <boaz@plexistor.com>, Dave Chinner <david@fromorbit.com>
Cc: Jeff Moyer <jmoyer@redhat.com>,
	"matthew r. wilcox" <matthew.r.wilcox@intel.com>,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	Vishal Verma <vishal.l.verma@intel.com>
Subject: Re: regression introduced by "block: Add support for DAX reads/writes to block devices"
Date: Mon, 10 Aug 2015 12:32:08 -0400	[thread overview]
Message-ID: <55C8D208.1070903@hp.com> (raw)
In-Reply-To: <55C714D0.8070003@plexistor.com>

On 8/9/2015 4:52 AM, Boaz Harrosh wrote:
> On 08/06/2015 11:34 PM, Dave Chinner wrote:
>> On Thu, Aug 06, 2015 at 10:52:47AM +0300, Boaz Harrosh wrote:
>>> On 08/06/2015 06:24 AM, Dave Chinner wrote:
>>>> On Wed, Aug 05, 2015 at 09:42:54PM -0400, Linda Knippers wrote:
>>>>> On 08/05/2015 06:01 PM, Dave Chinner wrote:
>>>>>> On Wed, Aug 05, 2015 at 04:19:08PM -0400, Jeff Moyer wrote:
>>> <>
>>>>>>>
>>>>>>> I sat down with Linda to look into it, and the problem is that mkfs.xfs
>>>>>>> sets the blocksize of the device to 512 (via BLKBSZSET), and then reads
>>>>>>> from the last sector of the device.  This results in dax_io trying to do
>>>>>>> a page-sized I/O at 512 bytes from the end of the device.
>>>>>>
>>>
>>> This part I do not understand. how is mkfs.xfs reading the sector?
>>> Is it through open(/dev/pmem0,...) ? O_DIRECT?
>>
>> mkfs.xfs uses O_DIRECT. Only if open(O_DIRECT) fails or mkfs.xfs is
>> told that it is working on an image file does it fall back to
>> buffered IO. All of the XFS userspace tools work this way to prevent
>> page cache pollution issues with read-once or write-once data during
>> operation.
>>
> 
> Thanks, yes makes sense. This is a bug at the DAX implementation of
> bdev. Since as you know with DAX there is no difference between
> O_DIRECT and buffered, we must support any aligned IO. I bet it
> should be something with bdev not giving 4K buffer-heads to dax.c.
> 
> Or ... It might just be the infamous bug where the actual partition
> they used was not 4k aligned on its start sector. So the last sector IO
> after partition translation came out wrong. This bug then should be
> fixed by: https://lists.01.org/pipermail/linux-nvdimm/2015-July/001555.html
> by:Vishal Verma
> 
> Vishal I think we should add CC: stable@vger.kernel.org to your patch
> because of these fdisk bugs.

That patch does cause 'mkfs -t xfs' to work.

Before:
$ sudo mkfs -t xfs -f /dev/pmem3
meta-data=/dev/pmem3             isize=256    agcount=4, agsize=524288 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=0        finobt=0
data     =                       bsize=4096   blocks=2097152, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=0
log      =internal log           bsize=4096   blocks=2560, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
mkfs.xfs: read failed: Numerical result out of range

After:

$ sudo mkfs -t xfs -f /dev/pmem3
meta-data=/dev/pmem3             isize=256    agcount=4, agsize=524288 blks
         =                       sectsz=4096  attr=2, projid32bit=1
         =                       crc=0        finobt=0
data     =                       bsize=4096   blocks=2097152, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=0
log      =internal log           bsize=4096   blocks=2560, version=2
         =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

$ cat /sys/block/pmem3/queue/logical_block_size
512
$ cat /sys/block/pmem3/queue/physical_block_size
4096
$ cat /sys/block/pmem3/queue/hw_sector_size
512
$ cat /sys/block/pmem3/queue/minimum_io_size
4096

Previously physical_block_size was 512 and minimum_io_size was 0.
What about logical_block_size and hw_sector_size still being 512?

So do we want to change pmem rather than changing DAX?

-- ljk
> 
>> Cheers,
>> Dave.
> 
> Thanks
> Boaz
> 


  reply	other threads:[~2015-08-10 16:32 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-08-05 20:19 regression introduced by "block: Add support for DAX reads/writes to block devices" Jeff Moyer
2015-08-05 22:01 ` Dave Chinner
2015-08-06  1:42   ` Linda Knippers
2015-08-06  3:24     ` Dave Chinner
2015-08-06  7:52       ` Boaz Harrosh
2015-08-06 20:34         ` Dave Chinner
2015-08-09  8:52           ` Boaz Harrosh
2015-08-10 16:32             ` Linda Knippers [this message]
2015-08-10 21:27               ` Dave Chinner
2015-08-10 23:04                 ` Linda Knippers
2015-08-06 14:21 ` Wilcox, Matthew R
2015-08-06 15:33   ` Jeff Moyer
2015-08-06 15:51     ` Wilcox, Matthew R
2015-08-06 21:30   ` Jeff Moyer
2015-08-07 18:11     ` Wilcox, Matthew R
2015-08-07 20:41       ` Jeff Moyer
2015-08-10  7:42         ` Boaz Harrosh
2015-08-12 21:11           ` Jeff Moyer
2015-08-13  5:32             ` Boaz Harrosh
2015-08-13 14:00               ` Jeff Moyer
2015-08-13 16:42                 ` Linda Knippers
2015-08-13 17:14                   ` Jeff Moyer
2015-08-13 17:52                     ` Linda Knippers
2015-08-13 18:19                       ` Jeff Moyer
2015-08-13 19:32                         ` Wilcox, Matthew R
2015-08-14 16:28                           ` Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55C8D208.1070903@hp.com \
    --to=linda.knippers@hp.com \
    --cc=boaz@plexistor.com \
    --cc=david@fromorbit.com \
    --cc=jmoyer@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=matthew.r.wilcox@intel.com \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).