From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1754179AbbHFHww (ORCPT <rfc822;w@1wt.eu>);
	Thu, 6 Aug 2015 03:52:52 -0400
Received: from mail-wi0-f177.google.com ([209.85.212.177]:37063 "EHLO
	mail-wi0-f177.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1751142AbbHFHwu (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Thu, 6 Aug 2015 03:52:50 -0400
Message-ID: <55C3124F.3020602@plexistor.com>
Date: Thu, 06 Aug 2015 10:52:47 +0300
From: Boaz Harrosh <boaz@plexistor.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0
MIME-Version: 1.0
To: Dave Chinner <david@fromorbit.com>, Linda Knippers <linda.knippers@hp.com>
CC: Jeff Moyer <jmoyer@redhat.com>,
        "matthew r. wilcox" <matthew.r.wilcox@intel.com>,
        linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: Re: regression introduced by "block: Add support for DAX reads/writes
 to block devices"
References: <x49k2t9ij43.fsf@segfault.boston.devel.redhat.com> <20150805220113.GC3902@dastard> <55C2BB9E.3040709@hp.com> <20150806032421.GA16638@dastard>
In-Reply-To: <20150806032421.GA16638@dastard>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On 08/06/2015 06:24 AM, Dave Chinner wrote:
> On Wed, Aug 05, 2015 at 09:42:54PM -0400, Linda Knippers wrote:
>> On 08/05/2015 06:01 PM, Dave Chinner wrote:
>>> On Wed, Aug 05, 2015 at 04:19:08PM -0400, Jeff Moyer wrote:
<>
>>>>
>>>> I sat down with Linda to look into it, and the problem is that mkfs.xfs
>>>> sets the blocksize of the device to 512 (via BLKBSZSET), and then reads
>>>> from the last sector of the device.  This results in dax_io trying to do
>>>> a page-sized I/O at 512 bytes from the end of the device.
>>>

This part I do not understand. how is mkfs.xfs reading the sector?
Is it through open(/dev/pmem0,...) ? O_DIRECT?

If so then yes the inode of /dev/pmem0 is IS_DAX() and will try
to use the dax.c stuff. (I think, which Kernel?)

Which means this is a bug.

>>> Right - we have to be able to do IO to that last sector, so this is
>>> a sanity check to tell if the block dev is large enough. The XFS
>>> kernel code does the same end-of-device sector read when the
>>> filesystem is mounted, too.
>>>
>>>> bdev_direct_access, receiving this bogus pos/size combo, returns
>>>> -ERANGE:
>>>>
>>>> 	if ((sector + DIV_ROUND_UP(size, 512)) >
>>>> 					part_nr_sects_read(bdev->bd_part))
>>>> 		return -ERANGE;
>>>>
>>>> Given that file systems supporting dax refuse to mount with a blocksize
>>>> != page size, I'm guessing this is sort of expected behavior.  However,
>>>> we really shouldn't be breaking direct I/O on pmem devices.
>>>

No this is a BUG. read/write buffered/direct to an IS_DAX() inode should
be able to be of any alignment size. Since with DAX buffered/direct is
exact same code path and buffered IO expects any size IO.

This is probably a bug in the DAX handling of the bdev-inode. Let me
test this. I will send a fix ASAP.

<>
>>> the output of:
>>>
>>> 	/sys/block/pmem0/queue/logical_block_size
>> 512
>>
>>> 	/sys/block/pmem0/queue/physical_block_size
>> 512
>>

There is a pending fix for this.
Do you need it sent to stable ?

>>> 	/sys/block/pmem0/queue/hw_sector_size
>> 512
>>
>>> 	/sys/block/pmem0/queue/minimum_io_size
>> 512
>>
>>> 	/sys/block/pmem0/queue/optimal_io_size
>> 0

Thanks
Boaz