From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754179AbbHFHww (ORCPT ); Thu, 6 Aug 2015 03:52:52 -0400 Received: from mail-wi0-f177.google.com ([209.85.212.177]:37063 "EHLO mail-wi0-f177.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751142AbbHFHwu (ORCPT ); Thu, 6 Aug 2015 03:52:50 -0400 Message-ID: <55C3124F.3020602@plexistor.com> Date: Thu, 06 Aug 2015 10:52:47 +0300 From: Boaz Harrosh User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: Dave Chinner , Linda Knippers CC: Jeff Moyer , "matthew r. wilcox" , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: Re: regression introduced by "block: Add support for DAX reads/writes to block devices" References: <20150805220113.GC3902@dastard> <55C2BB9E.3040709@hp.com> <20150806032421.GA16638@dastard> In-Reply-To: <20150806032421.GA16638@dastard> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 08/06/2015 06:24 AM, Dave Chinner wrote: > On Wed, Aug 05, 2015 at 09:42:54PM -0400, Linda Knippers wrote: >> On 08/05/2015 06:01 PM, Dave Chinner wrote: >>> On Wed, Aug 05, 2015 at 04:19:08PM -0400, Jeff Moyer wrote: <> >>>> >>>> I sat down with Linda to look into it, and the problem is that mkfs.xfs >>>> sets the blocksize of the device to 512 (via BLKBSZSET), and then reads >>>> from the last sector of the device. This results in dax_io trying to do >>>> a page-sized I/O at 512 bytes from the end of the device. >>> This part I do not understand. how is mkfs.xfs reading the sector? Is it through open(/dev/pmem0,...) ? O_DIRECT? If so then yes the inode of /dev/pmem0 is IS_DAX() and will try to use the dax.c stuff. (I think, which Kernel?) Which means this is a bug. >>> Right - we have to be able to do IO to that last sector, so this is >>> a sanity check to tell if the block dev is large enough. The XFS >>> kernel code does the same end-of-device sector read when the >>> filesystem is mounted, too. >>> >>>> bdev_direct_access, receiving this bogus pos/size combo, returns >>>> -ERANGE: >>>> >>>> if ((sector + DIV_ROUND_UP(size, 512)) > >>>> part_nr_sects_read(bdev->bd_part)) >>>> return -ERANGE; >>>> >>>> Given that file systems supporting dax refuse to mount with a blocksize >>>> != page size, I'm guessing this is sort of expected behavior. However, >>>> we really shouldn't be breaking direct I/O on pmem devices. >>> No this is a BUG. read/write buffered/direct to an IS_DAX() inode should be able to be of any alignment size. Since with DAX buffered/direct is exact same code path and buffered IO expects any size IO. This is probably a bug in the DAX handling of the bdev-inode. Let me test this. I will send a fix ASAP. <> >>> the output of: >>> >>> /sys/block/pmem0/queue/logical_block_size >> 512 >> >>> /sys/block/pmem0/queue/physical_block_size >> 512 >> There is a pending fix for this. Do you need it sent to stable ? >>> /sys/block/pmem0/queue/hw_sector_size >> 512 >> >>> /sys/block/pmem0/queue/minimum_io_size >> 512 >> >>> /sys/block/pmem0/queue/optimal_io_size >> 0 Thanks Boaz