From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753960AbbHFBnJ (ORCPT ); Wed, 5 Aug 2015 21:43:09 -0400 Received: from g4t3425.houston.hp.com ([15.201.208.53]:43146 "EHLO g4t3425.houston.hp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752368AbbHFBnH (ORCPT ); Wed, 5 Aug 2015 21:43:07 -0400 Message-ID: <55C2BB9E.3040709@hp.com> Date: Wed, 05 Aug 2015 21:42:54 -0400 From: Linda Knippers User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.1.1 MIME-Version: 1.0 To: Dave Chinner , Jeff Moyer CC: "matthew r. wilcox" , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: Re: regression introduced by "block: Add support for DAX reads/writes to block devices" References: <20150805220113.GC3902@dastard> In-Reply-To: <20150805220113.GC3902@dastard> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 08/05/2015 06:01 PM, Dave Chinner wrote: > On Wed, Aug 05, 2015 at 04:19:08PM -0400, Jeff Moyer wrote: >> Hi, Matthew, >> >> Linda Knippers noticed that commit (bbab37ddc20b) breaks mkfs.xfs: >> >> # mkfs -t xfs -f /dev/pmem0 >> meta-data=/dev/pmem0 isize=256 agcount=4, agsize=524288 blks >> = sectsz=512 attr=2, projid32bit=1 >> = crc=0 finobt=0 >> data = bsize=4096 blocks=2097152, imaxpct=25 >> = sunit=0 swidth=0 blks >> naming =version 2 bsize=4096 ascii-ci=0 ftype=0 >> log =internal log bsize=4096 blocks=2560, version=2 >> = sectsz=512 sunit=0 blks, lazy-count=1 >> realtime =none extsz=4096 blocks=0, rtextents=0 >> mkfs.xfs: read failed: Numerical result out of range >> >> I sat down with Linda to look into it, and the problem is that mkfs.xfs >> sets the blocksize of the device to 512 (via BLKBSZSET), and then reads >> from the last sector of the device. This results in dax_io trying to do >> a page-sized I/O at 512 bytes from the end of the device. > > Right - we have to be able to do IO to that last sector, so this is > a sanity check to tell if the block dev is large enough. The XFS > kernel code does the same end-of-device sector read when the > filesystem is mounted, too. > >> bdev_direct_access, receiving this bogus pos/size combo, returns >> -ERANGE: >> >> if ((sector + DIV_ROUND_UP(size, 512)) > >> part_nr_sects_read(bdev->bd_part)) >> return -ERANGE; >> >> Given that file systems supporting dax refuse to mount with a blocksize >> != page size, I'm guessing this is sort of expected behavior. However, >> we really shouldn't be breaking direct I/O on pmem devices. > > If the device is advertising 512 byte sector size support, then this > needs to work, especially as DAX is completely transparent on the > block device. Remember that DAX through a filesystem works on > filesystem data block size boundaries, so a 512 byte sector/4k block > size filesystem will be able to use DAX for mmapped files just fine. > >> So, what do you want to do? We could make the pmem device's logical >> block size fixed at the sytem page size. Or, we could modify the dax >> code to work with blocksize < pagesize. Or, we could continue using the >> direct I/O codepath for direct block device access. What do you think? > > I don't know how the pmem device sets up it's limits. Can you post > the output of: > > /sys/block/pmem0/queue/logical_block_size 512 > /sys/block/pmem0/queue/physical_block_size 512 > /sys/block/pmem0/queue/hw_sector_size 512 > /sys/block/pmem0/queue/minimum_io_size 512 > /sys/block/pmem0/queue/optimal_io_size 0 Let me know if you need anything else. -- ljk > As these all affect how mkfs.xfs configures the filesystem being > made and so influences the size and alignment of the IO is does.... > > Cheers, > > Dave. >