From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754525AbbHJQcr (ORCPT ); Mon, 10 Aug 2015 12:32:47 -0400 Received: from g4t3425.houston.hp.com ([15.201.208.53]:53770 "EHLO g4t3425.houston.hp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753826AbbHJQcK (ORCPT ); Mon, 10 Aug 2015 12:32:10 -0400 Subject: Re: regression introduced by "block: Add support for DAX reads/writes to block devices" To: Boaz Harrosh , Dave Chinner References: <20150805220113.GC3902@dastard> <55C2BB9E.3040709@hp.com> <20150806032421.GA16638@dastard> <55C3124F.3020602@plexistor.com> <20150806203450.GB16638@dastard> <55C714D0.8070003@plexistor.com> Cc: Jeff Moyer , "matthew r. wilcox" , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, Vishal Verma From: Linda Knippers X-Enigmail-Draft-Status: N1110 Message-ID: <55C8D208.1070903@hp.com> Date: Mon, 10 Aug 2015 12:32:08 -0400 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.1.0 MIME-Version: 1.0 In-Reply-To: <55C714D0.8070003@plexistor.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 8/9/2015 4:52 AM, Boaz Harrosh wrote: > On 08/06/2015 11:34 PM, Dave Chinner wrote: >> On Thu, Aug 06, 2015 at 10:52:47AM +0300, Boaz Harrosh wrote: >>> On 08/06/2015 06:24 AM, Dave Chinner wrote: >>>> On Wed, Aug 05, 2015 at 09:42:54PM -0400, Linda Knippers wrote: >>>>> On 08/05/2015 06:01 PM, Dave Chinner wrote: >>>>>> On Wed, Aug 05, 2015 at 04:19:08PM -0400, Jeff Moyer wrote: >>> <> >>>>>>> >>>>>>> I sat down with Linda to look into it, and the problem is that mkfs.xfs >>>>>>> sets the blocksize of the device to 512 (via BLKBSZSET), and then reads >>>>>>> from the last sector of the device. This results in dax_io trying to do >>>>>>> a page-sized I/O at 512 bytes from the end of the device. >>>>>> >>> >>> This part I do not understand. how is mkfs.xfs reading the sector? >>> Is it through open(/dev/pmem0,...) ? O_DIRECT? >> >> mkfs.xfs uses O_DIRECT. Only if open(O_DIRECT) fails or mkfs.xfs is >> told that it is working on an image file does it fall back to >> buffered IO. All of the XFS userspace tools work this way to prevent >> page cache pollution issues with read-once or write-once data during >> operation. >> > > Thanks, yes makes sense. This is a bug at the DAX implementation of > bdev. Since as you know with DAX there is no difference between > O_DIRECT and buffered, we must support any aligned IO. I bet it > should be something with bdev not giving 4K buffer-heads to dax.c. > > Or ... It might just be the infamous bug where the actual partition > they used was not 4k aligned on its start sector. So the last sector IO > after partition translation came out wrong. This bug then should be > fixed by: https://lists.01.org/pipermail/linux-nvdimm/2015-July/001555.html > by:Vishal Verma > > Vishal I think we should add CC: stable@vger.kernel.org to your patch > because of these fdisk bugs. That patch does cause 'mkfs -t xfs' to work. Before: $ sudo mkfs -t xfs -f /dev/pmem3 meta-data=/dev/pmem3 isize=256 agcount=4, agsize=524288 blks = sectsz=512 attr=2, projid32bit=1 = crc=0 finobt=0 data = bsize=4096 blocks=2097152, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 ftype=0 log =internal log bsize=4096 blocks=2560, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 mkfs.xfs: read failed: Numerical result out of range After: $ sudo mkfs -t xfs -f /dev/pmem3 meta-data=/dev/pmem3 isize=256 agcount=4, agsize=524288 blks = sectsz=4096 attr=2, projid32bit=1 = crc=0 finobt=0 data = bsize=4096 blocks=2097152, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 ftype=0 log =internal log bsize=4096 blocks=2560, version=2 = sectsz=4096 sunit=1 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 $ cat /sys/block/pmem3/queue/logical_block_size 512 $ cat /sys/block/pmem3/queue/physical_block_size 4096 $ cat /sys/block/pmem3/queue/hw_sector_size 512 $ cat /sys/block/pmem3/queue/minimum_io_size 4096 Previously physical_block_size was 512 and minimum_io_size was 0. What about logical_block_size and hw_sector_size still being 512? So do we want to change pmem rather than changing DAX? -- ljk > >> Cheers, >> Dave. > > Thanks > Boaz >