linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Vivek Goyal <vgoyal@redhat.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Jan Kara <jack@suse.cz>,
	"Darrick J. Wong" <darrick.wong@oracle.com>,
	Christoph Hellwig <hch@infradead.org>,
	Dave Chinner <david@fromorbit.com>,
	Miklos Szeredi <miklos@szeredi.hu>,
	linux-nvdimm <linux-nvdimm@lists.01.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	virtio-fs@redhat.com, Stefan Hajnoczi <stefanha@redhat.com>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Jeff Moyer <jmoyer@redhat.com>
Subject: Re: [PATCH 01/19] dax: remove block device dependencies
Date: Wed, 15 Jan 2020 14:56:17 -0500	[thread overview]
Message-ID: <20200115195617.GA4133@redhat.com> (raw)
In-Reply-To: <CAPcyv4igrs40uWuCB163PPBLqyGVaVbaNfE=kCfHRPRuvZdxQA@mail.gmail.com>

On Tue, Jan 14, 2020 at 02:23:04PM -0800, Dan Williams wrote:
> On Tue, Jan 14, 2020 at 1:28 PM Vivek Goyal <vgoyal@redhat.com> wrote:
> >
> > On Tue, Jan 14, 2020 at 12:39:00PM -0800, Dan Williams wrote:
> > > On Tue, Jan 14, 2020 at 12:31 PM Vivek Goyal <vgoyal@redhat.com> wrote:
> > > >
> > > > On Thu, Jan 09, 2020 at 12:03:01PM -0800, Dan Williams wrote:
> > > > > On Thu, Jan 9, 2020 at 3:27 AM Jan Kara <jack@suse.cz> wrote:
> > > > > >
> > > > > > On Tue 07-01-20 10:49:55, Dan Williams wrote:
> > > > > > > On Tue, Jan 7, 2020 at 10:33 AM Vivek Goyal <vgoyal@redhat.com> wrote:
> > > > > > > > W.r.t partitioning, bdev_dax_pgoff() seems to be the pain point where
> > > > > > > > dax code refers back to block device to figure out partition offset in
> > > > > > > > dax device. If we create a dax object corresponding to "struct block_device"
> > > > > > > > and store sector offset in that, then we could pass that object to dax
> > > > > > > > code and not worry about referring back to bdev. I have written some
> > > > > > > > proof of concept code and called that object "dax_handle". I can post
> > > > > > > > that code if there is interest.
> > > > > > >
> > > > > > > I don't think it's worth it in the end especially considering
> > > > > > > filesystems are looking to operate on /dev/dax devices directly and
> > > > > > > remove block entanglements entirely.
> > > > > > >
> > > > > > > > IMHO, it feels useful to be able to partition and use a dax capable
> > > > > > > > block device in same way as non-dax block device. It will be really
> > > > > > > > odd to think that if filesystem is on /dev/pmem0p1, then dax can't
> > > > > > > > be enabled but if filesystem is on /dev/mapper/pmem0p1, then dax
> > > > > > > > will work.
> > > > > > >
> > > > > > > That can already happen today. If you do not properly align the
> > > > > > > partition then dax operations will be disabled. This proposal just
> > > > > > > extends that existing failure domain to make all partitions fail to
> > > > > > > support dax.
> > > > > >
> > > > > > Well, I have some sympathy with the sysadmin that has /dev/pmem0 device,
> > > > > > decides to create partitions on it for whatever (possibly misguided)
> > > > > > reason and then ponders why the hell DAX is not working? And PAGE_SIZE
> > > > > > partition alignment is so obvious and widespread that I don't count it as a
> > > > > > realistic error case sysadmins would be pondering about currently.
> > > > > >
> > > > > > So I'd find two options reasonably consistent:
> > > > > > 1) Keep status quo where partitions are created and support DAX.
> > > > > > 2) Stop partition creation altogether, if anyones wants to split pmem
> > > > > > device further, he can use dm-linear for that (i.e., kpartx).
> > > > > >
> > > > > > But I'm not sure if the ship hasn't already sailed for option 2) to be
> > > > > > feasible without angry users and Linus reverting the change.
> > > > >
> > > > > Christoph? I feel myself leaning more and more to the "keep pmem
> > > > > partitions" camp.
> > > > >
> > > > > I don't see "drop partition support" effort ending well given the long
> > > > > standing "ext4 fails to mount when dax is not available" precedent.
> > > > >
> > > > > I think the next least bad option is to have a dax_get_by_host()
> > > > > variant that passes an offset and length pair rather than requiring a
> > > > > later bdev_dax_pgoff() to recall the offset. This also prevents
> > > > > needing to add another dax-device object representation.
> > > >
> > > > I am wondering what's the conclusion on this. I want to this to make
> > > > progress in some direction so that I can make progress on virtiofs DAX
> > > > support.
> > >
> > > I think we should at least try to delete the partition support and see
> > > if anyone screams. Have a module option to revert the behavior so
> > > people are not stuck waiting for the revert to land, but if it stays
> > > quiet then we're in a better place with that support pushed out of the
> > > dax core.
> >
> > Hi Dan,
> >
> > So basically keep partition support code just that disable it by default
> > and it is enabled by some knob say kernel command line option/module
> > option.
> 
> Yes.
> 
> > At what point of time will we remove that code completely. I mean what
> > if people scream after two kernel releases, after we have removed the
> > code.
> 
> I'd follow the typical timelines of Documentation/ABI/obsolete which
> is a year or more.
> 
> >
> > Also, from distribution's perspective, we might not hear from our
> > customers for a very long time (till we backport that code in to
> > existing releases or release this new code in next major release). From
> > that view point I will not like to break existing user visible behavior.
> >
> > How bad it is to keep partition support around. To me it feels reasonaly
> > simple where we just have to store offset into dax device into another
> > dax object:
> 
> If we end up keeping partition support, we're not adding another object.
> 
> > and pass that object around (instead of dax_device). If that's
> > the case, I am not sure why to even venture into a direction where some
> > user's setup might be broken.
> 
> It was a mistake to support them. If that mistake can be undone
> without breaking existing deployments the code base is better off
> without the concept.
> 
> > Also from an application perspective, /dev/pmem is a block device, so it
> > should behave like a block device, (including kernel partition table support).
> > From that view, dax looks like just an additional feature of that device
> > which can be enabled by passing option "-o dax".
> 
> dax via block devices was a crutch that we leaned on too heavily, and
> the implementation has slowly been moving away from it ever since.
> 
> > IOW, can we reconsider the idea of not supporting kernel partition tables
> > for dax capable block devices. I can only see downsides of removing kernel
> > partition table support and only upside seems to be little cleanup of dax
> > core code.
> 
> Can you help find end users that depend on it?

I can't think of a real user at this point of time. Just that I am
concerned, once the change goes in, somebody will get affected at later
point of time and comes out complainig and this change will be seen as
breaking user space and hence regression.

> Even the Red Hat
> installation guide example shows mounting on pmem0 directly. [1]

Below that example it also says.

"When creating partitions on a pmem device to be used for direct access,
partitions must be aligned on page boundaries. On the Intel 64 and AMD64
architecture, at least 4KiB alignment for the start and end of the
partition, but 2MiB is the preferred alignment. By default, the parted
tool aligns partitions on 1MiB boundaries. For the first partition,
specify 2MiB as the start of the partition. If the size of the partition
is a multiple of 2MiB, all other partitions are also aligned."

So documentation is clearly saying dax will work with partitions as well.
And some user might decide to just do that.

Thanks
Vivek


  reply	other threads:[~2020-01-15 19:56 UTC|newest]

Thread overview: 77+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-21 17:57 [PATCH v3 00/19][RFC] virtio-fs: Enable DAX support Vivek Goyal
2019-08-21 17:57 ` [PATCH 01/19] dax: remove block device dependencies Vivek Goyal
2019-08-26 11:51   ` Christoph Hellwig
2019-08-27 16:38     ` Vivek Goyal
2019-08-28  6:58       ` Christoph Hellwig
2019-08-28 17:58         ` Vivek Goyal
2019-08-28 22:53           ` Dave Chinner
2019-08-29  0:04             ` Dan Williams
2019-08-29  9:32               ` Christoph Hellwig
2019-12-16 18:10               ` Vivek Goyal
2020-01-07 12:51                 ` Christoph Hellwig
2020-01-07 14:22                   ` Dan Williams
2020-01-07 17:07                     ` Darrick J. Wong
2020-01-07 17:29                       ` Dan Williams
2020-01-07 18:01                         ` Vivek Goyal
2020-01-07 18:07                           ` Dan Williams
2020-01-07 18:33                             ` Vivek Goyal
2020-01-07 18:49                               ` Dan Williams
2020-01-07 19:02                                 ` Darrick J. Wong
2020-01-07 19:46                                   ` Dan Williams
2020-01-07 23:38                                     ` Dan Williams
2020-01-09 11:24                                 ` Jan Kara
2020-01-09 20:03                                   ` Dan Williams
2020-01-10 12:36                                     ` Christoph Hellwig
2020-01-14 20:31                                     ` Vivek Goyal
2020-01-14 20:39                                       ` Dan Williams
2020-01-14 21:28                                         ` Vivek Goyal
2020-01-14 22:23                                           ` Dan Williams
2020-01-15 19:56                                             ` Vivek Goyal [this message]
2020-01-15 20:17                                               ` Dan Williams
2020-01-15 21:08                                                 ` Jeff Moyer
2020-01-16 18:09                                                   ` Dan Williams
2020-01-16 18:39                                                     ` Vivek Goyal
2020-01-16 19:09                                                       ` Dan Williams
2020-01-16 19:23                                                         ` Vivek Goyal
2020-02-11 17:33                                                     ` Vivek Goyal
2020-01-15  9:03                                           ` Jan Kara
2019-08-21 17:57 ` [PATCH 02/19] dax: Pass dax_dev to dax_writeback_mapping_range() Vivek Goyal
2019-08-26 11:53   ` Christoph Hellwig
2019-08-26 20:33     ` Vivek Goyal
2019-08-26 20:58       ` Vivek Goyal
2019-08-26 21:33         ` Dan Williams
2019-08-28  6:58         ` Christoph Hellwig
2020-01-03 14:12         ` Vivek Goyal
2020-01-03 18:12           ` Dan Williams
2020-01-03 18:18             ` Dan Williams
2020-01-03 18:33               ` Vivek Goyal
2020-01-03 19:30                 ` Dan Williams
2020-01-03 18:43               ` Vivek Goyal
2019-08-27 13:45       ` Jan Kara
2019-08-21 17:57 ` [PATCH 03/19] virtio: Add get_shm_region method Vivek Goyal
2019-08-21 17:57 ` [PATCH 04/19] virtio: Implement get_shm_region for PCI transport Vivek Goyal
2019-08-26  1:43   ` [Virtio-fs] " piaojun
2019-08-26 13:06     ` Vivek Goyal
2019-08-27  9:41       ` piaojun
2019-08-27  8:34   ` Cornelia Huck
2019-08-27  8:46     ` Cornelia Huck
2019-08-27 11:53     ` Vivek Goyal
2019-08-21 17:57 ` [PATCH 05/19] virtio: Implement get_shm_region for MMIO transport Vivek Goyal
2019-08-27  8:39   ` Cornelia Huck
2019-08-27 11:54     ` Vivek Goyal
2019-08-21 17:57 ` [PATCH 06/19] fuse, dax: add fuse_conn->dax_dev field Vivek Goyal
2019-08-21 17:57 ` [PATCH 07/19] virtio_fs, dax: Set up virtio_fs dax_device Vivek Goyal
2019-08-21 17:57 ` [PATCH 08/19] fuse: Keep a list of free dax memory ranges Vivek Goyal
2019-08-21 17:57 ` [PATCH 09/19] fuse: implement FUSE_INIT map_alignment field Vivek Goyal
2019-08-21 17:57 ` [PATCH 10/19] fuse: Introduce setupmapping/removemapping commands Vivek Goyal
2019-08-21 17:57 ` [PATCH 11/19] fuse, dax: Implement dax read/write operations Vivek Goyal
2019-08-21 19:49   ` Liu Bo
2019-08-22 12:59     ` Vivek Goyal
2019-08-21 17:57 ` [PATCH 12/19] fuse, dax: add DAX mmap support Vivek Goyal
2019-08-21 17:57 ` [PATCH 13/19] fuse: Define dax address space operations Vivek Goyal
2019-08-21 17:57 ` [PATCH 14/19] fuse, dax: Take ->i_mmap_sem lock during dax page fault Vivek Goyal
2019-08-21 17:57 ` [PATCH 15/19] fuse: Maintain a list of busy elements Vivek Goyal
2019-08-21 17:57 ` [PATCH 16/19] dax: Create a range version of dax_layout_busy_page() Vivek Goyal
2019-08-21 17:57 ` [PATCH 17/19] fuse: Add logic to free up a memory range Vivek Goyal
2019-08-21 17:57 ` [PATCH 18/19] fuse: Release file in process context Vivek Goyal
2019-08-21 17:57 ` [PATCH 19/19] fuse: Take inode lock for dax inode truncation Vivek Goyal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200115195617.GA4133@redhat.com \
    --to=vgoyal@redhat.com \
    --cc=dan.j.williams@intel.com \
    --cc=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=dgilbert@redhat.com \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=jmoyer@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=miklos@szeredi.hu \
    --cc=stefanha@redhat.com \
    --cc=virtio-fs@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).