Linux-NVDIMM Archive on lore.kernel.org
 help / color / Atom feed
From: Vivek Goyal <vgoyal@redhat.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>,
	Christoph Hellwig <hch@infradead.org>,
	Dave Chinner <david@fromorbit.com>,
	Miklos Szeredi <miklos@szeredi.hu>,
	linux-nvdimm <linux-nvdimm@lists.01.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	virtio-fs@redhat.com, Stefan Hajnoczi <stefanha@redhat.com>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: [PATCH 01/19] dax: remove block device dependencies
Date: Tue, 7 Jan 2020 13:33:07 -0500
Message-ID: <20200107183307.GD15920@redhat.com> (raw)
In-Reply-To: <CAPcyv4gmdoqpwwwy4dS3D2eZFjmJ_Zi39k=1a4wn-_ksm-UV4A@mail.gmail.com>

On Tue, Jan 07, 2020 at 10:07:18AM -0800, Dan Williams wrote:
> On Tue, Jan 7, 2020 at 10:02 AM Vivek Goyal <vgoyal@redhat.com> wrote:
> >
> > On Tue, Jan 07, 2020 at 09:29:17AM -0800, Dan Williams wrote:
> > > On Tue, Jan 7, 2020 at 9:08 AM Darrick J. Wong <darrick.wong@oracle.com> wrote:
> > > >
> > > > On Tue, Jan 07, 2020 at 06:22:54AM -0800, Dan Williams wrote:
> > > > > On Tue, Jan 7, 2020 at 4:52 AM Christoph Hellwig <hch@infradead.org> wrote:
> > > > > >
> > > > > > On Mon, Dec 16, 2019 at 01:10:14PM -0500, Vivek Goyal wrote:
> > > > > > > > Agree. In retrospect it was my laziness in the dax-device
> > > > > > > > implementation to expect the block-device to be available.
> > > > > > > >
> > > > > > > > It looks like fs_dax_get_by_bdev() is an intercept point where a
> > > > > > > > dax_device could be dynamically created to represent the subset range
> > > > > > > > indicated by the block-device partition. That would open up more
> > > > > > > > cleanup opportunities.
> > > > > > >
> > > > > > > Hi Dan,
> > > > > > >
> > > > > > > After a long time I got time to look at it again. Want to work on this
> > > > > > > cleanup so that I can make progress with virtiofs DAX paches.
> > > > > > >
> > > > > > > I am not sure I understand the requirements fully. I see that right now
> > > > > > > dax_device is created per device and all block partitions refer to it. If
> > > > > > > we want to create one dax_device per partition, then it looks like this
> > > > > > > will be structured more along the lines how block layer handles disk and
> > > > > > > partitions. (One gendisk for disk and block_devices for partitions,
> > > > > > > including partition 0). That probably means state belong to whole device
> > > > > > > will be in common structure say dax_device_common, and per partition state
> > > > > > > will be in dax_device and dax_device can carry a pointer to
> > > > > > > dax_device_common.
> > > > > > >
> > > > > > > I am also not sure what does it mean to partition dax devices. How will
> > > > > > > partitions be exported to user space.
> > > > > >
> > > > > > Dan, last time we talked you agreed that partitioned dax devices are
> > > > > > rather pointless IIRC.  Should we just deprecate partitions on DAX
> > > > > > devices and then remove them after a cycle or two?
> > > > >
> > > > > That does seem a better plan than trying to force partition support
> > > > > where it is not needed.
> > > >
> > > > Question: if one /did/ have a partitioned DAX device and used kpartx to
> > > > create dm-linear devices for each partition, will DAX still work through
> > > > that?
> > >
> > > The device-mapper support will continue, but it will be limited to
> > > whole device sub-components. I.e. you could use kpartx to carve up
> > > /dev/pmem0 and still have dax, but not partitions of /dev/pmem0.
> >
> > So we can't use fdisk/parted to partition /dev/pmem0. Given /dev/pmem0
> > is a block device, I thought tools will expect it to be partitioned.
> > Sometimes I create those partitions and use /dev/pmem0. So what's
> > the replacement for this. People often have tools/scripts which might
> > want to partition the device and these will start failing.
> 
> Partitioning will still work, but dax operation will be declined and
> fall back to page-cache.

Ok, so if I mount /dev/pmem0p1 with dax enabled, that might fail or
filesystem will fall back to using page cache. (But dax will not be
enabled).

> 
> > IOW, I do not understand that why being able to partition /dev/pmem0
> > (which is a block device from user space point of view), is pointless.
> 
> How about s/pointless/redundant/. Persistent memory can already be
> "partitioned" via namespace boundaries.

But that's an entirely different way of partitioning. To me being able
to use block devices (with dax capability) in same way as any other
block device makes sense.

> Block device partitioning is
> then redundant and needlessly complicates, as you have found, the
> kernel implementation.

It does complicate kernel implementation. Is it too hard to solve the
problem in kernel.

W.r.t partitioning, bdev_dax_pgoff() seems to be the pain point where
dax code refers back to block device to figure out partition offset in
dax device. If we create a dax object corresponding to "struct block_device"
and store sector offset in that, then we could pass that object to dax
code and not worry about referring back to bdev. I have written some
proof of concept code and called that object "dax_handle". I can post
that code if there is interest.

IMHO, it feels useful to be able to partition and use a dax capable
block device in same way as non-dax block device. It will be really
odd to think that if filesystem is on /dev/pmem0p1, then dax can't
be enabled but if filesystem is on /dev/mapper/pmem0p1, then dax
will work.

Thanks
Vivek

> 
> The problem will be people that were on dax+ext4 on partitions. Those
> people will see a hard failure at mount whereas XFS will fallback to
> page cache with a warning in the log. I think ext4 must convert to the
> xfs dax handling model before partition support is dropped.
> 
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

  reply index

Thread overview: 77+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-21 17:57 [PATCH v3 00/19][RFC] virtio-fs: Enable DAX support Vivek Goyal
2019-08-21 17:57 ` [PATCH 01/19] dax: remove block device dependencies Vivek Goyal
2019-08-26 11:51   ` Christoph Hellwig
2019-08-27 16:38     ` Vivek Goyal
2019-08-28  6:58       ` Christoph Hellwig
2019-08-28 17:58         ` Vivek Goyal
2019-08-28 22:53           ` Dave Chinner
2019-08-29  0:04             ` Dan Williams
2019-08-29  9:32               ` Christoph Hellwig
2019-12-16 18:10               ` Vivek Goyal
2020-01-07 12:51                 ` Christoph Hellwig
2020-01-07 14:22                   ` Dan Williams
2020-01-07 17:07                     ` Darrick J. Wong
2020-01-07 17:29                       ` Dan Williams
2020-01-07 18:01                         ` Vivek Goyal
2020-01-07 18:07                           ` Dan Williams
2020-01-07 18:33                             ` Vivek Goyal [this message]
2020-01-07 18:49                               ` Dan Williams
2020-01-07 19:02                                 ` Darrick J. Wong
2020-01-07 19:46                                   ` Dan Williams
2020-01-07 23:38                                     ` Dan Williams
2020-01-09 11:24                                 ` Jan Kara
2020-01-09 20:03                                   ` Dan Williams
2020-01-10 12:36                                     ` Christoph Hellwig
2020-01-14 20:31                                     ` Vivek Goyal
2020-01-14 20:39                                       ` Dan Williams
2020-01-14 21:28                                         ` Vivek Goyal
2020-01-14 22:23                                           ` Dan Williams
2020-01-15 19:56                                             ` Vivek Goyal
2020-01-15 20:17                                               ` Dan Williams
2020-01-15 21:08                                                 ` Jeff Moyer
2020-01-16 18:09                                                   ` Dan Williams
2020-01-16 18:39                                                     ` Vivek Goyal
2020-01-16 19:09                                                       ` Dan Williams
2020-01-16 19:23                                                         ` Vivek Goyal
2020-02-11 17:33                                                     ` Vivek Goyal
2020-01-15  9:03                                           ` Jan Kara
2019-08-21 17:57 ` [PATCH 02/19] dax: Pass dax_dev to dax_writeback_mapping_range() Vivek Goyal
2019-08-26 11:53   ` Christoph Hellwig
2019-08-26 20:33     ` Vivek Goyal
2019-08-26 20:58       ` Vivek Goyal
2019-08-26 21:33         ` Dan Williams
2019-08-28  6:58         ` Christoph Hellwig
2020-01-03 14:12         ` Vivek Goyal
2020-01-03 18:12           ` Dan Williams
2020-01-03 18:18             ` Dan Williams
2020-01-03 18:33               ` Vivek Goyal
2020-01-03 19:30                 ` Dan Williams
2020-01-03 18:43               ` Vivek Goyal
2019-08-27 13:45       ` Jan Kara
2019-08-21 17:57 ` [PATCH 03/19] virtio: Add get_shm_region method Vivek Goyal
2019-08-21 17:57 ` [PATCH 04/19] virtio: Implement get_shm_region for PCI transport Vivek Goyal
2019-08-26  1:43   ` [Virtio-fs] " piaojun
2019-08-26 13:06     ` Vivek Goyal
2019-08-27  9:41       ` piaojun
2019-08-27  8:34   ` Cornelia Huck
2019-08-27  8:46     ` Cornelia Huck
2019-08-27 11:53     ` Vivek Goyal
2019-08-21 17:57 ` [PATCH 05/19] virtio: Implement get_shm_region for MMIO transport Vivek Goyal
2019-08-27  8:39   ` Cornelia Huck
2019-08-27 11:54     ` Vivek Goyal
2019-08-21 17:57 ` [PATCH 06/19] fuse, dax: add fuse_conn->dax_dev field Vivek Goyal
2019-08-21 17:57 ` [PATCH 07/19] virtio_fs, dax: Set up virtio_fs dax_device Vivek Goyal
2019-08-21 17:57 ` [PATCH 08/19] fuse: Keep a list of free dax memory ranges Vivek Goyal
2019-08-21 17:57 ` [PATCH 09/19] fuse: implement FUSE_INIT map_alignment field Vivek Goyal
2019-08-21 17:57 ` [PATCH 10/19] fuse: Introduce setupmapping/removemapping commands Vivek Goyal
2019-08-21 17:57 ` [PATCH 11/19] fuse, dax: Implement dax read/write operations Vivek Goyal
2019-08-21 19:49   ` Liu Bo
2019-08-22 12:59     ` Vivek Goyal
2019-08-21 17:57 ` [PATCH 12/19] fuse, dax: add DAX mmap support Vivek Goyal
2019-08-21 17:57 ` [PATCH 13/19] fuse: Define dax address space operations Vivek Goyal
2019-08-21 17:57 ` [PATCH 14/19] fuse, dax: Take ->i_mmap_sem lock during dax page fault Vivek Goyal
2019-08-21 17:57 ` [PATCH 15/19] fuse: Maintain a list of busy elements Vivek Goyal
2019-08-21 17:57 ` [PATCH 16/19] dax: Create a range version of dax_layout_busy_page() Vivek Goyal
2019-08-21 17:57 ` [PATCH 17/19] fuse: Add logic to free up a memory range Vivek Goyal
2019-08-21 17:57 ` [PATCH 18/19] fuse: Release file in process context Vivek Goyal
2019-08-21 17:57 ` [PATCH 19/19] fuse: Take inode lock for dax inode truncation Vivek Goyal

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200107183307.GD15920@redhat.com \
    --to=vgoyal@redhat.com \
    --cc=dan.j.williams@intel.com \
    --cc=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=dgilbert@redhat.com \
    --cc=hch@infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=miklos@szeredi.hu \
    --cc=stefanha@redhat.com \
    --cc=virtio-fs@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-NVDIMM Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-nvdimm/0 linux-nvdimm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-nvdimm linux-nvdimm/ https://lore.kernel.org/linux-nvdimm \
		linux-nvdimm@lists.01.org
	public-inbox-index linux-nvdimm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.01.lists.linux-nvdimm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git