linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: Jeff Moyer <jmoyer@redhat.com>
Cc: "linux-nvdimm@lists.01.org" <linux-nvdimm@ml01.01.org>,
	Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>,
	Linux ACPI <linux-acpi@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v3] libnvdimm, region: sysfs trigger for nvdimm_flush()
Date: Thu, 27 Apr 2017 14:36:15 -0700	[thread overview]
Message-ID: <CAPcyv4hErApEjUWnRz9ywak8RP6BDremBKzG2wDvWO=wHdAwPg@mail.gmail.com> (raw)
In-Reply-To: <CAPcyv4hthct-P=W9QLbjswvco8Fcewpf13ZX6-h2PXpHwRtAKg@mail.gmail.com>

On Thu, Apr 27, 2017 at 1:02 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> On Thu, Apr 27, 2017 at 12:40 PM, Jeff Moyer <jmoyer@redhat.com> wrote:
>> Dan Williams <dan.j.williams@intel.com> writes:
>>
>>> On Thu, Apr 27, 2017 at 11:41 AM, Jeff Moyer <jmoyer@redhat.com> wrote:
>>>> Dan Williams <dan.j.williams@intel.com> writes:
>>>>
>>>>>> The sentiment is that programs shouldn't have to grovel around in sysfs
>>>>>> to do stuff related to an open file descriptor or mapping.  I don't take
>>>>>> issue with the name.  I do worry that something like 'wpq_drain' may be
>>>>>> too platform specific, though.  The NVM Programming Model specification
>>>>>> is going to call this "deep flush", so maybe that will give you
>>>>>> some inspiration if you do want to change the name.
>>>>>
>>>>> I'll change to "deep_flush", and I quibble that this is related to a
>>>>> single open file descriptor or mapping. It really is a "region flush"
>>>>> for giving extra protection for global metadata, but the persistence
>>>>> of individual fds or mappings is handled by ADR. I think an ioctl
>>>>> might give the false impression that every time you flush a cacheline
>>>>> to persistence you need to call the ioctl.
>>>>
>>>> fsync, for example, may affect more than one fd--all data in the drive
>>>> write cache will be flushed.  I don't see how this is so different.  I
>>>> think a sysfs file is awkward because it requires an application to
>>>> chase down the correct file in the sysfs hierarchy.  If the application
>>>> already has an open fd or a mapping, it should be able to operate on
>>>> that.
>>>
>>> I'm teetering, but still leaning towards sysfs. The use case that
>>> needs this is device-dax because we otherwise silently do this behind
>>> the application's back on filesystem-dax for fsync / msync.
>>
>> We may yet get file system support for flush from userspace (NOVA, for
>> example).  So I don't think we should restrict ourselves to only
>> thinking about the device dax use case.
>>
>>> A device-dax ioctl would be straightforward, but 'deep flush' assumes
>>> that the device-dax instance is fronting persistent memory.  There's
>>> nothing persistent memory specific about device-dax except that today
>>> only the nvdimm sub-system knows how to create them, but there's
>>> nothing that prevents other memory regions from being mapped this way.
>>
>> You're concerned that applications operating on device dax instances
>> that are not backed by pmem will try to issue a deep flush?  Why would
>> they do that, and why can't you just return failure from the ioctl?
>>
>>> So I'd rather this persistent memory specific mechanism stay with the
>>> persistent memory specific portion of the interface rather than plumb
>>> persistent memory details out through the generic device-dax interface
>>> since we have no other intercept point like we do in the
>>> filesystem-dax case to hide this flush.
>>
>> Look at the block layer.  You can issue an ioctl on a block device, and
>> if the generic block layer can handle it, it does.  If not, it gets
>> passed down to lower layers until either it gets handled, or it bubbles
>> back up because nobody knew what to do with it.  I think you can do the
>> same thing here, and that solves your layering violation.
>
> So this is where I started. I was going to follow the block layer.
> Except recently the block layer has been leaning away from ioctls and
> implementing support for syscalls directly. The same approach for
> device-dax fallocate() support got NAKd, so I opted for sysfs out of
> the gate.
>
> However, since there really is no analog for "deep flush" in the
> syscall namespace lets (*gag*) implement an ioctl for this.

So I think about it for 2 seconds and now I'm veering back to sysfs.
We don't want applications calling this ioctl on filesystem-dax fds. I
simply can't bring myself to do the work to pick a unique ioctl number
that is known to be unique for the full filsystem and block-layer
paths when we can put this interface right where it belongs in nvdimm
specific sysfs.

  reply	other threads:[~2017-04-27 21:36 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-24 23:49 [PATCH v2 0/2] libnvdimm: export wpq flush interface Dan Williams
2017-04-24 23:49 ` [PATCH v2 1/2] libnvdimm, region: fix flush hint detection crash Dan Williams
2017-04-26 19:43   ` Jeff Moyer
2017-04-26 20:04     ` Dan Williams
2017-04-24 23:50 ` [PATCH v2 2/2] libnvdimm, region: sysfs trigger for nvdimm_flush() Dan Williams
2017-04-25 16:37   ` Ross Zwisler
2017-04-25 16:38     ` Dan Williams
2017-04-25 20:17   ` [PATCH v3] " Dan Williams
2017-04-26 20:38     ` Jeff Moyer
2017-04-26 23:00       ` Dan Williams
2017-04-27 13:45         ` Jeff Moyer
2017-04-27 16:56           ` Dan Williams
2017-04-27 18:41             ` Jeff Moyer
2017-04-27 19:17               ` Dan Williams
2017-04-27 19:21                 ` Dan Williams
2017-04-27 19:43                   ` Jeff Moyer
2017-04-27 19:40                 ` Jeff Moyer
2017-04-27 20:02                   ` Dan Williams
2017-04-27 21:36                     ` Dan Williams [this message]
2017-04-26 23:36       ` [PATCH v4] " Dan Williams
2017-04-27 22:17         ` [PATCH v5] " Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAPcyv4hErApEjUWnRz9ywak8RP6BDremBKzG2wDvWO=wHdAwPg@mail.gmail.com' \
    --to=dan.j.williams@intel.com \
    --cc=jmoyer@redhat.com \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvdimm@ml01.01.org \
    --cc=m.mizuma@jp.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).