All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@lst.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	Michal Hocko <mhocko@suse.com>, Jan Kara <jack@suse.cz>,
	"linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>,
	Linux MM <linux-mm@kvack.org>, Jeff Moyer <jmoyer@redhat.com>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Ross Zwisler <ross.zwisler@linux.intel.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Subject: Re: [PATCH v2 0/4] dax: require 'struct page' and other fixups
Date: Sun, 1 Oct 2017 14:22:08 -0700	[thread overview]
Message-ID: <CAPcyv4hLgGb0sO1=qGxt83zumKt82RA8dUr=_1Gaqew7hxajXg@mail.gmail.com> (raw)
In-Reply-To: <20171001211147.GE15067@dastard>

On Sun, Oct 1, 2017 at 2:11 PM, Dave Chinner <david@fromorbit.com> wrote:
> On Sun, Oct 01, 2017 at 10:58:06AM -0700, Dan Williams wrote:
>> On Sun, Oct 1, 2017 at 12:57 AM, Christoph Hellwig <hch@lst.de> wrote:
>> > While this looks like a really nice cleanup of the code and removes
>> > nasty race conditions I'd like to understand the tradeoffs.
>> >
>> > This now requires every dax device that is used with a file system
>> > to have a struct page backing, which means not only means we'd
>> > break existing setups, but also a sharp turn from previous policy.
>> >
>> > Unless I misremember it was you Intel guys that heavily pushed for
>> > the page-less version, so I'd like to understand why you've changed
>> > your mind.
>>
>> Sure, here's a quick recap of the story so far of how we got here:
>>
>> * In support of page-less I/O operations envisioned by Matthew I
>> introduced pfn_t as a proposal for converting the block layer and
>> other sub-systems to use pfns instead of pages [1]. You helped out on
>> that patch set with some work on the DMA api. [2]
>>
>> * The DMA api conversion effort came to a halt when it came time to
>> touch sparc paths and DaveM said [3]: "Generally speaking, I think
>> that all actual physical memory the kernel operates on should have a
>> struct page backing it."
>>
>> * ZONE_DEVICE was created to solve the DMA problem and in developing /
>> testing that discovered plenty of proof for Dave's assertion (no fork,
>> no ptrace, etc). We should have made the switch to require struct page
>> at that point, but I was persuaded by the argument that changing the
>> dax policy may break existing assumptions, and that there were larger
>> issues to go solve at the time.
>>
>> What changed recently was the discussions around what the dax mount
>> option means and the assertion that we can, in general, make some
>> policy changes on our way to removing the "experimental" designation
>> from filesystem-dax. It is clear that the page-less dax path remains
>> experimental with all the way it fails in several kernel paths, and
>> there has been no patches for several months to revive the effort.
>> Meanwhile the page-less path continues to generate maintenance
>> overhead. The recent gymnastics (new ->post_mmap file_operation) to
>> make sure ->vm_flags are safely manipulated when dynamically changing
>> the dax mode of a file was the final straw for me to pull the trigger
>> on this series.
>>
>> In terms of what breaks by changing this policy it should be noted
>> that we automatically create pages for "legacy" pmem devices, and the
>> default for "ndctl create-namespace" is to allocate pages. I have yet
>> to see a bug report where someone was surprised by fork failing or
>> direct-I/O causing a SIGBUS. So, I think the defaults are working, it
>> is unlikely that there are environments dependent on page-less
>> behavior.
>
> Does this imply that the hardware vendors won't have
> tens of terabytes of pmem in systems in the near to medium term?
> That's what we were originally told to expect by 2018-19 timeframe
> (i.e. 5 years in), and that's kinda what we've been working towards.
> Indeed, supporting systems with a couple of orders of magnitude more
> pmem than ram was the big driver for page-less DAX mappings in the
> first place. i.e. it was needed to avoid the static RAM overhead of
> all the static struct pages for such large amounts of physical
> memory.
>
> If we decide that we must have struct pages for pmem, then we're
> essentially throwing away the ability to support the very systems
> the hardware vendors were telling us we needed to design the pmem
> infrastructure for.  If that reality has changed, then I'd suggest
> that we need to determine what the long term replacement for
> pageless IO on large pmem systems will be before we throw what we
> have away.

No, we can support large pmem with struct page capacity reserved from
pmem itself rather than ram. A 1.5% capacity tax does not appear to be
prohibitive.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2017-10-01 21:22 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-29  1:21 [PATCH v2 0/4] dax: require 'struct page' and other fixups Dan Williams
2017-09-29  1:21 ` Dan Williams
2017-09-29  1:21 ` [PATCH v2 1/4] dax: quiet bdev_dax_supported() Dan Williams
2017-09-29  1:21   ` Dan Williams
2017-09-29  1:21 ` [PATCH v2 2/4] dax: disable filesystem dax on devices that do not map pages Dan Williams
2017-09-29  1:21   ` Dan Williams
2017-09-29  1:21 ` [PATCH v2 3/4] dax: stop using VM_MIXEDMAP for dax Dan Williams
2017-09-29  1:21   ` Dan Williams
2017-10-03  8:09   ` Jan Kara
2017-10-03 17:29     ` Dan Williams
2017-10-03 17:29       ` Dan Williams
2017-09-29  1:21 ` [PATCH v2 4/4] dax: stop using VM_HUGEPAGE " Dan Williams
2017-09-29  1:21   ` Dan Williams
2017-10-03  8:12   ` Jan Kara
2017-10-03  8:12     ` Jan Kara
2017-10-01  7:57 ` [PATCH v2 0/4] dax: require 'struct page' and other fixups Christoph Hellwig
2017-10-01 17:58   ` Dan Williams
2017-10-01 17:58     ` Dan Williams
2017-10-01 21:11     ` Dave Chinner
2017-10-01 21:11       ` Dave Chinner
2017-10-01 21:22       ` Dan Williams [this message]
2017-10-01 21:23         ` Dan Williams
2017-10-01 21:23           ` Dan Williams
2017-10-01 21:59         ` Dave Chinner
2017-10-01 21:59           ` Dave Chinner
2017-10-01 23:15           ` Dan Williams
2017-10-01 23:15             ` Dan Williams
2017-10-02 22:47             ` Andrew Morton
2017-10-02 22:47               ` Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAPcyv4hLgGb0sO1=qGxt83zumKt82RA8dUr=_1Gaqew7hxajXg@mail.gmail.com' \
    --to=dan.j.williams@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@fromorbit.com \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=jmoyer@redhat.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=mhocko@suse.com \
    --cc=ross.zwisler@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.