linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>,
	Jeff Moyer <jmoyer@redhat.com>,
	linux-nvdimm <linux-nvdimm@ml01.01.org>, X86 ML <x86@kernel.org>,
	Dave Chinner <david@fromorbit.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
	Jan Kara <jack@suse.com>
Subject: Re: [PATCH 0/2] "big hammer" for DAX msync/fsync correctness
Date: Fri, 6 Nov 2015 15:17:27 -0800	[thread overview]
Message-ID: <CAPcyv4iptRuGb0O13+LN0Qv7XUDdJYG6TCJNOVMASDTLw90gtw@mail.gmail.com> (raw)
In-Reply-To: <alpine.DEB.2.11.1511061832590.4032@nanos>

On Fri, Nov 6, 2015 at 9:35 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Fri, 6 Nov 2015, Dan Williams wrote:
>> On Fri, Nov 6, 2015 at 12:06 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
>> > Just for the record. Such a flush mechanism with
>> >
>> >      on_each_cpu()
>> >         wbinvd()
>> >         ...
>> >
>> > will make that stuff completely unusable on Real-Time systems. We've
>> > been there with the big hammer approach of the intel graphics
>> > driver.
>>
>> Noted.  This means RT systems either need to disable DAX or avoid
>> fsync.  Yes, this is a wart, but not an unexpected one in a first
>> generation persistent memory platform.
>
> And it's not just only RT. The folks who are aiming for 100%
> undisturbed user space (NOHZ_FULL) will be massively unhappy about
> that as well.
>
> Is it really required to do that on all cpus?
>

I believe it is, but I'll double check.

I assume the folks that want undisturbed userspace are ok with the
mitigation to modify their application to flush by individual cache
lines if they want to use DAX without fsync.  At least until the
platform can provide a cheaper fsync implementation.

The option to drive cache flushing from the radix is at least
interruptible, but it might be long running depending on how much
virtual address space is dirty.  Altogether, the options in the
current generation are:

1/ wbinvd driven: quick flush O(size of cache), but long interrupt-off latency

2/ radix driven: long flush O(size of dirty range), but at least preempt-able

3/ DAX without calling fsync: userspace takes direct responsibility
for cache management of DAX mappings

4/ DAX disabled: fsync is the standard page cache writeback latency

We could potentially argue about 1 vs 2 ad nauseum, but I wonder if
there is room to it punt it to a configuration option or make it
dynamic?  My stance is do 1 with the hope of riding options 3 and 4
until the platform happens to provide a better alternative.

  reply	other threads:[~2015-11-06 23:17 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-28 22:09 [PATCH 0/2] "big hammer" for DAX msync/fsync correctness Ross Zwisler
2015-10-28 22:09 ` [PATCH 1/2] pmem: add wb_cache_pmem() to the PMEM API Ross Zwisler
2015-10-28 22:09 ` [PATCH 2/2] pmem: Add simple and slow fsync/msync support Ross Zwisler
2015-10-28 23:02   ` Dan Williams
2015-10-28 22:24 ` [PATCH 0/2] "big hammer" for DAX msync/fsync correctness Jeff Moyer
2015-10-28 22:49   ` Dan Williams
2015-10-28 22:51   ` Ross Zwisler
2015-11-05 23:59     ` Dan Williams
2015-11-06  8:06       ` Thomas Gleixner
2015-11-06 16:04         ` Dan Williams
2015-11-06 17:35           ` Thomas Gleixner
2015-11-06 23:17             ` Dan Williams [this message]
2015-11-07  0:51               ` H. Peter Anvin
2015-11-07  6:50                 ` Thomas Gleixner
2015-11-07  8:12                   ` Dan Williams
2015-11-07  8:38                     ` Thomas Gleixner
2015-11-07  9:02                       ` Dan Williams
2015-11-07  9:24                         ` Thomas Gleixner
2015-11-06 20:25       ` H. Peter Anvin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAPcyv4iptRuGb0O13+LN0Qv7XUDdJYG6TCJNOVMASDTLw90gtw@mail.gmail.com \
    --to=dan.j.williams@intel.com \
    --cc=david@fromorbit.com \
    --cc=hpa@zytor.com \
    --cc=jack@suse.com \
    --cc=jmoyer@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvdimm@ml01.01.org \
    --cc=mingo@redhat.com \
    --cc=ross.zwisler@linux.intel.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).