From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1034020AbbKFXRa (ORCPT ); Fri, 6 Nov 2015 18:17:30 -0500 Received: from mail-wi0-f180.google.com ([209.85.212.180]:35161 "EHLO mail-wi0-f180.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1033956AbbKFXR2 (ORCPT ); Fri, 6 Nov 2015 18:17:28 -0500 MIME-Version: 1.0 In-Reply-To: References: <1446070176-14568-1-git-send-email-ross.zwisler@linux.intel.com> <20151028225112.GA30284@linux.intel.com> Date: Fri, 6 Nov 2015 15:17:27 -0800 Message-ID: Subject: Re: [PATCH 0/2] "big hammer" for DAX msync/fsync correctness From: Dan Williams To: Thomas Gleixner Cc: Ross Zwisler , Jeff Moyer , linux-nvdimm , X86 ML , Dave Chinner , "linux-kernel@vger.kernel.org" , Ingo Molnar , "H. Peter Anvin" , Jan Kara Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Nov 6, 2015 at 9:35 AM, Thomas Gleixner wrote: > On Fri, 6 Nov 2015, Dan Williams wrote: >> On Fri, Nov 6, 2015 at 12:06 AM, Thomas Gleixner wrote: >> > Just for the record. Such a flush mechanism with >> > >> > on_each_cpu() >> > wbinvd() >> > ... >> > >> > will make that stuff completely unusable on Real-Time systems. We've >> > been there with the big hammer approach of the intel graphics >> > driver. >> >> Noted. This means RT systems either need to disable DAX or avoid >> fsync. Yes, this is a wart, but not an unexpected one in a first >> generation persistent memory platform. > > And it's not just only RT. The folks who are aiming for 100% > undisturbed user space (NOHZ_FULL) will be massively unhappy about > that as well. > > Is it really required to do that on all cpus? > I believe it is, but I'll double check. I assume the folks that want undisturbed userspace are ok with the mitigation to modify their application to flush by individual cache lines if they want to use DAX without fsync. At least until the platform can provide a cheaper fsync implementation. The option to drive cache flushing from the radix is at least interruptible, but it might be long running depending on how much virtual address space is dirty. Altogether, the options in the current generation are: 1/ wbinvd driven: quick flush O(size of cache), but long interrupt-off latency 2/ radix driven: long flush O(size of dirty range), but at least preempt-able 3/ DAX without calling fsync: userspace takes direct responsibility for cache management of DAX mappings 4/ DAX disabled: fsync is the standard page cache writeback latency We could potentially argue about 1 vs 2 ad nauseum, but I wonder if there is room to it punt it to a configuration option or make it dynamic? My stance is do 1 with the hope of riding options 3 and 4 until the platform happens to provide a better alternative.