From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752284AbbKGIkL (ORCPT ); Sat, 7 Nov 2015 03:40:11 -0500 Received: from www.linutronix.de ([62.245.132.108]:34856 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750832AbbKGIjl (ORCPT ); Sat, 7 Nov 2015 03:39:41 -0500 Date: Sat, 7 Nov 2015 09:38:36 +0100 (CET) From: Thomas Gleixner To: Dan Williams cc: "H. Peter Anvin" , Ross Zwisler , Jeff Moyer , linux-nvdimm , X86 ML , Dave Chinner , "linux-kernel@vger.kernel.org" , Ingo Molnar , Jan Kara Subject: Re: [PATCH 0/2] "big hammer" for DAX msync/fsync correctness In-Reply-To: Message-ID: References: <1446070176-14568-1-git-send-email-ross.zwisler@linux.intel.com> <20151028225112.GA30284@linux.intel.com> <563D4B2F.9010608@zytor.com> User-Agent: Alpine 2.11 (DEB 23 2013-08-11) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001,URIBL_BLOCKED=0.001 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, 7 Nov 2015, Dan Williams wrote: > On Fri, Nov 6, 2015 at 10:50 PM, Thomas Gleixner wrote: > > On Fri, 6 Nov 2015, H. Peter Anvin wrote: > >> On 11/06/15 15:17, Dan Williams wrote: > >> >> > >> >> Is it really required to do that on all cpus? > >> > > >> > I believe it is, but I'll double check. > >> > > >> > >> It's required on all CPUs on which the DAX memory may have been dirtied. > >> This is similar to the way we flush TLBs. > > > > Right. And that's exactly the problem: "may have been dirtied" > > > > If DAX is used on 50% of the CPUs and the other 50% are plumming away > > happily in user space or run low latency RT tasks w/o ever touching > > it, then having an unconditional flush on ALL CPUs is just wrong > > because you penalize the uninvolved cores with a completely pointless > > SMP function call and drain their caches. > > > > It's not wrong and pointless, it's all we have available outside of > having the kernel remember every virtual address that might have been > touched since the last fsync and sit in a loop flushing those virtual > address cache line by cache line. > > There is a crossover point where wbinvd is better than a clwb loop > that needs to be determined. This is a totally different issue and I'm well aware that there is a tradeoff between wbinvd() and a clwb loop. wbinvd() might be more efficient performance wise above some number of cache lines, but then again it's draining all unrelated stuff as well, which can result in a even larger performance hit. Now what really concerns me more is that you just unconditionally flush on all CPUs whether they were involved in that DAX stuff or not. Assume that DAX using application on CPU 0-3 and some other unrelated workload on CPU4-7. That flush will - Interrupt CPU4-7 for no reason (whether you use clwb or wbinvd) - Drain the cache for CPU4-7 for no reason if done with wbinvd() - Render Cache Allocation useless if done with wbinvd() And we are not talking about a few micro seconds here. Assume that CPU4-7 have cache allocated and it's mostly dirty. We've measured the wbinvd() impact on RT, back then when the graphic folks used it as a big hammer. The maximum latency spike was way above one millisecond. We have similar issues with TLB flushing, but there we - are tracking where it was used and never flush on innocent cpus - one can design his application in a way that it uses different processes so cross CPU flushing does not happen I know that this is not an easy problem to solve, but you should be aware that various application scenarios are going to be massively unhappy about that. Thanks, tglx