From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1761399AbdACVO0 (ORCPT ); Tue, 3 Jan 2017 16:14:26 -0500 Received: from mail-oi0-f48.google.com ([209.85.218.48]:33456 "EHLO mail-oi0-f48.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751257AbdACVOM (ORCPT ); Tue, 3 Jan 2017 16:14:12 -0500 MIME-Version: 1.0 In-Reply-To: <20170102050927.GY1555@ZenIV.linux.org.uk> References: <20161026155021.20892-2-brian.boylston@hpe.com> <58110959.90901@plexistor.com> <5818A5C8.6040300@plexistor.com> <20161228234321.GA27417@ZenIV.linux.org.uk> <20161230035252.GV1555@ZenIV.linux.org.uk> <20161231022558.GW1555@ZenIV.linux.org.uk> <20170102050927.GY1555@ZenIV.linux.org.uk> From: Dan Williams Date: Tue, 3 Jan 2017 13:14:11 -0800 Message-ID: Subject: Re: [RFC] memcpy_nocache() and memcpy_writethrough() To: Al Viro Cc: "Elliott, Robert (Persistent Memory)" , Boaz Harrosh , "linux-nvdimm@lists.01.org" , "Moreno, Oliver" , "x86@kernel.org" , "linux-kernel@vger.kernel.org" , Ingo Molnar , "H. Peter Anvin" , Thomas Gleixner , "boylston@burromesa.net" , Linus Torvalds Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Jan 1, 2017 at 9:09 PM, Al Viro wrote: > On Mon, Jan 02, 2017 at 02:35:36AM +0000, Elliott, Robert (Persistent Memory) wrote: >> > -----Original Message----- >> > From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel- >> > owner@vger.kernel.org] On Behalf Of Al Viro >> > Sent: Friday, December 30, 2016 8:26 PM >> > Subject: [RFC] memcpy_nocache() and memcpy_writethrough() >> > >> ... >> > Why does pmem need writethrough warranties, anyway? >> >> Using either >> * nontemporal store instructions; or >> * following regular store instructions with a sequence of cache flush >> and store fence instructions (e.g., clflushopt or clwb + sfence) >> >> ensures that write data has reached an "ADR-safe zone" that the system >> promises will be persistent even if there is a surprise power loss or >> a CPU suffers from an error that isn't totally catastrophic (e.g., the >> CPU getting disconnected from the SDRAM will always lose data on an >> NVDIMM-N). > > Wait a sec... In which places do you need sfence in all that? movnt* > itself can be reordered, right? So using that for copying and storing > the pointer afterwards would still need sfence inbetween, unless I'm > seriously misunderstanding the situation... Robert was describing the overall flow / mechanics, but I think it is easier to visualize the sfence as a flush command sent to a disk device with a volatile cache. In fact, that's how we implemented it in the pmem block device driver. The pmem block device registers itself as requiring REQ_FLUSH to be sent to persist writes. The driver issues sfence on the assumption that all writes to pmem have either bypassed the cache with movnt, or are scheduled for write-back via one of the flush instructions (clflush, clwb, or clflushopt). >> Newly written data becomes globally visible before it becomes ADR-safe. >> This means software could act on the new data before a power loss, then >> see the old data reappear after the power loss - not good. Software >> needs to understand that any data in the process of being written is >> indeterminate until the persistence guarantee is met. The BTT shows >> one way that software can avoid that problem. > > Joy. What happens in terms of latency? I.e. how much of a stall does > clwb inflict? Unlike clflush, clwb is unordered, so it has lower overhead. It schedules writeback, but does not wait for it to complete. The clflushopt instruction is also unordered, but in addition to writeback it also invalidates the line.