From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933071AbcAYVTC (ORCPT ); Mon, 25 Jan 2016 16:19:02 -0500 Received: from mail-wm0-f68.google.com ([74.125.82.68]:32999 "EHLO mail-wm0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757278AbcAYVS5 (ORCPT ); Mon, 25 Jan 2016 16:18:57 -0500 MIME-Version: 1.0 In-Reply-To: <20160125165209.GH2948@linux.intel.com> References: <1414185652-28663-1-git-send-email-matthew.r.wilcox@intel.com> <1414185652-28663-11-git-send-email-matthew.r.wilcox@intel.com> <100D68C7BA14664A8938383216E40DE0421657C5@fmsmsx111.amr.corp.intel.com> <20160125165209.GH2948@linux.intel.com> Date: Mon, 25 Jan 2016 13:18:55 -0800 Message-ID: Subject: Re: [PATCH v12 10/20] dax: Replace XIP documentation with DAX documentation From: Jared Hulbert To: Matthew Wilcox Cc: "Wilcox, Matthew R" , Linux FS Devel , LKML , Linux Memory Management List , Andrew Morton , Carsten Otte , Chris Brandt Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jan 25, 2016 at 8:52 AM, Matthew Wilcox wrote: > On Sun, Jan 24, 2016 at 01:03:49AM -0800, Jared Hulbert wrote: >> I our defense we didn't know we were sinning at the time. > > Fair enough. Cache flushing is Hard. > >> Can you walk me through the cache flushing hole? How is it okay on >> X86 but not VIVT archs? I'm missing something obvious here. >> >> I thought earlier that vm_insert_mixed() handled the necessary >> flushing. Is that even the part you are worried about? > > No, that part should be fine. My concern is about write() calls to files > which are also mmaped. See Documentation/cachetlb.txt around line 229, > starting with "There exists another whole class of cpu cache issues" ... oh wow. So aren't all the copy_to/from_user() variants specifically supposed to handle such cases? >> What flushing functions would you call if you did have a cache page. > > Well, that's the problem; they don't currently exist. > >> There are all kinds of cache flushing functions that work without a >> struct page. If nothing else the specialized ASM instructions that do >> the various flushes don't use struct page as a parameter. This isn't >> the first I've run into the lack of a sane cache API. Grep for >> inval_cache in the mtd drivers, should have been much easier. Isn't >> the proper solution to fix update_mmu_cache() or build out a pageless >> cache flushing API? >> >> I don't get the explicit mapping solution. What are you mapping >> where? What addresses would be SHMLBA? Phys, kernel, userspace? > > The problem comes in dax_io() where the kernel stores to an alias of the > user address (or reads from an alias of the user address). Theoretically, > we should flush user addresses before we read from the kernel's alias, > and flush the kernel's alias after we store to it. Reasoning this out loud here. Please correct. For the dax read case: - kernel virt is mapped to pfn - data is memcpy'd from kernel virt For the dax write case: - kernel virt is mapped to pfn - data is memcpy'd to kernel virt - user virt map to pfn attempts to read Is that right? I see the x86 does a nocache copy_to/from operation, I'm not familiar with the semantics of that call and it would take me a while to understand the assembly but I assume it's doing some magic opcodes that forces the writes down to physical memory with each load/store. Does the the caching model of the x86 arch update the cache entries tied to the physical memory on update? For architectures that don't do auto coherency magic... For reads: - User dcaches need flushing before kernel virtual mapping to ensure kernel reads latest data. If the user has unflushed data in the dcache it would not be reflected in the read copy. This failure mode only is a problem if the filesystem is RW. For writes: - Unlike the read case we don't need up to date data for the user's mapping of a pfn. However, the user will need to caches invalidated to get fresh data, so we should make sure to writeback any affected lines in the user caches so they don't get lost if we do an invalidate. I suppose uncommitted data might corrupt the new data written from the kernel mapping if the cachelines get flushed later. - After the data is memcpy'ed to the kernel virt map the cache, and possibly the write buffers, should be flushed. Without this flush the data might not ever get to the user mapped versions. - Assuming the user maps were all flushed at the outset they should be reloaded with fresh data on access. Do I get it more or less? > But if we create a new address for the kernel to use which lands on the > same cache line as the user's address (and this is what SHMLBA is used > to indicate), there is no incoherency between the kernel's view and the > user's view. And no new cache flushing API is needed. So... how exactly would one force the kernel address to be at the SHMLBA boundary? > Is that clearer? I'm not always good at explaining these things in a > way which makes sense to other people :-( Yeah. I think I'm at 80% comprehension here. Or at least I think I am. Thanks. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: MIME-Version: 1.0 In-Reply-To: <20160125165209.GH2948@linux.intel.com> References: <1414185652-28663-1-git-send-email-matthew.r.wilcox@intel.com> <1414185652-28663-11-git-send-email-matthew.r.wilcox@intel.com> <100D68C7BA14664A8938383216E40DE0421657C5@fmsmsx111.amr.corp.intel.com> <20160125165209.GH2948@linux.intel.com> Date: Mon, 25 Jan 2016 13:18:55 -0800 Message-ID: Subject: Re: [PATCH v12 10/20] dax: Replace XIP documentation with DAX documentation From: Jared Hulbert To: Matthew Wilcox Cc: "Wilcox, Matthew R" , Linux FS Devel , LKML , Linux Memory Management List , Andrew Morton , Carsten Otte , Chris Brandt Content-Type: text/plain; charset=UTF-8 Sender: owner-linux-mm@kvack.org List-ID: On Mon, Jan 25, 2016 at 8:52 AM, Matthew Wilcox wrote: > On Sun, Jan 24, 2016 at 01:03:49AM -0800, Jared Hulbert wrote: >> I our defense we didn't know we were sinning at the time. > > Fair enough. Cache flushing is Hard. > >> Can you walk me through the cache flushing hole? How is it okay on >> X86 but not VIVT archs? I'm missing something obvious here. >> >> I thought earlier that vm_insert_mixed() handled the necessary >> flushing. Is that even the part you are worried about? > > No, that part should be fine. My concern is about write() calls to files > which are also mmaped. See Documentation/cachetlb.txt around line 229, > starting with "There exists another whole class of cpu cache issues" ... oh wow. So aren't all the copy_to/from_user() variants specifically supposed to handle such cases? >> What flushing functions would you call if you did have a cache page. > > Well, that's the problem; they don't currently exist. > >> There are all kinds of cache flushing functions that work without a >> struct page. If nothing else the specialized ASM instructions that do >> the various flushes don't use struct page as a parameter. This isn't >> the first I've run into the lack of a sane cache API. Grep for >> inval_cache in the mtd drivers, should have been much easier. Isn't >> the proper solution to fix update_mmu_cache() or build out a pageless >> cache flushing API? >> >> I don't get the explicit mapping solution. What are you mapping >> where? What addresses would be SHMLBA? Phys, kernel, userspace? > > The problem comes in dax_io() where the kernel stores to an alias of the > user address (or reads from an alias of the user address). Theoretically, > we should flush user addresses before we read from the kernel's alias, > and flush the kernel's alias after we store to it. Reasoning this out loud here. Please correct. For the dax read case: - kernel virt is mapped to pfn - data is memcpy'd from kernel virt For the dax write case: - kernel virt is mapped to pfn - data is memcpy'd to kernel virt - user virt map to pfn attempts to read Is that right? I see the x86 does a nocache copy_to/from operation, I'm not familiar with the semantics of that call and it would take me a while to understand the assembly but I assume it's doing some magic opcodes that forces the writes down to physical memory with each load/store. Does the the caching model of the x86 arch update the cache entries tied to the physical memory on update? For architectures that don't do auto coherency magic... For reads: - User dcaches need flushing before kernel virtual mapping to ensure kernel reads latest data. If the user has unflushed data in the dcache it would not be reflected in the read copy. This failure mode only is a problem if the filesystem is RW. For writes: - Unlike the read case we don't need up to date data for the user's mapping of a pfn. However, the user will need to caches invalidated to get fresh data, so we should make sure to writeback any affected lines in the user caches so they don't get lost if we do an invalidate. I suppose uncommitted data might corrupt the new data written from the kernel mapping if the cachelines get flushed later. - After the data is memcpy'ed to the kernel virt map the cache, and possibly the write buffers, should be flushed. Without this flush the data might not ever get to the user mapped versions. - Assuming the user maps were all flushed at the outset they should be reloaded with fresh data on access. Do I get it more or less? > But if we create a new address for the kernel to use which lands on the > same cache line as the user's address (and this is what SHMLBA is used > to indicate), there is no incoherency between the kernel's view and the > user's view. And no new cache flushing API is needed. So... how exactly would one force the kernel address to be at the SHMLBA boundary? > Is that clearer? I'm not always good at explaining these things in a > way which makes sense to other people :-( Yeah. I think I'm at 80% comprehension here. Or at least I think I am. Thanks. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org