From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933314AbcAaSHj (ORCPT ); Sun, 31 Jan 2016 13:07:39 -0500 Received: from mga09.intel.com ([134.134.136.24]:21845 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756351AbcAaSHh (ORCPT ); Sun, 31 Jan 2016 13:07:37 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.22,376,1449561600"; d="scan'208";a="905331736" Date: Mon, 1 Feb 2016 05:07:38 +1100 From: Matthew Wilcox To: Dan Williams Cc: Ross Zwisler , linux-nvdimm , Dave Chinner , "linux-kernel@vger.kernel.org" , Christoph Hellwig , Alexander Viro , Jan Kara , linux-fsdevel , Andrew Morton Subject: Re: [PATCH 2/2] dax: fix bdev NULL pointer dereferences Message-ID: <20160131180738.GB2948@linux.intel.com> References: <1454009704-25959-1-git-send-email-ross.zwisler@linux.intel.com> <1454009704-25959-2-git-send-email-ross.zwisler@linux.intel.com> <20160128213858.GA29114@infradead.org> <20160129182815.GB5224@linux.intel.com> <20160130052833.GY2948@linux.intel.com> <20160131023247.GZ2948@linux.intel.com> <33D2FA63-0EBC-4E5F-B337-F06A8A846166@gmail.com> <20160131105518.GA2948@linux.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Jan 31, 2016 at 08:38:20AM -0800, Dan Williams wrote: > On Sun, Jan 31, 2016 at 2:55 AM, Matthew Wilcox wrote: > > On Sat, Jan 30, 2016 at 11:12:12PM -0700, Ross Zwisler wrote: > >> Is there a reason to store pnfs instead of kaddrs in the radix tree? > > > > Once ARM, MIPS and SPARC get supported, they're going to need temporary > > kernel addresses assigned to PFNs rather than permanent ones. Also, > > it'll be easier for teardown to delete PFNs associated with a particular > > device than kaddrs associated with a particular device. And it lets > > us support more persistent memory on a 32-bit machine (also on a 64-bit > > machine, but that's mostly theoretical) > > > > +/* > > + * DAX uses the 'exceptional' entries to store PFNs in the radix tree. > > + * Bit 0 is clear (the radix tree uses this for its own purposes). Bit > > + * 1 is set (to indicate an exceptional entry). Bits 2 & 3 are PFN_DEV > > + * and PFN_MAP. The top two bits denote the size of the entry (PTE, PMD, > > + * PUD, one reserved). That leaves us 26 bits on 32-bit systems and 58 > > + * bits on 64-bit systems, able to address 256GB and 1024EB respectively. > > + */ > > > > It's also pretty cheap to look up the kaddr from the pfn, at least on > > 64-bit architectures without cache aliasing problems: > > > > +static void *dax_map_pfn(pfn_t pfn, unsigned long index) > > +{ > > + preempt_disable(); > > + pagefault_disable(); > > + return pfn_to_kaddr(pfn_t_to_pfn(pfn)); > > pfn_to_kaddr() assumes persistent memory is direct mapped which is not > always the case. Yes. This is just the default implementation of dax_map_pfn() which works for most situations. We can introduce more complex implementations of dax_map_pfn() as necessary. You make another excellent point for why we should store PFNs in the radix tree instead of kaddrs :-) One option that I've been looking at (primarily for x86-32) is having an rbtree of PFN ranges that drivers add to when they register peristent memory. That would let us use the io_mapping_create_wc() / io_mapping_map_atomic_wc() API. But having great support for persistent memory with 32-bit x86 kernels is very very low on my priority list.