From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932470AbcBAVrr (ORCPT ); Mon, 1 Feb 2016 16:47:47 -0500 Received: from ipmail07.adl2.internode.on.net ([150.101.137.131]:14284 "EHLO ipmail07.adl2.internode.on.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932075AbcBAVrp (ORCPT ); Mon, 1 Feb 2016 16:47:45 -0500 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A2DPBgCY0a9W/xATLHlegzqBP4hYnhkBAQEGi2KFRYQFhgkEAgKBPE0BAQEBAQGBC4RBAQEBAwE6HCMFCwgDDgoJJQ8FJQMhExmHege+BgEBAQcCHhiFMIR+iGwFlm+NQo55jj5ihAAoLolrAQEB Date: Tue, 2 Feb 2016 08:47:30 +1100 From: Dave Chinner To: Jan Kara Cc: Matthew Wilcox , Ross Zwisler , Christoph Hellwig , linux-kernel@vger.kernel.org, Alexander Viro , Andrew Morton , Dan Williams , Jan Kara , linux-fsdevel@vger.kernel.org, linux-nvdimm@ml01.01.org Subject: Re: [PATCH 2/2] dax: fix bdev NULL pointer dereferences Message-ID: <20160201214730.GR20456@dastard> References: <1454009704-25959-1-git-send-email-ross.zwisler@linux.intel.com> <1454009704-25959-2-git-send-email-ross.zwisler@linux.intel.com> <20160128213858.GA29114@infradead.org> <20160129182815.GB5224@linux.intel.com> <20160130052833.GY2948@linux.intel.com> <20160201145147.GD13740@quack.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160201145147.GD13740@quack.suse.cz> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Feb 01, 2016 at 03:51:47PM +0100, Jan Kara wrote: > On Sat 30-01-16 00:28:33, Matthew Wilcox wrote: > > On Fri, Jan 29, 2016 at 11:28:15AM -0700, Ross Zwisler wrote: > > > I guess I need to go off and understand if we can have DAX mappings on such a > > > device. If we can, we may have a problem - we can get the block_device from > > > get_block() in I/O path and the various fault paths, but we don't have access > > > to get_block() when flushing via dax_writeback_mapping_range(). We avoid > > > needing it the normal case by storing the sector results from get_block() in > > > the radix tree. > > > > I think we're doing it wrong by storing the sector in the radix tree; we'd > > really need to store both the sector and the bdev which is too much data. > > > > If we store the PFN of the underlying page instead, we don't have this > > problem. Instead, we have a different problem; of the device going > > away under us. I'm trying to find the code which tears down PTEs when > > the device goes away, and I'm not seeing it. What do we do about user > > mappings of the device? > > So I don't have a strong opinion whether storing PFN or sector is better. > Maybe PFN is somewhat more generic but OTOH turning DAX off for special > cases like inodes on XFS RT devices would be IMHO fine. We need to support alternate devices. There is a strong case for using the XFS RT device with DAX, especially for applications that know they are going to always use large/huge/giant pages to access their data files. The XFS RT device can guarantee allocation is always aligned to large/huge/giant page constraints right up to ENOSPC and throughout the production life of the filesystem. We have no other filesystem capable of providing such guarantees, which means the XFS RT device is uniquely suited to certain aplications with DAX... > I'm somewhat concerned that there are several things in flight (page fault > rework, invalidation on device removal, issues with DAX access to block > devices Ross found) and this is IMHO the smallest trouble we have and changing > this seems relatively invasive. So could we settle the fault code and > similar stuff first and look into this somewhat later? Because frankly I > have some trouble following how all the pieces are going to fit together > and I'm afraid we'll introduce some non-trivial bugs when several > fundamental things are in flux in parallel. Yup, there's way to many balls in the air at the moment. Cheers, Dave. -- Dave Chinner david@fromorbit.com