From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753515AbcBBGq2 (ORCPT ); Tue, 2 Feb 2016 01:46:28 -0500 Received: from mail-yk0-f178.google.com ([209.85.160.178]:34684 "EHLO mail-yk0-f178.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752254AbcBBGq0 (ORCPT ); Tue, 2 Feb 2016 01:46:26 -0500 MIME-Version: 1.0 In-Reply-To: References: <1454009704-25959-1-git-send-email-ross.zwisler@linux.intel.com> <1454009704-25959-2-git-send-email-ross.zwisler@linux.intel.com> <20160128213858.GA29114@infradead.org> <20160129182815.GB5224@linux.intel.com> <20160130052833.GY2948@linux.intel.com> <20160201145147.GD13740@quack.suse.cz> <20160201214730.GR20456@dastard> Date: Mon, 1 Feb 2016 22:46:25 -0800 Message-ID: Subject: Re: [PATCH 2/2] dax: fix bdev NULL pointer dereferences From: Dan Williams To: Jared Hulbert Cc: Dave Chinner , Jan Kara , Matthew Wilcox , Ross Zwisler , Christoph Hellwig , LKML , Alexander Viro , Andrew Morton , Jan Kara , Linux FS Devel , linux-nvdimm Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Feb 1, 2016 at 10:06 PM, Jared Hulbert wrote: > On Mon, Feb 1, 2016 at 1:47 PM, Dave Chinner wrote: >> On Mon, Feb 01, 2016 at 03:51:47PM +0100, Jan Kara wrote: >>> On Sat 30-01-16 00:28:33, Matthew Wilcox wrote: >>> > On Fri, Jan 29, 2016 at 11:28:15AM -0700, Ross Zwisler wrote: >>> > > I guess I need to go off and understand if we can have DAX mappings on such a >>> > > device. If we can, we may have a problem - we can get the block_device from >>> > > get_block() in I/O path and the various fault paths, but we don't have access >>> > > to get_block() when flushing via dax_writeback_mapping_range(). We avoid >>> > > needing it the normal case by storing the sector results from get_block() in >>> > > the radix tree. >>> > >>> > I think we're doing it wrong by storing the sector in the radix tree; we'd >>> > really need to store both the sector and the bdev which is too much data. >>> > >>> > If we store the PFN of the underlying page instead, we don't have this >>> > problem. Instead, we have a different problem; of the device going >>> > away under us. I'm trying to find the code which tears down PTEs when >>> > the device goes away, and I'm not seeing it. What do we do about user >>> > mappings of the device? >>> >>> So I don't have a strong opinion whether storing PFN or sector is better. >>> Maybe PFN is somewhat more generic but OTOH turning DAX off for special >>> cases like inodes on XFS RT devices would be IMHO fine. >> >> We need to support alternate devices. > > Embedded devices trying to use NOR Flash to free up RAM was > historically one of the more prevalent real world uses of the old > filemap_xip.c code although the users never made it to mainline. So I > spent some time last week trying to figure out how to make a subset of > DAX not depend on CONFIG_BLOCK. It was a very frustrating and > unfruitful experience. I discarded my main conclusion as impractical, > but now that I see the difficultly DAX faces in dealing with > "alternate devices" especially some of the crazy stuff btrfs can do, I > wonder if it's not so crazy after all. > > Lets stop calling bdev_direct_access() directly from DAX. Let the > filesystems do it. > > Sure we could enable generic_dax_direct_access() helper for the > filesystems that only support single devices to make it easy. But XFS > and btrfs for example, have to do the work of figuring out what bdev > is required and then calling bdev_direct_access(). > > My reasoning is that the filesystem knows how to map inodes and > offsets to devices and sectors, no matter how complex that is. It > would even enable a filesystem to intelligently use a mix of > direct_access and regular block devices down the road. Of course it > would also make the block-less solution doable. > > Good idea? Stupid idea? The CONFIG_BLOCK=y case isn't going anywhere, so if anything it seems the CONFIG_BLOCK=n is an incremental feature in its own right. What driver and what filesystem are looking to enable this XIP support in?