From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756856AbcBBSqw (ORCPT ); Tue, 2 Feb 2016 13:46:52 -0500 Received: from mga01.intel.com ([192.55.52.88]:31772 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756435AbcBBSqt (ORCPT ); Tue, 2 Feb 2016 13:46:49 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.22,385,1449561600"; d="scan'208";a="875376995" Date: Tue, 2 Feb 2016 13:46:43 -0500 From: Matthew Wilcox To: Dan Williams Cc: Ross Zwisler , Jan Kara , Dave Chinner , Christoph Hellwig , "linux-kernel@vger.kernel.org" , Alexander Viro , Andrew Morton , Jan Kara , linux-fsdevel , linux-nvdimm Subject: Re: [PATCH 2/2] dax: fix bdev NULL pointer dereferences Message-ID: <20160202184643.GA3260@linux.intel.com> References: <20160129182815.GB5224@linux.intel.com> <20160130052833.GY2948@linux.intel.com> <20160201145147.GD13740@quack.suse.cz> <20160201214730.GR20456@dastard> <20160202111723.GD12574@quack.suse.cz> <20160202164642.GE12574@quack.suse.cz> <20160202173456.GB23963@linux.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Feb 02, 2016 at 09:46:21AM -0800, Dan Williams wrote: > What a about a super_operation? That seems the right level, given > we're currently doing: > > inode->i_sb->s_bdev > > ...it does not seem terrible to instead do: > > inode->i_sb->s_op->get_block() The point is that filesystems have lots of different get_block operations, and the right one to use depends not just on the inode, but also upon what VFS function is being called, and in some filesystems the phase of the moon, or the file open flags (so even inode->i_ops->get_block is wrong; file->f_ops->get_block would be better, but of course we've lost that by the point we're doing writeback). I now realise that basing DAX around get_block & buffer_heads was a mistake. I think the Right Solution (not for 4.5) is to ask filesystems to populate the radix tree. A flow somewhat like this: 1. VFS or VM calls filesystem (eg ->fault()) 2. Filesystem calls DAX (eg dax_fault()) 3. DAX looks in radix tree, finds no information. 4. DAX calls (NEW!) mapping->a_ops->populate_pfns 5. Filesystem looks up its internal data structure (eg extent tree) and calls dax_create_pfns() (see giant patch from yesterday, only instead of passing a get_block_t, the filesystem has already filled in a bh which describes the entire extent that this access happens to land in). 6. DAX continues to take care of calling bdev_direct_access() from dax_create_pfns(). After we have that step done, we can look at what it would take to avoid calling bdev_direct_access for non-block-based filesystems. That looks to me like just calling dax_add_pfn_entries() from their ->populate_pfns implementation. And we introduce a CONFIG_BLOCK ifdef around dax_create_pfns(), dax_clear_blocks() and dax_zero_page_range(). Or maybe modify dax_zero_page_range() to use the radix tree as above, since it's probably a useful helper function. Once we have buffer_head usage confined to a fairly small part of DAX, we can look at replacing it with a more appropriate data structure with better-defined contents.