From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932210AbcFNUTX (ORCPT ); Tue, 14 Jun 2016 16:19:23 -0400 Received: from mx1.redhat.com ([209.132.183.28]:60177 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932085AbcFNUTV convert rfc822-to-8bit (ORCPT ); Tue, 14 Jun 2016 16:19:21 -0400 From: Jeff Moyer To: Mike Snitzer Cc: "Kani\, Toshimitsu" , "axboe\@kernel.dk" , "linux-nvdimm\@lists.01.org" , "linux-kernel\@vger.kernel.org" , "linux-raid\@vger.kernel.org" , "dm-devel\@redhat.com" , "viro\@zeniv.linux.org.uk" , "dan.j.williams\@intel.com" , "ross.zwisler\@linux.intel.com" , "agk\@redhat.com" Subject: Re: [PATCH 0/6] Support DAX for device-mapper dm-linear devices References: <1465856497-19698-1-git-send-email-toshi.kani@hpe.com> <1465861755.3504.185.camel@hpe.com> <20160614154131.GB25876@redhat.com> X-PGP-KeyID: 1F78E1B4 X-PGP-CertKey: F6FE 280D 8293 F72C 65FD 5A58 1FF8 A7CA 1F78 E1B4 X-PCLoadLetter: What the f**k does that mean? Date: Tue, 14 Jun 2016 16:19:19 -0400 In-Reply-To: <20160614154131.GB25876@redhat.com> (Mike Snitzer's message of "Tue, 14 Jun 2016 11:41:31 -0400") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8BIT X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.32]); Tue, 14 Jun 2016 20:19:21 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Mike Snitzer writes: > On Tue, Jun 14 2016 at 9:50am -0400, > Jeff Moyer wrote: > >> "Kani, Toshimitsu" writes: >> >> >> I had dm-linear and md-raid0 support on my list of things to look at, >> >> did you have raid0 in your plans? >> > >> > Yes, I hope to extend further and raid0 is a good candidate.    >> >> dm-flakey would allow more xfstests test cases to run. I'd say that's >> more important than linear or raid0. ;-) > > Regardless of which target(s) grow DAX support the most pressing initial > concern is getting the DM device stacking correct. And verifying that > IO that cross pmem device boundaries are being properly split by DM > core (via drivers/md/dm.c:__split_and_process_non_flush()'s call to > max_io_len). That was a tongue-in-cheek comment. You're reading way too much into it. >> Also, the next step in this work is to then decide how to determine on >> what numa node an LBA resides. We had discussed this at a prior >> plumbers conference, and I think the consensus was to use xattrs. >> Toshi, do you also plan to do that work? > > How does the associated NUMA node relate to this? Does the > DM requests_queue need to be setup to only allocate from the NUMA node > the pmem device is attached to? I recently added support for this to > DM. But there will likely be some code need to propagate the NUMA node > id accordingly. I assume you mean allocate memory (the volatile kind). That should work the same between pmem and regular block devices, no? What I was getting at was that applications may want to know on which node their data resides. Right now, it's easy to tell because a single device cannot span numa nodes, or, if it does, it does so via an interleave, so numa information isn't interesting. However, once data on a single file system can be placed on multiple different numa nodes, applications may want to query and/or control that placement. Here's a snippet from a blog post I never finished: There are two essential questions that need to be answered regarding persistent memory and NUMA: first, would an application benefit from being able to query the NUMA locality of its data, and second, would an application benefit from being able to specify a placement policy for its data? This article is an attempt to summarize the current state of hardware and software in order to consider the above two questions. We begin with a short list of use cases for these interfaces, which will frame the discussion. First, let's consider an interface that allows an application to query the NUMA placement of existing data. With such information, an application may want to perform the following actions: - relocate application processes to the same NUMA node as their data. (Interfaces for moving a process are readily available.) - specify a memory (RAM) allocation policy so that memory allocations come from the same NUMA node as the data. Second, we consider an interface that allows an application to specify a placement policy for new data. Using this interface, an application may: - ensure data is stored on the same NUMA node as the one on which the application is running - ensure data is stored on the same NUMA node as an I/O adapter such as a network card, that is a producer of data stored to NVM. - ensure data is stored on a different NUMA node: - so that the data is stored on the same NUMA node as related data - because the data does not need the faster access afforded by local NUMA placement. Presumably this is a trade-off, and other data will require local placement to meet the performance goals of the application. Cheers, Jeff