linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff Moyer <jmoyer@redhat.com>
To: Mike Snitzer <snitzer@redhat.com>
Cc: "Kani\, Toshimitsu" <toshi.kani@hpe.com>,
	"axboe\@kernel.dk" <axboe@kernel.dk>,
	"linux-nvdimm\@lists.01.org" <linux-nvdimm@ml01.01.org>,
	"linux-kernel\@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-raid\@vger.kernel.org" <linux-raid@vger.kernel.org>,
	"dm-devel\@redhat.com" <dm-devel@redhat.com>,
	"viro\@zeniv.linux.org.uk" <viro@zeniv.linux.org.uk>,
	"dan.j.williams\@intel.com" <dan.j.williams@intel.com>,
	"ross.zwisler\@linux.intel.com" <ross.zwisler@linux.intel.com>,
	"agk\@redhat.com" <agk@redhat.com>
Subject: Re: [PATCH 0/6] Support DAX for device-mapper dm-linear devices
Date: Tue, 14 Jun 2016 16:19:19 -0400	[thread overview]
Message-ID: <x49inxbzfp4.fsf@segfault.boston.devel.redhat.com> (raw)
In-Reply-To: <20160614154131.GB25876@redhat.com> (Mike Snitzer's message of "Tue, 14 Jun 2016 11:41:31 -0400")

Mike Snitzer <snitzer@redhat.com> writes:

> On Tue, Jun 14 2016 at  9:50am -0400,
> Jeff Moyer <jmoyer@redhat.com> wrote:
>
>> "Kani, Toshimitsu" <toshi.kani@hpe.com> writes:
>> 
>> >> I had dm-linear and md-raid0 support on my list of things to look at,
>> >> did you have raid0 in your plans?
>> >
>> > Yes, I hope to extend further and raid0 is a good candidate.   
>> 
>> dm-flakey would allow more xfstests test cases to run.  I'd say that's
>> more important than linear or raid0.  ;-)
>
> Regardless of which target(s) grow DAX support the most pressing initial
> concern is getting the DM device stacking correct.  And verifying that
> IO that cross pmem device boundaries are being properly split by DM
> core (via drivers/md/dm.c:__split_and_process_non_flush()'s call to
> max_io_len).

That was a tongue-in-cheek comment.  You're reading way too much into
it.

>> Also, the next step in this work is to then decide how to determine on
>> what numa node an LBA resides.  We had discussed this at a prior
>> plumbers conference, and I think the consensus was to use xattrs.
>> Toshi, do you also plan to do that work?
>
> How does the associated NUMA node relate to this?  Does the
> DM requests_queue need to be setup to only allocate from the NUMA node
> the pmem device is attached to?  I recently added support for this to
> DM.  But there will likely be some code need to propagate the NUMA node
> id accordingly.

I assume you mean allocate memory (the volatile kind).  That should work
the same between pmem and regular block devices, no?

What I was getting at was that applications may want to know on which
node their data resides.  Right now, it's easy to tell because a single
device cannot span numa nodes, or, if it does, it does so via an
interleave, so numa information isn't interesting.  However, once data
on a single file system can be placed on multiple different numa nodes,
applications may want to query and/or control that placement.

Here's a snippet from a blog post I never finished:

There are two essential questions that need to be answered regarding
persistent memory and NUMA: first, would an application benefit from
being able to query the NUMA locality of its data, and second, would
an application benefit from being able to specify a placement policy
for its data?  This article is an attempt to summarize the current
state of hardware and software in order to consider the above two
questions.  We begin with a short list of use cases for these
interfaces, which will frame the discussion.

First, let's consider an interface that allows an application to query
the NUMA placement of existing data.  With such information, an
application may want to perform the following actions:

- relocate application processes to the same NUMA node as their data.
  (Interfaces for moving a process are readily available.)
- specify a memory (RAM) allocation policy so that memory allocations
  come from the same NUMA node as the data.

Second, we consider an interface that allows an application to specify
a placement policy for new data.  Using this interface, an application
may:

- ensure data is stored on the same NUMA node as the one on which the
  application is running
- ensure data is stored on the same NUMA node as an I/O adapter such
  as a network card, that is a producer of data stored to NVM.
- ensure data is stored on a different NUMA node:
  - so that the data is stored on the same NUMA node as related data
  - because the data does not need the faster access afforded by local
    NUMA placement.  Presumably this is a trade-off, and other data
    will require local placement to meet the performance goals of the
    application.

Cheers,
Jeff

  parent reply	other threads:[~2016-06-14 20:19 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-13 22:21 [PATCH 0/6] Support DAX for device-mapper dm-linear devices Toshi Kani
2016-06-13 22:21 ` [PATCH 1/6] genhd: Add GENHD_FL_DAX to gendisk flags Toshi Kani
2016-06-13 22:21 ` [PATCH 2/6] block: Check GENHD_FL_DAX for DAX capability Toshi Kani
2016-06-13 22:21 ` [PATCH 3/6] dm: Add dm_blk_direct_access() for mapped device Toshi Kani
2016-06-13 22:21 ` [PATCH 4/6] dm-linear: Add linear_direct_access() Toshi Kani
2016-06-13 22:21 ` [PATCH 5/6] dm, dm-linear: Add dax_supported to dm_target Toshi Kani
2016-06-13 22:21 ` [PATCH 6/6] dm: Enable DAX support for mapper device Toshi Kani
2016-06-13 22:57 ` [PATCH 0/6] Support DAX for device-mapper dm-linear devices Mike Snitzer
2016-06-20 18:00   ` Mike Snitzer
2016-06-20 18:31     ` Kani, Toshimitsu
2016-06-20 19:40       ` Mike Snitzer
2016-06-20 19:52         ` Mike Snitzer
2016-06-20 20:11           ` Kani, Toshimitsu
2016-06-20 21:28             ` Kani, Toshimitsu
2016-06-20 22:22               ` Mike Snitzer
2016-06-21 13:41                 ` Mike Snitzer
2016-06-21 15:44                   ` Kani, Toshimitsu
2016-06-21 15:50                     ` Kani, Toshimitsu
2016-06-21 16:25                     ` Dan Williams
2016-06-21 16:35                       ` Kani, Toshimitsu
2016-06-21 16:45                         ` Dan Williams
2016-06-21 16:56                           ` Kani, Toshimitsu
2016-06-21 18:17                     ` Mike Snitzer
2016-06-22 17:44                       ` Kani, Toshimitsu
2016-06-22 19:15                         ` Dan Williams
2016-06-22 20:16                           ` Kani, Toshimitsu
2016-06-22 22:38                             ` Mike Snitzer
2016-06-22 22:59                               ` Kani, Toshimitsu
2016-06-13 23:18 ` Dan Williams
2016-06-13 23:59   ` Kani, Toshimitsu
2016-06-14  0:02     ` Dan Williams
2016-06-14  7:30       ` Dan Williams
2016-06-14 13:50     ` Jeff Moyer
2016-06-14 15:41       ` Mike Snitzer
2016-06-14 18:00         ` Kani, Toshimitsu
2016-06-14 20:19         ` Jeff Moyer [this message]
2016-06-15  1:46           ` Mike Snitzer
2016-06-15  2:07             ` Dan Williams
2016-06-15  2:35               ` Mike Snitzer
2016-06-14 15:53       ` Kani, Toshimitsu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=x49inxbzfp4.fsf@segfault.boston.devel.redhat.com \
    --to=jmoyer@redhat.com \
    --cc=agk@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=dan.j.williams@intel.com \
    --cc=dm-devel@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvdimm@ml01.01.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=ross.zwisler@linux.intel.com \
    --cc=snitzer@redhat.com \
    --cc=toshi.kani@hpe.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).