linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ira Weiny <ira.weiny@intel.com>
To: Dan Williams <dan.j.williams@intel.com>,
	Ira Weiny <ira.weiny@intel.com>,
	Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Dave Jiang <dave.jiang@intel.com>, Fan Ni <fan.ni@samsung.com>,
	"Navneet Singh" <navneet.singh@intel.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Davidlohr Bueso <dave@stgolabs.net>,
	Alison Schofield <alison.schofield@intel.com>,
	Vishal Verma <vishal.l.verma@intel.com>,
	<linux-btrfs@vger.kernel.org>, <linux-cxl@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 16/26] cxl/extent: Realize extent devices
Date: Sun, 5 May 2024 21:35:39 -0700	[thread overview]
Message-ID: <66385e1b63173_25842129417@iweiny-mobl.notmuch> (raw)
In-Reply-To: <663401ba2e237_1384629438@dwillia2-mobl3.amr.corp.intel.com.notmuch>

Dan Williams wrote:
> Ira Weiny wrote:
> > Jonathan Cameron wrote:
> > > On Sun, 24 Mar 2024 16:18:19 -0700
> > > ira.weiny@intel.com wrote:
> > > 
> > > > From: Navneet Singh <navneet.singh@intel.com>
> > > > 
> > > > Once all extents of an interleave set are present a region must
> > > > surface an extent to the region.
> > > > 
> > > > Without interleaving; endpoint decoder and region extents have a 1:1
> > > > relationship.  Future support for IW > 1 will maintain a N:1
> > > > relationship between the device extents and region extents.
> > > > 
> > > > Create a region extent device for every device extent found.  Release of
> > > > the extent device triggers a response to the underlying hardware extent.
> > > > 
> > > > There is no strong use case to support the addition of extents which
> > > > overlap previously accepted extent ranges.  Reject such new extents
> > > > until such time as a good use case emerges.
> > > > 
> > > > Expose the necessary details of region extents by creating the following
> > > > sysfs entries.
> > > > 
> > > > 	/sys/bus/cxl/devices/dax_regionX/extentY
> > > > 	/sys/bus/cxl/devices/dax_regionX/extentY/offset
> > > > 	/sys/bus/cxl/devices/dax_regionX/extentY/length
> > > > 	/sys/bus/cxl/devices/dax_regionX/extentY/label
> > > 
> > > Docs?
> > 
> > That is a good idea.
> > 
> > > The label in particular worries me a little as I'm not sure what
> > > is in it.
> > 
> > I envisioned a pass through of the tag.
> > 
> > >
> > > If it's the tag one possible format is a uuid (not a coincidence
> > > that it is the same length) and interpreting that as characters isn't
> > > going to get us far.  I wonder if we have to treat it as a binary attr
> > > given we have no idea what it is.
> > 
> > In thinking about this more (and running some experiments): none of these are
> > strictly necessary in this initial implementation.  No code currently uses
> > them directly.
> > 
> > I questioned these in the past and I've done so again over the weekend.
> > 
> > I was about to rip them out entirely when I remembered Gregory Price's
> > comments on Discord.  There he indicating a desire to very carefully place
> > dax devices.  Without at least the offset and length above (and to a
> > lesser extent the label) this can't be done.
> 
> Careful placement of dax-devices physically requires an entirely new
> allocation ABI. There is the mapping_store() interface that was added
> for a specific kexec / VMM fast restore use case, but that never
> envisioned the sparse region case. So I do think it is worthwhile to
> punt on that question to a later add-on feature.

Agreed.

>  
> [..]
> > I don't think this is exactly what the user is going to expect.  This can
> > be resolved by by looking at the dax device mappings though.[0]  So I'm going
> > to leave this for now.  But I expect some additional porcelain is going to
> > be required to fully meet Gregory's requirements.
> 
> Not sure what the exact requirement is, but if it's the typical, "I want
> to allocate by tag",

I'm extrapolating that it will be.  I want to allocate on the first extent.
Tags were removed from the series a while ago.

> then I think there is another potential coarse
> grained solution that probably covers most cases. Allow multiple
> dax_regions per cxl_dcd_regions, where each dax_region manages an
> exclusive set of tags.

I'll have to think on that because don't dax regions map to specific dpas on
creation?

> 
> The host negotiates a dax_region tag layout with the orchestrator and
> can then trust that all of the extents that show up in a given dax_region
> belong to a given tag or set of tags.
> 
> This is not something that needs to be considered in the initial
> enabling, but is potentially a way to avoid bolting-on a new fine grained
> allocation api after the fact.
> 

The point is I'm not trying to bolt anything on.  Just trying to explain what
can and can't be done.   The purpose of these entries was to give the user the
ability to see what extents existed and by correlating the dax mappings could
see where their dax mappings landed.  Careful allocation of dax devices could
result in the use of some extents and not others.  But this was __not__ at all
intended to be done initially.  Just use the space as it come available without
any tag use at all.

> 
> > [0]
> > 	/sys/bus/dax/devices/daxX.Y/mapping[0..N]/start
> > 	/sys/bus/dax/devices/daxX.Y/mapping[0..N]/end
> > 
> > Back to the label field:  It is currently just the 'tag' of the individual
> > extent (because no interleaving).  My vision for the interleave case would
> > be for the kernel to assemble device extents into a region extent only if
> > the tags match and export that.
> > 
> > Thinking on it more though we should leave label out for now.  This is the
> > second time it has been questioned.
> 
> I don't understand the issue. That is a critical piece of information
> and it is at the cxl device level
> 
> /sys/bus/cxl/devices/dax_regionX/extentY/label
> 
> ...now I would just call that "tag" and UUID format it (to Jonathan's
> point), but I see no rationale to hide what is most likely the most
> useful information about an extent.

The rationale is that the user was not going to use it.  So no use case == no
reason to have it...  yet.  I had code a while back to allocate dax devices on
specific tags.  But that was deemed to different from the current dax
allocation mechanism so it was scrapped...  for now.

I can add it back...  But I'm just getting a bit testy about who wants what and
how this is all going to get used.

Ira

  reply	other threads:[~2024-05-06  4:35 UTC|newest]

Thread overview: 160+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-24 23:18 [PATCH 00/26] DCD: Add support for Dynamic Capacity Devices (DCD) ira.weiny
2024-03-24 23:18 ` [PATCH 01/26] cxl/mbox: Flag " ira.weiny
2024-03-25 16:11   ` Jonathan Cameron
2024-03-25 22:16   ` fan
2024-03-25 22:56   ` Davidlohr Bueso
2024-04-02 22:26     ` Ira Weiny
2024-03-26 16:34   ` Dave Jiang
2024-04-02 22:30     ` Ira Weiny
2024-04-10 18:15   ` Alison Schofield
2024-03-24 23:18 ` [PATCH 02/26] cxl/core: Separate region mode from decoder mode ira.weiny
2024-03-25 16:20   ` Jonathan Cameron
2024-04-02 23:24     ` Ira Weiny
2024-03-25 23:18   ` Davidlohr Bueso
2024-03-28  5:22     ` Ira Weiny
2024-03-28 20:09       ` Dave Jiang
2024-04-02 23:27         ` Ira Weiny
2024-04-24 17:58         ` Ira Weiny
2024-04-02 23:25     ` Ira Weiny
2024-04-10 18:49   ` Alison Schofield
2024-03-24 23:18 ` [PATCH 03/26] cxl/mem: Read dynamic capacity configuration from the device ira.weiny
2024-03-25 17:40   ` Jonathan Cameron
2024-04-03 22:22     ` Ira Weiny
2024-03-25 23:36   ` fan
2024-04-03 22:41     ` Ira Weiny
2024-04-02 11:41   ` Jørgen Hansen
2024-04-05 18:09     ` Ira Weiny
2024-04-09  8:42       ` Jørgen Hansen
2024-04-09  2:00   ` Alison Schofield
2024-03-24 23:18 ` [PATCH 04/26] cxl/region: Add dynamic capacity decoder and region modes ira.weiny
2024-03-25 17:42   ` Jonathan Cameron
2024-03-26 16:17   ` fan
2024-03-27 15:43   ` Dave Jiang
2024-04-05 18:19     ` Ira Weiny
2024-04-06  0:01       ` Dave Jiang
2024-05-14  2:40   ` Zhijian Li (Fujitsu)
2024-03-24 23:18 ` [PATCH 05/26] cxl/core: Simplify cxl_dpa_set_mode() Ira Weiny
2024-03-25 17:46   ` Jonathan Cameron
2024-03-25 21:38   ` Davidlohr Bueso
2024-03-26 16:25   ` fan
2024-03-26 17:46   ` Dave Jiang
2024-04-05 19:21     ` Ira Weiny
2024-04-06  0:02       ` Dave Jiang
2024-04-09  0:43   ` Alison Schofield
2024-05-03 19:09     ` Ira Weiny
2024-05-03 20:33       ` Alison Schofield
2024-05-04  1:19       ` Dan Williams
2024-05-06  4:06         ` Ira Weiny
2024-05-04  4:13       ` Dan Williams
2024-05-06  3:46         ` Ira Weiny
2024-03-24 23:18 ` [PATCH 06/26] cxl/port: Add Dynamic Capacity mode support to endpoint decoders ira.weiny
2024-03-26 16:35   ` fan
2024-04-05 19:50     ` Ira Weiny
2024-03-26 17:58   ` Dave Jiang
2024-04-05 20:34     ` Ira Weiny
2024-04-04  8:32   ` Jonathan Cameron
2024-04-05 20:56     ` Ira Weiny
2024-05-06 16:22       ` Dan Williams
2024-05-10  5:31         ` Ira Weiny
2024-04-10 20:33   ` Alison Schofield
2024-03-24 23:18 ` [PATCH 07/26] cxl/port: Add dynamic capacity size " ira.weiny
2024-04-05 13:54   ` Jonathan Cameron
2024-05-03 17:09     ` Ira Weiny
2024-05-03 17:21       ` Dan Williams
2024-05-06  4:07         ` Ira Weiny
2024-04-10 22:50   ` Alison Schofield
2024-03-24 23:18 ` [PATCH 08/26] cxl/mem: Expose device dynamic capacity capabilities ira.weiny
2024-03-25 23:40   ` Davidlohr Bueso
2024-03-26 18:30     ` fan
2024-04-04  8:44       ` Jonathan Cameron
2024-04-04  8:51   ` Jonathan Cameron
2024-03-24 23:18 ` [PATCH 09/26] cxl/region: Add Dynamic Capacity CXL region support ira.weiny
2024-03-26 22:31   ` fan
2024-04-10  4:25     ` Ira Weiny
2024-03-27 17:27   ` Dave Jiang
2024-04-10  4:35     ` Ira Weiny
2024-04-04 10:26   ` Jonathan Cameron
2024-04-10  4:40     ` Ira Weiny
2024-03-24 23:18 ` [PATCH 10/26] cxl/events: Factor out event msgnum configuration Ira Weiny
2024-03-27 17:38   ` Dave Jiang
2024-04-04 15:07   ` Jonathan Cameron
2024-03-24 23:18 ` [PATCH 11/26] cxl/pci: Delay event buffer allocation Ira Weiny
2024-03-25 22:26   ` Davidlohr Bueso
2024-03-27 17:38   ` Dave Jiang
2024-04-04 15:08   ` Jonathan Cameron
2024-03-24 23:18 ` [PATCH 12/26] cxl/pci: Factor out interrupt policy check Ira Weiny
2024-03-27 17:41   ` Dave Jiang
2024-04-04 15:10   ` Jonathan Cameron
2024-03-24 23:18 ` [PATCH 13/26] cxl/mem: Configure dynamic capacity interrupts ira.weiny
2024-03-26 23:12   ` fan
2024-04-10  4:48     ` Ira Weiny
2024-03-27 17:54   ` Dave Jiang
2024-04-10  5:26     ` Ira Weiny
2024-04-04 15:22   ` Jonathan Cameron
2024-04-10  5:34     ` Ira Weiny
2024-04-10 23:23   ` Alison Schofield
2024-05-06 16:56   ` Dan Williams
2024-03-24 23:18 ` [PATCH 14/26] cxl/region: Read existing extents on region creation ira.weiny
2024-03-26 23:27   ` fan
2024-04-10  5:46     ` Ira Weiny
2024-03-27 17:45   ` fan
2024-04-10  6:19     ` Ira Weiny
2024-03-27 18:31   ` Dave Jiang
2024-04-10  6:09     ` Ira Weiny
2024-04-02 13:57   ` Jørgen Hansen
2024-04-10  6:29     ` Ira Weiny
2024-04-04 16:04   ` Jonathan Cameron
2024-04-04 16:13   ` Jonathan Cameron
2024-04-10 17:44   ` Alison Schofield
2024-05-06 18:34   ` Dan Williams
2024-03-24 23:18 ` [PATCH 15/26] range: Add range_overlaps() Ira Weiny
2024-03-25 18:33   ` David Sterba
2024-03-25 21:24   ` Davidlohr Bueso
2024-03-26 12:51   ` Johannes Thumshirn
2024-03-27 17:36   ` fan
2024-03-28 20:09   ` Dave Jiang
2024-04-04 16:06   ` Jonathan Cameron
2024-03-24 23:18 ` [PATCH 16/26] cxl/extent: Realize extent devices ira.weiny
2024-03-27 22:34   ` fan
2024-03-28 21:11   ` Dave Jiang
2024-04-24 19:57     ` Ira Weiny
2024-04-04 16:32   ` Jonathan Cameron
2024-04-30  3:23     ` Ira Weiny
2024-05-02 21:12       ` Dan Williams
2024-05-06  4:35         ` Ira Weiny [this message]
2024-04-11  0:09   ` Alison Schofield
2024-05-07  1:30   ` Dan Williams
2024-03-24 23:18 ` [PATCH 17/26] dax/region: Create extent resources on DAX region driver load ira.weiny
2024-04-04 16:36   ` Jonathan Cameron
2024-04-09 16:22   ` fan
2024-05-07  2:31   ` Dan Williams
2024-03-24 23:18 ` [PATCH 18/26] cxl/mem: Handle DCD add & release capacity events ira.weiny
2024-04-04 17:03   ` Jonathan Cameron
2024-05-07  5:04   ` Dan Williams
2024-03-24 23:18 ` [PATCH 19/26] dax/bus: Factor out dev dax resize logic Ira Weiny
2024-04-04 17:15   ` Jonathan Cameron
2024-03-24 23:18 ` [PATCH 20/26] dax: Document dax dev range tuple Ira Weiny
2024-04-01 17:06   ` Dave Jiang
2024-04-04 17:19   ` Jonathan Cameron
2024-03-24 23:18 ` [PATCH 21/26] dax/region: Prevent range mapping allocation on sparse regions Ira Weiny
2024-04-01 17:07   ` Dave Jiang
2024-04-10 23:02   ` Alison Schofield
2024-03-24 23:18 ` [PATCH 22/26] dax/region: Support DAX device creation on sparse DAX regions Ira Weiny
2024-04-04 17:36   ` Jonathan Cameron
2024-03-24 23:18 ` [PATCH 23/26] cxl/mem: Trace Dynamic capacity Event Record ira.weiny
2024-04-01 17:56   ` Dave Jiang
2024-04-04 17:38   ` Jonathan Cameron
2024-04-10 17:03   ` Alison Schofield
2024-03-24 23:18 ` [PATCH 24/26] tools/testing/cxl: Make event logs dynamic Ira Weiny
2024-03-24 23:18 ` [PATCH 25/26] tools/testing/cxl: Add DC Regions to mock mem data Ira Weiny
2024-03-24 23:18 ` [PATCH 26/26] tools/testing/cxl: Add Dynamic Capacity events Ira Weiny
2024-03-25 19:24 ` [PATCH 00/26] DCD: Add support for Dynamic Capacity Devices (DCD) fan
2024-03-28  5:20   ` Ira Weiny
2024-04-03 20:39     ` Jonathan Cameron
2024-04-04 10:20 ` Jonathan Cameron
2024-04-04 17:49 ` Jonathan Cameron
2024-05-01 23:49   ` Ira Weiny
2024-05-03  9:20     ` Jonathan Cameron
2024-05-06  4:24       ` Ira Weiny
2024-05-08 14:43         ` Jonathan Cameron
2024-04-10 18:01 ` Alison Schofield

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=66385e1b63173_25842129417@iweiny-mobl.notmuch \
    --to=ira.weiny@intel.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=alison.schofield@intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=dave@stgolabs.net \
    --cc=fan.ni@samsung.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=navneet.singh@intel.com \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).