linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Keith Busch <kbusch@kernel.org>
To: Hannes Reinecke <hare@suse.de>
Cc: Nilay Shroff <nilay@linux.ibm.com>,
	Sagi Grimberg <sagi@grimberg.me>,
	linux-nvme@lists.infradead.org, hch@lst.de, gjoyce@linux.ibm.com,
	axboe@fb.com
Subject: Re: [PATCH] nvme: find numa distance only if controller has valid numa id
Date: Mon, 15 Apr 2024 10:56:07 -0600	[thread overview]
Message-ID: <Zh1cJwMo4cntWoHS@kbusch-mbp> (raw)
In-Reply-To: <05dbae65-2cc2-40d7-9066-a83cdfdc47be@suse.de>

On Mon, Apr 15, 2024 at 04:39:45PM +0200, Hannes Reinecke wrote:
> > For calculating the distance between two nodes we invoke the function __node_distance().
> > This function would then access the numa distance table, which is typically an array with
> > valid index starting from 0. So obviously accessing this table with index of -1 would
> > deference incorrect memory location. De-referencing incorrect memory location might have
> > side effects including panic (though I didn't encounter panic). Furthermore in such a case,
> > the calculated node distance could potentially be incorrect and that might cause the nvme
> > multipath to choose a suboptimal IO path.
> > 
> > This patch may not help choosing the optimal IO path (as we assume that the node distance would be
> > LOCAL_DISTANCE in case nvme controller numa node id is -1) but it ensures that we don't access the
> > invalid memory location for calculating node distance.
> > 
> Hmm. One wonders: how does such a system work?
> The systems I know always have the PCI slots attached to the CPU
> sockets, so if the CPU is not present the NVMe device on that
> slot will be non-functional. In fact, it wouldn't be visible at
> all as the PCI lanes are not powered up.
> In your system the PCI lanes clearly are powered up, as the NVMe
> device shows up in the PCI enumeration.
> Which means you are running a rather different PCI configuration.
> Question now is: does the NVMe device _work_?
> If it does, shouldn't the NUMA node continue to be present (some kind of
> memory-less, CPU-less NUMA node ...)?
> As a side-note, we'll need these kind of configuration anyway once
> CXL switches become available...

I recall systems with IO controller attached in a shared manner to all
sockets, so memory is UMA from IO device perspecitve (it may still be
NUMA from CPU). I don't think you need to consider memory-only NUMA
nodes unless there are additional distances to consider (at which point
it's no longer UMA).


  reply	other threads:[~2024-04-15 16:56 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-13  9:04 [PATCH] nvme: find numa distance only if controller has valid numa id Nilay Shroff
2024-04-14  8:30 ` Sagi Grimberg
2024-04-14 11:02   ` Nilay Shroff
2024-04-15  8:55     ` Sagi Grimberg
2024-04-15  9:30       ` Nilay Shroff
2024-04-15 10:04         ` Sagi Grimberg
2024-04-15 14:39         ` Hannes Reinecke
2024-04-15 16:56           ` Keith Busch [this message]
2024-04-16  8:06           ` Nilay Shroff
2024-04-15  7:25 ` Christoph Hellwig
2024-04-15  7:54   ` Nilay Shroff

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zh1cJwMo4cntWoHS@kbusch-mbp \
    --to=kbusch@kernel.org \
    --cc=axboe@fb.com \
    --cc=gjoyce@linux.ibm.com \
    --cc=hare@suse.de \
    --cc=hch@lst.de \
    --cc=linux-nvme@lists.infradead.org \
    --cc=nilay@linux.ibm.com \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).