All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Jones <drjones@redhat.com>
To: Igor Mammedov <imammedo@redhat.com>
Cc: peter.maydell@linaro.org, Gavin Shan <gshan@redhat.com>,
	ehabkost@redhat.com, qemu-devel@nongnu.org, qemu-arm@nongnu.org,
	shan.gavin@gmail.com
Subject: Re: [PATCH 1/2] numa: Set default distance map if needed
Date: Tue, 12 Oct 2021 15:13:08 +0200	[thread overview]
Message-ID: <20211012131308.45j7ofd4xwk42epv@gator> (raw)
In-Reply-To: <20211012142754.1c4e5071@redhat.com>

On Tue, Oct 12, 2021 at 02:27:54PM +0200, Igor Mammedov wrote:
> On Tue, 12 Oct 2021 12:37:54 +0200
> Andrew Jones <drjones@redhat.com> wrote:
> 
> > On Tue, Oct 12, 2021 at 11:40:16AM +0200, Igor Mammedov wrote:
> > > On Wed,  6 Oct 2021 18:22:08 +0800
> > > Gavin Shan <gshan@redhat.com> wrote:
> > >   
> > > > The following option is used to specify the distance map. It's
> > > > possible the option isn't provided by user. In this case, the
> > > > distance map isn't populated and exposed to platform. On the
> > > > other hand, the empty NUMA node, where no memory resides, is
> > > > allowed on ARM64 virt platform. For these empty NUMA nodes,
> > > > their corresponding device-tree nodes aren't populated, but
> > > > their NUMA IDs should be included in the "/distance-map"
> > > > device-tree node, so that kernel can probe them properly if
> > > > device-tree is used.
> > > > 
> > > >   -numa,dist,src=<numa_id>,dst=<numa_id>,val=<distance>
> > > > 
> > > > So when user doesn't specify distance map, we need to generate
> > > > the default distance map, where the local and remote distances
> > > > are 10 and 20 separately. This adds an extra parameter to the
> > > > exiting complete_init_numa_distance() to generate the default
> > > > distance map for this case.
> > > > 
> > > > Signed-off-by: Gavin Shan <gshan@redhat.com>  
> > > 
> > > 
> > > how about error-ing out if distance map is required but
> > > not provided by user explicitly and asking user to fix
> > > command line?
> > > 
> > > Reasoning behind this that defaults are hard to maintain
> > > and will require compat hacks and being raod blocks down
> > > the road.
> > > Approach I was taking with generic NUMA code, is deprecating
> > > defaults and replacing them with sanity checks, which bail
> > > out on incorrect configuration and ask user to correct command line.
> > > Hence I dislike approach taken in this patch.
> > > 
> > > If you really wish to provide default, push it out of
> > > generic code into ARM specific one
> > > (then I won't oppose it that much (I think PPC does
> > > some magic like this))
> > > Also behavior seems to be ARM specific so generic
> > > NUMA code isn't a place for it anyways  
> > 
> > The distance-map DT node and the default 10/20 distance-map values
> > aren't arch-specific. RISCV is using it too.
> > 
> > I'm on the fence with this. I see erroring-out to require users
> > to provide explicit command lines as a good thing, but I also
> > see it as potentially an unnecessary burden for those that want
> > the default map anyway. The optional nature of the distance-map
> > node and the specification of the default map is here [1]
> > 
> > [1] Linux source: Documentation/devicetree/bindings/numa.txt
> 
> Looking at proposed linux patches [ https://lkml.org/lkml/2021/9/27/31 ],
> using optional distance table as source for numa-node-ids,
> looks like a hack around kernel's inability to fish them out
> from CPU &| PCI nodes (using those nodes as source should
> cover memory-less node use-case).
> 
> I consider including optional node as a policy decision.
> So user shall include it explicitly on QEMU command line
> if necessary (that works just fine for x86), or guest OS
> can make up defaults on its own in absence of data.

OK, so erroring-out on configs that must provide distance-maps, rather
than automatically generating them for all configs is better.

> 
> > So, my r-b stands for this patch, but I also wouldn't complain
> > about respinning it to error out instead.
> 
> > I would complain about
> > moving the logic to Arm specific code, though, since RISCV would
> > then need to duplicate it.
> 
> Instead of putting workaround in QEMU and then making them generic,
> I'd prefer to:
>  1. make QEMU to be able generate DT with memory-less nodes

How? DT syntax doesn't allow this, because each node needs a unique
name which is derived from its base address, which an empty numa
node doesn't have.

>  2. fix guest to get numa-node-id from CPU/PCI nodes if
>     memory node isn't present,

I'm not sure that's possible with DT. If it is, then proposing it
upstream to Linux DT maintainers would be the next step.

> or use ACPI tables which can
>     describe memory-less NUMA nodes if fixing how DT is
>     parsed unfeasible.

We use ACPI already for our guests, but we also generate a DT (which
edk2 consumes). We can't generate a valid DT when empty numa nodes
are put on the command line unless we follow a DT spec saying how
to do that. The current spec says we should have a distance-map
that contains those nodes.

Thanks,
drew



  reply	other threads:[~2021-10-12 13:17 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-06 10:22 [PATCH 0/2] hw/arm/virt: Fix qemu booting failure on device-tree Gavin Shan
2021-10-06 10:22 ` [PATCH 1/2] numa: Set default distance map if needed Gavin Shan
2021-10-06 10:35   ` Andrew Jones
2021-10-06 11:03     ` Gavin Shan
2021-10-06 11:56       ` Andrew Jones
2021-10-07 23:51         ` Gavin Shan
2021-10-08  6:07           ` Andrew Jones
2021-10-12  6:13             ` Gavin Shan
2021-10-12  9:40   ` Igor Mammedov
2021-10-12 10:31     ` Gavin Shan
2021-10-12 11:18       ` Igor Mammedov
2021-10-12 11:48       ` Andrew Jones
2021-10-12 12:34         ` Igor Mammedov
2021-10-12 13:05           ` Andrew Jones
2021-10-12 22:59             ` Gavin Shan
2021-10-12 10:37     ` Andrew Jones
2021-10-12 12:27       ` Igor Mammedov
2021-10-12 13:13         ` Andrew Jones [this message]
2021-10-12 13:53           ` Igor Mammedov
2021-10-12 23:32             ` Gavin Shan
2021-10-13  9:32               ` Igor Mammedov
2021-10-13  6:29             ` Andrew Jones
2021-10-06 10:22 ` [PATCH 2/2] hw/arm/virt: Don't create device-tree node for empty NUMA node Gavin Shan
2021-10-06 10:36   ` Andrew Jones

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211012131308.45j7ofd4xwk42epv@gator \
    --to=drjones@redhat.com \
    --cc=ehabkost@redhat.com \
    --cc=gshan@redhat.com \
    --cc=imammedo@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-arm@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=shan.gavin@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.