All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jan Beulich <jbeulich@suse.com>
To: Wei Chen <Wei.Chen@arm.com>
Cc: nd@arm.com, "Andrew Cooper" <andrew.cooper3@citrix.com>,
	"Roger Pau Monné" <roger.pau@citrix.com>, "Wei Liu" <wl@xen.org>,
	"George Dunlap" <george.dunlap@citrix.com>,
	"Julien Grall" <julien@xen.org>,
	"Stefano Stabellini" <sstabellini@kernel.org>,
	xen-devel@lists.xenproject.org
Subject: Re: [PATCH v5 2/6] xen/x86: move generically usable NUMA code from x86 to common
Date: Thu, 29 Sep 2022 14:14:20 +0200	[thread overview]
Message-ID: <ce60f432-fed5-0fbb-c544-36b767c22130@suse.com> (raw)
In-Reply-To: <72691b9b-761e-a89b-97df-afd5cf0ddebb@arm.com>

On 29.09.2022 09:43, Wei Chen wrote:
> On 2022/9/27 16:19, Jan Beulich wrote:
>> On 20.09.2022 11:12, Wei Chen wrote:
>>> +        nodes_used++;
>>> +        if ( epdx > memtop )
>>> +            memtop = epdx;
>>> +    }
>>> +
>>> +    if ( nodes_used <= 1 )
>>> +        i = BITS_PER_LONG - 1;
>>
>> Is this actually going to be correct for all architectures? Aiui
>> Arm64 has only up to 48 physical address bits, but what about an
>> architecture allowing the use of all 64 bits? I think at the very
>> least we want BUILD_BUG_ON(PADDR_BITS >= BITS_PER_LONG) here.
>>
> 
> Ok I will add above BUILD_BUG_ON. And I also have question why can't
> we use PADDR_BITS here directly?

Well, if you used PADDR_BITS, then you would use it without subtracting
1, and you'd be in trouble again when PADDR_BITS == BITS_PER_LONG. What
may be possible to do instead of BUILD_BUG_ON() is

    if ( nodes_used <= 1 )
        i = min(PADDR_BITS, BITS_PER_LONG - 1);

>>> +    else
>>> +        i = find_first_bit(&bitfield, sizeof(unsigned long) * 8);
>>> +
>>> +    memnodemapsize = (memtop >> i) + 1;
>>
>> Again perhaps the subject of a separate patch: Isn't there an off-by-1
>> mistake here? memtop is the maximum of all epdx-es, which are
>> calculated to be the first PDX following the region. Hence I'd expect
>>
>>      memnodemapsize = ((memtop - 1) >> i) + 1;
>>
>> here. I guess I'll make patches for both issues, which you may then
>> need to re-base over.
>>
> 
> Thanks, I will wait your patches.

Already sent out yesterday.

>>> +static void cf_check dump_numa(unsigned char key)
>>> +{
>>> +    s_time_t now = NOW();
>>> +    unsigned int i, j, n;
>>> +    struct domain *d;
>>> +    const struct page_info *page;
>>> +    unsigned int page_num_node[MAX_NUMNODES];
>>> +    const struct vnuma_info *vnuma;
>>> +
>>> +    printk("'%c' pressed -> dumping numa info (now = %"PRI_stime")\n", key,
>>> +           now);
>>> +
>>> +    for_each_online_node ( i )
>>> +    {
>>> +        paddr_t pa = pfn_to_paddr(node_start_pfn(i) + 1);
>>> +
>>> +        printk("NODE%u start->%lu size->%lu free->%lu\n",
>>> +               i, node_start_pfn(i), node_spanned_pages(i),
>>> +               avail_node_heap_pages(i));
>>> +        /* Sanity check phys_to_nid() */
>>> +        if ( phys_to_nid(pa) != i )
>>> +            printk("phys_to_nid(%"PRIpaddr") -> %d should be %u\n",
>>> +                   pa, phys_to_nid(pa), i);
>>> +    }
>>> +
>>> +    j = cpumask_first(&cpu_online_map);
>>> +    n = 0;
>>> +    for_each_online_cpu ( i )
>>> +    {
>>> +        if ( i != j + n || cpu_to_node[j] != cpu_to_node[i] )
>>> +        {
>>> +            if ( n > 1 )
>>> +                printk("CPU%u...%u -> NODE%d\n", j, j + n - 1, cpu_to_node[j]);
>>> +            else
>>> +                printk("CPU%u -> NODE%d\n", j, cpu_to_node[j]);
>>> +            j = i;
>>> +            n = 1;
>>> +        }
>>> +        else
>>> +            ++n;
>>> +    }
>>> +    if ( n > 1 )
>>> +        printk("CPU%u...%u -> NODE%d\n", j, j + n - 1, cpu_to_node[j]);
>>> +    else
>>> +        printk("CPU%u -> NODE%d\n", j, cpu_to_node[j]);
>>> +
>>> +    rcu_read_lock(&domlist_read_lock);
>>> +
>>> +    printk("Memory location of each domain:\n");
>>> +    for_each_domain ( d )
>>> +    {
>>> +        process_pending_softirqs();
>>> +
>>> +        printk("Domain %u (total: %u):\n", d->domain_id, domain_tot_pages(d));
>>> +
>>> +        for_each_online_node ( i )
>>> +            page_num_node[i] = 0;
>>
>> I'd be inclined to suggest to use memset() here, but I won't insist
>> on you doing this "on the fly". Along with this would likely go the
>> request to limit the scope of page_num_node[] (and then perhaps also
>> vnuma and page).
>>
> 
> memset for page_num_node makes sense, I will do it before 
> for_each_domain ( d ).

That won't be right - array elements need clearing on every iteration.
Plus ...

> About limit the scope, did you mean, we should move:
> 
> "const struct page_info *page;
> unsigned int page_num_node[MAX_NUMNODES];
> const struct vnuma_info *vnuma;"
> 
> to the block of for_each_domain ( d )?

... this limiting of scope (yes to your question) would also conflict
with the movement you suggest. It is actually (among other things)
such a mistaken movement which the more narrow scope is intended to
prevent.

Jan


  reply	other threads:[~2022-09-29 12:14 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-20  9:12 [PATCH v5 0/6] Device tree based NUMA support for Arm - Part#2 Wei Chen
2022-09-20  9:12 ` [PATCH v5 1/6] xen/x86: Provide helpers for common code to access acpi_numa Wei Chen
2022-09-27  7:37   ` Jan Beulich
2022-09-29  6:29     ` Wei Chen
2022-09-20  9:12 ` [PATCH v5 2/6] xen/x86: move generically usable NUMA code from x86 to common Wei Chen
2022-09-27  8:19   ` Jan Beulich
2022-09-27  9:39     ` Jan Beulich
2022-09-29  7:58       ` Wei Chen
2022-09-29  7:43     ` Wei Chen
2022-09-29 12:14       ` Jan Beulich [this message]
2022-09-30  1:45         ` Wei Chen
2022-09-20  9:12 ` [PATCH v5 3/6] xen/x86: Use ASSERT instead of VIRTUAL_BUG_ON for phys_to_nid Wei Chen
2022-09-20  9:12 ` [PATCH v5 4/6] xen/x86: use arch_get_ram_range to get information from E820 map Wei Chen
2022-09-20  9:12 ` [PATCH v5 5/6] xen/x86: move NUMA scan nodes codes from x86 to common Wei Chen
2022-09-27 15:48   ` Jan Beulich
2022-09-29  8:21     ` Wei Chen
2022-09-29 12:21       ` Jan Beulich
2022-09-30  1:40         ` Wei Chen
2022-09-30  6:03           ` Jan Beulich
2022-10-09  7:25             ` Wei Chen
2022-10-10  7:03               ` Wei Chen
2022-10-10  8:25                 ` Jan Beulich
2022-09-20  9:12 ` [PATCH v5 6/6] xen: introduce a Kconfig option to configure NUMA nodes number Wei Chen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ce60f432-fed5-0fbb-c544-36b767c22130@suse.com \
    --to=jbeulich@suse.com \
    --cc=Wei.Chen@arm.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=george.dunlap@citrix.com \
    --cc=julien@xen.org \
    --cc=nd@arm.com \
    --cc=roger.pau@citrix.com \
    --cc=sstabellini@kernel.org \
    --cc=wl@xen.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.