All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Kemi Wang <kemi.wang@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	Mel Gorman <mgorman@techsingularity.net>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Christopher Lameter <cl@linux.com>,
	YASUAKI ISHIMATSU <yasu.isimatu@gmail.com>,
	Andrey Ryabinin <aryabinin@virtuozzo.com>,
	Nikolay Borisov <nborisov@suse.com>,
	Pavel Tatashin <pasha.tatashin@oracle.com>,
	David Rientjes <rientjes@google.com>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	Dave <dave.hansen@linux.intel.com>,
	Andi Kleen <andi.kleen@intel.com>,
	Tim Chen <tim.c.chen@intel.com>,
	Jesper Dangaard Brouer <brouer@redhat.com>,
	Ying Huang <ying.huang@intel.com>, Aaron Lu <aaron.lu@intel.com>,
	Aubrey Li <aubrey.li@intel.com>, Linux MM <linux-mm@kvack.org>,
	Linux Kernel <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 1/2] mm: NUMA stats code cleanup and enhancement
Date: Wed, 29 Nov 2017 13:17:40 +0100	[thread overview]
Message-ID: <20171129121740.f6drkbktc43l5ib6@dhcp22.suse.cz> (raw)
In-Reply-To: <1511848824-18709-1-git-send-email-kemi.wang@intel.com>

On Tue 28-11-17 14:00:23, Kemi Wang wrote:
> The existed implementation of NUMA counters is per logical CPU along with
> zone->vm_numa_stat[] separated by zone, plus a global numa counter array
> vm_numa_stat[]. However, unlike the other vmstat counters, numa stats don't
> effect system's decision and are only read from /proc and /sys, it is a
> slow path operation and likely tolerate higher overhead. Additionally,
> usually nodes only have a single zone, except for node 0. And there isn't
> really any use where you need these hits counts separated by zone.
> 
> Therefore, we can migrate the implementation of numa stats from per-zone to
> per-node, and get rid of these global numa counters. It's good enough to
> keep everything in a per cpu ptr of type u64, and sum them up when need, as
> suggested by Andi Kleen. That's helpful for code cleanup and enhancement
> (e.g. save more than 130+ lines code).

I agree. Having these stats per zone is a bit of overcomplication. The
only consumer is /proc/zoneinfo and I would argue this doesn't justify
the additional complexity. Who does really need to know per zone broken
out numbers?

Anyway, I haven't checked your implementation too deeply but why don't
you simply define static percpu array for each numa node?
[...]
> +extern u64 __percpu *vm_numa_stat;
[...]
> +#ifdef CONFIG_NUMA
> +	size = sizeof(u64) * num_possible_nodes() * NR_VM_NUMA_STAT_ITEMS;
> +	align = __alignof__(u64[num_possible_nodes() * NR_VM_NUMA_STAT_ITEMS]);
> +	vm_numa_stat = (u64 __percpu *)__alloc_percpu(size, align);
> +#endif
-- 
Michal Hocko
SUSE Labs

WARNING: multiple messages have this Message-ID (diff)
From: Michal Hocko <mhocko@kernel.org>
To: Kemi Wang <kemi.wang@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	Mel Gorman <mgorman@techsingularity.net>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Christopher Lameter <cl@linux.com>,
	YASUAKI ISHIMATSU <yasu.isimatu@gmail.com>,
	Andrey Ryabinin <aryabinin@virtuozzo.com>,
	Nikolay Borisov <nborisov@suse.com>,
	Pavel Tatashin <pasha.tatashin@oracle.com>,
	David Rientjes <rientjes@google.com>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	Dave <dave.hansen@linux.intel.com>,
	Andi Kleen <andi.kleen@intel.com>,
	Tim Chen <tim.c.chen@intel.com>,
	Jesper Dangaard Brouer <brouer@redhat.com>,
	Ying Huang <ying.huang@intel.com>, Aaron Lu <aaron.lu@intel.com>,
	Aubrey Li <aubrey.li@intel.com>, Linux MM <linux-mm@kvack.org>,
	Linux Kernel <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 1/2] mm: NUMA stats code cleanup and enhancement
Date: Wed, 29 Nov 2017 13:17:40 +0100	[thread overview]
Message-ID: <20171129121740.f6drkbktc43l5ib6@dhcp22.suse.cz> (raw)
In-Reply-To: <1511848824-18709-1-git-send-email-kemi.wang@intel.com>

On Tue 28-11-17 14:00:23, Kemi Wang wrote:
> The existed implementation of NUMA counters is per logical CPU along with
> zone->vm_numa_stat[] separated by zone, plus a global numa counter array
> vm_numa_stat[]. However, unlike the other vmstat counters, numa stats don't
> effect system's decision and are only read from /proc and /sys, it is a
> slow path operation and likely tolerate higher overhead. Additionally,
> usually nodes only have a single zone, except for node 0. And there isn't
> really any use where you need these hits counts separated by zone.
> 
> Therefore, we can migrate the implementation of numa stats from per-zone to
> per-node, and get rid of these global numa counters. It's good enough to
> keep everything in a per cpu ptr of type u64, and sum them up when need, as
> suggested by Andi Kleen. That's helpful for code cleanup and enhancement
> (e.g. save more than 130+ lines code).

I agree. Having these stats per zone is a bit of overcomplication. The
only consumer is /proc/zoneinfo and I would argue this doesn't justify
the additional complexity. Who does really need to know per zone broken
out numbers?

Anyway, I haven't checked your implementation too deeply but why don't
you simply define static percpu array for each numa node?
[...]
> +extern u64 __percpu *vm_numa_stat;
[...]
> +#ifdef CONFIG_NUMA
> +	size = sizeof(u64) * num_possible_nodes() * NR_VM_NUMA_STAT_ITEMS;
> +	align = __alignof__(u64[num_possible_nodes() * NR_VM_NUMA_STAT_ITEMS]);
> +	vm_numa_stat = (u64 __percpu *)__alloc_percpu(size, align);
> +#endif
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2017-11-29 12:17 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-28  6:00 [PATCH 1/2] mm: NUMA stats code cleanup and enhancement Kemi Wang
2017-11-28  6:00 ` Kemi Wang
2017-11-28  6:00 ` [PATCH 2/2] mm: Rename zone_statistics() to numa_statistics() Kemi Wang
2017-11-28  6:00   ` Kemi Wang
2017-11-28  8:09 ` [PATCH 1/2] mm: NUMA stats code cleanup and enhancement Vlastimil Babka
2017-11-28  8:09   ` Vlastimil Babka
2017-11-28  8:33   ` kemi
2017-11-28  8:33     ` kemi
2017-11-28 18:40   ` Andi Kleen
2017-11-28 18:40     ` Andi Kleen
2017-11-28 21:56     ` Andrew Morton
2017-11-28 21:56       ` Andrew Morton
2017-11-28 22:52     ` Vlastimil Babka
2017-11-28 22:52       ` Vlastimil Babka
2017-11-29 12:17 ` Michal Hocko [this message]
2017-11-29 12:17   ` Michal Hocko
2017-11-30  5:56   ` kemi
2017-11-30  5:56     ` kemi
2017-11-30  8:53     ` Michal Hocko
2017-11-30  8:53       ` Michal Hocko
2017-11-30  9:32       ` kemi
2017-11-30  9:32         ` kemi
2017-11-30  9:45         ` Michal Hocko
2017-11-30  9:45           ` Michal Hocko
2017-11-30 11:06           ` Wang, Kemi
2017-11-30 11:06             ` Wang, Kemi
2017-12-08  8:38           ` kemi
2017-12-08  8:38             ` kemi
2017-12-08  8:47             ` Michal Hocko
2017-12-08  8:47               ` Michal Hocko
2017-12-12  2:05               ` kemi
2017-12-12  2:05                 ` kemi
2017-12-12  8:11                 ` Michal Hocko
2017-12-12  8:11                   ` Michal Hocko
2017-12-14  1:40                   ` kemi
2017-12-14  1:40                     ` kemi
2017-12-14  7:29                     ` Michal Hocko
2017-12-14  7:29                       ` Michal Hocko
2017-12-14  8:55                       ` kemi
2017-12-14  8:55                         ` kemi
2017-12-14  9:23                         ` Michal Hocko
2017-12-14  9:23                           ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171129121740.f6drkbktc43l5ib6@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=aaron.lu@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=andi.kleen@intel.com \
    --cc=aryabinin@virtuozzo.com \
    --cc=aubrey.li@intel.com \
    --cc=bigeasy@linutronix.de \
    --cc=brouer@redhat.com \
    --cc=cl@linux.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=kemi.wang@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=nborisov@suse.com \
    --cc=pasha.tatashin@oracle.com \
    --cc=rientjes@google.com \
    --cc=tim.c.chen@intel.com \
    --cc=vbabka@suse.cz \
    --cc=yasu.isimatu@gmail.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.