linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] add MemAvailable to per-node meminfo
@ 2021-12-16 12:46 Qi Zheng
  2021-12-16 12:46 ` [PATCH 1/2] mm: " Qi Zheng
  2021-12-16 12:46 ` [PATCH 2/2] mm: reimplement si_mem_available() Qi Zheng
  0 siblings, 2 replies; 12+ messages in thread
From: Qi Zheng @ 2021-12-16 12:46 UTC (permalink / raw)
  To: akpm, gregkh, rafael; +Cc: linux-mm, linux-kernel, songmuchun, Qi Zheng

Hi,

This patch series aims to add "MemAvailable" to per-node meminfo.

This series is based on next-20211215.

Comments and suggestions are welcome.

Thanks,
Qi

Qi Zheng (2):
  mm: add MemAvailable to per-node meminfo
  mm: reimplement si_mem_available()

 drivers/base/node.c    |  4 ++++
 include/linux/mm.h     |  1 +
 include/linux/mmzone.h |  5 +++++
 mm/page_alloc.c        | 44 +++++++++++++++++++++++++++++++-------------
 4 files changed, 41 insertions(+), 13 deletions(-)

-- 
2.11.0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/2] mm: add MemAvailable to per-node meminfo
  2021-12-16 12:46 [PATCH 0/2] add MemAvailable to per-node meminfo Qi Zheng
@ 2021-12-16 12:46 ` Qi Zheng
  2021-12-16 13:16   ` Greg KH
  2021-12-16 12:46 ` [PATCH 2/2] mm: reimplement si_mem_available() Qi Zheng
  1 sibling, 1 reply; 12+ messages in thread
From: Qi Zheng @ 2021-12-16 12:46 UTC (permalink / raw)
  To: akpm, gregkh, rafael; +Cc: linux-mm, linux-kernel, songmuchun, Qi Zheng

In /proc/meminfo, we can show the sum of all the available memory
as "MemAvailable". Add the same counter also to per-node meminfo
under /sys.

With this counter, some processes that bind nodes can make some
decisions by reading the "MemAvailable" of the corresponding nodes
directly.

Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
---
 drivers/base/node.c    |  4 ++++
 include/linux/mm.h     |  1 +
 include/linux/mmzone.h |  5 +++++
 mm/page_alloc.c        | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 59 insertions(+)

diff --git a/drivers/base/node.c b/drivers/base/node.c
index 87acc47e8951..deb2a7965ae4 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -375,8 +375,10 @@ static ssize_t node_read_meminfo(struct device *dev,
 	struct sysinfo i;
 	unsigned long sreclaimable, sunreclaimable;
 	unsigned long swapcached = 0;
+	long available;
 
 	si_meminfo_node(&i, nid);
+	available = si_mem_available_node(&i, nid);
 	sreclaimable = node_page_state_pages(pgdat, NR_SLAB_RECLAIMABLE_B);
 	sunreclaimable = node_page_state_pages(pgdat, NR_SLAB_UNRECLAIMABLE_B);
 #ifdef CONFIG_SWAP
@@ -386,6 +388,7 @@ static ssize_t node_read_meminfo(struct device *dev,
 			    "Node %d MemTotal:       %8lu kB\n"
 			    "Node %d MemFree:        %8lu kB\n"
 			    "Node %d MemUsed:        %8lu kB\n"
+			    "Node %d MemAvailable:   %8lu kB\n"
 			    "Node %d SwapCached:     %8lu kB\n"
 			    "Node %d Active:         %8lu kB\n"
 			    "Node %d Inactive:       %8lu kB\n"
@@ -398,6 +401,7 @@ static ssize_t node_read_meminfo(struct device *dev,
 			    nid, K(i.totalram),
 			    nid, K(i.freeram),
 			    nid, K(i.totalram - i.freeram),
+			    nid, K(available),
 			    nid, K(swapcached),
 			    nid, K(node_page_state(pgdat, NR_ACTIVE_ANON) +
 				   node_page_state(pgdat, NR_ACTIVE_FILE)),
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 1d4f731a8e18..34a5f5df388b 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2532,6 +2532,7 @@ extern void mem_init(void);
 extern void __init mmap_init(void);
 extern void show_mem(unsigned int flags, nodemask_t *nodemask);
 extern long si_mem_available(void);
+extern long si_mem_available_node(struct sysinfo *val, int nid);
 extern void si_meminfo(struct sysinfo * val);
 extern void si_meminfo_node(struct sysinfo *val, int nid);
 #ifdef __HAVE_ARCH_RESERVED_KERNEL_PAGES
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 936dc0b6c226..321c12f6272f 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1132,6 +1132,11 @@ extern struct zone *next_zone(struct zone *zone);
 			; /* do nothing */		\
 		else
 
+#define for_each_pgdat_zone(pgdat, zone)				\
+	for (zone = (pgdat)->node_zones;				\
+	     zone < (pgdat)->node_zones + MAX_NR_ZONES - 1 && zone;	\
+	     zone++)
+
 static inline struct zone *zonelist_zone(struct zoneref *zoneref)
 {
 	return zoneref->zone;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index edfd6c81af82..31f5e3e335cf 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5731,6 +5731,55 @@ static inline void show_node(struct zone *zone)
 		printk("Node %d ", zone_to_nid(zone));
 }
 
+/**
+ * si_mem_available_node - helper to calculate the size of available memory
+ *			   of the given node
+ * @val: pointer to struct sysinfo
+ * @nid: the node id
+ */
+long si_mem_available_node(struct sysinfo *val, int nid)
+{
+	long available;
+	unsigned long pagecache;
+	unsigned long reclaimable;
+	unsigned long wmark_low = 0;
+	struct pglist_data *pgdat = NODE_DATA(nid);
+	struct zone *zone;
+
+	for_each_pgdat_zone(pgdat, zone)
+		wmark_low += low_wmark_pages(zone);
+
+	/*
+	 * Estimate the amount of memory available for userspace allocations,
+	 * without causing swapping.
+	 */
+	available = val->freeram - pgdat->totalreserve_pages;
+
+	/*
+	 * Not all the page cache can be freed, otherwise the system will
+	 * start swapping. Assume at least half of the page cache, or the
+	 * low watermark worth of cache, needs to stay.
+	 */
+	pagecache = node_page_state(pgdat, NR_ACTIVE_FILE) +
+		    node_page_state(pgdat, NR_INACTIVE_FILE);
+	pagecache -= min(pagecache / 2, wmark_low);
+	available += pagecache;
+
+	/*
+	 * Part of the reclaimable slab and other kernel memory consists of
+	 * items that are in use, and cannot be freed. Cap this estimate at the
+	 * low watermark.
+	 */
+	reclaimable = node_page_state_pages(pgdat, NR_SLAB_RECLAIMABLE_B) +
+		      node_page_state(pgdat, NR_KERNEL_MISC_RECLAIMABLE);
+	reclaimable -= min(reclaimable / 2, wmark_low);
+	available += reclaimable;
+
+	if (available < 0)
+		available = 0;
+	return available;
+}
+
 long si_mem_available(void)
 {
 	long available;
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/2] mm: reimplement si_mem_available()
  2021-12-16 12:46 [PATCH 0/2] add MemAvailable to per-node meminfo Qi Zheng
  2021-12-16 12:46 ` [PATCH 1/2] mm: " Qi Zheng
@ 2021-12-16 12:46 ` Qi Zheng
  2021-12-16 13:17   ` Greg KH
                     ` (3 more replies)
  1 sibling, 4 replies; 12+ messages in thread
From: Qi Zheng @ 2021-12-16 12:46 UTC (permalink / raw)
  To: akpm, gregkh, rafael; +Cc: linux-mm, linux-kernel, songmuchun, Qi Zheng

Reimplement si_mem_available() by reusing si_mem_available_node().

Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
---
 mm/page_alloc.c | 45 +++++++--------------------------------------
 1 file changed, 7 insertions(+), 38 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 31f5e3e335cf..0982372c8e49 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5782,46 +5782,15 @@ long si_mem_available_node(struct sysinfo *val, int nid)
 
 long si_mem_available(void)
 {
-	long available;
-	unsigned long pagecache;
-	unsigned long wmark_low = 0;
-	unsigned long pages[NR_LRU_LISTS];
-	unsigned long reclaimable;
-	struct zone *zone;
-	int lru;
-
-	for (lru = LRU_BASE; lru < NR_LRU_LISTS; lru++)
-		pages[lru] = global_node_page_state(NR_LRU_BASE + lru);
-
-	for_each_zone(zone)
-		wmark_low += low_wmark_pages(zone);
-
-	/*
-	 * Estimate the amount of memory available for userspace allocations,
-	 * without causing swapping.
-	 */
-	available = global_zone_page_state(NR_FREE_PAGES) - totalreserve_pages;
-
-	/*
-	 * Not all the page cache can be freed, otherwise the system will
-	 * start swapping. Assume at least half of the page cache, or the
-	 * low watermark worth of cache, needs to stay.
-	 */
-	pagecache = pages[LRU_ACTIVE_FILE] + pages[LRU_INACTIVE_FILE];
-	pagecache -= min(pagecache / 2, wmark_low);
-	available += pagecache;
+	long available = 0;
+	struct sysinfo i;
+	int nid;
 
-	/*
-	 * Part of the reclaimable slab and other kernel memory consists of
-	 * items that are in use, and cannot be freed. Cap this estimate at the
-	 * low watermark.
-	 */
-	reclaimable = global_node_page_state_pages(NR_SLAB_RECLAIMABLE_B) +
-		global_node_page_state(NR_KERNEL_MISC_RECLAIMABLE);
-	available += reclaimable - min(reclaimable / 2, wmark_low);
+	for_each_online_node(nid) {
+		si_meminfo_node(&i, nid);
+		available += si_mem_available_node(&i, nid);
+	}
 
-	if (available < 0)
-		available = 0;
 	return available;
 }
 EXPORT_SYMBOL_GPL(si_mem_available);
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] mm: add MemAvailable to per-node meminfo
  2021-12-16 12:46 ` [PATCH 1/2] mm: " Qi Zheng
@ 2021-12-16 13:16   ` Greg KH
  2021-12-16 15:31     ` Qi Zheng
  0 siblings, 1 reply; 12+ messages in thread
From: Greg KH @ 2021-12-16 13:16 UTC (permalink / raw)
  To: Qi Zheng; +Cc: akpm, rafael, linux-mm, linux-kernel, songmuchun

On Thu, Dec 16, 2021 at 08:46:54PM +0800, Qi Zheng wrote:
> In /proc/meminfo, we can show the sum of all the available memory
> as "MemAvailable". Add the same counter also to per-node meminfo
> under /sys.
> 
> With this counter, some processes that bind nodes can make some
> decisions by reading the "MemAvailable" of the corresponding nodes
> directly.
> 
> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
> ---
>  drivers/base/node.c    |  4 ++++
>  include/linux/mm.h     |  1 +
>  include/linux/mmzone.h |  5 +++++
>  mm/page_alloc.c        | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 59 insertions(+)
> 
> diff --git a/drivers/base/node.c b/drivers/base/node.c
> index 87acc47e8951..deb2a7965ae4 100644
> --- a/drivers/base/node.c
> +++ b/drivers/base/node.c
> @@ -375,8 +375,10 @@ static ssize_t node_read_meminfo(struct device *dev,
>  	struct sysinfo i;
>  	unsigned long sreclaimable, sunreclaimable;
>  	unsigned long swapcached = 0;
> +	long available;
>  
>  	si_meminfo_node(&i, nid);
> +	available = si_mem_available_node(&i, nid);
>  	sreclaimable = node_page_state_pages(pgdat, NR_SLAB_RECLAIMABLE_B);
>  	sunreclaimable = node_page_state_pages(pgdat, NR_SLAB_UNRECLAIMABLE_B);
>  #ifdef CONFIG_SWAP
> @@ -386,6 +388,7 @@ static ssize_t node_read_meminfo(struct device *dev,
>  			    "Node %d MemTotal:       %8lu kB\n"
>  			    "Node %d MemFree:        %8lu kB\n"
>  			    "Node %d MemUsed:        %8lu kB\n"
> +			    "Node %d MemAvailable:   %8lu kB\n"

You just changed a user/kernel api without documenting it anywhere, or
ensuring that you did not just break anything.

Also, this api is crazy, and not ok, please never add anything new to
it, it is broken as-is.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] mm: reimplement si_mem_available()
  2021-12-16 12:46 ` [PATCH 2/2] mm: reimplement si_mem_available() Qi Zheng
@ 2021-12-16 13:17   ` Greg KH
  2021-12-16 15:39     ` Qi Zheng
  2021-12-16 15:06   ` kernel test robot
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 12+ messages in thread
From: Greg KH @ 2021-12-16 13:17 UTC (permalink / raw)
  To: Qi Zheng; +Cc: akpm, rafael, linux-mm, linux-kernel, songmuchun

On Thu, Dec 16, 2021 at 08:46:55PM +0800, Qi Zheng wrote:
> Reimplement si_mem_available() by reusing si_mem_available_node().

That says what you did, but not why.  Why are you doing this?

And what does it have to do with patch 1/2?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] mm: reimplement si_mem_available()
  2021-12-16 12:46 ` [PATCH 2/2] mm: reimplement si_mem_available() Qi Zheng
  2021-12-16 13:17   ` Greg KH
@ 2021-12-16 15:06   ` kernel test robot
  2021-12-16 18:21   ` kernel test robot
  2021-12-16 21:05   ` kernel test robot
  3 siblings, 0 replies; 12+ messages in thread
From: kernel test robot @ 2021-12-16 15:06 UTC (permalink / raw)
  To: Qi Zheng, akpm, gregkh, rafael
  Cc: kbuild-all, linux-mm, linux-kernel, songmuchun, Qi Zheng

Hi Qi,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on hnaz-mm/master]
[also build test ERROR on driver-core/driver-core-testing linux/master linus/master v5.16-rc5 next-20211215]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Qi-Zheng/add-MemAvailable-to-per-node-meminfo/20211216-204828
base:   https://github.com/hnaz/linux-mm master
config: um-i386_defconfig (https://download.01.org/0day-ci/archive/20211216/202112162327.HwWtuYUQ-lkp@intel.com/config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce (this is a W=1 build):
        # https://github.com/0day-ci/linux/commit/7980664f23d619d15a3931fe1ab7d1dbafad7c88
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Qi-Zheng/add-MemAvailable-to-per-node-meminfo/20211216-204828
        git checkout 7980664f23d619d15a3931fe1ab7d1dbafad7c88
        # save the config file to linux build tree
        mkdir build_dir
        make W=1 O=build_dir ARCH=um SUBARCH=i386 SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   /usr/bin/ld: mm/page_alloc.o: in function `si_mem_available':
>> page_alloc.c:(.text+0x3083): undefined reference to `si_meminfo_node'
   collect2: error: ld returned 1 exit status

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] mm: add MemAvailable to per-node meminfo
  2021-12-16 13:16   ` Greg KH
@ 2021-12-16 15:31     ` Qi Zheng
  2021-12-16 15:37       ` Greg KH
  0 siblings, 1 reply; 12+ messages in thread
From: Qi Zheng @ 2021-12-16 15:31 UTC (permalink / raw)
  To: Greg KH; +Cc: akpm, rafael, linux-mm, linux-kernel, songmuchun



On 12/16/21 9:16 PM, Greg KH wrote:
> On Thu, Dec 16, 2021 at 08:46:54PM +0800, Qi Zheng wrote:
>> In /proc/meminfo, we can show the sum of all the available memory
>> as "MemAvailable". Add the same counter also to per-node meminfo
>> under /sys.
>>
>> With this counter, some processes that bind nodes can make some
>> decisions by reading the "MemAvailable" of the corresponding nodes
>> directly.
>>
>> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
>> ---
>>   drivers/base/node.c    |  4 ++++
>>   include/linux/mm.h     |  1 +
>>   include/linux/mmzone.h |  5 +++++
>>   mm/page_alloc.c        | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
>>   4 files changed, 59 insertions(+)
>>
>> diff --git a/drivers/base/node.c b/drivers/base/node.c
>> index 87acc47e8951..deb2a7965ae4 100644
>> --- a/drivers/base/node.c
>> +++ b/drivers/base/node.c
>> @@ -375,8 +375,10 @@ static ssize_t node_read_meminfo(struct device *dev,
>>   	struct sysinfo i;
>>   	unsigned long sreclaimable, sunreclaimable;
>>   	unsigned long swapcached = 0;
>> +	long available;
>>   
>>   	si_meminfo_node(&i, nid);
>> +	available = si_mem_available_node(&i, nid);
>>   	sreclaimable = node_page_state_pages(pgdat, NR_SLAB_RECLAIMABLE_B);
>>   	sunreclaimable = node_page_state_pages(pgdat, NR_SLAB_UNRECLAIMABLE_B);
>>   #ifdef CONFIG_SWAP
>> @@ -386,6 +388,7 @@ static ssize_t node_read_meminfo(struct device *dev,
>>   			    "Node %d MemTotal:       %8lu kB\n"
>>   			    "Node %d MemFree:        %8lu kB\n"
>>   			    "Node %d MemUsed:        %8lu kB\n"
>> +			    "Node %d MemAvailable:   %8lu kB\n"
> 
> You just changed a user/kernel api without documenting it anywhere, or
> ensuring that you did not just break anything.

Hi greg k-h,

The MemAvailable has long existed in the /proc/meminfo, it's meaning
has been described in the Documentation/filesystems/proc.rst. Since
the semantics of per-node MemAvailable has not changed, so I did not
add a new document description.

> 
> Also, this api is crazy, and not ok, please never add anything new to
> it, it is broken as-is.

The consideration of adding per-node MemAvailable is that some processes
that bind nodes need this information to do some decisions.

Now their approach is to read other information in per-node meminfo
and /proc/sys/vm/watermark_scale_factor, and then approximate this
value. With this counter, they can directly read
/sys/devices/system/node/node*/meminfo to get the MemAvailable
information of each node.

And MemTotal, MemFree and SReclaimable(etc.) all have corresponding
per-node versions, so I think that adding per-node MemAvailable might
also make sense. :)

Thanks,
Qi

> 
> thanks,
> 
> greg k-h
> 

-- 
Thanks,
Qi

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] mm: add MemAvailable to per-node meminfo
  2021-12-16 15:31     ` Qi Zheng
@ 2021-12-16 15:37       ` Greg KH
  2021-12-16 15:43         ` Qi Zheng
  0 siblings, 1 reply; 12+ messages in thread
From: Greg KH @ 2021-12-16 15:37 UTC (permalink / raw)
  To: Qi Zheng; +Cc: akpm, rafael, linux-mm, linux-kernel, songmuchun

On Thu, Dec 16, 2021 at 11:31:36PM +0800, Qi Zheng wrote:
> 
> 
> On 12/16/21 9:16 PM, Greg KH wrote:
> > On Thu, Dec 16, 2021 at 08:46:54PM +0800, Qi Zheng wrote:
> > > In /proc/meminfo, we can show the sum of all the available memory
> > > as "MemAvailable". Add the same counter also to per-node meminfo
> > > under /sys.
> > > 
> > > With this counter, some processes that bind nodes can make some
> > > decisions by reading the "MemAvailable" of the corresponding nodes
> > > directly.
> > > 
> > > Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
> > > ---
> > >   drivers/base/node.c    |  4 ++++
> > >   include/linux/mm.h     |  1 +
> > >   include/linux/mmzone.h |  5 +++++
> > >   mm/page_alloc.c        | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
> > >   4 files changed, 59 insertions(+)
> > > 
> > > diff --git a/drivers/base/node.c b/drivers/base/node.c
> > > index 87acc47e8951..deb2a7965ae4 100644
> > > --- a/drivers/base/node.c
> > > +++ b/drivers/base/node.c
> > > @@ -375,8 +375,10 @@ static ssize_t node_read_meminfo(struct device *dev,
> > >   	struct sysinfo i;
> > >   	unsigned long sreclaimable, sunreclaimable;
> > >   	unsigned long swapcached = 0;
> > > +	long available;
> > >   	si_meminfo_node(&i, nid);
> > > +	available = si_mem_available_node(&i, nid);
> > >   	sreclaimable = node_page_state_pages(pgdat, NR_SLAB_RECLAIMABLE_B);
> > >   	sunreclaimable = node_page_state_pages(pgdat, NR_SLAB_UNRECLAIMABLE_B);
> > >   #ifdef CONFIG_SWAP
> > > @@ -386,6 +388,7 @@ static ssize_t node_read_meminfo(struct device *dev,
> > >   			    "Node %d MemTotal:       %8lu kB\n"
> > >   			    "Node %d MemFree:        %8lu kB\n"
> > >   			    "Node %d MemUsed:        %8lu kB\n"
> > > +			    "Node %d MemAvailable:   %8lu kB\n"
> > 
> > You just changed a user/kernel api without documenting it anywhere, or
> > ensuring that you did not just break anything.
> 
> Hi greg k-h,
> 
> The MemAvailable has long existed in the /proc/meminfo, it's meaning
> has been described in the Documentation/filesystems/proc.rst. Since
> the semantics of per-node MemAvailable has not changed, so I did not
> add a new document description.

This is not a proc file, it is in sysfs.

And it violates all of the sysfs rules, and has all of the problems that
proc files have.  So the worst of both worlds :(

> > Also, this api is crazy, and not ok, please never add anything new to
> > it, it is broken as-is.
> 
> The consideration of adding per-node MemAvailable is that some processes
> that bind nodes need this information to do some decisions.
> 
> Now their approach is to read other information in per-node meminfo
> and /proc/sys/vm/watermark_scale_factor, and then approximate this
> value. With this counter, they can directly read
> /sys/devices/system/node/node*/meminfo to get the MemAvailable
> information of each node.
> 
> And MemTotal, MemFree and SReclaimable(etc.) all have corresponding
> per-node versions, so I think that adding per-node MemAvailable might
> also make sense. :)

Please no, I do not want new things added to this file, as you might
break parsers of this file.

Also, again, you did not document this at all in Documentation/ABI/ so
for that reason alone it is not ok.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] mm: reimplement si_mem_available()
  2021-12-16 13:17   ` Greg KH
@ 2021-12-16 15:39     ` Qi Zheng
  0 siblings, 0 replies; 12+ messages in thread
From: Qi Zheng @ 2021-12-16 15:39 UTC (permalink / raw)
  To: Greg KH; +Cc: akpm, rafael, linux-mm, linux-kernel, songmuchun



On 12/16/21 9:17 PM, Greg KH wrote:
> On Thu, Dec 16, 2021 at 08:46:55PM +0800, Qi Zheng wrote:
>> Reimplement si_mem_available() by reusing si_mem_available_node().
> 
> That says what you did, but not why.  Why are you doing this?

Oh, sorry, I will add that.

> 
> And what does it have to do with patch 1/2?

With [PATCH 1/2], we can simply sum the MemAvailable of each node
to get the gobal MemAvailable. Essentially this is a cleanup, no
functional changes.

Thanks,
Qi

> 
> thanks,
> 
> greg k-h
> 

-- 
Thanks,
Qi

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] mm: add MemAvailable to per-node meminfo
  2021-12-16 15:37       ` Greg KH
@ 2021-12-16 15:43         ` Qi Zheng
  0 siblings, 0 replies; 12+ messages in thread
From: Qi Zheng @ 2021-12-16 15:43 UTC (permalink / raw)
  To: Greg KH; +Cc: akpm, rafael, linux-mm, linux-kernel, songmuchun



On 12/16/21 11:37 PM, Greg KH wrote:
> On Thu, Dec 16, 2021 at 11:31:36PM +0800, Qi Zheng wrote:
>>
>>
>> On 12/16/21 9:16 PM, Greg KH wrote:
>>> On Thu, Dec 16, 2021 at 08:46:54PM +0800, Qi Zheng wrote:
>>>> In /proc/meminfo, we can show the sum of all the available memory
>>>> as "MemAvailable". Add the same counter also to per-node meminfo
>>>> under /sys.
>>>>
>>>> With this counter, some processes that bind nodes can make some
>>>> decisions by reading the "MemAvailable" of the corresponding nodes
>>>> directly.
>>>>
>>>> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
>>>> ---
>>>>    drivers/base/node.c    |  4 ++++
>>>>    include/linux/mm.h     |  1 +
>>>>    include/linux/mmzone.h |  5 +++++
>>>>    mm/page_alloc.c        | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
>>>>    4 files changed, 59 insertions(+)
>>>>
>>>> diff --git a/drivers/base/node.c b/drivers/base/node.c
>>>> index 87acc47e8951..deb2a7965ae4 100644
>>>> --- a/drivers/base/node.c
>>>> +++ b/drivers/base/node.c
>>>> @@ -375,8 +375,10 @@ static ssize_t node_read_meminfo(struct device *dev,
>>>>    	struct sysinfo i;
>>>>    	unsigned long sreclaimable, sunreclaimable;
>>>>    	unsigned long swapcached = 0;
>>>> +	long available;
>>>>    	si_meminfo_node(&i, nid);
>>>> +	available = si_mem_available_node(&i, nid);
>>>>    	sreclaimable = node_page_state_pages(pgdat, NR_SLAB_RECLAIMABLE_B);
>>>>    	sunreclaimable = node_page_state_pages(pgdat, NR_SLAB_UNRECLAIMABLE_B);
>>>>    #ifdef CONFIG_SWAP
>>>> @@ -386,6 +388,7 @@ static ssize_t node_read_meminfo(struct device *dev,
>>>>    			    "Node %d MemTotal:       %8lu kB\n"
>>>>    			    "Node %d MemFree:        %8lu kB\n"
>>>>    			    "Node %d MemUsed:        %8lu kB\n"
>>>> +			    "Node %d MemAvailable:   %8lu kB\n"
>>>
>>> You just changed a user/kernel api without documenting it anywhere, or
>>> ensuring that you did not just break anything.
>>
>> Hi greg k-h,
>>
>> The MemAvailable has long existed in the /proc/meminfo, it's meaning
>> has been described in the Documentation/filesystems/proc.rst. Since
>> the semantics of per-node MemAvailable has not changed, so I did not
>> add a new document description.
> 
> This is not a proc file, it is in sysfs.
> 
> And it violates all of the sysfs rules, and has all of the problems that
> proc files have.  So the worst of both worlds :(
> 
>>> Also, this api is crazy, and not ok, please never add anything new to
>>> it, it is broken as-is.
>>
>> The consideration of adding per-node MemAvailable is that some processes
>> that bind nodes need this information to do some decisions.
>>
>> Now their approach is to read other information in per-node meminfo
>> and /proc/sys/vm/watermark_scale_factor, and then approximate this
>> value. With this counter, they can directly read
>> /sys/devices/system/node/node*/meminfo to get the MemAvailable
>> information of each node.
>>
>> And MemTotal, MemFree and SReclaimable(etc.) all have corresponding
>> per-node versions, so I think that adding per-node MemAvailable might
>> also make sense. :)
> 
> Please no, I do not want new things added to this file, as you might
> break parsers of this file.

OK, Got it.

Thank,
Qi

> 
> Also, again, you did not document this at all in Documentation/ABI/ so
> for that reason alone it is not ok.
> 
> thanks,
> 
> greg k-h
> 

-- 
Thanks,
Qi

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] mm: reimplement si_mem_available()
  2021-12-16 12:46 ` [PATCH 2/2] mm: reimplement si_mem_available() Qi Zheng
  2021-12-16 13:17   ` Greg KH
  2021-12-16 15:06   ` kernel test robot
@ 2021-12-16 18:21   ` kernel test robot
  2021-12-16 21:05   ` kernel test robot
  3 siblings, 0 replies; 12+ messages in thread
From: kernel test robot @ 2021-12-16 18:21 UTC (permalink / raw)
  To: Qi Zheng, akpm, gregkh, rafael
  Cc: llvm, kbuild-all, linux-mm, linux-kernel, songmuchun, Qi Zheng

Hi Qi,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on hnaz-mm/master]
[also build test ERROR on driver-core/driver-core-testing linux/master linus/master v5.16-rc5 next-20211215]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Qi-Zheng/add-MemAvailable-to-per-node-meminfo/20211216-204828
base:   https://github.com/hnaz/linux-mm master
config: hexagon-randconfig-r045-20211216 (https://download.01.org/0day-ci/archive/20211217/202112170216.SkyvfZzv-lkp@intel.com/config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project dd245bab9fbb364faa1581e4f92ba3119a872fba)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/7980664f23d619d15a3931fe1ab7d1dbafad7c88
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Qi-Zheng/add-MemAvailable-to-per-node-meminfo/20211216-204828
        git checkout 7980664f23d619d15a3931fe1ab7d1dbafad7c88
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=hexagon SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> ld.lld: error: undefined symbol: si_meminfo_node
   >>> referenced by page_alloc.c
   >>>               page_alloc.o:(si_mem_available) in archive mm/built-in.a
   >>> referenced by page_alloc.c
   >>>               page_alloc.o:(si_mem_available) in archive mm/built-in.a

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] mm: reimplement si_mem_available()
  2021-12-16 12:46 ` [PATCH 2/2] mm: reimplement si_mem_available() Qi Zheng
                     ` (2 preceding siblings ...)
  2021-12-16 18:21   ` kernel test robot
@ 2021-12-16 21:05   ` kernel test robot
  3 siblings, 0 replies; 12+ messages in thread
From: kernel test robot @ 2021-12-16 21:05 UTC (permalink / raw)
  To: Qi Zheng, akpm, gregkh, rafael
  Cc: kbuild-all, linux-mm, linux-kernel, songmuchun, Qi Zheng

Hi Qi,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on hnaz-mm/master]
[also build test ERROR on driver-core/driver-core-testing linux/master linus/master v5.16-rc5 next-20211215]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Qi-Zheng/add-MemAvailable-to-per-node-meminfo/20211216-204828
base:   https://github.com/hnaz/linux-mm master
config: arc-randconfig-r043-20211216 (https://download.01.org/0day-ci/archive/20211217/202112170401.UyUM1wQ5-lkp@intel.com/config)
compiler: arc-elf-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/7980664f23d619d15a3931fe1ab7d1dbafad7c88
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Qi-Zheng/add-MemAvailable-to-per-node-meminfo/20211216-204828
        git checkout 7980664f23d619d15a3931fe1ab7d1dbafad7c88
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross O=build_dir ARCH=arc SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   arc-elf-ld: mm/page_alloc.o: in function `si_mem_available':
>> (.text+0x8248): undefined reference to `si_meminfo_node'
>> arc-elf-ld: (.text+0x8248): undefined reference to `si_meminfo_node'

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-12-16 21:06 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-16 12:46 [PATCH 0/2] add MemAvailable to per-node meminfo Qi Zheng
2021-12-16 12:46 ` [PATCH 1/2] mm: " Qi Zheng
2021-12-16 13:16   ` Greg KH
2021-12-16 15:31     ` Qi Zheng
2021-12-16 15:37       ` Greg KH
2021-12-16 15:43         ` Qi Zheng
2021-12-16 12:46 ` [PATCH 2/2] mm: reimplement si_mem_available() Qi Zheng
2021-12-16 13:17   ` Greg KH
2021-12-16 15:39     ` Qi Zheng
2021-12-16 15:06   ` kernel test robot
2021-12-16 18:21   ` kernel test robot
2021-12-16 21:05   ` kernel test robot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).