Re: [PATCH] [RFC PATCH v2]mm/slub: Optimize slub memory usage

From: Jay Patel <jaypatel@linux.ibm.com>
To: Vlastimil Babka <vbabka@suse.cz>, linux-mm@kvack.org
Cc: cl@linux.com, penberg@kernel.org, rientjes@google.com,
	iamjoonsoo.kim@lge.com, akpm@linux-foundation.org,
	aneesh.kumar@linux.ibm.com, tsahu@linux.ibm.com,
	piyushs@linux.ibm.com
Subject: Re: [PATCH] [RFC PATCH v2]mm/slub: Optimize slub memory usage
Date: Thu, 20 Jul 2023 16:00:56 +0530	[thread overview]
Message-ID: <d841547b0bca28ee1ee7dd3b4dfc6a6dfa403755.camel@linux.ibm.com> (raw)
In-Reply-To: <a3bbb264-6d04-6917-f9b6-eade87a50707@suse.cz>

On Wed, 2023-07-12 at 15:06 +0200, Vlastimil Babka wrote:
> On 6/28/23 11:57, Jay Patel wrote:
> > In the previous version [1], we were able to reduce slub memory
> > wastage, but the total memory was also increasing so to solve
> > this problem have modified the patch as follow:
> > 
> > 1) If min_objects * object_size > PAGE_ALLOC_COSTLY_ORDER, then it
> > will return with PAGE_ALLOC_COSTLY_ORDER.
> > 2) Similarly, if min_objects * object_size < PAGE_SIZE, then it
> > will
> > return with slub_min_order.
> > 3) Additionally, I changed slub_max_order to 2. There is no
> > specific
> > reason for using the value 2, but it provided the best results in
> > terms of performance without any noticeable impact.
> > 
> > [1]
> > 
> 
> Hi,
> 
> thanks for the v2. A process note: the changelog should be self-
> contained as
> will become the commit description in git log. What this would mean
> here is
> to take the v1 changelog and adjust description to how v2 is
> implemented,
> and of course replace the v1 measurements with new ones.
> 
> The "what changed since v1" can be summarized in the area after sign-
> off and
> "---", before the diffstat. This helps those that looked at v1
> previously,
> but doesn't become part of git log.
> 
> Now, my impression is that v1 made a sensible tradeoff for 4K pages,
> as the
> wastage was reduced, yet overal slab consumption didn't increase
> much. But
> for 64K the tradeoff looked rather bad. I think it's because with 64K
> pages
> and certain object size you can e.g. get less waste with order-3 than
> order-2, but the difference will be relatively tiny part of the 64KB,
> so
> it's not worth the increase of order, while with 4KB you can get
> larger
> reduction of waste both in absolute amount and especially relatively
> to the
> 4KB size.
> 
> So I think ideally the calculation would somehow take this into
> account. The
> changes done in v2 as described above are different. It seems as a
> result we
> can now calculate lower orders on 4K systems than before the patch,
> probably
> due to conditions 2) or 3) ? I think it would be best if the patch
> resulted
> only in the same or higher order. It should be enough to tweak some
> thresholds for when it makes sense to pay the price of higher order -
> whether the reduction of wastage is worth it, in a way that takes the
> page
> size into account.
> 
> Thanks,
> Vlastimil

Hi Vlastimil,

Indeed, I aim to optimize memory allocation in the SLUB
allocator [1] by targeting larger page sizes with minimal modifications
, resulting in reduced memory consumpion. 

[1]https://lore.kernel.org/linux-mm/20230720102337.2069722-1-
jaypatel@linux.ibm.com/

Thanks,
Jay Patel
> 
> > I have conducted tests on systems with 160 CPUs and 16 CPUs using
> > 4K
> > and 64K page sizes. The tests showed that the patch successfully
> > reduces the total and wastage of slab memory without any noticeable
> > performance degradation in the hackbench test.
> > 
> > Test Results are as follows:
> > 1) On 160 CPUs with 4K Page size
> > 
> > +----------------+----------------+----------------+
> > >          Total wastage in slub memory            |
> > +----------------+----------------+----------------+
> > >                | After Boot     | After Hackbench|
> > > Normal         | 2090 Kb        | 3204 Kb        |
> > > With Patch     | 1825 Kb        | 3088 Kb        |
> > > Wastage reduce | ~12%           | ~4%            |
> > +----------------+----------------+----------------+
> > 
> > +-----------------+----------------+----------------+
> > >            Total slub memory                      |
> > +-----------------+----------------+----------------+
> > >                 | After Boot     | After Hackbench|
> > > Normal          | 500572         | 713568         |
> > > With Patch      | 482036         | 688312         |
> > > Memory reduce   | ~4%            | ~3%            |
> > +-----------------+----------------+----------------+
> > 
> > hackbench-process-sockets
> > +-------+-----+----------+----------+-----------+
> > >             |  Normal  |With Patch|           |
> > +-------+-----+----------+----------+-----------+
> > > Amean |  1  |  1.3237  |  1.2737  | ( 3.78%)  |
> > > Amean |   4 |   1.5923 |   1.6023 | ( -0.63%) |
> > > Amean |   7 |   2.3727 |   2.4260 | ( -2.25%) |
> > > Amean |  12 |   3.9813 |   4.1290 | ( -3.71%) |
> > > Amean |  21 |   6.9680 |   7.0630 | ( -1.36%) |
> > > Amean |  30 |  10.1480 |  10.2170 | ( -0.68%) |
> > > Amean |  48 |  16.7793 |  16.8780 | ( -0.59%) |
> > > Amean |  79 |  28.9537 |  28.8187 | ( 0.47%)  |
> > > Amean | 110 |  39.5507 |  40.0157 | ( -1.18%) |
> > > Amean | 141 |  51.5670 |  51.8200 | ( -0.49%) |
> > > Amean | 172 |  62.8710 |  63.2540 | ( -0.61%) |
> > > Amean | 203 |  74.6417 |  75.2520 | ( -0.82%) |
> > > Amean | 234 |  86.0853 |  86.5653 | ( -0.56%) |
> > > Amean | 265 |  97.9203 |  98.4617 | ( -0.55%) |
> > > Amean | 296 | 108.6243 | 109.8770 | ( -1.15%) |
> > +-------+-----+----------+----------+-----------+
> > 
> > 2) On 160 CPUs with 64K Page size
> > +-----------------+----------------+----------------+
> > >          Total wastage in slub memory             |
> > +-----------------+----------------+----------------+
> > >                 | After Boot     |After Hackbench |
> > > Normal          | 919 Kb         | 1880 Kb        |
> > > With Patch      | 807 Kb         | 1684 Kb        |
> > > Wastage reduce  | ~12%           | ~10%           |
> > +-----------------+----------------+----------------+
> > 
> > +-----------------+----------------+----------------+
> > >            Total slub memory                      |
> > +-----------------+----------------+----------------+
> > >                 | After Boot     | After Hackbench|
> > > Normal          | 1862592        | 3023744        |
> > > With Patch      | 1644416        | 2675776        |
> > > Memory reduce   | ~12%           | ~11%           |
> > +-----------------+----------------+----------------+
> > 
> > hackbench-process-sockets
> > +-------+-----+----------+----------+-----------+
> > >             |  Normal  |With Patch|           |
> > +-------+-----+----------+----------+-----------+
> > > Amean |  1  |  1.2547  |  1.2677  | ( -1.04%) |
> > > Amean |   4 |   1.5523 |   1.5783 | ( -1.67%) |
> > > Amean |   7 |   2.4157 |   2.3883 | ( 1.13%)  |
> > > Amean |  12 |   3.9807 |   3.9793 | ( 0.03%)  |
> > > Amean |  21 |   6.9687 |   6.9703 | ( -0.02%) |
> > > Amean |  30 |  10.1403 |  10.1297 | ( 0.11%)  |
> > > Amean |  48 |  16.7477 |  16.6893 | ( 0.35%)  |
> > > Amean |  79 |  27.9510 |  28.0463 | ( -0.34%) |
> > > Amean | 110 |  39.6833 |  39.5687 | ( 0.29%)  |
> > > Amean | 141 |  51.5673 |  51.4477 | ( 0.23%)  |
> > > Amean | 172 |  62.9643 |  63.1647 | ( -0.32%) |
> > > Amean | 203 |  74.6220 |  73.7900 | ( 1.11%)  |
> > > Amean | 234 |  85.1783 |  85.3420 | ( -0.19%) |
> > > Amean | 265 |  96.6627 |  96.7903 | ( -0.13%) |
> > > Amean | 296 | 108.2543 | 108.2253 | ( 0.03%)  |
> > +-------+-----+----------+----------+-----------+
> > 
> > 3) On 16 CPUs with 4K Page size
> > +-----------------+----------------+------------------+
> > >          Total wastage in slub memory               |
> > +-----------------+----------------+------------------+
> > >                 | After Boot     | After Hackbench  |
> > > Normal          | 491 Kb         | 727 Kb           |
> > > With Patch      | 483 Kb         | 670 Kb           |
> > > Wastage reduce  | ~1%            | ~8%              |
> > +-----------------+----------------+------------------+
> > 
> > +-----------------+----------------+----------------+
> > >            Total slub memory                      |
> > +-----------------+----------------+----------------+
> > >                 | After Boot      | After Hackbench|
> > > Normal          | 105340          |  153116        |
> > > With Patch      | 103620          | 147412         |
> > > Memory reduce   | ~1.6%           | ~4%            |
> > +-----------------+----------------+----------------+
> > 
> > hackbench-process-sockets
> > +-------+-----+----------+----------+---------+
> > >             |  Normal  |With Patch|         |
> > +-------+-----+----------+----------+---------+
> > > Amean | 1  | 1.0963   | 1.1070  | ( -0.97%) |
> > > Amean |  4 |  3.7963) |  3.7957 | ( 0.02%)  |
> > > Amean |  7 |  6.5947) |  6.6017 | ( -0.11%) |
> > > Amean | 12 | 11.1993) | 11.1730 | ( 0.24%)  |
> > > Amean | 21 | 19.4097) | 19.3647 | ( 0.23%)  |
> > > Amean | 30 | 27.7023) | 27.6040 | ( 0.35%)  |
> > > Amean | 48 | 44.1287) | 43.9630 | ( 0.38%)  |
> > > Amean | 64 | 58.8147) | 58.5753 | ( 0.41%)  |
> > +-------+----+---------+----------+-----------+
> > 
> > 4) On 16 CPUs with 64K Page size
> > +----------------+----------------+----------------+
> > >          Total wastage in slub memory            |
> > +----------------+----------------+----------------+
> > >                | After Boot     | After Hackbench|
> > > Normal         | 194 Kb         | 349 Kb         |
> > > With Patch     | 191 Kb         | 344 Kb         |
> > > Wastage reduce | ~1%            | ~1%            |
> > +----------------+----------------+----------------+
> > 
> > +-----------------+----------------+----------------+
> > >            Total slub memory                      |
> > +-----------------+----------------+----------------+
> > >                 | After Boot      | After Hackbench|
> > > Normal          | 330304          | 472960        |
> > > With Patch      | 319808          | 458944        |
> > > Memory reduce   | ~3%             | ~3%           |
> > +-----------------+----------------+----------------+
> > 
> > hackbench-process-sockets
> > +-------+-----+----------+----------+---------+
> > >             |  Normal  |With Patch|         |
> > +-------+----+----------+----------+----------+
> > > Amean | 1  |  1.9030  |  1.8967  | ( 0.33%) |
> > > Amean |  4 |   7.2117 |   7.1283 | ( 1.16%) |
> > > Amean |  7 |  12.5247 |  12.3460 | ( 1.43%) |
> > > Amean | 12 |  21.7157 |  21.4753 | ( 1.11%) |
> > > Amean | 21 |  38.2693 |  37.6670 | ( 1.57%) |
> > > Amean | 30 |  54.5930 |  53.8657 | ( 1.33%) |
> > > Amean | 48 |  87.6700 |  86.3690 | ( 1.48%) |
> > > Amean | 64 | 117.1227 | 115.4893 | ( 1.39%) |
> > +-------+----+----------+----------+----------+
> > 
> > Signed-off-by: Jay Patel <jaypatel@linux.ibm.com>
> > ---
> >  mm/slub.c | 52 +++++++++++++++++++++++++------------------------
> > ---
> >  1 file changed, 25 insertions(+), 27 deletions(-)
> > 
> > diff --git a/mm/slub.c b/mm/slub.c
> > index c87628cd8a9a..0a1090c528da 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -4058,7 +4058,7 @@ EXPORT_SYMBOL(kmem_cache_alloc_bulk);
> >   */
> >  static unsigned int slub_min_order;
> >  static unsigned int slub_max_order =
> > -	IS_ENABLED(CONFIG_SLUB_TINY) ? 1 : PAGE_ALLOC_COSTLY_ORDER;
> > +	IS_ENABLED(CONFIG_SLUB_TINY) ? 1 : 2;
> >  static unsigned int slub_min_objects;
> >  
> >  /*
> > @@ -4087,11 +4087,10 @@ static unsigned int slub_min_objects;
> >   * the smallest order which will fit the object.
> >   */
> >  static inline unsigned int calc_slab_order(unsigned int size,
> > -		unsigned int min_objects, unsigned int max_order,
> > -		unsigned int fract_leftover)
> > +		unsigned int min_objects, unsigned int max_order)
> >  {
> >  	unsigned int min_order = slub_min_order;
> > -	unsigned int order;
> > +	unsigned int order, min_wastage = size, min_wastage_order =
> > MAX_ORDER+1;
> >  
> >  	if (order_objects(min_order, size) > MAX_OBJS_PER_PAGE)
> >  		return get_order(size * MAX_OBJS_PER_PAGE) - 1;
> > @@ -4104,11 +4103,17 @@ static inline unsigned int
> > calc_slab_order(unsigned int size,
> >  
> >  		rem = slab_size % size;
> >  
> > -		if (rem <= slab_size / fract_leftover)
> > -			break;
> > +		if (rem < min_wastage) {
> > +			min_wastage = rem;
> > +			min_wastage_order = order;
> > +		}
> >  	}
> >  
> > -	return order;
> > +	if (min_wastage_order <= slub_max_order)
> > +		return min_wastage_order;
> > +	else
> > +		return order;
> > +
> >  }
> >  
> >  static inline int calculate_order(unsigned int size)
> > @@ -4142,35 +4147,28 @@ static inline int calculate_order(unsigned
> > int size)
> >  			nr_cpus = nr_cpu_ids;
> >  		min_objects = 4 * (fls(nr_cpus) + 1);
> >  	}
> > +
> > +	if ((min_objects * size) > (PAGE_SIZE <<
> > PAGE_ALLOC_COSTLY_ORDER))
> > +		return PAGE_ALLOC_COSTLY_ORDER;
> > +
> > +	if ((min_objects * size) <= PAGE_SIZE)
> > +		return slub_min_order;
> > +
> >  	max_objects = order_objects(slub_max_order, size);
> >  	min_objects = min(min_objects, max_objects);
> >  
> > -	while (min_objects > 1) {
> > -		unsigned int fraction;
> > -
> > -		fraction = 16;
> > -		while (fraction >= 4) {
> > -			order = calc_slab_order(size, min_objects,
> > -					slub_max_order, fraction);
> > -			if (order <= slub_max_order)
> > -				return order;
> > -			fraction /= 2;
> > -		}
> > +	while (min_objects >= 1) {
> > +		order = calc_slab_order(size, min_objects,
> > +		slub_max_order);
> > +		if (order <= slub_max_order)
> > +			return order;
> >  		min_objects--;
> >  	}
> >  
> > -	/*
> > -	 * We were unable to place multiple objects in a slab. Now
> > -	 * lets see if we can place a single object there.
> > -	 */
> > -	order = calc_slab_order(size, 1, slub_max_order, 1);
> > -	if (order <= slub_max_order)
> > -		return order;
> > -
> >  	/*
> >  	 * Doh this slab cannot be placed using slub_max_order.
> >  	 */
> > -	order = calc_slab_order(size, 1, MAX_ORDER, 1);
> > +	order = calc_slab_order(size, 1, MAX_ORDER);
> >  	if (order <= MAX_ORDER)
> >  		return order;
> >  	return -ENOSYS;