All of lore.kernel.org
 help / color / mirror / Atom feed
From: Pratik Sampat <psampat@linux.ibm.com>
To: Roman Gushchin <guro@fb.com>
Cc: Dennis Zhou <dennis@kernel.org>, Tejun Heo <tj@kernel.org>,
	Christoph Lameter <cl@linux.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	pratik.r.sampat@gmail.com
Subject: Re: [PATCH v3 0/6] percpu: partial chunk depopulation
Date: Sat, 17 Apr 2021 01:14:03 +0530	[thread overview]
Message-ID: <2a0d371d-79f6-e7aa-6dcd-3b29264e1feb@linux.ibm.com> (raw)
In-Reply-To: <YHng5nAPSLJHnRY9@carbon.dhcp.thefacebook.com>



On 17/04/21 12:39 am, Roman Gushchin wrote:
> On Sat, Apr 17, 2021 at 12:11:37AM +0530, Pratik Sampat wrote:
>>
>> On 17/04/21 12:04 am, Roman Gushchin wrote:
>>> On Fri, Apr 16, 2021 at 11:57:03PM +0530, Pratik Sampat wrote:
>>>> On 16/04/21 10:43 pm, Roman Gushchin wrote:
>>>>> On Fri, Apr 16, 2021 at 08:58:33PM +0530, Pratik Sampat wrote:
>>>>>> Hello Dennis,
>>>>>>
>>>>>> I apologize for the clutter of logs before, I'm pasting the logs of before and
>>>>>> after the percpu test in the case of the patchset being applied on 5.12-rc6 and
>>>>>> the vanilla kernel 5.12-rc6.
>>>>>>
>>>>>> On 16/04/21 7:48 pm, Dennis Zhou wrote:
>>>>>>> Hello,
>>>>>>>
>>>>>>> On Fri, Apr 16, 2021 at 06:26:15PM +0530, Pratik Sampat wrote:
>>>>>>>> Hello Roman,
>>>>>>>>
>>>>>>>> I've tried the v3 patch series on a POWER9 and an x86 KVM setup.
>>>>>>>>
>>>>>>>> My results of the percpu_test are as follows:
>>>>>>>> Intel KVM 4CPU:4G
>>>>>>>> Vanilla 5.12-rc6
>>>>>>>> # ./percpu_test.sh
>>>>>>>> Percpu:             1952 kB
>>>>>>>> Percpu:           219648 kB
>>>>>>>> Percpu:           219648 kB
>>>>>>>>
>>>>>>>> 5.12-rc6 + with patchset applied
>>>>>>>> # ./percpu_test.sh
>>>>>>>> Percpu:             2080 kB
>>>>>>>> Percpu:           219712 kB
>>>>>>>> Percpu:            72672 kB
>>>>>>>>
>>>>>>>> I'm able to see improvement comparable to that of what you're see too.
>>>>>>>>
>>>>>>>> However, on POWERPC I'm unable to reproduce these improvements with the patchset in the same configuration
>>>>>>>>
>>>>>>>> POWER9 KVM 4CPU:4G
>>>>>>>> Vanilla 5.12-rc6
>>>>>>>> # ./percpu_test.sh
>>>>>>>> Percpu:             5888 kB
>>>>>>>> Percpu:           118272 kB
>>>>>>>> Percpu:           118272 kB
>>>>>>>>
>>>>>>>> 5.12-rc6 + with patchset applied
>>>>>>>> # ./percpu_test.sh
>>>>>>>> Percpu:             6144 kB
>>>>>>>> Percpu:           119040 kB
>>>>>>>> Percpu:           119040 kB
>>>>>>>>
>>>>>>>> I'm wondering if there's any architectural specific code that needs plumbing
>>>>>>>> here?
>>>>>>>>
>>>>>>> There shouldn't be. Can you send me the percpu_stats debug output before
>>>>>>> and after?
>>>>>> I'll paste the whole debug stats before and after here.
>>>>>> 5.12-rc6 + patchset
>>>>>> -----BEFORE-----
>>>>>> Percpu Memory Statistics
>>>>>> Allocation Info:
>>>>> Hm, this looks highly suspicious. Here is your stats in a more compact form:
>>>>>
>>>>> Vanilla
>>>>>
>>>>> nr_alloc            :         9038         nr_alloc            :        97046
>>>>> nr_dealloc          :         6992	   nr_dealloc          :        94237
>>>>> nr_cur_alloc        :         2046	   nr_cur_alloc        :         2809
>>>>> nr_max_alloc        :         2178	   nr_max_alloc        :        90054
>>>>> nr_chunks           :            3	   nr_chunks           :           11
>>>>> nr_max_chunks       :            3	   nr_max_chunks       :           47
>>>>> min_alloc_size      :            4	   min_alloc_size      :            4
>>>>> max_alloc_size      :         1072	   max_alloc_size      :         1072
>>>>> empty_pop_pages     :            5	   empty_pop_pages     :           29
>>>>>
>>>>>
>>>>> Patched
>>>>>
>>>>> nr_alloc            :         9040         nr_alloc            :        97048
>>>>> nr_dealloc          :         6994	   nr_dealloc          :        95002
>>>>> nr_cur_alloc        :         2046	   nr_cur_alloc        :         2046
>>>>> nr_max_alloc        :         2208	   nr_max_alloc        :        90054
>>>>> nr_chunks           :            3	   nr_chunks           :           48
>>>>> nr_max_chunks       :            3	   nr_max_chunks       :           48
>>>>> min_alloc_size      :            4	   min_alloc_size      :            4
>>>>> max_alloc_size      :         1072	   max_alloc_size      :         1072
>>>>> empty_pop_pages     :           12	   empty_pop_pages     :           61
>>>>>
>>>>>
>>>>> So it looks like the number of chunks got bigger, as well as the number of
>>>>> empty_pop_pages? This contradicts to what you wrote, so can you, please, make
>>>>> sure that the data is correct and we're not messing two cases?
>>>>>
>>>>> So it looks like for some reason sidelined (depopulated) chunks are not getting
>>>>> freed completely. But I struggle to explain why the initial empty_pop_pages is
>>>>> bigger with the same amount of chunks.
>>>>>
>>>>> So, can you, please, apply the following patch and provide an updated statistics?
>>>> Unfortunately, I'm not completely well versed in this area, but yes the empty
>>>> pop pages number doesn't make sense to me either.
>>>>
>>>> I re-ran the numbers trying to make sure my experiment setup is sane but
>>>> results remain the same.
>>>>
>>>> Vanilla
>>>> nr_alloc            :         9040         nr_alloc            :        97048
>>>> nr_dealloc          :         6994	   nr_dealloc          :        94404
>>>> nr_cur_alloc        :         2046	   nr_cur_alloc        :         2644
>>>> nr_max_alloc        :         2169	   nr_max_alloc        :        90054
>>>> nr_chunks           :            3	   nr_chunks           :           10
>>>> nr_max_chunks       :            3	   nr_max_chunks       :           47
>>>> min_alloc_size      :            4	   min_alloc_size      :            4
>>>> max_alloc_size      :         1072	   max_alloc_size      :         1072
>>>> empty_pop_pages     :            4	   empty_pop_pages     :           32
>>>>
>>>> With the patchset + debug patch the results are as follows:
>>>> Patched
>>>>
>>>> nr_alloc            :         9040         nr_alloc            :        97048
>>>> nr_dealloc          :         6994	   nr_dealloc          :        94349
>>>> nr_cur_alloc        :         2046	   nr_cur_alloc        :         2699
>>>> nr_max_alloc        :         2194	   nr_max_alloc        :        90054
>>>> nr_chunks           :            3	   nr_chunks           :           48
>>>> nr_max_chunks       :            3	   nr_max_chunks       :           48
>>>> min_alloc_size      :            4	   min_alloc_size      :            4
>>>> max_alloc_size      :         1072	   max_alloc_size      :         1072
>>>> empty_pop_pages     :           12	   empty_pop_pages     :           54
>>>>
>>>> With the extra tracing I can see 39 entries of "Chunk (sidelined)"
>>>> after the test was run. I don't see any entries for "Chunk (to depopulate)"
>>>>
>>>> I've snipped the results of slidelined chunks because they went on for ~600
>>>> lines, if you need the full logs let me know.
>>> Yes, please! That's the most interesting part!
>> Got it. Pasting the full logs of after the percpu experiment was completed
> Thanks!
>
> Would you mind to apply the following patch and test again?
>
> --
>
> diff --git a/mm/percpu.c b/mm/percpu.c
> index ded3a7541cb2..532c6a7ebdfd 100644
> --- a/mm/percpu.c
> +++ b/mm/percpu.c
> @@ -2296,6 +2296,9 @@ void free_percpu(void __percpu *ptr)
>                                  need_balance = true;
>                                  break;
>                          }
> +
> +               chunk->depopulated = false;
> +               pcpu_chunk_relocate(chunk, -1);
>          } else if (chunk != pcpu_first_chunk && chunk != pcpu_reserved_chunk &&
>                     !chunk->isolated &&
>                     (pcpu_nr_empty_pop_pages[pcpu_chunk_type(chunk)] >
>
Sure thing.

I see much lower sideline chunks. In one such test run I saw zero occurrences
of slidelined chunks

Pasting the full logs as an example:

BEFORE
Percpu Memory Statistics
Allocation Info:
----------------------------------------
   unit_size           :       655360
   static_size         :       608920
   reserved_size       :            0
   dyn_size            :        46440
   atom_size           :        65536
   alloc_size          :       655360

Global Stats:
----------------------------------------
   nr_alloc            :         9038
   nr_dealloc          :         6992
   nr_cur_alloc        :         2046
   nr_max_alloc        :         2200
   nr_chunks           :            3
   nr_max_chunks       :            3
   min_alloc_size      :            4
   max_alloc_size      :         1072
   empty_pop_pages     :           12

Per Chunk Stats:
----------------------------------------
Chunk: <- First Chunk
   nr_alloc            :         1092
   max_alloc_size      :         1072
   empty_pop_pages     :            0
   first_bit           :        16247
   free_bytes          :            4
   contig_bytes        :            4
   sum_frag            :            4
   max_frag            :            4
   cur_min_alloc       :            4
   cur_med_alloc       :            8
   cur_max_alloc       :         1072
   memcg_aware         :            0

Chunk:
   nr_alloc            :          594
   max_alloc_size      :          992
   empty_pop_pages     :            8
   first_bit           :          456
   free_bytes          :       645008
   contig_bytes        :       319984
   sum_frag            :       325024
   max_frag            :       318680
   cur_min_alloc       :            4
   cur_med_alloc       :            8
   cur_max_alloc       :          424
   memcg_aware         :            0

Chunk:
   nr_alloc            :          360
   max_alloc_size      :         1072
   empty_pop_pages     :            4
   first_bit           :        26595
   free_bytes          :       506640
   contig_bytes        :       506540
   sum_frag            :          100
   max_frag            :           32
   cur_min_alloc       :            4
   cur_med_alloc       :          156
   cur_max_alloc       :         1072
   memcg_aware         :            1


AFTER
Percpu Memory Statistics
Allocation Info:
----------------------------------------
   unit_size           :       655360
   static_size         :       608920
   reserved_size       :            0
   dyn_size            :        46440
   atom_size           :        65536
   alloc_size          :       655360

Global Stats:
----------------------------------------
   nr_alloc            :        97046
   nr_dealloc          :        94304
   nr_cur_alloc        :         2742
   nr_max_alloc        :        90054
   nr_chunks           :           11
   nr_max_chunks       :           47
   min_alloc_size      :            4
   max_alloc_size      :         1072
   empty_pop_pages     :           18

Per Chunk Stats:
----------------------------------------
Chunk: <- First Chunk
   nr_alloc            :         1092
   max_alloc_size      :         1072
   empty_pop_pages     :            0
   first_bit           :        16247
   free_bytes          :            4
   contig_bytes        :            4
   sum_frag            :            4
   max_frag            :            4
   cur_min_alloc       :            4
   cur_med_alloc       :            8
   cur_max_alloc       :         1072
   memcg_aware         :            0

Chunk:
   nr_alloc            :          838
   max_alloc_size      :         1072
   empty_pop_pages     :            7
   first_bit           :          464
   free_bytes          :       640476
   contig_bytes        :       290672
   sum_frag            :       349804
   max_frag            :       304344
   cur_min_alloc       :            4
   cur_med_alloc       :            8
   cur_max_alloc       :         1072
   memcg_aware         :            0

Chunk:
   nr_alloc            :           90
   max_alloc_size      :         1072
   empty_pop_pages     :            0
   first_bit           :          536
   free_bytes          :       595752
   contig_bytes        :        26164
   sum_frag            :       575132
   max_frag            :        26164
   cur_min_alloc       :          156
   cur_med_alloc       :         1072
   cur_max_alloc       :         1072
   memcg_aware         :            1

Chunk:
   nr_alloc            :           90
   max_alloc_size      :         1072
   empty_pop_pages     :            0
   first_bit           :            0
   free_bytes          :       597428
   contig_bytes        :        26164
   sum_frag            :       596848
   max_frag            :        26164
   cur_min_alloc       :          156
   cur_med_alloc       :          312
   cur_max_alloc       :         1072
   memcg_aware         :            1

Chunk:
   nr_alloc            :           92
   max_alloc_size      :         1072
   empty_pop_pages     :            0
   first_bit           :            0
   free_bytes          :       595284
   contig_bytes        :        26164
   sum_frag            :       590360
   max_frag            :        26164
   cur_min_alloc       :          156
   cur_med_alloc       :          312
   cur_max_alloc       :         1072
   memcg_aware         :            1

Chunk:
   nr_alloc            :           92
   max_alloc_size      :         1072
   empty_pop_pages     :            0
   first_bit           :            0
   free_bytes          :       595284
   contig_bytes        :        26164
   sum_frag            :       583768
   max_frag            :        26164
   cur_min_alloc       :          156
   cur_med_alloc       :          312
   cur_max_alloc       :         1072
   memcg_aware         :            1

Chunk:
   nr_alloc            :          360
   max_alloc_size      :         1072
   empty_pop_pages     :            7
   first_bit           :        26595
   free_bytes          :       506640
   contig_bytes        :       506540
   sum_frag            :          100
   max_frag            :           32
   cur_min_alloc       :            4
   cur_med_alloc       :          156
   cur_max_alloc       :         1072
   memcg_aware         :            1

Chunk:
   nr_alloc            :           12
   max_alloc_size      :         1072
   empty_pop_pages     :            3
   first_bit           :            0
   free_bytes          :       647524
   contig_bytes        :       563492
   sum_frag            :        57872
   max_frag            :        26164
   cur_min_alloc       :          156
   cur_med_alloc       :          312
   cur_max_alloc       :         1072
   memcg_aware         :            1

Chunk:
   nr_alloc            :            0
   max_alloc_size      :         1072
   empty_pop_pages     :            1
   first_bit           :            0
   free_bytes          :       655360
   contig_bytes        :       655360
   sum_frag            :            0
   max_frag            :            0
   cur_min_alloc       :            0
   cur_med_alloc       :            0
   cur_max_alloc       :            0
   memcg_aware         :            1

Chunk (sidelined):
   nr_alloc            :           72
   max_alloc_size      :         1072
   empty_pop_pages     :            0
   first_bit           :            0
   free_bytes          :       608344
   contig_bytes        :       145552
   sum_frag            :       590340
   max_frag            :       145552
   cur_min_alloc       :          156
   cur_med_alloc       :          312
   cur_max_alloc       :         1072
   memcg_aware         :            1

Chunk (sidelined):
   nr_alloc            :            4
   max_alloc_size      :         1072
   empty_pop_pages     :            0
   first_bit           :            0
   free_bytes          :       652748
   contig_bytes        :       426720
   sum_frag            :       426720
   max_frag            :       426720
   cur_min_alloc       :          156
   cur_med_alloc       :          312
   cur_max_alloc       :         1072
   memcg_aware         :            1






  reply	other threads:[~2021-04-16 19:44 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-08  3:57 [PATCH v3 0/6] percpu: partial chunk depopulation Roman Gushchin
2021-04-08  3:57 ` [PATCH v3 1/6] percpu: fix a comment about the chunks ordering Roman Gushchin
2021-04-16 21:06   ` Dennis Zhou
2021-04-08  3:57 ` [PATCH v3 2/6] percpu: split __pcpu_balance_workfn() Roman Gushchin
2021-04-16 21:06   ` Dennis Zhou
2021-04-08  3:57 ` [PATCH v3 3/6] percpu: make pcpu_nr_empty_pop_pages per chunk type Roman Gushchin
2021-04-16 21:08   ` Dennis Zhou
2021-04-08  3:57 ` [PATCH v3 4/6] percpu: generalize pcpu_balance_populated() Roman Gushchin
2021-04-16 21:09   ` Dennis Zhou
2021-04-08  3:57 ` [PATCH v3 5/6] percpu: factor out pcpu_check_chunk_hint() Roman Gushchin
2021-04-16 21:15   ` Dennis Zhou
2021-04-08  3:57 ` [PATCH v3 6/6] percpu: implement partial chunk depopulation Roman Gushchin
2021-04-16 12:56 ` [PATCH v3 0/6] percpu: " Pratik Sampat
2021-04-16 14:18   ` Dennis Zhou
2021-04-16 15:28     ` Pratik Sampat
2021-04-16 17:13       ` Roman Gushchin
2021-04-16 18:27         ` Pratik Sampat
2021-04-16 18:34           ` Roman Gushchin
2021-04-16 18:41             ` Pratik Sampat
2021-04-16 19:09               ` Roman Gushchin
2021-04-16 19:44                 ` Pratik Sampat [this message]
2021-04-16 20:03                   ` Roman Gushchin
2021-04-17  7:08                     ` Pratik Sampat
2021-04-16 21:47                   ` Dennis Zhou
2021-04-17  7:14                     ` Pratik Sampat
2021-04-16 16:21     ` Roman Gushchin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2a0d371d-79f6-e7aa-6dcd-3b29264e1feb@linux.ibm.com \
    --to=psampat@linux.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=dennis@kernel.org \
    --cc=guro@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=pratik.r.sampat@gmail.com \
    --cc=tj@kernel.org \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.