linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Pratik Sampat <psampat@linux.ibm.com>
To: Roman Gushchin <guro@fb.com>
Cc: Dennis Zhou <dennis@kernel.org>, Tejun Heo <tj@kernel.org>,
	Christoph Lameter <cl@linux.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	pratik.r.sampat@gmail.com
Subject: Re: [PATCH v3 0/6] percpu: partial chunk depopulation
Date: Sat, 17 Apr 2021 01:14:03 +0530	[thread overview]
Message-ID: <2a0d371d-79f6-e7aa-6dcd-3b29264e1feb@linux.ibm.com> (raw)
In-Reply-To: <YHng5nAPSLJHnRY9@carbon.dhcp.thefacebook.com>



On 17/04/21 12:39 am, Roman Gushchin wrote:
> On Sat, Apr 17, 2021 at 12:11:37AM +0530, Pratik Sampat wrote:
>>
>> On 17/04/21 12:04 am, Roman Gushchin wrote:
>>> On Fri, Apr 16, 2021 at 11:57:03PM +0530, Pratik Sampat wrote:
>>>> On 16/04/21 10:43 pm, Roman Gushchin wrote:
>>>>> On Fri, Apr 16, 2021 at 08:58:33PM +0530, Pratik Sampat wrote:
>>>>>> Hello Dennis,
>>>>>>
>>>>>> I apologize for the clutter of logs before, I'm pasting the logs of before and
>>>>>> after the percpu test in the case of the patchset being applied on 5.12-rc6 and
>>>>>> the vanilla kernel 5.12-rc6.
>>>>>>
>>>>>> On 16/04/21 7:48 pm, Dennis Zhou wrote:
>>>>>>> Hello,
>>>>>>>
>>>>>>> On Fri, Apr 16, 2021 at 06:26:15PM +0530, Pratik Sampat wrote:
>>>>>>>> Hello Roman,
>>>>>>>>
>>>>>>>> I've tried the v3 patch series on a POWER9 and an x86 KVM setup.
>>>>>>>>
>>>>>>>> My results of the percpu_test are as follows:
>>>>>>>> Intel KVM 4CPU:4G
>>>>>>>> Vanilla 5.12-rc6
>>>>>>>> # ./percpu_test.sh
>>>>>>>> Percpu:             1952 kB
>>>>>>>> Percpu:           219648 kB
>>>>>>>> Percpu:           219648 kB
>>>>>>>>
>>>>>>>> 5.12-rc6 + with patchset applied
>>>>>>>> # ./percpu_test.sh
>>>>>>>> Percpu:             2080 kB
>>>>>>>> Percpu:           219712 kB
>>>>>>>> Percpu:            72672 kB
>>>>>>>>
>>>>>>>> I'm able to see improvement comparable to that of what you're see too.
>>>>>>>>
>>>>>>>> However, on POWERPC I'm unable to reproduce these improvements with the patchset in the same configuration
>>>>>>>>
>>>>>>>> POWER9 KVM 4CPU:4G
>>>>>>>> Vanilla 5.12-rc6
>>>>>>>> # ./percpu_test.sh
>>>>>>>> Percpu:             5888 kB
>>>>>>>> Percpu:           118272 kB
>>>>>>>> Percpu:           118272 kB
>>>>>>>>
>>>>>>>> 5.12-rc6 + with patchset applied
>>>>>>>> # ./percpu_test.sh
>>>>>>>> Percpu:             6144 kB
>>>>>>>> Percpu:           119040 kB
>>>>>>>> Percpu:           119040 kB
>>>>>>>>
>>>>>>>> I'm wondering if there's any architectural specific code that needs plumbing
>>>>>>>> here?
>>>>>>>>
>>>>>>> There shouldn't be. Can you send me the percpu_stats debug output before
>>>>>>> and after?
>>>>>> I'll paste the whole debug stats before and after here.
>>>>>> 5.12-rc6 + patchset
>>>>>> -----BEFORE-----
>>>>>> Percpu Memory Statistics
>>>>>> Allocation Info:
>>>>> Hm, this looks highly suspicious. Here is your stats in a more compact form:
>>>>>
>>>>> Vanilla
>>>>>
>>>>> nr_alloc            :         9038         nr_alloc            :        97046
>>>>> nr_dealloc          :         6992	   nr_dealloc          :        94237
>>>>> nr_cur_alloc        :         2046	   nr_cur_alloc        :         2809
>>>>> nr_max_alloc        :         2178	   nr_max_alloc        :        90054
>>>>> nr_chunks           :            3	   nr_chunks           :           11
>>>>> nr_max_chunks       :            3	   nr_max_chunks       :           47
>>>>> min_alloc_size      :            4	   min_alloc_size      :            4
>>>>> max_alloc_size      :         1072	   max_alloc_size      :         1072
>>>>> empty_pop_pages     :            5	   empty_pop_pages     :           29
>>>>>
>>>>>
>>>>> Patched
>>>>>
>>>>> nr_alloc            :         9040         nr_alloc            :        97048
>>>>> nr_dealloc          :         6994	   nr_dealloc          :        95002
>>>>> nr_cur_alloc        :         2046	   nr_cur_alloc        :         2046
>>>>> nr_max_alloc        :         2208	   nr_max_alloc        :        90054
>>>>> nr_chunks           :            3	   nr_chunks           :           48
>>>>> nr_max_chunks       :            3	   nr_max_chunks       :           48
>>>>> min_alloc_size      :            4	   min_alloc_size      :            4
>>>>> max_alloc_size      :         1072	   max_alloc_size      :         1072
>>>>> empty_pop_pages     :           12	   empty_pop_pages     :           61
>>>>>
>>>>>
>>>>> So it looks like the number of chunks got bigger, as well as the number of
>>>>> empty_pop_pages? This contradicts to what you wrote, so can you, please, make
>>>>> sure that the data is correct and we're not messing two cases?
>>>>>
>>>>> So it looks like for some reason sidelined (depopulated) chunks are not getting
>>>>> freed completely. But I struggle to explain why the initial empty_pop_pages is
>>>>> bigger with the same amount of chunks.
>>>>>
>>>>> So, can you, please, apply the following patch and provide an updated statistics?
>>>> Unfortunately, I'm not completely well versed in this area, but yes the empty
>>>> pop pages number doesn't make sense to me either.
>>>>
>>>> I re-ran the numbers trying to make sure my experiment setup is sane but
>>>> results remain the same.
>>>>
>>>> Vanilla
>>>> nr_alloc            :         9040         nr_alloc            :        97048
>>>> nr_dealloc          :         6994	   nr_dealloc          :        94404
>>>> nr_cur_alloc        :         2046	   nr_cur_alloc        :         2644
>>>> nr_max_alloc        :         2169	   nr_max_alloc        :        90054
>>>> nr_chunks           :            3	   nr_chunks           :           10
>>>> nr_max_chunks       :            3	   nr_max_chunks       :           47
>>>> min_alloc_size      :            4	   min_alloc_size      :            4
>>>> max_alloc_size      :         1072	   max_alloc_size      :         1072
>>>> empty_pop_pages     :            4	   empty_pop_pages     :           32
>>>>
>>>> With the patchset + debug patch the results are as follows:
>>>> Patched
>>>>
>>>> nr_alloc            :         9040         nr_alloc            :        97048
>>>> nr_dealloc          :         6994	   nr_dealloc          :        94349
>>>> nr_cur_alloc        :         2046	   nr_cur_alloc        :         2699
>>>> nr_max_alloc        :         2194	   nr_max_alloc        :        90054
>>>> nr_chunks           :            3	   nr_chunks           :           48
>>>> nr_max_chunks       :            3	   nr_max_chunks       :           48
>>>> min_alloc_size      :            4	   min_alloc_size      :            4
>>>> max_alloc_size      :         1072	   max_alloc_size      :         1072
>>>> empty_pop_pages     :           12	   empty_pop_pages     :           54
>>>>
>>>> With the extra tracing I can see 39 entries of "Chunk (sidelined)"
>>>> after the test was run. I don't see any entries for "Chunk (to depopulate)"
>>>>
>>>> I've snipped the results of slidelined chunks because they went on for ~600
>>>> lines, if you need the full logs let me know.
>>> Yes, please! That's the most interesting part!
>> Got it. Pasting the full logs of after the percpu experiment was completed
> Thanks!
>
> Would you mind to apply the following patch and test again?
>
> --
>
> diff --git a/mm/percpu.c b/mm/percpu.c
> index ded3a7541cb2..532c6a7ebdfd 100644
> --- a/mm/percpu.c
> +++ b/mm/percpu.c
> @@ -2296,6 +2296,9 @@ void free_percpu(void __percpu *ptr)
>                                  need_balance = true;
>                                  break;
>                          }
> +
> +               chunk->depopulated = false;
> +               pcpu_chunk_relocate(chunk, -1);
>          } else if (chunk != pcpu_first_chunk && chunk != pcpu_reserved_chunk &&
>                     !chunk->isolated &&
>                     (pcpu_nr_empty_pop_pages[pcpu_chunk_type(chunk)] >
>
Sure thing.

I see much lower sideline chunks. In one such test run I saw zero occurrences
of slidelined chunks

Pasting the full logs as an example:

BEFORE
Percpu Memory Statistics
Allocation Info:
----------------------------------------
   unit_size           :       655360
   static_size         :       608920
   reserved_size       :            0
   dyn_size            :        46440
   atom_size           :        65536
   alloc_size          :       655360

Global Stats:
----------------------------------------
   nr_alloc            :         9038
   nr_dealloc          :         6992
   nr_cur_alloc        :         2046
   nr_max_alloc        :         2200
   nr_chunks           :            3
   nr_max_chunks       :            3
   min_alloc_size      :            4
   max_alloc_size      :         1072
   empty_pop_pages     :           12

Per Chunk Stats:
----------------------------------------
Chunk: <- First Chunk
   nr_alloc            :         1092
   max_alloc_size      :         1072
   empty_pop_pages     :            0
   first_bit           :        16247
   free_bytes          :            4
   contig_bytes        :            4
   sum_frag            :            4
   max_frag            :            4
   cur_min_alloc       :            4
   cur_med_alloc       :            8
   cur_max_alloc       :         1072
   memcg_aware         :            0

Chunk:
   nr_alloc            :          594
   max_alloc_size      :          992
   empty_pop_pages     :            8
   first_bit           :          456
   free_bytes          :       645008
   contig_bytes        :       319984
   sum_frag            :       325024
   max_frag            :       318680
   cur_min_alloc       :            4
   cur_med_alloc       :            8
   cur_max_alloc       :          424
   memcg_aware         :            0

Chunk:
   nr_alloc            :          360
   max_alloc_size      :         1072
   empty_pop_pages     :            4
   first_bit           :        26595
   free_bytes          :       506640
   contig_bytes        :       506540
   sum_frag            :          100
   max_frag            :           32
   cur_min_alloc       :            4
   cur_med_alloc       :          156
   cur_max_alloc       :         1072
   memcg_aware         :            1


AFTER
Percpu Memory Statistics
Allocation Info:
----------------------------------------
   unit_size           :       655360
   static_size         :       608920
   reserved_size       :            0
   dyn_size            :        46440
   atom_size           :        65536
   alloc_size          :       655360

Global Stats:
----------------------------------------
   nr_alloc            :        97046
   nr_dealloc          :        94304
   nr_cur_alloc        :         2742
   nr_max_alloc        :        90054
   nr_chunks           :           11
   nr_max_chunks       :           47
   min_alloc_size      :            4
   max_alloc_size      :         1072
   empty_pop_pages     :           18

Per Chunk Stats:
----------------------------------------
Chunk: <- First Chunk
   nr_alloc            :         1092
   max_alloc_size      :         1072
   empty_pop_pages     :            0
   first_bit           :        16247
   free_bytes          :            4
   contig_bytes        :            4
   sum_frag            :            4
   max_frag            :            4
   cur_min_alloc       :            4
   cur_med_alloc       :            8
   cur_max_alloc       :         1072
   memcg_aware         :            0

Chunk:
   nr_alloc            :          838
   max_alloc_size      :         1072
   empty_pop_pages     :            7
   first_bit           :          464
   free_bytes          :       640476
   contig_bytes        :       290672
   sum_frag            :       349804
   max_frag            :       304344
   cur_min_alloc       :            4
   cur_med_alloc       :            8
   cur_max_alloc       :         1072
   memcg_aware         :            0

Chunk:
   nr_alloc            :           90
   max_alloc_size      :         1072
   empty_pop_pages     :            0
   first_bit           :          536
   free_bytes          :       595752
   contig_bytes        :        26164
   sum_frag            :       575132
   max_frag            :        26164
   cur_min_alloc       :          156
   cur_med_alloc       :         1072
   cur_max_alloc       :         1072
   memcg_aware         :            1

Chunk:
   nr_alloc            :           90
   max_alloc_size      :         1072
   empty_pop_pages     :            0
   first_bit           :            0
   free_bytes          :       597428
   contig_bytes        :        26164
   sum_frag            :       596848
   max_frag            :        26164
   cur_min_alloc       :          156
   cur_med_alloc       :          312
   cur_max_alloc       :         1072
   memcg_aware         :            1

Chunk:
   nr_alloc            :           92
   max_alloc_size      :         1072
   empty_pop_pages     :            0
   first_bit           :            0
   free_bytes          :       595284
   contig_bytes        :        26164
   sum_frag            :       590360
   max_frag            :        26164
   cur_min_alloc       :          156
   cur_med_alloc       :          312
   cur_max_alloc       :         1072
   memcg_aware         :            1

Chunk:
   nr_alloc            :           92
   max_alloc_size      :         1072
   empty_pop_pages     :            0
   first_bit           :            0
   free_bytes          :       595284
   contig_bytes        :        26164
   sum_frag            :       583768
   max_frag            :        26164
   cur_min_alloc       :          156
   cur_med_alloc       :          312
   cur_max_alloc       :         1072
   memcg_aware         :            1

Chunk:
   nr_alloc            :          360
   max_alloc_size      :         1072
   empty_pop_pages     :            7
   first_bit           :        26595
   free_bytes          :       506640
   contig_bytes        :       506540
   sum_frag            :          100
   max_frag            :           32
   cur_min_alloc       :            4
   cur_med_alloc       :          156
   cur_max_alloc       :         1072
   memcg_aware         :            1

Chunk:
   nr_alloc            :           12
   max_alloc_size      :         1072
   empty_pop_pages     :            3
   first_bit           :            0
   free_bytes          :       647524
   contig_bytes        :       563492
   sum_frag            :        57872
   max_frag            :        26164
   cur_min_alloc       :          156
   cur_med_alloc       :          312
   cur_max_alloc       :         1072
   memcg_aware         :            1

Chunk:
   nr_alloc            :            0
   max_alloc_size      :         1072
   empty_pop_pages     :            1
   first_bit           :            0
   free_bytes          :       655360
   contig_bytes        :       655360
   sum_frag            :            0
   max_frag            :            0
   cur_min_alloc       :            0
   cur_med_alloc       :            0
   cur_max_alloc       :            0
   memcg_aware         :            1

Chunk (sidelined):
   nr_alloc            :           72
   max_alloc_size      :         1072
   empty_pop_pages     :            0
   first_bit           :            0
   free_bytes          :       608344
   contig_bytes        :       145552
   sum_frag            :       590340
   max_frag            :       145552
   cur_min_alloc       :          156
   cur_med_alloc       :          312
   cur_max_alloc       :         1072
   memcg_aware         :            1

Chunk (sidelined):
   nr_alloc            :            4
   max_alloc_size      :         1072
   empty_pop_pages     :            0
   first_bit           :            0
   free_bytes          :       652748
   contig_bytes        :       426720
   sum_frag            :       426720
   max_frag            :       426720
   cur_min_alloc       :          156
   cur_med_alloc       :          312
   cur_max_alloc       :         1072
   memcg_aware         :            1






  reply	other threads:[~2021-04-16 19:44 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-08  3:57 Roman Gushchin
2021-04-08  3:57 ` [PATCH v3 1/6] percpu: fix a comment about the chunks ordering Roman Gushchin
2021-04-16 21:06   ` Dennis Zhou
2021-04-08  3:57 ` [PATCH v3 2/6] percpu: split __pcpu_balance_workfn() Roman Gushchin
2021-04-16 21:06   ` Dennis Zhou
2021-04-08  3:57 ` [PATCH v3 3/6] percpu: make pcpu_nr_empty_pop_pages per chunk type Roman Gushchin
2021-04-16 21:08   ` Dennis Zhou
2021-04-08  3:57 ` [PATCH v3 4/6] percpu: generalize pcpu_balance_populated() Roman Gushchin
2021-04-16 21:09   ` Dennis Zhou
2021-04-08  3:57 ` [PATCH v3 5/6] percpu: factor out pcpu_check_chunk_hint() Roman Gushchin
2021-04-16 21:15   ` Dennis Zhou
2021-04-08  3:57 ` [PATCH v3 6/6] percpu: implement partial chunk depopulation Roman Gushchin
2021-04-16 12:56 ` [PATCH v3 0/6] percpu: " Pratik Sampat
2021-04-16 14:18   ` Dennis Zhou
2021-04-16 15:28     ` Pratik Sampat
2021-04-16 17:13       ` Roman Gushchin
2021-04-16 18:27         ` Pratik Sampat
2021-04-16 18:34           ` Roman Gushchin
2021-04-16 18:41             ` Pratik Sampat
2021-04-16 19:09               ` Roman Gushchin
2021-04-16 19:44                 ` Pratik Sampat [this message]
2021-04-16 20:03                   ` Roman Gushchin
2021-04-17  7:08                     ` Pratik Sampat
2021-04-16 21:47                   ` Dennis Zhou
2021-04-17  7:14                     ` Pratik Sampat
2021-04-16 16:21     ` Roman Gushchin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2a0d371d-79f6-e7aa-6dcd-3b29264e1feb@linux.ibm.com \
    --to=psampat@linux.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=dennis@kernel.org \
    --cc=guro@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=pratik.r.sampat@gmail.com \
    --cc=tj@kernel.org \
    --cc=vbabka@suse.cz \
    --subject='Re: [PATCH v3 0/6] percpu: partial chunk depopulation' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).