linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Gavin Shan <gshan@redhat.com>
To: Alexander Duyck <alexander.duyck@gmail.com>
Cc: linux-mm <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	David Hildenbrand <david@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Anshuman Khandual <anshuman.khandual@arm.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>,
	shan.gavin@gmail.com
Subject: Re: [PATCH v2 2/3] mm/page_reporting: Allow driver to specify threshold
Date: Wed, 23 Jun 2021 10:43:48 +1000	[thread overview]
Message-ID: <d2e0c1a4-1ef1-d895-300b-179d33b83b41@redhat.com> (raw)
In-Reply-To: <CAKgT0UfXfqUhm5P-2J4ftgU9xmy9GixJdFnpUP-ppXih1yZXyQ@mail.gmail.com>

On 6/23/21 3:39 AM, Alexander Duyck wrote:
> On Mon, Jun 21, 2021 at 10:48 PM Gavin Shan <gshan@redhat.com> wrote:
>>
>> The page reporting threshold is currently sticky to @pageblock_order.
>> The page reporting can never be triggered because the freeing page
>> can't come up with a free area like that huge. The situation becomes
>> worse when the system memory becomes heavily fragmented.
>>
>> For example, the following configurations are used on ARM64 when 64KB
>> base page size is enabled. In this specific case, the page reporting
>> won't be triggered until the freeing page comes up with a 512MB free
>> area. That's hard to be met, especially when the system memory becomes
>> heavily fragmented.
>>
>>     PAGE_SIZE:          64KB
>>     HPAGE_SIZE:         512MB
>>     pageblock_order:    13       (512MB)
>>     MAX_ORDER:          14
>>
>> This allows the drivers to specify the threshold when the page
>> reporting device is registered. The threshold falls back to
>> @pageblock_order if it's not specified by the driver. The existing
>> users (hv_balloon and virtio_balloon) don't specify the threshold
>> and @pageblock_order is still taken as their page reporting order.
>> So this shouldn't introduce functional changes.
>>
>> Signed-off-by: Gavin Shan <gshan@redhat.com>
>> ---
>>   include/linux/page_reporting.h |  3 +++
>>   mm/page_reporting.c            | 14 ++++++++++----
>>   mm/page_reporting.h            | 10 ++--------
>>   3 files changed, 15 insertions(+), 12 deletions(-)
>>
>> diff --git a/include/linux/page_reporting.h b/include/linux/page_reporting.h
>> index 3b99e0ec24f2..fe648dfa3a7c 100644
>> --- a/include/linux/page_reporting.h
>> +++ b/include/linux/page_reporting.h
>> @@ -18,6 +18,9 @@ struct page_reporting_dev_info {
>>
>>          /* Current state of page reporting */
>>          atomic_t state;
>> +
>> +       /* Minimal order of page reporting */
>> +       unsigned int order;
>>   };
>>
>>   /* Tear-down and bring-up for page reporting devices */
>> diff --git a/mm/page_reporting.c b/mm/page_reporting.c
>> index df9c5054e1b4..27670360bae6 100644
>> --- a/mm/page_reporting.c
>> +++ b/mm/page_reporting.c
> 
> <snip>
> 
>> @@ -324,6 +324,12 @@ int page_reporting_register(struct page_reporting_dev_info *prdev)
>>                  goto err_out;
>>          }
>>
>> +       /*
>> +        * We need to choose the minimal order of page reporting if it's
>> +        * not specified by the driver.
>> +        */
>> +       prdev->order = prdev->order ? prdev->order : pageblock_order;
>> +
>>          /* initialize state and work structures */
>>          atomic_set(&prdev->state, PAGE_REPORTING_IDLE);
>>          INIT_DELAYED_WORK(&prdev->work, &page_reporting_process);
> 
> Rather than using prdev->order directly it might be better to have a
> reporting_order value you could export for use by
> page_reporting_notify_free. That way you avoid the overhead of having
> to make a function call per page freed.
> 

Yes, I obviously missed the point to reduce the overhead because of
function call. In next revision, I will introduce @page_reporting_order
for this. Besides, it will be exported as a module parameter so that
it can be changed dynamically, as David suggested before.

>> diff --git a/mm/page_reporting.h b/mm/page_reporting.h
>> index 2c385dd4ddbd..d9f972e72649 100644
>> --- a/mm/page_reporting.h
>> +++ b/mm/page_reporting.h
>> @@ -10,11 +10,9 @@
>>   #include <linux/pgtable.h>
>>   #include <linux/scatterlist.h>
>>
>> -#define PAGE_REPORTING_MIN_ORDER       pageblock_order
>> -
>>   #ifdef CONFIG_PAGE_REPORTING
>>   DECLARE_STATIC_KEY_FALSE(page_reporting_enabled);
>> -void __page_reporting_notify(void);
>> +void __page_reporting_notify(unsigned int order);
>>
>>   static inline bool page_reported(struct page *page)
>>   {
>> @@ -37,12 +35,8 @@ static inline void page_reporting_notify_free(unsigned int order)
>>          if (!static_branch_unlikely(&page_reporting_enabled))
>>                  return;
>>
>> -       /* Determine if we have crossed reporting threshold */
>> -       if (order < PAGE_REPORTING_MIN_ORDER)
>> -               return;
>> -
>>          /* This will add a few cycles, but should be called infrequently */
>> -       __page_reporting_notify();
>> +       __page_reporting_notify(order);
>>   }
>>   #else /* CONFIG_PAGE_REPORTING */
>>   #define page_reported(_page)   false
> 
> With us making the function call per page freed we are likely to have
> a much more significant impact on performance with page reporting
> enabled. Ideally we want to limit this impact so that we only take the
> cost for the conditional check on the lower order pages.
> 

Yep, thanks for the explanation, Alex.

Thanks,
Gavin



  reply	other threads:[~2021-06-22 22:52 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-22  7:49 [PATCH v2 0/3] mm/page_reporting: Make page reporting work on arm64 with 64KB page size Gavin Shan
2021-06-22  7:49 ` [PATCH v2 1/3] mm/page_reporting: Fix code style in __page_reporting_request() Gavin Shan
2021-06-22  7:49 ` [PATCH v2 2/3] mm/page_reporting: Allow driver to specify threshold Gavin Shan
2021-06-22 17:39   ` Alexander Duyck
2021-06-23  0:43     ` Gavin Shan [this message]
2021-06-22  7:49 ` [PATCH v2 3/3] virtio_balloon: Specify page reporting order if needed Gavin Shan
2021-06-22 17:44   ` Alexander Duyck
2021-06-23  0:47     ` Gavin Shan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d2e0c1a4-1ef1-d895-300b-179d33b83b41@redhat.com \
    --to=gshan@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.duyck@gmail.com \
    --cc=anshuman.khandual@arm.com \
    --cc=catalin.marinas@arm.com \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mst@redhat.com \
    --cc=shan.gavin@gmail.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).