linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Minchan Kim <minchan@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Michal Hocko <mhocko@suse.com>, linux-mm <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Suren Baghdasaryan <surenb@google.com>,
	John Dias <joaodias@google.com>,
	huww98@outlook.com, John Hubbard <jhubbard@nvidia.com>
Subject: Re: [RFC v2] mm: introduce page pin owner
Date: Fri, 14 Jan 2022 17:51:24 +0100	[thread overview]
Message-ID: <8f02e71b-5de7-4342-7371-a7fe19b114b5@redhat.com> (raw)
In-Reply-To: <YeGnSG3BS5np9mUa@google.com>

On 14.01.22 17:39, Minchan Kim wrote:
> On Fri, Jan 14, 2022 at 02:31:43PM +0100, David Hildenbrand wrote:
>> On 12.01.22 21:41, Minchan Kim wrote:
>>> On Wed, Jan 12, 2022 at 06:42:21PM +0100, David Hildenbrand wrote:
>>>>>>
>>>>>> What about something like:
>>>>>>
>>>>>> "mm: selective tracing of page reference holders on unref"
>>>>>>
>>>>>> PAGE_EXT_PIN_OWNER -> PAGE_EXT_TRACE_UNREF
>>>>>>
>>>>>> $whatever feature/user can then set the bit, for example, when migration
>>>>>> fails.
>>>>>
>>>>> I couldn't imagine put_page tracking is generally useful except
>>>>> migration failure. Do you have reasonable usecase in your mind
>>>>> to make the feature general to be used?
>>>>
>>>> HWpoison etc. purposes maybe? Trace who still held a reference a page
>>>> that was poisoned and couldn't be removed?  Or in general, tracking
>>>
>>> I am not familiar with hwpoison so here dumb question goes:
>>> Is that different one with __soft_offline_page?
>>> It uses migrate_pages so current interface supports it with filter.
>>
>> __soft_offline_page() won't kill the target and try to migrate because
>> the pages are about to be damaged and we can still access them.
>>
>> ordinary memory errors mean we kill the target because we cannot access
>> the page anymore without triggering MCEs (or worse IIUC) again.
>>
>> So in my thinking, after memory_failure(), it could eventually be
>> helpful to figure out who still has a (temporary) reference to such a
>> broken page, even after killing the process. But that's just one idea I
>> quickly came up with.
> 
> 
> Thanks for the clarification. Is the trace best fit in the case?
> Presumably you know the broken page, can't you find who owns the page
> using /proc/pid/pagemap?
> Furthermore, page_get/put operations commonly could happen in
> different contexts regardless of page's owner so the trace from
> different context is still helpful?
> 
> If it's helpful, could you tell what you want to make the interface to
> set the bit of broken page? For example, as-is case for page migration,
> report_page_pinners is called to mark failed page at unmap_and_move.
> Do you want to add something similar(maybe, set_page_ref_track under
> rename) in memory-failure.c?
> 
> It would be very helpful to design the feature's interface(surely,
> naming as well) and write description to convince others "yeah,
> sounds like so useful for the case and that's best fit than other way".

I currently don't have time to explore this further, it was just a
random thought how else this might be useful.


>>
>>>
>>> echo "memory_failure" > $trace_dir/events/page_pin_owner/report_page_pinners/filter
>>>
>>>> references to something that should have a refcount of 0 because it
>>>> should have been freed, but for some reason there are still references
>>>> around?
>>>
>>> Sounds like you are talking about memory leak? What's the purpose
>>> with trace, not using other existing tool to find memory leak?
>>>
>>
>> IIRC, kmemleak can find objects that are no longer referenced, and we
>> cannot track buddy allocations, but only kmalloc and friends.
> 
> PageOwner is your good buddy.

Page owner tracks who owns a page but not who else holds a reference
(some buffer, gup, whatsoever), right?

> 
>>
>>>>
>>>>> Otherwise, I'd like to have feature naming more higher level
>>>>> to represent page migration failure and then tracking unref of
>>>>> the page. In the sense, PagePinOwner John suggested was good
>>>>> candidate(Even, my original naming PagePinner was worse) since
>>>>
>>>> Personally, I dislike both variants.
>>>>
>>>>> I was trouble to abstract the feature with short word.
>>>>> If we approach "what feature is doing" rather than "what's
>>>>> the feature's goal"(I feel the your suggestion would be close
>>>>> to what feature is doing), I'd like to express "unreference on
>>>>> migraiton failed page" so PAGE_EXT_UNMIGRATED_UNREF
>>>>> (However, I prefer the feature naming more "what we want to achieve")
>>>>>
>>>> E.g., PAGE_EXT_TRACE_UNREF will trace unref to the page once the bit is
>>>> set. The functionality itself is completely independent of migration
>>>> failures. That's just the code that sets it to enable the underlying
>>>> tracing for that specific page.
>>>
>>> I agree that make something general is great but I also want to avoid
>>> create something too big from the beginning with just imagination.
>>> So, I'd like to hear more concrete and appealing usecases and then
>>> we could think over this trace approach is really the best one to
>>> achieve the goal. Once it's agreed, the naming you suggested would
>>> make sense. 
>>
>> At least for me it's a lot cleaner if a feature clearly expresses what
>> it actually does. Staring at PAGE_EXT_PIN_OWNER I initially had no clue.
>> I was assuming we would actually track (not trace!) all active FOLL_PIN
>> (not unref callers!). Maybe that makes it clearer why I'd prefer a
>> clearer name.
> 
> I totally agree PagePinOwner is not 100% straightforward. I'm open for
> other better name. Currently we are discussing how we could generalize
> and whether it's useful or not. Depending on the discussion, the design/
> interface as well as naming could be changed. No problem.

PagePinOwner is just highly misleading. Because that's not what the
feature does. Having that said, i hope we'll get other opinions as well.

>>
>>>>
>>>> Makes sense, I was expecting the output to be large, but possible it's
>>>> going to be way too large.
>>>>
>>>> Would it also make sense to track for a flagged page new taken
>>>> references, such that you can differentiate between new (e.g.,
>>>> temporary) ones and previous ones? Feels like a reasonable addition.
>>>
>>> I actually tried it and it showed 2x times bigger output.
>>
>> Is 2x that bad? Or would it be worth making it configurable?
> 
> For my goal, 2x was bad because I need to minimize the trace buffer.
> Furthermore, the new get operation was not helpful but just noisy.
> If some usecase is not enough with only put callsite, we add get
> easily. Implementation is not hard. The matter is how it's useful
> in real practice since we would expose the interface to the user,
> I guess.

Right. but it's a debugging feature after all, so I asume we care little
about KABI. We can always add it on top, I agree.

> 
>>
>>> For me to debug CMA alloation failure, the new get_page callstack
>>> after migration failure were waste since they repeated from lru
>>> adding, isolate from the LRU something. Rather than get callsite,
>>> I needed only put call sites since I could deduce where the pair-get
>>> came from.
>>
>> Could maybe some filters help that exclude such LRU activity?
> 
> For my case, the filter wouldn't be helpful because I shouldn't
> miss the LRU activity since sometime they makes the allocation
> failure. Thus, I could deduce the lru's get site from put callback.
> get callsite is not helpful for my case but just wasted memory
> and perf.
> 

Makes sense.

-- 
Thanks,

David / dhildenb


  reply	other threads:[~2022-01-14 16:51 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-28 17:59 [RFC v2] mm: introduce page pin owner Minchan Kim
2022-01-06 18:14 ` Minchan Kim
2022-01-06 22:27 ` John Hubbard
2022-01-06 23:24   ` Minchan Kim
2022-01-12 12:25 ` David Hildenbrand
2022-01-12 16:22   ` Minchan Kim
2022-01-12 17:42     ` David Hildenbrand
2022-01-12 20:41       ` Minchan Kim
2022-01-14 13:31         ` David Hildenbrand
2022-01-14 16:39           ` Minchan Kim
2022-01-14 16:51             ` David Hildenbrand [this message]
2022-01-14 18:47               ` David Hildenbrand
2022-01-14 18:57                 ` Minchan Kim
2022-01-21 21:59                   ` Minchan Kim
2022-01-14 18:55               ` Minchan Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8f02e71b-5de7-4342-7371-a7fe19b114b5@redhat.com \
    --to=david@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=huww98@outlook.com \
    --cc=jhubbard@nvidia.com \
    --cc=joaodias@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=minchan@kernel.org \
    --cc=surenb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).