linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Minchan Kim <minchan@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	joaodias@google.com
Subject: Re: [PATCH] mm: be more verbose for alloc_contig_range faliures
Date: Thu, 4 Mar 2021 17:28:32 +0100	[thread overview]
Message-ID: <d3095ead-a762-61cd-0990-702e14e03d10@redhat.com> (raw)
In-Reply-To: <YEEJf0itS/8vn8Iy@google.com>

On 04.03.21 17:23, Minchan Kim wrote:
> On Thu, Mar 04, 2021 at 05:10:52PM +0100, David Hildenbrand wrote:
>> On 04.03.21 17:01, Minchan Kim wrote:
>>> On Tue, Mar 02, 2021 at 09:23:49AM -0800, Minchan Kim wrote:
>>>> On Fri, Feb 19, 2021 at 10:28:12AM +0100, Michal Hocko wrote:
>>>>> On Thu 18-02-21 08:19:50, Minchan Kim wrote:
>>>>>> On Thu, Feb 18, 2021 at 10:43:21AM +0100, David Hildenbrand wrote:
>>>>>>> On 18.02.21 10:35, Michal Hocko wrote:
>>>>>>>> On Thu 18-02-21 10:02:43, David Hildenbrand wrote:
>>>>>>>>> On 18.02.21 09:56, Michal Hocko wrote:
>>>>>>>>>> On Wed 17-02-21 08:36:03, Minchan Kim wrote:
>>>>>>>>>>> alloc_contig_range is usually used on cma area or movable zone.
>>>>>>>>>>> It's critical if the page migration fails on those areas so
>>>>>>>>>>> dump more debugging message like memory_hotplug unless user
>>>>>>>>>>> specifiy __GFP_NOWARN.
>>>>>>>>>>
>>>>>>>>>> I agree with David that this has a potential to generate a lot of output
>>>>>>>>>> and it is not really clear whether it is worth it. Page isolation code
>>>>>>>>>> already has REPORT_FAILURE mode which currently used only for the memory
>>>>>>>>>> hotplug because this was just too noisy from the CMA path - d381c54760dc
>>>>>>>>>> ("mm: only report isolation failures when offlining memory").
>>>>>>>>>>
>>>>>>>>>> Maybe migration failures are less likely to fail but still.
>>>>>>>>>
>>>>>>>>> Side note: I really dislike that uncontrolled error reporting on memory
>>>>>>>>> offlining path we have enabled as default. Yeah, it might be useful for
>>>>>>>>> ZONE_MOVABLE in some cases, but otherwise it's just noise.
>>>>>>>>>
>>>>>>>>> Just do a "sudo stress-ng --memhotplug 1" and see the log getting flooded
>>>>>>>>
>>>>>>>> Anyway we can discuss this in a separate thread but I think this is not
>>>>>>>> a representative workload.
>>>>>>>
>>>>>>> Sure, but the essence is "this is noise", and we'll have more noise on
>>>>>>> alloc_contig_range() as we see these calls more frequently. There should be
>>>>>>> an explicit way to enable such *debug* messages.
>>>>>>
>>>>>> alloc_contig_range already has gfp_mask and it respects __GFP_NOWARN.
>>>>>> Why shouldn't people use it if they don't care the failure?
>>>>>> Semantically, it makes sense to me.
>>>>
>>>> Sorry for the late response.
>>>>
>>>>>
>>>>> Well, alloc_contig_range doesn't really have to implement all the gfp
>>>>> flags. This is a matter of practicality. alloc_contig_range is quite
>>>>> different from the page allocator because it is to be expected that it
>>>>> can fail the request. This is avery optimistic allocation request. That
>>>>> would suggest that complaining about allocation failures is rather
>>>>> noisy.
>>>>
>>>> That was why I'd like to approach for per-call site indicator with
>>>> __GFP_NOWARN. Even though it was allocation from CMA, some of them
>>>> wouldn't be critical for the failure so those wouldn't care of
>>>> the failure. cma_alloc already has carried on "bool no_warn"
>>>> which was changed into gfp_t recently. What alloc_contig_range
>>>> should do is to take care of the request.
>>>>
>>>>>
>>>>> Now I do understand that some users would like to see why those
>>>>> allocations have failed. The question is whether that information is
>>>>> generally useful or it is more of a debugging aid. The amount of
>>>>> information is also an important aspect. It would be rather unfortunate
>>>>> to dump thousands of pages just because they cannot be migrated.
>>>>
>>>> Totally, agree dumping thounds of pages as debugging aid are bad.
>>>> Couldn't we simply ratelimit them like other places?
>>>>
>>>>>
>>>>> I do not have a strong opinion here. We can make all alloc_contig_range
>>>>> users use GFP_NOWARN by default and only skip the flag from the cma
>>>>> allocator but I am slowly leaning towards (ab)using dynamic debugging
>>>>
>>>> I agree the rest of the places are GFP_NOWARN by default except CMA
>>>> if they expect alloc_contig_range are optimistic allocation request.
>>>> However, I'd like to tweak it for CMA - accept gfp_t from cma_alloc
>>>> and take care of the __GFP_NOWARN since some sites of CMA could be
>>>> fault tolerant so no need to get the warning.
>>>
>>> Any thought to proceed?
>>
>> IMHO, add some proper debug mechanisms and don't try squeezing debug
>> messages into "WARN" semantics.
>>
>> Any alloc_contig_range() user can benefit from that.
> 
> So the point is how we could add proper debug mechanism here.
> Think about call site A is not critical for the failure but
> called very frquently. Call site B is critical for the failure
> but called very rarely so turns on system wide dynamic debugging.
> You could see a lot of debug message from A even though we
> dont't want it. Even, it could hide B's debugging message
> by ratelimiting.

Do you have a real life example how this would be an issue? This sounds 
like a purely theoretical construct.

You want to debug something, so you try triggering it and capturing 
debug data. There are not that many alloc_contig_range() users such that 
this would really be an issue to isolate ...

Strictly speaking: any allocation failure on ZONE_MOVABLE or CMA is 
problematic (putting aside NORETRY logic and similar aside). So any such 
page you hit is worth investigating and, therefore, worth getting logged 
for debugging purposes.

-- 
Thanks,

David / dhildenb



  reply	other threads:[~2021-03-04 16:29 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-17 16:36 [PATCH] mm: be more verbose for alloc_contig_range faliures Minchan Kim
2021-02-17 16:51 ` David Hildenbrand
2021-02-17 17:26   ` Minchan Kim
2021-02-17 17:34     ` David Hildenbrand
2021-02-17 17:45       ` Minchan Kim
2021-02-18  8:56 ` Michal Hocko
2021-02-18  9:02   ` David Hildenbrand
2021-02-18  9:35     ` Michal Hocko
2021-02-18  9:43       ` David Hildenbrand
2021-02-18  9:59         ` Michal Hocko
2021-02-18 16:19         ` Minchan Kim
2021-02-18 16:26           ` David Hildenbrand
2021-02-18 16:47             ` Minchan Kim
2021-02-18 16:53               ` David Hildenbrand
2021-02-19  9:28           ` Michal Hocko
2021-02-19  9:30             ` David Hildenbrand
2021-02-19 10:02               ` Michal Hocko
2021-02-19 10:34                 ` David Hildenbrand
     [not found]             ` <YD50pcPuwV456vwm@google.com>
2021-03-04 16:01               ` Minchan Kim
2021-03-04 16:10                 ` David Hildenbrand
2021-03-04 16:23                   ` Minchan Kim
2021-03-04 16:28                     ` David Hildenbrand [this message]
2021-03-04 17:11                       ` Minchan Kim
2021-03-04 17:23                         ` David Hildenbrand
2021-03-04 18:11                           ` Minchan Kim
2021-03-04 18:22                             ` Minchan Kim
2021-03-08 12:49                               ` Michal Hocko
2021-03-08 13:22                                 ` David Hildenbrand
2021-03-08 14:11                                   ` Michal Hocko
2021-03-08 14:13                                     ` David Hildenbrand
2021-03-08 15:42                                       ` Michal Hocko
2021-03-08 15:58                                         ` Minchan Kim
2021-03-08 16:21                                           ` Michal Hocko
2021-03-08 17:01                                             ` Minchan Kim
2021-03-08 20:27                                           ` Minchan Kim
2021-02-18 16:10   ` Minchan Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d3095ead-a762-61cd-0990-702e14e03d10@redhat.com \
    --to=david@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=joaodias@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=minchan@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).