linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Nikolay Borisov <kernel@kyup.com>
To: Michal Hocko <mhocko@suse.cz>
Cc: linux-mm@kvack.org, mgorman@suse.de,
	Andrew Morton <akpm@linux-foundation.org>,
	Marian Marinov <mm@1h.com>,
	SiteGround Operations <operations@siteground.com>,
	Jan Kara <jack@suse.cz>
Subject: Re: Making per-cpu lists draining dependant on a flag
Date: Wed, 14 Oct 2015 12:06:21 +0300	[thread overview]
Message-ID: <561E1B0D.9050809@kyup.com> (raw)
In-Reply-To: <20151014083710.GF28333@dhcp22.suse.cz>



On 10/14/2015 11:37 AM, Michal Hocko wrote:
> On Fri 09-10-15 14:00:31, Nikolay Borisov wrote:
>> Hello mm people,
>>
>>
>> I want to ask you the following question which stemmed from analysing
>> and chasing this particular deadlock:
>> http://permalink.gmane.org/gmane.linux.kernel/2056730
> 
> This link doesn't seem to work properly for me. Could you post a
> http://lkml.kernel.org/r/$msg_id link please?
> 
>> To summarise it:
>>
>> For simplicity I will use the following nomenclature:
>> t1 - kworker/u96:0
>> t2 - kworker/u98:39
>> t3 - kworker/u98:7
>>
>> t1 issues drain_all_pages which generates IPI's, at the same time
>> however,
> 
> OK, as per
> http://lkml.kernel.org/r/1444318308-27560-1-git-send-email-kernel%40kyup.com
> drain_all_pages is called from the __alloc_pages_nodemask called from
> slab allocator. There is no stack leading to the allocation but then you
> are saying
> 
>> t2 has already started doing async write of pages
>> as part of its normal operation but is blocked upon t1 completion of
>> its IPI (generated from drain_all_pages) since they both work on the
>> same dm-thin volume.
> 
> which I read as the allocator is holding the same dm_bufio_lock, right?
> 
>> At the same time again, t3 is executing
>> ext4_finish_bio, which disables interrupts, yet is dependent on t2
>> completing its writes.
> 
> That would be a bug on its own because ext4_finish_bio seems to be
> called from SoftIRQ context so it cannot wait for a regular scheduling
> context. Whoever is holding that lock BH_Uptodate_Lock has to be in
> (soft)IRQ context.
> 
> <found the original thread on linux-mm finally - the threading got
> broken on the way>
> http://lkml.kernel.org/r/20151013131453.GA1332%40quack.suse.cz
> 
> So Jack (CCed) thinks this is a non-atomic update of flags and that
> indeed sounds plausible.
> 
>> But since it has disabled interrupts, it wont
>> respond to t1's IPI and at this point a hard lock up occurs. This
>> happens, since drain_all_pages calls on_each_cpu_mask with the last
>> argument equal to  "true" meaning "wait until the ipi handler has
>> finished", which of course will never happen in the described situation.
>>
>> Based on that I was wondering whether avoiding such situation might
>> merit making drain_all_pages invocation from
>> __alloc_pages_direct_reclaim dependent on a particular GFP being passed
>> e.g. GFP_NOPCPDRAIN or something along those lines?
> 
> I do not think so. Even if the dependency was real it would be a clear
> deadlock even without drain_all_pages AFAICS.
> 
>> Alternatively would it be possible to make the IPI asycnrhonous e.g.
>> calling on_each_cpu_mask with the last argument equal to false?
> 
> Strictly speaking the allocation path doesn't really depend on the sync
> behavior. We are just trying to release pages on pcp lists and retry the
> allocation. Even if the allocation context was faster than other CPUs
> and fail the request then we would try again without triggering the OOM
> because the reclaim has apparently made some progress.
> 
> Other callers might be more sensitive. Anyway this is called only if the
> allocator issues a sleeping allocation request so I think that waiting
> here is perfectly acceptable.

Thanks for taking the time to look over the issue. Indeed, I guess I
have been misled as to who the real culprit is, though the call traces
seemed to make the issue apparent. But kernel land seems to be a lot
more subtle :)

In any case I will test with Jack's patch and hopefully report that
everything is okay.

Nikolay

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

      reply	other threads:[~2015-10-14  9:06 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-09 11:00 Making per-cpu lists draining dependant on a flag Nikolay Borisov
2015-10-13 14:43 ` Michal Hocko
2015-10-13 14:55   ` Nikolay Borisov
2015-10-14  8:37 ` Michal Hocko
2015-10-14  9:06   ` Nikolay Borisov [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=561E1B0D.9050809@kyup.com \
    --to=kernel@kyup.com \
    --cc=akpm@linux-foundation.org \
    --cc=jack@suse.cz \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.cz \
    --cc=mm@1h.com \
    --cc=operations@siteground.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).