linux-next.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
	Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Stephen Rothwell <sfr@canb.auug.org.au>,
	linux-mm@kvack.org, linux-next@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [linux-next: Tree for Jun 1] __khugepaged_exit rwsem_down_write_failed lockup
Date: Fri, 3 Jun 2016 16:46:00 +0200	[thread overview]
Message-ID: <20160603144600.GK20676@dhcp22.suse.cz> (raw)
In-Reply-To: <20160603135154.GD29930@redhat.com>

On Fri 03-06-16 15:51:54, Andrea Arcangeli wrote:
> On Thu, Jun 02, 2016 at 02:21:10PM +0200, Michal Hocko wrote:
> > Testing with the patch makes some sense as well, but I would like to
> > hear from Andrea whether the approach is good because I am wondering why
> > he hasn't done that before - it feels so much simpler than the current
> > code.
> 
> The down_write in the exit path comes from __ksm_exit. If you don't
> like it there I'd suggest to also remove it from __ksm_exit.

I see

> This is a proposed cleanup correct?

yes this is a cleanup but also a robustness thing, see below.
 
> The first thing that I can notice is that khugepaged_test_exit() then
> can only be called and provide the expected retval, after
> atomic_inc_not_zero(mm_users). Also note mmget_not_zero() should be
> used instead.

I didn't get used to mmget_not_zero yet, but true a helper would be
better.

[...]

> To me the fewer time we hold the mm_users the better and I don't see
> an obvious runtime improvement coming from this change. It's a bit
> simpler yes, but the down_write in the exit path is well understood,
> ksm does the same thing and it's in a slow path (it only happens if
> the mm that exited is the current one under scan by either ksmd or
> khugepaged, so normally the down_write is not executed in the exit
> path and the "mm" is collected right away both as a mm_users and
> mm_count).

OK, I see your point. I wasn't aware that the mmap_sem is dropped
before the allocation request. Then the original code indeed might
get into exit_mmap earlier wrt. to the patch.

The reason I dislike taking write lock in the __mmput is basically
for the same reason you have pointed out. exit_mmap might be delayed
for an unbounded amount of time. khugepaged resp. ksmd might be well
behaved and release their read lock for costly operations or when they
detect the mm is dead but it is hard to guarantee that all potential
kernel users/drivers are behaving the same way. It is not really trivial
to check whether we have such users (there are 100+ users outside of mm/
as per my quick git grep).

The exit path should be as simple as possible with the amount of
external dependencies reduced to the bare minimum.

> In short I think it's a tradeoff: pros) removes down_write in a slow
> path of the the mm exit which may simplify the code a bit, cons) it
> could increase the latency in freeing memory as result of a task
> exiting or being killed during the khugepaged scan, for example while
> the THP is being allocated. While compaction runs to allocate the THP
> in collapse_huge_page, if the task is killed currently the memory is
> released right away, without waiting for the allocation to succeed or
> fail.

Are those latencies a real problem. The allocation itself shouldn't
really take a long time.

> I don't see a big enough problem with the down_write in a slow path of
> khugepaged_exit to justify the increased latency in releasing memory.

What do you think about the external dependencies mentioned above. Do
you think this is a sufficient argument wrt. occasional higher
latencies?
[...]

> If prefer instead to remove the down_write, you probably could move
> the test_exit before the down_read/write to bail out before taking the
> lock: you don't need the mmap_sem to do test_exit anymore. The only
> reason the text_exit would remain in fact is just to reduce the
> latency of the memory freeing, it then becomes a voluntary preempt
> cond_resched() to release the memory to make a parallel ;), but unable
> to let the kernel free the memory while the THP allocation runs.

OK, I will think about that as well.

Thanks!
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2016-06-03 14:46 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-01  3:11 linux-next: Tree for Jun 1 Stephen Rothwell
2016-06-02  1:48 ` [linux-next: Tree for Jun 1] __khugepaged_exit rwsem_down_write_failed lockup Sergey Senozhatsky
2016-06-02  9:21   ` Michal Hocko
2016-06-02 12:08     ` Sergey Senozhatsky
2016-06-02 12:21       ` Michal Hocko
2016-06-03 13:51         ` Andrea Arcangeli
2016-06-03 14:46           ` Michal Hocko [this message]
2016-06-03 15:10             ` Andrea Arcangeli
2016-06-07  7:34               ` Michal Hocko
2016-06-08  8:19               ` Vlastimil Babka
2016-06-03  7:15     ` Sergey Senozhatsky
2016-06-03  7:25       ` Michal Hocko
2016-06-03  8:43         ` Sergey Senozhatsky
2016-06-03  9:55           ` Michal Hocko
2016-06-03 10:05             ` Michal Hocko
2016-06-03 13:38               ` Sergey Senozhatsky
2016-06-03 13:45                 ` Michal Hocko
2016-06-03 13:49                   ` Michal Hocko
2016-06-04  7:51                     ` Sergey Senozhatsky
2016-06-06  8:39                       ` Michal Hocko
2016-06-02 13:24   ` Vlastimil Babka
2016-06-02 18:58     ` Ebru Akagunduz
2016-06-03  1:00       ` Sergey Senozhatsky
2016-06-03  1:29         ` Sergey Senozhatsky
2016-06-03  4:14           ` Sergey Senozhatsky
2016-06-03 12:28     ` [PATCH] mm, thp: fix locking inconsistency in collapse_huge_page Ebru Akagunduz
2016-06-06 13:05       ` Vlastimil Babka
2016-06-09  3:51         ` Sergey Senozhatsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160603144600.GK20676@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-next@vger.kernel.org \
    --cc=sergey.senozhatsky.work@gmail.com \
    --cc=sergey.senozhatsky@gmail.com \
    --cc=sfr@canb.auug.org.au \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).