All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
To: aryabinin@virtuozzo.com, linux-mm@kvack.org
Cc: willy@infradead.org, hch@lst.de, jszhang@marvell.com,
	joelaf@google.com, chris@chris-wilson.co.uk, joaodias@google.com,
	tglx@linutronix.de, hpa@zytor.com, mingo@elte.hu
Subject: Re: [PATCH] mm: Remove pointless might_sleep() in remove_vm_area().
Date: Sat, 25 Mar 2017 07:47:52 +0900	[thread overview]
Message-ID: <201703250747.IAJ39022.HJSOtFFLVOFOMQ@I-love.SAKURA.ne.jp> (raw)
In-Reply-To: <fe511b26-f2e5-0a0e-09cc-303d38d2ad05@virtuozzo.com>

Andrey Ryabinin wrote:
> >>> ___might_sleep() via cond_resched_lock() in __purge_vmap_area_lazy() from
> >>> try_purge_vmap_area_lazy() from free_vmap_area_noflush() from
> >>> free_unmap_vmap_area() from remove_vm_area() which might trigger same
> >>> false positive warning is remaining. But so far we haven't heard about
> >>> warning from that path.
> >>
> >> And why that would be a false positive?
> >>
> > 
> > #define cond_resched_lock(lock) ({                              \
> > 	___might_sleep(__FILE__, __LINE__, PREEMPT_LOCK_OFFSET);\
> > 	__cond_resched_lock(lock);                              \
> > 	})
> > 
> > cond_resched_lock() calls ___might_sleep() even when
> > __cond_resched_lock() will not call preempt_schedule_common()
> > because should_resched() returns false due to preemption counter
> > being already elevated by holding &(&tfile->lock)->rlock spinlock.
>  
> That is true only for preemptible kernel. On non-preempt kernel should_resched()
> might return true under spin_lock.

Excuse me, but I did not catch what you mean.
I think

  spin_lock(&lock1);
  cond_resched_lock(&lock1); /* <= might sleep */
  spin_unlock(&lock1);

is true. Are you saying that

  spin_lock(&lock0);
  spin_lock(&lock1);
  cond_resched_lock(&lock1); /* <= might sleep */
  spin_unlock(&lock1);
  spin_unlock(&lock0);

is also true for non-preempt kernel?

> 
> Just fix the drm code. There is zero point in releasing memory under spinlock.

If above answer is "yes", I don't think fixing the drm code is what we can do now.
Rather, we need below change because commit 763b218ddfaf5676 ("mm: add preempt
points into __purge_vmap_area_lazy()") suddenly broke things without verifying
all callers.

>From 281cfbc722cd0f4ff83f880a2dd487fd42226e2d Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Fri, 24 Mar 2017 11:24:18 +0900
Subject: [PATCH] mm: Allow calling vfree() from non-schedulable context.

Commit 5803ed292e63a1bf ("mm: mark all calls into the vmalloc subsystem
as potentially sleeping") added might_sleep() to remove_vm_area() from
vfree(), and is causing

[    2.616064] BUG: sleeping function called from invalid context at mm/vmalloc.c:1480
[    2.616125] in_atomic(): 1, irqs_disabled(): 0, pid: 341, name: plymouthd
[    2.616156] 2 locks held by plymouthd/341:
[    2.616158]  #0:  (drm_global_mutex){+.+.+.}, at: [<ffffffffc01c274b>] drm_release+0x3b/0x3b0 [drm]
[    2.616256]  #1:  (&(&tfile->lock)->rlock){+.+...}, at: [<ffffffffc0173038>] ttm_object_file_release+0x28/0x90 [ttm]
[    2.616270] CPU: 2 PID: 341 Comm: plymouthd Not tainted 4.11.0-0.rc3.git0.1.kmallocwd.fc25.x86_64+debug #1
[    2.616271] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[    2.616273] Call Trace:
[    2.616281]  dump_stack+0x86/0xc3
[    2.616285]  ___might_sleep+0x17d/0x250
[    2.616289]  __might_sleep+0x4a/0x80
[    2.616293]  remove_vm_area+0x22/0x90
[    2.616296]  __vunmap+0x2e/0x110
[    2.616299]  vfree+0x42/0x90
[    2.616304]  kvfree+0x2c/0x40
[    2.616312]  drm_ht_remove+0x1a/0x30 [drm]
[    2.616317]  ttm_object_file_release+0x50/0x90 [ttm]
[    2.616324]  vmw_postclose+0x47/0x60 [vmwgfx]
[    2.616331]  drm_release+0x290/0x3b0 [drm]
[    2.616338]  __fput+0xf8/0x210
[    2.616342]  ____fput+0xe/0x10
[    2.616345]  task_work_run+0x85/0xc0
[    2.616351]  exit_to_usermode_loop+0xb4/0xc0
[    2.616355]  do_syscall_64+0x185/0x1f0
[    2.616359]  entry_SYSCALL64_slow_path+0x25/0x25

warning.

And commit 763b218ddfaf5676 ("mm: add preempt points into
__purge_vmap_area_lazy()") actually made vfree() potentially sleeping on
non-preemptible kernels. But we want to keep vfree() being callable from
non-schedulable context as with kfree() because vfree() is called via
kvfree().

This patch updates the condition to use __vfree_deferred() in order to
make sure that all vfree()/kvfree() users who did not notice that commit
will remain safe.

console_unlock() is a function which is prepared for being called from
non-schedulable context (e.g. spinlock held, inside RCU). It is using

  !oops_in_progress && preemptible() && !rcu_preempt_depth()

as a condition for whether it is safe to schedule. This patch uses that
condition with oops_in_progress check (which is not important for
__vunmap() case) removed.

Straightforward change will be

-		if (unlikely(in_interrupt()))
+		if (unlikely(in_interrupt() || !(preemptible() && !rcu_preempt_depth())))

in vfree(). But this patch optimizes the condition as below.

-  if (unlikely(in_interrupt() || !(preemptible() && !rcu_preempt_depth())))
+  if (unlikely(in_interrupt() || !preemptible() || rcu_preempt_depth()))

Then, since in_interrupt() and preemptible() are defined as

#define in_interrupt() (irq_count())
#define irq_count()    (preempt_count() & (HARDIRQ_MASK | SOFTIRQ_MASK | NMI_MASK))
#define preemptible()  (preempt_count() == 0 && !irqs_disabled())

if CONFIG_PREEMPT_COUNT=y, this condition is

-  if (unlikely(in_interrupt() || !preemptible() || rcu_preempt_depth()))
+  if (unlikely((preempt_count() & (HARDIRQ_MASK | SOFTIRQ_MASK | NMI_MASK)) ||
+          !(preempt_count() == 0 && !irqs_disabled()) || rcu_preempt_depth()))

and is thus

-   if (unlikely((preempt_count() & (HARDIRQ_MASK | SOFTIRQ_MASK | NMI_MASK)) ||
-           !(preempt_count() == 0 && !irqs_disabled()) || rcu_preempt_depth()))
+	    if (unlikely((preempt_count() & (HARDIRQ_MASK | SOFTIRQ_MASK | NMI_MASK)) ||
+	            (preempt_count() != 0 || irqs_disabled()) || rcu_preempt_depth()))

and is thus

-   if (unlikely((preempt_count() & (HARDIRQ_MASK | SOFTIRQ_MASK | NMI_MASK)) ||
-           (preempt_count() != 0 || irqs_disabled()) || rcu_preempt_depth()))
+	    if (unlikely((preempt_count() != 0 || irqs_disabled()) || rcu_preempt_depth()))

and is thus finally

-   if (unlikely((preempt_count() != 0 || irqs_disabled()) || rcu_preempt_depth()))
+   if (unlikely(preempt_count() || irqs_disabled() || rcu_preempt_depth()))

. This patch will not break CONFIG_PREEMPT_COUNT=n case because
in_interrupt() is evaluated as false because preempt_count() is always 0.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jisheng Zhang <jszhang@marvell.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: John Dias <joaodias@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: <stable@vger.kernel.org> # v4.10
---
 mm/vmalloc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 0b05762..5f0eb81 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1589,7 +1589,7 @@ void vfree(const void *addr)
 
 	if (!addr)
 		return;
-	if (unlikely(in_interrupt()))
+	if (unlikely(preempt_count() || irqs_disabled() || rcu_preempt_depth()))
 		__vfree_deferred(addr);
 	else
 		__vunmap(addr, 1);
-- 
1.8.3.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2017-03-24 22:49 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-24 10:53 [PATCH] mm: Remove pointless might_sleep() in remove_vm_area() Tetsuo Handa
2017-03-24 12:22 ` Andrey Ryabinin
2017-03-24 12:40   ` Tetsuo Handa
2017-03-24 15:05     ` Andrey Ryabinin
2017-03-24 16:17       ` Matthew Wilcox
2017-03-27 13:26         ` Andrey Ryabinin
2017-03-27 14:06           ` Matthew Wilcox
2017-03-27 14:10           ` Thomas Hellstrom
2017-03-27 14:29             ` Tetsuo Handa
2017-03-27 14:29               ` Tetsuo Handa
2017-03-27 15:02               ` Andrey Ryabinin
2017-03-27 15:02                 ` Andrey Ryabinin
2017-03-28 10:07                 ` Tetsuo Handa
2017-03-28 10:07                   ` Tetsuo Handa
2017-03-24 22:47       ` Tetsuo Handa [this message]
2017-03-27 10:16         ` Tetsuo Handa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201703250747.IAJ39022.HJSOtFFLVOFOMQ@I-love.SAKURA.ne.jp \
    --to=penguin-kernel@i-love.sakura.ne.jp \
    --cc=aryabinin@virtuozzo.com \
    --cc=chris@chris-wilson.co.uk \
    --cc=hch@lst.de \
    --cc=hpa@zytor.com \
    --cc=joaodias@google.com \
    --cc=joelaf@google.com \
    --cc=jszhang@marvell.com \
    --cc=linux-mm@kvack.org \
    --cc=mingo@elte.hu \
    --cc=tglx@linutronix.de \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.