linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* kernfs: possible deadlock between of->mutex and mmap_sem
@ 2014-03-01  1:14 Sasha Levin
  2014-03-01 23:18 ` Dave Chinner
  2014-03-03 22:39 ` Tejun Heo
  0 siblings, 2 replies; 7+ messages in thread
From: Sasha Levin @ 2014-03-01  1:14 UTC (permalink / raw)
  To: Tejun Heo, Greg KH; +Cc: LKML

Hi all,

I've stumbled on the following while fuzzing with trinity inside a KVM tools running the latest 
-next kernel.

We deal with files that have an mmap op by giving them a different locking class than the files 
which don't due to mmap_sem nesting being different for those files.

We assume that for mmap supporting files, of->mutex will be nested inside mm->mmap_sem. However, 
this is not always the case. Consider the following:

	kernfs_fop_write()
		copy_from_user()
			might_fault()

might_fault() suggests that we may lock mm->mmap_sem, which causes a reverse lock nesting of 
mm->mmap_sem inside of of->mutex.

I'll send a patch to fix it some time next week unless someone beats me to it :)


[ 1182.846501] ======================================================
[ 1182.847256] [ INFO: possible circular locking dependency detected ]
[ 1182.848111] 3.14.0-rc4-next-20140228-sasha-00011-g4077c67-dirty #26 Tainted: G        W
[ 1182.849088] -------------------------------------------------------
[ 1182.849927] trinity-c236/10658 is trying to acquire lock:
[ 1182.850094]  (&of->mutex#2){+.+.+.}, at: [<fs/kernfs/file.c:487>] kernfs_fop_mmap+0x54/0x120
[ 1182.850094]
[ 1182.850094] but task is already holding lock:
[ 1182.850094]  (&mm->mmap_sem){++++++}, at: [<mm/util.c:397>] vm_mmap_pgoff+0x6e/0xe0
[ 1182.850094]
[ 1182.850094] which lock already depends on the new lock.
[ 1182.850094]
[ 1182.850094]
[ 1182.850094] the existing dependency chain (in reverse order) is:
[ 1182.850094]
-> #1 (&mm->mmap_sem){++++++}:
[ 1182.856968]        [<kernel/locking/lockdep.c:1945 kernel/locking/lockdep.c:2131>] 
validate_chain+0x6c5/0x7b0
[ 1182.856968]        [<kernel/locking/lockdep.c:3182>] __lock_acquire+0x4cd/0x5a0
[ 1182.856968]        [<arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602>] 
lock_acquire+0x182/0x1d0
[ 1182.856968]        [<mm/memory.c:4188>] might_fault+0x7e/0xb0
[ 1182.860975]        [<arch/x86/include/asm/uaccess.h:713 fs/kernfs/file.c:291>] 
kernfs_fop_write+0xd8/0x190
[ 1182.860975]        [<fs/read_write.c:473>] vfs_write+0xe3/0x1d0
[ 1182.860975]        [<fs/read_write.c:523 fs/read_write.c:515>] SyS_write+0x5d/0xa0
[ 1182.860975]        [<arch/x86/kernel/entry_64.S:749>] tracesys+0xdd/0xe2
[ 1182.860975]
-> #0 (&of->mutex#2){+.+.+.}:
[ 1182.860975]        [<kernel/locking/lockdep.c:1840>] check_prev_add+0x13f/0x560
[ 1182.860975]        [<kernel/locking/lockdep.c:1945 kernel/locking/lockdep.c:2131>] 
validate_chain+0x6c5/0x7b0
[ 1182.860975]        [<kernel/locking/lockdep.c:3182>] __lock_acquire+0x4cd/0x5a0
[ 1182.860975]        [<arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602>] 
lock_acquire+0x182/0x1d0
[ 1182.860975]        [<kernel/locking/mutex.c:470 kernel/locking/mutex.c:571>] 
mutex_lock_nested+0x6a/0x510
[ 1182.860975]        [<fs/kernfs/file.c:487>] kernfs_fop_mmap+0x54/0x120
[ 1182.860975]        [<mm/mmap.c:1573>] mmap_region+0x310/0x5c0
[ 1182.860975]        [<mm/mmap.c:1365>] do_mmap_pgoff+0x385/0x430
[ 1182.860975]        [<mm/util.c:399>] vm_mmap_pgoff+0x8f/0xe0
[ 1182.860975]        [<mm/mmap.c:1416 mm/mmap.c:1374>] SyS_mmap_pgoff+0x1b0/0x210
[ 1182.860975]        [<arch/x86/kernel/sys_x86_64.c:72>] SyS_mmap+0x1d/0x20
[ 1182.860975]        [<arch/x86/kernel/entry_64.S:749>] tracesys+0xdd/0xe2
[ 1182.860975]
[ 1182.860975] other info that might help us debug this:
[ 1182.860975]
[ 1182.860975]  Possible unsafe locking scenario:
[ 1182.860975]
[ 1182.860975]        CPU0                    CPU1
[ 1182.860975]        ----                    ----
[ 1182.860975]   lock(&mm->mmap_sem);
[ 1182.860975]                                lock(&of->mutex#2);
[ 1182.860975]                                lock(&mm->mmap_sem);
[ 1182.860975]   lock(&of->mutex#2);
[ 1182.860975]
[ 1182.860975]  *** DEADLOCK ***
[ 1182.860975]
[ 1182.860975] 1 lock held by trinity-c236/10658:
[ 1182.860975]  #0:  (&mm->mmap_sem){++++++}, at: [<mm/util.c:397>] vm_mmap_pgoff+0x6e/0xe0
[ 1182.860975]
[ 1182.860975] stack backtrace:
[ 1182.860975] CPU: 2 PID: 10658 Comm: trinity-c236 Tainted: G        W 
3.14.0-rc4-next-20140228-sasha-00011-g4077c67-dirty #26
[ 1182.860975]  0000000000000000 ffff88011911fa48 ffffffff8438e945 0000000000000000
[ 1182.860975]  0000000000000000 ffff88011911fa98 ffffffff811a0109 ffff88011911fab8
[ 1182.860975]  ffff88011911fab8 ffff88011911fa98 ffff880119128cc0 ffff880119128cf8
[ 1182.860975] Call Trace:
[ 1182.860975]  [<lib/dump_stack.c:52>] dump_stack+0x52/0x7f
[ 1182.860975]  [<kernel/locking/lockdep.c:1213>] print_circular_bug+0x129/0x160
[ 1182.860975]  [<kernel/locking/lockdep.c:1840>] check_prev_add+0x13f/0x560
[ 1182.860975]  [<include/linux/spinlock.h:343 mm/slub.c:1933>] ? deactivate_slab+0x511/0x550
[ 1182.860975]  [<kernel/locking/lockdep.c:1945 kernel/locking/lockdep.c:2131>] 
validate_chain+0x6c5/0x7b0
[ 1182.860975]  [<kernel/locking/lockdep.c:3182>] __lock_acquire+0x4cd/0x5a0
[ 1182.860975]  [<mm/mmap.c:1552>] ? mmap_region+0x24a/0x5c0
[ 1182.860975]  [<arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602>] 
lock_acquire+0x182/0x1d0
[ 1182.860975]  [<fs/kernfs/file.c:487>] ? kernfs_fop_mmap+0x54/0x120
[ 1182.860975]  [<kernel/locking/mutex.c:470 kernel/locking/mutex.c:571>] mutex_lock_nested+0x6a/0x510
[ 1182.860975]  [<fs/kernfs/file.c:487>] ? kernfs_fop_mmap+0x54/0x120
[ 1182.860975]  [<kernel/sched/core.c:2477>] ? get_parent_ip+0x11/0x50
[ 1182.860975]  [<fs/kernfs/file.c:487>] ? kernfs_fop_mmap+0x54/0x120
[ 1182.860975]  [<fs/kernfs/file.c:487>] kernfs_fop_mmap+0x54/0x120
[ 1182.860975]  [<mm/mmap.c:1573>] mmap_region+0x310/0x5c0
[ 1182.860975]  [<mm/mmap.c:1365>] do_mmap_pgoff+0x385/0x430
[ 1182.860975]  [<mm/util.c:397>] ? vm_mmap_pgoff+0x6e/0xe0
[ 1182.860975]  [<mm/util.c:399>] vm_mmap_pgoff+0x8f/0xe0
[ 1182.860975]  [<kernel/rcu/update.c:97>] ? __rcu_read_unlock+0x44/0xb0
[ 1182.860975]  [<fs/file.c:641>] ? dup_fd+0x3c0/0x3c0
[ 1182.860975]  [<mm/mmap.c:1416 mm/mmap.c:1374>] SyS_mmap_pgoff+0x1b0/0x210
[ 1182.860975]  [<arch/x86/kernel/sys_x86_64.c:72>] SyS_mmap+0x1d/0x20
[ 1182.860975]  [<arch/x86/kernel/entry_64.S:749>] tracesys+0xdd/0xe2


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: kernfs: possible deadlock between of->mutex and mmap_sem
  2014-03-01  1:14 kernfs: possible deadlock between of->mutex and mmap_sem Sasha Levin
@ 2014-03-01 23:18 ` Dave Chinner
  2014-03-03 22:39 ` Tejun Heo
  1 sibling, 0 replies; 7+ messages in thread
From: Dave Chinner @ 2014-03-01 23:18 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Tejun Heo, Greg KH, LKML

On Fri, Feb 28, 2014 at 08:14:45PM -0500, Sasha Levin wrote:
> Hi all,
> 
> I've stumbled on the following while fuzzing with trinity inside a
> KVM tools running the latest -next kernel.
> 
> We deal with files that have an mmap op by giving them a different
> locking class than the files which don't due to mmap_sem nesting
> being different for those files.
> 
> We assume that for mmap supporting files, of->mutex will be nested
> inside mm->mmap_sem. However, this is not always the case. Consider
> the following:
> 
> 	kernfs_fop_write()
> 		copy_from_user()
> 			might_fault()
> 
> might_fault() suggests that we may lock mm->mmap_sem, which causes a
> reverse lock nesting of mm->mmap_sem inside of of->mutex.

Yup, all filesystems have to deal with this. It's a long standing
problem caused by a very rarely seen corner case that drives us
completely batty because it prevents us from being able to serialise
filesystem IO operations against page fault driven IO...

> I'll send a patch to fix it some time next week unless someone beats me to it :)
> 
> 
> [ 1182.846501] ======================================================
> [ 1182.847256] [ INFO: possible circular locking dependency detected ]
> [ 1182.848111] 3.14.0-rc4-next-20140228-sasha-00011-g4077c67-dirty #26 Tainted: G        W
> [ 1182.849088] -------------------------------------------------------
> [ 1182.849927] trinity-c236/10658 is trying to acquire lock:
> [ 1182.850094]  (&of->mutex#2){+.+.+.}, at: [<fs/kernfs/file.c:487>] kernfs_fop_mmap+0x54/0x120
> [ 1182.850094]
> [ 1182.850094] but task is already holding lock:
> [ 1182.850094]  (&mm->mmap_sem){++++++}, at: [<mm/util.c:397>] vm_mmap_pgoff+0x6e/0xe0
> [ 1182.850094]
> [ 1182.850094] which lock already depends on the new lock.
> [ 1182.850094]
> [ 1182.850094]
> [ 1182.850094] the existing dependency chain (in reverse order) is:
> [ 1182.850094]
> -> #1 (&mm->mmap_sem){++++++}:
> [ 1182.856968]        [<kernel/locking/lockdep.c:1945
> kernel/locking/lockdep.c:2131>] validate_chain+0x6c5/0x7b0
> [ 1182.856968]        [<kernel/locking/lockdep.c:3182>] __lock_acquire+0x4cd/0x5a0
> [ 1182.856968]        [<arch/x86/include/asm/current.h:14
> kernel/locking/lockdep.c:3602>] lock_acquire+0x182/0x1d0
> [ 1182.856968]        [<mm/memory.c:4188>] might_fault+0x7e/0xb0
> [ 1182.860975]        [<arch/x86/include/asm/uaccess.h:713
> fs/kernfs/file.c:291>] kernfs_fop_write+0xd8/0x190
> [ 1182.860975]        [<fs/read_write.c:473>] vfs_write+0xe3/0x1d0
> [ 1182.860975]        [<fs/read_write.c:523 fs/read_write.c:515>] SyS_write+0x5d/0xa0
> [ 1182.860975]        [<arch/x86/kernel/entry_64.S:749>] tracesys+0xdd/0xe2

Those stack traces are an unreadable mess.  If you're going to add
extra metadata to the stack, please put it *after* the
stack functions so the stack itself is easy to read.

i.e. the stack trace is far more important than line numbers, so the
stack itself should be optimised for readability. IOWs, the stack
functions go first and are neatly aligned, everything else can make
a mess after that....

Oh, and when pasting stack traces - turn off line wrapping ;)

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: kernfs: possible deadlock between of->mutex and mmap_sem
  2014-03-01  1:14 kernfs: possible deadlock between of->mutex and mmap_sem Sasha Levin
  2014-03-01 23:18 ` Dave Chinner
@ 2014-03-03 22:39 ` Tejun Heo
  2014-03-03 22:44   ` Sasha Levin
  1 sibling, 1 reply; 7+ messages in thread
From: Tejun Heo @ 2014-03-03 22:39 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Greg KH, LKML

On Fri, Feb 28, 2014 at 08:14:45PM -0500, Sasha Levin wrote:
> Hi all,
> 
> I've stumbled on the following while fuzzing with trinity inside a
> KVM tools running the latest -next kernel.
> 
> We deal with files that have an mmap op by giving them a different
> locking class than the files which don't due to mmap_sem nesting
> being different for those files.
> 
> We assume that for mmap supporting files, of->mutex will be nested
> inside mm->mmap_sem. However, this is not always the case. Consider
> the following:
> 
> 	kernfs_fop_write()
> 		copy_from_user()
> 			might_fault()
> 
> might_fault() suggests that we may lock mm->mmap_sem, which causes a
> reverse lock nesting of mm->mmap_sem inside of of->mutex.
> 
> I'll send a patch to fix it some time next week unless someone beats me to it :)

How are you planning to fix it?  Prolly the right thing to do would be
caching atomic_write_len in open_file and copy data before grabbing
any locks.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: kernfs: possible deadlock between of->mutex and mmap_sem
  2014-03-03 22:39 ` Tejun Heo
@ 2014-03-03 22:44   ` Sasha Levin
  2014-03-03 22:46     ` Tejun Heo
  0 siblings, 1 reply; 7+ messages in thread
From: Sasha Levin @ 2014-03-03 22:44 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Greg KH, LKML

On 03/03/2014 05:39 PM, Tejun Heo wrote:
> On Fri, Feb 28, 2014 at 08:14:45PM -0500, Sasha Levin wrote:
>> Hi all,
>>
>> I've stumbled on the following while fuzzing with trinity inside a
>> KVM tools running the latest -next kernel.
>>
>> We deal with files that have an mmap op by giving them a different
>> locking class than the files which don't due to mmap_sem nesting
>> being different for those files.
>>
>> We assume that for mmap supporting files, of->mutex will be nested
>> inside mm->mmap_sem. However, this is not always the case. Consider
>> the following:
>>
>> 	kernfs_fop_write()
>> 		copy_from_user()
>> 			might_fault()
>>
>> might_fault() suggests that we may lock mm->mmap_sem, which causes a
>> reverse lock nesting of mm->mmap_sem inside of of->mutex.
>>
>> I'll send a patch to fix it some time next week unless someone beats me to it :)
>
> How are you planning to fix it?  Prolly the right thing to do would be
> caching atomic_write_len in open_file and copy data before grabbing
> any locks.

I've actually didn't have a plan when I wrote that, was just planning on putting effort into it.


Thanks,
Sasha


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: kernfs: possible deadlock between of->mutex and mmap_sem
  2014-03-03 22:44   ` Sasha Levin
@ 2014-03-03 22:46     ` Tejun Heo
  2014-03-04 20:38       ` [PATCH driver-core-next] kernfs: cache atomic_write_len in kernfs_open_file Tejun Heo
  0 siblings, 1 reply; 7+ messages in thread
From: Tejun Heo @ 2014-03-03 22:46 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Greg KH, LKML

On Mon, Mar 03, 2014 at 05:44:52PM -0500, Sasha Levin wrote:
> >>I'll send a patch to fix it some time next week unless someone beats me to it :)
> >
> >How are you planning to fix it?  Prolly the right thing to do would be
> >caching atomic_write_len in open_file and copy data before grabbing
> >any locks.
> 
> I've actually didn't have a plan when I wrote that, was just
> planning on putting effort into it.

I'll prolly work on it tomorrow unless you beat me to it. :)

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH driver-core-next] kernfs: cache atomic_write_len in kernfs_open_file
  2014-03-03 22:46     ` Tejun Heo
@ 2014-03-04 20:38       ` Tejun Heo
  2014-03-05  2:50         ` Greg KH
  0 siblings, 1 reply; 7+ messages in thread
From: Tejun Heo @ 2014-03-04 20:38 UTC (permalink / raw)
  To: Greg KH, LKML; +Cc: Sasha Levin

While implementing atomic_write_len, 4d3773c4bb41 ("kernfs: implement
kernfs_ops->atomic_write_len") moved data copy from userland inside
kernfs_get_active() and kernfs_open_file->mutex so that
kernfs_ops->atomic_write_len can be accessed before copying buffer
from userland; unfortunately, this could lead to locking order
inversion involving mmap_sem if copy_from_user() takes a page fault.

  ======================================================
  [ INFO: possible circular locking dependency detected ]
  3.14.0-rc4-next-20140228-sasha-00011-g4077c67-dirty #26 Tainted: G        W
  -------------------------------------------------------
  trinity-c236/10658 is trying to acquire lock:
   (&of->mutex#2){+.+.+.}, at: [<fs/kernfs/file.c:487>] kernfs_fop_mmap+0x54/0x120

  but task is already holding lock:
   (&mm->mmap_sem){++++++}, at: [<mm/util.c:397>] vm_mmap_pgoff+0x6e/0xe0

  which lock already depends on the new lock.


  the existing dependency chain (in reverse order) is:

 -> #1 (&mm->mmap_sem){++++++}:
	 [<kernel/locking/lockdep.c:1945 kernel/locking/lockdep.c:2131>] validate_chain+0x6c5/0x7b0
	 [<kernel/locking/lockdep.c:3182>] __lock_acquire+0x4cd/0x5a0
	 [<arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602>] lock_acquire+0x182/0x1d0
	 [<mm/memory.c:4188>] might_fault+0x7e/0xb0
	 [<arch/x86/include/asm/uaccess.h:713 fs/kernfs/file.c:291>] kernfs_fop_write+0xd8/0x190
	 [<fs/read_write.c:473>] vfs_write+0xe3/0x1d0
	 [<fs/read_write.c:523 fs/read_write.c:515>] SyS_write+0x5d/0xa0
	 [<arch/x86/kernel/entry_64.S:749>] tracesys+0xdd/0xe2

 -> #0 (&of->mutex#2){+.+.+.}:
	 [<kernel/locking/lockdep.c:1840>] check_prev_add+0x13f/0x560
	 [<kernel/locking/lockdep.c:1945 kernel/locking/lockdep.c:2131>] validate_chain+0x6c5/0x7b0
	 [<kernel/locking/lockdep.c:3182>] __lock_acquire+0x4cd/0x5a0
	 [<arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602>] lock_acquire+0x182/0x1d0
	 [<kernel/locking/mutex.c:470 kernel/locking/mutex.c:571>] mutex_lock_nested+0x6a/0x510
	 [<fs/kernfs/file.c:487>] kernfs_fop_mmap+0x54/0x120
	 [<mm/mmap.c:1573>] mmap_region+0x310/0x5c0
	 [<mm/mmap.c:1365>] do_mmap_pgoff+0x385/0x430
	 [<mm/util.c:399>] vm_mmap_pgoff+0x8f/0xe0
	 [<mm/mmap.c:1416 mm/mmap.c:1374>] SyS_mmap_pgoff+0x1b0/0x210
	 [<arch/x86/kernel/sys_x86_64.c:72>] SyS_mmap+0x1d/0x20
	 [<arch/x86/kernel/entry_64.S:749>] tracesys+0xdd/0xe2

  other info that might help us debug this:

   Possible unsafe locking scenario:

	 CPU0                    CPU1
	 ----                    ----
    lock(&mm->mmap_sem);
				 lock(&of->mutex#2);
				 lock(&mm->mmap_sem);
    lock(&of->mutex#2);

   *** DEADLOCK ***

  1 lock held by trinity-c236/10658:
   #0:  (&mm->mmap_sem){++++++}, at: [<mm/util.c:397>] vm_mmap_pgoff+0x6e/0xe0

  stack backtrace:
  CPU: 2 PID: 10658 Comm: trinity-c236 Tainted: G        W 3.14.0-rc4-next-20140228-sasha-00011-g4077c67-dirty #26
   0000000000000000 ffff88011911fa48 ffffffff8438e945 0000000000000000
   0000000000000000 ffff88011911fa98 ffffffff811a0109 ffff88011911fab8
   ffff88011911fab8 ffff88011911fa98 ffff880119128cc0 ffff880119128cf8
  Call Trace:
   [<lib/dump_stack.c:52>] dump_stack+0x52/0x7f
   [<kernel/locking/lockdep.c:1213>] print_circular_bug+0x129/0x160
   [<kernel/locking/lockdep.c:1840>] check_prev_add+0x13f/0x560
   [<include/linux/spinlock.h:343 mm/slub.c:1933>] ? deactivate_slab+0x511/0x550
   [<kernel/locking/lockdep.c:1945 kernel/locking/lockdep.c:2131>] validate_chain+0x6c5/0x7b0
   [<kernel/locking/lockdep.c:3182>] __lock_acquire+0x4cd/0x5a0
   [<mm/mmap.c:1552>] ? mmap_region+0x24a/0x5c0
   [<arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602>] lock_acquire+0x182/0x1d0
   [<fs/kernfs/file.c:487>] ? kernfs_fop_mmap+0x54/0x120
   [<kernel/locking/mutex.c:470 kernel/locking/mutex.c:571>] mutex_lock_nested+0x6a/0x510
   [<fs/kernfs/file.c:487>] ? kernfs_fop_mmap+0x54/0x120
   [<kernel/sched/core.c:2477>] ? get_parent_ip+0x11/0x50
   [<fs/kernfs/file.c:487>] ? kernfs_fop_mmap+0x54/0x120
   [<fs/kernfs/file.c:487>] kernfs_fop_mmap+0x54/0x120
   [<mm/mmap.c:1573>] mmap_region+0x310/0x5c0
   [<mm/mmap.c:1365>] do_mmap_pgoff+0x385/0x430
   [<mm/util.c:397>] ? vm_mmap_pgoff+0x6e/0xe0
   [<mm/util.c:399>] vm_mmap_pgoff+0x8f/0xe0
   [<kernel/rcu/update.c:97>] ? __rcu_read_unlock+0x44/0xb0
   [<fs/file.c:641>] ? dup_fd+0x3c0/0x3c0
   [<mm/mmap.c:1416 mm/mmap.c:1374>] SyS_mmap_pgoff+0x1b0/0x210
   [<arch/x86/kernel/sys_x86_64.c:72>] SyS_mmap+0x1d/0x20
   [<arch/x86/kernel/entry_64.S:749>] tracesys+0xdd/0xe2

Fix it by caching atomic_write_len in kernfs_open_file during open so
that it can be determined without accessing kernfs_ops in
kernfs_fop_write().  This restores the structure of kernfs_fop_write()
before 4d3773c4bb41 with updated @len determination logic.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
References: http://lkml.kernel.org/g/53113485.2090407@oracle.com
---
 fs/kernfs/file.c       |   63 ++++++++++++++++++++++++-------------------------
 include/linux/kernfs.h |    1 
 2 files changed, 33 insertions(+), 31 deletions(-)

diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
index ddcb471..8034706 100644
--- a/fs/kernfs/file.c
+++ b/fs/kernfs/file.c
@@ -253,55 +253,50 @@ static ssize_t kernfs_fop_write(struct file *file, const char __user *user_buf,
 {
 	struct kernfs_open_file *of = kernfs_of(file);
 	const struct kernfs_ops *ops;
-	char *buf = NULL;
-	ssize_t len;
-
-	/*
-	 * @of->mutex nests outside active ref and is just to ensure that
-	 * the ops aren't called concurrently for the same open file.
-	 */
-	mutex_lock(&of->mutex);
-	if (!kernfs_get_active(of->kn)) {
-		mutex_unlock(&of->mutex);
-		return -ENODEV;
-	}
-
-	ops = kernfs_ops(of->kn);
-	if (!ops->write) {
-		len = -EINVAL;
-		goto out_unlock;
-	}
+	size_t len;
+	char *buf;
 
-	if (ops->atomic_write_len) {
+	if (of->atomic_write_len) {
 		len = count;
-		if (len > ops->atomic_write_len) {
-			len = -E2BIG;
-			goto out_unlock;
-		}
+		if (len > of->atomic_write_len)
+			return -E2BIG;
 	} else {
 		len = min_t(size_t, count, PAGE_SIZE);
 	}
 
 	buf = kmalloc(len + 1, GFP_KERNEL);
-	if (!buf) {
-		len = -ENOMEM;
-		goto out_unlock;
-	}
+	if (!buf)
+		return -ENOMEM;
 
 	if (copy_from_user(buf, user_buf, len)) {
 		len = -EFAULT;
-		goto out_unlock;
+		goto out_free;
 	}
 	buf[len] = '\0';	/* guarantee string termination */
 
-	len = ops->write(of, buf, len, *ppos);
-out_unlock:
+	/*
+	 * @of->mutex nests outside active ref and is just to ensure that
+	 * the ops aren't called concurrently for the same open file.
+	 */
+	mutex_lock(&of->mutex);
+	if (!kernfs_get_active(of->kn)) {
+		mutex_unlock(&of->mutex);
+		len = -ENODEV;
+		goto out_free;
+	}
+
+	ops = kernfs_ops(of->kn);
+	if (ops->write)
+		len = ops->write(of, buf, len, *ppos);
+	else
+		len = -EINVAL;
+
 	kernfs_put_active(of->kn);
 	mutex_unlock(&of->mutex);
 
 	if (len > 0)
 		*ppos += len;
-
+out_free:
 	kfree(buf);
 	return len;
 }
@@ -666,6 +661,12 @@ static int kernfs_fop_open(struct inode *inode, struct file *file)
 	of->file = file;
 
 	/*
+	 * Write path needs to atomic_write_len outside active reference.
+	 * Cache it in open_file.  See kernfs_fop_write() for details.
+	 */
+	of->atomic_write_len = ops->atomic_write_len;
+
+	/*
 	 * Always instantiate seq_file even if read access doesn't use
 	 * seq_file or is not requested.  This unifies private data access
 	 * and readable regular files are the vast majority anyway.
diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
index 649497a..65a3e5a 100644
--- a/include/linux/kernfs.h
+++ b/include/linux/kernfs.h
@@ -158,6 +158,7 @@ struct kernfs_open_file {
 	int			event;
 	struct list_head	list;
 
+	size_t			atomic_write_len;
 	bool			mmapped;
 	const struct vm_operations_struct *vm_ops;
 };

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH driver-core-next] kernfs: cache atomic_write_len in kernfs_open_file
  2014-03-04 20:38       ` [PATCH driver-core-next] kernfs: cache atomic_write_len in kernfs_open_file Tejun Heo
@ 2014-03-05  2:50         ` Greg KH
  0 siblings, 0 replies; 7+ messages in thread
From: Greg KH @ 2014-03-05  2:50 UTC (permalink / raw)
  To: Tejun Heo; +Cc: LKML, Sasha Levin

On Tue, Mar 04, 2014 at 03:38:46PM -0500, Tejun Heo wrote:
> While implementing atomic_write_len, 4d3773c4bb41 ("kernfs: implement
> kernfs_ops->atomic_write_len") moved data copy from userland inside
> kernfs_get_active() and kernfs_open_file->mutex so that
> kernfs_ops->atomic_write_len can be accessed before copying buffer
> from userland; unfortunately, this could lead to locking order
> inversion involving mmap_sem if copy_from_user() takes a page fault.
> 
>   ======================================================
>   [ INFO: possible circular locking dependency detected ]
>   3.14.0-rc4-next-20140228-sasha-00011-g4077c67-dirty #26 Tainted: G        W
>   -------------------------------------------------------
>   trinity-c236/10658 is trying to acquire lock:
>    (&of->mutex#2){+.+.+.}, at: [<fs/kernfs/file.c:487>] kernfs_fop_mmap+0x54/0x120
> 
>   but task is already holding lock:
>    (&mm->mmap_sem){++++++}, at: [<mm/util.c:397>] vm_mmap_pgoff+0x6e/0xe0
> 
>   which lock already depends on the new lock.
> 
> 
>   the existing dependency chain (in reverse order) is:
> 
>  -> #1 (&mm->mmap_sem){++++++}:
> 	 [<kernel/locking/lockdep.c:1945 kernel/locking/lockdep.c:2131>] validate_chain+0x6c5/0x7b0
> 	 [<kernel/locking/lockdep.c:3182>] __lock_acquire+0x4cd/0x5a0
> 	 [<arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602>] lock_acquire+0x182/0x1d0
> 	 [<mm/memory.c:4188>] might_fault+0x7e/0xb0
> 	 [<arch/x86/include/asm/uaccess.h:713 fs/kernfs/file.c:291>] kernfs_fop_write+0xd8/0x190
> 	 [<fs/read_write.c:473>] vfs_write+0xe3/0x1d0
> 	 [<fs/read_write.c:523 fs/read_write.c:515>] SyS_write+0x5d/0xa0
> 	 [<arch/x86/kernel/entry_64.S:749>] tracesys+0xdd/0xe2
> 
>  -> #0 (&of->mutex#2){+.+.+.}:
> 	 [<kernel/locking/lockdep.c:1840>] check_prev_add+0x13f/0x560
> 	 [<kernel/locking/lockdep.c:1945 kernel/locking/lockdep.c:2131>] validate_chain+0x6c5/0x7b0
> 	 [<kernel/locking/lockdep.c:3182>] __lock_acquire+0x4cd/0x5a0
> 	 [<arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602>] lock_acquire+0x182/0x1d0
> 	 [<kernel/locking/mutex.c:470 kernel/locking/mutex.c:571>] mutex_lock_nested+0x6a/0x510
> 	 [<fs/kernfs/file.c:487>] kernfs_fop_mmap+0x54/0x120
> 	 [<mm/mmap.c:1573>] mmap_region+0x310/0x5c0
> 	 [<mm/mmap.c:1365>] do_mmap_pgoff+0x385/0x430
> 	 [<mm/util.c:399>] vm_mmap_pgoff+0x8f/0xe0
> 	 [<mm/mmap.c:1416 mm/mmap.c:1374>] SyS_mmap_pgoff+0x1b0/0x210
> 	 [<arch/x86/kernel/sys_x86_64.c:72>] SyS_mmap+0x1d/0x20
> 	 [<arch/x86/kernel/entry_64.S:749>] tracesys+0xdd/0xe2
> 
>   other info that might help us debug this:
> 
>    Possible unsafe locking scenario:
> 
> 	 CPU0                    CPU1
> 	 ----                    ----
>     lock(&mm->mmap_sem);
> 				 lock(&of->mutex#2);
> 				 lock(&mm->mmap_sem);
>     lock(&of->mutex#2);
> 
>    *** DEADLOCK ***
> 
>   1 lock held by trinity-c236/10658:
>    #0:  (&mm->mmap_sem){++++++}, at: [<mm/util.c:397>] vm_mmap_pgoff+0x6e/0xe0
> 
>   stack backtrace:
>   CPU: 2 PID: 10658 Comm: trinity-c236 Tainted: G        W 3.14.0-rc4-next-20140228-sasha-00011-g4077c67-dirty #26
>    0000000000000000 ffff88011911fa48 ffffffff8438e945 0000000000000000
>    0000000000000000 ffff88011911fa98 ffffffff811a0109 ffff88011911fab8
>    ffff88011911fab8 ffff88011911fa98 ffff880119128cc0 ffff880119128cf8
>   Call Trace:
>    [<lib/dump_stack.c:52>] dump_stack+0x52/0x7f
>    [<kernel/locking/lockdep.c:1213>] print_circular_bug+0x129/0x160
>    [<kernel/locking/lockdep.c:1840>] check_prev_add+0x13f/0x560
>    [<include/linux/spinlock.h:343 mm/slub.c:1933>] ? deactivate_slab+0x511/0x550
>    [<kernel/locking/lockdep.c:1945 kernel/locking/lockdep.c:2131>] validate_chain+0x6c5/0x7b0
>    [<kernel/locking/lockdep.c:3182>] __lock_acquire+0x4cd/0x5a0
>    [<mm/mmap.c:1552>] ? mmap_region+0x24a/0x5c0
>    [<arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602>] lock_acquire+0x182/0x1d0
>    [<fs/kernfs/file.c:487>] ? kernfs_fop_mmap+0x54/0x120
>    [<kernel/locking/mutex.c:470 kernel/locking/mutex.c:571>] mutex_lock_nested+0x6a/0x510
>    [<fs/kernfs/file.c:487>] ? kernfs_fop_mmap+0x54/0x120
>    [<kernel/sched/core.c:2477>] ? get_parent_ip+0x11/0x50
>    [<fs/kernfs/file.c:487>] ? kernfs_fop_mmap+0x54/0x120
>    [<fs/kernfs/file.c:487>] kernfs_fop_mmap+0x54/0x120
>    [<mm/mmap.c:1573>] mmap_region+0x310/0x5c0
>    [<mm/mmap.c:1365>] do_mmap_pgoff+0x385/0x430
>    [<mm/util.c:397>] ? vm_mmap_pgoff+0x6e/0xe0
>    [<mm/util.c:399>] vm_mmap_pgoff+0x8f/0xe0
>    [<kernel/rcu/update.c:97>] ? __rcu_read_unlock+0x44/0xb0
>    [<fs/file.c:641>] ? dup_fd+0x3c0/0x3c0
>    [<mm/mmap.c:1416 mm/mmap.c:1374>] SyS_mmap_pgoff+0x1b0/0x210
>    [<arch/x86/kernel/sys_x86_64.c:72>] SyS_mmap+0x1d/0x20
>    [<arch/x86/kernel/entry_64.S:749>] tracesys+0xdd/0xe2
> 
> Fix it by caching atomic_write_len in kernfs_open_file during open so
> that it can be determined without accessing kernfs_ops in
> kernfs_fop_write().  This restores the structure of kernfs_fop_write()
> before 4d3773c4bb41 with updated @len determination logic.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Reported-by: Sasha Levin <sasha.levin@oracle.com>
> References: http://lkml.kernel.org/g/53113485.2090407@oracle.com
> ---
>  fs/kernfs/file.c       |   63 ++++++++++++++++++++++++-------------------------
>  include/linux/kernfs.h |    1 
>  2 files changed, 33 insertions(+), 31 deletions(-)

This is for 3.15, right?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-03-05  2:49 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-03-01  1:14 kernfs: possible deadlock between of->mutex and mmap_sem Sasha Levin
2014-03-01 23:18 ` Dave Chinner
2014-03-03 22:39 ` Tejun Heo
2014-03-03 22:44   ` Sasha Levin
2014-03-03 22:46     ` Tejun Heo
2014-03-04 20:38       ` [PATCH driver-core-next] kernfs: cache atomic_write_len in kernfs_open_file Tejun Heo
2014-03-05  2:50         ` Greg KH

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).