Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH -next] fork: annotate a data race in vm_area_dup()
@ 2020-02-14 20:33 Qian Cai
  2020-02-17 22:31 ` Kirill A. Shutemov
  0 siblings, 1 reply; 9+ messages in thread
From: Qian Cai @ 2020-02-14 20:33 UTC (permalink / raw)
  To: akpm; +Cc: elver, linux-mm, linux-kernel, Qian Cai

struct vm_area_struct could be accessed concurrently as noticed by
KCSAN,

 write to 0xffff9cf8bba08ad8 of 8 bytes by task 14263 on cpu 35:
  vma_interval_tree_insert+0x101/0x150:
  rb_insert_augmented_cached at include/linux/rbtree_augmented.h:58
  (inlined by) vma_interval_tree_insert at mm/interval_tree.c:23
  __vma_link_file+0x6e/0xe0
  __vma_link_file at mm/mmap.c:629
  vma_link+0xa2/0x120
  mmap_region+0x753/0xb90
  do_mmap+0x45c/0x710
  vm_mmap_pgoff+0xc0/0x130
  ksys_mmap_pgoff+0x1d1/0x300
  __x64_sys_mmap+0x33/0x40
  do_syscall_64+0x91/0xc44
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

 read to 0xffff9cf8bba08a80 of 200 bytes by task 14262 on cpu 122:
  vm_area_dup+0x6a/0xe0
  vm_area_dup at kernel/fork.c:362
  __split_vma+0x72/0x2a0
  __split_vma at mm/mmap.c:2661
  split_vma+0x5a/0x80
  mprotect_fixup+0x368/0x3f0
  do_mprotect_pkey+0x263/0x420
  __x64_sys_mprotect+0x51/0x70
  do_syscall_64+0x91/0xc44
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

The write is holding mmap_sem while changing vm_area_struct.shared.rb.
Even though the read is lockless while making a copy, the clone will
have its own shared.rb reinitialized. Thus, mark it as an intentional
data race using the data_race() macro.

Signed-off-by: Qian Cai <cai@lca.pw>
---
 kernel/fork.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index 41f784b6203a..81bdc6e8a6cf 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -359,7 +359,11 @@ struct vm_area_struct *vm_area_dup(struct vm_area_struct *orig)
 	struct vm_area_struct *new = kmem_cache_alloc(vm_area_cachep, GFP_KERNEL);
 
 	if (new) {
-		*new = *orig;
+		/*
+		 * @orig may be modified concurrently, but the clone will be
+		 * reinitialized.
+		 */
+		*new = data_race(*orig);
 		INIT_LIST_HEAD(&new->anon_vma_chain);
 	}
 	return new;
-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH -next] fork: annotate a data race in vm_area_dup()
  2020-02-14 20:33 [PATCH -next] fork: annotate a data race in vm_area_dup() Qian Cai
@ 2020-02-17 22:31 ` Kirill A. Shutemov
  2020-02-18  3:59   ` Qian Cai
  0 siblings, 1 reply; 9+ messages in thread
From: Kirill A. Shutemov @ 2020-02-17 22:31 UTC (permalink / raw)
  To: Qian Cai; +Cc: akpm, elver, linux-mm, linux-kernel

On Fri, Feb 14, 2020 at 03:33:23PM -0500, Qian Cai wrote:
> struct vm_area_struct could be accessed concurrently as noticed by
> KCSAN,
> 
>  write to 0xffff9cf8bba08ad8 of 8 bytes by task 14263 on cpu 35:
>   vma_interval_tree_insert+0x101/0x150:
>   rb_insert_augmented_cached at include/linux/rbtree_augmented.h:58
>   (inlined by) vma_interval_tree_insert at mm/interval_tree.c:23
>   __vma_link_file+0x6e/0xe0
>   __vma_link_file at mm/mmap.c:629
>   vma_link+0xa2/0x120
>   mmap_region+0x753/0xb90
>   do_mmap+0x45c/0x710
>   vm_mmap_pgoff+0xc0/0x130
>   ksys_mmap_pgoff+0x1d1/0x300
>   __x64_sys_mmap+0x33/0x40
>   do_syscall_64+0x91/0xc44
>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
>  read to 0xffff9cf8bba08a80 of 200 bytes by task 14262 on cpu 122:
>   vm_area_dup+0x6a/0xe0
>   vm_area_dup at kernel/fork.c:362
>   __split_vma+0x72/0x2a0
>   __split_vma at mm/mmap.c:2661
>   split_vma+0x5a/0x80
>   mprotect_fixup+0x368/0x3f0
>   do_mprotect_pkey+0x263/0x420
>   __x64_sys_mprotect+0x51/0x70
>   do_syscall_64+0x91/0xc44
>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
> The write is holding mmap_sem while changing vm_area_struct.shared.rb.
> Even though the read is lockless while making a copy, the clone will
> have its own shared.rb reinitialized. Thus, mark it as an intentional
> data race using the data_race() macro.

I'm confused. AFAICS both sides hold mmap_sem on write:

 - vm_mmap_pgoff() takes mmap_sem for the write on the write side

 - do_mprotect_pkey() takes mmap_sem for the write on the read side


What do I miss?

-- 
 Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH -next] fork: annotate a data race in vm_area_dup()
  2020-02-17 22:31 ` Kirill A. Shutemov
@ 2020-02-18  3:59   ` Qian Cai
  2020-02-18 10:30     ` Kirill A. Shutemov
  0 siblings, 1 reply; 9+ messages in thread
From: Qian Cai @ 2020-02-18  3:59 UTC (permalink / raw)
  To: Kirill A. Shutemov; +Cc: Andrew Morton, elver, linux-mm, linux-kernel



> On Feb 17, 2020, at 5:31 PM, Kirill A. Shutemov <kirill@shutemov.name> wrote:
> 
> I'm confused. AFAICS both sides hold mmap_sem on write:
> 
> - vm_mmap_pgoff() takes mmap_sem for the write on the write side
> 
> - do_mprotect_pkey() takes mmap_sem for the write on the read side
> 
> 
> What do I miss?

Ah, good catch. I missed the locking for the read there. This is interesting because Marco
did confirmed that the concurrency could happen,

https://lore.kernel.org/lkml/20191025173511.181416-1-elver@google.com/

If that means KCSAN is not at fault, then I could think of two things,

1) someone downgrades the lock.

I don’t think that a case here. Only __do_munmap() will do that but I did not see how
it will affect us here.

2) the reader and writer are two different processes.

So, they held a different mmap_sem, but I can’t see how could two processes shared
the same vm_area_struct. Also, file->f_mapping->i_mmap was also stored in the
writer, but I can’t see how it was also loaded in the reader.

Any ideas?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH -next] fork: annotate a data race in vm_area_dup()
  2020-02-18  3:59   ` Qian Cai
@ 2020-02-18 10:30     ` Kirill A. Shutemov
  2020-02-18 12:40       ` Qian Cai
  0 siblings, 1 reply; 9+ messages in thread
From: Kirill A. Shutemov @ 2020-02-18 10:30 UTC (permalink / raw)
  To: Qian Cai
  Cc: Andrew Morton, elver, linux-mm, linux-kernel, peterz,
	syzbot+c034966b0b02f94f7f34, syzkaller-bugs

On Mon, Feb 17, 2020 at 10:59:47PM -0500, Qian Cai wrote:
> 
> 
> > On Feb 17, 2020, at 5:31 PM, Kirill A. Shutemov <kirill@shutemov.name> wrote:
> > 
> > I'm confused. AFAICS both sides hold mmap_sem on write:
> > 
> > - vm_mmap_pgoff() takes mmap_sem for the write on the write side
> > 
> > - do_mprotect_pkey() takes mmap_sem for the write on the read side
> > 
> > 
> > What do I miss?
> 
> Ah, good catch. I missed the locking for the read there. This is interesting because Marco
> did confirmed that the concurrency could happen,
> 
> https://lore.kernel.org/lkml/20191025173511.181416-1-elver@google.com/
> 
> If that means KCSAN is not at fault, then I could think of two things,
> 
> 1) someone downgrades the lock.
> 
> I don’t think that a case here. Only __do_munmap() will do that but I did not see how
> it will affect us here.
> 
> 2) the reader and writer are two different processes.
> 
> So, they held a different mmap_sem, but I can’t see how could two processes shared
> the same vm_area_struct. Also, file->f_mapping->i_mmap was also stored in the
> writer, but I can’t see how it was also loaded in the reader.
> 
> Any ideas?

I think I've got this:

vm_area_dup() blindly copies all fields of orignal VMA to the new one.
This includes coping vm_area_struct::shared.rb which is normally protected
by i_mmap_lock. But this is fine because the read value will be
overwritten on the following __vma_link_file() under proper protectection.

So the fix is correct, but justificaiton is lacking.

Also, I would like to more fine-grained annotation: marking with
data_race() 200 bytes copy may hide other issues.

-- 
 Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH -next] fork: annotate a data race in vm_area_dup()
  2020-02-18 10:30     ` Kirill A. Shutemov
@ 2020-02-18 12:40       ` Qian Cai
  2020-02-18 14:09         ` Marco Elver
  0 siblings, 1 reply; 9+ messages in thread
From: Qian Cai @ 2020-02-18 12:40 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, elver, linux-mm, linux-kernel, peterz,
	syzbot+c034966b0b02f94f7f34, syzkaller-bugs



> On Feb 18, 2020, at 5:29 AM, Kirill A. Shutemov <kirill@shutemov.name> wrote:
> 
> I think I've got this:
> 
> vm_area_dup() blindly copies all fields of orignal VMA to the new one.
> This includes coping vm_area_struct::shared.rb which is normally protected
> by i_mmap_lock. But this is fine because the read value will be
> overwritten on the following __vma_link_file() under proper protectection.

Right, multiple processes could share the same file-based address space where those vma have been linked into address_space::i_mmap via vm_area_struct::shared.rb. Thus, the reader could see its shared.rb linkage pointers got updated by other processes.

> 
> So the fix is correct, but justificaiton is lacking.
> 
> Also, I would like to more fine-grained annotation: marking with
> data_race() 200 bytes copy may hide other issues.

That is the harder part where I don’t think we have anything for that today. Macro, any suggestions? ASSERT_IGNORE_FIELD()?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH -next] fork: annotate a data race in vm_area_dup()
  2020-02-18 12:40       ` Qian Cai
@ 2020-02-18 14:09         ` Marco Elver
  2020-02-18 15:00           ` Qian Cai
  0 siblings, 1 reply; 9+ messages in thread
From: Marco Elver @ 2020-02-18 14:09 UTC (permalink / raw)
  To: Qian Cai
  Cc: Kirill A. Shutemov, Andrew Morton, Linux Memory Management List,
	LKML, Peter Zijlstra, syzbot, syzkaller-bugs

On Tue, 18 Feb 2020 at 13:40, Qian Cai <cai@lca.pw> wrote:
>
>
>
> > On Feb 18, 2020, at 5:29 AM, Kirill A. Shutemov <kirill@shutemov.name> wrote:
> >
> > I think I've got this:
> >
> > vm_area_dup() blindly copies all fields of orignal VMA to the new one.
> > This includes coping vm_area_struct::shared.rb which is normally protected
> > by i_mmap_lock. But this is fine because the read value will be
> > overwritten on the following __vma_link_file() under proper protectection.
>
> Right, multiple processes could share the same file-based address space where those vma have been linked into address_space::i_mmap via vm_area_struct::shared.rb. Thus, the reader could see its shared.rb linkage pointers got updated by other processes.
>
> >
> > So the fix is correct, but justificaiton is lacking.
> >
> > Also, I would like to more fine-grained annotation: marking with
> > data_race() 200 bytes copy may hide other issues.
>
> That is the harder part where I don’t think we have anything for that today. Macro, any suggestions? ASSERT_IGNORE_FIELD()?

There is no nice interface I can think of. All options will just cause
more problems, inconsistencies, or annoyances.

Ideally, to not introduce more types of macros and keep it consistent,
ASSERT_EXCLUSIVE_FIELDS_EXCEPT(var, ...) maybe what you're after:
"Check no concurrent writers to struct, except ignore the provided
fields".

This option doesn't quite work, unless you just restrict it to 1 field
(we can't use ranges, because e.g. vm_area_struct has
__randomize_layout). The next time around, you'll want 2 fields, and
it won't work. Also, do we know that 'shared.rb' is the only thing we
want to ignore?

If you wanted something similar to ASSERT_EXCLUSIVE_BITS, it'd have to
be ASSERT_EXCLUSIVE_FIELDS(var, ...), however, this is quite annoying
for structs with many fields as you'd have to list all of them. It's
similar to what you can already do currently (but I also don't
recommend because it's unmaintainable):

ASSERT_EXCLUSIVE_WRITER(orig->vm_start);
ASSERT_EXCLUSIVE_WRITER(orig->vm_end);
ASSERT_EXCLUSIVE_WRITER(orig->vm_next);
ASSERT_EXCLUSIVE_WRITER(orig->vm_prev);
... and so on ...
*new = data_race(*orig);

Also note that vm_area_struct has __randomize_layout, which makes
using ranges impossible. All in all, I don't see a terribly nice
option.

If, however, you knew that there are 1 or 2 fields only that you want
to make sure are not modified concurrently, ASSERT_EXCLUSIVE_WRITER +
data_race() would probably work well (or even ASSERT_EXCLUSIVE_ACCESS
if you want to make sure there are no writers nor _readers_).

Thanks,
-- Marco


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH -next] fork: annotate a data race in vm_area_dup()
  2020-02-18 14:09         ` Marco Elver
@ 2020-02-18 15:00           ` Qian Cai
  2020-02-18 15:18             ` Kirill A. Shutemov
  0 siblings, 1 reply; 9+ messages in thread
From: Qian Cai @ 2020-02-18 15:00 UTC (permalink / raw)
  To: Marco Elver
  Cc: Kirill A. Shutemov, Andrew Morton, Linux Memory Management List,
	LKML, Peter Zijlstra, syzbot, syzkaller-bugs

On Tue, 2020-02-18 at 15:09 +0100, Marco Elver wrote:
> On Tue, 18 Feb 2020 at 13:40, Qian Cai <cai@lca.pw> wrote:
> > 
> > 
> > 
> > > On Feb 18, 2020, at 5:29 AM, Kirill A. Shutemov <kirill@shutemov.name> wrote:
> > > 
> > > I think I've got this:
> > > 
> > > vm_area_dup() blindly copies all fields of orignal VMA to the new one.
> > > This includes coping vm_area_struct::shared.rb which is normally protected
> > > by i_mmap_lock. But this is fine because the read value will be
> > > overwritten on the following __vma_link_file() under proper protectection.
> > 
> > Right, multiple processes could share the same file-based address space where those vma have been linked into address_space::i_mmap via vm_area_struct::shared.rb. Thus, the reader could see its shared.rb linkage pointers got updated by other processes.
> > 
> > > 
> > > So the fix is correct, but justificaiton is lacking.
> > > 
> > > Also, I would like to more fine-grained annotation: marking with
> > > data_race() 200 bytes copy may hide other issues.
> > 
> > That is the harder part where I don’t think we have anything for that today. Macro, any suggestions? ASSERT_IGNORE_FIELD()?
> 
> There is no nice interface I can think of. All options will just cause
> more problems, inconsistencies, or annoyances.
> 
> Ideally, to not introduce more types of macros and keep it consistent,
> ASSERT_EXCLUSIVE_FIELDS_EXCEPT(var, ...) maybe what you're after:
> "Check no concurrent writers to struct, except ignore the provided
> fields".
> 
> This option doesn't quite work, unless you just restrict it to 1 field
> (we can't use ranges, because e.g. vm_area_struct has
> __randomize_layout). The next time around, you'll want 2 fields, and
> it won't work. Also, do we know that 'shared.rb' is the only thing we
> want to ignore?
> 
> If you wanted something similar to ASSERT_EXCLUSIVE_BITS, it'd have to
> be ASSERT_EXCLUSIVE_FIELDS(var, ...), however, this is quite annoying
> for structs with many fields as you'd have to list all of them. It's
> similar to what you can already do currently (but I also don't
> recommend because it's unmaintainable):
> 
> ASSERT_EXCLUSIVE_WRITER(orig->vm_start);
> ASSERT_EXCLUSIVE_WRITER(orig->vm_end);
> ASSERT_EXCLUSIVE_WRITER(orig->vm_next);
> ASSERT_EXCLUSIVE_WRITER(orig->vm_prev);
> ... and so on ...
> *new = data_race(*orig);
> 
> Also note that vm_area_struct has __randomize_layout, which makes
> using ranges impossible. All in all, I don't see a terribly nice
> option.
> 
> If, however, you knew that there are 1 or 2 fields only that you want
> to make sure are not modified concurrently, ASSERT_EXCLUSIVE_WRITER +
> data_race() would probably work well (or even ASSERT_EXCLUSIVE_ACCESS
> if you want to make sure there are no writers nor _readers_).

I am testing an idea that just do,

lockdep_assert_held_write(&orig->vm_mm->mmap_sem);
*new = data_race(*orig);

The idea is that as long as we have the exclusive mmap_sem held in all paths
(auditing indicated so), no writer should be able to mess up our vm_area_struct
except the "shared.rb" field which has no harm.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH -next] fork: annotate a data race in vm_area_dup()
  2020-02-18 15:00           ` Qian Cai
@ 2020-02-18 15:18             ` Kirill A. Shutemov
  2020-02-18 16:46               ` Qian Cai
  0 siblings, 1 reply; 9+ messages in thread
From: Kirill A. Shutemov @ 2020-02-18 15:18 UTC (permalink / raw)
  To: Qian Cai
  Cc: Marco Elver, Andrew Morton, Linux Memory Management List, LKML,
	Peter Zijlstra, syzbot, syzkaller-bugs

On Tue, Feb 18, 2020 at 10:00:35AM -0500, Qian Cai wrote:
> On Tue, 2020-02-18 at 15:09 +0100, Marco Elver wrote:
> > On Tue, 18 Feb 2020 at 13:40, Qian Cai <cai@lca.pw> wrote:
> > > 
> > > 
> > > 
> > > > On Feb 18, 2020, at 5:29 AM, Kirill A. Shutemov <kirill@shutemov.name> wrote:
> > > > 
> > > > I think I've got this:
> > > > 
> > > > vm_area_dup() blindly copies all fields of orignal VMA to the new one.
> > > > This includes coping vm_area_struct::shared.rb which is normally protected
> > > > by i_mmap_lock. But this is fine because the read value will be
> > > > overwritten on the following __vma_link_file() under proper protectection.
> > > 
> > > Right, multiple processes could share the same file-based address space where those vma have been linked into address_space::i_mmap via vm_area_struct::shared.rb. Thus, the reader could see its shared.rb linkage pointers got updated by other processes.
> > > 
> > > > 
> > > > So the fix is correct, but justificaiton is lacking.
> > > > 
> > > > Also, I would like to more fine-grained annotation: marking with
> > > > data_race() 200 bytes copy may hide other issues.
> > > 
> > > That is the harder part where I don’t think we have anything for that today. Macro, any suggestions? ASSERT_IGNORE_FIELD()?
> > 
> > There is no nice interface I can think of. All options will just cause
> > more problems, inconsistencies, or annoyances.
> > 
> > Ideally, to not introduce more types of macros and keep it consistent,
> > ASSERT_EXCLUSIVE_FIELDS_EXCEPT(var, ...) maybe what you're after:
> > "Check no concurrent writers to struct, except ignore the provided
> > fields".
> > 
> > This option doesn't quite work, unless you just restrict it to 1 field
> > (we can't use ranges, because e.g. vm_area_struct has
> > __randomize_layout). The next time around, you'll want 2 fields, and
> > it won't work. Also, do we know that 'shared.rb' is the only thing we
> > want to ignore?
> > 
> > If you wanted something similar to ASSERT_EXCLUSIVE_BITS, it'd have to
> > be ASSERT_EXCLUSIVE_FIELDS(var, ...), however, this is quite annoying
> > for structs with many fields as you'd have to list all of them. It's
> > similar to what you can already do currently (but I also don't
> > recommend because it's unmaintainable):
> > 
> > ASSERT_EXCLUSIVE_WRITER(orig->vm_start);
> > ASSERT_EXCLUSIVE_WRITER(orig->vm_end);
> > ASSERT_EXCLUSIVE_WRITER(orig->vm_next);
> > ASSERT_EXCLUSIVE_WRITER(orig->vm_prev);
> > ... and so on ...
> > *new = data_race(*orig);
> > 
> > Also note that vm_area_struct has __randomize_layout, which makes
> > using ranges impossible. All in all, I don't see a terribly nice
> > option.
> > 
> > If, however, you knew that there are 1 or 2 fields only that you want
> > to make sure are not modified concurrently, ASSERT_EXCLUSIVE_WRITER +
> > data_race() would probably work well (or even ASSERT_EXCLUSIVE_ACCESS
> > if you want to make sure there are no writers nor _readers_).
> 
> I am testing an idea that just do,
> 
> lockdep_assert_held_write(&orig->vm_mm->mmap_sem);
> *new = data_race(*orig);
> 
> The idea is that as long as we have the exclusive mmap_sem held in all paths
> (auditing indicated so), no writer should be able to mess up our vm_area_struct
> except the "shared.rb" field which has no harm.

Well, some fields protected by page_table_lock and can be written to
without exclusive mmap_sem. Probably even without any mmap_sem: pin
mm_struct + page_table_lock should be enough.

-- 
 Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH -next] fork: annotate a data race in vm_area_dup()
  2020-02-18 15:18             ` Kirill A. Shutemov
@ 2020-02-18 16:46               ` Qian Cai
  0 siblings, 0 replies; 9+ messages in thread
From: Qian Cai @ 2020-02-18 16:46 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Marco Elver, Andrew Morton, Linux Memory Management List, LKML,
	Peter Zijlstra, syzbot, syzkaller-bugs

On Tue, 2020-02-18 at 18:18 +0300, Kirill A. Shutemov wrote:
> On Tue, Feb 18, 2020 at 10:00:35AM -0500, Qian Cai wrote:
> > On Tue, 2020-02-18 at 15:09 +0100, Marco Elver wrote:
> > > On Tue, 18 Feb 2020 at 13:40, Qian Cai <cai@lca.pw> wrote:
> > > > 
> > > > 
> > > > 
> > > > > On Feb 18, 2020, at 5:29 AM, Kirill A. Shutemov <kirill@shutemov.name> wrote:
> > > > > 
> > > > > I think I've got this:
> > > > > 
> > > > > vm_area_dup() blindly copies all fields of orignal VMA to the new one.
> > > > > This includes coping vm_area_struct::shared.rb which is normally protected
> > > > > by i_mmap_lock. But this is fine because the read value will be
> > > > > overwritten on the following __vma_link_file() under proper protectection.
> > > > 
> > > > Right, multiple processes could share the same file-based address space where those vma have been linked into address_space::i_mmap via vm_area_struct::shared.rb. Thus, the reader could see its shared.rb linkage pointers got updated by other processes.
> > > > 
> > > > > 
> > > > > So the fix is correct, but justificaiton is lacking.
> > > > > 
> > > > > Also, I would like to more fine-grained annotation: marking with
> > > > > data_race() 200 bytes copy may hide other issues.
> > > > 
> > > > That is the harder part where I don’t think we have anything for that today. Macro, any suggestions? ASSERT_IGNORE_FIELD()?
> > > 
> > > There is no nice interface I can think of. All options will just cause
> > > more problems, inconsistencies, or annoyances.
> > > 
> > > Ideally, to not introduce more types of macros and keep it consistent,
> > > ASSERT_EXCLUSIVE_FIELDS_EXCEPT(var, ...) maybe what you're after:
> > > "Check no concurrent writers to struct, except ignore the provided
> > > fields".
> > > 
> > > This option doesn't quite work, unless you just restrict it to 1 field
> > > (we can't use ranges, because e.g. vm_area_struct has
> > > __randomize_layout). The next time around, you'll want 2 fields, and
> > > it won't work. Also, do we know that 'shared.rb' is the only thing we
> > > want to ignore?
> > > 
> > > If you wanted something similar to ASSERT_EXCLUSIVE_BITS, it'd have to
> > > be ASSERT_EXCLUSIVE_FIELDS(var, ...), however, this is quite annoying
> > > for structs with many fields as you'd have to list all of them. It's
> > > similar to what you can already do currently (but I also don't
> > > recommend because it's unmaintainable):
> > > 
> > > ASSERT_EXCLUSIVE_WRITER(orig->vm_start);
> > > ASSERT_EXCLUSIVE_WRITER(orig->vm_end);
> > > ASSERT_EXCLUSIVE_WRITER(orig->vm_next);
> > > ASSERT_EXCLUSIVE_WRITER(orig->vm_prev);
> > > ... and so on ...
> > > *new = data_race(*orig);
> > > 
> > > Also note that vm_area_struct has __randomize_layout, which makes
> > > using ranges impossible. All in all, I don't see a terribly nice
> > > option.
> > > 
> > > If, however, you knew that there are 1 or 2 fields only that you want
> > > to make sure are not modified concurrently, ASSERT_EXCLUSIVE_WRITER +
> > > data_race() would probably work well (or even ASSERT_EXCLUSIVE_ACCESS
> > > if you want to make sure there are no writers nor _readers_).
> > 
> > I am testing an idea that just do,
> > 
> > lockdep_assert_held_write(&orig->vm_mm->mmap_sem);
> > *new = data_race(*orig);
> > 
> > The idea is that as long as we have the exclusive mmap_sem held in all paths
> > (auditing indicated so), no writer should be able to mess up our vm_area_struct
> > except the "shared.rb" field which has no harm.
> 
> Well, some fields protected by page_table_lock and can be written to
> without exclusive mmap_sem. Probably even without any mmap_sem: pin
> mm_struct + page_table_lock should be enough.
> 

How about this?

diff --git a/kernel/fork.c b/kernel/fork.c
index cb2ae49e497e..68f273e0ebf4 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -359,7 +359,10 @@ struct vm_area_struct *vm_area_dup(struct vm_area_struct
*orig)
        struct vm_area_struct *new = kmem_cache_alloc(vm_area_cachep,
GFP_KERNEL);
 
        if (new) {
-               *new = *orig;
+               ASSERT_EXCLUSIVE_WRITER(orig->vm_flags);
+               ASSERT_EXCLUSIVE_WRITER(orig->vm_file);
+
+               *new = data_race(*orig);
                INIT_LIST_HEAD(&new->anon_vma_chain);
                new->vm_next = new->vm_prev = NULL;
        }


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, back to index

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-14 20:33 [PATCH -next] fork: annotate a data race in vm_area_dup() Qian Cai
2020-02-17 22:31 ` Kirill A. Shutemov
2020-02-18  3:59   ` Qian Cai
2020-02-18 10:30     ` Kirill A. Shutemov
2020-02-18 12:40       ` Qian Cai
2020-02-18 14:09         ` Marco Elver
2020-02-18 15:00           ` Qian Cai
2020-02-18 15:18             ` Kirill A. Shutemov
2020-02-18 16:46               ` Qian Cai

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git