All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: fs/coda oops bisected to (925b9cd1b8) "locking/rwsem: Make owner store task pointer of last owni
@ 2019-03-31  4:00 Jan Harkes
  2019-03-31 18:14 ` Waiman Long
  0 siblings, 1 reply; 5+ messages in thread
From: Jan Harkes @ 2019-03-31  4:00 UTC (permalink / raw)
  To: Waiman Long
  Cc: Ingo Molnar, Peter Zijlstra, Alexander Viro,
	Pedro Cuadra Chamorro, linux-kernel

On Fri, Mar 29, 2019 at 05:53:22PM +0000, Waiman Long wrote:
> On 03/29/2019 12:10 PM, Jan Harkes wrote:
> > I knew I definitely had never seen this problem with the stable kernel
> > on Ubuntu xenial (4.4) so I bisected between v4.4 and v5.1-rc2 and ended
> > up at
> >
> >     # first bad commit: [925b9cd1b89a94b7124d128c80dfc48f78a63098]
> >     # locking/rwsem: Make owner store task pointer of last owning reader
> >
> > When I revert this particular commit on 5.1-rc2, I am not able to
> > reproduce the problem anymore.
> 
> Without CONFIG_DEBUG_RWSEMS, the only behavioral change of this patch is
> to do an unconditional write of task_structure pointer into sem->owner
> after acquiring the read lock in down_read(). Before this patch, it does

I tried with just that change, but that is not at fault. It is also hard
to believe we have a use-after-free issue, because we are using a
spinlock on the inode that is held in place by the file we are releasing.

After trying various variations the minimal change that fixes the soft
lockup is as follows. Without this patch I get a reliable lockup, with
the patch everything works as expected.


    diff --git a/kernel/locking/rwsem.h b/kernel/locking/rwsem.h
    index bad2bca..0cc437d 100644
    --- a/kernel/locking/rwsem.h
    +++ b/kernel/locking/rwsem.h
    @@ -61,8 +61,7 @@ static inline void rwsem_clear_owner(struct rw_semaphore *sem)
     static inline void __rwsem_set_reader_owned(struct rw_semaphore *sem,
                                                struct task_struct *owner)
     {
    -	unsigned long val = (unsigned long)owner | RWSEM_READER_OWNED
    -						 | RWSEM_ANONYMOUSLY_OWNED;
    +	unsigned long val = RWSEM_READER_OWNED | RWSEM_ANONYMOUSLY_OWNED;
     
            WRITE_ONCE(sem->owner, (struct task_struct *)val);
     }


I'll continue digging if I can find a reason why. So far I've only found
one place where rwsem->owner is modified while not holding a lock, but
changing that doesn't make a difference for my particular case.


    diff --git a/include/linux/percpu-rwsem.h b/include/linux/percpu-rwsem.h
    index 03cb4b6f842e..fe696a8b57f3 100644
    --- a/include/linux/percpu-rwsem.h
    +++ b/include/linux/percpu-rwsem.h
    @@ -114,11 +114,11 @@ extern void percpu_free_rwsem(struct percpu_rw_semaphore *);
     static inline void percpu_rwsem_release(struct percpu_rw_semaphore *sem,
                                            bool read, unsigned long ip)
     {
    -	lock_release(&sem->rw_sem.dep_map, 1, ip);
     #ifdef CONFIG_RWSEM_SPIN_ON_OWNER
            if (!read)
                    sem->rw_sem.owner = RWSEM_OWNER_UNKNOWN;
     #endif
    +	lock_release(&sem->rw_sem.dep_map, 1, ip);
     }
     
     static inline void percpu_rwsem_acquire(struct percpu_rw_semaphore *sem,


Jan

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: fs/coda oops bisected to (925b9cd1b8) "locking/rwsem: Make owner store task pointer of last owni
  2019-03-31  4:00 fs/coda oops bisected to (925b9cd1b8) "locking/rwsem: Make owner store task pointer of last owni Jan Harkes
@ 2019-03-31 18:14 ` Waiman Long
  2019-03-31 19:13   ` Jan Harkes
  0 siblings, 1 reply; 5+ messages in thread
From: Waiman Long @ 2019-03-31 18:14 UTC (permalink / raw)
  To: Jan Harkes
  Cc: Ingo Molnar, Peter Zijlstra, Alexander Viro,
	Pedro Cuadra Chamorro, linux-kernel

On 03/31/2019 12:00 AM, Jan Harkes wrote:
> On Fri, Mar 29, 2019 at 05:53:22PM +0000, Waiman Long wrote:
>> On 03/29/2019 12:10 PM, Jan Harkes wrote:
>>> I knew I definitely had never seen this problem with the stable kernel
>>> on Ubuntu xenial (4.4) so I bisected between v4.4 and v5.1-rc2 and ended
>>> up at
>>>
>>>     # first bad commit: [925b9cd1b89a94b7124d128c80dfc48f78a63098]
>>>     # locking/rwsem: Make owner store task pointer of last owning reader
>>>
>>> When I revert this particular commit on 5.1-rc2, I am not able to
>>> reproduce the problem anymore.
>> Without CONFIG_DEBUG_RWSEMS, the only behavioral change of this patch is
>> to do an unconditional write of task_structure pointer into sem->owner
>> after acquiring the read lock in down_read(). Before this patch, it does
> I tried with just that change, but that is not at fault. It is also hard
> to believe we have a use-after-free issue, because we are using a
> spinlock on the inode that is held in place by the file we are releasing.

One possibility is that there is a previous reference to the memory
currently occupied by the spinlock. If the memory location is previously
part of a rwsem structure and someone is still using it, you may get
memory corruption.

> After trying various variations the minimal change that fixes the soft
> lockup is as follows. Without this patch I get a reliable lockup, with
> the patch everything works as expected.
>
>
>     diff --git a/kernel/locking/rwsem.h b/kernel/locking/rwsem.h
>     index bad2bca..0cc437d 100644
>     --- a/kernel/locking/rwsem.h
>     +++ b/kernel/locking/rwsem.h
>     @@ -61,8 +61,7 @@ static inline void rwsem_clear_owner(struct rw_semaphore *sem)
>      static inline void __rwsem_set_reader_owned(struct rw_semaphore *sem,
>                                                 struct task_struct *owner)
>      {
>     -	unsigned long val = (unsigned long)owner | RWSEM_READER_OWNED
>     -						 | RWSEM_ANONYMOUSLY_OWNED;
>     +	unsigned long val = RWSEM_READER_OWNED | RWSEM_ANONYMOUSLY_OWNED;
>      
>             WRITE_ONCE(sem->owner, (struct task_struct *)val);
>      }

The change above clears all the upper bytes of owner to 0, while the
original code will write some non-zero values to the upper bytes.

>
> I'll continue digging if I can find a reason why. So far I've only found
> one place where rwsem->owner is modified while not holding a lock, but
> changing that doesn't make a difference for my particular case.
>
>
>     diff --git a/include/linux/percpu-rwsem.h b/include/linux/percpu-rwsem.h
>     index 03cb4b6f842e..fe696a8b57f3 100644
>     --- a/include/linux/percpu-rwsem.h
>     +++ b/include/linux/percpu-rwsem.h
>     @@ -114,11 +114,11 @@ extern void percpu_free_rwsem(struct percpu_rw_semaphore *);
>      static inline void percpu_rwsem_release(struct percpu_rw_semaphore *sem,
>                                             bool read, unsigned long ip)
>      {
>     -	lock_release(&sem->rw_sem.dep_map, 1, ip);
>      #ifdef CONFIG_RWSEM_SPIN_ON_OWNER
>             if (!read)
>                     sem->rw_sem.owner = RWSEM_OWNER_UNKNOWN;
>      #endif
>     +	lock_release(&sem->rw_sem.dep_map, 1, ip);
>      }
>      
>      static inline void percpu_rwsem_acquire(struct percpu_rw_semaphore *sem,
>
>
> Jan

The rwsem is still locked. The above code releases the current task from
being the lock holder of the rwsem from lockdep's perspective. Later on
some other task will take over the ownership of the rwsem without
actually releasing the lock in the interim. So I don't think this code
is part of the problem.

Cheers,
Longman



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: fs/coda oops bisected to (925b9cd1b8) "locking/rwsem: Make owner store task pointer of last owni
  2019-03-31 18:14 ` Waiman Long
@ 2019-03-31 19:13   ` Jan Harkes
  2019-04-02 19:17     ` Jan Harkes
  0 siblings, 1 reply; 5+ messages in thread
From: Jan Harkes @ 2019-03-31 19:13 UTC (permalink / raw)
  To: Waiman Long
  Cc: Ingo Molnar, Peter Zijlstra, Alexander Viro,
	Pedro Cuadra Chamorro, linux-kernel

On Sun, Mar 31, 2019 at 02:14:13PM -0400, Waiman Long wrote:
> On 03/31/2019 12:00 AM, Jan Harkes wrote:
> > On Fri, Mar 29, 2019 at 05:53:22PM +0000, Waiman Long wrote:
> >> On 03/29/2019 12:10 PM, Jan Harkes wrote:
> >>> I knew I definitely had never seen this problem with the stable kernel
> >>> on Ubuntu xenial (4.4) so I bisected between v4.4 and v5.1-rc2 and ended
> >>> up at
> >>>
> >>>     # first bad commit: [925b9cd1b89a94b7124d128c80dfc48f78a63098]
> >>>     # locking/rwsem: Make owner store task pointer of last owning reader
> >>>
> >>> When I revert this particular commit on 5.1-rc2, I am not able to
> >>> reproduce the problem anymore.
> >> Without CONFIG_DEBUG_RWSEMS, the only behavioral change of this patch is
> >> to do an unconditional write of task_structure pointer into sem->owner
> >> after acquiring the read lock in down_read(). Before this patch, it does
> >
> > I tried with just that change, but that is not at fault. It is also hard
> > to believe we have a use-after-free issue, because we are using a
> > spinlock on the inode that is held in place by the file we are releasing.
> 
> One possibility is that there is a previous reference to the memory
> currently occupied by the spinlock. If the memory location is previously
> part of a rwsem structure and someone is still using it, you may get
> memory corruption.

Ah, I hadn't even thought of that possibility. Good, it will open up
some new places to look at. Since I can open, read, write and close
files in Coda without problems and so far it only seems to be related to
closing the fd as a result of an executable process exiting.

> >     -	unsigned long val = (unsigned long)owner | RWSEM_READER_OWNED
> >     -						 | RWSEM_ANONYMOUSLY_OWNED;
> >     +	unsigned long val = RWSEM_READER_OWNED | RWSEM_ANONYMOUSLY_OWNED;
>
> The change above clears all the upper bytes of owner to 0, while the
> original code will write some non-zero values to the upper bytes.

Yes, and write locks will still write to all bits of the owner field. So
the clobber must then be caused by something trying to grab a read lock
on a rw_semaphore that has been freed, and whose memory got reused by
the inode that was allocated when starting the executable. That seems
like a specific enough thing that I hopefully will be able to find it.

> > I'll continue digging if I can find a reason why. So far I've only found
> > one place where rwsem->owner is modified while not holding a lock, but
> > changing that doesn't make a difference for my particular case.
...
> The rwsem is still locked. The above code releases the current task from
> being the lock holder of the rwsem from lockdep's perspective. Later on

Ah that explains why that change made no difference. I'm unfamiliar with
the details of the locking code and assumed it was releasing some sort
of write lock there, my bad.

Jan


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: fs/coda oops bisected to (925b9cd1b8) "locking/rwsem: Make owner store task pointer of last owni
  2019-03-31 19:13   ` Jan Harkes
@ 2019-04-02 19:17     ` Jan Harkes
  2019-04-02 20:24       ` Waiman Long
  0 siblings, 1 reply; 5+ messages in thread
From: Jan Harkes @ 2019-04-02 19:17 UTC (permalink / raw)
  To: Waiman Long
  Cc: Ingo Molnar, Peter Zijlstra, Alexander Viro,
	Pedro Cuadra Chamorro, linux-kernel

On Sun, Mar 31, 2019 at 03:13:47PM -0400, Jan Harkes wrote:
> On Sun, Mar 31, 2019 at 02:14:13PM -0400, Waiman Long wrote:
> > One possibility is that there is a previous reference to the memory
> > currently occupied by the spinlock. If the memory location is previously
> > part of a rwsem structure and someone is still using it, you may get
> > memory corruption.
> 
> Ah, I hadn't even thought of that possibility. Good, it will open up

First of all, I have to thank you for your original patch because
otherwise I probably would never have discovered that something was
seriously wrong. Your patch made the problem visible.

I ended up changing 'owner' to '_RET_IP_' and dumping the value of the
clobbered coda inode spinlock and surrounding memory and found that the
'culprit' is in ext4_filemap_fault and despite it being in ext4, it is
still a Coda specific problem.

Effectively Coda overlays other filesystems' inodes for mmap, but
the vma->vm_file still points at Coda's file. So when we use
file_inode() in ext4_filemap_fault we end up with the Coda inode instead
of the ext4 inode and when trying to grab ext4's mmap_sem we really just
scribble over the memory region that happens to contain the Coda inode
spinlock. A fix is to use vm_file->f_mapping->host instead of
file_inode(vm_file).

Of course everyone looks at ext4 as a canonical example so this problem
has spread pretty much everywhere and I'm wondering how to best resolve
this.

- change file_inode() to follow file->f_mapping->host

  would fix most places, but maybe f_mapping is not always guaranteed to
  point at a usable place?

- change Coda's mmap to replace vma->vm_file with the host file

  we'd probably no longer get notified when the last reference to the
  host file goes away, so we'd call coda_release and notify userspace on
  close() even when there are still active mmap regions.

- fix every in-tree file system to use vma->vm_file->f_mapping->host.

Jan


diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 69d65d49837b..122d691d3eda 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -284,7 +284,7 @@ static vm_fault_t ext4_dax_huge_fault(struct vm_fault *vmf,
 	vm_fault_t result;
 	int retries = 0;
 	handle_t *handle = NULL;
-	struct inode *inode = file_inode(vmf->vma->vm_file);
+	struct inode *inode = vmf->vma->vm_file->f_mapping->host;
 	struct super_block *sb = inode->i_sb;
 
 	/*
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index b54b261ded36..62a0025ce7f8 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -6211,7 +6211,7 @@ vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf)
 	int err;
 	vm_fault_t ret;
 	struct file *file = vma->vm_file;
-	struct inode *inode = file_inode(file);
+	struct inode *inode = file->f_mapping->host;
 	struct address_space *mapping = inode->i_mapping;
 	handle_t *handle;
 	get_block_t *get_block;
@@ -6302,7 +6302,7 @@ vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf)
 
 vm_fault_t ext4_filemap_fault(struct vm_fault *vmf)
 {
-	struct inode *inode = file_inode(vmf->vma->vm_file);
+	struct inode *inode = vmf->vma->vm_file->f_mapping->host;
 	vm_fault_t ret;
 
 	down_read(&EXT4_I(inode)->i_mmap_sem);

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: fs/coda oops bisected to (925b9cd1b8) "locking/rwsem: Make owner store task pointer of last owni
  2019-04-02 19:17     ` Jan Harkes
@ 2019-04-02 20:24       ` Waiman Long
  0 siblings, 0 replies; 5+ messages in thread
From: Waiman Long @ 2019-04-02 20:24 UTC (permalink / raw)
  To: Jan Harkes
  Cc: Ingo Molnar, Peter Zijlstra, Alexander Viro,
	Pedro Cuadra Chamorro, linux-kernel

On 04/02/2019 03:17 PM, Jan Harkes wrote:
> On Sun, Mar 31, 2019 at 03:13:47PM -0400, Jan Harkes wrote:
>> On Sun, Mar 31, 2019 at 02:14:13PM -0400, Waiman Long wrote:
>>> One possibility is that there is a previous reference to the memory
>>> currently occupied by the spinlock. If the memory location is previously
>>> part of a rwsem structure and someone is still using it, you may get
>>> memory corruption.
>> Ah, I hadn't even thought of that possibility. Good, it will open up
> First of all, I have to thank you for your original patch because
> otherwise I probably would never have discovered that something was
> seriously wrong. Your patch made the problem visible.
>
> I ended up changing 'owner' to '_RET_IP_' and dumping the value of the
> clobbered coda inode spinlock and surrounding memory and found that the
> 'culprit' is in ext4_filemap_fault and despite it being in ext4, it is
> still a Coda specific problem.

It is good news that you have found the bug. However, I don't have
sufficient expertise in the filesystem and mm areas to give you
recommendation of what to do next.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-04-02 20:24 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-31  4:00 fs/coda oops bisected to (925b9cd1b8) "locking/rwsem: Make owner store task pointer of last owni Jan Harkes
2019-03-31 18:14 ` Waiman Long
2019-03-31 19:13   ` Jan Harkes
2019-04-02 19:17     ` Jan Harkes
2019-04-02 20:24       ` Waiman Long

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.