All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] xfs: fix use-after-free race in xfs_buf_rele
@ 2018-10-09 22:00 Dave Chinner
  2018-10-10 13:19 ` Brian Foster
  2018-10-10 13:36 ` Carlos Maiolino
  0 siblings, 2 replies; 4+ messages in thread
From: Dave Chinner @ 2018-10-09 22:00 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

When looking at a 4.18 based KASAN use after free report, I noticed
that racing xfs_buf_rele() may race on dropping the last reference
to the buffer and taking the buffer lock. This was the symptom
displayed by the KASAN report, but the actual issue that was
reported had already been fixed in 4.19-rc1 by commit e339dd8d8b04
("xfs: use sync buffer I/O for sync delwri queue submission").

Despite this, I think there is still an issue with xfs_buf_rele()
in this code:


        release = atomic_dec_and_lock(&bp->b_hold, &pag->pag_buf_lock);
        spin_lock(&bp->b_lock);
        if (!release) {
.....

If two threads race on the b_lock after both dropping a reference
and one getting dropping the last reference so release = true, we
end up with:


CPU 0				CPU 1
atomic_dec_and_lock()
				atomic_dec_and_lock()
				spin_lock(&bp->b_lock)
spin_lock(&bp->b_lock)
<spins>
				<release = true bp->b_lru_ref = 0>
				<remove from lists>
				freebuf = true
				spin_unlock(&bp->b_lock)
				xfs_buf_free(bp)
<gets lock, reading and writing freed memory>
<accesses freed memory>
spin_unlock(&bp->b_lock) <reads/writes freed memory>

IOWs, we can't safely take bp->b_lock after dropping the hold
reference because the buffer may go away at any time after we
drop that reference. However, this can be fixed simply by taking the
bp->b_lock before we drop the reference.

It is safe to nest the pag_buf_lock inside bp->b_lock as the
pag_buf_lock is only used to serialise against lookup in
xfs_buf_find() and no other locks are held over or under the
pag_buf_lock there. Make this clear by documenting the buffer lock
orders at the top of the file.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_buf.c | 38 +++++++++++++++++++++++++++++++++++++-
 1 file changed, 37 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 57d28dde5a78..d76116760ef6 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -37,6 +37,32 @@ static kmem_zone_t *xfs_buf_zone;
 #define xb_to_gfp(flags) \
 	((((flags) & XBF_READ_AHEAD) ? __GFP_NORETRY : GFP_NOFS) | __GFP_NOWARN)
 
+/*
+ * Locking orders
+ *
+ * xfs_buf_ioacct_inc:
+ * xfs_buf_ioacct_dec:
+ *	b_sema (caller holds)
+ *	  b_lock
+ *
+ * xfs_buf_stale:
+ *	b_sema (caller holds)
+ *	  b_lock
+ *	    lru_lock
+ *
+ * xfs_buf_rele:
+ *	b_lock
+ *	  pag_buf_lock
+ *	    lru_lock
+ *
+ * xfs_buftarg_wait_rele
+ *	lru_lock
+ *	  b_lock (trylock due to inversion)
+ *
+ * xfs_buftarg_isolate
+ *	lru_lock
+ *	  b_lock (trylock due to inversion)
+ */
 
 static inline int
 xfs_buf_is_vmapped(
@@ -1036,8 +1062,18 @@ xfs_buf_rele(
 
 	ASSERT(atomic_read(&bp->b_hold) > 0);
 
-	release = atomic_dec_and_lock(&bp->b_hold, &pag->pag_buf_lock);
+	/*
+	 * We grab the b_lock here first to serialise racing xfs_buf_rele()
+	 * calls. The pag_buf_lock being taken on the last reference only
+	 * serialises against racing lookups in xfs_buf_find(). IOWs, the second
+	 * to last reference we drop here is not serialised against the last
+	 * reference until we take bp->b_lock. Hence if we don't grab b_lock
+	 * first, the last "release" reference can win the race to the lock and
+	 * free the buffer before the second-to-last reference is processed,
+	 * leading to a use-after-free scenario.
+	 */
 	spin_lock(&bp->b_lock);
+	release = atomic_dec_and_lock(&bp->b_hold, &pag->pag_buf_lock);
 	if (!release) {
 		/*
 		 * Drop the in-flight state if the buffer is already on the LRU
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] xfs: fix use-after-free race in xfs_buf_rele
  2018-10-09 22:00 [PATCH] xfs: fix use-after-free race in xfs_buf_rele Dave Chinner
@ 2018-10-10 13:19 ` Brian Foster
  2018-10-10 21:42   ` Dave Chinner
  2018-10-10 13:36 ` Carlos Maiolino
  1 sibling, 1 reply; 4+ messages in thread
From: Brian Foster @ 2018-10-10 13:19 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Wed, Oct 10, 2018 at 09:00:44AM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> When looking at a 4.18 based KASAN use after free report, I noticed
> that racing xfs_buf_rele() may race on dropping the last reference
> to the buffer and taking the buffer lock. This was the symptom
> displayed by the KASAN report, but the actual issue that was
> reported had already been fixed in 4.19-rc1 by commit e339dd8d8b04
> ("xfs: use sync buffer I/O for sync delwri queue submission").
> 

Are you saying that the KASAN report reflected the delwri queue race and
the rele() issue was discovered by inspection, or that the KASAN report
occurred with the delwri queue fix already in place? Or IOW, has this
been reproduced?

> Despite this, I think there is still an issue with xfs_buf_rele()
> in this code:
> 
> 
>         release = atomic_dec_and_lock(&bp->b_hold, &pag->pag_buf_lock);
>         spin_lock(&bp->b_lock);
>         if (!release) {
> .....
> 
> If two threads race on the b_lock after both dropping a reference
> and one getting dropping the last reference so release = true, we
> end up with:
> 
> 
> CPU 0				CPU 1
> atomic_dec_and_lock()
> 				atomic_dec_and_lock()
> 				spin_lock(&bp->b_lock)
> spin_lock(&bp->b_lock)
> <spins>

Ok, so CPU0 drops one ref and returns, CPU1 drops the last ref, acquires
->b_pag_lock, returns and wins the race to ->b_lock. The latter bit
basically means the final reference cleanup executes before the previous
reference drop gets to acquire ->b_lock. CPU1 releases ->b_lock and the
race is on between freeing the buffer and processing the intermediate
release.

I suppose this makes sense because ->b_pag_lock is only acquired on the
final release. Therefore, ->b_lock is the real serialization point as
far as xfs_buf_rele() is concerned.

> 				<release = true bp->b_lru_ref = 0>
> 				<remove from lists>
> 				freebuf = true
> 				spin_unlock(&bp->b_lock)
> 				xfs_buf_free(bp)
> <gets lock, reading and writing freed memory>
> <accesses freed memory>
> spin_unlock(&bp->b_lock) <reads/writes freed memory>
> 
> IOWs, we can't safely take bp->b_lock after dropping the hold
> reference because the buffer may go away at any time after we
> drop that reference. However, this can be fixed simply by taking the
> bp->b_lock before we drop the reference.
> 
> It is safe to nest the pag_buf_lock inside bp->b_lock as the
> pag_buf_lock is only used to serialise against lookup in
> xfs_buf_find() and no other locks are held over or under the
> pag_buf_lock there. Make this clear by documenting the buffer lock
> orders at the top of the file.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---

Looks reasonable enough to me:

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/xfs_buf.c | 38 +++++++++++++++++++++++++++++++++++++-
>  1 file changed, 37 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> index 57d28dde5a78..d76116760ef6 100644
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -37,6 +37,32 @@ static kmem_zone_t *xfs_buf_zone;
>  #define xb_to_gfp(flags) \
>  	((((flags) & XBF_READ_AHEAD) ? __GFP_NORETRY : GFP_NOFS) | __GFP_NOWARN)
>  
> +/*
> + * Locking orders
> + *
> + * xfs_buf_ioacct_inc:
> + * xfs_buf_ioacct_dec:
> + *	b_sema (caller holds)
> + *	  b_lock
> + *
> + * xfs_buf_stale:
> + *	b_sema (caller holds)
> + *	  b_lock
> + *	    lru_lock
> + *
> + * xfs_buf_rele:
> + *	b_lock
> + *	  pag_buf_lock
> + *	    lru_lock
> + *
> + * xfs_buftarg_wait_rele
> + *	lru_lock
> + *	  b_lock (trylock due to inversion)
> + *
> + * xfs_buftarg_isolate
> + *	lru_lock
> + *	  b_lock (trylock due to inversion)
> + */
>  
>  static inline int
>  xfs_buf_is_vmapped(
> @@ -1036,8 +1062,18 @@ xfs_buf_rele(
>  
>  	ASSERT(atomic_read(&bp->b_hold) > 0);
>  
> -	release = atomic_dec_and_lock(&bp->b_hold, &pag->pag_buf_lock);
> +	/*
> +	 * We grab the b_lock here first to serialise racing xfs_buf_rele()
> +	 * calls. The pag_buf_lock being taken on the last reference only
> +	 * serialises against racing lookups in xfs_buf_find(). IOWs, the second
> +	 * to last reference we drop here is not serialised against the last
> +	 * reference until we take bp->b_lock. Hence if we don't grab b_lock
> +	 * first, the last "release" reference can win the race to the lock and
> +	 * free the buffer before the second-to-last reference is processed,
> +	 * leading to a use-after-free scenario.
> +	 */
>  	spin_lock(&bp->b_lock);
> +	release = atomic_dec_and_lock(&bp->b_hold, &pag->pag_buf_lock);
>  	if (!release) {
>  		/*
>  		 * Drop the in-flight state if the buffer is already on the LRU
> -- 
> 2.17.0
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] xfs: fix use-after-free race in xfs_buf_rele
  2018-10-09 22:00 [PATCH] xfs: fix use-after-free race in xfs_buf_rele Dave Chinner
  2018-10-10 13:19 ` Brian Foster
@ 2018-10-10 13:36 ` Carlos Maiolino
  1 sibling, 0 replies; 4+ messages in thread
From: Carlos Maiolino @ 2018-10-10 13:36 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Wed, Oct 10, 2018 at 09:00:44AM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> When looking at a 4.18 based KASAN use after free report, I noticed
> that racing xfs_buf_rele() may race on dropping the last reference
> to the buffer and taking the buffer lock. This was the symptom
> displayed by the KASAN report, but the actual issue that was
> reported had already been fixed in 4.19-rc1 by commit e339dd8d8b04
> ("xfs: use sync buffer I/O for sync delwri queue submission").
> 
> Despite this, I think there is still an issue with xfs_buf_rele()
> in this code:
> 
> 
>         release = atomic_dec_and_lock(&bp->b_hold, &pag->pag_buf_lock);
>         spin_lock(&bp->b_lock);
>         if (!release) {
> .....
> 
> If two threads race on the b_lock after both dropping a reference
> and one getting dropping the last reference so release = true, we
> end up with:
> 
> 
> CPU 0				CPU 1
> atomic_dec_and_lock()
> 				atomic_dec_and_lock()
> 				spin_lock(&bp->b_lock)
> spin_lock(&bp->b_lock)
> <spins>
> 				<release = true bp->b_lru_ref = 0>
> 				<remove from lists>
> 				freebuf = true
> 				spin_unlock(&bp->b_lock)
> 				xfs_buf_free(bp)
> <gets lock, reading and writing freed memory>
> <accesses freed memory>
> spin_unlock(&bp->b_lock) <reads/writes freed memory>
> 
> IOWs, we can't safely take bp->b_lock after dropping the hold
> reference because the buffer may go away at any time after we
> drop that reference. However, this can be fixed simply by taking the
> bp->b_lock before we drop the reference.
> 
> It is safe to nest the pag_buf_lock inside bp->b_lock as the
> pag_buf_lock is only used to serialise against lookup in
> xfs_buf_find() and no other locks are held over or under the
> pag_buf_lock there. Make this clear by documenting the buffer lock
> orders at the top of the file.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>


Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>



> ---
>  fs/xfs/xfs_buf.c | 38 +++++++++++++++++++++++++++++++++++++-
>  1 file changed, 37 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> index 57d28dde5a78..d76116760ef6 100644
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -37,6 +37,32 @@ static kmem_zone_t *xfs_buf_zone;
>  #define xb_to_gfp(flags) \
>  	((((flags) & XBF_READ_AHEAD) ? __GFP_NORETRY : GFP_NOFS) | __GFP_NOWARN)
>  
> +/*
> + * Locking orders
> + *
> + * xfs_buf_ioacct_inc:
> + * xfs_buf_ioacct_dec:
> + *	b_sema (caller holds)
> + *	  b_lock
> + *
> + * xfs_buf_stale:
> + *	b_sema (caller holds)
> + *	  b_lock
> + *	    lru_lock
> + *
> + * xfs_buf_rele:
> + *	b_lock
> + *	  pag_buf_lock
> + *	    lru_lock
> + *
> + * xfs_buftarg_wait_rele
> + *	lru_lock
> + *	  b_lock (trylock due to inversion)
> + *
> + * xfs_buftarg_isolate
> + *	lru_lock
> + *	  b_lock (trylock due to inversion)
> + */
>  
>  static inline int
>  xfs_buf_is_vmapped(
> @@ -1036,8 +1062,18 @@ xfs_buf_rele(
>  
>  	ASSERT(atomic_read(&bp->b_hold) > 0);
>  
> -	release = atomic_dec_and_lock(&bp->b_hold, &pag->pag_buf_lock);
> +	/*
> +	 * We grab the b_lock here first to serialise racing xfs_buf_rele()
> +	 * calls. The pag_buf_lock being taken on the last reference only
> +	 * serialises against racing lookups in xfs_buf_find(). IOWs, the second
> +	 * to last reference we drop here is not serialised against the last
> +	 * reference until we take bp->b_lock. Hence if we don't grab b_lock
> +	 * first, the last "release" reference can win the race to the lock and
> +	 * free the buffer before the second-to-last reference is processed,
> +	 * leading to a use-after-free scenario.
> +	 */
>  	spin_lock(&bp->b_lock);
> +	release = atomic_dec_and_lock(&bp->b_hold, &pag->pag_buf_lock);
>  	if (!release) {
>  		/*
>  		 * Drop the in-flight state if the buffer is already on the LRU
> -- 
> 2.17.0
> 

-- 
Carlos

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] xfs: fix use-after-free race in xfs_buf_rele
  2018-10-10 13:19 ` Brian Foster
@ 2018-10-10 21:42   ` Dave Chinner
  0 siblings, 0 replies; 4+ messages in thread
From: Dave Chinner @ 2018-10-10 21:42 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Wed, Oct 10, 2018 at 09:19:51AM -0400, Brian Foster wrote:
> On Wed, Oct 10, 2018 at 09:00:44AM +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > When looking at a 4.18 based KASAN use after free report, I noticed
> > that racing xfs_buf_rele() may race on dropping the last reference
> > to the buffer and taking the buffer lock. This was the symptom
> > displayed by the KASAN report, but the actual issue that was
> > reported had already been fixed in 4.19-rc1 by commit e339dd8d8b04
> > ("xfs: use sync buffer I/O for sync delwri queue submission").
> > 
> 
> Are you saying that the KASAN report reflected the delwri queue race and
> the rele() issue was discovered by inspection, or that the KASAN report
> occurred with the delwri queue fix already in place? Or IOW, has this
> been reproduced?

Neither. The KASAN report exposed the rele() race on 4.18, but the
code that exposed the race was modified in 4.19 so it was not
susceptible to the race condition.

i.e. the old delwri code had the condition that the async io
completion could drop the IO reference at the same that the delwri
queue waiting code would drop it's reference. Basically, it was

IO completion			delwri wait for complation
				lock
xfs_buf_relse()
  unlock
    wakeup waiter
				xfs_buf_relse
				  unlock
  xfs_buf_rele()		  xfs_buf_rele()

And the two of them then race for the last reference and bp->b_lock.

This can happen anywhere there are two concurrent xfs_buf_rele()
calls. That's not common because most buffer access are serialised
by the bp->b_sema, but the above is an example of how waiting on
IO completion by just locking the buffer can trigger it...

> > Despite this, I think there is still an issue with xfs_buf_rele()
> > in this code:
> > 
> > 
> >         release = atomic_dec_and_lock(&bp->b_hold, &pag->pag_buf_lock);
> >         spin_lock(&bp->b_lock);
> >         if (!release) {
> > .....
> > 
> > If two threads race on the b_lock after both dropping a reference
> > and one getting dropping the last reference so release = true, we
> > end up with:
> > 
> > 
> > CPU 0				CPU 1
> > atomic_dec_and_lock()
> > 				atomic_dec_and_lock()
> > 				spin_lock(&bp->b_lock)
> > spin_lock(&bp->b_lock)
> > <spins>
> 
> Ok, so CPU0 drops one ref and returns, CPU1 drops the last ref, acquires
> ->b_pag_lock, returns and wins the race to ->b_lock. The latter bit
> basically means the final reference cleanup executes before the previous
> reference drop gets to acquire ->b_lock. CPU1 releases ->b_lock and the
> race is on between freeing the buffer and processing the intermediate
> release.
> 
> I suppose this makes sense because ->b_pag_lock is only acquired on the
> final release. Therefore, ->b_lock is the real serialization point as
> far as xfs_buf_rele() is concerned.

Yes, that's pretty much the issue here.

> > 				<release = true bp->b_lru_ref = 0>
> > 				<remove from lists>
> > 				freebuf = true
> > 				spin_unlock(&bp->b_lock)
> > 				xfs_buf_free(bp)
> > <gets lock, reading and writing freed memory>
> > <accesses freed memory>
> > spin_unlock(&bp->b_lock) <reads/writes freed memory>
> > 
> > IOWs, we can't safely take bp->b_lock after dropping the hold
> > reference because the buffer may go away at any time after we
> > drop that reference. However, this can be fixed simply by taking the
> > bp->b_lock before we drop the reference.
> > 
> > It is safe to nest the pag_buf_lock inside bp->b_lock as the
> > pag_buf_lock is only used to serialise against lookup in
> > xfs_buf_find() and no other locks are held over or under the
> > pag_buf_lock there. Make this clear by documenting the buffer lock
> > orders at the top of the file.
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > ---
> 
> Looks reasonable enough to me:
> 
> Reviewed-by: Brian Foster <bfoster@redhat.com>

Thanks!

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-10-11  5:06 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-09 22:00 [PATCH] xfs: fix use-after-free race in xfs_buf_rele Dave Chinner
2018-10-10 13:19 ` Brian Foster
2018-10-10 21:42   ` Dave Chinner
2018-10-10 13:36 ` Carlos Maiolino

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.