linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH] jbd2: avoid __GFP_ZERO with SLAB_TYPESAFE_BY_RCU
@ 2022-02-09 16:57 Qian Cai
  2022-02-09 18:10 ` Jan Kara
  0 siblings, 1 reply; 12+ messages in thread
From: Qian Cai @ 2022-02-09 16:57 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Jan Kara, Paul E. McKenney, Neeraj Upadhyay, Joel Fernandes,
	Boqun Feng, linux-ext4, rcu, linux-kernel, Qian Cai

Since the linux-next commit 120aa5e57479 (mm: Check for
SLAB_TYPESAFE_BY_RCU and __GFP_ZERO slab allocation), we will get a
boot warning. Avoid it by calling synchronize_rcu() before the zeroing.

Signed-off-by: Qian Cai <quic_qiancai@quicinc.com>
---
 fs/jbd2/journal.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index c2cf74b01ddb..323112de5921 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -2861,15 +2861,18 @@ static struct journal_head *journal_alloc_journal_head(void)
 #ifdef CONFIG_JBD2_DEBUG
 	atomic_inc(&nr_journal_heads);
 #endif
-	ret = kmem_cache_zalloc(jbd2_journal_head_cache, GFP_NOFS);
+	ret = kmem_cache_alloc(jbd2_journal_head_cache, GFP_NOFS);
 	if (!ret) {
 		jbd_debug(1, "out of memory for journal_head\n");
 		pr_notice_ratelimited("ENOMEM in %s, retrying.\n", __func__);
-		ret = kmem_cache_zalloc(jbd2_journal_head_cache,
+		ret = kmem_cache_alloc(jbd2_journal_head_cache,
 				GFP_NOFS | __GFP_NOFAIL);
 	}
-	if (ret)
+	if (ret) {
+		synchronize_rcu();
+		memset(ret, 0, sizeof(*ret));
 		spin_lock_init(&ret->b_state_lock);
+	}
 	return ret;
 }
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] jbd2: avoid __GFP_ZERO with SLAB_TYPESAFE_BY_RCU
  2022-02-09 16:57 [RFC PATCH] jbd2: avoid __GFP_ZERO with SLAB_TYPESAFE_BY_RCU Qian Cai
@ 2022-02-09 18:10 ` Jan Kara
  2022-02-09 18:46   ` Qian Cai
  2022-02-09 20:11   ` Paul E. McKenney
  0 siblings, 2 replies; 12+ messages in thread
From: Jan Kara @ 2022-02-09 18:10 UTC (permalink / raw)
  To: Qian Cai
  Cc: Theodore Ts'o, Jan Kara, Paul E. McKenney, Neeraj Upadhyay,
	Joel Fernandes, Boqun Feng, linux-ext4, rcu, linux-kernel

On Wed 09-02-22 11:57:42, Qian Cai wrote:
> Since the linux-next commit 120aa5e57479 (mm: Check for
> SLAB_TYPESAFE_BY_RCU and __GFP_ZERO slab allocation), we will get a
> boot warning. Avoid it by calling synchronize_rcu() before the zeroing.
> 
> Signed-off-by: Qian Cai <quic_qiancai@quicinc.com>

No, the performance impact of this would be just horrible. Can you
ellaborate a bit why SLAB_TYPESAFE_BY_RCU + __GFP_ZERO is a problem and why
synchronize_rcu() would be needed here before the memset() please? I mean
how is zeroing here any different from the memory just being used?

								Honza

> ---
>  fs/jbd2/journal.c | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
> index c2cf74b01ddb..323112de5921 100644
> --- a/fs/jbd2/journal.c
> +++ b/fs/jbd2/journal.c
> @@ -2861,15 +2861,18 @@ static struct journal_head *journal_alloc_journal_head(void)
>  #ifdef CONFIG_JBD2_DEBUG
>  	atomic_inc(&nr_journal_heads);
>  #endif
> -	ret = kmem_cache_zalloc(jbd2_journal_head_cache, GFP_NOFS);
> +	ret = kmem_cache_alloc(jbd2_journal_head_cache, GFP_NOFS);
>  	if (!ret) {
>  		jbd_debug(1, "out of memory for journal_head\n");
>  		pr_notice_ratelimited("ENOMEM in %s, retrying.\n", __func__);
> -		ret = kmem_cache_zalloc(jbd2_journal_head_cache,
> +		ret = kmem_cache_alloc(jbd2_journal_head_cache,
>  				GFP_NOFS | __GFP_NOFAIL);
>  	}
> -	if (ret)
> +	if (ret) {
> +		synchronize_rcu();
> +		memset(ret, 0, sizeof(*ret));
>  		spin_lock_init(&ret->b_state_lock);
> +	}
>  	return ret;
>  }
>  
> -- 
> 2.30.2
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] jbd2: avoid __GFP_ZERO with SLAB_TYPESAFE_BY_RCU
  2022-02-09 18:10 ` Jan Kara
@ 2022-02-09 18:46   ` Qian Cai
  2022-02-09 20:11   ` Paul E. McKenney
  1 sibling, 0 replies; 12+ messages in thread
From: Qian Cai @ 2022-02-09 18:46 UTC (permalink / raw)
  To: Jan Kara
  Cc: Theodore Ts'o, Jan Kara, Paul E. McKenney, Neeraj Upadhyay,
	Joel Fernandes, Boqun Feng, linux-ext4, rcu, linux-kernel

On Wed, Feb 09, 2022 at 07:10:10PM +0100, Jan Kara wrote:
> On Wed 09-02-22 11:57:42, Qian Cai wrote:
> > Since the linux-next commit 120aa5e57479 (mm: Check for
> > SLAB_TYPESAFE_BY_RCU and __GFP_ZERO slab allocation), we will get a
> > boot warning. Avoid it by calling synchronize_rcu() before the zeroing.
> > 
> > Signed-off-by: Qian Cai <quic_qiancai@quicinc.com>
> 
> No, the performance impact of this would be just horrible. Can you
> ellaborate a bit why SLAB_TYPESAFE_BY_RCU + __GFP_ZERO is a problem and why
> synchronize_rcu() would be needed here before the memset() please? I mean
> how is zeroing here any different from the memory just being used?

I'll defer to Paul and other RCU developers for more indepth explanations of
the issue with the combo. The above mentioned commit has a bit information:

    Code using a SLAB_TYPESAFE_BY_RCU kmem_cache can have readers accessing
    blocks of memory passed to kmem_cache_free(), and those readers might
    still be accessing those blocks after kmem_cache_alloc() reallocates
    those blocks.  These readers are not going to take kindly to that memory
    being zeroed along the way.  Therefore, add a WARN_ON_ONCE() complaining
    about __GFP_ZERO being passed to an allocation from a SLAB_TYPESAFE_BY_RCU
    kmem_cache.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] jbd2: avoid __GFP_ZERO with SLAB_TYPESAFE_BY_RCU
  2022-02-09 18:10 ` Jan Kara
  2022-02-09 18:46   ` Qian Cai
@ 2022-02-09 20:11   ` Paul E. McKenney
  2022-02-10  5:07     ` Theodore Ts'o
  2022-02-10  9:16     ` Jan Kara
  1 sibling, 2 replies; 12+ messages in thread
From: Paul E. McKenney @ 2022-02-09 20:11 UTC (permalink / raw)
  To: Jan Kara
  Cc: Qian Cai, Theodore Ts'o, Jan Kara, Neeraj Upadhyay,
	Joel Fernandes, Boqun Feng, linux-ext4, rcu, linux-kernel

On Wed, Feb 09, 2022 at 07:10:10PM +0100, Jan Kara wrote:
> On Wed 09-02-22 11:57:42, Qian Cai wrote:
> > Since the linux-next commit 120aa5e57479 (mm: Check for
> > SLAB_TYPESAFE_BY_RCU and __GFP_ZERO slab allocation), we will get a
> > boot warning. Avoid it by calling synchronize_rcu() before the zeroing.
> > 
> > Signed-off-by: Qian Cai <quic_qiancai@quicinc.com>
> 
> No, the performance impact of this would be just horrible. Can you
> ellaborate a bit why SLAB_TYPESAFE_BY_RCU + __GFP_ZERO is a problem and why
> synchronize_rcu() would be needed here before the memset() please? I mean
> how is zeroing here any different from the memory just being used?

Suppose a reader picks up a pointer to a memory block, then that memory
is freed.  No problem, given that this is a SLAB_TYPESAFE_BY_RCU slab,
so the memory won't be freed while the reader is accessing it.  But while
the reader is in the process of validating the block, it is zeroed.

How does the validation step handle this in all cases?

If you have a way of handling this, I will of course drop the patch.
And learn something new, which is always a good thing.  ;-)

							Thanx, Paul

> > ---
> >  fs/jbd2/journal.c | 9 ++++++---
> >  1 file changed, 6 insertions(+), 3 deletions(-)
> > 
> > diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
> > index c2cf74b01ddb..323112de5921 100644
> > --- a/fs/jbd2/journal.c
> > +++ b/fs/jbd2/journal.c
> > @@ -2861,15 +2861,18 @@ static struct journal_head *journal_alloc_journal_head(void)
> >  #ifdef CONFIG_JBD2_DEBUG
> >  	atomic_inc(&nr_journal_heads);
> >  #endif
> > -	ret = kmem_cache_zalloc(jbd2_journal_head_cache, GFP_NOFS);
> > +	ret = kmem_cache_alloc(jbd2_journal_head_cache, GFP_NOFS);
> >  	if (!ret) {
> >  		jbd_debug(1, "out of memory for journal_head\n");
> >  		pr_notice_ratelimited("ENOMEM in %s, retrying.\n", __func__);
> > -		ret = kmem_cache_zalloc(jbd2_journal_head_cache,
> > +		ret = kmem_cache_alloc(jbd2_journal_head_cache,
> >  				GFP_NOFS | __GFP_NOFAIL);
> >  	}
> > -	if (ret)
> > +	if (ret) {
> > +		synchronize_rcu();
> > +		memset(ret, 0, sizeof(*ret));
> >  		spin_lock_init(&ret->b_state_lock);
> > +	}
> >  	return ret;
> >  }
> >  
> > -- 
> > 2.30.2
> > 
> -- 
> Jan Kara <jack@suse.com>
> SUSE Labs, CR

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] jbd2: avoid __GFP_ZERO with SLAB_TYPESAFE_BY_RCU
  2022-02-09 20:11   ` Paul E. McKenney
@ 2022-02-10  5:07     ` Theodore Ts'o
  2022-02-10  5:43       ` Paul E. McKenney
  2022-02-10  9:16     ` Jan Kara
  1 sibling, 1 reply; 12+ messages in thread
From: Theodore Ts'o @ 2022-02-10  5:07 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Jan Kara, Qian Cai, Jan Kara, Neeraj Upadhyay, Joel Fernandes,
	Boqun Feng, linux-ext4, rcu, linux-kernel

On Wed, Feb 09, 2022 at 12:11:37PM -0800, Paul E. McKenney wrote:
> On Wed, Feb 09, 2022 at 07:10:10PM +0100, Jan Kara wrote:
> > 
> > No, the performance impact of this would be just horrible. Can you
> > ellaborate a bit why SLAB_TYPESAFE_BY_RCU + __GFP_ZERO is a problem and why
> > synchronize_rcu() would be needed here before the memset() please? I mean
> > how is zeroing here any different from the memory just being used?
> 
> Suppose a reader picks up a pointer to a memory block, then that memory
> is freed.  No problem, given that this is a SLAB_TYPESAFE_BY_RCU slab,
> so the memory won't be freed while the reader is accessing it.  But while
> the reader is in the process of validating the block, it is zeroed.
> 
> How does the validation step handle this in all cases?
> 
> If you have a way of handling this, I will of course drop the patch.
> And learn something new, which is always a good thing.  ;-)

I must be missing something.  The change is on the allocation path,
and why would kmem_cache_[z]alloc() return a memory chunk which could
still be in use by a reader?  Shouldn't the allocator _not_ return a
particular chunk until it is sure there aren't any readers left that
would be discombobulated by that memory being used for some new use
case?

Otherwise we would have to add synchronize_rcu(); after every single
kmem_cache allocation which might be using RCU, and that would be
terrible, no?

					- Ted


> > > ---
> > >  fs/jbd2/journal.c | 9 ++++++---
> > >  1 file changed, 6 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
> > > index c2cf74b01ddb..323112de5921 100644
> > > --- a/fs/jbd2/journal.c
> > > +++ b/fs/jbd2/journal.c
> > > @@ -2861,15 +2861,18 @@ static struct journal_head *journal_alloc_journal_head(void)
> > >  #ifdef CONFIG_JBD2_DEBUG
> > >  	atomic_inc(&nr_journal_heads);
> > >  #endif
> > > -	ret = kmem_cache_zalloc(jbd2_journal_head_cache, GFP_NOFS);
> > > +	ret = kmem_cache_alloc(jbd2_journal_head_cache, GFP_NOFS);
> > >  	if (!ret) {
> > >  		jbd_debug(1, "out of memory for journal_head\n");
> > >  		pr_notice_ratelimited("ENOMEM in %s, retrying.\n", __func__);
> > > -		ret = kmem_cache_zalloc(jbd2_journal_head_cache,
> > > +		ret = kmem_cache_alloc(jbd2_journal_head_cache,
> > >  				GFP_NOFS | __GFP_NOFAIL);
> > >  	}
> > > -	if (ret)
> > > +	if (ret) {
> > > +		synchronize_rcu();
> > > +		memset(ret, 0, sizeof(*ret));
> > >  		spin_lock_init(&ret->b_state_lock);
> > > +	}
> > >  	return ret;
> > >  }
> > >  
> > > -- 
> > > 2.30.2
> > > 
> > -- 
> > Jan Kara <jack@suse.com>
> > SUSE Labs, CR

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] jbd2: avoid __GFP_ZERO with SLAB_TYPESAFE_BY_RCU
  2022-02-10  5:07     ` Theodore Ts'o
@ 2022-02-10  5:43       ` Paul E. McKenney
  2022-02-10 15:54         ` Theodore Ts'o
  0 siblings, 1 reply; 12+ messages in thread
From: Paul E. McKenney @ 2022-02-10  5:43 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Jan Kara, Qian Cai, Jan Kara, Neeraj Upadhyay, Joel Fernandes,
	Boqun Feng, linux-ext4, rcu, linux-kernel

On Thu, Feb 10, 2022 at 12:07:33AM -0500, Theodore Ts'o wrote:
> On Wed, Feb 09, 2022 at 12:11:37PM -0800, Paul E. McKenney wrote:
> > On Wed, Feb 09, 2022 at 07:10:10PM +0100, Jan Kara wrote:
> > > 
> > > No, the performance impact of this would be just horrible. Can you
> > > ellaborate a bit why SLAB_TYPESAFE_BY_RCU + __GFP_ZERO is a problem and why
> > > synchronize_rcu() would be needed here before the memset() please? I mean
> > > how is zeroing here any different from the memory just being used?
> > 
> > Suppose a reader picks up a pointer to a memory block, then that memory
> > is freed.  No problem, given that this is a SLAB_TYPESAFE_BY_RCU slab,
> > so the memory won't be freed while the reader is accessing it.  But while
> > the reader is in the process of validating the block, it is zeroed.
> > 
> > How does the validation step handle this in all cases?
> > 
> > If you have a way of handling this, I will of course drop the patch.
> > And learn something new, which is always a good thing.  ;-)
> 
> I must be missing something.  The change is on the allocation path,
> and why would kmem_cache_[z]alloc() return a memory chunk which could
> still be in use by a reader?  Shouldn't the allocator _not_ return a
> particular chunk until it is sure there aren't any readers left that
> would be discombobulated by that memory being used for some new use
> case?

From the allocator's viewpoint yes, but ...

> Otherwise we would have to add synchronize_rcu(); after every single
> kmem_cache allocation which might be using RCU, and that would be
> terrible, no?

... if ext4 is not freeing memory blocks that might still be referenced
by RCU readers, then the SLAB_TYPESAFE_BY_RCU should be removed.
This "might still be referenced" is from the viewpoint of the code using
the allocator, not from that of the allocator itself.

So the typical RCU approach (not involving SLAB_TYPESAFE_BY_RCU)
is to take the grace period at the time of the free.  This can be
done synchronously using synchronize_rcu(), but is often instead done
asynchronously using call_rcu() or kfree_rcu().  So in this case,
you don't need synchronize_rcu() on allocation because the required
grace period already happened at *free() time.

But there are a few situations where it makes sense to free blocks that
readers might still be referencing.  Readers must then add validity
checks to detect this case, and also prevent freeing, for example,
using a per-block spinlock for synchronization.  For example, a reader
might acquire a spinlock in the block to prevent changes, recheck the
lookup key, and if the key does not match, release the lock and pretend
not to have found the block.  If the key does match, anything attempting
to delete and free the block will be spinning on that same spinlock.

And so if you specify SLAB_TYPESAFE_BY_RCU, the slab allocator is
guaranteeing type safety to RCU readers instead of the usual existence
guarantee.  A memory block might be freed out from under an RCU reader,
but its type will remain the same.  This means that the grace period
happens internally to the slab allocator when a slab is returned to
the system.

So either the validation checks are quite novel, the kmem_cache_zalloc()
calls should be replaced by kmem_cache_alloc() plus validation checks,
or the SLAB_TYPESAFE_BY_RCU should be removed.

Just out of curiosity, what is your mental model of SLAB_TYPESAFE_BY_RCU?

And yes, I did just up the visibility of this topic in my upcoming
presentation...

							Thanx, Paul

> > > > ---
> > > >  fs/jbd2/journal.c | 9 ++++++---
> > > >  1 file changed, 6 insertions(+), 3 deletions(-)
> > > > 
> > > > diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
> > > > index c2cf74b01ddb..323112de5921 100644
> > > > --- a/fs/jbd2/journal.c
> > > > +++ b/fs/jbd2/journal.c
> > > > @@ -2861,15 +2861,18 @@ static struct journal_head *journal_alloc_journal_head(void)
> > > >  #ifdef CONFIG_JBD2_DEBUG
> > > >  	atomic_inc(&nr_journal_heads);
> > > >  #endif
> > > > -	ret = kmem_cache_zalloc(jbd2_journal_head_cache, GFP_NOFS);
> > > > +	ret = kmem_cache_alloc(jbd2_journal_head_cache, GFP_NOFS);
> > > >  	if (!ret) {
> > > >  		jbd_debug(1, "out of memory for journal_head\n");
> > > >  		pr_notice_ratelimited("ENOMEM in %s, retrying.\n", __func__);
> > > > -		ret = kmem_cache_zalloc(jbd2_journal_head_cache,
> > > > +		ret = kmem_cache_alloc(jbd2_journal_head_cache,
> > > >  				GFP_NOFS | __GFP_NOFAIL);
> > > >  	}
> > > > -	if (ret)
> > > > +	if (ret) {
> > > > +		synchronize_rcu();
> > > > +		memset(ret, 0, sizeof(*ret));
> > > >  		spin_lock_init(&ret->b_state_lock);
> > > > +	}
> > > >  	return ret;
> > > >  }
> > > >  
> > > > -- 
> > > > 2.30.2
> > > > 
> > > -- 
> > > Jan Kara <jack@suse.com>
> > > SUSE Labs, CR

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] jbd2: avoid __GFP_ZERO with SLAB_TYPESAFE_BY_RCU
  2022-02-09 20:11   ` Paul E. McKenney
  2022-02-10  5:07     ` Theodore Ts'o
@ 2022-02-10  9:16     ` Jan Kara
  2022-02-10 14:57       ` Paul E. McKenney
  1 sibling, 1 reply; 12+ messages in thread
From: Jan Kara @ 2022-02-10  9:16 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Jan Kara, Qian Cai, Theodore Ts'o, Jan Kara, Neeraj Upadhyay,
	Joel Fernandes, Boqun Feng, linux-ext4, rcu, linux-kernel

On Wed 09-02-22 12:11:37, Paul E. McKenney wrote:
> On Wed, Feb 09, 2022 at 07:10:10PM +0100, Jan Kara wrote:
> > On Wed 09-02-22 11:57:42, Qian Cai wrote:
> > > Since the linux-next commit 120aa5e57479 (mm: Check for
> > > SLAB_TYPESAFE_BY_RCU and __GFP_ZERO slab allocation), we will get a
> > > boot warning. Avoid it by calling synchronize_rcu() before the zeroing.
> > > 
> > > Signed-off-by: Qian Cai <quic_qiancai@quicinc.com>
> > 
> > No, the performance impact of this would be just horrible. Can you
> > ellaborate a bit why SLAB_TYPESAFE_BY_RCU + __GFP_ZERO is a problem and why
> > synchronize_rcu() would be needed here before the memset() please? I mean
> > how is zeroing here any different from the memory just being used?
> 
> Suppose a reader picks up a pointer to a memory block, then that memory
> is freed.  No problem, given that this is a SLAB_TYPESAFE_BY_RCU slab,
> so the memory won't be freed while the reader is accessing it.  But while
> the reader is in the process of validating the block, it is zeroed.
> 
> How does the validation step handle this in all cases?
> 
> If you have a way of handling this, I will of course drop the patch.
> And learn something new, which is always a good thing.  ;-)

So I maybe missed something when implementing the usage of journal_heads
under RCU but let's have a look. An example of RCU user of journal heads
is fs/jbd2/transaction.c:jbd2_write_access_granted(). It does:

        rcu_read_lock();

	// This part fetches journal_head from buffer_head - not related to
	// our slab RCU discussion

        if (!buffer_jbd(bh))
                goto out;
        /* This should be bh2jh() but that doesn't work with inline functions */
        jh = READ_ONCE(bh->b_private);
        if (!jh)
                goto out;

	// The validation comes here

        /* For undo access buffer must have data copied */
        if (undo && !jh->b_committed_data)
                goto out;
        if (READ_ONCE(jh->b_transaction) != handle->h_transaction &&
            READ_ONCE(jh->b_next_transaction) != handle->h_transaction)
                goto out;
	
	// Then some more checks unrelated to the slab itself.

        /*
         * There are two reasons for the barrier here:
         * 1) Make sure to fetch b_bh after we did previous checks so that we
         * detect when jh went through free, realloc, attach to transaction
         * while we were checking. Paired with implicit barrier in that path.
         * 2) So that access to bh done after jbd2_write_access_granted()
         * doesn't get reordered and see inconsistent state of concurrent
         * do_get_write_access().
         */
        smp_mb();
        if (unlikely(jh->b_bh != bh))
                goto out;

	// If all passed

	rcu_read_unlock();
	return true;

So if we are going to return true from the function, we know that 'jh' was
attached to handle->h_transaction at some point. And when 'jh' was attached
to handle->h_transaction, the transaction was holding reference to the 'jh'
and our 'handle' holds reference to the transaction so 'jh' could not be
freed since that moment. I.e., we are sure our reference to the handle keeps
'jh' alive and we can safely use it.

I don't see how any amount of scribbling over 'jh' could break this
validation. But maybe it is just a lack of my imagination :).

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] jbd2: avoid __GFP_ZERO with SLAB_TYPESAFE_BY_RCU
  2022-02-10  9:16     ` Jan Kara
@ 2022-02-10 14:57       ` Paul E. McKenney
  2022-02-10 19:12         ` Jan Kara
  0 siblings, 1 reply; 12+ messages in thread
From: Paul E. McKenney @ 2022-02-10 14:57 UTC (permalink / raw)
  To: Jan Kara
  Cc: Qian Cai, Theodore Ts'o, Jan Kara, Neeraj Upadhyay,
	Joel Fernandes, Boqun Feng, linux-ext4, rcu, linux-kernel

On Thu, Feb 10, 2022 at 10:16:48AM +0100, Jan Kara wrote:
> On Wed 09-02-22 12:11:37, Paul E. McKenney wrote:
> > On Wed, Feb 09, 2022 at 07:10:10PM +0100, Jan Kara wrote:
> > > On Wed 09-02-22 11:57:42, Qian Cai wrote:
> > > > Since the linux-next commit 120aa5e57479 (mm: Check for
> > > > SLAB_TYPESAFE_BY_RCU and __GFP_ZERO slab allocation), we will get a
> > > > boot warning. Avoid it by calling synchronize_rcu() before the zeroing.
> > > > 
> > > > Signed-off-by: Qian Cai <quic_qiancai@quicinc.com>
> > > 
> > > No, the performance impact of this would be just horrible. Can you
> > > ellaborate a bit why SLAB_TYPESAFE_BY_RCU + __GFP_ZERO is a problem and why
> > > synchronize_rcu() would be needed here before the memset() please? I mean
> > > how is zeroing here any different from the memory just being used?
> > 
> > Suppose a reader picks up a pointer to a memory block, then that memory
> > is freed.  No problem, given that this is a SLAB_TYPESAFE_BY_RCU slab,
> > so the memory won't be freed while the reader is accessing it.  But while
> > the reader is in the process of validating the block, it is zeroed.
> > 
> > How does the validation step handle this in all cases?
> > 
> > If you have a way of handling this, I will of course drop the patch.
> > And learn something new, which is always a good thing.  ;-)
> 
> So I maybe missed something when implementing the usage of journal_heads
> under RCU but let's have a look. An example of RCU user of journal heads
> is fs/jbd2/transaction.c:jbd2_write_access_granted(). It does:
> 
>         rcu_read_lock();
> 
> 	// This part fetches journal_head from buffer_head - not related to
> 	// our slab RCU discussion
> 
>         if (!buffer_jbd(bh))
>                 goto out;
>         /* This should be bh2jh() but that doesn't work with inline functions */
>         jh = READ_ONCE(bh->b_private);
>         if (!jh)
>                 goto out;
> 
> 	// The validation comes here
> 
>         /* For undo access buffer must have data copied */
>         if (undo && !jh->b_committed_data)
>                 goto out;

OK, so if *jh was freed and re-zallocated in the meantime, this test
should fail.  One concern would be if the zeroing was not at least eight
bytes at a time, maybe due to overly eager use of fancy SIMD hardware.
Though perhaps you also do something about ->b_committed_data on
the free path, the commit-done path, or whatever?  (I do see a
"jh->b_committed_data = NULL" on what might well be the commit-done path.)

>         if (READ_ONCE(jh->b_transaction) != handle->h_transaction &&
>             READ_ONCE(jh->b_next_transaction) != handle->h_transaction)
>                 goto out;

And same with these guys.

> 	// Then some more checks unrelated to the slab itself.
> 
>         /*
>          * There are two reasons for the barrier here:
>          * 1) Make sure to fetch b_bh after we did previous checks so that we
>          * detect when jh went through free, realloc, attach to transaction
>          * while we were checking. Paired with implicit barrier in that path.
>          * 2) So that access to bh done after jbd2_write_access_granted()
>          * doesn't get reordered and see inconsistent state of concurrent
>          * do_get_write_access().
>          */
>         smp_mb();
>         if (unlikely(jh->b_bh != bh))
>                 goto out;
> 
> 	// If all passed
> 
> 	rcu_read_unlock();
> 	return true;
> 
> So if we are going to return true from the function, we know that 'jh' was
> attached to handle->h_transaction at some point. And when 'jh' was attached
> to handle->h_transaction, the transaction was holding reference to the 'jh'
> and our 'handle' holds reference to the transaction so 'jh' could not be
> freed since that moment. I.e., we are sure our reference to the handle keeps
> 'jh' alive and we can safely use it.
> 
> I don't see how any amount of scribbling over 'jh' could break this
> validation. But maybe it is just a lack of my imagination :).

Regardless of whether you are suffering a lack of imagination, you
have clearly demonstrated that it is possible to correctly use the
SLAB_TYPESAFE_BY_RCU flag in conjunction with kmem_cache_alloc(), thus
demonstrating that I was suffering from a lack of imagination.  ;-)

I have therefore reverted my commit.  Please accept my apologies for
the hassle!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] jbd2: avoid __GFP_ZERO with SLAB_TYPESAFE_BY_RCU
  2022-02-10  5:43       ` Paul E. McKenney
@ 2022-02-10 15:54         ` Theodore Ts'o
  0 siblings, 0 replies; 12+ messages in thread
From: Theodore Ts'o @ 2022-02-10 15:54 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Jan Kara, Qian Cai, Jan Kara, Neeraj Upadhyay, Joel Fernandes,
	Boqun Feng, linux-ext4, rcu, linux-kernel

On Wed, Feb 09, 2022 at 09:43:57PM -0800, Paul E. McKenney wrote:
> So the typical RCU approach (not involving SLAB_TYPESAFE_BY_RCU)
> is to take the grace period at the time of the free.  This can be
> done synchronously using synchronize_rcu(), but is often instead done
> asynchronously using call_rcu() or kfree_rcu().  So in this case,
> you don't need synchronize_rcu() on allocation because the required
> grace period already happened at *free() time.
> 
> But there are a few situations where it makes sense to free blocks that
> readers might still be referencing.  Readers must then add validity
> checks to detect this case, and also prevent freeing, for example,
> using a per-block spinlock for synchronization.  For example, a reader
> might acquire a spinlock in the block to prevent changes, recheck the
> lookup key, and if the key does not match, release the lock and pretend
> not to have found the block.  If the key does match, anything attempting
> to delete and free the block will be spinning on that same spinlock.
> 
> And so if you specify SLAB_TYPESAFE_BY_RCU, the slab allocator is
> guaranteeing type safety to RCU readers instead of the usual existence
> guarantee.  A memory block might be freed out from under an RCU reader,
> but its type will remain the same.  This means that the grace period
> happens internally to the slab allocator when a slab is returned to
> the system.
> 
> So either the validation checks are quite novel, the kmem_cache_zalloc()
> calls should be replaced by kmem_cache_alloc() plus validation checks,
> or the SLAB_TYPESAFE_BY_RCU should be removed.
> 
> Just out of curiosity, what is your mental model of SLAB_TYPESAFE_BY_RCU?

Hmm, so the code in question the flag was called SLAB_DESTROY_BY_RCU
in June 205 by commit de92c8caf16c ("jbd2: speedup
jbd2_journal_get_[write|undo]_access()"), and it was written by Jan.
I don't see anything to make sure the jh doesn't get freed until after
the grace period, and so that looks like a problem unless I'm missing
something.   Jan, what do you think?

					- Ted

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] jbd2: avoid __GFP_ZERO with SLAB_TYPESAFE_BY_RCU
  2022-02-10 14:57       ` Paul E. McKenney
@ 2022-02-10 19:12         ` Jan Kara
  2022-02-10 20:06           ` Paul E. McKenney
  0 siblings, 1 reply; 12+ messages in thread
From: Jan Kara @ 2022-02-10 19:12 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Jan Kara, Qian Cai, Theodore Ts'o, Jan Kara, Neeraj Upadhyay,
	Joel Fernandes, Boqun Feng, linux-ext4, rcu, linux-kernel

On Thu 10-02-22 06:57:13, Paul E. McKenney wrote:
> On Thu, Feb 10, 2022 at 10:16:48AM +0100, Jan Kara wrote:
> > On Wed 09-02-22 12:11:37, Paul E. McKenney wrote:
> > > On Wed, Feb 09, 2022 at 07:10:10PM +0100, Jan Kara wrote:
> > > > On Wed 09-02-22 11:57:42, Qian Cai wrote:
> > > > > Since the linux-next commit 120aa5e57479 (mm: Check for
> > > > > SLAB_TYPESAFE_BY_RCU and __GFP_ZERO slab allocation), we will get a
> > > > > boot warning. Avoid it by calling synchronize_rcu() before the zeroing.
> > > > > 
> > > > > Signed-off-by: Qian Cai <quic_qiancai@quicinc.com>
> > > > 
> > > > No, the performance impact of this would be just horrible. Can you
> > > > ellaborate a bit why SLAB_TYPESAFE_BY_RCU + __GFP_ZERO is a problem and why
> > > > synchronize_rcu() would be needed here before the memset() please? I mean
> > > > how is zeroing here any different from the memory just being used?
> > > 
> > > Suppose a reader picks up a pointer to a memory block, then that memory
> > > is freed.  No problem, given that this is a SLAB_TYPESAFE_BY_RCU slab,
> > > so the memory won't be freed while the reader is accessing it.  But while
> > > the reader is in the process of validating the block, it is zeroed.
> > > 
> > > How does the validation step handle this in all cases?
> > > 
> > > If you have a way of handling this, I will of course drop the patch.
> > > And learn something new, which is always a good thing.  ;-)
> > 
> > So I maybe missed something when implementing the usage of journal_heads
> > under RCU but let's have a look. An example of RCU user of journal heads
> > is fs/jbd2/transaction.c:jbd2_write_access_granted(). It does:
> > 
> >         rcu_read_lock();
> > 
> > 	// This part fetches journal_head from buffer_head - not related to
> > 	// our slab RCU discussion
> > 
> >         if (!buffer_jbd(bh))
> >                 goto out;
> >         /* This should be bh2jh() but that doesn't work with inline functions */
> >         jh = READ_ONCE(bh->b_private);
> >         if (!jh)
> >                 goto out;
> > 
> > 	// The validation comes here
> > 
> >         /* For undo access buffer must have data copied */
> >         if (undo && !jh->b_committed_data)
> >                 goto out;
> 
> OK, so if *jh was freed and re-zallocated in the meantime, this test
> should fail.  One concern would be if the zeroing was not at least eight
> bytes at a time, maybe due to overly eager use of fancy SIMD hardware.
> Though perhaps you also do something about ->b_committed_data on
> the free path, the commit-done path, or whatever?  (I do see a
> "jh->b_committed_data = NULL" on what might well be the commit-done path.)
>
> >         if (READ_ONCE(jh->b_transaction) != handle->h_transaction &&
> >             READ_ONCE(jh->b_next_transaction) != handle->h_transaction)
> >                 goto out;
> 
> And same with these guys.

Yes, on commit-done path we zero out jh->b_transaction (or set
jh->b_transaction = jh->b_next_transaction; jh->b_next_transaction = NULL).
So these fields are actually guaranteed to be zero on free.

> > 	// Then some more checks unrelated to the slab itself.
> > 
> >         /*
> >          * There are two reasons for the barrier here:
> >          * 1) Make sure to fetch b_bh after we did previous checks so that we
> >          * detect when jh went through free, realloc, attach to transaction
> >          * while we were checking. Paired with implicit barrier in that path.
> >          * 2) So that access to bh done after jbd2_write_access_granted()
> >          * doesn't get reordered and see inconsistent state of concurrent
> >          * do_get_write_access().
> >          */
> >         smp_mb();
> >         if (unlikely(jh->b_bh != bh))
> >                 goto out;
> > 
> > 	// If all passed
> > 
> > 	rcu_read_unlock();
> > 	return true;
> > 
> > So if we are going to return true from the function, we know that 'jh' was
> > attached to handle->h_transaction at some point. And when 'jh' was attached
> > to handle->h_transaction, the transaction was holding reference to the 'jh'
> > and our 'handle' holds reference to the transaction so 'jh' could not be
> > freed since that moment. I.e., we are sure our reference to the handle keeps
> > 'jh' alive and we can safely use it.
> > 
> > I don't see how any amount of scribbling over 'jh' could break this
> > validation. But maybe it is just a lack of my imagination :).
> 
> Regardless of whether you are suffering a lack of imagination, you
> have clearly demonstrated that it is possible to correctly use the
> SLAB_TYPESAFE_BY_RCU flag in conjunction with kmem_cache_alloc(), thus
> demonstrating that I was suffering from a lack of imagination.  ;-)
> 
> I have therefore reverted my commit.  Please accept my apologies for
> the hassle!

No problem. Thanks for reverting the patch. I can imagine that jbd2's use
of SLAB_TYPESAFE_BY_RCU is an unusual one...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] jbd2: avoid __GFP_ZERO with SLAB_TYPESAFE_BY_RCU
  2022-02-10 19:12         ` Jan Kara
@ 2022-02-10 20:06           ` Paul E. McKenney
  2022-02-10 20:08             ` Paul E. McKenney
  0 siblings, 1 reply; 12+ messages in thread
From: Paul E. McKenney @ 2022-02-10 20:06 UTC (permalink / raw)
  To: Jan Kara
  Cc: Qian Cai, Theodore Ts'o, Jan Kara, Neeraj Upadhyay,
	Joel Fernandes, Boqun Feng, linux-ext4, rcu, linux-kernel

On Thu, Feb 10, 2022 at 08:12:52PM +0100, Jan Kara wrote:
> On Thu 10-02-22 06:57:13, Paul E. McKenney wrote:
> > On Thu, Feb 10, 2022 at 10:16:48AM +0100, Jan Kara wrote:
> > > On Wed 09-02-22 12:11:37, Paul E. McKenney wrote:
> > > > On Wed, Feb 09, 2022 at 07:10:10PM +0100, Jan Kara wrote:
> > > > > On Wed 09-02-22 11:57:42, Qian Cai wrote:
> > > > > > Since the linux-next commit 120aa5e57479 (mm: Check for
> > > > > > SLAB_TYPESAFE_BY_RCU and __GFP_ZERO slab allocation), we will get a
> > > > > > boot warning. Avoid it by calling synchronize_rcu() before the zeroing.
> > > > > > 
> > > > > > Signed-off-by: Qian Cai <quic_qiancai@quicinc.com>
> > > > > 
> > > > > No, the performance impact of this would be just horrible. Can you
> > > > > ellaborate a bit why SLAB_TYPESAFE_BY_RCU + __GFP_ZERO is a problem and why
> > > > > synchronize_rcu() would be needed here before the memset() please? I mean
> > > > > how is zeroing here any different from the memory just being used?
> > > > 
> > > > Suppose a reader picks up a pointer to a memory block, then that memory
> > > > is freed.  No problem, given that this is a SLAB_TYPESAFE_BY_RCU slab,
> > > > so the memory won't be freed while the reader is accessing it.  But while
> > > > the reader is in the process of validating the block, it is zeroed.
> > > > 
> > > > How does the validation step handle this in all cases?
> > > > 
> > > > If you have a way of handling this, I will of course drop the patch.
> > > > And learn something new, which is always a good thing.  ;-)
> > > 
> > > So I maybe missed something when implementing the usage of journal_heads
> > > under RCU but let's have a look. An example of RCU user of journal heads
> > > is fs/jbd2/transaction.c:jbd2_write_access_granted(). It does:
> > > 
> > >         rcu_read_lock();
> > > 
> > > 	// This part fetches journal_head from buffer_head - not related to
> > > 	// our slab RCU discussion
> > > 
> > >         if (!buffer_jbd(bh))
> > >                 goto out;
> > >         /* This should be bh2jh() but that doesn't work with inline functions */
> > >         jh = READ_ONCE(bh->b_private);
> > >         if (!jh)
> > >                 goto out;
> > > 
> > > 	// The validation comes here
> > > 
> > >         /* For undo access buffer must have data copied */
> > >         if (undo && !jh->b_committed_data)
> > >                 goto out;
> > 
> > OK, so if *jh was freed and re-zallocated in the meantime, this test
> > should fail.  One concern would be if the zeroing was not at least eight
> > bytes at a time, maybe due to overly eager use of fancy SIMD hardware.
> > Though perhaps you also do something about ->b_committed_data on
> > the free path, the commit-done path, or whatever?  (I do see a
> > "jh->b_committed_data = NULL" on what might well be the commit-done path.)
> >
> > >         if (READ_ONCE(jh->b_transaction) != handle->h_transaction &&
> > >             READ_ONCE(jh->b_next_transaction) != handle->h_transaction)
> > >                 goto out;
> > 
> > And same with these guys.
> 
> Yes, on commit-done path we zero out jh->b_transaction (or set
> jh->b_transaction = jh->b_next_transaction; jh->b_next_transaction = NULL).
> So these fields are actually guaranteed to be zero on free.

Very good, thank you!

One more thing...

This assumes that when the slab allocator gets a fresh slab from mm,
that slab has been zeroed, right?  Or is there some other trick that
you are using to somehow accommodate randomly initialized memory?

Or am I just blind today and missing where the zeroing always happens
(other than for pages destined for userspace)?

> > > 	// Then some more checks unrelated to the slab itself.
> > > 
> > >         /*
> > >          * There are two reasons for the barrier here:
> > >          * 1) Make sure to fetch b_bh after we did previous checks so that we
> > >          * detect when jh went through free, realloc, attach to transaction
> > >          * while we were checking. Paired with implicit barrier in that path.
> > >          * 2) So that access to bh done after jbd2_write_access_granted()
> > >          * doesn't get reordered and see inconsistent state of concurrent
> > >          * do_get_write_access().
> > >          */
> > >         smp_mb();
> > >         if (unlikely(jh->b_bh != bh))
> > >                 goto out;
> > > 
> > > 	// If all passed
> > > 
> > > 	rcu_read_unlock();
> > > 	return true;
> > > 
> > > So if we are going to return true from the function, we know that 'jh' was
> > > attached to handle->h_transaction at some point. And when 'jh' was attached
> > > to handle->h_transaction, the transaction was holding reference to the 'jh'
> > > and our 'handle' holds reference to the transaction so 'jh' could not be
> > > freed since that moment. I.e., we are sure our reference to the handle keeps
> > > 'jh' alive and we can safely use it.
> > > 
> > > I don't see how any amount of scribbling over 'jh' could break this
> > > validation. But maybe it is just a lack of my imagination :).
> > 
> > Regardless of whether you are suffering a lack of imagination, you
> > have clearly demonstrated that it is possible to correctly use the
> > SLAB_TYPESAFE_BY_RCU flag in conjunction with kmem_cache_alloc(), thus
> > demonstrating that I was suffering from a lack of imagination.  ;-)
> > 
> > I have therefore reverted my commit.  Please accept my apologies for
> > the hassle!
> 
> No problem. Thanks for reverting the patch. I can imagine that jbd2's use
> of SLAB_TYPESAFE_BY_RCU is an unusual one...

I do like the very lightweight validation checks!

My mental model of this sort of validation always involved expensive
atomic operations.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] jbd2: avoid __GFP_ZERO with SLAB_TYPESAFE_BY_RCU
  2022-02-10 20:06           ` Paul E. McKenney
@ 2022-02-10 20:08             ` Paul E. McKenney
  0 siblings, 0 replies; 12+ messages in thread
From: Paul E. McKenney @ 2022-02-10 20:08 UTC (permalink / raw)
  To: Jan Kara
  Cc: Qian Cai, Theodore Ts'o, Jan Kara, Neeraj Upadhyay,
	Joel Fernandes, Boqun Feng, linux-ext4, rcu, linux-kernel

On Thu, Feb 10, 2022 at 12:06:34PM -0800, Paul E. McKenney wrote:
> On Thu, Feb 10, 2022 at 08:12:52PM +0100, Jan Kara wrote:
> > On Thu 10-02-22 06:57:13, Paul E. McKenney wrote:
> > > On Thu, Feb 10, 2022 at 10:16:48AM +0100, Jan Kara wrote:
> > > > On Wed 09-02-22 12:11:37, Paul E. McKenney wrote:
> > > > > On Wed, Feb 09, 2022 at 07:10:10PM +0100, Jan Kara wrote:
> > > > > > On Wed 09-02-22 11:57:42, Qian Cai wrote:
> > > > > > > Since the linux-next commit 120aa5e57479 (mm: Check for
> > > > > > > SLAB_TYPESAFE_BY_RCU and __GFP_ZERO slab allocation), we will get a
> > > > > > > boot warning. Avoid it by calling synchronize_rcu() before the zeroing.
> > > > > > > 
> > > > > > > Signed-off-by: Qian Cai <quic_qiancai@quicinc.com>
> > > > > > 
> > > > > > No, the performance impact of this would be just horrible. Can you
> > > > > > ellaborate a bit why SLAB_TYPESAFE_BY_RCU + __GFP_ZERO is a problem and why
> > > > > > synchronize_rcu() would be needed here before the memset() please? I mean
> > > > > > how is zeroing here any different from the memory just being used?
> > > > > 
> > > > > Suppose a reader picks up a pointer to a memory block, then that memory
> > > > > is freed.  No problem, given that this is a SLAB_TYPESAFE_BY_RCU slab,
> > > > > so the memory won't be freed while the reader is accessing it.  But while
> > > > > the reader is in the process of validating the block, it is zeroed.
> > > > > 
> > > > > How does the validation step handle this in all cases?
> > > > > 
> > > > > If you have a way of handling this, I will of course drop the patch.
> > > > > And learn something new, which is always a good thing.  ;-)
> > > > 
> > > > So I maybe missed something when implementing the usage of journal_heads
> > > > under RCU but let's have a look. An example of RCU user of journal heads
> > > > is fs/jbd2/transaction.c:jbd2_write_access_granted(). It does:
> > > > 
> > > >         rcu_read_lock();
> > > > 
> > > > 	// This part fetches journal_head from buffer_head - not related to
> > > > 	// our slab RCU discussion
> > > > 
> > > >         if (!buffer_jbd(bh))
> > > >                 goto out;
> > > >         /* This should be bh2jh() but that doesn't work with inline functions */
> > > >         jh = READ_ONCE(bh->b_private);
> > > >         if (!jh)
> > > >                 goto out;
> > > > 
> > > > 	// The validation comes here
> > > > 
> > > >         /* For undo access buffer must have data copied */
> > > >         if (undo && !jh->b_committed_data)
> > > >                 goto out;
> > > 
> > > OK, so if *jh was freed and re-zallocated in the meantime, this test
> > > should fail.  One concern would be if the zeroing was not at least eight
> > > bytes at a time, maybe due to overly eager use of fancy SIMD hardware.
> > > Though perhaps you also do something about ->b_committed_data on
> > > the free path, the commit-done path, or whatever?  (I do see a
> > > "jh->b_committed_data = NULL" on what might well be the commit-done path.)
> > >
> > > >         if (READ_ONCE(jh->b_transaction) != handle->h_transaction &&
> > > >             READ_ONCE(jh->b_next_transaction) != handle->h_transaction)
> > > >                 goto out;
> > > 
> > > And same with these guys.
> > 
> > Yes, on commit-done path we zero out jh->b_transaction (or set
> > jh->b_transaction = jh->b_next_transaction; jh->b_next_transaction = NULL).
> > So these fields are actually guaranteed to be zero on free.
> 
> Very good, thank you!
> 
> One more thing...
> 
> This assumes that when the slab allocator gets a fresh slab from mm,
> that slab has been zeroed, right?  Or is there some other trick that
> you are using to somehow accommodate randomly initialized memory?
> 
> Or am I just blind today and missing where the zeroing always happens
> (other than for pages destined for userspace)?

Never mind!!!

This is of course exactly why you are using zalloc().

I will go grab that brown paper bag...

							Thanx, Paul

> > > > 	// Then some more checks unrelated to the slab itself.
> > > > 
> > > >         /*
> > > >          * There are two reasons for the barrier here:
> > > >          * 1) Make sure to fetch b_bh after we did previous checks so that we
> > > >          * detect when jh went through free, realloc, attach to transaction
> > > >          * while we were checking. Paired with implicit barrier in that path.
> > > >          * 2) So that access to bh done after jbd2_write_access_granted()
> > > >          * doesn't get reordered and see inconsistent state of concurrent
> > > >          * do_get_write_access().
> > > >          */
> > > >         smp_mb();
> > > >         if (unlikely(jh->b_bh != bh))
> > > >                 goto out;
> > > > 
> > > > 	// If all passed
> > > > 
> > > > 	rcu_read_unlock();
> > > > 	return true;
> > > > 
> > > > So if we are going to return true from the function, we know that 'jh' was
> > > > attached to handle->h_transaction at some point. And when 'jh' was attached
> > > > to handle->h_transaction, the transaction was holding reference to the 'jh'
> > > > and our 'handle' holds reference to the transaction so 'jh' could not be
> > > > freed since that moment. I.e., we are sure our reference to the handle keeps
> > > > 'jh' alive and we can safely use it.
> > > > 
> > > > I don't see how any amount of scribbling over 'jh' could break this
> > > > validation. But maybe it is just a lack of my imagination :).
> > > 
> > > Regardless of whether you are suffering a lack of imagination, you
> > > have clearly demonstrated that it is possible to correctly use the
> > > SLAB_TYPESAFE_BY_RCU flag in conjunction with kmem_cache_alloc(), thus
> > > demonstrating that I was suffering from a lack of imagination.  ;-)
> > > 
> > > I have therefore reverted my commit.  Please accept my apologies for
> > > the hassle!
> > 
> > No problem. Thanks for reverting the patch. I can imagine that jbd2's use
> > of SLAB_TYPESAFE_BY_RCU is an unusual one...
> 
> I do like the very lightweight validation checks!
> 
> My mental model of this sort of validation always involved expensive
> atomic operations.  ;-)
> 
> 							Thanx, Paul

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-02-10 20:08 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-09 16:57 [RFC PATCH] jbd2: avoid __GFP_ZERO with SLAB_TYPESAFE_BY_RCU Qian Cai
2022-02-09 18:10 ` Jan Kara
2022-02-09 18:46   ` Qian Cai
2022-02-09 20:11   ` Paul E. McKenney
2022-02-10  5:07     ` Theodore Ts'o
2022-02-10  5:43       ` Paul E. McKenney
2022-02-10 15:54         ` Theodore Ts'o
2022-02-10  9:16     ` Jan Kara
2022-02-10 14:57       ` Paul E. McKenney
2022-02-10 19:12         ` Jan Kara
2022-02-10 20:06           ` Paul E. McKenney
2022-02-10 20:08             ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).