All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Rongwei Wang <rongwei.wang@linux.alibaba.com>,
	Christoph Lameter <cl@linux.com>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	David Rientjes <rientjes@google.com>,
	Pekka Enberg <penberg@kernel.org>,
	linux-mm@kvack.org, Feng Tang <feng.tang@intel.com>
Subject: Re: [RFC PATCH] mm, slub: restrict sysfs validation to debug caches and make it safe
Date: Mon, 1 Aug 2022 15:51:49 +0200	[thread overview]
Message-ID: <224265c3-f3fe-aecd-039a-0c9ba3817305@suse.cz> (raw)
In-Reply-To: <YuZA3vHXETJ8IRin@hyeyoo>

On 7/31/22 10:44, Hyeonggon Yoo wrote:
> On Wed, Jul 27, 2022 at 08:29:09PM +0200, Vlastimil Babka wrote:
> Also all issues (as far as I know) related to validate attribute
> as gone after this patch.

As you (And Rongwei) were able to trigger/reproduce the issues, does your
testing also no longer reproduce them?

> Silly question:
> 	Do we want to apply on stable trees?

I'd prefer not to, it's too intrusive for stable.

> 	I doubt someone would use validate attribute when not debugging.

I doubt as well. Also it requires root, and even if somebody hits the issue,
it's just spurious warnings, nothing fatal. So that doesn't warrant the
intrusive stable backport IMHO.

>> [1] https://lore.kernel.org/all/69462916-2d1c-dd50-2e64-b31c2b61690e@suse.cz/
>> 
>>  mm/slub.c | 322 +++++++++++++++++++++++++++++++++++++++---------------
>>  1 file changed, 231 insertions(+), 91 deletions(-)
>> 
>> diff --git a/mm/slub.c b/mm/slub.c
>> index b1281b8654bd..01e5228809d7 100644
>> --- a/mm/slub.c
>> +++ b/mm/slub.c
>> @@ -1324,17 +1324,14 @@ static inline int alloc_consistency_checks(struct kmem_cache *s,
>>  }
> 
> [...]
> 
>> +/*
>> + * Called only for kmem_cache_debug() caches instead of acquire_slab(), with a
>> + * slab from the n->partial list. Removes only a single object from the slab
>> + * under slab_lock(), does the alloc_debug_processing() checks and leaves the
>> + * slab on the list, or moves it to full list if it was the last object.
>> + */
>> +static void *alloc_single_from_partial(struct kmem_cache *s,
>> +		struct kmem_cache_node *n, struct slab *slab)
>> +{
>> +	void *object;
>> +	unsigned long flags;
>> +
>> +	lockdep_assert_held(&n->list_lock);
>> +
>> +	slab_lock(slab, &flags);
>> +
>> +	object = slab->freelist;
>> +	slab->freelist = get_freepointer(s, object);
>> +	slab->inuse++;
>> +
>> +	if (!alloc_debug_processing(s, slab, object)) {
>> +		remove_partial(n, slab);
>> +		slab_unlock(slab, &flags);
>> +		return NULL;
>> +	}
>> +
>> +	if (slab->inuse == slab->objects) {
>> +		remove_partial(n, slab);
>> +		add_full(s, n, slab);
>> +	}
>> +
>> +	slab_unlock(slab, &flags);
> 
> AFAIK add_full/remove_full/add_partial/remove_partial
> can be called outside slab_lock but inside list_lock.

Right, I will adjust, thanks.

>> +	return object;
>> +}
>> +
>> +/*
>> + * Called only for kmem_cache_debug() caches to allocate from a freshly
>> + * allocated slab. Allocates a single object instead of whole freelist
>> + * and puts the slab to the partial (or full) list.
>> + */
>> +static void *alloc_single_from_new_slab(struct kmem_cache *s,
>> +					struct slab *slab)
>> +{
>> +	int nid = slab_nid(slab);
>> +	struct kmem_cache_node *n = get_node(s, nid);
>> +	unsigned long flags, flags2;
>> +	void *object;
>> +
>> +	spin_lock_irqsave(&n->list_lock, flags);
>> +	slab_lock(slab, &flags2);
>> +
>> +	object = slab->freelist;
>> +	slab->freelist = get_freepointer(s, object);
>> +	/* Undo what allocate_slab() did */
>> +	slab->frozen = 0;
>> +	slab->inuse = 1;
> 
> Maybe do it in allocate_slab()?

Hmm yeah, I guess we could stop doing that pre-freezing and inuse = objects
in allocate_slab(), and do it in __slab_alloc(), which thus won't add any
overhead. Then we won't have to unfreeze in early_kmem_cache_node_alloc() as
well.

>> +	if (!alloc_debug_processing(s, slab, object)) {
>> +		/*
>> +		 * It's not really expected that this would fail on a
>> +		 * freshly allocated slab, but a concurrent memory
>> +		 * corruption in theory could cause that.
>> +		 */
>> +		slab_unlock(slab, &flags2);
>> +		spin_unlock_irqrestore(&n->list_lock, flags);
>> +		return NULL;
>> +	}
>> +
>> +	if (slab->inuse == slab->objects)
>> +		add_full(s, n, slab);
>> +	else
>> +		add_partial(n, slab, DEACTIVATE_TO_HEAD);
>> +
>> +	slab_unlock(slab, &flags2);
>> +	inc_slabs_node(s, nid, slab->objects);
>> +	spin_unlock_irqrestore(&n->list_lock, flags);
>> +
>> +	return object;
>> +}
> 
> [...]
> 
>>  #endif /* CONFIG_SLUB_DEBUG */
>>  
>>  #if defined(CONFIG_SLUB_DEBUG) || defined(CONFIG_SYSFS)
>> @@ -3036,6 +3165,20 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
>>  		return NULL;
>>  	}
>>  
>> +	stat(s, ALLOC_SLAB);
>> +
>> +	if (kmem_cache_debug(s)) {
>> +		freelist = alloc_single_from_new_slab(s, slab);
>> +
>> +		if (unlikely(!freelist))
>> +			goto new_objects;
>> +
>> +		if (s->flags & SLAB_STORE_USER)
>> +			set_track(s, freelist, TRACK_ALLOC, addr);
>> +
>> +		return freelist;
>> +	}
>> +
>>  	/*
>>  	 * No other reference to the slab yet so we can
>>  	 * muck around with it freely without cmpxchg
>> @@ -3043,29 +3186,29 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
>>  	freelist = slab->freelist;
>>  	slab->freelist = NULL;
>>  
>> -	stat(s, ALLOC_SLAB);
>> +	inc_slabs_node(s, slab_nid(slab), slab->objects);
>>  
>>  check_new_slab:
>>  
>>  	if (kmem_cache_debug(s)) {
>> -		if (!alloc_debug_processing(s, slab, freelist, addr)) {
>> -			/* Slab failed checks. Next slab needed */
>> -			goto new_slab;
>> -		} else {
>> -			/*
>> -			 * For debug case, we don't load freelist so that all
>> -			 * allocations go through alloc_debug_processing()
>> -			 */
>> -			goto return_single;
>> -		}
>> +		/*
>> +		 * For debug caches here we had to go through
>> +		 * alloc_single_from_partial() so just store the tracking info
>> +		 * and return the object
>> +		 */
>> +		if (s->flags & SLAB_STORE_USER)
>> +			set_track(s, freelist, TRACK_ALLOC, addr);
>> +		return freelist;
>>  	}
>>  
>> -	if (unlikely(!pfmemalloc_match(slab, gfpflags)))
>> +	if (unlikely(!pfmemalloc_match(slab, gfpflags))) {
>>  		/*
>>  		 * For !pfmemalloc_match() case we don't load freelist so that
>>  		 * we don't make further mismatched allocations easier.
>>  		 */
>> -		goto return_single;
>> +		deactivate_slab(s, slab, get_freepointer(s, freelist));
>> +		return freelist;
>> +	}
> 
> 
> 
>>  
>>  retry_load_slab:
>>  
>> @@ -3089,11 +3232,6 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
>>  	c->slab = slab;
>>  
>>  	goto load_freelist;
>> -
>> -return_single:
>> -
>> -	deactivate_slab(s, slab, get_freepointer(s, freelist));
>> -	return freelist;
>>  }
>>  
>>  /*
>> @@ -3341,9 +3479,10 @@ static void __slab_free(struct kmem_cache *s, struct slab *slab,
>>  	if (kfence_free(head))
>>  		return;
>>  
>> -	if (kmem_cache_debug(s) &&
>> -	    !free_debug_processing(s, slab, head, tail, cnt, addr))
>> +	if (kmem_cache_debug(s)) {
>> +		free_debug_processing(s, slab, head, tail, cnt, addr);
>>  		return;
>> +	}
> 
> Oh, now debugging caches does not share free path with non-debugging
> caches.
> 
> Now free_debug_processing's return type can be void?

Right.

>>  
>>  	do {
>>  		if (unlikely(n)) {
>> @@ -3958,6 +4097,7 @@ static void early_kmem_cache_node_alloc(int node)
>>  	slab = new_slab(kmem_cache_node, GFP_NOWAIT, node);
>>  
>>  	BUG_ON(!slab);
>> +	inc_slabs_node(kmem_cache_node, slab_nid(slab), slab->objects);
>>  	if (slab_nid(slab) != node) {
>>  		pr_err("SLUB: Unable to allocate memory from node %d\n", node);
>>  		pr_err("SLUB: Allocating a useless per node structure in order to be able to continue\n");
>> @@ -5625,7 +5765,7 @@ static ssize_t validate_store(struct kmem_cache *s,
>>  {
>>  	int ret = -EINVAL;
>>  
>> -	if (buf[0] == '1') {
>> +	if (buf[0] == '1' && kmem_cache_debug(s)) {
>>  		ret = validate_slab_cache(s);
>>  		if (ret >= 0)
>>  			ret = length;
> 
> Yeah definitely this is what it should be,
> instead of serializing inc_slabs_node()/dec_slabs_node()
> for non-debugging caches.
> 
>> -- 
>> 2.37.1
>> 
> 



  parent reply	other threads:[~2022-08-01 13:51 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-27 18:29 [RFC PATCH] mm, slub: restrict sysfs validation to debug caches and make it safe Vlastimil Babka
2022-07-31  8:44 ` Hyeonggon Yoo
2022-07-31 10:07   ` Rongwei Wang
2022-08-01 13:51   ` Vlastimil Babka [this message]
2022-08-02  2:47     ` Hyeonggon Yoo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=224265c3-f3fe-aecd-039a-0c9ba3817305@suse.cz \
    --to=vbabka@suse.cz \
    --cc=42.hyeyoo@gmail.com \
    --cc=cl@linux.com \
    --cc=feng.tang@intel.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=linux-mm@kvack.org \
    --cc=penberg@kernel.org \
    --cc=rientjes@google.com \
    --cc=rongwei.wang@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.