All of lore.kernel.org
 help / color / mirror / Atom feed
* How to increase rcu_head size without breaking -mm ?
@ 2022-08-25  2:56 Joel Fernandes
  2022-08-25  7:13 ` Vlastimil Babka
  0 siblings, 1 reply; 9+ messages in thread
From: Joel Fernandes @ 2022-08-25  2:56 UTC (permalink / raw)
  To: vbabka, willy, paulmck, rcu

Hi,
I am trying to add some debug information to rcu_head for some patches, and need
your help! At least this used to work some time ago (> 1 year).

I get various SLAB_MATCH() errors with a change like this:

diff --git a/include/linux/types.h b/include/linux/types.h
index ea8cf60a8a79..daf7682a7a3b 100644
--- a/include/linux/types.h
+++ b/include/linux/types.h
@@ -220,6 +220,7 @@ struct ustat {
 struct callback_head {
        struct callback_head *next;
        void (*func)(struct callback_head *head);
+       char buf[4];
 } __attribute__((aligned(sizeof(void *))));
 #define rcu_head callback_head
---

Errors:
mm/slab.h:67:9: note: in expansion of macro ‘static_assert’
   67 |         static_assert(offsetof(struct page, pg) == offsetof(struct slab,
sl))
      |         ^~~~~~~~~~~~~
mm/slab.h:73:1: note: in expansion of macro ‘SLAB_MATCH’
   73 | SLAB_MATCH(_refcount, __page_refcount);
      | ^~~~~~~~~~

----------

I tried various things like increasing the struct slab or struct page size using
dummy variables, but nothing gives.

I tried to print the offsets of the struct using the compiler, but that requires
compilation to actually succeed. The same exercise with a pre-processor also
does not help because nested structures could also including rcu_heads.

If I at least knew how the offsets were off, I could hack it for my needs (of
course upstreaming such a change is a different story, but I do want to upstream
some variant of rcu_head debug info).

Any thoughts on how we can accomplish this?

Thanks!

 - Joel

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: How to increase rcu_head size without breaking -mm ?
  2022-08-25  2:56 How to increase rcu_head size without breaking -mm ? Joel Fernandes
@ 2022-08-25  7:13 ` Vlastimil Babka
  2022-08-25  7:50   ` Vlastimil Babka
  0 siblings, 1 reply; 9+ messages in thread
From: Vlastimil Babka @ 2022-08-25  7:13 UTC (permalink / raw)
  To: Joel Fernandes, willy, paulmck, rcu

On 8/25/22 04:56, Joel Fernandes wrote:
> Hi,
> I am trying to add some debug information to rcu_head for some patches, and need
> your help! At least this used to work some time ago (> 1 year).

Hmm, got a patch of how it worked back then? Because what I see (in v5.12)
struct page definition:

	struct rcu_head rcu_head;

is in union with the whole of

	struct {        /* slab, slob and slub */ ... }

which starts with

	union { struct list_head slab_list;  ... }
	struct kmem_cache *slab_cache; /* not slob */

and there's e.g. kmem_rcu_free() in mm/slab.c that does

        page = container_of(head, struct page, rcu_head);
        cachep = page->slab_cache;

and very similar thing in mm/slub.c rcu_free_slab() - we assume that the
rcu_head and slab_cache fields can coexist.

So this looks silently fragile to me? In case rcu_head becomes larger than
list_head, e.g. with your buf[4], it would start overlaping the
page->slab_cache field, no?

AFAICS the way it's structured now in struct slab in mm/slab.h, it makes the
overlap impossible, thus the struct slab will grow as rcu_head grows, and
offsets of the later fields stop matching struct page counterparts, which
doesn't grow. Hmm.

> I get various SLAB_MATCH() errors with a change like this:
> 
> diff --git a/include/linux/types.h b/include/linux/types.h
> index ea8cf60a8a79..daf7682a7a3b 100644
> --- a/include/linux/types.h
> +++ b/include/linux/types.h
> @@ -220,6 +220,7 @@ struct ustat {
>  struct callback_head {
>         struct callback_head *next;
>         void (*func)(struct callback_head *head);
> +       char buf[4];
>  } __attribute__((aligned(sizeof(void *))));
>  #define rcu_head callback_head
> ---
> 
> Errors:
> mm/slab.h:67:9: note: in expansion of macro ‘static_assert’
>    67 |         static_assert(offsetof(struct page, pg) == offsetof(struct slab,
> sl))
>       |         ^~~~~~~~~~~~~
> mm/slab.h:73:1: note: in expansion of macro ‘SLAB_MATCH’
>    73 | SLAB_MATCH(_refcount, __page_refcount);
>       | ^~~~~~~~~~
> 
> ----------
> 
> I tried various things like increasing the struct slab or struct page size using
> dummy variables, but nothing gives.
> 
> I tried to print the offsets of the struct using the compiler, but that requires
> compilation to actually succeed. The same exercise with a pre-processor also
> does not help because nested structures could also including rcu_heads.
> 
> If I at least knew how the offsets were off, I could hack it for my needs (of
> course upstreaming such a change is a different story, but I do want to upstream
> some variant of rcu_head debug info).
> 
> Any thoughts on how we can accomplish this?
> 
> Thanks!
> 
>  - Joel


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: How to increase rcu_head size without breaking -mm ?
  2022-08-25  7:13 ` Vlastimil Babka
@ 2022-08-25  7:50   ` Vlastimil Babka
  2022-08-25  8:14     ` Vlastimil Babka
  0 siblings, 1 reply; 9+ messages in thread
From: Vlastimil Babka @ 2022-08-25  7:50 UTC (permalink / raw)
  To: Joel Fernandes, willy, paulmck, rcu


On 8/25/22 09:13, Vlastimil Babka wrote:
> On 8/25/22 04:56, Joel Fernandes wrote:
>> Hi,
>> I am trying to add some debug information to rcu_head for some patches, and need
>> your help! At least this used to work some time ago (> 1 year).
> 
> Hmm, got a patch of how it worked back then? Because what I see (in v5.12)
> struct page definition:
> 
> 	struct rcu_head rcu_head;
> 
> is in union with the whole of
> 
> 	struct {        /* slab, slob and slub */ ... }
> 
> which starts with
> 
> 	union { struct list_head slab_list;  ... }
> 	struct kmem_cache *slab_cache; /* not slob */
> 
> and there's e.g. kmem_rcu_free() in mm/slab.c that does
> 
>         page = container_of(head, struct page, rcu_head);
>         cachep = page->slab_cache;
> 
> and very similar thing in mm/slub.c rcu_free_slab() - we assume that the
> rcu_head and slab_cache fields can coexist.
> 
> So this looks silently fragile to me? In case rcu_head becomes larger than
> list_head, e.g. with your buf[4], it would start overlaping the
> page->slab_cache field, no?
> 
> AFAICS the way it's structured now in struct slab in mm/slab.h, it makes the
> overlap impossible, thus the struct slab will grow as rcu_head grows, and
> offsets of the later fields stop matching struct page counterparts, which
> doesn't grow. Hmm.

The following has made things compile for both SLUB and SLAB (didn't adjust
SLOB), the idea is to define more explicitly that everything can overlap
with rcu_head except slab_cache. So the structures don't even grow in the
end. Which requires moving slab_cache field around. Which would be fine
except for SLUB and its cmpxchg_double usage, which requires the
freelist+counter fields to be 128bit aligned, which means we can't fit it
all with these constraints. So that's broken with the patch and I don't see
an easy way to fix this. But maybe it can unblock your current debugging
efforts if you use SLAB, or disable the cmpxchg_double in SLUB (at some perf
cost).

----8<----
diff --git a/include/linux/types.h b/include/linux/types.h
index ea8cf60a8a79..daf7682a7a3b 100644
--- a/include/linux/types.h
+++ b/include/linux/types.h
@@ -220,6 +220,7 @@ struct ustat {
 struct callback_head {
 	struct callback_head *next;
 	void (*func)(struct callback_head *head);
+	char buf[4];
 } __attribute__((aligned(sizeof(void *))));
 #define rcu_head callback_head
 
diff --git a/mm/slab.h b/mm/slab.h
index 4ec82bec15ec..9fdbfcc401c0 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -12,37 +12,42 @@ struct slab {
 #if defined(CONFIG_SLAB)
 
 	union {
-		struct list_head slab_list;
+		struct {
+			struct list_head slab_list;
+			void *freelist;	/* array of free object indexes */
+			void *s_mem;	/* first object */
+		};
 		struct rcu_head rcu_head;
 	};
 	struct kmem_cache *slab_cache;
-	void *freelist;	/* array of free object indexes */
-	void *s_mem;	/* first object */
 	unsigned int active;
 
 #elif defined(CONFIG_SLUB)
 
 	union {
-		struct list_head slab_list;
-		struct rcu_head rcu_head;
-#ifdef CONFIG_SLUB_CPU_PARTIAL
 		struct {
-			struct slab *next;
-			int slabs;	/* Nr of slabs left */
-		};
+			union {
+				struct list_head slab_list;
+#ifdef CONFIG_SLUB_CPU_PARTIAL
+				struct {
+					struct slab *next;
+					int slabs;	/* Nr of slabs left */
+				};
 #endif
-	};
-	struct kmem_cache *slab_cache;
-	/* Double-word boundary */
-	void *freelist;		/* first free object */
-	union {
-		unsigned long counters;
-		struct {
-			unsigned inuse:16;
-			unsigned objects:15;
-			unsigned frozen:1;
+			};
+			void *freelist;		/* first free object */
+			union {
+				unsigned long counters;
+				struct {
+					unsigned inuse:16;
+					unsigned objects:15;
+					unsigned frozen:1;
+				};
+			};
 		};
+		struct rcu_head rcu_head;
 	};
+	struct kmem_cache *slab_cache;
 	unsigned int __unused;
 
 #elif defined(CONFIG_SLOB)


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: How to increase rcu_head size without breaking -mm ?
  2022-08-25  7:50   ` Vlastimil Babka
@ 2022-08-25  8:14     ` Vlastimil Babka
  2022-08-25 15:00       ` Joel Fernandes
  0 siblings, 1 reply; 9+ messages in thread
From: Vlastimil Babka @ 2022-08-25  8:14 UTC (permalink / raw)
  To: Joel Fernandes, willy, paulmck, rcu


On 8/25/22 09:50, Vlastimil Babka wrote:
> 
> On 8/25/22 09:13, Vlastimil Babka wrote:
>> On 8/25/22 04:56, Joel Fernandes wrote:
>>> Hi,
>>> I am trying to add some debug information to rcu_head for some patches, and need
>>> your help! At least this used to work some time ago (> 1 year).
>> 
>> Hmm, got a patch of how it worked back then? Because what I see (in v5.12)
>> struct page definition:
>> 
>> 	struct rcu_head rcu_head;
>> 
>> is in union with the whole of
>> 
>> 	struct {        /* slab, slob and slub */ ... }
>> 
>> which starts with
>> 
>> 	union { struct list_head slab_list;  ... }
>> 	struct kmem_cache *slab_cache; /* not slob */
>> 
>> and there's e.g. kmem_rcu_free() in mm/slab.c that does
>> 
>>         page = container_of(head, struct page, rcu_head);
>>         cachep = page->slab_cache;
>> 
>> and very similar thing in mm/slub.c rcu_free_slab() - we assume that the
>> rcu_head and slab_cache fields can coexist.
>> 
>> So this looks silently fragile to me? In case rcu_head becomes larger than
>> list_head, e.g. with your buf[4], it would start overlaping the
>> page->slab_cache field, no?
>> 
>> AFAICS the way it's structured now in struct slab in mm/slab.h, it makes the
>> overlap impossible, thus the struct slab will grow as rcu_head grows, and
>> offsets of the later fields stop matching struct page counterparts, which
>> doesn't grow. Hmm.
> 
> The following has made things compile for both SLUB and SLAB (didn't adjust
> SLOB), the idea is to define more explicitly that everything can overlap
> with rcu_head except slab_cache. So the structures don't even grow in the
> end. Which requires moving slab_cache field around. Which would be fine
> except for SLUB and its cmpxchg_double usage, which requires the
> freelist+counter fields to be 128bit aligned, which means we can't fit it
> all with these constraints. So that's broken with the patch and I don't see
> an easy way to fix this.

Ah maybe I found a way. It requires that rcu_head in struct slab would be at
different offset in struct page, and we drop the particular SLAB_MATCH
check. I think it should be fine as we don't cast struct slab to struct page
in the rcu calls/callbacks so it was probably there just for intermediate
steps of the struct slab series, and can be dropped now.

So this should allow you to use up to 32 bytes of rcu_head.

----8<----
diff --git a/include/linux/types.h b/include/linux/types.h
index ea8cf60a8a79..daf7682a7a3b 100644
--- a/include/linux/types.h
+++ b/include/linux/types.h
@@ -220,6 +220,7 @@ struct ustat {
 struct callback_head {
 	struct callback_head *next;
 	void (*func)(struct callback_head *head);
+	char buf[4];
 } __attribute__((aligned(sizeof(void *))));
 #define rcu_head callback_head
 
diff --git a/mm/slab.h b/mm/slab.h
index 4ec82bec15ec..dd49851f9814 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -12,37 +12,45 @@ struct slab {
 #if defined(CONFIG_SLAB)
 
 	union {
-		struct list_head slab_list;
+		struct {
+			struct list_head slab_list;
+			void *freelist;	/* array of free object indexes */
+			void *s_mem;	/* first object */
+		};
 		struct rcu_head rcu_head;
 	};
 	struct kmem_cache *slab_cache;
-	void *freelist;	/* array of free object indexes */
-	void *s_mem;	/* first object */
 	unsigned int active;
 
 #elif defined(CONFIG_SLUB)
 
-	union {
-		struct list_head slab_list;
-		struct rcu_head rcu_head;
-#ifdef CONFIG_SLUB_CPU_PARTIAL
-		struct {
-			struct slab *next;
-			int slabs;	/* Nr of slabs left */
-		};
-#endif
-	};
 	struct kmem_cache *slab_cache;
-	/* Double-word boundary */
-	void *freelist;		/* first free object */
+
 	union {
-		unsigned long counters;
 		struct {
-			unsigned inuse:16;
-			unsigned objects:15;
-			unsigned frozen:1;
+			union {
+				struct list_head slab_list;
+#ifdef CONFIG_SLUB_CPU_PARTIAL
+				struct {
+					struct slab *next;
+					int slabs;	/* Nr of slabs left */
+				};
+#endif
+			};
+			/* Double-word boundary */
+			void *freelist;		/* first free object */
+			union {
+				unsigned long counters;
+				struct {
+					unsigned inuse:16;
+					unsigned objects:15;
+					unsigned frozen:1;
+				};
+			};
 		};
+		struct rcu_head rcu_head;
 	};
+
 	unsigned int __unused;
 
 #elif defined(CONFIG_SLOB)
@@ -66,9 +74,13 @@ struct slab {
 #define SLAB_MATCH(pg, sl)						\
 	static_assert(offsetof(struct page, pg) == offsetof(struct slab, sl))
 SLAB_MATCH(flags, __page_flags);
+#ifdef CONFIG_SLUB
+SLAB_MATCH(compound_head, slab_cache);	/* Ensure bit 0 is clear */
+#else
 SLAB_MATCH(compound_head, slab_list);	/* Ensure bit 0 is clear */
+#endif
 #ifndef CONFIG_SLOB
-SLAB_MATCH(rcu_head, rcu_head);
+// SLAB_MATCH(rcu_head, rcu_head); /* not necessary, hopefully? */
 #endif
 SLAB_MATCH(_refcount, __page_refcount);
 #ifdef CONFIG_MEMCG



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: How to increase rcu_head size without breaking -mm ?
  2022-08-25  8:14     ` Vlastimil Babka
@ 2022-08-25 15:00       ` Joel Fernandes
  2022-08-25 15:33         ` Vlastimil Babka
  0 siblings, 1 reply; 9+ messages in thread
From: Joel Fernandes @ 2022-08-25 15:00 UTC (permalink / raw)
  To: Vlastimil Babka; +Cc: willy, paulmck, rcu

On Thu, Aug 25, 2022 at 10:14:16AM +0200, Vlastimil Babka wrote:
> 
> On 8/25/22 09:50, Vlastimil Babka wrote:
> > 
> > On 8/25/22 09:13, Vlastimil Babka wrote:
> >> On 8/25/22 04:56, Joel Fernandes wrote:
> >>> Hi,
> >>> I am trying to add some debug information to rcu_head for some patches, and need
> >>> your help! At least this used to work some time ago (> 1 year).
> >> 
> >> Hmm, got a patch of how it worked back then? Because what I see (in v5.12)
> >> struct page definition:
> >> 
> >> 	struct rcu_head rcu_head;
> >> 
> >> is in union with the whole of
> >> 
> >> 	struct {        /* slab, slob and slub */ ... }
> >> 
> >> which starts with
> >> 
> >> 	union { struct list_head slab_list;  ... }
> >> 	struct kmem_cache *slab_cache; /* not slob */
> >> 
> >> and there's e.g. kmem_rcu_free() in mm/slab.c that does
> >> 
> >>         page = container_of(head, struct page, rcu_head);
> >>         cachep = page->slab_cache;
> >> 
> >> and very similar thing in mm/slub.c rcu_free_slab() - we assume that the
> >> rcu_head and slab_cache fields can coexist.
> >> 
> >> So this looks silently fragile to me? In case rcu_head becomes larger than
> >> list_head, e.g. with your buf[4], it would start overlaping the
> >> page->slab_cache field, no?
> >> 
> >> AFAICS the way it's structured now in struct slab in mm/slab.h, it makes the
> >> overlap impossible, thus the struct slab will grow as rcu_head grows, and
> >> offsets of the later fields stop matching struct page counterparts, which
> >> doesn't grow. Hmm.
> > 
> > The following has made things compile for both SLUB and SLAB (didn't adjust
> > SLOB), the idea is to define more explicitly that everything can overlap
> > with rcu_head except slab_cache. So the structures don't even grow in the
> > end. Which requires moving slab_cache field around. Which would be fine
> > except for SLUB and its cmpxchg_double usage, which requires the
> > freelist+counter fields to be 128bit aligned, which means we can't fit it
> > all with these constraints. So that's broken with the patch and I don't see
> > an easy way to fix this.
> 
> Ah maybe I found a way. It requires that rcu_head in struct slab would be at
> different offset in struct page, and we drop the particular SLAB_MATCH
> check. I think it should be fine as we don't cast struct slab to struct page
> in the rcu calls/callbacks so it was probably there just for intermediate
> steps of the struct slab series, and can be dropped now.
> 
> So this should allow you to use up to 32 bytes of rcu_head.

Thanks a lot, Vlastimil! I can confirm it builds, some comments below:

> ----8<----
> diff --git a/include/linux/types.h b/include/linux/types.h
> index ea8cf60a8a79..daf7682a7a3b 100644
> --- a/include/linux/types.h
> +++ b/include/linux/types.h
> @@ -220,6 +220,7 @@ struct ustat {
>  struct callback_head {
>  	struct callback_head *next;
>  	void (*func)(struct callback_head *head);
> +	char buf[4];
>  } __attribute__((aligned(sizeof(void *))));
>  #define rcu_head callback_head
>  
> diff --git a/mm/slab.h b/mm/slab.h
> index 4ec82bec15ec..dd49851f9814 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -12,37 +12,45 @@ struct slab {
>  #if defined(CONFIG_SLAB)
>  
>  	union {
> -		struct list_head slab_list;
> +		struct {
> +			struct list_head slab_list;
> +			void *freelist;	/* array of free object indexes */
> +			void *s_mem;	/* first object */
> +		};
>  		struct rcu_head rcu_head;

Does this present any problems for SLAB_TYPESAFE_BY_RCU such as the slab
being in use before the end of a grace-period? I was concerned about the
freelist being corrupted by the rcu_head storage, since its now in a union
with it. If this does not matter, then that's all good.

> @@ -66,9 +74,13 @@ struct slab {
>  #define SLAB_MATCH(pg, sl)						\
>  	static_assert(offsetof(struct page, pg) == offsetof(struct slab, sl))
>  SLAB_MATCH(flags, __page_flags);
> +#ifdef CONFIG_SLUB
> +SLAB_MATCH(compound_head, slab_cache);	/* Ensure bit 0 is clear */
> +#else
>  SLAB_MATCH(compound_head, slab_list);	/* Ensure bit 0 is clear */
> +#endif
>  #ifndef CONFIG_SLOB
> -SLAB_MATCH(rcu_head, rcu_head);
> +// SLAB_MATCH(rcu_head, rcu_head); /* not necessary, hopefully? */

I think if it helps, I can make this check as a dependency on a default-off
debug config option.

Tested-by: Joel Fernandes (Google) <joel@joelfernandes.org>

thanks,

 - Joel


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: How to increase rcu_head size without breaking -mm ?
  2022-08-25 15:00       ` Joel Fernandes
@ 2022-08-25 15:33         ` Vlastimil Babka
  2022-08-25 19:51           ` Joel Fernandes
  0 siblings, 1 reply; 9+ messages in thread
From: Vlastimil Babka @ 2022-08-25 15:33 UTC (permalink / raw)
  To: Joel Fernandes; +Cc: willy, paulmck, rcu

On 8/25/22 17:00, Joel Fernandes wrote:
> On Thu, Aug 25, 2022 at 10:14:16AM +0200, Vlastimil Babka wrote:
>> 
>> On 8/25/22 09:50, Vlastimil Babka wrote:
>> > 
>> > On 8/25/22 09:13, Vlastimil Babka wrote:
>> >> On 8/25/22 04:56, Joel Fernandes wrote:
>> >>> Hi,
>> >>> I am trying to add some debug information to rcu_head for some patches, and need
>> >>> your help! At least this used to work some time ago (> 1 year).
>> >> 
>> >> Hmm, got a patch of how it worked back then? Because what I see (in v5.12)
>> >> struct page definition:
>> >> 
>> >> 	struct rcu_head rcu_head;
>> >> 
>> >> is in union with the whole of
>> >> 
>> >> 	struct {        /* slab, slob and slub */ ... }
>> >> 
>> >> which starts with
>> >> 
>> >> 	union { struct list_head slab_list;  ... }
>> >> 	struct kmem_cache *slab_cache; /* not slob */
>> >> 
>> >> and there's e.g. kmem_rcu_free() in mm/slab.c that does
>> >> 
>> >>         page = container_of(head, struct page, rcu_head);
>> >>         cachep = page->slab_cache;
>> >> 
>> >> and very similar thing in mm/slub.c rcu_free_slab() - we assume that the
>> >> rcu_head and slab_cache fields can coexist.
>> >> 
>> >> So this looks silently fragile to me? In case rcu_head becomes larger than
>> >> list_head, e.g. with your buf[4], it would start overlaping the
>> >> page->slab_cache field, no?
>> >> 
>> >> AFAICS the way it's structured now in struct slab in mm/slab.h, it makes the
>> >> overlap impossible, thus the struct slab will grow as rcu_head grows, and
>> >> offsets of the later fields stop matching struct page counterparts, which
>> >> doesn't grow. Hmm.
>> > 
>> > The following has made things compile for both SLUB and SLAB (didn't adjust
>> > SLOB), the idea is to define more explicitly that everything can overlap
>> > with rcu_head except slab_cache. So the structures don't even grow in the
>> > end. Which requires moving slab_cache field around. Which would be fine
>> > except for SLUB and its cmpxchg_double usage, which requires the
>> > freelist+counter fields to be 128bit aligned, which means we can't fit it
>> > all with these constraints. So that's broken with the patch and I don't see
>> > an easy way to fix this.
>> 
>> Ah maybe I found a way. It requires that rcu_head in struct slab would be at
>> different offset in struct page, and we drop the particular SLAB_MATCH
>> check. I think it should be fine as we don't cast struct slab to struct page
>> in the rcu calls/callbacks so it was probably there just for intermediate
>> steps of the struct slab series, and can be dropped now.
>> 
>> So this should allow you to use up to 32 bytes of rcu_head.
> 
> Thanks a lot, Vlastimil! I can confirm it builds, some comments below:
> 
>> ----8<----
>> diff --git a/include/linux/types.h b/include/linux/types.h
>> index ea8cf60a8a79..daf7682a7a3b 100644
>> --- a/include/linux/types.h
>> +++ b/include/linux/types.h
>> @@ -220,6 +220,7 @@ struct ustat {
>>  struct callback_head {
>>  	struct callback_head *next;
>>  	void (*func)(struct callback_head *head);
>> +	char buf[4];
>>  } __attribute__((aligned(sizeof(void *))));
>>  #define rcu_head callback_head
>>  
>> diff --git a/mm/slab.h b/mm/slab.h
>> index 4ec82bec15ec..dd49851f9814 100644
>> --- a/mm/slab.h
>> +++ b/mm/slab.h
>> @@ -12,37 +12,45 @@ struct slab {
>>  #if defined(CONFIG_SLAB)
>>  
>>  	union {
>> -		struct list_head slab_list;
>> +		struct {
>> +			struct list_head slab_list;
>> +			void *freelist;	/* array of free object indexes */
>> +			void *s_mem;	/* first object */
>> +		};
>>  		struct rcu_head rcu_head;
> 
> Does this present any problems for SLAB_TYPESAFE_BY_RCU such as the slab
> being in use before the end of a grace-period? I was concerned about the
> freelist being corrupted by the rcu_head storage, since its now in a union
> with it. If this does not matter, then that's all good.

Good point, I didn't check that. If we're freeing the slab then nothing
should be using it at that point, including the freelist pointer. Only the
actual object storage is guaranteed not to go away or become overwritten for
SLAB_TYPESAFE_BY_RCU caches.

But we need to check the slab freeing functions themselves.
It looks like SLAB is fine, slab_destroy() there takes care to copy the
freelist pointer away before call_rcu() already and then it only works with
the copy. Also nothing should need s_mem either, in case you need more bytes
for the rcu_head debug.

SLUB seems also fine as the freeing doesn't try to look at the freelist even
in case of debugging consistency checks being enabled. It could be a problem
in case of a double-free, but that's already a problem anyway.
In case you need more than the 8 bytes of freelist pointer, then overwriting
the next 'counters' union would be a problem for the consistency checks as
it reads slab->objects from there.
But I guess we could do those checks before call_rcu, so the callback would
then only free the slab page. The state expected by those checks doesn't
depend on the grace period expiring.

>> @@ -66,9 +74,13 @@ struct slab {
>>  #define SLAB_MATCH(pg, sl)						\
>>  	static_assert(offsetof(struct page, pg) == offsetof(struct slab, sl))
>>  SLAB_MATCH(flags, __page_flags);
>> +#ifdef CONFIG_SLUB
>> +SLAB_MATCH(compound_head, slab_cache);	/* Ensure bit 0 is clear */
>> +#else
>>  SLAB_MATCH(compound_head, slab_list);	/* Ensure bit 0 is clear */
>> +#endif
>>  #ifndef CONFIG_SLOB
>> -SLAB_MATCH(rcu_head, rcu_head);
>> +// SLAB_MATCH(rcu_head, rcu_head); /* not necessary, hopefully? */
> 
> I think if it helps, I can make this check as a dependency on a default-off
> debug config option.

It wouldn't help as with this patch, for SLUB the rcu_head would start at
different offset regardless of whether you enable the option to increase its
size.

> Tested-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> 
> thanks,
> 
>  - Joel
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: How to increase rcu_head size without breaking -mm ?
  2022-08-25 15:33         ` Vlastimil Babka
@ 2022-08-25 19:51           ` Joel Fernandes
  2022-08-26  8:35             ` Vlastimil Babka
  0 siblings, 1 reply; 9+ messages in thread
From: Joel Fernandes @ 2022-08-25 19:51 UTC (permalink / raw)
  To: Vlastimil Babka; +Cc: willy, paulmck, rcu

On Thu, Aug 25, 2022 at 05:33:49PM +0200, Vlastimil Babka wrote:
> On 8/25/22 17:00, Joel Fernandes wrote:
> > On Thu, Aug 25, 2022 at 10:14:16AM +0200, Vlastimil Babka wrote:
> >> 
> >> On 8/25/22 09:50, Vlastimil Babka wrote:
> >> > 
> >> > On 8/25/22 09:13, Vlastimil Babka wrote:
> >> >> On 8/25/22 04:56, Joel Fernandes wrote:
> >> >>> Hi,
> >> >>> I am trying to add some debug information to rcu_head for some patches, and need
> >> >>> your help! At least this used to work some time ago (> 1 year).
> >> >> 
> >> >> Hmm, got a patch of how it worked back then? Because what I see (in v5.12)
> >> >> struct page definition:
> >> >> 
> >> >> 	struct rcu_head rcu_head;
> >> >> 
> >> >> is in union with the whole of
> >> >> 
> >> >> 	struct {        /* slab, slob and slub */ ... }
> >> >> 
> >> >> which starts with
> >> >> 
> >> >> 	union { struct list_head slab_list;  ... }
> >> >> 	struct kmem_cache *slab_cache; /* not slob */
> >> >> 
> >> >> and there's e.g. kmem_rcu_free() in mm/slab.c that does
> >> >> 
> >> >>         page = container_of(head, struct page, rcu_head);
> >> >>         cachep = page->slab_cache;
> >> >> 
> >> >> and very similar thing in mm/slub.c rcu_free_slab() - we assume that the
> >> >> rcu_head and slab_cache fields can coexist.
> >> >> 
> >> >> So this looks silently fragile to me? In case rcu_head becomes larger than
> >> >> list_head, e.g. with your buf[4], it would start overlaping the
> >> >> page->slab_cache field, no?
> >> >> 
> >> >> AFAICS the way it's structured now in struct slab in mm/slab.h, it makes the
> >> >> overlap impossible, thus the struct slab will grow as rcu_head grows, and
> >> >> offsets of the later fields stop matching struct page counterparts, which
> >> >> doesn't grow. Hmm.
> >> > 
> >> > The following has made things compile for both SLUB and SLAB (didn't adjust
> >> > SLOB), the idea is to define more explicitly that everything can overlap
> >> > with rcu_head except slab_cache. So the structures don't even grow in the
> >> > end. Which requires moving slab_cache field around. Which would be fine
> >> > except for SLUB and its cmpxchg_double usage, which requires the
> >> > freelist+counter fields to be 128bit aligned, which means we can't fit it
> >> > all with these constraints. So that's broken with the patch and I don't see
> >> > an easy way to fix this.
> >> 
> >> Ah maybe I found a way. It requires that rcu_head in struct slab would be at
> >> different offset in struct page, and we drop the particular SLAB_MATCH
> >> check. I think it should be fine as we don't cast struct slab to struct page
> >> in the rcu calls/callbacks so it was probably there just for intermediate
> >> steps of the struct slab series, and can be dropped now.
> >> 
> >> So this should allow you to use up to 32 bytes of rcu_head.
> > 
> > Thanks a lot, Vlastimil! I can confirm it builds, some comments below:
> > 
> >> ----8<----
> >> diff --git a/include/linux/types.h b/include/linux/types.h
> >> index ea8cf60a8a79..daf7682a7a3b 100644
> >> --- a/include/linux/types.h
> >> +++ b/include/linux/types.h
> >> @@ -220,6 +220,7 @@ struct ustat {
> >>  struct callback_head {
> >>  	struct callback_head *next;
> >>  	void (*func)(struct callback_head *head);
> >> +	char buf[4];
> >>  } __attribute__((aligned(sizeof(void *))));
> >>  #define rcu_head callback_head
> >>  
> >> diff --git a/mm/slab.h b/mm/slab.h
> >> index 4ec82bec15ec..dd49851f9814 100644
> >> --- a/mm/slab.h
> >> +++ b/mm/slab.h
> >> @@ -12,37 +12,45 @@ struct slab {
> >>  #if defined(CONFIG_SLAB)
> >>  
> >>  	union {
> >> -		struct list_head slab_list;
> >> +		struct {
> >> +			struct list_head slab_list;
> >> +			void *freelist;	/* array of free object indexes */
> >> +			void *s_mem;	/* first object */
> >> +		};
> >>  		struct rcu_head rcu_head;
> > 
> > Does this present any problems for SLAB_TYPESAFE_BY_RCU such as the slab
> > being in use before the end of a grace-period? I was concerned about the
> > freelist being corrupted by the rcu_head storage, since its now in a union
> > with it. If this does not matter, then that's all good.
> 
> Good point, I didn't check that. If we're freeing the slab then nothing
> should be using it at that point, including the freelist pointer. Only the
> actual object storage is guaranteed not to go away or become overwritten for
> SLAB_TYPESAFE_BY_RCU caches.

Cool, sounds good.

> But we need to check the slab freeing functions themselves.
> It looks like SLAB is fine, slab_destroy() there takes care to copy the
> freelist pointer away before call_rcu() already and then it only works with
> the copy. Also nothing should need s_mem either, in case you need more bytes
> for the rcu_head debug.
> 
> SLUB seems also fine as the freeing doesn't try to look at the freelist even
> in case of debugging consistency checks being enabled. It could be a problem
> in case of a double-free, but that's already a problem anyway.
> In case you need more than the 8 bytes of freelist pointer, then overwriting
> the next 'counters' union would be a problem for the consistency checks as
> it reads slab->objects from there.
> But I guess we could do those checks before call_rcu, so the callback would
> then only free the slab page. The state expected by those checks doesn't
> depend on the grace period expiring.

Ok, sounds good, I can run it for a few days on real hardware and see if
something breaks or such. The most I can get with your patch is 12 bytes
before the FOLIO checks start complainting. I can stick to 8 bytes, and
perhaps we can allocate a separate debug object if we need more or something.

> >> @@ -66,9 +74,13 @@ struct slab {
> >>  #define SLAB_MATCH(pg, sl)						\
> >>  	static_assert(offsetof(struct page, pg) == offsetof(struct slab, sl))
> >>  SLAB_MATCH(flags, __page_flags);
> >> +#ifdef CONFIG_SLUB
> >> +SLAB_MATCH(compound_head, slab_cache);	/* Ensure bit 0 is clear */
> >> +#else
> >>  SLAB_MATCH(compound_head, slab_list);	/* Ensure bit 0 is clear */
> >> +#endif
> >>  #ifndef CONFIG_SLOB
> >> -SLAB_MATCH(rcu_head, rcu_head);
> >> +// SLAB_MATCH(rcu_head, rcu_head); /* not necessary, hopefully? */
> > 
> > I think if it helps, I can make this check as a dependency on a default-off
> > debug config option.
> 
> It wouldn't help as with this patch, for SLUB the rcu_head would start at
> different offset regardless of whether you enable the option to increase its
> size.

True, unless we maintained 2 versions of the structs based on CONFIGs but
that's really ugly.

Thanks so much for this patch, its surely helpful for debugging purposes and
I look forward to testing it more and providing you any feedback!

thanks,

 - Joel


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: How to increase rcu_head size without breaking -mm ?
  2022-08-25 19:51           ` Joel Fernandes
@ 2022-08-26  8:35             ` Vlastimil Babka
  2022-08-26  8:50               ` Vlastimil Babka
  0 siblings, 1 reply; 9+ messages in thread
From: Vlastimil Babka @ 2022-08-26  8:35 UTC (permalink / raw)
  To: Joel Fernandes; +Cc: willy, paulmck, rcu

On 8/25/22 21:51, Joel Fernandes wrote:
>> But we need to check the slab freeing functions themselves.
>> It looks like SLAB is fine, slab_destroy() there takes care to copy the
>> freelist pointer away before call_rcu() already and then it only works with
>> the copy. Also nothing should need s_mem either, in case you need more bytes
>> for the rcu_head debug.
>> 
>> SLUB seems also fine as the freeing doesn't try to look at the freelist even
>> in case of debugging consistency checks being enabled. It could be a problem
>> in case of a double-free, but that's already a problem anyway.
>> In case you need more than the 8 bytes of freelist pointer, then overwriting
>> the next 'counters' union would be a problem for the consistency checks as
>> it reads slab->objects from there.
>> But I guess we could do those checks before call_rcu, so the callback would
>> then only free the slab page. The state expected by those checks doesn't
>> depend on the grace period expiring.
> 
> Ok, sounds good, I can run it for a few days on real hardware and see if
> something breaks or such. The most I can get with your patch is 12 bytes
> before the FOLIO checks start complainting. I can stick to 8 bytes, and
> perhaps we can allocate a separate debug object if we need more or something.

Weird, it works for me with 16 bytes, both SLAB and SLUB.

>> >> @@ -66,9 +74,13 @@ struct slab {
>> >>  #define SLAB_MATCH(pg, sl)						\
>> >>  	static_assert(offsetof(struct page, pg) == offsetof(struct slab, sl))
>> >>  SLAB_MATCH(flags, __page_flags);
>> >> +#ifdef CONFIG_SLUB
>> >> +SLAB_MATCH(compound_head, slab_cache);	/* Ensure bit 0 is clear */
>> >> +#else
>> >>  SLAB_MATCH(compound_head, slab_list);	/* Ensure bit 0 is clear */
>> >> +#endif
>> >>  #ifndef CONFIG_SLOB
>> >> -SLAB_MATCH(rcu_head, rcu_head);
>> >> +// SLAB_MATCH(rcu_head, rcu_head); /* not necessary, hopefully? */
>> > 
>> > I think if it helps, I can make this check as a dependency on a default-off
>> > debug config option.
>> 
>> It wouldn't help as with this patch, for SLUB the rcu_head would start at
>> different offset regardless of whether you enable the option to increase its
>> size.
> 
> True, unless we maintained 2 versions of the structs based on CONFIGs but
> that's really ugly.
> 
> Thanks so much for this patch, its surely helpful for debugging purposes and
> I look forward to testing it more and providing you any feedback!

I'll send a formal RFC soon.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: How to increase rcu_head size without breaking -mm ?
  2022-08-26  8:35             ` Vlastimil Babka
@ 2022-08-26  8:50               ` Vlastimil Babka
  0 siblings, 0 replies; 9+ messages in thread
From: Vlastimil Babka @ 2022-08-26  8:50 UTC (permalink / raw)
  To: Joel Fernandes; +Cc: willy, paulmck, rcu

On 8/26/22 10:35, Vlastimil Babka wrote:
> On 8/25/22 21:51, Joel Fernandes wrote:
>>> But we need to check the slab freeing functions themselves.
>>> It looks like SLAB is fine, slab_destroy() there takes care to copy the
>>> freelist pointer away before call_rcu() already and then it only works with
>>> the copy. Also nothing should need s_mem either, in case you need more bytes
>>> for the rcu_head debug.
>>> 
>>> SLUB seems also fine as the freeing doesn't try to look at the freelist even
>>> in case of debugging consistency checks being enabled. It could be a problem
>>> in case of a double-free, but that's already a problem anyway.
>>> In case you need more than the 8 bytes of freelist pointer, then overwriting
>>> the next 'counters' union would be a problem for the consistency checks as
>>> it reads slab->objects from there.
>>> But I guess we could do those checks before call_rcu, so the callback would
>>> then only free the slab page. The state expected by those checks doesn't
>>> depend on the grace period expiring.
>> 
>> Ok, sounds good, I can run it for a few days on real hardware and see if
>> something breaks or such. The most I can get with your patch is 12 bytes
>> before the FOLIO checks start complainting. I can stick to 8 bytes, and
>> perhaps we can allocate a separate debug object if we need more or something.
> 
> Weird, it works for me with 16 bytes, both SLAB and SLUB.

Oh, I see, it breaks in some 32bit vdso object, in the struct page's
asserts, not slab's. Likely some false positive due to header creep, I doubt
the vdso really works with 32bit struct pages as those don't exist on 64 bit
kernel. Maybe we should move the asserts from the header to e.g.
page_alloc.c, willy?

> 
>>> >> @@ -66,9 +74,13 @@ struct slab {
>>> >>  #define SLAB_MATCH(pg, sl)						\
>>> >>  	static_assert(offsetof(struct page, pg) == offsetof(struct slab, sl))
>>> >>  SLAB_MATCH(flags, __page_flags);
>>> >> +#ifdef CONFIG_SLUB
>>> >> +SLAB_MATCH(compound_head, slab_cache);	/* Ensure bit 0 is clear */
>>> >> +#else
>>> >>  SLAB_MATCH(compound_head, slab_list);	/* Ensure bit 0 is clear */
>>> >> +#endif
>>> >>  #ifndef CONFIG_SLOB
>>> >> -SLAB_MATCH(rcu_head, rcu_head);
>>> >> +// SLAB_MATCH(rcu_head, rcu_head); /* not necessary, hopefully? */
>>> > 
>>> > I think if it helps, I can make this check as a dependency on a default-off
>>> > debug config option.
>>> 
>>> It wouldn't help as with this patch, for SLUB the rcu_head would start at
>>> different offset regardless of whether you enable the option to increase its
>>> size.
>> 
>> True, unless we maintained 2 versions of the structs based on CONFIGs but
>> that's really ugly.
>> 
>> Thanks so much for this patch, its surely helpful for debugging purposes and
>> I look forward to testing it more and providing you any feedback!
> 
> I'll send a formal RFC soon.
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-08-26  8:50 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-25  2:56 How to increase rcu_head size without breaking -mm ? Joel Fernandes
2022-08-25  7:13 ` Vlastimil Babka
2022-08-25  7:50   ` Vlastimil Babka
2022-08-25  8:14     ` Vlastimil Babka
2022-08-25 15:00       ` Joel Fernandes
2022-08-25 15:33         ` Vlastimil Babka
2022-08-25 19:51           ` Joel Fernandes
2022-08-26  8:35             ` Vlastimil Babka
2022-08-26  8:50               ` Vlastimil Babka

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.