linux-hardening.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* replacing memcpy() calls with direct assignment
@ 2022-06-21 18:37 Kees Cook
  2022-06-21 19:05 ` Greg KH
  2022-06-21 19:50 ` Julia Lawall
  0 siblings, 2 replies; 5+ messages in thread
From: Kees Cook @ 2022-06-21 18:37 UTC (permalink / raw)
  To: Coccinelle; +Cc: linux-hardening, Julia Lawall

Hello Coccinelle gurus! :)

I recently spent way too long looking at a weird bug in Clang that I
eventually worked around by just replacing a memcpy() with a direct
assignment. It really was very mechanical, and seems like it might be a
common code pattern in the kernel. Swapping these would make the code
much more readable, I think. Here's the example:


https://lore.kernel.org/linux-hardening/20220616052312.292861-1-keescook@chromium.org/

-		memcpy(&host_image->image_section_info[i],
-		       &fw_image->fw_section_info[i],
-		       sizeof(struct fw_section_info_st));
+		host_image->image_section_info[i] = fw_image->fw_section_info[i];

Is there a way to reduce the size of this cocci rule? I had to
explicitly spell out each "address of" condition separately, though I'd
expect them to be internal aliases, but I'd get output like:

 *&dst = src;

etc

@direct_assignment@
type TYPE;
TYPE DST, SRC;
TYPE *DPTR;
TYPE *SPTR;
@@

(
- memcpy(&DST, &SRC, sizeof(TYPE))
+ DST = SRC
|
- memcpy(&DST, &SRC, sizeof(DST))
+ DST = SRC
|
- memcpy(&DST, &SRC, sizeof(SRC))
+ DST = SRC
|

- memcpy(&DST, SPTR, sizeof(TYPE))
+ DST = *SPTR
|
- memcpy(&DST, SPTR, sizeof(DST))
+ DST = *SPTR
|
- memcpy(&DST, SPTR, sizeof(*SPTR))
+ DST = *SPTR
|

- memcpy(DPTR, &SRC, sizeof(TYPE))
+ *DPTR = SRC
|
- memcpy(DPTR, &SRC, sizeof(DST))
+ *DPTR = SRC
|
- memcpy(DPTR, &SRC, sizeof(SRC))
+ *DPTR = SRC
|

- memcpy(DPTR, SPTR, sizeof(TYPE))
+ *DPTR = *SPTR
|
- memcpy(DPTR, SPTR, sizeof(*DST))
+ *DPTR = *SPTR
|
- memcpy(DPTR, SPTR, sizeof(*SRC))
+ *DPTR = *SPTR
)

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: replacing memcpy() calls with direct assignment
  2022-06-21 18:37 replacing memcpy() calls with direct assignment Kees Cook
@ 2022-06-21 19:05 ` Greg KH
  2022-06-21 20:31   ` Kees Cook
  2022-06-21 19:50 ` Julia Lawall
  1 sibling, 1 reply; 5+ messages in thread
From: Greg KH @ 2022-06-21 19:05 UTC (permalink / raw)
  To: Kees Cook; +Cc: Coccinelle, linux-hardening, Julia Lawall

On Tue, Jun 21, 2022 at 11:37:10AM -0700, Kees Cook wrote:
> Hello Coccinelle gurus! :)
> 
> I recently spent way too long looking at a weird bug in Clang that I
> eventually worked around by just replacing a memcpy() with a direct
> assignment. It really was very mechanical, and seems like it might be a
> common code pattern in the kernel. Swapping these would make the code
> much more readable, I think. Here's the example:
> 
> 
> https://lore.kernel.org/linux-hardening/20220616052312.292861-1-keescook@chromium.org/
> 
> -		memcpy(&host_image->image_section_info[i],
> -		       &fw_image->fw_section_info[i],
> -		       sizeof(struct fw_section_info_st));
> +		host_image->image_section_info[i] = fw_image->fw_section_info[i];

Ick, that hids the fact that you are doing a potentially huge memory
copy here.

And would it also prevent the compiler from using our optimized memcpy()
function and replacing it with whatever it wanted to use instead?

What clang bug does this fix such that it warrants us hiding this
information away from the developers?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: replacing memcpy() calls with direct assignment
  2022-06-21 18:37 replacing memcpy() calls with direct assignment Kees Cook
  2022-06-21 19:05 ` Greg KH
@ 2022-06-21 19:50 ` Julia Lawall
  1 sibling, 0 replies; 5+ messages in thread
From: Julia Lawall @ 2022-06-21 19:50 UTC (permalink / raw)
  To: Kees Cook; +Cc: Coccinelle, linux-hardening



On Tue, 21 Jun 2022, Kees Cook wrote:

> Hello Coccinelle gurus! :)
>
> I recently spent way too long looking at a weird bug in Clang that I
> eventually worked around by just replacing a memcpy() with a direct
> assignment. It really was very mechanical, and seems like it might be a
> common code pattern in the kernel. Swapping these would make the code
> much more readable, I think. Here's the example:
>
>
> https://lore.kernel.org/linux-hardening/20220616052312.292861-1-keescook@chromium.org/
>
> -		memcpy(&host_image->image_section_info[i],
> -		       &fw_image->fw_section_info[i],
> -		       sizeof(struct fw_section_info_st));
> +		host_image->image_section_info[i] = fw_image->fw_section_info[i];
>
> Is there a way to reduce the size of this cocci rule? I had to
> explicitly spell out each "address of" condition separately, though I'd
> expect them to be internal aliases, but I'd get output like:
>
>  *&dst = src;
>
> etc

I don't disagree with Greg, but I will still answer the question :)

>
> @direct_assignment@
> type TYPE;
> TYPE DST, SRC;
> TYPE *DPTR;
> TYPE *SPTR;
> @@
>
> (
> - memcpy(&DST, &SRC, sizeof(TYPE))
> + DST = SRC
> |
> - memcpy(&DST, &SRC, sizeof(DST))
> + DST = SRC
> |
> - memcpy(&DST, &SRC, sizeof(SRC))
> + DST = SRC
> |
>
> - memcpy(&DST, SPTR, sizeof(TYPE))
> + DST = *SPTR
> |
> - memcpy(&DST, SPTR, sizeof(DST))
> + DST = *SPTR
> |
> - memcpy(&DST, SPTR, sizeof(*SPTR))
> + DST = *SPTR
> |
>
> - memcpy(DPTR, &SRC, sizeof(TYPE))
> + *DPTR = SRC
> |
> - memcpy(DPTR, &SRC, sizeof(DST))
> + *DPTR = SRC
> |
> - memcpy(DPTR, &SRC, sizeof(SRC))
> + *DPTR = SRC
> |
>
> - memcpy(DPTR, SPTR, sizeof(TYPE))
> + *DPTR = *SPTR
> |
> - memcpy(DPTR, SPTR, sizeof(*DST))
> + *DPTR = *SPTR
> |
> - memcpy(DPTR, SPTR, sizeof(*SRC))
> + *DPTR = *SPTR
> )

You can make a disjunction for the sizeof, eg in the last case:

\(sizeof(TYPE)\|sizeof(*DST)\|sizeof(*SRC)\)

That would reduce the number of lines by 2/3.

Note that it would not be good to put

sizeof( \(TYPE\|*DST\|*SRC\) )

because the C rules for parentheses with sizeof in the type case are
different than the rules in the expression case.

On the other hand, I believe that the above rule will require SRC and DST
to have known types, while such a type is only necessary for the
sizeof(TYPE) case.  So it would be better to have one rule for the
sizeof(TYPE) case, and another rule for the other sizeof cases.
In the second rule, SRC and DST can just be expressions.

julia

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: replacing memcpy() calls with direct assignment
  2022-06-21 19:05 ` Greg KH
@ 2022-06-21 20:31   ` Kees Cook
  2022-06-21 20:43     ` Greg KH
  0 siblings, 1 reply; 5+ messages in thread
From: Kees Cook @ 2022-06-21 20:31 UTC (permalink / raw)
  To: Greg KH; +Cc: Coccinelle, linux-hardening, Julia Lawall

On Tue, Jun 21, 2022 at 09:05:36PM +0200, Greg KH wrote:
> On Tue, Jun 21, 2022 at 11:37:10AM -0700, Kees Cook wrote:
> > Hello Coccinelle gurus! :)
> > 
> > I recently spent way too long looking at a weird bug in Clang that I
> > eventually worked around by just replacing a memcpy() with a direct
> > assignment. It really was very mechanical, and seems like it might be a
> > common code pattern in the kernel. Swapping these would make the code
> > much more readable, I think. Here's the example:
> > 
> > 
> > https://lore.kernel.org/linux-hardening/20220616052312.292861-1-keescook@chromium.org/
> > 
> > -		memcpy(&host_image->image_section_info[i],
> > -		       &fw_image->fw_section_info[i],
> > -		       sizeof(struct fw_section_info_st));
> > +		host_image->image_section_info[i] = fw_image->fw_section_info[i];
> 
> Ick, that hids the fact that you are doing a potentially huge memory
> copy here.
> 
> And would it also prevent the compiler from using our optimized memcpy()
> function and replacing it with whatever it wanted to use instead?

What? Uh, quite the reverse, in fact. The compiler is MUCH better about
doing those kinds of optimizations. The commit log details that there's
no binary difference, in fact, with this change.

> What clang bug does this fix such that it warrants us hiding this
> information away from the developers?

Hiding? This makes the code significantly more clear. Doing an assignment
makes it clear they're the same type, etc, etc. Obscuring all that with
a memcpy() makes no sense.

As for the bug in Clang, it's triggered by a UBSAN_BOUNDS bug that is
still being investigated.

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: replacing memcpy() calls with direct assignment
  2022-06-21 20:31   ` Kees Cook
@ 2022-06-21 20:43     ` Greg KH
  0 siblings, 0 replies; 5+ messages in thread
From: Greg KH @ 2022-06-21 20:43 UTC (permalink / raw)
  To: Kees Cook; +Cc: Coccinelle, linux-hardening, Julia Lawall

On Tue, Jun 21, 2022 at 01:31:13PM -0700, Kees Cook wrote:
> On Tue, Jun 21, 2022 at 09:05:36PM +0200, Greg KH wrote:
> > On Tue, Jun 21, 2022 at 11:37:10AM -0700, Kees Cook wrote:
> > > Hello Coccinelle gurus! :)
> > > 
> > > I recently spent way too long looking at a weird bug in Clang that I
> > > eventually worked around by just replacing a memcpy() with a direct
> > > assignment. It really was very mechanical, and seems like it might be a
> > > common code pattern in the kernel. Swapping these would make the code
> > > much more readable, I think. Here's the example:
> > > 
> > > 
> > > https://lore.kernel.org/linux-hardening/20220616052312.292861-1-keescook@chromium.org/
> > > 
> > > -		memcpy(&host_image->image_section_info[i],
> > > -		       &fw_image->fw_section_info[i],
> > > -		       sizeof(struct fw_section_info_st));
> > > +		host_image->image_section_info[i] = fw_image->fw_section_info[i];
> > 
> > Ick, that hids the fact that you are doing a potentially huge memory
> > copy here.
> > 
> > And would it also prevent the compiler from using our optimized memcpy()
> > function and replacing it with whatever it wanted to use instead?
> 
> What? Uh, quite the reverse, in fact. The compiler is MUCH better about
> doing those kinds of optimizations. The commit log details that there's
> no binary difference, in fact, with this change.

Ah, so we are telling gcc to use our memcpy() implementations then,
otherwise it could use floating point for built-in things like this
without us knowing it.

So it's not an optimization either way.

> > What clang bug does this fix such that it warrants us hiding this
> > information away from the developers?
> 
> Hiding? This makes the code significantly more clear. Doing an assignment
> makes it clear they're the same type, etc, etc. Obscuring all that with
> a memcpy() makes no sense.

Doing a huge memory copy with a simple '=' assignment does have the
potential to hide things.  Yes, memory copies are so fast it's not even
funny these days, but it's like our use of typedef, we don't use it
because it makes it easier to hide what is really happening.

So do we want to hide this type of thing?  I vote no, but hey, this
isn't the part of the kernel that I maintain :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-06-21 20:43 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-21 18:37 replacing memcpy() calls with direct assignment Kees Cook
2022-06-21 19:05 ` Greg KH
2022-06-21 20:31   ` Kees Cook
2022-06-21 20:43     ` Greg KH
2022-06-21 19:50 ` Julia Lawall

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).