All of lore.kernel.org
 help / color / mirror / Atom feed
From: Matteo Croce <mcroce@linux.microsoft.com>
To: Nick Kossifidis <mick@ics.forth.gr>
Cc: linux-riscv <linux-riscv@lists.infradead.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-arch <linux-arch@vger.kernel.org>,
	Paul Walmsley <paul.walmsley@sifive.com>,
	Palmer Dabbelt <palmer@dabbelt.com>,
	Albert Ou <aou@eecs.berkeley.edu>,
	Atish Patra <atish.patra@wdc.com>,
	Emil Renner Berthing <kernel@esmil.dk>,
	Akira Tsukamoto <akira.tsukamoto@gmail.com>,
	Drew Fustini <drew@beagleboard.org>,
	Bin Meng <bmeng.cn@gmail.com>,
	David Laight <David.Laight@aculab.com>,
	Guo Ren <guoren@kernel.org>
Subject: Re: [PATCH v3 1/3] riscv: optimized memcpy
Date: Wed, 23 Jun 2021 01:35:58 +0200	[thread overview]
Message-ID: <CAFnufp0JuAvrOA89KDbcbhMeMvovoS96STVV+r53PLGJV4r0aw@mail.gmail.com> (raw)
In-Reply-To: <87f2cf0e98c5c5560cfb591b4f4b29c8@mailhost.ics.forth.gr>

On Tue, Jun 22, 2021 at 2:15 AM Nick Kossifidis <mick@ics.forth.gr> wrote:
>
> Hello Matteo and thanks for the patch,
>
> Στις 2021-06-17 18:27, Matteo Croce έγραψε:
> >
> > @@ -0,0 +1,91 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + * String functions optimized for hardware which doesn't
> > + * handle unaligned memory accesses efficiently.
> > + *
> > + * Copyright (C) 2021 Matteo Croce
> > + */
> > +
> > +#include <linux/types.h>
> > +#include <linux/module.h>
> > +
> > +/* Minimum size for a word copy to be convenient */
> > +#define MIN_THRESHOLD (BITS_PER_LONG / 8 * 2)
> > +
> > +/* convenience union to avoid cast between different pointer types */
> > +union types {
> > +     u8 *u8;
>
> You are using a type as a name, I'd go with as_bytes/as_ulong/as_uptr
> which makes it easier for the reader to understand what you are trying
> to do.
>

Makes sense

> > +     unsigned long *ulong;
> > +     uintptr_t uptr;
> > +};
> > +
> > +union const_types {
> > +     const u8 *u8;
> > +     unsigned long *ulong;
> > +};
> > +
>
> I suggest you define those unions inside the function body, no one else
> is using them.
>

They will be used in memset(), in patch 3/3

> > +void *__memcpy(void *dest, const void *src, size_t count)
> > +{
> > +     const int bytes_long = BITS_PER_LONG / 8;
> > +#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
> > +     const int mask = bytes_long - 1;
> > +     const int distance = (src - dest) & mask;
>
> Why not unsigned ints ?
>

Ok.

> > +#endif
> > +     union const_types s = { .u8 = src };
> > +     union types d = { .u8 = dest };
> > +
> > +#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
>
> If you want to be compliant with memcpy you should check for overlapping
> regions here since "The memory areas must not overlap", and do nothing
> about it because according to POSIX this leads to undefined behavior.
> That's why recent libc implementations use memmove in any case (memcpy
> is an alias to memmove), which is the suggested approach.
>

Mmm which memcpy arch implementation does this check?
I guess that noone is currently doing it.

> > +     if (count < MIN_THRESHOLD)
> > +             goto copy_remainder;
> > +
> > +     /* copy a byte at time until destination is aligned */
> > +     for (; count && d.uptr & mask; count--)
> > +             *d.u8++ = *s.u8++;
> > +
>
> You should check for !IS_ENABLED(CONFIG_CPU_BIG_ENDIAN) here.
>

I tought that only Little Endian RISC-V machines were supported in Linux.
Should I add a BUILD_BUG_ON()?
Anyway, if this is going in generic lib/, I'll take care of the endianness.

> > +     if (distance) {
> > +             unsigned long last, next;
> > +
> > +             /* move s backward to the previous alignment boundary */
> > +             s.u8 -= distance;
>
> It'd help here to explain that since s is distance bytes ahead relative
> to d, and d reached the alignment boundary above, s is now aligned but
> the data needs to be shifted to compensate for distance, in order to do
> word-by-word copy.
>
> > +
> > +             /* 32/64 bit wide copy from s to d.
> > +              * d is aligned now but s is not, so read s alignment wise,
> > +              * and do proper shift to get the right value.
> > +              * Works only on Little Endian machines.
> > +              */
>
> This commend is misleading because s is aligned or else s.ulong[0]/[1]
> below would result an unaligned access.
>

Yes, those two comments should be rephrased, merged and put above.

> > +             for (next = s.ulong[0]; count >= bytes_long + mask; count -=
> > bytes_long) {
> > +                     last = next;
> > +                     next = s.ulong[1];
> > +
> > +                     d.ulong[0] = last >> (distance * 8) |
> > +                                  next << ((bytes_long - distance) * 8);
> > +
> > +                     d.ulong++;
> > +                     s.ulong++;
> > +             }
> > +
> > +             /* restore s with the original offset */
> > +             s.u8 += distance;
> > +     } else
> > +#endif
> > +     {
> > +             /* if the source and dest lower bits are the same, do a simple
> > +              * 32/64 bit wide copy.
> > +              */
>
> A while() loop would make more sense here.
>

Ok.

> > +             for (; count >= bytes_long; count -= bytes_long)
> > +                     *d.ulong++ = *s.ulong++;
> > +     }
> > +
> > +     /* suppress warning when CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y */
> > +     goto copy_remainder;
> > +
> > +copy_remainder:
> > +     while (count--)
> > +             *d.u8++ = *s.u8++;
> > +
> > +     return dest;
> > +}
> > +EXPORT_SYMBOL(__memcpy);
> > +
> > +void *memcpy(void *dest, const void *src, size_t count) __weak
> > __alias(__memcpy);
> > +EXPORT_SYMBOL(memcpy);

Regards,
-- 
per aspera ad upstream

WARNING: multiple messages have this Message-ID (diff)
From: Matteo Croce <mcroce@linux.microsoft.com>
To: Nick Kossifidis <mick@ics.forth.gr>
Cc: linux-riscv <linux-riscv@lists.infradead.org>,
	 Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-arch <linux-arch@vger.kernel.org>,
	 Paul Walmsley <paul.walmsley@sifive.com>,
	Palmer Dabbelt <palmer@dabbelt.com>,
	 Albert Ou <aou@eecs.berkeley.edu>,
	Atish Patra <atish.patra@wdc.com>,
	 Emil Renner Berthing <kernel@esmil.dk>,
	Akira Tsukamoto <akira.tsukamoto@gmail.com>,
	 Drew Fustini <drew@beagleboard.org>,
	Bin Meng <bmeng.cn@gmail.com>,
	 David Laight <David.Laight@aculab.com>,
	Guo Ren <guoren@kernel.org>
Subject: Re: [PATCH v3 1/3] riscv: optimized memcpy
Date: Wed, 23 Jun 2021 01:35:58 +0200	[thread overview]
Message-ID: <CAFnufp0JuAvrOA89KDbcbhMeMvovoS96STVV+r53PLGJV4r0aw@mail.gmail.com> (raw)
In-Reply-To: <87f2cf0e98c5c5560cfb591b4f4b29c8@mailhost.ics.forth.gr>

On Tue, Jun 22, 2021 at 2:15 AM Nick Kossifidis <mick@ics.forth.gr> wrote:
>
> Hello Matteo and thanks for the patch,
>
> Στις 2021-06-17 18:27, Matteo Croce έγραψε:
> >
> > @@ -0,0 +1,91 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + * String functions optimized for hardware which doesn't
> > + * handle unaligned memory accesses efficiently.
> > + *
> > + * Copyright (C) 2021 Matteo Croce
> > + */
> > +
> > +#include <linux/types.h>
> > +#include <linux/module.h>
> > +
> > +/* Minimum size for a word copy to be convenient */
> > +#define MIN_THRESHOLD (BITS_PER_LONG / 8 * 2)
> > +
> > +/* convenience union to avoid cast between different pointer types */
> > +union types {
> > +     u8 *u8;
>
> You are using a type as a name, I'd go with as_bytes/as_ulong/as_uptr
> which makes it easier for the reader to understand what you are trying
> to do.
>

Makes sense

> > +     unsigned long *ulong;
> > +     uintptr_t uptr;
> > +};
> > +
> > +union const_types {
> > +     const u8 *u8;
> > +     unsigned long *ulong;
> > +};
> > +
>
> I suggest you define those unions inside the function body, no one else
> is using them.
>

They will be used in memset(), in patch 3/3

> > +void *__memcpy(void *dest, const void *src, size_t count)
> > +{
> > +     const int bytes_long = BITS_PER_LONG / 8;
> > +#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
> > +     const int mask = bytes_long - 1;
> > +     const int distance = (src - dest) & mask;
>
> Why not unsigned ints ?
>

Ok.

> > +#endif
> > +     union const_types s = { .u8 = src };
> > +     union types d = { .u8 = dest };
> > +
> > +#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
>
> If you want to be compliant with memcpy you should check for overlapping
> regions here since "The memory areas must not overlap", and do nothing
> about it because according to POSIX this leads to undefined behavior.
> That's why recent libc implementations use memmove in any case (memcpy
> is an alias to memmove), which is the suggested approach.
>

Mmm which memcpy arch implementation does this check?
I guess that noone is currently doing it.

> > +     if (count < MIN_THRESHOLD)
> > +             goto copy_remainder;
> > +
> > +     /* copy a byte at time until destination is aligned */
> > +     for (; count && d.uptr & mask; count--)
> > +             *d.u8++ = *s.u8++;
> > +
>
> You should check for !IS_ENABLED(CONFIG_CPU_BIG_ENDIAN) here.
>

I tought that only Little Endian RISC-V machines were supported in Linux.
Should I add a BUILD_BUG_ON()?
Anyway, if this is going in generic lib/, I'll take care of the endianness.

> > +     if (distance) {
> > +             unsigned long last, next;
> > +
> > +             /* move s backward to the previous alignment boundary */
> > +             s.u8 -= distance;
>
> It'd help here to explain that since s is distance bytes ahead relative
> to d, and d reached the alignment boundary above, s is now aligned but
> the data needs to be shifted to compensate for distance, in order to do
> word-by-word copy.
>
> > +
> > +             /* 32/64 bit wide copy from s to d.
> > +              * d is aligned now but s is not, so read s alignment wise,
> > +              * and do proper shift to get the right value.
> > +              * Works only on Little Endian machines.
> > +              */
>
> This commend is misleading because s is aligned or else s.ulong[0]/[1]
> below would result an unaligned access.
>

Yes, those two comments should be rephrased, merged and put above.

> > +             for (next = s.ulong[0]; count >= bytes_long + mask; count -=
> > bytes_long) {
> > +                     last = next;
> > +                     next = s.ulong[1];
> > +
> > +                     d.ulong[0] = last >> (distance * 8) |
> > +                                  next << ((bytes_long - distance) * 8);
> > +
> > +                     d.ulong++;
> > +                     s.ulong++;
> > +             }
> > +
> > +             /* restore s with the original offset */
> > +             s.u8 += distance;
> > +     } else
> > +#endif
> > +     {
> > +             /* if the source and dest lower bits are the same, do a simple
> > +              * 32/64 bit wide copy.
> > +              */
>
> A while() loop would make more sense here.
>

Ok.

> > +             for (; count >= bytes_long; count -= bytes_long)
> > +                     *d.ulong++ = *s.ulong++;
> > +     }
> > +
> > +     /* suppress warning when CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y */
> > +     goto copy_remainder;
> > +
> > +copy_remainder:
> > +     while (count--)
> > +             *d.u8++ = *s.u8++;
> > +
> > +     return dest;
> > +}
> > +EXPORT_SYMBOL(__memcpy);
> > +
> > +void *memcpy(void *dest, const void *src, size_t count) __weak
> > __alias(__memcpy);
> > +EXPORT_SYMBOL(memcpy);

Regards,
-- 
per aspera ad upstream

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

  reply	other threads:[~2021-06-22 23:36 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-17 15:27 [PATCH v3 0/3] riscv: optimized mem* functions Matteo Croce
2021-06-17 15:27 ` Matteo Croce
2021-06-17 15:27 ` [PATCH v3 1/3] riscv: optimized memcpy Matteo Croce
2021-06-17 15:27   ` Matteo Croce
2021-06-18 14:06   ` kernel test robot
2021-06-18 14:06     ` kernel test robot
2021-06-18 14:06     ` kernel test robot
2021-06-21 14:26   ` Christoph Hellwig
2021-06-21 14:26     ` Christoph Hellwig
2021-06-22  8:19     ` David Laight
2021-06-22  8:19       ` David Laight
2021-06-22 22:53       ` Matteo Croce
2021-06-22 22:53         ` Matteo Croce
2021-06-22 22:00     ` Matteo Croce
2021-06-22 22:00       ` Matteo Croce
2021-06-22  0:14   ` Nick Kossifidis
2021-06-22  0:14     ` Nick Kossifidis
2021-06-22 23:35     ` Matteo Croce [this message]
2021-06-22 23:35       ` Matteo Croce
2021-06-23  9:48       ` Nick Kossifidis
2021-06-23  9:48         ` Nick Kossifidis
2021-06-17 15:27 ` [PATCH v3 2/3] riscv: optimized memmove Matteo Croce
2021-06-17 15:27   ` Matteo Croce
2021-06-21 14:28   ` Christoph Hellwig
2021-06-21 14:28     ` Christoph Hellwig
2021-06-22  0:46   ` Nick Kossifidis
2021-06-22  0:46     ` Nick Kossifidis
2021-06-30  4:40   ` kernel test robot
2021-06-30  4:40     ` kernel test robot
2021-06-30  4:40     ` kernel test robot
2021-06-17 15:27 ` [PATCH v3 3/3] riscv: optimized memset Matteo Croce
2021-06-17 15:27   ` Matteo Croce
2021-06-21 14:32   ` Christoph Hellwig
2021-06-21 14:32     ` Christoph Hellwig
2021-06-22  1:07   ` Nick Kossifidis
2021-06-22  1:07     ` Nick Kossifidis
2021-06-22  8:38     ` David Laight
2021-06-22  8:38       ` David Laight
2021-06-23  1:14       ` Matteo Croce
2021-06-23  1:14         ` Matteo Croce
2021-06-23  9:05         ` David Laight
2021-06-23  9:05           ` David Laight
2021-06-23  0:08     ` Matteo Croce
2021-06-23  0:08       ` Matteo Croce
2021-06-22  1:09 ` [PATCH v3 0/3] riscv: optimized mem* functions Nick Kossifidis
2021-06-22  1:09   ` Nick Kossifidis
2021-06-22  2:39   ` Guo Ren
2021-06-22  2:39     ` Guo Ren

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAFnufp0JuAvrOA89KDbcbhMeMvovoS96STVV+r53PLGJV4r0aw@mail.gmail.com \
    --to=mcroce@linux.microsoft.com \
    --cc=David.Laight@aculab.com \
    --cc=akira.tsukamoto@gmail.com \
    --cc=aou@eecs.berkeley.edu \
    --cc=atish.patra@wdc.com \
    --cc=bmeng.cn@gmail.com \
    --cc=drew@beagleboard.org \
    --cc=guoren@kernel.org \
    --cc=kernel@esmil.dk \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=mick@ics.forth.gr \
    --cc=palmer@dabbelt.com \
    --cc=paul.walmsley@sifive.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.