All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alexander Potapenko <glider@google.com>
To: Yury Norov <yury.norov@gmail.com>
Cc: catalin.marinas@arm.com, will@kernel.org, pcc@google.com,
	andreyknvl@gmail.com, andriy.shevchenko@linux.intel.com,
	aleksander.lobakin@intel.com, linux@rasmusvillemoes.dk,
	linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, eugenis@google.com,
	syednwaris@gmail.com, william.gray@linaro.org,
	Arnd Bergmann <arnd@arndb.de>
Subject: Re: [PATCH v6 1/5] lib/bitmap: add bitmap_{read,write}()
Date: Tue, 10 Oct 2023 11:17:39 +0200	[thread overview]
Message-ID: <CAG_fn=XHiVyRO-JiOSFREgJjXWjU9Zc1CCMPYV2Xx4LA8P8tkA@mail.gmail.com> (raw)
In-Reply-To: <ZSCLuCu9yMyDdHni@yury-ThinkPad>

> > + *
> > + *   for (bit = 0; bit < nbits; bit++)
> > + *           assign_bit(start + bit, bitmap, val & BIT(bit));
>
> __assign_bit()

Ack

>
> > + */
>
> 'behaves similarly' sounds like an understatement. I think, it behaves
> much faster because it can assign up to 64 bits at once, not mentioning
> the pressure on cache lines traffic.

My intent was to describe the visible behavior, of course the
generated code is better, and the number of memory accesses lower.

How about the following description:

 * The result of bitmap_write() is similar to @nbits calls of assign_bit(), i.e.
 * bits beyond @nbits are ignored:
 *
 *   for (bit = 0; bit < nbits; bit++)
 *           assign_bit(start + bit, bitmap, val & BIT(bit));

?

>
> How faster - that's a good question. I'd be really pleased if you add
> a performance test for bitmap_write/read. Or I can do it myself later.
> You can find examples in the same lib/test_bitmap.c.

I can add two separate functions doing some bitmap_read and
bitmap_write calls in a loop to measure their performance
independently - along the lines of what you did here:
https://lore.kernel.org/lkml/ZL9X0TZb%2FQhCbEiC@yury-ThinkPad/


> > +     if (unlikely(!nbits))
> > +             return;
>
> can you please add more empty lines to separate blocks visually?

Sure, will do.

>
> > +     mask = BITMAP_LAST_WORD_MASK(nbits);
> > +     value &= mask;
> > +     if (space >= nbits) {
> > +             map[index] &= ~(mask << offset);
> > +             map[index] |= value << offset;
> > +             return;
> > +     }
> > +     map[index] &= ~BITMAP_FIRST_WORD_MASK(start);
> > +     map[index] |= value << offset;
> > +     map[index + 1] &= ~BITMAP_LAST_WORD_MASK(start + nbits);
> > +     map[index + 1] |= (value >> space);
> > +}
>
> I compiled the below fix on spark64 BE machine:
>
> --- a/include/linux/bitmap.h
> +++ b/include/linux/bitmap.h
> @@ -608,7 +608,7 @@ static inline unsigned long bitmap_read(const unsigned long *map,
>         if (unlikely(!nbits))
>                 return 0;
>         if (space >= nbits)
> -               return (map[index] >> offset) & GENMASK(nbits - 1, 0);
> +               return (map[index] >> offset) & BITMAP_LAST_WORD_MASK(nbits);
>         value_low = map[index] & BITMAP_FIRST_WORD_MASK(start);
>         value_high = map[index + 1] & BITMAP_LAST_WORD_MASK(start + nbits);
>         return (value_low >> offset) | (value_high << space);
> @@ -661,9 +661,9 @@ static inline void bitmap_write(unsigned long *map,
>                 map[index] |= value << offset;
>                 return;
>         }
> -       map[index] &= ~BITMAP_FIRST_WORD_MASK(start);
> +       map[index] &= BITMAP_LAST_WORD_MASK(start);
>         map[index] |= value << offset;
> -       map[index + 1] &= ~BITMAP_LAST_WORD_MASK(start + nbits);
> +       map[index + 1] &= BITMAP_FIRST_WORD_MASK(start + nbits);
>         map[index + 1] |= (value >> space);
>  }
>
> All the tests are passed just as before, and there's no any difference
> reported by bloat-o-meter. Can you please use non-negation versions as
> they are more straightforward?

I am terribly sorry for this, but this code version is completely
different from what we've discussed here:
https://lore.kernel.org/lkml/CAG_fn=VrPJj6YowHNki5RGAAs8qvwZpUVN4K9qw=cf4aW7Qw9A@mail.gmail.com/

The correct version roughly looks as follows:

void bitmap_write(unsigned long *map, unsigned long value,
                      unsigned long start, unsigned long nbits)
{
        unsigned long offset, space, mask;
        size_t index;
        bool fit;

        if (unlikely(!nbits))
                return;

        mask = BITMAP_LAST_WORD_MASK(nbits);
        value &= mask;
        offset = start % BITS_PER_LONG;
        space = BITS_PER_LONG - offset;
        index = BIT_WORD(start);
        fit = space >= nbits;

        map[index] &= (fit ? (~(mask << offset)) :
~BITMAP_FIRST_WORD_MASK(start));
        map[index] |= value << offset;
        if (fit)
                return;

        map[index + 1] &= BITMAP_FIRST_WORD_MASK(start + nbits);
        map[index + 1] |= (value >> space);
}

Note that here replacing ~BITMAP_FIRST_WORD_MASK() with
BITMAP_LAST_WORD_MASK() costs us 25 bytes (116 bytes with
~BITMAP_FIRST_WORD_MASK() vs. 141 bytes without).

WARNING: multiple messages have this Message-ID (diff)
From: Alexander Potapenko <glider@google.com>
To: Yury Norov <yury.norov@gmail.com>
Cc: catalin.marinas@arm.com, will@kernel.org, pcc@google.com,
	 andreyknvl@gmail.com, andriy.shevchenko@linux.intel.com,
	 aleksander.lobakin@intel.com, linux@rasmusvillemoes.dk,
	 linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,  eugenis@google.com,
	syednwaris@gmail.com, william.gray@linaro.org,
	 Arnd Bergmann <arnd@arndb.de>
Subject: Re: [PATCH v6 1/5] lib/bitmap: add bitmap_{read,write}()
Date: Tue, 10 Oct 2023 11:17:39 +0200	[thread overview]
Message-ID: <CAG_fn=XHiVyRO-JiOSFREgJjXWjU9Zc1CCMPYV2Xx4LA8P8tkA@mail.gmail.com> (raw)
In-Reply-To: <ZSCLuCu9yMyDdHni@yury-ThinkPad>

> > + *
> > + *   for (bit = 0; bit < nbits; bit++)
> > + *           assign_bit(start + bit, bitmap, val & BIT(bit));
>
> __assign_bit()

Ack

>
> > + */
>
> 'behaves similarly' sounds like an understatement. I think, it behaves
> much faster because it can assign up to 64 bits at once, not mentioning
> the pressure on cache lines traffic.

My intent was to describe the visible behavior, of course the
generated code is better, and the number of memory accesses lower.

How about the following description:

 * The result of bitmap_write() is similar to @nbits calls of assign_bit(), i.e.
 * bits beyond @nbits are ignored:
 *
 *   for (bit = 0; bit < nbits; bit++)
 *           assign_bit(start + bit, bitmap, val & BIT(bit));

?

>
> How faster - that's a good question. I'd be really pleased if you add
> a performance test for bitmap_write/read. Or I can do it myself later.
> You can find examples in the same lib/test_bitmap.c.

I can add two separate functions doing some bitmap_read and
bitmap_write calls in a loop to measure their performance
independently - along the lines of what you did here:
https://lore.kernel.org/lkml/ZL9X0TZb%2FQhCbEiC@yury-ThinkPad/


> > +     if (unlikely(!nbits))
> > +             return;
>
> can you please add more empty lines to separate blocks visually?

Sure, will do.

>
> > +     mask = BITMAP_LAST_WORD_MASK(nbits);
> > +     value &= mask;
> > +     if (space >= nbits) {
> > +             map[index] &= ~(mask << offset);
> > +             map[index] |= value << offset;
> > +             return;
> > +     }
> > +     map[index] &= ~BITMAP_FIRST_WORD_MASK(start);
> > +     map[index] |= value << offset;
> > +     map[index + 1] &= ~BITMAP_LAST_WORD_MASK(start + nbits);
> > +     map[index + 1] |= (value >> space);
> > +}
>
> I compiled the below fix on spark64 BE machine:
>
> --- a/include/linux/bitmap.h
> +++ b/include/linux/bitmap.h
> @@ -608,7 +608,7 @@ static inline unsigned long bitmap_read(const unsigned long *map,
>         if (unlikely(!nbits))
>                 return 0;
>         if (space >= nbits)
> -               return (map[index] >> offset) & GENMASK(nbits - 1, 0);
> +               return (map[index] >> offset) & BITMAP_LAST_WORD_MASK(nbits);
>         value_low = map[index] & BITMAP_FIRST_WORD_MASK(start);
>         value_high = map[index + 1] & BITMAP_LAST_WORD_MASK(start + nbits);
>         return (value_low >> offset) | (value_high << space);
> @@ -661,9 +661,9 @@ static inline void bitmap_write(unsigned long *map,
>                 map[index] |= value << offset;
>                 return;
>         }
> -       map[index] &= ~BITMAP_FIRST_WORD_MASK(start);
> +       map[index] &= BITMAP_LAST_WORD_MASK(start);
>         map[index] |= value << offset;
> -       map[index + 1] &= ~BITMAP_LAST_WORD_MASK(start + nbits);
> +       map[index + 1] &= BITMAP_FIRST_WORD_MASK(start + nbits);
>         map[index + 1] |= (value >> space);
>  }
>
> All the tests are passed just as before, and there's no any difference
> reported by bloat-o-meter. Can you please use non-negation versions as
> they are more straightforward?

I am terribly sorry for this, but this code version is completely
different from what we've discussed here:
https://lore.kernel.org/lkml/CAG_fn=VrPJj6YowHNki5RGAAs8qvwZpUVN4K9qw=cf4aW7Qw9A@mail.gmail.com/

The correct version roughly looks as follows:

void bitmap_write(unsigned long *map, unsigned long value,
                      unsigned long start, unsigned long nbits)
{
        unsigned long offset, space, mask;
        size_t index;
        bool fit;

        if (unlikely(!nbits))
                return;

        mask = BITMAP_LAST_WORD_MASK(nbits);
        value &= mask;
        offset = start % BITS_PER_LONG;
        space = BITS_PER_LONG - offset;
        index = BIT_WORD(start);
        fit = space >= nbits;

        map[index] &= (fit ? (~(mask << offset)) :
~BITMAP_FIRST_WORD_MASK(start));
        map[index] |= value << offset;
        if (fit)
                return;

        map[index + 1] &= BITMAP_FIRST_WORD_MASK(start + nbits);
        map[index + 1] |= (value >> space);
}

Note that here replacing ~BITMAP_FIRST_WORD_MASK() with
BITMAP_LAST_WORD_MASK() costs us 25 bytes (116 bytes with
~BITMAP_FIRST_WORD_MASK() vs. 141 bytes without).

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2023-10-10  9:18 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-06 13:45 [PATCH v6 0/5] Implement MTE tag compression for swapped pages Alexander Potapenko
2023-10-06 13:45 ` Alexander Potapenko
2023-10-06 13:45 ` [PATCH v6 1/5] lib/bitmap: add bitmap_{read,write}() Alexander Potapenko
2023-10-06 13:45   ` Alexander Potapenko
2023-10-06 14:47   ` Andy Shevchenko
2023-10-06 14:47     ` Andy Shevchenko
2023-10-06 16:53     ` Yury Norov
2023-10-06 16:53       ` Yury Norov
2023-10-10  8:16     ` Alexander Potapenko
2023-10-10  8:16       ` Alexander Potapenko
2023-10-10  9:43       ` Alexander Potapenko
2023-10-10  9:43         ` Alexander Potapenko
2023-10-06 22:35   ` Yury Norov
2023-10-06 22:35     ` Yury Norov
2023-10-10  9:17     ` Alexander Potapenko [this message]
2023-10-10  9:17       ` Alexander Potapenko
2023-10-10 11:03       ` Rasmus Villemoes
2023-10-10 11:03         ` Rasmus Villemoes
2023-10-10 12:14         ` Alexander Potapenko
2023-10-10 12:14           ` Alexander Potapenko
2023-10-06 13:45 ` [PATCH v6 2/5] lib/test_bitmap: add tests for bitmap_{read,write}() Alexander Potapenko
2023-10-06 13:45   ` Alexander Potapenko
2023-10-06 13:45 ` [PATCH v6 3/5] arm64: mte: implement CONFIG_ARM64_MTE_COMP Alexander Potapenko
2023-10-06 13:45   ` Alexander Potapenko
2023-10-06 13:45 ` [PATCH v6 4/5] arm64: mte: add a test for MTE tags compression Alexander Potapenko
2023-10-06 13:45   ` Alexander Potapenko
2023-10-06 13:45 ` [PATCH v6 5/5] arm64: mte: add compression support to mteswap.c Alexander Potapenko
2023-10-06 13:45   ` Alexander Potapenko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAG_fn=XHiVyRO-JiOSFREgJjXWjU9Zc1CCMPYV2Xx4LA8P8tkA@mail.gmail.com' \
    --to=glider@google.com \
    --cc=aleksander.lobakin@intel.com \
    --cc=andreyknvl@gmail.com \
    --cc=andriy.shevchenko@linux.intel.com \
    --cc=arnd@arndb.de \
    --cc=catalin.marinas@arm.com \
    --cc=eugenis@google.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@rasmusvillemoes.dk \
    --cc=pcc@google.com \
    --cc=syednwaris@gmail.com \
    --cc=will@kernel.org \
    --cc=william.gray@linaro.org \
    --cc=yury.norov@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.