Re: [PATCH v6 1/5] lib/bitmap: add bitmap_{read,write}()

From: Alexander Potapenko <glider@google.com>
To: Yury Norov <yury.norov@gmail.com>
Cc: catalin.marinas@arm.com, will@kernel.org, pcc@google.com,
	andreyknvl@gmail.com, andriy.shevchenko@linux.intel.com,
	aleksander.lobakin@intel.com, linux@rasmusvillemoes.dk,
	linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, eugenis@google.com,
	syednwaris@gmail.com, william.gray@linaro.org,
	Arnd Bergmann <arnd@arndb.de>
Subject: Re: [PATCH v6 1/5] lib/bitmap: add bitmap_{read,write}()
Date: Tue, 10 Oct 2023 11:17:39 +0200	[thread overview]
Message-ID: <CAG_fn=XHiVyRO-JiOSFREgJjXWjU9Zc1CCMPYV2Xx4LA8P8tkA@mail.gmail.com> (raw)
In-Reply-To: <ZSCLuCu9yMyDdHni@yury-ThinkPad>

> > + *
> > + *   for (bit = 0; bit < nbits; bit++)
> > + *           assign_bit(start + bit, bitmap, val & BIT(bit));
>
> __assign_bit()

Ack

>
> > + */
>
> 'behaves similarly' sounds like an understatement. I think, it behaves
> much faster because it can assign up to 64 bits at once, not mentioning
> the pressure on cache lines traffic.

My intent was to describe the visible behavior, of course the
generated code is better, and the number of memory accesses lower.

How about the following description:

 * The result of bitmap_write() is similar to @nbits calls of assign_bit(), i.e.
 * bits beyond @nbits are ignored:
 *
 *   for (bit = 0; bit < nbits; bit++)
 *           assign_bit(start + bit, bitmap, val & BIT(bit));

?

>
> How faster - that's a good question. I'd be really pleased if you add
> a performance test for bitmap_write/read. Or I can do it myself later.
> You can find examples in the same lib/test_bitmap.c.

I can add two separate functions doing some bitmap_read and
bitmap_write calls in a loop to measure their performance
independently - along the lines of what you did here:
https://lore.kernel.org/lkml/ZL9X0TZb%2FQhCbEiC@yury-ThinkPad/

> > +     if (unlikely(!nbits))
> > +             return;
>
> can you please add more empty lines to separate blocks visually?

Sure, will do.

>
> > +     mask = BITMAP_LAST_WORD_MASK(nbits);
> > +     value &= mask;
> > +     if (space >= nbits) {
> > +             map[index] &= ~(mask << offset);
> > +             map[index] |= value << offset;
> > +             return;
> > +     }
> > +     map[index] &= ~BITMAP_FIRST_WORD_MASK(start);
> > +     map[index] |= value << offset;
> > +     map[index + 1] &= ~BITMAP_LAST_WORD_MASK(start + nbits);
> > +     map[index + 1] |= (value >> space);
> > +}
>
> I compiled the below fix on spark64 BE machine:
>
> --- a/include/linux/bitmap.h
> +++ b/include/linux/bitmap.h
> @@ -608,7 +608,7 @@ static inline unsigned long bitmap_read(const unsigned long *map,
>         if (unlikely(!nbits))
>                 return 0;
>         if (space >= nbits)
> -               return (map[index] >> offset) & GENMASK(nbits - 1, 0);
> +               return (map[index] >> offset) & BITMAP_LAST_WORD_MASK(nbits);
>         value_low = map[index] & BITMAP_FIRST_WORD_MASK(start);
>         value_high = map[index + 1] & BITMAP_LAST_WORD_MASK(start + nbits);
>         return (value_low >> offset) | (value_high << space);
> @@ -661,9 +661,9 @@ static inline void bitmap_write(unsigned long *map,
>                 map[index] |= value << offset;
>                 return;
>         }
> -       map[index] &= ~BITMAP_FIRST_WORD_MASK(start);
> +       map[index] &= BITMAP_LAST_WORD_MASK(start);
>         map[index] |= value << offset;
> -       map[index + 1] &= ~BITMAP_LAST_WORD_MASK(start + nbits);
> +       map[index + 1] &= BITMAP_FIRST_WORD_MASK(start + nbits);
>         map[index + 1] |= (value >> space);
>  }
>
> All the tests are passed just as before, and there's no any difference
> reported by bloat-o-meter. Can you please use non-negation versions as
> they are more straightforward?

I am terribly sorry for this, but this code version is completely
different from what we've discussed here:
https://lore.kernel.org/lkml/CAG_fn=VrPJj6YowHNki5RGAAs8qvwZpUVN4K9qw=cf4aW7Qw9A@mail.gmail.com/

The correct version roughly looks as follows:

void bitmap_write(unsigned long *map, unsigned long value,
                      unsigned long start, unsigned long nbits)
{
        unsigned long offset, space, mask;
        size_t index;
        bool fit;

        if (unlikely(!nbits))
                return;

        mask = BITMAP_LAST_WORD_MASK(nbits);
        value &= mask;
        offset = start % BITS_PER_LONG;
        space = BITS_PER_LONG - offset;
        index = BIT_WORD(start);
        fit = space >= nbits;

        map[index] &= (fit ? (~(mask << offset)) :
~BITMAP_FIRST_WORD_MASK(start));
        map[index] |= value << offset;
        if (fit)
                return;

        map[index + 1] &= BITMAP_FIRST_WORD_MASK(start + nbits);
        map[index + 1] |= (value >> space);
}

Note that here replacing ~BITMAP_FIRST_WORD_MASK() with
BITMAP_LAST_WORD_MASK() costs us 25 bytes (116 bytes with
~BITMAP_FIRST_WORD_MASK() vs. 141 bytes without).