From: Alexander Potapenko <glider@google.com> To: Yury Norov <yury.norov@gmail.com> Cc: catalin.marinas@arm.com, will@kernel.org, pcc@google.com, andreyknvl@gmail.com, andriy.shevchenko@linux.intel.com, aleksander.lobakin@intel.com, linux@rasmusvillemoes.dk, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, eugenis@google.com, syednwaris@gmail.com, william.gray@linaro.org, Arnd Bergmann <arnd@arndb.de> Subject: Re: [PATCH v6 1/5] lib/bitmap: add bitmap_{read,write}() Date: Tue, 10 Oct 2023 11:17:39 +0200 [thread overview] Message-ID: <CAG_fn=XHiVyRO-JiOSFREgJjXWjU9Zc1CCMPYV2Xx4LA8P8tkA@mail.gmail.com> (raw) In-Reply-To: <ZSCLuCu9yMyDdHni@yury-ThinkPad> > > + * > > + * for (bit = 0; bit < nbits; bit++) > > + * assign_bit(start + bit, bitmap, val & BIT(bit)); > > __assign_bit() Ack > > > + */ > > 'behaves similarly' sounds like an understatement. I think, it behaves > much faster because it can assign up to 64 bits at once, not mentioning > the pressure on cache lines traffic. My intent was to describe the visible behavior, of course the generated code is better, and the number of memory accesses lower. How about the following description: * The result of bitmap_write() is similar to @nbits calls of assign_bit(), i.e. * bits beyond @nbits are ignored: * * for (bit = 0; bit < nbits; bit++) * assign_bit(start + bit, bitmap, val & BIT(bit)); ? > > How faster - that's a good question. I'd be really pleased if you add > a performance test for bitmap_write/read. Or I can do it myself later. > You can find examples in the same lib/test_bitmap.c. I can add two separate functions doing some bitmap_read and bitmap_write calls in a loop to measure their performance independently - along the lines of what you did here: https://lore.kernel.org/lkml/ZL9X0TZb%2FQhCbEiC@yury-ThinkPad/ > > + if (unlikely(!nbits)) > > + return; > > can you please add more empty lines to separate blocks visually? Sure, will do. > > > + mask = BITMAP_LAST_WORD_MASK(nbits); > > + value &= mask; > > + if (space >= nbits) { > > + map[index] &= ~(mask << offset); > > + map[index] |= value << offset; > > + return; > > + } > > + map[index] &= ~BITMAP_FIRST_WORD_MASK(start); > > + map[index] |= value << offset; > > + map[index + 1] &= ~BITMAP_LAST_WORD_MASK(start + nbits); > > + map[index + 1] |= (value >> space); > > +} > > I compiled the below fix on spark64 BE machine: > > --- a/include/linux/bitmap.h > +++ b/include/linux/bitmap.h > @@ -608,7 +608,7 @@ static inline unsigned long bitmap_read(const unsigned long *map, > if (unlikely(!nbits)) > return 0; > if (space >= nbits) > - return (map[index] >> offset) & GENMASK(nbits - 1, 0); > + return (map[index] >> offset) & BITMAP_LAST_WORD_MASK(nbits); > value_low = map[index] & BITMAP_FIRST_WORD_MASK(start); > value_high = map[index + 1] & BITMAP_LAST_WORD_MASK(start + nbits); > return (value_low >> offset) | (value_high << space); > @@ -661,9 +661,9 @@ static inline void bitmap_write(unsigned long *map, > map[index] |= value << offset; > return; > } > - map[index] &= ~BITMAP_FIRST_WORD_MASK(start); > + map[index] &= BITMAP_LAST_WORD_MASK(start); > map[index] |= value << offset; > - map[index + 1] &= ~BITMAP_LAST_WORD_MASK(start + nbits); > + map[index + 1] &= BITMAP_FIRST_WORD_MASK(start + nbits); > map[index + 1] |= (value >> space); > } > > All the tests are passed just as before, and there's no any difference > reported by bloat-o-meter. Can you please use non-negation versions as > they are more straightforward? I am terribly sorry for this, but this code version is completely different from what we've discussed here: https://lore.kernel.org/lkml/CAG_fn=VrPJj6YowHNki5RGAAs8qvwZpUVN4K9qw=cf4aW7Qw9A@mail.gmail.com/ The correct version roughly looks as follows: void bitmap_write(unsigned long *map, unsigned long value, unsigned long start, unsigned long nbits) { unsigned long offset, space, mask; size_t index; bool fit; if (unlikely(!nbits)) return; mask = BITMAP_LAST_WORD_MASK(nbits); value &= mask; offset = start % BITS_PER_LONG; space = BITS_PER_LONG - offset; index = BIT_WORD(start); fit = space >= nbits; map[index] &= (fit ? (~(mask << offset)) : ~BITMAP_FIRST_WORD_MASK(start)); map[index] |= value << offset; if (fit) return; map[index + 1] &= BITMAP_FIRST_WORD_MASK(start + nbits); map[index + 1] |= (value >> space); } Note that here replacing ~BITMAP_FIRST_WORD_MASK() with BITMAP_LAST_WORD_MASK() costs us 25 bytes (116 bytes with ~BITMAP_FIRST_WORD_MASK() vs. 141 bytes without).
WARNING: multiple messages have this Message-ID (diff)
From: Alexander Potapenko <glider@google.com> To: Yury Norov <yury.norov@gmail.com> Cc: catalin.marinas@arm.com, will@kernel.org, pcc@google.com, andreyknvl@gmail.com, andriy.shevchenko@linux.intel.com, aleksander.lobakin@intel.com, linux@rasmusvillemoes.dk, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, eugenis@google.com, syednwaris@gmail.com, william.gray@linaro.org, Arnd Bergmann <arnd@arndb.de> Subject: Re: [PATCH v6 1/5] lib/bitmap: add bitmap_{read,write}() Date: Tue, 10 Oct 2023 11:17:39 +0200 [thread overview] Message-ID: <CAG_fn=XHiVyRO-JiOSFREgJjXWjU9Zc1CCMPYV2Xx4LA8P8tkA@mail.gmail.com> (raw) In-Reply-To: <ZSCLuCu9yMyDdHni@yury-ThinkPad> > > + * > > + * for (bit = 0; bit < nbits; bit++) > > + * assign_bit(start + bit, bitmap, val & BIT(bit)); > > __assign_bit() Ack > > > + */ > > 'behaves similarly' sounds like an understatement. I think, it behaves > much faster because it can assign up to 64 bits at once, not mentioning > the pressure on cache lines traffic. My intent was to describe the visible behavior, of course the generated code is better, and the number of memory accesses lower. How about the following description: * The result of bitmap_write() is similar to @nbits calls of assign_bit(), i.e. * bits beyond @nbits are ignored: * * for (bit = 0; bit < nbits; bit++) * assign_bit(start + bit, bitmap, val & BIT(bit)); ? > > How faster - that's a good question. I'd be really pleased if you add > a performance test for bitmap_write/read. Or I can do it myself later. > You can find examples in the same lib/test_bitmap.c. I can add two separate functions doing some bitmap_read and bitmap_write calls in a loop to measure their performance independently - along the lines of what you did here: https://lore.kernel.org/lkml/ZL9X0TZb%2FQhCbEiC@yury-ThinkPad/ > > + if (unlikely(!nbits)) > > + return; > > can you please add more empty lines to separate blocks visually? Sure, will do. > > > + mask = BITMAP_LAST_WORD_MASK(nbits); > > + value &= mask; > > + if (space >= nbits) { > > + map[index] &= ~(mask << offset); > > + map[index] |= value << offset; > > + return; > > + } > > + map[index] &= ~BITMAP_FIRST_WORD_MASK(start); > > + map[index] |= value << offset; > > + map[index + 1] &= ~BITMAP_LAST_WORD_MASK(start + nbits); > > + map[index + 1] |= (value >> space); > > +} > > I compiled the below fix on spark64 BE machine: > > --- a/include/linux/bitmap.h > +++ b/include/linux/bitmap.h > @@ -608,7 +608,7 @@ static inline unsigned long bitmap_read(const unsigned long *map, > if (unlikely(!nbits)) > return 0; > if (space >= nbits) > - return (map[index] >> offset) & GENMASK(nbits - 1, 0); > + return (map[index] >> offset) & BITMAP_LAST_WORD_MASK(nbits); > value_low = map[index] & BITMAP_FIRST_WORD_MASK(start); > value_high = map[index + 1] & BITMAP_LAST_WORD_MASK(start + nbits); > return (value_low >> offset) | (value_high << space); > @@ -661,9 +661,9 @@ static inline void bitmap_write(unsigned long *map, > map[index] |= value << offset; > return; > } > - map[index] &= ~BITMAP_FIRST_WORD_MASK(start); > + map[index] &= BITMAP_LAST_WORD_MASK(start); > map[index] |= value << offset; > - map[index + 1] &= ~BITMAP_LAST_WORD_MASK(start + nbits); > + map[index + 1] &= BITMAP_FIRST_WORD_MASK(start + nbits); > map[index + 1] |= (value >> space); > } > > All the tests are passed just as before, and there's no any difference > reported by bloat-o-meter. Can you please use non-negation versions as > they are more straightforward? I am terribly sorry for this, but this code version is completely different from what we've discussed here: https://lore.kernel.org/lkml/CAG_fn=VrPJj6YowHNki5RGAAs8qvwZpUVN4K9qw=cf4aW7Qw9A@mail.gmail.com/ The correct version roughly looks as follows: void bitmap_write(unsigned long *map, unsigned long value, unsigned long start, unsigned long nbits) { unsigned long offset, space, mask; size_t index; bool fit; if (unlikely(!nbits)) return; mask = BITMAP_LAST_WORD_MASK(nbits); value &= mask; offset = start % BITS_PER_LONG; space = BITS_PER_LONG - offset; index = BIT_WORD(start); fit = space >= nbits; map[index] &= (fit ? (~(mask << offset)) : ~BITMAP_FIRST_WORD_MASK(start)); map[index] |= value << offset; if (fit) return; map[index + 1] &= BITMAP_FIRST_WORD_MASK(start + nbits); map[index + 1] |= (value >> space); } Note that here replacing ~BITMAP_FIRST_WORD_MASK() with BITMAP_LAST_WORD_MASK() costs us 25 bytes (116 bytes with ~BITMAP_FIRST_WORD_MASK() vs. 141 bytes without). _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2023-10-10 9:18 UTC|newest] Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top 2023-10-06 13:45 [PATCH v6 0/5] Implement MTE tag compression for swapped pages Alexander Potapenko 2023-10-06 13:45 ` Alexander Potapenko 2023-10-06 13:45 ` [PATCH v6 1/5] lib/bitmap: add bitmap_{read,write}() Alexander Potapenko 2023-10-06 13:45 ` Alexander Potapenko 2023-10-06 14:47 ` Andy Shevchenko 2023-10-06 14:47 ` Andy Shevchenko 2023-10-06 16:53 ` Yury Norov 2023-10-06 16:53 ` Yury Norov 2023-10-10 8:16 ` Alexander Potapenko 2023-10-10 8:16 ` Alexander Potapenko 2023-10-10 9:43 ` Alexander Potapenko 2023-10-10 9:43 ` Alexander Potapenko 2023-10-06 22:35 ` Yury Norov 2023-10-06 22:35 ` Yury Norov 2023-10-10 9:17 ` Alexander Potapenko [this message] 2023-10-10 9:17 ` Alexander Potapenko 2023-10-10 11:03 ` Rasmus Villemoes 2023-10-10 11:03 ` Rasmus Villemoes 2023-10-10 12:14 ` Alexander Potapenko 2023-10-10 12:14 ` Alexander Potapenko 2023-10-06 13:45 ` [PATCH v6 2/5] lib/test_bitmap: add tests for bitmap_{read,write}() Alexander Potapenko 2023-10-06 13:45 ` Alexander Potapenko 2023-10-06 13:45 ` [PATCH v6 3/5] arm64: mte: implement CONFIG_ARM64_MTE_COMP Alexander Potapenko 2023-10-06 13:45 ` Alexander Potapenko 2023-10-06 13:45 ` [PATCH v6 4/5] arm64: mte: add a test for MTE tags compression Alexander Potapenko 2023-10-06 13:45 ` Alexander Potapenko 2023-10-06 13:45 ` [PATCH v6 5/5] arm64: mte: add compression support to mteswap.c Alexander Potapenko 2023-10-06 13:45 ` Alexander Potapenko
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to='CAG_fn=XHiVyRO-JiOSFREgJjXWjU9Zc1CCMPYV2Xx4LA8P8tkA@mail.gmail.com' \ --to=glider@google.com \ --cc=aleksander.lobakin@intel.com \ --cc=andreyknvl@gmail.com \ --cc=andriy.shevchenko@linux.intel.com \ --cc=arnd@arndb.de \ --cc=catalin.marinas@arm.com \ --cc=eugenis@google.com \ --cc=linux-arm-kernel@lists.infradead.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux@rasmusvillemoes.dk \ --cc=pcc@google.com \ --cc=syednwaris@gmail.com \ --cc=will@kernel.org \ --cc=william.gray@linaro.org \ --cc=yury.norov@gmail.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.