linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/6] bitops: Introduce the the for_each_set_clump macro
@ 2020-04-24 12:25 Syed Nayyar Waris
  2020-04-24 14:10 ` Lukas Wunner
  0 siblings, 1 reply; 8+ messages in thread
From: Syed Nayyar Waris @ 2020-04-24 12:25 UTC (permalink / raw)
  To: akpm
  Cc: andriy.shevchenko, vilhelm.gray, arnd, linus.walleij, linux-arch,
	linux-kernel

This macro iterates for each group of bits (clump) with set bits,
within a bitmap memory region. For each iteration, "start" is set to
the bit offset of the found clump, while the respective clump value is
stored to the location pointed by "clump". Additionally, the
bitmap_get_value and bitmap_set_value functions are introduced to
respectively get and set a value of n-bits in a bitmap memory region.
The n-bits can have any size less than or equal to BITS_PER_LONG.
Moreover, during setting value of n-bit in bitmap, if a situation arise
that the width of next n-bit is exceeding the word boundary, then it
will divide itself such that some portion of it is stored in that word,
while the remaining portion is stored in the next higher word. Similar
situation occurs while retrieving value of n-bits from bitmap.

Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Syed Nayyar Waris <syednwaris@gmail.com>
Signed-off-by: William Breathitt Gray <vilhelm.gray@gmail.com>
---
 include/asm-generic/bitops/find.h | 19 ++++++++++++
 include/linux/bitmap.h            | 61 +++++++++++++++++++++++++++++++++++++++
 include/linux/bitops.h            | 13 +++++++++
 lib/find_bit.c                    | 14 +++++++++
 4 files changed, 107 insertions(+)

diff --git a/include/asm-generic/bitops/find.h b/include/asm-generic/bitops/find.h
index 9fdf213..4e66007 100644
--- a/include/asm-generic/bitops/find.h
+++ b/include/asm-generic/bitops/find.h
@@ -97,4 +97,23 @@ extern unsigned long find_next_clump8(unsigned long *clump,
 #define find_first_clump8(clump, bits, size) \
 	find_next_clump8((clump), (bits), (size), 0)
 
+/**
+ * find_next_clump - find next clump with set bits in a memory region
+ * @clump: location to store copy of found clump
+ * @addr: address to base the search on
+ * @size: bitmap size in number of bits
+ * @offset: bit offset at which to start searching
+ * @clump_size: clump size in bits
+ *
+ * Returns the bit offset for the next set clump; the found clump value is
+ * copied to the location pointed by @clump. If no bits are set, returns @size.
+ */
+extern unsigned long find_next_clump(unsigned long *clump,
+				      const unsigned long *addr,
+				      unsigned long size, unsigned long offset,
+				      unsigned long clump_size);
+
+#define find_first_clump(clump, bits, size, clump_size) \
+	find_next_clump((clump), (bits), (size), 0, (clump_size))
+
 #endif /*_ASM_GENERIC_BITOPS_FIND_H_ */
diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h
index 99058eb..7ab2c65 100644
--- a/include/linux/bitmap.h
+++ b/include/linux/bitmap.h
@@ -75,7 +75,11 @@
  *  bitmap_from_arr32(dst, buf, nbits)          Copy nbits from u32[] buf to dst
  *  bitmap_to_arr32(buf, src, nbits)            Copy nbits from buf to u32[] dst
  *  bitmap_get_value8(map, start)               Get 8bit value from map at start
+ *  bitmap_get_value(map, start, nbits)		Get bit value of size
+ *						'nbits' from map at start
  *  bitmap_set_value8(map, value, start)        Set 8bit value to map at start
+ *  bitmap_set_value(map, value, start, nbits)	Set bit value of size 'nbits'
+ *						of map at start
  *
  * Note, bitmap_zero() and bitmap_fill() operate over the region of
  * unsigned longs, that is, bits behind bitmap till the unsigned long
@@ -564,6 +568,34 @@ static inline unsigned long bitmap_get_value8(const unsigned long *map,
 }
 
 /**
+ * bitmap_get_value - get a value of n-bits from the memory region
+ * @map: address to the bitmap memory region
+ * @start: bit offset of the n-bit value
+ * @nbits: size of value in bits
+ *
+ * Returns value of nbits located at the @start bit offset within the @map
+ * memory region.
+ */
+static inline unsigned long bitmap_get_value(const unsigned long *map,
+					      unsigned long start,
+					      unsigned long nbits)
+{
+	const size_t index = BIT_WORD(start);
+	const unsigned long offset = start % BITS_PER_LONG;
+	const unsigned long ceiling = roundup(start + 1, BITS_PER_LONG);
+	const unsigned long space = ceiling - start;
+	unsigned long value_low, value_high;
+
+	if (space >= nbits)
+		return (map[index] >> offset) & GENMASK(nbits - 1, 0);
+	else {
+		value_low = map[index] & BITMAP_FIRST_WORD_MASK(start);
+		value_high = map[index + 1] & BITMAP_LAST_WORD_MASK(start + nbits);
+		return (value_low >> offset) | (value_high << space);
+	}
+}
+
+/**
  * bitmap_set_value8 - set an 8-bit value within a memory region
  * @map: address to the bitmap memory region
  * @value: the 8-bit value; values wider than 8 bits may clobber bitmap
@@ -579,6 +611,35 @@ static inline void bitmap_set_value8(unsigned long *map, unsigned long value,
 	map[index] |= value << offset;
 }
 
+/**
+ * bitmap_set_value - set n-bit value within a memory region
+ * @map: address to the bitmap memory region
+ * @value: value of nbits
+ * @start: bit offset of the n-bit value
+ * @nbits: size of value in bits
+ */
+static inline void bitmap_set_value(unsigned long *map,
+				    unsigned long value,
+				    unsigned long start, unsigned long nbits)
+{
+	const size_t index = BIT_WORD(start);
+	const unsigned long offset = start % BITS_PER_LONG;
+	const unsigned long ceiling = roundup(start + 1, BITS_PER_LONG);
+	const unsigned long space = ceiling - start;
+
+	value &= GENMASK(nbits - 1, 0);
+
+	if (space >= nbits) {
+		map[index] &= ~(GENMASK(nbits + offset - 1, offset));
+		map[index] |= value << offset;
+	} else {
+		map[index] &= ~BITMAP_FIRST_WORD_MASK(start);
+		map[index] |= value << offset;
+		map[index + 1] &= ~BITMAP_LAST_WORD_MASK(start + nbits);
+		map[index + 1] |= (value >> space);
+	}
+}
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* __LINUX_BITMAP_H */
diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index 9acf654..41c2d9c 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -62,6 +62,19 @@ extern unsigned long __sw_hweight64(__u64 w);
 	     (start) < (size); \
 	     (start) = find_next_clump8(&(clump), (bits), (size), (start) + 8))
 
+/**
+ * for_each_set_clump - iterate over bitmap for each clump with set bits
+ * @start: bit offset to start search and to store the current iteration offset
+ * @clump: location to store copy of current 8-bit clump
+ * @bits: bitmap address to base the search on
+ * @size: bitmap size in number of bits
+ * @clump_size: clump size in bits
+ */
+#define for_each_set_clump(start, clump, bits, size, clump_size) \
+	for ((start) = find_first_clump(&(clump), (bits), (size), (clump_size)); \
+	     (start) < (size); \
+	     (start) = find_next_clump(&(clump), (bits), (size), (start) + (clump_size), (clump_size)))
+
 static inline int get_bitmask_order(unsigned int count)
 {
 	int order;
diff --git a/lib/find_bit.c b/lib/find_bit.c
index 49f875f..1341bd3 100644
--- a/lib/find_bit.c
+++ b/lib/find_bit.c
@@ -190,3 +190,17 @@ unsigned long find_next_clump8(unsigned long *clump, const unsigned long *addr,
 	return offset;
 }
 EXPORT_SYMBOL(find_next_clump8);
+
+unsigned long find_next_clump(unsigned long *clump, const unsigned long *addr,
+			       unsigned long size, unsigned long offset,
+			       unsigned long clump_size)
+{
+	offset = find_next_bit(addr, size, offset);
+	if (offset == size)
+		return size;
+
+	offset = rounddown(offset, clump_size);
+	*clump = bitmap_get_value(addr, offset, clump_size);
+	return offset;
+}
+EXPORT_SYMBOL(find_next_clump);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/6] bitops: Introduce the the for_each_set_clump macro
  2020-04-24 12:25 [PATCH 1/6] bitops: Introduce the the for_each_set_clump macro Syed Nayyar Waris
@ 2020-04-24 14:10 ` Lukas Wunner
  2020-04-24 14:52   ` Syed Nayyar Waris
  0 siblings, 1 reply; 8+ messages in thread
From: Lukas Wunner @ 2020-04-24 14:10 UTC (permalink / raw)
  To: Syed Nayyar Waris
  Cc: akpm, andriy.shevchenko, vilhelm.gray, arnd, linus.walleij,
	linux-arch, linux-kernel

On Fri, Apr 24, 2020 at 05:55:21PM +0530, Syed Nayyar Waris wrote:
> +static inline void bitmap_set_value(unsigned long *map,
> +				    unsigned long value,
> +				    unsigned long start, unsigned long nbits)
> +{
> +	const size_t index = BIT_WORD(start);
> +	const unsigned long offset = start % BITS_PER_LONG;
> +	const unsigned long ceiling = roundup(start + 1, BITS_PER_LONG);
> +	const unsigned long space = ceiling - start;
> +
> +	value &= GENMASK(nbits - 1, 0);
> +
> +	if (space >= nbits) {
> +		map[index] &= ~(GENMASK(nbits + offset - 1, offset));
> +		map[index] |= value << offset;
> +	} else {
> +		map[index] &= ~BITMAP_FIRST_WORD_MASK(start);
> +		map[index] |= value << offset;
> +		map[index + 1] &= ~BITMAP_LAST_WORD_MASK(start + nbits);
> +		map[index + 1] |= (value >> space);
> +	}
> +}

Sorry but what's the advantage of using this complicated function
as a replacement for the much simpler bitmap_set_value8()?

The drivers calling bitmap_set_value8() *know* that 8-bit accesses
are possible and take advantage of that knowledge by using a small,
speed-optimized function.  Replacing that with a more complicated
(potentially less performant) function doesn't seem to be a step
forward.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/6] bitops: Introduce the the for_each_set_clump macro
  2020-04-24 14:10 ` Lukas Wunner
@ 2020-04-24 14:52   ` Syed Nayyar Waris
  2020-04-24 15:00     ` Lukas Wunner
  0 siblings, 1 reply; 8+ messages in thread
From: Syed Nayyar Waris @ 2020-04-24 14:52 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: akpm, andriy.shevchenko, William Breathitt Gray, arnd,
	Linus Walleij, linux-arch, linux-kernel

On Fri, Apr 24, 2020 at 7:40 PM Lukas Wunner <lukas@wunner.de> wrote:
>
> On Fri, Apr 24, 2020 at 05:55:21PM +0530, Syed Nayyar Waris wrote:
> > +static inline void bitmap_set_value(unsigned long *map,
> > +                                 unsigned long value,
> > +                                 unsigned long start, unsigned long nbits)
> > +{
> > +     const size_t index = BIT_WORD(start);
> > +     const unsigned long offset = start % BITS_PER_LONG;
> > +     const unsigned long ceiling = roundup(start + 1, BITS_PER_LONG);
> > +     const unsigned long space = ceiling - start;
> > +
> > +     value &= GENMASK(nbits - 1, 0);
> > +
> > +     if (space >= nbits) {
> > +             map[index] &= ~(GENMASK(nbits + offset - 1, offset));
> > +             map[index] |= value << offset;
> > +     } else {
> > +             map[index] &= ~BITMAP_FIRST_WORD_MASK(start);
> > +             map[index] |= value << offset;
> > +             map[index + 1] &= ~BITMAP_LAST_WORD_MASK(start + nbits);
> > +             map[index + 1] |= (value >> space);
> > +     }
> > +}
>
> Sorry but what's the advantage of using this complicated function
> as a replacement for the much simpler bitmap_set_value8()?
>
> The drivers calling bitmap_set_value8() *know* that 8-bit accesses
> are possible and take advantage of that knowledge by using a small,
> speed-optimized function.  Replacing that with a more complicated
> (potentially less performant) function doesn't seem to be a step
> forward.
>
> Thanks,
>
> Lukas

Actually this generic function can work with n-bits of any size (less
than equal to BITS_PER_LONG), while the earlier bitmap_set_value8
worked with n-bits having size of 8 bits only.

In the case when n-bits is 8-bits, this new bitmap_set_value()
function would behave very similar to the earlier bitmap_set_value8()
function. For example,  in case of n-bits being 8-bits it will always
execute the 'if' condition and not the 'else' condition, hence
offering the same performance (because of encountering similar code
statements) as earlier bitmap_set_value8() function, most probably.

There is an additional advantage (this can happen when n-bits is not 8
bits): during setting value of n-bit in bitmap, if a situation arise
that the width of next n-bit is exceeding the word boundary, then it
will divide itself such that some portion of it is stored in that
word, while the remaining portion is stored in the next higher word.

So, this function preserves the behaviour of earlier
bitmap_set_value8() function and also adds extra functionality to
that.

Thanks
Syed Nayyar Waris

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/6] bitops: Introduce the the for_each_set_clump macro
  2020-04-24 14:52   ` Syed Nayyar Waris
@ 2020-04-24 15:00     ` Lukas Wunner
  2020-04-24 15:09       ` William Breathitt Gray
  0 siblings, 1 reply; 8+ messages in thread
From: Lukas Wunner @ 2020-04-24 15:00 UTC (permalink / raw)
  To: Syed Nayyar Waris
  Cc: akpm, andriy.shevchenko, William Breathitt Gray, arnd,
	Linus Walleij, linux-arch, linux-kernel

On Fri, Apr 24, 2020 at 08:22:38PM +0530, Syed Nayyar Waris wrote:
> On Fri, Apr 24, 2020 at 7:40 PM Lukas Wunner <lukas@wunner.de> wrote:
> >
> > On Fri, Apr 24, 2020 at 05:55:21PM +0530, Syed Nayyar Waris wrote:
> > > +static inline void bitmap_set_value(unsigned long *map,
> > > +                                 unsigned long value,
> > > +                                 unsigned long start, unsigned long nbits)
> > > +{
> > > +     const size_t index = BIT_WORD(start);
> > > +     const unsigned long offset = start % BITS_PER_LONG;
> > > +     const unsigned long ceiling = roundup(start + 1, BITS_PER_LONG);
> > > +     const unsigned long space = ceiling - start;
> > > +
> > > +     value &= GENMASK(nbits - 1, 0);
> > > +
> > > +     if (space >= nbits) {
> > > +             map[index] &= ~(GENMASK(nbits + offset - 1, offset));
> > > +             map[index] |= value << offset;
> > > +     } else {
> > > +             map[index] &= ~BITMAP_FIRST_WORD_MASK(start);
> > > +             map[index] |= value << offset;
> > > +             map[index + 1] &= ~BITMAP_LAST_WORD_MASK(start + nbits);
> > > +             map[index + 1] |= (value >> space);
> > > +     }
> > > +}
> >
> > Sorry but what's the advantage of using this complicated function
> > as a replacement for the much simpler bitmap_set_value8()?
> >
> > The drivers calling bitmap_set_value8() *know* that 8-bit accesses
> > are possible and take advantage of that knowledge by using a small,
> > speed-optimized function.  Replacing that with a more complicated
> > (potentially less performant) function doesn't seem to be a step
> > forward.
> 
> Actually this generic function can work with n-bits of any size (less
> than equal to BITS_PER_LONG), while the earlier bitmap_set_value8
> worked with n-bits having size of 8 bits only.
> 
> In the case when n-bits is 8-bits, this new bitmap_set_value()
> function would behave very similar to the earlier bitmap_set_value8()
> function. For example,  in case of n-bits being 8-bits it will always
> execute the 'if' condition and not the 'else' condition, hence
> offering the same performance (because of encountering similar code
> statements) as earlier bitmap_set_value8() function, most probably.
> 
> There is an additional advantage (this can happen when n-bits is not 8
> bits): during setting value of n-bit in bitmap, if a situation arise
> that the width of next n-bit is exceeding the word boundary, then it
> will divide itself such that some portion of it is stored in that
> word, while the remaining portion is stored in the next higher word.
> 
> So, this function preserves the behaviour of earlier
> bitmap_set_value8() function and also adds extra functionality to
> that.

Please leave drivers as is which use exclusively 8-bit accesses,
e.g. gpio-max3191x.c and gpio-74x164.c.  I'm fearing a performance
regression if your new generic variant is used.  They work perfectly
fine the way they are and I don't see any benefit this series may have
for them.

If there are other drivers which benefit from the flexibility of your
generic variant then I'm not opposed to changing those.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/6] bitops: Introduce the the for_each_set_clump macro
  2020-04-24 15:00     ` Lukas Wunner
@ 2020-04-24 15:09       ` William Breathitt Gray
  2020-04-24 16:34         ` Andy Shevchenko
  0 siblings, 1 reply; 8+ messages in thread
From: William Breathitt Gray @ 2020-04-24 15:09 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Syed Nayyar Waris, akpm, andriy.shevchenko, arnd, Linus Walleij,
	linux-arch, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 3870 bytes --]

On Fri, Apr 24, 2020 at 05:00:58PM +0200, Lukas Wunner wrote:
> On Fri, Apr 24, 2020 at 08:22:38PM +0530, Syed Nayyar Waris wrote:
> > On Fri, Apr 24, 2020 at 7:40 PM Lukas Wunner <lukas@wunner.de> wrote:
> > >
> > > On Fri, Apr 24, 2020 at 05:55:21PM +0530, Syed Nayyar Waris wrote:
> > > > +static inline void bitmap_set_value(unsigned long *map,
> > > > +                                 unsigned long value,
> > > > +                                 unsigned long start, unsigned long nbits)
> > > > +{
> > > > +     const size_t index = BIT_WORD(start);
> > > > +     const unsigned long offset = start % BITS_PER_LONG;
> > > > +     const unsigned long ceiling = roundup(start + 1, BITS_PER_LONG);
> > > > +     const unsigned long space = ceiling - start;
> > > > +
> > > > +     value &= GENMASK(nbits - 1, 0);
> > > > +
> > > > +     if (space >= nbits) {
> > > > +             map[index] &= ~(GENMASK(nbits + offset - 1, offset));
> > > > +             map[index] |= value << offset;
> > > > +     } else {
> > > > +             map[index] &= ~BITMAP_FIRST_WORD_MASK(start);
> > > > +             map[index] |= value << offset;
> > > > +             map[index + 1] &= ~BITMAP_LAST_WORD_MASK(start + nbits);
> > > > +             map[index + 1] |= (value >> space);
> > > > +     }
> > > > +}
> > >
> > > Sorry but what's the advantage of using this complicated function
> > > as a replacement for the much simpler bitmap_set_value8()?
> > >
> > > The drivers calling bitmap_set_value8() *know* that 8-bit accesses
> > > are possible and take advantage of that knowledge by using a small,
> > > speed-optimized function.  Replacing that with a more complicated
> > > (potentially less performant) function doesn't seem to be a step
> > > forward.
> > 
> > Actually this generic function can work with n-bits of any size (less
> > than equal to BITS_PER_LONG), while the earlier bitmap_set_value8
> > worked with n-bits having size of 8 bits only.
> > 
> > In the case when n-bits is 8-bits, this new bitmap_set_value()
> > function would behave very similar to the earlier bitmap_set_value8()
> > function. For example,  in case of n-bits being 8-bits it will always
> > execute the 'if' condition and not the 'else' condition, hence
> > offering the same performance (because of encountering similar code
> > statements) as earlier bitmap_set_value8() function, most probably.
> > 
> > There is an additional advantage (this can happen when n-bits is not 8
> > bits): during setting value of n-bit in bitmap, if a situation arise
> > that the width of next n-bit is exceeding the word boundary, then it
> > will divide itself such that some portion of it is stored in that
> > word, while the remaining portion is stored in the next higher word.
> > 
> > So, this function preserves the behaviour of earlier
> > bitmap_set_value8() function and also adds extra functionality to
> > that.
> 
> Please leave drivers as is which use exclusively 8-bit accesses,
> e.g. gpio-max3191x.c and gpio-74x164.c.  I'm fearing a performance
> regression if your new generic variant is used.  They work perfectly
> fine the way they are and I don't see any benefit this series may have
> for them.
> 
> If there are other drivers which benefit from the flexibility of your
> generic variant then I'm not opposed to changing those.
> 
> Thanks,
> 
> Lukas

We can leave of course bitmap_set_value8 alone, but for 8-bit values the
difference in latency I suspect is primarily due to the conditional test
for the word boundaries. This latency is surely overshadowed by the I/O
latency of the GPIO drivers, so I don't think there's much harm in
changing those to use the generic function when the bottleneck will not
be due to the bitmap_set_value/bitmap_get_value operations.

William Breathitt Gray

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/6] bitops: Introduce the the for_each_set_clump macro
  2020-04-24 15:09       ` William Breathitt Gray
@ 2020-04-24 16:34         ` Andy Shevchenko
  2020-04-24 16:42           ` William Breathitt Gray
  0 siblings, 1 reply; 8+ messages in thread
From: Andy Shevchenko @ 2020-04-24 16:34 UTC (permalink / raw)
  To: William Breathitt Gray
  Cc: Lukas Wunner, Syed Nayyar Waris, akpm, arnd, Linus Walleij,
	linux-arch, linux-kernel

On Fri, Apr 24, 2020 at 11:09:26AM -0400, William Breathitt Gray wrote:
> On Fri, Apr 24, 2020 at 05:00:58PM +0200, Lukas Wunner wrote:
> > On Fri, Apr 24, 2020 at 08:22:38PM +0530, Syed Nayyar Waris wrote:
> > > On Fri, Apr 24, 2020 at 7:40 PM Lukas Wunner <lukas@wunner.de> wrote:
> > > > On Fri, Apr 24, 2020 at 05:55:21PM +0530, Syed Nayyar Waris wrote:

...

> > > So, this function preserves the behaviour of earlier
> > > bitmap_set_value8() function and also adds extra functionality to
> > > that.
> > 
> > Please leave drivers as is which use exclusively 8-bit accesses,
> > e.g. gpio-max3191x.c and gpio-74x164.c.  I'm fearing a performance
> > regression if your new generic variant is used.  They work perfectly
> > fine the way they are and I don't see any benefit this series may have
> > for them.
> > 
> > If there are other drivers which benefit from the flexibility of your
> > generic variant then I'm not opposed to changing those.

> We can leave of course bitmap_set_value8 alone, but for 8-bit values the
> difference in latency I suspect is primarily due to the conditional test
> for the word boundaries. This latency is surely overshadowed by the I/O
> latency of the GPIO drivers, so I don't think there's much harm in
> changing those to use the generic function when the bottleneck will not
> be due to the bitmap_set_value/bitmap_get_value operations.

Okay, how many new (non-8-bit) users this will target?

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/6] bitops: Introduce the the for_each_set_clump macro
  2020-04-24 16:34         ` Andy Shevchenko
@ 2020-04-24 16:42           ` William Breathitt Gray
  2020-04-24 17:59             ` Lukas Wunner
  0 siblings, 1 reply; 8+ messages in thread
From: William Breathitt Gray @ 2020-04-24 16:42 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Lukas Wunner, Syed Nayyar Waris, akpm, arnd, Linus Walleij,
	linux-arch, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2125 bytes --]

On Fri, Apr 24, 2020 at 07:34:10PM +0300, Andy Shevchenko wrote:
> On Fri, Apr 24, 2020 at 11:09:26AM -0400, William Breathitt Gray wrote:
> > On Fri, Apr 24, 2020 at 05:00:58PM +0200, Lukas Wunner wrote:
> > > On Fri, Apr 24, 2020 at 08:22:38PM +0530, Syed Nayyar Waris wrote:
> > > > On Fri, Apr 24, 2020 at 7:40 PM Lukas Wunner <lukas@wunner.de> wrote:
> > > > > On Fri, Apr 24, 2020 at 05:55:21PM +0530, Syed Nayyar Waris wrote:
> 
> ...
> 
> > > > So, this function preserves the behaviour of earlier
> > > > bitmap_set_value8() function and also adds extra functionality to
> > > > that.
> > > 
> > > Please leave drivers as is which use exclusively 8-bit accesses,
> > > e.g. gpio-max3191x.c and gpio-74x164.c.  I'm fearing a performance
> > > regression if your new generic variant is used.  They work perfectly
> > > fine the way they are and I don't see any benefit this series may have
> > > for them.
> > > 
> > > If there are other drivers which benefit from the flexibility of your
> > > generic variant then I'm not opposed to changing those.
> 
> > We can leave of course bitmap_set_value8 alone, but for 8-bit values the
> > difference in latency I suspect is primarily due to the conditional test
> > for the word boundaries. This latency is surely overshadowed by the I/O
> > latency of the GPIO drivers, so I don't think there's much harm in
> > changing those to use the generic function when the bottleneck will not
> > be due to the bitmap_set_value/bitmap_get_value operations.
> 
> Okay, how many new (non-8-bit) users this will target?
> 
> -- 
> With Best Regards,
> Andy Shevchenko

Within this patchset the only non-8-bit users are gpio-thunderx and
gpio-xilinix. The gpio-xilinx has configurable port widths so in some
instances it can behave like the 8-bit users, but not always.

If you want to keep the existing for_each_set_clump8 and related
functions, ignore [PATCH 3/6] and [PATCH 4/6]. That should allow this
patchset to be just an introduction of the new generic functions without
affecting the existing 8-bit users.

William Breathitt Gray

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/6] bitops: Introduce the the for_each_set_clump macro
  2020-04-24 16:42           ` William Breathitt Gray
@ 2020-04-24 17:59             ` Lukas Wunner
  0 siblings, 0 replies; 8+ messages in thread
From: Lukas Wunner @ 2020-04-24 17:59 UTC (permalink / raw)
  To: William Breathitt Gray
  Cc: Andy Shevchenko, Syed Nayyar Waris, akpm, arnd, Linus Walleij,
	linux-arch, linux-kernel

On Fri, Apr 24, 2020 at 12:42:00PM -0400, William Breathitt Gray wrote:
> Within this patchset the only non-8-bit users are gpio-thunderx and
> gpio-xilinix. The gpio-xilinx has configurable port widths so in some
> instances it can behave like the 8-bit users, but not always.
> 
> If you want to keep the existing for_each_set_clump8 and related
> functions, ignore [PATCH 3/6] and [PATCH 4/6]. That should allow this
> patchset to be just an introduction of the new generic functions without
> affecting the existing 8-bit users.

Yes I don't mind the changes to gpio-thunderx and gpio-xilinx at all
but please leave the 8-bit users as they are wherever possible.
Actually my concern is not just performance but the existing 8-bit
variant is simpler to understand than the generic variant,
making it easier to follow the code in the drivers.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-04-24 17:59 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-24 12:25 [PATCH 1/6] bitops: Introduce the the for_each_set_clump macro Syed Nayyar Waris
2020-04-24 14:10 ` Lukas Wunner
2020-04-24 14:52   ` Syed Nayyar Waris
2020-04-24 15:00     ` Lukas Wunner
2020-04-24 15:09       ` William Breathitt Gray
2020-04-24 16:34         ` Andy Shevchenko
2020-04-24 16:42           ` William Breathitt Gray
2020-04-24 17:59             ` Lukas Wunner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).