All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] ioatdma: ring buffer management updates
@ 2010-05-11 18:51 ` Dan Williams
  2010-05-11 18:51   ` [PATCH 1/2] ioat: convert to circ_buf Dan Williams
  2010-05-12  8:36   ` [PATCH 2/2] ioat2,3: convert to producer/consumer locking David Howells
  0 siblings, 2 replies; 4+ messages in thread
From: Dan Williams @ 2010-05-11 18:51 UTC (permalink / raw)
  To: linux-kernel, linux-raid, netdev

Two patches targeted at the next merge window affecting the ioatdma
driver (used when NET_DMA and/or ASYNC_TX_DMA+MD_RAID456 are enabled).
According to perf the split locking update improves cpu utilization by a
few percentage points.

---

Dan Williams (2):
      ioat2,3: convert to producer/consumer locking
      ioat: convert to circ_buf


 drivers/dma/ioat/dma.h    |    1 
 drivers/dma/ioat/dma_v2.c |  184 +++++++++++++++++++++++----------------------
 drivers/dma/ioat/dma_v2.h |   33 +++-----
 drivers/dma/ioat/dma_v3.c |  117 +++++++++--------------------
 4 files changed, 142 insertions(+), 193 deletions(-)

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 1/2] ioat: convert to circ_buf
  2010-05-11 18:51 ` [PATCH 0/2] ioatdma: ring buffer management updates Dan Williams
@ 2010-05-11 18:51   ` Dan Williams
  2010-05-12  8:36   ` [PATCH 2/2] ioat2,3: convert to producer/consumer locking David Howells
  1 sibling, 0 replies; 4+ messages in thread
From: Dan Williams @ 2010-05-11 18:51 UTC (permalink / raw)
  To: linux-kernel, linux-raid, netdev; +Cc: Dan Williams

Use the common power-of-2 circular buffer macros.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---

 drivers/dma/ioat/dma_v2.c |    2 +-
 drivers/dma/ioat/dma_v2.h |   18 +++++++-----------
 2 files changed, 8 insertions(+), 12 deletions(-)

diff --git a/drivers/dma/ioat/dma_v2.c b/drivers/dma/ioat/dma_v2.c
index b5ae56c..b6699a3 100644
--- a/drivers/dma/ioat/dma_v2.c
+++ b/drivers/dma/ioat/dma_v2.c
@@ -553,7 +553,7 @@ bool reshape_ring(struct ioat2_dma_chan *ioat, int order)
 	 */
 	struct ioat_chan_common *chan = &ioat->base;
 	struct dma_chan *c = &chan->common;
-	const u16 curr_size = ioat2_ring_mask(ioat) + 1;
+	const u16 curr_size = ioat2_ring_size(ioat);
 	const u16 active = ioat2_ring_active(ioat);
 	const u16 new_size = 1 << order;
 	struct ioat_ring_ent **ring;
diff --git a/drivers/dma/ioat/dma_v2.h b/drivers/dma/ioat/dma_v2.h
index ef2871f..d7b64f1 100644
--- a/drivers/dma/ioat/dma_v2.h
+++ b/drivers/dma/ioat/dma_v2.h
@@ -22,6 +22,7 @@
 #define IOATDMA_V2_H
 
 #include <linux/dmaengine.h>
+#include <linux/circ_buf.h>
 #include "dma.h"
 #include "hw.h"
 
@@ -71,31 +72,26 @@ static inline struct ioat2_dma_chan *to_ioat2_chan(struct dma_chan *c)
 	return container_of(chan, struct ioat2_dma_chan, base);
 }
 
-static inline u16 ioat2_ring_mask(struct ioat2_dma_chan *ioat)
+static inline u16 ioat2_ring_size(struct ioat2_dma_chan *ioat)
 {
-	return (1 << ioat->alloc_order) - 1;
+	return 1 << ioat->alloc_order;
 }
 
 /* count of descriptors in flight with the engine */
 static inline u16 ioat2_ring_active(struct ioat2_dma_chan *ioat)
 {
-	return (ioat->head - ioat->tail) & ioat2_ring_mask(ioat);
+	return CIRC_CNT(ioat->head, ioat->tail, ioat2_ring_size(ioat));
 }
 
 /* count of descriptors pending submission to hardware */
 static inline u16 ioat2_ring_pending(struct ioat2_dma_chan *ioat)
 {
-	return (ioat->head - ioat->issued) & ioat2_ring_mask(ioat);
+	return CIRC_CNT(ioat->head, ioat->issued, ioat2_ring_size(ioat));
 }
 
 static inline u16 ioat2_ring_space(struct ioat2_dma_chan *ioat)
 {
-	u16 num_descs = ioat2_ring_mask(ioat) + 1;
-	u16 active = ioat2_ring_active(ioat);
-
-	BUG_ON(active > num_descs);
-
-	return num_descs - active;
+	return ioat2_ring_size(ioat) - ioat2_ring_active(ioat);
 }
 
 /* assumes caller already checked space */
@@ -151,7 +147,7 @@ struct ioat_ring_ent {
 static inline struct ioat_ring_ent *
 ioat2_get_ring_ent(struct ioat2_dma_chan *ioat, u16 idx)
 {
-	return ioat->ring[idx & ioat2_ring_mask(ioat)];
+	return ioat->ring[idx & (ioat2_ring_size(ioat) - 1)];
 }
 
 static inline void ioat2_set_chainaddr(struct ioat2_dma_chan *ioat, u64 addr)


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH 2/2] ioat2,3: convert to producer/consumer locking
  2010-05-11 18:51 ` [PATCH 0/2] ioatdma: ring buffer management updates Dan Williams
  2010-05-11 18:51   ` [PATCH 1/2] ioat: convert to circ_buf Dan Williams
@ 2010-05-12  8:36   ` David Howells
  2010-05-13 23:42     ` Dan Williams
  1 sibling, 1 reply; 4+ messages in thread
From: David Howells @ 2010-05-12  8:36 UTC (permalink / raw)
  To: Dan Williams
  Cc: dhowells, linux-kernel, linux-raid, netdev, Paul E. McKenney,
	Maciej Sosnowski


Out of interest, does it make the code smaller if you mark
ioat2_get_ring_ent() and ioat2_ring_mask() with __attribute_const__?

I'm not sure whether it'll affect how long gcc is willing to cache these, but
once computed, I would guess they won't change within the calling function.

Also, is the device you're driving watching the ring and its indices?  If so,
does it modify the indices?  If that is the case, you might need to use
read_barrier_depends() rather than smp_read_barrier_depends().

> +		prefetch(ioat2_get_ring_ent(ioat, idx + i + 1));
> +		desc = ioat2_get_ring_ent(ioat, idx + i);
>  		dump_desc_dbg(ioat, desc);
>  		tx = &desc->txd;
>  		if (tx->cookie) {

Is this right, I wonder?  You're prefetching [i+1] before reading [i]?  Doesn't
this mean that you might have to wait for [i+1] to be retrieved from RAM before
[i] can be read?  Should you instead read tx->cookie before issuing the
prefetch?  Admittedly, this is only likely to affect the reading of the head of
the queue - subsequent reads in the same loop will, of course, have been
prefetched.

David

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 2/2] ioat2,3: convert to producer/consumer locking
  2010-05-12  8:36   ` [PATCH 2/2] ioat2,3: convert to producer/consumer locking David Howells
@ 2010-05-13 23:42     ` Dan Williams
  0 siblings, 0 replies; 4+ messages in thread
From: Dan Williams @ 2010-05-13 23:42 UTC (permalink / raw)
  To: David Howells
  Cc: linux-kernel, linux-raid, netdev, Paul E. McKenney, Maciej Sosnowski

On Wed, May 12, 2010 at 1:36 AM, David Howells <dhowells@redhat.com> wrote:
>
> Out of interest, does it make the code smaller if you mark
> ioat2_get_ring_ent() and ioat2_ring_mask() with __attribute_const__?
>
> I'm not sure whether it'll affect how long gcc is willing to cache these, but
> once computed, I would guess they won't change within the calling function.

Unfortunately, it does not make a difference, but I'll keep this in
mind if ioat2_get_ring_ent() ever gets more complicated (which it
might in the future).

> Also, is the device you're driving watching the ring and its indices?  If so,
> does it modify the indices?  If that is the case, you might need to use
> read_barrier_depends() rather than smp_read_barrier_depends().

The device does not observe the indices directly.  Instead we
increment a free running 'count' register by the distance between
ioat->pending and ioat->head.

>
>> +             prefetch(ioat2_get_ring_ent(ioat, idx + i + 1));
>> +             desc = ioat2_get_ring_ent(ioat, idx + i);
>>               dump_desc_dbg(ioat, desc);
>>               tx = &desc->txd;
>>               if (tx->cookie) {
>
> Is this right, I wonder?  You're prefetching [i+1] before reading [i]?  Doesn't
> this mean that you might have to wait for [i+1] to be retrieved from RAM before
> [i] can be read?  Should you instead read tx->cookie before issuing the
> prefetch?  Admittedly, this is only likely to affect the reading of the head of
> the queue - subsequent reads in the same loop will, of course, have been
> prefetched.

Yes, it should be the other way around.

Thanks!

--
Dan

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2010-05-13 23:42 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20100511185141.6139.98842.stgit@localhost.localdomain>
2010-05-11 18:51 ` [PATCH 0/2] ioatdma: ring buffer management updates Dan Williams
2010-05-11 18:51   ` [PATCH 1/2] ioat: convert to circ_buf Dan Williams
2010-05-12  8:36   ` [PATCH 2/2] ioat2,3: convert to producer/consumer locking David Howells
2010-05-13 23:42     ` Dan Williams

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.