All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] firewire: fw-ohci: plug dma memory leak in AR handler
@ 2008-03-25 20:47 Jarod Wilson
  2008-03-25 22:29 ` Stefan Richter
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Jarod Wilson @ 2008-03-25 20:47 UTC (permalink / raw)
  To: linux1394-devel; +Cc: linux-kernel

There's a nasty memory leak in firewire-ohci's ar_context_tasklet(), in that
we're not freeing up some of the memory we use for each ar_buffer, due to a
moving pointer. The problem has been there for a while, but didn't start
to be noticed until we were doing a coherent allocation for the ar_buffer --
meaning we have a smaller pool of memory to work with now, so the problem
crops up sooner. The manifestation of this comes after doing a bunch of I/O to
a firewire disk, which eventually stalls, and this starts spewing to the
console:

PCI-DMA: Out of IOMMU space for 53248 bytes at device 0000:04:09.0

The device there is one of my FireWire controllers trying to do I/O. The host
is a fairly new rev. opteron.

Just need to make sure we're freeing the correct memory range is pass through
ar_context_tasklet to fix it. Probably something we ought to sneak into 2.6.25
if its still doable...

Signed-off-by: Jarod Wilson <jwilson@redhat.com>
---

 drivers/firewire/fw-ohci.c |    9 +++++----
 1 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/firewire/fw-ohci.c b/drivers/firewire/fw-ohci.c
index 8ff9059..e1d50f7 100644
--- a/drivers/firewire/fw-ohci.c
+++ b/drivers/firewire/fw-ohci.c
@@ -579,7 +579,8 @@ static void ar_context_tasklet(unsigned long data)
 
 	if (d->res_count == 0) {
 		size_t size, rest, offset;
-		dma_addr_t buffer_bus;
+		dma_addr_t start_bus;
+		void *start;
 
 		/*
 		 * This descriptor is finished and we may have a
@@ -588,9 +589,9 @@ static void ar_context_tasklet(unsigned long data)
 		 */
 
 		offset = offsetof(struct ar_buffer, data);
-		buffer_bus = le32_to_cpu(ab->descriptor.data_address) - offset;
+		start = buffer = ab;
+		start_bus = le32_to_cpu(ab->descriptor.data_address) - offset;
 
-		buffer = ab;
 		ab = ab->next;
 		d = &ab->descriptor;
 		size = buffer + PAGE_SIZE - ctx->pointer;
@@ -605,7 +606,7 @@ static void ar_context_tasklet(unsigned long data)
 			buffer = handle_ar_packet(ctx, buffer);
 
 		dma_free_coherent(ohci->card.device, PAGE_SIZE,
-				  buffer, buffer_bus);
+				  start, start_bus);
 		ar_context_add_page(ctx);
 	} else {
 		buffer = ctx->pointer;

-- 
Jarod Wilson
jwilson@redhat.com

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] firewire: fw-ohci: plug dma memory leak in AR handler
  2008-03-25 20:47 [PATCH] firewire: fw-ohci: plug dma memory leak in AR handler Jarod Wilson
@ 2008-03-25 22:29 ` Stefan Richter
  2008-03-26  7:09 ` Stefan Richter
  2008-03-26 21:37 ` Jarod Wilson
  2 siblings, 0 replies; 9+ messages in thread
From: Stefan Richter @ 2008-03-25 22:29 UTC (permalink / raw)
  To: Jarod Wilson; +Cc: linux1394-devel, linux-kernel

Jarod Wilson wrote:
> Just need to make sure we're freeing the correct memory

That would be a plus.  :-)

> Probably something we ought to sneak into 2.6.25 if its still doable...

Looks good and initial testing here is fine.  I don't have a board with 
IOMMU though.  Will look over it once more tomorrow, then submit it.
-- 
Stefan Richter
-=====-==--- --== ==--=
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] firewire: fw-ohci: plug dma memory leak in AR handler
  2008-03-25 20:47 [PATCH] firewire: fw-ohci: plug dma memory leak in AR handler Jarod Wilson
  2008-03-25 22:29 ` Stefan Richter
@ 2008-03-26  7:09 ` Stefan Richter
  2008-03-26 13:09   ` Jarod Wilson
  2008-03-26 23:50   ` Stefan Richter
  2008-03-26 21:37 ` Jarod Wilson
  2 siblings, 2 replies; 9+ messages in thread
From: Stefan Richter @ 2008-03-26  7:09 UTC (permalink / raw)
  To: Jarod Wilson; +Cc: linux1394-devel, linux-kernel

Jarod Wilson wrote:
> @@ -605,7 +606,7 @@ static void ar_context_tasklet(unsigned long data)
>  			buffer = handle_ar_packet(ctx, buffer);
>  
>  		dma_free_coherent(ohci->card.device, PAGE_SIZE,
> -				  buffer, buffer_bus);
> +				  start, start_bus);
>  		ar_context_add_page(ctx);

On the other hand, why do we free a page + allocate a page?
Why don't we re-initialize and re-add the old page?
-- 
Stefan Richter
-=====-==--- --== ==-=-
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] firewire: fw-ohci: plug dma memory leak in AR handler
  2008-03-26  7:09 ` Stefan Richter
@ 2008-03-26 13:09   ` Jarod Wilson
  2008-03-26 23:50   ` Stefan Richter
  1 sibling, 0 replies; 9+ messages in thread
From: Jarod Wilson @ 2008-03-26 13:09 UTC (permalink / raw)
  To: Stefan Richter; +Cc: linux1394-devel, linux-kernel

On Wednesday 26 March 2008 03:09:47 am Stefan Richter wrote:
> Jarod Wilson wrote:
> > @@ -605,7 +606,7 @@ static void ar_context_tasklet(unsigned long data)
> >  			buffer = handle_ar_packet(ctx, buffer);
> >
> >  		dma_free_coherent(ohci->card.device, PAGE_SIZE,
> > -				  buffer, buffer_bus);
> > +				  start, start_bus);
> >  		ar_context_add_page(ctx);
>
> On the other hand, why do we free a page + allocate a page?
> Why don't we re-initialize and re-add the old page?

Oh good, I'm not crazy (outside of having firewire on the brain way too much 
right now). I had that same thought tossing and turning in bed late last 
night. :)

-- 
Jarod Wilson
jwilson@redhat.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] firewire: fw-ohci: plug dma memory leak in AR handler
  2008-03-25 20:47 [PATCH] firewire: fw-ohci: plug dma memory leak in AR handler Jarod Wilson
  2008-03-25 22:29 ` Stefan Richter
  2008-03-26  7:09 ` Stefan Richter
@ 2008-03-26 21:37 ` Jarod Wilson
  2008-03-27  0:12   ` Stefan Richter
  2 siblings, 1 reply; 9+ messages in thread
From: Jarod Wilson @ 2008-03-26 21:37 UTC (permalink / raw)
  To: linux1394-devel; +Cc: linux-kernel

On Tuesday 25 March 2008 04:47:16 pm Jarod Wilson wrote:
> There's a nasty memory leak in firewire-ohci's ar_context_tasklet(), in
> that we're not freeing up some of the memory we use for each ar_buffer, due
> to a moving pointer. The problem has been there for a while, but didn't
> start to be noticed until we were doing a coherent allocation for the
> ar_buffer -- meaning we have a smaller pool of memory to work with now, so
> the problem crops up sooner. The manifestation of this comes after doing a
> bunch of I/O to a firewire disk, which eventually stalls, and this starts
> spewing to the console:
>
> PCI-DMA: Out of IOMMU space for 53248 bytes at device 0000:04:09.0
>
> The device there is one of my FireWire controllers trying to do I/O. The
> host is a fairly new rev. opteron.
>
> Just need to make sure we're freeing the correct memory range is pass
> through ar_context_tasklet to fix it. Probably something we ought to sneak
> into 2.6.25 if its still doable...

So as it turns out, while this is indeed a leak that needs to be plugged, it 
does NOT remedy the 'out of iommu space' issue, it just delays it a while 
longer. Still working on tracing the root cause of the memory exhaustion.


-- 
Jarod Wilson
jwilson@redhat.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] firewire: fw-ohci: plug dma memory leak in AR handler
  2008-03-26  7:09 ` Stefan Richter
  2008-03-26 13:09   ` Jarod Wilson
@ 2008-03-26 23:50   ` Stefan Richter
  2008-03-27  7:56     ` Stefan Richter
  1 sibling, 1 reply; 9+ messages in thread
From: Stefan Richter @ 2008-03-26 23:50 UTC (permalink / raw)
  To: Jarod Wilson; +Cc: linux1394-devel, linux-kernel

I wrote:
> Jarod Wilson wrote:
>> @@ -605,7 +606,7 @@ static void ar_context_tasklet(unsigned long data)
>>  			buffer = handle_ar_packet(ctx, buffer);
>>  
>>  		dma_free_coherent(ohci->card.device, PAGE_SIZE,
>> -				  buffer, buffer_bus);
>> +				  start, start_bus);
>>  		ar_context_add_page(ctx);
> 
> On the other hand, why do we free a page + allocate a page?
> Why don't we re-initialize and re-add the old page?


Meanwhile I tried a simple modification to ar_context_add_page and its
callers which results in _add_page simply re-adding the old page. I must
do something fundamentally wrong though.

After plugging in a FW disk and starting hdparm -tT, I get the modified
_add_page called for the ar_request_ctx, then for the ar_response_ctx,
then for the ar_request_ctx again, then everything stalls in one of
these modes:
  - No status write request reception is logged anymore, or
  - status write request reception with evt_no_status is logged.
The number of _add_page calls for ar_request_ctx until failure
corresponds to the number of pages added in ar_context_init.
(Normally two, I also tried three and four.)

Just FYI, here is basically what I tested, with a debug printk in it.
---
 drivers/firewire/fw-ohci.c |   34 +++++++++++++++-------------------
 1 file changed, 15 insertions(+), 19 deletions(-)

Index: linux/drivers/firewire/fw-ohci.c
===================================================================
--- linux.orig/drivers/firewire/fw-ohci.c
+++ linux/drivers/firewire/fw-ohci.c
@@ -451,14 +451,19 @@ ohci_update_phy_reg(struct fw_card *card
 	return 0;
 }
 
-static int ar_context_add_page(struct ar_context *ctx)
+static int ar_context_add_page(struct ar_context *ctx, struct ar_buffer *ab)
 {
 	struct device *dev = ctx->ohci->card.device;
-	struct ar_buffer *ab;
 	dma_addr_t uninitialized_var(ab_bus);
-	size_t offset;
+	size_t offset = offsetof(struct ar_buffer, data);
 
-	ab = dma_alloc_coherent(dev, PAGE_SIZE, &ab_bus, GFP_ATOMIC);
+	if (ab == NULL)
+		ab = dma_alloc_coherent(dev, PAGE_SIZE, &ab_bus, GFP_KERNEL);
+	else {
+		ab_bus = le32_to_cpu(ab->descriptor.data_address) - offset;
+		fw_notify("=== %s ===\n",
+			  ctx == &ctx->ohci->ar_request_ctx  ? "Req " : "Resp");
+	}
 	if (ab == NULL)
 		return -ENOMEM;
 
@@ -466,7 +471,6 @@ static int ar_context_add_page(struct ar
 	ab->descriptor.control        = cpu_to_le16(DESCRIPTOR_INPUT_MORE |
 						    DESCRIPTOR_STATUS |
 						    DESCRIPTOR_BRANCH_ALWAYS);
-	offset = offsetof(struct ar_buffer, data);
 	ab->descriptor.req_count      = cpu_to_le16(PAGE_SIZE - offset);
 	ab->descriptor.data_address   = cpu_to_le32(ab_bus + offset);
 	ab->descriptor.res_count      = cpu_to_le16(PAGE_SIZE - offset);
@@ -569,8 +573,7 @@ static __le32 *handle_ar_packet(struct a
 static void ar_context_tasklet(unsigned long data)
 {
 	struct ar_context *ctx = (struct ar_context *)data;
-	struct fw_ohci *ohci = ctx->ohci;
-	struct ar_buffer *ab;
+	struct ar_buffer *ab, *old_ab;
 	struct descriptor *d;
 	void *buffer, *end;
 
@@ -578,9 +581,7 @@ static void ar_context_tasklet(unsigned 
 	d = &ab->descriptor;
 
 	if (d->res_count == 0) {
-		size_t size, rest, offset;
-		dma_addr_t start_bus;
-		void *start;
+		size_t size, rest;
 
 		/*
 		 * This descriptor is finished and we may have a
@@ -588,10 +589,7 @@ static void ar_context_tasklet(unsigned 
 		 * reuse the page for reassembling the split packet.
 		 */
 
-		offset = offsetof(struct ar_buffer, data);
-		start = buffer = ab;
-		start_bus = le32_to_cpu(ab->descriptor.data_address) - offset;
-
+		buffer = old_ab = ab;
 		ab = ab->next;
 		d = &ab->descriptor;
 		size = buffer + PAGE_SIZE - ctx->pointer;
@@ -605,9 +603,7 @@ static void ar_context_tasklet(unsigned 
 		while (buffer < end)
 			buffer = handle_ar_packet(ctx, buffer);
 
-		dma_free_coherent(ohci->card.device, PAGE_SIZE,
-				  start, start_bus);
-		ar_context_add_page(ctx);
+		ar_context_add_page(ctx, old_ab);
 	} else {
 		buffer = ctx->pointer;
 		ctx->pointer = end =
@@ -628,8 +624,8 @@ ar_context_init(struct ar_context *ctx, 
 	ctx->last_buffer = &ab;
 	tasklet_init(&ctx->tasklet, ar_context_tasklet, (unsigned long)ctx);
 
-	ar_context_add_page(ctx);
-	ar_context_add_page(ctx);
+	ar_context_add_page(ctx, NULL);
+	ar_context_add_page(ctx, NULL);
 	ctx->current_buffer = ab.next;
 	ctx->pointer = ctx->current_buffer->data;
 

-- 
Stefan Richter
-=====-==--- --== ==-==
http://arcgraph.de/sr/


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] firewire: fw-ohci: plug dma memory leak in AR handler
  2008-03-26 21:37 ` Jarod Wilson
@ 2008-03-27  0:12   ` Stefan Richter
  2008-03-27  2:17     ` Jarod Wilson
  0 siblings, 1 reply; 9+ messages in thread
From: Stefan Richter @ 2008-03-27  0:12 UTC (permalink / raw)
  To: Jarod Wilson; +Cc: linux1394-devel, linux-kernel

Jarod Wilson wrote:
> So as it turns out, while this is indeed a leak that needs to be plugged, it 
> does NOT remedy the 'out of iommu space' issue, it just delays it a while 
> longer. Still working on tracing the root cause of the memory exhaustion.

Do you want to change the wording of the patch description before I 
submit it upstream?
-- 
Stefan Richter
-=====-==--- --== ==-==
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] firewire: fw-ohci: plug dma memory leak in AR handler
  2008-03-27  0:12   ` Stefan Richter
@ 2008-03-27  2:17     ` Jarod Wilson
  0 siblings, 0 replies; 9+ messages in thread
From: Jarod Wilson @ 2008-03-27  2:17 UTC (permalink / raw)
  To: Stefan Richter; +Cc: linux1394-devel, linux-kernel

On Wednesday 26 March 2008 08:12:51 pm Stefan Richter wrote:
> Jarod Wilson wrote:
> > So as it turns out, while this is indeed a leak that needs to be plugged,
> > it does NOT remedy the 'out of iommu space' issue, it just delays it a
> > while longer. Still working on tracing the root cause of the memory
> > exhaustion.
>
> Do you want to change the wording of the patch description before I
> submit it upstream?

Yeah, I'll whip something up in just a sec and get it out the door...

-- 
Jarod Wilson
jwilson@redhat.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] firewire: fw-ohci: plug dma memory leak in AR handler
  2008-03-26 23:50   ` Stefan Richter
@ 2008-03-27  7:56     ` Stefan Richter
  0 siblings, 0 replies; 9+ messages in thread
From: Stefan Richter @ 2008-03-27  7:56 UTC (permalink / raw)
  To: Jarod Wilson; +Cc: linux1394-devel, linux-kernel

I wrote:
> I wrote:
>> On the other hand, why do we free a page + allocate a page?
>> Why don't we re-initialize and re-add the old page?
> 
> Meanwhile I tried a simple modification to ar_context_add_page and its
> callers which results in _add_page simply re-adding the old page. I must
> do something fundamentally wrong though.

Besides, the current code which reassembles packets that reach into the 
next buffer is broken for packets whose total size approaches PAGE_SIZE. 
(Remember, async packets can be sized 4kB + 1394 headers + OHCI 
trailer.)  Reminds me of ohci1394 somehow.  :-(

I will attempt to fix this for post 2.6.25, unless you aspire to do so.
-- 
Stefan Richter
-=====-==--- --== ==-==
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2008-03-27  7:57 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-03-25 20:47 [PATCH] firewire: fw-ohci: plug dma memory leak in AR handler Jarod Wilson
2008-03-25 22:29 ` Stefan Richter
2008-03-26  7:09 ` Stefan Richter
2008-03-26 13:09   ` Jarod Wilson
2008-03-26 23:50   ` Stefan Richter
2008-03-27  7:56     ` Stefan Richter
2008-03-26 21:37 ` Jarod Wilson
2008-03-27  0:12   ` Stefan Richter
2008-03-27  2:17     ` Jarod Wilson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.