* [PATCH] firewire: fw-ohci: plug dma memory leak in AR handler
@ 2008-03-25 20:47 Jarod Wilson
2008-03-25 22:29 ` Stefan Richter
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Jarod Wilson @ 2008-03-25 20:47 UTC (permalink / raw)
To: linux1394-devel; +Cc: linux-kernel
There's a nasty memory leak in firewire-ohci's ar_context_tasklet(), in that
we're not freeing up some of the memory we use for each ar_buffer, due to a
moving pointer. The problem has been there for a while, but didn't start
to be noticed until we were doing a coherent allocation for the ar_buffer --
meaning we have a smaller pool of memory to work with now, so the problem
crops up sooner. The manifestation of this comes after doing a bunch of I/O to
a firewire disk, which eventually stalls, and this starts spewing to the
console:
PCI-DMA: Out of IOMMU space for 53248 bytes at device 0000:04:09.0
The device there is one of my FireWire controllers trying to do I/O. The host
is a fairly new rev. opteron.
Just need to make sure we're freeing the correct memory range is pass through
ar_context_tasklet to fix it. Probably something we ought to sneak into 2.6.25
if its still doable...
Signed-off-by: Jarod Wilson <jwilson@redhat.com>
---
drivers/firewire/fw-ohci.c | 9 +++++----
1 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/firewire/fw-ohci.c b/drivers/firewire/fw-ohci.c
index 8ff9059..e1d50f7 100644
--- a/drivers/firewire/fw-ohci.c
+++ b/drivers/firewire/fw-ohci.c
@@ -579,7 +579,8 @@ static void ar_context_tasklet(unsigned long data)
if (d->res_count == 0) {
size_t size, rest, offset;
- dma_addr_t buffer_bus;
+ dma_addr_t start_bus;
+ void *start;
/*
* This descriptor is finished and we may have a
@@ -588,9 +589,9 @@ static void ar_context_tasklet(unsigned long data)
*/
offset = offsetof(struct ar_buffer, data);
- buffer_bus = le32_to_cpu(ab->descriptor.data_address) - offset;
+ start = buffer = ab;
+ start_bus = le32_to_cpu(ab->descriptor.data_address) - offset;
- buffer = ab;
ab = ab->next;
d = &ab->descriptor;
size = buffer + PAGE_SIZE - ctx->pointer;
@@ -605,7 +606,7 @@ static void ar_context_tasklet(unsigned long data)
buffer = handle_ar_packet(ctx, buffer);
dma_free_coherent(ohci->card.device, PAGE_SIZE,
- buffer, buffer_bus);
+ start, start_bus);
ar_context_add_page(ctx);
} else {
buffer = ctx->pointer;
--
Jarod Wilson
jwilson@redhat.com
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] firewire: fw-ohci: plug dma memory leak in AR handler
2008-03-25 20:47 [PATCH] firewire: fw-ohci: plug dma memory leak in AR handler Jarod Wilson
@ 2008-03-25 22:29 ` Stefan Richter
2008-03-26 7:09 ` Stefan Richter
2008-03-26 21:37 ` Jarod Wilson
2 siblings, 0 replies; 9+ messages in thread
From: Stefan Richter @ 2008-03-25 22:29 UTC (permalink / raw)
To: Jarod Wilson; +Cc: linux1394-devel, linux-kernel
Jarod Wilson wrote:
> Just need to make sure we're freeing the correct memory
That would be a plus. :-)
> Probably something we ought to sneak into 2.6.25 if its still doable...
Looks good and initial testing here is fine. I don't have a board with
IOMMU though. Will look over it once more tomorrow, then submit it.
--
Stefan Richter
-=====-==--- --== ==--=
http://arcgraph.de/sr/
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] firewire: fw-ohci: plug dma memory leak in AR handler
2008-03-25 20:47 [PATCH] firewire: fw-ohci: plug dma memory leak in AR handler Jarod Wilson
2008-03-25 22:29 ` Stefan Richter
@ 2008-03-26 7:09 ` Stefan Richter
2008-03-26 13:09 ` Jarod Wilson
2008-03-26 23:50 ` Stefan Richter
2008-03-26 21:37 ` Jarod Wilson
2 siblings, 2 replies; 9+ messages in thread
From: Stefan Richter @ 2008-03-26 7:09 UTC (permalink / raw)
To: Jarod Wilson; +Cc: linux1394-devel, linux-kernel
Jarod Wilson wrote:
> @@ -605,7 +606,7 @@ static void ar_context_tasklet(unsigned long data)
> buffer = handle_ar_packet(ctx, buffer);
>
> dma_free_coherent(ohci->card.device, PAGE_SIZE,
> - buffer, buffer_bus);
> + start, start_bus);
> ar_context_add_page(ctx);
On the other hand, why do we free a page + allocate a page?
Why don't we re-initialize and re-add the old page?
--
Stefan Richter
-=====-==--- --== ==-=-
http://arcgraph.de/sr/
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] firewire: fw-ohci: plug dma memory leak in AR handler
2008-03-26 7:09 ` Stefan Richter
@ 2008-03-26 13:09 ` Jarod Wilson
2008-03-26 23:50 ` Stefan Richter
1 sibling, 0 replies; 9+ messages in thread
From: Jarod Wilson @ 2008-03-26 13:09 UTC (permalink / raw)
To: Stefan Richter; +Cc: linux1394-devel, linux-kernel
On Wednesday 26 March 2008 03:09:47 am Stefan Richter wrote:
> Jarod Wilson wrote:
> > @@ -605,7 +606,7 @@ static void ar_context_tasklet(unsigned long data)
> > buffer = handle_ar_packet(ctx, buffer);
> >
> > dma_free_coherent(ohci->card.device, PAGE_SIZE,
> > - buffer, buffer_bus);
> > + start, start_bus);
> > ar_context_add_page(ctx);
>
> On the other hand, why do we free a page + allocate a page?
> Why don't we re-initialize and re-add the old page?
Oh good, I'm not crazy (outside of having firewire on the brain way too much
right now). I had that same thought tossing and turning in bed late last
night. :)
--
Jarod Wilson
jwilson@redhat.com
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] firewire: fw-ohci: plug dma memory leak in AR handler
2008-03-25 20:47 [PATCH] firewire: fw-ohci: plug dma memory leak in AR handler Jarod Wilson
2008-03-25 22:29 ` Stefan Richter
2008-03-26 7:09 ` Stefan Richter
@ 2008-03-26 21:37 ` Jarod Wilson
2008-03-27 0:12 ` Stefan Richter
2 siblings, 1 reply; 9+ messages in thread
From: Jarod Wilson @ 2008-03-26 21:37 UTC (permalink / raw)
To: linux1394-devel; +Cc: linux-kernel
On Tuesday 25 March 2008 04:47:16 pm Jarod Wilson wrote:
> There's a nasty memory leak in firewire-ohci's ar_context_tasklet(), in
> that we're not freeing up some of the memory we use for each ar_buffer, due
> to a moving pointer. The problem has been there for a while, but didn't
> start to be noticed until we were doing a coherent allocation for the
> ar_buffer -- meaning we have a smaller pool of memory to work with now, so
> the problem crops up sooner. The manifestation of this comes after doing a
> bunch of I/O to a firewire disk, which eventually stalls, and this starts
> spewing to the console:
>
> PCI-DMA: Out of IOMMU space for 53248 bytes at device 0000:04:09.0
>
> The device there is one of my FireWire controllers trying to do I/O. The
> host is a fairly new rev. opteron.
>
> Just need to make sure we're freeing the correct memory range is pass
> through ar_context_tasklet to fix it. Probably something we ought to sneak
> into 2.6.25 if its still doable...
So as it turns out, while this is indeed a leak that needs to be plugged, it
does NOT remedy the 'out of iommu space' issue, it just delays it a while
longer. Still working on tracing the root cause of the memory exhaustion.
--
Jarod Wilson
jwilson@redhat.com
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] firewire: fw-ohci: plug dma memory leak in AR handler
2008-03-26 7:09 ` Stefan Richter
2008-03-26 13:09 ` Jarod Wilson
@ 2008-03-26 23:50 ` Stefan Richter
2008-03-27 7:56 ` Stefan Richter
1 sibling, 1 reply; 9+ messages in thread
From: Stefan Richter @ 2008-03-26 23:50 UTC (permalink / raw)
To: Jarod Wilson; +Cc: linux1394-devel, linux-kernel
I wrote:
> Jarod Wilson wrote:
>> @@ -605,7 +606,7 @@ static void ar_context_tasklet(unsigned long data)
>> buffer = handle_ar_packet(ctx, buffer);
>>
>> dma_free_coherent(ohci->card.device, PAGE_SIZE,
>> - buffer, buffer_bus);
>> + start, start_bus);
>> ar_context_add_page(ctx);
>
> On the other hand, why do we free a page + allocate a page?
> Why don't we re-initialize and re-add the old page?
Meanwhile I tried a simple modification to ar_context_add_page and its
callers which results in _add_page simply re-adding the old page. I must
do something fundamentally wrong though.
After plugging in a FW disk and starting hdparm -tT, I get the modified
_add_page called for the ar_request_ctx, then for the ar_response_ctx,
then for the ar_request_ctx again, then everything stalls in one of
these modes:
- No status write request reception is logged anymore, or
- status write request reception with evt_no_status is logged.
The number of _add_page calls for ar_request_ctx until failure
corresponds to the number of pages added in ar_context_init.
(Normally two, I also tried three and four.)
Just FYI, here is basically what I tested, with a debug printk in it.
---
drivers/firewire/fw-ohci.c | 34 +++++++++++++++-------------------
1 file changed, 15 insertions(+), 19 deletions(-)
Index: linux/drivers/firewire/fw-ohci.c
===================================================================
--- linux.orig/drivers/firewire/fw-ohci.c
+++ linux/drivers/firewire/fw-ohci.c
@@ -451,14 +451,19 @@ ohci_update_phy_reg(struct fw_card *card
return 0;
}
-static int ar_context_add_page(struct ar_context *ctx)
+static int ar_context_add_page(struct ar_context *ctx, struct ar_buffer *ab)
{
struct device *dev = ctx->ohci->card.device;
- struct ar_buffer *ab;
dma_addr_t uninitialized_var(ab_bus);
- size_t offset;
+ size_t offset = offsetof(struct ar_buffer, data);
- ab = dma_alloc_coherent(dev, PAGE_SIZE, &ab_bus, GFP_ATOMIC);
+ if (ab == NULL)
+ ab = dma_alloc_coherent(dev, PAGE_SIZE, &ab_bus, GFP_KERNEL);
+ else {
+ ab_bus = le32_to_cpu(ab->descriptor.data_address) - offset;
+ fw_notify("=== %s ===\n",
+ ctx == &ctx->ohci->ar_request_ctx ? "Req " : "Resp");
+ }
if (ab == NULL)
return -ENOMEM;
@@ -466,7 +471,6 @@ static int ar_context_add_page(struct ar
ab->descriptor.control = cpu_to_le16(DESCRIPTOR_INPUT_MORE |
DESCRIPTOR_STATUS |
DESCRIPTOR_BRANCH_ALWAYS);
- offset = offsetof(struct ar_buffer, data);
ab->descriptor.req_count = cpu_to_le16(PAGE_SIZE - offset);
ab->descriptor.data_address = cpu_to_le32(ab_bus + offset);
ab->descriptor.res_count = cpu_to_le16(PAGE_SIZE - offset);
@@ -569,8 +573,7 @@ static __le32 *handle_ar_packet(struct a
static void ar_context_tasklet(unsigned long data)
{
struct ar_context *ctx = (struct ar_context *)data;
- struct fw_ohci *ohci = ctx->ohci;
- struct ar_buffer *ab;
+ struct ar_buffer *ab, *old_ab;
struct descriptor *d;
void *buffer, *end;
@@ -578,9 +581,7 @@ static void ar_context_tasklet(unsigned
d = &ab->descriptor;
if (d->res_count == 0) {
- size_t size, rest, offset;
- dma_addr_t start_bus;
- void *start;
+ size_t size, rest;
/*
* This descriptor is finished and we may have a
@@ -588,10 +589,7 @@ static void ar_context_tasklet(unsigned
* reuse the page for reassembling the split packet.
*/
- offset = offsetof(struct ar_buffer, data);
- start = buffer = ab;
- start_bus = le32_to_cpu(ab->descriptor.data_address) - offset;
-
+ buffer = old_ab = ab;
ab = ab->next;
d = &ab->descriptor;
size = buffer + PAGE_SIZE - ctx->pointer;
@@ -605,9 +603,7 @@ static void ar_context_tasklet(unsigned
while (buffer < end)
buffer = handle_ar_packet(ctx, buffer);
- dma_free_coherent(ohci->card.device, PAGE_SIZE,
- start, start_bus);
- ar_context_add_page(ctx);
+ ar_context_add_page(ctx, old_ab);
} else {
buffer = ctx->pointer;
ctx->pointer = end =
@@ -628,8 +624,8 @@ ar_context_init(struct ar_context *ctx,
ctx->last_buffer = &ab;
tasklet_init(&ctx->tasklet, ar_context_tasklet, (unsigned long)ctx);
- ar_context_add_page(ctx);
- ar_context_add_page(ctx);
+ ar_context_add_page(ctx, NULL);
+ ar_context_add_page(ctx, NULL);
ctx->current_buffer = ab.next;
ctx->pointer = ctx->current_buffer->data;
--
Stefan Richter
-=====-==--- --== ==-==
http://arcgraph.de/sr/
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] firewire: fw-ohci: plug dma memory leak in AR handler
2008-03-26 21:37 ` Jarod Wilson
@ 2008-03-27 0:12 ` Stefan Richter
2008-03-27 2:17 ` Jarod Wilson
0 siblings, 1 reply; 9+ messages in thread
From: Stefan Richter @ 2008-03-27 0:12 UTC (permalink / raw)
To: Jarod Wilson; +Cc: linux1394-devel, linux-kernel
Jarod Wilson wrote:
> So as it turns out, while this is indeed a leak that needs to be plugged, it
> does NOT remedy the 'out of iommu space' issue, it just delays it a while
> longer. Still working on tracing the root cause of the memory exhaustion.
Do you want to change the wording of the patch description before I
submit it upstream?
--
Stefan Richter
-=====-==--- --== ==-==
http://arcgraph.de/sr/
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] firewire: fw-ohci: plug dma memory leak in AR handler
2008-03-27 0:12 ` Stefan Richter
@ 2008-03-27 2:17 ` Jarod Wilson
0 siblings, 0 replies; 9+ messages in thread
From: Jarod Wilson @ 2008-03-27 2:17 UTC (permalink / raw)
To: Stefan Richter; +Cc: linux1394-devel, linux-kernel
On Wednesday 26 March 2008 08:12:51 pm Stefan Richter wrote:
> Jarod Wilson wrote:
> > So as it turns out, while this is indeed a leak that needs to be plugged,
> > it does NOT remedy the 'out of iommu space' issue, it just delays it a
> > while longer. Still working on tracing the root cause of the memory
> > exhaustion.
>
> Do you want to change the wording of the patch description before I
> submit it upstream?
Yeah, I'll whip something up in just a sec and get it out the door...
--
Jarod Wilson
jwilson@redhat.com
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] firewire: fw-ohci: plug dma memory leak in AR handler
2008-03-26 23:50 ` Stefan Richter
@ 2008-03-27 7:56 ` Stefan Richter
0 siblings, 0 replies; 9+ messages in thread
From: Stefan Richter @ 2008-03-27 7:56 UTC (permalink / raw)
To: Jarod Wilson; +Cc: linux1394-devel, linux-kernel
I wrote:
> I wrote:
>> On the other hand, why do we free a page + allocate a page?
>> Why don't we re-initialize and re-add the old page?
>
> Meanwhile I tried a simple modification to ar_context_add_page and its
> callers which results in _add_page simply re-adding the old page. I must
> do something fundamentally wrong though.
Besides, the current code which reassembles packets that reach into the
next buffer is broken for packets whose total size approaches PAGE_SIZE.
(Remember, async packets can be sized 4kB + 1394 headers + OHCI
trailer.) Reminds me of ohci1394 somehow. :-(
I will attempt to fix this for post 2.6.25, unless you aspire to do so.
--
Stefan Richter
-=====-==--- --== ==-==
http://arcgraph.de/sr/
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2008-03-27 7:57 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-03-25 20:47 [PATCH] firewire: fw-ohci: plug dma memory leak in AR handler Jarod Wilson
2008-03-25 22:29 ` Stefan Richter
2008-03-26 7:09 ` Stefan Richter
2008-03-26 13:09 ` Jarod Wilson
2008-03-26 23:50 ` Stefan Richter
2008-03-27 7:56 ` Stefan Richter
2008-03-26 21:37 ` Jarod Wilson
2008-03-27 0:12 ` Stefan Richter
2008-03-27 2:17 ` Jarod Wilson
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.