All of lore.kernel.org
 help / color / mirror / Atom feed
* where is async_tx_clear_ack() function called.
@ 2009-07-09 23:36 Tirumala Reddy Marri
  2009-07-10 17:15 ` Dan Williams
  0 siblings, 1 reply; 5+ messages in thread
From: Tirumala Reddy Marri @ 2009-07-09 23:36 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-raid

Hi Dan,
    While I was debugging file system crash with AMCC ADMA driver , I
noticed that async_tx_clear_ack() never been called. Where is the
async_tx->flags is cleared up on completion of DMA transfer ? Looks like
this not cleared and I am hitting the BUG_ON() in async_tx_submit(). If
I call async_tx_clear_ack() in ppc4xx_ adma_free_slots() function
everything seems working fine .  

This doesn't sound normal something is going on. Somehow descriptors are
allocated even though they are not free .

Thanks and Regards,
Marri


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: where is async_tx_clear_ack() function called.
  2009-07-09 23:36 where is async_tx_clear_ack() function called Tirumala Reddy Marri
@ 2009-07-10 17:15 ` Dan Williams
  2009-07-10 17:44   ` Tirumala Reddy Marri
  0 siblings, 1 reply; 5+ messages in thread
From: Dan Williams @ 2009-07-10 17:15 UTC (permalink / raw)
  To: Tirumala Reddy Marri; +Cc: linux-raid

On Thu, Jul 9, 2009 at 4:36 PM, Tirumala Reddy Marri<tmarri@amcc.com> wrote:
> Hi Dan,
>    While I was debugging file system crash with AMCC ADMA driver , I
> noticed that async_tx_clear_ack() never been called. Where is the
> async_tx->flags is cleared up on completion of DMA transfer ? Looks like
> this not cleared and I am hitting the BUG_ON() in async_tx_submit(). If
> I call async_tx_clear_ack() in ppc4xx_ adma_free_slots() function
> everything seems working fine .
>
> This doesn't sound normal something is going on. Somehow descriptors are
> allocated even though they are not free .
>

If you take a look at iop-adma.c you will see lines like:

     sw_desc->async_tx.flags = flags;

So we inheret the DMA_CTRL_ACK flag from when async_tx calls the
->prep routine.  It is not important to clear ack on completion, it is
only important to reset it at prep time.  It is merely a flag that
holds off recycling descriptors while the api may still want to attach
a dependency.

If memory serves correctly we have tried to debug this driver before,
please refresh my memory on how this driver is different than the one
produced by Emcraft[1] that does not seem to exhibit these problems?

Thanks,
Dan


[1]: http://patchwork.ozlabs.org/patch/18116/
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: where is async_tx_clear_ack() function called.
  2009-07-10 17:15 ` Dan Williams
@ 2009-07-10 17:44   ` Tirumala Reddy Marri
  2009-07-10 17:53     ` Dan Williams
  0 siblings, 1 reply; 5+ messages in thread
From: Tirumala Reddy Marri @ 2009-07-10 17:44 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-raid



Thanks Dan.
  I see the flags are set in the xxxx_pre_dma_xor. I am hitting  BUG_ON(async_tx_test_ack(depend_tx) || depend_tx->next || tx->parent) I happening. Looks like some dependency related issue.  When I suppress this by calling async_tx_clear_ack() in xxx_free_slots() function, that is leading to file-system corruption in the case of recovery or re-build.


The driver I am working on is a different SOC . It  is a subset of (DMA+XOR little different) what emccraft posted. That driver can't be used as it is. There is another SOC ADMA driver which is completely different HW from the emccrafts driver.


Thanks and Regards,
Marri

-----Original Message-----
From: dan.j.williams@gmail.com [mailto:dan.j.williams@gmail.com] On Behalf Of Dan Williams
Sent: Friday, July 10, 2009 10:15 AM
To: Tirumala Reddy Marri
Cc: linux-raid@vger.kernel.org
Subject: Re: where is async_tx_clear_ack() function called.

On Thu, Jul 9, 2009 at 4:36 PM, Tirumala Reddy Marri<tmarri@amcc.com> wrote:
> Hi Dan,
>    While I was debugging file system crash with AMCC ADMA driver , I
> noticed that async_tx_clear_ack() never been called. Where is the
> async_tx->flags is cleared up on completion of DMA transfer ? Looks like
> this not cleared and I am hitting the BUG_ON() in async_tx_submit(). If
> I call async_tx_clear_ack() in ppc4xx_ adma_free_slots() function
> everything seems working fine .
>
> This doesn't sound normal something is going on. Somehow descriptors are
> allocated even though they are not free .
>

If you take a look at iop-adma.c you will see lines like:

     sw_desc->async_tx.flags = flags;

So we inheret the DMA_CTRL_ACK flag from when async_tx calls the
->prep routine.  It is not important to clear ack on completion, it is
only important to reset it at prep time.  It is merely a flag that
holds off recycling descriptors while the api may still want to attach
a dependency.

If memory serves correctly we have tried to debug this driver before,
please refresh my memory on how this driver is different than the one
produced by Emcraft[1] that does not seem to exhibit these problems?

Thanks,
Dan


[1]: http://patchwork.ozlabs.org/patch/18116/
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: where is async_tx_clear_ack() function called.
  2009-07-10 17:44   ` Tirumala Reddy Marri
@ 2009-07-10 17:53     ` Dan Williams
  2009-07-10 20:40       ` Tirumala Reddy Marri
  0 siblings, 1 reply; 5+ messages in thread
From: Dan Williams @ 2009-07-10 17:53 UTC (permalink / raw)
  To: Tirumala Reddy Marri; +Cc: linux-raid

On Fri, 2009-07-10 at 10:44 -0700, Tirumala Reddy Marri wrote:
> 
> Thanks Dan.
>   I see the flags are set in the xxxx_pre_dma_xor. I am hitting
> BUG_ON(async_tx_test_ack(depend_tx) || depend_tx->next || tx->parent)
> I happening. Looks like some dependency related issue.  When I
> suppress this by calling async_tx_clear_ack() in xxx_free_slots()
> function, that is leading to file-system corruption in the case of
> recovery or re-build.
> 
> 
> The driver I am working on is a different SOC . It  is a subset of
> (DMA+XOR little different) what emccraft posted. That driver can't be
> used as it is. There is another SOC ADMA driver which is completely
> different HW from the emccrafts driver.

Channel switching can be tricky to debug.  One thing you might try is
limiting offload to one channel.  I.e. if memcpy and xor are provided by
different channels then turn off (don't register) the memcpy channels
and see if that resolves your issue.  If it does then you need to review
the iop-adma driver or the Emcraft ppc440spe driver to see what is
causing the mishandling of channel switching.

--
Dan



^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: where is async_tx_clear_ack() function called.
  2009-07-10 17:53     ` Dan Williams
@ 2009-07-10 20:40       ` Tirumala Reddy Marri
  0 siblings, 0 replies; 5+ messages in thread
From: Tirumala Reddy Marri @ 2009-07-10 20:40 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-raid


Dan,
   After disabling memcopy support in ADMA, file system corruption
doesn't seem to happening. I am going to take a look at this channel
switching stuff.
Regards,
Marri

On Fri, 2009-07-10 at 10:44 -0700, Tirumala Reddy Marri wrote:
> 
> Thanks Dan.
>   I see the flags are set in the xxxx_pre_dma_xor. I am hitting
> BUG_ON(async_tx_test_ack(depend_tx) || depend_tx->next || tx->parent)
> I happening. Looks like some dependency related issue.  When I
> suppress this by calling async_tx_clear_ack() in xxx_free_slots()
> function, that is leading to file-system corruption in the case of
> recovery or re-build.
> 
> 
> The driver I am working on is a different SOC . It  is a subset of
> (DMA+XOR little different) what emccraft posted. That driver can't be
> used as it is. There is another SOC ADMA driver which is completely
> different HW from the emccrafts driver.

Channel switching can be tricky to debug.  One thing you might try is
limiting offload to one channel.  I.e. if memcpy and xor are provided by
different channels then turn off (don't register) the memcpy channels
and see if that resolves your issue.  If it does then you need to review
the iop-adma driver or the Emcraft ppc440spe driver to see what is
causing the mishandling of channel switching.



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2009-07-10 20:40 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-07-09 23:36 where is async_tx_clear_ack() function called Tirumala Reddy Marri
2009-07-10 17:15 ` Dan Williams
2009-07-10 17:44   ` Tirumala Reddy Marri
2009-07-10 17:53     ` Dan Williams
2009-07-10 20:40       ` Tirumala Reddy Marri

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.