Re: at91-sama5d2_adc crash

From: <Eugen.Hristev@microchip.com>
To: <ardeleanalex@gmail.com>
Cc: <alexandru.ardelean@analog.com>, <jic23@kernel.org>,
	<linux-iio@vger.kernel.org>
Subject: Re: at91-sama5d2_adc crash
Date: Tue, 15 Sep 2020 16:30:23 +0000	[thread overview]
Message-ID: <a725eff5-c438-5d10-4090-15c49d52a91f@microchip.com> (raw)
In-Reply-To: <CA+U=DsqrFziYRjVfGnfBe0nKfxs0Le3pB0iYe=-7nDsLNZfXYw@mail.gmail.com>

On 14.09.2020 15:37, Alexandru Ardelean wrote:

> On Mon, Sep 14, 2020 at 2:33 PM <Eugen.Hristev@microchip.com> wrote:
>>
>> Hello Alex,
>>
>> Sorry to disturb but we have issues again with this patch :
>>
>>
> 
> firstly, many apologies for the breakage;
> 
>>
>> f3c034f61775 ("iio: at91-sama5d2_adc: adjust
>> iio_triggered_buffer_{predisable,postenable} positions")
>>
>> I recently discovered a crash when using buffered trigger with DMA with
>> this driver:
>>
>> # echo 100 > /sys/bus/iio/devices/iio\:device0/buffer/length
>> # echo 100 > /sys/bus/iio/devices/iio\:device0/buffer/watermark
>> # echo 1 > /sys/bus/iio/devices/iio\:device0/scan_elements/in_voltage4_en
>> # iio_generic_buffer -n fc030000.adc -t
>> fc030000.adc-dev0-external_rising -c 5
>> iio device number being used is 0
>> iio trigger number being used is 0
>> /sys/bus/iio/devicii/iio:device0 fc030000.adc-dev0-external_risingo
>> iio:device0: using dma0chan10 for rx DMA transfers
>>
>> Division by zero in kernel.
>> CPU: 0 PID: 243 Comm: irq/182-fc03000 Not tainted 5.8.0-rc1 #1
>> Hardware name: Atmel SAMA5
>> [<c010caf0>] (unwind_backtrace) from [<c010a034>] (show_stack+0x10/0x14)
>> [<c010a034>] (show_stack) from [<c03a892c>] (Ldiv0+0x8/0x10)
>> [<c03a892c>] (Ldiv0) from [<c03a88fc>] (__aeabi_uidivmod+0x8/0x18)
>> [<c03a88fc>] (__aeabi_uidivmod) from [<c03592d4>] (div_s64_rem+0x3c/0xc4)
>> [<c03592d4>] (div_s64_rem) from [<c05ed344>]
> 
> which of these lines is the issue?
> 
>          sample_count = div_s64(transferred_len, sample_size);
> 
>          /*
>           * interval between samples is total time since last transfer handling
>           * divided by the number of samples (total size divided by sample size)
>           */
>          interval = div_s64((ns - st->dma_st.dma_ts), sample_count);
> 
> would a simple return like below make sense?
> if (!sample_size)    // or  if (!sample_count)
>      return;
> 
> Is at91_adc_dma_size_done() returning 0?
> Or more closely, which variable is zero?
> 
>> (at91_adc_trigger_handler+0xcc/0x494)
>> [<c05ed344>] (at91_adc_trigger_handler) from [<c014944c>]
>> (irq_thread_fn+0x1c/0x78)
>> [<c014944c>] (irq_thread_fn) from [<c01496dc>] (irq_thread+0x124/0x1d0)
>> [<c01496dc>] (irq_thread) from [<c01325b4>] (kthread+0x138/0x140)
>> [<c01325b4>] (kthread) from [<c0100148>] (ret_from_fork+0x14/0x2c)
>> Exception stack(0xde49dfb0 to 0xde49dff8)
>> dfa0:                                     00000000 00000000 00000000
>> 00000000
>> dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000
>> 00000000
>> dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
>> Division by zero in kernel.
>> CPU: 0 PID: 243 Comm: irq/182-fc03000 Not tainted 5.8.0-rc1 #1
>> Hardware name: Atmel SAMA5
>> [<c010caf0>] (unwind_backtrace) from [<c010a034>] (show_stack+0x10/0x14)
>> [<c010a034>] (show_stack) from [<c03a7e24>] (Ldiv0_64+0x8/0x18)
>> [<c03a7e24>] (Ldiv0_64) from [<c0359344>] (div_s64_rem+0xac/0xc4)
>> [<c0359344>] (div_s64_rem) from [<c05ed360>]
>> (at91_adc_trigger_handler+0xe8/0x494)
>> [<c05ed360>] (at91_adc_trigger_handler) from [<c014944c>]
>> (irq_thread_fn+0x1c/0x78)
>> [<c014944c>] (irq_thread_fn) from [<c01496dc>] (irq_thread+0x124/0x1d0)
>> [<c01496dc>] (irq_thread) from [<c01325b4>] (kthread+0x138/0x140)
>> [<c01325b4>] (kthread) from [<c0100148>] (ret_from_fork+0x14/0x2c)
>> Exception stack(0xde49dfb0 to 0xde49dff8)
>> dfa0:                                     00000000 00000000 00000000
>> 00000000
>> dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000
>> 00000000
>> dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
>> sched: RT throttling activated
>>
>>
>> It looks like crash is there since 5.8-rc1 introduced by that patch
>>
>> I looked in the code and it looks something is zero, probably the
>> received buffer size. It is likely that the DMA starts too soon before
>> the buffer is properly setup ?
> 
> I am not sure about this. See [1]
> 
>>
>> Can you help with fixing this ? or you know how we can do it ?
>> Also could you remind me why we enable and start the DMA on pre-enable
>> of the buffer instead of post-enable of the buffer ?
> 
> [1]
> So, in iio_enable_buffers(), the order is:
> preenable()
> update_scan_mode()
> hwfifo_set_watermark()
> iio_buffer_enable()    <-  this can also enable the DMA buffer if
> using the IIO DMA buffer framework
> [2] IIO-core-now-attaches-pollfunc via iio_trigger_attach_poll_func()
> (if current mode is INDIO_BUFFER_TRIGGERED)
> postenable()
> 
> I could guess maybe that hwfifo_set_watermark() is called too late?
> Ideally, the driver would use the IIO DMA buffer hook to enable the
> buffer via iio_buffer_enable()
> That would be the ideal way to do it, and not enable DMA in preenable.

Hi,

you are correct. The set watermark is being set too late. And the 
dma_init is called when a watermark higher than 1 is being set. In 
consequence, DMA is not initialized at the moment of dma_start (which is 
now in preenable, which is before the set_watermark).

Do you have any suggestion about how to fix this ? Move the start dma to 
set_watermark maybe ?

Thanks,
Eugen
> 
> Regarding [2] it was discussed that attach/detach should be done in
> that order, and it should be done in IIO core.
> So, all drivers were cleaned up to respect that order.
> The idea is more in the lines that repetitive/duplicated things should
> be done by the IIO framework, and attach/detach-of-pollfunc is one of
> them.
> 
>> In pre-enable, do we have everything ready inside IIO to be able to
>> start the DMA? Or it's better to have it at post-enable time ?
> 
> This really depends on each driver.
> 
>>
>> I know you want to ditch the post-enable and pre-disable hooks, but it
>> looks this driver needs them, or we need to find a way to make it work
>> properly
> 
> These hooks have already been dropped in favor of the IIO core doing
> attach/detach.
> At this point, it may be a bit more difficult to go back, than to move forward.
> 
> I think this crash should be fixable to function with this IIO core change.
> 
> Again, apologies for the breakage.
> 
> Thanks
> Alex
> 
>>
>> Thanks !
>> Eugen