All of lore.kernel.org
 help / color / mirror / Atom feed
* adi-axi-adc issues and how to properly support this designs
@ 2023-03-31 14:40 Nuno Sá
  2023-04-10 12:21 ` Jonathan Cameron
  0 siblings, 1 reply; 7+ messages in thread
From: Nuno Sá @ 2023-03-31 14:40 UTC (permalink / raw)
  To: Jonathan Cameron; +Cc: linux-iio

Hi Jonathan,

There are some major issues with the implementation we have upstream
for the adi-axi-adc driver. The only user is the ad9467 driver. In the
rest of the email I will refer to ad9467 as the converter device and
axi-adc as the IP core. 

Let me start to state that these designs, in a very a basic way, have a
converter connected to a IP core that typically lives in a FPGA. This
core connects to a DMA core (which is used by the DMA buffer
implementation) so we can keep up with the high rates of these
converters. So there's a link between these devices and we try to
implement that so far looking at them as one single IIO device.

Let's go to the major issues now:

1) There is a circular dependency between the two device. And when
compiled as modules for example, we cannot really rmmod the modules
anymore:

"root@analog:~# rmmod ad9467
rmmod: ERROR: Module ad9467 is in use

root@analog:~# rmmod adi-axi-adc.ko
rmmod: ERROR: Module adi_axi_adc is in use by: ad9467
"

This easy to understand as the ad9467 driver as a direct symbol
dependency on the axi-adc one. And this one calls 'module_get()' as
soon as it attaches to a "client"

2) It's fairly easy to crash the kernel:

"
root@analog:/sys/bus/spi/drivers/ad9467# echo spi0.0 > unbind
root@analog:/sys/bus/iio/devices/iio:device2# echo 1 >
buffer0/in_voltage0_en
root@analog:/sys/bus/iio/devices/iio:device2# echo 1 > buffer0/enable
[  132.349133] 8<--- cut here ---
[  132.352193] Unable to handle kernel paging request at virtual
address e0940000 when read
[  132.360333] [e0940000] *pgd=0208b811, *pte=00000000, *ppte=00000000
[  132.366668] Internal error: Oops: 7 [#1] SMP ARM
[  132.371289] Modules linked in: ad9467 adi_axi_adc ad9517
[  132.376609] CPU: 1 PID: 444 Comm: bash Not tainted 6.2.9-dirty #3
[  132.382701] Hardware name: Xilinx Zynq Platform
[  132.387223] PC is at adi_axi_adc_update_scan_mode+0x34/0x88
[adi_axi_adc]
[  132.394020] LR is at arm_heavy_mb+0x1c/0x38
[  132.398212] pc : [<bf0060c4>]    lr : [<c031820c>]    psr: a0000013
[  132.404392] sp : e0929e30  ip : deaddead  fp : c4430270
[  132.409678] r10: c8a0bc18  r9 : deaddeac  r8 : c89b1c00
[  132.414895] r7 : c4430340  r6 : c45db7c0  r5 : e093ffc0  r4 :
000003f0
[  132.421413] r3 : 00010000  r2 : e0940000  r1 : 00000000  r0 :
00000000
"

3) This solution does not really scale. There are designs where we have
multiple instances of the IP core attached to the client (typically if
a converter like this has more than one channel, we have one instance
of the core per channel). The way we have it know, that would not me
feasable.

Issues number 2) and 3) would more or less be easy to solve moving from
of_parse_phandle() and from devm_ (using plain kmalloc() + krefs). The
big issue to me is 1) because, for example, in the driver we have out
of tree for ad9467, we do a tuning on the adc LVDS interface that we do
not do in the upstream version. And why is this problematic? Because
this means the converter driver has to do some configurations on the IP
core device and hence we have a bidirectional link between these 2
devices which, while doable, is far from ideal and can become complex
to keep it sane.

To be fair, what we have in upstream today is inspired a lot in what
ADI has out of tree (likely even better) but I truly think we need to
do it better. As I want to start working on this, I want to share some
ideas that I have on my head and hopefuly get some feedback on the
direction to go. So, here it goes:

1) The obvious one is to flip the logic of the "communication" link.
Right now callbacks are done from the IP core driver to the converter.
As stated, the converter might need to access the IP core to ask/do
some configurations. Well, from what I've seen in other projects we
have, I'm positive everything can be done by being the converter the
"entry point". This effectively means, that IP core devices would
register somewhere (I'm thinking in a tiny interface module that can
likely be reused for the axi-dac counterpart of the axi-adc) and the
converter device would "get" it during probe.

2) With respect to 1), while treating the devices as 1 might look
appealing, I'm not really sure it's the right thing to do. This,
because these devices are effectively two different devices: the
converter is typically a SPI device while the IP core is a MMIO device.
Treating them as 1, makes things like the IIO direct_reg_access not
doable (unless we duplicate some code and manually add the second
debugfs interface). Another thing not ideal with 1) is that the DMA
buffer interface really belongs on the IP core (looking on how the HW
is designed) and not on the converter driver. However, one thing that
I'm not sure it will be ideal with having two devices is synchronizing
accesses to the IP core. For example, the axi-dac driver (that I also
intend to bring upstream as part of this work) also has an userspace
interface so we need to be carefull. But I guess this is no different
for devices which are accessed by the in-kernel IIO interface.

I'm way more inclined to go with 2).

Another thing that I've been thinking is how should we implement the IP
core driver... As stated, there are usecases where we have mulitple
instances of these devices. Well, that is how we've been doing things
in ADI tree but essentially the core is just one big "block/device"
[1]. However, the way we've been doing things is actually ok because
every sub-block on these IPs have their own memory map allowing to
control everything needed by the sub-blocks.

Another option and now that we support output buffers and multiple
buffers per device would be to support these cores as one device
(likely in the addac folder). However, I think that keeping each block
as a standalone device might keep things more simple. But I assume it's
tempting to get a first user to the multi buffer support :)


Now for the big elephant of all the above... Everything that I've been
saying means breaking the current ABI. I'm fairly confident that we
should not have any upstream user for these drivers because it lacks a
lot of features that are only present in ADI tree. So, I think it's ok
to destroy the ABI but of course I cannot guarantee there are no users
:).

So, would it be fine for me to not care about the current ABI while
working on this? Because, it will be very difficult to come up with
something that is compatible with what we have. Well, there's always
the option of coming up with a -v2 (or something like that) for these
drivers keeping the ones we have now, untouched. Ideas?

So... feedback is very much appreciated :)

[1]: https://wiki.analog.com/resources/eval/user-guides/adrv9002/axi_adrv9002
https://wiki.analog.com/resources/eval/user-guides/adrv9002/axi_adrv9002
https://wiki.analog.com/resources/eval/user-guides/adrv9002/axi_adrv9002

- Nuno Sá

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: adi-axi-adc issues and how to properly support this designs
  2023-03-31 14:40 adi-axi-adc issues and how to properly support this designs Nuno Sá
@ 2023-04-10 12:21 ` Jonathan Cameron
  2023-04-14  9:11   ` Nuno Sá
  0 siblings, 1 reply; 7+ messages in thread
From: Jonathan Cameron @ 2023-04-10 12:21 UTC (permalink / raw)
  To: Nuno Sá; +Cc: linux-iio

On Fri, 31 Mar 2023 16:40:44 +0200
Nuno Sá <noname.nuno@gmail.com> wrote:

> Hi Jonathan,
> 

Hmm. Complex topic, so I'd definitely like others to weigh in with
opinions .

> There are some major issues with the implementation we have upstream
> for the adi-axi-adc driver. The only user is the ad9467 driver. In the
> rest of the email I will refer to ad9467 as the converter device and
> axi-adc as the IP core. 
> 
> Let me start to state that these designs, in a very a basic way, have a
> converter connected to a IP core that typically lives in a FPGA. This
> core connects to a DMA core (which is used by the DMA buffer
> implementation) so we can keep up with the high rates of these
> converters. So there's a link between these devices and we try to
> implement that so far looking at them as one single IIO device.
> 
> Let's go to the major issues now:
> 
> 1) There is a circular dependency between the two device. And when
> compiled as modules for example, we cannot really rmmod the modules
> anymore:
> 
> "root@analog:~# rmmod ad9467
> rmmod: ERROR: Module ad9467 is in use
> 
> root@analog:~# rmmod adi-axi-adc.ko
> rmmod: ERROR: Module adi_axi_adc is in use by: ad9467
> "
> 
> This easy to understand as the ad9467 driver as a direct symbol
> dependency on the axi-adc one. And this one calls 'module_get()' as
> soon as it attaches to a "client"

Ouch. module_get() never works for this.  Long time ago I thought it
did :(  An unbind will bypass that (as well any real hot unplug paths).


> 
> 2) It's fairly easy to crash the kernel:
> 
> "
> root@analog:/sys/bus/spi/drivers/ad9467# echo spi0.0 > unbind
> root@analog:/sys/bus/iio/devices/iio:device2# echo 1 >
> buffer0/in_voltage0_en
> root@analog:/sys/bus/iio/devices/iio:device2# echo 1 > buffer0/enable
> [  132.349133] 8<--- cut here ---
> [  132.352193] Unable to handle kernel paging request at virtual
> address e0940000 when read
> [  132.360333] [e0940000] *pgd=0208b811, *pte=00000000, *ppte=00000000
> [  132.366668] Internal error: Oops: 7 [#1] SMP ARM
> [  132.371289] Modules linked in: ad9467 adi_axi_adc ad9517
> [  132.376609] CPU: 1 PID: 444 Comm: bash Not tainted 6.2.9-dirty #3
> [  132.382701] Hardware name: Xilinx Zynq Platform
> [  132.387223] PC is at adi_axi_adc_update_scan_mode+0x34/0x88
> [adi_axi_adc]
> [  132.394020] LR is at arm_heavy_mb+0x1c/0x38
> [  132.398212] pc : [<bf0060c4>]    lr : [<c031820c>]    psr: a0000013
> [  132.404392] sp : e0929e30  ip : deaddead  fp : c4430270
> [  132.409678] r10: c8a0bc18  r9 : deaddeac  r8 : c89b1c00
> [  132.414895] r7 : c4430340  r6 : c45db7c0  r5 : e093ffc0  r4 :
> 000003f0
> [  132.421413] r3 : 00010000  r2 : e0940000  r1 : 00000000  r0 :
> 00000000
> "

It's nasty but generally that sort of path can be prevented with some
careful locking and checking of indicators that a component has gone away.
So whenever you unbind one aspect it notifies the other one that it has gone
away. I'm not sure where to currently look for best practice in this.

There are a lot of similar situations - basically anywhere a set of 
drivers end up hanging off two buses.  v4l and media drivers in general
end up doing this a lot.

One I'm familiar with is how some of the CXL tear down happens and
that lead me to device_release_driver() which is also used by usb etc.
I've not looked at this for a while but it may provide the tear down needed
if the right dance is used.  I think that will work if your convertor
driver is using services off the IP core driver and someone unbinds
that ip-core driver.


> 
> 3) This solution does not really scale. There are designs where we have
> multiple instances of the IP core attached to the client (typically if
> a converter like this has more than one channel, we have one instance
> of the core per channel). The way we have it know, that would not me
> feasable.
> 
> Issues number 2) and 3) would more or less be easy to solve moving from
> of_parse_phandle() and from devm_ (using plain kmalloc() + krefs). The
> big issue to me is 1) because, for example, in the driver we have out
> of tree for ad9467, we do a tuning on the adc LVDS interface that we do
> not do in the upstream version. And why is this problematic? Because
> this means the converter driver has to do some configurations on the IP
> core device and hence we have a bidirectional link between these 2
> devices which, while doable, is far from ideal and can become complex
> to keep it sane.

That sort of thing becomes inevitable when we have two hardware buses
involved (e.g. SPI and AXI). 

> 
> To be fair, what we have in upstream today is inspired a lot in what
> ADI has out of tree (likely even better) but I truly think we need to
> do it better. As I want to start working on this, I want to share some
> ideas that I have on my head and hopefuly get some feedback on the
> direction to go. So, here it goes:
> 
> 1) The obvious one is to flip the logic of the "communication" link.
> Right now callbacks are done from the IP core driver to the converter.

That does seem potentially backwards.  My instinct would be to have
it the other way around. Controls tend to be at the convertor end and
it tends to be the only part that 'knows' what data is being captured.
The way we've done this in IIO in general has evolved over time, but
I think we are standardising on the 'convertor' front end consuming services
from other elements.  So if it's an analog output IIO device driver consuming
the services of an ADC, then the device you talk to represents the analog output
devices (e.g. an accelerometer) and you don't use the userspace interfaces
of the ADC in this at all. 

> As stated, the converter might need to access the IP core to ask/do
> some configurations. Well, from what I've seen in other projects we
> have, I'm positive everything can be done by being the converter the
> "entry point". This effectively means, that IP core devices would
> register somewhere (I'm thinking in a tiny interface module that can
> likely be reused for the axi-dac counterpart of the axi-adc) and the
> converter device would "get" it during probe.

Yes, like treating the IP core as a resource like a gpio / clock etc that a driver
consumes.

> 
> 2) With respect to 1), while treating the devices as 1 might look
> appealing, I'm not really sure it's the right thing to do. This,
> because these devices are effectively two different devices: the
> converter is typically a SPI device while the IP core is a MMIO device.

Yes.  Or look at it a different way, it's a device that sits on two
host buses (SPI and AXI) each of which has a controller.  One is the SPI
master, the other is the IP core with it's MMIO interface.

> Treating them as 1, makes things like the IIO direct_reg_access not
> doable (unless we duplicate some code and manually add the second
> debugfs interface). 

It should work for the SPI accessed front end. I guess you are talking
about the IP core part. I'm not sure we should treat that as anything
to do with a IIO. It's a generic 'bus controller'.

> Another thing not ideal with 1) is that the DMA
> buffer interface really belongs on the IP core (looking on how the HW
> is designed) and not on the converter driver. 

This seems not hugely different from an SPI controller doing DMA.  The DMA
part is in the SPI controller, but the data flow control is all in the
IIO driver for whatever is on the Analog end. It will need some functionality
/ callbacks etc and the stuff mentioned earlier to ensure that tearing down
the IP core driver tears down it's users.

> However, one thing that
> I'm not sure it will be ideal with having two devices is synchronizing
> accesses to the IP core. For example, the axi-dac driver (that I also
> intend to bring upstream as part of this work) also has an userspace
> interface so we need to be carefull.

I'm a bit lost. I think we'd only expect to see a userspace interface
on the 'front end' driver. Is there something on the driver handling the
IP core as well (beyond debug which you can do whatever you like with :)

> But I guess this is no different
> for devices which are accessed by the in-kernel IIO interface.
> 
> I'm way more inclined to go with 2).

I'm not sure I fully follow what option 2) is!  Perhaps a bit
of ascii art?

> 
> Another thing that I've been thinking is how should we implement the IP
> core driver... As stated, there are usecases where we have mulitple
> instances of these devices. Well, that is how we've been doing things
> in ADI tree but essentially the core is just one big "block/device"
> [1]. However, the way we've been doing things is actually ok because
> every sub-block on these IPs have their own memory map allowing to
> control everything needed by the sub-blocks.

Device model wise, that sounds like something that would be neater split
up. So one kernel device registered per sub-block.  That should make
it easier to handle referencing them from front end drivers.  You can 
also then unbind individual blocks etc rather than having to do them
in one go.  If there is need for a 'parent' block then this might be
a fit for the auxiliary bus framework - or something that looks like
that but suits your particular requirements.

> 
> Another option and now that we support output buffers and multiple
> buffers per device would be to support these cores as one device
> (likely in the addac folder).
> However, I think that keeping each block
> as a standalone device might keep things more simple. But I assume it's
> tempting to get a first user to the multi buffer support :)

Do we end up with with one front end, with a bunch of IP cores (or sub cores etc)
behind it?  If so I'd like the front end (where buffers are presented)
to have a bunch of output buffers.  I'd assume that there are controls
for some front ends that affect the data flowing into multiple IP
blocks? (gain etc).  That front end (which would be the IIO driver)
might well register for multiple backends (IP block driver).

> 
> Now for the big elephant of all the above... Everything that I've been
> saying means breaking the current ABI. I'm fairly confident that we
> should not have any upstream user for these drivers because it lacks a
> lot of features that are only present in ADI tree. So, I think it's ok
> to destroy the ABI but of course I cannot guarantee there are no users
> :).

I'm less worried about doing this for this driver than the vast majority
of others because at least for upstream I don't think we'll have huge
numbers of users.  + they'll shout at you not me ;)

> 
> So, would it be fine for me to not care about the current ABI while
> working on this? Because, it will be very difficult to come up with
> something that is compatible with what we have. Well, there's always
> the option of coming up with a -v2 (or something like that) for these
> drivers keeping the ones we have now, untouched. Ideas?

Probably good to layout the current ABI as exposed and then we can see
whether what we are changing is stuff not used in normal software flows
anyway or if we are going to break critical stuff.  IF we end up with
a /sys/bus/iio/device/iio:deviceX that has the same interfaces, then
the user won't care that the driver structure underneath is totally different.
I don't care about debug related interfaces remaining stable ABI

Going for a v2 may be the best option - marking the other one deprecated
and getting rid of it after a number of years.  If you (ADI in general)
reckon you can get away with it then I'm happy.  The usecases for this
stuff tend to be sufficiently high end that I'd imagine you only have to
deal with a fairly small number of customers and many of them won't
use upstream directly anyway.

> 
> So... feedback is very much appreciated :)

I think unfortunately this is going to be an area where experimentation
is needed, particularly as I suspect you have a lot of different devices all
with subtle differences in requirements. We need soemthing that supports
them all.

For extra fun. In the ideal sense any driver for a convertor should work
equally well with different IP cores (if any ever come along).

Jonathan

> 
> [1]: https://wiki.analog.com/resources/eval/user-guides/adrv9002/axi_adrv9002
> https://wiki.analog.com/resources/eval/user-guides/adrv9002/axi_adrv9002
> https://wiki.analog.com/resources/eval/user-guides/adrv9002/axi_adrv9002
> 
> - Nuno Sá


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: adi-axi-adc issues and how to properly support this designs
  2023-04-10 12:21 ` Jonathan Cameron
@ 2023-04-14  9:11   ` Nuno Sá
  2023-04-16 13:33     ` Jonathan Cameron
  0 siblings, 1 reply; 7+ messages in thread
From: Nuno Sá @ 2023-04-14  9:11 UTC (permalink / raw)
  To: Jonathan Cameron; +Cc: linux-iio

Hi Jonathan,

Thanks for the feedback. Definitely helpful...

On Mon, 2023-04-10 at 13:21 +0100, Jonathan Cameron wrote:
> On Fri, 31 Mar 2023 16:40:44 +0200
> Nuno Sá <noname.nuno@gmail.com> wrote:
> 
> > Hi Jonathan,
> > 
> 
> Hmm. Complex topic, so I'd definitely like others to weigh in with
> opinions .
> 
> > There are some major issues with the implementation we have upstream
> > for the adi-axi-adc driver. The only user is the ad9467 driver. In the
> > rest of the email I will refer to ad9467 as the converter device and
> > axi-adc as the IP core. 
> > 
> > Let me start to state that these designs, in a very a basic way, have a
> > converter connected to a IP core that typically lives in a FPGA. This
> > core connects to a DMA core (which is used by the DMA buffer
> > implementation) so we can keep up with the high rates of these
> > converters. So there's a link between these devices and we try to
> > implement that so far looking at them as one single IIO device.
> > 
> > Let's go to the major issues now:
> > 
> > 1) There is a circular dependency between the two device. And when
> > compiled as modules for example, we cannot really rmmod the modules
> > anymore:
> > 
> > "root@analog:~# rmmod ad9467
> > rmmod: ERROR: Module ad9467 is in use
> > 
> > root@analog:~# rmmod adi-axi-adc.ko
> > rmmod: ERROR: Module adi_axi_adc is in use by: ad9467
> > "
> > 
> > This easy to understand as the ad9467 driver as a direct symbol
> > dependency on the axi-adc one. And this one calls 'module_get()' as
> > soon as it attaches to a "client"
> 
> Ouch. module_get() never works for this.  Long time ago I thought it
> did :(  An unbind will bypass that (as well any real hot unplug paths).
> 

Yeps, it bypasses it but I just wanted to point out the flaw in the current design :)

> 
> > 
> > 2) It's fairly easy to crash the kernel:
> > 
> > "
> > root@analog:/sys/bus/spi/drivers/ad9467# echo spi0.0 > unbind
> > root@analog:/sys/bus/iio/devices/iio:device2# echo 1 >
> > buffer0/in_voltage0_en
> > root@analog:/sys/bus/iio/devices/iio:device2# echo 1 > buffer0/enable
> > [  132.349133] 8<--- cut here ---
> > [  132.352193] Unable to handle kernel paging request at virtual
> > address e0940000 when read
> > [  132.360333] [e0940000] *pgd=0208b811, *pte=00000000, *ppte=00000000
> > [  132.366668] Internal error: Oops: 7 [#1] SMP ARM
> > [  132.371289] Modules linked in: ad9467 adi_axi_adc ad9517
> > [  132.376609] CPU: 1 PID: 444 Comm: bash Not tainted 6.2.9-dirty #3
> > [  132.382701] Hardware name: Xilinx Zynq Platform
> > [  132.387223] PC is at adi_axi_adc_update_scan_mode+0x34/0x88
> > [adi_axi_adc]
> > [  132.394020] LR is at arm_heavy_mb+0x1c/0x38
> > [  132.398212] pc : [<bf0060c4>]    lr : [<c031820c>]    psr: a0000013
> > [  132.404392] sp : e0929e30  ip : deaddead  fp : c4430270
> > [  132.409678] r10: c8a0bc18  r9 : deaddeac  r8 : c89b1c00
> > [  132.414895] r7 : c4430340  r6 : c45db7c0  r5 : e093ffc0  r4 :
> > 000003f0
> > [  132.421413] r3 : 00010000  r2 : e0940000  r1 : 00000000  r0 :
> > 00000000
> > "
> 
> It's nasty but generally that sort of path can be prevented with some
> careful locking and checking of indicators that a component has gone away.
> So whenever you unbind one aspect it notifies the other one that it has gone
> away. I'm not sure where to currently look for best practice in this.
> 
> There are a lot of similar situations - basically anywhere a set of 
> drivers end up hanging off two buses.  v4l and media drivers in general
> end up doing this a lot.
> 
> One I'm familiar with is how some of the CXL tear down happens and
> that lead me to device_release_driver() which is also used by usb etc.
> I've not looked at this for a while but it may provide the tear down needed
> if the right dance is used.  I think that will work if your convertor
> driver is using services off the IP core driver and someone unbinds
> that ip-core driver.
> 

Yes, CCF does it with refcounting and some dummy clock ops for the case a clock
provider is gone (with active consumers).

> 
> > 
> > 3) This solution does not really scale. There are designs where we have
> > multiple instances of the IP core attached to the client (typically if
> > a converter like this has more than one channel, we have one instance
> > of the core per channel). The way we have it know, that would not me
> > feasable.
> > 

...

> 
> 
> > However, one thing that
> > I'm not sure it will be ideal with having two devices is synchronizing
> > accesses to the IP core. For example, the axi-dac driver (that I also
> > intend to bring upstream as part of this work) also has an userspace
> > interface so we need to be carefull.
> 
> I'm a bit lost. I think we'd only expect to see a userspace interface
> on the 'front end' driver. Is there something on the driver handling the
> IP core as well (beyond debug which you can do whatever you like with :)
> 

See below... 

> > But I guess this is no different
> > for devices which are accessed by the in-kernel IIO interface.
> > 
> > I'm way more inclined to go with 2).
> 
> I'm not sure I fully follow what option 2) is!  Perhaps a bit
> of ascii art?
> 

So option 2 is pretty much treating the IP core also as a separate IIO device.
This is not so visible in axi-adc but on the dac counterpart, the settings
to generate tones (frequency/phase) are on the core. So, this could map to
IIO ABI.

However, I guess we can provide some callback/ops to read_raw()/write_raw() 
to extend the front end converter (This is what is done today upstream but
in the wrong direction). In this way, I think we could still only expose the
frond end device.

> > 
> > Another thing that I've been thinking is how should we implement the IP
> > core driver... As stated, there are usecases where we have mulitple
> > instances of these devices. Well, that is how we've been doing things
> > in ADI tree but essentially the core is just one big "block/device"
> > [1]. However, the way we've been doing things is actually ok because
> > every sub-block on these IPs have their own memory map allowing to
> > control everything needed by the sub-blocks.
> 
> Device model wise, that sounds like something that would be neater split
> up. So one kernel device registered per sub-block.  That should make
> it easier to handle referencing them from front end drivers.  You can
> also then unbind individual blocks etc rather than having to do them
> in one go.  If there is need for a 'parent' block then this might be
> a fit for the auxiliary bus framework - or something that looks like
> that but suits your particular requirements.
> 

Agreed. Instead of auxiliary bus, I've been thinking in two option:
 * Normal devicetree (FW) + kref based approach (as CCF, in kernel IIO, etc)
 * component API [1]. The component API looks to fit nicely in these designs
and I would not have to care about refcounting. It's also an all or nothing
approach. Either all devices are present or none is which honestly, I think
it makes sense. I also don't dislike the two staged binding approach... One thing
that worries me is that it looks to be very tailored for DRM (there's even some
DRM specific comments in the code :)) which might bring some unexpected, unpleasant
surprises.

> > 
> > Another option and now that we support output buffers and multiple
> > buffers per device would be to support these cores as one device
> > (likely in the addac folder).
> > However, I think that keeping each block
> > as a standalone device might keep things more simple. But I assume it's
> > tempting to get a first user to the multi buffer support :)
> 
> Do we end up with with one front end, with a bunch of IP cores (or sub cores etc)
> behind it?  If so I'd like the front end (where buffers are presented)
> to have a bunch of output buffers.  I'd assume that there are controls
> for some front ends that affect the data flowing into multiple IP
> blocks? (gain etc).  That front end (which would be the IIO driver)
> might well register for multiple backends (IP block driver).
> 

Yes. Think of high speed ADCs/DACs with more than one channel/
data paths. Typically each interface (LVDS/CMOS/JESD) of that path is connected
to the fpga mapping to it's own sub IP block. So yes, you're right. In this
case, as we are moving more for the "front end" approach, we will still have
users for multi buffer support :).

> > 
> > Now for the big elephant of all the above... Everything that I've been
> > saying means breaking the current ABI. I'm fairly confident that we
> > should not have any upstream user for these drivers because it lacks a
> > lot of features that are only present in ADI tree. So, I think it's ok
> > to destroy the ABI but of course I cannot guarantee there are no users
> > :).
> 
> I'm less worried about doing this for this driver than the vast majority
> of others because at least for upstream I don't think we'll have huge
> numbers of users.  + they'll shout at you not me ;)
> 
> > 
> > So, would it be fine for me to not care about the current ABI while
> > working on this? Because, it will be very difficult to come up with
> > something that is compatible with what we have. Well, there's always
> > the option of coming up with a -v2 (or something like that) for these
> > drivers keeping the ones we have now, untouched. Ideas?
> 
> Probably good to layout the current ABI as exposed and then we can see
> whether what we are changing is stuff not used in normal software flows
> anyway or if we are going to break critical stuff.  IF we end up with
> a /sys/bus/iio/device/iio:deviceX that has the same interfaces, then
> the user won't care that the driver structure underneath is totally different.
> I don't care about debug related interfaces remaining stable ABI

Maybe with this front end device logic where we are only still exposing one
IIO device, we we'll end up not breaking that much the ABI (if breaking at all).
> 
> Going for a v2 may be the best option - marking the other one deprecated
> and getting rid of it after a number of years.  If you (ADI in general)
> reckon you can get away with it then I'm happy.  The usecases for this
> stuff tend to be sufficiently high end that I'd imagine you only have to
> deal with a fairly small number of customers and many of them won't
> use upstream directly anyway.
> 
> > 
> > So... feedback is very much appreciated :)
> 
> I think unfortunately this is going to be an area where experimentation
> is needed, particularly as I suspect you have a lot of different devices all
> with subtle differences in requirements. We need soemthing that supports
> them all.
> 
> For extra fun. In the ideal sense any driver for a convertor should work
> equally well with different IP cores (if any ever come along).
> 

Yeah, that is actually my goal. In my head, I'm planning to have a tiny
middleware module where IP core drivers register with some ops() (that we
can always extend) and front end devices get() these cores and then call
the exposed API. Then, ideally, this module should not have any specific
information about the IP core specific internals (naturally some ops/API
might be more tailored to some drivers but we should try to keep them as
generic as possible)... 

[1]: https://elixir.bootlin.com/linux/latest/source/drivers/base/component.c
- Nuno Sá


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: adi-axi-adc issues and how to properly support this designs
  2023-04-14  9:11   ` Nuno Sá
@ 2023-04-16 13:33     ` Jonathan Cameron
  2023-04-18 13:36       ` Nuno Sá
  0 siblings, 1 reply; 7+ messages in thread
From: Jonathan Cameron @ 2023-04-16 13:33 UTC (permalink / raw)
  To: Nuno Sá; +Cc: linux-iio

On Fri, 14 Apr 2023 11:11:43 +0200
Nuno Sá <noname.nuno@gmail.com> wrote:

> Hi Jonathan,
> 
> Thanks for the feedback. Definitely helpful...
> 
> On Mon, 2023-04-10 at 13:21 +0100, Jonathan Cameron wrote:
> > On Fri, 31 Mar 2023 16:40:44 +0200
> > Nuno Sá <noname.nuno@gmail.com> wrote:
> >   
> > > Hi Jonathan,
> > >   
> > 
> > Hmm. Complex topic, so I'd definitely like others to weigh in with
> > opinions .
> >   
> > > There are some major issues with the implementation we have upstream
> > > for the adi-axi-adc driver. The only user is the ad9467 driver. In the
> > > rest of the email I will refer to ad9467 as the converter device and
> > > axi-adc as the IP core. 
> > > 
> > > Let me start to state that these designs, in a very a basic way, have a
> > > converter connected to a IP core that typically lives in a FPGA. This
> > > core connects to a DMA core (which is used by the DMA buffer
> > > implementation) so we can keep up with the high rates of these
> > > converters. So there's a link between these devices and we try to
> > > implement that so far looking at them as one single IIO device.
> > > 
> > > Let's go to the major issues now:
> > > 
> > > 1) There is a circular dependency between the two device. And when
> > > compiled as modules for example, we cannot really rmmod the modules
> > > anymore:
> > > 
> > > "root@analog:~# rmmod ad9467
> > > rmmod: ERROR: Module ad9467 is in use
> > > 
> > > root@analog:~# rmmod adi-axi-adc.ko
> > > rmmod: ERROR: Module adi_axi_adc is in use by: ad9467
> > > "
> > > 
> > > This easy to understand as the ad9467 driver as a direct symbol
> > > dependency on the axi-adc one. And this one calls 'module_get()' as
> > > soon as it attaches to a "client"  
> > 
> > Ouch. module_get() never works for this.  Long time ago I thought it
> > did :(  An unbind will bypass that (as well any real hot unplug paths).
> >   
> 
> Yeps, it bypasses it but I just wanted to point out the flaw in the current design :)

It was a surprise to me when Lars pointed this out years ago.
All we can do is treat it as a hint for something that makes
little sense for a user to deliberately do.

> 
> >   
> > > 
> > > 2) It's fairly easy to crash the kernel:
> > > 
> > > "
> > > root@analog:/sys/bus/spi/drivers/ad9467# echo spi0.0 > unbind
> > > root@analog:/sys/bus/iio/devices/iio:device2# echo 1 >
> > > buffer0/in_voltage0_en
> > > root@analog:/sys/bus/iio/devices/iio:device2# echo 1 > buffer0/enable
> > > [  132.349133] 8<--- cut here ---
> > > [  132.352193] Unable to handle kernel paging request at virtual
> > > address e0940000 when read
> > > [  132.360333] [e0940000] *pgd=0208b811, *pte=00000000, *ppte=00000000
> > > [  132.366668] Internal error: Oops: 7 [#1] SMP ARM
> > > [  132.371289] Modules linked in: ad9467 adi_axi_adc ad9517
> > > [  132.376609] CPU: 1 PID: 444 Comm: bash Not tainted 6.2.9-dirty #3
> > > [  132.382701] Hardware name: Xilinx Zynq Platform
> > > [  132.387223] PC is at adi_axi_adc_update_scan_mode+0x34/0x88
> > > [adi_axi_adc]
> > > [  132.394020] LR is at arm_heavy_mb+0x1c/0x38
> > > [  132.398212] pc : [<bf0060c4>]    lr : [<c031820c>]    psr: a0000013
> > > [  132.404392] sp : e0929e30  ip : deaddead  fp : c4430270
> > > [  132.409678] r10: c8a0bc18  r9 : deaddeac  r8 : c89b1c00
> > > [  132.414895] r7 : c4430340  r6 : c45db7c0  r5 : e093ffc0  r4 :
> > > 000003f0
> > > [  132.421413] r3 : 00010000  r2 : e0940000  r1 : 00000000  r0 :
> > > 00000000
> > > "  
> > 
> > It's nasty but generally that sort of path can be prevented with some
> > careful locking and checking of indicators that a component has gone away.
> > So whenever you unbind one aspect it notifies the other one that it has gone
> > away. I'm not sure where to currently look for best practice in this.
> > 
> > There are a lot of similar situations - basically anywhere a set of 
> > drivers end up hanging off two buses.  v4l and media drivers in general
> > end up doing this a lot.
> > 
> > One I'm familiar with is how some of the CXL tear down happens and
> > that lead me to device_release_driver() which is also used by usb etc.
> > I've not looked at this for a while but it may provide the tear down needed
> > if the right dance is used.  I think that will work if your convertor
> > driver is using services off the IP core driver and someone unbinds
> > that ip-core driver.
> >   
> 
> Yes, CCF does it with refcounting and some dummy clock ops for the case a clock
> provider is gone (with active consumers).

That must be 'exciting' on occasion as no way to know if the clock disappeared
or reset and hence the device interface might lock up. 

> 
> >   
> > > 
> > > 3) This solution does not really scale. There are designs where we have
> > > multiple instances of the IP core attached to the client (typically if
> > > a converter like this has more than one channel, we have one instance
> > > of the core per channel). The way we have it know, that would not me
> > > feasable.
> > >   
> 
> ...
> 
> > 
> >   
> > > However, one thing that
> > > I'm not sure it will be ideal with having two devices is synchronizing
> > > accesses to the IP core. For example, the axi-dac driver (that I also
> > > intend to bring upstream as part of this work) also has an userspace
> > > interface so we need to be carefull.  
> > 
> > I'm a bit lost. I think we'd only expect to see a userspace interface
> > on the 'front end' driver. Is there something on the driver handling the
> > IP core as well (beyond debug which you can do whatever you like with :)
> >   
> 
> See below... 
> 
> > > But I guess this is no different
> > > for devices which are accessed by the in-kernel IIO interface.
> > > 
> > > I'm way more inclined to go with 2).  
> > 
> > I'm not sure I fully follow what option 2) is!  Perhaps a bit
> > of ascii art?
> >   
> 
> So option 2 is pretty much treating the IP core also as a separate IIO device.
> This is not so visible in axi-adc but on the dac counterpart, the settings
> to generate tones (frequency/phase) are on the core. So, this could map to
> IIO ABI.

Those are of interest to which bit of analog circuitry?  The big beyond the
front end?  If so fine to provide that via the IIO interfaces, but the 
front end device should proxy them (using the IIO in kernel consumer interfaces
- which probably needs some work to make them safe to the provider going away.
From below, seems that a more focused interface for this particular device
representation may make sense.

> 
> However, I guess we can provide some callback/ops to read_raw()/write_raw() 
> to extend the front end converter (This is what is done today upstream but
> in the wrong direction). In this way, I think we could still only expose the
> frond end device.

Yup.  Either via the IIO consumer route, or something custom.  Depends a bit
on whether we want the front ends drivers to play nicely with other
backends (alternatives to adi-axi-adc part).  You touch on this later so
that all sounds good.

> 
> > > 
> > > Another thing that I've been thinking is how should we implement the IP
> > > core driver... As stated, there are usecases where we have mulitple
> > > instances of these devices. Well, that is how we've been doing things
> > > in ADI tree but essentially the core is just one big "block/device"
> > > [1]. However, the way we've been doing things is actually ok because
> > > every sub-block on these IPs have their own memory map allowing to
> > > control everything needed by the sub-blocks.  
> > 
> > Device model wise, that sounds like something that would be neater split
> > up. So one kernel device registered per sub-block.  That should make
> > it easier to handle referencing them from front end drivers.  You can
> > also then unbind individual blocks etc rather than having to do them
> > in one go.  If there is need for a 'parent' block then this might be
> > a fit for the auxiliary bus framework - or something that looks like
> > that but suits your particular requirements.
> >   
> 
> Agreed. Instead of auxiliary bus, I've been thinking in two option:
>  * Normal devicetree (FW) + kref based approach (as CCF, in kernel IIO, etc)
>  * component API [1]. The component API looks to fit nicely in these designs
> and I would not have to care about refcounting. It's also an all or nothing
> approach. Either all devices are present or none is which honestly, I think
> it makes sense. I also don't dislike the two staged binding approach... One thing
> that worries me is that it looks to be very tailored for DRM (there's even some
> DRM specific comments in the code :)) which might bring some unexpected, unpleasant
> surprises.
> 

Interesting.  I wasn't aware of that one.  If it looks good to you then
I'd be interested to see what resulting code looks like.

> > > 
> > > Another option and now that we support output buffers and multiple
> > > buffers per device would be to support these cores as one device
> > > (likely in the addac folder).
> > > However, I think that keeping each block
> > > as a standalone device might keep things more simple. But I assume it's
> > > tempting to get a first user to the multi buffer support :)  
> > 
> > Do we end up with with one front end, with a bunch of IP cores (or sub cores etc)
> > behind it?  If so I'd like the front end (where buffers are presented)
> > to have a bunch of output buffers.  I'd assume that there are controls
> > for some front ends that affect the data flowing into multiple IP
> > blocks? (gain etc).  That front end (which would be the IIO driver)
> > might well register for multiple backends (IP block driver).
> >   
> 
> Yes. Think of high speed ADCs/DACs with more than one channel/
> data paths. Typically each interface (LVDS/CMOS/JESD) of that path is connected
> to the fpga mapping to it's own sub IP block. So yes, you're right. In this
> case, as we are moving more for the "front end" approach, we will still have
> users for multi buffer support :).
> 
> > > 
> > > Now for the big elephant of all the above... Everything that I've been
> > > saying means breaking the current ABI. I'm fairly confident that we
> > > should not have any upstream user for these drivers because it lacks a
> > > lot of features that are only present in ADI tree. So, I think it's ok
> > > to destroy the ABI but of course I cannot guarantee there are no users
> > > :).  
> > 
> > I'm less worried about doing this for this driver than the vast majority
> > of others because at least for upstream I don't think we'll have huge
> > numbers of users.  + they'll shout at you not me ;)
> >   
> > > 
> > > So, would it be fine for me to not care about the current ABI while
> > > working on this? Because, it will be very difficult to come up with
> > > something that is compatible with what we have. Well, there's always
> > > the option of coming up with a -v2 (or something like that) for these
> > > drivers keeping the ones we have now, untouched. Ideas?  
> > 
> > Probably good to layout the current ABI as exposed and then we can see
> > whether what we are changing is stuff not used in normal software flows
> > anyway or if we are going to break critical stuff.  IF we end up with
> > a /sys/bus/iio/device/iio:deviceX that has the same interfaces, then
> > the user won't care that the driver structure underneath is totally different.
> > I don't care about debug related interfaces remaining stable ABI  
> 
> Maybe with this front end device logic where we are only still exposing one
> IIO device, we we'll end up not breaking that much the ABI (if breaking at all).

It'll probably change in subtle ways, but maybe in ways that userspace
doesn't notice :)  If we can get away with it that would be great.
Otherwise I'm going to rely on you ADI folk telling me we can get away
with the ABI breakage.

> > 
> > Going for a v2 may be the best option - marking the other one deprecated
> > and getting rid of it after a number of years.  If you (ADI in general)
> > reckon you can get away with it then I'm happy.  The usecases for this
> > stuff tend to be sufficiently high end that I'd imagine you only have to
> > deal with a fairly small number of customers and many of them won't
> > use upstream directly anyway.
> >   
> > > 
> > > So... feedback is very much appreciated :)  
> > 
> > I think unfortunately this is going to be an area where experimentation
> > is needed, particularly as I suspect you have a lot of different devices all
> > with subtle differences in requirements. We need soemthing that supports
> > them all.
> > 
> > For extra fun. In the ideal sense any driver for a convertor should work
> > equally well with different IP cores (if any ever come along).
> >   
> 
> Yeah, that is actually my goal. In my head, I'm planning to have a tiny
> middleware module where IP core drivers register with some ops() (that we
> can always extend) and front end devices get() these cores and then call
> the exposed API. Then, ideally, this module should not have any specific
> information about the IP core specific internals (naturally some ops/API
> might be more tailored to some drivers but we should try to keep them as
> generic as possible)... 

Nice and good luck :)

Jonathan

> 
> [1]: https://elixir.bootlin.com/linux/latest/source/drivers/base/component.c
> - Nuno Sá
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: adi-axi-adc issues and how to properly support this designs
  2023-04-16 13:33     ` Jonathan Cameron
@ 2023-04-18 13:36       ` Nuno Sá
  2023-05-01 16:12         ` Jonathan Cameron
  0 siblings, 1 reply; 7+ messages in thread
From: Nuno Sá @ 2023-04-18 13:36 UTC (permalink / raw)
  To: Jonathan Cameron; +Cc: linux-iio

On Sun, 2023-04-16 at 14:33 +0100, Jonathan Cameron wrote:
> On Fri, 14 Apr 2023 11:11:43 +0200
> Nuno Sá <noname.nuno@gmail.com> wrote:
> 
> > Hi Jonathan,
> > 
> > Thanks for the feedback. Definitely helpful...
> > 
> > On Mon, 2023-04-10 at 13:21 +0100, Jonathan Cameron wrote:
> > > On Fri, 31 Mar 2023 16:40:44 +0200
> > > Nuno Sá <noname.nuno@gmail.com> wrote:
> > >   
> > > > Hi Jonathan,
> > > >   
> > > 
> > > Hmm. Complex topic, so I'd definitely like others to weigh in with
> > > opinions .
> > >   
> > > > There are some major issues with the implementation we have upstream
> > > > for the adi-axi-adc driver. The only user is the ad9467 driver. In the
> > > > rest of the email I will refer to ad9467 as the converter device and
> > > > axi-adc as the IP core. 
> > > > 
> > > > Let me start to state that these designs, in a very a basic way, have a
> > > > converter connected to a IP core that typically lives in a FPGA. This
> > > > core connects to a DMA core (which is used by the DMA buffer
> > > > implementation) so we can keep up with the high rates of these
> > > > converters. So there's a link between these devices and we try to
> > > > implement that so far looking at them as one single IIO device.
> > > > 
> > > > Let's go to the major issues now:
> > > > 
> > > > 1) There is a circular dependency between the two device. And when
> > > > compiled as modules for example, we cannot really rmmod the modules
> > > > anymore:
> > > > 
> > > > "root@analog:~# rmmod ad9467
> > > > rmmod: ERROR: Module ad9467 is in use
> > > > 
> > > > root@analog:~# rmmod adi-axi-adc.ko
> > > > rmmod: ERROR: Module adi_axi_adc is in use by: ad9467
> > > > "
> > > > 
> > > > This easy to understand as the ad9467 driver as a direct symbol
> > > > dependency on the axi-adc one. And this one calls 'module_get()' as
> > > > soon as it attaches to a "client"  
> > > 
> > > Ouch. module_get() never works for this.  Long time ago I thought it
> > > did :(  An unbind will bypass that (as well any real hot unplug paths).
> > >   
> > 
> > Yeps, it bypasses it but I just wanted to point out the flaw in the current
> > design :)
> 
> It was a surprise to me when Lars pointed this out years ago.
> All we can do is treat it as a hint for something that makes
> little sense for a user to deliberately do.
> 
> > 
> > >   
> > > > 
> > > > 2) It's fairly easy to crash the kernel:
> > > > 
> > > > "
> > > > root@analog:/sys/bus/spi/drivers/ad9467# echo spi0.0 > unbind
> > > > root@analog:/sys/bus/iio/devices/iio:device2# echo 1 >
> > > > buffer0/in_voltage0_en
> > > > root@analog:/sys/bus/iio/devices/iio:device2# echo 1 > buffer0/enable
> > > > [  132.349133] 8<--- cut here ---
> > > > [  132.352193] Unable to handle kernel paging request at virtual
> > > > address e0940000 when read
> > > > [  132.360333] [e0940000] *pgd=0208b811, *pte=00000000, *ppte=00000000
> > > > [  132.366668] Internal error: Oops: 7 [#1] SMP ARM
> > > > [  132.371289] Modules linked in: ad9467 adi_axi_adc ad9517
> > > > [  132.376609] CPU: 1 PID: 444 Comm: bash Not tainted 6.2.9-dirty #3
> > > > [  132.382701] Hardware name: Xilinx Zynq Platform
> > > > [  132.387223] PC is at adi_axi_adc_update_scan_mode+0x34/0x88
> > > > [adi_axi_adc]
> > > > [  132.394020] LR is at arm_heavy_mb+0x1c/0x38
> > > > [  132.398212] pc : [<bf0060c4>]    lr : [<c031820c>]    psr: a0000013
> > > > [  132.404392] sp : e0929e30  ip : deaddead  fp : c4430270
> > > > [  132.409678] r10: c8a0bc18  r9 : deaddeac  r8 : c89b1c00
> > > > [  132.414895] r7 : c4430340  r6 : c45db7c0  r5 : e093ffc0  r4 :
> > > > 000003f0
> > > > [  132.421413] r3 : 00010000  r2 : e0940000  r1 : 00000000  r0 :
> > > > 00000000
> > > > "  
> > > 
> > > It's nasty but generally that sort of path can be prevented with some
> > > careful locking and checking of indicators that a component has gone away.
> > > So whenever you unbind one aspect it notifies the other one that it has
> > > gone
> > > away. I'm not sure where to currently look for best practice in this.
> > > 
> > > There are a lot of similar situations - basically anywhere a set of 
> > > drivers end up hanging off two buses.  v4l and media drivers in general
> > > end up doing this a lot.
> > > 
> > > One I'm familiar with is how some of the CXL tear down happens and
> > > that lead me to device_release_driver() which is also used by usb etc.
> > > I've not looked at this for a while but it may provide the tear down
> > > needed
> > > if the right dance is used.  I think that will work if your convertor
> > > driver is using services off the IP core driver and someone unbinds
> > > that ip-core driver.
> > >   
> > 
> > Yes, CCF does it with refcounting and some dummy clock ops for the case a
> > clock
> > provider is gone (with active consumers).
> 
> That must be 'exciting' on occasion as no way to know if the clock disappeared
> or reset and hence the device interface might lock up. 
> 
> > 
> > >   
> > > > 
> > > > 3) This solution does not really scale. There are designs where we have
> > > > multiple instances of the IP core attached to the client (typically if
> > > > a converter like this has more than one channel, we have one instance
> > > > of the core per channel). The way we have it know, that would not me
> > > > feasable.
> > > >   
> > 
> > ...
> > 
> > > 
> > >   
> > > > However, one thing that
> > > > I'm not sure it will be ideal with having two devices is synchronizing
> > > > accesses to the IP core. For example, the axi-dac driver (that I also
> > > > intend to bring upstream as part of this work) also has an userspace
> > > > interface so we need to be carefull.  
> > > 
> > > I'm a bit lost. I think we'd only expect to see a userspace interface
> > > on the 'front end' driver. Is there something on the driver handling the
> > > IP core as well (beyond debug which you can do whatever you like with :)
> > >   
> > 
> > See below... 
> > 
> > > > But I guess this is no different
> > > > for devices which are accessed by the in-kernel IIO interface.
> > > > 
> > > > I'm way more inclined to go with 2).  
> > > 
> > > I'm not sure I fully follow what option 2) is!  Perhaps a bit
> > > of ascii art?
> > >   
> > 
> > So option 2 is pretty much treating the IP core also as a separate IIO
> > device.
> > This is not so visible in axi-adc but on the dac counterpart, the settings
> > to generate tones (frequency/phase) are on the core. So, this could map to
> > IIO ABI.
> 
> Those are of interest to which bit of analog circuitry?  The big beyond the
> front end?  If so fine to provide that via the IIO interfaces, but the 
> front end device should proxy them (using the IIO in kernel consumer
> interfaces
> - which probably needs some work to make them safe to the provider going away.
> From below, seems that a more focused interface for this particular device
> representation may make sense.
> 

Not sure I'm following your question but I'll try... The settings are on the IP
core beyond the front end but I would say they are of interest of the front end
device. For instance, we might generate a tone on the IP core (setting phase and
frequency) but these bits are to be transmitted trough a dac/dds (our front end
device) analog, digital front end.

- Nuno Sá

> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: adi-axi-adc issues and how to properly support this designs
  2023-04-18 13:36       ` Nuno Sá
@ 2023-05-01 16:12         ` Jonathan Cameron
  2023-05-02  6:43           ` Nuno Sá
  0 siblings, 1 reply; 7+ messages in thread
From: Jonathan Cameron @ 2023-05-01 16:12 UTC (permalink / raw)
  To: Nuno Sá; +Cc: linux-iio

On Tue, 18 Apr 2023 15:36:35 +0200
Nuno Sá <noname.nuno@gmail.com> wrote:

> On Sun, 2023-04-16 at 14:33 +0100, Jonathan Cameron wrote:
> > On Fri, 14 Apr 2023 11:11:43 +0200
> > Nuno Sá <noname.nuno@gmail.com> wrote:
> >   
> > > Hi Jonathan,
> > > 
> > > Thanks for the feedback. Definitely helpful...
> > > 
> > > On Mon, 2023-04-10 at 13:21 +0100, Jonathan Cameron wrote:  
> > > > On Fri, 31 Mar 2023 16:40:44 +0200
> > > > Nuno Sá <noname.nuno@gmail.com> wrote:
> > > >     
> > > > > Hi Jonathan,
> > > > >     
> > > > 
> > > > Hmm. Complex topic, so I'd definitely like others to weigh in with
> > > > opinions .
> > > >     
> > > > > There are some major issues with the implementation we have upstream
> > > > > for the adi-axi-adc driver. The only user is the ad9467 driver. In the
> > > > > rest of the email I will refer to ad9467 as the converter device and
> > > > > axi-adc as the IP core. 
> > > > > 
> > > > > Let me start to state that these designs, in a very a basic way, have a
> > > > > converter connected to a IP core that typically lives in a FPGA. This
> > > > > core connects to a DMA core (which is used by the DMA buffer
> > > > > implementation) so we can keep up with the high rates of these
> > > > > converters. So there's a link between these devices and we try to
> > > > > implement that so far looking at them as one single IIO device.
> > > > > 
> > > > > Let's go to the major issues now:
> > > > > 
> > > > > 1) There is a circular dependency between the two device. And when
> > > > > compiled as modules for example, we cannot really rmmod the modules
> > > > > anymore:
> > > > > 
> > > > > "root@analog:~# rmmod ad9467
> > > > > rmmod: ERROR: Module ad9467 is in use
> > > > > 
> > > > > root@analog:~# rmmod adi-axi-adc.ko
> > > > > rmmod: ERROR: Module adi_axi_adc is in use by: ad9467
> > > > > "
> > > > > 
> > > > > This easy to understand as the ad9467 driver as a direct symbol
> > > > > dependency on the axi-adc one. And this one calls 'module_get()' as
> > > > > soon as it attaches to a "client"    
> > > > 
> > > > Ouch. module_get() never works for this.  Long time ago I thought it
> > > > did :(  An unbind will bypass that (as well any real hot unplug paths).
> > > >     
> > > 
> > > Yeps, it bypasses it but I just wanted to point out the flaw in the current
> > > design :)  
> > 
> > It was a surprise to me when Lars pointed this out years ago.
> > All we can do is treat it as a hint for something that makes
> > little sense for a user to deliberately do.
> >   
> > >   
> > > >     
> > > > > 
> > > > > 2) It's fairly easy to crash the kernel:
> > > > > 
> > > > > "
> > > > > root@analog:/sys/bus/spi/drivers/ad9467# echo spi0.0 > unbind
> > > > > root@analog:/sys/bus/iio/devices/iio:device2# echo 1 >
> > > > > buffer0/in_voltage0_en
> > > > > root@analog:/sys/bus/iio/devices/iio:device2# echo 1 > buffer0/enable
> > > > > [  132.349133] 8<--- cut here ---
> > > > > [  132.352193] Unable to handle kernel paging request at virtual
> > > > > address e0940000 when read
> > > > > [  132.360333] [e0940000] *pgd=0208b811, *pte=00000000, *ppte=00000000
> > > > > [  132.366668] Internal error: Oops: 7 [#1] SMP ARM
> > > > > [  132.371289] Modules linked in: ad9467 adi_axi_adc ad9517
> > > > > [  132.376609] CPU: 1 PID: 444 Comm: bash Not tainted 6.2.9-dirty #3
> > > > > [  132.382701] Hardware name: Xilinx Zynq Platform
> > > > > [  132.387223] PC is at adi_axi_adc_update_scan_mode+0x34/0x88
> > > > > [adi_axi_adc]
> > > > > [  132.394020] LR is at arm_heavy_mb+0x1c/0x38
> > > > > [  132.398212] pc : [<bf0060c4>]    lr : [<c031820c>]    psr: a0000013
> > > > > [  132.404392] sp : e0929e30  ip : deaddead  fp : c4430270
> > > > > [  132.409678] r10: c8a0bc18  r9 : deaddeac  r8 : c89b1c00
> > > > > [  132.414895] r7 : c4430340  r6 : c45db7c0  r5 : e093ffc0  r4 :
> > > > > 000003f0
> > > > > [  132.421413] r3 : 00010000  r2 : e0940000  r1 : 00000000  r0 :
> > > > > 00000000
> > > > > "    
> > > > 
> > > > It's nasty but generally that sort of path can be prevented with some
> > > > careful locking and checking of indicators that a component has gone away.
> > > > So whenever you unbind one aspect it notifies the other one that it has
> > > > gone
> > > > away. I'm not sure where to currently look for best practice in this.
> > > > 
> > > > There are a lot of similar situations - basically anywhere a set of 
> > > > drivers end up hanging off two buses.  v4l and media drivers in general
> > > > end up doing this a lot.
> > > > 
> > > > One I'm familiar with is how some of the CXL tear down happens and
> > > > that lead me to device_release_driver() which is also used by usb etc.
> > > > I've not looked at this for a while but it may provide the tear down
> > > > needed
> > > > if the right dance is used.  I think that will work if your convertor
> > > > driver is using services off the IP core driver and someone unbinds
> > > > that ip-core driver.
> > > >     
> > > 
> > > Yes, CCF does it with refcounting and some dummy clock ops for the case a
> > > clock
> > > provider is gone (with active consumers).  
> > 
> > That must be 'exciting' on occasion as no way to know if the clock disappeared
> > or reset and hence the device interface might lock up. 
> >   
> > >   
> > > >     
> > > > > 
> > > > > 3) This solution does not really scale. There are designs where we have
> > > > > multiple instances of the IP core attached to the client (typically if
> > > > > a converter like this has more than one channel, we have one instance
> > > > > of the core per channel). The way we have it know, that would not me
> > > > > feasable.
> > > > >     
> > > 
> > > ...
> > >   
> > > > 
> > > >     
> > > > > However, one thing that
> > > > > I'm not sure it will be ideal with having two devices is synchronizing
> > > > > accesses to the IP core. For example, the axi-dac driver (that I also
> > > > > intend to bring upstream as part of this work) also has an userspace
> > > > > interface so we need to be carefull.    
> > > > 
> > > > I'm a bit lost. I think we'd only expect to see a userspace interface
> > > > on the 'front end' driver. Is there something on the driver handling the
> > > > IP core as well (beyond debug which you can do whatever you like with :)
> > > >     
> > > 
> > > See below... 
> > >   
> > > > > But I guess this is no different
> > > > > for devices which are accessed by the in-kernel IIO interface.
> > > > > 
> > > > > I'm way more inclined to go with 2).    
> > > > 
> > > > I'm not sure I fully follow what option 2) is!  Perhaps a bit
> > > > of ascii art?
> > > >     
> > > 
> > > So option 2 is pretty much treating the IP core also as a separate IIO
> > > device.
> > > This is not so visible in axi-adc but on the dac counterpart, the settings
> > > to generate tones (frequency/phase) are on the core. So, this could map to
> > > IIO ABI.  
> > 
> > Those are of interest to which bit of analog circuitry?  The big beyond the
> > front end?  If so fine to provide that via the IIO interfaces, but the 
> > front end device should proxy them (using the IIO in kernel consumer
> > interfaces
> > - which probably needs some work to make them safe to the provider going away.
> > From below, seems that a more focused interface for this particular device
> > representation may make sense.
> >   
> 
> Not sure I'm following your question but I'll try... The settings are on the IP
> core beyond the front end but I would say they are of interest of the front end
> device. For instance, we might generate a tone on the IP core (setting phase and
> frequency) but these bits are to be transmitted trough a dac/dds (our front end
> device) analog, digital front end.

hehe. I avoided opening your email for a few weeks because I thought it might
drag me down a rat hole.  Then I find there is just a nice clean reply to
my question that would have taken me 2 mins to read.

That makes complete sense.  To my mind only the front end needs to be an IIO
device, but if it's convenient to represent the back end as one we can probably
make that work as well.

Jonathan

> 
> - Nuno Sá
> 
> >   


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: adi-axi-adc issues and how to properly support this designs
  2023-05-01 16:12         ` Jonathan Cameron
@ 2023-05-02  6:43           ` Nuno Sá
  0 siblings, 0 replies; 7+ messages in thread
From: Nuno Sá @ 2023-05-02  6:43 UTC (permalink / raw)
  To: Jonathan Cameron; +Cc: linux-iio

On Mon, 2023-05-01 at 17:12 +0100, Jonathan Cameron wrote:
> On Tue, 18 Apr 2023 15:36:35 +0200
> Nuno Sá <noname.nuno@gmail.com> wrote:
> 
> > On Sun, 2023-04-16 at 14:33 +0100, Jonathan Cameron wrote:
> > > On Fri, 14 Apr 2023 11:11:43 +0200
> > > Nuno Sá <noname.nuno@gmail.com> wrote:
> > >   
> > > > Hi Jonathan,
> > > > 
> > > > Thanks for the feedback. Definitely helpful...
> > > > 
> > > > On Mon, 2023-04-10 at 13:21 +0100, Jonathan Cameron wrote:  
> > > > > On Fri, 31 Mar 2023 16:40:44 +0200
> > > > > Nuno Sá <noname.nuno@gmail.com> wrote:
> > > > >     
> > > > > > Hi Jonathan,
> > > > > >     
> > > > > 
> > > > > Hmm. Complex topic, so I'd definitely like others to weigh in with
> > > > > opinions .
> > > > >     
> > > > > > There are some major issues with the implementation we have upstream
> > > > > > for the adi-axi-adc driver. The only user is the ad9467 driver. In the
> > > > > > rest of the email I will refer to ad9467 as the converter device and
> > > > > > axi-adc as the IP core. 
> > > > > > 
> > > > > > Let me start to state that these designs, in a very a basic way, have a
> > > > > > converter connected to a IP core that typically lives in a FPGA. This
> > > > > > core connects to a DMA core (which is used by the DMA buffer
> > > > > > implementation) so we can keep up with the high rates of these
> > > > > > converters. So there's a link between these devices and we try to
> > > > > > implement that so far looking at them as one single IIO device.
> > > > > > 
> > > > > > Let's go to the major issues now:
> > > > > > 
> > > > > > 1) There is a circular dependency between the two device. And when
> > > > > > compiled as modules for example, we cannot really rmmod the modules
> > > > > > anymore:
> > > > > > 
> > > > > > "root@analog:~# rmmod ad9467
> > > > > > rmmod: ERROR: Module ad9467 is in use
> > > > > > 
> > > > > > root@analog:~# rmmod adi-axi-adc.ko
> > > > > > rmmod: ERROR: Module adi_axi_adc is in use by: ad9467
> > > > > > "
> > > > > > 
> > > > > > This easy to understand as the ad9467 driver as a direct symbol
> > > > > > dependency on the axi-adc one. And this one calls 'module_get()' as
> > > > > > soon as it attaches to a "client"    
> > > > > 
> > > > > Ouch. module_get() never works for this.  Long time ago I thought it
> > > > > did :(  An unbind will bypass that (as well any real hot unplug paths).
> > > > >     
> > > > 
> > > > Yeps, it bypasses it but I just wanted to point out the flaw in the current
> > > > design :)  
> > > 
> > > It was a surprise to me when Lars pointed this out years ago.
> > > All we can do is treat it as a hint for something that makes
> > > little sense for a user to deliberately do.
> > >   
> > > >   
> > > > >     
> > > > > > 
> > > > > > 2) It's fairly easy to crash the kernel:
> > > > > > 
> > > > > > "
> > > > > > root@analog:/sys/bus/spi/drivers/ad9467# echo spi0.0 > unbind
> > > > > > root@analog:/sys/bus/iio/devices/iio:device2# echo 1 >
> > > > > > buffer0/in_voltage0_en
> > > > > > root@analog:/sys/bus/iio/devices/iio:device2# echo 1 > buffer0/enable
> > > > > > [  132.349133] 8<--- cut here ---
> > > > > > [  132.352193] Unable to handle kernel paging request at virtual
> > > > > > address e0940000 when read
> > > > > > [  132.360333] [e0940000] *pgd=0208b811, *pte=00000000, *ppte=00000000
> > > > > > [  132.366668] Internal error: Oops: 7 [#1] SMP ARM
> > > > > > [  132.371289] Modules linked in: ad9467 adi_axi_adc ad9517
> > > > > > [  132.376609] CPU: 1 PID: 444 Comm: bash Not tainted 6.2.9-dirty #3
> > > > > > [  132.382701] Hardware name: Xilinx Zynq Platform
> > > > > > [  132.387223] PC is at adi_axi_adc_update_scan_mode+0x34/0x88
> > > > > > [adi_axi_adc]
> > > > > > [  132.394020] LR is at arm_heavy_mb+0x1c/0x38
> > > > > > [  132.398212] pc : [<bf0060c4>]    lr : [<c031820c>]    psr: a0000013
> > > > > > [  132.404392] sp : e0929e30  ip : deaddead  fp : c4430270
> > > > > > [  132.409678] r10: c8a0bc18  r9 : deaddeac  r8 : c89b1c00
> > > > > > [  132.414895] r7 : c4430340  r6 : c45db7c0  r5 : e093ffc0  r4 :
> > > > > > 000003f0
> > > > > > [  132.421413] r3 : 00010000  r2 : e0940000  r1 : 00000000  r0 :
> > > > > > 00000000
> > > > > > "    
> > > > > 
> > > > > It's nasty but generally that sort of path can be prevented with some
> > > > > careful locking and checking of indicators that a component has gone away.
> > > > > So whenever you unbind one aspect it notifies the other one that it has
> > > > > gone
> > > > > away. I'm not sure where to currently look for best practice in this.
> > > > > 
> > > > > There are a lot of similar situations - basically anywhere a set of 
> > > > > drivers end up hanging off two buses.  v4l and media drivers in general
> > > > > end up doing this a lot.
> > > > > 
> > > > > One I'm familiar with is how some of the CXL tear down happens and
> > > > > that lead me to device_release_driver() which is also used by usb etc.
> > > > > I've not looked at this for a while but it may provide the tear down
> > > > > needed
> > > > > if the right dance is used.  I think that will work if your convertor
> > > > > driver is using services off the IP core driver and someone unbinds
> > > > > that ip-core driver.
> > > > >     
> > > > 
> > > > Yes, CCF does it with refcounting and some dummy clock ops for the case a
> > > > clock
> > > > provider is gone (with active consumers).  
> > > 
> > > That must be 'exciting' on occasion as no way to know if the clock disappeared
> > > or reset and hence the device interface might lock up. 
> > >   
> > > >   
> > > > >     
> > > > > > 
> > > > > > 3) This solution does not really scale. There are designs where we have
> > > > > > multiple instances of the IP core attached to the client (typically if
> > > > > > a converter like this has more than one channel, we have one instance
> > > > > > of the core per channel). The way we have it know, that would not me
> > > > > > feasable.
> > > > > >     
> > > > 
> > > > ...
> > > >   
> > > > > 
> > > > >     
> > > > > > However, one thing that
> > > > > > I'm not sure it will be ideal with having two devices is synchronizing
> > > > > > accesses to the IP core. For example, the axi-dac driver (that I also
> > > > > > intend to bring upstream as part of this work) also has an userspace
> > > > > > interface so we need to be carefull.    
> > > > > 
> > > > > I'm a bit lost. I think we'd only expect to see a userspace interface
> > > > > on the 'front end' driver. Is there something on the driver handling the
> > > > > IP core as well (beyond debug which you can do whatever you like with :)
> > > > >     
> > > > 
> > > > See below... 
> > > >   
> > > > > > But I guess this is no different
> > > > > > for devices which are accessed by the in-kernel IIO interface.
> > > > > > 
> > > > > > I'm way more inclined to go with 2).    
> > > > > 
> > > > > I'm not sure I fully follow what option 2) is!  Perhaps a bit
> > > > > of ascii art?
> > > > >     
> > > > 
> > > > So option 2 is pretty much treating the IP core also as a separate IIO
> > > > device.
> > > > This is not so visible in axi-adc but on the dac counterpart, the settings
> > > > to generate tones (frequency/phase) are on the core. So, this could map to
> > > > IIO ABI.  
> > > 
> > > Those are of interest to which bit of analog circuitry?  The big beyond the
> > > front end?  If so fine to provide that via the IIO interfaces, but the 
> > > front end device should proxy them (using the IIO in kernel consumer
> > > interfaces
> > > - which probably needs some work to make them safe to the provider going away.
> > > From below, seems that a more focused interface for this particular device
> > > representation may make sense.
> > >   
> > 
> > Not sure I'm following your question but I'll try... The settings are on the IP
> > core beyond the front end but I would say they are of interest of the front end
> > device. For instance, we might generate a tone on the IP core (setting phase and
> > frequency) but these bits are to be transmitted trough a dac/dds (our front end
> > device) analog, digital front end.
> 
> hehe. I avoided opening your email for a few weeks because I thought it might
> drag me down a rat hole.  Then I find there is just a nice clean reply to
> my question that would have taken me 2 mins to read.
> 

Ehehe. nice surprise then :)

> That makes complete sense.  To my mind only the front end needs to be an IIO
> device, but if it's convenient to represent the back end as one we can probably
> make that work as well.
> > 

Alright, I'll go then the one IIO device route for v1 (at least start working with
that goal :)) so we can see how it looks like...

Thanks for your feedback on this!

- Nuno Sá

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-05-02  6:43 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-31 14:40 adi-axi-adc issues and how to properly support this designs Nuno Sá
2023-04-10 12:21 ` Jonathan Cameron
2023-04-14  9:11   ` Nuno Sá
2023-04-16 13:33     ` Jonathan Cameron
2023-04-18 13:36       ` Nuno Sá
2023-05-01 16:12         ` Jonathan Cameron
2023-05-02  6:43           ` Nuno Sá

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.