All of lore.kernel.org
 help / color / mirror / Atom feed
* Tegra20 parallel video capture driver
@ 2022-04-29 16:49 Luca Ceresoli
  2022-05-05  8:54 ` Luca Ceresoli
  2022-05-05  9:42 ` Mikko Perttunen
  0 siblings, 2 replies; 6+ messages in thread
From: Luca Ceresoli @ 2022-04-29 16:49 UTC (permalink / raw)
  To: linux-tegra
  Cc: thierry.reding, dmitry.osipenko, Sowjanya Komatineni, Jon Hunter

Hello,

I am working to implement a driver for the Tegra20 camera
interface (VI) and have it merged in mainline Linux. I have already
done part of the work, but now I'm facing some difficulties, especially
with respect to interaction with the syncpt feature of host1x. I have a
few questions that hopefully someone here can reply to, but first let
me lay some background.

First, the good news.

A good thing is I have an old reference image using a vendor kernel
based on 3.1 which works and I can capture video from a parallel smart
sensor in YUV. This is on a custom hardware.

Second, there is a (staging) video capture driver in mainline [0], but
the huge delta between 3.1 and mainline makes the two drivers not
trivial to compare.

Now to the bad news.

First, that the mainline driver only implements CSI input for
tegra210 and I only have a parallel camera and a tegra20. The
tegra20 documentation I have does not document parallel parallel video
at all.

Second, the VI is heavily based on the host1x peripheral, and
especially the Synchronization Point (syncpt) feature. Syncpts are also
barely documented and I don't have any previous experience with this
kind of device. And the syncpt (and perhaps host1x too) implementation
in the vendor kernel vs mainline is very different.

And here's my status.

The mainline staging driver is already structured to support different
SoCs, so I added tegra20 there, initially as a copy of tegra210. I did
the same to add VIP (parallel input) support in addition to CSI. Then I
filled the gaps as far as I could and now I have register access in the
mainline driver that produces pretty much the same writes as the vendor
kernel does, but it does so using the same driver structure that is in
mainline, quite cleanly.

Now I am stuck at trying to understand how to make syncpt code work. As
said the syncpt APIs are very different from what I see in the old
vendor kernel. Thus I cannot try to do make the mainline driver behave
like the old one. Moreover the mainline driver does many more calls to
syncpt code than the old one does.

And finally here are some questions.

1. Is there any good documentation on host1x, especially syncpt?

2. Is the VIP (parallel input) section of VI (video capture) for
   Tegra20 documented anywhere?

3. The old driver accesses the syncpt using fixed IDs from #defines (15
   for VI, 16 for CSI), while the mainline driver uses IDs obtained at
   runtime from host1x_syncpt_request(). Is it correct that one can use
   any of the 32 available syncpts, and the old driver using fixed
   values is just an implementation choice?

4. I am currently working on a version of tegra_channel_capture_frame()
   [1] for tegra20 VIP and other code involving syncpt. The mainline
   driver calls many different host1x_syncpt_*() functions, while the
   old driver does call very few of them. Can anybody clarify the
   meaning of the various syncpt calls?

5. In tegra_channel_capture_frame() the call to host1x_syncpt_wait()
   always returns -EAGAIN. Where would you start to investigate the
   reason of this timeout?

I think it is enough for today, thank you in advance for any feedback!

[0]
https://elixir.bootlin.com/linux/latest/source/drivers/staging/media/tegra-video

[1]
https://elixir.bootlin.com/linux/v5.17.5/source/drivers/staging/media/tegra-video/tegra210.c#L325

Best regards.
-- 
Luca Ceresoli, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Tegra20 parallel video capture driver
  2022-04-29 16:49 Tegra20 parallel video capture driver Luca Ceresoli
@ 2022-05-05  8:54 ` Luca Ceresoli
  2022-05-05  9:42 ` Mikko Perttunen
  1 sibling, 0 replies; 6+ messages in thread
From: Luca Ceresoli @ 2022-05-05  8:54 UTC (permalink / raw)
  To: linux-tegra
  Cc: thierry.reding, dmitry.osipenko, Sowjanya Komatineni, Jon Hunter

Hi,

Il giorno Fri, 29 Apr 2022 18:49:31 +0200
Luca Ceresoli <luca.ceresoli@bootlin.com> ha scritto:

> Hello,
> 
> I am working to implement a driver for the Tegra20 camera
> interface (VI) and have it merged in mainline Linux. I have already
> done part of the work, but now I'm facing some difficulties,
> especially with respect to interaction with the syncpt feature of
> host1x. I have a few questions that hopefully someone here can reply
> to, but first let me lay some background.
> 
> First, the good news.
> 
> A good thing is I have an old reference image using a vendor kernel
> based on 3.1 which works and I can capture video from a parallel smart
> sensor in YUV. This is on a custom hardware.
> 
> Second, there is a (staging) video capture driver in mainline [0], but
> the huge delta between 3.1 and mainline makes the two drivers not
> trivial to compare.
> 
> Now to the bad news.
> 
> First, that the mainline driver only implements CSI input for
> tegra210 and I only have a parallel camera and a tegra20. The
> tegra20 documentation I have does not document parallel parallel video
> at all.
> 
> Second, the VI is heavily based on the host1x peripheral, and
> especially the Synchronization Point (syncpt) feature. Syncpts are
> also barely documented and I don't have any previous experience with
> this kind of device. And the syncpt (and perhaps host1x too)
> implementation in the vendor kernel vs mainline is very different.
> 
> And here's my status.
> 
> The mainline staging driver is already structured to support different
> SoCs, so I added tegra20 there, initially as a copy of tegra210. I did
> the same to add VIP (parallel input) support in addition to CSI. Then
> I filled the gaps as far as I could and now I have register access in
> the mainline driver that produces pretty much the same writes as the
> vendor kernel does, but it does so using the same driver structure
> that is in mainline, quite cleanly.
> 
> Now I am stuck at trying to understand how to make syncpt code work.
> As said the syncpt APIs are very different from what I see in the old
> vendor kernel. Thus I cannot try to do make the mainline driver behave
> like the old one. Moreover the mainline driver does many more calls to
> syncpt code than the old one does.
> 
> And finally here are some questions.
> 
> 1. Is there any good documentation on host1x, especially syncpt?
> 
> 2. Is the VIP (parallel input) section of VI (video capture) for
>    Tegra20 documented anywhere?
> 
> 3. The old driver accesses the syncpt using fixed IDs from #defines
> (15 for VI, 16 for CSI), while the mainline driver uses IDs obtained
> at runtime from host1x_syncpt_request(). Is it correct that one can
> use any of the 32 available syncpts, and the old driver using fixed
>    values is just an implementation choice?
> 
> 4. I am currently working on a version of
> tegra_channel_capture_frame() [1] for tegra20 VIP and other code
> involving syncpt. The mainline driver calls many different
> host1x_syncpt_*() functions, while the old driver does call very few
> of them. Can anybody clarify the meaning of the various syncpt calls?
> 
> 5. In tegra_channel_capture_frame() the call to host1x_syncpt_wait()
>    always returns -EAGAIN. Where would you start to investigate the
>    reason of this timeout?
> 
> I think it is enough for today, thank you in advance for any feedback!
> 
> [0]
> https://elixir.bootlin.com/linux/latest/source/drivers/staging/media/tegra-video
> 
> [1]
> https://elixir.bootlin.com/linux/v5.17.5/source/drivers/staging/media/tegra-video/tegra210.c#L325

Should it be any helpful, here is a reference to the old driver in the
vendor 3.1 branch that I am using as a reference:

https://github.com/SKIDATA/linux/blob/l4t-3.1.y/drivers/media/video/tegra_v4l2_camera.c

-- 
Luca Ceresoli, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Tegra20 parallel video capture driver
  2022-04-29 16:49 Tegra20 parallel video capture driver Luca Ceresoli
  2022-05-05  8:54 ` Luca Ceresoli
@ 2022-05-05  9:42 ` Mikko Perttunen
  2022-05-05 15:32   ` Luca Ceresoli
  2022-05-09 15:56   ` Luca Ceresoli
  1 sibling, 2 replies; 6+ messages in thread
From: Mikko Perttunen @ 2022-05-05  9:42 UTC (permalink / raw)
  To: Luca Ceresoli, linux-tegra
  Cc: thierry.reding, dmitry.osipenko, Sowjanya Komatineni, Jon Hunter

On 4/29/22 19:49, Luca Ceresoli wrote:
> Hello,

Hi!

> 
> I am working to implement a driver for the Tegra20 camera
> interface (VI) and have it merged in mainline Linux. I have already
> done part of the work, but now I'm facing some difficulties, especially
> with respect to interaction with the syncpt feature of host1x. I have a
> few questions that hopefully someone here can reply to, but first let
> me lay some background.
> 
> First, the good news.
> 
> A good thing is I have an old reference image using a vendor kernel
> based on 3.1 which works and I can capture video from a parallel smart
> sensor in YUV. This is on a custom hardware.
> 
> Second, there is a (staging) video capture driver in mainline [0], but
> the huge delta between 3.1 and mainline makes the two drivers not
> trivial to compare.
> 
> Now to the bad news.
> 
> First, that the mainline driver only implements CSI input for
> tegra210 and I only have a parallel camera and a tegra20. The
> tegra20 documentation I have does not document parallel parallel video
> at all.
> 
> Second, the VI is heavily based on the host1x peripheral, and
> especially the Synchronization Point (syncpt) feature. Syncpts are also
> barely documented and I don't have any previous experience with this
> kind of device. And the syncpt (and perhaps host1x too) implementation
> in the vendor kernel vs mainline is very different.
> 
> And here's my status.
> 
> The mainline staging driver is already structured to support different
> SoCs, so I added tegra20 there, initially as a copy of tegra210. I did
> the same to add VIP (parallel input) support in addition to CSI. Then I
> filled the gaps as far as I could and now I have register access in the
> mainline driver that produces pretty much the same writes as the vendor
> kernel does, but it does so using the same driver structure that is in
> mainline, quite cleanly.
> 
> Now I am stuck at trying to understand how to make syncpt code work. As
> said the syncpt APIs are very different from what I see in the old
> vendor kernel. Thus I cannot try to do make the mainline driver behave
> like the old one. Moreover the mainline driver does many more calls to
> syncpt code than the old one does.
> 
> And finally here are some questions.
> 
> 1. Is there any good documentation on host1x, especially syncpt?

There isn't much documentation, apart from what exists in the TRMs. But 
basically, syncpoints are an array of 32-bit unsigned integers in 
hardware, with each 32-bit value having the following fundamental 
operations:

* Increment by one, wrapping (i.e. mod 2^32)
* Wait for value to reach threshold, such that if you consider the 32 
bit unsigned integers a circle, then the half-circle preceding 
(integer-wise) the threshold value are considered to be in the past and 
the half-circle succeeding are considered to be in the future. For 
example, if we specify threshold 0xffffffff then 0x0 is considered to be 
one increment in the future. Waits can be done either by HW stalling 
host1x channels (similar to DMA) or CPU interrupts.

> 
> 2. Is the VIP (parallel input) section of VI (video capture) for
>     Tegra20 documented anywhere?

If not in the TRM, then no.

> 
> 3. The old driver accesses the syncpt using fixed IDs from #defines (15
>     for VI, 16 for CSI), while the mainline driver uses IDs obtained at
>     runtime from host1x_syncpt_request(). Is it correct that one can use
>     any of the 32 available syncpts, and the old driver using fixed
>     values is just an implementation choice?

The old code used to have hard-coded syncpoint IDs for each use, but now 
we allocate dynamically. This is a global namespace so you need to call 
host1x_syncpt_request to get an ID that is not colliding with anyone else.

> 
> 4. I am currently working on a version of tegra_channel_capture_frame()
>     [1] for tegra20 VIP and other code involving syncpt. The mainline
>     driver calls many different host1x_syncpt_*() functions, while the
>     old driver does call very few of them. Can anybody clarify the
>     meaning of the various syncpt calls?

- request/put -- allocate and free syncpoint
- incr/wait -- as described before
- incr_max -- when syncpoint is used with host1x channels, used to 
indicate to job tracking code how many times the job will increment the 
syncpoint. when not using channels, should not be used -- looks like the 
tegra210 code calls this a couple of times but it probably shouldn't be 
doing that.

> 
> 5. In tegra_channel_capture_frame() the call to host1x_syncpt_wait()
>     always returns -EAGAIN. Where would you start to investigate the
>     reason of this timeout?

This means that the syncpoint's value did not reach the specified 
threshold. Either the specified threshold is incorrect, or the engine 
did not increment the syncpoint (or did not increment sufficient number 
of times), either because the syncpoint increment was not programmed 
correctly, or the condition for the increment was not fulfilled (i.e. 
the engine got stuck for whatever reason).

Apart from the wait, you can use /sys/kernel/debug/tegra-host1x/status 
to track syncpoint value. Apart from that, if you're not using channels, 
you'll just have to look at engine registers to see why it's not processing.

Cheers,
Mikko

> 
> I think it is enough for today, thank you in advance for any feedback!
> 
> [0]
> https://elixir.bootlin.com/linux/latest/source/drivers/staging/media/tegra-video
> 
> [1]
> https://elixir.bootlin.com/linux/v5.17.5/source/drivers/staging/media/tegra-video/tegra210.c#L325
> 
> Best regards.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Tegra20 parallel video capture driver
  2022-05-05  9:42 ` Mikko Perttunen
@ 2022-05-05 15:32   ` Luca Ceresoli
  2022-05-09 15:56   ` Luca Ceresoli
  1 sibling, 0 replies; 6+ messages in thread
From: Luca Ceresoli @ 2022-05-05 15:32 UTC (permalink / raw)
  To: Mikko Perttunen
  Cc: linux-tegra, thierry.reding, dmitry.osipenko,
	Sowjanya Komatineni, Jon Hunter

Hi Mikko,

Il giorno Thu, 5 May 2022 12:42:08 +0300
Mikko Perttunen <cyndis@kapsi.fi> ha scritto:

> On 4/29/22 19:49, Luca Ceresoli wrote:
> > Hello,  
> 
> Hi!
> 
> > 
> > I am working to implement a driver for the Tegra20 camera
> > interface (VI) and have it merged in mainline Linux. I have already
> > done part of the work, but now I'm facing some difficulties,
> > especially with respect to interaction with the syncpt feature of
> > host1x. I have a few questions that hopefully someone here can
> > reply to, but first let me lay some background.
> > 
> > First, the good news.
> > 
> > A good thing is I have an old reference image using a vendor kernel
> > based on 3.1 which works and I can capture video from a parallel
> > smart sensor in YUV. This is on a custom hardware.
> > 
> > Second, there is a (staging) video capture driver in mainline [0],
> > but the huge delta between 3.1 and mainline makes the two drivers
> > not trivial to compare.
> > 
> > Now to the bad news.
> > 
> > First, that the mainline driver only implements CSI input for
> > tegra210 and I only have a parallel camera and a tegra20. The
> > tegra20 documentation I have does not document parallel parallel
> > video at all.
> > 
> > Second, the VI is heavily based on the host1x peripheral, and
> > especially the Synchronization Point (syncpt) feature. Syncpts are
> > also barely documented and I don't have any previous experience
> > with this kind of device. And the syncpt (and perhaps host1x too)
> > implementation in the vendor kernel vs mainline is very different.
> > 
> > And here's my status.
> > 
> > The mainline staging driver is already structured to support
> > different SoCs, so I added tegra20 there, initially as a copy of
> > tegra210. I did the same to add VIP (parallel input) support in
> > addition to CSI. Then I filled the gaps as far as I could and now I
> > have register access in the mainline driver that produces pretty
> > much the same writes as the vendor kernel does, but it does so
> > using the same driver structure that is in mainline, quite cleanly.
> > 
> > Now I am stuck at trying to understand how to make syncpt code
> > work. As said the syncpt APIs are very different from what I see in
> > the old vendor kernel. Thus I cannot try to do make the mainline
> > driver behave like the old one. Moreover the mainline driver does
> > many more calls to syncpt code than the old one does.
> > 
> > And finally here are some questions.
> > 
> > 1. Is there any good documentation on host1x, especially syncpt?  
> 
> There isn't much documentation, apart from what exists in the TRMs.
> But basically, syncpoints are an array of 32-bit unsigned integers in 
> hardware, with each 32-bit value having the following fundamental 
> operations:
> 
> * Increment by one, wrapping (i.e. mod 2^32)
> * Wait for value to reach threshold, such that if you consider the 32 
> bit unsigned integers a circle, then the half-circle preceding 
> (integer-wise) the threshold value are considered to be in the past
> and the half-circle succeeding are considered to be in the future.
> For example, if we specify threshold 0xffffffff then 0x0 is
> considered to be one increment in the future. Waits can be done
> either by HW stalling host1x channels (similar to DMA) or CPU
> interrupts.
> 
> > 
> > 2. Is the VIP (parallel input) section of VI (video capture) for
> >     Tegra20 documented anywhere?  
> 
> If not in the TRM, then no.
> 
> > 
> > 3. The old driver accesses the syncpt using fixed IDs from #defines
> > (15 for VI, 16 for CSI), while the mainline driver uses IDs
> > obtained at runtime from host1x_syncpt_request(). Is it correct
> > that one can use any of the 32 available syncpts, and the old
> > driver using fixed values is just an implementation choice?  
> 
> The old code used to have hard-coded syncpoint IDs for each use, but
> now we allocate dynamically. This is a global namespace so you need
> to call host1x_syncpt_request to get an ID that is not colliding with
> anyone else.
> 
> > 
> > 4. I am currently working on a version of
> > tegra_channel_capture_frame() [1] for tegra20 VIP and other code
> > involving syncpt. The mainline driver calls many different
> > host1x_syncpt_*() functions, while the old driver does call very
> > few of them. Can anybody clarify the meaning of the various syncpt
> > calls?  
> 
> - request/put -- allocate and free syncpoint
> - incr/wait -- as described before
> - incr_max -- when syncpoint is used with host1x channels, used to 
> indicate to job tracking code how many times the job will increment
> the syncpoint. when not using channels, should not be used -- looks
> like the tegra210 code calls this a couple of times but it probably
> shouldn't be doing that.
> 
> > 
> > 5. In tegra_channel_capture_frame() the call to host1x_syncpt_wait()
> >     always returns -EAGAIN. Where would you start to investigate the
> >     reason of this timeout?  
> 
> This means that the syncpoint's value did not reach the specified 
> threshold. Either the specified threshold is incorrect, or the engine 
> did not increment the syncpoint (or did not increment sufficient
> number of times), either because the syncpoint increment was not
> programmed correctly, or the condition for the increment was not
> fulfilled (i.e. the engine got stuck for whatever reason).
> 
> Apart from the wait, you can use
> /sys/kernel/debug/tegra-host1x/status to track syncpoint value. Apart
> from that, if you're not using channels, you'll just have to look at
> engine registers to see why it's not processing.

Thank you very much! I appreciate a lot your feedback and it will be
helpful for my advancement. I will probably come back with more
questions later on.

Best regards.
-- 
Luca Ceresoli, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Tegra20 parallel video capture driver
  2022-05-05  9:42 ` Mikko Perttunen
  2022-05-05 15:32   ` Luca Ceresoli
@ 2022-05-09 15:56   ` Luca Ceresoli
  2022-05-10  7:40     ` Mikko Perttunen
  1 sibling, 1 reply; 6+ messages in thread
From: Luca Ceresoli @ 2022-05-09 15:56 UTC (permalink / raw)
  To: Mikko Perttunen
  Cc: linux-tegra, thierry.reding, dmitry.osipenko,
	Sowjanya Komatineni, Jon Hunter

Hi Mikko, All,

as anticipated, here is another batch of questions that arose after the
(very appreciated!) feedback from Mikko.

Il giorno Thu, 5 May 2022 12:42:08 +0300
Mikko Perttunen <cyndis@kapsi.fi> ha scritto:
[...]
> > 1. Is there any good documentation on host1x, especially syncpt?  
> 
> There isn't much documentation, apart from what exists in the TRMs.
> But basically, syncpoints are an array of 32-bit unsigned integers in 
> hardware, with each 32-bit value having the following fundamental 
> operations:
> 
> * Increment by one, wrapping (i.e. mod 2^32)
> * Wait for value to reach threshold, such that if you consider the 32 
> bit unsigned integers a circle, then the half-circle preceding 
> (integer-wise) the threshold value are considered to be in the past
> and the half-circle succeeding are considered to be in the future.
> For example, if we specify threshold 0xffffffff then 0x0 is
> considered to be one increment in the future. Waits can be done
> either by HW stalling host1x channels (similar to DMA) or CPU
> interrupts.

Do you think the mainline host1x implementation is 100% compatible with
tegra20? So far I have been assuming that it is, but since my code flow
is now pretty much equal to the old 3.1 driver and I am not receiving
any syncpt events, I started wondering whether the syncpt calls with an
equivalent name really do the same thing or not. Specifically the 
nvhost_syncpt_wait_timeout_ext() in 3.1 looks to have the same
structure as host1x_syncpt_wait() in mainline, but I still haven't dung
into the very details to see if there is any subtle difference.

> > 4. I am currently working on a version of
> > tegra_channel_capture_frame() [1] for tegra20 VIP and other code
> > involving syncpt. The mainline driver calls many different
> > host1x_syncpt_*() functions, while the old driver does call very
> > few of them. Can anybody clarify the meaning of the various syncpt
> > calls?  
> 
> - request/put -- allocate and free syncpoint
> - incr/wait -- as described before
> - incr_max -- when syncpoint is used with host1x channels, used to 
> indicate to job tracking code how many times the job will increment
> the syncpoint. when not using channels, should not be used -- looks
> like the tegra210 code calls this a couple of times but it probably
> shouldn't be doing that.

As I understand it, for my own use case I should never call _incr as it
is what the VI hardware is supposed to do, and rather i should call
_wait to wait for that increment to happen. Coherently, this is what
the 3.1 driver does: it reads the syncpt value into a plain variable,
increments the variable, then calls _wait to wait the syncpt to reach
the same value as the variable.

But the mainline code the the tegra210 VI does many more syncpt calls
and it does not use a variable in the same way. It rather calls
host1x_syncpt_incr() and host1x_syncpt_incr_max() (which it shouldn't
call as you said) but I am not sure these calls achieve the same result.

Do you think the calls that exist in
drivers/staging/media/tegra-video/tegra210.c do make sense for tegra20
too?

> > 5. In tegra_channel_capture_frame() the call to host1x_syncpt_wait()
> >     always returns -EAGAIN. Where would you start to investigate the
> >     reason of this timeout?  
> 
> This means that the syncpoint's value did not reach the specified 
> threshold. Either the specified threshold is incorrect, or the engine 
> did not increment the syncpoint (or did not increment sufficient
> number of times), either because the syncpoint increment was not
> programmed correctly, or the condition for the increment was not
> fulfilled (i.e. the engine got stuck for whatever reason).
> 
> Apart from the wait, you can use
> /sys/kernel/debug/tegra-host1x/status to track syncpoint value. Apart
> from that, if you're not using channels, you'll just have to look at
> engine registers to see why it's not processing.

About the latest suggestion, which status register are you referring
to? Unfortunately the TRM I have does not document the VIP registers at
all, and the 3.1 driver does not use of them so it's impossible to
infer the bit meanings. Can you suggest any specific register that I
should check?


Another question is about the clocks. I tried to set them up as
similarly as possible to the working 3.1 code, but couldn't have them
identical.

The 3.1 kernel configures this way: 

  pll_c @ 600 MHZ --->     VI @ 150 MHz (/ 4)
  pll_p @ 216 MHZ ---> HOST1X @ 144 MHz (x 2 / 3) 

Mainline does not seem to be able to use a fractional divider for pll_p
to get 144 MHz from 216 MHz (if I ask for 144 i get 108), thus I
currently configured this way:

  pll_c @ 600 MHZ -+->     VI @ 150 MHz (/ 4)
                   `-> HOST1X @ 150 MHz (/ 4)

Do you think there would be any issues in clocking host1x at 150 MHz
instead of 144? 


One last quetion. About the VI configuration, in an attempt to getting
it working with the simplest possible setup I tried using an
interleaved format instead of multiplanar. This results in writing
format 3 instead of 6 in the low bits of
TEGRA_VI_VI_FIRST_OUTPUT_CONTROL. Do you think this may be a problem?


Kind regards,
Luca

-- 
Luca Ceresoli, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Tegra20 parallel video capture driver
  2022-05-09 15:56   ` Luca Ceresoli
@ 2022-05-10  7:40     ` Mikko Perttunen
  0 siblings, 0 replies; 6+ messages in thread
From: Mikko Perttunen @ 2022-05-10  7:40 UTC (permalink / raw)
  To: Luca Ceresoli
  Cc: linux-tegra, thierry.reding, dmitry.osipenko,
	Sowjanya Komatineni, Jon Hunter

On 5/9/22 18:56, Luca Ceresoli wrote:
> Hi Mikko, All,
> 
> as anticipated, here is another batch of questions that arose after the
> (very appreciated!) feedback from Mikko.
> 
> Il giorno Thu, 5 May 2022 12:42:08 +0300
> Mikko Perttunen <cyndis@kapsi.fi> ha scritto:
> [...]
>>> 1. Is there any good documentation on host1x, especially syncpt?
>>
>> There isn't much documentation, apart from what exists in the TRMs.
>> But basically, syncpoints are an array of 32-bit unsigned integers in
>> hardware, with each 32-bit value having the following fundamental
>> operations:
>>
>> * Increment by one, wrapping (i.e. mod 2^32)
>> * Wait for value to reach threshold, such that if you consider the 32
>> bit unsigned integers a circle, then the half-circle preceding
>> (integer-wise) the threshold value are considered to be in the past
>> and the half-circle succeeding are considered to be in the future.
>> For example, if we specify threshold 0xffffffff then 0x0 is
>> considered to be one increment in the future. Waits can be done
>> either by HW stalling host1x channels (similar to DMA) or CPU
>> interrupts.
> 
> Do you think the mainline host1x implementation is 100% compatible with
> tegra20? So far I have been assuming that it is, but since my code flow
> is now pretty much equal to the old 3.1 driver and I am not receiving
> any syncpt events, I started wondering whether the syncpt calls with an
> equivalent name really do the same thing or not. Specifically the
> nvhost_syncpt_wait_timeout_ext() in 3.1 looks to have the same
> structure as host1x_syncpt_wait() in mainline, but I still haven't dung
> into the very details to see if there is any subtle difference.
> 

It should be compatible. nvhost_syncpt_wait_timeout_ext is essentially 
the same as host1x_syncpt_wait.

>>> 4. I am currently working on a version of
>>> tegra_channel_capture_frame() [1] for tegra20 VIP and other code
>>> involving syncpt. The mainline driver calls many different
>>> host1x_syncpt_*() functions, while the old driver does call very
>>> few of them. Can anybody clarify the meaning of the various syncpt
>>> calls?
>>
>> - request/put -- allocate and free syncpoint
>> - incr/wait -- as described before
>> - incr_max -- when syncpoint is used with host1x channels, used to
>> indicate to job tracking code how many times the job will increment
>> the syncpoint. when not using channels, should not be used -- looks
>> like the tegra210 code calls this a couple of times but it probably
>> shouldn't be doing that.
> 
> As I understand it, for my own use case I should never call _incr as it
> is what the VI hardware is supposed to do, and rather i should call
> _wait to wait for that increment to happen. Coherently, this is what
> the 3.1 driver does: it reads the syncpt value into a plain variable,
> increments the variable, then calls _wait to wait the syncpt to reach
> the same value as the variable.

This is correct.

> 
> But the mainline code the the tegra210 VI does many more syncpt calls
> and it does not use a variable in the same way. It rather calls
> host1x_syncpt_incr() and host1x_syncpt_incr_max() (which it shouldn't
> call as you said) but I am not sure these calls achieve the same result.

Looks like it is only calling _incr in timeout situations. So if VI gets 
stuck, the driver does the increment. (I don't know if that's the best 
way to handle timeouts, but it's a way)

> 
> Do you think the calls that exist in
> drivers/staging/media/tegra-video/tegra210.c do make sense for tegra20
> too?

I would expect the general use of syncpoints to work for Tegra20 as 
well, but you might want to do something simpler as well.

> 
>>> 5. In tegra_channel_capture_frame() the call to host1x_syncpt_wait()
>>>      always returns -EAGAIN. Where would you start to investigate the
>>>      reason of this timeout?
>>
>> This means that the syncpoint's value did not reach the specified
>> threshold. Either the specified threshold is incorrect, or the engine
>> did not increment the syncpoint (or did not increment sufficient
>> number of times), either because the syncpoint increment was not
>> programmed correctly, or the condition for the increment was not
>> fulfilled (i.e. the engine got stuck for whatever reason).
>>
>> Apart from the wait, you can use
>> /sys/kernel/debug/tegra-host1x/status to track syncpoint value. Apart
>> from that, if you're not using channels, you'll just have to look at
>> engine registers to see why it's not processing.
> 
> About the latest suggestion, which status register are you referring
> to? Unfortunately the TRM I have does not document the VIP registers at
> all, and the 3.1 driver does not use of them so it's impossible to
> infer the bit meanings. Can you suggest any specific register that I
> should check?
> 

Sorry, it was just a general suggestion; I'm not very familiar with VI 
in general and Tegra20 VI not at all.

> 
> Another question is about the clocks. I tried to set them up as
> similarly as possible to the working 3.1 code, but couldn't have them
> identical.
> 
> The 3.1 kernel configures this way:
> 
>    pll_c @ 600 MHZ --->     VI @ 150 MHz (/ 4)
>    pll_p @ 216 MHZ ---> HOST1X @ 144 MHz (x 2 / 3)
> 
> Mainline does not seem to be able to use a fractional divider for pll_p
> to get 144 MHz from 216 MHz (if I ask for 144 i get 108), thus I
> currently configured this way:
> 
>    pll_c @ 600 MHZ -+->     VI @ 150 MHz (/ 4)
>                     `-> HOST1X @ 150 MHz (/ 4)
> 
> Do you think there would be any issues in clocking host1x at 150 MHz
> instead of 144?
> 

I would not expect it to be a problem although I don't know why it would 
be necessary to deviate from what upstream sets for host1x by default. 
VI clock (and possible other camera-related clocks) may be more precise 
though.

> 
> One last quetion. About the VI configuration, in an attempt to getting
> it working with the simplest possible setup I tried using an
> interleaved format instead of multiplanar. This results in writing
> format 3 instead of 6 in the low bits of
> TEGRA_VI_VI_FIRST_OUTPUT_CONTROL. Do you think this may be a problem?
> 

This field seems to correspond to various output frame formats. 3 seems 
to match YUV422 interleaved and 6 YUV420 planar, so I think that is 
working as expected.

Cheers,
Mikko

> 
> Kind regards,
> Luca
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-05-10  7:43 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-29 16:49 Tegra20 parallel video capture driver Luca Ceresoli
2022-05-05  8:54 ` Luca Ceresoli
2022-05-05  9:42 ` Mikko Perttunen
2022-05-05 15:32   ` Luca Ceresoli
2022-05-09 15:56   ` Luca Ceresoli
2022-05-10  7:40     ` Mikko Perttunen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.