Issues around packet capture when secondary process is doing rx/tx

All of lore.kernel.org
 help / color / mirror / Atom feed

* Issues around packet capture when secondary process is doing rx/tx
@ 2024-01-08  1:59 Stephen Hemminger
  2024-01-08 10:41 ` Morten Brørup
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Stephen Hemminger @ 2024-01-08  1:59 UTC (permalink / raw)
  To: dev; +Cc: arshdeep.kaur, Gowda, Sandesh, Reshma Pattan

I have been looking at a problem reported by Sandesh 
where packet capture does not work if rx/tx burst is done in secondary process.

The root cause is that existing rx/tx callback model just doesn't work
unless the process doing the rx/tx burst calls is the same one that
registered the callbacks.

An example sequence would be:
	1. dumpcap (or pdump) as secondary tells pdump in primary to register callback
	2. secondary process calls rx_burst.
	3. rx_burst sees the callback but it has pointer pdump_rx which is not necessarily
	   at same location in primary and secondary process. 
	4. indirect function call in secondary to bad location likely causes crash.

Some possible workarounds.
	1. Keep callback list per-process: messy, but won't crash. Capture won't work
           without other changes. In this primary would register callback, but secondaries
           would not use them in rx/tx burst.

	2. Replace use of rx/tx callback in pdump with change to rte_ethdev to have
           a capture flag. (i.e. don't use indirection).  Likely ABI problems.
           Basically, ignore the rx/tx callback mechanism. This is my preferred
	   solution.

	3. Some fix up mechanism (in EAL mp support?) to have each process fixup
           its callback mechanism.

	4. Do something in pdump_init to register the callback in same process context
	   (probably need callbacks to be per-process). Would mean callback is always
           on independent of capture being enabled.

        5. Get rid of indirect function call pointer, and replace it by index into
           a static table of callback functions. Every process would have same code
           (in this case pdump_rx) but at different address.  Requires all callbacks
           to be statically defined at build time.

The existing rx/tx callback is not safe id rx/tx burst is called from different process
than where callback is registered.



^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: Issues around packet capture when secondary process is doing rx/tx
  2024-01-08  1:59 Issues around packet capture when secondary process is doing rx/tx Stephen Hemminger
@ 2024-01-08 10:41 ` Morten Brørup
  2024-04-03 11:43   ` Ferruh Yigit
  2024-01-08 15:13 ` Konstantin Ananyev
  2024-01-09  1:30 ` Honnappa Nagarahalli
  2 siblings, 1 reply; 18+ messages in thread
From: Morten Brørup @ 2024-01-08 10:41 UTC (permalink / raw)
  To: Stephen Hemminger, dev; +Cc: arshdeep.kaur, Gowda, Sandesh, Reshma Pattan

> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Monday, 8 January 2024 02.59
> 
> I have been looking at a problem reported by Sandesh
> where packet capture does not work if rx/tx burst is done in secondary
> process.
> 
> The root cause is that existing rx/tx callback model just doesn't work
> unless the process doing the rx/tx burst calls is the same one that
> registered the callbacks.

So, callbacks don't work across processes, because code might differ across processes.

If process A is running, and RX'ing and TX'ing, and process B wants to install its own callbacks (e.g. packet capture) on RX and RX, we basically want process A to execute code residing in process B, which is impossible.

An alternative could be to pass the packets through a ring in shared memory. However, this method would add the ring processing latency of process B to the RX/TX latency of process A.

I think we can conclude that callbacks are one of the things that don't work with secondary processes.

With this decided, we can then consider how to best add packet capture. The concept of passing "data" (instead of calling functions) across processes obviously applies to this use case.

> 
> An example sequence would be:
> 	1. dumpcap (or pdump) as secondary tells pdump in primary to
> register callback
> 	2. secondary process calls rx_burst.
> 	3. rx_burst sees the callback but it has pointer pdump_rx which
> is not necessarily
> 	   at same location in primary and secondary process.
> 	4. indirect function call in secondary to bad location likely
> causes crash.
> 
> Some possible workarounds.
> 	1. Keep callback list per-process: messy, but won't crash.
> Capture won't work
>            without other changes. In this primary would register
> callback, but secondaries
>            would not use them in rx/tx burst.
> 
> 	2. Replace use of rx/tx callback in pdump with change to
> rte_ethdev to have
>            a capture flag. (i.e. don't use indirection).  Likely ABI
> problems.
>            Basically, ignore the rx/tx callback mechanism. This is my
> preferred
> 	   solution.
> 
> 	3. Some fix up mechanism (in EAL mp support?) to have each
> process fixup
>            its callback mechanism.
> 
> 	4. Do something in pdump_init to register the callback in same
> process context
> 	   (probably need callbacks to be per-process). Would mean
> callback is always
>            on independent of capture being enabled.
> 
>         5. Get rid of indirect function call pointer, and replace it by
> index into
>            a static table of callback functions. Every process would
> have same code
>            (in this case pdump_rx) but at different address.  Requires
> all callbacks
>            to be statically defined at build time.
> 
> The existing rx/tx callback is not safe id rx/tx burst is called from
> different process
> than where callback is registered.
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: Issues around packet capture when secondary process is doing rx/tx
  2024-01-08  1:59 Issues around packet capture when secondary process is doing rx/tx Stephen Hemminger
  2024-01-08 10:41 ` Morten Brørup
@ 2024-01-08 15:13 ` Konstantin Ananyev
  2024-01-08 17:02   ` Stephen Hemminger
                     ` (4 more replies)
  2024-01-09  1:30 ` Honnappa Nagarahalli
  2 siblings, 5 replies; 18+ messages in thread
From: Konstantin Ananyev @ 2024-01-08 15:13 UTC (permalink / raw)
  To: Stephen Hemminger, dev; +Cc: arshdeep.kaur, Gowda, Sandesh, Reshma Pattan

> I have been looking at a problem reported by Sandesh
> where packet capture does not work if rx/tx burst is done in secondary process.
> 
> The root cause is that existing rx/tx callback model just doesn't work
> unless the process doing the rx/tx burst calls is the same one that
> registered the callbacks.
> 
> An example sequence would be:
> 	1. dumpcap (or pdump) as secondary tells pdump in primary to register callback
> 	2. secondary process calls rx_burst.
> 	3. rx_burst sees the callback but it has pointer pdump_rx which is not necessarily
> 	   at same location in primary and secondary process.
> 	4. indirect function call in secondary to bad location likely causes crash.

As I remember, RX/TX callbacks were never intended to work over multiple processes.
Right now RX/TX callbacks are private for the process, different process simply should not
see/execute them.
I.E. it callbacks list is part of 'struct rte_eth_dev' itself, not the rte_eth_dev.data that is shared
between processes.
It should be normal, wehn for the same port/queue you will end-up with different list of callbacks
for different processes.  
So, unless I am missing something, I don't see how we can end-up with 3) and 4) from above:
From my understanding secondary process will never see/call primary's callbacks.

About pdump itself, it was a while when I looked at it last time, but as I remember to start it to work,
server process has to call rte_pdump_init() which in terns register PDUMP_MP handler.
I suppose for the secondary process to act as a 'pdump server' it needs to call rte_pdump_init() itself,
though I am not sure such option is supported right now. 

> 
> Some possible workarounds.
> 	1. Keep callback list per-process: messy, but won't crash. Capture won't work
>            without other changes. In this primary would register callback, but secondaries
>            would not use them in rx/tx burst.
> 
> 	2. Replace use of rx/tx callback in pdump with change to rte_ethdev to have
>            a capture flag. (i.e. don't use indirection).  Likely ABI problems.
>            Basically, ignore the rx/tx callback mechanism. This is my preferred
> 	   solution.

It is not only the capture flag, it is also what to do with the captured packets
(copy? If yes, then where to? examine? drop?, do something else?).
It is probably not the best choice to add all these things into ethdev API.

> 	3. Some fix up mechanism (in EAL mp support?) to have each process fixup
>            its callback mechanism.

Probably the easiest way to fix that - pass to rte_pdump_enable() extra information
that  would allow it to distinguish on what exact process (local, remote)
we want to enable pdump functionality. Then it could act accordingly.

> 
> 	4. Do something in pdump_init to register the callback in same process context
> 	   (probably need callbacks to be per-process). Would mean callback is always
>            on independent of capture being enabled.
> 
>         5. Get rid of indirect function call pointer, and replace it by index into
>            a static table of callback functions. Every process would have same code
>            (in this case pdump_rx) but at different address.  Requires all callbacks
>            to be statically defined at build time.

Doesn't look like a good approach - it will break many things. 

> The existing rx/tx callback is not safe id rx/tx burst is called from different process
> than where callback is registered.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Issues around packet capture when secondary process is doing rx/tx
  2024-01-08 15:13 ` Konstantin Ananyev
@ 2024-01-08 17:02   ` Stephen Hemminger
  2024-01-08 17:55   ` Stephen Hemminger
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 18+ messages in thread
From: Stephen Hemminger @ 2024-01-08 17:02 UTC (permalink / raw)
  To: Konstantin Ananyev; +Cc: dev, arshdeep.kaur, Gowda, Sandesh, Reshma Pattan

On Mon, 8 Jan 2024 15:13:25 +0000
Konstantin Ananyev <konstantin.ananyev@huawei.com> wrote:

> > 
> > 	2. Replace use of rx/tx callback in pdump with change to rte_ethdev to have
> >            a capture flag. (i.e. don't use indirection).  Likely ABI problems.
> >            Basically, ignore the rx/tx callback mechanism. This is my preferred
> > 	   solution.  
> 
> It is not only the capture flag, it is also what to do with the captured packets
> (copy? If yes, then where to? examine? drop?, do something else?).
> It is probably not the best choice to add all these things into ethdev API.

The part that pdump does is trivial, it just copies and puts in ring.
This will work from any process.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Issues around packet capture when secondary process is doing rx/tx
  2024-01-08 15:13 ` Konstantin Ananyev
  2024-01-08 17:02   ` Stephen Hemminger
@ 2024-01-08 17:55   ` Stephen Hemminger
  2024-01-09 23:06   ` Stephen Hemminger
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 18+ messages in thread
From: Stephen Hemminger @ 2024-01-08 17:55 UTC (permalink / raw)
  To: Konstantin Ananyev; +Cc: dev, arshdeep.kaur, Gowda, Sandesh, Reshma Pattan

On Mon, 8 Jan 2024 15:13:25 +0000
Konstantin Ananyev <konstantin.ananyev@huawei.com> wrote:

> > I have been looking at a problem reported by Sandesh
> > where packet capture does not work if rx/tx burst is done in secondary process.
> > 
> > The root cause is that existing rx/tx callback model just doesn't work
> > unless the process doing the rx/tx burst calls is the same one that
> > registered the callbacks.
> > 
> > An example sequence would be:
> > 	1. dumpcap (or pdump) as secondary tells pdump in primary to register callback
> > 	2. secondary process calls rx_burst.
> > 	3. rx_burst sees the callback but it has pointer pdump_rx which is not necessarily
> > 	   at same location in primary and secondary process.
> > 	4. indirect function call in secondary to bad location likely causes crash.  
> 
> As I remember, RX/TX callbacks were never intended to work over multiple processes.
> Right now RX/TX callbacks are private for the process, different process simply should not
> see/execute them.
> I.E. it callbacks list is part of 'struct rte_eth_dev' itself, not the rte_eth_dev.data that is shared
> between processes.
> It should be normal, wehn for the same port/queue you will end-up with different list of callbacks
> for different processes.  
> So, unless I am missing something, I don't see how we can end-up with 3) and 4) from above:
> From my understanding secondary process will never see/call primary's callbacks.
> 
> About pdump itself, it was a while when I looked at it last time, but as I remember to start it to work,
> server process has to call rte_pdump_init() which in terns register PDUMP_MP handler.
> I suppose for the secondary process to act as a 'pdump server' it needs to call rte_pdump_init() itself,
> though I am not sure such option is supported right now. 

Maybe the simplest would be just to make sure that rte_pdump_init() is called
in the process that does rx/tx burst. That might be made to work.
Still won't work for case where there are multiple secondary processes and some
the ethdev ports are used differently in each one, but would work better than now.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: Issues around packet capture when secondary process is doing rx/tx
  2024-01-08  1:59 Issues around packet capture when secondary process is doing rx/tx Stephen Hemminger
  2024-01-08 10:41 ` Morten Brørup
  2024-01-08 15:13 ` Konstantin Ananyev
@ 2024-01-09  1:30 ` Honnappa Nagarahalli
  2 siblings, 0 replies; 18+ messages in thread
From: Honnappa Nagarahalli @ 2024-01-09  1:30 UTC (permalink / raw)
  To: Stephen Hemminger, dev
  Cc: arshdeep.kaur, Gowda, Sandesh, Reshma Pattan, nd, nd



> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Sunday, January 7, 2024 7:59 PM
> To: dev@dpdk.org
> Cc: arshdeep.kaur@intel.com; Gowda, Sandesh <sandesh.gowda@intel.com>;
> Reshma Pattan <reshma.pattan@intel.com>
> Subject: Issues around packet capture when secondary process is doing rx/tx
> 
> I have been looking at a problem reported by Sandesh where packet capture
> does not work if rx/tx burst is done in secondary process.
> 
> The root cause is that existing rx/tx callback model just doesn't work unless the
> process doing the rx/tx burst calls is the same one that registered the callbacks.
This is not specific to packet capture. This is a generic problem and we should look to solve it generically.

> 
> An example sequence would be:
> 	1. dumpcap (or pdump) as secondary tells pdump in primary to register
> callback
> 	2. secondary process calls rx_burst.
> 	3. rx_burst sees the callback but it has pointer pdump_rx which is not
> necessarily
> 	   at same location in primary and secondary process.
> 	4. indirect function call in secondary to bad location likely causes crash.
> 
> Some possible workarounds.
> 	1. Keep callback list per-process: messy, but won't crash. Capture won't
> work
>            without other changes. In this primary would register callback, but
> secondaries
>            would not use them in rx/tx burst.
> 
> 	2. Replace use of rx/tx callback in pdump with change to rte_ethdev to
> have
>            a capture flag. (i.e. don't use indirection).  Likely ABI problems.
>            Basically, ignore the rx/tx callback mechanism. This is my preferred
> 	   solution.
> 
> 	3. Some fix up mechanism (in EAL mp support?) to have each process
> fixup
>            its callback mechanism.
Yes, would prefer this. Let the application call additional APIs to register the call backs in secondary process.

> 
> 	4. Do something in pdump_init to register the callback in same process
> context
> 	   (probably need callbacks to be per-process). Would mean callback is
> always
>            on independent of capture being enabled.
> 
>         5. Get rid of indirect function call pointer, and replace it by index into
>            a static table of callback functions. Every process would have same code
>            (in this case pdump_rx) but at different address.  Requires all callbacks
>            to be statically defined at build time.
> 
> The existing rx/tx callback is not safe id rx/tx burst is called from different
> process than where callback is registered.
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Issues around packet capture when secondary process is doing rx/tx
  2024-01-08 15:13 ` Konstantin Ananyev
  2024-01-08 17:02   ` Stephen Hemminger
  2024-01-08 17:55   ` Stephen Hemminger
@ 2024-01-09 23:06   ` Stephen Hemminger
  2024-01-09 23:07     ` Stephen Hemminger
  2024-01-10 20:11     ` Konstantin Ananyev
  2024-04-03  0:14   ` Stephen Hemminger
  2024-04-03 11:42   ` Ferruh Yigit
  4 siblings, 2 replies; 18+ messages in thread
From: Stephen Hemminger @ 2024-01-09 23:06 UTC (permalink / raw)
  To: Konstantin Ananyev; +Cc: dev, arshdeep.kaur, Gowda, Sandesh, Reshma Pattan

On Mon, 8 Jan 2024 15:13:25 +0000
Konstantin Ananyev <konstantin.ananyev@huawei.com> wrote:

> > I have been looking at a problem reported by Sandesh
> > where packet capture does not work if rx/tx burst is done in secondary process.
> > 
> > The root cause is that existing rx/tx callback model just doesn't work
> > unless the process doing the rx/tx burst calls is the same one that
> > registered the callbacks.
> > 
> > An example sequence would be:
> > 	1. dumpcap (or pdump) as secondary tells pdump in primary to register callback
> > 	2. secondary process calls rx_burst.
> > 	3. rx_burst sees the callback but it has pointer pdump_rx which is not necessarily
> > 	   at same location in primary and secondary process.
> > 	4. indirect function call in secondary to bad location likely causes crash.  
> 
> As I remember, RX/TX callbacks were never intended to work over multiple processes.
> Right now RX/TX callbacks are private for the process, different process simply should not
> see/execute them.
> I.E. it callbacks list is part of 'struct rte_eth_dev' itself, not the rte_eth_dev.data that is shared
> between processes.
> It should be normal, wehn for the same port/queue you will end-up with different list of callbacks
> for different processes.  
> So, unless I am missing something, I don't see how we can end-up with 3) and 4) from above:
> From my understanding secondary process will never see/call primary's callbacks.
> 
> About pdump itself, it was a while when I looked at it last time, but as I remember to start it to work,
> server process has to call rte_pdump_init() which in terns register PDUMP_MP handler.
> I suppose for the secondary process to act as a 'pdump server' it needs to call rte_pdump_init() itself,
> though I am not sure such option is supported right now. 
>  

Did some more tests with modified testpmd, and reached some conclusions:

The logical interface would be to allow rte_pdump_init() to be called by
   the process that would be using rx/tx burst API's.

  This doesn't work as it should because the multi-process socket API
  assumes that the it only runs the server in primary.  The secondary
  can start its own MP thread, but it won't work:

  Primary EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
  Secondary: EAL: Multi-process socket /var/run/dpdk/rte/mp_socket_6057_1ccd4157fd5

  The problem is when client (pdump or dumpcap) tries to run, it uses the mp_socket
  in the primary which causes: EAL: Cannot find action: mp_pdump

  Looks like the whole MP socket mechanism is just not up to this.

Maybe pdump needs to have its own socket and control thread?
Or MP socket needs to have some multicast fanout to all secondaries?







        2. Fut

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Issues around packet capture when secondary process is doing rx/tx
  2024-01-09 23:06   ` Stephen Hemminger
@ 2024-01-09 23:07     ` Stephen Hemminger
  2024-04-03 12:11       ` Ferruh Yigit
  2024-01-10 20:11     ` Konstantin Ananyev
  1 sibling, 1 reply; 18+ messages in thread
From: Stephen Hemminger @ 2024-01-09 23:07 UTC (permalink / raw)
  To: Konstantin Ananyev; +Cc: dev, arshdeep.kaur, Gowda, Sandesh, Reshma Pattan

On Tue, 9 Jan 2024 15:06:47 -0800
Stephen Hemminger <stephen@networkplumber.org> wrote:

> On Mon, 8 Jan 2024 15:13:25 +0000
> Konstantin Ananyev <konstantin.ananyev@huawei.com> wrote:
> 
> > > I have been looking at a problem reported by Sandesh
> > > where packet capture does not work if rx/tx burst is done in secondary process.
> > > 
> > > The root cause is that existing rx/tx callback model just doesn't work
> > > unless the process doing the rx/tx burst calls is the same one that
> > > registered the callbacks.
> > > 
> > > An example sequence would be:
> > > 	1. dumpcap (or pdump) as secondary tells pdump in primary to register callback
> > > 	2. secondary process calls rx_burst.
> > > 	3. rx_burst sees the callback but it has pointer pdump_rx which is not necessarily
> > > 	   at same location in primary and secondary process.
> > > 	4. indirect function call in secondary to bad location likely causes crash.    
> > 
> > As I remember, RX/TX callbacks were never intended to work over multiple processes.
> > Right now RX/TX callbacks are private for the process, different process simply should not
> > see/execute them.
> > I.E. it callbacks list is part of 'struct rte_eth_dev' itself, not the rte_eth_dev.data that is shared
> > between processes.
> > It should be normal, wehn for the same port/queue you will end-up with different list of callbacks
> > for different processes.  
> > So, unless I am missing something, I don't see how we can end-up with 3) and 4) from above:
> > From my understanding secondary process will never see/call primary's callbacks.
> > 
> > About pdump itself, it was a while when I looked at it last time, but as I remember to start it to work,
> > server process has to call rte_pdump_init() which in terns register PDUMP_MP handler.
> > I suppose for the secondary process to act as a 'pdump server' it needs to call rte_pdump_init() itself,
> > though I am not sure such option is supported right now. 
> >    
> 
> Did some more tests with modified testpmd, and reached some conclusions:
> 
> The logical interface would be to allow rte_pdump_init() to be called by
>    the process that would be using rx/tx burst API's.
> 
>   This doesn't work as it should because the multi-process socket API
>   assumes that the it only runs the server in primary.  The secondary
>   can start its own MP thread, but it won't work:
> 
>   Primary EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
>   Secondary: EAL: Multi-process socket /var/run/dpdk/rte/mp_socket_6057_1ccd4157fd5
> 
>   The problem is when client (pdump or dumpcap) tries to run, it uses the mp_socket
>   in the primary which causes: EAL: Cannot find action: mp_pdump
> 
>   Looks like the whole MP socket mechanism is just not up to this.
> 
> Maybe pdump needs to have its own socket and control thread?
> Or MP socket needs to have some multicast fanout to all secondaries?
> 
> 
> 
> 
> 
> 
> 
>         2. Fut


^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: Issues around packet capture when secondary process is doing rx/tx
  2024-01-09 23:06   ` Stephen Hemminger
  2024-01-09 23:07     ` Stephen Hemminger
@ 2024-01-10 20:11     ` Konstantin Ananyev
  2024-04-03 12:20       ` Ferruh Yigit
  1 sibling, 1 reply; 18+ messages in thread
From: Konstantin Ananyev @ 2024-01-10 20:11 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev, arshdeep.kaur, Gowda, Sandesh, Reshma Pattan



> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Tuesday, January 9, 2024 11:07 PM
> To: Konstantin Ananyev <konstantin.ananyev@huawei.com>
> Cc: dev@dpdk.org; arshdeep.kaur@intel.com; Gowda, Sandesh <sandesh.gowda@intel.com>; Reshma Pattan
> <reshma.pattan@intel.com>
> Subject: Re: Issues around packet capture when secondary process is doing rx/tx
> 
> On Mon, 8 Jan 2024 15:13:25 +0000
> Konstantin Ananyev <konstantin.ananyev@huawei.com> wrote:
> 
> > > I have been looking at a problem reported by Sandesh
> > > where packet capture does not work if rx/tx burst is done in secondary process.
> > >
> > > The root cause is that existing rx/tx callback model just doesn't work
> > > unless the process doing the rx/tx burst calls is the same one that
> > > registered the callbacks.
> > >
> > > An example sequence would be:
> > > 	1. dumpcap (or pdump) as secondary tells pdump in primary to register callback
> > > 	2. secondary process calls rx_burst.
> > > 	3. rx_burst sees the callback but it has pointer pdump_rx which is not necessarily
> > > 	   at same location in primary and secondary process.
> > > 	4. indirect function call in secondary to bad location likely causes crash.
> >
> > As I remember, RX/TX callbacks were never intended to work over multiple processes.
> > Right now RX/TX callbacks are private for the process, different process simply should not
> > see/execute them.
> > I.E. it callbacks list is part of 'struct rte_eth_dev' itself, not the rte_eth_dev.data that is shared
> > between processes.
> > It should be normal, wehn for the same port/queue you will end-up with different list of callbacks
> > for different processes.
> > So, unless I am missing something, I don't see how we can end-up with 3) and 4) from above:
> > From my understanding secondary process will never see/call primary's callbacks.
> >
> > About pdump itself, it was a while when I looked at it last time, but as I remember to start it to work,
> > server process has to call rte_pdump_init() which in terns register PDUMP_MP handler.
> > I suppose for the secondary process to act as a 'pdump server' it needs to call rte_pdump_init() itself,
> > though I am not sure such option is supported right now.
> >
> 
> Did some more tests with modified testpmd, and reached some conclusions:
> 
> The logical interface would be to allow rte_pdump_init() to be called by
>    the process that would be using rx/tx burst API's.
> 
>   This doesn't work as it should because the multi-process socket API
>   assumes that the it only runs the server in primary.  The secondary
>   can start its own MP thread, but it won't work:
> 
>   Primary EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
>   Secondary: EAL: Multi-process socket /var/run/dpdk/rte/mp_socket_6057_1ccd4157fd5
> 
>   The problem is when client (pdump or dumpcap) tries to run, it uses the mp_socket
>   in the primary which causes: EAL: Cannot find action: mp_pdump
> 
>   Looks like the whole MP socket mechanism is just not up to this.
> 
> Maybe pdump needs to have its own socket and control thread?
> Or MP socket needs to have some multicast fanout to all secondaries?

Might be we can do something simpler: pass to pdump_enable(), where we want to enable it:
on primary (remote_ process or secondary (local) process?
And then for primary send a message over MP socket (as we doing now), and for secondary (itself)
just do actual pdump enablement on it's own (install callbacks, etc.).
Yes, in that way, one secondary would not be able to enable/idable pdump on another secondary,
only on itself, but might be it is not needed?

Konstrantin
 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Issues around packet capture when secondary process is doing rx/tx
  2024-01-08 15:13 ` Konstantin Ananyev
                     ` (2 preceding siblings ...)
  2024-01-09 23:06   ` Stephen Hemminger
@ 2024-04-03  0:14   ` Stephen Hemminger
  2024-04-03 11:42   ` Ferruh Yigit
  4 siblings, 0 replies; 18+ messages in thread
From: Stephen Hemminger @ 2024-04-03  0:14 UTC (permalink / raw)
  To: Konstantin Ananyev; +Cc: dev, arshdeep.kaur, Gowda, Sandesh, Reshma Pattan

On Mon, 8 Jan 2024 15:13:25 +0000
Konstantin Ananyev <konstantin.ananyev@huawei.com> wrote:

> > I have been looking at a problem reported by Sandesh
> > where packet capture does not work if rx/tx burst is done in secondary process.
> > 
> > The root cause is that existing rx/tx callback model just doesn't work
> > unless the process doing the rx/tx burst calls is the same one that
> > registered the callbacks.
> > 
> > An example sequence would be:
> > 	1. dumpcap (or pdump) as secondary tells pdump in primary to register callback
> > 	2. secondary process calls rx_burst.
> > 	3. rx_burst sees the callback but it has pointer pdump_rx which is not necessarily
> > 	   at same location in primary and secondary process.
> > 	4. indirect function call in secondary to bad location likely causes crash.  
> 
> As I remember, RX/TX callbacks were never intended to work over multiple processes.
> Right now RX/TX callbacks are private for the process, different process simply should not
> see/execute them.
> I.E. it callbacks list is part of 'struct rte_eth_dev' itself, not the rte_eth_dev.data that is shared
> between processes.
> It should be normal, wehn for the same port/queue you will end-up with different list of callbacks
> for different processes.  
> So, unless I am missing something, I don't see how we can end-up with 3) and 4) from above:
> From my understanding secondary process will never see/call primary's callbacks.
> 
> About pdump itself, it was a while when I looked at it last time, but as I remember to start it to work,
> server process has to call rte_pdump_init() which in terns register PDUMP_MP handler.
> I suppose for the secondary process to act as a 'pdump server' it needs to call rte_pdump_init() itself,
> though I am not sure such option is supported right now. 
>  
> > 
> > Some possible workarounds.
> > 	1. Keep callback list per-process: messy, but won't crash. Capture won't work
> >            without other changes. In this primary would register callback, but secondaries
> >            would not use them in rx/tx burst.
> > 
> > 	2. Replace use of rx/tx callback in pdump with change to rte_ethdev to have
> >            a capture flag. (i.e. don't use indirection).  Likely ABI problems.
> >            Basically, ignore the rx/tx callback mechanism. This is my preferred
> > 	   solution.  
> 
> It is not only the capture flag, it is also what to do with the captured packets
> (copy? If yes, then where to? examine? drop?, do something else?).
> It is probably not the best choice to add all these things into ethdev API.
> 
> > 	3. Some fix up mechanism (in EAL mp support?) to have each process fixup
> >            its callback mechanism.  
>  
> Probably the easiest way to fix that - pass to rte_pdump_enable() extra information
> that  would allow it to distinguish on what exact process (local, remote)
> we want to enable pdump functionality. Then it could act accordingly.
> 
> > 
> > 	4. Do something in pdump_init to register the callback in same process context
> > 	   (probably need callbacks to be per-process). Would mean callback is always
> >            on independent of capture being enabled.
> > 
> >         5. Get rid of indirect function call pointer, and replace it by index into
> >            a static table of callback functions. Every process would have same code
> >            (in this case pdump_rx) but at different address.  Requires all callbacks
> >            to be statically defined at build time.  
> 
> Doesn't look like a good approach - it will break many things. 
>  
> > The existing rx/tx callback is not safe id rx/tx burst is called from different process
> > than where callback is registered.  
>  
> 

Have been looking into best way to fix this, and the real answer is not to use
callbacks but instead use a flag per-queue. The natural place to put these in
rte_ethdev_driver. BUT this will mean an ABI breakage, so will have to wait for 24.11
release. Sometimes fixing a design flaw means an ABI change.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Issues around packet capture when secondary process is doing rx/tx
  2024-01-08 15:13 ` Konstantin Ananyev
                     ` (3 preceding siblings ...)
  2024-04-03  0:14   ` Stephen Hemminger
@ 2024-04-03 11:42   ` Ferruh Yigit
  4 siblings, 0 replies; 18+ messages in thread
From: Ferruh Yigit @ 2024-04-03 11:42 UTC (permalink / raw)
  To: Konstantin Ananyev, Stephen Hemminger, dev
  Cc: arshdeep.kaur, Gowda, Sandesh, Reshma Pattan

On 1/8/2024 3:13 PM, Konstantin Ananyev wrote:
> 
> 
>> I have been looking at a problem reported by Sandesh
>> where packet capture does not work if rx/tx burst is done in secondary process.
>>
>> The root cause is that existing rx/tx callback model just doesn't work
>> unless the process doing the rx/tx burst calls is the same one that
>> registered the callbacks.
>>
>> An example sequence would be:
>> 	1. dumpcap (or pdump) as secondary tells pdump in primary to register callback
>> 	2. secondary process calls rx_burst.
>> 	3. rx_burst sees the callback but it has pointer pdump_rx which is not necessarily
>> 	   at same location in primary and secondary process.
>> 	4. indirect function call in secondary to bad location likely causes crash.
> 
> As I remember, RX/TX callbacks were never intended to work over multiple processes.
> Right now RX/TX callbacks are private for the process, different process simply should not
> see/execute them.
> I.E. it callbacks list is part of 'struct rte_eth_dev' itself, not the rte_eth_dev.data that is shared
> between processes.
> It should be normal, wehn for the same port/queue you will end-up with different list of callbacks
> for different processes.  
> So, unless I am missing something, I don't see how we can end-up with 3) and 4) from above:
> From my understanding secondary process will never see/call primary's callbacks.
> 

Ack. There should be another reason for crash.


> About pdump itself, it was a while when I looked at it last time, but as I remember to start it to work,
> server process has to call rte_pdump_init() which in terns register PDUMP_MP handler.
> I suppose for the secondary process to act as a 'pdump server' it needs to call rte_pdump_init() itself,
> though I am not sure such option is supported right now. 
>  

Currently testpmd calls 'rte_pdump_init()', and both primary testpmd and
secondary testpmd process calls this API and both register PDUMP_MP
handler, I think this is OK.

When pdump secondary process sends MP message, both primary testpmd and
secondary testpmd process should register callbacks with provided ring
and mempool information.

I don't know if both primary and secondary process callbacks running
simultaneously causing this problem, otherwise I expect it to work.

>>
>> Some possible workarounds.
>> 	1. Keep callback list per-process: messy, but won't crash. Capture won't work
>>            without other changes. In this primary would register callback, but secondaries
>>            would not use them in rx/tx burst.
>>
>> 	2. Replace use of rx/tx callback in pdump with change to rte_ethdev to have
>>            a capture flag. (i.e. don't use indirection).  Likely ABI problems.
>>            Basically, ignore the rx/tx callback mechanism. This is my preferred
>> 	   solution.
> 
> It is not only the capture flag, it is also what to do with the captured packets
> (copy? If yes, then where to? examine? drop?, do something else?).
> It is probably not the best choice to add all these things into ethdev API.
> 
>> 	3. Some fix up mechanism (in EAL mp support?) to have each process fixup
>>            its callback mechanism.
>  
> Probably the easiest way to fix that - pass to rte_pdump_enable() extra information
> that  would allow it to distinguish on what exact process (local, remote)
> we want to enable pdump functionality. Then it could act accordingly.
> 
>>
>> 	4. Do something in pdump_init to register the callback in same process context
>> 	   (probably need callbacks to be per-process). Would mean callback is always
>>            on independent of capture being enabled.
>>
>>         5. Get rid of indirect function call pointer, and replace it by index into
>>            a static table of callback functions. Every process would have same code
>>            (in this case pdump_rx) but at different address.  Requires all callbacks
>>            to be statically defined at build time.
> 
> Doesn't look like a good approach - it will break many things. 
>  
>> The existing rx/tx callback is not safe id rx/tx burst is called from different process
>> than where callback is registered.
>  
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Issues around packet capture when secondary process is doing rx/tx
  2024-01-08 10:41 ` Morten Brørup
@ 2024-04-03 11:43   ` Ferruh Yigit
  0 siblings, 0 replies; 18+ messages in thread
From: Ferruh Yigit @ 2024-04-03 11:43 UTC (permalink / raw)
  To: Morten Brørup, Stephen Hemminger, dev
  Cc: arshdeep.kaur, Gowda, Sandesh, Reshma Pattan, Konstantin Ananyev

On 1/8/2024 10:41 AM, Morten Brørup wrote:
>> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
>> Sent: Monday, 8 January 2024 02.59
>>
>> I have been looking at a problem reported by Sandesh
>> where packet capture does not work if rx/tx burst is done in secondary
>> process.
>>
>> The root cause is that existing rx/tx callback model just doesn't work
>> unless the process doing the rx/tx burst calls is the same one that
>> registered the callbacks.
> 
> So, callbacks don't work across processes, because code might differ across processes.
> 
> If process A is running, and RX'ing and TX'ing, and process B wants to install its own callbacks (e.g. packet capture) on RX and RX, we basically want process A to execute code residing in process B, which is impossible.
> 

Callbacks stored in "struct rte_eth_dev", so it is per process, which
means primary and secondaries has their own copies of callbacks, as
Konstantin explained.

So, how pdump works :), it uses MP support and shared ring similar to
you mentioned below. More detail:
- Primary registers a MP handler
- pdump secondary process sends a MP message with a ring and mempool in
the message
- When primary receives the MP message it registers its *own* callbacks
that gets 'ring' as parameter
- Callbacks clone packets to 'ring', that is how pdump secondary process
access to the packets

> An alternative could be to pass the packets through a ring in shared memory. However, this method would add the ring processing latency of process B to the RX/TX latency of process A.
> 
> I think we can conclude that callbacks are one of the things that don't work with secondary processes.
> 
> With this decided, we can then consider how to best add packet capture. The concept of passing "data" (instead of calling functions) across processes obviously applies to this use case.
> 
>>
>> An example sequence would be:
>> 	1. dumpcap (or pdump) as secondary tells pdump in primary to
>> register callback
>> 	2. secondary process calls rx_burst.
>> 	3. rx_burst sees the callback but it has pointer pdump_rx which
>> is not necessarily
>> 	   at same location in primary and secondary process.
>> 	4. indirect function call in secondary to bad location likely
>> causes crash.
>>
>> Some possible workarounds.
>> 	1. Keep callback list per-process: messy, but won't crash.
>> Capture won't work
>>            without other changes. In this primary would register
>> callback, but secondaries
>>            would not use them in rx/tx burst.
>>
>> 	2. Replace use of rx/tx callback in pdump with change to
>> rte_ethdev to have
>>            a capture flag. (i.e. don't use indirection).  Likely ABI
>> problems.
>>            Basically, ignore the rx/tx callback mechanism. This is my
>> preferred
>> 	   solution.
>>
>> 	3. Some fix up mechanism (in EAL mp support?) to have each
>> process fixup
>>            its callback mechanism.
>>
>> 	4. Do something in pdump_init to register the callback in same
>> process context
>> 	   (probably need callbacks to be per-process). Would mean
>> callback is always
>>            on independent of capture being enabled.
>>
>>         5. Get rid of indirect function call pointer, and replace it by
>> index into
>>            a static table of callback functions. Every process would
>> have same code
>>            (in this case pdump_rx) but at different address.  Requires
>> all callbacks
>>            to be statically defined at build time.
>>
>> The existing rx/tx callback is not safe id rx/tx burst is called from
>> different process
>> than where callback is registered.
>>
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Issues around packet capture when secondary process is doing rx/tx
  2024-01-09 23:07     ` Stephen Hemminger
@ 2024-04-03 12:11       ` Ferruh Yigit
  0 siblings, 0 replies; 18+ messages in thread
From: Ferruh Yigit @ 2024-04-03 12:11 UTC (permalink / raw)
  To: Stephen Hemminger, Konstantin Ananyev
  Cc: dev, arshdeep.kaur, Gowda, Sandesh, Reshma Pattan

On 1/9/2024 11:07 PM, Stephen Hemminger wrote:
> On Tue, 9 Jan 2024 15:06:47 -0800
> Stephen Hemminger <stephen@networkplumber.org> wrote:
> 
>> On Mon, 8 Jan 2024 15:13:25 +0000
>> Konstantin Ananyev <konstantin.ananyev@huawei.com> wrote:
>>
>>>> I have been looking at a problem reported by Sandesh
>>>> where packet capture does not work if rx/tx burst is done in secondary process.
>>>>
>>>> The root cause is that existing rx/tx callback model just doesn't work
>>>> unless the process doing the rx/tx burst calls is the same one that
>>>> registered the callbacks.
>>>>
>>>> An example sequence would be:
>>>> 	1. dumpcap (or pdump) as secondary tells pdump in primary to register callback
>>>> 	2. secondary process calls rx_burst.
>>>> 	3. rx_burst sees the callback but it has pointer pdump_rx which is not necessarily
>>>> 	   at same location in primary and secondary process.
>>>> 	4. indirect function call in secondary to bad location likely causes crash.    
>>>
>>> As I remember, RX/TX callbacks were never intended to work over multiple processes.
>>> Right now RX/TX callbacks are private for the process, different process simply should not
>>> see/execute them.
>>> I.E. it callbacks list is part of 'struct rte_eth_dev' itself, not the rte_eth_dev.data that is shared
>>> between processes.
>>> It should be normal, wehn for the same port/queue you will end-up with different list of callbacks
>>> for different processes.  
>>> So, unless I am missing something, I don't see how we can end-up with 3) and 4) from above:
>>> From my understanding secondary process will never see/call primary's callbacks.
>>>
>>> About pdump itself, it was a while when I looked at it last time, but as I remember to start it to work,
>>> server process has to call rte_pdump_init() which in terns register PDUMP_MP handler.
>>> I suppose for the secondary process to act as a 'pdump server' it needs to call rte_pdump_init() itself,
>>> though I am not sure such option is supported right now. 
>>>    
>>
>> Did some more tests with modified testpmd, and reached some conclusions:
>>
>> The logical interface would be to allow rte_pdump_init() to be called by
>>    the process that would be using rx/tx burst API's.
>>
>>   This doesn't work as it should because the multi-process socket API
>>   assumes that the it only runs the server in primary.  The secondary
>>   can start its own MP thread, but it won't work:
>>
>>   Primary EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
>>   Secondary: EAL: Multi-process socket /var/run/dpdk/rte/mp_socket_6057_1ccd4157fd5
>>
>>   The problem is when client (pdump or dumpcap) tries to run, it uses the mp_socket
>>   in the primary which causes: EAL: Cannot find action: mp_pdump
>>
>>   Looks like the whole MP socket mechanism is just not up to this.
>>
>> Maybe pdump needs to have its own socket and control thread?
>> Or MP socket needs to have some multicast fanout to all secondaries?
>>

I replied to old email but you seem already figured out the root cause.

So when a secondary sends an MP message, the registered MP handler in
another secondary is not called.

As you suggested fan-out to all secondaries with a flag in the message
can be an option.


And one of the reasons MP socket added was, when a device hotplugged in
secondary, this new device populated both in primary and secondary.
And as far as I know if there are multiple secondaries, device populated
to all, not via secondary to secondary communication, but via primary to
all secondaries communication.
So some kind of fan out to all secondaries should be happening for
hotplugging device usecase.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Issues around packet capture when secondary process is doing rx/tx
  2024-01-10 20:11     ` Konstantin Ananyev
@ 2024-04-03 12:20       ` Ferruh Yigit
  2024-04-04 13:26         ` Konstantin Ananyev
  0 siblings, 1 reply; 18+ messages in thread
From: Ferruh Yigit @ 2024-04-03 12:20 UTC (permalink / raw)
  To: Konstantin Ananyev, Stephen Hemminger
  Cc: dev, arshdeep.kaur, Gowda, Sandesh, Reshma Pattan

On 1/10/2024 8:11 PM, Konstantin Ananyev wrote:
> 
> 
>> -----Original Message-----
>> From: Stephen Hemminger <stephen@networkplumber.org>
>> Sent: Tuesday, January 9, 2024 11:07 PM
>> To: Konstantin Ananyev <konstantin.ananyev@huawei.com>
>> Cc: dev@dpdk.org; arshdeep.kaur@intel.com; Gowda, Sandesh <sandesh.gowda@intel.com>; Reshma Pattan
>> <reshma.pattan@intel.com>
>> Subject: Re: Issues around packet capture when secondary process is doing rx/tx
>>
>> On Mon, 8 Jan 2024 15:13:25 +0000
>> Konstantin Ananyev <konstantin.ananyev@huawei.com> wrote:
>>
>>>> I have been looking at a problem reported by Sandesh
>>>> where packet capture does not work if rx/tx burst is done in secondary process.
>>>>
>>>> The root cause is that existing rx/tx callback model just doesn't work
>>>> unless the process doing the rx/tx burst calls is the same one that
>>>> registered the callbacks.
>>>>
>>>> An example sequence would be:
>>>> 	1. dumpcap (or pdump) as secondary tells pdump in primary to register callback
>>>> 	2. secondary process calls rx_burst.
>>>> 	3. rx_burst sees the callback but it has pointer pdump_rx which is not necessarily
>>>> 	   at same location in primary and secondary process.
>>>> 	4. indirect function call in secondary to bad location likely causes crash.
>>>
>>> As I remember, RX/TX callbacks were never intended to work over multiple processes.
>>> Right now RX/TX callbacks are private for the process, different process simply should not
>>> see/execute them.
>>> I.E. it callbacks list is part of 'struct rte_eth_dev' itself, not the rte_eth_dev.data that is shared
>>> between processes.
>>> It should be normal, wehn for the same port/queue you will end-up with different list of callbacks
>>> for different processes.
>>> So, unless I am missing something, I don't see how we can end-up with 3) and 4) from above:
>>> From my understanding secondary process will never see/call primary's callbacks.
>>>
>>> About pdump itself, it was a while when I looked at it last time, but as I remember to start it to work,
>>> server process has to call rte_pdump_init() which in terns register PDUMP_MP handler.
>>> I suppose for the secondary process to act as a 'pdump server' it needs to call rte_pdump_init() itself,
>>> though I am not sure such option is supported right now.
>>>
>>
>> Did some more tests with modified testpmd, and reached some conclusions:
>>
>> The logical interface would be to allow rte_pdump_init() to be called by
>>    the process that would be using rx/tx burst API's.
>>
>>   This doesn't work as it should because the multi-process socket API
>>   assumes that the it only runs the server in primary.  The secondary
>>   can start its own MP thread, but it won't work:
>>
>>   Primary EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
>>   Secondary: EAL: Multi-process socket /var/run/dpdk/rte/mp_socket_6057_1ccd4157fd5
>>
>>   The problem is when client (pdump or dumpcap) tries to run, it uses the mp_socket
>>   in the primary which causes: EAL: Cannot find action: mp_pdump
>>
>>   Looks like the whole MP socket mechanism is just not up to this.
>>
>> Maybe pdump needs to have its own socket and control thread?
>> Or MP socket needs to have some multicast fanout to all secondaries?
> 
> Might be we can do something simpler: pass to pdump_enable(), where we want to enable it:
> on primary (remote_ process or secondary (local) process?
> And then for primary send a message over MP socket (as we doing now), and for secondary (itself)
> just do actual pdump enablement on it's own (install callbacks, etc.).
> Yes, in that way, one secondary would not be able to enable/idable pdump on another secondary,
> only on itself, but might be it is not needed?
> 
> 

How secondary, lets say testpmd secondary, install callbacks without
getting 'mp' & 'ring' info from pdump secondary process?


^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: Issues around packet capture when secondary process is doing rx/tx
  2024-04-03 12:20       ` Ferruh Yigit
@ 2024-04-04 13:26         ` Konstantin Ananyev
  2024-04-04 14:28           ` Ferruh Yigit
  0 siblings, 1 reply; 18+ messages in thread
From: Konstantin Ananyev @ 2024-04-04 13:26 UTC (permalink / raw)
  To: Ferruh Yigit, Stephen Hemminger
  Cc: dev, arshdeep.kaur, Gowda, Sandesh, Reshma Pattan



> >> -----Original Message-----
> >> From: Stephen Hemminger <stephen@networkplumber.org>
> >> Sent: Tuesday, January 9, 2024 11:07 PM
> >> To: Konstantin Ananyev <konstantin.ananyev@huawei.com>
> >> Cc: dev@dpdk.org; arshdeep.kaur@intel.com; Gowda, Sandesh <sandesh.gowda@intel.com>; Reshma Pattan
> >> <reshma.pattan@intel.com>
> >> Subject: Re: Issues around packet capture when secondary process is doing rx/tx
> >>
> >> On Mon, 8 Jan 2024 15:13:25 +0000
> >> Konstantin Ananyev <konstantin.ananyev@huawei.com> wrote:
> >>
> >>>> I have been looking at a problem reported by Sandesh
> >>>> where packet capture does not work if rx/tx burst is done in secondary process.
> >>>>
> >>>> The root cause is that existing rx/tx callback model just doesn't work
> >>>> unless the process doing the rx/tx burst calls is the same one that
> >>>> registered the callbacks.
> >>>>
> >>>> An example sequence would be:
> >>>> 	1. dumpcap (or pdump) as secondary tells pdump in primary to register callback
> >>>> 	2. secondary process calls rx_burst.
> >>>> 	3. rx_burst sees the callback but it has pointer pdump_rx which is not necessarily
> >>>> 	   at same location in primary and secondary process.
> >>>> 	4. indirect function call in secondary to bad location likely causes crash.
> >>>
> >>> As I remember, RX/TX callbacks were never intended to work over multiple processes.
> >>> Right now RX/TX callbacks are private for the process, different process simply should not
> >>> see/execute them.
> >>> I.E. it callbacks list is part of 'struct rte_eth_dev' itself, not the rte_eth_dev.data that is shared
> >>> between processes.
> >>> It should be normal, wehn for the same port/queue you will end-up with different list of callbacks
> >>> for different processes.
> >>> So, unless I am missing something, I don't see how we can end-up with 3) and 4) from above:
> >>> From my understanding secondary process will never see/call primary's callbacks.
> >>>
> >>> About pdump itself, it was a while when I looked at it last time, but as I remember to start it to work,
> >>> server process has to call rte_pdump_init() which in terns register PDUMP_MP handler.
> >>> I suppose for the secondary process to act as a 'pdump server' it needs to call rte_pdump_init() itself,
> >>> though I am not sure such option is supported right now.
> >>>
> >>
> >> Did some more tests with modified testpmd, and reached some conclusions:
> >>
> >> The logical interface would be to allow rte_pdump_init() to be called by
> >>    the process that would be using rx/tx burst API's.
> >>
> >>   This doesn't work as it should because the multi-process socket API
> >>   assumes that the it only runs the server in primary.  The secondary
> >>   can start its own MP thread, but it won't work:
> >>
> >>   Primary EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
> >>   Secondary: EAL: Multi-process socket /var/run/dpdk/rte/mp_socket_6057_1ccd4157fd5
> >>
> >>   The problem is when client (pdump or dumpcap) tries to run, it uses the mp_socket
> >>   in the primary which causes: EAL: Cannot find action: mp_pdump
> >>
> >>   Looks like the whole MP socket mechanism is just not up to this.
> >>
> >> Maybe pdump needs to have its own socket and control thread?
> >> Or MP socket needs to have some multicast fanout to all secondaries?
> >
> > Might be we can do something simpler: pass to pdump_enable(), where we want to enable it:
> > on primary (remote_ process or secondary (local) process?
> > And then for primary send a message over MP socket (as we doing now), and for secondary (itself)
> > just do actual pdump enablement on it's own (install callbacks, etc.).
> > Yes, in that way, one secondary would not be able to enable/idable pdump on another secondary,
> > only on itself, but might be it is not needed?
> >
> >
> 
> How secondary, lets say testpmd secondary, install callbacks without
> getting 'mp' & 'ring' info from pdump secondary process?

Please see my comment above (I copied it here too):
>Yes, in that way, one secondary would not be able to enable/disable pdump on another secondary, only on itself, but might be it is not needed?


 



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Issues around packet capture when secondary process is doing rx/tx
  2024-04-04 13:26         ` Konstantin Ananyev
@ 2024-04-04 14:28           ` Ferruh Yigit
  2024-04-04 15:21             ` Stephen Hemminger
  2024-04-04 16:18             ` Konstantin Ananyev
  0 siblings, 2 replies; 18+ messages in thread
From: Ferruh Yigit @ 2024-04-04 14:28 UTC (permalink / raw)
  To: Konstantin Ananyev, Stephen Hemminger
  Cc: dev, arshdeep.kaur, Gowda, Sandesh, Reshma Pattan

On 4/4/2024 2:26 PM, Konstantin Ananyev wrote:
> 
> 
>>>> -----Original Message-----
>>>> From: Stephen Hemminger <stephen@networkplumber.org>
>>>> Sent: Tuesday, January 9, 2024 11:07 PM
>>>> To: Konstantin Ananyev <konstantin.ananyev@huawei.com>
>>>> Cc: dev@dpdk.org; arshdeep.kaur@intel.com; Gowda, Sandesh <sandesh.gowda@intel.com>; Reshma Pattan
>>>> <reshma.pattan@intel.com>
>>>> Subject: Re: Issues around packet capture when secondary process is doing rx/tx
>>>>
>>>> On Mon, 8 Jan 2024 15:13:25 +0000
>>>> Konstantin Ananyev <konstantin.ananyev@huawei.com> wrote:
>>>>
>>>>>> I have been looking at a problem reported by Sandesh
>>>>>> where packet capture does not work if rx/tx burst is done in secondary process.
>>>>>>
>>>>>> The root cause is that existing rx/tx callback model just doesn't work
>>>>>> unless the process doing the rx/tx burst calls is the same one that
>>>>>> registered the callbacks.
>>>>>>
>>>>>> An example sequence would be:
>>>>>> 	1. dumpcap (or pdump) as secondary tells pdump in primary to register callback
>>>>>> 	2. secondary process calls rx_burst.
>>>>>> 	3. rx_burst sees the callback but it has pointer pdump_rx which is not necessarily
>>>>>> 	   at same location in primary and secondary process.
>>>>>> 	4. indirect function call in secondary to bad location likely causes crash.
>>>>>
>>>>> As I remember, RX/TX callbacks were never intended to work over multiple processes.
>>>>> Right now RX/TX callbacks are private for the process, different process simply should not
>>>>> see/execute them.
>>>>> I.E. it callbacks list is part of 'struct rte_eth_dev' itself, not the rte_eth_dev.data that is shared
>>>>> between processes.
>>>>> It should be normal, wehn for the same port/queue you will end-up with different list of callbacks
>>>>> for different processes.
>>>>> So, unless I am missing something, I don't see how we can end-up with 3) and 4) from above:
>>>>> From my understanding secondary process will never see/call primary's callbacks.
>>>>>
>>>>> About pdump itself, it was a while when I looked at it last time, but as I remember to start it to work,
>>>>> server process has to call rte_pdump_init() which in terns register PDUMP_MP handler.
>>>>> I suppose for the secondary process to act as a 'pdump server' it needs to call rte_pdump_init() itself,
>>>>> though I am not sure such option is supported right now.
>>>>>
>>>>
>>>> Did some more tests with modified testpmd, and reached some conclusions:
>>>>
>>>> The logical interface would be to allow rte_pdump_init() to be called by
>>>>    the process that would be using rx/tx burst API's.
>>>>
>>>>   This doesn't work as it should because the multi-process socket API
>>>>   assumes that the it only runs the server in primary.  The secondary
>>>>   can start its own MP thread, but it won't work:
>>>>
>>>>   Primary EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
>>>>   Secondary: EAL: Multi-process socket /var/run/dpdk/rte/mp_socket_6057_1ccd4157fd5
>>>>
>>>>   The problem is when client (pdump or dumpcap) tries to run, it uses the mp_socket
>>>>   in the primary which causes: EAL: Cannot find action: mp_pdump
>>>>
>>>>   Looks like the whole MP socket mechanism is just not up to this.
>>>>
>>>> Maybe pdump needs to have its own socket and control thread?
>>>> Or MP socket needs to have some multicast fanout to all secondaries?
>>>
>>> Might be we can do something simpler: pass to pdump_enable(), where we want to enable it:
>>> on primary (remote_ process or secondary (local) process?
>>> And then for primary send a message over MP socket (as we doing now), and for secondary (itself)
>>> just do actual pdump enablement on it's own (install callbacks, etc.).
>>> Yes, in that way, one secondary would not be able to enable/idable pdump on another secondary,
>>> only on itself, but might be it is not needed?
>>>
>>>
>>
>> How secondary, lets say testpmd secondary, install callbacks without
>> getting 'mp' & 'ring' info from pdump secondary process?
> 
> Please see my comment above (I copied it here too):
>> Yes, in that way, one secondary would not be able to enable/disable pdump on another secondary, only on itself, but might be it is not needed?
> 

I saw it Konstantin, but it wasn't clear to me what you are suggesting,
that is why I am asking more.

Do you suggest when testpmd run as secondary process and doing
forwarding, it should do the tasks of pdump itself and we don't use
pdump at all?


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Issues around packet capture when secondary process is doing rx/tx
  2024-04-04 14:28           ` Ferruh Yigit
@ 2024-04-04 15:21             ` Stephen Hemminger
  2024-04-04 16:18             ` Konstantin Ananyev
  1 sibling, 0 replies; 18+ messages in thread
From: Stephen Hemminger @ 2024-04-04 15:21 UTC (permalink / raw)
  To: Ferruh Yigit
  Cc: Konstantin Ananyev, dev, arshdeep.kaur, Gowda, Sandesh, Reshma Pattan


> >>>> Maybe pdump needs to have its own socket and control thread?
> >>>> Or MP socket needs to have some multicast fanout to all secondaries?  
> >>>
> >>> Might be we can do something simpler: pass to pdump_enable(), where we want to enable it:
> >>> on primary (remote_ process or secondary (local) process?
> >>> And then for primary send a message over MP socket (as we doing now), and for secondary (itself)
> >>> just do actual pdump enablement on it's own (install callbacks, etc.).

> >>> Yes, in that way, one secondary would not be able to enable/idable pdump on another secondary,
> >>> only on itself, but might be it is not needed?
> >>>
> >>>  
> >>
> >> How secondary, lets say testpmd secondary, install callbacks without
> >> getting 'mp' & 'ring' info from pdump secondary process?  
> > 
> > Please see my comment above (I copied it here too):  
> >> Yes, in that way, one secondary would not be able to enable/disable pdump on another secondary, only on itself, but might be it is not needed?  
> >   
> 
> I saw it Konstantin, but it wasn't clear to me what you are suggesting,
> that is why I am asking more.
> 
> Do you suggest when testpmd run as secondary process and doing
> forwarding, it should do the tasks of pdump itself and we don't use
> pdump at all?
> 

I looked into starting pdump_init in the active secondary process,
but that won't work right because the passive secondary won't talk to it
over the right unix domain socket. It might be possible to have multiple
MP server sockets and use some form of AF_UNIX multicast, but it gets
complex to handle.

Probably best to skip callbacks for this and use a state flag in eth_dev_driver.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: Issues around packet capture when secondary process is doing rx/tx
  2024-04-04 14:28           ` Ferruh Yigit
  2024-04-04 15:21             ` Stephen Hemminger
@ 2024-04-04 16:18             ` Konstantin Ananyev
  1 sibling, 0 replies; 18+ messages in thread
From: Konstantin Ananyev @ 2024-04-04 16:18 UTC (permalink / raw)
  To: Ferruh Yigit, Stephen Hemminger
  Cc: dev, arshdeep.kaur, Gowda, Sandesh, Reshma Pattan


> >>>> -----Original Message-----
> >>>> From: Stephen Hemminger <stephen@networkplumber.org>
> >>>> Sent: Tuesday, January 9, 2024 11:07 PM
> >>>> To: Konstantin Ananyev <konstantin.ananyev@huawei.com>
> >>>> Cc: dev@dpdk.org; arshdeep.kaur@intel.com; Gowda, Sandesh <sandesh.gowda@intel.com>; Reshma Pattan
> >>>> <reshma.pattan@intel.com>
> >>>> Subject: Re: Issues around packet capture when secondary process is doing rx/tx
> >>>>
> >>>> On Mon, 8 Jan 2024 15:13:25 +0000
> >>>> Konstantin Ananyev <konstantin.ananyev@huawei.com> wrote:
> >>>>
> >>>>>> I have been looking at a problem reported by Sandesh
> >>>>>> where packet capture does not work if rx/tx burst is done in secondary process.
> >>>>>>
> >>>>>> The root cause is that existing rx/tx callback model just doesn't work
> >>>>>> unless the process doing the rx/tx burst calls is the same one that
> >>>>>> registered the callbacks.
> >>>>>>
> >>>>>> An example sequence would be:
> >>>>>> 	1. dumpcap (or pdump) as secondary tells pdump in primary to register callback
> >>>>>> 	2. secondary process calls rx_burst.
> >>>>>> 	3. rx_burst sees the callback but it has pointer pdump_rx which is not necessarily
> >>>>>> 	   at same location in primary and secondary process.
> >>>>>> 	4. indirect function call in secondary to bad location likely causes crash.
> >>>>>
> >>>>> As I remember, RX/TX callbacks were never intended to work over multiple processes.
> >>>>> Right now RX/TX callbacks are private for the process, different process simply should not
> >>>>> see/execute them.
> >>>>> I.E. it callbacks list is part of 'struct rte_eth_dev' itself, not the rte_eth_dev.data that is shared
> >>>>> between processes.
> >>>>> It should be normal, wehn for the same port/queue you will end-up with different list of callbacks
> >>>>> for different processes.
> >>>>> So, unless I am missing something, I don't see how we can end-up with 3) and 4) from above:
> >>>>> From my understanding secondary process will never see/call primary's callbacks.
> >>>>>
> >>>>> About pdump itself, it was a while when I looked at it last time, but as I remember to start it to work,
> >>>>> server process has to call rte_pdump_init() which in terns register PDUMP_MP handler.
> >>>>> I suppose for the secondary process to act as a 'pdump server' it needs to call rte_pdump_init() itself,
> >>>>> though I am not sure such option is supported right now.
> >>>>>
> >>>>
> >>>> Did some more tests with modified testpmd, and reached some conclusions:
> >>>>
> >>>> The logical interface would be to allow rte_pdump_init() to be called by
> >>>>    the process that would be using rx/tx burst API's.
> >>>>
> >>>>   This doesn't work as it should because the multi-process socket API
> >>>>   assumes that the it only runs the server in primary.  The secondary
> >>>>   can start its own MP thread, but it won't work:
> >>>>
> >>>>   Primary EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
> >>>>   Secondary: EAL: Multi-process socket /var/run/dpdk/rte/mp_socket_6057_1ccd4157fd5
> >>>>
> >>>>   The problem is when client (pdump or dumpcap) tries to run, it uses the mp_socket
> >>>>   in the primary which causes: EAL: Cannot find action: mp_pdump
> >>>>
> >>>>   Looks like the whole MP socket mechanism is just not up to this.
> >>>>
> >>>> Maybe pdump needs to have its own socket and control thread?
> >>>> Or MP socket needs to have some multicast fanout to all secondaries?
> >>>
> >>> Might be we can do something simpler: pass to pdump_enable(), where we want to enable it:
> >>> on primary (remote_ process or secondary (local) process?
> >>> And then for primary send a message over MP socket (as we doing now), and for secondary (itself)
> >>> just do actual pdump enablement on it's own (install callbacks, etc.).
> >>> Yes, in that way, one secondary would not be able to enable/idable pdump on another secondary,
> >>> only on itself, but might be it is not needed?
> >>>
> >>>
> >>
> >> How secondary, lets say testpmd secondary, install callbacks without
> >> getting 'mp' & 'ring' info from pdump secondary process?
> >
> > Please see my comment above (I copied it here too):
> >> Yes, in that way, one secondary would not be able to enable/disable pdump on another secondary, only on itself, but might be it is
> not needed?
> >
> 
> I saw it Konstantin, but it wasn't clear to me what you are suggesting,
> that is why I am asking more.
> 
> Do you suggest when testpmd run as secondary process and doing
> forwarding, it should do the tasks of pdump itself and we don't use
> pdump at all?

Sort of - we can still use pdump API, but under the hood instead of sending request to primary,
secondary would just install an RX/TX callback for itself.
Again, with that schema secondary<->secondary would not be supported.


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2024-04-04 16:18 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-08  1:59 Issues around packet capture when secondary process is doing rx/tx Stephen Hemminger
2024-01-08 10:41 ` Morten Brørup
2024-04-03 11:43   ` Ferruh Yigit
2024-01-08 15:13 ` Konstantin Ananyev
2024-01-08 17:02   ` Stephen Hemminger
2024-01-08 17:55   ` Stephen Hemminger
2024-01-09 23:06   ` Stephen Hemminger
2024-01-09 23:07     ` Stephen Hemminger
2024-04-03 12:11       ` Ferruh Yigit
2024-01-10 20:11     ` Konstantin Ananyev
2024-04-03 12:20       ` Ferruh Yigit
2024-04-04 13:26         ` Konstantin Ananyev
2024-04-04 14:28           ` Ferruh Yigit
2024-04-04 15:21             ` Stephen Hemminger
2024-04-04 16:18             ` Konstantin Ananyev
2024-04-03  0:14   ` Stephen Hemminger
2024-04-03 11:42   ` Ferruh Yigit
2024-01-09  1:30 ` Honnappa Nagarahalli

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.