All of lore.kernel.org
 help / color / mirror / Atom feed
* [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
@ 2018-08-15 18:49 Sridhar Samudrala
  2018-08-27  8:40 ` [virtio-dev] " Cornelia Huck
  2018-09-07 21:34 ` [virtio-dev] " Michael S. Tsirkin
  0 siblings, 2 replies; 85+ messages in thread
From: Sridhar Samudrala @ 2018-08-15 18:49 UTC (permalink / raw)
  To: mst, cohuck, virtio-dev

VIRTIO_NET_F_STANDBY feature enables hypervisor to indicate virtio_net
device to act as a standby for another device with the same MAC address.

Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Fixes: https://github.com/oasis-tcs/virtio-spec/issues/18
---
 content.tex | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/content.tex b/content.tex
index be18234..42a0e7e 100644
--- a/content.tex
+++ b/content.tex
@@ -2525,6 +2525,9 @@ features.
 
 \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
     channel.
+
+\item[VIRTIO_NET_F_STANDBY(62)] Device may act as a standby for a primary
+    device with the same MAC address.
 \end{description}
 
 \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device / Feature bits / Feature bit requirements}
@@ -2614,6 +2617,9 @@ level ethernet header length) size with \field{gso_type} NONE or ECN, and do
 so without fragmentation, after VIRTIO_NET_F_MTU has been successfully
 negotiated.
 
+If the driver negotiates the VIRTIO_NET_F_STANDBY feature, the device MAY act
+as a standby device for a primary device with the same MAC address.
+
 \drivernormative{\subsubsection}{Device configuration layout}{Device Types / Network Device / Device configuration layout}
 
 A driver SHOULD negotiate VIRTIO_NET_F_MAC if the device offers it.
@@ -2636,6 +2642,8 @@ If the driver negotiates VIRTIO_NET_F_MTU, it MUST NOT transmit packets of
 size exceeding the value of \field{mtu} (plus low level ethernet header length)
 with \field{gso_type} NONE or ECN.
 
+A driver SHOULD negotiate the VIRTIO_NET_F_STANDBY feature if the device offers it.
+
 \subsubsection{Legacy Interface: Device configuration layout}\label{sec:Device Types / Network Device / Device configuration layout / Legacy Interface: Device configuration layout}
 \label{sec:Device Types / Block Device / Feature bits / Device configuration layout / Legacy Interface: Device configuration layout}
 When using the legacy interface, transitional devices and drivers
-- 
2.14.4


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [virtio-dev] Re: [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-08-15 18:49 [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature Sridhar Samudrala
@ 2018-08-27  8:40 ` Cornelia Huck
  2018-08-27 12:34   ` Michael S. Tsirkin
  2018-09-07 21:34 ` [virtio-dev] " Michael S. Tsirkin
  1 sibling, 1 reply; 85+ messages in thread
From: Cornelia Huck @ 2018-08-27  8:40 UTC (permalink / raw)
  To: Sridhar Samudrala, mst; +Cc: virtio-dev

On Wed, 15 Aug 2018 11:49:15 -0700
Sridhar Samudrala <sridhar.samudrala@intel.com> wrote:

> VIRTIO_NET_F_STANDBY feature enables hypervisor to indicate virtio_net
> device to act as a standby for another device with the same MAC address.
> 
> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
> Acked-by: Cornelia Huck <cohuck@redhat.com>
> Fixes: https://github.com/oasis-tcs/virtio-spec/issues/18

I think you need to update the github issue to point to this (v4) patch.

> ---
>  content.tex | 8 ++++++++
>  1 file changed, 8 insertions(+)

Other than that, I'd vote (...) to start voting on this issue, but
AFAIK I can't do that. Michael?

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [virtio-dev] Re: [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-08-27  8:40 ` [virtio-dev] " Cornelia Huck
@ 2018-08-27 12:34   ` Michael S. Tsirkin
  2018-08-27 16:50     ` Samudrala, Sridhar
  0 siblings, 1 reply; 85+ messages in thread
From: Michael S. Tsirkin @ 2018-08-27 12:34 UTC (permalink / raw)
  To: Cornelia Huck; +Cc: Sridhar Samudrala, virtio-dev

On Mon, Aug 27, 2018 at 10:40:35AM +0200, Cornelia Huck wrote:
> On Wed, 15 Aug 2018 11:49:15 -0700
> Sridhar Samudrala <sridhar.samudrala@intel.com> wrote:
> 
> > VIRTIO_NET_F_STANDBY feature enables hypervisor to indicate virtio_net
> > device to act as a standby for another device with the same MAC address.
> > 
> > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
> > Acked-by: Cornelia Huck <cohuck@redhat.com>
> > Fixes: https://github.com/oasis-tcs/virtio-spec/issues/18
> 
> I think you need to update the github issue to point to this (v4) patch.
> 
> > ---
> >  content.tex | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> 
> Other than that, I'd vote (...) to start voting on this issue, but
> AFAIK I can't do that. Michael?

OK, ballot started.

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [virtio-dev] Re: [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-08-27 12:34   ` Michael S. Tsirkin
@ 2018-08-27 16:50     ` Samudrala, Sridhar
  2018-08-28 12:13       ` Michael S. Tsirkin
  0 siblings, 1 reply; 85+ messages in thread
From: Samudrala, Sridhar @ 2018-08-27 16:50 UTC (permalink / raw)
  To: Michael S. Tsirkin, Cornelia Huck; +Cc: virtio-dev

On 8/27/2018 5:34 AM, Michael S. Tsirkin wrote:
> On Mon, Aug 27, 2018 at 10:40:35AM +0200, Cornelia Huck wrote:
>> On Wed, 15 Aug 2018 11:49:15 -0700
>> Sridhar Samudrala <sridhar.samudrala@intel.com> wrote:
>>
>>> VIRTIO_NET_F_STANDBY feature enables hypervisor to indicate virtio_net
>>> device to act as a standby for another device with the same MAC address.
>>>
>>> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
>>> Acked-by: Cornelia Huck <cohuck@redhat.com>
>>> Fixes: https://github.com/oasis-tcs/virtio-spec/issues/18
>> I think you need to update the github issue to point to this (v4) patch.

Updated the github issue with link to v4 patch.

>>
>>> ---
>>>   content.tex | 8 ++++++++
>>>   1 file changed, 8 insertions(+)
>> Other than that, I'd vote (...) to start voting on this issue, but
>> AFAIK I can't do that. Michael?
> OK, ballot started.


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] Re: [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-08-27 16:50     ` Samudrala, Sridhar
@ 2018-08-28 12:13       ` Michael S. Tsirkin
  0 siblings, 0 replies; 85+ messages in thread
From: Michael S. Tsirkin @ 2018-08-28 12:13 UTC (permalink / raw)
  To: Samudrala, Sridhar; +Cc: Cornelia Huck, virtio-dev

On Mon, Aug 27, 2018 at 09:50:33AM -0700, Samudrala, Sridhar wrote:
> On 8/27/2018 5:34 AM, Michael S. Tsirkin wrote:
> > On Mon, Aug 27, 2018 at 10:40:35AM +0200, Cornelia Huck wrote:
> > > On Wed, 15 Aug 2018 11:49:15 -0700
> > > Sridhar Samudrala <sridhar.samudrala@intel.com> wrote:
> > > 
> > > > VIRTIO_NET_F_STANDBY feature enables hypervisor to indicate virtio_net
> > > > device to act as a standby for another device with the same MAC address.
> > > > 
> > > > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
> > > > Acked-by: Cornelia Huck <cohuck@redhat.com>
> > > > Fixes: https://github.com/oasis-tcs/virtio-spec/issues/18
> > > I think you need to update the github issue to point to this (v4) patch.
> 
> Updated the github issue with link to v4 patch.

Please verify it's the same link that was put in the ballot:
https://www.oasis-open.org/committees/ballot.php?id=3240

> > > 
> > > > ---
> > > >   content.tex | 8 ++++++++
> > > >   1 file changed, 8 insertions(+)
> > > Other than that, I'd vote (...) to start voting on this issue, but
> > > AFAIK I can't do that. Michael?
> > OK, ballot started.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-08-15 18:49 [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature Sridhar Samudrala
  2018-08-27  8:40 ` [virtio-dev] " Cornelia Huck
@ 2018-09-07 21:34 ` Michael S. Tsirkin
  2018-09-12 15:17   ` Samudrala, Sridhar
  1 sibling, 1 reply; 85+ messages in thread
From: Michael S. Tsirkin @ 2018-09-07 21:34 UTC (permalink / raw)
  To: Sridhar Samudrala; +Cc: cohuck, virtio-dev

On Wed, Aug 15, 2018 at 11:49:15AM -0700, Sridhar Samudrala wrote:
> VIRTIO_NET_F_STANDBY feature enables hypervisor to indicate virtio_net
> device to act as a standby for another device with the same MAC address.
> 
> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
> Acked-by: Cornelia Huck <cohuck@redhat.com>
> Fixes: https://github.com/oasis-tcs/virtio-spec/issues/18

Applied but when do you plan to add documentation as pointed
out by Jan and Halil?

> ---
>  content.tex | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/content.tex b/content.tex
> index be18234..42a0e7e 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -2525,6 +2525,9 @@ features.
>  
>  \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
>      channel.
> +
> +\item[VIRTIO_NET_F_STANDBY(62)] Device may act as a standby for a primary
> +    device with the same MAC address.
>  \end{description}
>  
>  \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device / Feature bits / Feature bit requirements}
> @@ -2614,6 +2617,9 @@ level ethernet header length) size with \field{gso_type} NONE or ECN, and do
>  so without fragmentation, after VIRTIO_NET_F_MTU has been successfully
>  negotiated.
>  
> +If the driver negotiates the VIRTIO_NET_F_STANDBY feature, the device MAY act
> +as a standby device for a primary device with the same MAC address.
> +
>  \drivernormative{\subsubsection}{Device configuration layout}{Device Types / Network Device / Device configuration layout}
>  
>  A driver SHOULD negotiate VIRTIO_NET_F_MAC if the device offers it.
> @@ -2636,6 +2642,8 @@ If the driver negotiates VIRTIO_NET_F_MTU, it MUST NOT transmit packets of
>  size exceeding the value of \field{mtu} (plus low level ethernet header length)
>  with \field{gso_type} NONE or ECN.
>  
> +A driver SHOULD negotiate the VIRTIO_NET_F_STANDBY feature if the device offers it.
> +
>  \subsubsection{Legacy Interface: Device configuration layout}\label{sec:Device Types / Network Device / Device configuration layout / Legacy Interface: Device configuration layout}
>  \label{sec:Device Types / Block Device / Feature bits / Device configuration layout / Legacy Interface: Device configuration layout}
>  When using the legacy interface, transitional devices and drivers
> -- 
> 2.14.4
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-09-07 21:34 ` [virtio-dev] " Michael S. Tsirkin
@ 2018-09-12 15:17   ` Samudrala, Sridhar
  2018-09-12 15:22     ` Michael S. Tsirkin
  0 siblings, 1 reply; 85+ messages in thread
From: Samudrala, Sridhar @ 2018-09-12 15:17 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: cohuck, virtio-dev



On 9/7/2018 2:34 PM, Michael S. Tsirkin wrote:
> On Wed, Aug 15, 2018 at 11:49:15AM -0700, Sridhar Samudrala wrote:
>> VIRTIO_NET_F_STANDBY feature enables hypervisor to indicate virtio_net
>> device to act as a standby for another device with the same MAC address.
>>
>> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
>> Acked-by: Cornelia Huck <cohuck@redhat.com>
>> Fixes: https://github.com/oasis-tcs/virtio-spec/issues/18
> Applied but when do you plan to add documentation as pointed
> out by Jan and Halil?

I thought additional documentation will be done as part of the Qemu enablement
patches and i hope someone in RH is looking into it.

Does it make sense to add a link to to the kernel documentation of this feature in
the spec
  https://www.kernel.org/doc/html/latest/networking/net_failover.html


>
>> ---
>>   content.tex | 8 ++++++++
>>   1 file changed, 8 insertions(+)
>>
>> diff --git a/content.tex b/content.tex
>> index be18234..42a0e7e 100644
>> --- a/content.tex
>> +++ b/content.tex
>> @@ -2525,6 +2525,9 @@ features.
>>   
>>   \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
>>       channel.
>> +
>> +\item[VIRTIO_NET_F_STANDBY(62)] Device may act as a standby for a primary
>> +    device with the same MAC address.
>>   \end{description}
>>   
>>   \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device / Feature bits / Feature bit requirements}
>> @@ -2614,6 +2617,9 @@ level ethernet header length) size with \field{gso_type} NONE or ECN, and do
>>   so without fragmentation, after VIRTIO_NET_F_MTU has been successfully
>>   negotiated.
>>   
>> +If the driver negotiates the VIRTIO_NET_F_STANDBY feature, the device MAY act
>> +as a standby device for a primary device with the same MAC address.
>> +
>>   \drivernormative{\subsubsection}{Device configuration layout}{Device Types / Network Device / Device configuration layout}
>>   
>>   A driver SHOULD negotiate VIRTIO_NET_F_MAC if the device offers it.
>> @@ -2636,6 +2642,8 @@ If the driver negotiates VIRTIO_NET_F_MTU, it MUST NOT transmit packets of
>>   size exceeding the value of \field{mtu} (plus low level ethernet header length)
>>   with \field{gso_type} NONE or ECN.
>>   
>> +A driver SHOULD negotiate the VIRTIO_NET_F_STANDBY feature if the device offers it.
>> +
>>   \subsubsection{Legacy Interface: Device configuration layout}\label{sec:Device Types / Network Device / Device configuration layout / Legacy Interface: Device configuration layout}
>>   \label{sec:Device Types / Block Device / Feature bits / Device configuration layout / Legacy Interface: Device configuration layout}
>>   When using the legacy interface, transitional devices and drivers
>> -- 
>> 2.14.4
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-09-12 15:17   ` Samudrala, Sridhar
@ 2018-09-12 15:22     ` Michael S. Tsirkin
  2018-09-18 10:20       ` Cornelia Huck
  0 siblings, 1 reply; 85+ messages in thread
From: Michael S. Tsirkin @ 2018-09-12 15:22 UTC (permalink / raw)
  To: Samudrala, Sridhar; +Cc: cohuck, virtio-dev

On Wed, Sep 12, 2018 at 08:17:45AM -0700, Samudrala, Sridhar wrote:
> 
> 
> On 9/7/2018 2:34 PM, Michael S. Tsirkin wrote:
> > On Wed, Aug 15, 2018 at 11:49:15AM -0700, Sridhar Samudrala wrote:
> > > VIRTIO_NET_F_STANDBY feature enables hypervisor to indicate virtio_net
> > > device to act as a standby for another device with the same MAC address.
> > > 
> > > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
> > > Acked-by: Cornelia Huck <cohuck@redhat.com>
> > > Fixes: https://github.com/oasis-tcs/virtio-spec/issues/18
> > Applied but when do you plan to add documentation as pointed
> > out by Jan and Halil?
> 
> I thought additional documentation will be done as part of the Qemu enablement
> patches and i hope someone in RH is looking into it.
> 
> Does it make sense to add a link to to the kernel documentation of this feature in
> the spec
>  https://www.kernel.org/doc/html/latest/networking/net_failover.html


I do not think this will address the comments posted.  Specifically we
should probably include documentation for what is a standby and primary:
what is expected of driver (maintain configuration on standby, support
primary coming and going, transmit on standby only if there is no
primary) and of device (have same mac for standby as for standby).


> 
> > 
> > > ---
> > >   content.tex | 8 ++++++++
> > >   1 file changed, 8 insertions(+)
> > > 
> > > diff --git a/content.tex b/content.tex
> > > index be18234..42a0e7e 100644
> > > --- a/content.tex
> > > +++ b/content.tex
> > > @@ -2525,6 +2525,9 @@ features.
> > >   \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
> > >       channel.
> > > +
> > > +\item[VIRTIO_NET_F_STANDBY(62)] Device may act as a standby for a primary
> > > +    device with the same MAC address.
> > >   \end{description}
> > >   \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device / Feature bits / Feature bit requirements}
> > > @@ -2614,6 +2617,9 @@ level ethernet header length) size with \field{gso_type} NONE or ECN, and do
> > >   so without fragmentation, after VIRTIO_NET_F_MTU has been successfully
> > >   negotiated.
> > > +If the driver negotiates the VIRTIO_NET_F_STANDBY feature, the device MAY act
> > > +as a standby device for a primary device with the same MAC address.
> > > +
> > >   \drivernormative{\subsubsection}{Device configuration layout}{Device Types / Network Device / Device configuration layout}
> > >   A driver SHOULD negotiate VIRTIO_NET_F_MAC if the device offers it.
> > > @@ -2636,6 +2642,8 @@ If the driver negotiates VIRTIO_NET_F_MTU, it MUST NOT transmit packets of
> > >   size exceeding the value of \field{mtu} (plus low level ethernet header length)
> > >   with \field{gso_type} NONE or ECN.
> > > +A driver SHOULD negotiate the VIRTIO_NET_F_STANDBY feature if the device offers it.
> > > +
> > >   \subsubsection{Legacy Interface: Device configuration layout}\label{sec:Device Types / Network Device / Device configuration layout / Legacy Interface: Device configuration layout}
> > >   \label{sec:Device Types / Block Device / Feature bits / Device configuration layout / Legacy Interface: Device configuration layout}
> > >   When using the legacy interface, transitional devices and drivers
> > > -- 
> > > 2.14.4
> > > 
> > > 
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-09-12 15:22     ` Michael S. Tsirkin
@ 2018-09-18 10:20       ` Cornelia Huck
  2018-09-18 10:37         ` Sameeh Jubran
  2018-09-18 13:35         ` Michael S. Tsirkin
  0 siblings, 2 replies; 85+ messages in thread
From: Cornelia Huck @ 2018-09-18 10:20 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Samudrala, Sridhar, virtio-dev

On Wed, 12 Sep 2018 11:22:12 -0400
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Wed, Sep 12, 2018 at 08:17:45AM -0700, Samudrala, Sridhar wrote:
> > 
> > 
> > On 9/7/2018 2:34 PM, Michael S. Tsirkin wrote:  
> > > On Wed, Aug 15, 2018 at 11:49:15AM -0700, Sridhar Samudrala wrote:  
> > > > VIRTIO_NET_F_STANDBY feature enables hypervisor to indicate virtio_net
> > > > device to act as a standby for another device with the same MAC address.
> > > > 
> > > > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
> > > > Acked-by: Cornelia Huck <cohuck@redhat.com>
> > > > Fixes: https://github.com/oasis-tcs/virtio-spec/issues/18  
> > > Applied but when do you plan to add documentation as pointed
> > > out by Jan and Halil?  
> > 
> > I thought additional documentation will be done as part of the Qemu enablement
> > patches and i hope someone in RH is looking into it.
> > 
> > Does it make sense to add a link to to the kernel documentation of this feature in
> > the spec
> >  https://www.kernel.org/doc/html/latest/networking/net_failover.html  
> 
> 
> I do not think this will address the comments posted.  Specifically we
> should probably include documentation for what is a standby and primary:
> what is expected of driver (maintain configuration on standby, support
> primary coming and going, transmit on standby only if there is no
> primary) and of device (have same mac for standby as for standby).

Yes, we need some definitive statements of what a driver and a device
is supposed to do in order to conform; it might make sense to discuss
this in conjunction with discussion on any QEMU patches (have not
checked whether anything has been posted, just returned from vacation).

I assume that we still stick with the plan to implement/document
MAC-based handling first and then enhance with other methods later?

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-09-18 10:20       ` Cornelia Huck
@ 2018-09-18 10:37         ` Sameeh Jubran
  2018-09-18 13:25           ` Michael S. Tsirkin
  2018-09-18 13:35         ` Michael S. Tsirkin
  1 sibling, 1 reply; 85+ messages in thread
From: Sameeh Jubran @ 2018-09-18 10:37 UTC (permalink / raw)
  To: cohuck; +Cc: Michael S. Tsirkin, sridhar.samudrala, virtio-dev

On Tue, Sep 18, 2018 at 1:21 PM Cornelia Huck <cohuck@redhat.com> wrote:
>
> On Wed, 12 Sep 2018 11:22:12 -0400
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
>
> > On Wed, Sep 12, 2018 at 08:17:45AM -0700, Samudrala, Sridhar wrote:
> > >
> > >
> > > On 9/7/2018 2:34 PM, Michael S. Tsirkin wrote:
> > > > On Wed, Aug 15, 2018 at 11:49:15AM -0700, Sridhar Samudrala wrote:
> > > > > VIRTIO_NET_F_STANDBY feature enables hypervisor to indicate virtio_net
> > > > > device to act as a standby for another device with the same MAC address.
> > > > >
> > > > > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
> > > > > Acked-by: Cornelia Huck <cohuck@redhat.com>
> > > > > Fixes: https://github.com/oasis-tcs/virtio-spec/issues/18
> > > > Applied but when do you plan to add documentation as pointed
> > > > out by Jan and Halil?
> > >
> > > I thought additional documentation will be done as part of the Qemu enablement
> > > patches and i hope someone in RH is looking into it.
> > >
> > > Does it make sense to add a link to to the kernel documentation of this feature in
> > > the spec
> > >  https://www.kernel.org/doc/html/latest/networking/net_failover.html
> >
> >
> > I do not think this will address the comments posted.  Specifically we
> > should probably include documentation for what is a standby and primary:
> > what is expected of driver (maintain configuration on standby, support
> > primary coming and going, transmit on standby only if there is no
> > primary) and of device (have same mac for standby as for standby).
>
> Yes, we need some definitive statements of what a driver and a device
> is supposed to do in order to conform; it might make sense to discuss
> this in conjunction with discussion on any QEMU patches (have not
> checked whether anything has been posted, just returned from vacation).
>
> I assume that we still stick with the plan to implement/document
> MAC-based handling first and then enhance with other methods later?

I am currently in the process of writing the patches for this feature,
I have thought about how the feature should be implemented
and decided to go with a different approach. I've decided that the id
of the vfio attached device will be specified in the virtio-net
arguments as follows:

-device virtio-net,standby=<device_id_of_vfio_device>
-vfio #address,id=<device_id_of_vfio_device>

This approach makes minimal changes to the current infrastructure and
does so elegantly without adding unnecessary ids to the bridges.

The mac address approach seems to be very complicated as there is no
standard way to find the mac address of a given device and it is
vendor dependent,
which makes the task of identifying the target standby device by it's
mac address a very tough one.

Please share your thoughts so I'll move forward with the patches.

An initial patch which implements hiding the device from pci bus
before the feature is acked is provided below:

commit b716371bf4807fe16ffb4ffd901b69a110902a3c (HEAD -> failover)
Author: Sameeh Jubran <sjubran@redhat.com>
Date:   Sun Sep 16 13:21:41 2018 +0300

    virtio-net: Implement standby feature

    Signed-off-by: Sameeh Jubran <sjubran@redhat.com>

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index f154756e85..46386c0e1b 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -26,7 +26,9 @@
 #include "qapi/qapi-events-net.h"
 #include "hw/virtio/virtio-access.h"
 #include "migration/misc.h"
+#include "hw/pci/pci.h"
 #include "standard-headers/linux/ethtool.h"
+#include "hw/vfio/vfio-common.h"

 #define VIRTIO_NET_VM_VERSION    11

@@ -1946,6 +1948,13 @@ void virtio_net_set_netclient_name(VirtIONet
*n, const char *name,
     n->netclient_type = g_strdup(type);
 }

+static bool standby_device_present(VirtIONet *n, const char *id,
+        struct PCIDevice **pdev)
+{
+    return pci_qdev_find_device(id, pdev) >= 0 && pdev &&
+    vfio_is_vfio_pci(*pdev);
+}
+
 static void virtio_net_device_realize(DeviceState *dev, Error **errp)
 {
     VirtIODevice *vdev = VIRTIO_DEVICE(dev);
@@ -1976,6 +1985,21 @@ static void
virtio_net_device_realize(DeviceState *dev, Error **errp)
         n->host_features |= (1ULL << VIRTIO_NET_F_SPEED_DUPLEX);
     }

+    if (n->net_conf.standby_id_str && standby_device_present(n,
+                n->net_conf.standby_id_str, &n->standby_pdev)) {
+        DeviceState *dev = DEVICE(n->standby_pdev);
+        DeviceClass *klass = DEVICE_GET_CLASS(dev);
+        /* Hide standby from pci till the feature is acked */
+        if (klass->hotpluggable)
+        {
+            qdev_unplug(dev, errp);
+            if (errp == NULL)
+            {
+                n->host_features |= (1ULL << VIRTIO_NET_F_STANDBY);
+            }
+        }
+    }
+
     virtio_net_set_config_size(n, n->host_features);
     virtio_init(vdev, "virtio-net", VIRTIO_ID_NET, n->config_size);

@@ -2198,6 +2222,7 @@ static Property virtio_net_properties[] = {
                      true),
     DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN),
     DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str),
+    DEFINE_PROP_STRING("standby", VirtIONet, net_conf.standby_id_str),
     DEFINE_PROP_END_OF_LIST(),
 };

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 866f0deeb7..593debe56e 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -220,6 +220,12 @@ static void vfio_intx_disable_kvm(VFIOPCIDevice *vdev)
 #endif
 }

+bool vfio_is_vfio_pci(PCIDevice* pdev)
+{
+    VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
+    return vdev->vbasedev.type == VFIO_DEVICE_TYPE_PCI;
+}
+
 static void vfio_intx_update(PCIDevice *pdev)
 {
     VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 821def0565..26dfde805f 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -195,5 +195,6 @@ int vfio_spapr_create_window(VFIOContainer *container,
                              hwaddr *pgsize);
 int vfio_spapr_remove_window(VFIOContainer *container,
                              hwaddr offset_within_address_space);
+bool vfio_is_vfio_pci(PCIDevice* pdev);

 #endif /* HW_VFIO_VFIO_COMMON_H */
diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
index 4d7f3c82ca..94388b40cb 100644
--- a/include/hw/virtio/virtio-net.h
+++ b/include/hw/virtio/virtio-net.h
@@ -42,6 +42,7 @@ typedef struct virtio_net_conf
     int32_t speed;
     char *duplex_str;
     uint8_t duplex;
+    char *standby_id_str;
 } virtio_net_conf;

 /* Maximum packet size we can receive from tap device: header + 64k */
@@ -103,6 +104,7 @@ typedef struct VirtIONet {
     int announce_counter;
     bool needs_vnet_hdr_swap;
     bool mtu_bypass_backend;
+    PCIDevice *standby_pdev;
 } VirtIONet;

 void virtio_net_set_netclient_name(VirtIONet *n, const char *name,
(END)

>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>


-- 
Respectfully,
Sameeh Jubran
Linkedin
Software Engineer @ Daynix.

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-09-18 10:37         ` Sameeh Jubran
@ 2018-09-18 13:25           ` Michael S. Tsirkin
  2018-09-18 18:30             ` Siwei Liu
                               ` (2 more replies)
  0 siblings, 3 replies; 85+ messages in thread
From: Michael S. Tsirkin @ 2018-09-18 13:25 UTC (permalink / raw)
  To: Sameeh Jubran; +Cc: cohuck, sridhar.samudrala, virtio-dev

On Tue, Sep 18, 2018 at 01:37:35PM +0300, Sameeh Jubran wrote:
> On Tue, Sep 18, 2018 at 1:21 PM Cornelia Huck <cohuck@redhat.com> wrote:
> >
> > On Wed, 12 Sep 2018 11:22:12 -0400
> > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >
> > > On Wed, Sep 12, 2018 at 08:17:45AM -0700, Samudrala, Sridhar wrote:
> > > >
> > > >
> > > > On 9/7/2018 2:34 PM, Michael S. Tsirkin wrote:
> > > > > On Wed, Aug 15, 2018 at 11:49:15AM -0700, Sridhar Samudrala wrote:
> > > > > > VIRTIO_NET_F_STANDBY feature enables hypervisor to indicate virtio_net
> > > > > > device to act as a standby for another device with the same MAC address.
> > > > > >
> > > > > > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
> > > > > > Acked-by: Cornelia Huck <cohuck@redhat.com>
> > > > > > Fixes: https://github.com/oasis-tcs/virtio-spec/issues/18
> > > > > Applied but when do you plan to add documentation as pointed
> > > > > out by Jan and Halil?
> > > >
> > > > I thought additional documentation will be done as part of the Qemu enablement
> > > > patches and i hope someone in RH is looking into it.
> > > >
> > > > Does it make sense to add a link to to the kernel documentation of this feature in
> > > > the spec
> > > >  https://www.kernel.org/doc/html/latest/networking/net_failover.html
> > >
> > >
> > > I do not think this will address the comments posted.  Specifically we
> > > should probably include documentation for what is a standby and primary:
> > > what is expected of driver (maintain configuration on standby, support
> > > primary coming and going, transmit on standby only if there is no
> > > primary) and of device (have same mac for standby as for standby).
> >
> > Yes, we need some definitive statements of what a driver and a device
> > is supposed to do in order to conform; it might make sense to discuss
> > this in conjunction with discussion on any QEMU patches (have not
> > checked whether anything has been posted, just returned from vacation).
> >
> > I assume that we still stick with the plan to implement/document
> > MAC-based handling first and then enhance with other methods later?
> 
> I am currently in the process of writing the patches for this feature,
> I have thought about how the feature should be implemented
> and decided to go with a different approach. I've decided that the id
> of the vfio attached device will be specified in the virtio-net
> arguments as follows:
> 
> -device virtio-net,standby=<device_id_of_vfio_device>
> -vfio #address,id=<device_id_of_vfio_device>
> 
> This approach makes minimal changes to the current infrastructure and
> does so elegantly without adding unnecessary ids to the bridges.
> 
> The mac address approach seems to be very complicated as there is no
> standard way to find the mac address of a given device and it is
> vendor dependent,
> which makes the task of identifying the target standby device by it's
> mac address a very tough one.

Oh mac address is used by guest. I agree it's not a great qemu
interface.
The idea was basically to have -vfio #address,primary=<id>


> Please share your thoughts so I'll move forward with the patches.

Can this actually support hotplug add and remove of the vfio device though?
E.g. hotplug add vfio device while VM is already running?
With the primary=<> it works because standby must always exist
even when primary isn't there.


> An initial patch which implements hiding the device from pci bus
> before the feature is acked is provided below:
> 
> commit b716371bf4807fe16ffb4ffd901b69a110902a3c (HEAD -> failover)
> Author: Sameeh Jubran <sjubran@redhat.com>
> Date:   Sun Sep 16 13:21:41 2018 +0300
> 
>     virtio-net: Implement standby feature
> 
>     Signed-off-by: Sameeh Jubran <sjubran@redhat.com>
> 
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index f154756e85..46386c0e1b 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -26,7 +26,9 @@
>  #include "qapi/qapi-events-net.h"
>  #include "hw/virtio/virtio-access.h"
>  #include "migration/misc.h"
> +#include "hw/pci/pci.h"
>  #include "standard-headers/linux/ethtool.h"
> +#include "hw/vfio/vfio-common.h"
> 
>  #define VIRTIO_NET_VM_VERSION    11
> 
> @@ -1946,6 +1948,13 @@ void virtio_net_set_netclient_name(VirtIONet
> *n, const char *name,
>      n->netclient_type = g_strdup(type);
>  }
> 
> +static bool standby_device_present(VirtIONet *n, const char *id,
> +        struct PCIDevice **pdev)
> +{
> +    return pci_qdev_find_device(id, pdev) >= 0 && pdev &&
> +    vfio_is_vfio_pci(*pdev);
> +}
> +
>  static void virtio_net_device_realize(DeviceState *dev, Error **errp)
>  {
>      VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> @@ -1976,6 +1985,21 @@ static void
> virtio_net_device_realize(DeviceState *dev, Error **errp)
>          n->host_features |= (1ULL << VIRTIO_NET_F_SPEED_DUPLEX);
>      }
> 
> +    if (n->net_conf.standby_id_str && standby_device_present(n,
> +                n->net_conf.standby_id_str, &n->standby_pdev)) {
> +        DeviceState *dev = DEVICE(n->standby_pdev);
> +        DeviceClass *klass = DEVICE_GET_CLASS(dev);
> +        /* Hide standby from pci till the feature is acked */
> +        if (klass->hotpluggable)
> +        {
> +            qdev_unplug(dev, errp);


Does this really hide the device?
I see:
    hdc = HOTPLUG_HANDLER_GET_CLASS(hotplug_ctrl);
    if (hdc->unplug_request) {
        hotplug_handler_unplug_request(hotplug_ctrl, dev, errp);
    } else {
        hotplug_handler_unplug(hotplug_ctrl, dev, errp);
    }

which seems to just send an eject request to guest - the reverse of
what we want to do.

> +            if (errp == NULL)
> +            {
> +                n->host_features |= (1ULL << VIRTIO_NET_F_STANDBY);
> +            }

I'm not sure how is this error handling supposed to work.

> +        }
> +    }
> +
>      virtio_net_set_config_size(n, n->host_features);
>      virtio_init(vdev, "virtio-net", VIRTIO_ID_NET, n->config_size);
> 
> @@ -2198,6 +2222,7 @@ static Property virtio_net_properties[] = {
>                       true),
>      DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN),
>      DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str),
> +    DEFINE_PROP_STRING("standby", VirtIONet, net_conf.standby_id_str),
>      DEFINE_PROP_END_OF_LIST(),
>  };
> 
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 866f0deeb7..593debe56e 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -220,6 +220,12 @@ static void vfio_intx_disable_kvm(VFIOPCIDevice *vdev)
>  #endif
>  }
> 
> +bool vfio_is_vfio_pci(PCIDevice* pdev)
> +{
> +    VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
> +    return vdev->vbasedev.type == VFIO_DEVICE_TYPE_PCI;
> +}
> +
>  static void vfio_intx_update(PCIDevice *pdev)
>  {
>      VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index 821def0565..26dfde805f 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -195,5 +195,6 @@ int vfio_spapr_create_window(VFIOContainer *container,
>                               hwaddr *pgsize);
>  int vfio_spapr_remove_window(VFIOContainer *container,
>                               hwaddr offset_within_address_space);
> +bool vfio_is_vfio_pci(PCIDevice* pdev);
> 
>  #endif /* HW_VFIO_VFIO_COMMON_H */
> diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
> index 4d7f3c82ca..94388b40cb 100644
> --- a/include/hw/virtio/virtio-net.h
> +++ b/include/hw/virtio/virtio-net.h
> @@ -42,6 +42,7 @@ typedef struct virtio_net_conf
>      int32_t speed;
>      char *duplex_str;
>      uint8_t duplex;
> +    char *standby_id_str;
>  } virtio_net_conf;
> 
>  /* Maximum packet size we can receive from tap device: header + 64k */
> @@ -103,6 +104,7 @@ typedef struct VirtIONet {
>      int announce_counter;
>      bool needs_vnet_hdr_swap;
>      bool mtu_bypass_backend;
> +    PCIDevice *standby_pdev;
>  } VirtIONet;
> 
>  void virtio_net_set_netclient_name(VirtIONet *n, const char *name,
> (END)
> 
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> >
> 
> 
> -- 
> Respectfully,
> Sameeh Jubran
> Linkedin
> Software Engineer @ Daynix.

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-09-18 10:20       ` Cornelia Huck
  2018-09-18 10:37         ` Sameeh Jubran
@ 2018-09-18 13:35         ` Michael S. Tsirkin
  2018-09-18 15:13           ` Venu Busireddy
  1 sibling, 1 reply; 85+ messages in thread
From: Michael S. Tsirkin @ 2018-09-18 13:35 UTC (permalink / raw)
  To: Cornelia Huck; +Cc: Samudrala, Sridhar, virtio-dev

On Tue, Sep 18, 2018 at 12:20:52PM +0200, Cornelia Huck wrote:
> On Wed, 12 Sep 2018 11:22:12 -0400
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > On Wed, Sep 12, 2018 at 08:17:45AM -0700, Samudrala, Sridhar wrote:
> > > 
> > > 
> > > On 9/7/2018 2:34 PM, Michael S. Tsirkin wrote:  
> > > > On Wed, Aug 15, 2018 at 11:49:15AM -0700, Sridhar Samudrala wrote:  
> > > > > VIRTIO_NET_F_STANDBY feature enables hypervisor to indicate virtio_net
> > > > > device to act as a standby for another device with the same MAC address.
> > > > > 
> > > > > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
> > > > > Acked-by: Cornelia Huck <cohuck@redhat.com>
> > > > > Fixes: https://github.com/oasis-tcs/virtio-spec/issues/18  
> > > > Applied but when do you plan to add documentation as pointed
> > > > out by Jan and Halil?  
> > > 
> > > I thought additional documentation will be done as part of the Qemu enablement
> > > patches and i hope someone in RH is looking into it.
> > > 
> > > Does it make sense to add a link to to the kernel documentation of this feature in
> > > the spec
> > >  https://www.kernel.org/doc/html/latest/networking/net_failover.html  
> > 
> > 
> > I do not think this will address the comments posted.  Specifically we
> > should probably include documentation for what is a standby and primary:
> > what is expected of driver (maintain configuration on standby, support
> > primary coming and going, transmit on standby only if there is no
> > primary) and of device (have same mac for standby as for standby).
> 
> Yes, we need some definitive statements of what a driver and a device
> is supposed to do in order to conform; it might make sense to discuss
> this in conjunction with discussion on any QEMU patches (have not
> checked whether anything has been posted, just returned from vacation).
> 
> I assume that we still stick with the plan to implement/document
> MAC-based handling first and then enhance with other methods later?

I'm fine with that at least. If someone wants to work on
other methods straight away, that's also fine by me.

-- 
MST

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-09-18 13:35         ` Michael S. Tsirkin
@ 2018-09-18 15:13           ` Venu Busireddy
  2018-09-18 15:31             ` Michael S. Tsirkin
  0 siblings, 1 reply; 85+ messages in thread
From: Venu Busireddy @ 2018-09-18 15:13 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Cornelia Huck, Samudrala, Sridhar, virtio-dev

On 2018-09-18 09:35:48 -0400, Michael S. Tsirkin wrote:
> On Tue, Sep 18, 2018 at 12:20:52PM +0200, Cornelia Huck wrote:
> > On Wed, 12 Sep 2018 11:22:12 -0400
> > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > 
> > > On Wed, Sep 12, 2018 at 08:17:45AM -0700, Samudrala, Sridhar wrote:
> > > > 
> > > > 
> > > > On 9/7/2018 2:34 PM, Michael S. Tsirkin wrote:  
> > > > > On Wed, Aug 15, 2018 at 11:49:15AM -0700, Sridhar Samudrala wrote:  
> > > > > > VIRTIO_NET_F_STANDBY feature enables hypervisor to indicate virtio_net
> > > > > > device to act as a standby for another device with the same MAC address.
> > > > > > 
> > > > > > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
> > > > > > Acked-by: Cornelia Huck <cohuck@redhat.com>
> > > > > > Fixes: https://github.com/oasis-tcs/virtio-spec/issues/18  
> > > > > Applied but when do you plan to add documentation as pointed
> > > > > out by Jan and Halil?  
> > > > 
> > > > I thought additional documentation will be done as part of the Qemu enablement
> > > > patches and i hope someone in RH is looking into it.
> > > > 
> > > > Does it make sense to add a link to to the kernel documentation of this feature in
> > > > the spec
> > > >  https://www.kernel.org/doc/html/latest/networking/net_failover.html  
> > > 
> > > 
> > > I do not think this will address the comments posted.  Specifically we
> > > should probably include documentation for what is a standby and primary:
> > > what is expected of driver (maintain configuration on standby, support
> > > primary coming and going, transmit on standby only if there is no
> > > primary) and of device (have same mac for standby as for standby).
> > 
> > Yes, we need some definitive statements of what a driver and a device
> > is supposed to do in order to conform; it might make sense to discuss
> > this in conjunction with discussion on any QEMU patches (have not
> > checked whether anything has been posted, just returned from vacation).
> > 
> > I assume that we still stick with the plan to implement/document
> > MAC-based handling first and then enhance with other methods later?
> 
> I'm fine with that at least. If someone wants to work on
> other methods straight away, that's also fine by me.

Patch set [1] implements the failover-group-id mechanism. Are you
thinking of some other method?

Venu

[1] https://lists.oasis-open.org/archives/virtio-dev/201806/msg00384.html



---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-09-18 15:13           ` Venu Busireddy
@ 2018-09-18 15:31             ` Michael S. Tsirkin
  2018-09-18 18:48               ` Siwei Liu
  0 siblings, 1 reply; 85+ messages in thread
From: Michael S. Tsirkin @ 2018-09-18 15:31 UTC (permalink / raw)
  To: Venu Busireddy; +Cc: Cornelia Huck, Samudrala, Sridhar, virtio-dev

On Tue, Sep 18, 2018 at 10:13:37AM -0500, Venu Busireddy wrote:
> On 2018-09-18 09:35:48 -0400, Michael S. Tsirkin wrote:
> > On Tue, Sep 18, 2018 at 12:20:52PM +0200, Cornelia Huck wrote:
> > > On Wed, 12 Sep 2018 11:22:12 -0400
> > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > 
> > > > On Wed, Sep 12, 2018 at 08:17:45AM -0700, Samudrala, Sridhar wrote:
> > > > > 
> > > > > 
> > > > > On 9/7/2018 2:34 PM, Michael S. Tsirkin wrote:  
> > > > > > On Wed, Aug 15, 2018 at 11:49:15AM -0700, Sridhar Samudrala wrote:  
> > > > > > > VIRTIO_NET_F_STANDBY feature enables hypervisor to indicate virtio_net
> > > > > > > device to act as a standby for another device with the same MAC address.
> > > > > > > 
> > > > > > > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
> > > > > > > Acked-by: Cornelia Huck <cohuck@redhat.com>
> > > > > > > Fixes: https://github.com/oasis-tcs/virtio-spec/issues/18  
> > > > > > Applied but when do you plan to add documentation as pointed
> > > > > > out by Jan and Halil?  
> > > > > 
> > > > > I thought additional documentation will be done as part of the Qemu enablement
> > > > > patches and i hope someone in RH is looking into it.
> > > > > 
> > > > > Does it make sense to add a link to to the kernel documentation of this feature in
> > > > > the spec
> > > > >  https://www.kernel.org/doc/html/latest/networking/net_failover.html  
> > > > 
> > > > 
> > > > I do not think this will address the comments posted.  Specifically we
> > > > should probably include documentation for what is a standby and primary:
> > > > what is expected of driver (maintain configuration on standby, support
> > > > primary coming and going, transmit on standby only if there is no
> > > > primary) and of device (have same mac for standby as for standby).
> > > 
> > > Yes, we need some definitive statements of what a driver and a device
> > > is supposed to do in order to conform; it might make sense to discuss
> > > this in conjunction with discussion on any QEMU patches (have not
> > > checked whether anything has been posted, just returned from vacation).
> > > 
> > > I assume that we still stick with the plan to implement/document
> > > MAC-based handling first and then enhance with other methods later?
> > 
> > I'm fine with that at least. If someone wants to work on
> > other methods straight away, that's also fine by me.
> 
> Patch set [1] implements the failover-group-id mechanism. Are you
> thinking of some other method?
> 
> Venu
> 
> [1] https://lists.oasis-open.org/archives/virtio-dev/201806/msg00384.html
> 

Yes, the grouping mechanism seems fine to me (I don't remember
about the implementation, it's been a while).

It is not by itself sufficient though, is it?

MAC is assumed to be shared to avoid things like ARP/neighboor
rediscovery, right?
If true that implies that to avoid guest confusion visibility of the
primary needs to be controlled by standby's driver.
This makes this patchset incomplete.

For this work to be complete what is needed is:
- hypervisor: add control of primary's visibility to guest
- guest: add support for this grouping to the failover driver

We also need
- spec: document matching rules based on the pci bridge

and it's helpful to have a spec proposal with implementation, but I
would say at least proposed patches to one of the above 2 would be
helpful before we include this in spec.

-- 
MST

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-09-18 13:25           ` Michael S. Tsirkin
@ 2018-09-18 18:30             ` Siwei Liu
  2018-09-18 18:39               ` Michael S. Tsirkin
  2018-09-19  5:03             ` Samudrala, Sridhar
  2018-09-20  5:51             ` Sameeh Jubran
  2 siblings, 1 reply; 85+ messages in thread
From: Siwei Liu @ 2018-09-18 18:30 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Sameeh Jubran, Cornelia Huck, Samudrala, Sridhar, virtio-dev

On Tue, Sep 18, 2018 at 6:25 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Tue, Sep 18, 2018 at 01:37:35PM +0300, Sameeh Jubran wrote:
>> On Tue, Sep 18, 2018 at 1:21 PM Cornelia Huck <cohuck@redhat.com> wrote:
>> >
>> > On Wed, 12 Sep 2018 11:22:12 -0400
>> > "Michael S. Tsirkin" <mst@redhat.com> wrote:
>> >
>> > > On Wed, Sep 12, 2018 at 08:17:45AM -0700, Samudrala, Sridhar wrote:
>> > > >
>> > > >
>> > > > On 9/7/2018 2:34 PM, Michael S. Tsirkin wrote:
>> > > > > On Wed, Aug 15, 2018 at 11:49:15AM -0700, Sridhar Samudrala wrote:
>> > > > > > VIRTIO_NET_F_STANDBY feature enables hypervisor to indicate virtio_net
>> > > > > > device to act as a standby for another device with the same MAC address.
>> > > > > >
>> > > > > > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
>> > > > > > Acked-by: Cornelia Huck <cohuck@redhat.com>
>> > > > > > Fixes: https://github.com/oasis-tcs/virtio-spec/issues/18
>> > > > > Applied but when do you plan to add documentation as pointed
>> > > > > out by Jan and Halil?
>> > > >
>> > > > I thought additional documentation will be done as part of the Qemu enablement
>> > > > patches and i hope someone in RH is looking into it.
>> > > >
>> > > > Does it make sense to add a link to to the kernel documentation of this feature in
>> > > > the spec
>> > > >  https://www.kernel.org/doc/html/latest/networking/net_failover.html
>> > >
>> > >
>> > > I do not think this will address the comments posted.  Specifically we
>> > > should probably include documentation for what is a standby and primary:
>> > > what is expected of driver (maintain configuration on standby, support
>> > > primary coming and going, transmit on standby only if there is no
>> > > primary) and of device (have same mac for standby as for standby).
>> >
>> > Yes, we need some definitive statements of what a driver and a device
>> > is supposed to do in order to conform; it might make sense to discuss
>> > this in conjunction with discussion on any QEMU patches (have not
>> > checked whether anything has been posted, just returned from vacation).
>> >
>> > I assume that we still stick with the plan to implement/document
>> > MAC-based handling first and then enhance with other methods later?
>>
>> I am currently in the process of writing the patches for this feature,
>> I have thought about how the feature should be implemented
>> and decided to go with a different approach. I've decided that the id
>> of the vfio attached device will be specified in the virtio-net
>> arguments as follows:
>>
>> -device virtio-net,standby=<device_id_of_vfio_device>
>> -vfio #address,id=<device_id_of_vfio_device>
>>
>> This approach makes minimal changes to the current infrastructure and
>> does so elegantly without adding unnecessary ids to the bridges.
>>
>> The mac address approach seems to be very complicated as there is no
>> standard way to find the mac address of a given device and it is
>> vendor dependent,
>> which makes the task of identifying the target standby device by it's
>> mac address a very tough one.
>
> Oh mac address is used by guest. I agree it's not a great qemu
> interface.
> The idea was basically to have -vfio #address,primary=<id>

Interesting... How do you make sure the MAC address are same (grouped)
between vfio and virtio-net-pci (from QEMU side)? I thought the spec
meant to make this a guest-host interface, right?

-Siwei

>
>
>> Please share your thoughts so I'll move forward with the patches.
>
> Can this actually support hotplug add and remove of the vfio device though?
> E.g. hotplug add vfio device while VM is already running?
> With the primary=<> it works because standby must always exist
> even when primary isn't there.
>
>
>> An initial patch which implements hiding the device from pci bus
>> before the feature is acked is provided below:
>>
>> commit b716371bf4807fe16ffb4ffd901b69a110902a3c (HEAD -> failover)
>> Author: Sameeh Jubran <sjubran@redhat.com>
>> Date:   Sun Sep 16 13:21:41 2018 +0300
>>
>>     virtio-net: Implement standby feature
>>
>>     Signed-off-by: Sameeh Jubran <sjubran@redhat.com>
>>
>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>> index f154756e85..46386c0e1b 100644
>> --- a/hw/net/virtio-net.c
>> +++ b/hw/net/virtio-net.c
>> @@ -26,7 +26,9 @@
>>  #include "qapi/qapi-events-net.h"
>>  #include "hw/virtio/virtio-access.h"
>>  #include "migration/misc.h"
>> +#include "hw/pci/pci.h"
>>  #include "standard-headers/linux/ethtool.h"
>> +#include "hw/vfio/vfio-common.h"
>>
>>  #define VIRTIO_NET_VM_VERSION    11
>>
>> @@ -1946,6 +1948,13 @@ void virtio_net_set_netclient_name(VirtIONet
>> *n, const char *name,
>>      n->netclient_type = g_strdup(type);
>>  }
>>
>> +static bool standby_device_present(VirtIONet *n, const char *id,
>> +        struct PCIDevice **pdev)
>> +{
>> +    return pci_qdev_find_device(id, pdev) >= 0 && pdev &&
>> +    vfio_is_vfio_pci(*pdev);
>> +}
>> +
>>  static void virtio_net_device_realize(DeviceState *dev, Error **errp)
>>  {
>>      VirtIODevice *vdev = VIRTIO_DEVICE(dev);
>> @@ -1976,6 +1985,21 @@ static void
>> virtio_net_device_realize(DeviceState *dev, Error **errp)
>>          n->host_features |= (1ULL << VIRTIO_NET_F_SPEED_DUPLEX);
>>      }
>>
>> +    if (n->net_conf.standby_id_str && standby_device_present(n,
>> +                n->net_conf.standby_id_str, &n->standby_pdev)) {
>> +        DeviceState *dev = DEVICE(n->standby_pdev);
>> +        DeviceClass *klass = DEVICE_GET_CLASS(dev);
>> +        /* Hide standby from pci till the feature is acked */
>> +        if (klass->hotpluggable)
>> +        {
>> +            qdev_unplug(dev, errp);
>
>
> Does this really hide the device?
> I see:
>     hdc = HOTPLUG_HANDLER_GET_CLASS(hotplug_ctrl);
>     if (hdc->unplug_request) {
>         hotplug_handler_unplug_request(hotplug_ctrl, dev, errp);
>     } else {
>         hotplug_handler_unplug(hotplug_ctrl, dev, errp);
>     }
>
> which seems to just send an eject request to guest - the reverse of
> what we want to do.
>
>> +            if (errp == NULL)
>> +            {
>> +                n->host_features |= (1ULL << VIRTIO_NET_F_STANDBY);
>> +            }
>
> I'm not sure how is this error handling supposed to work.
>
>> +        }
>> +    }
>> +
>>      virtio_net_set_config_size(n, n->host_features);
>>      virtio_init(vdev, "virtio-net", VIRTIO_ID_NET, n->config_size);
>>
>> @@ -2198,6 +2222,7 @@ static Property virtio_net_properties[] = {
>>                       true),
>>      DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN),
>>      DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str),
>> +    DEFINE_PROP_STRING("standby", VirtIONet, net_conf.standby_id_str),
>>      DEFINE_PROP_END_OF_LIST(),
>>  };
>>
>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>> index 866f0deeb7..593debe56e 100644
>> --- a/hw/vfio/pci.c
>> +++ b/hw/vfio/pci.c
>> @@ -220,6 +220,12 @@ static void vfio_intx_disable_kvm(VFIOPCIDevice *vdev)
>>  #endif
>>  }
>>
>> +bool vfio_is_vfio_pci(PCIDevice* pdev)
>> +{
>> +    VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
>> +    return vdev->vbasedev.type == VFIO_DEVICE_TYPE_PCI;
>> +}
>> +
>>  static void vfio_intx_update(PCIDevice *pdev)
>>  {
>>      VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>> index 821def0565..26dfde805f 100644
>> --- a/include/hw/vfio/vfio-common.h
>> +++ b/include/hw/vfio/vfio-common.h
>> @@ -195,5 +195,6 @@ int vfio_spapr_create_window(VFIOContainer *container,
>>                               hwaddr *pgsize);
>>  int vfio_spapr_remove_window(VFIOContainer *container,
>>                               hwaddr offset_within_address_space);
>> +bool vfio_is_vfio_pci(PCIDevice* pdev);
>>
>>  #endif /* HW_VFIO_VFIO_COMMON_H */
>> diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
>> index 4d7f3c82ca..94388b40cb 100644
>> --- a/include/hw/virtio/virtio-net.h
>> +++ b/include/hw/virtio/virtio-net.h
>> @@ -42,6 +42,7 @@ typedef struct virtio_net_conf
>>      int32_t speed;
>>      char *duplex_str;
>>      uint8_t duplex;
>> +    char *standby_id_str;
>>  } virtio_net_conf;
>>
>>  /* Maximum packet size we can receive from tap device: header + 64k */
>> @@ -103,6 +104,7 @@ typedef struct VirtIONet {
>>      int announce_counter;
>>      bool needs_vnet_hdr_swap;
>>      bool mtu_bypass_backend;
>> +    PCIDevice *standby_pdev;
>>  } VirtIONet;
>>
>>  void virtio_net_set_netclient_name(VirtIONet *n, const char *name,
>> (END)
>>
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
>> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>> >
>>
>>
>> --
>> Respectfully,
>> Sameeh Jubran
>> Linkedin
>> Software Engineer @ Daynix.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-09-18 18:30             ` Siwei Liu
@ 2018-09-18 18:39               ` Michael S. Tsirkin
  2018-09-18 19:10                 ` Siwei Liu
  0 siblings, 1 reply; 85+ messages in thread
From: Michael S. Tsirkin @ 2018-09-18 18:39 UTC (permalink / raw)
  To: Siwei Liu; +Cc: Sameeh Jubran, Cornelia Huck, Samudrala, Sridhar, virtio-dev

On Tue, Sep 18, 2018 at 11:30:27AM -0700, Siwei Liu wrote:
> On Tue, Sep 18, 2018 at 6:25 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Tue, Sep 18, 2018 at 01:37:35PM +0300, Sameeh Jubran wrote:
> >> On Tue, Sep 18, 2018 at 1:21 PM Cornelia Huck <cohuck@redhat.com> wrote:
> >> >
> >> > On Wed, 12 Sep 2018 11:22:12 -0400
> >> > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >> >
> >> > > On Wed, Sep 12, 2018 at 08:17:45AM -0700, Samudrala, Sridhar wrote:
> >> > > >
> >> > > >
> >> > > > On 9/7/2018 2:34 PM, Michael S. Tsirkin wrote:
> >> > > > > On Wed, Aug 15, 2018 at 11:49:15AM -0700, Sridhar Samudrala wrote:
> >> > > > > > VIRTIO_NET_F_STANDBY feature enables hypervisor to indicate virtio_net
> >> > > > > > device to act as a standby for another device with the same MAC address.
> >> > > > > >
> >> > > > > > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
> >> > > > > > Acked-by: Cornelia Huck <cohuck@redhat.com>
> >> > > > > > Fixes: https://github.com/oasis-tcs/virtio-spec/issues/18
> >> > > > > Applied but when do you plan to add documentation as pointed
> >> > > > > out by Jan and Halil?
> >> > > >
> >> > > > I thought additional documentation will be done as part of the Qemu enablement
> >> > > > patches and i hope someone in RH is looking into it.
> >> > > >
> >> > > > Does it make sense to add a link to to the kernel documentation of this feature in
> >> > > > the spec
> >> > > >  https://www.kernel.org/doc/html/latest/networking/net_failover.html
> >> > >
> >> > >
> >> > > I do not think this will address the comments posted.  Specifically we
> >> > > should probably include documentation for what is a standby and primary:
> >> > > what is expected of driver (maintain configuration on standby, support
> >> > > primary coming and going, transmit on standby only if there is no
> >> > > primary) and of device (have same mac for standby as for standby).
> >> >
> >> > Yes, we need some definitive statements of what a driver and a device
> >> > is supposed to do in order to conform; it might make sense to discuss
> >> > this in conjunction with discussion on any QEMU patches (have not
> >> > checked whether anything has been posted, just returned from vacation).
> >> >
> >> > I assume that we still stick with the plan to implement/document
> >> > MAC-based handling first and then enhance with other methods later?
> >>
> >> I am currently in the process of writing the patches for this feature,
> >> I have thought about how the feature should be implemented
> >> and decided to go with a different approach. I've decided that the id
> >> of the vfio attached device will be specified in the virtio-net
> >> arguments as follows:
> >>
> >> -device virtio-net,standby=<device_id_of_vfio_device>
> >> -vfio #address,id=<device_id_of_vfio_device>
> >>
> >> This approach makes minimal changes to the current infrastructure and
> >> does so elegantly without adding unnecessary ids to the bridges.
> >>
> >> The mac address approach seems to be very complicated as there is no
> >> standard way to find the mac address of a given device and it is
> >> vendor dependent,
> >> which makes the task of identifying the target standby device by it's
> >> mac address a very tough one.
> >
> > Oh mac address is used by guest. I agree it's not a great qemu
> > interface.
> > The idea was basically to have -vfio #address,primary=<id>
> 
> Interesting... How do you make sure the MAC address are same (grouped)
> between vfio and virtio-net-pci (from QEMU side)? I thought the spec
> meant to make this a guest-host interface, right?
> 
> -Siwei

I guess at this point that can be up to the management tool.


> >
> >
> >> Please share your thoughts so I'll move forward with the patches.
> >
> > Can this actually support hotplug add and remove of the vfio device though?
> > E.g. hotplug add vfio device while VM is already running?
> > With the primary=<> it works because standby must always exist
> > even when primary isn't there.
> >
> >
> >> An initial patch which implements hiding the device from pci bus
> >> before the feature is acked is provided below:
> >>
> >> commit b716371bf4807fe16ffb4ffd901b69a110902a3c (HEAD -> failover)
> >> Author: Sameeh Jubran <sjubran@redhat.com>
> >> Date:   Sun Sep 16 13:21:41 2018 +0300
> >>
> >>     virtio-net: Implement standby feature
> >>
> >>     Signed-off-by: Sameeh Jubran <sjubran@redhat.com>
> >>
> >> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> >> index f154756e85..46386c0e1b 100644
> >> --- a/hw/net/virtio-net.c
> >> +++ b/hw/net/virtio-net.c
> >> @@ -26,7 +26,9 @@
> >>  #include "qapi/qapi-events-net.h"
> >>  #include "hw/virtio/virtio-access.h"
> >>  #include "migration/misc.h"
> >> +#include "hw/pci/pci.h"
> >>  #include "standard-headers/linux/ethtool.h"
> >> +#include "hw/vfio/vfio-common.h"
> >>
> >>  #define VIRTIO_NET_VM_VERSION    11
> >>
> >> @@ -1946,6 +1948,13 @@ void virtio_net_set_netclient_name(VirtIONet
> >> *n, const char *name,
> >>      n->netclient_type = g_strdup(type);
> >>  }
> >>
> >> +static bool standby_device_present(VirtIONet *n, const char *id,
> >> +        struct PCIDevice **pdev)
> >> +{
> >> +    return pci_qdev_find_device(id, pdev) >= 0 && pdev &&
> >> +    vfio_is_vfio_pci(*pdev);
> >> +}
> >> +
> >>  static void virtio_net_device_realize(DeviceState *dev, Error **errp)
> >>  {
> >>      VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> >> @@ -1976,6 +1985,21 @@ static void
> >> virtio_net_device_realize(DeviceState *dev, Error **errp)
> >>          n->host_features |= (1ULL << VIRTIO_NET_F_SPEED_DUPLEX);
> >>      }
> >>
> >> +    if (n->net_conf.standby_id_str && standby_device_present(n,
> >> +                n->net_conf.standby_id_str, &n->standby_pdev)) {
> >> +        DeviceState *dev = DEVICE(n->standby_pdev);
> >> +        DeviceClass *klass = DEVICE_GET_CLASS(dev);
> >> +        /* Hide standby from pci till the feature is acked */
> >> +        if (klass->hotpluggable)
> >> +        {
> >> +            qdev_unplug(dev, errp);
> >
> >
> > Does this really hide the device?
> > I see:
> >     hdc = HOTPLUG_HANDLER_GET_CLASS(hotplug_ctrl);
> >     if (hdc->unplug_request) {
> >         hotplug_handler_unplug_request(hotplug_ctrl, dev, errp);
> >     } else {
> >         hotplug_handler_unplug(hotplug_ctrl, dev, errp);
> >     }
> >
> > which seems to just send an eject request to guest - the reverse of
> > what we want to do.
> >
> >> +            if (errp == NULL)
> >> +            {
> >> +                n->host_features |= (1ULL << VIRTIO_NET_F_STANDBY);
> >> +            }
> >
> > I'm not sure how is this error handling supposed to work.
> >
> >> +        }
> >> +    }
> >> +
> >>      virtio_net_set_config_size(n, n->host_features);
> >>      virtio_init(vdev, "virtio-net", VIRTIO_ID_NET, n->config_size);
> >>
> >> @@ -2198,6 +2222,7 @@ static Property virtio_net_properties[] = {
> >>                       true),
> >>      DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN),
> >>      DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str),
> >> +    DEFINE_PROP_STRING("standby", VirtIONet, net_conf.standby_id_str),
> >>      DEFINE_PROP_END_OF_LIST(),
> >>  };
> >>
> >> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> >> index 866f0deeb7..593debe56e 100644
> >> --- a/hw/vfio/pci.c
> >> +++ b/hw/vfio/pci.c
> >> @@ -220,6 +220,12 @@ static void vfio_intx_disable_kvm(VFIOPCIDevice *vdev)
> >>  #endif
> >>  }
> >>
> >> +bool vfio_is_vfio_pci(PCIDevice* pdev)
> >> +{
> >> +    VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
> >> +    return vdev->vbasedev.type == VFIO_DEVICE_TYPE_PCI;
> >> +}
> >> +
> >>  static void vfio_intx_update(PCIDevice *pdev)
> >>  {
> >>      VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
> >> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> >> index 821def0565..26dfde805f 100644
> >> --- a/include/hw/vfio/vfio-common.h
> >> +++ b/include/hw/vfio/vfio-common.h
> >> @@ -195,5 +195,6 @@ int vfio_spapr_create_window(VFIOContainer *container,
> >>                               hwaddr *pgsize);
> >>  int vfio_spapr_remove_window(VFIOContainer *container,
> >>                               hwaddr offset_within_address_space);
> >> +bool vfio_is_vfio_pci(PCIDevice* pdev);
> >>
> >>  #endif /* HW_VFIO_VFIO_COMMON_H */
> >> diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
> >> index 4d7f3c82ca..94388b40cb 100644
> >> --- a/include/hw/virtio/virtio-net.h
> >> +++ b/include/hw/virtio/virtio-net.h
> >> @@ -42,6 +42,7 @@ typedef struct virtio_net_conf
> >>      int32_t speed;
> >>      char *duplex_str;
> >>      uint8_t duplex;
> >> +    char *standby_id_str;
> >>  } virtio_net_conf;
> >>
> >>  /* Maximum packet size we can receive from tap device: header + 64k */
> >> @@ -103,6 +104,7 @@ typedef struct VirtIONet {
> >>      int announce_counter;
> >>      bool needs_vnet_hdr_swap;
> >>      bool mtu_bypass_backend;
> >> +    PCIDevice *standby_pdev;
> >>  } VirtIONet;
> >>
> >>  void virtio_net_set_netclient_name(VirtIONet *n, const char *name,
> >> (END)
> >>
> >> >
> >> > ---------------------------------------------------------------------
> >> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> >> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> >> >
> >>
> >>
> >> --
> >> Respectfully,
> >> Sameeh Jubran
> >> Linkedin
> >> Software Engineer @ Daynix.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-09-18 15:31             ` Michael S. Tsirkin
@ 2018-09-18 18:48               ` Siwei Liu
  2018-09-20  3:11                 ` Michael S. Tsirkin
  0 siblings, 1 reply; 85+ messages in thread
From: Siwei Liu @ 2018-09-18 18:48 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Venu Busireddy, Cornelia Huck, Samudrala, Sridhar, virtio-dev

On Tue, Sep 18, 2018 at 8:31 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Tue, Sep 18, 2018 at 10:13:37AM -0500, Venu Busireddy wrote:
>> On 2018-09-18 09:35:48 -0400, Michael S. Tsirkin wrote:
>> > On Tue, Sep 18, 2018 at 12:20:52PM +0200, Cornelia Huck wrote:
>> > > On Wed, 12 Sep 2018 11:22:12 -0400
>> > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
>> > >
>> > > > On Wed, Sep 12, 2018 at 08:17:45AM -0700, Samudrala, Sridhar wrote:
>> > > > >
>> > > > >
>> > > > > On 9/7/2018 2:34 PM, Michael S. Tsirkin wrote:
>> > > > > > On Wed, Aug 15, 2018 at 11:49:15AM -0700, Sridhar Samudrala wrote:
>> > > > > > > VIRTIO_NET_F_STANDBY feature enables hypervisor to indicate virtio_net
>> > > > > > > device to act as a standby for another device with the same MAC address.
>> > > > > > >
>> > > > > > > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
>> > > > > > > Acked-by: Cornelia Huck <cohuck@redhat.com>
>> > > > > > > Fixes: https://github.com/oasis-tcs/virtio-spec/issues/18
>> > > > > > Applied but when do you plan to add documentation as pointed
>> > > > > > out by Jan and Halil?
>> > > > >
>> > > > > I thought additional documentation will be done as part of the Qemu enablement
>> > > > > patches and i hope someone in RH is looking into it.
>> > > > >
>> > > > > Does it make sense to add a link to to the kernel documentation of this feature in
>> > > > > the spec
>> > > > >  https://www.kernel.org/doc/html/latest/networking/net_failover.html
>> > > >
>> > > >
>> > > > I do not think this will address the comments posted.  Specifically we
>> > > > should probably include documentation for what is a standby and primary:
>> > > > what is expected of driver (maintain configuration on standby, support
>> > > > primary coming and going, transmit on standby only if there is no
>> > > > primary) and of device (have same mac for standby as for standby).
>> > >
>> > > Yes, we need some definitive statements of what a driver and a device
>> > > is supposed to do in order to conform; it might make sense to discuss
>> > > this in conjunction with discussion on any QEMU patches (have not
>> > > checked whether anything has been posted, just returned from vacation).
>> > >
>> > > I assume that we still stick with the plan to implement/document
>> > > MAC-based handling first and then enhance with other methods later?
>> >
>> > I'm fine with that at least. If someone wants to work on
>> > other methods straight away, that's also fine by me.
>>
>> Patch set [1] implements the failover-group-id mechanism. Are you
>> thinking of some other method?
>>
>> Venu
>>
>> [1] https://lists.oasis-open.org/archives/virtio-dev/201806/msg00384.html
>>
>
> Yes, the grouping mechanism seems fine to me (I don't remember
> about the implementation, it's been a while).
>
> It is not by itself sufficient though, is it?

I do understand that the group ID patch is incomplete though it's a
base patch for the real work.

>
> MAC is assumed to be shared to avoid things like ARP/neighboor
> rediscovery, right?

True, but does this really need to be part of the guest-host
interface? Or rather, I don't see how MAC based matching can be done
on the host part. Are you going to expose MAC address to VFIO?

The thing is the current MAC based implementation has intrinsic flaw
that doesn't propagate errors to hypervisor, or there's no back
channel for guest to unwind the hot plug action upon failure in
probing or enslaving the primary. If you think about a more robust
implementation, another grouping mechanism rather than MAC is pretty
much required.

Thanks,
-Siwei

> If true that implies that to avoid guest confusion visibility of the
> primary needs to be controlled by standby's driver.
> This makes this patchset incomplete.
>
> For this work to be complete what is needed is:
> - hypervisor: add control of primary's visibility to guest
> - guest: add support for this grouping to the failover driver
>
> We also need
> - spec: document matching rules based on the pci bridge
>
> and it's helpful to have a spec proposal with implementation, but I
> would say at least proposed patches to one of the above 2 would be
> helpful before we include this in spec.
>
> --
> MST
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-09-18 18:39               ` Michael S. Tsirkin
@ 2018-09-18 19:10                 ` Siwei Liu
  2018-09-20  3:04                   ` Michael S. Tsirkin
  0 siblings, 1 reply; 85+ messages in thread
From: Siwei Liu @ 2018-09-18 19:10 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Sameeh Jubran, Cornelia Huck, Samudrala, Sridhar, virtio-dev

On Tue, Sep 18, 2018 at 11:39 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Tue, Sep 18, 2018 at 11:30:27AM -0700, Siwei Liu wrote:
>> On Tue, Sep 18, 2018 at 6:25 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
>> > On Tue, Sep 18, 2018 at 01:37:35PM +0300, Sameeh Jubran wrote:
>> >> On Tue, Sep 18, 2018 at 1:21 PM Cornelia Huck <cohuck@redhat.com> wrote:
>> >> >
>> >> > On Wed, 12 Sep 2018 11:22:12 -0400
>> >> > "Michael S. Tsirkin" <mst@redhat.com> wrote:
>> >> >
>> >> > > On Wed, Sep 12, 2018 at 08:17:45AM -0700, Samudrala, Sridhar wrote:
>> >> > > >
>> >> > > >
>> >> > > > On 9/7/2018 2:34 PM, Michael S. Tsirkin wrote:
>> >> > > > > On Wed, Aug 15, 2018 at 11:49:15AM -0700, Sridhar Samudrala wrote:
>> >> > > > > > VIRTIO_NET_F_STANDBY feature enables hypervisor to indicate virtio_net
>> >> > > > > > device to act as a standby for another device with the same MAC address.
>> >> > > > > >
>> >> > > > > > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
>> >> > > > > > Acked-by: Cornelia Huck <cohuck@redhat.com>
>> >> > > > > > Fixes: https://github.com/oasis-tcs/virtio-spec/issues/18
>> >> > > > > Applied but when do you plan to add documentation as pointed
>> >> > > > > out by Jan and Halil?
>> >> > > >
>> >> > > > I thought additional documentation will be done as part of the Qemu enablement
>> >> > > > patches and i hope someone in RH is looking into it.
>> >> > > >
>> >> > > > Does it make sense to add a link to to the kernel documentation of this feature in
>> >> > > > the spec
>> >> > > >  https://www.kernel.org/doc/html/latest/networking/net_failover.html
>> >> > >
>> >> > >
>> >> > > I do not think this will address the comments posted.  Specifically we
>> >> > > should probably include documentation for what is a standby and primary:
>> >> > > what is expected of driver (maintain configuration on standby, support
>> >> > > primary coming and going, transmit on standby only if there is no
>> >> > > primary) and of device (have same mac for standby as for standby).
>> >> >
>> >> > Yes, we need some definitive statements of what a driver and a device
>> >> > is supposed to do in order to conform; it might make sense to discuss
>> >> > this in conjunction with discussion on any QEMU patches (have not
>> >> > checked whether anything has been posted, just returned from vacation).
>> >> >
>> >> > I assume that we still stick with the plan to implement/document
>> >> > MAC-based handling first and then enhance with other methods later?
>> >>
>> >> I am currently in the process of writing the patches for this feature,
>> >> I have thought about how the feature should be implemented
>> >> and decided to go with a different approach. I've decided that the id
>> >> of the vfio attached device will be specified in the virtio-net
>> >> arguments as follows:
>> >>
>> >> -device virtio-net,standby=<device_id_of_vfio_device>
>> >> -vfio #address,id=<device_id_of_vfio_device>
>> >>
>> >> This approach makes minimal changes to the current infrastructure and
>> >> does so elegantly without adding unnecessary ids to the bridges.
>> >>
>> >> The mac address approach seems to be very complicated as there is no
>> >> standard way to find the mac address of a given device and it is
>> >> vendor dependent,
>> >> which makes the task of identifying the target standby device by it's
>> >> mac address a very tough one.
>> >
>> > Oh mac address is used by guest. I agree it's not a great qemu
>> > interface.
>> > The idea was basically to have -vfio #address,primary=<id>
>>
>> Interesting... How do you make sure the MAC address are same (grouped)
>> between vfio and virtio-net-pci (from QEMU side)? I thought the spec
>> meant to make this a guest-host interface, right?
>>
>> -Siwei
>
> I guess at this point that can be up to the management tool.

Although still a guest-host interface, moving this device-driver
virtio requirement to management toolstack is poor engineering
practice IMO.

-Siwei
>
>
>> >
>> >
>> >> Please share your thoughts so I'll move forward with the patches.
>> >
>> > Can this actually support hotplug add and remove of the vfio device though?
>> > E.g. hotplug add vfio device while VM is already running?
>> > With the primary=<> it works because standby must always exist
>> > even when primary isn't there.
>> >
>> >
>> >> An initial patch which implements hiding the device from pci bus
>> >> before the feature is acked is provided below:
>> >>
>> >> commit b716371bf4807fe16ffb4ffd901b69a110902a3c (HEAD -> failover)
>> >> Author: Sameeh Jubran <sjubran@redhat.com>
>> >> Date:   Sun Sep 16 13:21:41 2018 +0300
>> >>
>> >>     virtio-net: Implement standby feature
>> >>
>> >>     Signed-off-by: Sameeh Jubran <sjubran@redhat.com>
>> >>
>> >> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>> >> index f154756e85..46386c0e1b 100644
>> >> --- a/hw/net/virtio-net.c
>> >> +++ b/hw/net/virtio-net.c
>> >> @@ -26,7 +26,9 @@
>> >>  #include "qapi/qapi-events-net.h"
>> >>  #include "hw/virtio/virtio-access.h"
>> >>  #include "migration/misc.h"
>> >> +#include "hw/pci/pci.h"
>> >>  #include "standard-headers/linux/ethtool.h"
>> >> +#include "hw/vfio/vfio-common.h"
>> >>
>> >>  #define VIRTIO_NET_VM_VERSION    11
>> >>
>> >> @@ -1946,6 +1948,13 @@ void virtio_net_set_netclient_name(VirtIONet
>> >> *n, const char *name,
>> >>      n->netclient_type = g_strdup(type);
>> >>  }
>> >>
>> >> +static bool standby_device_present(VirtIONet *n, const char *id,
>> >> +        struct PCIDevice **pdev)
>> >> +{
>> >> +    return pci_qdev_find_device(id, pdev) >= 0 && pdev &&
>> >> +    vfio_is_vfio_pci(*pdev);
>> >> +}
>> >> +
>> >>  static void virtio_net_device_realize(DeviceState *dev, Error **errp)
>> >>  {
>> >>      VirtIODevice *vdev = VIRTIO_DEVICE(dev);
>> >> @@ -1976,6 +1985,21 @@ static void
>> >> virtio_net_device_realize(DeviceState *dev, Error **errp)
>> >>          n->host_features |= (1ULL << VIRTIO_NET_F_SPEED_DUPLEX);
>> >>      }
>> >>
>> >> +    if (n->net_conf.standby_id_str && standby_device_present(n,
>> >> +                n->net_conf.standby_id_str, &n->standby_pdev)) {
>> >> +        DeviceState *dev = DEVICE(n->standby_pdev);
>> >> +        DeviceClass *klass = DEVICE_GET_CLASS(dev);
>> >> +        /* Hide standby from pci till the feature is acked */
>> >> +        if (klass->hotpluggable)
>> >> +        {
>> >> +            qdev_unplug(dev, errp);
>> >
>> >
>> > Does this really hide the device?
>> > I see:
>> >     hdc = HOTPLUG_HANDLER_GET_CLASS(hotplug_ctrl);
>> >     if (hdc->unplug_request) {
>> >         hotplug_handler_unplug_request(hotplug_ctrl, dev, errp);
>> >     } else {
>> >         hotplug_handler_unplug(hotplug_ctrl, dev, errp);
>> >     }
>> >
>> > which seems to just send an eject request to guest - the reverse of
>> > what we want to do.
>> >
>> >> +            if (errp == NULL)
>> >> +            {
>> >> +                n->host_features |= (1ULL << VIRTIO_NET_F_STANDBY);
>> >> +            }
>> >
>> > I'm not sure how is this error handling supposed to work.
>> >
>> >> +        }
>> >> +    }
>> >> +
>> >>      virtio_net_set_config_size(n, n->host_features);
>> >>      virtio_init(vdev, "virtio-net", VIRTIO_ID_NET, n->config_size);
>> >>
>> >> @@ -2198,6 +2222,7 @@ static Property virtio_net_properties[] = {
>> >>                       true),
>> >>      DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN),
>> >>      DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str),
>> >> +    DEFINE_PROP_STRING("standby", VirtIONet, net_conf.standby_id_str),
>> >>      DEFINE_PROP_END_OF_LIST(),
>> >>  };
>> >>
>> >> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>> >> index 866f0deeb7..593debe56e 100644
>> >> --- a/hw/vfio/pci.c
>> >> +++ b/hw/vfio/pci.c
>> >> @@ -220,6 +220,12 @@ static void vfio_intx_disable_kvm(VFIOPCIDevice *vdev)
>> >>  #endif
>> >>  }
>> >>
>> >> +bool vfio_is_vfio_pci(PCIDevice* pdev)
>> >> +{
>> >> +    VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
>> >> +    return vdev->vbasedev.type == VFIO_DEVICE_TYPE_PCI;
>> >> +}
>> >> +
>> >>  static void vfio_intx_update(PCIDevice *pdev)
>> >>  {
>> >>      VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
>> >> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>> >> index 821def0565..26dfde805f 100644
>> >> --- a/include/hw/vfio/vfio-common.h
>> >> +++ b/include/hw/vfio/vfio-common.h
>> >> @@ -195,5 +195,6 @@ int vfio_spapr_create_window(VFIOContainer *container,
>> >>                               hwaddr *pgsize);
>> >>  int vfio_spapr_remove_window(VFIOContainer *container,
>> >>                               hwaddr offset_within_address_space);
>> >> +bool vfio_is_vfio_pci(PCIDevice* pdev);
>> >>
>> >>  #endif /* HW_VFIO_VFIO_COMMON_H */
>> >> diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
>> >> index 4d7f3c82ca..94388b40cb 100644
>> >> --- a/include/hw/virtio/virtio-net.h
>> >> +++ b/include/hw/virtio/virtio-net.h
>> >> @@ -42,6 +42,7 @@ typedef struct virtio_net_conf
>> >>      int32_t speed;
>> >>      char *duplex_str;
>> >>      uint8_t duplex;
>> >> +    char *standby_id_str;
>> >>  } virtio_net_conf;
>> >>
>> >>  /* Maximum packet size we can receive from tap device: header + 64k */
>> >> @@ -103,6 +104,7 @@ typedef struct VirtIONet {
>> >>      int announce_counter;
>> >>      bool needs_vnet_hdr_swap;
>> >>      bool mtu_bypass_backend;
>> >> +    PCIDevice *standby_pdev;
>> >>  } VirtIONet;
>> >>
>> >>  void virtio_net_set_netclient_name(VirtIONet *n, const char *name,
>> >> (END)
>> >>
>> >> >
>> >> > ---------------------------------------------------------------------
>> >> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
>> >> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>> >> >
>> >>
>> >>
>> >> --
>> >> Respectfully,
>> >> Sameeh Jubran
>> >> Linkedin
>> >> Software Engineer @ Daynix.
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
>> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>> >

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-09-18 13:25           ` Michael S. Tsirkin
  2018-09-18 18:30             ` Siwei Liu
@ 2018-09-19  5:03             ` Samudrala, Sridhar
  2018-09-20  5:51             ` Sameeh Jubran
  2 siblings, 0 replies; 85+ messages in thread
From: Samudrala, Sridhar @ 2018-09-19  5:03 UTC (permalink / raw)
  To: Michael S. Tsirkin, Sameeh Jubran; +Cc: cohuck, virtio-dev

On 9/18/2018 6:25 AM, Michael S. Tsirkin wrote:
> On Tue, Sep 18, 2018 at 01:37:35PM +0300, Sameeh Jubran wrote:
>> On Tue, Sep 18, 2018 at 1:21 PM Cornelia Huck <cohuck@redhat.com> wrote:
>>> On Wed, 12 Sep 2018 11:22:12 -0400
>>> "Michael S. Tsirkin" <mst@redhat.com> wrote:
>>>
>>>> On Wed, Sep 12, 2018 at 08:17:45AM -0700, Samudrala, Sridhar wrote:
>>>>>
>>>>> On 9/7/2018 2:34 PM, Michael S. Tsirkin wrote:
>>>>>> On Wed, Aug 15, 2018 at 11:49:15AM -0700, Sridhar Samudrala wrote:
>>>>>>> VIRTIO_NET_F_STANDBY feature enables hypervisor to indicate virtio_net
>>>>>>> device to act as a standby for another device with the same MAC address.
>>>>>>>
>>>>>>> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
>>>>>>> Acked-by: Cornelia Huck <cohuck@redhat.com>
>>>>>>> Fixes: https://github.com/oasis-tcs/virtio-spec/issues/18
>>>>>> Applied but when do you plan to add documentation as pointed
>>>>>> out by Jan and Halil?
>>>>> I thought additional documentation will be done as part of the Qemu enablement
>>>>> patches and i hope someone in RH is looking into it.
>>>>>
>>>>> Does it make sense to add a link to to the kernel documentation of this feature in
>>>>> the spec
>>>>>   https://www.kernel.org/doc/html/latest/networking/net_failover.html
>>>>
>>>> I do not think this will address the comments posted.  Specifically we
>>>> should probably include documentation for what is a standby and primary:
>>>> what is expected of driver (maintain configuration on standby, support
>>>> primary coming and going, transmit on standby only if there is no
>>>> primary) and of device (have same mac for standby as for standby).
>>> Yes, we need some definitive statements of what a driver and a device
>>> is supposed to do in order to conform; it might make sense to discuss
>>> this in conjunction with discussion on any QEMU patches (have not
>>> checked whether anything has been posted, just returned from vacation).
>>>
>>> I assume that we still stick with the plan to implement/document
>>> MAC-based handling first and then enhance with other methods later?
>> I am currently in the process of writing the patches for this feature,
>> I have thought about how the feature should be implemented
>> and decided to go with a different approach. I've decided that the id
>> of the vfio attached device will be specified in the virtio-net
>> arguments as follows:
>>
>> -device virtio-net,standby=<device_id_of_vfio_device>
>> -vfio #address,id=<device_id_of_vfio_device>
>>
>> This approach makes minimal changes to the current infrastructure and
>> does so elegantly without adding unnecessary ids to the bridges.
>>
>> The mac address approach seems to be very complicated as there is no
>> standard way to find the mac address of a given device and it is
>> vendor dependent,
>> which makes the task of identifying the target standby device by it's
>> mac address a very tough one.
> Oh mac address is used by guest. I agree it's not a great qemu
> interface.
> The idea was basically to have -vfio #address,primary=<id>
>
>
>> Please share your thoughts so I'll move forward with the patches.
> Can this actually support hotplug add and remove of the vfio device though?
> E.g. hotplug add vfio device while VM is already running?
> With the primary=<> it works because standby must always exist
> even when primary isn't there.

Also, how do we want to handle a scenario where a VM has a direct attached VF
device and virtio-net in standby mode is hotplugged/unplugged?

What should be the behavior if guest unloads a virtio-net driver that is acting
as a standby? Do we want qemu to unplug VF device too?


>
>
>> An initial patch which implements hiding the device from pci bus
>> before the feature is acked is provided below:
>>
>> commit b716371bf4807fe16ffb4ffd901b69a110902a3c (HEAD -> failover)
>> Author: Sameeh Jubran <sjubran@redhat.com>
>> Date:   Sun Sep 16 13:21:41 2018 +0300
>>
>>      virtio-net: Implement standby feature
>>
>>      Signed-off-by: Sameeh Jubran <sjubran@redhat.com>
>>
>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>> index f154756e85..46386c0e1b 100644
>> --- a/hw/net/virtio-net.c
>> +++ b/hw/net/virtio-net.c
>> @@ -26,7 +26,9 @@
>>   #include "qapi/qapi-events-net.h"
>>   #include "hw/virtio/virtio-access.h"
>>   #include "migration/misc.h"
>> +#include "hw/pci/pci.h"
>>   #include "standard-headers/linux/ethtool.h"
>> +#include "hw/vfio/vfio-common.h"
>>
>>   #define VIRTIO_NET_VM_VERSION    11
>>
>> @@ -1946,6 +1948,13 @@ void virtio_net_set_netclient_name(VirtIONet
>> *n, const char *name,
>>       n->netclient_type = g_strdup(type);
>>   }
>>
>> +static bool standby_device_present(VirtIONet *n, const char *id,
>> +        struct PCIDevice **pdev)
>> +{
>> +    return pci_qdev_find_device(id, pdev) >= 0 && pdev &&
>> +    vfio_is_vfio_pci(*pdev);
>> +}
>> +
>>   static void virtio_net_device_realize(DeviceState *dev, Error **errp)
>>   {
>>       VirtIODevice *vdev = VIRTIO_DEVICE(dev);
>> @@ -1976,6 +1985,21 @@ static void
>> virtio_net_device_realize(DeviceState *dev, Error **errp)
>>           n->host_features |= (1ULL << VIRTIO_NET_F_SPEED_DUPLEX);
>>       }
>>
>> +    if (n->net_conf.standby_id_str && standby_device_present(n,
>> +                n->net_conf.standby_id_str, &n->standby_pdev)) {
>> +        DeviceState *dev = DEVICE(n->standby_pdev);
>> +        DeviceClass *klass = DEVICE_GET_CLASS(dev);
>> +        /* Hide standby from pci till the feature is acked */
>> +        if (klass->hotpluggable)
>> +        {
>> +            qdev_unplug(dev, errp);
>
> Does this really hide the device?
> I see:
>      hdc = HOTPLUG_HANDLER_GET_CLASS(hotplug_ctrl);
>      if (hdc->unplug_request) {
>          hotplug_handler_unplug_request(hotplug_ctrl, dev, errp);
>      } else {
>          hotplug_handler_unplug(hotplug_ctrl, dev, errp);
>      }
>
> which seems to just send an eject request to guest - the reverse of
> what we want to do.
>
>> +            if (errp == NULL)
>> +            {
>> +                n->host_features |= (1ULL << VIRTIO_NET_F_STANDBY);
>> +            }
> I'm not sure how is this error handling supposed to work.
>
>> +        }
>> +    }
>> +
>>       virtio_net_set_config_size(n, n->host_features);
>>       virtio_init(vdev, "virtio-net", VIRTIO_ID_NET, n->config_size);
>>
>> @@ -2198,6 +2222,7 @@ static Property virtio_net_properties[] = {
>>                        true),
>>       DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN),
>>       DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str),
>> +    DEFINE_PROP_STRING("standby", VirtIONet, net_conf.standby_id_str),
>>       DEFINE_PROP_END_OF_LIST(),
>>   };
>>
>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>> index 866f0deeb7..593debe56e 100644
>> --- a/hw/vfio/pci.c
>> +++ b/hw/vfio/pci.c
>> @@ -220,6 +220,12 @@ static void vfio_intx_disable_kvm(VFIOPCIDevice *vdev)
>>   #endif
>>   }
>>
>> +bool vfio_is_vfio_pci(PCIDevice* pdev)
>> +{
>> +    VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
>> +    return vdev->vbasedev.type == VFIO_DEVICE_TYPE_PCI;
>> +}
>> +
>>   static void vfio_intx_update(PCIDevice *pdev)
>>   {
>>       VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>> index 821def0565..26dfde805f 100644
>> --- a/include/hw/vfio/vfio-common.h
>> +++ b/include/hw/vfio/vfio-common.h
>> @@ -195,5 +195,6 @@ int vfio_spapr_create_window(VFIOContainer *container,
>>                                hwaddr *pgsize);
>>   int vfio_spapr_remove_window(VFIOContainer *container,
>>                                hwaddr offset_within_address_space);
>> +bool vfio_is_vfio_pci(PCIDevice* pdev);
>>
>>   #endif /* HW_VFIO_VFIO_COMMON_H */
>> diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
>> index 4d7f3c82ca..94388b40cb 100644
>> --- a/include/hw/virtio/virtio-net.h
>> +++ b/include/hw/virtio/virtio-net.h
>> @@ -42,6 +42,7 @@ typedef struct virtio_net_conf
>>       int32_t speed;
>>       char *duplex_str;
>>       uint8_t duplex;
>> +    char *standby_id_str;
>>   } virtio_net_conf;
>>
>>   /* Maximum packet size we can receive from tap device: header + 64k */
>> @@ -103,6 +104,7 @@ typedef struct VirtIONet {
>>       int announce_counter;
>>       bool needs_vnet_hdr_swap;
>>       bool mtu_bypass_backend;
>> +    PCIDevice *standby_pdev;
>>   } VirtIONet;
>>
>>   void virtio_net_set_netclient_name(VirtIONet *n, const char *name,
>> (END)
>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
>>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>>>
>>
>> -- 
>> Respectfully,
>> Sameeh Jubran
>> Linkedin
>> Software Engineer @ Daynix.


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-09-18 19:10                 ` Siwei Liu
@ 2018-09-20  3:04                   ` Michael S. Tsirkin
  0 siblings, 0 replies; 85+ messages in thread
From: Michael S. Tsirkin @ 2018-09-20  3:04 UTC (permalink / raw)
  To: Siwei Liu; +Cc: Sameeh Jubran, Cornelia Huck, Samudrala, Sridhar, virtio-dev

On Tue, Sep 18, 2018 at 12:10:54PM -0700, Siwei Liu wrote:
> On Tue, Sep 18, 2018 at 11:39 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Tue, Sep 18, 2018 at 11:30:27AM -0700, Siwei Liu wrote:
> >> On Tue, Sep 18, 2018 at 6:25 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> >> > On Tue, Sep 18, 2018 at 01:37:35PM +0300, Sameeh Jubran wrote:
> >> >> On Tue, Sep 18, 2018 at 1:21 PM Cornelia Huck <cohuck@redhat.com> wrote:
> >> >> >
> >> >> > On Wed, 12 Sep 2018 11:22:12 -0400
> >> >> > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >> >> >
> >> >> > > On Wed, Sep 12, 2018 at 08:17:45AM -0700, Samudrala, Sridhar wrote:
> >> >> > > >
> >> >> > > >
> >> >> > > > On 9/7/2018 2:34 PM, Michael S. Tsirkin wrote:
> >> >> > > > > On Wed, Aug 15, 2018 at 11:49:15AM -0700, Sridhar Samudrala wrote:
> >> >> > > > > > VIRTIO_NET_F_STANDBY feature enables hypervisor to indicate virtio_net
> >> >> > > > > > device to act as a standby for another device with the same MAC address.
> >> >> > > > > >
> >> >> > > > > > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
> >> >> > > > > > Acked-by: Cornelia Huck <cohuck@redhat.com>
> >> >> > > > > > Fixes: https://github.com/oasis-tcs/virtio-spec/issues/18
> >> >> > > > > Applied but when do you plan to add documentation as pointed
> >> >> > > > > out by Jan and Halil?
> >> >> > > >
> >> >> > > > I thought additional documentation will be done as part of the Qemu enablement
> >> >> > > > patches and i hope someone in RH is looking into it.
> >> >> > > >
> >> >> > > > Does it make sense to add a link to to the kernel documentation of this feature in
> >> >> > > > the spec
> >> >> > > >  https://www.kernel.org/doc/html/latest/networking/net_failover.html
> >> >> > >
> >> >> > >
> >> >> > > I do not think this will address the comments posted.  Specifically we
> >> >> > > should probably include documentation for what is a standby and primary:
> >> >> > > what is expected of driver (maintain configuration on standby, support
> >> >> > > primary coming and going, transmit on standby only if there is no
> >> >> > > primary) and of device (have same mac for standby as for standby).
> >> >> >
> >> >> > Yes, we need some definitive statements of what a driver and a device
> >> >> > is supposed to do in order to conform; it might make sense to discuss
> >> >> > this in conjunction with discussion on any QEMU patches (have not
> >> >> > checked whether anything has been posted, just returned from vacation).
> >> >> >
> >> >> > I assume that we still stick with the plan to implement/document
> >> >> > MAC-based handling first and then enhance with other methods later?
> >> >>
> >> >> I am currently in the process of writing the patches for this feature,
> >> >> I have thought about how the feature should be implemented
> >> >> and decided to go with a different approach. I've decided that the id
> >> >> of the vfio attached device will be specified in the virtio-net
> >> >> arguments as follows:
> >> >>
> >> >> -device virtio-net,standby=<device_id_of_vfio_device>
> >> >> -vfio #address,id=<device_id_of_vfio_device>
> >> >>
> >> >> This approach makes minimal changes to the current infrastructure and
> >> >> does so elegantly without adding unnecessary ids to the bridges.
> >> >>
> >> >> The mac address approach seems to be very complicated as there is no
> >> >> standard way to find the mac address of a given device and it is
> >> >> vendor dependent,
> >> >> which makes the task of identifying the target standby device by it's
> >> >> mac address a very tough one.
> >> >
> >> > Oh mac address is used by guest. I agree it's not a great qemu
> >> > interface.
> >> > The idea was basically to have -vfio #address,primary=<id>
> >>
> >> Interesting... How do you make sure the MAC address are same (grouped)
> >> between vfio and virtio-net-pci (from QEMU side)? I thought the spec
> >> meant to make this a guest-host interface, right?
> >>
> >> -Siwei
> >
> > I guess at this point that can be up to the management tool.
> 
> Although still a guest-host interface, moving this device-driver
> virtio requirement to management toolstack is poor engineering
> practice IMO.
> 
> -Siwei

There are advantages to doing it outside QEMU, such as
security (libvirt has access to netlink, QEMU doesn't).
It doesn't look like such an important detail  to me -
these details are going to be up to whoever implements it.

Anyway we are discussing this on a wrong list. Where does code belong
(qemu or libvirt) is a question to be discussed on qemu
and libvirt lists, virtio spec does not care which
host side module does what.



---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-09-18 18:48               ` Siwei Liu
@ 2018-09-20  3:11                 ` Michael S. Tsirkin
  2018-09-20 23:57                   ` Siwei Liu
  0 siblings, 1 reply; 85+ messages in thread
From: Michael S. Tsirkin @ 2018-09-20  3:11 UTC (permalink / raw)
  To: Siwei Liu; +Cc: Venu Busireddy, Cornelia Huck, Samudrala, Sridhar, virtio-dev

On Tue, Sep 18, 2018 at 11:48:46AM -0700, Siwei Liu wrote:
> On Tue, Sep 18, 2018 at 8:31 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Tue, Sep 18, 2018 at 10:13:37AM -0500, Venu Busireddy wrote:
> >> On 2018-09-18 09:35:48 -0400, Michael S. Tsirkin wrote:
> >> > On Tue, Sep 18, 2018 at 12:20:52PM +0200, Cornelia Huck wrote:
> >> > > On Wed, 12 Sep 2018 11:22:12 -0400
> >> > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >> > >
> >> > > > On Wed, Sep 12, 2018 at 08:17:45AM -0700, Samudrala, Sridhar wrote:
> >> > > > >
> >> > > > >
> >> > > > > On 9/7/2018 2:34 PM, Michael S. Tsirkin wrote:
> >> > > > > > On Wed, Aug 15, 2018 at 11:49:15AM -0700, Sridhar Samudrala wrote:
> >> > > > > > > VIRTIO_NET_F_STANDBY feature enables hypervisor to indicate virtio_net
> >> > > > > > > device to act as a standby for another device with the same MAC address.
> >> > > > > > >
> >> > > > > > > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
> >> > > > > > > Acked-by: Cornelia Huck <cohuck@redhat.com>
> >> > > > > > > Fixes: https://github.com/oasis-tcs/virtio-spec/issues/18
> >> > > > > > Applied but when do you plan to add documentation as pointed
> >> > > > > > out by Jan and Halil?
> >> > > > >
> >> > > > > I thought additional documentation will be done as part of the Qemu enablement
> >> > > > > patches and i hope someone in RH is looking into it.
> >> > > > >
> >> > > > > Does it make sense to add a link to to the kernel documentation of this feature in
> >> > > > > the spec
> >> > > > >  https://www.kernel.org/doc/html/latest/networking/net_failover.html
> >> > > >
> >> > > >
> >> > > > I do not think this will address the comments posted.  Specifically we
> >> > > > should probably include documentation for what is a standby and primary:
> >> > > > what is expected of driver (maintain configuration on standby, support
> >> > > > primary coming and going, transmit on standby only if there is no
> >> > > > primary) and of device (have same mac for standby as for standby).
> >> > >
> >> > > Yes, we need some definitive statements of what a driver and a device
> >> > > is supposed to do in order to conform; it might make sense to discuss
> >> > > this in conjunction with discussion on any QEMU patches (have not
> >> > > checked whether anything has been posted, just returned from vacation).
> >> > >
> >> > > I assume that we still stick with the plan to implement/document
> >> > > MAC-based handling first and then enhance with other methods later?
> >> >
> >> > I'm fine with that at least. If someone wants to work on
> >> > other methods straight away, that's also fine by me.
> >>
> >> Patch set [1] implements the failover-group-id mechanism. Are you
> >> thinking of some other method?
> >>
> >> Venu
> >>
> >> [1] https://lists.oasis-open.org/archives/virtio-dev/201806/msg00384.html
> >>
> >
> > Yes, the grouping mechanism seems fine to me (I don't remember
> > about the implementation, it's been a while).
> >
> > It is not by itself sufficient though, is it?
> 
> I do understand that the group ID patch is incomplete though it's a
> base patch for the real work.
> 
> >
> > MAC is assumed to be shared to avoid things like ARP/neighboor
> > rediscovery, right?
> 
> True, but does this really need to be part of the guest-host
> interface? Or rather, I don't see how MAC based matching can be done
> on the host part.

mac address matching does not need to affect host side.

> Are you going to expose MAC address to VFIO?

If mac of a VF is programmed by libvirt through the PF
(that's already the case), VFIO does not need to care about it.

> 
> The thing is the current MAC based implementation has intrinsic flaw
> that doesn't propagate errors to hypervisor, or there's no back
> channel for guest to unwind the hot plug action upon failure in
> probing or enslaving the primary.

I guess you can eject the primary if you like. But
why does hypervisor need to know? On error, just don't use primary,
use standby.

> If you think about a more robust
> implementation, another grouping mechanism rather than MAC is pretty
> much required.
> 
> Thanks,
> -Siwei

I don't really know what is the flaw, or how is it fixed by a grouping
mechanism. All this motivation was never described as part of work on
an alternate grouping.

> > If true that implies that to avoid guest confusion visibility of the
> > primary needs to be controlled by standby's driver.
> > This makes this patchset incomplete.
> >
> > For this work to be complete what is needed is:
> > - hypervisor: add control of primary's visibility to guest
> > - guest: add support for this grouping to the failover driver
> >
> > We also need
> > - spec: document matching rules based on the pci bridge
> >
> > and it's helpful to have a spec proposal with implementation, but I
> > would say at least proposed patches to one of the above 2 would be
> > helpful before we include this in spec.
> >
> > --
> > MST
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-09-18 13:25           ` Michael S. Tsirkin
  2018-09-18 18:30             ` Siwei Liu
  2018-09-19  5:03             ` Samudrala, Sridhar
@ 2018-09-20  5:51             ` Sameeh Jubran
  2 siblings, 0 replies; 85+ messages in thread
From: Sameeh Jubran @ 2018-09-20  5:51 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: cohuck, sridhar.samudrala, virtio-dev

On Tue, Sep 18, 2018 at 4:25 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, Sep 18, 2018 at 01:37:35PM +0300, Sameeh Jubran wrote:
> > On Tue, Sep 18, 2018 at 1:21 PM Cornelia Huck <cohuck@redhat.com> wrote:
> > >
> > > On Wed, 12 Sep 2018 11:22:12 -0400
> > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > >
> > > > On Wed, Sep 12, 2018 at 08:17:45AM -0700, Samudrala, Sridhar wrote:
> > > > >
> > > > >
> > > > > On 9/7/2018 2:34 PM, Michael S. Tsirkin wrote:
> > > > > > On Wed, Aug 15, 2018 at 11:49:15AM -0700, Sridhar Samudrala wrote:
> > > > > > > VIRTIO_NET_F_STANDBY feature enables hypervisor to indicate virtio_net
> > > > > > > device to act as a standby for another device with the same MAC address.
> > > > > > >
> > > > > > > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
> > > > > > > Acked-by: Cornelia Huck <cohuck@redhat.com>
> > > > > > > Fixes: https://github.com/oasis-tcs/virtio-spec/issues/18
> > > > > > Applied but when do you plan to add documentation as pointed
> > > > > > out by Jan and Halil?
> > > > >
> > > > > I thought additional documentation will be done as part of the Qemu enablement
> > > > > patches and i hope someone in RH is looking into it.
> > > > >
> > > > > Does it make sense to add a link to to the kernel documentation of this feature in
> > > > > the spec
> > > > >  https://www.kernel.org/doc/html/latest/networking/net_failover.html
> > > >
> > > >
> > > > I do not think this will address the comments posted.  Specifically we
> > > > should probably include documentation for what is a standby and primary:
> > > > what is expected of driver (maintain configuration on standby, support
> > > > primary coming and going, transmit on standby only if there is no
> > > > primary) and of device (have same mac for standby as for standby).
> > >
> > > Yes, we need some definitive statements of what a driver and a device
> > > is supposed to do in order to conform; it might make sense to discuss
> > > this in conjunction with discussion on any QEMU patches (have not
> > > checked whether anything has been posted, just returned from vacation).
> > >
> > > I assume that we still stick with the plan to implement/document
> > > MAC-based handling first and then enhance with other methods later?
> >
> > I am currently in the process of writing the patches for this feature,
> > I have thought about how the feature should be implemented
> > and decided to go with a different approach. I've decided that the id
> > of the vfio attached device will be specified in the virtio-net
> > arguments as follows:
> >
> > -device virtio-net,standby=<device_id_of_vfio_device>
> > -vfio #address,id=<device_id_of_vfio_device>
> >
> > This approach makes minimal changes to the current infrastructure and
> > does so elegantly without adding unnecessary ids to the bridges.
> >
> > The mac address approach seems to be very complicated as there is no
> > standard way to find the mac address of a given device and it is
> > vendor dependent,
> > which makes the task of identifying the target standby device by it's
> > mac address a very tough one.
>
> Oh mac address is used by guest. I agree it's not a great qemu
> interface.
> The idea was basically to have -vfio #address,primary=<id>
>
>
> > Please share your thoughts so I'll move forward with the patches.
>
> Can this actually support hotplug add and remove of the vfio device though?
> E.g. hotplug add vfio device while VM is already running?
> With the primary=<> it works because standby must always exist
> even when primary isn't there.
Oh I get what are you saying, what are you suggesting can be easily done too.
The primary searches for the standby device and if it exists it should
hide itself
and somehow register itself to the standby.

Now the idea of group idea starts to make sense to me as it makes the
identification of the paired device accessible
from both devices without any additions.

However this can be easily done by exposing an API from virio-net for
the primary
device to announce itself.

I don't really like the idea of the group id as it seems to me as
unneeded logic, but I think I'm missing something.
Can someone explain the motive behind the group id? Is it necessary?
>
>
> > An initial patch which implements hiding the device from pci bus
> > before the feature is acked is provided below:
> >
> > commit b716371bf4807fe16ffb4ffd901b69a110902a3c (HEAD -> failover)
> > Author: Sameeh Jubran <sjubran@redhat.com>
> > Date:   Sun Sep 16 13:21:41 2018 +0300
> >
> >     virtio-net: Implement standby feature
> >
> >     Signed-off-by: Sameeh Jubran <sjubran@redhat.com>
> >
> > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> > index f154756e85..46386c0e1b 100644
> > --- a/hw/net/virtio-net.c
> > +++ b/hw/net/virtio-net.c
> > @@ -26,7 +26,9 @@
> >  #include "qapi/qapi-events-net.h"
> >  #include "hw/virtio/virtio-access.h"
> >  #include "migration/misc.h"
> > +#include "hw/pci/pci.h"
> >  #include "standard-headers/linux/ethtool.h"
> > +#include "hw/vfio/vfio-common.h"
> >
> >  #define VIRTIO_NET_VM_VERSION    11
> >
> > @@ -1946,6 +1948,13 @@ void virtio_net_set_netclient_name(VirtIONet
> > *n, const char *name,
> >      n->netclient_type = g_strdup(type);
> >  }
> >
> > +static bool standby_device_present(VirtIONet *n, const char *id,
> > +        struct PCIDevice **pdev)
> > +{
> > +    return pci_qdev_find_device(id, pdev) >= 0 && pdev &&
> > +    vfio_is_vfio_pci(*pdev);
> > +}
> > +
> >  static void virtio_net_device_realize(DeviceState *dev, Error **errp)
> >  {
> >      VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> > @@ -1976,6 +1985,21 @@ static void
> > virtio_net_device_realize(DeviceState *dev, Error **errp)
> >          n->host_features |= (1ULL << VIRTIO_NET_F_SPEED_DUPLEX);
> >      }
> >
> > +    if (n->net_conf.standby_id_str && standby_device_present(n,
> > +                n->net_conf.standby_id_str, &n->standby_pdev)) {
> > +        DeviceState *dev = DEVICE(n->standby_pdev);
> > +        DeviceClass *klass = DEVICE_GET_CLASS(dev);
> > +        /* Hide standby from pci till the feature is acked */
> > +        if (klass->hotpluggable)
> > +        {
> > +            qdev_unplug(dev, errp);
>
>
> Does this really hide the device?
> I see:
>     hdc = HOTPLUG_HANDLER_GET_CLASS(hotplug_ctrl);
>     if (hdc->unplug_request) {
>         hotplug_handler_unplug_request(hotplug_ctrl, dev, errp);
>     } else {
>         hotplug_handler_unplug(hotplug_ctrl, dev, errp);
>     }
>
> which seems to just send an eject request to guest - the reverse of
> what we want to do.
You are right it doesn't hide the device, I thought about registering
a pre plug callback which should be called before the device is
realized and deattach it
from it's parent or override the realize callback in the deviec state
to null. This should hide the device from the pci bus as far as I
understand.
>
> > +            if (errp == NULL)
> > +            {
> > +                n->host_features |= (1ULL << VIRTIO_NET_F_STANDBY);
> > +            }
>
> I'm not sure how is this error handling supposed to work.
>
> > +        }
> > +    }
> > +
> >      virtio_net_set_config_size(n, n->host_features);
> >      virtio_init(vdev, "virtio-net", VIRTIO_ID_NET, n->config_size);
> >
> > @@ -2198,6 +2222,7 @@ static Property virtio_net_properties[] = {
> >                       true),
> >      DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN),
> >      DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str),
> > +    DEFINE_PROP_STRING("standby", VirtIONet, net_conf.standby_id_str),
> >      DEFINE_PROP_END_OF_LIST(),
> >  };
> >
> > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> > index 866f0deeb7..593debe56e 100644
> > --- a/hw/vfio/pci.c
> > +++ b/hw/vfio/pci.c
> > @@ -220,6 +220,12 @@ static void vfio_intx_disable_kvm(VFIOPCIDevice *vdev)
> >  #endif
> >  }
> >
> > +bool vfio_is_vfio_pci(PCIDevice* pdev)
> > +{
> > +    VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
> > +    return vdev->vbasedev.type == VFIO_DEVICE_TYPE_PCI;
> > +}
> > +
> >  static void vfio_intx_update(PCIDevice *pdev)
> >  {
> >      VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
> > diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> > index 821def0565..26dfde805f 100644
> > --- a/include/hw/vfio/vfio-common.h
> > +++ b/include/hw/vfio/vfio-common.h
> > @@ -195,5 +195,6 @@ int vfio_spapr_create_window(VFIOContainer *container,
> >                               hwaddr *pgsize);
> >  int vfio_spapr_remove_window(VFIOContainer *container,
> >                               hwaddr offset_within_address_space);
> > +bool vfio_is_vfio_pci(PCIDevice* pdev);
> >
> >  #endif /* HW_VFIO_VFIO_COMMON_H */
> > diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
> > index 4d7f3c82ca..94388b40cb 100644
> > --- a/include/hw/virtio/virtio-net.h
> > +++ b/include/hw/virtio/virtio-net.h
> > @@ -42,6 +42,7 @@ typedef struct virtio_net_conf
> >      int32_t speed;
> >      char *duplex_str;
> >      uint8_t duplex;
> > +    char *standby_id_str;
> >  } virtio_net_conf;
> >
> >  /* Maximum packet size we can receive from tap device: header + 64k */
> > @@ -103,6 +104,7 @@ typedef struct VirtIONet {
> >      int announce_counter;
> >      bool needs_vnet_hdr_swap;
> >      bool mtu_bypass_backend;
> > +    PCIDevice *standby_pdev;
> >  } VirtIONet;
> >
> >  void virtio_net_set_netclient_name(VirtIONet *n, const char *name,
> > (END)
> >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > >
> >
> >
> > --
> > Respectfully,
> > Sameeh Jubran
> > Linkedin
> > Software Engineer @ Daynix.



-- 
Respectfully,
Sameeh Jubran
Linkedin
Software Engineer @ Daynix.

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-09-20  3:11                 ` Michael S. Tsirkin
@ 2018-09-20 23:57                   ` Siwei Liu
  2018-09-21  2:23                     ` Michael S. Tsirkin
  0 siblings, 1 reply; 85+ messages in thread
From: Siwei Liu @ 2018-09-20 23:57 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Venu Busireddy, Cornelia Huck, Samudrala, Sridhar, virtio-dev

On Wed, Sep 19, 2018 at 8:11 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Tue, Sep 18, 2018 at 11:48:46AM -0700, Siwei Liu wrote:
>> On Tue, Sep 18, 2018 at 8:31 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
>> > On Tue, Sep 18, 2018 at 10:13:37AM -0500, Venu Busireddy wrote:
>> >> On 2018-09-18 09:35:48 -0400, Michael S. Tsirkin wrote:
>> >> > On Tue, Sep 18, 2018 at 12:20:52PM +0200, Cornelia Huck wrote:
>> >> > > On Wed, 12 Sep 2018 11:22:12 -0400
>> >> > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
>> >> > >
>> >> > > > On Wed, Sep 12, 2018 at 08:17:45AM -0700, Samudrala, Sridhar wrote:
>> >> > > > >
>> >> > > > >
>> >> > > > > On 9/7/2018 2:34 PM, Michael S. Tsirkin wrote:
>> >> > > > > > On Wed, Aug 15, 2018 at 11:49:15AM -0700, Sridhar Samudrala wrote:
>> >> > > > > > > VIRTIO_NET_F_STANDBY feature enables hypervisor to indicate virtio_net
>> >> > > > > > > device to act as a standby for another device with the same MAC address.
>> >> > > > > > >
>> >> > > > > > > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
>> >> > > > > > > Acked-by: Cornelia Huck <cohuck@redhat.com>
>> >> > > > > > > Fixes: https://github.com/oasis-tcs/virtio-spec/issues/18
>> >> > > > > > Applied but when do you plan to add documentation as pointed
>> >> > > > > > out by Jan and Halil?
>> >> > > > >
>> >> > > > > I thought additional documentation will be done as part of the Qemu enablement
>> >> > > > > patches and i hope someone in RH is looking into it.
>> >> > > > >
>> >> > > > > Does it make sense to add a link to to the kernel documentation of this feature in
>> >> > > > > the spec
>> >> > > > >  https://www.kernel.org/doc/html/latest/networking/net_failover.html
>> >> > > >
>> >> > > >
>> >> > > > I do not think this will address the comments posted.  Specifically we
>> >> > > > should probably include documentation for what is a standby and primary:
>> >> > > > what is expected of driver (maintain configuration on standby, support
>> >> > > > primary coming and going, transmit on standby only if there is no
>> >> > > > primary) and of device (have same mac for standby as for standby).
>> >> > >
>> >> > > Yes, we need some definitive statements of what a driver and a device
>> >> > > is supposed to do in order to conform; it might make sense to discuss
>> >> > > this in conjunction with discussion on any QEMU patches (have not
>> >> > > checked whether anything has been posted, just returned from vacation).
>> >> > >
>> >> > > I assume that we still stick with the plan to implement/document
>> >> > > MAC-based handling first and then enhance with other methods later?
>> >> >
>> >> > I'm fine with that at least. If someone wants to work on
>> >> > other methods straight away, that's also fine by me.
>> >>
>> >> Patch set [1] implements the failover-group-id mechanism. Are you
>> >> thinking of some other method?
>> >>
>> >> Venu
>> >>
>> >> [1] https://lists.oasis-open.org/archives/virtio-dev/201806/msg00384.html
>> >>
>> >
>> > Yes, the grouping mechanism seems fine to me (I don't remember
>> > about the implementation, it's been a while).
>> >
>> > It is not by itself sufficient though, is it?
>>
>> I do understand that the group ID patch is incomplete though it's a
>> base patch for the real work.
>>
>> >
>> > MAC is assumed to be shared to avoid things like ARP/neighboor
>> > rediscovery, right?
>>
>> True, but does this really need to be part of the guest-host
>> interface? Or rather, I don't see how MAC based matching can be done
>> on the host part.
>
> mac address matching does not need to affect host side.

Did you realize that the host side can't have duplicate MAC address
filters for both PV and VF at the same time?

If hot adding a VF with duplicate MAC address filter programmed in
prior, the PV path for virtio in the host side is effectively
disabled. However, the fact that VF gets hot plugged by QEMU/libvirt
does not mean it's ready and usable in the guest. You end up with
unusable guest networking, *temporarily only when VF is successfully
probed and properly enslabed*. As of now, no guest-host handshake was
defined in the spec to make virtio driver aware of hotplug event thus
VF's exposure, and zero handshake was done to switch the datapath when
VF driver is ready and usable in guest. The current implementation
relies on the lucky side that all the entire hot plug process will be
successul in the guest.

BTW netvsc mitigate potential failure in the hotplug and driver
probing by acknowledging the hypervisor through a DATAPATH_SWITCH
hypercall (VMbus message) when VF driver is enslaved and ready, only
then hypervisor will kick off datapath switching by moving the MAC
address filter.

>
>> Are you going to expose MAC address to VFIO?
>
> If mac of a VF is programmed by libvirt through the PF
> (that's already the case), VFIO does not need to care about it.
>
>>
>> The thing is the current MAC based implementation has intrinsic flaw
>> that doesn't propagate errors to hypervisor, or there's no back
>> channel for guest to unwind the hot plug action upon failure in
>> probing or enslaving the primary.
>
> I guess you can eject the primary if you like. But
> why does hypervisor need to know? On error, just don't use primary,
> use standby.

Forget about the grouping mechanism first. What guest kernel change do
you propose to make virtio driver know every possible error, think
about how many moving targets it needs to specifically track with or
has to depend on during the hot plug and driver probing process? If
someone starts to implement the code and think about various error
cases as a whole, I bet it would be more clear why grouping is
relevant in the first place.

-Siwei

>
>> If you think about a more robust
>> implementation, another grouping mechanism rather than MAC is pretty
>> much required.
>>
>> Thanks,
>> -Siwei
>
> I don't really know what is the flaw, or how is it fixed by a grouping
> mechanism. All this motivation was never described as part of work on
> an alternate grouping.
>
>> > If true that implies that to avoid guest confusion visibility of the
>> > primary needs to be controlled by standby's driver.
>> > This makes this patchset incomplete.
>> >
>> > For this work to be complete what is needed is:
>> > - hypervisor: add control of primary's visibility to guest
>> > - guest: add support for this grouping to the failover driver
>> >
>> > We also need
>> > - spec: document matching rules based on the pci bridge
>> >
>> > and it's helpful to have a spec proposal with implementation, but I
>> > would say at least proposed patches to one of the above 2 would be
>> > helpful before we include this in spec.
>> >
>> > --
>> > MST
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
>> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>> >

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-09-20 23:57                   ` Siwei Liu
@ 2018-09-21  2:23                     ` Michael S. Tsirkin
  2018-09-21  2:34                       ` Michael S. Tsirkin
  2018-09-27  0:18                       ` Siwei Liu
  0 siblings, 2 replies; 85+ messages in thread
From: Michael S. Tsirkin @ 2018-09-21  2:23 UTC (permalink / raw)
  To: Siwei Liu; +Cc: Venu Busireddy, Cornelia Huck, Samudrala, Sridhar, virtio-dev

On Thu, Sep 20, 2018 at 04:57:56PM -0700, Siwei Liu wrote:
> On Wed, Sep 19, 2018 at 8:11 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Tue, Sep 18, 2018 at 11:48:46AM -0700, Siwei Liu wrote:
> >> On Tue, Sep 18, 2018 at 8:31 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> >> > On Tue, Sep 18, 2018 at 10:13:37AM -0500, Venu Busireddy wrote:
> >> >> On 2018-09-18 09:35:48 -0400, Michael S. Tsirkin wrote:
> >> >> > On Tue, Sep 18, 2018 at 12:20:52PM +0200, Cornelia Huck wrote:
> >> >> > > On Wed, 12 Sep 2018 11:22:12 -0400
> >> >> > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >> >> > >
> >> >> > > > On Wed, Sep 12, 2018 at 08:17:45AM -0700, Samudrala, Sridhar wrote:
> >> >> > > > >
> >> >> > > > >
> >> >> > > > > On 9/7/2018 2:34 PM, Michael S. Tsirkin wrote:
> >> >> > > > > > On Wed, Aug 15, 2018 at 11:49:15AM -0700, Sridhar Samudrala wrote:
> >> >> > > > > > > VIRTIO_NET_F_STANDBY feature enables hypervisor to indicate virtio_net
> >> >> > > > > > > device to act as a standby for another device with the same MAC address.
> >> >> > > > > > >
> >> >> > > > > > > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
> >> >> > > > > > > Acked-by: Cornelia Huck <cohuck@redhat.com>
> >> >> > > > > > > Fixes: https://github.com/oasis-tcs/virtio-spec/issues/18
> >> >> > > > > > Applied but when do you plan to add documentation as pointed
> >> >> > > > > > out by Jan and Halil?
> >> >> > > > >
> >> >> > > > > I thought additional documentation will be done as part of the Qemu enablement
> >> >> > > > > patches and i hope someone in RH is looking into it.
> >> >> > > > >
> >> >> > > > > Does it make sense to add a link to to the kernel documentation of this feature in
> >> >> > > > > the spec
> >> >> > > > >  https://www.kernel.org/doc/html/latest/networking/net_failover.html
> >> >> > > >
> >> >> > > >
> >> >> > > > I do not think this will address the comments posted.  Specifically we
> >> >> > > > should probably include documentation for what is a standby and primary:
> >> >> > > > what is expected of driver (maintain configuration on standby, support
> >> >> > > > primary coming and going, transmit on standby only if there is no
> >> >> > > > primary) and of device (have same mac for standby as for standby).
> >> >> > >
> >> >> > > Yes, we need some definitive statements of what a driver and a device
> >> >> > > is supposed to do in order to conform; it might make sense to discuss
> >> >> > > this in conjunction with discussion on any QEMU patches (have not
> >> >> > > checked whether anything has been posted, just returned from vacation).
> >> >> > >
> >> >> > > I assume that we still stick with the plan to implement/document
> >> >> > > MAC-based handling first and then enhance with other methods later?
> >> >> >
> >> >> > I'm fine with that at least. If someone wants to work on
> >> >> > other methods straight away, that's also fine by me.
> >> >>
> >> >> Patch set [1] implements the failover-group-id mechanism. Are you
> >> >> thinking of some other method?
> >> >>
> >> >> Venu
> >> >>
> >> >> [1] https://lists.oasis-open.org/archives/virtio-dev/201806/msg00384.html
> >> >>
> >> >
> >> > Yes, the grouping mechanism seems fine to me (I don't remember
> >> > about the implementation, it's been a while).
> >> >
> >> > It is not by itself sufficient though, is it?
> >>
> >> I do understand that the group ID patch is incomplete though it's a
> >> base patch for the real work.
> >>
> >> >
> >> > MAC is assumed to be shared to avoid things like ARP/neighboor
> >> > rediscovery, right?
> >>
> >> True, but does this really need to be part of the guest-host
> >> interface? Or rather, I don't see how MAC based matching can be done
> >> on the host part.
> >
> > mac address matching does not need to affect host side.
> 
> Did you realize that the host side can't have duplicate MAC address
> filters for both PV and VF at the same time?
> 
> If hot adding a VF with duplicate MAC address filter programmed in
> prior, the PV path for virtio in the host side is effectively
> disabled. However, the fact that VF gets hot plugged by QEMU/libvirt
> does not mean it's ready and usable in the guest. You end up with
> unusable guest networking, *temporarily only when VF is successfully
> probed and properly enslabed*. As of now, no guest-host handshake was
> defined in the spec to make virtio driver aware of hotplug event thus
> VF's exposure, and zero handshake was done to switch the datapath when
> VF driver is ready and usable in guest. The current implementation
> relies on the lucky side that all the entire hot plug process will be
> successul in the guest.

I think it's a PF bug then. PF driver should ignore filters
for VFs which have not been enabled by guest since reset. 

> BTW netvsc mitigate potential failure in the hotplug and driver
> probing by acknowledging the hypervisor through a DATAPATH_SWITCH
> hypercall (VMbus message) when VF driver is enslaved and ready, only
> then hypervisor will kick off datapath switching by moving the MAC
> address filter.

We can do it without need for PV.  We can detect e.g. bus master enable.
Move the filter when enabled, move it back when disabled e.g. by
VF reset. Or maybe MSE, or both.

> >
> >> Are you going to expose MAC address to VFIO?
> >
> > If mac of a VF is programmed by libvirt through the PF
> > (that's already the case), VFIO does not need to care about it.
> >
> >>
> >> The thing is the current MAC based implementation has intrinsic flaw
> >> that doesn't propagate errors to hypervisor, or there's no back
> >> channel for guest to unwind the hot plug action upon failure in
> >> probing or enslaving the primary.
> >
> > I guess you can eject the primary if you like. But
> > why does hypervisor need to know? On error, just don't use primary,
> > use standby.
> 
> Forget about the grouping mechanism first.

OK :)

> What guest kernel change do
> you propose to make virtio driver know every possible error, think
> about how many moving targets it needs to specifically track with or
> has to depend on during the hot plug and driver probing process? If
> someone starts to implement the code and think about various error
> cases as a whole, I bet it would be more clear why grouping is
> relevant in the first place.
> 
> -Siwei

It just seems that no one's been motivated to do it so far.

> >
> >> If you think about a more robust
> >> implementation, another grouping mechanism rather than MAC is pretty
> >> much required.
> >>
> >> Thanks,
> >> -Siwei
> >
> > I don't really know what is the flaw, or how is it fixed by a grouping
> > mechanism. All this motivation was never described as part of work on
> > an alternate grouping.
> >
> >> > If true that implies that to avoid guest confusion visibility of the
> >> > primary needs to be controlled by standby's driver.
> >> > This makes this patchset incomplete.
> >> >
> >> > For this work to be complete what is needed is:
> >> > - hypervisor: add control of primary's visibility to guest
> >> > - guest: add support for this grouping to the failover driver
> >> >
> >> > We also need
> >> > - spec: document matching rules based on the pci bridge
> >> >
> >> > and it's helpful to have a spec proposal with implementation, but I
> >> > would say at least proposed patches to one of the above 2 would be
> >> > helpful before we include this in spec.
> >> >
> >> > --
> >> > MST
> >> >
> >> > ---------------------------------------------------------------------
> >> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> >> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> >> >

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-09-21  2:23                     ` Michael S. Tsirkin
@ 2018-09-21  2:34                       ` Michael S. Tsirkin
  2018-09-27  0:18                       ` Siwei Liu
  1 sibling, 0 replies; 85+ messages in thread
From: Michael S. Tsirkin @ 2018-09-21  2:34 UTC (permalink / raw)
  To: Siwei Liu; +Cc: Venu Busireddy, Cornelia Huck, Samudrala, Sridhar, virtio-dev

On Thu, Sep 20, 2018 at 10:23:22PM -0400, Michael S. Tsirkin wrote:
> On Thu, Sep 20, 2018 at 04:57:56PM -0700, Siwei Liu wrote:
> > On Wed, Sep 19, 2018 at 8:11 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > > On Tue, Sep 18, 2018 at 11:48:46AM -0700, Siwei Liu wrote:
> > >> On Tue, Sep 18, 2018 at 8:31 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > >> > On Tue, Sep 18, 2018 at 10:13:37AM -0500, Venu Busireddy wrote:
> > >> >> On 2018-09-18 09:35:48 -0400, Michael S. Tsirkin wrote:
> > >> >> > On Tue, Sep 18, 2018 at 12:20:52PM +0200, Cornelia Huck wrote:
> > >> >> > > On Wed, 12 Sep 2018 11:22:12 -0400
> > >> >> > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > >> >> > >
> > >> >> > > > On Wed, Sep 12, 2018 at 08:17:45AM -0700, Samudrala, Sridhar wrote:
> > >> >> > > > >
> > >> >> > > > >
> > >> >> > > > > On 9/7/2018 2:34 PM, Michael S. Tsirkin wrote:
> > >> >> > > > > > On Wed, Aug 15, 2018 at 11:49:15AM -0700, Sridhar Samudrala wrote:
> > >> >> > > > > > > VIRTIO_NET_F_STANDBY feature enables hypervisor to indicate virtio_net
> > >> >> > > > > > > device to act as a standby for another device with the same MAC address.
> > >> >> > > > > > >
> > >> >> > > > > > > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
> > >> >> > > > > > > Acked-by: Cornelia Huck <cohuck@redhat.com>
> > >> >> > > > > > > Fixes: https://github.com/oasis-tcs/virtio-spec/issues/18
> > >> >> > > > > > Applied but when do you plan to add documentation as pointed
> > >> >> > > > > > out by Jan and Halil?
> > >> >> > > > >
> > >> >> > > > > I thought additional documentation will be done as part of the Qemu enablement
> > >> >> > > > > patches and i hope someone in RH is looking into it.
> > >> >> > > > >
> > >> >> > > > > Does it make sense to add a link to to the kernel documentation of this feature in
> > >> >> > > > > the spec
> > >> >> > > > >  https://www.kernel.org/doc/html/latest/networking/net_failover.html
> > >> >> > > >
> > >> >> > > >
> > >> >> > > > I do not think this will address the comments posted.  Specifically we
> > >> >> > > > should probably include documentation for what is a standby and primary:
> > >> >> > > > what is expected of driver (maintain configuration on standby, support
> > >> >> > > > primary coming and going, transmit on standby only if there is no
> > >> >> > > > primary) and of device (have same mac for standby as for standby).
> > >> >> > >
> > >> >> > > Yes, we need some definitive statements of what a driver and a device
> > >> >> > > is supposed to do in order to conform; it might make sense to discuss
> > >> >> > > this in conjunction with discussion on any QEMU patches (have not
> > >> >> > > checked whether anything has been posted, just returned from vacation).
> > >> >> > >
> > >> >> > > I assume that we still stick with the plan to implement/document
> > >> >> > > MAC-based handling first and then enhance with other methods later?
> > >> >> >
> > >> >> > I'm fine with that at least. If someone wants to work on
> > >> >> > other methods straight away, that's also fine by me.
> > >> >>
> > >> >> Patch set [1] implements the failover-group-id mechanism. Are you
> > >> >> thinking of some other method?
> > >> >>
> > >> >> Venu
> > >> >>
> > >> >> [1] https://lists.oasis-open.org/archives/virtio-dev/201806/msg00384.html
> > >> >>
> > >> >
> > >> > Yes, the grouping mechanism seems fine to me (I don't remember
> > >> > about the implementation, it's been a while).
> > >> >
> > >> > It is not by itself sufficient though, is it?
> > >>
> > >> I do understand that the group ID patch is incomplete though it's a
> > >> base patch for the real work.
> > >>
> > >> >
> > >> > MAC is assumed to be shared to avoid things like ARP/neighboor
> > >> > rediscovery, right?
> > >>
> > >> True, but does this really need to be part of the guest-host
> > >> interface? Or rather, I don't see how MAC based matching can be done
> > >> on the host part.
> > >
> > > mac address matching does not need to affect host side.
> > 
> > Did you realize that the host side can't have duplicate MAC address
> > filters for both PV and VF at the same time?
> > 
> > If hot adding a VF with duplicate MAC address filter programmed in
> > prior, the PV path for virtio in the host side is effectively
> > disabled. However, the fact that VF gets hot plugged by QEMU/libvirt
> > does not mean it's ready and usable in the guest. You end up with
> > unusable guest networking, *temporarily only when VF is successfully
> > probed and properly enslabed*. As of now, no guest-host handshake was
> > defined in the spec to make virtio driver aware of hotplug event thus
> > VF's exposure, and zero handshake was done to switch the datapath when
> > VF driver is ready and usable in guest. The current implementation
> > relies on the lucky side that all the entire hot plug process will be
> > successul in the guest.
> 
> I think it's a PF bug then. PF driver should ignore filters
> for VFs which have not been enabled by guest since reset. 
> 
> > BTW netvsc mitigate potential failure in the hotplug and driver
> > probing by acknowledging the hypervisor through a DATAPATH_SWITCH
> > hypercall (VMbus message) when VF driver is enslaved and ready, only
> > then hypervisor will kick off datapath switching by moving the MAC
> > address filter.
> 
> We can do it without need for PV.  We can detect e.g. bus master enable.
> Move the filter when enabled, move it back when disabled e.g. by
> VF reset. Or maybe MSE, or both.

One other issue that I think netvsc will also have would be the moving
of the MAC address.  We need to reserve resources for the filter
otherwise we risk attempt to install a filter will fail.  Maybe we can
start VF with a temporary MAC, then change it to a final one when guest
tries to use it. It will work but we run into fact that MACs are
currently programmed by mgmnt - in many setups qemu does not have the
rights to do it.

I'll try to ask some mgmt guys about the feasibility of this.

I'm less worried about errors and more worried about downtime - hotplug
on PCIe takes a while to complete (which maybe we should fix for linux
by some PV, but would be tricky to fix for windows).


> > >
> > >> Are you going to expose MAC address to VFIO?
> > >
> > > If mac of a VF is programmed by libvirt through the PF
> > > (that's already the case), VFIO does not need to care about it.
> > >
> > >>
> > >> The thing is the current MAC based implementation has intrinsic flaw
> > >> that doesn't propagate errors to hypervisor, or there's no back
> > >> channel for guest to unwind the hot plug action upon failure in
> > >> probing or enslaving the primary.
> > >
> > > I guess you can eject the primary if you like. But
> > > why does hypervisor need to know? On error, just don't use primary,
> > > use standby.
> > 
> > Forget about the grouping mechanism first.
> 
> OK :)
> 
> > What guest kernel change do
> > you propose to make virtio driver know every possible error, think
> > about how many moving targets it needs to specifically track with or
> > has to depend on during the hot plug and driver probing process? If
> > someone starts to implement the code and think about various error
> > cases as a whole, I bet it would be more clear why grouping is
> > relevant in the first place.
> > 
> > -Siwei
> 
> It just seems that no one's been motivated to do it so far.
> 
> > >
> > >> If you think about a more robust
> > >> implementation, another grouping mechanism rather than MAC is pretty
> > >> much required.
> > >>
> > >> Thanks,
> > >> -Siwei
> > >
> > > I don't really know what is the flaw, or how is it fixed by a grouping
> > > mechanism. All this motivation was never described as part of work on
> > > an alternate grouping.
> > >
> > >> > If true that implies that to avoid guest confusion visibility of the
> > >> > primary needs to be controlled by standby's driver.
> > >> > This makes this patchset incomplete.
> > >> >
> > >> > For this work to be complete what is needed is:
> > >> > - hypervisor: add control of primary's visibility to guest
> > >> > - guest: add support for this grouping to the failover driver
> > >> >
> > >> > We also need
> > >> > - spec: document matching rules based on the pci bridge
> > >> >
> > >> > and it's helpful to have a spec proposal with implementation, but I
> > >> > would say at least proposed patches to one of the above 2 would be
> > >> > helpful before we include this in spec.
> > >> >
> > >> > --
> > >> > MST
> > >> >
> > >> > ---------------------------------------------------------------------
> > >> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > >> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > >> >

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-09-21  2:23                     ` Michael S. Tsirkin
  2018-09-21  2:34                       ` Michael S. Tsirkin
@ 2018-09-27  0:18                       ` Siwei Liu
  2018-09-27  7:17                         ` Sameeh Jubran
  2018-09-27 16:32                         ` Michael S. Tsirkin
  1 sibling, 2 replies; 85+ messages in thread
From: Siwei Liu @ 2018-09-27  0:18 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Venu Busireddy, Cornelia Huck, Samudrala, Sridhar, virtio-dev

On Thu, Sep 20, 2018 at 7:23 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Thu, Sep 20, 2018 at 04:57:56PM -0700, Siwei Liu wrote:
>> On Wed, Sep 19, 2018 at 8:11 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>> > On Tue, Sep 18, 2018 at 11:48:46AM -0700, Siwei Liu wrote:
>> >> On Tue, Sep 18, 2018 at 8:31 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
>> >> > On Tue, Sep 18, 2018 at 10:13:37AM -0500, Venu Busireddy wrote:
>> >> >> On 2018-09-18 09:35:48 -0400, Michael S. Tsirkin wrote:
>> >> >> > On Tue, Sep 18, 2018 at 12:20:52PM +0200, Cornelia Huck wrote:
>> >> >> > > On Wed, 12 Sep 2018 11:22:12 -0400
>> >> >> > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
>> >> >> > >
>> >> >> > > > On Wed, Sep 12, 2018 at 08:17:45AM -0700, Samudrala, Sridhar wrote:
>> >> >> > > > >
>> >> >> > > > >
>> >> >> > > > > On 9/7/2018 2:34 PM, Michael S. Tsirkin wrote:
>> >> >> > > > > > On Wed, Aug 15, 2018 at 11:49:15AM -0700, Sridhar Samudrala wrote:
>> >> >> > > > > > > VIRTIO_NET_F_STANDBY feature enables hypervisor to indicate virtio_net
>> >> >> > > > > > > device to act as a standby for another device with the same MAC address.
>> >> >> > > > > > >
>> >> >> > > > > > > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
>> >> >> > > > > > > Acked-by: Cornelia Huck <cohuck@redhat.com>
>> >> >> > > > > > > Fixes: https://github.com/oasis-tcs/virtio-spec/issues/18
>> >> >> > > > > > Applied but when do you plan to add documentation as pointed
>> >> >> > > > > > out by Jan and Halil?
>> >> >> > > > >
>> >> >> > > > > I thought additional documentation will be done as part of the Qemu enablement
>> >> >> > > > > patches and i hope someone in RH is looking into it.
>> >> >> > > > >
>> >> >> > > > > Does it make sense to add a link to to the kernel documentation of this feature in
>> >> >> > > > > the spec
>> >> >> > > > >  https://www.kernel.org/doc/html/latest/networking/net_failover.html
>> >> >> > > >
>> >> >> > > >
>> >> >> > > > I do not think this will address the comments posted.  Specifically we
>> >> >> > > > should probably include documentation for what is a standby and primary:
>> >> >> > > > what is expected of driver (maintain configuration on standby, support
>> >> >> > > > primary coming and going, transmit on standby only if there is no
>> >> >> > > > primary) and of device (have same mac for standby as for standby).
>> >> >> > >
>> >> >> > > Yes, we need some definitive statements of what a driver and a device
>> >> >> > > is supposed to do in order to conform; it might make sense to discuss
>> >> >> > > this in conjunction with discussion on any QEMU patches (have not
>> >> >> > > checked whether anything has been posted, just returned from vacation).
>> >> >> > >
>> >> >> > > I assume that we still stick with the plan to implement/document
>> >> >> > > MAC-based handling first and then enhance with other methods later?
>> >> >> >
>> >> >> > I'm fine with that at least. If someone wants to work on
>> >> >> > other methods straight away, that's also fine by me.
>> >> >>
>> >> >> Patch set [1] implements the failover-group-id mechanism. Are you
>> >> >> thinking of some other method?
>> >> >>
>> >> >> Venu
>> >> >>
>> >> >> [1] https://lists.oasis-open.org/archives/virtio-dev/201806/msg00384.html
>> >> >>
>> >> >
>> >> > Yes, the grouping mechanism seems fine to me (I don't remember
>> >> > about the implementation, it's been a while).
>> >> >
>> >> > It is not by itself sufficient though, is it?
>> >>
>> >> I do understand that the group ID patch is incomplete though it's a
>> >> base patch for the real work.
>> >>
>> >> >
>> >> > MAC is assumed to be shared to avoid things like ARP/neighboor
>> >> > rediscovery, right?
>> >>
>> >> True, but does this really need to be part of the guest-host
>> >> interface? Or rather, I don't see how MAC based matching can be done
>> >> on the host part.
>> >
>> > mac address matching does not need to affect host side.
>>
>> Did you realize that the host side can't have duplicate MAC address
>> filters for both PV and VF at the same time?
>>
>> If hot adding a VF with duplicate MAC address filter programmed in
>> prior, the PV path for virtio in the host side is effectively
>> disabled. However, the fact that VF gets hot plugged by QEMU/libvirt
>> does not mean it's ready and usable in the guest. You end up with
>> unusable guest networking, *temporarily only when VF is successfully
>> probed and properly enslabed*. As of now, no guest-host handshake was
>> defined in the spec to make virtio driver aware of hotplug event thus
>> VF's exposure, and zero handshake was done to switch the datapath when
>> VF driver is ready and usable in guest. The current implementation
>> relies on the lucky side that all the entire hot plug process will be
>> successul in the guest.
>
> I think it's a PF bug then. PF driver should ignore filters
> for VFs which have not been enabled by guest since reset.

Even so, the fact is that if the design is tied to MAC based matching
you end up with relying on that MAC address to pair device, which
loses the flexibility to move MAC filter at some point later after
assigning VF to guest.

>
>> BTW netvsc mitigate potential failure in the hotplug and driver
>> probing by acknowledging the hypervisor through a DATAPATH_SWITCH
>> hypercall (VMbus message) when VF driver is enslaved and ready, only
>> then hypervisor will kick off datapath switching by moving the MAC
>> address filter.
>
> We can do it without need for PV.  We can detect e.g. bus master enable.

I'm not sure if it's valid to assume master enable/disable is the
right point to move the filter, although it improves a bit than do
nothing. The thing is that from device (QEMU) perspective it knows
nothing and should not assume too much about guest implementation -
the time to move the filter around means the VF driver is fully ready
in guest and properly handled by the bond driver (net_failover), so
the primary can take over the datapath going forward. While the bus
master enable usually happens earlier than that, which does not
indicate anything about readiness on the control side that the
bond/failver driver can actually see this VF and "manage" it. This
strictly does not form any guest-host handshake to me. Think about
what if VM user changes VF to a different netns, or rebind it to DPDK
PMD? These just demostrate a few things that can get well covered by
this design, and I suspect the errors in the real life would be much
more complex.

> Move the filter when enabled, move it back when disabled e.g. by
> VF reset. Or maybe MSE, or both.

MSE is on the PF and shared by all VFs, why it's relevant?

>
>> >
>> >> Are you going to expose MAC address to VFIO?
>> >
>> > If mac of a VF is programmed by libvirt through the PF
>> > (that's already the case), VFIO does not need to care about it.
>> >
>> >>
>> >> The thing is the current MAC based implementation has intrinsic flaw
>> >> that doesn't propagate errors to hypervisor, or there's no back
>> >> channel for guest to unwind the hot plug action upon failure in
>> >> probing or enslaving the primary.
>> >
>> > I guess you can eject the primary if you like. But
>> > why does hypervisor need to know? On error, just don't use primary,
>> > use standby.
>>
>> Forget about the grouping mechanism first.
>
> OK :)
>
>> What guest kernel change do
>> you propose to make virtio driver know every possible error, think
>> about how many moving targets it needs to specifically track with or
>> has to depend on during the hot plug and driver probing process? If
>> someone starts to implement the code and think about various error
>> cases as a whole, I bet it would be more clear why grouping is
>> relevant in the first place.
>>
>> -Siwei
>
> It just seems that no one's been motivated to do it so far.

It's just that the MAC matching design is simply too broken. We have
root disk hosted on networked storage, i.e. iSCSI, that can't tolerate
any potential network failure if the design itself is not error proof.
IOW our criteria for network downtime and errors is super rigorous..

-Siwei

>
>> >
>> >> If you think about a more robust
>> >> implementation, another grouping mechanism rather than MAC is pretty
>> >> much required.
>> >>
>> >> Thanks,
>> >> -Siwei
>> >
>> > I don't really know what is the flaw, or how is it fixed by a grouping
>> > mechanism. All this motivation was never described as part of work on
>> > an alternate grouping.
>> >
>> >> > If true that implies that to avoid guest confusion visibility of the
>> >> > primary needs to be controlled by standby's driver.
>> >> > This makes this patchset incomplete.
>> >> >
>> >> > For this work to be complete what is needed is:
>> >> > - hypervisor: add control of primary's visibility to guest
>> >> > - guest: add support for this grouping to the failover driver
>> >> >
>> >> > We also need
>> >> > - spec: document matching rules based on the pci bridge
>> >> >
>> >> > and it's helpful to have a spec proposal with implementation, but I
>> >> > would say at least proposed patches to one of the above 2 would be
>> >> > helpful before we include this in spec.
>> >> >
>> >> > --
>> >> > MST
>> >> >
>> >> > ---------------------------------------------------------------------
>> >> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
>> >> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>> >> >

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-09-27  0:18                       ` Siwei Liu
@ 2018-09-27  7:17                         ` Sameeh Jubran
  2018-09-27 16:17                           ` Michael S. Tsirkin
  2018-09-27 16:32                         ` Michael S. Tsirkin
  1 sibling, 1 reply; 85+ messages in thread
From: Sameeh Jubran @ 2018-09-27  7:17 UTC (permalink / raw)
  To: loseweigh
  Cc: Michael S. Tsirkin, venu.busireddy, cohuck, sridhar.samudrala,
	virtio-dev

What do you think about the following alternative implementation which
uses cross id validation.

-device virtio-net,standby=<device_id_of_vfio_device>,id=<device_id_of_virtio_net_device>
-vfio #address,id=<device_id_of_vfio_device>,primary=<device_id_of_virtio_net_device>

On Thu, Sep 27, 2018 at 3:19 AM Siwei Liu <loseweigh@gmail.com> wrote:
>
> On Thu, Sep 20, 2018 at 7:23 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Thu, Sep 20, 2018 at 04:57:56PM -0700, Siwei Liu wrote:
> >> On Wed, Sep 19, 2018 at 8:11 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> >> > On Tue, Sep 18, 2018 at 11:48:46AM -0700, Siwei Liu wrote:
> >> >> On Tue, Sep 18, 2018 at 8:31 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> >> >> > On Tue, Sep 18, 2018 at 10:13:37AM -0500, Venu Busireddy wrote:
> >> >> >> On 2018-09-18 09:35:48 -0400, Michael S. Tsirkin wrote:
> >> >> >> > On Tue, Sep 18, 2018 at 12:20:52PM +0200, Cornelia Huck wrote:
> >> >> >> > > On Wed, 12 Sep 2018 11:22:12 -0400
> >> >> >> > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >> >> >> > >
> >> >> >> > > > On Wed, Sep 12, 2018 at 08:17:45AM -0700, Samudrala, Sridhar wrote:
> >> >> >> > > > >
> >> >> >> > > > >
> >> >> >> > > > > On 9/7/2018 2:34 PM, Michael S. Tsirkin wrote:
> >> >> >> > > > > > On Wed, Aug 15, 2018 at 11:49:15AM -0700, Sridhar Samudrala wrote:
> >> >> >> > > > > > > VIRTIO_NET_F_STANDBY feature enables hypervisor to indicate virtio_net
> >> >> >> > > > > > > device to act as a standby for another device with the same MAC address.
> >> >> >> > > > > > >
> >> >> >> > > > > > > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
> >> >> >> > > > > > > Acked-by: Cornelia Huck <cohuck@redhat.com>
> >> >> >> > > > > > > Fixes: https://github.com/oasis-tcs/virtio-spec/issues/18
> >> >> >> > > > > > Applied but when do you plan to add documentation as pointed
> >> >> >> > > > > > out by Jan and Halil?
> >> >> >> > > > >
> >> >> >> > > > > I thought additional documentation will be done as part of the Qemu enablement
> >> >> >> > > > > patches and i hope someone in RH is looking into it.
> >> >> >> > > > >
> >> >> >> > > > > Does it make sense to add a link to to the kernel documentation of this feature in
> >> >> >> > > > > the spec
> >> >> >> > > > >  https://www.kernel.org/doc/html/latest/networking/net_failover.html
> >> >> >> > > >
> >> >> >> > > >
> >> >> >> > > > I do not think this will address the comments posted.  Specifically we
> >> >> >> > > > should probably include documentation for what is a standby and primary:
> >> >> >> > > > what is expected of driver (maintain configuration on standby, support
> >> >> >> > > > primary coming and going, transmit on standby only if there is no
> >> >> >> > > > primary) and of device (have same mac for standby as for standby).
> >> >> >> > >
> >> >> >> > > Yes, we need some definitive statements of what a driver and a device
> >> >> >> > > is supposed to do in order to conform; it might make sense to discuss
> >> >> >> > > this in conjunction with discussion on any QEMU patches (have not
> >> >> >> > > checked whether anything has been posted, just returned from vacation).
> >> >> >> > >
> >> >> >> > > I assume that we still stick with the plan to implement/document
> >> >> >> > > MAC-based handling first and then enhance with other methods later?
> >> >> >> >
> >> >> >> > I'm fine with that at least. If someone wants to work on
> >> >> >> > other methods straight away, that's also fine by me.
> >> >> >>
> >> >> >> Patch set [1] implements the failover-group-id mechanism. Are you
> >> >> >> thinking of some other method?
> >> >> >>
> >> >> >> Venu
> >> >> >>
> >> >> >> [1] https://lists.oasis-open.org/archives/virtio-dev/201806/msg00384.html
> >> >> >>
> >> >> >
> >> >> > Yes, the grouping mechanism seems fine to me (I don't remember
> >> >> > about the implementation, it's been a while).
> >> >> >
> >> >> > It is not by itself sufficient though, is it?
> >> >>
> >> >> I do understand that the group ID patch is incomplete though it's a
> >> >> base patch for the real work.
> >> >>
> >> >> >
> >> >> > MAC is assumed to be shared to avoid things like ARP/neighboor
> >> >> > rediscovery, right?
> >> >>
> >> >> True, but does this really need to be part of the guest-host
> >> >> interface? Or rather, I don't see how MAC based matching can be done
> >> >> on the host part.
> >> >
> >> > mac address matching does not need to affect host side.
> >>
> >> Did you realize that the host side can't have duplicate MAC address
> >> filters for both PV and VF at the same time?
> >>
> >> If hot adding a VF with duplicate MAC address filter programmed in
> >> prior, the PV path for virtio in the host side is effectively
> >> disabled. However, the fact that VF gets hot plugged by QEMU/libvirt
> >> does not mean it's ready and usable in the guest. You end up with
> >> unusable guest networking, *temporarily only when VF is successfully
> >> probed and properly enslabed*. As of now, no guest-host handshake was
> >> defined in the spec to make virtio driver aware of hotplug event thus
> >> VF's exposure, and zero handshake was done to switch the datapath when
> >> VF driver is ready and usable in guest. The current implementation
> >> relies on the lucky side that all the entire hot plug process will be
> >> successul in the guest.
> >
> > I think it's a PF bug then. PF driver should ignore filters
> > for VFs which have not been enabled by guest since reset.
>
> Even so, the fact is that if the design is tied to MAC based matching
> you end up with relying on that MAC address to pair device, which
> loses the flexibility to move MAC filter at some point later after
> assigning VF to guest.
>
> >
> >> BTW netvsc mitigate potential failure in the hotplug and driver
> >> probing by acknowledging the hypervisor through a DATAPATH_SWITCH
> >> hypercall (VMbus message) when VF driver is enslaved and ready, only
> >> then hypervisor will kick off datapath switching by moving the MAC
> >> address filter.
> >
> > We can do it without need for PV.  We can detect e.g. bus master enable.
>
> I'm not sure if it's valid to assume master enable/disable is the
> right point to move the filter, although it improves a bit than do
> nothing. The thing is that from device (QEMU) perspective it knows
> nothing and should not assume too much about guest implementation -
> the time to move the filter around means the VF driver is fully ready
> in guest and properly handled by the bond driver (net_failover), so
> the primary can take over the datapath going forward. While the bus
> master enable usually happens earlier than that, which does not
> indicate anything about readiness on the control side that the
> bond/failver driver can actually see this VF and "manage" it. This
> strictly does not form any guest-host handshake to me. Think about
> what if VM user changes VF to a different netns, or rebind it to DPDK
> PMD? These just demostrate a few things that can get well covered by
> this design, and I suspect the errors in the real life would be much
> more complex.
>
> > Move the filter when enabled, move it back when disabled e.g. by
> > VF reset. Or maybe MSE, or both.
>
> MSE is on the PF and shared by all VFs, why it's relevant?
>
> >
> >> >
> >> >> Are you going to expose MAC address to VFIO?
> >> >
> >> > If mac of a VF is programmed by libvirt through the PF
> >> > (that's already the case), VFIO does not need to care about it.
> >> >
> >> >>
> >> >> The thing is the current MAC based implementation has intrinsic flaw
> >> >> that doesn't propagate errors to hypervisor, or there's no back
> >> >> channel for guest to unwind the hot plug action upon failure in
> >> >> probing or enslaving the primary.
> >> >
> >> > I guess you can eject the primary if you like. But
> >> > why does hypervisor need to know? On error, just don't use primary,
> >> > use standby.
> >>
> >> Forget about the grouping mechanism first.
> >
> > OK :)
> >
> >> What guest kernel change do
> >> you propose to make virtio driver know every possible error, think
> >> about how many moving targets it needs to specifically track with or
> >> has to depend on during the hot plug and driver probing process? If
> >> someone starts to implement the code and think about various error
> >> cases as a whole, I bet it would be more clear why grouping is
> >> relevant in the first place.
> >>
> >> -Siwei
> >
> > It just seems that no one's been motivated to do it so far.
>
> It's just that the MAC matching design is simply too broken. We have
> root disk hosted on networked storage, i.e. iSCSI, that can't tolerate
> any potential network failure if the design itself is not error proof.
> IOW our criteria for network downtime and errors is super rigorous..
>
> -Siwei
>
> >
> >> >
> >> >> If you think about a more robust
> >> >> implementation, another grouping mechanism rather than MAC is pretty
> >> >> much required.
> >> >>
> >> >> Thanks,
> >> >> -Siwei
> >> >
> >> > I don't really know what is the flaw, or how is it fixed by a grouping
> >> > mechanism. All this motivation was never described as part of work on
> >> > an alternate grouping.
> >> >
> >> >> > If true that implies that to avoid guest confusion visibility of the
> >> >> > primary needs to be controlled by standby's driver.
> >> >> > This makes this patchset incomplete.
> >> >> >
> >> >> > For this work to be complete what is needed is:
> >> >> > - hypervisor: add control of primary's visibility to guest
> >> >> > - guest: add support for this grouping to the failover driver
> >> >> >
> >> >> > We also need
> >> >> > - spec: document matching rules based on the pci bridge
> >> >> >
> >> >> > and it's helpful to have a spec proposal with implementation, but I
> >> >> > would say at least proposed patches to one of the above 2 would be
> >> >> > helpful before we include this in spec.
> >> >> >
> >> >> > --
> >> >> > MST
> >> >> >
> >> >> > ---------------------------------------------------------------------
> >> >> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> >> >> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> >> >> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>


-- 
Respectfully,
Sameeh Jubran
Linkedin
Software Engineer @ Daynix.

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-09-27  7:17                         ` Sameeh Jubran
@ 2018-09-27 16:17                           ` Michael S. Tsirkin
  2018-09-27 17:23                             ` Samudrala, Sridhar
  0 siblings, 1 reply; 85+ messages in thread
From: Michael S. Tsirkin @ 2018-09-27 16:17 UTC (permalink / raw)
  To: Sameeh Jubran
  Cc: loseweigh, venu.busireddy, cohuck, sridhar.samudrala, virtio-dev

On Thu, Sep 27, 2018 at 10:17:37AM +0300, Sameeh Jubran wrote:
> What do you think about the following alternative implementation which
> uses cross id validation.
> 
> -device virtio-net,standby=<device_id_of_vfio_device>,id=<device_id_of_virtio_net_device>
> -vfio #address,id=<device_id_of_vfio_device>,primary=<device_id_of_virtio_net_device>

virtio is a standby device, isn't it?

Besides that I don't see issues with this API.

-- 
MST

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-09-27  0:18                       ` Siwei Liu
  2018-09-27  7:17                         ` Sameeh Jubran
@ 2018-09-27 16:32                         ` Michael S. Tsirkin
  2018-10-02  8:42                           ` Siwei Liu
  1 sibling, 1 reply; 85+ messages in thread
From: Michael S. Tsirkin @ 2018-09-27 16:32 UTC (permalink / raw)
  To: Siwei Liu; +Cc: Venu Busireddy, Cornelia Huck, Samudrala, Sridhar, virtio-dev

On Wed, Sep 26, 2018 at 05:18:38PM -0700, Siwei Liu wrote:
> On Thu, Sep 20, 2018 at 7:23 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Thu, Sep 20, 2018 at 04:57:56PM -0700, Siwei Liu wrote:
> >> On Wed, Sep 19, 2018 at 8:11 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> >> > On Tue, Sep 18, 2018 at 11:48:46AM -0700, Siwei Liu wrote:
> >> >> On Tue, Sep 18, 2018 at 8:31 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> >> >> > On Tue, Sep 18, 2018 at 10:13:37AM -0500, Venu Busireddy wrote:
> >> >> >> On 2018-09-18 09:35:48 -0400, Michael S. Tsirkin wrote:
> >> >> >> > On Tue, Sep 18, 2018 at 12:20:52PM +0200, Cornelia Huck wrote:
> >> >> >> > > On Wed, 12 Sep 2018 11:22:12 -0400
> >> >> >> > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >> >> >> > >
> >> >> >> > > > On Wed, Sep 12, 2018 at 08:17:45AM -0700, Samudrala, Sridhar wrote:
> >> >> >> > > > >
> >> >> >> > > > >
> >> >> >> > > > > On 9/7/2018 2:34 PM, Michael S. Tsirkin wrote:
> >> >> >> > > > > > On Wed, Aug 15, 2018 at 11:49:15AM -0700, Sridhar Samudrala wrote:
> >> >> >> > > > > > > VIRTIO_NET_F_STANDBY feature enables hypervisor to indicate virtio_net
> >> >> >> > > > > > > device to act as a standby for another device with the same MAC address.
> >> >> >> > > > > > >
> >> >> >> > > > > > > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
> >> >> >> > > > > > > Acked-by: Cornelia Huck <cohuck@redhat.com>
> >> >> >> > > > > > > Fixes: https://github.com/oasis-tcs/virtio-spec/issues/18
> >> >> >> > > > > > Applied but when do you plan to add documentation as pointed
> >> >> >> > > > > > out by Jan and Halil?
> >> >> >> > > > >
> >> >> >> > > > > I thought additional documentation will be done as part of the Qemu enablement
> >> >> >> > > > > patches and i hope someone in RH is looking into it.
> >> >> >> > > > >
> >> >> >> > > > > Does it make sense to add a link to to the kernel documentation of this feature in
> >> >> >> > > > > the spec
> >> >> >> > > > >  https://www.kernel.org/doc/html/latest/networking/net_failover.html
> >> >> >> > > >
> >> >> >> > > >
> >> >> >> > > > I do not think this will address the comments posted.  Specifically we
> >> >> >> > > > should probably include documentation for what is a standby and primary:
> >> >> >> > > > what is expected of driver (maintain configuration on standby, support
> >> >> >> > > > primary coming and going, transmit on standby only if there is no
> >> >> >> > > > primary) and of device (have same mac for standby as for standby).
> >> >> >> > >
> >> >> >> > > Yes, we need some definitive statements of what a driver and a device
> >> >> >> > > is supposed to do in order to conform; it might make sense to discuss
> >> >> >> > > this in conjunction with discussion on any QEMU patches (have not
> >> >> >> > > checked whether anything has been posted, just returned from vacation).
> >> >> >> > >
> >> >> >> > > I assume that we still stick with the plan to implement/document
> >> >> >> > > MAC-based handling first and then enhance with other methods later?
> >> >> >> >
> >> >> >> > I'm fine with that at least. If someone wants to work on
> >> >> >> > other methods straight away, that's also fine by me.
> >> >> >>
> >> >> >> Patch set [1] implements the failover-group-id mechanism. Are you
> >> >> >> thinking of some other method?
> >> >> >>
> >> >> >> Venu
> >> >> >>
> >> >> >> [1] https://lists.oasis-open.org/archives/virtio-dev/201806/msg00384.html
> >> >> >>
> >> >> >
> >> >> > Yes, the grouping mechanism seems fine to me (I don't remember
> >> >> > about the implementation, it's been a while).
> >> >> >
> >> >> > It is not by itself sufficient though, is it?
> >> >>
> >> >> I do understand that the group ID patch is incomplete though it's a
> >> >> base patch for the real work.
> >> >>
> >> >> >
> >> >> > MAC is assumed to be shared to avoid things like ARP/neighboor
> >> >> > rediscovery, right?
> >> >>
> >> >> True, but does this really need to be part of the guest-host
> >> >> interface? Or rather, I don't see how MAC based matching can be done
> >> >> on the host part.
> >> >
> >> > mac address matching does not need to affect host side.
> >>
> >> Did you realize that the host side can't have duplicate MAC address
> >> filters for both PV and VF at the same time?
> >>
> >> If hot adding a VF with duplicate MAC address filter programmed in
> >> prior, the PV path for virtio in the host side is effectively
> >> disabled. However, the fact that VF gets hot plugged by QEMU/libvirt
> >> does not mean it's ready and usable in the guest. You end up with
> >> unusable guest networking, *temporarily only when VF is successfully
> >> probed and properly enslabed*. As of now, no guest-host handshake was
> >> defined in the spec to make virtio driver aware of hotplug event thus
> >> VF's exposure, and zero handshake was done to switch the datapath when
> >> VF driver is ready and usable in guest. The current implementation
> >> relies on the lucky side that all the entire hot plug process will be
> >> successul in the guest.
> >
> > I think it's a PF bug then. PF driver should ignore filters
> > for VFs which have not been enabled by guest since reset.
> 
> Even so, the fact is that if the design is tied to MAC based matching
> you end up with relying on that MAC address to pair device, which
> loses the flexibility to move MAC filter at some point later after
> assigning VF to guest.

Whatever you use for pairing, you still need to reuse same MAC
to avoid redoing neighbour disovery/arp, right?

> >
> >> BTW netvsc mitigate potential failure in the hotplug and driver
> >> probing by acknowledging the hypervisor through a DATAPATH_SWITCH
> >> hypercall (VMbus message) when VF driver is enslaved and ready, only
> >> then hypervisor will kick off datapath switching by moving the MAC
> >> address filter.
> >
> > We can do it without need for PV.  We can detect e.g. bus master enable.
> 
> I'm not sure if it's valid to assume master enable/disable is the
> right point to move the filter, although it improves a bit than do
> nothing. The thing is that from device (QEMU) perspective it knows
> nothing and should not assume too much about guest implementation -
> the time to move the filter around means the VF driver is fully ready
> in guest and properly handled by the bond driver (net_failover), so
> the primary can take over the datapath going forward. While the bus
> master enable usually happens earlier than that, which does not
> indicate anything about readiness on the control side that the
> bond/failver driver can actually see this VF and "manage" it. This
> strictly does not form any guest-host handshake to me. Think about
> what if VM user changes VF to a different netns, or rebind it to DPDK
> PMD?

OK. What then?

> These just demostrate a few things that can get well covered by
> this design, and I suspect the errors in the real life would be much
> more complex.

What's missing is actual design though. The only thing that I saw so far
is bridge group identifier for qemu, which is a start but doesn't
actually solve any problems by itself. You even said "forget about the
grouping mechanism" yourself below.

> > Move the filter when enabled, move it back when disabled e.g. by
> > VF reset. Or maybe MSE, or both.
> 
> MSE is on the PF and shared by all VFs, why it's relevant?

Oh, right. Just FLR then.

> >
> >> >
> >> >> Are you going to expose MAC address to VFIO?
> >> >
> >> > If mac of a VF is programmed by libvirt through the PF
> >> > (that's already the case), VFIO does not need to care about it.
> >> >
> >> >>
> >> >> The thing is the current MAC based implementation has intrinsic flaw
> >> >> that doesn't propagate errors to hypervisor, or there's no back
> >> >> channel for guest to unwind the hot plug action upon failure in
> >> >> probing or enslaving the primary.
> >> >
> >> > I guess you can eject the primary if you like. But
> >> > why does hypervisor need to know? On error, just don't use primary,
> >> > use standby.
> >>
> >> Forget about the grouping mechanism first.
> >
> > OK :)
> >
> >> What guest kernel change do
> >> you propose to make virtio driver know every possible error, think
> >> about how many moving targets it needs to specifically track with or
> >> has to depend on during the hot plug and driver probing process? If
> >> someone starts to implement the code and think about various error
> >> cases as a whole, I bet it would be more clear why grouping is
> >> relevant in the first place.
> >>
> >> -Siwei
> >
> > It just seems that no one's been motivated to do it so far.
> 
> It's just that the MAC matching design is simply too broken.

Too broken to even bother coding up any alternatives?

> We have
> root disk hosted on networked storage, i.e. iSCSI, that can't tolerate
> any potential network failure if the design itself is not error proof.
> IOW our criteria for network downtime and errors is super rigorous..
> 
> -Siwei

IMO that's a very interesting usecase to address! I'll be happy to merge
patches that help reduce downtime, spec-wise I'll be happy to propose
them for TC vote.

> >
> >> >
> >> >> If you think about a more robust
> >> >> implementation, another grouping mechanism rather than MAC is pretty
> >> >> much required.
> >> >>
> >> >> Thanks,
> >> >> -Siwei
> >> >
> >> > I don't really know what is the flaw, or how is it fixed by a grouping
> >> > mechanism. All this motivation was never described as part of work on
> >> > an alternate grouping.
> >> >
> >> >> > If true that implies that to avoid guest confusion visibility of the
> >> >> > primary needs to be controlled by standby's driver.
> >> >> > This makes this patchset incomplete.
> >> >> >
> >> >> > For this work to be complete what is needed is:
> >> >> > - hypervisor: add control of primary's visibility to guest
> >> >> > - guest: add support for this grouping to the failover driver
> >> >> >
> >> >> > We also need
> >> >> > - spec: document matching rules based on the pci bridge
> >> >> >
> >> >> > and it's helpful to have a spec proposal with implementation, but I
> >> >> > would say at least proposed patches to one of the above 2 would be
> >> >> > helpful before we include this in spec.
> >> >> >
> >> >> > --
> >> >> > MST
> >> >> >
> >> >> > ---------------------------------------------------------------------
> >> >> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> >> >> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> >> >> >

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-09-27 16:17                           ` Michael S. Tsirkin
@ 2018-09-27 17:23                             ` Samudrala, Sridhar
  2018-09-27 23:45                               ` Michael S. Tsirkin
  2018-09-30  9:17                               ` Sameeh Jubran
  0 siblings, 2 replies; 85+ messages in thread
From: Samudrala, Sridhar @ 2018-09-27 17:23 UTC (permalink / raw)
  To: Michael S. Tsirkin, Sameeh Jubran
  Cc: loseweigh, venu.busireddy, cohuck, virtio-dev


On 9/27/2018 9:17 AM, Michael S. Tsirkin wrote:
> On Thu, Sep 27, 2018 at 10:17:37AM +0300, Sameeh Jubran wrote:
>> What do you think about the following alternative implementation which
>> uses cross id validation.
>>
>> -device virtio-net,standby=<device_id_of_vfio_device>,id=<device_id_of_virtio_net_device>
>> -vfio #address,id=<device_id_of_vfio_device>,primary=<device_id_of_virtio_net_device>
> virtio is a standby device, isn't it?
>
> Besides that I don't see issues with this API.

Yes. I think 'standby' and 'primary' are reversed in the above suggestion. This should work.

-device virtio-net,primary=<device_id_of_vfio_device>,id=<device_id_of_virtio_net_device>
-vfio #address,id=<device_id_of_vfio_device>,standby=<device_id_of_virtio_net_device>

It should be OK to have virtio-net in standby mode without an associated vfio primary device,
but a vfio primary device should not allowed without a virtio-net standby device.

Will it be possible to remove vfio device when virtio_net driver is unloaded in the VM?

Thanks
Sridhar

	



---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-09-27 17:23                             ` Samudrala, Sridhar
@ 2018-09-27 23:45                               ` Michael S. Tsirkin
  2018-09-30  9:17                               ` Sameeh Jubran
  1 sibling, 0 replies; 85+ messages in thread
From: Michael S. Tsirkin @ 2018-09-27 23:45 UTC (permalink / raw)
  To: Samudrala, Sridhar
  Cc: Sameeh Jubran, loseweigh, venu.busireddy, cohuck, virtio-dev

On Thu, Sep 27, 2018 at 10:23:17AM -0700, Samudrala, Sridhar wrote:
> 
> On 9/27/2018 9:17 AM, Michael S. Tsirkin wrote:
> > On Thu, Sep 27, 2018 at 10:17:37AM +0300, Sameeh Jubran wrote:
> > > What do you think about the following alternative implementation which
> > > uses cross id validation.
> > > 
> > > -device virtio-net,standby=<device_id_of_vfio_device>,id=<device_id_of_virtio_net_device>
> > > -vfio #address,id=<device_id_of_vfio_device>,primary=<device_id_of_virtio_net_device>
> > virtio is a standby device, isn't it?
> > 
> > Besides that I don't see issues with this API.
> 
> Yes. I think 'standby' and 'primary' are reversed in the above suggestion. This should work.
> 
> -device virtio-net,primary=<device_id_of_vfio_device>,id=<device_id_of_virtio_net_device>
> -vfio #address,id=<device_id_of_vfio_device>,standby=<device_id_of_virtio_net_device>
> 
> It should be OK to have virtio-net in standby mode without an associated vfio primary device,
> but a vfio primary device should not allowed without a virtio-net standby device.
> 
> Will it be possible to remove vfio device when virtio_net driver is unloaded in the VM?
> 
> Thanks
> Sridhar
> 
> 	

We can request removal, yes.

-- 
MST

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-09-27 17:23                             ` Samudrala, Sridhar
  2018-09-27 23:45                               ` Michael S. Tsirkin
@ 2018-09-30  9:17                               ` Sameeh Jubran
  2018-09-30 13:50                                 ` Sameeh Jubran
  1 sibling, 1 reply; 85+ messages in thread
From: Sameeh Jubran @ 2018-09-30  9:17 UTC (permalink / raw)
  To: sridhar.samudrala
  Cc: Michael S. Tsirkin, Siwei Liu, venu.busireddy, cohuck, virtio-dev

On Thu, Sep 27, 2018 at 8:25 PM Samudrala, Sridhar
<sridhar.samudrala@intel.com> wrote:
>
>
> On 9/27/2018 9:17 AM, Michael S. Tsirkin wrote:
> > On Thu, Sep 27, 2018 at 10:17:37AM +0300, Sameeh Jubran wrote:
> >> What do you think about the following alternative implementation which
> >> uses cross id validation.
> >>
> >> -device virtio-net,standby=<device_id_of_vfio_device>,id=<device_id_of_virtio_net_device>
> >> -vfio #address,id=<device_id_of_vfio_device>,primary=<device_id_of_virtio_net_device>
> > virtio is a standby device, isn't it?
> >
> > Besides that I don't see issues with this API.
>
> Yes. I think 'standby' and 'primary' are reversed in the above suggestion. This should work.
hmm, I thought about standby being a property that virtio-net has, and
this property is that it is a standby for the
primary device and vice versa for the vfio. However this can be viewed
from different aspects and isn't a deal breaker :)
>
> -device virtio-net,primary=<device_id_of_vfio_device>,id=<device_id_of_virtio_net_device>
> -vfio #address,id=<device_id_of_vfio_device>,standby=<device_id_of_virtio_net_device>
>
> It should be OK to have virtio-net in standby mode without an associated vfio primary device,
> but a vfio primary device should not allowed without a virtio-net standby device.
>
> Will it be possible to remove vfio device when virtio_net driver is unloaded in the VM?
>
> Thanks
> Sridhar
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>


-- 
Respectfully,
Sameeh Jubran
Linkedin
Software Engineer @ Daynix.

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-09-30  9:17                               ` Sameeh Jubran
@ 2018-09-30 13:50                                 ` Sameeh Jubran
  0 siblings, 0 replies; 85+ messages in thread
From: Sameeh Jubran @ 2018-09-30 13:50 UTC (permalink / raw)
  To: sridhar.samudrala
  Cc: Michael S. Tsirkin, Siwei Liu, venu.busireddy, cohuck, virtio-dev

I have created the following pacth which implements the basic
functionality of hiding and plugging the primary device upon acking
the standby feature.
It's not currently implemented for vfio devices but for e1000 as I
don't have access to vfio device on my current setup, however the
implementation
should be similar. I am facing an issue with hotplug handler being
NULL whe calling qdev_get_hotplug_handler even though it did work for
me before,
if anyone has any idea what the issue could be it would be much appreciated.

I think there should be a structure which describes the state of the
primary device in order to implement the migration feature.

Please share your thoughts and insights

commit 39a350ee65a26ab6ede4c08b3ca3b9e945fcf305 (HEAD -> failover)
Author: Sameeh Jubran <sjubran@redhat.com>
Date:   Sun Sep 16 13:21:41 2018 +0300

    virtio-net: Implement standby feature

    Signed-off-by: Sameeh Jubran <sjubran@redhat.com>

diff --git a/hw/net/e1000.c b/hw/net/e1000.c
index 13a9494a8d..026b8631ed 100644
--- a/hw/net/e1000.c
+++ b/hw/net/e1000.c
@@ -36,6 +36,8 @@
 #include "qemu/range.h"

 #include "e1000x_common.h"
+#include "hw/virtio/virtio-net.h"
+#include "hw/virtio/virtio-pci.h"

 static const uint8_t bcast[] = {0xff, 0xff, 0xff, 0xff, 0xff, 0xff};

@@ -118,6 +120,7 @@ typedef struct E1000State_st {
     bool mit_timer_on;         /* Mitigation timer is running. */
     bool mit_irq_level;        /* Tracks interrupt pin level. */
     uint32_t mit_ide;          /* Tracks E1000_TXD_CMD_IDE bit. */
+    char *primary_id_str;

 /* Compatibility flags for migration to/from qemu 1.3.0 and older */
 #define E1000_FLAG_AUTONEG_BIT 0
@@ -1652,9 +1655,16 @@ static void e1000_write_config(PCIDevice
*pci_dev, uint32_t address,
     }
 }

+static bool standby_device_present(const char *id,
+        struct PCIDevice **pdev)
+{
+    return pci_qdev_find_device(id, pdev) >= 0;
+}
+
 static void pci_e1000_realize(PCIDevice *pci_dev, Error **errp)
 {
     DeviceState *dev = DEVICE(pci_dev);
+    PCIDevice *standby_pci_dev;
     E1000State *d = E1000(pci_dev);
     uint8_t *pci_conf;
     uint8_t *macaddr;
@@ -1690,6 +1700,12 @@ static void pci_e1000_realize(PCIDevice
*pci_dev, Error **errp)

     d->autoneg_timer = timer_new_ms(QEMU_CLOCK_VIRTUAL,
e1000_autoneg_timer, d);
     d->mit_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, e1000_mit_timer, d);
+    if (d->primary_id_str && standby_device_present(
+            d->primary_id_str, &standby_pci_dev) && standby_pci_dev) {
+        VirtIOPCIProxy *proxy = VIRTIO_PCI(standby_pci_dev);
+        VirtIODevice *vdev = virtio_bus_get_device(&proxy->bus);
+        virtio_net_register_primary_device(DEVICE(vdev), dev);
+    }
 }

 static void qdev_e1000_reset(DeviceState *dev)
@@ -1708,6 +1724,7 @@ static Property e1000_properties[] = {
                     compat_flags, E1000_FLAG_MAC_BIT, true),
     DEFINE_PROP_BIT("migrate_tso_props", E1000State,
                     compat_flags, E1000_FLAG_TSO_BIT, true),
+    DEFINE_PROP_STRING("primary", E1000State, primary_id_str),
     DEFINE_PROP_END_OF_LIST(),
 };

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index f154756e85..fbe10f4fe1 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -26,7 +26,9 @@
 #include "qapi/qapi-events-net.h"
 #include "hw/virtio/virtio-access.h"
 #include "migration/misc.h"
+#include "hw/pci/pci.h"
 #include "standard-headers/linux/ethtool.h"
+#include "hw/vfio/vfio-common.h"

 #define VIRTIO_NET_VM_VERSION    11

@@ -312,9 +314,14 @@ static void virtio_net_set_link_status(NetClientState *nc)
     uint16_t old_status = n->status;

     if (nc->link_down)
+    {
         n->status &= ~VIRTIO_NET_S_LINK_UP;
+    }
     else
+    {
+
         n->status |= VIRTIO_NET_S_LINK_UP;
+    }

     if (n->status != old_status)
         virtio_notify_config(vdev);
@@ -721,6 +728,16 @@ static void virtio_net_set_features(VirtIODevice
*vdev, uint64_t features)
     } else {
         memset(n->vlans, 0xff, MAX_VLAN >> 3);
     }
+
+    if (virtio_has_feature(features, VIRTIO_NET_F_STANDBY)) {
+        Error * errp;
+        DeviceState *pdev = DEVICE(n->primary_pdev);
+        DeviceClass *klass = DEVICE_GET_CLASS(pdev);
+        if (klass->hotpluggable && n->primary_hph)
+        {
+            hotplug_handler_plug(n->primary_hph, pdev, &errp);
+        }
+    }
 }

 static int virtio_net_handle_rx_mode(VirtIONet *n, uint8_t cmd,
@@ -1946,6 +1963,41 @@ void virtio_net_set_netclient_name(VirtIONet
*n, const char *name,
     n->netclient_type = g_strdup(type);
 }

+static bool primary_device_present(const char *id, struct PCIDevice **pdev)
+{
+    return pci_qdev_find_device(id, pdev) >= 0 &&
+    vfio_is_vfio_pci(*pdev);
+}
+
+bool virtio_net_register_primary_device(DeviceState *dev, DeviceState
*primary_dev)
+{
+    bool ret = false;
+    VirtIONet *n = VIRTIO_NET(dev);
+    Error *errp;
+    DeviceClass *klass = DEVICE_GET_CLASS(primary_dev);
+
+    if (n->primary_pdev == NULL)
+    {
+        n->primary_pdev = PCI_DEVICE(primary_dev);
+    }
+
+    if (n->primary_hph == NULL)
+    {
+        n->primary_hph = qdev_get_hotplug_handler(primary_dev);
+    }
+
+    /* Hide standby from pci till the feature is acked */
+    if (klass->hotpluggable && n->primary_hph)
+    {
+        object_ref(OBJECT(primary_dev));
+        qdev_simple_device_unplug_cb(n->primary_hph, primary_dev, &errp);
+        n->host_features |= (1ULL << VIRTIO_NET_F_STANDBY);
+        ret = true;
+    }
+
+    return ret;
+}
+
 static void virtio_net_device_realize(DeviceState *dev, Error **errp)
 {
     VirtIODevice *vdev = VIRTIO_DEVICE(dev);
@@ -1976,6 +2028,11 @@ static void
virtio_net_device_realize(DeviceState *dev, Error **errp)
         n->host_features |= (1ULL << VIRTIO_NET_F_SPEED_DUPLEX);
     }

+    if (n->net_conf.standby_id_str && primary_device_present(
+        n->net_conf.standby_id_str, &n->primary_pdev)) {
+        virtio_net_register_primary_device(dev, DEVICE(n->primary_pdev));
+    }
+
     virtio_net_set_config_size(n, n->host_features);
     virtio_init(vdev, "virtio-net", VIRTIO_ID_NET, n->config_size);

@@ -2198,6 +2255,7 @@ static Property virtio_net_properties[] = {
                      true),
     DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN),
     DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str),
+    DEFINE_PROP_STRING("standby", VirtIONet, net_conf.standby_id_str),
commit 39a350ee65a26ab6ede4c08b3ca3b9e945fcf305 (HEAD -> failover)
Author: Sameeh Jubran <sjubran@redhat.com>
Date:   Sun Sep 16 13:21:41 2018 +0300

    virtio-net: Implement standby feature

    Signed-off-by: Sameeh Jubran <sjubran@redhat.com>

diff --git a/hw/net/e1000.c b/hw/net/e1000.c
index 13a9494a8d..026b8631ed 100644
--- a/hw/net/e1000.c
+++ b/hw/net/e1000.c
@@ -36,6 +36,8 @@
 #include "qemu/range.h"

 #include "e1000x_common.h"
+#include "hw/virtio/virtio-net.h"
+#include "hw/virtio/virtio-pci.h"

 static const uint8_t bcast[] = {0xff, 0xff, 0xff, 0xff, 0xff, 0xff};

@@ -118,6 +120,7 @@ typedef struct E1000State_st {
     bool mit_timer_on;         /* Mitigation timer is running. */
     bool mit_irq_level;        /* Tracks interrupt pin level. */
     uint32_t mit_ide;          /* Tracks E1000_TXD_CMD_IDE bit. */
+    char *primary_id_str;

 /* Compatibility flags for migration to/from qemu 1.3.0 and older */
 #define E1000_FLAG_AUTONEG_BIT 0
@@ -1652,9 +1655,16 @@ static void e1000_write_config(PCIDevice
*pci_dev, uint32_t address,
     }
 }

+static bool standby_device_present(const char *id,
+        struct PCIDevice **pdev)
+{
+    return pci_qdev_find_device(id, pdev) >= 0;
+}
+
 static void pci_e1000_realize(PCIDevice *pci_dev, Error **errp)
 {
     DeviceState *dev = DEVICE(pci_dev);
+    PCIDevice *standby_pci_dev;
     E1000State *d = E1000(pci_dev);
     uint8_t *pci_conf;
     uint8_t *macaddr;
@@ -1690,6 +1700,12 @@ static void pci_e1000_realize(PCIDevice
*pci_dev, Error **errp)

     d->autoneg_timer = timer_new_ms(QEMU_CLOCK_VIRTUAL,
e1000_autoneg_timer, d);
     d->mit_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, e1000_mit_timer, d);
+    if (d->primary_id_str && standby_device_present(
+            d->primary_id_str, &standby_pci_dev) && standby_pci_dev) {
+        VirtIOPCIProxy *proxy = VIRTIO_PCI(standby_pci_dev);
+        VirtIODevice *vdev = virtio_bus_get_device(&proxy->bus);
+        virtio_net_register_primary_device(DEVICE(vdev), dev);
+    }
 }

 static void qdev_e1000_reset(DeviceState *dev)
@@ -1708,6 +1724,7 @@ static Property e1000_properties[] = {
                     compat_flags, E1000_FLAG_MAC_BIT, true),
     DEFINE_PROP_BIT("migrate_tso_props", E1000State,
                     compat_flags, E1000_FLAG_TSO_BIT, true),
+    DEFINE_PROP_STRING("primary", E1000State, primary_id_str),
     DEFINE_PROP_END_OF_LIST(),
 };

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index f154756e85..fbe10f4fe1 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -26,7 +26,9 @@
 #include "qapi/qapi-events-net.h"
 #include "hw/virtio/virtio-access.h"
 #include "migration/misc.h"
+#include "hw/pci/pci.h"
 #include "standard-headers/linux/ethtool.h"
+#include "hw/vfio/vfio-common.h"

 #define VIRTIO_NET_VM_VERSION    11

@@ -312,9 +314,14 @@ static void virtio_net_set_link_status(NetClientState *nc)
     uint16_t old_status = n->status;

     if (nc->link_down)
+    {
         n->status &= ~VIRTIO_NET_S_LINK_UP;
+    }
     else
+    {
+
         n->status |= VIRTIO_NET_S_LINK_UP;
+    }

     if (n->status != old_status)
         virtio_notify_config(vdev);
@@ -721,6 +728,16 @@ static void virtio_net_set_features(VirtIODevice
*vdev, uint64_t features)
     } else {
         memset(n->vlans, 0xff, MAX_VLAN >> 3);
     }
+
+    if (virtio_has_feature(features, VIRTIO_NET_F_STANDBY)) {
+        Error * errp;
+        DeviceState *pdev = DEVICE(n->primary_pdev);
+        DeviceClass *klass = DEVICE_GET_CLASS(pdev);
+        if (klass->hotpluggable && n->primary_hph)
+        {
+            hotplug_handler_plug(n->primary_hph, pdev, &errp);
+        }
+    }
 }

 static int virtio_net_handle_rx_mode(VirtIONet *n, uint8_t cmd,
@@ -1946,6 +1963,41 @@ void virtio_net_set_netclient_name(VirtIONet
*n, const char *name,
     n->netclient_type = g_strdup(type);
 }

+static bool primary_device_present(const char *id, struct PCIDevice **pdev)
+{
+    return pci_qdev_find_device(id, pdev) >= 0 &&
+    vfio_is_vfio_pci(*pdev);
+}
+
+bool virtio_net_register_primary_device(DeviceState *dev, DeviceState
*primary_dev)
+{
+    bool ret = false;
+    VirtIONet *n = VIRTIO_NET(dev);
+    Error *errp;
+    DeviceClass *klass = DEVICE_GET_CLASS(primary_dev);
+
+    if (n->primary_pdev == NULL)
+    {
+        n->primary_pdev = PCI_DEVICE(primary_dev);
+    }
+
+    if (n->primary_hph == NULL)
+    {
+        n->primary_hph = qdev_get_hotplug_handler(primary_dev);
+    }
+
+    /* Hide standby from pci till the feature is acked */
+    if (klass->hotpluggable && n->primary_hph)
+    {
+        object_ref(OBJECT(primary_dev));
+        qdev_simple_device_unplug_cb(n->primary_hph, primary_dev, &errp);
+        n->host_features |= (1ULL << VIRTIO_NET_F_STANDBY);
+        ret = true;
+    }
+
+    return ret;
+}
+
 static void virtio_net_device_realize(DeviceState *dev, Error **errp)
 {
     VirtIODevice *vdev = VIRTIO_DEVICE(dev);
@@ -1976,6 +2028,11 @@ static void
virtio_net_device_realize(DeviceState *dev, Error **errp)
         n->host_features |= (1ULL << VIRTIO_NET_F_SPEED_DUPLEX);
     }

+    if (n->net_conf.standby_id_str && primary_device_present(
+        n->net_conf.standby_id_str, &n->primary_pdev)) {
+        virtio_net_register_primary_device(dev, DEVICE(n->primary_pdev));
+    }
+
     virtio_net_set_config_size(n, n->host_features);
     virtio_init(vdev, "virtio-net", VIRTIO_ID_NET, n->config_size);

@@ -2198,6 +2255,7 @@ static Property virtio_net_properties[] = {
                      true),
     DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN),
     DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str),
+    DEFINE_PROP_STRING("standby", VirtIONet, net_conf.standby_id_str),
     DEFINE_PROP_END_OF_LIST(),
 };

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 866f0deeb7..593debe56e 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -220,6 +220,12 @@ static void vfio_intx_disable_kvm(VFIOPCIDevice *vdev)
 #endif
 }

+bool vfio_is_vfio_pci(PCIDevice* pdev)
+{
+    VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
+    return vdev->vbasedev.type == VFIO_DEVICE_TYPE_PCI;
+}
+
 static void vfio_intx_update(PCIDevice *pdev)
 {
     VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 821def0565..26dfde805f 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -195,5 +195,6 @@ int vfio_spapr_create_window(VFIOContainer *container,
                              hwaddr *pgsize);
 int vfio_spapr_remove_window(VFIOContainer *container,
                              hwaddr offset_within_address_space);
+bool vfio_is_vfio_pci(PCIDevice* pdev);

 #endif /* HW_VFIO_VFIO_COMMON_H */
diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
index 4d7f3c82ca..3b86f17805 100644
--- a/include/hw/virtio/virtio-net.h
+++ b/include/hw/virtio/virtio-net.h
@@ -42,6 +42,7 @@ typedef struct virtio_net_conf
     int32_t speed;
     char *duplex_str;
     uint8_t duplex;
+    char *standby_id_str;
 } virtio_net_conf;

 /* Maximum packet size we can receive from tap device: header + 64k */
@@ -103,9 +104,13 @@ typedef struct VirtIONet {
     int announce_counter;
     bool needs_vnet_hdr_swap;
     bool mtu_bypass_backend;
+    PCIDevice *primary_pdev;
+    HotplugHandler *primary_hph;
 } VirtIONet;

 void virtio_net_set_netclient_name(VirtIONet *n, const char *name,
                                    const char *type);
+bool virtio_net_register_primary_device(DeviceState *vdev, DeviceState *pdev);
+

 #endif
On Sun, Sep 30, 2018 at 12:17 PM Sameeh Jubran <sameeh@daynix.com> wrote:
>
> On Thu, Sep 27, 2018 at 8:25 PM Samudrala, Sridhar
> <sridhar.samudrala@intel.com> wrote:
> >
> >
> > On 9/27/2018 9:17 AM, Michael S. Tsirkin wrote:
> > > On Thu, Sep 27, 2018 at 10:17:37AM +0300, Sameeh Jubran wrote:
> > >> What do you think about the following alternative implementation which
> > >> uses cross id validation.
> > >>
> > >> -device virtio-net,standby=<device_id_of_vfio_device>,id=<device_id_of_virtio_net_device>
> > >> -vfio #address,id=<device_id_of_vfio_device>,primary=<device_id_of_virtio_net_device>
> > > virtio is a standby device, isn't it?
> > >
> > > Besides that I don't see issues with this API.
> >
> > Yes. I think 'standby' and 'primary' are reversed in the above suggestion. This should work.
> hmm, I thought about standby being a property that virtio-net has, and
> this property is that it is a standby for the
> primary device and vice versa for the vfio. However this can be viewed
> from different aspects and isn't a deal breaker :)
> >
> > -device virtio-net,primary=<device_id_of_vfio_device>,id=<device_id_of_virtio_net_device>
> > -vfio #address,id=<device_id_of_vfio_device>,standby=<device_id_of_virtio_net_device>
> >
> > It should be OK to have virtio-net in standby mode without an associated vfio primary device,
> > but a vfio primary device should not allowed without a virtio-net standby device.
> >
> > Will it be possible to remove vfio device when virtio_net driver is unloaded in the VM?
> >
> > Thanks
> > Sridhar
> >
> >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> >
>
>
> --
> Respectfully,
> Sameeh Jubran
> Linkedin
> Software Engineer @ Daynix.



-- 
Respectfully,
Sameeh Jubran
Linkedin
Software Engineer @ Daynix.

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-09-27 16:32                         ` Michael S. Tsirkin
@ 2018-10-02  8:42                           ` Siwei Liu
  2018-10-02 12:43                             ` Michael S. Tsirkin
  0 siblings, 1 reply; 85+ messages in thread
From: Siwei Liu @ 2018-10-02  8:42 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Venu Busireddy, Cornelia Huck, Samudrala, Sridhar, virtio-dev

On Thu, Sep 27, 2018 at 9:32 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, Sep 26, 2018 at 05:18:38PM -0700, Siwei Liu wrote:
> > On Thu, Sep 20, 2018 at 7:23 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > > On Thu, Sep 20, 2018 at 04:57:56PM -0700, Siwei Liu wrote:
> > >> On Wed, Sep 19, 2018 at 8:11 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > >> > On Tue, Sep 18, 2018 at 11:48:46AM -0700, Siwei Liu wrote:
> > >> >> On Tue, Sep 18, 2018 at 8:31 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > >> >> > On Tue, Sep 18, 2018 at 10:13:37AM -0500, Venu Busireddy wrote:
> > >> >> >> On 2018-09-18 09:35:48 -0400, Michael S. Tsirkin wrote:
> > >> >> >> > On Tue, Sep 18, 2018 at 12:20:52PM +0200, Cornelia Huck wrote:
> > >> >> >> > > On Wed, 12 Sep 2018 11:22:12 -0400
> > >> >> >> > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > >> >> >> > >
> > >> >> >> > > > On Wed, Sep 12, 2018 at 08:17:45AM -0700, Samudrala, Sridhar wrote:
> > >> >> >> > > > >
> > >> >> >> > > > >
> > >> >> >> > > > > On 9/7/2018 2:34 PM, Michael S. Tsirkin wrote:
> > >> >> >> > > > > > On Wed, Aug 15, 2018 at 11:49:15AM -0700, Sridhar Samudrala wrote:
> > >> >> >> > > > > > > VIRTIO_NET_F_STANDBY feature enables hypervisor to indicate virtio_net
> > >> >> >> > > > > > > device to act as a standby for another device with the same MAC address.
> > >> >> >> > > > > > >
> > >> >> >> > > > > > > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
> > >> >> >> > > > > > > Acked-by: Cornelia Huck <cohuck@redhat.com>
> > >> >> >> > > > > > > Fixes: https://github.com/oasis-tcs/virtio-spec/issues/18
> > >> >> >> > > > > > Applied but when do you plan to add documentation as pointed
> > >> >> >> > > > > > out by Jan and Halil?
> > >> >> >> > > > >
> > >> >> >> > > > > I thought additional documentation will be done as part of the Qemu enablement
> > >> >> >> > > > > patches and i hope someone in RH is looking into it.
> > >> >> >> > > > >
> > >> >> >> > > > > Does it make sense to add a link to to the kernel documentation of this feature in
> > >> >> >> > > > > the spec
> > >> >> >> > > > >  https://www.kernel.org/doc/html/latest/networking/net_failover.html
> > >> >> >> > > >
> > >> >> >> > > >
> > >> >> >> > > > I do not think this will address the comments posted.  Specifically we
> > >> >> >> > > > should probably include documentation for what is a standby and primary:
> > >> >> >> > > > what is expected of driver (maintain configuration on standby, support
> > >> >> >> > > > primary coming and going, transmit on standby only if there is no
> > >> >> >> > > > primary) and of device (have same mac for standby as for standby).
> > >> >> >> > >
> > >> >> >> > > Yes, we need some definitive statements of what a driver and a device
> > >> >> >> > > is supposed to do in order to conform; it might make sense to discuss
> > >> >> >> > > this in conjunction with discussion on any QEMU patches (have not
> > >> >> >> > > checked whether anything has been posted, just returned from vacation).
> > >> >> >> > >
> > >> >> >> > > I assume that we still stick with the plan to implement/document
> > >> >> >> > > MAC-based handling first and then enhance with other methods later?
> > >> >> >> >
> > >> >> >> > I'm fine with that at least. If someone wants to work on
> > >> >> >> > other methods straight away, that's also fine by me.
> > >> >> >>
> > >> >> >> Patch set [1] implements the failover-group-id mechanism. Are you
> > >> >> >> thinking of some other method?
> > >> >> >>
> > >> >> >> Venu
> > >> >> >>
> > >> >> >> [1] https://lists.oasis-open.org/archives/virtio-dev/201806/msg00384.html
> > >> >> >>
> > >> >> >
> > >> >> > Yes, the grouping mechanism seems fine to me (I don't remember
> > >> >> > about the implementation, it's been a while).
> > >> >> >
> > >> >> > It is not by itself sufficient though, is it?
> > >> >>
> > >> >> I do understand that the group ID patch is incomplete though it's a
> > >> >> base patch for the real work.
> > >> >>
> > >> >> >
> > >> >> > MAC is assumed to be shared to avoid things like ARP/neighboor
> > >> >> > rediscovery, right?
> > >> >>
> > >> >> True, but does this really need to be part of the guest-host
> > >> >> interface? Or rather, I don't see how MAC based matching can be done
> > >> >> on the host part.
> > >> >
> > >> > mac address matching does not need to affect host side.
> > >>
> > >> Did you realize that the host side can't have duplicate MAC address
> > >> filters for both PV and VF at the same time?
> > >>
> > >> If hot adding a VF with duplicate MAC address filter programmed in
> > >> prior, the PV path for virtio in the host side is effectively
> > >> disabled. However, the fact that VF gets hot plugged by QEMU/libvirt
> > >> does not mean it's ready and usable in the guest. You end up with
> > >> unusable guest networking, *temporarily only when VF is successfully
> > >> probed and properly enslabed*. As of now, no guest-host handshake was
> > >> defined in the spec to make virtio driver aware of hotplug event thus
> > >> VF's exposure, and zero handshake was done to switch the datapath when
> > >> VF driver is ready and usable in guest. The current implementation
> > >> relies on the lucky side that all the entire hot plug process will be
> > >> successul in the guest.
> > >
> > > I think it's a PF bug then. PF driver should ignore filters
> > > for VFs which have not been enabled by guest since reset.
> >
> > Even so, the fact is that if the design is tied to MAC based matching
> > you end up with relying on that MAC address to pair device, which
> > loses the flexibility to move MAC filter at some point later after
> > assigning VF to guest.
>
> Whatever you use for pairing, you still need to reuse same MAC
> to avoid redoing neighbour disovery/arp, right?

The VF's MAC can be updated by PF/host on the fly at any time. One can
start with a random MAC but use group ID to pair device instead. And
only update MAC address to the real one when moving MAC filter around
after PV says OK to switch datapath.

Do you see any problem with this design?

>
> > >
> > >> BTW netvsc mitigate potential failure in the hotplug and driver
> > >> probing by acknowledging the hypervisor through a DATAPATH_SWITCH
> > >> hypercall (VMbus message) when VF driver is enslaved and ready, only
> > >> then hypervisor will kick off datapath switching by moving the MAC
> > >> address filter.
> > >
> > > We can do it without need for PV.  We can detect e.g. bus master enable.
> >
> > I'm not sure if it's valid to assume master enable/disable is the
> > right point to move the filter, although it improves a bit than do
> > nothing. The thing is that from device (QEMU) perspective it knows
> > nothing and should not assume too much about guest implementation -
> > the time to move the filter around means the VF driver is fully ready
> > in guest and properly handled by the bond driver (net_failover), so
> > the primary can take over the datapath going forward. While the bus
> > master enable usually happens earlier than that, which does not
> > indicate anything about readiness on the control side that the
> > bond/failver driver can actually see this VF and "manage" it. This
> > strictly does not form any guest-host handshake to me. Think about
> > what if VM user changes VF to a different netns, or rebind it to DPDK
> > PMD?
>
> OK. What then?

The guest should have liberty to switch datapath for its own. Host
never knows when VF will be ready and useful in guest. The assumption
that MAC based matching can blindly switch host datapath at the time
of hot plugging is pretty fragile. There's no gurantee of the time or
availability for a useful VF path within the VM. I think the number
one goal of live migration is to ensure the connections are alive
rather than migrate anyway without caring guest activity.

>
> > These just demostrate a few things that can get well covered by
> > this design, and I suspect the errors in the real life would be much
> > more complex.
>
> What's missing is actual design though. The only thing that I saw so far
> is bridge group identifier for qemu, which is a start but doesn't
> actually solve any problems by itself. You even said "forget about the
> grouping mechanism" yourself below.

Then please come up with a more robust design sticking to MAC based
matching. The current one does not seem appealing at all to run in
production.

>
> > > Move the filter when enabled, move it back when disabled e.g. by
> > > VF reset. Or maybe MSE, or both.
> >
> > MSE is on the PF and shared by all VFs, why it's relevant?
>
> Oh, right. Just FLR then.
>
> > >
> > >> >
> > >> >> Are you going to expose MAC address to VFIO?
> > >> >
> > >> > If mac of a VF is programmed by libvirt through the PF
> > >> > (that's already the case), VFIO does not need to care about it.
> > >> >
> > >> >>
> > >> >> The thing is the current MAC based implementation has intrinsic flaw
> > >> >> that doesn't propagate errors to hypervisor, or there's no back
> > >> >> channel for guest to unwind the hot plug action upon failure in
> > >> >> probing or enslaving the primary.
> > >> >
> > >> > I guess you can eject the primary if you like. But
> > >> > why does hypervisor need to know? On error, just don't use primary,
> > >> > use standby.
> > >>
> > >> Forget about the grouping mechanism first.
> > >
> > > OK :)
> > >
> > >> What guest kernel change do
> > >> you propose to make virtio driver know every possible error, think
> > >> about how many moving targets it needs to specifically track with or
> > >> has to depend on during the hot plug and driver probing process? If
> > >> someone starts to implement the code and think about various error
> > >> cases as a whole, I bet it would be more clear why grouping is
> > >> relevant in the first place.
> > >>
> > >> -Siwei
> > >
> > > It just seems that no one's been motivated to do it so far.
> >
> > It's just that the MAC matching design is simply too broken.
>
> Too broken to even bother coding up any alternatives?

No, but we need to make sure everything works for our iSCSI setup
before posting patches back. And we've been in active discussions
internally for some interesting scenarios and requirements.

>
> > We have
> > root disk hosted on networked storage, i.e. iSCSI, that can't tolerate
> > any potential network failure if the design itself is not error proof.
> > IOW our criteria for network downtime and errors is super rigorous..
> >
> > -Siwei
>
> IMO that's a very interesting usecase to address! I'll be happy to merge
> patches that help reduce downtime, spec-wise I'll be happy to propose
> them for TC vote.

Good. Hopefully we'll come back soon. :)

-Siwei
>
> > >
> > >> >
> > >> >> If you think about a more robust
> > >> >> implementation, another grouping mechanism rather than MAC is pretty
> > >> >> much required.
> > >> >>
> > >> >> Thanks,
> > >> >> -Siwei
> > >> >
> > >> > I don't really know what is the flaw, or how is it fixed by a grouping
> > >> > mechanism. All this motivation was never described as part of work on
> > >> > an alternate grouping.
> > >> >
> > >> >> > If true that implies that to avoid guest confusion visibility of the
> > >> >> > primary needs to be controlled by standby's driver.
> > >> >> > This makes this patchset incomplete.
> > >> >> >
> > >> >> > For this work to be complete what is needed is:
> > >> >> > - hypervisor: add control of primary's visibility to guest
> > >> >> > - guest: add support for this grouping to the failover driver
> > >> >> >
> > >> >> > We also need
> > >> >> > - spec: document matching rules based on the pci bridge
> > >> >> >
> > >> >> > and it's helpful to have a spec proposal with implementation, but I
> > >> >> > would say at least proposed patches to one of the above 2 would be
> > >> >> > helpful before we include this in spec.
> > >> >> >
> > >> >> > --
> > >> >> > MST
> > >> >> >
> > >> >> > ---------------------------------------------------------------------
> > >> >> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > >> >> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > >> >> >

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-10-02  8:42                           ` Siwei Liu
@ 2018-10-02 12:43                             ` Michael S. Tsirkin
  2018-10-05  0:03                               ` Siwei Liu
  0 siblings, 1 reply; 85+ messages in thread
From: Michael S. Tsirkin @ 2018-10-02 12:43 UTC (permalink / raw)
  To: Siwei Liu; +Cc: Venu Busireddy, Cornelia Huck, Samudrala, Sridhar, virtio-dev

On Tue, Oct 02, 2018 at 01:42:09AM -0700, Siwei Liu wrote:
> The VF's MAC can be updated by PF/host on the fly at any time. One can
> start with a random MAC but use group ID to pair device instead. And
> only update MAC address to the real one when moving MAC filter around
> after PV says OK to switch datapath.
> 
> Do you see any problem with this design?

Isn't this what I proposed:
	Maybe we can
	start VF with a temporary MAC, then change it to a final one when guest
	tries to use it. It will work but we run into fact that MACs are
	currently programmed by mgmnt - in many setups qemu does not have the
	rights to do it.

?

If yes I don't see a problem with the interface design, even though
implementation wise it's more work as it will have to include management
changes.

-- 
MST

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-10-02 12:43                             ` Michael S. Tsirkin
@ 2018-10-05  0:03                               ` Siwei Liu
  2018-10-05  5:17                                 ` Samudrala, Sridhar
  2018-10-05 19:18                                 ` Michael S. Tsirkin
  0 siblings, 2 replies; 85+ messages in thread
From: Siwei Liu @ 2018-10-05  0:03 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Venu Busireddy, Cornelia Huck, Samudrala, Sridhar, virtio-dev

On Tue, Oct 2, 2018 at 5:43 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, Oct 02, 2018 at 01:42:09AM -0700, Siwei Liu wrote:
> > The VF's MAC can be updated by PF/host on the fly at any time. One can
> > start with a random MAC but use group ID to pair device instead. And
> > only update MAC address to the real one when moving MAC filter around
> > after PV says OK to switch datapath.
> >
> > Do you see any problem with this design?
>
> Isn't this what I proposed:
>         Maybe we can
>         start VF with a temporary MAC, then change it to a final one when guest
>         tries to use it. It will work but we run into fact that MACs are
>         currently programmed by mgmnt - in many setups qemu does not have the
>         rights to do it.
>
> ?
>
> If yes I don't see a problem with the interface design, even though
> implementation wise it's more work as it will have to include management
> changes.

I thought we discussed this design a while back:
https://www.spinics.net/lists/netdev/msg512232.html

... plug in a VF with a random MAC filter programmed in prior, and
initially use that random MAC within guest. This would require:
a) not relying on permanent MAC address to do pairing during the
initial discovery, e.g. use the failover group ID as in this
discussion
b) host to toggle the MAC address filter: which includes taking down
the tap device to return the MAC back to PF, followed by assigning
that MAC to VF using "ip link ... set vf ..."
c) notify guest to reload/reset VF driver for the change of hardware MAC address
d) until VF reloads the driver it won't be able to use the datapath,
so very short period of network outage is (still) expected

though I still don't think this design can elimnate downtime. However,
it looks like as of today the MAC matching still haven't addressed the
datapath switching and error handling in a clean way. As said, for
SR-IOV live migration on iSCSI root disk there will be a lot of
dancing parts going along the way, reliable network connectity and
dedicated handshakes are critical to this kind of setup.

-Siwei

>
> --
> MST
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-10-05  0:03                               ` Siwei Liu
@ 2018-10-05  5:17                                 ` Samudrala, Sridhar
  2018-10-10 14:40                                   ` Michael S. Tsirkin
  2018-10-05 19:18                                 ` Michael S. Tsirkin
  1 sibling, 1 reply; 85+ messages in thread
From: Samudrala, Sridhar @ 2018-10-05  5:17 UTC (permalink / raw)
  To: Siwei Liu, Michael S. Tsirkin; +Cc: Venu Busireddy, Cornelia Huck, virtio-dev

On 10/4/2018 5:03 PM, Siwei Liu wrote:
> On Tue, Oct 2, 2018 at 5:43 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>> On Tue, Oct 02, 2018 at 01:42:09AM -0700, Siwei Liu wrote:
>>> The VF's MAC can be updated by PF/host on the fly at any time. One can
>>> start with a random MAC but use group ID to pair device instead. And
>>> only update MAC address to the real one when moving MAC filter around
>>> after PV says OK to switch datapath.
>>>
>>> Do you see any problem with this design?
>> Isn't this what I proposed:
>>          Maybe we can
>>          start VF with a temporary MAC, then change it to a final one when guest
>>          tries to use it. It will work but we run into fact that MACs are
>>          currently programmed by mgmnt - in many setups qemu does not have the
>>          rights to do it.
>>
>> ?
>>
>> If yes I don't see a problem with the interface design, even though
>> implementation wise it's more work as it will have to include management
>> changes.
> I thought we discussed this design a while back:
> https://www.spinics.net/lists/netdev/msg512232.html
>
> ... plug in a VF with a random MAC filter programmed in prior, and
> initially use that random MAC within guest. This would require:
> a) not relying on permanent MAC address to do pairing during the
> initial discovery, e.g. use the failover group ID as in this
> discussion
> b) host to toggle the MAC address filter: which includes taking down
> the tap device to return the MAC back to PF, followed by assigning
> that MAC to VF using "ip link ... set vf ..."
> c) notify guest to reload/reset VF driver for the change of hardware MAC address
> d) until VF reloads the driver it won't be able to use the datapath,
> so very short period of network outage is (still) expected
>
> though I still don't think this design can elimnate downtime. However,
> it looks like as of today the MAC matching still haven't addressed the
> datapath switching and error handling in a clean way.

I am not sure what is the issue with datapath switching with the net_failover solution.

Do you see any issues with the migration management layer to automate the steps
that are listed in the example script in the documentation.
https://www.kernel.org/doc/html/latest/networking/net_failover.html

Now that we are considering making the VF visible only when the standby negotiation
is completed, i am not sure why we need a random MAC.


> As said, for
> SR-IOV live migration on iSCSI root disk there will be a lot of
> dancing parts going along the way, reliable network connectity and
> dedicated handshakes are critical to this kind of setup.
>
> -Siwei
>
>> --
>> MST
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-10-05  0:03                               ` Siwei Liu
  2018-10-05  5:17                                 ` Samudrala, Sridhar
@ 2018-10-05 19:18                                 ` Michael S. Tsirkin
  2018-10-08 22:06                                   ` Sameeh Jubran
  2018-10-11  1:26                                   ` Siwei Liu
  1 sibling, 2 replies; 85+ messages in thread
From: Michael S. Tsirkin @ 2018-10-05 19:18 UTC (permalink / raw)
  To: Siwei Liu; +Cc: Venu Busireddy, Cornelia Huck, Samudrala, Sridhar, virtio-dev

On Thu, Oct 04, 2018 at 05:03:14PM -0700, Siwei Liu wrote:
> On Tue, Oct 2, 2018 at 5:43 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Tue, Oct 02, 2018 at 01:42:09AM -0700, Siwei Liu wrote:
> > > The VF's MAC can be updated by PF/host on the fly at any time. One can
> > > start with a random MAC but use group ID to pair device instead. And
> > > only update MAC address to the real one when moving MAC filter around
> > > after PV says OK to switch datapath.
> > >
> > > Do you see any problem with this design?
> >
> > Isn't this what I proposed:
> >         Maybe we can
> >         start VF with a temporary MAC, then change it to a final one when guest
> >         tries to use it. It will work but we run into fact that MACs are
> >         currently programmed by mgmnt - in many setups qemu does not have the
> >         rights to do it.
> >
> > ?
> >
> > If yes I don't see a problem with the interface design, even though
> > implementation wise it's more work as it will have to include management
> > changes.
> 
> I thought we discussed this design a while back:
> https://www.spinics.net/lists/netdev/msg512232.html
> 
> ... plug in a VF with a random MAC filter programmed in prior, and
> initially use that random MAC within guest. This would require:
> a) not relying on permanent MAC address to do pairing during the
> initial discovery, e.g. use the failover group ID as in this
> discussion
> b) host to toggle the MAC address filter: which includes taking down
> the tap device to return the MAC back to PF, followed by assigning
> that MAC to VF using "ip link ... set vf ..."
> c) notify guest to reload/reset VF driver for the change of hardware MAC address
> d) until VF reloads the driver it won't be able to use the datapath,
> so very short period of network outage is (still) expected
>
> though I still don't think this design can elimnate downtime.


No, my idea is somewhat different. As you say there is a problem
of delay at point (c). Further, the need to poke at PF filters
with set vf does not match the current security model where
any security related configuration such as MAC filtering is done upfront.


So I have two suggestions:

1. Teach pf driver not to program the filter until vf driver actually goes up.

   How do we know it went up? For example, it is highly likely
   that driver will send some kind of command on init.
   E.g. linux seems to always try to set the mac address during init.
   We can have any kind of command received by the PF enable
   the filter, until reset.

   In absence of an appropriate command, QEMU can detect bus master
   enable and do that.

2. Create a variant of trusted VF where it starts out without a valid
   MAC, guest can set a softmac MAC but only can set it to the specific
   value that matches virtio.
   Alternatively - if it's preferred for some reason - allow
   guest to program just two MACs, the original one and the virtio one.
   Any other value is denied.



> However,
> it looks like as of today the MAC matching still haven't addressed the
> datapath switching and error handling in a clean way. As said, for
> SR-IOV live migration on iSCSI root disk there will be a lot of
> dancing parts going along the way, reliable network connectity and
> dedicated handshakes are critical to this kind of setup.
> 
> -Siwei

I think MAC matching removes downtime when device is removed but not
when it's re-added, yes. It has the advantage of an already present
linux driver support, but if you are prepared to work on
adding e.g. bridge based matching, that will go away.


> >
> > --
> > MST
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-10-05 19:18                                 ` Michael S. Tsirkin
@ 2018-10-08 22:06                                   ` Sameeh Jubran
  2018-10-10 14:43                                     ` Michael S. Tsirkin
  2018-10-11  1:26                                   ` Siwei Liu
  1 sibling, 1 reply; 85+ messages in thread
From: Sameeh Jubran @ 2018-10-08 22:06 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Siwei Liu, venu.busireddy, cohuck, sridhar.samudrala, virtio-dev,
	ehabkost

Hi All,

I have been busy trying to figure out how to implement the feature and
got very confused with the current open questions and tier impact.

As I have stated earlier, I thought about doing the following:

1 - Have an id for virtio-net (the standby device) and one for the
vfio device (primary).
2 - On realize of virtio-net check for the existence of the primary
device and hide the standby feature.
3 - Once the feature is acked by the guest, the device would be
plugged back by virtio-net.

I've faced few issues when I tried to implement this which I overcame.
At the end of the email I've included a prototype which implements
this basic functionality using e1000 instead of vfio net device, I'm
sharing this as a draft only (it has many flaws) as parts of it are
valid for the actual implementation.

Issues that I've faced:

* I've used a device_listener callbacks it get the device to register
itself for virtio-net. This makes virtio-net listen to the realization
of the device. I don't think this approach is right, as it makes the
virtio-net listen to every device which can be avoided by extending
the current implementation of the device listner, Moreover, this
doesn't solve the migration issues, as far as I understand, the
realize function doesn't get called after the migration process which
means this doesn't work. (correct me if I'm wrong)

* When testing with PC machine type which uses the PIIX4 as the
hotplug handler, the hotplug handler get's set after the virtio-net
and e1000 device has been realized. This means that I can't save the
hotplug handler before detaching the device which means I can't plug
it back as when the device is unplugged it is unattached from it's
parent bus. This was resolved by saving a pointer to the parent bus
instead and when attempting to replug the device then the parent can
be used to get the hotplug handler. Note that unplugging the device
using "qdev_simple_device_unplug_cb" doesn't require the hotplug
handler as this function simply detaches the device object from it's
parent object (the pci bus).

I've talked to Eduardo and he mentioned that he and Michael had
discussed the following approach: using a property (for pci devices
currently and maybe for others in the future?) which tells Qemu to
hide the device from the bus upon init. This approach leaves the
responsibility of managing the failover device to the management. The
management can send commands to plug the hidden device or hide it back
as well. I think that I like this approach better as it is proof of
issues that can come up when trying to handle the failure of
unplug/plug requests to the guest.

Please share your thoughts on this approach versus the draft implementation.

_____________________________________________________________________________________________________

commit 06afc24a613b2cb31c064859e89b709ec54fecdc (HEAD -> failover)
Author: Sameeh Jubran <sjubran@redhat.com>
Date:   Sun Sep 16 13:21:41 2018 +0300

    virtio-net: Implement standby feature

    Signed-off-by: Sameeh Jubran <sjubran@redhat.com>

diff --git a/hw/net/e1000.c b/hw/net/e1000.c
index 13a9494a8d..387d8856c0 100644
--- a/hw/net/e1000.c
+++ b/hw/net/e1000.c
@@ -36,6 +36,8 @@
 #include "qemu/range.h"

 #include "e1000x_common.h"
+#include "hw/virtio/virtio-net.h"
+#include "hw/virtio/virtio-pci.h"

 static const uint8_t bcast[] = {0xff, 0xff, 0xff, 0xff, 0xff, 0xff};

@@ -118,6 +120,7 @@ typedef struct E1000State_st {
     bool mit_timer_on;         /* Mitigation timer is running. */
     bool mit_irq_level;        /* Tracks interrupt pin level. */
     uint32_t mit_ide;          /* Tracks E1000_TXD_CMD_IDE bit. */
+    char *primary_id_str;

 /* Compatibility flags for migration to/from qemu 1.3.0 and older */
 #define E1000_FLAG_AUTONEG_BIT 0
@@ -1652,9 +1655,16 @@ static void e1000_write_config(PCIDevice
*pci_dev, uint32_t address,
     }
 }

+static bool standby_device_present(const char *id,
+        struct PCIDevice **pdev)
+{
+    return pci_qdev_find_device(id, pdev) >= 0;
+}
+
 static void pci_e1000_realize(PCIDevice *pci_dev, Error **errp)
 {
     DeviceState *dev = DEVICE(pci_dev);
+    PCIDevice *standby_pci_dev;
     E1000State *d = E1000(pci_dev);
     uint8_t *pci_conf;
     uint8_t *macaddr;
@@ -1690,6 +1700,12 @@ static void pci_e1000_realize(PCIDevice
*pci_dev, Error **errp)

     d->autoneg_timer = timer_new_ms(QEMU_CLOCK_VIRTUAL,
e1000_autoneg_timer, d);
     d->mit_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, e1000_mit_timer, d);
+    if (d->primary_id_str && standby_device_present(
+            d->primary_id_str, &standby_pci_dev) && standby_pci_dev) {
+        VirtIOPCIProxy *proxy = VIRTIO_PCI(standby_pci_dev);
+        VirtIODevice *vdev = virtio_bus_get_device(&proxy->bus);
+        virtio_net_register_primary_device(DEVICE(vdev));
+    }
 }

 static void qdev_e1000_reset(DeviceState *dev)
@@ -1708,6 +1724,7 @@ static Property e1000_properties[] = {
                     compat_flags, E1000_FLAG_MAC_BIT, true),
     DEFINE_PROP_BIT("migrate_tso_props", E1000State,
                     compat_flags, E1000_FLAG_TSO_BIT, true),
+    DEFINE_PROP_STRING("primary", E1000State, primary_id_str),
     DEFINE_PROP_END_OF_LIST(),
 };

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index f154756e85..b831ba438b 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -26,7 +26,9 @@
 #include "qapi/qapi-events-net.h"
 #include "hw/virtio/virtio-access.h"
 #include "migration/misc.h"
+#include "hw/pci/pci.h"
 #include "standard-headers/linux/ethtool.h"
+#include "hw/vfio/vfio-common.h"

 #define VIRTIO_NET_VM_VERSION    11

@@ -721,6 +723,20 @@ static void virtio_net_set_features(VirtIODevice
*vdev, uint64_t features)
     } else {
         memset(n->vlans, 0xff, MAX_VLAN >> 3);
     }
+
+    if (virtio_has_feature(features, VIRTIO_NET_F_STANDBY)) {
+        Error * errp;
+        DeviceState *pdev = DEVICE(n->primary_pdev);
+        DeviceClass *klass = DEVICE_GET_CLASS(pdev);
+
+        /* Plug the primary device back to the pci bus */
+        if (klass->hotpluggable && n->primary_parent_bus)
+        {
+            BusState *qbus = BUS(n->primary_parent_bus);
+            hotplug_handler_plug(qbus->hotplug_handler, pdev,
+                    &errp);
+        }
+    }
 }

 static int virtio_net_handle_rx_mode(VirtIONet *n, uint8_t cmd,
@@ -1946,6 +1962,52 @@ void virtio_net_set_netclient_name(VirtIONet
*n, const char *name,
     n->netclient_type = g_strdup(type);
 }

+static bool primary_device_present(const char *id, struct PCIDevice **pdev)
+{
+    return pci_qdev_find_device(id, pdev) >= 0 &&
+    vfio_is_vfio_pci(*pdev);
+}
+
+
+static void primary_device_realize(DeviceListener *listener,
+                              DeviceState *dev)
+{
+    VirtIONet *n = container_of(listener, VirtIONet, primary_listener);
+
+    if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE) && dev->id
+    && !strcmp(dev->id, n->net_conf.standby_id_str))
+    {
+        Error *errp;
+        DeviceClass *klass = DEVICE_GET_CLASS(dev);
+
+        if (n->primary_pdev == NULL)
+        {
+            n->primary_pdev = PCI_DEVICE(dev);
+        }
+
+        if (n->primary_parent_bus == NULL)
+        {
+            n->primary_parent_bus = qdev_get_parent_bus(dev);
+        }
+
+        /* Hide standby from pci till the feature is acked */
+        if (klass->hotpluggable && n->primary_parent_bus)
+        {
+            object_ref(OBJECT(dev));
+            qdev_simple_device_unplug_cb(NULL ,dev, &errp);
+            n->host_features |= (1ULL << VIRTIO_NET_F_STANDBY);
+        }
+    }
+}
+
+void virtio_net_register_primary_device(DeviceState *dev)
+{
+    VirtIONet *n = VIRTIO_NET(dev);
+    n->primary_listener.realize = primary_device_realize;
+    n->primary_listener.unrealize = NULL;
+    device_listener_register(&n->primary_listener);
+}
+
 static void virtio_net_device_realize(DeviceState *dev, Error **errp)
 {
     VirtIODevice *vdev = VIRTIO_DEVICE(dev);
@@ -1976,6 +2038,11 @@ static void
virtio_net_device_realize(DeviceState *dev, Error **errp)
         n->host_features |= (1ULL << VIRTIO_NET_F_SPEED_DUPLEX);
     }

+    if (n->net_conf.standby_id_str && primary_device_present(
+        n->net_conf.standby_id_str, &n->primary_pdev)) {
+        virtio_net_register_primary_device(dev);
+    }
+
     virtio_net_set_config_size(n, n->host_features);
     virtio_init(vdev, "virtio-net", VIRTIO_ID_NET, n->config_size);

@@ -2198,6 +2265,7 @@ static Property virtio_net_properties[] = {
                      true),
     DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN),
     DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str),
+    DEFINE_PROP_STRING("standby", VirtIONet, net_conf.standby_id_str),
     DEFINE_PROP_END_OF_LIST(),
 };

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 866f0deeb7..593debe56e 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -220,6 +220,12 @@ static void vfio_intx_disable_kvm(VFIOPCIDevice *vdev)
 #endif
 }

+bool vfio_is_vfio_pci(PCIDevice* pdev)
+{
+    VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
+    return vdev->vbasedev.type == VFIO_DEVICE_TYPE_PCI;
+}
+
 static void vfio_intx_update(PCIDevice *pdev)
 {
     VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 821def0565..26dfde805f 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -195,5 +195,6 @@ int vfio_spapr_create_window(VFIOContainer *container,
                              hwaddr *pgsize);
 int vfio_spapr_remove_window(VFIOContainer *container,
                              hwaddr offset_within_address_space);
+bool vfio_is_vfio_pci(PCIDevice* pdev);

 #endif /* HW_VFIO_VFIO_COMMON_H */
diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
index 4d7f3c82ca..cfb8843a77 100644
--- a/include/hw/virtio/virtio-net.h
+++ b/include/hw/virtio/virtio-net.h
@@ -42,6 +42,7 @@ typedef struct virtio_net_conf
     int32_t speed;
     char *duplex_str;
     uint8_t duplex;
+    char *standby_id_str;
 } virtio_net_conf;

 /* Maximum packet size we can receive from tap device: header + 64k */
@@ -103,9 +104,14 @@ typedef struct VirtIONet {
     int announce_counter;
     bool needs_vnet_hdr_swap;
     bool mtu_bypass_backend;
+    PCIDevice *primary_pdev;
+    BusState *primary_parent_bus;
+    DeviceListener primary_listener;
 } VirtIONet;

 void virtio_net_set_netclient_name(VirtIONet *n, const char *name,
                                    const char *type);
+void virtio_net_register_primary_device(DeviceState *vdev);
+

 #endif
(END)
On Fri, Oct 5, 2018 at 10:18 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Thu, Oct 04, 2018 at 05:03:14PM -0700, Siwei Liu wrote:
> > On Tue, Oct 2, 2018 at 5:43 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Tue, Oct 02, 2018 at 01:42:09AM -0700, Siwei Liu wrote:
> > > > The VF's MAC can be updated by PF/host on the fly at any time. One can
> > > > start with a random MAC but use group ID to pair device instead. And
> > > > only update MAC address to the real one when moving MAC filter around
> > > > after PV says OK to switch datapath.
> > > >
> > > > Do you see any problem with this design?
> > >
> > > Isn't this what I proposed:
> > >         Maybe we can
> > >         start VF with a temporary MAC, then change it to a final one when guest
> > >         tries to use it. It will work but we run into fact that MACs are
> > >         currently programmed by mgmnt - in many setups qemu does not have the
> > >         rights to do it.
> > >
> > > ?
> > >
> > > If yes I don't see a problem with the interface design, even though
> > > implementation wise it's more work as it will have to include management
> > > changes.
> >
> > I thought we discussed this design a while back:
> > https://www.spinics.net/lists/netdev/msg512232.html
> >
> > ... plug in a VF with a random MAC filter programmed in prior, and
> > initially use that random MAC within guest. This would require:
> > a) not relying on permanent MAC address to do pairing during the
> > initial discovery, e.g. use the failover group ID as in this
> > discussion
> > b) host to toggle the MAC address filter: which includes taking down
> > the tap device to return the MAC back to PF, followed by assigning
> > that MAC to VF using "ip link ... set vf ..."
> > c) notify guest to reload/reset VF driver for the change of hardware MAC address
> > d) until VF reloads the driver it won't be able to use the datapath,
> > so very short period of network outage is (still) expected
> >
> > though I still don't think this design can elimnate downtime.
>
>
> No, my idea is somewhat different. As you say there is a problem
> of delay at point (c). Further, the need to poke at PF filters
> with set vf does not match the current security model where
> any security related configuration such as MAC filtering is done upfront.
>
>
> So I have two suggestions:
>
> 1. Teach pf driver not to program the filter until vf driver actually goes up.
>
>    How do we know it went up? For example, it is highly likely
>    that driver will send some kind of command on init.
>    E.g. linux seems to always try to set the mac address during init.
>    We can have any kind of command received by the PF enable
>    the filter, until reset.
>
>    In absence of an appropriate command, QEMU can detect bus master
>    enable and do that.
>
> 2. Create a variant of trusted VF where it starts out without a valid
>    MAC, guest can set a softmac MAC but only can set it to the specific
>    value that matches virtio.
>    Alternatively - if it's preferred for some reason - allow
>    guest to program just two MACs, the original one and the virtio one.
>    Any other value is denied.
>
>
>
> > However,
> > it looks like as of today the MAC matching still haven't addressed the
> > datapath switching and error handling in a clean way. As said, for
> > SR-IOV live migration on iSCSI root disk there will be a lot of
> > dancing parts going along the way, reliable network connectity and
> > dedicated handshakes are critical to this kind of setup.
> >
> > -Siwei
>
> I think MAC matching removes downtime when device is removed but not
> when it's re-added, yes. It has the advantage of an already present
> linux driver support, but if you are prepared to work on
> adding e.g. bridge based matching, that will go away.
>
>
> > >
> > > --
> > > MST
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>


-- 
Respectfully,
Sameeh Jubran
Linkedin
Software Engineer @ Daynix.

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-10-05  5:17                                 ` Samudrala, Sridhar
@ 2018-10-10 14:40                                   ` Michael S. Tsirkin
  2018-10-11  0:16                                     ` Samudrala, Sridhar
  0 siblings, 1 reply; 85+ messages in thread
From: Michael S. Tsirkin @ 2018-10-10 14:40 UTC (permalink / raw)
  To: Samudrala, Sridhar; +Cc: Siwei Liu, Venu Busireddy, Cornelia Huck, virtio-dev

On Thu, Oct 04, 2018 at 10:17:04PM -0700, Samudrala, Sridhar wrote:
> On 10/4/2018 5:03 PM, Siwei Liu wrote:
> > On Tue, Oct 2, 2018 at 5:43 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > On Tue, Oct 02, 2018 at 01:42:09AM -0700, Siwei Liu wrote:
> > > > The VF's MAC can be updated by PF/host on the fly at any time. One can
> > > > start with a random MAC but use group ID to pair device instead. And
> > > > only update MAC address to the real one when moving MAC filter around
> > > > after PV says OK to switch datapath.
> > > > 
> > > > Do you see any problem with this design?
> > > Isn't this what I proposed:
> > >          Maybe we can
> > >          start VF with a temporary MAC, then change it to a final one when guest
> > >          tries to use it. It will work but we run into fact that MACs are
> > >          currently programmed by mgmnt - in many setups qemu does not have the
> > >          rights to do it.
> > > 
> > > ?
> > > 
> > > If yes I don't see a problem with the interface design, even though
> > > implementation wise it's more work as it will have to include management
> > > changes.
> > I thought we discussed this design a while back:
> > https://www.spinics.net/lists/netdev/msg512232.html
> > 
> > ... plug in a VF with a random MAC filter programmed in prior, and
> > initially use that random MAC within guest. This would require:
> > a) not relying on permanent MAC address to do pairing during the
> > initial discovery, e.g. use the failover group ID as in this
> > discussion
> > b) host to toggle the MAC address filter: which includes taking down
> > the tap device to return the MAC back to PF, followed by assigning
> > that MAC to VF using "ip link ... set vf ..."
> > c) notify guest to reload/reset VF driver for the change of hardware MAC address
> > d) until VF reloads the driver it won't be able to use the datapath,
> > so very short period of network outage is (still) expected
> > 
> > though I still don't think this design can elimnate downtime. However,
> > it looks like as of today the MAC matching still haven't addressed the
> > datapath switching and error handling in a clean way.
> 
> I am not sure what is the issue with datapath switching with the net_failover solution.
> 
> Do you see any issues with the migration management layer to automate the steps
> that are listed in the example script in the documentation.
> https://www.kernel.org/doc/html/latest/networking/net_failover.html
> 
> Now that we are considering making the VF visible only when the standby negotiation
> is completed, i am not sure why we need a random MAC.
> 

The claim is that some pfs update MAC RX filter immediately once vf is
created, not when its driver attaches. That will mean on hot-plug there
is downtime until device is guest visible and driver initialized.

Can you confirm that isn't the case for intel cards?


> > As said, for
> > SR-IOV live migration on iSCSI root disk there will be a lot of
> > dancing parts going along the way, reliable network connectity and
> > dedicated handshakes are critical to this kind of setup.
> > 
> > -Siwei
> > 
> > > --
> > > MST
> > > 
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > > 

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-10-08 22:06                                   ` Sameeh Jubran
@ 2018-10-10 14:43                                     ` Michael S. Tsirkin
  0 siblings, 0 replies; 85+ messages in thread
From: Michael S. Tsirkin @ 2018-10-10 14:43 UTC (permalink / raw)
  To: Sameeh Jubran
  Cc: Siwei Liu, venu.busireddy, cohuck, sridhar.samudrala, virtio-dev,
	ehabkost

On Tue, Oct 09, 2018 at 01:06:59AM +0300, Sameeh Jubran wrote:
> Hi All,
> 
> I have been busy trying to figure out how to implement the feature and
> got very confused with the current open questions and tier impact.
> 
> As I have stated earlier, I thought about doing the following:
> 
> 1 - Have an id for virtio-net (the standby device) and one for the
> vfio device (primary).
> 2 - On realize of virtio-net check for the existence of the primary
> device and hide the standby feature.
> 3 - Once the feature is acked by the guest, the device would be
> plugged back by virtio-net.
> 
> I've faced few issues when I tried to implement this which I overcame.
> At the end of the email I've included a prototype which implements
> this basic functionality using e1000 instead of vfio net device, I'm
> sharing this as a draft only (it has many flaws) as parts of it are
> valid for the actual implementation.
> 
> Issues that I've faced:
> 
> * I've used a device_listener callbacks it get the device to register
> itself for virtio-net. This makes virtio-net listen to the realization
> of the device. I don't think this approach is right, as it makes the
> virtio-net listen to every device which can be avoided by extending
> the current implementation of the device listner, Moreover, this
> doesn't solve the migration issues, as far as I understand, the
> realize function doesn't get called after the migration process which
> means this doesn't work. (correct me if I'm wrong)
> 
> * When testing with PC machine type which uses the PIIX4 as the
> hotplug handler, the hotplug handler get's set after the virtio-net
> and e1000 device has been realized. This means that I can't save the
> hotplug handler before detaching the device which means I can't plug
> it back as when the device is unplugged it is unattached from it's
> parent bus. This was resolved by saving a pointer to the parent bus
> instead and when attempting to replug the device then the parent can
> be used to get the hotplug handler. Note that unplugging the device
> using "qdev_simple_device_unplug_cb" doesn't require the hotplug
> handler as this function simply detaches the device object from it's
> parent object (the pci bus).
> 
> I've talked to Eduardo and he mentioned that he and Michael had
> discussed the following approach: using a property (for pci devices
> currently and maybe for others in the future?) which tells Qemu to
> hide the device from the bus upon init. This approach leaves the
> responsibility of managing the failover device to the management. The
> management can send commands to plug the hidden device or hide it back
> as well. I think that I like this approach better as it is proof of
> issues that can come up when trying to handle the failure of
> unplug/plug requests to the guest.
> 
> Please share your thoughts on this approach versus the draft implementation.

I would think just an internal flag on the pci device that
controls whether it's guest visible would be enough.
Not sure why would management need to be involved.

-- 
MST

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-10-10 14:40                                   ` Michael S. Tsirkin
@ 2018-10-11  0:16                                     ` Samudrala, Sridhar
  0 siblings, 0 replies; 85+ messages in thread
From: Samudrala, Sridhar @ 2018-10-11  0:16 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Siwei Liu, Venu Busireddy, Cornelia Huck, virtio-dev

On 10/10/2018 7:40 AM, Michael S. Tsirkin wrote:
> On Thu, Oct 04, 2018 at 10:17:04PM -0700, Samudrala, Sridhar wrote:
>> On 10/4/2018 5:03 PM, Siwei Liu wrote:
>>> On Tue, Oct 2, 2018 at 5:43 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>>>> On Tue, Oct 02, 2018 at 01:42:09AM -0700, Siwei Liu wrote:
>>>>> The VF's MAC can be updated by PF/host on the fly at any time. One can
>>>>> start with a random MAC but use group ID to pair device instead. And
>>>>> only update MAC address to the real one when moving MAC filter around
>>>>> after PV says OK to switch datapath.
>>>>>
>>>>> Do you see any problem with this design?
>>>> Isn't this what I proposed:
>>>>           Maybe we can
>>>>           start VF with a temporary MAC, then change it to a final one when guest
>>>>           tries to use it. It will work but we run into fact that MACs are
>>>>           currently programmed by mgmnt - in many setups qemu does not have the
>>>>           rights to do it.
>>>>
>>>> ?
>>>>
>>>> If yes I don't see a problem with the interface design, even though
>>>> implementation wise it's more work as it will have to include management
>>>> changes.
>>> I thought we discussed this design a while back:
>>> https://www.spinics.net/lists/netdev/msg512232.html
>>>
>>> ... plug in a VF with a random MAC filter programmed in prior, and
>>> initially use that random MAC within guest. This would require:
>>> a) not relying on permanent MAC address to do pairing during the
>>> initial discovery, e.g. use the failover group ID as in this
>>> discussion
>>> b) host to toggle the MAC address filter: which includes taking down
>>> the tap device to return the MAC back to PF, followed by assigning
>>> that MAC to VF using "ip link ... set vf ..."
>>> c) notify guest to reload/reset VF driver for the change of hardware MAC address
>>> d) until VF reloads the driver it won't be able to use the datapath,
>>> so very short period of network outage is (still) expected
>>>
>>> though I still don't think this design can elimnate downtime. However,
>>> it looks like as of today the MAC matching still haven't addressed the
>>> datapath switching and error handling in a clean way.
>> I am not sure what is the issue with datapath switching with the net_failover solution.
>>
>> Do you see any issues with the migration management layer to automate the steps
>> that are listed in the example script in the documentation.
>> https://www.kernel.org/doc/html/latest/networking/net_failover.html
>>
>> Now that we are considering making the VF visible only when the standby negotiation
>> is completed, i am not sure why we need a random MAC.
>>
> The claim is that some pfs update MAC RX filter immediately once vf is
> created, not when its driver attaches. That will mean on hot-plug there
> is downtime until device is guest visible and driver initialized.
>
> Can you confirm that isn't the case for intel cards?

For an untrusted VF,  MAC address is assigned by the management layer and is set via
ndo_set_vf_mac() call to the PF on the hypervisor.  This does cause the MAC RX filter
to be programmed immediately.

If possible we could delay setting the MAC RX filter until the device is guest visible, but
before the driver is loaded. If the VF driver comes up with a random mac before the
MAC address is set via PF, it will require a VF reset to get the right MAC which also would
also result in downtime.





---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-10-05 19:18                                 ` Michael S. Tsirkin
  2018-10-08 22:06                                   ` Sameeh Jubran
@ 2018-10-11  1:26                                   ` Siwei Liu
  2018-10-18 23:20                                     ` Siwei Liu
  2018-10-19  3:45                                     ` Michael S. Tsirkin
  1 sibling, 2 replies; 85+ messages in thread
From: Siwei Liu @ 2018-10-11  1:26 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Venu Busireddy, Cornelia Huck, Samudrala, Sridhar, virtio-dev

On Fri, Oct 5, 2018 at 12:18 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Thu, Oct 04, 2018 at 05:03:14PM -0700, Siwei Liu wrote:
> > On Tue, Oct 2, 2018 at 5:43 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Tue, Oct 02, 2018 at 01:42:09AM -0700, Siwei Liu wrote:
> > > > The VF's MAC can be updated by PF/host on the fly at any time. One can
> > > > start with a random MAC but use group ID to pair device instead. And
> > > > only update MAC address to the real one when moving MAC filter around
> > > > after PV says OK to switch datapath.
> > > >
> > > > Do you see any problem with this design?
> > >
> > > Isn't this what I proposed:
> > >         Maybe we can
> > >         start VF with a temporary MAC, then change it to a final one when guest
> > >         tries to use it. It will work but we run into fact that MACs are
> > >         currently programmed by mgmnt - in many setups qemu does not have the
> > >         rights to do it.
> > >
> > > ?
> > >
> > > If yes I don't see a problem with the interface design, even though
> > > implementation wise it's more work as it will have to include management
> > > changes.
> >
> > I thought we discussed this design a while back:
> > https://www.spinics.net/lists/netdev/msg512232.html
> >
> > ... plug in a VF with a random MAC filter programmed in prior, and
> > initially use that random MAC within guest. This would require:
> > a) not relying on permanent MAC address to do pairing during the
> > initial discovery, e.g. use the failover group ID as in this
> > discussion
> > b) host to toggle the MAC address filter: which includes taking down
> > the tap device to return the MAC back to PF, followed by assigning
> > that MAC to VF using "ip link ... set vf ..."
> > c) notify guest to reload/reset VF driver for the change of hardware MAC address
> > d) until VF reloads the driver it won't be able to use the datapath,
> > so very short period of network outage is (still) expected
> >
> > though I still don't think this design can elimnate downtime.
>
>
> No, my idea is somewhat different. As you say there is a problem
> of delay at point (c).
That's true, I never say the downtime can be avoided because of this
delay in the guest side. But with this the downtime gets to the bare
minimum and in most situations packets won't be lost on reception as
long as the PF sets up the filter in timely manner.

> Further, the need to poke at PF filters
> with set vf does not match the current security model where
> any security related configuration such as MAC filtering is done upfront.

The security model belongs to the VM policy not the VF, right? I think
same MAC address will always be used on the VM as it starts with
virtio. Why it is a security issue that VF starts with an unused MAC
before it's able to be used in the guest?

>
>
> So I have two suggestions:
>
> 1. Teach pf driver not to program the filter until vf driver actually goes up.
>
>    How do we know it went up? For example, it is highly likely
>    that driver will send some kind of command on init.
>    E.g. linux seems to always try to set the mac address during init.
>    We can have any kind of command received by the PF enable
>    the filter, until reset.

I'm not sure it's a valid assumption for any guest, say Windows. The
VF can start with the MAC address advertised from PF in the first
reset, and the MAC filter generally will be activated at that point.
Some other PF/VF variants enable the filter after that until the VF is
brought up in guest, while some others enable the filter even before
the VF gets assigned to guest. Trying to assume the behaviour on
specific guest or specific NIC device is a slippery slope. The only
thing that's reliable is the semantics of ndo_vf_xxx interface for the
PF. You seem to overly assume too much on the specific PF behaviour
which is not defined in the interface itself.

>
>    In absence of an appropriate command, QEMU can detect bus master
>    enable and do that.
>
> 2. Create a variant of trusted VF where it starts out without a valid
>    MAC, guest can set a softmac MAC but only can set it to the specific
>    value that matches virtio.
>    Alternatively - if it's preferred for some reason - allow
>    guest to program just two MACs, the original one and the virtio one.
>    Any other value is denied.

I am getting confused, I don't know why that's even needed. The
management tool can set any predefined MAC that is deemed safe for VF
to start with. Why it needs to be that complicated? What is the
purpose of another model for trusted VF and softmac? It's the PF that
changes the MAC not the VF.

>
>
>
> > However,
> > it looks like as of today the MAC matching still haven't addressed the
> > datapath switching and error handling in a clean way. As said, for
> > SR-IOV live migration on iSCSI root disk there will be a lot of
> > dancing parts going along the way, reliable network connectity and
> > dedicated handshakes are critical to this kind of setup.
> >
> > -Siwei
>
> I think MAC matching removes downtime when device is removed but not
> when it's re-added, yes. It has the advantage of an already present
> linux driver support, but if you are prepared to work on
> adding e.g. bridge based matching, that will go away.

The removal order and consequence will be the same between MAC
matching and group ID based matching. It's just the initial discovery
that's slightly different. Why do you think the downtime will be
different for the removal scenario? And why do you think it's needed
to alter the current PF driver behavior to support bridge based
matching? Sorry I'm really confused about your suggestion. Those PF
driver model changes are not needed acutally. The fact is that the
bridge based matching is supposed to work quite well for any PF driver
implementation no matter when the MAC address filters gets added or
enabled.

Thanks,
-Siwei


>
>
> > >
> > > --
> > > MST
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > >

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-10-11  1:26                                   ` Siwei Liu
@ 2018-10-18 23:20                                     ` Siwei Liu
  2018-10-18 23:40                                       ` Michael S. Tsirkin
  2018-10-19  3:45                                     ` Michael S. Tsirkin
  1 sibling, 1 reply; 85+ messages in thread
From: Siwei Liu @ 2018-10-18 23:20 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Venu Busireddy, Cornelia Huck, Samudrala, Sridhar, virtio-dev,
	liran.alon

To be honest, I don't understand why there's resistance of using PV to
initiate datapath switching, and the point of relying on very specifc
PF behavior for datapath swtiching, There's even no point to alter the
SR-IOV driver model to accomondate the (zero) downtime requirement,
how potentally can Windows work with that? The current MAC based
scheme is very fragile when dealing with errors, and the past lesson
learnt leads me to believe that those drivers errors in the hot plug
path, even not common, is NOT neglectable.

-Siwei

On Wed, Oct 10, 2018 at 6:26 PM Siwei Liu <loseweigh@gmail.com> wrote:
>
> On Fri, Oct 5, 2018 at 12:18 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Thu, Oct 04, 2018 at 05:03:14PM -0700, Siwei Liu wrote:
> > > On Tue, Oct 2, 2018 at 5:43 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Tue, Oct 02, 2018 at 01:42:09AM -0700, Siwei Liu wrote:
> > > > > The VF's MAC can be updated by PF/host on the fly at any time. One can
> > > > > start with a random MAC but use group ID to pair device instead. And
> > > > > only update MAC address to the real one when moving MAC filter around
> > > > > after PV says OK to switch datapath.
> > > > >
> > > > > Do you see any problem with this design?
> > > >
> > > > Isn't this what I proposed:
> > > >         Maybe we can
> > > >         start VF with a temporary MAC, then change it to a final one when guest
> > > >         tries to use it. It will work but we run into fact that MACs are
> > > >         currently programmed by mgmnt - in many setups qemu does not have the
> > > >         rights to do it.
> > > >
> > > > ?
> > > >
> > > > If yes I don't see a problem with the interface design, even though
> > > > implementation wise it's more work as it will have to include management
> > > > changes.
> > >
> > > I thought we discussed this design a while back:
> > > https://www.spinics.net/lists/netdev/msg512232.html
> > >
> > > ... plug in a VF with a random MAC filter programmed in prior, and
> > > initially use that random MAC within guest. This would require:
> > > a) not relying on permanent MAC address to do pairing during the
> > > initial discovery, e.g. use the failover group ID as in this
> > > discussion
> > > b) host to toggle the MAC address filter: which includes taking down
> > > the tap device to return the MAC back to PF, followed by assigning
> > > that MAC to VF using "ip link ... set vf ..."
> > > c) notify guest to reload/reset VF driver for the change of hardware MAC address
> > > d) until VF reloads the driver it won't be able to use the datapath,
> > > so very short period of network outage is (still) expected
> > >
> > > though I still don't think this design can elimnate downtime.
> >
> >
> > No, my idea is somewhat different. As you say there is a problem
> > of delay at point (c).
> That's true, I never say the downtime can be avoided because of this
> delay in the guest side. But with this the downtime gets to the bare
> minimum and in most situations packets won't be lost on reception as
> long as the PF sets up the filter in timely manner.
>
> > Further, the need to poke at PF filters
> > with set vf does not match the current security model where
> > any security related configuration such as MAC filtering is done upfront.
>
> The security model belongs to the VM policy not the VF, right? I think
> same MAC address will always be used on the VM as it starts with
> virtio. Why it is a security issue that VF starts with an unused MAC
> before it's able to be used in the guest?
>
> >
> >
> > So I have two suggestions:
> >
> > 1. Teach pf driver not to program the filter until vf driver actually goes up.
> >
> >    How do we know it went up? For example, it is highly likely
> >    that driver will send some kind of command on init.
> >    E.g. linux seems to always try to set the mac address during init.
> >    We can have any kind of command received by the PF enable
> >    the filter, until reset.
>
> I'm not sure it's a valid assumption for any guest, say Windows. The
> VF can start with the MAC address advertised from PF in the first
> reset, and the MAC filter generally will be activated at that point.
> Some other PF/VF variants enable the filter after that until the VF is
> brought up in guest, while some others enable the filter even before
> the VF gets assigned to guest. Trying to assume the behaviour on
> specific guest or specific NIC device is a slippery slope. The only
> thing that's reliable is the semantics of ndo_vf_xxx interface for the
> PF. You seem to overly assume too much on the specific PF behaviour
> which is not defined in the interface itself.
>
> >
> >    In absence of an appropriate command, QEMU can detect bus master
> >    enable and do that.
> >
> > 2. Create a variant of trusted VF where it starts out without a valid
> >    MAC, guest can set a softmac MAC but only can set it to the specific
> >    value that matches virtio.
> >    Alternatively - if it's preferred for some reason - allow
> >    guest to program just two MACs, the original one and the virtio one.
> >    Any other value is denied.
>
> I am getting confused, I don't know why that's even needed. The
> management tool can set any predefined MAC that is deemed safe for VF
> to start with. Why it needs to be that complicated? What is the
> purpose of another model for trusted VF and softmac? It's the PF that
> changes the MAC not the VF.
>
> >
> >
> >
> > > However,
> > > it looks like as of today the MAC matching still haven't addressed the
> > > datapath switching and error handling in a clean way. As said, for
> > > SR-IOV live migration on iSCSI root disk there will be a lot of
> > > dancing parts going along the way, reliable network connectity and
> > > dedicated handshakes are critical to this kind of setup.
> > >
> > > -Siwei
> >
> > I think MAC matching removes downtime when device is removed but not
> > when it's re-added, yes. It has the advantage of an already present
> > linux driver support, but if you are prepared to work on
> > adding e.g. bridge based matching, that will go away.
>
> The removal order and consequence will be the same between MAC
> matching and group ID based matching. It's just the initial discovery
> that's slightly different. Why do you think the downtime will be
> different for the removal scenario? And why do you think it's needed
> to alter the current PF driver behavior to support bridge based
> matching? Sorry I'm really confused about your suggestion. Those PF
> driver model changes are not needed acutally. The fact is that the
> bridge based matching is supposed to work quite well for any PF driver
> implementation no matter when the MAC address filters gets added or
> enabled.
>
> Thanks,
> -Siwei
>
>
> >
> >
> > > >
> > > > --
> > > > MST
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > > >

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-10-18 23:20                                     ` Siwei Liu
@ 2018-10-18 23:40                                       ` Michael S. Tsirkin
  0 siblings, 0 replies; 85+ messages in thread
From: Michael S. Tsirkin @ 2018-10-18 23:40 UTC (permalink / raw)
  To: Siwei Liu
  Cc: Venu Busireddy, Cornelia Huck, Samudrala, Sridhar, virtio-dev,
	liran.alon

On Thu, Oct 18, 2018 at 04:20:13PM -0700, Siwei Liu wrote:
> To be honest, I don't understand why there's resistance of using PV to
> initiate datapath switching,

I see no resistance. I see lack of man-power.  Any extension needs:
1- guest driver support
2- qemu support
3- spec documentation

Current scheme has (1) by now, a beginning of (3) and a rudimentary
prototype of (2).

For your proposed extension, you can either do some of this work, or do
some testing to demonstrate problems and motivate others. So far
emailing list about theoretical issues did not seem to motivate anyone.

-- 
MST

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-10-11  1:26                                   ` Siwei Liu
  2018-10-18 23:20                                     ` Siwei Liu
@ 2018-10-19  3:45                                     ` Michael S. Tsirkin
  2018-11-21 15:39                                       ` Sameeh Jubran
  1 sibling, 1 reply; 85+ messages in thread
From: Michael S. Tsirkin @ 2018-10-19  3:45 UTC (permalink / raw)
  To: Siwei Liu; +Cc: Venu Busireddy, Cornelia Huck, Samudrala, Sridhar, virtio-dev

On Wed, Oct 10, 2018 at 06:26:50PM -0700, Siwei Liu wrote:
> On Fri, Oct 5, 2018 at 12:18 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Thu, Oct 04, 2018 at 05:03:14PM -0700, Siwei Liu wrote:
> > > On Tue, Oct 2, 2018 at 5:43 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Tue, Oct 02, 2018 at 01:42:09AM -0700, Siwei Liu wrote:
> > > > > The VF's MAC can be updated by PF/host on the fly at any time. One can
> > > > > start with a random MAC but use group ID to pair device instead. And
> > > > > only update MAC address to the real one when moving MAC filter around
> > > > > after PV says OK to switch datapath.
> > > > >
> > > > > Do you see any problem with this design?
> > > >
> > > > Isn't this what I proposed:
> > > >         Maybe we can
> > > >         start VF with a temporary MAC, then change it to a final one when guest
> > > >         tries to use it. It will work but we run into fact that MACs are
> > > >         currently programmed by mgmnt - in many setups qemu does not have the
> > > >         rights to do it.
> > > >
> > > > ?
> > > >
> > > > If yes I don't see a problem with the interface design, even though
> > > > implementation wise it's more work as it will have to include management
> > > > changes.
> > >
> > > I thought we discussed this design a while back:
> > > https://www.spinics.net/lists/netdev/msg512232.html
> > >
> > > ... plug in a VF with a random MAC filter programmed in prior, and
> > > initially use that random MAC within guest. This would require:
> > > a) not relying on permanent MAC address to do pairing during the
> > > initial discovery, e.g. use the failover group ID as in this
> > > discussion
> > > b) host to toggle the MAC address filter: which includes taking down
> > > the tap device to return the MAC back to PF, followed by assigning
> > > that MAC to VF using "ip link ... set vf ..."
> > > c) notify guest to reload/reset VF driver for the change of hardware MAC address
> > > d) until VF reloads the driver it won't be able to use the datapath,
> > > so very short period of network outage is (still) expected
> > >
> > > though I still don't think this design can elimnate downtime.
> >
> >
> > No, my idea is somewhat different. As you say there is a problem
> > of delay at point (c).
> That's true, I never say the downtime can be avoided because of this
> delay in the guest side. But with this the downtime gets to the bare
> minimum and in most situations packets won't be lost on reception as
> long as the PF sets up the filter in timely manner.

It's not really the bare minimum IMHO. E.g. fixing the PF to
defer filter update will give you less downtime.

> > Further, the need to poke at PF filters
> > with set vf does not match the current security model where
> > any security related configuration such as MAC filtering is done upfront.
> 
> The security model belongs to the VM policy not the VF, right? I think
> same MAC address will always be used on the VM as it starts with
> virtio. Why it is a security issue that VF starts with an unused MAC
> before it's able to be used in the guest?

Basically if guest is able to trigger MAC changes,
it might be able to exploit some bug to escalate that to
full network access. Completely blocking configuration
changes after setup feels safer.

Case in point, with QEMU a typical selinux policy will block
attempts to change MACs, that task will have to be
delegated to a suitably priveledged tool.


> 
> >
> >
> > So I have two suggestions:
> >
> > 1. Teach pf driver not to program the filter until vf driver actually goes up.
> >
> >    How do we know it went up? For example, it is highly likely
> >    that driver will send some kind of command on init.
> >    E.g. linux seems to always try to set the mac address during init.
> >    We can have any kind of command received by the PF enable
> >    the filter, until reset.
> 
> I'm not sure it's a valid assumption for any guest, say Windows. The
> VF can start with the MAC address advertised from PF in the first
> reset, and the MAC filter generally will be activated at that point.
> Some other PF/VF variants enable the filter after that until the VF is
> brought up in guest, while some others enable the filter even before
> the VF gets assigned to guest. Trying to assume the behaviour on
> specific guest or specific NIC device is a slippery slope.


Is all this just theoretical or do you observe any problems in practice?

> The only
> thing that's reliable is the semantics of ndo_vf_xxx interface for the
> PF.

ndo_vf_xxx is an internal Linux interface. That's not guaranteed to be
stable at all. I think you mean the netlink interface that triggers
that. That should be stable but if what you say above is true isn't
fully defined.

> You seem to overly assume too much on the specific PF behaviour
> which is not defined in the interface itself.

So IMHO it's something that we should fix in Linux,
making all devices behave consistently.

> >
> >    In absence of an appropriate command, QEMU can detect bus master
> >    enable and do that.
> >
> > 2. Create a variant of trusted VF where it starts out without a valid
> >    MAC, guest can set a softmac MAC but only can set it to the specific
> >    value that matches virtio.
> >    Alternatively - if it's preferred for some reason - allow
> >    guest to program just two MACs, the original one and the virtio one.
> >    Any other value is denied.
> 
> I am getting confused, I don't know why that's even needed. The
> management tool can set any predefined MAC that is deemed safe for VF
> to start with. Why it needs to be that complicated? What is the
> purpose of another model for trusted VF and softmac? It's the PF that
> changes the MAC not the VF.

This will give us a simple solution without guest driver changes for
when VF is trusted. In particular it will work e.g. for PFs as well.

> >
> >
> >
> > > However,
> > > it looks like as of today the MAC matching still haven't addressed the
> > > datapath switching and error handling in a clean way. As said, for
> > > SR-IOV live migration on iSCSI root disk there will be a lot of
> > > dancing parts going along the way, reliable network connectity and
> > > dedicated handshakes are critical to this kind of setup.
> > >
> > > -Siwei
> >
> > I think MAC matching removes downtime when device is removed but not
> > when it's re-added, yes. It has the advantage of an already present
> > linux driver support, but if you are prepared to work on
> > adding e.g. bridge based matching, that will go away.
> 
> The removal order and consequence will be the same between MAC
> matching and group ID based matching. It's just the initial discovery
> that's slightly different. Why do you think the downtime will be
> different for the removal scenario? And why do you think it's needed
> to alter the current PF driver behavior to support bridge based
> matching? Sorry I'm really confused about your suggestion. Those PF
> driver model changes are not needed acutally. The fact is that the
> bridge based matching is supposed to work quite well for any PF driver
> implementation no matter when the MAC address filters gets added or
> enabled.
> 
> Thanks,
> -Siwei

It seems that it requires a bunch of changes for all VF drivers
though.

> 
> >
> >
> > > >
> > > > --
> > > > MST
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > > >

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-10-19  3:45                                     ` Michael S. Tsirkin
@ 2018-11-21 15:39                                       ` Sameeh Jubran
  2018-11-21 18:41                                         ` Michael S. Tsirkin
  0 siblings, 1 reply; 85+ messages in thread
From: Sameeh Jubran @ 2018-11-21 15:39 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Siwei Liu, venu.busireddy, cohuck, sridhar.samudrala, virtio-dev

[-- Attachment #1: Type: text/plain, Size: 17753 bytes --]

Hi all,

It's been a while since the last discussion here. I have been working on
implementing the standby feature in Qemu. I have tried multiple approaches
for implementation and in the end decided to implement using the
hotplug/unplug infrastructure for multiple reasons which I'll go over when
I send the patches. For now you can find the implementation here:
https://github.com/sameehj/qemu/tree/failover_hidden_opts (the full command
line I used can be found at the end of the email)

I have tested my implementation in Qemu with Fedora 29 guest, I can see the
failover interface successfully and assign an ip to it. The feature is
acked and the primary device is plugged in with no issues.

I have created a setup which has two hosts (host A and host B) with X710
10G cards connected back to back. On one host (I'll refer to this host as
host A) I have configured a bridge with the PF interface as well as
vitio-net's interface (standby) both attached to it. I ran the guest with
the patched Qemu on host A and pinged the bridge successfully, I also have
a ping between host A and Host B, however, I can't ping host B from the VM
and vice versa, this only happens when the feature is enabled for some
reason I have yet to figure out.

I haven't tested migration yet, but on my way to do so.

Since I couldn't ping from VM to host B, I did an iperf test between the VM
and host A with the feature enabled and during the test I have unplugged
the sriov device, the device was unplugged successfully and no drops where
observed as you can see in the results below:

[root@dhcp156-44 ~]# ifconfig
ens3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.19.156.44  netmask 255.255.248.0  broadcast 10.19.159.255
        inet6 fe80::d306:561f:9f43:ff77  prefixlen 64  scopeid 0x20<link>
        inet6 2620:52:0:1398:9699:325b:25f9:e7bb  prefixlen 64  scopeid
0x0<global>
        ether 56:cc:c1:01:cc:21  txqueuelen 1000  (Ethernet)
        RX packets 12258  bytes 870822 (850.4 KiB)
        RX errors 11  dropped 0  overruns 0  frame 11
        TX packets 294  bytes 32432 (31.6 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

ens4: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.1.17  netmask 255.255.255.0  broadcast 192.168.1.255
        inet6 fe80::bc87:86b8:bc86:be4e  prefixlen 64  scopeid 0x20<link>
        ether 8a:f7:20:29:3b:cb  txqueuelen 1000  (Ethernet)
        RX packets 41052  bytes 2775833 (2.6 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 47468  bytes 15629 (15.2 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

ens6: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        ether 8a:f7:20:29:3b:cb  txqueuelen 1000  (Ethernet)
        RX packets 214  bytes 14966 (14.6 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 163  bytes 26498 (25.8 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

ens4nsby: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        ether 8a:f7:20:29:3b:cb  txqueuelen 1000  (Ethernet)
        RX packets 41052  bytes 2775833 (2.6 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 47468  bytes 2889827541 (2.6 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 176  bytes 19712 (19.2 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 176  bytes 19712 (19.2 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

[root@dhcp156-44 ~]# iperf -c 192.168.1.117 -t 100 -i 1
------------------------------------------------------------
Client connecting to 192.168.1.117, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[  3] local 192.168.1.17 port 40368 connected with 192.168.1.117 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 1.0 sec  3.47 GBytes  29.8 Gbits/sec
[  3]  1.0- 2.0 sec  4.35 GBytes  37.4 Gbits/sec
[  3]  2.0- 3.0 sec  4.10 GBytes  35.2 Gbits/sec
[  3]  3.0- 4.0 sec  4.20 GBytes  36.1 Gbits/sec
[  3]  4.0- 5.0 sec  4.20 GBytes  36.1 Gbits/sec
[  3]  5.0- 6.0 sec  4.07 GBytes  34.9 Gbits/sec
[  3]  6.0- 7.0 sec  4.53 GBytes  38.9 Gbits/sec
[  3]  7.0- 8.0 sec  4.38 GBytes  37.6 Gbits/sec
[  3]  8.0- 9.0 sec  4.60 GBytes  39.5 Gbits/sec
[  3]  9.0-10.0 sec  4.60 GBytes  39.5 Gbits/sec
[  3] 10.0-11.0 sec  4.56 GBytes  39.2 Gbits/sec
[  3] 11.0-12.0 sec  4.70 GBytes  40.4 Gbits/sec
[  3] 12.0-13.0 sec  4.65 GBytes  39.9 Gbits/sec
[  3] 13.0-14.0 sec  4.51 GBytes  38.7 Gbits/sec
[  3] 14.0-15.0 sec  4.48 GBytes  38.5 Gbits/sec
[  3] 15.0-16.0 sec  4.67 GBytes  40.2 Gbits/sec
[  3] 16.0-17.0 sec  4.37 GBytes  37.5 Gbits/sec
[  3] 17.0-18.0 sec  4.68 GBytes  40.2 Gbits/sec
[  3] 18.0-19.0 sec  4.99 GBytes  42.9 Gbits/sec
[  3] 19.0-20.0 sec  5.00 GBytes  42.9 Gbits/sec
[  3] 20.0-21.0 sec  4.90 GBytes  42.1 Gbits/sec
[  3] 21.0-22.0 sec  4.72 GBytes  40.5 Gbits/sec
[  3] 22.0-23.0 sec  4.60 GBytes  39.5 Gbits/sec
[  3] 23.0-24.0 sec  4.72 GBytes  40.6 Gbits/sec
[  3] 24.0-25.0 sec  4.42 GBytes  38.0 Gbits/sec
[  3] 25.0-26.0 sec  4.44 GBytes  38.2 Gbits/sec
[  3] 26.0-27.0 sec  4.18 GBytes  35.9 Gbits/sec
[  3] 27.0-28.0 sec  4.20 GBytes  36.1 Gbits/sec
[  3] 28.0-29.0 sec  4.27 GBytes  36.7 Gbits/sec
[  3] 29.0-30.0 sec  4.16 GBytes  35.7 Gbits/sec
[  3] 30.0-31.0 sec  4.14 GBytes  35.6 Gbits/sec
[  3] 31.0-32.0 sec  4.13 GBytes  35.4 Gbits/sec
[  3] 32.0-33.0 sec  4.16 GBytes  35.7 Gbits/sec
[  3] 33.0-34.0 sec  4.33 GBytes  37.2 Gbits/sec
[  3] 34.0-35.0 sec  4.31 GBytes  37.0 Gbits/sec
[  3] 35.0-36.0 sec  4.26 GBytes  36.6 Gbits/sec
[  3] 36.0-37.0 sec  4.36 GBytes  37.5 Gbits/sec
[  3] 37.0-38.0 sec  4.11 GBytes  35.3 Gbits/sec
[  3] 38.0-39.0 sec  4.00 GBytes  34.4 Gbits/sec
[  3] 39.0-40.0 sec  4.53 GBytes  38.9 Gbits/sec
[  3] 40.0-41.0 sec  4.06 GBytes  34.9 Gbits/sec
[  3] 41.0-42.0 sec  4.17 GBytes  35.8 Gbits/sec
[  3] 42.0-43.0 sec  4.14 GBytes  35.6 Gbits/sec
[  3] 43.0-44.0 sec  4.07 GBytes  34.9 Gbits/sec
^C[  3]  0.0-44.5 sec   195 GBytes  37.5 Gbits/sec
[root@dhcp156-44 ~]# ifconfig
ens3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.19.156.44  netmask 255.255.248.0  broadcast 10.19.159.255
        inet6 fe80::d306:561f:9f43:ff77  prefixlen 64  scopeid 0x20<link>
        inet6 2620:52:0:1398:9699:325b:25f9:e7bb  prefixlen 64  scopeid
0x0<global>
        ether 56:cc:c1:01:cc:21  txqueuelen 1000  (Ethernet)
        RX packets 12547  bytes 889713 (868.8 KiB)
        RX errors 11  dropped 0  overruns 0  frame 11
        TX packets 373  bytes 45723 (44.6 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

ens4: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.1.17  netmask 255.255.255.0  broadcast 192.168.1.255
        inet6 fe80::bc87:86b8:bc86:be4e  prefixlen 64  scopeid 0x20<link>
        ether 8a:f7:20:29:3b:cb  txqueuelen 1000  (Ethernet)
        RX packets 2862498  bytes 192898865 (183.9 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 3414905  bytes 209192841687 (194.8 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

ens4nsby: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        ether 8a:f7:20:29:3b:cb  txqueuelen 1000  (Ethernet)
        RX packets 2862498  bytes 192898865 (183.9 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 3414905  bytes 212082653599 (197.5 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 176  bytes 19712 (19.2 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 176  bytes 19712 (19.2 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

__________________________________________________________________________________________________________________

The command line I used:

/root/qemu/x86_64-softmmu/qemu-system-x86_64 \
-netdev
tap,id=hostnet0,script=world_bridge_standalone.sh,downscript=no,ifname=cc17
\
-device e1000,netdev=hostnet0,mac=56:cc:c1:01:cc:21,id=cc17 \
-netdev
tap,vhost=on,id=hostnet1,script=test_bridge_standalone.sh,downscript=no,ifname=cc1_72,queues=4
\
-device
virtio-net,host_mtu=1500,netdev=hostnet1,mac=8a:f7:20:29:3b:cb,id=cc1_72,vectors=10,mq=on,primary=cc1_71
\
-device vfio-pci,host=65:02.1,id=cc1_71,standby=cc1_72 \
-enable-kvm \
-name netkvm \
-m 3000M \
-drive file=/dev/shm/fedora_29.qcow2,if=ide,id=drivex \
-smp 4 \
-vga qxl \
-spice port=6110,disable-ticketing \
-device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x7
\
-chardev spicevmc,name=vdagent,id=vdagent \
-device
virtserialport,nr=1,bus=virtio-serial0.0,chardev=vdagent,name=com.redhat.spice.0
\
-chardev socket,path=/tmp/qga.sock,server,nowait,id=qga0 \
-device virtio-serial \
-device virtserialport,chardev=qga0,name=org.qemu.guest_agent.0 \
-monitor stdio


On Fri, Oct 19, 2018 at 6:45 AM Michael S. Tsirkin <mst@redhat.com> wrote:

> On Wed, Oct 10, 2018 at 06:26:50PM -0700, Siwei Liu wrote:
> > On Fri, Oct 5, 2018 at 12:18 PM Michael S. Tsirkin <mst@redhat.com>
> wrote:
> > >
> > > On Thu, Oct 04, 2018 at 05:03:14PM -0700, Siwei Liu wrote:
> > > > On Tue, Oct 2, 2018 at 5:43 AM Michael S. Tsirkin <mst@redhat.com>
> wrote:
> > > > >
> > > > > On Tue, Oct 02, 2018 at 01:42:09AM -0700, Siwei Liu wrote:
> > > > > > The VF's MAC can be updated by PF/host on the fly at any time.
> One can
> > > > > > start with a random MAC but use group ID to pair device instead.
> And
> > > > > > only update MAC address to the real one when moving MAC filter
> around
> > > > > > after PV says OK to switch datapath.
> > > > > >
> > > > > > Do you see any problem with this design?
> > > > >
> > > > > Isn't this what I proposed:
> > > > >         Maybe we can
> > > > >         start VF with a temporary MAC, then change it to a final
> one when guest
> > > > >         tries to use it. It will work but we run into fact that
> MACs are
> > > > >         currently programmed by mgmnt - in many setups qemu does
> not have the
> > > > >         rights to do it.
> > > > >
> > > > > ?
> > > > >
> > > > > If yes I don't see a problem with the interface design, even though
> > > > > implementation wise it's more work as it will have to include
> management
> > > > > changes.
> > > >
> > > > I thought we discussed this design a while back:
> > > > https://www.spinics.net/lists/netdev/msg512232.html
> > > >
> > > > ... plug in a VF with a random MAC filter programmed in prior, and
> > > > initially use that random MAC within guest. This would require:
> > > > a) not relying on permanent MAC address to do pairing during the
> > > > initial discovery, e.g. use the failover group ID as in this
> > > > discussion
> > > > b) host to toggle the MAC address filter: which includes taking down
> > > > the tap device to return the MAC back to PF, followed by assigning
> > > > that MAC to VF using "ip link ... set vf ..."
> > > > c) notify guest to reload/reset VF driver for the change of hardware
> MAC address
> > > > d) until VF reloads the driver it won't be able to use the datapath,
> > > > so very short period of network outage is (still) expected
> > > >
> > > > though I still don't think this design can elimnate downtime.
> > >
> > >
> > > No, my idea is somewhat different. As you say there is a problem
> > > of delay at point (c).
> > That's true, I never say the downtime can be avoided because of this
> > delay in the guest side. But with this the downtime gets to the bare
> > minimum and in most situations packets won't be lost on reception as
> > long as the PF sets up the filter in timely manner.
>
> It's not really the bare minimum IMHO. E.g. fixing the PF to
> defer filter update will give you less downtime.
>
> > > Further, the need to poke at PF filters
> > > with set vf does not match the current security model where
> > > any security related configuration such as MAC filtering is done
> upfront.
> >
> > The security model belongs to the VM policy not the VF, right? I think
> > same MAC address will always be used on the VM as it starts with
> > virtio. Why it is a security issue that VF starts with an unused MAC
> > before it's able to be used in the guest?
>
> Basically if guest is able to trigger MAC changes,
> it might be able to exploit some bug to escalate that to
> full network access. Completely blocking configuration
> changes after setup feels safer.
>
> Case in point, with QEMU a typical selinux policy will block
> attempts to change MACs, that task will have to be
> delegated to a suitably priveledged tool.
>
>
> >
> > >
> > >
> > > So I have two suggestions:
> > >
> > > 1. Teach pf driver not to program the filter until vf driver actually
> goes up.
> > >
> > >    How do we know it went up? For example, it is highly likely
> > >    that driver will send some kind of command on init.
> > >    E.g. linux seems to always try to set the mac address during init.
> > >    We can have any kind of command received by the PF enable
> > >    the filter, until reset.
> >
> > I'm not sure it's a valid assumption for any guest, say Windows. The
> > VF can start with the MAC address advertised from PF in the first
> > reset, and the MAC filter generally will be activated at that point.
> > Some other PF/VF variants enable the filter after that until the VF is
> > brought up in guest, while some others enable the filter even before
> > the VF gets assigned to guest. Trying to assume the behaviour on
> > specific guest or specific NIC device is a slippery slope.
>
>
> Is all this just theoretical or do you observe any problems in practice?
>
> > The only
> > thing that's reliable is the semantics of ndo_vf_xxx interface for the
> > PF.
>
> ndo_vf_xxx is an internal Linux interface. That's not guaranteed to be
> stable at all. I think you mean the netlink interface that triggers
> that. That should be stable but if what you say above is true isn't
> fully defined.
>
> > You seem to overly assume too much on the specific PF behaviour
> > which is not defined in the interface itself.
>
> So IMHO it's something that we should fix in Linux,
> making all devices behave consistently.
>
> > >
> > >    In absence of an appropriate command, QEMU can detect bus master
> > >    enable and do that.
> > >
> > > 2. Create a variant of trusted VF where it starts out without a valid
> > >    MAC, guest can set a softmac MAC but only can set it to the specific
> > >    value that matches virtio.
> > >    Alternatively - if it's preferred for some reason - allow
> > >    guest to program just two MACs, the original one and the virtio one.
> > >    Any other value is denied.
> >
> > I am getting confused, I don't know why that's even needed. The
> > management tool can set any predefined MAC that is deemed safe for VF
> > to start with. Why it needs to be that complicated? What is the
> > purpose of another model for trusted VF and softmac? It's the PF that
> > changes the MAC not the VF.
>
> This will give us a simple solution without guest driver changes for
> when VF is trusted. In particular it will work e.g. for PFs as well.
>
> > >
> > >
> > >
> > > > However,
> > > > it looks like as of today the MAC matching still haven't addressed
> the
> > > > datapath switching and error handling in a clean way. As said, for
> > > > SR-IOV live migration on iSCSI root disk there will be a lot of
> > > > dancing parts going along the way, reliable network connectity and
> > > > dedicated handshakes are critical to this kind of setup.
> > > >
> > > > -Siwei
> > >
> > > I think MAC matching removes downtime when device is removed but not
> > > when it's re-added, yes. It has the advantage of an already present
> > > linux driver support, but if you are prepared to work on
> > > adding e.g. bridge based matching, that will go away.
> >
> > The removal order and consequence will be the same between MAC
> > matching and group ID based matching. It's just the initial discovery
> > that's slightly different. Why do you think the downtime will be
> > different for the removal scenario? And why do you think it's needed
> > to alter the current PF driver behavior to support bridge based
> > matching? Sorry I'm really confused about your suggestion. Those PF
> > driver model changes are not needed acutally. The fact is that the
> > bridge based matching is supposed to work quite well for any PF driver
> > implementation no matter when the MAC address filters gets added or
> > enabled.
> >
> > Thanks,
> > -Siwei
>
> It seems that it requires a bunch of changes for all VF drivers
> though.
>
> >
> > >
> > >
> > > > >
> > > > > --
> > > > > MST
> > > > >
> > > > >
> ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail:
> virtio-dev-unsubscribe@lists.oasis-open.org
> > > > > For additional commands, e-mail:
> virtio-dev-help@lists.oasis-open.org
> > > > >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>
>

-- 
Respectfully,
*Sameeh Jubran*
*Linkedin <https://il.linkedin.com/pub/sameeh-jubran/87/747/a8a>*
*Software Engineer @ Daynix <http://www.daynix.com>.*

[-- Attachment #2: Type: text/html, Size: 22970 bytes --]

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-11-21 15:39                                       ` Sameeh Jubran
@ 2018-11-21 18:41                                         ` Michael S. Tsirkin
  2018-11-21 20:04                                           ` Sameeh Jubran
  0 siblings, 1 reply; 85+ messages in thread
From: Michael S. Tsirkin @ 2018-11-21 18:41 UTC (permalink / raw)
  To: Sameeh Jubran
  Cc: Siwei Liu, venu.busireddy, cohuck, sridhar.samudrala, virtio-dev

Great to see you making progress on this!
Some comments below:

On Wed, Nov 21, 2018 at 05:39:38PM +0200, Sameeh Jubran wrote:
> I have created a setup which has two hosts (host A and host B) with X710 10G
> cards connected back to back. On one host (I'll refer to this host as host A) I
> have configured a bridge with the PF interface as well as vitio-net's interface
> (standby) both attached to it. 

...

> The command line I used:
> 
> /root/qemu/x86_64-softmmu/qemu-system-x86_64 \
> -netdev tap,id=hostnet0,script=world_bridge_standalone.sh,downscript=no,ifname=
> cc17 \
> -device e1000,netdev=hostnet0,mac=56:cc:c1:01:cc:21,id=cc17 \

What's e1000 doing here?
Can this be reason you can not talk to host?

> -netdev tap,vhost=on,id=hostnet1,script=test_bridge_standalone.sh,downscript=
> no,ifname=cc1_72,queues=4 \
> -device virtio-net,host_mtu=1500,netdev=hostnet1,mac=8a:f7:20:29:3b:cb,id=
> cc1_72,vectors=10,mq=on,primary=cc1_71 \
> -device vfio-pci,host=65:02.1,id=cc1_71,standby=cc1_72 \
> -enable-kvm \
> -name netkvm \
> -m 3000M \
> -drive file=/dev/shm/fedora_29.qcow2,if=ide,id=drivex \
> -smp 4 \
> -vga qxl \
> -spice port=6110,disable-ticketing \
> -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x7 \
> -chardev spicevmc,name=vdagent,id=vdagent \
> -device virtserialport,nr=1,bus=virtio-serial0.0,chardev=vdagent,name=
> com.redhat.spice.0 \
> -chardev socket,path=/tmp/qga.sock,server,nowait,id=qga0 \
> -device virtio-serial \
> -device virtserialport,chardev=qga0,name=org.qemu.guest_agent.0 \
> -monitor stdio


...

> Since I couldn't ping from VM to host B, I did an iperf test between the VM and
> host A with the feature enabled and during the test I have unplugged the sriov
> device, the device was unplugged successfully and no drops where observed as
> you can see in the results below:
> 
> [root@dhcp156-44 ~]# ifconfig

Well I suspect this won't tell you anything, this shows packet drops at
the hardware level. When e.g. link is down linux won't send any packets
out. The simplest test is to monitor latency and throughput and see that
while it is lower for the duration of migration, there are no huge
spikes around the switch.

-- 
MST

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-11-21 18:41                                         ` Michael S. Tsirkin
@ 2018-11-21 20:04                                           ` Sameeh Jubran
  2018-11-21 23:51                                             ` Samudrala, Sridhar
  2018-11-22 18:27                                             ` Michael S. Tsirkin
  0 siblings, 2 replies; 85+ messages in thread
From: Sameeh Jubran @ 2018-11-21 20:04 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Siwei Liu, venu.busireddy, cohuck, sridhar.samudrala, virtio-dev

On Wed, Nov 21, 2018 at 8:41 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> Great to see you making progress on this!
> Some comments below:
>
> On Wed, Nov 21, 2018 at 05:39:38PM +0200, Sameeh Jubran wrote:
> > I have created a setup which has two hosts (host A and host B) with X710 10G
> > cards connected back to back. On one host (I'll refer to this host as host A) I
> > have configured a bridge with the PF interface as well as vitio-net's interface
> > (standby) both attached to it.
>
> ...
>
> > The command line I used:
> >
> > /root/qemu/x86_64-softmmu/qemu-system-x86_64 \
> > -netdev tap,id=hostnet0,script=world_bridge_standalone.sh,downscript=no,ifname=
> > cc17 \
> > -device e1000,netdev=hostnet0,mac=56:cc:c1:01:cc:21,id=cc17 \
>
> What's e1000 doing here?
> Can this be reason you can not talk to host?
I don't think so, the e1000 is for enabling WAN connection on the
guest for downloading packages and ssh connection. It is connected to
a separate bridge which is connected to the external interface of the
host.
>
> > -netdev tap,vhost=on,id=hostnet1,script=test_bridge_standalone.sh,downscript=
> > no,ifname=cc1_72,queues=4 \
> > -device virtio-net,host_mtu=1500,netdev=hostnet1,mac=8a:f7:20:29:3b:cb,id=
> > cc1_72,vectors=10,mq=on,primary=cc1_71 \
> > -device vfio-pci,host=65:02.1,id=cc1_71,standby=cc1_72 \
> > -enable-kvm \
> > -name netkvm \
> > -m 3000M \
> > -drive file=/dev/shm/fedora_29.qcow2,if=ide,id=drivex \
> > -smp 4 \
> > -vga qxl \
> > -spice port=6110,disable-ticketing \
> > -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x7 \
> > -chardev spicevmc,name=vdagent,id=vdagent \
> > -device virtserialport,nr=1,bus=virtio-serial0.0,chardev=vdagent,name=
> > com.redhat.spice.0 \
> > -chardev socket,path=/tmp/qga.sock,server,nowait,id=qga0 \
> > -device virtio-serial \
> > -device virtserialport,chardev=qga0,name=org.qemu.guest_agent.0 \
> > -monitor stdio
>
>
> ...
>
> > Since I couldn't ping from VM to host B, I did an iperf test between the VM and
> > host A with the feature enabled and during the test I have unplugged the sriov
> > device, the device was unplugged successfully and no drops where observed as
> > you can see in the results below:
> >
> > [root@dhcp156-44 ~]# ifconfig
>
> Well I suspect this won't tell you anything, this shows packet drops at
> the hardware level. When e.g. link is down linux won't send any packets
> out. The simplest test is to monitor latency and throughput and see that
> while it is lower for the duration of migration, there are no huge
> spikes around the switch.
Oh, okay will do that.

I have noticed some nasty lag when I tried to ssh to the VM using the
failover interface while I didn't experience that with the e1000.
Sridhar Any idea what might be the cause?
>
> --
> MST



-- 
Respectfully,
Sameeh Jubran
Linkedin
Software Engineer @ Daynix.

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-11-21 20:04                                           ` Sameeh Jubran
@ 2018-11-21 23:51                                             ` Samudrala, Sridhar
  2018-11-22 13:55                                               ` Sameeh Jubran
  2018-11-22 18:27                                             ` Michael S. Tsirkin
  1 sibling, 1 reply; 85+ messages in thread
From: Samudrala, Sridhar @ 2018-11-21 23:51 UTC (permalink / raw)
  To: Sameeh Jubran, Michael S. Tsirkin
  Cc: Siwei Liu, venu.busireddy, cohuck, virtio-dev

On 11/21/2018 12:04 PM, Sameeh Jubran wrote:
> On Wed, Nov 21, 2018 at 8:41 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>> Great to see you making progress on this!
>> Some comments below:
>>
>> On Wed, Nov 21, 2018 at 05:39:38PM +0200, Sameeh Jubran wrote:
>>> I have created a setup which has two hosts (host A and host B) with X710 10G
>>> cards connected back to back. On one host (I'll refer to this host as host A) I
>>> have configured a bridge with the PF interface as well as vitio-net's interface
>>> (standby) both attached to it.
>> ...
>>
>>> The command line I used:
>>>
>>> /root/qemu/x86_64-softmmu/qemu-system-x86_64 \
>>> -netdev tap,id=hostnet0,script=world_bridge_standalone.sh,downscript=no,ifname=
>>> cc17 \
>>> -device e1000,netdev=hostnet0,mac=56:cc:c1:01:cc:21,id=cc17 \
>> What's e1000 doing here?
>> Can this be reason you can not talk to host?
> I don't think so, the e1000 is for enabling WAN connection on the
> guest for downloading packages and ssh connection. It is connected to
> a separate bridge which is connected to the external interface of the
> host.
>>> -netdev tap,vhost=on,id=hostnet1,script=test_bridge_standalone.sh,downscript=
>>> no,ifname=cc1_72,queues=4 \
>>> -device virtio-net,host_mtu=1500,netdev=hostnet1,mac=8a:f7:20:29:3b:cb,id=
>>> cc1_72,vectors=10,mq=on,primary=cc1_71 \
>>> -device vfio-pci,host=65:02.1,id=cc1_71,standby=cc1_72 \
>>> -enable-kvm \
>>> -name netkvm \
>>> -m 3000M \
>>> -drive file=/dev/shm/fedora_29.qcow2,if=ide,id=drivex \
>>> -smp 4 \
>>> -vga qxl \
>>> -spice port=6110,disable-ticketing \
>>> -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x7 \
>>> -chardev spicevmc,name=vdagent,id=vdagent \
>>> -device virtserialport,nr=1,bus=virtio-serial0.0,chardev=vdagent,name=
>>> com.redhat.spice.0 \
>>> -chardev socket,path=/tmp/qga.sock,server,nowait,id=qga0 \
>>> -device virtio-serial \
>>> -device virtserialport,chardev=qga0,name=org.qemu.guest_agent.0 \
>>> -monitor stdio
>>
>> ...
>>
>>> Since I couldn't ping from VM to host B, I did an iperf test between the VM and
>>> host A with the feature enabled and during the test I have unplugged the sriov
>>> device, the device was unplugged successfully and no drops where observed as
>>> you can see in the results below:
>>>
>>> [root@dhcp156-44 ~]# ifconfig
>> Well I suspect this won't tell you anything, this shows packet drops at
>> the hardware level. When e.g. link is down linux won't send any packets
>> out. The simplest test is to monitor latency and throughput and see that
>> while it is lower for the duration of migration, there are no huge
>> spikes around the switch.
> Oh, okay will do that.
>
> I have noticed some nasty lag when I tried to ssh to the VM using the
> failover interface while I didn't experience that with the e1000.
> Sridhar Any idea what might be the cause?
>
When using failover interface, i guess you have the VF interface plugged in and UP.
So you should be using the primary interface.

Do you see the VFs MAC configured correctly at the host PF?
You can do
     ip link show dev <pf>
and it should show the MACs of all the VFs associated with that PF.




---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-11-21 23:51                                             ` Samudrala, Sridhar
@ 2018-11-22 13:55                                               ` Sameeh Jubran
  0 siblings, 0 replies; 85+ messages in thread
From: Sameeh Jubran @ 2018-11-22 13:55 UTC (permalink / raw)
  To: sridhar.samudrala
  Cc: Michael S. Tsirkin, Siwei Liu, venu.busireddy, cohuck, virtio-dev

On Thu, Nov 22, 2018 at 1:51 AM Samudrala, Sridhar
<sridhar.samudrala@intel.com> wrote:
>
> On 11/21/2018 12:04 PM, Sameeh Jubran wrote:
> > On Wed, Nov 21, 2018 at 8:41 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >> Great to see you making progress on this!
> >> Some comments below:
> >>
> >> On Wed, Nov 21, 2018 at 05:39:38PM +0200, Sameeh Jubran wrote:
> >>> I have created a setup which has two hosts (host A and host B) with X710 10G
> >>> cards connected back to back. On one host (I'll refer to this host as host A) I
> >>> have configured a bridge with the PF interface as well as vitio-net's interface
> >>> (standby) both attached to it.
> >> ...
> >>
> >>> The command line I used:
> >>>
> >>> /root/qemu/x86_64-softmmu/qemu-system-x86_64 \
> >>> -netdev tap,id=hostnet0,script=world_bridge_standalone.sh,downscript=no,ifname=
> >>> cc17 \
> >>> -device e1000,netdev=hostnet0,mac=56:cc:c1:01:cc:21,id=cc17 \
> >> What's e1000 doing here?
> >> Can this be reason you can not talk to host?
> > I don't think so, the e1000 is for enabling WAN connection on the
> > guest for downloading packages and ssh connection. It is connected to
> > a separate bridge which is connected to the external interface of the
> > host.
> >>> -netdev tap,vhost=on,id=hostnet1,script=test_bridge_standalone.sh,downscript=
> >>> no,ifname=cc1_72,queues=4 \
> >>> -device virtio-net,host_mtu=1500,netdev=hostnet1,mac=8a:f7:20:29:3b:cb,id=
> >>> cc1_72,vectors=10,mq=on,primary=cc1_71 \
> >>> -device vfio-pci,host=65:02.1,id=cc1_71,standby=cc1_72 \
> >>> -enable-kvm \
> >>> -name netkvm \
> >>> -m 3000M \
> >>> -drive file=/dev/shm/fedora_29.qcow2,if=ide,id=drivex \
> >>> -smp 4 \
> >>> -vga qxl \
> >>> -spice port=6110,disable-ticketing \
> >>> -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x7 \
> >>> -chardev spicevmc,name=vdagent,id=vdagent \
> >>> -device virtserialport,nr=1,bus=virtio-serial0.0,chardev=vdagent,name=
> >>> com.redhat.spice.0 \
> >>> -chardev socket,path=/tmp/qga.sock,server,nowait,id=qga0 \
> >>> -device virtio-serial \
> >>> -device virtserialport,chardev=qga0,name=org.qemu.guest_agent.0 \
> >>> -monitor stdio
> >>
> >> ...
> >>
> >>> Since I couldn't ping from VM to host B, I did an iperf test between the VM and
> >>> host A with the feature enabled and during the test I have unplugged the sriov
> >>> device, the device was unplugged successfully and no drops where observed as
> >>> you can see in the results below:
> >>>
> >>> [root@dhcp156-44 ~]# ifconfig
> >> Well I suspect this won't tell you anything, this shows packet drops at
> >> the hardware level. When e.g. link is down linux won't send any packets
> >> out. The simplest test is to monitor latency and throughput and see that
> >> while it is lower for the duration of migration, there are no huge
> >> spikes around the switch.
> > Oh, okay will do that.
> >
> > I have noticed some nasty lag when I tried to ssh to the VM using the
> > failover interface while I didn't experience that with the e1000.
> > Sridhar Any idea what might be the cause?
> >
> When using failover interface, i guess you have the VF interface plugged in and UP.
> So you should be using the primary interface.
>
> Do you see the VFs MAC configured correctly at the host PF?
> You can do
>      ip link show dev <pf>
> and it should show the MACs of all the VFs associated with that PF.
You are correct, the vf mac was zet to all zeroes, it can be set by
using "ip link set ens2f0 vf 1 mac 8a:f7:20:29:3b:cb" for example
[root@virtlab517 netkvm_dev]#  ip link show dev ens2f0
2: ens2f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master
test_br0 state UP mode DEFAULT group default qlen 1000
    link/ether f8:f2:1e:33:43:30 brd ff:ff:ff:ff:ff:ff
    vf 0 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off
    vf 1 MAC 8a:f7:20:29:3b:cb, spoof checking on, link-state auto, trust off
    vf 2 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off
    vf 3 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off
    vf 4 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off
    vf 5 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off
    vf 6 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off
    vf 7 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off

I have tried the same test above now but with Iperf test and ping from
the VM to Host B and during the test I eject the primary device using
device_del command. This is the same mechanism which my implementation
does during migration, however, the traffic is lost now as you can see
below. Did you test the failover interface with such scenarios ?

[root@dhcp156-44 ~]# ifconfig
ens3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.19.156.44  netmask 255.255.248.0  broadcast 10.19.159.255
        inet6 2620:52:0:1398:9699:325b:25f9:e7bb  prefixlen 64
scopeid 0x0<global>
        inet6 fe80::d306:561f:9f43:ff77  prefixlen 64  scopeid 0x20<link>
        ether 56:cc:c1:01:cc:21  txqueuelen 1000  (Ethernet)
        RX packets 55201  bytes 3496532 (3.3 MiB)
        RX errors 72  dropped 0  overruns 0  frame 72
        TX packets 738  bytes 70323 (68.6 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

ens4: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.1.17  netmask 255.255.255.0  broadcast 192.168.1.255
        inet6 fe80::bc87:86b8:bc86:be4e  prefixlen 64  scopeid 0x20<link>
        ether 8a:f7:20:29:3b:cb  txqueuelen 1000  (Ethernet)
        RX packets 41260  bytes 3067573 (2.9 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 1156043  bytes 1749160131 (1.6 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

ens6: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        ether 8a:f7:20:29:3b:cb  txqueuelen 1000  (Ethernet)
        RX packets 40183  bytes 2848249 (2.7 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 1156043  bytes 1749160131 (1.6 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

ens4nsby: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        ether 8a:f7:20:29:3b:cb  txqueuelen 1000  (Ethernet)
        RX packets 1077  bytes 219324 (214.1 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 3  bytes 264 (264.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 3  bytes 264 (264.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

_____________________________________________________________________________________________________________________________________

[root@dhcp156-44 ~]# iperf -c 192.168.1.118 -t 100 -i 1
------------------------------------------------------------
Client connecting to 192.168.1.118, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[  3] local 192.168.1.17 port 42210 connected with 192.168.1.118 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 1.0 sec  1.10 GBytes  9.42 Gbits/sec
[  3]  1.0- 2.0 sec  1.10 GBytes  9.42 Gbits/sec
[  3]  2.0- 3.0 sec  1.10 GBytes  9.41 Gbits/sec
[  3]  3.0- 4.0 sec  1.10 GBytes  9.42 Gbits/sec
[  3]  4.0- 5.0 sec  1.09 GBytes  9.40 Gbits/sec
[  3]  5.0- 6.0 sec  1.10 GBytes  9.42 Gbits/sec
[  3]  6.0- 7.0 sec  1.10 GBytes  9.41 Gbits/sec
[  3]  7.0- 8.0 sec  1.10 GBytes  9.42 Gbits/sec
[  3]  8.0- 9.0 sec  1.10 GBytes  9.41 Gbits/sec
[  3]  9.0-10.0 sec  1.10 GBytes  9.41 Gbits/sec
[  3] 10.0-11.0 sec  1.10 GBytes  9.41 Gbits/sec
[  3] 11.0-12.0 sec  1.10 GBytes  9.42 Gbits/sec
[  3] 12.0-13.0 sec  1.10 GBytes  9.41 Gbits/sec
[  3] 13.0-14.0 sec  1.10 GBytes  9.41 Gbits/sec
[  3] 14.0-15.0 sec  1.10 GBytes  9.42 Gbits/sec
[  3] 15.0-16.0 sec  1.10 GBytes  9.42 Gbits/sec
[  3] 16.0-17.0 sec  1.10 GBytes  9.41 Gbits/sec
[  3] 17.0-18.0 sec  1.10 GBytes  9.42 Gbits/sec
[  3] 18.0-19.0 sec  1.09 GBytes  9.40 Gbits/sec
[  3] 19.0-20.0 sec  1.10 GBytes  9.42 Gbits/sec
[  3] 20.0-21.0 sec  1.10 GBytes  9.41 Gbits/sec
[  3] 21.0-22.0 sec  1.10 GBytes  9.42 Gbits/sec
[  3] 22.0-23.0 sec  1.10 GBytes  9.41 Gbits/sec
[  3] 23.0-24.0 sec  1.10 GBytes  9.42 Gbits/sec
[  3] 24.0-25.0 sec  1.10 GBytes  9.41 Gbits/sec
[  3] 25.0-26.0 sec  1.10 GBytes  9.41 Gbits/sec
[  3] 26.0-27.0 sec  1.10 GBytes  9.42 Gbits/sec
[  3] 27.0-28.0 sec  1.10 GBytes  9.42 Gbits/sec
[  3] 28.0-29.0 sec  1.10 GBytes  9.41 Gbits/sec
[  3] 29.0-30.0 sec  1.10 GBytes  9.41 Gbits/sec
[  3] 30.0-31.0 sec   318 MBytes  2.66 Gbits/sec
[  3] 31.0-32.0 sec  0.00 Bytes  0.00 bits/sec
[  3] 32.0-33.0 sec  0.00 Bytes  0.00 bits/sec
[  3] 33.0-34.0 sec  0.00 Bytes  0.00 bits/sec
[  3] 34.0-35.0 sec  0.00 Bytes  0.00 bits/sec
[  3] 35.0-36.0 sec  0.00 Bytes  0.00 bits/sec
[  3] 36.0-37.0 sec  0.00 Bytes  0.00 bits/sec
[  3] 37.0-38.0 sec  0.00 Bytes  0.00 bits/sec
[  3] 38.0-39.0 sec  0.00 Bytes  0.00 bits/sec
[  3] 39.0-40.0 sec  0.00 Bytes  0.00 bits/sec
[  3] 40.0-41.0 sec  0.00 Bytes  0.00 bits/sec
[  3] 41.0-42.0 sec  0.00 Bytes  0.00 bits/sec
^C[  3] 42.0-43.0 sec  0.00 Bytes  0.00 bits/sec
[  3]  0.0-43.4 sec  33.2 GBytes  6.57 Gbits/sec
_____________________________________________________________________________________________________________________________________
[root@dhcp156-44 ~]# ping 192.168.1.118
PING 192.168.1.118 (192.168.1.118) 56(84) bytes of data.
64 bytes from 192.168.1.118: icmp_seq=1 ttl=64 time=0.264 ms
64 bytes from 192.168.1.118: icmp_seq=2 ttl=64 time=0.167 ms
64 bytes from 192.168.1.118: icmp_seq=3 ttl=64 time=0.168 ms
64 bytes from 192.168.1.118: icmp_seq=4 ttl=64 time=0.174 ms
64 bytes from 192.168.1.118: icmp_seq=5 ttl=64 time=0.168 ms
64 bytes from 192.168.1.118: icmp_seq=6 ttl=64 time=0.282 ms
64 bytes from 192.168.1.118: icmp_seq=7 ttl=64 time=0.179 ms
64 bytes from 192.168.1.118: icmp_seq=8 ttl=64 time=0.141 ms
64 bytes from 192.168.1.118: icmp_seq=9 ttl=64 time=0.165 ms
64 bytes from 192.168.1.118: icmp_seq=10 ttl=64 time=0.168 ms
64 bytes from 192.168.1.118: icmp_seq=11 ttl=64 time=0.153 ms
64 bytes from 192.168.1.118: icmp_seq=12 ttl=64 time=0.296 ms
64 bytes from 192.168.1.118: icmp_seq=13 ttl=64 time=0.258 ms
64 bytes from 192.168.1.118: icmp_seq=14 ttl=64 time=0.236 ms
64 bytes from 192.168.1.118: icmp_seq=15 ttl=64 time=0.190 ms
64 bytes from 192.168.1.118: icmp_seq=16 ttl=64 time=0.245 ms
64 bytes from 192.168.1.118: icmp_seq=17 ttl=64 time=0.187 ms
64 bytes from 192.168.1.118: icmp_seq=18 ttl=64 time=0.222 ms
64 bytes from 192.168.1.118: icmp_seq=19 ttl=64 time=0.231 ms
64 bytes from 192.168.1.118: icmp_seq=20 ttl=64 time=0.279 ms
64 bytes from 192.168.1.118: icmp_seq=21 ttl=64 time=0.271 ms
64 bytes from 192.168.1.118: icmp_seq=22 ttl=64 time=0.319 ms
64 bytes from 192.168.1.118: icmp_seq=23 ttl=64 time=0.350 ms
64 bytes from 192.168.1.118: icmp_seq=24 ttl=64 time=0.311 ms
64 bytes from 192.168.1.118: icmp_seq=25 ttl=64 time=0.249 ms
64 bytes from 192.168.1.118: icmp_seq=26 ttl=64 time=0.258 ms
64 bytes from 192.168.1.118: icmp_seq=27 ttl=64 time=0.220 ms
64 bytes from 192.168.1.118: icmp_seq=28 ttl=64 time=0.299 ms
64 bytes from 192.168.1.118: icmp_seq=29 ttl=64 time=0.281 ms
64 bytes from 192.168.1.118: icmp_seq=30 ttl=64 time=0.271 ms
64 bytes from 192.168.1.118: icmp_seq=31 ttl=64 time=0.241 ms
64 bytes from 192.168.1.118: icmp_seq=32 ttl=64 time=0.245 ms
64 bytes from 192.168.1.118: icmp_seq=33 ttl=64 time=0.245 ms
64 bytes from 192.168.1.118: icmp_seq=34 ttl=64 time=0.287 ms
64 bytes from 192.168.1.118: icmp_seq=35 ttl=64 time=0.322 ms
64 bytes from 192.168.1.118: icmp_seq=36 ttl=64 time=0.247 ms
64 bytes from 192.168.1.118: icmp_seq=37 ttl=64 time=0.316 ms
64 bytes from 192.168.1.118: icmp_seq=38 ttl=64 time=0.255 ms
64 bytes from 192.168.1.118: icmp_seq=39 ttl=64 time=0.308 ms
64 bytes from 192.168.1.118: icmp_seq=40 ttl=64 time=0.250 ms
64 bytes from 192.168.1.118: icmp_seq=41 ttl=64 time=0.260 ms


^C
--- 192.168.1.118 ping statistics ---
63 packets transmitted, 41 received, 34.9206% packet loss, time 568ms
rtt min/avg/max/mdev = 0.141/0.243/0.350/0.054 ms



>
>
>


-- 
Respectfully,
Sameeh Jubran
Linkedin
Software Engineer @ Daynix.

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-11-21 20:04                                           ` Sameeh Jubran
  2018-11-21 23:51                                             ` Samudrala, Sridhar
@ 2018-11-22 18:27                                             ` Michael S. Tsirkin
  2018-11-26 15:13                                               ` Sameeh Jubran
  1 sibling, 1 reply; 85+ messages in thread
From: Michael S. Tsirkin @ 2018-11-22 18:27 UTC (permalink / raw)
  To: Sameeh Jubran
  Cc: Siwei Liu, venu.busireddy, cohuck, sridhar.samudrala, virtio-dev

On Wed, Nov 21, 2018 at 10:04:53PM +0200, Sameeh Jubran wrote:
> On Wed, Nov 21, 2018 at 8:41 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > Great to see you making progress on this!
> > Some comments below:
> >
> > On Wed, Nov 21, 2018 at 05:39:38PM +0200, Sameeh Jubran wrote:
> > > I have created a setup which has two hosts (host A and host B) with X710 10G
> > > cards connected back to back. On one host (I'll refer to this host as host A) I
> > > have configured a bridge with the PF interface as well as vitio-net's interface
> > > (standby) both attached to it.
> >
> > ...
> >
> > > The command line I used:
> > >
> > > /root/qemu/x86_64-softmmu/qemu-system-x86_64 \
> > > -netdev tap,id=hostnet0,script=world_bridge_standalone.sh,downscript=no,ifname=
> > > cc17 \
> > > -device e1000,netdev=hostnet0,mac=56:cc:c1:01:cc:21,id=cc17 \
> >
> > What's e1000 doing here?
> > Can this be reason you can not talk to host?
> I don't think so, the e1000 is for enabling WAN connection on the
> guest for downloading packages and ssh connection. It is connected to
> a separate bridge which is connected to the external interface of the
> host.
> >
> > > -netdev tap,vhost=on,id=hostnet1,script=test_bridge_standalone.sh,downscript=
> > > no,ifname=cc1_72,queues=4 \
> > > -device virtio-net,host_mtu=1500,netdev=hostnet1,mac=8a:f7:20:29:3b:cb,id=
> > > cc1_72,vectors=10,mq=on,primary=cc1_71 \
> > > -device vfio-pci,host=65:02.1,id=cc1_71,standby=cc1_72 \
> > > -enable-kvm \
> > > -name netkvm \
> > > -m 3000M \
> > > -drive file=/dev/shm/fedora_29.qcow2,if=ide,id=drivex \
> > > -smp 4 \
> > > -vga qxl \
> > > -spice port=6110,disable-ticketing \
> > > -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x7 \
> > > -chardev spicevmc,name=vdagent,id=vdagent \
> > > -device virtserialport,nr=1,bus=virtio-serial0.0,chardev=vdagent,name=
> > > com.redhat.spice.0 \
> > > -chardev socket,path=/tmp/qga.sock,server,nowait,id=qga0 \
> > > -device virtio-serial \
> > > -device virtserialport,chardev=qga0,name=org.qemu.guest_agent.0 \
> > > -monitor stdio
> >
> >
> > ...
> >
> > > Since I couldn't ping from VM to host B, I did an iperf test between the VM and
> > > host A with the feature enabled and during the test I have unplugged the sriov
> > > device, the device was unplugged successfully and no drops where observed as
> > > you can see in the results below:
> > >
> > > [root@dhcp156-44 ~]# ifconfig
> >
> > Well I suspect this won't tell you anything, this shows packet drops at
> > the hardware level. When e.g. link is down linux won't send any packets
> > out. The simplest test is to monitor latency and throughput and see that
> > while it is lower for the duration of migration, there are no huge
> > spikes around the switch.
> Oh, okay will do that.
> 
> I have noticed some nasty lag when I tried to ssh to the VM using the
> failover interface while I didn't experience that with the e1000.
> Sridhar Any idea what might be the cause?

Try tcpdump?

> >
> > --
> > MST
> 
> 
> 
> -- 
> Respectfully,
> Sameeh Jubran
> Linkedin
> Software Engineer @ Daynix.

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-11-22 18:27                                             ` Michael S. Tsirkin
@ 2018-11-26 15:13                                               ` Sameeh Jubran
  2018-11-26 15:43                                                 ` Sameeh Jubran
  0 siblings, 1 reply; 85+ messages in thread
From: Sameeh Jubran @ 2018-11-26 15:13 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Siwei Liu, venu.busireddy, cohuck, sridhar.samudrala, virtio-dev,
	liran.alon, Yan Vugenfirer

On Thu, Nov 22, 2018 at 8:27 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, Nov 21, 2018 at 10:04:53PM +0200, Sameeh Jubran wrote:
> > On Wed, Nov 21, 2018 at 8:41 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > Great to see you making progress on this!
> > > Some comments below:
> > >
> > > On Wed, Nov 21, 2018 at 05:39:38PM +0200, Sameeh Jubran wrote:
> > > > I have created a setup which has two hosts (host A and host B) with X710 10G
> > > > cards connected back to back. On one host (I'll refer to this host as host A) I
> > > > have configured a bridge with the PF interface as well as vitio-net's interface
> > > > (standby) both attached to it.
> > >
> > > ...
> > >
> > > > The command line I used:
> > > >
> > > > /root/qemu/x86_64-softmmu/qemu-system-x86_64 \
> > > > -netdev tap,id=hostnet0,script=world_bridge_standalone.sh,downscript=no,ifname=
> > > > cc17 \
> > > > -device e1000,netdev=hostnet0,mac=56:cc:c1:01:cc:21,id=cc17 \
> > >
> > > What's e1000 doing here?
> > > Can this be reason you can not talk to host?
> > I don't think so, the e1000 is for enabling WAN connection on the
> > guest for downloading packages and ssh connection. It is connected to
> > a separate bridge which is connected to the external interface of the
> > host.
> > >
> > > > -netdev tap,vhost=on,id=hostnet1,script=test_bridge_standalone.sh,downscript=
> > > > no,ifname=cc1_72,queues=4 \
> > > > -device virtio-net,host_mtu=1500,netdev=hostnet1,mac=8a:f7:20:29:3b:cb,id=
> > > > cc1_72,vectors=10,mq=on,primary=cc1_71 \
> > > > -device vfio-pci,host=65:02.1,id=cc1_71,standby=cc1_72 \
> > > > -enable-kvm \
> > > > -name netkvm \
> > > > -m 3000M \
> > > > -drive file=/dev/shm/fedora_29.qcow2,if=ide,id=drivex \
> > > > -smp 4 \
> > > > -vga qxl \
> > > > -spice port=6110,disable-ticketing \
> > > > -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x7 \
> > > > -chardev spicevmc,name=vdagent,id=vdagent \
> > > > -device virtserialport,nr=1,bus=virtio-serial0.0,chardev=vdagent,name=
> > > > com.redhat.spice.0 \
> > > > -chardev socket,path=/tmp/qga.sock,server,nowait,id=qga0 \
> > > > -device virtio-serial \
> > > > -device virtserialport,chardev=qga0,name=org.qemu.guest_agent.0 \
> > > > -monitor stdio
> > >
> > >
> > > ...
> > >
> > > > Since I couldn't ping from VM to host B, I did an iperf test between the VM and
> > > > host A with the feature enabled and during the test I have unplugged the sriov
> > > > device, the device was unplugged successfully and no drops where observed as
> > > > you can see in the results below:
> > > >
> > > > [root@dhcp156-44 ~]# ifconfig
> > >
> > > Well I suspect this won't tell you anything, this shows packet drops at
> > > the hardware level. When e.g. link is down linux won't send any packets
> > > out. The simplest test is to monitor latency and throughput and see that
> > > while it is lower for the duration of migration, there are no huge
> > > spikes around the switch.
> > Oh, okay will do that.
> >
> > I have noticed some nasty lag when I tried to ssh to the VM using the
> > failover interface while I didn't experience that with the e1000.
> > Sridhar Any idea what might be the cause?
>
> Try tcpdump?
I have investigated this and this is what I have so far, maybe you can
help me with some insights to figure what's going on.
The setup is as follows:



|_VM_|
__||___
|host A|----X710---------back-to-back--------X710---|host B|

_______________________________________________________________________
- On the host A:

I have the following interfaces attached to the "test_br0" bridge:

virtio-net's netdev, cc1_72
X710 device PF interfaces: ens2f0 and ens2f1 (only ens2f0 is connected
in the back to back setup)

The bridge has the mac address of the PF ens2f0 and ip : 192.168.1.117
_______________________________________________________________________
- On the host B:

I have the following interfaces attached to the "test_br0" bridge:

X710 device PF interfaces: ens2f0 and ens2f1 (only ens2f0 is connected
in the back to back setup)

The bridge has the mac address of the PF ens2f0 and ip : 192.168.1.118
_______________________________________________________________________
- On the VM:
The failover interface has the ip: 192.168.1.17
_______________________________________________________________________

I can successfully ping 118 from 17. (host B from the VM), however I
can't see the ICMP requests on host A anywhere!
I can see them inside host B on ens2f0, I can see them in the VM on
the failover interface but not on Host A.
Not on the brdige (test_br0) as I would expect, not on the ens2f0
interface, not co cc1_72 (virtio-net) interface and of-course not on
the world interface.
This leads me to think that the icmp requests are send on the "vf"
interface which I cant see on the host. The thing that further
confirms my theory is when
I use device_del to unplug the primary interface, the ping get
disconnected. Using tcpdump I can see that the ping requests arrive to
host B and there is a
suitable ping reply, however the reply is not present on Host A or the
VM anywhere, moreover, when the primary gets disconnected I start
seeing the ping
requests on Host A on the "test_br0" and "ens2f0".

Liran do you think this is related to the mac vtables and vfs issue
that you've mentioned on the monthly meeting?


>
> > >
> > > --
> > > MST
> >
> >
> >
> > --
> > Respectfully,
> > Sameeh Jubran
> > Linkedin
> > Software Engineer @ Daynix.



-- 
Respectfully,
Sameeh Jubran
Linkedin
Software Engineer @ Daynix.

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-11-26 15:13                                               ` Sameeh Jubran
@ 2018-11-26 15:43                                                 ` Sameeh Jubran
  2018-11-26 20:22                                                   ` Samudrala, Sridhar
  0 siblings, 1 reply; 85+ messages in thread
From: Sameeh Jubran @ 2018-11-26 15:43 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Siwei Liu, venu.busireddy, cohuck, sridhar.samudrala, virtio-dev,
	liran.alon, Yan Vugenfirer

On Mon, Nov 26, 2018 at 5:13 PM Sameeh Jubran <sameeh@daynix.com> wrote:
>
> On Thu, Nov 22, 2018 at 8:27 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Wed, Nov 21, 2018 at 10:04:53PM +0200, Sameeh Jubran wrote:
> > > On Wed, Nov 21, 2018 at 8:41 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > Great to see you making progress on this!
> > > > Some comments below:
> > > >
> > > > On Wed, Nov 21, 2018 at 05:39:38PM +0200, Sameeh Jubran wrote:
> > > > > I have created a setup which has two hosts (host A and host B) with X710 10G
> > > > > cards connected back to back. On one host (I'll refer to this host as host A) I
> > > > > have configured a bridge with the PF interface as well as vitio-net's interface
> > > > > (standby) both attached to it.
> > > >
> > > > ...
> > > >
> > > > > The command line I used:
> > > > >
> > > > > /root/qemu/x86_64-softmmu/qemu-system-x86_64 \
> > > > > -netdev tap,id=hostnet0,script=world_bridge_standalone.sh,downscript=no,ifname=
> > > > > cc17 \
> > > > > -device e1000,netdev=hostnet0,mac=56:cc:c1:01:cc:21,id=cc17 \
> > > >
> > > > What's e1000 doing here?
> > > > Can this be reason you can not talk to host?
> > > I don't think so, the e1000 is for enabling WAN connection on the
> > > guest for downloading packages and ssh connection. It is connected to
> > > a separate bridge which is connected to the external interface of the
> > > host.
> > > >
> > > > > -netdev tap,vhost=on,id=hostnet1,script=test_bridge_standalone.sh,downscript=
> > > > > no,ifname=cc1_72,queues=4 \
> > > > > -device virtio-net,host_mtu=1500,netdev=hostnet1,mac=8a:f7:20:29:3b:cb,id=
> > > > > cc1_72,vectors=10,mq=on,primary=cc1_71 \
> > > > > -device vfio-pci,host=65:02.1,id=cc1_71,standby=cc1_72 \
> > > > > -enable-kvm \
> > > > > -name netkvm \
> > > > > -m 3000M \
> > > > > -drive file=/dev/shm/fedora_29.qcow2,if=ide,id=drivex \
> > > > > -smp 4 \
> > > > > -vga qxl \
> > > > > -spice port=6110,disable-ticketing \
> > > > > -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x7 \
> > > > > -chardev spicevmc,name=vdagent,id=vdagent \
> > > > > -device virtserialport,nr=1,bus=virtio-serial0.0,chardev=vdagent,name=
> > > > > com.redhat.spice.0 \
> > > > > -chardev socket,path=/tmp/qga.sock,server,nowait,id=qga0 \
> > > > > -device virtio-serial \
> > > > > -device virtserialport,chardev=qga0,name=org.qemu.guest_agent.0 \
> > > > > -monitor stdio
> > > >
> > > >
> > > > ...
> > > >
> > > > > Since I couldn't ping from VM to host B, I did an iperf test between the VM and
> > > > > host A with the feature enabled and during the test I have unplugged the sriov
> > > > > device, the device was unplugged successfully and no drops where observed as
> > > > > you can see in the results below:
> > > > >
> > > > > [root@dhcp156-44 ~]# ifconfig
> > > >
> > > > Well I suspect this won't tell you anything, this shows packet drops at
> > > > the hardware level. When e.g. link is down linux won't send any packets
> > > > out. The simplest test is to monitor latency and throughput and see that
> > > > while it is lower for the duration of migration, there are no huge
> > > > spikes around the switch.
> > > Oh, okay will do that.
> > >
> > > I have noticed some nasty lag when I tried to ssh to the VM using the
> > > failover interface while I didn't experience that with the e1000.
> > > Sridhar Any idea what might be the cause?
> >
> > Try tcpdump?
> I have investigated this and this is what I have so far, maybe you can
> help me with some insights to figure what's going on.
> The setup is as follows:
>
>
>
> |_VM_|
> __||___
> |host A|----X710---------back-to-back--------X710---|host B|
>
> _______________________________________________________________________
> - On the host A:
>
> I have the following interfaces attached to the "test_br0" bridge:
>
> virtio-net's netdev, cc1_72
> X710 device PF interfaces: ens2f0 and ens2f1 (only ens2f0 is connected
> in the back to back setup)
>
> The bridge has the mac address of the PF ens2f0 and ip : 192.168.1.117
> _______________________________________________________________________
> - On the host B:
>
> I have the following interfaces attached to the "test_br0" bridge:
>
> X710 device PF interfaces: ens2f0 and ens2f1 (only ens2f0 is connected
> in the back to back setup)
>
> The bridge has the mac address of the PF ens2f0 and ip : 192.168.1.118
> _______________________________________________________________________
> - On the VM:
> The failover interface has the ip: 192.168.1.17
> _______________________________________________________________________
>
> I can successfully ping 118 from 17. (host B from the VM), however I
> can't see the ICMP requests on host A anywhere!
> I can see them inside host B on ens2f0, I can see them in the VM on
> the failover interface but not on Host A.
> Not on the brdige (test_br0) as I would expect, not on the ens2f0
> interface, not co cc1_72 (virtio-net) interface and of-course not on
> the world interface.
> This leads me to think that the icmp requests are send on the "vf"
> interface which I cant see on the host. The thing that further
> confirms my theory is when
> I use device_del to unplug the primary interface, the ping get
> disconnected. Using tcpdump I can see that the ping requests arrive to
> host B and there is a
> suitable ping reply, however the reply is not present on Host A or the
> VM anywhere, moreover, when the primary gets disconnected I start
> seeing the ping
> requests on Host A on the "test_br0" and "ens2f0".
>
> Liran do you think this is related to the mac vtables and vfs issue
> that you've mentioned on the monthly meeting?
>
>
Update:
I have just set the vf mac's address to 0 (ip link set ens2f0 vf 1 mac
00:00:00:00:00:00) after unplugging it (the primary device) and the
pings started working again on the failover interface. So it seems
like the frames were arriving to the vf on the host.

> >
> > > >
> > > > --
> > > > MST
> > >
> > >
> > >
> > > --
> > > Respectfully,
> > > Sameeh Jubran
> > > Linkedin
> > > Software Engineer @ Daynix.
>
>
>
> --
> Respectfully,
> Sameeh Jubran
> Linkedin
> Software Engineer @ Daynix.



-- 
Respectfully,
Sameeh Jubran
Linkedin
Software Engineer @ Daynix.

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-11-26 15:43                                                 ` Sameeh Jubran
@ 2018-11-26 20:22                                                   ` Samudrala, Sridhar
  2018-11-27 11:24                                                     ` Sameeh Jubran
  2018-11-28 17:08                                                     ` Michael S. Tsirkin
  0 siblings, 2 replies; 85+ messages in thread
From: Samudrala, Sridhar @ 2018-11-26 20:22 UTC (permalink / raw)
  To: Sameeh Jubran, Michael S. Tsirkin
  Cc: Siwei Liu, venu.busireddy, cohuck, virtio-dev, liran.alon,
	Yan Vugenfirer

On 11/26/2018 7:43 AM, Sameeh Jubran wrote:
> On Mon, Nov 26, 2018 at 5:13 PM Sameeh Jubran <sameeh@daynix.com> wrote:
>> On Thu, Nov 22, 2018 at 8:27 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>>> On Wed, Nov 21, 2018 at 10:04:53PM +0200, Sameeh Jubran wrote:
>>>> On Wed, Nov 21, 2018 at 8:41 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>>>>> Great to see you making progress on this!
>>>>> Some comments below:
>>>>>
>>>>> On Wed, Nov 21, 2018 at 05:39:38PM +0200, Sameeh Jubran wrote:
>>>>>> I have created a setup which has two hosts (host A and host B) with X710 10G
>>>>>> cards connected back to back. On one host (I'll refer to this host as host A) I
>>>>>> have configured a bridge with the PF interface as well as vitio-net's interface
>>>>>> (standby) both attached to it.
>>>>> ...
>>>>>
>>>>>> The command line I used:
>>>>>>
>>>>>> /root/qemu/x86_64-softmmu/qemu-system-x86_64 \
>>>>>> -netdev tap,id=hostnet0,script=world_bridge_standalone.sh,downscript=no,ifname=
>>>>>> cc17 \
>>>>>> -device e1000,netdev=hostnet0,mac=56:cc:c1:01:cc:21,id=cc17 \
>>>>> What's e1000 doing here?
>>>>> Can this be reason you can not talk to host?
>>>> I don't think so, the e1000 is for enabling WAN connection on the
>>>> guest for downloading packages and ssh connection. It is connected to
>>>> a separate bridge which is connected to the external interface of the
>>>> host.
>>>>>> -netdev tap,vhost=on,id=hostnet1,script=test_bridge_standalone.sh,downscript=
>>>>>> no,ifname=cc1_72,queues=4 \
>>>>>> -device virtio-net,host_mtu=1500,netdev=hostnet1,mac=8a:f7:20:29:3b:cb,id=
>>>>>> cc1_72,vectors=10,mq=on,primary=cc1_71 \
>>>>>> -device vfio-pci,host=65:02.1,id=cc1_71,standby=cc1_72 \
>>>>>> -enable-kvm \
>>>>>> -name netkvm \
>>>>>> -m 3000M \
>>>>>> -drive file=/dev/shm/fedora_29.qcow2,if=ide,id=drivex \
>>>>>> -smp 4 \
>>>>>> -vga qxl \
>>>>>> -spice port=6110,disable-ticketing \
>>>>>> -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x7 \
>>>>>> -chardev spicevmc,name=vdagent,id=vdagent \
>>>>>> -device virtserialport,nr=1,bus=virtio-serial0.0,chardev=vdagent,name=
>>>>>> com.redhat.spice.0 \
>>>>>> -chardev socket,path=/tmp/qga.sock,server,nowait,id=qga0 \
>>>>>> -device virtio-serial \
>>>>>> -device virtserialport,chardev=qga0,name=org.qemu.guest_agent.0 \
>>>>>> -monitor stdio
>>>>>
>>>>> ...
>>>>>
>>>>>> Since I couldn't ping from VM to host B, I did an iperf test between the VM and
>>>>>> host A with the feature enabled and during the test I have unplugged the sriov
>>>>>> device, the device was unplugged successfully and no drops where observed as
>>>>>> you can see in the results below:
>>>>>>
>>>>>> [root@dhcp156-44 ~]# ifconfig
>>>>> Well I suspect this won't tell you anything, this shows packet drops at
>>>>> the hardware level. When e.g. link is down linux won't send any packets
>>>>> out. The simplest test is to monitor latency and throughput and see that
>>>>> while it is lower for the duration of migration, there are no huge
>>>>> spikes around the switch.
>>>> Oh, okay will do that.
>>>>
>>>> I have noticed some nasty lag when I tried to ssh to the VM using the
>>>> failover interface while I didn't experience that with the e1000.
>>>> Sridhar Any idea what might be the cause?
>>> Try tcpdump?
>> I have investigated this and this is what I have so far, maybe you can
>> help me with some insights to figure what's going on.
>> The setup is as follows:
>>
>>
>>
>> |_VM_|
>> __||___
>> |host A|----X710---------back-to-back--------X710---|host B|
>>
>> _______________________________________________________________________
>> - On the host A:
>>
>> I have the following interfaces attached to the "test_br0" bridge:
>>
>> virtio-net's netdev, cc1_72
>> X710 device PF interfaces: ens2f0 and ens2f1 (only ens2f0 is connected
>> in the back to back setup)
>>
>> The bridge has the mac address of the PF ens2f0 and ip : 192.168.1.117
>> _______________________________________________________________________
>> - On the host B:
>>
>> I have the following interfaces attached to the "test_br0" bridge:
>>
>> X710 device PF interfaces: ens2f0 and ens2f1 (only ens2f0 is connected
>> in the back to back setup)
>>
>> The bridge has the mac address of the PF ens2f0 and ip : 192.168.1.118
>> _______________________________________________________________________
>> - On the VM:
>> The failover interface has the ip: 192.168.1.17
>> _______________________________________________________________________
>>
>> I can successfully ping 118 from 17. (host B from the VM), however I
>> can't see the ICMP requests on host A anywhere!
>> I can see them inside host B on ens2f0, I can see them in the VM on
>> the failover interface but not on Host A.
>> Not on the brdige (test_br0) as I would expect, not on the ens2f0
>> interface, not co cc1_72 (virtio-net) interface and of-course not on
>> the world interface.

This is the expected behavior when VF is directly attached to the VM and is
being used as the primary interface. You don't see any packets on Host A.

>> This leads me to think that the icmp requests are send on the "vf"
>> interface which I cant see on the host. The thing that further
>> confirms my theory is when
>> I use device_del to unplug the primary interface, the ping get
>> disconnected. Using tcpdump I can see that the ping requests arrive to
>> host B and there is a
>> suitable ping reply, however the reply is not present on Host A or the
>> VM anywhere, moreover, when the primary gets disconnected I start
>> seeing the ping
>> requests on Host A on the "test_br0" and "ens2f0".
>>
>> Liran do you think this is related to the mac vtables and vfs issue
>> that you've mentioned on the monthly meeting?
>>
>>
> Update:
> I have just set the vf mac's address to 0 (ip link set ens2f0 vf 1 mac
> 00:00:00:00:00:00) after unplugging it (the primary device) and the
> pings started working again on the failover interface. So it seems
> like the frames were arriving to the vf on the host.
>
>

Yes. When the VF is unplugged, you need to reset the VFs MAC so that the packets
with VMs MAC start flowing via VF, bridge and the virtio interface.

Have you looked at this documentation that shows a sample script to initiate live
migration?
https://www.kernel.org/doc/html/latest/networking/net_failover.html

-Sridhar


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-11-26 20:22                                                   ` Samudrala, Sridhar
@ 2018-11-27 11:24                                                     ` Sameeh Jubran
  2018-11-28 17:08                                                     ` Michael S. Tsirkin
  1 sibling, 0 replies; 85+ messages in thread
From: Sameeh Jubran @ 2018-11-27 11:24 UTC (permalink / raw)
  To: sridhar.samudrala
  Cc: Michael S. Tsirkin, Siwei Liu, venu.busireddy, cohuck,
	virtio-dev, liran.alon, Yan Vugenfirer

On Mon, Nov 26, 2018 at 10:22 PM Samudrala, Sridhar
<sridhar.samudrala@intel.com> wrote:
>
> On 11/26/2018 7:43 AM, Sameeh Jubran wrote:
> > On Mon, Nov 26, 2018 at 5:13 PM Sameeh Jubran <sameeh@daynix.com> wrote:
> >> On Thu, Nov 22, 2018 at 8:27 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >>> On Wed, Nov 21, 2018 at 10:04:53PM +0200, Sameeh Jubran wrote:
> >>>> On Wed, Nov 21, 2018 at 8:41 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >>>>> Great to see you making progress on this!
> >>>>> Some comments below:
> >>>>>
> >>>>> On Wed, Nov 21, 2018 at 05:39:38PM +0200, Sameeh Jubran wrote:
> >>>>>> I have created a setup which has two hosts (host A and host B) with X710 10G
> >>>>>> cards connected back to back. On one host (I'll refer to this host as host A) I
> >>>>>> have configured a bridge with the PF interface as well as vitio-net's interface
> >>>>>> (standby) both attached to it.
> >>>>> ...
> >>>>>
> >>>>>> The command line I used:
> >>>>>>
> >>>>>> /root/qemu/x86_64-softmmu/qemu-system-x86_64 \
> >>>>>> -netdev tap,id=hostnet0,script=world_bridge_standalone.sh,downscript=no,ifname=
> >>>>>> cc17 \
> >>>>>> -device e1000,netdev=hostnet0,mac=56:cc:c1:01:cc:21,id=cc17 \
> >>>>> What's e1000 doing here?
> >>>>> Can this be reason you can not talk to host?
> >>>> I don't think so, the e1000 is for enabling WAN connection on the
> >>>> guest for downloading packages and ssh connection. It is connected to
> >>>> a separate bridge which is connected to the external interface of the
> >>>> host.
> >>>>>> -netdev tap,vhost=on,id=hostnet1,script=test_bridge_standalone.sh,downscript=
> >>>>>> no,ifname=cc1_72,queues=4 \
> >>>>>> -device virtio-net,host_mtu=1500,netdev=hostnet1,mac=8a:f7:20:29:3b:cb,id=
> >>>>>> cc1_72,vectors=10,mq=on,primary=cc1_71 \
> >>>>>> -device vfio-pci,host=65:02.1,id=cc1_71,standby=cc1_72 \
> >>>>>> -enable-kvm \
> >>>>>> -name netkvm \
> >>>>>> -m 3000M \
> >>>>>> -drive file=/dev/shm/fedora_29.qcow2,if=ide,id=drivex \
> >>>>>> -smp 4 \
> >>>>>> -vga qxl \
> >>>>>> -spice port=6110,disable-ticketing \
> >>>>>> -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x7 \
> >>>>>> -chardev spicevmc,name=vdagent,id=vdagent \
> >>>>>> -device virtserialport,nr=1,bus=virtio-serial0.0,chardev=vdagent,name=
> >>>>>> com.redhat.spice.0 \
> >>>>>> -chardev socket,path=/tmp/qga.sock,server,nowait,id=qga0 \
> >>>>>> -device virtio-serial \
> >>>>>> -device virtserialport,chardev=qga0,name=org.qemu.guest_agent.0 \
> >>>>>> -monitor stdio
> >>>>>
> >>>>> ...
> >>>>>
> >>>>>> Since I couldn't ping from VM to host B, I did an iperf test between the VM and
> >>>>>> host A with the feature enabled and during the test I have unplugged the sriov
> >>>>>> device, the device was unplugged successfully and no drops where observed as
> >>>>>> you can see in the results below:
> >>>>>>
> >>>>>> [root@dhcp156-44 ~]# ifconfig
> >>>>> Well I suspect this won't tell you anything, this shows packet drops at
> >>>>> the hardware level. When e.g. link is down linux won't send any packets
> >>>>> out. The simplest test is to monitor latency and throughput and see that
> >>>>> while it is lower for the duration of migration, there are no huge
> >>>>> spikes around the switch.
> >>>> Oh, okay will do that.
> >>>>
> >>>> I have noticed some nasty lag when I tried to ssh to the VM using the
> >>>> failover interface while I didn't experience that with the e1000.
> >>>> Sridhar Any idea what might be the cause?
> >>> Try tcpdump?
> >> I have investigated this and this is what I have so far, maybe you can
> >> help me with some insights to figure what's going on.
> >> The setup is as follows:
> >>
> >>
> >>
> >> |_VM_|
> >> __||___
> >> |host A|----X710---------back-to-back--------X710---|host B|
> >>
> >> _______________________________________________________________________
> >> - On the host A:
> >>
> >> I have the following interfaces attached to the "test_br0" bridge:
> >>
> >> virtio-net's netdev, cc1_72
> >> X710 device PF interfaces: ens2f0 and ens2f1 (only ens2f0 is connected
> >> in the back to back setup)
> >>
> >> The bridge has the mac address of the PF ens2f0 and ip : 192.168.1.117
> >> _______________________________________________________________________
> >> - On the host B:
> >>
> >> I have the following interfaces attached to the "test_br0" bridge:
> >>
> >> X710 device PF interfaces: ens2f0 and ens2f1 (only ens2f0 is connected
> >> in the back to back setup)
> >>
> >> The bridge has the mac address of the PF ens2f0 and ip : 192.168.1.118
> >> _______________________________________________________________________
> >> - On the VM:
> >> The failover interface has the ip: 192.168.1.17
> >> _______________________________________________________________________
> >>
> >> I can successfully ping 118 from 17. (host B from the VM), however I
> >> can't see the ICMP requests on host A anywhere!
> >> I can see them inside host B on ens2f0, I can see them in the VM on
> >> the failover interface but not on Host A.
> >> Not on the brdige (test_br0) as I would expect, not on the ens2f0
> >> interface, not co cc1_72 (virtio-net) interface and of-course not on
> >> the world interface.
>
> This is the expected behavior when VF is directly attached to the VM and is
> being used as the primary interface. You don't see any packets on Host A.
>
> >> This leads me to think that the icmp requests are send on the "vf"
> >> interface which I cant see on the host. The thing that further
> >> confirms my theory is when
> >> I use device_del to unplug the primary interface, the ping get
> >> disconnected. Using tcpdump I can see that the ping requests arrive to
> >> host B and there is a
> >> suitable ping reply, however the reply is not present on Host A or the
> >> VM anywhere, moreover, when the primary gets disconnected I start
> >> seeing the ping
> >> requests on Host A on the "test_br0" and "ens2f0".
> >>
> >> Liran do you think this is related to the mac vtables and vfs issue
> >> that you've mentioned on the monthly meeting?
> >>
> >>
> > Update:
> > I have just set the vf mac's address to 0 (ip link set ens2f0 vf 1 mac
> > 00:00:00:00:00:00) after unplugging it (the primary device) and the
> > pings started working again on the failover interface. So it seems
> > like the frames were arriving to the vf on the host.
> >
> >
>
> Yes. When the VF is unplugged, you need to reset the VFs MAC so that the packets
> with VMs MAC start flowing via VF, bridge and the virtio interface.
>
> Have you looked at this documentation that shows a sample script to initiate live
> migration?
> https://www.kernel.org/doc/html/latest/networking/net_failover.html

This means that we need the management cooperation to do this and
Qemu-driver implementation only won't suffice. What do you suggest as
an alternatives?
Please share your thoughts
>
> -Sridhar
>


-- 
Respectfully,
Sameeh Jubran
Linkedin
Software Engineer @ Daynix.

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-11-26 20:22                                                   ` Samudrala, Sridhar
  2018-11-27 11:24                                                     ` Sameeh Jubran
@ 2018-11-28 17:08                                                     ` Michael S. Tsirkin
  2018-11-28 17:31                                                       ` Samudrala, Sridhar
  1 sibling, 1 reply; 85+ messages in thread
From: Michael S. Tsirkin @ 2018-11-28 17:08 UTC (permalink / raw)
  To: Samudrala, Sridhar
  Cc: Sameeh Jubran, Siwei Liu, venu.busireddy, cohuck, virtio-dev,
	liran.alon, Yan Vugenfirer

On Mon, Nov 26, 2018 at 12:22:56PM -0800, Samudrala, Sridhar wrote:
> > Update:
> > I have just set the vf mac's address to 0 (ip link set ens2f0 vf 1 mac
> > 00:00:00:00:00:00) after unplugging it (the primary device) and the
> > pings started working again on the failover interface. So it seems
> > like the frames were arriving to the vf on the host.
> > 
> > 
> 
> Yes. When the VF is unplugged, you need to reset the VFs MAC so that the packets
> with VMs MAC start flowing via VF, bridge and the virtio interface.
> 
> Have you looked at this documentation that shows a sample script to initiate live
> migration?
> https://www.kernel.org/doc/html/latest/networking/net_failover.html
> 
> -Sridhar

Interesting I didn't notice it does this. So in fact
just defining VF mac will immediately divert packets
to the VF? Given guest driver did not initialize VF
yet won't a bunch of packets be dropped?

-- 
MST

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-11-28 17:08                                                     ` Michael S. Tsirkin
@ 2018-11-28 17:31                                                       ` Samudrala, Sridhar
  2018-11-28 17:35                                                         ` Michael S. Tsirkin
  0 siblings, 1 reply; 85+ messages in thread
From: Samudrala, Sridhar @ 2018-11-28 17:31 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Sameeh Jubran, Siwei Liu, venu.busireddy, cohuck, virtio-dev,
	liran.alon, Yan Vugenfirer



On 11/28/2018 9:08 AM, Michael S. Tsirkin wrote:
> On Mon, Nov 26, 2018 at 12:22:56PM -0800, Samudrala, Sridhar wrote:
>>> Update:
>>> I have just set the vf mac's address to 0 (ip link set ens2f0 vf 1 mac
>>> 00:00:00:00:00:00) after unplugging it (the primary device) and the
>>> pings started working again on the failover interface. So it seems
>>> like the frames were arriving to the vf on the host.
>>>
>>>
>> Yes. When the VF is unplugged, you need to reset the VFs MAC so that the packets
>> with VMs MAC start flowing via VF, bridge and the virtio interface.
>>
>> Have you looked at this documentation that shows a sample script to initiate live
>> migration?
>> https://www.kernel.org/doc/html/latest/networking/net_failover.html
>>
>> -Sridhar
> Interesting I didn't notice it does this. So in fact
> just defining VF mac will immediately divert packets
> to the VF? Given guest driver did not initialize VF
> yet won't a bunch of packets be dropped?

There is typo in my stmt above (VF->PF)
When the VF is unplugged, you need to reset the VFs MAC so that the packets
with VMs MAC start flowing via PF, bridge and the virtio interface.

When the VF is plugged in, ideally the MAC filter for the VF should be added to
the HW once the guest driver comes up and can receive packets. Currently with intel
drivers, the filter gets added to HW as soon as the host admin sets the VFs MAC via
ndo_set_vf_mac() api. So potentially there could be packet drops until the VF driver
comes up in the VM.




---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-11-28 17:31                                                       ` Samudrala, Sridhar
@ 2018-11-28 17:35                                                         ` Michael S. Tsirkin
  2018-11-28 18:39                                                           ` Samudrala, Sridhar
  0 siblings, 1 reply; 85+ messages in thread
From: Michael S. Tsirkin @ 2018-11-28 17:35 UTC (permalink / raw)
  To: Samudrala, Sridhar
  Cc: Sameeh Jubran, Siwei Liu, venu.busireddy, cohuck, virtio-dev,
	liran.alon, Yan Vugenfirer

On Wed, Nov 28, 2018 at 09:31:32AM -0800, Samudrala, Sridhar wrote:
> 
> 
> On 11/28/2018 9:08 AM, Michael S. Tsirkin wrote:
> > On Mon, Nov 26, 2018 at 12:22:56PM -0800, Samudrala, Sridhar wrote:
> > > > Update:
> > > > I have just set the vf mac's address to 0 (ip link set ens2f0 vf 1 mac
> > > > 00:00:00:00:00:00) after unplugging it (the primary device) and the
> > > > pings started working again on the failover interface. So it seems
> > > > like the frames were arriving to the vf on the host.
> > > > 
> > > > 
> > > Yes. When the VF is unplugged, you need to reset the VFs MAC so that the packets
> > > with VMs MAC start flowing via VF, bridge and the virtio interface.
> > > 
> > > Have you looked at this documentation that shows a sample script to initiate live
> > > migration?
> > > https://www.kernel.org/doc/html/latest/networking/net_failover.html
> > > 
> > > -Sridhar
> > Interesting I didn't notice it does this. So in fact
> > just defining VF mac will immediately divert packets
> > to the VF? Given guest driver did not initialize VF
> > yet won't a bunch of packets be dropped?
> 
> There is typo in my stmt above (VF->PF)
> When the VF is unplugged, you need to reset the VFs MAC so that the packets
> with VMs MAC start flowing via PF, bridge and the virtio interface.
> 
> When the VF is plugged in, ideally the MAC filter for the VF should be added to
> the HW once the guest driver comes up and can receive packets. Currently with intel
> drivers, the filter gets added to HW as soon as the host admin sets the VFs MAC via
> ndo_set_vf_mac() api. So potentially there could be packet drops until the VF driver
> comes up in the VM.
> 
> 

Can this be fixed in the intel drivers?

-- 
MST

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-11-28 17:35                                                         ` Michael S. Tsirkin
@ 2018-11-28 18:39                                                           ` Samudrala, Sridhar
  2018-11-28 18:51                                                             ` Michael S. Tsirkin
  2018-11-28 20:06                                                             ` Michael S. Tsirkin
  0 siblings, 2 replies; 85+ messages in thread
From: Samudrala, Sridhar @ 2018-11-28 18:39 UTC (permalink / raw)
  To: Michael S. Tsirkin, Carolyn
  Cc: Sameeh Jubran, Siwei Liu, venu.busireddy, cohuck, virtio-dev,
	liran.alon, Yan Vugenfirer, Brandeburg, Jesse

On 11/28/2018 9:35 AM, Michael S. Tsirkin wrote:
> On Wed, Nov 28, 2018 at 09:31:32AM -0800, Samudrala, Sridhar wrote:
>>
>> On 11/28/2018 9:08 AM, Michael S. Tsirkin wrote:
>>> On Mon, Nov 26, 2018 at 12:22:56PM -0800, Samudrala, Sridhar wrote:
>>>>> Update:
>>>>> I have just set the vf mac's address to 0 (ip link set ens2f0 vf 1 mac
>>>>> 00:00:00:00:00:00) after unplugging it (the primary device) and the
>>>>> pings started working again on the failover interface. So it seems
>>>>> like the frames were arriving to the vf on the host.
>>>>>
>>>>>
>>>> Yes. When the VF is unplugged, you need to reset the VFs MAC so that the packets
>>>> with VMs MAC start flowing via VF, bridge and the virtio interface.
>>>>
>>>> Have you looked at this documentation that shows a sample script to initiate live
>>>> migration?
>>>> https://www.kernel.org/doc/html/latest/networking/net_failover.html
>>>>
>>>> -Sridhar
>>> Interesting I didn't notice it does this. So in fact
>>> just defining VF mac will immediately divert packets
>>> to the VF? Given guest driver did not initialize VF
>>> yet won't a bunch of packets be dropped?
>> There is typo in my stmt above (VF->PF)
>> When the VF is unplugged, you need to reset the VFs MAC so that the packets
>> with VMs MAC start flowing via PF, bridge and the virtio interface.
>>
>> When the VF is plugged in, ideally the MAC filter for the VF should be added to
>> the HW once the guest driver comes up and can receive packets. Currently with intel
>> drivers, the filter gets added to HW as soon as the host admin sets the VFs MAC via
>> ndo_set_vf_mac() api. So potentially there could be packet drops until the VF driver
>> comes up in the VM.
>>
>>
> Can this be fixed in the intel drivers?

I just checked and it looks like this seems to have been addressed in the
ice 100Gb driver. Will bring this up issue internally to see if we can change this
behavior in i40e/ixgbe drivers.


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-11-28 18:39                                                           ` Samudrala, Sridhar
@ 2018-11-28 18:51                                                             ` Michael S. Tsirkin
  2018-11-29  6:29                                                               ` Samudrala, Sridhar
  2018-11-28 20:06                                                             ` Michael S. Tsirkin
  1 sibling, 1 reply; 85+ messages in thread
From: Michael S. Tsirkin @ 2018-11-28 18:51 UTC (permalink / raw)
  To: Samudrala, Sridhar
  Cc: Carolyn, Sameeh Jubran, Siwei Liu, venu.busireddy, cohuck,
	virtio-dev, liran.alon, Yan Vugenfirer, Brandeburg, Jesse

On Wed, Nov 28, 2018 at 10:39:55AM -0800, Samudrala, Sridhar wrote:
> On 11/28/2018 9:35 AM, Michael S. Tsirkin wrote:
> > On Wed, Nov 28, 2018 at 09:31:32AM -0800, Samudrala, Sridhar wrote:
> > > 
> > > On 11/28/2018 9:08 AM, Michael S. Tsirkin wrote:
> > > > On Mon, Nov 26, 2018 at 12:22:56PM -0800, Samudrala, Sridhar wrote:
> > > > > > Update:
> > > > > > I have just set the vf mac's address to 0 (ip link set ens2f0 vf 1 mac
> > > > > > 00:00:00:00:00:00) after unplugging it (the primary device) and the
> > > > > > pings started working again on the failover interface. So it seems
> > > > > > like the frames were arriving to the vf on the host.
> > > > > > 
> > > > > > 
> > > > > Yes. When the VF is unplugged, you need to reset the VFs MAC so that the packets
> > > > > with VMs MAC start flowing via VF, bridge and the virtio interface.
> > > > > 
> > > > > Have you looked at this documentation that shows a sample script to initiate live
> > > > > migration?
> > > > > https://www.kernel.org/doc/html/latest/networking/net_failover.html
> > > > > 
> > > > > -Sridhar
> > > > Interesting I didn't notice it does this. So in fact
> > > > just defining VF mac will immediately divert packets
> > > > to the VF? Given guest driver did not initialize VF
> > > > yet won't a bunch of packets be dropped?
> > > There is typo in my stmt above (VF->PF)
> > > When the VF is unplugged, you need to reset the VFs MAC so that the packets
> > > with VMs MAC start flowing via PF, bridge and the virtio interface.
> > > 
> > > When the VF is plugged in, ideally the MAC filter for the VF should be added to
> > > the HW once the guest driver comes up and can receive packets. Currently with intel
> > > drivers, the filter gets added to HW as soon as the host admin sets the VFs MAC via
> > > ndo_set_vf_mac() api. So potentially there could be packet drops until the VF driver
> > > comes up in the VM.
> > > 
> > > 
> > Can this be fixed in the intel drivers?
> 
> I just checked and it looks like this seems to have been addressed in the
> ice 100Gb driver.

Thanks! Could you pls point out the relevant code/commit id?

> Will bring this up issue internally to see if we can change this
> behavior in i40e/ixgbe drivers.

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-11-28 18:39                                                           ` Samudrala, Sridhar
  2018-11-28 18:51                                                             ` Michael S. Tsirkin
@ 2018-11-28 20:06                                                             ` Michael S. Tsirkin
  2018-11-28 20:28                                                               ` si-wei liu
  1 sibling, 1 reply; 85+ messages in thread
From: Michael S. Tsirkin @ 2018-11-28 20:06 UTC (permalink / raw)
  To: Samudrala, Sridhar
  Cc: Carolyn, Sameeh Jubran, Siwei Liu, venu.busireddy, cohuck,
	virtio-dev, liran.alon, Yan Vugenfirer, Brandeburg, Jesse

On Wed, Nov 28, 2018 at 10:39:55AM -0800, Samudrala, Sridhar wrote:
> On 11/28/2018 9:35 AM, Michael S. Tsirkin wrote:
> > On Wed, Nov 28, 2018 at 09:31:32AM -0800, Samudrala, Sridhar wrote:
> > > 
> > > On 11/28/2018 9:08 AM, Michael S. Tsirkin wrote:
> > > > On Mon, Nov 26, 2018 at 12:22:56PM -0800, Samudrala, Sridhar wrote:
> > > > > > Update:
> > > > > > I have just set the vf mac's address to 0 (ip link set ens2f0 vf 1 mac
> > > > > > 00:00:00:00:00:00) after unplugging it (the primary device) and the
> > > > > > pings started working again on the failover interface. So it seems
> > > > > > like the frames were arriving to the vf on the host.
> > > > > > 
> > > > > > 
> > > > > Yes. When the VF is unplugged, you need to reset the VFs MAC so that the packets
> > > > > with VMs MAC start flowing via VF, bridge and the virtio interface.
> > > > > 
> > > > > Have you looked at this documentation that shows a sample script to initiate live
> > > > > migration?
> > > > > https://www.kernel.org/doc/html/latest/networking/net_failover.html
> > > > > 
> > > > > -Sridhar
> > > > Interesting I didn't notice it does this. So in fact
> > > > just defining VF mac will immediately divert packets
> > > > to the VF? Given guest driver did not initialize VF
> > > > yet won't a bunch of packets be dropped?
> > > There is typo in my stmt above (VF->PF)
> > > When the VF is unplugged, you need to reset the VFs MAC so that the packets
> > > with VMs MAC start flowing via PF, bridge and the virtio interface.
> > > 
> > > When the VF is plugged in, ideally the MAC filter for the VF should be added to
> > > the HW once the guest driver comes up and can receive packets. Currently with intel
> > > drivers, the filter gets added to HW as soon as the host admin sets the VFs MAC via
> > > ndo_set_vf_mac() api. So potentially there could be packet drops until the VF driver
> > > comes up in the VM.
> > > 
> > > 
> > Can this be fixed in the intel drivers?
> 
> I just checked and it looks like this seems to have been addressed in the
> ice 100Gb driver. Will bring this up issue internally to see if we can change this
> behavior in i40e/ixgbe drivers.

Also what happens if the mac is programmed both in PF (e.g. with
macvtap) and VF? Ideally VF will take precedence.

-- 
MST

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-11-28 20:06                                                             ` Michael S. Tsirkin
@ 2018-11-28 20:28                                                               ` si-wei liu
  2018-11-28 20:43                                                                 ` Michael S. Tsirkin
  2018-11-29  1:15                                                                 ` Michael S. Tsirkin
  0 siblings, 2 replies; 85+ messages in thread
From: si-wei liu @ 2018-11-28 20:28 UTC (permalink / raw)
  To: Michael S. Tsirkin, Samudrala, Sridhar
  Cc: Carolyn, Sameeh Jubran, Siwei Liu, venu.busireddy, cohuck,
	virtio-dev, liran.alon, Yan Vugenfirer, Brandeburg, Jesse



On 11/28/2018 12:06 PM, Michael S. Tsirkin wrote:
> On Wed, Nov 28, 2018 at 10:39:55AM -0800, Samudrala, Sridhar wrote:
>> On 11/28/2018 9:35 AM, Michael S. Tsirkin wrote:
>>> On Wed, Nov 28, 2018 at 09:31:32AM -0800, Samudrala, Sridhar wrote:
>>>> On 11/28/2018 9:08 AM, Michael S. Tsirkin wrote:
>>>>> On Mon, Nov 26, 2018 at 12:22:56PM -0800, Samudrala, Sridhar wrote:
>>>>>>> Update:
>>>>>>> I have just set the vf mac's address to 0 (ip link set ens2f0 vf 1 mac
>>>>>>> 00:00:00:00:00:00) after unplugging it (the primary device) and the
>>>>>>> pings started working again on the failover interface. So it seems
>>>>>>> like the frames were arriving to the vf on the host.
>>>>>>>
>>>>>>>
>>>>>> Yes. When the VF is unplugged, you need to reset the VFs MAC so that the packets
>>>>>> with VMs MAC start flowing via VF, bridge and the virtio interface.
>>>>>>
>>>>>> Have you looked at this documentation that shows a sample script to initiate live
>>>>>> migration?
>>>>>> https://www.kernel.org/doc/html/latest/networking/net_failover.html
>>>>>>
>>>>>> -Sridhar
>>>>> Interesting I didn't notice it does this. So in fact
>>>>> just defining VF mac will immediately divert packets
>>>>> to the VF? Given guest driver did not initialize VF
>>>>> yet won't a bunch of packets be dropped?
>>>> There is typo in my stmt above (VF->PF)
>>>> When the VF is unplugged, you need to reset the VFs MAC so that the packets
>>>> with VMs MAC start flowing via PF, bridge and the virtio interface.
>>>>
>>>> When the VF is plugged in, ideally the MAC filter for the VF should be added to
>>>> the HW once the guest driver comes up and can receive packets. Currently with intel
>>>> drivers, the filter gets added to HW as soon as the host admin sets the VFs MAC via
>>>> ndo_set_vf_mac() api. So potentially there could be packet drops until the VF driver
>>>> comes up in the VM.
>>>>
>>>>
>>> Can this be fixed in the intel drivers?
>> I just checked and it looks like this seems to have been addressed in the
>> ice 100Gb driver. Will bring this up issue internally to see if we can change this
>> behavior in i40e/ixgbe drivers.
> Also what happens if the mac is programmed both in PF (e.g. with
> macvtap) and VF? Ideally VF will take precedence.
I'm seriously doubtful that legacy Intel NIC hardware can do that 
instead of mucking around with software workaround in the PF driver. 
Actually, the same applies to other NIC vendors when hardware sees 
duplicate filters. There's no such control of precedence on one over the 
other.


-Siwei





---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-11-28 20:28                                                               ` si-wei liu
@ 2018-11-28 20:43                                                                 ` Michael S. Tsirkin
  2018-11-28 20:47                                                                   ` si-wei liu
  2018-11-29  1:15                                                                 ` Michael S. Tsirkin
  1 sibling, 1 reply; 85+ messages in thread
From: Michael S. Tsirkin @ 2018-11-28 20:43 UTC (permalink / raw)
  To: si-wei liu
  Cc: Samudrala, Sridhar, Carolyn, Sameeh Jubran, Siwei Liu,
	venu.busireddy, cohuck, virtio-dev, liran.alon, Yan Vugenfirer,
	Brandeburg, Jesse

On Wed, Nov 28, 2018 at 12:28:42PM -0800, si-wei liu wrote:
> 
> 
> On 11/28/2018 12:06 PM, Michael S. Tsirkin wrote:
> > On Wed, Nov 28, 2018 at 10:39:55AM -0800, Samudrala, Sridhar wrote:
> > > On 11/28/2018 9:35 AM, Michael S. Tsirkin wrote:
> > > > On Wed, Nov 28, 2018 at 09:31:32AM -0800, Samudrala, Sridhar wrote:
> > > > > On 11/28/2018 9:08 AM, Michael S. Tsirkin wrote:
> > > > > > On Mon, Nov 26, 2018 at 12:22:56PM -0800, Samudrala, Sridhar wrote:
> > > > > > > > Update:
> > > > > > > > I have just set the vf mac's address to 0 (ip link set ens2f0 vf 1 mac
> > > > > > > > 00:00:00:00:00:00) after unplugging it (the primary device) and the
> > > > > > > > pings started working again on the failover interface. So it seems
> > > > > > > > like the frames were arriving to the vf on the host.
> > > > > > > > 
> > > > > > > > 
> > > > > > > Yes. When the VF is unplugged, you need to reset the VFs MAC so that the packets
> > > > > > > with VMs MAC start flowing via VF, bridge and the virtio interface.
> > > > > > > 
> > > > > > > Have you looked at this documentation that shows a sample script to initiate live
> > > > > > > migration?
> > > > > > > https://www.kernel.org/doc/html/latest/networking/net_failover.html
> > > > > > > 
> > > > > > > -Sridhar
> > > > > > Interesting I didn't notice it does this. So in fact
> > > > > > just defining VF mac will immediately divert packets
> > > > > > to the VF? Given guest driver did not initialize VF
> > > > > > yet won't a bunch of packets be dropped?
> > > > > There is typo in my stmt above (VF->PF)
> > > > > When the VF is unplugged, you need to reset the VFs MAC so that the packets
> > > > > with VMs MAC start flowing via PF, bridge and the virtio interface.
> > > > > 
> > > > > When the VF is plugged in, ideally the MAC filter for the VF should be added to
> > > > > the HW once the guest driver comes up and can receive packets. Currently with intel
> > > > > drivers, the filter gets added to HW as soon as the host admin sets the VFs MAC via
> > > > > ndo_set_vf_mac() api. So potentially there could be packet drops until the VF driver
> > > > > comes up in the VM.
> > > > > 
> > > > > 
> > > > Can this be fixed in the intel drivers?
> > > I just checked and it looks like this seems to have been addressed in the
> > > ice 100Gb driver. Will bring this up issue internally to see if we can change this
> > > behavior in i40e/ixgbe drivers.
> > Also what happens if the mac is programmed both in PF (e.g. with
> > macvtap) and VF? Ideally VF will take precedence.
> I'm seriously doubtful that legacy Intel NIC hardware can do that instead of
> mucking around with software workaround in the PF driver. Actually, the same
> applies to other NIC vendors when hardware sees duplicate filters. There's
> no such control of precedence on one over the other.
> 
> 
> -Siwei
> 
> 
> 

OK I guess we will need another feature bit for a software
workaround then.


-- 
MST

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-11-28 20:43                                                                 ` Michael S. Tsirkin
@ 2018-11-28 20:47                                                                   ` si-wei liu
  0 siblings, 0 replies; 85+ messages in thread
From: si-wei liu @ 2018-11-28 20:47 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Samudrala, Sridhar, Carolyn, Sameeh Jubran, Siwei Liu,
	venu.busireddy, cohuck, virtio-dev, liran.alon, Yan Vugenfirer,
	Brandeburg, Jesse



On 11/28/2018 12:43 PM, Michael S. Tsirkin wrote:
> On Wed, Nov 28, 2018 at 12:28:42PM -0800, si-wei liu wrote:
>>
>> On 11/28/2018 12:06 PM, Michael S. Tsirkin wrote:
>>> On Wed, Nov 28, 2018 at 10:39:55AM -0800, Samudrala, Sridhar wrote:
>>>> On 11/28/2018 9:35 AM, Michael S. Tsirkin wrote:
>>>>> On Wed, Nov 28, 2018 at 09:31:32AM -0800, Samudrala, Sridhar wrote:
>>>>>> On 11/28/2018 9:08 AM, Michael S. Tsirkin wrote:
>>>>>>> On Mon, Nov 26, 2018 at 12:22:56PM -0800, Samudrala, Sridhar wrote:
>>>>>>>>> Update:
>>>>>>>>> I have just set the vf mac's address to 0 (ip link set ens2f0 vf 1 mac
>>>>>>>>> 00:00:00:00:00:00) after unplugging it (the primary device) and the
>>>>>>>>> pings started working again on the failover interface. So it seems
>>>>>>>>> like the frames were arriving to the vf on the host.
>>>>>>>>>
>>>>>>>>>
>>>>>>>> Yes. When the VF is unplugged, you need to reset the VFs MAC so that the packets
>>>>>>>> with VMs MAC start flowing via VF, bridge and the virtio interface.
>>>>>>>>
>>>>>>>> Have you looked at this documentation that shows a sample script to initiate live
>>>>>>>> migration?
>>>>>>>> https://www.kernel.org/doc/html/latest/networking/net_failover.html
>>>>>>>>
>>>>>>>> -Sridhar
>>>>>>> Interesting I didn't notice it does this. So in fact
>>>>>>> just defining VF mac will immediately divert packets
>>>>>>> to the VF? Given guest driver did not initialize VF
>>>>>>> yet won't a bunch of packets be dropped?
>>>>>> There is typo in my stmt above (VF->PF)
>>>>>> When the VF is unplugged, you need to reset the VFs MAC so that the packets
>>>>>> with VMs MAC start flowing via PF, bridge and the virtio interface.
>>>>>>
>>>>>> When the VF is plugged in, ideally the MAC filter for the VF should be added to
>>>>>> the HW once the guest driver comes up and can receive packets. Currently with intel
>>>>>> drivers, the filter gets added to HW as soon as the host admin sets the VFs MAC via
>>>>>> ndo_set_vf_mac() api. So potentially there could be packet drops until the VF driver
>>>>>> comes up in the VM.
>>>>>>
>>>>>>
>>>>> Can this be fixed in the intel drivers?
>>>> I just checked and it looks like this seems to have been addressed in the
>>>> ice 100Gb driver. Will bring this up issue internally to see if we can change this
>>>> behavior in i40e/ixgbe drivers.
>>> Also what happens if the mac is programmed both in PF (e.g. with
>>> macvtap) and VF? Ideally VF will take precedence.
>> I'm seriously doubtful that legacy Intel NIC hardware can do that instead of
>> mucking around with software workaround in the PF driver. Actually, the same
>> applies to other NIC vendors when hardware sees duplicate filters. There's
>> no such control of precedence on one over the other.
>>
>>
>> -Siwei
>>
>>
>>
> OK I guess we will need another feature bit for a software
> workaround then.
>
OK but then what if the NIC vendor is not willing to take a software 
workaround for this feature?

-Siwei







---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-11-28 20:28                                                               ` si-wei liu
  2018-11-28 20:43                                                                 ` Michael S. Tsirkin
@ 2018-11-29  1:15                                                                 ` Michael S. Tsirkin
  2018-11-29  6:37                                                                   ` Samudrala, Sridhar
  2018-11-29 20:14                                                                   ` si-wei liu
  1 sibling, 2 replies; 85+ messages in thread
From: Michael S. Tsirkin @ 2018-11-29  1:15 UTC (permalink / raw)
  To: si-wei liu
  Cc: Samudrala, Sridhar, Carolyn, Sameeh Jubran, Siwei Liu,
	venu.busireddy, cohuck, virtio-dev, liran.alon, Yan Vugenfirer,
	Brandeburg, Jesse

On Wed, Nov 28, 2018 at 12:28:42PM -0800, si-wei liu wrote:
> 
> 
> On 11/28/2018 12:06 PM, Michael S. Tsirkin wrote:
> > On Wed, Nov 28, 2018 at 10:39:55AM -0800, Samudrala, Sridhar wrote:
> > > On 11/28/2018 9:35 AM, Michael S. Tsirkin wrote:
> > > > On Wed, Nov 28, 2018 at 09:31:32AM -0800, Samudrala, Sridhar wrote:
> > > > > On 11/28/2018 9:08 AM, Michael S. Tsirkin wrote:
> > > > > > On Mon, Nov 26, 2018 at 12:22:56PM -0800, Samudrala, Sridhar wrote:
> > > > > > > > Update:
> > > > > > > > I have just set the vf mac's address to 0 (ip link set ens2f0 vf 1 mac
> > > > > > > > 00:00:00:00:00:00) after unplugging it (the primary device) and the
> > > > > > > > pings started working again on the failover interface. So it seems
> > > > > > > > like the frames were arriving to the vf on the host.
> > > > > > > > 
> > > > > > > > 
> > > > > > > Yes. When the VF is unplugged, you need to reset the VFs MAC so that the packets
> > > > > > > with VMs MAC start flowing via VF, bridge and the virtio interface.
> > > > > > > 
> > > > > > > Have you looked at this documentation that shows a sample script to initiate live
> > > > > > > migration?
> > > > > > > https://www.kernel.org/doc/html/latest/networking/net_failover.html
> > > > > > > 
> > > > > > > -Sridhar
> > > > > > Interesting I didn't notice it does this. So in fact
> > > > > > just defining VF mac will immediately divert packets
> > > > > > to the VF? Given guest driver did not initialize VF
> > > > > > yet won't a bunch of packets be dropped?
> > > > > There is typo in my stmt above (VF->PF)
> > > > > When the VF is unplugged, you need to reset the VFs MAC so that the packets
> > > > > with VMs MAC start flowing via PF, bridge and the virtio interface.
> > > > > 
> > > > > When the VF is plugged in, ideally the MAC filter for the VF should be added to
> > > > > the HW once the guest driver comes up and can receive packets. Currently with intel
> > > > > drivers, the filter gets added to HW as soon as the host admin sets the VFs MAC via
> > > > > ndo_set_vf_mac() api. So potentially there could be packet drops until the VF driver
> > > > > comes up in the VM.
> > > > > 
> > > > > 
> > > > Can this be fixed in the intel drivers?
> > > I just checked and it looks like this seems to have been addressed in the
> > > ice 100Gb driver. Will bring this up issue internally to see if we can change this
> > > behavior in i40e/ixgbe drivers.
> > Also what happens if the mac is programmed both in PF (e.g. with
> > macvtap) and VF? Ideally VF will take precedence.
> I'm seriously doubtful that legacy Intel NIC hardware can do that instead of
> mucking around with software workaround in the PF driver. Actually, the same
> applies to other NIC vendors when hardware sees duplicate filters. There's
> no such control of precedence on one over the other.
> 
> 
> -Siwei
> 
> 

Well removing a MAC from the PF filter when we are adding it to the VF
filter should always be possible. Need to keep it in a separate list and
re-add it when removing the MAC from VF filter.  This can be handled in
the net core, no need for driver specific hacks.


Still, let's prioritize things correctly.  IMHO it's fine if we
initially assume promisc mode on the PF.  macvlan has this mode too
after all.

Question is how does userspace know driver isn't broken in this respect?
Let's add a "vf failover" flag somewhere so this can be probed?

-- 
MST

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-11-28 18:51                                                             ` Michael S. Tsirkin
@ 2018-11-29  6:29                                                               ` Samudrala, Sridhar
  0 siblings, 0 replies; 85+ messages in thread
From: Samudrala, Sridhar @ 2018-11-29  6:29 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Carolyn, Sameeh Jubran, Siwei Liu, venu.busireddy, cohuck,
	virtio-dev, liran.alon, Yan Vugenfirer, Brandeburg, Jesse



On 11/28/2018 10:51 AM, Michael S. Tsirkin wrote:
> On Wed, Nov 28, 2018 at 10:39:55AM -0800, Samudrala, Sridhar wrote:
>> On 11/28/2018 9:35 AM, Michael S. Tsirkin wrote:
>>> On Wed, Nov 28, 2018 at 09:31:32AM -0800, Samudrala, Sridhar wrote:
>>>> On 11/28/2018 9:08 AM, Michael S. Tsirkin wrote:
>>>>> On Mon, Nov 26, 2018 at 12:22:56PM -0800, Samudrala, Sridhar wrote:
>>>>>>> Update:
>>>>>>> I have just set the vf mac's address to 0 (ip link set ens2f0 vf 1 mac
>>>>>>> 00:00:00:00:00:00) after unplugging it (the primary device) and the
>>>>>>> pings started working again on the failover interface. So it seems
>>>>>>> like the frames were arriving to the vf on the host.
>>>>>>>
>>>>>>>
>>>>>> Yes. When the VF is unplugged, you need to reset the VFs MAC so that the packets
>>>>>> with VMs MAC start flowing via VF, bridge and the virtio interface.
>>>>>>
>>>>>> Have you looked at this documentation that shows a sample script to initiate live
>>>>>> migration?
>>>>>> https://www.kernel.org/doc/html/latest/networking/net_failover.html
>>>>>>
>>>>>> -Sridhar
>>>>> Interesting I didn't notice it does this. So in fact
>>>>> just defining VF mac will immediately divert packets
>>>>> to the VF? Given guest driver did not initialize VF
>>>>> yet won't a bunch of packets be dropped?
>>>> There is typo in my stmt above (VF->PF)
>>>> When the VF is unplugged, you need to reset the VFs MAC so that the packets
>>>> with VMs MAC start flowing via PF, bridge and the virtio interface.
>>>>
>>>> When the VF is plugged in, ideally the MAC filter for the VF should be added to
>>>> the HW once the guest driver comes up and can receive packets. Currently with intel
>>>> drivers, the filter gets added to HW as soon as the host admin sets the VFs MAC via
>>>> ndo_set_vf_mac() api. So potentially there could be packet drops until the VF driver
>>>> comes up in the VM.
>>>>
>>>>
>>> Can this be fixed in the intel drivers?
>> I just checked and it looks like this seems to have been addressed in the
>> ice 100Gb driver.
> Thanks! Could you pls point out the relevant code/commit id?

You can look into ice_set_vf_mac().  Compare it with i40e_ndo_set_vf_mac()




---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-11-29  1:15                                                                 ` Michael S. Tsirkin
@ 2018-11-29  6:37                                                                   ` Samudrala, Sridhar
  2018-11-29 20:14                                                                   ` si-wei liu
  1 sibling, 0 replies; 85+ messages in thread
From: Samudrala, Sridhar @ 2018-11-29  6:37 UTC (permalink / raw)
  To: Michael S. Tsirkin, si-wei liu
  Cc: Carolyn, Sameeh Jubran, Siwei Liu, venu.busireddy, cohuck,
	virtio-dev, liran.alon, Yan Vugenfirer, Brandeburg, Jesse

On 11/28/2018 5:15 PM, Michael S. Tsirkin wrote:
> On Wed, Nov 28, 2018 at 12:28:42PM -0800, si-wei liu wrote:
>>
>> On 11/28/2018 12:06 PM, Michael S. Tsirkin wrote:
>>> On Wed, Nov 28, 2018 at 10:39:55AM -0800, Samudrala, Sridhar wrote:
>>>> On 11/28/2018 9:35 AM, Michael S. Tsirkin wrote:
>>>>> On Wed, Nov 28, 2018 at 09:31:32AM -0800, Samudrala, Sridhar wrote:
>>>>>> On 11/28/2018 9:08 AM, Michael S. Tsirkin wrote:
>>>>>>> On Mon, Nov 26, 2018 at 12:22:56PM -0800, Samudrala, Sridhar wrote:
>>>>>>>>> Update:
>>>>>>>>> I have just set the vf mac's address to 0 (ip link set ens2f0 vf 1 mac
>>>>>>>>> 00:00:00:00:00:00) after unplugging it (the primary device) and the
>>>>>>>>> pings started working again on the failover interface. So it seems
>>>>>>>>> like the frames were arriving to the vf on the host.
>>>>>>>>>
>>>>>>>>>
>>>>>>>> Yes. When the VF is unplugged, you need to reset the VFs MAC so that the packets
>>>>>>>> with VMs MAC start flowing via VF, bridge and the virtio interface.
>>>>>>>>
>>>>>>>> Have you looked at this documentation that shows a sample script to initiate live
>>>>>>>> migration?
>>>>>>>> https://www.kernel.org/doc/html/latest/networking/net_failover.html
>>>>>>>>
>>>>>>>> -Sridhar
>>>>>>> Interesting I didn't notice it does this. So in fact
>>>>>>> just defining VF mac will immediately divert packets
>>>>>>> to the VF? Given guest driver did not initialize VF
>>>>>>> yet won't a bunch of packets be dropped?
>>>>>> There is typo in my stmt above (VF->PF)
>>>>>> When the VF is unplugged, you need to reset the VFs MAC so that the packets
>>>>>> with VMs MAC start flowing via PF, bridge and the virtio interface.
>>>>>>
>>>>>> When the VF is plugged in, ideally the MAC filter for the VF should be added to
>>>>>> the HW once the guest driver comes up and can receive packets. Currently with intel
>>>>>> drivers, the filter gets added to HW as soon as the host admin sets the VFs MAC via
>>>>>> ndo_set_vf_mac() api. So potentially there could be packet drops until the VF driver
>>>>>> comes up in the VM.
>>>>>>
>>>>>>
>>>>> Can this be fixed in the intel drivers?
>>>> I just checked and it looks like this seems to have been addressed in the
>>>> ice 100Gb driver. Will bring this up issue internally to see if we can change this
>>>> behavior in i40e/ixgbe drivers.
>>> Also what happens if the mac is programmed both in PF (e.g. with
>>> macvtap) and VF? Ideally VF will take precedence.
>> I'm seriously doubtful that legacy Intel NIC hardware can do that instead of
>> mucking around with software workaround in the PF driver. Actually, the same
>> applies to other NIC vendors when hardware sees duplicate filters. There's
>> no such control of precedence on one over the other.
>>
>>
>> -Siwei
>>
>>
> Well removing a MAC from the PF filter when we are adding it to the VF
> filter should always be possible. Need to keep it in a separate list and
> re-add it when removing the MAC from VF filter.  This can be handled in
> the net core, no need for driver specific hacks.

We don't explicitly add MAC to the PF filter list. Just resetting VFs MAC will cause the
frames to reach PF as that is the default port.

Also, setting MACs is a privileged operation and i think managing them during live migration
should be handled by a management layer.


>
>
> Still, let's prioritize things correctly.  IMHO it's fine if we
> initially assume promisc mode on the PF.  macvlan has this mode too
> after all.

Yes. We don't need to explicitly set the MAC on the PF.

>
> Question is how does userspace know driver isn't broken in this respect?
> Let's add a "vf failover" flag somewhere so this can be probed?

I don't understand the need for this additional flag.


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-11-29  1:15                                                                 ` Michael S. Tsirkin
  2018-11-29  6:37                                                                   ` Samudrala, Sridhar
@ 2018-11-29 20:14                                                                   ` si-wei liu
  2018-11-29 21:17                                                                     ` Michael S. Tsirkin
  1 sibling, 1 reply; 85+ messages in thread
From: si-wei liu @ 2018-11-29 20:14 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Samudrala, Sridhar, Carolyn, Sameeh Jubran, Siwei Liu,
	venu.busireddy, cohuck, virtio-dev, liran.alon, Yan Vugenfirer,
	Brandeburg, Jesse, Boris Ostrovsky



On 11/28/2018 5:15 PM, Michael S. Tsirkin wrote:
> On Wed, Nov 28, 2018 at 12:28:42PM -0800, si-wei liu wrote:
>>
>> On 11/28/2018 12:06 PM, Michael S. Tsirkin wrote:
>>> On Wed, Nov 28, 2018 at 10:39:55AM -0800, Samudrala, Sridhar wrote:
>>>> On 11/28/2018 9:35 AM, Michael S. Tsirkin wrote:
>>>>> On Wed, Nov 28, 2018 at 09:31:32AM -0800, Samudrala, Sridhar wrote:
>>>>>> On 11/28/2018 9:08 AM, Michael S. Tsirkin wrote:
>>>>>>> On Mon, Nov 26, 2018 at 12:22:56PM -0800, Samudrala, Sridhar wrote:
>>>>>>>>> Update:
>>>>>>>>> I have just set the vf mac's address to 0 (ip link set ens2f0 vf 1 mac
>>>>>>>>> 00:00:00:00:00:00) after unplugging it (the primary device) and the
>>>>>>>>> pings started working again on the failover interface. So it seems
>>>>>>>>> like the frames were arriving to the vf on the host.
>>>>>>>>>
>>>>>>>>>
>>>>>>>> Yes. When the VF is unplugged, you need to reset the VFs MAC so that the packets
>>>>>>>> with VMs MAC start flowing via VF, bridge and the virtio interface.
>>>>>>>>
>>>>>>>> Have you looked at this documentation that shows a sample script to initiate live
>>>>>>>> migration?
>>>>>>>> https://www.kernel.org/doc/html/latest/networking/net_failover.html
>>>>>>>>
>>>>>>>> -Sridhar
>>>>>>> Interesting I didn't notice it does this. So in fact
>>>>>>> just defining VF mac will immediately divert packets
>>>>>>> to the VF? Given guest driver did not initialize VF
>>>>>>> yet won't a bunch of packets be dropped?
>>>>>> There is typo in my stmt above (VF->PF)
>>>>>> When the VF is unplugged, you need to reset the VFs MAC so that the packets
>>>>>> with VMs MAC start flowing via PF, bridge and the virtio interface.
>>>>>>
>>>>>> When the VF is plugged in, ideally the MAC filter for the VF should be added to
>>>>>> the HW once the guest driver comes up and can receive packets. Currently with intel
>>>>>> drivers, the filter gets added to HW as soon as the host admin sets the VFs MAC via
>>>>>> ndo_set_vf_mac() api. So potentially there could be packet drops until the VF driver
>>>>>> comes up in the VM.
>>>>>>
>>>>>>
>>>>> Can this be fixed in the intel drivers?
>>>> I just checked and it looks like this seems to have been addressed in the
>>>> ice 100Gb driver. Will bring this up issue internally to see if we can change this
>>>> behavior in i40e/ixgbe drivers.
>>> Also what happens if the mac is programmed both in PF (e.g. with
>>> macvtap) and VF? Ideally VF will take precedence.
>> I'm seriously doubtful that legacy Intel NIC hardware can do that instead of
>> mucking around with software workaround in the PF driver. Actually, the same
>> applies to other NIC vendors when hardware sees duplicate filters. There's
>> no such control of precedence on one over the other.
>>
>>
>> -Siwei
>>
>>
> Well removing a MAC from the PF filter when we are adding it to the VF
> filter should always be possible. Need to keep it in a separate list and
> re-add it when removing the MAC from VF filter.  This can be handled in
> the net core, no need for driver specific hacks.
So that is what I ever said - essentially what you need is a netdev API, 
rather than to add dirty hacks on each driver. That is fine, but how 
would you implement it? Note there's no equivalent driver level .ndo API 
to "move" filters, and all existing .ndo APIs manipulate at the MAC 
address level as opposed to filters. Are you going to convince netdev 
this is the right thing to do and we should add such API to the net core 
and each individual driver?


> Still, let's prioritize things correctly.  IMHO it's fine if we
> initially assume promisc mode on the PF.  macvlan has this mode too
> after all.
I'm not sure what promisc mode you talked about. As far as I understand 
it for macvlan/macvtap the NIC is only put into promisc mode when 
running out of MAC filter entries. Before that all MAC addresses will be 
added to the NIC as unicast filters. In addition, people prefer 
macvlan/macvtap for adding isolation in a multi-tenant cloud as well as 
avoiding performance penalty due to noisy neighbors. I'd rather to hear 
that claim to be that the current MAC-based pairing scheme doesn't work 
well with macvtap and only works with bridged setup which has promisc 
enabled. That would be more helpful for people to understand the 
situation better.

Thanks,
-Siwei


>
> Question is how does userspace know driver isn't broken in this respect?
> Let's add a "vf failover" flag somewhere so this can be probed?
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-11-29 20:14                                                                   ` si-wei liu
@ 2018-11-29 21:17                                                                     ` Michael S. Tsirkin
  2018-11-29 22:53                                                                       ` si-wei liu
  0 siblings, 1 reply; 85+ messages in thread
From: Michael S. Tsirkin @ 2018-11-29 21:17 UTC (permalink / raw)
  To: si-wei liu
  Cc: Samudrala, Sridhar, Carolyn, Sameeh Jubran, Siwei Liu,
	venu.busireddy, cohuck, virtio-dev, liran.alon, Yan Vugenfirer,
	Brandeburg, Jesse, Boris Ostrovsky

On Thu, Nov 29, 2018 at 12:14:46PM -0800, si-wei liu wrote:
> 
> 
> On 11/28/2018 5:15 PM, Michael S. Tsirkin wrote:
> > On Wed, Nov 28, 2018 at 12:28:42PM -0800, si-wei liu wrote:
> > > 
> > > On 11/28/2018 12:06 PM, Michael S. Tsirkin wrote:
> > > > On Wed, Nov 28, 2018 at 10:39:55AM -0800, Samudrala, Sridhar wrote:
> > > > > On 11/28/2018 9:35 AM, Michael S. Tsirkin wrote:
> > > > > > On Wed, Nov 28, 2018 at 09:31:32AM -0800, Samudrala, Sridhar wrote:
> > > > > > > On 11/28/2018 9:08 AM, Michael S. Tsirkin wrote:
> > > > > > > > On Mon, Nov 26, 2018 at 12:22:56PM -0800, Samudrala, Sridhar wrote:
> > > > > > > > > > Update:
> > > > > > > > > > I have just set the vf mac's address to 0 (ip link set ens2f0 vf 1 mac
> > > > > > > > > > 00:00:00:00:00:00) after unplugging it (the primary device) and the
> > > > > > > > > > pings started working again on the failover interface. So it seems
> > > > > > > > > > like the frames were arriving to the vf on the host.
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > Yes. When the VF is unplugged, you need to reset the VFs MAC so that the packets
> > > > > > > > > with VMs MAC start flowing via VF, bridge and the virtio interface.
> > > > > > > > > 
> > > > > > > > > Have you looked at this documentation that shows a sample script to initiate live
> > > > > > > > > migration?
> > > > > > > > > https://www.kernel.org/doc/html/latest/networking/net_failover.html
> > > > > > > > > 
> > > > > > > > > -Sridhar
> > > > > > > > Interesting I didn't notice it does this. So in fact
> > > > > > > > just defining VF mac will immediately divert packets
> > > > > > > > to the VF? Given guest driver did not initialize VF
> > > > > > > > yet won't a bunch of packets be dropped?
> > > > > > > There is typo in my stmt above (VF->PF)
> > > > > > > When the VF is unplugged, you need to reset the VFs MAC so that the packets
> > > > > > > with VMs MAC start flowing via PF, bridge and the virtio interface.
> > > > > > > 
> > > > > > > When the VF is plugged in, ideally the MAC filter for the VF should be added to
> > > > > > > the HW once the guest driver comes up and can receive packets. Currently with intel
> > > > > > > drivers, the filter gets added to HW as soon as the host admin sets the VFs MAC via
> > > > > > > ndo_set_vf_mac() api. So potentially there could be packet drops until the VF driver
> > > > > > > comes up in the VM.
> > > > > > > 
> > > > > > > 
> > > > > > Can this be fixed in the intel drivers?
> > > > > I just checked and it looks like this seems to have been addressed in the
> > > > > ice 100Gb driver. Will bring this up issue internally to see if we can change this
> > > > > behavior in i40e/ixgbe drivers.
> > > > Also what happens if the mac is programmed both in PF (e.g. with
> > > > macvtap) and VF? Ideally VF will take precedence.
> > > I'm seriously doubtful that legacy Intel NIC hardware can do that instead of
> > > mucking around with software workaround in the PF driver. Actually, the same
> > > applies to other NIC vendors when hardware sees duplicate filters. There's
> > > no such control of precedence on one over the other.
> > > 
> > > 
> > > -Siwei
> > > 
> > > 
> > Well removing a MAC from the PF filter when we are adding it to the VF
> > filter should always be possible. Need to keep it in a separate list and
> > re-add it when removing the MAC from VF filter.  This can be handled in
> > the net core, no need for driver specific hacks.
> So that is what I ever said - essentially what you need is a netdev API,
> rather than to add dirty hacks on each driver. That is fine, but how would
> you implement it? Note there's no equivalent driver level .ndo API to "move"
> filters, and all existing .ndo APIs manipulate at the MAC address level as
> opposed to filters. Are you going to convince netdev this is the right thing
> to do and we should add such API to the net core and each individual driver?

There's no need for a new API IMO.
You drop it from list of uc macs, then call .ndo_set_rx_mode.
This can be done without changing existing drivers.

> 
> > Still, let's prioritize things correctly.  IMHO it's fine if we
> > initially assume promisc mode on the PF.  macvlan has this mode too
> > after all.
> I'm not sure what promisc mode you talked about. As far as I understand it
> for macvlan/macvtap the NIC is only put into promisc mode when running out
> of MAC filter entries. Before that all MAC addresses will be added to the
> NIC as unicast filters. In addition, people prefer macvlan/macvtap for
> adding isolation in a multi-tenant cloud as well as avoiding performance
> penalty due to noisy neighbors. I'd rather to hear that claim to be that the
> current MAC-based pairing scheme doesn't work well with macvtap and only
> works with bridged setup which has promisc enabled. That would be more
> helpful for people to understand the situation better.
> 
> Thanks,
> -Siwei
> 

As a first step that's fine. Still this assumes just creating a VF
doesn't yet program the on-card filter to cause packet drops. Let's
assume drivers are fixed to do that. How does userspace know
that's the case? We might need some kind of attribute so
userspace can detect it.

> > 
> > Question is how does userspace know driver isn't broken in this respect?
> > Let's add a "vf failover" flag somewhere so this can be probed?
> > 

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-11-29 21:17                                                                     ` Michael S. Tsirkin
@ 2018-11-29 22:53                                                                       ` si-wei liu
  2018-11-29 23:53                                                                         ` Samudrala, Sridhar
  2018-11-30  6:21                                                                         ` Michael S. Tsirkin
  0 siblings, 2 replies; 85+ messages in thread
From: si-wei liu @ 2018-11-29 22:53 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Samudrala, Sridhar, Carolyn, Sameeh Jubran, Siwei Liu,
	venu.busireddy, cohuck, virtio-dev, liran.alon, Yan Vugenfirer,
	Brandeburg, Jesse, Boris Ostrovsky



On 11/29/2018 1:17 PM, Michael S. Tsirkin wrote:
> On Thu, Nov 29, 2018 at 12:14:46PM -0800, si-wei liu wrote:
>>
>> On 11/28/2018 5:15 PM, Michael S. Tsirkin wrote:
>>> On Wed, Nov 28, 2018 at 12:28:42PM -0800, si-wei liu wrote:
>>>> On 11/28/2018 12:06 PM, Michael S. Tsirkin wrote:
>>>>> On Wed, Nov 28, 2018 at 10:39:55AM -0800, Samudrala, Sridhar wrote:
>>>>>> On 11/28/2018 9:35 AM, Michael S. Tsirkin wrote:
>>>>>>> On Wed, Nov 28, 2018 at 09:31:32AM -0800, Samudrala, Sridhar wrote:
>>>>>>>> On 11/28/2018 9:08 AM, Michael S. Tsirkin wrote:
>>>>>>>>> On Mon, Nov 26, 2018 at 12:22:56PM -0800, Samudrala, Sridhar wrote:
>>>>>>>>>>> Update:
>>>>>>>>>>> I have just set the vf mac's address to 0 (ip link set ens2f0 vf 1 mac
>>>>>>>>>>> 00:00:00:00:00:00) after unplugging it (the primary device) and the
>>>>>>>>>>> pings started working again on the failover interface. So it seems
>>>>>>>>>>> like the frames were arriving to the vf on the host.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> Yes. When the VF is unplugged, you need to reset the VFs MAC so that the packets
>>>>>>>>>> with VMs MAC start flowing via VF, bridge and the virtio interface.
>>>>>>>>>>
>>>>>>>>>> Have you looked at this documentation that shows a sample script to initiate live
>>>>>>>>>> migration?
>>>>>>>>>> https://www.kernel.org/doc/html/latest/networking/net_failover.html
>>>>>>>>>>
>>>>>>>>>> -Sridhar
>>>>>>>>> Interesting I didn't notice it does this. So in fact
>>>>>>>>> just defining VF mac will immediately divert packets
>>>>>>>>> to the VF? Given guest driver did not initialize VF
>>>>>>>>> yet won't a bunch of packets be dropped?
>>>>>>>> There is typo in my stmt above (VF->PF)
>>>>>>>> When the VF is unplugged, you need to reset the VFs MAC so that the packets
>>>>>>>> with VMs MAC start flowing via PF, bridge and the virtio interface.
>>>>>>>>
>>>>>>>> When the VF is plugged in, ideally the MAC filter for the VF should be added to
>>>>>>>> the HW once the guest driver comes up and can receive packets. Currently with intel
>>>>>>>> drivers, the filter gets added to HW as soon as the host admin sets the VFs MAC via
>>>>>>>> ndo_set_vf_mac() api. So potentially there could be packet drops until the VF driver
>>>>>>>> comes up in the VM.
>>>>>>>>
>>>>>>>>
>>>>>>> Can this be fixed in the intel drivers?
>>>>>> I just checked and it looks like this seems to have been addressed in the
>>>>>> ice 100Gb driver. Will bring this up issue internally to see if we can change this
>>>>>> behavior in i40e/ixgbe drivers.
>>>>> Also what happens if the mac is programmed both in PF (e.g. with
>>>>> macvtap) and VF? Ideally VF will take precedence.
>>>> I'm seriously doubtful that legacy Intel NIC hardware can do that instead of
>>>> mucking around with software workaround in the PF driver. Actually, the same
>>>> applies to other NIC vendors when hardware sees duplicate filters. There's
>>>> no such control of precedence on one over the other.
>>>>
>>>>
>>>> -Siwei
>>>>
>>>>
>>> Well removing a MAC from the PF filter when we are adding it to the VF
>>> filter should always be possible. Need to keep it in a separate list and
>>> re-add it when removing the MAC from VF filter.  This can be handled in
>>> the net core, no need for driver specific hacks.
>> So that is what I ever said - essentially what you need is a netdev API,
>> rather than to add dirty hacks on each driver. That is fine, but how would
>> you implement it? Note there's no equivalent driver level .ndo API to "move"
>> filters, and all existing .ndo APIs manipulate at the MAC address level as
>> opposed to filters. Are you going to convince netdev this is the right thing
>> to do and we should add such API to the net core and each individual driver?
> There's no need for a new API IMO.
> You drop it from list of uc macs, then call .ndo_set_rx_mode.
Then still you need a new netlink API - effectively it alters the 
running state of macvtap as it steals certain filters out from the NIC 
that affects the datapath of macvtap. I assume we talk about some kernel 
mechanism to do automatic datapath switching without involving userspace 
management stack/orchestration software. In the kernel's (net core's) 
view that also needs some weak binding/coordination between the VF and 
the macvtap for which MAC filter needs to be activated. Still this 
senses to me a new API rather than tweaking the current and 
long-existing default behavior and making it work transparently just for 
this case. Otherwise, without introducing a new API, how does the 
userspace infer that the running kernel supports this new behavior.

> This can be done without changing existing drivers.
>
>>> Still, let's prioritize things correctly.  IMHO it's fine if we
>>> initially assume promisc mode on the PF.  macvlan has this mode too
>>> after all.
>> I'm not sure what promisc mode you talked about. As far as I understand it
>> for macvlan/macvtap the NIC is only put into promisc mode when running out
>> of MAC filter entries. Before that all MAC addresses will be added to the
>> NIC as unicast filters. In addition, people prefer macvlan/macvtap for
>> adding isolation in a multi-tenant cloud as well as avoiding performance
>> penalty due to noisy neighbors. I'd rather to hear that claim to be that the
>> current MAC-based pairing scheme doesn't work well with macvtap and only
>> works with bridged setup which has promisc enabled. That would be more
>> helpful for people to understand the situation better.
>>
>> Thanks,
>> -Siwei
>>
> As a first step that's fine.
Well, I specifically called it out one year ago as this work was started 
that macvtap is what we look into (we don't care about bridge with 
promiscuous enabled) and the answer I got at the point was that the 
current model would work well for macvtap too (which I've been very 
doubtful from the very beginning). Eventually turns out this is not true 
and it looks like this is slowly converging to what Hyper-V netvsc 
already supported quite a few years if not a decade ago, sighs...

>   Still this assumes just creating a VF
> doesn't yet program the on-card filter to cause packet drops.
Suppose this behavior is fixable in legacy Intel NIC, you would still 
need to evacuate the filter programmed by macvtap previously when VF's 
filter gets activated (typically when VF's netdev is netif_running() in 
a Linux guest). That's what we and NetVSC call as "datapath switching", 
and where this could be handled (driver, net core, or userspace) is the 
core for the architectural design that I spent much time on.

Having said it, I don't expect or would desperately wait on one vendor 
to fix a legacy driver which wasn't quite motivated, then no work would 
be done on that. If you'd go the way, please make sure Intel could 
change their driver first.

>   Let's
> assume drivers are fixed to do that. How does userspace know
> that's the case? We might need some kind of attribute so
> userspace can detect it.
Where do you envision the new attribute could be at? Supposedly it'd be 
exposed by the kernel, which constitutes a new API or API changes.


Thanks,
-Siwei
>
>>> Question is how does userspace know driver isn't broken in this respect?
>>> Let's add a "vf failover" flag somewhere so this can be probed?
>>>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-11-29 22:53                                                                       ` si-wei liu
@ 2018-11-29 23:53                                                                         ` Samudrala, Sridhar
  2018-11-30  0:24                                                                           ` si-wei liu
  2018-11-30  6:21                                                                         ` Michael S. Tsirkin
  1 sibling, 1 reply; 85+ messages in thread
From: Samudrala, Sridhar @ 2018-11-29 23:53 UTC (permalink / raw)
  To: si-wei liu, Michael S. Tsirkin
  Cc: Carolyn, Sameeh Jubran, Siwei Liu, venu.busireddy, cohuck,
	virtio-dev, liran.alon, Yan Vugenfirer, Brandeburg, Jesse,
	Boris Ostrovsky

On 11/29/2018 2:53 PM, si-wei liu wrote:
>
>
> On 11/29/2018 1:17 PM, Michael S. Tsirkin wrote:
>> On Thu, Nov 29, 2018 at 12:14:46PM -0800, si-wei liu wrote:
>>>
>>> On 11/28/2018 5:15 PM, Michael S. Tsirkin wrote:
>>>> On Wed, Nov 28, 2018 at 12:28:42PM -0800, si-wei liu wrote:
>>>>> On 11/28/2018 12:06 PM, Michael S. Tsirkin wrote:
>>>>>> On Wed, Nov 28, 2018 at 10:39:55AM -0800, Samudrala, Sridhar wrote:
>>>>>>> On 11/28/2018 9:35 AM, Michael S. Tsirkin wrote:
>>>>>>>> On Wed, Nov 28, 2018 at 09:31:32AM -0800, Samudrala, Sridhar 
>>>>>>>> wrote:
>>>>>>>>> On 11/28/2018 9:08 AM, Michael S. Tsirkin wrote:
>>>>>>>>>> On Mon, Nov 26, 2018 at 12:22:56PM -0800, Samudrala, Sridhar 
>>>>>>>>>> wrote:
>>>>>>>>>>>> Update:
>>>>>>>>>>>> I have just set the vf mac's address to 0 (ip link set 
>>>>>>>>>>>> ens2f0 vf 1 mac
>>>>>>>>>>>> 00:00:00:00:00:00) after unplugging it (the primary device) 
>>>>>>>>>>>> and the
>>>>>>>>>>>> pings started working again on the failover interface. So 
>>>>>>>>>>>> it seems
>>>>>>>>>>>> like the frames were arriving to the vf on the host.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> Yes. When the VF is unplugged, you need to reset the VFs MAC 
>>>>>>>>>>> so that the packets
>>>>>>>>>>> with VMs MAC start flowing via VF, bridge and the virtio 
>>>>>>>>>>> interface.
>>>>>>>>>>>
>>>>>>>>>>> Have you looked at this documentation that shows a sample 
>>>>>>>>>>> script to initiate live
>>>>>>>>>>> migration?
>>>>>>>>>>> https://www.kernel.org/doc/html/latest/networking/net_failover.html 
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> -Sridhar
>>>>>>>>>> Interesting I didn't notice it does this. So in fact
>>>>>>>>>> just defining VF mac will immediately divert packets
>>>>>>>>>> to the VF? Given guest driver did not initialize VF
>>>>>>>>>> yet won't a bunch of packets be dropped?
>>>>>>>>> There is typo in my stmt above (VF->PF)
>>>>>>>>> When the VF is unplugged, you need to reset the VFs MAC so 
>>>>>>>>> that the packets
>>>>>>>>> with VMs MAC start flowing via PF, bridge and the virtio 
>>>>>>>>> interface.
>>>>>>>>>
>>>>>>>>> When the VF is plugged in, ideally the MAC filter for the VF 
>>>>>>>>> should be added to
>>>>>>>>> the HW once the guest driver comes up and can receive packets. 
>>>>>>>>> Currently with intel
>>>>>>>>> drivers, the filter gets added to HW as soon as the host admin 
>>>>>>>>> sets the VFs MAC via
>>>>>>>>> ndo_set_vf_mac() api. So potentially there could be packet 
>>>>>>>>> drops until the VF driver
>>>>>>>>> comes up in the VM.
>>>>>>>>>
>>>>>>>>>
>>>>>>>> Can this be fixed in the intel drivers?
>>>>>>> I just checked and it looks like this seems to have been 
>>>>>>> addressed in the
>>>>>>> ice 100Gb driver. Will bring this up issue internally to see if 
>>>>>>> we can change this
>>>>>>> behavior in i40e/ixgbe drivers.
>>>>>> Also what happens if the mac is programmed both in PF (e.g. with
>>>>>> macvtap) and VF? Ideally VF will take precedence.
>>>>> I'm seriously doubtful that legacy Intel NIC hardware can do that 
>>>>> instead of
>>>>> mucking around with software workaround in the PF driver. 
>>>>> Actually, the same
>>>>> applies to other NIC vendors when hardware sees duplicate filters. 
>>>>> There's
>>>>> no such control of precedence on one over the other.
>>>>>
>>>>>
>>>>> -Siwei
>>>>>
>>>>>
>>>> Well removing a MAC from the PF filter when we are adding it to the VF
>>>> filter should always be possible. Need to keep it in a separate 
>>>> list and
>>>> re-add it when removing the MAC from VF filter.  This can be 
>>>> handled in
>>>> the net core, no need for driver specific hacks.
>>> So that is what I ever said - essentially what you need is a netdev 
>>> API,
>>> rather than to add dirty hacks on each driver. That is fine, but how 
>>> would
>>> you implement it? Note there's no equivalent driver level .ndo API 
>>> to "move"
>>> filters, and all existing .ndo APIs manipulate at the MAC address 
>>> level as
>>> opposed to filters. Are you going to convince netdev this is the 
>>> right thing
>>> to do and we should add such API to the net core and each individual 
>>> driver?
>> There's no need for a new API IMO.
>> You drop it from list of uc macs, then call .ndo_set_rx_mode.
> Then still you need a new netlink API - effectively it alters the 
> running state of macvtap as it steals certain filters out from the NIC 
> that affects the datapath of macvtap. I assume we talk about some 
> kernel mechanism to do automatic datapath switching without involving 
> userspace management stack/orchestration software. In the kernel's 
> (net core's) view that also needs some weak binding/coordination 
> between the VF and the macvtap for which MAC filter needs to be 
> activated. Still this senses to me a new API rather than tweaking the 
> current and long-existing default behavior and making it work 
> transparently just for this case. Otherwise, without introducing a new 
> API, how does the userspace infer that the running kernel supports 
> this new behavior.

In case of virtio backed by macvtap, you can change the mac address of the macvtap
interface. When VF is plugged in, change macvtap's MAC to an unassigned MAC and bring
the virtio link down.
When VF in unplugged, set macvtap's MAC to VMs mac and bring up virtio link.



---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-11-29 23:53                                                                         ` Samudrala, Sridhar
@ 2018-11-30  0:24                                                                           ` si-wei liu
  2018-11-30  3:08                                                                             ` Samudrala, Sridhar
  0 siblings, 1 reply; 85+ messages in thread
From: si-wei liu @ 2018-11-30  0:24 UTC (permalink / raw)
  To: Samudrala, Sridhar, Michael S. Tsirkin
  Cc: Carolyn, Sameeh Jubran, Siwei Liu, venu.busireddy, cohuck,
	virtio-dev, liran.alon, Yan Vugenfirer, Brandeburg, Jesse,
	Boris Ostrovsky



On 11/29/2018 3:53 PM, Samudrala, Sridhar wrote:
> On 11/29/2018 2:53 PM, si-wei liu wrote:
>>
>>
>> On 11/29/2018 1:17 PM, Michael S. Tsirkin wrote:
>>> On Thu, Nov 29, 2018 at 12:14:46PM -0800, si-wei liu wrote:
>>>>
>>>> On 11/28/2018 5:15 PM, Michael S. Tsirkin wrote:
>>>>> On Wed, Nov 28, 2018 at 12:28:42PM -0800, si-wei liu wrote:
>>>>>> On 11/28/2018 12:06 PM, Michael S. Tsirkin wrote:
>>>>>>> On Wed, Nov 28, 2018 at 10:39:55AM -0800, Samudrala, Sridhar wrote:
>>>>>>>> On 11/28/2018 9:35 AM, Michael S. Tsirkin wrote:
>>>>>>>>> On Wed, Nov 28, 2018 at 09:31:32AM -0800, Samudrala, Sridhar 
>>>>>>>>> wrote:
>>>>>>>>>> On 11/28/2018 9:08 AM, Michael S. Tsirkin wrote:
>>>>>>>>>>> On Mon, Nov 26, 2018 at 12:22:56PM -0800, Samudrala, Sridhar 
>>>>>>>>>>> wrote:
>>>>>>>>>>>>> Update:
>>>>>>>>>>>>> I have just set the vf mac's address to 0 (ip link set 
>>>>>>>>>>>>> ens2f0 vf 1 mac
>>>>>>>>>>>>> 00:00:00:00:00:00) after unplugging it (the primary 
>>>>>>>>>>>>> device) and the
>>>>>>>>>>>>> pings started working again on the failover interface. So 
>>>>>>>>>>>>> it seems
>>>>>>>>>>>>> like the frames were arriving to the vf on the host.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>> Yes. When the VF is unplugged, you need to reset the VFs 
>>>>>>>>>>>> MAC so that the packets
>>>>>>>>>>>> with VMs MAC start flowing via VF, bridge and the virtio 
>>>>>>>>>>>> interface.
>>>>>>>>>>>>
>>>>>>>>>>>> Have you looked at this documentation that shows a sample 
>>>>>>>>>>>> script to initiate live
>>>>>>>>>>>> migration?
>>>>>>>>>>>> https://www.kernel.org/doc/html/latest/networking/net_failover.html 
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> -Sridhar
>>>>>>>>>>> Interesting I didn't notice it does this. So in fact
>>>>>>>>>>> just defining VF mac will immediately divert packets
>>>>>>>>>>> to the VF? Given guest driver did not initialize VF
>>>>>>>>>>> yet won't a bunch of packets be dropped?
>>>>>>>>>> There is typo in my stmt above (VF->PF)
>>>>>>>>>> When the VF is unplugged, you need to reset the VFs MAC so 
>>>>>>>>>> that the packets
>>>>>>>>>> with VMs MAC start flowing via PF, bridge and the virtio 
>>>>>>>>>> interface.
>>>>>>>>>>
>>>>>>>>>> When the VF is plugged in, ideally the MAC filter for the VF 
>>>>>>>>>> should be added to
>>>>>>>>>> the HW once the guest driver comes up and can receive 
>>>>>>>>>> packets. Currently with intel
>>>>>>>>>> drivers, the filter gets added to HW as soon as the host 
>>>>>>>>>> admin sets the VFs MAC via
>>>>>>>>>> ndo_set_vf_mac() api. So potentially there could be packet 
>>>>>>>>>> drops until the VF driver
>>>>>>>>>> comes up in the VM.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> Can this be fixed in the intel drivers?
>>>>>>>> I just checked and it looks like this seems to have been 
>>>>>>>> addressed in the
>>>>>>>> ice 100Gb driver. Will bring this up issue internally to see if 
>>>>>>>> we can change this
>>>>>>>> behavior in i40e/ixgbe drivers.
>>>>>>> Also what happens if the mac is programmed both in PF (e.g. with
>>>>>>> macvtap) and VF? Ideally VF will take precedence.
>>>>>> I'm seriously doubtful that legacy Intel NIC hardware can do that 
>>>>>> instead of
>>>>>> mucking around with software workaround in the PF driver. 
>>>>>> Actually, the same
>>>>>> applies to other NIC vendors when hardware sees duplicate 
>>>>>> filters. There's
>>>>>> no such control of precedence on one over the other.
>>>>>>
>>>>>>
>>>>>> -Siwei
>>>>>>
>>>>>>
>>>>> Well removing a MAC from the PF filter when we are adding it to 
>>>>> the VF
>>>>> filter should always be possible. Need to keep it in a separate 
>>>>> list and
>>>>> re-add it when removing the MAC from VF filter.  This can be 
>>>>> handled in
>>>>> the net core, no need for driver specific hacks.
>>>> So that is what I ever said - essentially what you need is a netdev 
>>>> API,
>>>> rather than to add dirty hacks on each driver. That is fine, but 
>>>> how would
>>>> you implement it? Note there's no equivalent driver level .ndo API 
>>>> to "move"
>>>> filters, and all existing .ndo APIs manipulate at the MAC address 
>>>> level as
>>>> opposed to filters. Are you going to convince netdev this is the 
>>>> right thing
>>>> to do and we should add such API to the net core and each 
>>>> individual driver?
>>> There's no need for a new API IMO.
>>> You drop it from list of uc macs, then call .ndo_set_rx_mode.
>> Then still you need a new netlink API - effectively it alters the 
>> running state of macvtap as it steals certain filters out from the 
>> NIC that affects the datapath of macvtap. I assume we talk about some 
>> kernel mechanism to do automatic datapath switching without involving 
>> userspace management stack/orchestration software. In the kernel's 
>> (net core's) view that also needs some weak binding/coordination 
>> between the VF and the macvtap for which MAC filter needs to be 
>> activated. Still this senses to me a new API rather than tweaking the 
>> current and long-existing default behavior and making it work 
>> transparently just for this case. Otherwise, without introducing a 
>> new API, how does the userspace infer that the running kernel 
>> supports this new behavior.
>
> In case of virtio backed by macvtap, you can change the mac address of 
> the macvtap
> interface. When VF is plugged in, change macvtap's MAC to an 
> unassigned MAC and bring
> the virtio link down.
> When VF in unplugged, set macvtap's MAC to VMs mac and bring up virtio 
> link.
>
This needs management software to orchestrate, right?  What MST and I 
are discussing is to how to do this switching automatically without 
involving management software.


-Siwei

>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-11-30  0:24                                                                           ` si-wei liu
@ 2018-11-30  3:08                                                                             ` Samudrala, Sridhar
  2018-11-30  4:46                                                                               ` si-wei liu
  0 siblings, 1 reply; 85+ messages in thread
From: Samudrala, Sridhar @ 2018-11-30  3:08 UTC (permalink / raw)
  To: si-wei liu, Michael S. Tsirkin
  Cc: Carolyn, Sameeh Jubran, Siwei Liu, venu.busireddy, cohuck,
	virtio-dev, liran.alon, Yan Vugenfirer, Brandeburg, Jesse,
	Boris Ostrovsky

On 11/29/2018 4:24 PM, si-wei liu wrote:
>
>
> On 11/29/2018 3:53 PM, Samudrala, Sridhar wrote:
>> On 11/29/2018 2:53 PM, si-wei liu wrote:
>>>
>>>
>>> On 11/29/2018 1:17 PM, Michael S. Tsirkin wrote:
>>>> On Thu, Nov 29, 2018 at 12:14:46PM -0800, si-wei liu wrote:
>>>>>
>>>>> On 11/28/2018 5:15 PM, Michael S. Tsirkin wrote:
>>>>>> On Wed, Nov 28, 2018 at 12:28:42PM -0800, si-wei liu wrote:
>>>>>>> On 11/28/2018 12:06 PM, Michael S. Tsirkin wrote:
>>>>>>>> On Wed, Nov 28, 2018 at 10:39:55AM -0800, Samudrala, Sridhar 
>>>>>>>> wrote:
>>>>>>>>> On 11/28/2018 9:35 AM, Michael S. Tsirkin wrote:
>>>>>>>>>> On Wed, Nov 28, 2018 at 09:31:32AM -0800, Samudrala, Sridhar 
>>>>>>>>>> wrote:
>>>>>>>>>>> On 11/28/2018 9:08 AM, Michael S. Tsirkin wrote:
>>>>>>>>>>>> On Mon, Nov 26, 2018 at 12:22:56PM -0800, Samudrala, 
>>>>>>>>>>>> Sridhar wrote:
>>>>>>>>>>>>>> Update:
>>>>>>>>>>>>>> I have just set the vf mac's address to 0 (ip link set 
>>>>>>>>>>>>>> ens2f0 vf 1 mac
>>>>>>>>>>>>>> 00:00:00:00:00:00) after unplugging it (the primary 
>>>>>>>>>>>>>> device) and the
>>>>>>>>>>>>>> pings started working again on the failover interface. So 
>>>>>>>>>>>>>> it seems
>>>>>>>>>>>>>> like the frames were arriving to the vf on the host.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>> Yes. When the VF is unplugged, you need to reset the VFs 
>>>>>>>>>>>>> MAC so that the packets
>>>>>>>>>>>>> with VMs MAC start flowing via VF, bridge and the virtio 
>>>>>>>>>>>>> interface.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Have you looked at this documentation that shows a sample 
>>>>>>>>>>>>> script to initiate live
>>>>>>>>>>>>> migration?
>>>>>>>>>>>>> https://www.kernel.org/doc/html/latest/networking/net_failover.html 
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> -Sridhar
>>>>>>>>>>>> Interesting I didn't notice it does this. So in fact
>>>>>>>>>>>> just defining VF mac will immediately divert packets
>>>>>>>>>>>> to the VF? Given guest driver did not initialize VF
>>>>>>>>>>>> yet won't a bunch of packets be dropped?
>>>>>>>>>>> There is typo in my stmt above (VF->PF)
>>>>>>>>>>> When the VF is unplugged, you need to reset the VFs MAC so 
>>>>>>>>>>> that the packets
>>>>>>>>>>> with VMs MAC start flowing via PF, bridge and the virtio 
>>>>>>>>>>> interface.
>>>>>>>>>>>
>>>>>>>>>>> When the VF is plugged in, ideally the MAC filter for the VF 
>>>>>>>>>>> should be added to
>>>>>>>>>>> the HW once the guest driver comes up and can receive 
>>>>>>>>>>> packets. Currently with intel
>>>>>>>>>>> drivers, the filter gets added to HW as soon as the host 
>>>>>>>>>>> admin sets the VFs MAC via
>>>>>>>>>>> ndo_set_vf_mac() api. So potentially there could be packet 
>>>>>>>>>>> drops until the VF driver
>>>>>>>>>>> comes up in the VM.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> Can this be fixed in the intel drivers?
>>>>>>>>> I just checked and it looks like this seems to have been 
>>>>>>>>> addressed in the
>>>>>>>>> ice 100Gb driver. Will bring this up issue internally to see 
>>>>>>>>> if we can change this
>>>>>>>>> behavior in i40e/ixgbe drivers.
>>>>>>>> Also what happens if the mac is programmed both in PF (e.g. with
>>>>>>>> macvtap) and VF? Ideally VF will take precedence.
>>>>>>> I'm seriously doubtful that legacy Intel NIC hardware can do 
>>>>>>> that instead of
>>>>>>> mucking around with software workaround in the PF driver. 
>>>>>>> Actually, the same
>>>>>>> applies to other NIC vendors when hardware sees duplicate 
>>>>>>> filters. There's
>>>>>>> no such control of precedence on one over the other.
>>>>>>>
>>>>>>>
>>>>>>> -Siwei
>>>>>>>
>>>>>>>
>>>>>> Well removing a MAC from the PF filter when we are adding it to 
>>>>>> the VF
>>>>>> filter should always be possible. Need to keep it in a separate 
>>>>>> list and
>>>>>> re-add it when removing the MAC from VF filter.  This can be 
>>>>>> handled in
>>>>>> the net core, no need for driver specific hacks.
>>>>> So that is what I ever said - essentially what you need is a 
>>>>> netdev API,
>>>>> rather than to add dirty hacks on each driver. That is fine, but 
>>>>> how would
>>>>> you implement it? Note there's no equivalent driver level .ndo API 
>>>>> to "move"
>>>>> filters, and all existing .ndo APIs manipulate at the MAC address 
>>>>> level as
>>>>> opposed to filters. Are you going to convince netdev this is the 
>>>>> right thing
>>>>> to do and we should add such API to the net core and each 
>>>>> individual driver?
>>>> There's no need for a new API IMO.
>>>> You drop it from list of uc macs, then call .ndo_set_rx_mode.
>>> Then still you need a new netlink API - effectively it alters the 
>>> running state of macvtap as it steals certain filters out from the 
>>> NIC that affects the datapath of macvtap. I assume we talk about 
>>> some kernel mechanism to do automatic datapath switching without 
>>> involving userspace management stack/orchestration software. In the 
>>> kernel's (net core's) view that also needs some weak 
>>> binding/coordination between the VF and the macvtap for which MAC 
>>> filter needs to be activated. Still this senses to me a new API 
>>> rather than tweaking the current and long-existing default behavior 
>>> and making it work transparently just for this case. Otherwise, 
>>> without introducing a new API, how does the userspace infer that the 
>>> running kernel supports this new behavior.
>>
>> In case of virtio backed by macvtap, you can change the mac address 
>> of the macvtap
>> interface. When VF is plugged in, change macvtap's MAC to an 
>> unassigned MAC and bring
>> the virtio link down.
>> When VF in unplugged, set macvtap's MAC to VMs mac and bring up 
>> virtio link.
>>
> This needs management software to orchestrate, right? 

Yes. Isn't that a good option as live migration is initiated and orchestrated via mgmt. software.

> What MST and I are discussing is to how to do this switching 
> automatically without involving management software.

OK. I agree that it would be nice if we can do all this automatically via Qemu when the orchestration sw
initiates live migration rather than the mgmt. sw having to do some pre and post migration steps.
It may be possible to do these pre and post migration steps in qemu via netlink api to the kernel to
update the MAC addresses as we are now associating the primary and standby interfaces.



---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-11-30  3:08                                                                             ` Samudrala, Sridhar
@ 2018-11-30  4:46                                                                               ` si-wei liu
  0 siblings, 0 replies; 85+ messages in thread
From: si-wei liu @ 2018-11-30  4:46 UTC (permalink / raw)
  To: Samudrala, Sridhar, Michael S. Tsirkin
  Cc: Carolyn, Sameeh Jubran, Siwei Liu, venu.busireddy, cohuck,
	virtio-dev, liran.alon, Yan Vugenfirer, Brandeburg, Jesse,
	Boris Ostrovsky



On 11/29/2018 07:08 PM, Samudrala, Sridhar wrote:
> On 11/29/2018 4:24 PM, si-wei liu wrote:
>>
>>
>> On 11/29/2018 3:53 PM, Samudrala, Sridhar wrote:
>>> On 11/29/2018 2:53 PM, si-wei liu wrote:
>>>>
>>>>
>>>> On 11/29/2018 1:17 PM, Michael S. Tsirkin wrote:
>>>>> On Thu, Nov 29, 2018 at 12:14:46PM -0800, si-wei liu wrote:
>>>>>>
>>>>>> On 11/28/2018 5:15 PM, Michael S. Tsirkin wrote:
>>>>>>> On Wed, Nov 28, 2018 at 12:28:42PM -0800, si-wei liu wrote:
>>>>>>>> On 11/28/2018 12:06 PM, Michael S. Tsirkin wrote:
>>>>>>>>> On Wed, Nov 28, 2018 at 10:39:55AM -0800, Samudrala, Sridhar 
>>>>>>>>> wrote:
>>>>>>>>>> On 11/28/2018 9:35 AM, Michael S. Tsirkin wrote:
>>>>>>>>>>> On Wed, Nov 28, 2018 at 09:31:32AM -0800, Samudrala, Sridhar 
>>>>>>>>>>> wrote:
>>>>>>>>>>>> On 11/28/2018 9:08 AM, Michael S. Tsirkin wrote:
>>>>>>>>>>>>> On Mon, Nov 26, 2018 at 12:22:56PM -0800, Samudrala, 
>>>>>>>>>>>>> Sridhar wrote:
>>>>>>>>>>>>>>> Update:
>>>>>>>>>>>>>>> I have just set the vf mac's address to 0 (ip link set 
>>>>>>>>>>>>>>> ens2f0 vf 1 mac
>>>>>>>>>>>>>>> 00:00:00:00:00:00) after unplugging it (the primary 
>>>>>>>>>>>>>>> device) and the
>>>>>>>>>>>>>>> pings started working again on the failover interface. 
>>>>>>>>>>>>>>> So it seems
>>>>>>>>>>>>>>> like the frames were arriving to the vf on the host.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yes. When the VF is unplugged, you need to reset the VFs 
>>>>>>>>>>>>>> MAC so that the packets
>>>>>>>>>>>>>> with VMs MAC start flowing via VF, bridge and the virtio 
>>>>>>>>>>>>>> interface.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Have you looked at this documentation that shows a sample 
>>>>>>>>>>>>>> script to initiate live
>>>>>>>>>>>>>> migration?
>>>>>>>>>>>>>> https://www.kernel.org/doc/html/latest/networking/net_failover.html 
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -Sridhar
>>>>>>>>>>>>> Interesting I didn't notice it does this. So in fact
>>>>>>>>>>>>> just defining VF mac will immediately divert packets
>>>>>>>>>>>>> to the VF? Given guest driver did not initialize VF
>>>>>>>>>>>>> yet won't a bunch of packets be dropped?
>>>>>>>>>>>> There is typo in my stmt above (VF->PF)
>>>>>>>>>>>> When the VF is unplugged, you need to reset the VFs MAC so 
>>>>>>>>>>>> that the packets
>>>>>>>>>>>> with VMs MAC start flowing via PF, bridge and the virtio 
>>>>>>>>>>>> interface.
>>>>>>>>>>>>
>>>>>>>>>>>> When the VF is plugged in, ideally the MAC filter for the 
>>>>>>>>>>>> VF should be added to
>>>>>>>>>>>> the HW once the guest driver comes up and can receive 
>>>>>>>>>>>> packets. Currently with intel
>>>>>>>>>>>> drivers, the filter gets added to HW as soon as the host 
>>>>>>>>>>>> admin sets the VFs MAC via
>>>>>>>>>>>> ndo_set_vf_mac() api. So potentially there could be packet 
>>>>>>>>>>>> drops until the VF driver
>>>>>>>>>>>> comes up in the VM.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> Can this be fixed in the intel drivers?
>>>>>>>>>> I just checked and it looks like this seems to have been 
>>>>>>>>>> addressed in the
>>>>>>>>>> ice 100Gb driver. Will bring this up issue internally to see 
>>>>>>>>>> if we can change this
>>>>>>>>>> behavior in i40e/ixgbe drivers.
>>>>>>>>> Also what happens if the mac is programmed both in PF (e.g. with
>>>>>>>>> macvtap) and VF? Ideally VF will take precedence.
>>>>>>>> I'm seriously doubtful that legacy Intel NIC hardware can do 
>>>>>>>> that instead of
>>>>>>>> mucking around with software workaround in the PF driver. 
>>>>>>>> Actually, the same
>>>>>>>> applies to other NIC vendors when hardware sees duplicate 
>>>>>>>> filters. There's
>>>>>>>> no such control of precedence on one over the other.
>>>>>>>>
>>>>>>>>
>>>>>>>> -Siwei
>>>>>>>>
>>>>>>>>
>>>>>>> Well removing a MAC from the PF filter when we are adding it to 
>>>>>>> the VF
>>>>>>> filter should always be possible. Need to keep it in a separate 
>>>>>>> list and
>>>>>>> re-add it when removing the MAC from VF filter.  This can be 
>>>>>>> handled in
>>>>>>> the net core, no need for driver specific hacks.
>>>>>> So that is what I ever said - essentially what you need is a 
>>>>>> netdev API,
>>>>>> rather than to add dirty hacks on each driver. That is fine, but 
>>>>>> how would
>>>>>> you implement it? Note there's no equivalent driver level .ndo 
>>>>>> API to "move"
>>>>>> filters, and all existing .ndo APIs manipulate at the MAC address 
>>>>>> level as
>>>>>> opposed to filters. Are you going to convince netdev this is the 
>>>>>> right thing
>>>>>> to do and we should add such API to the net core and each 
>>>>>> individual driver?
>>>>> There's no need for a new API IMO.
>>>>> You drop it from list of uc macs, then call .ndo_set_rx_mode.
>>>> Then still you need a new netlink API - effectively it alters the 
>>>> running state of macvtap as it steals certain filters out from the 
>>>> NIC that affects the datapath of macvtap. I assume we talk about 
>>>> some kernel mechanism to do automatic datapath switching without 
>>>> involving userspace management stack/orchestration software. In the 
>>>> kernel's (net core's) view that also needs some weak 
>>>> binding/coordination between the VF and the macvtap for which MAC 
>>>> filter needs to be activated. Still this senses to me a new API 
>>>> rather than tweaking the current and long-existing default behavior 
>>>> and making it work transparently just for this case. Otherwise, 
>>>> without introducing a new API, how does the userspace infer that 
>>>> the running kernel supports this new behavior.
>>>
>>> In case of virtio backed by macvtap, you can change the mac address 
>>> of the macvtap
>>> interface. When VF is plugged in, change macvtap's MAC to an 
>>> unassigned MAC and bring
>>> the virtio link down.
>>> When VF in unplugged, set macvtap's MAC to VMs mac and bring up 
>>> virtio link.
>>>
>> This needs management software to orchestrate, right? 
>
> Yes. Isn't that a good option as live migration is initiated and 
> orchestrated via mgmt. software.
The motivation is to reduce the down time to zero to get in par with 
HyperV. Or maybe even better. But you won't be able to achieve that if 
initiating datapath switching from the userspace via mgmt software.

>
>> What MST and I are discussing is to how to do this switching 
>> automatically without involving management software.
>
> OK. I agree that it would be nice if we can do all this automatically 
> via Qemu when the orchestration sw
> initiates live migration rather than the mgmt. sw having to do some 
> pre and post migration steps.
> It may be possible to do these pre and post migration steps in qemu 
> via netlink api to the kernel to
> update the MAC addresses as we are now associating the primary and 
> standby interfaces.
The number one blocker for that approach now is can Intel ixgbe and i40e 
driver be fixed to defer adding MAC filter to the NIC until VF is up and 
running in guest? Particularly we'd limit the fix to PF side only with 
VF driver intact, using the existing mailbox or adminq interface.

Thanks,
-Siwei




---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-11-29 22:53                                                                       ` si-wei liu
  2018-11-29 23:53                                                                         ` Samudrala, Sridhar
@ 2018-11-30  6:21                                                                         ` Michael S. Tsirkin
  2018-12-04  2:09                                                                           ` si-wei liu
  1 sibling, 1 reply; 85+ messages in thread
From: Michael S. Tsirkin @ 2018-11-30  6:21 UTC (permalink / raw)
  To: si-wei liu
  Cc: Samudrala, Sridhar, Carolyn, Sameeh Jubran, Siwei Liu,
	venu.busireddy, cohuck, virtio-dev, liran.alon, Yan Vugenfirer,
	Brandeburg, Jesse, Boris Ostrovsky

On Thu, Nov 29, 2018 at 02:53:08PM -0800, si-wei liu wrote:
> 
> 
> On 11/29/2018 1:17 PM, Michael S. Tsirkin wrote:
> > On Thu, Nov 29, 2018 at 12:14:46PM -0800, si-wei liu wrote:
> > > 
> > > On 11/28/2018 5:15 PM, Michael S. Tsirkin wrote:
> > > > On Wed, Nov 28, 2018 at 12:28:42PM -0800, si-wei liu wrote:
> > > > > On 11/28/2018 12:06 PM, Michael S. Tsirkin wrote:
> > > > > > On Wed, Nov 28, 2018 at 10:39:55AM -0800, Samudrala, Sridhar wrote:
> > > > > > > On 11/28/2018 9:35 AM, Michael S. Tsirkin wrote:
> > > > > > > > On Wed, Nov 28, 2018 at 09:31:32AM -0800, Samudrala, Sridhar wrote:
> > > > > > > > > On 11/28/2018 9:08 AM, Michael S. Tsirkin wrote:
> > > > > > > > > > On Mon, Nov 26, 2018 at 12:22:56PM -0800, Samudrala, Sridhar wrote:
> > > > > > > > > > > > Update:
> > > > > > > > > > > > I have just set the vf mac's address to 0 (ip link set ens2f0 vf 1 mac
> > > > > > > > > > > > 00:00:00:00:00:00) after unplugging it (the primary device) and the
> > > > > > > > > > > > pings started working again on the failover interface. So it seems
> > > > > > > > > > > > like the frames were arriving to the vf on the host.
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > Yes. When the VF is unplugged, you need to reset the VFs MAC so that the packets
> > > > > > > > > > > with VMs MAC start flowing via VF, bridge and the virtio interface.
> > > > > > > > > > > 
> > > > > > > > > > > Have you looked at this documentation that shows a sample script to initiate live
> > > > > > > > > > > migration?
> > > > > > > > > > > https://www.kernel.org/doc/html/latest/networking/net_failover.html
> > > > > > > > > > > 
> > > > > > > > > > > -Sridhar
> > > > > > > > > > Interesting I didn't notice it does this. So in fact
> > > > > > > > > > just defining VF mac will immediately divert packets
> > > > > > > > > > to the VF? Given guest driver did not initialize VF
> > > > > > > > > > yet won't a bunch of packets be dropped?
> > > > > > > > > There is typo in my stmt above (VF->PF)
> > > > > > > > > When the VF is unplugged, you need to reset the VFs MAC so that the packets
> > > > > > > > > with VMs MAC start flowing via PF, bridge and the virtio interface.
> > > > > > > > > 
> > > > > > > > > When the VF is plugged in, ideally the MAC filter for the VF should be added to
> > > > > > > > > the HW once the guest driver comes up and can receive packets. Currently with intel
> > > > > > > > > drivers, the filter gets added to HW as soon as the host admin sets the VFs MAC via
> > > > > > > > > ndo_set_vf_mac() api. So potentially there could be packet drops until the VF driver
> > > > > > > > > comes up in the VM.
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > Can this be fixed in the intel drivers?
> > > > > > > I just checked and it looks like this seems to have been addressed in the
> > > > > > > ice 100Gb driver. Will bring this up issue internally to see if we can change this
> > > > > > > behavior in i40e/ixgbe drivers.
> > > > > > Also what happens if the mac is programmed both in PF (e.g. with
> > > > > > macvtap) and VF? Ideally VF will take precedence.
> > > > > I'm seriously doubtful that legacy Intel NIC hardware can do that instead of
> > > > > mucking around with software workaround in the PF driver. Actually, the same
> > > > > applies to other NIC vendors when hardware sees duplicate filters. There's
> > > > > no such control of precedence on one over the other.
> > > > > 
> > > > > 
> > > > > -Siwei
> > > > > 
> > > > > 
> > > > Well removing a MAC from the PF filter when we are adding it to the VF
> > > > filter should always be possible. Need to keep it in a separate list and
> > > > re-add it when removing the MAC from VF filter.  This can be handled in
> > > > the net core, no need for driver specific hacks.
> > > So that is what I ever said - essentially what you need is a netdev API,
> > > rather than to add dirty hacks on each driver. That is fine, but how would
> > > you implement it? Note there's no equivalent driver level .ndo API to "move"
> > > filters, and all existing .ndo APIs manipulate at the MAC address level as
> > > opposed to filters. Are you going to convince netdev this is the right thing
> > > to do and we should add such API to the net core and each individual driver?
> > There's no need for a new API IMO.
> > You drop it from list of uc macs, then call .ndo_set_rx_mode.
> Then still you need a new netlink API
> - effectively it alters the running
> state of macvtap as it steals certain filters out from the NIC that affects
> the datapath of macvtap. I assume we talk about some kernel mechanism to do
> automatic datapath switching without involving userspace management
> stack/orchestration software. In the kernel's (net core's) view that also
> needs some weak binding/coordination between the VF and the macvtap for
> which MAC filter needs to be activated. Still this senses to me a new API
> rather than tweaking the current and long-existing default behavior and
> making it work transparently just for this case. Otherwise, without
> introducing a new API, how does the userspace infer that the running kernel
> supports this new behavior.

I agree. But a single flag is not much of an extension. We don't even
need it in netlink, can be anywhere in e.g. sysfs.

> > This can be done without changing existing drivers.
> > 
> > > > Still, let's prioritize things correctly.  IMHO it's fine if we
> > > > initially assume promisc mode on the PF.  macvlan has this mode too
> > > > after all.
> > > I'm not sure what promisc mode you talked about. As far as I understand it
> > > for macvlan/macvtap the NIC is only put into promisc mode when running out
> > > of MAC filter entries. Before that all MAC addresses will be added to the
> > > NIC as unicast filters. In addition, people prefer macvlan/macvtap for
> > > adding isolation in a multi-tenant cloud as well as avoiding performance
> > > penalty due to noisy neighbors. I'd rather to hear that claim to be that the
> > > current MAC-based pairing scheme doesn't work well with macvtap and only
> > > works with bridged setup which has promisc enabled. That would be more
> > > helpful for people to understand the situation better.
> > > 
> > > Thanks,
> > > -Siwei
> > > 
> > As a first step that's fine.
> Well, I specifically called it out one year ago as this work was started
> that macvtap is what we look into (we don't care about bridge with
> promiscuous enabled) and the answer I got at the point was that the current
> model would work well for macvtap too (which I've been very doubtful from
> the very beginning).

At least I personally did not realize it's about macvtap.  I wish there
were example command lines showing what's broken.  Liran got hold of me
at the KVM forum and explained it's about macvlan that's the first I
heard about it, but that was offline, others might hear just now first.

The issue between macvlan and configuring a VF can be
tested with a couple of simple commands maybe using e.g. netsniff
with no need for a VM at all.
Pity these were never posted - interested in posting a test
tool that can be used to demonstrate/test the issue on various cards?

> Eventually turns out this is not true and it looks like
> this is slowly converging to what Hyper-V netvsc already supported quite a
> few years if not a decade ago, sighs...

Oh we'll see.

Meanwhile what's missing and was missing all along for the change you
seem to be advocating for to get off the ground is people who
are ready to actually send e.g. spec, guest driver, test patches.

> >   Still this assumes just creating a VF
> > doesn't yet program the on-card filter to cause packet drops.
> Suppose this behavior is fixable in legacy Intel NIC, you would still need
> to evacuate the filter programmed by macvtap previously when VF's filter
> gets activated (typically when VF's netdev is netif_running() in a Linux
> guest). That's what we and NetVSC call as "datapath switching", and where
> this could be handled (driver, net core, or userspace) is the core for the
> architectural design that I spent much time on.
> 
> Having said it, I don't expect or would desperately wait on one vendor to
> fix a legacy driver which wasn't quite motivated, then no work would be done
> on that.

Then that device can't be used with the mechanism in question.
Or if there are lots of drivers like this maybe someone will be
motivated enough to post a better implementation with a new
feature bit. It's not that I'm arguing against that.

But given the options of teaching management to play with
netlink API in response to guest actions, and with VCPU stopped,
and doing it all in host kernel drivers, I know I'll prefer host kernel
changes.

> If you'd go the way, please make sure Intel could change their
> driver first.

We'll see what happens with that. It's Sridhar from intel that implemented
the guest changes after all, so I expect he's motivated to make them
work well.


> >   Let's
> > assume drivers are fixed to do that. How does userspace know
> > that's the case? We might need some kind of attribute so
> > userspace can detect it.
> Where do you envision the new attribute could be at? Supposedly it'd be
> exposed by the kernel, which constitutes a new API or API changes.
> 
> 
> Thanks,
> -Siwei

People add e.g. new attributes in sysfs left and right.  It's unlikely
to be a matter of serious contention.

> > 
> > > > Question is how does userspace know driver isn't broken in this respect?
> > > > Let's add a "vf failover" flag somewhere so this can be probed?
> > > > 

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-11-30  6:21                                                                         ` Michael S. Tsirkin
@ 2018-12-04  2:09                                                                           ` si-wei liu
  2018-12-04  3:59                                                                             ` Michael S. Tsirkin
  0 siblings, 1 reply; 85+ messages in thread
From: si-wei liu @ 2018-12-04  2:09 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Samudrala, Sridhar, Carolyn, Sameeh Jubran, Siwei Liu,
	venu.busireddy, cohuck, virtio-dev, liran.alon, Yan Vugenfirer,
	Brandeburg, Jesse, Boris Ostrovsky



On 11/29/2018 10:21 PM, Michael S. Tsirkin wrote:
> On Thu, Nov 29, 2018 at 02:53:08PM -0800, si-wei liu wrote:
>>
>> On 11/29/2018 1:17 PM, Michael S. Tsirkin wrote:
>>> On Thu, Nov 29, 2018 at 12:14:46PM -0800, si-wei liu wrote:
>>>> On 11/28/2018 5:15 PM, Michael S. Tsirkin wrote:
>>>>> On Wed, Nov 28, 2018 at 12:28:42PM -0800, si-wei liu wrote:
>>>>>> On 11/28/2018 12:06 PM, Michael S. Tsirkin wrote:
>>>>>>> On Wed, Nov 28, 2018 at 10:39:55AM -0800, Samudrala, Sridhar wrote:
>>>>>>>> On 11/28/2018 9:35 AM, Michael S. Tsirkin wrote:
>>>>>>>>> On Wed, Nov 28, 2018 at 09:31:32AM -0800, Samudrala, Sridhar wrote:
>>>>>>>>>> On 11/28/2018 9:08 AM, Michael S. Tsirkin wrote:
>>>>>>>>>>> On Mon, Nov 26, 2018 at 12:22:56PM -0800, Samudrala, Sridhar wrote:
>>>>>>>>>>>>> Update:
>>>>>>>>>>>>> I have just set the vf mac's address to 0 (ip link set ens2f0 vf 1 mac
>>>>>>>>>>>>> 00:00:00:00:00:00) after unplugging it (the primary device) and the
>>>>>>>>>>>>> pings started working again on the failover interface. So it seems
>>>>>>>>>>>>> like the frames were arriving to the vf on the host.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>> Yes. When the VF is unplugged, you need to reset the VFs MAC so that the packets
>>>>>>>>>>>> with VMs MAC start flowing via VF, bridge and the virtio interface.
>>>>>>>>>>>>
>>>>>>>>>>>> Have you looked at this documentation that shows a sample script to initiate live
>>>>>>>>>>>> migration?
>>>>>>>>>>>> https://www.kernel.org/doc/html/latest/networking/net_failover.html
>>>>>>>>>>>>
>>>>>>>>>>>> -Sridhar
>>>>>>>>>>> Interesting I didn't notice it does this. So in fact
>>>>>>>>>>> just defining VF mac will immediately divert packets
>>>>>>>>>>> to the VF? Given guest driver did not initialize VF
>>>>>>>>>>> yet won't a bunch of packets be dropped?
>>>>>>>>>> There is typo in my stmt above (VF->PF)
>>>>>>>>>> When the VF is unplugged, you need to reset the VFs MAC so that the packets
>>>>>>>>>> with VMs MAC start flowing via PF, bridge and the virtio interface.
>>>>>>>>>>
>>>>>>>>>> When the VF is plugged in, ideally the MAC filter for the VF should be added to
>>>>>>>>>> the HW once the guest driver comes up and can receive packets. Currently with intel
>>>>>>>>>> drivers, the filter gets added to HW as soon as the host admin sets the VFs MAC via
>>>>>>>>>> ndo_set_vf_mac() api. So potentially there could be packet drops until the VF driver
>>>>>>>>>> comes up in the VM.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> Can this be fixed in the intel drivers?
>>>>>>>> I just checked and it looks like this seems to have been addressed in the
>>>>>>>> ice 100Gb driver. Will bring this up issue internally to see if we can change this
>>>>>>>> behavior in i40e/ixgbe drivers.
>>>>>>> Also what happens if the mac is programmed both in PF (e.g. with
>>>>>>> macvtap) and VF? Ideally VF will take precedence.
>>>>>> I'm seriously doubtful that legacy Intel NIC hardware can do that instead of
>>>>>> mucking around with software workaround in the PF driver. Actually, the same
>>>>>> applies to other NIC vendors when hardware sees duplicate filters. There's
>>>>>> no such control of precedence on one over the other.
>>>>>>
>>>>>>
>>>>>> -Siwei
>>>>>>
>>>>>>
>>>>> Well removing a MAC from the PF filter when we are adding it to the VF
>>>>> filter should always be possible. Need to keep it in a separate list and
>>>>> re-add it when removing the MAC from VF filter.  This can be handled in
>>>>> the net core, no need for driver specific hacks.
>>>> So that is what I ever said - essentially what you need is a netdev API,
>>>> rather than to add dirty hacks on each driver. That is fine, but how would
>>>> you implement it? Note there's no equivalent driver level .ndo API to "move"
>>>> filters, and all existing .ndo APIs manipulate at the MAC address level as
>>>> opposed to filters. Are you going to convince netdev this is the right thing
>>>> to do and we should add such API to the net core and each individual driver?
>>> There's no need for a new API IMO.
>>> You drop it from list of uc macs, then call .ndo_set_rx_mode.
>> Then still you need a new netlink API
>> - effectively it alters the running
>> state of macvtap as it steals certain filters out from the NIC that affects
>> the datapath of macvtap. I assume we talk about some kernel mechanism to do
>> automatic datapath switching without involving userspace management
>> stack/orchestration software. In the kernel's (net core's) view that also
>> needs some weak binding/coordination between the VF and the macvtap for
>> which MAC filter needs to be activated. Still this senses to me a new API
>> rather than tweaking the current and long-existing default behavior and
>> making it work transparently just for this case. Otherwise, without
>> introducing a new API, how does the userspace infer that the running kernel
>> supports this new behavior.
> I agree. But a single flag is not much of an extension. We don't even
> need it in netlink, can be anywhere in e.g. sysfs.
I think sysfs attribute is for exposing the capability, while you still 
need to set up macvtap with some special mode via netlink. That way it 
doesn't break current behavior, and when VF's MAC filter is added 
macvtap would need to react to remove the filter from NIC. And add the 
one back when VF's MAC is removed.

>
>>> This can be done without changing existing drivers.
>>>
>>>>> Still, let's prioritize things correctly.  IMHO it's fine if we
>>>>> initially assume promisc mode on the PF.  macvlan has this mode too
>>>>> after all.
>>>> I'm not sure what promisc mode you talked about. As far as I understand it
>>>> for macvlan/macvtap the NIC is only put into promisc mode when running out
>>>> of MAC filter entries. Before that all MAC addresses will be added to the
>>>> NIC as unicast filters. In addition, people prefer macvlan/macvtap for
>>>> adding isolation in a multi-tenant cloud as well as avoiding performance
>>>> penalty due to noisy neighbors. I'd rather to hear that claim to be that the
>>>> current MAC-based pairing scheme doesn't work well with macvtap and only
>>>> works with bridged setup which has promisc enabled. That would be more
>>>> helpful for people to understand the situation better.
>>>>
>>>> Thanks,
>>>> -Siwei
>>>>
>>> As a first step that's fine.
>> Well, I specifically called it out one year ago as this work was started
>> that macvtap is what we look into (we don't care about bridge with
>> promiscuous enabled) and the answer I got at the point was that the current
>> model would work well for macvtap too (which I've been very doubtful from
>> the very beginning).
> At least I personally did not realize it's about macvtap.
Wouldn't macvtap a very common backend that any virtio-net feature has 
to support? I thought it has tighter integration with vhost-net than 
bridge and tap.

>   I wish there
> were example command lines showing what's broken.  Liran got hold of me
> at the KVM forum and explained it's about macvlan that's the first I
> heard about it, but that was offline, others might hear just now first.
>
> The issue between macvlan and configuring a VF can be
> tested with a couple of simple commands maybe using e.g. netsniff
> with no need for a VM at all.
> Pity these were never posted - interested in posting a test
> tool that can be used to demonstrate/test the issue on various cards?
>
>> Eventually turns out this is not true and it looks like
>> this is slowly converging to what Hyper-V netvsc already supported quite a
>> few years if not a decade ago, sighs...
> Oh we'll see.
>
> Meanwhile what's missing and was missing all along for the change you
> seem to be advocating for to get off the ground is people who
> are ready to actually send e.g. spec, guest driver, test patches.
Partly because it hadn't been converged to the best way to do it (even 
the group ID mechanism with PCI bridge can address our need you don't 
seem to think it is valuable). The in-kernel approach is fine at its 
appearance, but I personally don't believe changing every legacy driver 
is the way to go. It's the choice of implementation and what has been 
implemented in those drivers today IMHO is nothing wrong.

>
>>>    Still this assumes just creating a VF
>>> doesn't yet program the on-card filter to cause packet drops.
>> Suppose this behavior is fixable in legacy Intel NIC, you would still need
>> to evacuate the filter programmed by macvtap previously when VF's filter
>> gets activated (typically when VF's netdev is netif_running() in a Linux
>> guest). That's what we and NetVSC call as "datapath switching", and where
>> this could be handled (driver, net core, or userspace) is the core for the
>> architectural design that I spent much time on.
>>
>> Having said it, I don't expect or would desperately wait on one vendor to
>> fix a legacy driver which wasn't quite motivated, then no work would be done
>> on that.
> Then that device can't be used with the mechanism in question.
> Or if there are lots of drivers like this maybe someone will be
> motivated enough to post a better implementation with a new
> feature bit. It's not that I'm arguing against that.
>
> But given the options of teaching management to play with
> netlink API in response to guest actions, and with VCPU stopped,
> and doing it all in host kernel drivers, I know I'll prefer host kernel
> changes.
We have some internal patches that leverage management to respond to 
various guest actions. If you're interested we can post them. The thing 
is no one would like to work on the libvirt changes, while internally we 
have our own orchestration software which is not libvirt. But if you 
think it's fine we can definitely share our QEMU patches while leaving 
out libvirt.

Thanks,
-Siwei
>
>> If you'd go the way, please make sure Intel could change their
>> driver first.
> We'll see what happens with that. It's Sridhar from intel that implemented
> the guest changes after all, so I expect he's motivated to make them
> work well.
>
>
>>>    Let's
>>> assume drivers are fixed to do that. How does userspace know
>>> that's the case? We might need some kind of attribute so
>>> userspace can detect it.
>> Where do you envision the new attribute could be at? Supposedly it'd be
>> exposed by the kernel, which constitutes a new API or API changes.
>>
>>
>> Thanks,
>> -Siwei
> People add e.g. new attributes in sysfs left and right.  It's unlikely
> to be a matter of serious contention.
>
>>>>> Question is how does userspace know driver isn't broken in this respect?
>>>>> Let's add a "vf failover" flag somewhere so this can be probed?
>>>>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-12-04  2:09                                                                           ` si-wei liu
@ 2018-12-04  3:59                                                                             ` Michael S. Tsirkin
  2018-12-05 16:18                                                                               ` Sameeh Jubran
  0 siblings, 1 reply; 85+ messages in thread
From: Michael S. Tsirkin @ 2018-12-04  3:59 UTC (permalink / raw)
  To: si-wei liu
  Cc: Samudrala, Sridhar, Carolyn, Sameeh Jubran, Siwei Liu,
	venu.busireddy, cohuck, virtio-dev, liran.alon, Yan Vugenfirer,
	Brandeburg, Jesse, Boris Ostrovsky

On Mon, Dec 03, 2018 at 06:09:19PM -0800, si-wei liu wrote:
> > I agree. But a single flag is not much of an extension. We don't even
> > need it in netlink, can be anywhere in e.g. sysfs.
> I think sysfs attribute is for exposing the capability, while you still need
> to set up macvtap with some special mode via netlink. That way it doesn't
> break current behavior, and when VF's MAC filter is added macvtap would need
> to react to remove the filter from NIC. And add the one back when VF's MAC
> is removed.

All this will be up to the developers actually working on it. My
understanding is that intel is going to just change the behaviour
unconditionally, and it's already the case for Mellanox.
That creates a critical mass large enough that maybe others
just need to confirm.

...


> > Meanwhile what's missing and was missing all along for the change you
> > seem to be advocating for to get off the ground is people who
> > are ready to actually send e.g. spec, guest driver, test patches.
> Partly because it hadn't been converged to the best way to do it (even the
> group ID mechanism with PCI bridge can address our need you don't seem to
> think it is valuable). The in-kernel approach is fine at its appearance, but
> I personally don't believe changing every legacy driver is the way to go.
> It's the choice of implementation and what has been implemented in those
> drivers today IMHO is nothing wrong.

It's not a question of being wrong as such.
A standard behaviour is clearly better than each driver doing its
own thing which is the case now. As long as we ar standardizing,
let's standardize on something that matches our needs?
But I really see no problem with also supporting other options,
as long as someone is prepared to actually put in the work.


> > 
> > > >    Still this assumes just creating a VF
> > > > doesn't yet program the on-card filter to cause packet drops.
> > > Suppose this behavior is fixable in legacy Intel NIC, you would still need
> > > to evacuate the filter programmed by macvtap previously when VF's filter
> > > gets activated (typically when VF's netdev is netif_running() in a Linux
> > > guest). That's what we and NetVSC call as "datapath switching", and where
> > > this could be handled (driver, net core, or userspace) is the core for the
> > > architectural design that I spent much time on.
> > > 
> > > Having said it, I don't expect or would desperately wait on one vendor to
> > > fix a legacy driver which wasn't quite motivated, then no work would be done
> > > on that.
> > Then that device can't be used with the mechanism in question.
> > Or if there are lots of drivers like this maybe someone will be
> > motivated enough to post a better implementation with a new
> > feature bit. It's not that I'm arguing against that.
> > 
> > But given the options of teaching management to play with
> > netlink API in response to guest actions, and with VCPU stopped,
> > and doing it all in host kernel drivers, I know I'll prefer host kernel
> > changes.
> We have some internal patches that leverage management to respond to various
> guest actions. If you're interested we can post them. The thing is no one
> would like to work on the libvirt changes, while internally we have our own
> orchestration software which is not libvirt. But if you think it's fine we
> can definitely share our QEMU patches while leaving out libvirt.
> 
> Thanks,
> -Siwei

Sure, why not.

The following is generally necessary for any virtio project to happen:
- guest patches
- qemu patches
- spec documentation

Some extras are sometimes a dependency, e.g. host kernel patches.


Typically at least two of these are enough for people to
be able to figure out how things work.




> > 
> > > If you'd go the way, please make sure Intel could change their
> > > driver first.
> > We'll see what happens with that. It's Sridhar from intel that implemented
> > the guest changes after all, so I expect he's motivated to make them
> > work well.
> > 
> > 
> > > >    Let's
> > > > assume drivers are fixed to do that. How does userspace know
> > > > that's the case? We might need some kind of attribute so
> > > > userspace can detect it.
> > > Where do you envision the new attribute could be at? Supposedly it'd be
> > > exposed by the kernel, which constitutes a new API or API changes.
> > > 
> > > 
> > > Thanks,
> > > -Siwei
> > People add e.g. new attributes in sysfs left and right.  It's unlikely
> > to be a matter of serious contention.
> > 
> > > > > > Question is how does userspace know driver isn't broken in this respect?
> > > > > > Let's add a "vf failover" flag somewhere so this can be probed?
> > > > > > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > 

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-12-04  3:59                                                                             ` Michael S. Tsirkin
@ 2018-12-05 16:18                                                                               ` Sameeh Jubran
  2018-12-05 17:18                                                                                 ` Michael S. Tsirkin
  2018-12-08  1:54                                                                                 ` si-wei liu
  0 siblings, 2 replies; 85+ messages in thread
From: Sameeh Jubran @ 2018-12-05 16:18 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: si-wei.liu, sridhar.samudrala, carolyn.wyborny, Siwei Liu,
	venu.busireddy, cohuck, virtio-dev, liran.alon, Yan Vugenfirer,
	jesse.brandeburg, boris.ostrovsky

Hi all,

This is a followup on the discussion in the DPDK and Virtio monthly meeting.

Michael suggested that layer 2 tests should be created in order to
test the PF/VF behavior in different scenarios without using VMs at
all which should speed up the testing process.

The following "mausezahn" tool - which is part of netsniff-ng package
- can be used in order to generate layer 2 packets as follows:

mausezahn enp59s0 -c 0 -a rand -b 20:71:c6:2a:68:38 "08 00 aa bb cc dd"

The packets can be sniffed using tcpdump or netsniff-ng.

I am not completely sure how the setup should look like on the host,
but here is a script which assigns macvlan to the PF and sets it's mac
address to be the same as the VF mac address. The scripts assumes that
the sriov is already configured and the vf are present.

[root@wsfd-advnetlab10 ~]# cat go_macvlan.sh
MACVLAN_NAME=macvlan0
PF_NAME=enp59s0
VF_NUMBER=1
MAC_ADDR=20:71:c6:2a:68:38

echo "$PF_NAME vf status before setting mac"
ip link show dev $PF_NAME
ip link set $PF_NAME vf $VF_NUMBER mac $MAC_ADDR
ip li add link $PF_NAME $MACVLAN_NAME address $MAC_ADDR type macvlan
ip link set $PF_NAME up
echo "$PF_NAME vf status after setting mac"
ip link show dev $PF_NAME

Please share your thoughts on how the different test scenarios should
go, I can customize the scripts further more and host them somewhere.

On Tue, Dec 4, 2018 at 5:59 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Mon, Dec 03, 2018 at 06:09:19PM -0800, si-wei liu wrote:
> > > I agree. But a single flag is not much of an extension. We don't even
> > > need it in netlink, can be anywhere in e.g. sysfs.
> > I think sysfs attribute is for exposing the capability, while you still need
> > to set up macvtap with some special mode via netlink. That way it doesn't
> > break current behavior, and when VF's MAC filter is added macvtap would need
> > to react to remove the filter from NIC. And add the one back when VF's MAC
> > is removed.
>
> All this will be up to the developers actually working on it. My
> understanding is that intel is going to just change the behaviour
> unconditionally, and it's already the case for Mellanox.
> That creates a critical mass large enough that maybe others
> just need to confirm.
>
> ...
>
>
> > > Meanwhile what's missing and was missing all along for the change you
> > > seem to be advocating for to get off the ground is people who
> > > are ready to actually send e.g. spec, guest driver, test patches.
> > Partly because it hadn't been converged to the best way to do it (even the
> > group ID mechanism with PCI bridge can address our need you don't seem to
> > think it is valuable). The in-kernel approach is fine at its appearance, but
> > I personally don't believe changing every legacy driver is the way to go.
> > It's the choice of implementation and what has been implemented in those
> > drivers today IMHO is nothing wrong.
>
> It's not a question of being wrong as such.
> A standard behaviour is clearly better than each driver doing its
> own thing which is the case now. As long as we ar standardizing,
> let's standardize on something that matches our needs?
> But I really see no problem with also supporting other options,
> as long as someone is prepared to actually put in the work.
>
>
> > >
> > > > >    Still this assumes just creating a VF
> > > > > doesn't yet program the on-card filter to cause packet drops.
> > > > Suppose this behavior is fixable in legacy Intel NIC, you would still need
> > > > to evacuate the filter programmed by macvtap previously when VF's filter
> > > > gets activated (typically when VF's netdev is netif_running() in a Linux
> > > > guest). That's what we and NetVSC call as "datapath switching", and where
> > > > this could be handled (driver, net core, or userspace) is the core for the
> > > > architectural design that I spent much time on.
> > > >
> > > > Having said it, I don't expect or would desperately wait on one vendor to
> > > > fix a legacy driver which wasn't quite motivated, then no work would be done
> > > > on that.
> > > Then that device can't be used with the mechanism in question.
> > > Or if there are lots of drivers like this maybe someone will be
> > > motivated enough to post a better implementation with a new
> > > feature bit. It's not that I'm arguing against that.
> > >
> > > But given the options of teaching management to play with
> > > netlink API in response to guest actions, and with VCPU stopped,
> > > and doing it all in host kernel drivers, I know I'll prefer host kernel
> > > changes.
> > We have some internal patches that leverage management to respond to various
> > guest actions. If you're interested we can post them. The thing is no one
> > would like to work on the libvirt changes, while internally we have our own
> > orchestration software which is not libvirt. But if you think it's fine we
> > can definitely share our QEMU patches while leaving out libvirt.
> >
> > Thanks,
> > -Siwei
>
> Sure, why not.
>
> The following is generally necessary for any virtio project to happen:
> - guest patches
> - qemu patches
> - spec documentation
>
> Some extras are sometimes a dependency, e.g. host kernel patches.
>
>
> Typically at least two of these are enough for people to
> be able to figure out how things work.
>
>
>
>
> > >
> > > > If you'd go the way, please make sure Intel could change their
> > > > driver first.
> > > We'll see what happens with that. It's Sridhar from intel that implemented
> > > the guest changes after all, so I expect he's motivated to make them
> > > work well.
> > >
> > >
> > > > >    Let's
> > > > > assume drivers are fixed to do that. How does userspace know
> > > > > that's the case? We might need some kind of attribute so
> > > > > userspace can detect it.
> > > > Where do you envision the new attribute could be at? Supposedly it'd be
> > > > exposed by the kernel, which constitutes a new API or API changes.
> > > >
> > > >
> > > > Thanks,
> > > > -Siwei
> > > People add e.g. new attributes in sysfs left and right.  It's unlikely
> > > to be a matter of serious contention.
> > >
> > > > > > > Question is how does userspace know driver isn't broken in this respect?
> > > > > > > Let's add a "vf failover" flag somewhere so this can be probed?
> > > > > > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > >



-- 
Respectfully,
Sameeh Jubran
Linkedin
Software Engineer @ Daynix.

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-12-05 16:18                                                                               ` Sameeh Jubran
@ 2018-12-05 17:18                                                                                 ` Michael S. Tsirkin
  2018-12-08  1:54                                                                                 ` si-wei liu
  1 sibling, 0 replies; 85+ messages in thread
From: Michael S. Tsirkin @ 2018-12-05 17:18 UTC (permalink / raw)
  To: Sameeh Jubran
  Cc: si-wei.liu, sridhar.samudrala, carolyn.wyborny, Siwei Liu,
	venu.busireddy, cohuck, virtio-dev, liran.alon, Yan Vugenfirer,
	jesse.brandeburg, boris.ostrovsky

On Wed, Dec 05, 2018 at 06:18:05PM +0200, Sameeh Jubran wrote:
> Hi all,
> 
> This is a followup on the discussion in the DPDK and Virtio monthly meeting.
> 
> Michael suggested that layer 2 tests should be created in order to
> test the PF/VF behavior in different scenarios without using VMs at
> all which should speed up the testing process.
> 
> The following "mausezahn" tool - which is part of netsniff-ng package
> - can be used in order to generate layer 2 packets as follows:
> 
> mausezahn enp59s0 -c 0 -a rand -b 20:71:c6:2a:68:38 "08 00 aa bb cc dd"
> 
> The packets can be sniffed using tcpdump or netsniff-ng.
> 
> I am not completely sure how the setup should look like on the host,
> but here is a script which assigns macvlan to the PF and sets it's mac
> address to be the same as the VF mac address. The scripts assumes that
> the sriov is already configured and the vf are present.
> 
> [root@wsfd-advnetlab10 ~]# cat go_macvlan.sh
> MACVLAN_NAME=macvlan0
> PF_NAME=enp59s0
> VF_NUMBER=1
> MAC_ADDR=20:71:c6:2a:68:38
> 
> echo "$PF_NAME vf status before setting mac"
> ip link show dev $PF_NAME
> ip link set $PF_NAME vf $VF_NUMBER mac $MAC_ADDR
> ip li add link $PF_NAME $MACVLAN_NAME address $MAC_ADDR type macvlan
> ip link set $PF_NAME up
> echo "$PF_NAME vf status after setting mac"
> ip link show dev $PF_NAME
> 
> Please share your thoughts on how the different test scenarios should
> go, I can customize the scripts further more and host them somewhere.

OK so for starters need code to send the packets (maybe
multiple ones with a counter so drops can be detected?)
and also to sniff and verify their arrival on either of
the two interfaces?

And then on top there would be all the different ways
to switch between the two interfaces back and forth
while this is going on.

The tool would ideally do this several times with each method and report
observed downtime.


-- 
MST

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-12-05 16:18                                                                               ` Sameeh Jubran
  2018-12-05 17:18                                                                                 ` Michael S. Tsirkin
@ 2018-12-08  1:54                                                                                 ` si-wei liu
  2018-12-10 15:13                                                                                   ` Sameeh Jubran
  1 sibling, 1 reply; 85+ messages in thread
From: si-wei liu @ 2018-12-08  1:54 UTC (permalink / raw)
  To: Sameeh Jubran, Michael S. Tsirkin
  Cc: sridhar.samudrala, carolyn.wyborny, Siwei Liu, venu.busireddy,
	cohuck, virtio-dev, liran.alon, Yan Vugenfirer, jesse.brandeburg,
	boris.ostrovsky



On 12/05/2018 08:18 AM, Sameeh Jubran wrote:
> Hi all,
>
> This is a followup on the discussion in the DPDK and Virtio monthly meeting.
>
> Michael suggested that layer 2 tests should be created in order to
> test the PF/VF behavior in different scenarios without using VMs at
> all which should speed up the testing process.
>
> The following "mausezahn" tool - which is part of netsniff-ng package
> - can be used in order to generate layer 2 packets as follows:
>
> mausezahn enp59s0 -c 0 -a rand -b 20:71:c6:2a:68:38 "08 00 aa bb cc dd"
>
> The packets can be sniffed using tcpdump or netsniff-ng.
Does tcpdump or netsniff-ng enable NIC's promiscuous mode by default? 
Try disable it when you monitor/capture the L2 packets.

>
> I am not completely sure how the setup should look like on the host,
> but here is a script which assigns macvlan to the PF and sets it's mac
> address to be the same as the VF mac address. The scripts assumes that
> the sriov is already configured and the vf are present.
>
> [root@wsfd-advnetlab10 ~]# cat go_macvlan.sh
> MACVLAN_NAME=macvlan0
> PF_NAME=enp59s0
> VF_NUMBER=1
> MAC_ADDR=20:71:c6:2a:68:38
>
> echo "$PF_NAME vf status before setting mac"
> ip link show dev $PF_NAME
> ip link set $PF_NAME vf $VF_NUMBER mac $MAC_ADDR
> ip li add link $PF_NAME $MACVLAN_NAME address $MAC_ADDR type macvlan
> ip link set $PF_NAME up
> echo "$PF_NAME vf status after setting mac"
> ip link show dev $PF_NAME
>
> Please share your thoughts on how the different test scenarios should
> go, I can customize the scripts further more and host them somewhere.
You can do something like below:

FAKE_VLAN=123
ip link set $MACVLAN_NAME up
ip link set $PF_NAME vf $VF_NUMBER vlan $FAKE_VLAN

Datapath now switched to macvlan0, which should get the L2 packets from 
over the wire.

ip link set $PF_NAME vf $VF_NUMBER vlan 0
ip link set $MACVLAN_NAME down

Datapath now switched back to VF. VF#1 should get packets.

For a more accurate downtime test, replace 'ip link set vf .. vlan ...' 
to unbind VF from the original driver and bind it to vfio-pci.


Regards,
-Siwei

>
> On Tue, Dec 4, 2018 at 5:59 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>> On Mon, Dec 03, 2018 at 06:09:19PM -0800, si-wei liu wrote:
>>>> I agree. But a single flag is not much of an extension. We don't even
>>>> need it in netlink, can be anywhere in e.g. sysfs.
>>> I think sysfs attribute is for exposing the capability, while you still need
>>> to set up macvtap with some special mode via netlink. That way it doesn't
>>> break current behavior, and when VF's MAC filter is added macvtap would need
>>> to react to remove the filter from NIC. And add the one back when VF's MAC
>>> is removed.
>> All this will be up to the developers actually working on it. My
>> understanding is that intel is going to just change the behaviour
>> unconditionally, and it's already the case for Mellanox.
>> That creates a critical mass large enough that maybe others
>> just need to confirm.
>>
>> ...
>>
>>
>>>> Meanwhile what's missing and was missing all along for the change you
>>>> seem to be advocating for to get off the ground is people who
>>>> are ready to actually send e.g. spec, guest driver, test patches.
>>> Partly because it hadn't been converged to the best way to do it (even the
>>> group ID mechanism with PCI bridge can address our need you don't seem to
>>> think it is valuable). The in-kernel approach is fine at its appearance, but
>>> I personally don't believe changing every legacy driver is the way to go.
>>> It's the choice of implementation and what has been implemented in those
>>> drivers today IMHO is nothing wrong.
>> It's not a question of being wrong as such.
>> A standard behaviour is clearly better than each driver doing its
>> own thing which is the case now. As long as we ar standardizing,
>> let's standardize on something that matches our needs?
>> But I really see no problem with also supporting other options,
>> as long as someone is prepared to actually put in the work.
>>
>>
>>>>>>     Still this assumes just creating a VF
>>>>>> doesn't yet program the on-card filter to cause packet drops.
>>>>> Suppose this behavior is fixable in legacy Intel NIC, you would still need
>>>>> to evacuate the filter programmed by macvtap previously when VF's filter
>>>>> gets activated (typically when VF's netdev is netif_running() in a Linux
>>>>> guest). That's what we and NetVSC call as "datapath switching", and where
>>>>> this could be handled (driver, net core, or userspace) is the core for the
>>>>> architectural design that I spent much time on.
>>>>>
>>>>> Having said it, I don't expect or would desperately wait on one vendor to
>>>>> fix a legacy driver which wasn't quite motivated, then no work would be done
>>>>> on that.
>>>> Then that device can't be used with the mechanism in question.
>>>> Or if there are lots of drivers like this maybe someone will be
>>>> motivated enough to post a better implementation with a new
>>>> feature bit. It's not that I'm arguing against that.
>>>>
>>>> But given the options of teaching management to play with
>>>> netlink API in response to guest actions, and with VCPU stopped,
>>>> and doing it all in host kernel drivers, I know I'll prefer host kernel
>>>> changes.
>>> We have some internal patches that leverage management to respond to various
>>> guest actions. If you're interested we can post them. The thing is no one
>>> would like to work on the libvirt changes, while internally we have our own
>>> orchestration software which is not libvirt. But if you think it's fine we
>>> can definitely share our QEMU patches while leaving out libvirt.
>>>
>>> Thanks,
>>> -Siwei
>> Sure, why not.
>>
>> The following is generally necessary for any virtio project to happen:
>> - guest patches
>> - qemu patches
>> - spec documentation
>>
>> Some extras are sometimes a dependency, e.g. host kernel patches.
>>
>>
>> Typically at least two of these are enough for people to
>> be able to figure out how things work.
>>
>>
>>
>>
>>>>> If you'd go the way, please make sure Intel could change their
>>>>> driver first.
>>>> We'll see what happens with that. It's Sridhar from intel that implemented
>>>> the guest changes after all, so I expect he's motivated to make them
>>>> work well.
>>>>
>>>>
>>>>>>     Let's
>>>>>> assume drivers are fixed to do that. How does userspace know
>>>>>> that's the case? We might need some kind of attribute so
>>>>>> userspace can detect it.
>>>>> Where do you envision the new attribute could be at? Supposedly it'd be
>>>>> exposed by the kernel, which constitutes a new API or API changes.
>>>>>
>>>>>
>>>>> Thanks,
>>>>> -Siwei
>>>> People add e.g. new attributes in sysfs left and right.  It's unlikely
>>>> to be a matter of serious contention.
>>>>
>>>>>>>> Question is how does userspace know driver isn't broken in this respect?
>>>>>>>> Let's add a "vf failover" flag somewhere so this can be probed?
>>>>>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
>>>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>>>>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-12-08  1:54                                                                                 ` si-wei liu
@ 2018-12-10 15:13                                                                                   ` Sameeh Jubran
  2018-12-10 15:34                                                                                     ` Sameeh Jubran
  0 siblings, 1 reply; 85+ messages in thread
From: Sameeh Jubran @ 2018-12-10 15:13 UTC (permalink / raw)
  To: si-wei.liu
  Cc: Michael S. Tsirkin, sridhar.samudrala, carolyn.wyborny,
	Siwei Liu, venu.busireddy, cohuck, virtio-dev, liran.alon,
	Yan Vugenfirer, jesse.brandeburg, boris.ostrovsky

On Sat, Dec 8, 2018 at 3:54 AM si-wei liu <si-wei.liu@oracle.com> wrote:
>
>
>
> On 12/05/2018 08:18 AM, Sameeh Jubran wrote:
> > Hi all,
> >
> > This is a followup on the discussion in the DPDK and Virtio monthly meeting.
> >
> > Michael suggested that layer 2 tests should be created in order to
> > test the PF/VF behavior in different scenarios without using VMs at
> > all which should speed up the testing process.
> >
> > The following "mausezahn" tool - which is part of netsniff-ng package
> > - can be used in order to generate layer 2 packets as follows:
> >
> > mausezahn enp59s0 -c 0 -a rand -b 20:71:c6:2a:68:38 "08 00 aa bb cc dd"
> >
> > The packets can be sniffed using tcpdump or netsniff-ng.
> Does tcpdump or netsniff-ng enable NIC's promiscuous mode by default?
> Try disable it when you monitor/capture the L2 packets.
netsniff-ng enables promiscuous mode by default, however the -M flag
can disable this.

>
> >
> > I am not completely sure how the setup should look like on the host,
> > but here is a script which assigns macvlan to the PF and sets it's mac
> > address to be the same as the VF mac address. The scripts assumes that
> > the sriov is already configured and the vf are present.
> >
> > [root@wsfd-advnetlab10 ~]# cat go_macvlan.sh
> > MACVLAN_NAME=macvlan0
> > PF_NAME=enp59s0
> > VF_NUMBER=1
> > MAC_ADDR=20:71:c6:2a:68:38
> >
> > echo "$PF_NAME vf status before setting mac"
> > ip link show dev $PF_NAME
> > ip link set $PF_NAME vf $VF_NUMBER mac $MAC_ADDR
> > ip li add link $PF_NAME $MACVLAN_NAME address $MAC_ADDR type macvlan
> > ip link set $PF_NAME up
> > echo "$PF_NAME vf status after setting mac"
> > ip link show dev $PF_NAME
> >
> > Please share your thoughts on how the different test scenarios should
> > go, I can customize the scripts further more and host them somewhere.
> You can do something like below:
>
> FAKE_VLAN=123
> ip link set $MACVLAN_NAME up
> ip link set $PF_NAME vf $VF_NUMBER vlan $FAKE_VLAN
>
> Datapath now switched to macvlan0, which should get the L2 packets from
> over the wire.
>
> ip link set $PF_NAME vf $VF_NUMBER vlan 0
> ip link set $MACVLAN_NAME down
>
> Datapath now switched back to VF. VF#1 should get packets.
>
> For a more accurate downtime test, replace 'ip link set vf .. vlan ...'
> to unbind VF from the original driver and bind it to vfio-pci.
Yup.
>
>
> Regards,
> -Siwei
>
> >
> > On Tue, Dec 4, 2018 at 5:59 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> >> On Mon, Dec 03, 2018 at 06:09:19PM -0800, si-wei liu wrote:
> >>>> I agree. But a single flag is not much of an extension. We don't even
> >>>> need it in netlink, can be anywhere in e.g. sysfs.
> >>> I think sysfs attribute is for exposing the capability, while you still need
> >>> to set up macvtap with some special mode via netlink. That way it doesn't
> >>> break current behavior, and when VF's MAC filter is added macvtap would need
> >>> to react to remove the filter from NIC. And add the one back when VF's MAC
> >>> is removed.
> >> All this will be up to the developers actually working on it. My
> >> understanding is that intel is going to just change the behaviour
> >> unconditionally, and it's already the case for Mellanox.
> >> That creates a critical mass large enough that maybe others
> >> just need to confirm.
> >>
> >> ...
> >>
> >>
> >>>> Meanwhile what's missing and was missing all along for the change you
> >>>> seem to be advocating for to get off the ground is people who
> >>>> are ready to actually send e.g. spec, guest driver, test patches.
> >>> Partly because it hadn't been converged to the best way to do it (even the
> >>> group ID mechanism with PCI bridge can address our need you don't seem to
> >>> think it is valuable). The in-kernel approach is fine at its appearance, but
> >>> I personally don't believe changing every legacy driver is the way to go.
> >>> It's the choice of implementation and what has been implemented in those
> >>> drivers today IMHO is nothing wrong.
> >> It's not a question of being wrong as such.
> >> A standard behaviour is clearly better than each driver doing its
> >> own thing which is the case now. As long as we ar standardizing,
> >> let's standardize on something that matches our needs?
> >> But I really see no problem with also supporting other options,
> >> as long as someone is prepared to actually put in the work.
> >>
> >>
> >>>>>>     Still this assumes just creating a VF
> >>>>>> doesn't yet program the on-card filter to cause packet drops.
> >>>>> Suppose this behavior is fixable in legacy Intel NIC, you would still need
> >>>>> to evacuate the filter programmed by macvtap previously when VF's filter
> >>>>> gets activated (typically when VF's netdev is netif_running() in a Linux
> >>>>> guest). That's what we and NetVSC call as "datapath switching", and where
> >>>>> this could be handled (driver, net core, or userspace) is the core for the
> >>>>> architectural design that I spent much time on.
> >>>>>
> >>>>> Having said it, I don't expect or would desperately wait on one vendor to
> >>>>> fix a legacy driver which wasn't quite motivated, then no work would be done
> >>>>> on that.
> >>>> Then that device can't be used with the mechanism in question.
> >>>> Or if there are lots of drivers like this maybe someone will be
> >>>> motivated enough to post a better implementation with a new
> >>>> feature bit. It's not that I'm arguing against that.
> >>>>
> >>>> But given the options of teaching management to play with
> >>>> netlink API in response to guest actions, and with VCPU stopped,
> >>>> and doing it all in host kernel drivers, I know I'll prefer host kernel
> >>>> changes.
> >>> We have some internal patches that leverage management to respond to various
> >>> guest actions. If you're interested we can post them. The thing is no one
> >>> would like to work on the libvirt changes, while internally we have our own
> >>> orchestration software which is not libvirt. But if you think it's fine we
> >>> can definitely share our QEMU patches while leaving out libvirt.
> >>>
> >>> Thanks,
> >>> -Siwei
> >> Sure, why not.
> >>
> >> The following is generally necessary for any virtio project to happen:
> >> - guest patches
> >> - qemu patches
> >> - spec documentation
> >>
> >> Some extras are sometimes a dependency, e.g. host kernel patches.
> >>
> >>
> >> Typically at least two of these are enough for people to
> >> be able to figure out how things work.
> >>
> >>
> >>
> >>
> >>>>> If you'd go the way, please make sure Intel could change their
> >>>>> driver first.
> >>>> We'll see what happens with that. It's Sridhar from intel that implemented
> >>>> the guest changes after all, so I expect he's motivated to make them
> >>>> work well.
> >>>>
> >>>>
> >>>>>>     Let's
> >>>>>> assume drivers are fixed to do that. How does userspace know
> >>>>>> that's the case? We might need some kind of attribute so
> >>>>>> userspace can detect it.
> >>>>> Where do you envision the new attribute could be at? Supposedly it'd be
> >>>>> exposed by the kernel, which constitutes a new API or API changes.
> >>>>>
> >>>>>
> >>>>> Thanks,
> >>>>> -Siwei
> >>>> People add e.g. new attributes in sysfs left and right.  It's unlikely
> >>>> to be a matter of serious contention.
> >>>>
> >>>>>>>> Question is how does userspace know driver isn't broken in this respect?
> >>>>>>>> Let's add a "vf failover" flag somewhere so this can be probed?
> >>>>>>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> >>>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> >>>>
> >
> >
>


-- 
Respectfully,
Sameeh Jubran
Linkedin
Software Engineer @ Daynix.

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-12-10 15:13                                                                                   ` Sameeh Jubran
@ 2018-12-10 15:34                                                                                     ` Sameeh Jubran
  2018-12-10 17:46                                                                                       ` Michael S. Tsirkin
  0 siblings, 1 reply; 85+ messages in thread
From: Sameeh Jubran @ 2018-12-10 15:34 UTC (permalink / raw)
  To: si-wei.liu
  Cc: Michael S. Tsirkin, sridhar.samudrala, carolyn.wyborny,
	Siwei Liu, venu.busireddy, cohuck, virtio-dev, liran.alon,
	Yan Vugenfirer, jesse.brandeburg, boris.ostrovsky

On Mon, Dec 10, 2018 at 5:13 PM Sameeh Jubran <sameeh@daynix.com> wrote:
>
> On Sat, Dec 8, 2018 at 3:54 AM si-wei liu <si-wei.liu@oracle.com> wrote:
> >
> >
> >
> > On 12/05/2018 08:18 AM, Sameeh Jubran wrote:
> > > Hi all,
> > >
> > > This is a followup on the discussion in the DPDK and Virtio monthly meeting.
> > >
> > > Michael suggested that layer 2 tests should be created in order to
> > > test the PF/VF behavior in different scenarios without using VMs at
> > > all which should speed up the testing process.
> > >
> > > The following "mausezahn" tool - which is part of netsniff-ng package
> > > - can be used in order to generate layer 2 packets as follows:
> > >
> > > mausezahn enp59s0 -c 0 -a rand -b 20:71:c6:2a:68:38 "08 00 aa bb cc dd"
> > >
> > > The packets can be sniffed using tcpdump or netsniff-ng.
> > Does tcpdump or netsniff-ng enable NIC's promiscuous mode by default?
> > Try disable it when you monitor/capture the L2 packets.
> netsniff-ng enables promiscuous mode by default, however the -M flag
> can disable this.
>
> >
> > >
> > > I am not completely sure how the setup should look like on the host,
> > > but here is a script which assigns macvlan to the PF and sets it's mac
> > > address to be the same as the VF mac address. The scripts assumes that
> > > the sriov is already configured and the vf are present.
> > >
> > > [root@wsfd-advnetlab10 ~]# cat go_macvlan.sh
> > > MACVLAN_NAME=macvlan0
> > > PF_NAME=enp59s0
> > > VF_NUMBER=1
> > > MAC_ADDR=20:71:c6:2a:68:38
> > >
> > > echo "$PF_NAME vf status before setting mac"
> > > ip link show dev $PF_NAME
> > > ip link set $PF_NAME vf $VF_NUMBER mac $MAC_ADDR
> > > ip li add link $PF_NAME $MACVLAN_NAME address $MAC_ADDR type macvlan
> > > ip link set $PF_NAME up
> > > echo "$PF_NAME vf status after setting mac"
> > > ip link show dev $PF_NAME
> > >
> > > Please share your thoughts on how the different test scenarios should
> > > go, I can customize the scripts further more and host them somewhere.
> > You can do something like below:
> >
> > FAKE_VLAN=123
> > ip link set $MACVLAN_NAME up
> > ip link set $PF_NAME vf $VF_NUMBER vlan $FAKE_VLAN
> >
> > Datapath now switched to macvlan0, which should get the L2 packets from
> > over the wire.
> >
> > ip link set $PF_NAME vf $VF_NUMBER vlan 0
> > ip link set $MACVLAN_NAME down
> >
> > Datapath now switched back to VF. VF#1 should get packets.
> >
> > For a more accurate downtime test, replace 'ip link set vf .. vlan ...'
> > to unbind VF from the original driver and bind it to vfio-pci.
> Yup.

The only issue that I'm not sure on how to deal with, is how to listen
to the packets on the vf. How can I make sure that they are arriving
there?

> >
> >
> > Regards,
> > -Siwei
> >
> > >
> > > On Tue, Dec 4, 2018 at 5:59 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >> On Mon, Dec 03, 2018 at 06:09:19PM -0800, si-wei liu wrote:
> > >>>> I agree. But a single flag is not much of an extension. We don't even
> > >>>> need it in netlink, can be anywhere in e.g. sysfs.
> > >>> I think sysfs attribute is for exposing the capability, while you still need
> > >>> to set up macvtap with some special mode via netlink. That way it doesn't
> > >>> break current behavior, and when VF's MAC filter is added macvtap would need
> > >>> to react to remove the filter from NIC. And add the one back when VF's MAC
> > >>> is removed.
> > >> All this will be up to the developers actually working on it. My
> > >> understanding is that intel is going to just change the behaviour
> > >> unconditionally, and it's already the case for Mellanox.
> > >> That creates a critical mass large enough that maybe others
> > >> just need to confirm.
> > >>
> > >> ...
> > >>
> > >>
> > >>>> Meanwhile what's missing and was missing all along for the change you
> > >>>> seem to be advocating for to get off the ground is people who
> > >>>> are ready to actually send e.g. spec, guest driver, test patches.
> > >>> Partly because it hadn't been converged to the best way to do it (even the
> > >>> group ID mechanism with PCI bridge can address our need you don't seem to
> > >>> think it is valuable). The in-kernel approach is fine at its appearance, but
> > >>> I personally don't believe changing every legacy driver is the way to go.
> > >>> It's the choice of implementation and what has been implemented in those
> > >>> drivers today IMHO is nothing wrong.
> > >> It's not a question of being wrong as such.
> > >> A standard behaviour is clearly better than each driver doing its
> > >> own thing which is the case now. As long as we ar standardizing,
> > >> let's standardize on something that matches our needs?
> > >> But I really see no problem with also supporting other options,
> > >> as long as someone is prepared to actually put in the work.
> > >>
> > >>
> > >>>>>>     Still this assumes just creating a VF
> > >>>>>> doesn't yet program the on-card filter to cause packet drops.
> > >>>>> Suppose this behavior is fixable in legacy Intel NIC, you would still need
> > >>>>> to evacuate the filter programmed by macvtap previously when VF's filter
> > >>>>> gets activated (typically when VF's netdev is netif_running() in a Linux
> > >>>>> guest). That's what we and NetVSC call as "datapath switching", and where
> > >>>>> this could be handled (driver, net core, or userspace) is the core for the
> > >>>>> architectural design that I spent much time on.
> > >>>>>
> > >>>>> Having said it, I don't expect or would desperately wait on one vendor to
> > >>>>> fix a legacy driver which wasn't quite motivated, then no work would be done
> > >>>>> on that.
> > >>>> Then that device can't be used with the mechanism in question.
> > >>>> Or if there are lots of drivers like this maybe someone will be
> > >>>> motivated enough to post a better implementation with a new
> > >>>> feature bit. It's not that I'm arguing against that.
> > >>>>
> > >>>> But given the options of teaching management to play with
> > >>>> netlink API in response to guest actions, and with VCPU stopped,
> > >>>> and doing it all in host kernel drivers, I know I'll prefer host kernel
> > >>>> changes.
> > >>> We have some internal patches that leverage management to respond to various
> > >>> guest actions. If you're interested we can post them. The thing is no one
> > >>> would like to work on the libvirt changes, while internally we have our own
> > >>> orchestration software which is not libvirt. But if you think it's fine we
> > >>> can definitely share our QEMU patches while leaving out libvirt.
> > >>>
> > >>> Thanks,
> > >>> -Siwei
> > >> Sure, why not.
> > >>
> > >> The following is generally necessary for any virtio project to happen:
> > >> - guest patches
> > >> - qemu patches
> > >> - spec documentation
> > >>
> > >> Some extras are sometimes a dependency, e.g. host kernel patches.
> > >>
> > >>
> > >> Typically at least two of these are enough for people to
> > >> be able to figure out how things work.
> > >>
> > >>
> > >>
> > >>
> > >>>>> If you'd go the way, please make sure Intel could change their
> > >>>>> driver first.
> > >>>> We'll see what happens with that. It's Sridhar from intel that implemented
> > >>>> the guest changes after all, so I expect he's motivated to make them
> > >>>> work well.
> > >>>>
> > >>>>
> > >>>>>>     Let's
> > >>>>>> assume drivers are fixed to do that. How does userspace know
> > >>>>>> that's the case? We might need some kind of attribute so
> > >>>>>> userspace can detect it.
> > >>>>> Where do you envision the new attribute could be at? Supposedly it'd be
> > >>>>> exposed by the kernel, which constitutes a new API or API changes.
> > >>>>>
> > >>>>>
> > >>>>> Thanks,
> > >>>>> -Siwei
> > >>>> People add e.g. new attributes in sysfs left and right.  It's unlikely
> > >>>> to be a matter of serious contention.
> > >>>>
> > >>>>>>>> Question is how does userspace know driver isn't broken in this respect?
> > >>>>>>>> Let's add a "vf failover" flag somewhere so this can be probed?
> > >>>>>>>>
> > >>>> ---------------------------------------------------------------------
> > >>>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > >>>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > >>>>
> > >
> > >
> >
>
>
> --
> Respectfully,
> Sameeh Jubran
> Linkedin
> Software Engineer @ Daynix.



-- 
Respectfully,
Sameeh Jubran
Linkedin
Software Engineer @ Daynix.

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-12-10 15:34                                                                                     ` Sameeh Jubran
@ 2018-12-10 17:46                                                                                       ` Michael S. Tsirkin
  2018-12-11 15:50                                                                                         ` Sameeh Jubran
  0 siblings, 1 reply; 85+ messages in thread
From: Michael S. Tsirkin @ 2018-12-10 17:46 UTC (permalink / raw)
  To: Sameeh Jubran
  Cc: si-wei.liu, sridhar.samudrala, carolyn.wyborny, Siwei Liu,
	venu.busireddy, cohuck, virtio-dev, liran.alon, Yan Vugenfirer,
	jesse.brandeburg, boris.ostrovsky

On Mon, Dec 10, 2018 at 05:34:53PM +0200, Sameeh Jubran wrote:
> On Mon, Dec 10, 2018 at 5:13 PM Sameeh Jubran <sameeh@daynix.com> wrote:
> >
> > On Sat, Dec 8, 2018 at 3:54 AM si-wei liu <si-wei.liu@oracle.com> wrote:
> > >
> > >
> > >
> > > On 12/05/2018 08:18 AM, Sameeh Jubran wrote:
> > > > Hi all,
> > > >
> > > > This is a followup on the discussion in the DPDK and Virtio monthly meeting.
> > > >
> > > > Michael suggested that layer 2 tests should be created in order to
> > > > test the PF/VF behavior in different scenarios without using VMs at
> > > > all which should speed up the testing process.
> > > >
> > > > The following "mausezahn" tool - which is part of netsniff-ng package
> > > > - can be used in order to generate layer 2 packets as follows:
> > > >
> > > > mausezahn enp59s0 -c 0 -a rand -b 20:71:c6:2a:68:38 "08 00 aa bb cc dd"
> > > >
> > > > The packets can be sniffed using tcpdump or netsniff-ng.
> > > Does tcpdump or netsniff-ng enable NIC's promiscuous mode by default?
> > > Try disable it when you monitor/capture the L2 packets.
> > netsniff-ng enables promiscuous mode by default, however the -M flag
> > can disable this.
> >
> > >
> > > >
> > > > I am not completely sure how the setup should look like on the host,
> > > > but here is a script which assigns macvlan to the PF and sets it's mac
> > > > address to be the same as the VF mac address. The scripts assumes that
> > > > the sriov is already configured and the vf are present.
> > > >
> > > > [root@wsfd-advnetlab10 ~]# cat go_macvlan.sh
> > > > MACVLAN_NAME=macvlan0
> > > > PF_NAME=enp59s0
> > > > VF_NUMBER=1
> > > > MAC_ADDR=20:71:c6:2a:68:38
> > > >
> > > > echo "$PF_NAME vf status before setting mac"
> > > > ip link show dev $PF_NAME
> > > > ip link set $PF_NAME vf $VF_NUMBER mac $MAC_ADDR
> > > > ip li add link $PF_NAME $MACVLAN_NAME address $MAC_ADDR type macvlan
> > > > ip link set $PF_NAME up
> > > > echo "$PF_NAME vf status after setting mac"
> > > > ip link show dev $PF_NAME
> > > >
> > > > Please share your thoughts on how the different test scenarios should
> > > > go, I can customize the scripts further more and host them somewhere.
> > > You can do something like below:
> > >
> > > FAKE_VLAN=123
> > > ip link set $MACVLAN_NAME up
> > > ip link set $PF_NAME vf $VF_NUMBER vlan $FAKE_VLAN
> > >
> > > Datapath now switched to macvlan0, which should get the L2 packets from
> > > over the wire.
> > >
> > > ip link set $PF_NAME vf $VF_NUMBER vlan 0
> > > ip link set $MACVLAN_NAME down
> > >
> > > Datapath now switched back to VF. VF#1 should get packets.
> > >
> > > For a more accurate downtime test, replace 'ip link set vf .. vlan ...'
> > > to unbind VF from the original driver and bind it to vfio-pci.
> > Yup.
> 
> The only issue that I'm not sure on how to deal with, is how to listen
> to the packets on the vf. How can I make sure that they are arriving
> there?

Using --dev flag to bind to the vf device?


> > >
> > >
> > > Regards,
> > > -Siwei
> > >
> > > >
> > > > On Tue, Dec 4, 2018 at 5:59 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >> On Mon, Dec 03, 2018 at 06:09:19PM -0800, si-wei liu wrote:
> > > >>>> I agree. But a single flag is not much of an extension. We don't even
> > > >>>> need it in netlink, can be anywhere in e.g. sysfs.
> > > >>> I think sysfs attribute is for exposing the capability, while you still need
> > > >>> to set up macvtap with some special mode via netlink. That way it doesn't
> > > >>> break current behavior, and when VF's MAC filter is added macvtap would need
> > > >>> to react to remove the filter from NIC. And add the one back when VF's MAC
> > > >>> is removed.
> > > >> All this will be up to the developers actually working on it. My
> > > >> understanding is that intel is going to just change the behaviour
> > > >> unconditionally, and it's already the case for Mellanox.
> > > >> That creates a critical mass large enough that maybe others
> > > >> just need to confirm.
> > > >>
> > > >> ...
> > > >>
> > > >>
> > > >>>> Meanwhile what's missing and was missing all along for the change you
> > > >>>> seem to be advocating for to get off the ground is people who
> > > >>>> are ready to actually send e.g. spec, guest driver, test patches.
> > > >>> Partly because it hadn't been converged to the best way to do it (even the
> > > >>> group ID mechanism with PCI bridge can address our need you don't seem to
> > > >>> think it is valuable). The in-kernel approach is fine at its appearance, but
> > > >>> I personally don't believe changing every legacy driver is the way to go.
> > > >>> It's the choice of implementation and what has been implemented in those
> > > >>> drivers today IMHO is nothing wrong.
> > > >> It's not a question of being wrong as such.
> > > >> A standard behaviour is clearly better than each driver doing its
> > > >> own thing which is the case now. As long as we ar standardizing,
> > > >> let's standardize on something that matches our needs?
> > > >> But I really see no problem with also supporting other options,
> > > >> as long as someone is prepared to actually put in the work.
> > > >>
> > > >>
> > > >>>>>>     Still this assumes just creating a VF
> > > >>>>>> doesn't yet program the on-card filter to cause packet drops.
> > > >>>>> Suppose this behavior is fixable in legacy Intel NIC, you would still need
> > > >>>>> to evacuate the filter programmed by macvtap previously when VF's filter
> > > >>>>> gets activated (typically when VF's netdev is netif_running() in a Linux
> > > >>>>> guest). That's what we and NetVSC call as "datapath switching", and where
> > > >>>>> this could be handled (driver, net core, or userspace) is the core for the
> > > >>>>> architectural design that I spent much time on.
> > > >>>>>
> > > >>>>> Having said it, I don't expect or would desperately wait on one vendor to
> > > >>>>> fix a legacy driver which wasn't quite motivated, then no work would be done
> > > >>>>> on that.
> > > >>>> Then that device can't be used with the mechanism in question.
> > > >>>> Or if there are lots of drivers like this maybe someone will be
> > > >>>> motivated enough to post a better implementation with a new
> > > >>>> feature bit. It's not that I'm arguing against that.
> > > >>>>
> > > >>>> But given the options of teaching management to play with
> > > >>>> netlink API in response to guest actions, and with VCPU stopped,
> > > >>>> and doing it all in host kernel drivers, I know I'll prefer host kernel
> > > >>>> changes.
> > > >>> We have some internal patches that leverage management to respond to various
> > > >>> guest actions. If you're interested we can post them. The thing is no one
> > > >>> would like to work on the libvirt changes, while internally we have our own
> > > >>> orchestration software which is not libvirt. But if you think it's fine we
> > > >>> can definitely share our QEMU patches while leaving out libvirt.
> > > >>>
> > > >>> Thanks,
> > > >>> -Siwei
> > > >> Sure, why not.
> > > >>
> > > >> The following is generally necessary for any virtio project to happen:
> > > >> - guest patches
> > > >> - qemu patches
> > > >> - spec documentation
> > > >>
> > > >> Some extras are sometimes a dependency, e.g. host kernel patches.
> > > >>
> > > >>
> > > >> Typically at least two of these are enough for people to
> > > >> be able to figure out how things work.
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>>>> If you'd go the way, please make sure Intel could change their
> > > >>>>> driver first.
> > > >>>> We'll see what happens with that. It's Sridhar from intel that implemented
> > > >>>> the guest changes after all, so I expect he's motivated to make them
> > > >>>> work well.
> > > >>>>
> > > >>>>
> > > >>>>>>     Let's
> > > >>>>>> assume drivers are fixed to do that. How does userspace know
> > > >>>>>> that's the case? We might need some kind of attribute so
> > > >>>>>> userspace can detect it.
> > > >>>>> Where do you envision the new attribute could be at? Supposedly it'd be
> > > >>>>> exposed by the kernel, which constitutes a new API or API changes.
> > > >>>>>
> > > >>>>>
> > > >>>>> Thanks,
> > > >>>>> -Siwei
> > > >>>> People add e.g. new attributes in sysfs left and right.  It's unlikely
> > > >>>> to be a matter of serious contention.
> > > >>>>
> > > >>>>>>>> Question is how does userspace know driver isn't broken in this respect?
> > > >>>>>>>> Let's add a "vf failover" flag somewhere so this can be probed?
> > > >>>>>>>>
> > > >>>> ---------------------------------------------------------------------
> > > >>>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > >>>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > > >>>>
> > > >
> > > >
> > >
> >
> >
> > --
> > Respectfully,
> > Sameeh Jubran
> > Linkedin
> > Software Engineer @ Daynix.
> 
> 
> 
> -- 
> Respectfully,
> Sameeh Jubran
> Linkedin
> Software Engineer @ Daynix.

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature
  2018-12-10 17:46                                                                                       ` Michael S. Tsirkin
@ 2018-12-11 15:50                                                                                         ` Sameeh Jubran
  0 siblings, 0 replies; 85+ messages in thread
From: Sameeh Jubran @ 2018-12-11 15:50 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: si-wei.liu, sridhar.samudrala, carolyn.wyborny, Siwei Liu,
	venu.busireddy, cohuck, virtio-dev, liran.alon, Yan Vugenfirer,
	jesse.brandeburg, boris.ostrovsky

On Mon, Dec 10, 2018 at 7:46 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Mon, Dec 10, 2018 at 05:34:53PM +0200, Sameeh Jubran wrote:
> > On Mon, Dec 10, 2018 at 5:13 PM Sameeh Jubran <sameeh@daynix.com> wrote:
> > >
> > > On Sat, Dec 8, 2018 at 3:54 AM si-wei liu <si-wei.liu@oracle.com> wrote:
> > > >
> > > >
> > > >
> > > > On 12/05/2018 08:18 AM, Sameeh Jubran wrote:
> > > > > Hi all,
> > > > >
> > > > > This is a followup on the discussion in the DPDK and Virtio monthly meeting.
> > > > >
> > > > > Michael suggested that layer 2 tests should be created in order to
> > > > > test the PF/VF behavior in different scenarios without using VMs at
> > > > > all which should speed up the testing process.
> > > > >
> > > > > The following "mausezahn" tool - which is part of netsniff-ng package
> > > > > - can be used in order to generate layer 2 packets as follows:
> > > > >
> > > > > mausezahn enp59s0 -c 0 -a rand -b 20:71:c6:2a:68:38 "08 00 aa bb cc dd"
> > > > >
> > > > > The packets can be sniffed using tcpdump or netsniff-ng.
> > > > Does tcpdump or netsniff-ng enable NIC's promiscuous mode by default?
> > > > Try disable it when you monitor/capture the L2 packets.
> > > netsniff-ng enables promiscuous mode by default, however the -M flag
> > > can disable this.
> > >
> > > >
> > > > >
> > > > > I am not completely sure how the setup should look like on the host,
> > > > > but here is a script which assigns macvlan to the PF and sets it's mac
> > > > > address to be the same as the VF mac address. The scripts assumes that
> > > > > the sriov is already configured and the vf are present.
> > > > >
> > > > > [root@wsfd-advnetlab10 ~]# cat go_macvlan.sh
> > > > > MACVLAN_NAME=macvlan0
> > > > > PF_NAME=enp59s0
> > > > > VF_NUMBER=1
> > > > > MAC_ADDR=20:71:c6:2a:68:38
> > > > >
> > > > > echo "$PF_NAME vf status before setting mac"
> > > > > ip link show dev $PF_NAME
> > > > > ip link set $PF_NAME vf $VF_NUMBER mac $MAC_ADDR
> > > > > ip li add link $PF_NAME $MACVLAN_NAME address $MAC_ADDR type macvlan
> > > > > ip link set $PF_NAME up
> > > > > echo "$PF_NAME vf status after setting mac"
> > > > > ip link show dev $PF_NAME
> > > > >
> > > > > Please share your thoughts on how the different test scenarios should
> > > > > go, I can customize the scripts further more and host them somewhere.
> > > > You can do something like below:
> > > >
> > > > FAKE_VLAN=123
> > > > ip link set $MACVLAN_NAME up
> > > > ip link set $PF_NAME vf $VF_NUMBER vlan $FAKE_VLAN
> > > >
> > > > Datapath now switched to macvlan0, which should get the L2 packets from
> > > > over the wire.
> > > >
> > > > ip link set $PF_NAME vf $VF_NUMBER vlan 0
> > > > ip link set $MACVLAN_NAME down
> > > >
> > > > Datapath now switched back to VF. VF#1 should get packets.
> > > >
> > > > For a more accurate downtime test, replace 'ip link set vf .. vlan ...'
> > > > to unbind VF from the original driver and bind it to vfio-pci.
> > > Yup.
> >
> > The only issue that I'm not sure on how to deal with, is how to listen
> > to the packets on the vf. How can I make sure that they are arriving
> > there?
>
> Using --dev flag to bind to the vf device?
Nope this doesn't work since there is no vf interface on the host.
I have tried to specify the vf's device address as well but it doesn't
seem to work too.
>
>
> > > >
> > > >
> > > > Regards,
> > > > -Siwei
> > > >
> > > > >
> > > > > On Tue, Dec 4, 2018 at 5:59 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > >> On Mon, Dec 03, 2018 at 06:09:19PM -0800, si-wei liu wrote:
> > > > >>>> I agree. But a single flag is not much of an extension. We don't even
> > > > >>>> need it in netlink, can be anywhere in e.g. sysfs.
> > > > >>> I think sysfs attribute is for exposing the capability, while you still need
> > > > >>> to set up macvtap with some special mode via netlink. That way it doesn't
> > > > >>> break current behavior, and when VF's MAC filter is added macvtap would need
> > > > >>> to react to remove the filter from NIC. And add the one back when VF's MAC
> > > > >>> is removed.
> > > > >> All this will be up to the developers actually working on it. My
> > > > >> understanding is that intel is going to just change the behaviour
> > > > >> unconditionally, and it's already the case for Mellanox.
> > > > >> That creates a critical mass large enough that maybe others
> > > > >> just need to confirm.
> > > > >>
> > > > >> ...
> > > > >>
> > > > >>
> > > > >>>> Meanwhile what's missing and was missing all along for the change you
> > > > >>>> seem to be advocating for to get off the ground is people who
> > > > >>>> are ready to actually send e.g. spec, guest driver, test patches.
> > > > >>> Partly because it hadn't been converged to the best way to do it (even the
> > > > >>> group ID mechanism with PCI bridge can address our need you don't seem to
> > > > >>> think it is valuable). The in-kernel approach is fine at its appearance, but
> > > > >>> I personally don't believe changing every legacy driver is the way to go.
> > > > >>> It's the choice of implementation and what has been implemented in those
> > > > >>> drivers today IMHO is nothing wrong.
> > > > >> It's not a question of being wrong as such.
> > > > >> A standard behaviour is clearly better than each driver doing its
> > > > >> own thing which is the case now. As long as we ar standardizing,
> > > > >> let's standardize on something that matches our needs?
> > > > >> But I really see no problem with also supporting other options,
> > > > >> as long as someone is prepared to actually put in the work.
> > > > >>
> > > > >>
> > > > >>>>>>     Still this assumes just creating a VF
> > > > >>>>>> doesn't yet program the on-card filter to cause packet drops.
> > > > >>>>> Suppose this behavior is fixable in legacy Intel NIC, you would still need
> > > > >>>>> to evacuate the filter programmed by macvtap previously when VF's filter
> > > > >>>>> gets activated (typically when VF's netdev is netif_running() in a Linux
> > > > >>>>> guest). That's what we and NetVSC call as "datapath switching", and where
> > > > >>>>> this could be handled (driver, net core, or userspace) is the core for the
> > > > >>>>> architectural design that I spent much time on.
> > > > >>>>>
> > > > >>>>> Having said it, I don't expect or would desperately wait on one vendor to
> > > > >>>>> fix a legacy driver which wasn't quite motivated, then no work would be done
> > > > >>>>> on that.
> > > > >>>> Then that device can't be used with the mechanism in question.
> > > > >>>> Or if there are lots of drivers like this maybe someone will be
> > > > >>>> motivated enough to post a better implementation with a new
> > > > >>>> feature bit. It's not that I'm arguing against that.
> > > > >>>>
> > > > >>>> But given the options of teaching management to play with
> > > > >>>> netlink API in response to guest actions, and with VCPU stopped,
> > > > >>>> and doing it all in host kernel drivers, I know I'll prefer host kernel
> > > > >>>> changes.
> > > > >>> We have some internal patches that leverage management to respond to various
> > > > >>> guest actions. If you're interested we can post them. The thing is no one
> > > > >>> would like to work on the libvirt changes, while internally we have our own
> > > > >>> orchestration software which is not libvirt. But if you think it's fine we
> > > > >>> can definitely share our QEMU patches while leaving out libvirt.
> > > > >>>
> > > > >>> Thanks,
> > > > >>> -Siwei
> > > > >> Sure, why not.
> > > > >>
> > > > >> The following is generally necessary for any virtio project to happen:
> > > > >> - guest patches
> > > > >> - qemu patches
> > > > >> - spec documentation
> > > > >>
> > > > >> Some extras are sometimes a dependency, e.g. host kernel patches.
> > > > >>
> > > > >>
> > > > >> Typically at least two of these are enough for people to
> > > > >> be able to figure out how things work.
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>>>> If you'd go the way, please make sure Intel could change their
> > > > >>>>> driver first.
> > > > >>>> We'll see what happens with that. It's Sridhar from intel that implemented
> > > > >>>> the guest changes after all, so I expect he's motivated to make them
> > > > >>>> work well.
> > > > >>>>
> > > > >>>>
> > > > >>>>>>     Let's
> > > > >>>>>> assume drivers are fixed to do that. How does userspace know
> > > > >>>>>> that's the case? We might need some kind of attribute so
> > > > >>>>>> userspace can detect it.
> > > > >>>>> Where do you envision the new attribute could be at? Supposedly it'd be
> > > > >>>>> exposed by the kernel, which constitutes a new API or API changes.
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> Thanks,
> > > > >>>>> -Siwei
> > > > >>>> People add e.g. new attributes in sysfs left and right.  It's unlikely
> > > > >>>> to be a matter of serious contention.
> > > > >>>>
> > > > >>>>>>>> Question is how does userspace know driver isn't broken in this respect?
> > > > >>>>>>>> Let's add a "vf failover" flag somewhere so this can be probed?
> > > > >>>>>>>>
> > > > >>>> ---------------------------------------------------------------------
> > > > >>>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > > >>>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > > > >>>>
> > > > >
> > > > >
> > > >
> > >
> > >
> > > --
> > > Respectfully,
> > > Sameeh Jubran
> > > Linkedin
> > > Software Engineer @ Daynix.
> >
> >
> >
> > --
> > Respectfully,
> > Sameeh Jubran
> > Linkedin
> > Software Engineer @ Daynix.



-- 
Respectfully,
Sameeh Jubran
Linkedin
Software Engineer @ Daynix.

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 85+ messages in thread

end of thread, other threads:[~2018-12-11 15:51 UTC | newest]

Thread overview: 85+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-15 18:49 [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature Sridhar Samudrala
2018-08-27  8:40 ` [virtio-dev] " Cornelia Huck
2018-08-27 12:34   ` Michael S. Tsirkin
2018-08-27 16:50     ` Samudrala, Sridhar
2018-08-28 12:13       ` Michael S. Tsirkin
2018-09-07 21:34 ` [virtio-dev] " Michael S. Tsirkin
2018-09-12 15:17   ` Samudrala, Sridhar
2018-09-12 15:22     ` Michael S. Tsirkin
2018-09-18 10:20       ` Cornelia Huck
2018-09-18 10:37         ` Sameeh Jubran
2018-09-18 13:25           ` Michael S. Tsirkin
2018-09-18 18:30             ` Siwei Liu
2018-09-18 18:39               ` Michael S. Tsirkin
2018-09-18 19:10                 ` Siwei Liu
2018-09-20  3:04                   ` Michael S. Tsirkin
2018-09-19  5:03             ` Samudrala, Sridhar
2018-09-20  5:51             ` Sameeh Jubran
2018-09-18 13:35         ` Michael S. Tsirkin
2018-09-18 15:13           ` Venu Busireddy
2018-09-18 15:31             ` Michael S. Tsirkin
2018-09-18 18:48               ` Siwei Liu
2018-09-20  3:11                 ` Michael S. Tsirkin
2018-09-20 23:57                   ` Siwei Liu
2018-09-21  2:23                     ` Michael S. Tsirkin
2018-09-21  2:34                       ` Michael S. Tsirkin
2018-09-27  0:18                       ` Siwei Liu
2018-09-27  7:17                         ` Sameeh Jubran
2018-09-27 16:17                           ` Michael S. Tsirkin
2018-09-27 17:23                             ` Samudrala, Sridhar
2018-09-27 23:45                               ` Michael S. Tsirkin
2018-09-30  9:17                               ` Sameeh Jubran
2018-09-30 13:50                                 ` Sameeh Jubran
2018-09-27 16:32                         ` Michael S. Tsirkin
2018-10-02  8:42                           ` Siwei Liu
2018-10-02 12:43                             ` Michael S. Tsirkin
2018-10-05  0:03                               ` Siwei Liu
2018-10-05  5:17                                 ` Samudrala, Sridhar
2018-10-10 14:40                                   ` Michael S. Tsirkin
2018-10-11  0:16                                     ` Samudrala, Sridhar
2018-10-05 19:18                                 ` Michael S. Tsirkin
2018-10-08 22:06                                   ` Sameeh Jubran
2018-10-10 14:43                                     ` Michael S. Tsirkin
2018-10-11  1:26                                   ` Siwei Liu
2018-10-18 23:20                                     ` Siwei Liu
2018-10-18 23:40                                       ` Michael S. Tsirkin
2018-10-19  3:45                                     ` Michael S. Tsirkin
2018-11-21 15:39                                       ` Sameeh Jubran
2018-11-21 18:41                                         ` Michael S. Tsirkin
2018-11-21 20:04                                           ` Sameeh Jubran
2018-11-21 23:51                                             ` Samudrala, Sridhar
2018-11-22 13:55                                               ` Sameeh Jubran
2018-11-22 18:27                                             ` Michael S. Tsirkin
2018-11-26 15:13                                               ` Sameeh Jubran
2018-11-26 15:43                                                 ` Sameeh Jubran
2018-11-26 20:22                                                   ` Samudrala, Sridhar
2018-11-27 11:24                                                     ` Sameeh Jubran
2018-11-28 17:08                                                     ` Michael S. Tsirkin
2018-11-28 17:31                                                       ` Samudrala, Sridhar
2018-11-28 17:35                                                         ` Michael S. Tsirkin
2018-11-28 18:39                                                           ` Samudrala, Sridhar
2018-11-28 18:51                                                             ` Michael S. Tsirkin
2018-11-29  6:29                                                               ` Samudrala, Sridhar
2018-11-28 20:06                                                             ` Michael S. Tsirkin
2018-11-28 20:28                                                               ` si-wei liu
2018-11-28 20:43                                                                 ` Michael S. Tsirkin
2018-11-28 20:47                                                                   ` si-wei liu
2018-11-29  1:15                                                                 ` Michael S. Tsirkin
2018-11-29  6:37                                                                   ` Samudrala, Sridhar
2018-11-29 20:14                                                                   ` si-wei liu
2018-11-29 21:17                                                                     ` Michael S. Tsirkin
2018-11-29 22:53                                                                       ` si-wei liu
2018-11-29 23:53                                                                         ` Samudrala, Sridhar
2018-11-30  0:24                                                                           ` si-wei liu
2018-11-30  3:08                                                                             ` Samudrala, Sridhar
2018-11-30  4:46                                                                               ` si-wei liu
2018-11-30  6:21                                                                         ` Michael S. Tsirkin
2018-12-04  2:09                                                                           ` si-wei liu
2018-12-04  3:59                                                                             ` Michael S. Tsirkin
2018-12-05 16:18                                                                               ` Sameeh Jubran
2018-12-05 17:18                                                                                 ` Michael S. Tsirkin
2018-12-08  1:54                                                                                 ` si-wei liu
2018-12-10 15:13                                                                                   ` Sameeh Jubran
2018-12-10 15:34                                                                                     ` Sameeh Jubran
2018-12-10 17:46                                                                                       ` Michael S. Tsirkin
2018-12-11 15:50                                                                                         ` Sameeh Jubran

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.