All of lore.kernel.org
 help / color / mirror / Atom feed
* virtio "how to restart applications" - //dpdk.org/doc/virtio-net-pmd
@ 2017-03-16 19:39 Gopakumar Choorakkot Edakkunni
  2017-03-17  2:06 ` Yuanhan Liu
  0 siblings, 1 reply; 15+ messages in thread
From: Gopakumar Choorakkot Edakkunni @ 2017-03-16 19:39 UTC (permalink / raw)
  To: dev, Yuanhan Liu, huawei.xie

So the doc says we should call rte_eth_dev_close() *before* going down. And
I know that especially in dpdk-virtionet  in the guest + ovs-dpdk in the
host, the ovs ends up getting stalled/stuck (!!) if I dont close the port
before starting() it when the guest dpdk process comes back up.

Considering that this not done properly can screw up the HOST ovs, and I
want to do everything possible to avoid that, I want to be 200% sure that I
call close even if my process gets a kill -9 .. So obviously the only way
of doing that is to close the port when the dpdk process comes back up and
*before* we init the port. rte_eth_dev_close() is not capable of doing that
as it expects the port parameters to be initialized etc.. before it can be
called. Any other suggestions on what can be done to close on restart
rather than close on going down ? Thought of bouncing this by the alias
before I add a version of close myself that can do this close-on-restart

Rgds,
Gopa.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: virtio "how to restart applications" - //dpdk.org/doc/virtio-net-pmd
  2017-03-16 19:39 virtio "how to restart applications" - //dpdk.org/doc/virtio-net-pmd Gopakumar Choorakkot Edakkunni
@ 2017-03-17  2:06 ` Yuanhan Liu
  2017-03-17  2:48   ` Gopakumar Choorakkot Edakkunni
  0 siblings, 1 reply; 15+ messages in thread
From: Yuanhan Liu @ 2017-03-17  2:06 UTC (permalink / raw)
  To: Gopakumar Choorakkot Edakkunni; +Cc: dev

On Thu, Mar 16, 2017 at 12:39:16PM -0700, Gopakumar Choorakkot Edakkunni wrote:
> So the doc says we should call rte_eth_dev_close() *before* going down. And I
> know that especially in dpdk-virtionet  in the guest + ovs-dpdk in the host,
> the ovs ends up getting stalled/stuck (!!) if I dont close the port before
> starting() it when the guest dpdk process comes back up.

I'm assuming you were using an old version, something like dpdk v2.2?
IIRC, DPDK v16.04 should have fixed your issue.

> Considering that this not done properly can screw up the HOST ovs, and I want
> to do everything possible to avoid that, I want to be 200% sure that I call
> close even if my process gets a kill -9 .. So obviously the only way of doing
> that is to close the port when the dpdk process comes back up and *before* we
> init the port. rte_eth_dev_close() is not capable of doing that as it expects
> the port parameters to be initialized etc.. before it can be called.

We do virtio reset before init, which is basically what rte_eth_dev_close()
mainly does. So I see no big issue here.

The stuck issue is due to hugepage reset by the guest DPDK application,
leading all virtio vring elements being mem zeroed. The old vhost doesn't
handle it well, as a result, it got stuck. And here are some relevant
commits:

    a436f53 vhost: avoid dead loop chain
    c687b0b vhost: check for ring descriptors overflow
    623bc47 vhost: do sanity check for ring descriptor length

	--yliu

> Any other
> suggestions on what can be done to close on restart rather than close on going
> down ? Thought of bouncing this by the alias before I add a version of close
> myself that can do this close-on-restart

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: virtio "how to restart applications" - //dpdk.org/doc/virtio-net-pmd
  2017-03-17  2:06 ` Yuanhan Liu
@ 2017-03-17  2:48   ` Gopakumar Choorakkot Edakkunni
  2017-03-17  4:35     ` Yuanhan Liu
  0 siblings, 1 reply; 15+ messages in thread
From: Gopakumar Choorakkot Edakkunni @ 2017-03-17  2:48 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: dev

Thanks a lot for the response Yuanhan. I am using dpdk v16.07. So what you
are saying is that in 16.07, we dont really need to call
rte_eth_dev_close() on exit, because dpdk will ensure that it will do
virtio reset before init when it comes up right ?

Regarding the vhost commits you mentioned - do we still need those fixes if
we have the "virtio reset before init" mechanism ? Or that is a seperate
problem altogether (and hence we would need those fixes) ?

Rgds,
Gopa.

On Thu, Mar 16, 2017 at 7:06 PM, Yuanhan Liu <yuanhan.liu@linux.intel.com>
wrote:

> On Thu, Mar 16, 2017 at 12:39:16PM -0700, Gopakumar Choorakkot Edakkunni
> wrote:
> > So the doc says we should call rte_eth_dev_close() *before* going down.
> And I
> > know that especially in dpdk-virtionet  in the guest + ovs-dpdk in the
> host,
> > the ovs ends up getting stalled/stuck (!!) if I dont close the port
> before
> > starting() it when the guest dpdk process comes back up.
>
> I'm assuming you were using an old version, something like dpdk v2.2?
> IIRC, DPDK v16.04 should have fixed your issue.
>
> > Considering that this not done properly can screw up the HOST ovs, and I
> want
> > to do everything possible to avoid that, I want to be 200% sure that I
> call
> > close even if my process gets a kill -9 .. So obviously the only way of
> doing
> > that is to close the port when the dpdk process comes back up and
> *before* we
> > init the port. rte_eth_dev_close() is not capable of doing that as it
> expects
> > the port parameters to be initialized etc.. before it can be called.
>
> We do virtio reset before init, which is basically what rte_eth_dev_close()
> mainly does. So I see no big issue here.
>
> The stuck issue is due to hugepage reset by the guest DPDK application,
> leading all virtio vring elements being mem zeroed. The old vhost doesn't
> handle it well, as a result, it got stuck. And here are some relevant
> commits:
>
>     a436f53 vhost: avoid dead loop chain
>     c687b0b vhost: check for ring descriptors overflow
>     623bc47 vhost: do sanity check for ring descriptor length
>
>         --yliu
>
> > Any other
> > suggestions on what can be done to close on restart rather than close on
> going
> > down ? Thought of bouncing this by the alias before I add a version of
> close
> > myself that can do this close-on-restart
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: virtio "how to restart applications" - //dpdk.org/doc/virtio-net-pmd
  2017-03-17  2:48   ` Gopakumar Choorakkot Edakkunni
@ 2017-03-17  4:35     ` Yuanhan Liu
  2017-03-17  4:56       ` Gopakumar Choorakkot Edakkunni
  0 siblings, 1 reply; 15+ messages in thread
From: Yuanhan Liu @ 2017-03-17  4:35 UTC (permalink / raw)
  To: Gopakumar Choorakkot Edakkunni; +Cc: dev

On Thu, Mar 16, 2017 at 07:48:28PM -0700, Gopakumar Choorakkot Edakkunni wrote:
> Thanks a lot for the response Yuanhan. I am using dpdk v16.07. So what you are
> saying is that in 16.07, we dont really need to call rte_eth_dev_close() on
> exit,

It's not about "don't really need", it's more like "it's hard to". Just
think that it may crash at any time.

> because dpdk will ensure that it will do virtio reset before init when it
> comes up right ?

No, It just handles the abnormal case well when guest APP restarts.

> Regarding the vhost commits you mentioned - do we still need those fixes if we
> have the "virtio reset before init" mechanism ?

Yes, we still need them: just think some malicious guest may also forge
data like that.

I'm a bit confused then. Have you actually met any issue (like got stucked)
with DPDK v16.07?

	--yliu

> Or that is a seperate problem
> altogether (and hence we would need those fixes) ?
> 
> Rgds,
> Gopa.
> 
> On Thu, Mar 16, 2017 at 7:06 PM, Yuanhan Liu <yuanhan.liu@linux.intel.com>
> wrote:
> 
>     On Thu, Mar 16, 2017 at 12:39:16PM -0700, Gopakumar Choorakkot Edakkunni
>     wrote:
>     > So the doc says we should call rte_eth_dev_close() *before* going down.
>     And I
>     > know that especially in dpdk-virtionet  in the guest + ovs-dpdk in the
>     host,
>     > the ovs ends up getting stalled/stuck (!!) if I dont close the port
>     before
>     > starting() it when the guest dpdk process comes back up.
> 
>     I'm assuming you were using an old version, something like dpdk v2.2?
>     IIRC, DPDK v16.04 should have fixed your issue.
>    
>     > Considering that this not done properly can screw up the HOST ovs, and I
>     want
>     > to do everything possible to avoid that, I want to be 200% sure that I
>     call
>     > close even if my process gets a kill -9 .. So obviously the only way of
>     doing
>     > that is to close the port when the dpdk process comes back up and
>     *before* we
>     > init the port. rte_eth_dev_close() is not capable of doing that as it
>     expects
>     > the port parameters to be initialized etc.. before it can be called.
> 
>     We do virtio reset before init, which is basically what rte_eth_dev_close()
>     mainly does. So I see no big issue here.
> 
>     The stuck issue is due to hugepage reset by the guest DPDK application,
>     leading all virtio vring elements being mem zeroed. The old vhost doesn't
>     handle it well, as a result, it got stuck. And here are some relevant
>     commits:
> 
>         a436f53 vhost: avoid dead loop chain
>         c687b0b vhost: check for ring descriptors overflow
>         623bc47 vhost: do sanity check for ring descriptor length
> 
>             --yliu
> 
>     > Any other
>     > suggestions on what can be done to close on restart rather than close on
>     going
>     > down ? Thought of bouncing this by the alias before I add a version of
>     close
>     > myself that can do this close-on-restart
> 
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: virtio "how to restart applications" - //dpdk.org/doc/virtio-net-pmd
  2017-03-17  4:35     ` Yuanhan Liu
@ 2017-03-17  4:56       ` Gopakumar Choorakkot Edakkunni
  2017-03-17  5:13         ` Yuanhan Liu
  0 siblings, 1 reply; 15+ messages in thread
From: Gopakumar Choorakkot Edakkunni @ 2017-03-17  4:56 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: dev

Hi Yuanhan,

Thanks for the confirmation about not having to do anything special to
close the ports on dpdk going down or coming up.

As for the question about if I met any issue of ovs getting stuck - yes, my
guest process runs dpdk 16.07 as I mentioned earlier - and if I kill my
guest process, then the host OVS-dpdk on the host reports stall ! The
OVS-dpdk and emu versions I use are as below. But maybe that is because of
the ovs missing the fixes you mentioned ?

~# ovs-vswitchd --version
ovs-vswitchd (Open vSwitch) 2.4.1
Compiled Nov 14 2016 06:53:31
# kvm --version
QEMU emulator version 2.2.0, Copyright (c) 2003-2008 Fabrice Bellard
~#


Rgds,
Gopa.

On Thu, Mar 16, 2017 at 9:35 PM, Yuanhan Liu <yuanhan.liu@linux.intel.com>
wrote:

> On Thu, Mar 16, 2017 at 07:48:28PM -0700, Gopakumar Choorakkot Edakkunni
> wrote:
> > Thanks a lot for the response Yuanhan. I am using dpdk v16.07. So what
> you are
> > saying is that in 16.07, we dont really need to call rte_eth_dev_close()
> on
> > exit,
>
> It's not about "don't really need", it's more like "it's hard to". Just
> think that it may crash at any time.
>
> > because dpdk will ensure that it will do virtio reset before init when it
> > comes up right ?
>
> No, It just handles the abnormal case well when guest APP restarts.
>
> > Regarding the vhost commits you mentioned - do we still need those fixes
> if we
> > have the "virtio reset before init" mechanism ?
>
> Yes, we still need them: just think some malicious guest may also forge
> data like that.
>
> I'm a bit confused then. Have you actually met any issue (like got stucked)
> with DPDK v16.07?
>
>         --yliu
>
> > Or that is a seperate problem
> > altogether (and hence we would need those fixes) ?
> >
> > Rgds,
> > Gopa.
> >
> > On Thu, Mar 16, 2017 at 7:06 PM, Yuanhan Liu <
> yuanhan.liu@linux.intel.com>
> > wrote:
> >
> >     On Thu, Mar 16, 2017 at 12:39:16PM -0700, Gopakumar Choorakkot
> Edakkunni
> >     wrote:
> >     > So the doc says we should call rte_eth_dev_close() *before* going
> down.
> >     And I
> >     > know that especially in dpdk-virtionet  in the guest + ovs-dpdk in
> the
> >     host,
> >     > the ovs ends up getting stalled/stuck (!!) if I dont close the port
> >     before
> >     > starting() it when the guest dpdk process comes back up.
> >
> >     I'm assuming you were using an old version, something like dpdk v2.2?
> >     IIRC, DPDK v16.04 should have fixed your issue.
> >
> >     > Considering that this not done properly can screw up the HOST ovs,
> and I
> >     want
> >     > to do everything possible to avoid that, I want to be 200% sure
> that I
> >     call
> >     > close even if my process gets a kill -9 .. So obviously the only
> way of
> >     doing
> >     > that is to close the port when the dpdk process comes back up and
> >     *before* we
> >     > init the port. rte_eth_dev_close() is not capable of doing that as
> it
> >     expects
> >     > the port parameters to be initialized etc.. before it can be
> called.
> >
> >     We do virtio reset before init, which is basically what
> rte_eth_dev_close()
> >     mainly does. So I see no big issue here.
> >
> >     The stuck issue is due to hugepage reset by the guest DPDK
> application,
> >     leading all virtio vring elements being mem zeroed. The old vhost
> doesn't
> >     handle it well, as a result, it got stuck. And here are some relevant
> >     commits:
> >
> >         a436f53 vhost: avoid dead loop chain
> >         c687b0b vhost: check for ring descriptors overflow
> >         623bc47 vhost: do sanity check for ring descriptor length
> >
> >             --yliu
> >
> >     > Any other
> >     > suggestions on what can be done to close on restart rather than
> close on
> >     going
> >     > down ? Thought of bouncing this by the alias before I add a
> version of
> >     close
> >     > myself that can do this close-on-restart
> >
> >
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: virtio "how to restart applications" - //dpdk.org/doc/virtio-net-pmd
  2017-03-17  4:56       ` Gopakumar Choorakkot Edakkunni
@ 2017-03-17  5:13         ` Yuanhan Liu
  2017-03-17  5:20           ` Gopakumar Choorakkot Edakkunni
  0 siblings, 1 reply; 15+ messages in thread
From: Yuanhan Liu @ 2017-03-17  5:13 UTC (permalink / raw)
  To: Gopakumar Choorakkot Edakkunni; +Cc: dev

On Thu, Mar 16, 2017 at 09:56:01PM -0700, Gopakumar Choorakkot Edakkunni wrote:
> Hi Yuanhan,
> 
> Thanks for the confirmation about not having to do anything special to close
> the ports on dpdk going down or coming up.
> 
> As for the question about if I met any issue of ovs getting stuck - yes, my
> guest process runs dpdk 16.07 as I mentioned earlier - and if I kill my guest
> process, then the host OVS-dpdk on the host reports stall ! The OVS-dpdk and
> emu versions I use are as below. But maybe that is because of the ovs missing
> the fixes you mentioned ?

When I was saying dpdk version, I meant the DPDK version with OVS.

> ~# ovs-vswitchd --version
> ovs-vswitchd (Open vSwitch) 2.4.1

And yes, the fixes are not included in the DPDK required for OVS 2.4.

	--yliu

> Compiled Nov 14 2016 06:53:31
> # kvm --version
> QEMU emulator version 2.2.0, Copyright (c) 2003-2008 Fabrice Bellard
> ~#
> 
> 
> Rgds,
> Gopa.
> 
> On Thu, Mar 16, 2017 at 9:35 PM, Yuanhan Liu <yuanhan.liu@linux.intel.com>
> wrote:
> 
>     On Thu, Mar 16, 2017 at 07:48:28PM -0700, Gopakumar Choorakkot Edakkunni
>     wrote:
>     > Thanks a lot for the response Yuanhan. I am using dpdk v16.07. So what
>     you are
>     > saying is that in 16.07, we dont really need to call rte_eth_dev_close()
>     on
>     > exit,
> 
>     It's not about "don't really need", it's more like "it's hard to". Just
>     think that it may crash at any time.
>    
>     > because dpdk will ensure that it will do virtio reset before init when it
>     > comes up right ?
> 
>     No, It just handles the abnormal case well when guest APP restarts.
>    
>     > Regarding the vhost commits you mentioned - do we still need those fixes
>     if we
>     > have the "virtio reset before init" mechanism ?
> 
>     Yes, we still need them: just think some malicious guest may also forge
>     data like that.
> 
>     I'm a bit confused then. Have you actually met any issue (like got stucked)
>     with DPDK v16.07?
> 
>             --yliu
> 
>     > Or that is a seperate problem
>     > altogether (and hence we would need those fixes) ?
>     >
>     > Rgds,
>     > Gopa.
>     >
>     > On Thu, Mar 16, 2017 at 7:06 PM, Yuanhan Liu <yuanhan.liu@linux.intel.com
>     >
>     > wrote:
>     >
>     >     On Thu, Mar 16, 2017 at 12:39:16PM -0700, Gopakumar Choorakkot
>     Edakkunni
>     >     wrote:
>     >     > So the doc says we should call rte_eth_dev_close() *before* going
>     down.
>     >     And I
>     >     > know that especially in dpdk-virtionet  in the guest + ovs-dpdk in
>     the
>     >     host,
>     >     > the ovs ends up getting stalled/stuck (!!) if I dont close the port
>     >     before
>     >     > starting() it when the guest dpdk process comes back up.
>     >
>     >     I'm assuming you were using an old version, something like dpdk v2.2?
>     >     IIRC, DPDK v16.04 should have fixed your issue.
>     >
>     >     > Considering that this not done properly can screw up the HOST ovs,
>     and I
>     >     want
>     >     > to do everything possible to avoid that, I want to be 200% sure
>     that I
>     >     call
>     >     > close even if my process gets a kill -9 .. So obviously the only
>     way of
>     >     doing
>     >     > that is to close the port when the dpdk process comes back up and
>     >     *before* we
>     >     > init the port. rte_eth_dev_close() is not capable of doing that as
>     it
>     >     expects
>     >     > the port parameters to be initialized etc.. before it can be
>     called.
>     >
>     >     We do virtio reset before init, which is basically what
>     rte_eth_dev_close()
>     >     mainly does. So I see no big issue here.
>     >
>     >     The stuck issue is due to hugepage reset by the guest DPDK
>     application,
>     >     leading all virtio vring elements being mem zeroed. The old vhost
>     doesn't
>     >     handle it well, as a result, it got stuck. And here are some relevant
>     >     commits:
>     >
>     >         a436f53 vhost: avoid dead loop chain
>     >         c687b0b vhost: check for ring descriptors overflow
>     >         623bc47 vhost: do sanity check for ring descriptor length
>     >
>     >             --yliu
>     >
>     >     > Any other
>     >     > suggestions on what can be done to close on restart rather than
>     close on
>     >     going
>     >     > down ? Thought of bouncing this by the alias before I add a version
>     of
>     >     close
>     >     > myself that can do this close-on-restart
>     >
>     >
> 
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: virtio "how to restart applications" - //dpdk.org/doc/virtio-net-pmd
  2017-03-17  5:13         ` Yuanhan Liu
@ 2017-03-17  5:20           ` Gopakumar Choorakkot Edakkunni
  2017-03-17  5:24             ` Yuanhan Liu
  0 siblings, 1 reply; 15+ messages in thread
From: Gopakumar Choorakkot Edakkunni @ 2017-03-17  5:20 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: dev

>> When I was saying dpdk version, I meant the DPDK version with OVS.

Oh I see! My apologies for the misuderstanding. The dpdk version used by
host ovs should be dpdk2.2, the guest process uses dpdk16.07. The OVS
process is not getting restarted, what is getting restarted is the guest
process using dpdk16.07 - so the above clarifications you had about virtio
being reset-before-opened on guest restart - does that still hold good or
does that need the HOST side dpdk to be 16.04 or above ?

>> And yes, the fixes are not included in the DPDK required for OVS 2.4.

Thanks for the info.

Rgds,
Gopa.

On Thu, Mar 16, 2017 at 10:13 PM, Yuanhan Liu <yuanhan.liu@linux.intel.com>
wrote:

> On Thu, Mar 16, 2017 at 09:56:01PM -0700, Gopakumar Choorakkot Edakkunni
> wrote:
> > Hi Yuanhan,
> >
> > Thanks for the confirmation about not having to do anything special to
> close
> > the ports on dpdk going down or coming up.
> >
> > As for the question about if I met any issue of ovs getting stuck - yes,
> my
> > guest process runs dpdk 16.07 as I mentioned earlier - and if I kill my
> guest
> > process, then the host OVS-dpdk on the host reports stall ! The OVS-dpdk
> and
> > emu versions I use are as below. But maybe that is because of the ovs
> missing
> > the fixes you mentioned ?
>
> When I was saying dpdk version, I meant the DPDK version with OVS.
>
> > ~# ovs-vswitchd --version
> > ovs-vswitchd (Open vSwitch) 2.4.1
>
> And yes, the fixes are not included in the DPDK required for OVS 2.4.
>
>         --yliu
>
> > Compiled Nov 14 2016 06:53:31
> > # kvm --version
> > QEMU emulator version 2.2.0, Copyright (c) 2003-2008 Fabrice Bellard
> > ~#
> >
> >
> > Rgds,
> > Gopa.
> >
> > On Thu, Mar 16, 2017 at 9:35 PM, Yuanhan Liu <
> yuanhan.liu@linux.intel.com>
> > wrote:
> >
> >     On Thu, Mar 16, 2017 at 07:48:28PM -0700, Gopakumar Choorakkot
> Edakkunni
> >     wrote:
> >     > Thanks a lot for the response Yuanhan. I am using dpdk v16.07. So
> what
> >     you are
> >     > saying is that in 16.07, we dont really need to call
> rte_eth_dev_close()
> >     on
> >     > exit,
> >
> >     It's not about "don't really need", it's more like "it's hard to".
> Just
> >     think that it may crash at any time.
> >
> >     > because dpdk will ensure that it will do virtio reset before init
> when it
> >     > comes up right ?
> >
> >     No, It just handles the abnormal case well when guest APP restarts.
> >
> >     > Regarding the vhost commits you mentioned - do we still need those
> fixes
> >     if we
> >     > have the "virtio reset before init" mechanism ?
> >
> >     Yes, we still need them: just think some malicious guest may also
> forge
> >     data like that.
> >
> >     I'm a bit confused then. Have you actually met any issue (like got
> stucked)
> >     with DPDK v16.07?
> >
> >             --yliu
> >
> >     > Or that is a seperate problem
> >     > altogether (and hence we would need those fixes) ?
> >     >
> >     > Rgds,
> >     > Gopa.
> >     >
> >     > On Thu, Mar 16, 2017 at 7:06 PM, Yuanhan Liu <
> yuanhan.liu@linux.intel.com
> >     >
> >     > wrote:
> >     >
> >     >     On Thu, Mar 16, 2017 at 12:39:16PM -0700, Gopakumar Choorakkot
> >     Edakkunni
> >     >     wrote:
> >     >     > So the doc says we should call rte_eth_dev_close() *before*
> going
> >     down.
> >     >     And I
> >     >     > know that especially in dpdk-virtionet  in the guest +
> ovs-dpdk in
> >     the
> >     >     host,
> >     >     > the ovs ends up getting stalled/stuck (!!) if I dont close
> the port
> >     >     before
> >     >     > starting() it when the guest dpdk process comes back up.
> >     >
> >     >     I'm assuming you were using an old version, something like
> dpdk v2.2?
> >     >     IIRC, DPDK v16.04 should have fixed your issue.
> >     >
> >     >     > Considering that this not done properly can screw up the
> HOST ovs,
> >     and I
> >     >     want
> >     >     > to do everything possible to avoid that, I want to be 200%
> sure
> >     that I
> >     >     call
> >     >     > close even if my process gets a kill -9 .. So obviously the
> only
> >     way of
> >     >     doing
> >     >     > that is to close the port when the dpdk process comes back
> up and
> >     >     *before* we
> >     >     > init the port. rte_eth_dev_close() is not capable of doing
> that as
> >     it
> >     >     expects
> >     >     > the port parameters to be initialized etc.. before it can be
> >     called.
> >     >
> >     >     We do virtio reset before init, which is basically what
> >     rte_eth_dev_close()
> >     >     mainly does. So I see no big issue here.
> >     >
> >     >     The stuck issue is due to hugepage reset by the guest DPDK
> >     application,
> >     >     leading all virtio vring elements being mem zeroed. The old
> vhost
> >     doesn't
> >     >     handle it well, as a result, it got stuck. And here are some
> relevant
> >     >     commits:
> >     >
> >     >         a436f53 vhost: avoid dead loop chain
> >     >         c687b0b vhost: check for ring descriptors overflow
> >     >         623bc47 vhost: do sanity check for ring descriptor length
> >     >
> >     >             --yliu
> >     >
> >     >     > Any other
> >     >     > suggestions on what can be done to close on restart rather
> than
> >     close on
> >     >     going
> >     >     > down ? Thought of bouncing this by the alias before I add a
> version
> >     of
> >     >     close
> >     >     > myself that can do this close-on-restart
> >     >
> >     >
> >
> >
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: virtio "how to restart applications" - //dpdk.org/doc/virtio-net-pmd
  2017-03-17  5:20           ` Gopakumar Choorakkot Edakkunni
@ 2017-03-17  5:24             ` Yuanhan Liu
  2017-03-17  5:30               ` Gopakumar Choorakkot Edakkunni
  0 siblings, 1 reply; 15+ messages in thread
From: Yuanhan Liu @ 2017-03-17  5:24 UTC (permalink / raw)
  To: Gopakumar Choorakkot Edakkunni; +Cc: dev

On Thu, Mar 16, 2017 at 10:20:30PM -0700, Gopakumar Choorakkot Edakkunni wrote:
> >> When I was saying dpdk version, I meant the DPDK version with OVS.
> 
> Oh I see! My apologies for the misuderstanding. The dpdk version used by host
> ovs should be dpdk2.2, the guest process uses dpdk16.07. The OVS process is not
> getting restarted, what is getting restarted is the guest process using
> dpdk16.07 - so the above clarifications you had about virtio being
> reset-before-opened on guest restart - does that still hold good or does that
> need the HOST side dpdk to be 16.04 or above ?

Yes, the HOST dpdk should be >= v16.04.

	--yliu
> 
> >> And yes, the fixes are not included in the DPDK required for OVS 2.4.
> 
> Thanks for the info.
> 
> Rgds,
> Gopa.
> 
> On Thu, Mar 16, 2017 at 10:13 PM, Yuanhan Liu <yuanhan.liu@linux.intel.com>
> wrote:
> 
>     On Thu, Mar 16, 2017 at 09:56:01PM -0700, Gopakumar Choorakkot Edakkunni
>     wrote:
>     > Hi Yuanhan,
>     >
>     > Thanks for the confirmation about not having to do anything special to
>     close
>     > the ports on dpdk going down or coming up.
>     >
>     > As for the question about if I met any issue of ovs getting stuck - yes,
>     my
>     > guest process runs dpdk 16.07 as I mentioned earlier - and if I kill my
>     guest
>     > process, then the host OVS-dpdk on the host reports stall ! The OVS-dpdk
>     and
>     > emu versions I use are as below. But maybe that is because of the ovs
>     missing
>     > the fixes you mentioned ?
> 
>     When I was saying dpdk version, I meant the DPDK version with OVS.
>    
>     > ~# ovs-vswitchd --version
>     > ovs-vswitchd (Open vSwitch) 2.4.1
> 
>     And yes, the fixes are not included in the DPDK required for OVS 2.4.
> 
>             --yliu
> 
>     > Compiled Nov 14 2016 06:53:31
>     > # kvm --version
>     > QEMU emulator version 2.2.0, Copyright (c) 2003-2008 Fabrice Bellard
>     > ~#
>     >
>     >
>     > Rgds,
>     > Gopa.
>     >
>     > On Thu, Mar 16, 2017 at 9:35 PM, Yuanhan Liu <yuanhan.liu@linux.intel.com
>     >
>     > wrote:
>     >
>     >     On Thu, Mar 16, 2017 at 07:48:28PM -0700, Gopakumar Choorakkot
>     Edakkunni
>     >     wrote:
>     >     > Thanks a lot for the response Yuanhan. I am using dpdk v16.07. So
>     what
>     >     you are
>     >     > saying is that in 16.07, we dont really need to call
>     rte_eth_dev_close()
>     >     on
>     >     > exit,
>     >
>     >     It's not about "don't really need", it's more like "it's hard to".
>     Just
>     >     think that it may crash at any time.
>     >
>     >     > because dpdk will ensure that it will do virtio reset before init
>     when it
>     >     > comes up right ?
>     >
>     >     No, It just handles the abnormal case well when guest APP restarts.
>     >
>     >     > Regarding the vhost commits you mentioned - do we still need those
>     fixes
>     >     if we
>     >     > have the "virtio reset before init" mechanism ?
>     >
>     >     Yes, we still need them: just think some malicious guest may also
>     forge
>     >     data like that.
>     >
>     >     I'm a bit confused then. Have you actually met any issue (like got
>     stucked)
>     >     with DPDK v16.07?
>     >
>     >             --yliu
>     >
>     >     > Or that is a seperate problem
>     >     > altogether (and hence we would need those fixes) ?
>     >     >
>     >     > Rgds,
>     >     > Gopa.
>     >     >
>     >     > On Thu, Mar 16, 2017 at 7:06 PM, Yuanhan Liu <
>     yuanhan.liu@linux.intel.com
>     >     >
>     >     > wrote:
>     >     >
>     >     >     On Thu, Mar 16, 2017 at 12:39:16PM -0700, Gopakumar Choorakkot
>     >     Edakkunni
>     >     >     wrote:
>     >     >     > So the doc says we should call rte_eth_dev_close() *before*
>     going
>     >     down.
>     >     >     And I
>     >     >     > know that especially in dpdk-virtionet  in the guest +
>     ovs-dpdk in
>     >     the
>     >     >     host,
>     >     >     > the ovs ends up getting stalled/stuck (!!) if I dont close
>     the port
>     >     >     before
>     >     >     > starting() it when the guest dpdk process comes back up.
>     >     >
>     >     >     I'm assuming you were using an old version, something like dpdk
>     v2.2?
>     >     >     IIRC, DPDK v16.04 should have fixed your issue.
>     >     >
>     >     >     > Considering that this not done properly can screw up the HOST
>     ovs,
>     >     and I
>     >     >     want
>     >     >     > to do everything possible to avoid that, I want to be 200%
>     sure
>     >     that I
>     >     >     call
>     >     >     > close even if my process gets a kill -9 .. So obviously the
>     only
>     >     way of
>     >     >     doing
>     >     >     > that is to close the port when the dpdk process comes back up
>     and
>     >     >     *before* we
>     >     >     > init the port. rte_eth_dev_close() is not capable of doing
>     that as
>     >     it
>     >     >     expects
>     >     >     > the port parameters to be initialized etc.. before it can be
>     >     called.
>     >     >
>     >     >     We do virtio reset before init, which is basically what
>     >     rte_eth_dev_close()
>     >     >     mainly does. So I see no big issue here.
>     >     >
>     >     >     The stuck issue is due to hugepage reset by the guest DPDK
>     >     application,
>     >     >     leading all virtio vring elements being mem zeroed. The old
>     vhost
>     >     doesn't
>     >     >     handle it well, as a result, it got stuck. And here are some
>     relevant
>     >     >     commits:
>     >     >
>     >     >         a436f53 vhost: avoid dead loop chain
>     >     >         c687b0b vhost: check for ring descriptors overflow
>     >     >         623bc47 vhost: do sanity check for ring descriptor length
>     >     >
>     >     >             --yliu
>     >     >
>     >     >     > Any other
>     >     >     > suggestions on what can be done to close on restart rather
>     than
>     >     close on
>     >     >     going
>     >     >     > down ? Thought of bouncing this by the alias before I add a
>     version
>     >     of
>     >     >     close
>     >     >     > myself that can do this close-on-restart
>     >     >
>     >     >
>     >
>     >
> 
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: virtio "how to restart applications" - //dpdk.org/doc/virtio-net-pmd
  2017-03-17  5:24             ` Yuanhan Liu
@ 2017-03-17  5:30               ` Gopakumar Choorakkot Edakkunni
  2017-03-17  5:40                 ` Yuanhan Liu
  0 siblings, 1 reply; 15+ messages in thread
From: Gopakumar Choorakkot Edakkunni @ 2017-03-17  5:30 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: dev

Thanks for the confirmation, glad I reached the person who knows the nuts
and bolts of virtio :-). So if the host is not in our control (ie if I am
just running as a VM on host provided by thirdparty vendor), is there any
workaround I can do from the guest side to prevent problems from happening
on a guest restart ?

And if theres no workarounds at all and the host has to change, instead of
asking the third party vendor to do a wholesale upgrade to 16.04, is there
one/few commits that can be added to the host ovs-dpdk to take care of this
guest restart virtio-reset-before opening case ?

Rgds,
Gopa.

On Thu, Mar 16, 2017 at 10:24 PM, Yuanhan Liu <yuanhan.liu@linux.intel.com>
wrote:

> On Thu, Mar 16, 2017 at 10:20:30PM -0700, Gopakumar Choorakkot Edakkunni
> wrote:
> > >> When I was saying dpdk version, I meant the DPDK version with OVS.
> >
> > Oh I see! My apologies for the misuderstanding. The dpdk version used by
> host
> > ovs should be dpdk2.2, the guest process uses dpdk16.07. The OVS process
> is not
> > getting restarted, what is getting restarted is the guest process using
> > dpdk16.07 - so the above clarifications you had about virtio being
> > reset-before-opened on guest restart - does that still hold good or does
> that
> > need the HOST side dpdk to be 16.04 or above ?
>
> Yes, the HOST dpdk should be >= v16.04.
>
>         --yliu
> >
> > >> And yes, the fixes are not included in the DPDK required for OVS 2.4.
> >
> > Thanks for the info.
> >
> > Rgds,
> > Gopa.
> >
> > On Thu, Mar 16, 2017 at 10:13 PM, Yuanhan Liu <
> yuanhan.liu@linux.intel.com>
> > wrote:
> >
> >     On Thu, Mar 16, 2017 at 09:56:01PM -0700, Gopakumar Choorakkot
> Edakkunni
> >     wrote:
> >     > Hi Yuanhan,
> >     >
> >     > Thanks for the confirmation about not having to do anything
> special to
> >     close
> >     > the ports on dpdk going down or coming up.
> >     >
> >     > As for the question about if I met any issue of ovs getting stuck
> - yes,
> >     my
> >     > guest process runs dpdk 16.07 as I mentioned earlier - and if I
> kill my
> >     guest
> >     > process, then the host OVS-dpdk on the host reports stall ! The
> OVS-dpdk
> >     and
> >     > emu versions I use are as below. But maybe that is because of the
> ovs
> >     missing
> >     > the fixes you mentioned ?
> >
> >     When I was saying dpdk version, I meant the DPDK version with OVS.
> >
> >     > ~# ovs-vswitchd --version
> >     > ovs-vswitchd (Open vSwitch) 2.4.1
> >
> >     And yes, the fixes are not included in the DPDK required for OVS 2.4.
> >
> >             --yliu
> >
> >     > Compiled Nov 14 2016 06:53:31
> >     > # kvm --version
> >     > QEMU emulator version 2.2.0, Copyright (c) 2003-2008 Fabrice
> Bellard
> >     > ~#
> >     >
> >     >
> >     > Rgds,
> >     > Gopa.
> >     >
> >     > On Thu, Mar 16, 2017 at 9:35 PM, Yuanhan Liu <
> yuanhan.liu@linux.intel.com
> >     >
> >     > wrote:
> >     >
> >     >     On Thu, Mar 16, 2017 at 07:48:28PM -0700, Gopakumar Choorakkot
> >     Edakkunni
> >     >     wrote:
> >     >     > Thanks a lot for the response Yuanhan. I am using dpdk
> v16.07. So
> >     what
> >     >     you are
> >     >     > saying is that in 16.07, we dont really need to call
> >     rte_eth_dev_close()
> >     >     on
> >     >     > exit,
> >     >
> >     >     It's not about "don't really need", it's more like "it's hard
> to".
> >     Just
> >     >     think that it may crash at any time.
> >     >
> >     >     > because dpdk will ensure that it will do virtio reset before
> init
> >     when it
> >     >     > comes up right ?
> >     >
> >     >     No, It just handles the abnormal case well when guest APP
> restarts.
> >     >
> >     >     > Regarding the vhost commits you mentioned - do we still need
> those
> >     fixes
> >     >     if we
> >     >     > have the "virtio reset before init" mechanism ?
> >     >
> >     >     Yes, we still need them: just think some malicious guest may
> also
> >     forge
> >     >     data like that.
> >     >
> >     >     I'm a bit confused then. Have you actually met any issue (like
> got
> >     stucked)
> >     >     with DPDK v16.07?
> >     >
> >     >             --yliu
> >     >
> >     >     > Or that is a seperate problem
> >     >     > altogether (and hence we would need those fixes) ?
> >     >     >
> >     >     > Rgds,
> >     >     > Gopa.
> >     >     >
> >     >     > On Thu, Mar 16, 2017 at 7:06 PM, Yuanhan Liu <
> >     yuanhan.liu@linux.intel.com
> >     >     >
> >     >     > wrote:
> >     >     >
> >     >     >     On Thu, Mar 16, 2017 at 12:39:16PM -0700, Gopakumar
> Choorakkot
> >     >     Edakkunni
> >     >     >     wrote:
> >     >     >     > So the doc says we should call rte_eth_dev_close()
> *before*
> >     going
> >     >     down.
> >     >     >     And I
> >     >     >     > know that especially in dpdk-virtionet  in the guest +
> >     ovs-dpdk in
> >     >     the
> >     >     >     host,
> >     >     >     > the ovs ends up getting stalled/stuck (!!) if I dont
> close
> >     the port
> >     >     >     before
> >     >     >     > starting() it when the guest dpdk process comes back
> up.
> >     >     >
> >     >     >     I'm assuming you were using an old version, something
> like dpdk
> >     v2.2?
> >     >     >     IIRC, DPDK v16.04 should have fixed your issue.
> >     >     >
> >     >     >     > Considering that this not done properly can screw up
> the HOST
> >     ovs,
> >     >     and I
> >     >     >     want
> >     >     >     > to do everything possible to avoid that, I want to be
> 200%
> >     sure
> >     >     that I
> >     >     >     call
> >     >     >     > close even if my process gets a kill -9 .. So
> obviously the
> >     only
> >     >     way of
> >     >     >     doing
> >     >     >     > that is to close the port when the dpdk process comes
> back up
> >     and
> >     >     >     *before* we
> >     >     >     > init the port. rte_eth_dev_close() is not capable of
> doing
> >     that as
> >     >     it
> >     >     >     expects
> >     >     >     > the port parameters to be initialized etc.. before it
> can be
> >     >     called.
> >     >     >
> >     >     >     We do virtio reset before init, which is basically what
> >     >     rte_eth_dev_close()
> >     >     >     mainly does. So I see no big issue here.
> >     >     >
> >     >     >     The stuck issue is due to hugepage reset by the guest
> DPDK
> >     >     application,
> >     >     >     leading all virtio vring elements being mem zeroed. The
> old
> >     vhost
> >     >     doesn't
> >     >     >     handle it well, as a result, it got stuck. And here are
> some
> >     relevant
> >     >     >     commits:
> >     >     >
> >     >     >         a436f53 vhost: avoid dead loop chain
> >     >     >         c687b0b vhost: check for ring descriptors overflow
> >     >     >         623bc47 vhost: do sanity check for ring descriptor
> length
> >     >     >
> >     >     >             --yliu
> >     >     >
> >     >     >     > Any other
> >     >     >     > suggestions on what can be done to close on restart
> rather
> >     than
> >     >     close on
> >     >     >     going
> >     >     >     > down ? Thought of bouncing this by the alias before I
> add a
> >     version
> >     >     of
> >     >     >     close
> >     >     >     > myself that can do this close-on-restart
> >     >     >
> >     >     >
> >     >
> >     >
> >
> >
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: virtio "how to restart applications" - //dpdk.org/doc/virtio-net-pmd
  2017-03-17  5:30               ` Gopakumar Choorakkot Edakkunni
@ 2017-03-17  5:40                 ` Yuanhan Liu
  2017-03-17  5:50                   ` Gopakumar Choorakkot Edakkunni
  0 siblings, 1 reply; 15+ messages in thread
From: Yuanhan Liu @ 2017-03-17  5:40 UTC (permalink / raw)
  To: Gopakumar Choorakkot Edakkunni; +Cc: dev

On Thu, Mar 16, 2017 at 10:30:09PM -0700, Gopakumar Choorakkot Edakkunni wrote:
> Thanks for the confirmation, glad I reached the person who knows the nuts and
> bolts of virtio :-). So if the host is not in our control (ie if I am just
> running as a VM on host provided by thirdparty vendor), is there any workaround
> I can do from the guest side to prevent problems from happening on a guest
> restart ?

Not too much. You might want to hack the guest DPDK EAL memory initiation
part though, to not reset the hugepage memory on start. But that's too hacky
that I will not recommend you to do so!

> And if theres no workarounds at all and the host has to change, instead of
> asking the third party vendor to do a wholesale upgrade to 16.04, is there one/
> few commits that can be added to the host ovs-dpdk to take care of this guest
> restart virtio-reset-before opening case ?

Yes, backporting the commits I have mentioned should be able to fix it.
But please note that I did some code refactorings before those fixes: it
won't apply cleanly to DPDK v2.2.

And if you want to upgrade, I'd suggest to upgrade to v16.11, which is
LTS release.

	--yliu
> 
> Rgds,
> Gopa.
> 
> On Thu, Mar 16, 2017 at 10:24 PM, Yuanhan Liu <yuanhan.liu@linux.intel.com>
> wrote:
> 
>     On Thu, Mar 16, 2017 at 10:20:30PM -0700, Gopakumar Choorakkot Edakkunni
>     wrote:
>     > >> When I was saying dpdk version, I meant the DPDK version with OVS.
>     >
>     > Oh I see! My apologies for the misuderstanding. The dpdk version used by
>     host
>     > ovs should be dpdk2.2, the guest process uses dpdk16.07. The OVS process
>     is not
>     > getting restarted, what is getting restarted is the guest process using
>     > dpdk16.07 - so the above clarifications you had about virtio being
>     > reset-before-opened on guest restart - does that still hold good or does
>     that
>     > need the HOST side dpdk to be 16.04 or above ?
> 
>     Yes, the HOST dpdk should be >= v16.04.
> 
>             --yliu
>     >
>     > >> And yes, the fixes are not included in the DPDK required for OVS 2.4.
>     >
>     > Thanks for the info.
>     >
>     > Rgds,
>     > Gopa.
>     >
>     > On Thu, Mar 16, 2017 at 10:13 PM, Yuanhan Liu <
>     yuanhan.liu@linux.intel.com>
>     > wrote:
>     >
>     >     On Thu, Mar 16, 2017 at 09:56:01PM -0700, Gopakumar Choorakkot
>     Edakkunni
>     >     wrote:
>     >     > Hi Yuanhan,
>     >     >
>     >     > Thanks for the confirmation about not having to do anything special
>     to
>     >     close
>     >     > the ports on dpdk going down or coming up.
>     >     >
>     >     > As for the question about if I met any issue of ovs getting stuck -
>     yes,
>     >     my
>     >     > guest process runs dpdk 16.07 as I mentioned earlier - and if I
>     kill my
>     >     guest
>     >     > process, then the host OVS-dpdk on the host reports stall ! The
>     OVS-dpdk
>     >     and
>     >     > emu versions I use are as below. But maybe that is because of the
>     ovs
>     >     missing
>     >     > the fixes you mentioned ?
>     >
>     >     When I was saying dpdk version, I meant the DPDK version with OVS.
>     >
>     >     > ~# ovs-vswitchd --version
>     >     > ovs-vswitchd (Open vSwitch) 2.4.1
>     >
>     >     And yes, the fixes are not included in the DPDK required for OVS 2.4.
>     >
>     >             --yliu
>     >
>     >     > Compiled Nov 14 2016 06:53:31
>     >     > # kvm --version
>     >     > QEMU emulator version 2.2.0, Copyright (c) 2003-2008 Fabrice
>     Bellard
>     >     > ~#
>     >     >
>     >     >
>     >     > Rgds,
>     >     > Gopa.
>     >     >
>     >     > On Thu, Mar 16, 2017 at 9:35 PM, Yuanhan Liu <
>     yuanhan.liu@linux.intel.com
>     >     >
>     >     > wrote:
>     >     >
>     >     >     On Thu, Mar 16, 2017 at 07:48:28PM -0700, Gopakumar Choorakkot
>     >     Edakkunni
>     >     >     wrote:
>     >     >     > Thanks a lot for the response Yuanhan. I am using dpdk
>     v16.07. So
>     >     what
>     >     >     you are
>     >     >     > saying is that in 16.07, we dont really need to call
>     >     rte_eth_dev_close()
>     >     >     on
>     >     >     > exit,
>     >     >
>     >     >     It's not about "don't really need", it's more like "it's hard
>     to".
>     >     Just
>     >     >     think that it may crash at any time.
>     >     >
>     >     >     > because dpdk will ensure that it will do virtio reset before
>     init
>     >     when it
>     >     >     > comes up right ?
>     >     >
>     >     >     No, It just handles the abnormal case well when guest APP
>     restarts.
>     >     >
>     >     >     > Regarding the vhost commits you mentioned - do we still need
>     those
>     >     fixes
>     >     >     if we
>     >     >     > have the "virtio reset before init" mechanism ?
>     >     >
>     >     >     Yes, we still need them: just think some malicious guest may
>     also
>     >     forge
>     >     >     data like that.
>     >     >
>     >     >     I'm a bit confused then. Have you actually met any issue (like
>     got
>     >     stucked)
>     >     >     with DPDK v16.07?
>     >     >
>     >     >             --yliu
>     >     >
>     >     >     > Or that is a seperate problem
>     >     >     > altogether (and hence we would need those fixes) ?
>     >     >     >
>     >     >     > Rgds,
>     >     >     > Gopa.
>     >     >     >
>     >     >     > On Thu, Mar 16, 2017 at 7:06 PM, Yuanhan Liu <
>     >     yuanhan.liu@linux.intel.com
>     >     >     >
>     >     >     > wrote:
>     >     >     >
>     >     >     >     On Thu, Mar 16, 2017 at 12:39:16PM -0700, Gopakumar
>     Choorakkot
>     >     >     Edakkunni
>     >     >     >     wrote:
>     >     >     >     > So the doc says we should call rte_eth_dev_close()
>     *before*
>     >     going
>     >     >     down.
>     >     >     >     And I
>     >     >     >     > know that especially in dpdk-virtionet  in the guest +
>     >     ovs-dpdk in
>     >     >     the
>     >     >     >     host,
>     >     >     >     > the ovs ends up getting stalled/stuck (!!) if I dont
>     close
>     >     the port
>     >     >     >     before
>     >     >     >     > starting() it when the guest dpdk process comes back
>     up.
>     >     >     >
>     >     >     >     I'm assuming you were using an old version, something
>     like dpdk
>     >     v2.2?
>     >     >     >     IIRC, DPDK v16.04 should have fixed your issue.
>     >     >     >
>     >     >     >     > Considering that this not done properly can screw up
>     the HOST
>     >     ovs,
>     >     >     and I
>     >     >     >     want
>     >     >     >     > to do everything possible to avoid that, I want to be
>     200%
>     >     sure
>     >     >     that I
>     >     >     >     call
>     >     >     >     > close even if my process gets a kill -9 .. So obviously
>     the
>     >     only
>     >     >     way of
>     >     >     >     doing
>     >     >     >     > that is to close the port when the dpdk process comes
>     back up
>     >     and
>     >     >     >     *before* we
>     >     >     >     > init the port. rte_eth_dev_close() is not capable of
>     doing
>     >     that as
>     >     >     it
>     >     >     >     expects
>     >     >     >     > the port parameters to be initialized etc.. before it
>     can be
>     >     >     called.
>     >     >     >
>     >     >     >     We do virtio reset before init, which is basically what
>     >     >     rte_eth_dev_close()
>     >     >     >     mainly does. So I see no big issue here.
>     >     >     >
>     >     >     >     The stuck issue is due to hugepage reset by the guest
>     DPDK
>     >     >     application,
>     >     >     >     leading all virtio vring elements being mem zeroed. The
>     old
>     >     vhost
>     >     >     doesn't
>     >     >     >     handle it well, as a result, it got stuck. And here are
>     some
>     >     relevant
>     >     >     >     commits:
>     >     >     >
>     >     >     >         a436f53 vhost: avoid dead loop chain
>     >     >     >         c687b0b vhost: check for ring descriptors overflow
>     >     >     >         623bc47 vhost: do sanity check for ring descriptor
>     length
>     >     >     >
>     >     >     >             --yliu
>     >     >     >
>     >     >     >     > Any other
>     >     >     >     > suggestions on what can be done to close on restart
>     rather
>     >     than
>     >     >     close on
>     >     >     >     going
>     >     >     >     > down ? Thought of bouncing this by the alias before I
>     add a
>     >     version
>     >     >     of
>     >     >     >     close
>     >     >     >     > myself that can do this close-on-restart
>     >     >     >
>     >     >     >
>     >     >
>     >     >
>     >
>     >
> 
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: virtio "how to restart applications" - //dpdk.org/doc/virtio-net-pmd
  2017-03-17  5:40                 ` Yuanhan Liu
@ 2017-03-17  5:50                   ` Gopakumar Choorakkot Edakkunni
  2017-03-18 21:32                     ` Gopakumar Choorakkot Edakkunni
  0 siblings, 1 reply; 15+ messages in thread
From: Gopakumar Choorakkot Edakkunni @ 2017-03-17  5:50 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: dev

Thanks again Yuanhan, you are the true expert!!

Rgds,
Gopa.

On Thu, Mar 16, 2017 at 10:40 PM, Yuanhan Liu <yuanhan.liu@linux.intel.com>
wrote:

> On Thu, Mar 16, 2017 at 10:30:09PM -0700, Gopakumar Choorakkot Edakkunni
> wrote:
> > Thanks for the confirmation, glad I reached the person who knows the
> nuts and
> > bolts of virtio :-). So if the host is not in our control (ie if I am
> just
> > running as a VM on host provided by thirdparty vendor), is there any
> workaround
> > I can do from the guest side to prevent problems from happening on a
> guest
> > restart ?
>
> Not too much. You might want to hack the guest DPDK EAL memory initiation
> part though, to not reset the hugepage memory on start. But that's too
> hacky
> that I will not recommend you to do so!
>
> > And if theres no workarounds at all and the host has to change, instead
> of
> > asking the third party vendor to do a wholesale upgrade to 16.04, is
> there one/
> > few commits that can be added to the host ovs-dpdk to take care of this
> guest
> > restart virtio-reset-before opening case ?
>
> Yes, backporting the commits I have mentioned should be able to fix it.
> But please note that I did some code refactorings before those fixes: it
> won't apply cleanly to DPDK v2.2.
>
> And if you want to upgrade, I'd suggest to upgrade to v16.11, which is
> LTS release.
>
>         --yliu
> >
> > Rgds,
> > Gopa.
> >
> > On Thu, Mar 16, 2017 at 10:24 PM, Yuanhan Liu <
> yuanhan.liu@linux.intel.com>
> > wrote:
> >
> >     On Thu, Mar 16, 2017 at 10:20:30PM -0700, Gopakumar Choorakkot
> Edakkunni
> >     wrote:
> >     > >> When I was saying dpdk version, I meant the DPDK version with
> OVS.
> >     >
> >     > Oh I see! My apologies for the misuderstanding. The dpdk version
> used by
> >     host
> >     > ovs should be dpdk2.2, the guest process uses dpdk16.07. The OVS
> process
> >     is not
> >     > getting restarted, what is getting restarted is the guest process
> using
> >     > dpdk16.07 - so the above clarifications you had about virtio being
> >     > reset-before-opened on guest restart - does that still hold good
> or does
> >     that
> >     > need the HOST side dpdk to be 16.04 or above ?
> >
> >     Yes, the HOST dpdk should be >= v16.04.
> >
> >             --yliu
> >     >
> >     > >> And yes, the fixes are not included in the DPDK required for
> OVS 2.4.
> >     >
> >     > Thanks for the info.
> >     >
> >     > Rgds,
> >     > Gopa.
> >     >
> >     > On Thu, Mar 16, 2017 at 10:13 PM, Yuanhan Liu <
> >     yuanhan.liu@linux.intel.com>
> >     > wrote:
> >     >
> >     >     On Thu, Mar 16, 2017 at 09:56:01PM -0700, Gopakumar Choorakkot
> >     Edakkunni
> >     >     wrote:
> >     >     > Hi Yuanhan,
> >     >     >
> >     >     > Thanks for the confirmation about not having to do anything
> special
> >     to
> >     >     close
> >     >     > the ports on dpdk going down or coming up.
> >     >     >
> >     >     > As for the question about if I met any issue of ovs getting
> stuck -
> >     yes,
> >     >     my
> >     >     > guest process runs dpdk 16.07 as I mentioned earlier - and
> if I
> >     kill my
> >     >     guest
> >     >     > process, then the host OVS-dpdk on the host reports stall !
> The
> >     OVS-dpdk
> >     >     and
> >     >     > emu versions I use are as below. But maybe that is because
> of the
> >     ovs
> >     >     missing
> >     >     > the fixes you mentioned ?
> >     >
> >     >     When I was saying dpdk version, I meant the DPDK version with
> OVS.
> >     >
> >     >     > ~# ovs-vswitchd --version
> >     >     > ovs-vswitchd (Open vSwitch) 2.4.1
> >     >
> >     >     And yes, the fixes are not included in the DPDK required for
> OVS 2.4.
> >     >
> >     >             --yliu
> >     >
> >     >     > Compiled Nov 14 2016 06:53:31
> >     >     > # kvm --version
> >     >     > QEMU emulator version 2.2.0, Copyright (c) 2003-2008 Fabrice
> >     Bellard
> >     >     > ~#
> >     >     >
> >     >     >
> >     >     > Rgds,
> >     >     > Gopa.
> >     >     >
> >     >     > On Thu, Mar 16, 2017 at 9:35 PM, Yuanhan Liu <
> >     yuanhan.liu@linux.intel.com
> >     >     >
> >     >     > wrote:
> >     >     >
> >     >     >     On Thu, Mar 16, 2017 at 07:48:28PM -0700, Gopakumar
> Choorakkot
> >     >     Edakkunni
> >     >     >     wrote:
> >     >     >     > Thanks a lot for the response Yuanhan. I am using dpdk
> >     v16.07. So
> >     >     what
> >     >     >     you are
> >     >     >     > saying is that in 16.07, we dont really need to call
> >     >     rte_eth_dev_close()
> >     >     >     on
> >     >     >     > exit,
> >     >     >
> >     >     >     It's not about "don't really need", it's more like "it's
> hard
> >     to".
> >     >     Just
> >     >     >     think that it may crash at any time.
> >     >     >
> >     >     >     > because dpdk will ensure that it will do virtio reset
> before
> >     init
> >     >     when it
> >     >     >     > comes up right ?
> >     >     >
> >     >     >     No, It just handles the abnormal case well when guest APP
> >     restarts.
> >     >     >
> >     >     >     > Regarding the vhost commits you mentioned - do we
> still need
> >     those
> >     >     fixes
> >     >     >     if we
> >     >     >     > have the "virtio reset before init" mechanism ?
> >     >     >
> >     >     >     Yes, we still need them: just think some malicious guest
> may
> >     also
> >     >     forge
> >     >     >     data like that.
> >     >     >
> >     >     >     I'm a bit confused then. Have you actually met any issue
> (like
> >     got
> >     >     stucked)
> >     >     >     with DPDK v16.07?
> >     >     >
> >     >     >             --yliu
> >     >     >
> >     >     >     > Or that is a seperate problem
> >     >     >     > altogether (and hence we would need those fixes) ?
> >     >     >     >
> >     >     >     > Rgds,
> >     >     >     > Gopa.
> >     >     >     >
> >     >     >     > On Thu, Mar 16, 2017 at 7:06 PM, Yuanhan Liu <
> >     >     yuanhan.liu@linux.intel.com
> >     >     >     >
> >     >     >     > wrote:
> >     >     >     >
> >     >     >     >     On Thu, Mar 16, 2017 at 12:39:16PM -0700, Gopakumar
> >     Choorakkot
> >     >     >     Edakkunni
> >     >     >     >     wrote:
> >     >     >     >     > So the doc says we should call
> rte_eth_dev_close()
> >     *before*
> >     >     going
> >     >     >     down.
> >     >     >     >     And I
> >     >     >     >     > know that especially in dpdk-virtionet  in the
> guest +
> >     >     ovs-dpdk in
> >     >     >     the
> >     >     >     >     host,
> >     >     >     >     > the ovs ends up getting stalled/stuck (!!) if I
> dont
> >     close
> >     >     the port
> >     >     >     >     before
> >     >     >     >     > starting() it when the guest dpdk process comes
> back
> >     up.
> >     >     >     >
> >     >     >     >     I'm assuming you were using an old version,
> something
> >     like dpdk
> >     >     v2.2?
> >     >     >     >     IIRC, DPDK v16.04 should have fixed your issue.
> >     >     >     >
> >     >     >     >     > Considering that this not done properly can
> screw up
> >     the HOST
> >     >     ovs,
> >     >     >     and I
> >     >     >     >     want
> >     >     >     >     > to do everything possible to avoid that, I want
> to be
> >     200%
> >     >     sure
> >     >     >     that I
> >     >     >     >     call
> >     >     >     >     > close even if my process gets a kill -9 .. So
> obviously
> >     the
> >     >     only
> >     >     >     way of
> >     >     >     >     doing
> >     >     >     >     > that is to close the port when the dpdk process
> comes
> >     back up
> >     >     and
> >     >     >     >     *before* we
> >     >     >     >     > init the port. rte_eth_dev_close() is not
> capable of
> >     doing
> >     >     that as
> >     >     >     it
> >     >     >     >     expects
> >     >     >     >     > the port parameters to be initialized etc..
> before it
> >     can be
> >     >     >     called.
> >     >     >     >
> >     >     >     >     We do virtio reset before init, which is basically
> what
> >     >     >     rte_eth_dev_close()
> >     >     >     >     mainly does. So I see no big issue here.
> >     >     >     >
> >     >     >     >     The stuck issue is due to hugepage reset by the
> guest
> >     DPDK
> >     >     >     application,
> >     >     >     >     leading all virtio vring elements being mem
> zeroed. The
> >     old
> >     >     vhost
> >     >     >     doesn't
> >     >     >     >     handle it well, as a result, it got stuck. And
> here are
> >     some
> >     >     relevant
> >     >     >     >     commits:
> >     >     >     >
> >     >     >     >         a436f53 vhost: avoid dead loop chain
> >     >     >     >         c687b0b vhost: check for ring descriptors
> overflow
> >     >     >     >         623bc47 vhost: do sanity check for ring
> descriptor
> >     length
> >     >     >     >
> >     >     >     >             --yliu
> >     >     >     >
> >     >     >     >     > Any other
> >     >     >     >     > suggestions on what can be done to close on
> restart
> >     rather
> >     >     than
> >     >     >     close on
> >     >     >     >     going
> >     >     >     >     > down ? Thought of bouncing this by the alias
> before I
> >     add a
> >     >     version
> >     >     >     of
> >     >     >     >     close
> >     >     >     >     > myself that can do this close-on-restart
> >     >     >     >
> >     >     >     >
> >     >     >
> >     >     >
> >     >
> >     >
> >
> >
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: virtio "how to restart applications" - //dpdk.org/doc/virtio-net-pmd
  2017-03-17  5:50                   ` Gopakumar Choorakkot Edakkunni
@ 2017-03-18 21:32                     ` Gopakumar Choorakkot Edakkunni
  2017-03-18 21:37                       ` Gopakumar Choorakkot Edakkunni
  0 siblings, 1 reply; 15+ messages in thread
From: Gopakumar Choorakkot Edakkunni @ 2017-03-18 21:32 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: dev

Hi Yuan,

As a "hack"/"workaround", in rte_eal_init(), if I can call vtpci_reset()
just before rte_eal_memory_init(), that should take care of the problem of
host zeroing out hugepages right ? As of today vtpci_reset() is called in
rte_eal_dev_init() which comes *after* rte_eal_memory_init()

Rgds,
Gopa.

On Thu, Mar 16, 2017 at 10:50 PM, Gopakumar Choorakkot Edakkunni <
gopakumar.c.e@gmail.com> wrote:

> Thanks again Yuanhan, you are the true expert!!
>
> Rgds,
> Gopa.
>
> On Thu, Mar 16, 2017 at 10:40 PM, Yuanhan Liu <yuanhan.liu@linux.intel.com
> > wrote:
>
>> On Thu, Mar 16, 2017 at 10:30:09PM -0700, Gopakumar Choorakkot Edakkunni
>> wrote:
>> > Thanks for the confirmation, glad I reached the person who knows the
>> nuts and
>> > bolts of virtio :-). So if the host is not in our control (ie if I am
>> just
>> > running as a VM on host provided by thirdparty vendor), is there any
>> workaround
>> > I can do from the guest side to prevent problems from happening on a
>> guest
>> > restart ?
>>
>> Not too much. You might want to hack the guest DPDK EAL memory initiation
>> part though, to not reset the hugepage memory on start. But that's too
>> hacky
>> that I will not recommend you to do so!
>>
>> > And if theres no workarounds at all and the host has to change, instead
>> of
>> > asking the third party vendor to do a wholesale upgrade to 16.04, is
>> there one/
>> > few commits that can be added to the host ovs-dpdk to take care of this
>> guest
>> > restart virtio-reset-before opening case ?
>>
>> Yes, backporting the commits I have mentioned should be able to fix it.
>> But please note that I did some code refactorings before those fixes: it
>> won't apply cleanly to DPDK v2.2.
>>
>> And if you want to upgrade, I'd suggest to upgrade to v16.11, which is
>> LTS release.
>>
>>         --yliu
>> >
>> > Rgds,
>> > Gopa.
>> >
>> > On Thu, Mar 16, 2017 at 10:24 PM, Yuanhan Liu <
>> yuanhan.liu@linux.intel.com>
>> > wrote:
>> >
>> >     On Thu, Mar 16, 2017 at 10:20:30PM -0700, Gopakumar Choorakkot
>> Edakkunni
>> >     wrote:
>> >     > >> When I was saying dpdk version, I meant the DPDK version with
>> OVS.
>> >     >
>> >     > Oh I see! My apologies for the misuderstanding. The dpdk version
>> used by
>> >     host
>> >     > ovs should be dpdk2.2, the guest process uses dpdk16.07. The OVS
>> process
>> >     is not
>> >     > getting restarted, what is getting restarted is the guest process
>> using
>> >     > dpdk16.07 - so the above clarifications you had about virtio being
>> >     > reset-before-opened on guest restart - does that still hold good
>> or does
>> >     that
>> >     > need the HOST side dpdk to be 16.04 or above ?
>> >
>> >     Yes, the HOST dpdk should be >= v16.04.
>> >
>> >             --yliu
>> >     >
>> >     > >> And yes, the fixes are not included in the DPDK required for
>> OVS 2.4.
>> >     >
>> >     > Thanks for the info.
>> >     >
>> >     > Rgds,
>> >     > Gopa.
>> >     >
>> >     > On Thu, Mar 16, 2017 at 10:13 PM, Yuanhan Liu <
>> >     yuanhan.liu@linux.intel.com>
>> >     > wrote:
>> >     >
>> >     >     On Thu, Mar 16, 2017 at 09:56:01PM -0700, Gopakumar Choorakkot
>> >     Edakkunni
>> >     >     wrote:
>> >     >     > Hi Yuanhan,
>> >     >     >
>> >     >     > Thanks for the confirmation about not having to do anything
>> special
>> >     to
>> >     >     close
>> >     >     > the ports on dpdk going down or coming up.
>> >     >     >
>> >     >     > As for the question about if I met any issue of ovs getting
>> stuck -
>> >     yes,
>> >     >     my
>> >     >     > guest process runs dpdk 16.07 as I mentioned earlier - and
>> if I
>> >     kill my
>> >     >     guest
>> >     >     > process, then the host OVS-dpdk on the host reports stall !
>> The
>> >     OVS-dpdk
>> >     >     and
>> >     >     > emu versions I use are as below. But maybe that is because
>> of the
>> >     ovs
>> >     >     missing
>> >     >     > the fixes you mentioned ?
>> >     >
>> >     >     When I was saying dpdk version, I meant the DPDK version with
>> OVS.
>> >     >
>> >     >     > ~# ovs-vswitchd --version
>> >     >     > ovs-vswitchd (Open vSwitch) 2.4.1
>> >     >
>> >     >     And yes, the fixes are not included in the DPDK required for
>> OVS 2.4.
>> >     >
>> >     >             --yliu
>> >     >
>> >     >     > Compiled Nov 14 2016 06:53:31
>> >     >     > # kvm --version
>> >     >     > QEMU emulator version 2.2.0, Copyright (c) 2003-2008 Fabrice
>> >     Bellard
>> >     >     > ~#
>> >     >     >
>> >     >     >
>> >     >     > Rgds,
>> >     >     > Gopa.
>> >     >     >
>> >     >     > On Thu, Mar 16, 2017 at 9:35 PM, Yuanhan Liu <
>> >     yuanhan.liu@linux.intel.com
>> >     >     >
>> >     >     > wrote:
>> >     >     >
>> >     >     >     On Thu, Mar 16, 2017 at 07:48:28PM -0700, Gopakumar
>> Choorakkot
>> >     >     Edakkunni
>> >     >     >     wrote:
>> >     >     >     > Thanks a lot for the response Yuanhan. I am using dpdk
>> >     v16.07. So
>> >     >     what
>> >     >     >     you are
>> >     >     >     > saying is that in 16.07, we dont really need to call
>> >     >     rte_eth_dev_close()
>> >     >     >     on
>> >     >     >     > exit,
>> >     >     >
>> >     >     >     It's not about "don't really need", it's more like
>> "it's hard
>> >     to".
>> >     >     Just
>> >     >     >     think that it may crash at any time.
>> >     >     >
>> >     >     >     > because dpdk will ensure that it will do virtio reset
>> before
>> >     init
>> >     >     when it
>> >     >     >     > comes up right ?
>> >     >     >
>> >     >     >     No, It just handles the abnormal case well when guest
>> APP
>> >     restarts.
>> >     >     >
>> >     >     >     > Regarding the vhost commits you mentioned - do we
>> still need
>> >     those
>> >     >     fixes
>> >     >     >     if we
>> >     >     >     > have the "virtio reset before init" mechanism ?
>> >     >     >
>> >     >     >     Yes, we still need them: just think some malicious
>> guest may
>> >     also
>> >     >     forge
>> >     >     >     data like that.
>> >     >     >
>> >     >     >     I'm a bit confused then. Have you actually met any
>> issue (like
>> >     got
>> >     >     stucked)
>> >     >     >     with DPDK v16.07?
>> >     >     >
>> >     >     >             --yliu
>> >     >     >
>> >     >     >     > Or that is a seperate problem
>> >     >     >     > altogether (and hence we would need those fixes) ?
>> >     >     >     >
>> >     >     >     > Rgds,
>> >     >     >     > Gopa.
>> >     >     >     >
>> >     >     >     > On Thu, Mar 16, 2017 at 7:06 PM, Yuanhan Liu <
>> >     >     yuanhan.liu@linux.intel.com
>> >     >     >     >
>> >     >     >     > wrote:
>> >     >     >     >
>> >     >     >     >     On Thu, Mar 16, 2017 at 12:39:16PM -0700,
>> Gopakumar
>> >     Choorakkot
>> >     >     >     Edakkunni
>> >     >     >     >     wrote:
>> >     >     >     >     > So the doc says we should call
>> rte_eth_dev_close()
>> >     *before*
>> >     >     going
>> >     >     >     down.
>> >     >     >     >     And I
>> >     >     >     >     > know that especially in dpdk-virtionet  in the
>> guest +
>> >     >     ovs-dpdk in
>> >     >     >     the
>> >     >     >     >     host,
>> >     >     >     >     > the ovs ends up getting stalled/stuck (!!) if I
>> dont
>> >     close
>> >     >     the port
>> >     >     >     >     before
>> >     >     >     >     > starting() it when the guest dpdk process comes
>> back
>> >     up.
>> >     >     >     >
>> >     >     >     >     I'm assuming you were using an old version,
>> something
>> >     like dpdk
>> >     >     v2.2?
>> >     >     >     >     IIRC, DPDK v16.04 should have fixed your issue.
>> >     >     >     >
>> >     >     >     >     > Considering that this not done properly can
>> screw up
>> >     the HOST
>> >     >     ovs,
>> >     >     >     and I
>> >     >     >     >     want
>> >     >     >     >     > to do everything possible to avoid that, I want
>> to be
>> >     200%
>> >     >     sure
>> >     >     >     that I
>> >     >     >     >     call
>> >     >     >     >     > close even if my process gets a kill -9 .. So
>> obviously
>> >     the
>> >     >     only
>> >     >     >     way of
>> >     >     >     >     doing
>> >     >     >     >     > that is to close the port when the dpdk process
>> comes
>> >     back up
>> >     >     and
>> >     >     >     >     *before* we
>> >     >     >     >     > init the port. rte_eth_dev_close() is not
>> capable of
>> >     doing
>> >     >     that as
>> >     >     >     it
>> >     >     >     >     expects
>> >     >     >     >     > the port parameters to be initialized etc..
>> before it
>> >     can be
>> >     >     >     called.
>> >     >     >     >
>> >     >     >     >     We do virtio reset before init, which is
>> basically what
>> >     >     >     rte_eth_dev_close()
>> >     >     >     >     mainly does. So I see no big issue here.
>> >     >     >     >
>> >     >     >     >     The stuck issue is due to hugepage reset by the
>> guest
>> >     DPDK
>> >     >     >     application,
>> >     >     >     >     leading all virtio vring elements being mem
>> zeroed. The
>> >     old
>> >     >     vhost
>> >     >     >     doesn't
>> >     >     >     >     handle it well, as a result, it got stuck. And
>> here are
>> >     some
>> >     >     relevant
>> >     >     >     >     commits:
>> >     >     >     >
>> >     >     >     >         a436f53 vhost: avoid dead loop chain
>> >     >     >     >         c687b0b vhost: check for ring descriptors
>> overflow
>> >     >     >     >         623bc47 vhost: do sanity check for ring
>> descriptor
>> >     length
>> >     >     >     >
>> >     >     >     >             --yliu
>> >     >     >     >
>> >     >     >     >     > Any other
>> >     >     >     >     > suggestions on what can be done to close on
>> restart
>> >     rather
>> >     >     than
>> >     >     >     close on
>> >     >     >     >     going
>> >     >     >     >     > down ? Thought of bouncing this by the alias
>> before I
>> >     add a
>> >     >     version
>> >     >     >     of
>> >     >     >     >     close
>> >     >     >     >     > myself that can do this close-on-restart
>> >     >     >     >
>> >     >     >     >
>> >     >     >
>> >     >     >
>> >     >
>> >     >
>> >
>> >
>>
>
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: virtio "how to restart applications" - //dpdk.org/doc/virtio-net-pmd
  2017-03-18 21:32                     ` Gopakumar Choorakkot Edakkunni
@ 2017-03-18 21:37                       ` Gopakumar Choorakkot Edakkunni
  2017-03-18 23:43                         ` Gopakumar Choorakkot Edakkunni
  0 siblings, 1 reply; 15+ messages in thread
From: Gopakumar Choorakkot Edakkunni @ 2017-03-18 21:37 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: dev

I mean vtpci_reset is called from rte_eal_pci_probe() which is the *last*
thing in rte_eal_init(), *after* hugepage init, so if I can somehow get
that done *before* hugepage init maybe all will be well (because I cant do
anything to fix the host side)

Rgds,
Gopa.

On Sat, Mar 18, 2017 at 2:32 PM, Gopakumar Choorakkot Edakkunni <
gopakumar.c.e@gmail.com> wrote:

> Hi Yuan,
>
> As a "hack"/"workaround", in rte_eal_init(), if I can call vtpci_reset()
> just before rte_eal_memory_init(), that should take care of the problem of
> host zeroing out hugepages right ? As of today vtpci_reset() is called in
> rte_eal_dev_init() which comes *after* rte_eal_memory_init()
>
> Rgds,
> Gopa.
>
> On Thu, Mar 16, 2017 at 10:50 PM, Gopakumar Choorakkot Edakkunni <
> gopakumar.c.e@gmail.com> wrote:
>
>> Thanks again Yuanhan, you are the true expert!!
>>
>> Rgds,
>> Gopa.
>>
>> On Thu, Mar 16, 2017 at 10:40 PM, Yuanhan Liu <
>> yuanhan.liu@linux.intel.com> wrote:
>>
>>> On Thu, Mar 16, 2017 at 10:30:09PM -0700, Gopakumar Choorakkot Edakkunni
>>> wrote:
>>> > Thanks for the confirmation, glad I reached the person who knows the
>>> nuts and
>>> > bolts of virtio :-). So if the host is not in our control (ie if I am
>>> just
>>> > running as a VM on host provided by thirdparty vendor), is there any
>>> workaround
>>> > I can do from the guest side to prevent problems from happening on a
>>> guest
>>> > restart ?
>>>
>>> Not too much. You might want to hack the guest DPDK EAL memory initiation
>>> part though, to not reset the hugepage memory on start. But that's too
>>> hacky
>>> that I will not recommend you to do so!
>>>
>>> > And if theres no workarounds at all and the host has to change,
>>> instead of
>>> > asking the third party vendor to do a wholesale upgrade to 16.04, is
>>> there one/
>>> > few commits that can be added to the host ovs-dpdk to take care of
>>> this guest
>>> > restart virtio-reset-before opening case ?
>>>
>>> Yes, backporting the commits I have mentioned should be able to fix it.
>>> But please note that I did some code refactorings before those fixes: it
>>> won't apply cleanly to DPDK v2.2.
>>>
>>> And if you want to upgrade, I'd suggest to upgrade to v16.11, which is
>>> LTS release.
>>>
>>>         --yliu
>>> >
>>> > Rgds,
>>> > Gopa.
>>> >
>>> > On Thu, Mar 16, 2017 at 10:24 PM, Yuanhan Liu <
>>> yuanhan.liu@linux.intel.com>
>>> > wrote:
>>> >
>>> >     On Thu, Mar 16, 2017 at 10:20:30PM -0700, Gopakumar Choorakkot
>>> Edakkunni
>>> >     wrote:
>>> >     > >> When I was saying dpdk version, I meant the DPDK version with
>>> OVS.
>>> >     >
>>> >     > Oh I see! My apologies for the misuderstanding. The dpdk version
>>> used by
>>> >     host
>>> >     > ovs should be dpdk2.2, the guest process uses dpdk16.07. The OVS
>>> process
>>> >     is not
>>> >     > getting restarted, what is getting restarted is the guest
>>> process using
>>> >     > dpdk16.07 - so the above clarifications you had about virtio
>>> being
>>> >     > reset-before-opened on guest restart - does that still hold good
>>> or does
>>> >     that
>>> >     > need the HOST side dpdk to be 16.04 or above ?
>>> >
>>> >     Yes, the HOST dpdk should be >= v16.04.
>>> >
>>> >             --yliu
>>> >     >
>>> >     > >> And yes, the fixes are not included in the DPDK required for
>>> OVS 2.4.
>>> >     >
>>> >     > Thanks for the info.
>>> >     >
>>> >     > Rgds,
>>> >     > Gopa.
>>> >     >
>>> >     > On Thu, Mar 16, 2017 at 10:13 PM, Yuanhan Liu <
>>> >     yuanhan.liu@linux.intel.com>
>>> >     > wrote:
>>> >     >
>>> >     >     On Thu, Mar 16, 2017 at 09:56:01PM -0700, Gopakumar
>>> Choorakkot
>>> >     Edakkunni
>>> >     >     wrote:
>>> >     >     > Hi Yuanhan,
>>> >     >     >
>>> >     >     > Thanks for the confirmation about not having to do
>>> anything special
>>> >     to
>>> >     >     close
>>> >     >     > the ports on dpdk going down or coming up.
>>> >     >     >
>>> >     >     > As for the question about if I met any issue of ovs
>>> getting stuck -
>>> >     yes,
>>> >     >     my
>>> >     >     > guest process runs dpdk 16.07 as I mentioned earlier - and
>>> if I
>>> >     kill my
>>> >     >     guest
>>> >     >     > process, then the host OVS-dpdk on the host reports stall
>>> ! The
>>> >     OVS-dpdk
>>> >     >     and
>>> >     >     > emu versions I use are as below. But maybe that is because
>>> of the
>>> >     ovs
>>> >     >     missing
>>> >     >     > the fixes you mentioned ?
>>> >     >
>>> >     >     When I was saying dpdk version, I meant the DPDK version
>>> with OVS.
>>> >     >
>>> >     >     > ~# ovs-vswitchd --version
>>> >     >     > ovs-vswitchd (Open vSwitch) 2.4.1
>>> >     >
>>> >     >     And yes, the fixes are not included in the DPDK required for
>>> OVS 2.4.
>>> >     >
>>> >     >             --yliu
>>> >     >
>>> >     >     > Compiled Nov 14 2016 06:53:31
>>> >     >     > # kvm --version
>>> >     >     > QEMU emulator version 2.2.0, Copyright (c) 2003-2008
>>> Fabrice
>>> >     Bellard
>>> >     >     > ~#
>>> >     >     >
>>> >     >     >
>>> >     >     > Rgds,
>>> >     >     > Gopa.
>>> >     >     >
>>> >     >     > On Thu, Mar 16, 2017 at 9:35 PM, Yuanhan Liu <
>>> >     yuanhan.liu@linux.intel.com
>>> >     >     >
>>> >     >     > wrote:
>>> >     >     >
>>> >     >     >     On Thu, Mar 16, 2017 at 07:48:28PM -0700, Gopakumar
>>> Choorakkot
>>> >     >     Edakkunni
>>> >     >     >     wrote:
>>> >     >     >     > Thanks a lot for the response Yuanhan. I am using
>>> dpdk
>>> >     v16.07. So
>>> >     >     what
>>> >     >     >     you are
>>> >     >     >     > saying is that in 16.07, we dont really need to call
>>> >     >     rte_eth_dev_close()
>>> >     >     >     on
>>> >     >     >     > exit,
>>> >     >     >
>>> >     >     >     It's not about "don't really need", it's more like
>>> "it's hard
>>> >     to".
>>> >     >     Just
>>> >     >     >     think that it may crash at any time.
>>> >     >     >
>>> >     >     >     > because dpdk will ensure that it will do virtio
>>> reset before
>>> >     init
>>> >     >     when it
>>> >     >     >     > comes up right ?
>>> >     >     >
>>> >     >     >     No, It just handles the abnormal case well when guest
>>> APP
>>> >     restarts.
>>> >     >     >
>>> >     >     >     > Regarding the vhost commits you mentioned - do we
>>> still need
>>> >     those
>>> >     >     fixes
>>> >     >     >     if we
>>> >     >     >     > have the "virtio reset before init" mechanism ?
>>> >     >     >
>>> >     >     >     Yes, we still need them: just think some malicious
>>> guest may
>>> >     also
>>> >     >     forge
>>> >     >     >     data like that.
>>> >     >     >
>>> >     >     >     I'm a bit confused then. Have you actually met any
>>> issue (like
>>> >     got
>>> >     >     stucked)
>>> >     >     >     with DPDK v16.07?
>>> >     >     >
>>> >     >     >             --yliu
>>> >     >     >
>>> >     >     >     > Or that is a seperate problem
>>> >     >     >     > altogether (and hence we would need those fixes) ?
>>> >     >     >     >
>>> >     >     >     > Rgds,
>>> >     >     >     > Gopa.
>>> >     >     >     >
>>> >     >     >     > On Thu, Mar 16, 2017 at 7:06 PM, Yuanhan Liu <
>>> >     >     yuanhan.liu@linux.intel.com
>>> >     >     >     >
>>> >     >     >     > wrote:
>>> >     >     >     >
>>> >     >     >     >     On Thu, Mar 16, 2017 at 12:39:16PM -0700,
>>> Gopakumar
>>> >     Choorakkot
>>> >     >     >     Edakkunni
>>> >     >     >     >     wrote:
>>> >     >     >     >     > So the doc says we should call
>>> rte_eth_dev_close()
>>> >     *before*
>>> >     >     going
>>> >     >     >     down.
>>> >     >     >     >     And I
>>> >     >     >     >     > know that especially in dpdk-virtionet  in the
>>> guest +
>>> >     >     ovs-dpdk in
>>> >     >     >     the
>>> >     >     >     >     host,
>>> >     >     >     >     > the ovs ends up getting stalled/stuck (!!) if
>>> I dont
>>> >     close
>>> >     >     the port
>>> >     >     >     >     before
>>> >     >     >     >     > starting() it when the guest dpdk process
>>> comes back
>>> >     up.
>>> >     >     >     >
>>> >     >     >     >     I'm assuming you were using an old version,
>>> something
>>> >     like dpdk
>>> >     >     v2.2?
>>> >     >     >     >     IIRC, DPDK v16.04 should have fixed your issue.
>>> >     >     >     >
>>> >     >     >     >     > Considering that this not done properly can
>>> screw up
>>> >     the HOST
>>> >     >     ovs,
>>> >     >     >     and I
>>> >     >     >     >     want
>>> >     >     >     >     > to do everything possible to avoid that, I
>>> want to be
>>> >     200%
>>> >     >     sure
>>> >     >     >     that I
>>> >     >     >     >     call
>>> >     >     >     >     > close even if my process gets a kill -9 .. So
>>> obviously
>>> >     the
>>> >     >     only
>>> >     >     >     way of
>>> >     >     >     >     doing
>>> >     >     >     >     > that is to close the port when the dpdk
>>> process comes
>>> >     back up
>>> >     >     and
>>> >     >     >     >     *before* we
>>> >     >     >     >     > init the port. rte_eth_dev_close() is not
>>> capable of
>>> >     doing
>>> >     >     that as
>>> >     >     >     it
>>> >     >     >     >     expects
>>> >     >     >     >     > the port parameters to be initialized etc..
>>> before it
>>> >     can be
>>> >     >     >     called.
>>> >     >     >     >
>>> >     >     >     >     We do virtio reset before init, which is
>>> basically what
>>> >     >     >     rte_eth_dev_close()
>>> >     >     >     >     mainly does. So I see no big issue here.
>>> >     >     >     >
>>> >     >     >     >     The stuck issue is due to hugepage reset by the
>>> guest
>>> >     DPDK
>>> >     >     >     application,
>>> >     >     >     >     leading all virtio vring elements being mem
>>> zeroed. The
>>> >     old
>>> >     >     vhost
>>> >     >     >     doesn't
>>> >     >     >     >     handle it well, as a result, it got stuck. And
>>> here are
>>> >     some
>>> >     >     relevant
>>> >     >     >     >     commits:
>>> >     >     >     >
>>> >     >     >     >         a436f53 vhost: avoid dead loop chain
>>> >     >     >     >         c687b0b vhost: check for ring descriptors
>>> overflow
>>> >     >     >     >         623bc47 vhost: do sanity check for ring
>>> descriptor
>>> >     length
>>> >     >     >     >
>>> >     >     >     >             --yliu
>>> >     >     >     >
>>> >     >     >     >     > Any other
>>> >     >     >     >     > suggestions on what can be done to close on
>>> restart
>>> >     rather
>>> >     >     than
>>> >     >     >     close on
>>> >     >     >     >     going
>>> >     >     >     >     > down ? Thought of bouncing this by the alias
>>> before I
>>> >     add a
>>> >     >     version
>>> >     >     >     of
>>> >     >     >     >     close
>>> >     >     >     >     > myself that can do this close-on-restart
>>> >     >     >     >
>>> >     >     >     >
>>> >     >     >
>>> >     >     >
>>> >     >
>>> >     >
>>> >
>>> >
>>>
>>
>>
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: virtio "how to restart applications" - //dpdk.org/doc/virtio-net-pmd
  2017-03-18 21:37                       ` Gopakumar Choorakkot Edakkunni
@ 2017-03-18 23:43                         ` Gopakumar Choorakkot Edakkunni
  2017-03-22  5:32                           ` Gopakumar Choorakkot Edakkunni
  0 siblings, 1 reply; 15+ messages in thread
From: Gopakumar Choorakkot Edakkunni @ 2017-03-18 23:43 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: dev

I ended up implementing a mechanism to do the equivalent of a vtpci_reset()
as soon as the dpdk-app dies and just before it comes back up. I am
"hoping" that is sufficient to let the host know that the virtio rings
etc.. are unconfigured, so that when the dpdk app comes up again in guest
and does hugepage init etc.., it in theory should not confuse the host ovs ?

Rgds,
Gopa.

On Sat, Mar 18, 2017 at 2:37 PM, Gopakumar Choorakkot Edakkunni <
gopakumar.c.e@gmail.com> wrote:

> I mean vtpci_reset is called from rte_eal_pci_probe() which is the *last*
> thing in rte_eal_init(), *after* hugepage init, so if I can somehow get
> that done *before* hugepage init maybe all will be well (because I cant do
> anything to fix the host side)
>
> Rgds,
> Gopa.
>
> On Sat, Mar 18, 2017 at 2:32 PM, Gopakumar Choorakkot Edakkunni <
> gopakumar.c.e@gmail.com> wrote:
>
>> Hi Yuan,
>>
>> As a "hack"/"workaround", in rte_eal_init(), if I can call vtpci_reset()
>> just before rte_eal_memory_init(), that should take care of the problem of
>> host zeroing out hugepages right ? As of today vtpci_reset() is called in
>> rte_eal_dev_init() which comes *after* rte_eal_memory_init()
>>
>> Rgds,
>> Gopa.
>>
>> On Thu, Mar 16, 2017 at 10:50 PM, Gopakumar Choorakkot Edakkunni <
>> gopakumar.c.e@gmail.com> wrote:
>>
>>> Thanks again Yuanhan, you are the true expert!!
>>>
>>> Rgds,
>>> Gopa.
>>>
>>> On Thu, Mar 16, 2017 at 10:40 PM, Yuanhan Liu <
>>> yuanhan.liu@linux.intel.com> wrote:
>>>
>>>> On Thu, Mar 16, 2017 at 10:30:09PM -0700, Gopakumar Choorakkot
>>>> Edakkunni wrote:
>>>> > Thanks for the confirmation, glad I reached the person who knows the
>>>> nuts and
>>>> > bolts of virtio :-). So if the host is not in our control (ie if I am
>>>> just
>>>> > running as a VM on host provided by thirdparty vendor), is there any
>>>> workaround
>>>> > I can do from the guest side to prevent problems from happening on a
>>>> guest
>>>> > restart ?
>>>>
>>>> Not too much. You might want to hack the guest DPDK EAL memory
>>>> initiation
>>>> part though, to not reset the hugepage memory on start. But that's too
>>>> hacky
>>>> that I will not recommend you to do so!
>>>>
>>>> > And if theres no workarounds at all and the host has to change,
>>>> instead of
>>>> > asking the third party vendor to do a wholesale upgrade to 16.04, is
>>>> there one/
>>>> > few commits that can be added to the host ovs-dpdk to take care of
>>>> this guest
>>>> > restart virtio-reset-before opening case ?
>>>>
>>>> Yes, backporting the commits I have mentioned should be able to fix it.
>>>> But please note that I did some code refactorings before those fixes: it
>>>> won't apply cleanly to DPDK v2.2.
>>>>
>>>> And if you want to upgrade, I'd suggest to upgrade to v16.11, which is
>>>> LTS release.
>>>>
>>>>         --yliu
>>>> >
>>>> > Rgds,
>>>> > Gopa.
>>>> >
>>>> > On Thu, Mar 16, 2017 at 10:24 PM, Yuanhan Liu <
>>>> yuanhan.liu@linux.intel.com>
>>>> > wrote:
>>>> >
>>>> >     On Thu, Mar 16, 2017 at 10:20:30PM -0700, Gopakumar Choorakkot
>>>> Edakkunni
>>>> >     wrote:
>>>> >     > >> When I was saying dpdk version, I meant the DPDK version
>>>> with OVS.
>>>> >     >
>>>> >     > Oh I see! My apologies for the misuderstanding. The dpdk
>>>> version used by
>>>> >     host
>>>> >     > ovs should be dpdk2.2, the guest process uses dpdk16.07. The
>>>> OVS process
>>>> >     is not
>>>> >     > getting restarted, what is getting restarted is the guest
>>>> process using
>>>> >     > dpdk16.07 - so the above clarifications you had about virtio
>>>> being
>>>> >     > reset-before-opened on guest restart - does that still hold
>>>> good or does
>>>> >     that
>>>> >     > need the HOST side dpdk to be 16.04 or above ?
>>>> >
>>>> >     Yes, the HOST dpdk should be >= v16.04.
>>>> >
>>>> >             --yliu
>>>> >     >
>>>> >     > >> And yes, the fixes are not included in the DPDK required for
>>>> OVS 2.4.
>>>> >     >
>>>> >     > Thanks for the info.
>>>> >     >
>>>> >     > Rgds,
>>>> >     > Gopa.
>>>> >     >
>>>> >     > On Thu, Mar 16, 2017 at 10:13 PM, Yuanhan Liu <
>>>> >     yuanhan.liu@linux.intel.com>
>>>> >     > wrote:
>>>> >     >
>>>> >     >     On Thu, Mar 16, 2017 at 09:56:01PM -0700, Gopakumar
>>>> Choorakkot
>>>> >     Edakkunni
>>>> >     >     wrote:
>>>> >     >     > Hi Yuanhan,
>>>> >     >     >
>>>> >     >     > Thanks for the confirmation about not having to do
>>>> anything special
>>>> >     to
>>>> >     >     close
>>>> >     >     > the ports on dpdk going down or coming up.
>>>> >     >     >
>>>> >     >     > As for the question about if I met any issue of ovs
>>>> getting stuck -
>>>> >     yes,
>>>> >     >     my
>>>> >     >     > guest process runs dpdk 16.07 as I mentioned earlier -
>>>> and if I
>>>> >     kill my
>>>> >     >     guest
>>>> >     >     > process, then the host OVS-dpdk on the host reports stall
>>>> ! The
>>>> >     OVS-dpdk
>>>> >     >     and
>>>> >     >     > emu versions I use are as below. But maybe that is
>>>> because of the
>>>> >     ovs
>>>> >     >     missing
>>>> >     >     > the fixes you mentioned ?
>>>> >     >
>>>> >     >     When I was saying dpdk version, I meant the DPDK version
>>>> with OVS.
>>>> >     >
>>>> >     >     > ~# ovs-vswitchd --version
>>>> >     >     > ovs-vswitchd (Open vSwitch) 2.4.1
>>>> >     >
>>>> >     >     And yes, the fixes are not included in the DPDK required
>>>> for OVS 2.4.
>>>> >     >
>>>> >     >             --yliu
>>>> >     >
>>>> >     >     > Compiled Nov 14 2016 06:53:31
>>>> >     >     > # kvm --version
>>>> >     >     > QEMU emulator version 2.2.0, Copyright (c) 2003-2008
>>>> Fabrice
>>>> >     Bellard
>>>> >     >     > ~#
>>>> >     >     >
>>>> >     >     >
>>>> >     >     > Rgds,
>>>> >     >     > Gopa.
>>>> >     >     >
>>>> >     >     > On Thu, Mar 16, 2017 at 9:35 PM, Yuanhan Liu <
>>>> >     yuanhan.liu@linux.intel.com
>>>> >     >     >
>>>> >     >     > wrote:
>>>> >     >     >
>>>> >     >     >     On Thu, Mar 16, 2017 at 07:48:28PM -0700, Gopakumar
>>>> Choorakkot
>>>> >     >     Edakkunni
>>>> >     >     >     wrote:
>>>> >     >     >     > Thanks a lot for the response Yuanhan. I am using
>>>> dpdk
>>>> >     v16.07. So
>>>> >     >     what
>>>> >     >     >     you are
>>>> >     >     >     > saying is that in 16.07, we dont really need to call
>>>> >     >     rte_eth_dev_close()
>>>> >     >     >     on
>>>> >     >     >     > exit,
>>>> >     >     >
>>>> >     >     >     It's not about "don't really need", it's more like
>>>> "it's hard
>>>> >     to".
>>>> >     >     Just
>>>> >     >     >     think that it may crash at any time.
>>>> >     >     >
>>>> >     >     >     > because dpdk will ensure that it will do virtio
>>>> reset before
>>>> >     init
>>>> >     >     when it
>>>> >     >     >     > comes up right ?
>>>> >     >     >
>>>> >     >     >     No, It just handles the abnormal case well when guest
>>>> APP
>>>> >     restarts.
>>>> >     >     >
>>>> >     >     >     > Regarding the vhost commits you mentioned - do we
>>>> still need
>>>> >     those
>>>> >     >     fixes
>>>> >     >     >     if we
>>>> >     >     >     > have the "virtio reset before init" mechanism ?
>>>> >     >     >
>>>> >     >     >     Yes, we still need them: just think some malicious
>>>> guest may
>>>> >     also
>>>> >     >     forge
>>>> >     >     >     data like that.
>>>> >     >     >
>>>> >     >     >     I'm a bit confused then. Have you actually met any
>>>> issue (like
>>>> >     got
>>>> >     >     stucked)
>>>> >     >     >     with DPDK v16.07?
>>>> >     >     >
>>>> >     >     >             --yliu
>>>> >     >     >
>>>> >     >     >     > Or that is a seperate problem
>>>> >     >     >     > altogether (and hence we would need those fixes) ?
>>>> >     >     >     >
>>>> >     >     >     > Rgds,
>>>> >     >     >     > Gopa.
>>>> >     >     >     >
>>>> >     >     >     > On Thu, Mar 16, 2017 at 7:06 PM, Yuanhan Liu <
>>>> >     >     yuanhan.liu@linux.intel.com
>>>> >     >     >     >
>>>> >     >     >     > wrote:
>>>> >     >     >     >
>>>> >     >     >     >     On Thu, Mar 16, 2017 at 12:39:16PM -0700,
>>>> Gopakumar
>>>> >     Choorakkot
>>>> >     >     >     Edakkunni
>>>> >     >     >     >     wrote:
>>>> >     >     >     >     > So the doc says we should call
>>>> rte_eth_dev_close()
>>>> >     *before*
>>>> >     >     going
>>>> >     >     >     down.
>>>> >     >     >     >     And I
>>>> >     >     >     >     > know that especially in dpdk-virtionet  in
>>>> the guest +
>>>> >     >     ovs-dpdk in
>>>> >     >     >     the
>>>> >     >     >     >     host,
>>>> >     >     >     >     > the ovs ends up getting stalled/stuck (!!) if
>>>> I dont
>>>> >     close
>>>> >     >     the port
>>>> >     >     >     >     before
>>>> >     >     >     >     > starting() it when the guest dpdk process
>>>> comes back
>>>> >     up.
>>>> >     >     >     >
>>>> >     >     >     >     I'm assuming you were using an old version,
>>>> something
>>>> >     like dpdk
>>>> >     >     v2.2?
>>>> >     >     >     >     IIRC, DPDK v16.04 should have fixed your issue.
>>>> >     >     >     >
>>>> >     >     >     >     > Considering that this not done properly can
>>>> screw up
>>>> >     the HOST
>>>> >     >     ovs,
>>>> >     >     >     and I
>>>> >     >     >     >     want
>>>> >     >     >     >     > to do everything possible to avoid that, I
>>>> want to be
>>>> >     200%
>>>> >     >     sure
>>>> >     >     >     that I
>>>> >     >     >     >     call
>>>> >     >     >     >     > close even if my process gets a kill -9 .. So
>>>> obviously
>>>> >     the
>>>> >     >     only
>>>> >     >     >     way of
>>>> >     >     >     >     doing
>>>> >     >     >     >     > that is to close the port when the dpdk
>>>> process comes
>>>> >     back up
>>>> >     >     and
>>>> >     >     >     >     *before* we
>>>> >     >     >     >     > init the port. rte_eth_dev_close() is not
>>>> capable of
>>>> >     doing
>>>> >     >     that as
>>>> >     >     >     it
>>>> >     >     >     >     expects
>>>> >     >     >     >     > the port parameters to be initialized etc..
>>>> before it
>>>> >     can be
>>>> >     >     >     called.
>>>> >     >     >     >
>>>> >     >     >     >     We do virtio reset before init, which is
>>>> basically what
>>>> >     >     >     rte_eth_dev_close()
>>>> >     >     >     >     mainly does. So I see no big issue here.
>>>> >     >     >     >
>>>> >     >     >     >     The stuck issue is due to hugepage reset by the
>>>> guest
>>>> >     DPDK
>>>> >     >     >     application,
>>>> >     >     >     >     leading all virtio vring elements being mem
>>>> zeroed. The
>>>> >     old
>>>> >     >     vhost
>>>> >     >     >     doesn't
>>>> >     >     >     >     handle it well, as a result, it got stuck. And
>>>> here are
>>>> >     some
>>>> >     >     relevant
>>>> >     >     >     >     commits:
>>>> >     >     >     >
>>>> >     >     >     >         a436f53 vhost: avoid dead loop chain
>>>> >     >     >     >         c687b0b vhost: check for ring descriptors
>>>> overflow
>>>> >     >     >     >         623bc47 vhost: do sanity check for ring
>>>> descriptor
>>>> >     length
>>>> >     >     >     >
>>>> >     >     >     >             --yliu
>>>> >     >     >     >
>>>> >     >     >     >     > Any other
>>>> >     >     >     >     > suggestions on what can be done to close on
>>>> restart
>>>> >     rather
>>>> >     >     than
>>>> >     >     >     close on
>>>> >     >     >     >     going
>>>> >     >     >     >     > down ? Thought of bouncing this by the alias
>>>> before I
>>>> >     add a
>>>> >     >     version
>>>> >     >     >     of
>>>> >     >     >     >     close
>>>> >     >     >     >     > myself that can do this close-on-restart
>>>> >     >     >     >
>>>> >     >     >     >
>>>> >     >     >
>>>> >     >     >
>>>> >     >
>>>> >     >
>>>> >
>>>> >
>>>>
>>>
>>>
>>
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: virtio "how to restart applications" - //dpdk.org/doc/virtio-net-pmd
  2017-03-18 23:43                         ` Gopakumar Choorakkot Edakkunni
@ 2017-03-22  5:32                           ` Gopakumar Choorakkot Edakkunni
  0 siblings, 0 replies; 15+ messages in thread
From: Gopakumar Choorakkot Edakkunni @ 2017-03-22  5:32 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: dev

Also Yuanhuan, your suggestion about the hugepage mapping / clearing memory
was great .. I tried a test where I just wrote random values into the
entire hugepage area and that succesfully crashed the ovs on the host :).
So thats a good test to generally ensure that the guest doesnt mess up the
host ! Thanks again for your suggestions.

Rgds,
Gopa.

On Sat, Mar 18, 2017 at 4:43 PM, Gopakumar Choorakkot Edakkunni <
gopakumar.c.e@gmail.com> wrote:

> I ended up implementing a mechanism to do the equivalent of a
> vtpci_reset() as soon as the dpdk-app dies and just before it comes back
> up. I am "hoping" that is sufficient to let the host know that the virtio
> rings etc.. are unconfigured, so that when the dpdk app comes up again in
> guest and does hugepage init etc.., it in theory should not confuse the
> host ovs ?
>
> Rgds,
> Gopa.
>
> On Sat, Mar 18, 2017 at 2:37 PM, Gopakumar Choorakkot Edakkunni <
> gopakumar.c.e@gmail.com> wrote:
>
>> I mean vtpci_reset is called from rte_eal_pci_probe() which is the *last*
>> thing in rte_eal_init(), *after* hugepage init, so if I can somehow get
>> that done *before* hugepage init maybe all will be well (because I cant do
>> anything to fix the host side)
>>
>> Rgds,
>> Gopa.
>>
>> On Sat, Mar 18, 2017 at 2:32 PM, Gopakumar Choorakkot Edakkunni <
>> gopakumar.c.e@gmail.com> wrote:
>>
>>> Hi Yuan,
>>>
>>> As a "hack"/"workaround", in rte_eal_init(), if I can call vtpci_reset()
>>> just before rte_eal_memory_init(), that should take care of the problem of
>>> host zeroing out hugepages right ? As of today vtpci_reset() is called in
>>> rte_eal_dev_init() which comes *after* rte_eal_memory_init()
>>>
>>> Rgds,
>>> Gopa.
>>>
>>> On Thu, Mar 16, 2017 at 10:50 PM, Gopakumar Choorakkot Edakkunni <
>>> gopakumar.c.e@gmail.com> wrote:
>>>
>>>> Thanks again Yuanhan, you are the true expert!!
>>>>
>>>> Rgds,
>>>> Gopa.
>>>>
>>>> On Thu, Mar 16, 2017 at 10:40 PM, Yuanhan Liu <
>>>> yuanhan.liu@linux.intel.com> wrote:
>>>>
>>>>> On Thu, Mar 16, 2017 at 10:30:09PM -0700, Gopakumar Choorakkot
>>>>> Edakkunni wrote:
>>>>> > Thanks for the confirmation, glad I reached the person who knows the
>>>>> nuts and
>>>>> > bolts of virtio :-). So if the host is not in our control (ie if I
>>>>> am just
>>>>> > running as a VM on host provided by thirdparty vendor), is there any
>>>>> workaround
>>>>> > I can do from the guest side to prevent problems from happening on a
>>>>> guest
>>>>> > restart ?
>>>>>
>>>>> Not too much. You might want to hack the guest DPDK EAL memory
>>>>> initiation
>>>>> part though, to not reset the hugepage memory on start. But that's too
>>>>> hacky
>>>>> that I will not recommend you to do so!
>>>>>
>>>>> > And if theres no workarounds at all and the host has to change,
>>>>> instead of
>>>>> > asking the third party vendor to do a wholesale upgrade to 16.04, is
>>>>> there one/
>>>>> > few commits that can be added to the host ovs-dpdk to take care of
>>>>> this guest
>>>>> > restart virtio-reset-before opening case ?
>>>>>
>>>>> Yes, backporting the commits I have mentioned should be able to fix it.
>>>>> But please note that I did some code refactorings before those fixes:
>>>>> it
>>>>> won't apply cleanly to DPDK v2.2.
>>>>>
>>>>> And if you want to upgrade, I'd suggest to upgrade to v16.11, which is
>>>>> LTS release.
>>>>>
>>>>>         --yliu
>>>>> >
>>>>> > Rgds,
>>>>> > Gopa.
>>>>> >
>>>>> > On Thu, Mar 16, 2017 at 10:24 PM, Yuanhan Liu <
>>>>> yuanhan.liu@linux.intel.com>
>>>>> > wrote:
>>>>> >
>>>>> >     On Thu, Mar 16, 2017 at 10:20:30PM -0700, Gopakumar Choorakkot
>>>>> Edakkunni
>>>>> >     wrote:
>>>>> >     > >> When I was saying dpdk version, I meant the DPDK version
>>>>> with OVS.
>>>>> >     >
>>>>> >     > Oh I see! My apologies for the misuderstanding. The dpdk
>>>>> version used by
>>>>> >     host
>>>>> >     > ovs should be dpdk2.2, the guest process uses dpdk16.07. The
>>>>> OVS process
>>>>> >     is not
>>>>> >     > getting restarted, what is getting restarted is the guest
>>>>> process using
>>>>> >     > dpdk16.07 - so the above clarifications you had about virtio
>>>>> being
>>>>> >     > reset-before-opened on guest restart - does that still hold
>>>>> good or does
>>>>> >     that
>>>>> >     > need the HOST side dpdk to be 16.04 or above ?
>>>>> >
>>>>> >     Yes, the HOST dpdk should be >= v16.04.
>>>>> >
>>>>> >             --yliu
>>>>> >     >
>>>>> >     > >> And yes, the fixes are not included in the DPDK required
>>>>> for OVS 2.4.
>>>>> >     >
>>>>> >     > Thanks for the info.
>>>>> >     >
>>>>> >     > Rgds,
>>>>> >     > Gopa.
>>>>> >     >
>>>>> >     > On Thu, Mar 16, 2017 at 10:13 PM, Yuanhan Liu <
>>>>> >     yuanhan.liu@linux.intel.com>
>>>>> >     > wrote:
>>>>> >     >
>>>>> >     >     On Thu, Mar 16, 2017 at 09:56:01PM -0700, Gopakumar
>>>>> Choorakkot
>>>>> >     Edakkunni
>>>>> >     >     wrote:
>>>>> >     >     > Hi Yuanhan,
>>>>> >     >     >
>>>>> >     >     > Thanks for the confirmation about not having to do
>>>>> anything special
>>>>> >     to
>>>>> >     >     close
>>>>> >     >     > the ports on dpdk going down or coming up.
>>>>> >     >     >
>>>>> >     >     > As for the question about if I met any issue of ovs
>>>>> getting stuck -
>>>>> >     yes,
>>>>> >     >     my
>>>>> >     >     > guest process runs dpdk 16.07 as I mentioned earlier -
>>>>> and if I
>>>>> >     kill my
>>>>> >     >     guest
>>>>> >     >     > process, then the host OVS-dpdk on the host reports
>>>>> stall ! The
>>>>> >     OVS-dpdk
>>>>> >     >     and
>>>>> >     >     > emu versions I use are as below. But maybe that is
>>>>> because of the
>>>>> >     ovs
>>>>> >     >     missing
>>>>> >     >     > the fixes you mentioned ?
>>>>> >     >
>>>>> >     >     When I was saying dpdk version, I meant the DPDK version
>>>>> with OVS.
>>>>> >     >
>>>>> >     >     > ~# ovs-vswitchd --version
>>>>> >     >     > ovs-vswitchd (Open vSwitch) 2.4.1
>>>>> >     >
>>>>> >     >     And yes, the fixes are not included in the DPDK required
>>>>> for OVS 2.4.
>>>>> >     >
>>>>> >     >             --yliu
>>>>> >     >
>>>>> >     >     > Compiled Nov 14 2016 06:53:31
>>>>> >     >     > # kvm --version
>>>>> >     >     > QEMU emulator version 2.2.0, Copyright (c) 2003-2008
>>>>> Fabrice
>>>>> >     Bellard
>>>>> >     >     > ~#
>>>>> >     >     >
>>>>> >     >     >
>>>>> >     >     > Rgds,
>>>>> >     >     > Gopa.
>>>>> >     >     >
>>>>> >     >     > On Thu, Mar 16, 2017 at 9:35 PM, Yuanhan Liu <
>>>>> >     yuanhan.liu@linux.intel.com
>>>>> >     >     >
>>>>> >     >     > wrote:
>>>>> >     >     >
>>>>> >     >     >     On Thu, Mar 16, 2017 at 07:48:28PM -0700, Gopakumar
>>>>> Choorakkot
>>>>> >     >     Edakkunni
>>>>> >     >     >     wrote:
>>>>> >     >     >     > Thanks a lot for the response Yuanhan. I am using
>>>>> dpdk
>>>>> >     v16.07. So
>>>>> >     >     what
>>>>> >     >     >     you are
>>>>> >     >     >     > saying is that in 16.07, we dont really need to
>>>>> call
>>>>> >     >     rte_eth_dev_close()
>>>>> >     >     >     on
>>>>> >     >     >     > exit,
>>>>> >     >     >
>>>>> >     >     >     It's not about "don't really need", it's more like
>>>>> "it's hard
>>>>> >     to".
>>>>> >     >     Just
>>>>> >     >     >     think that it may crash at any time.
>>>>> >     >     >
>>>>> >     >     >     > because dpdk will ensure that it will do virtio
>>>>> reset before
>>>>> >     init
>>>>> >     >     when it
>>>>> >     >     >     > comes up right ?
>>>>> >     >     >
>>>>> >     >     >     No, It just handles the abnormal case well when
>>>>> guest APP
>>>>> >     restarts.
>>>>> >     >     >
>>>>> >     >     >     > Regarding the vhost commits you mentioned - do we
>>>>> still need
>>>>> >     those
>>>>> >     >     fixes
>>>>> >     >     >     if we
>>>>> >     >     >     > have the "virtio reset before init" mechanism ?
>>>>> >     >     >
>>>>> >     >     >     Yes, we still need them: just think some malicious
>>>>> guest may
>>>>> >     also
>>>>> >     >     forge
>>>>> >     >     >     data like that.
>>>>> >     >     >
>>>>> >     >     >     I'm a bit confused then. Have you actually met any
>>>>> issue (like
>>>>> >     got
>>>>> >     >     stucked)
>>>>> >     >     >     with DPDK v16.07?
>>>>> >     >     >
>>>>> >     >     >             --yliu
>>>>> >     >     >
>>>>> >     >     >     > Or that is a seperate problem
>>>>> >     >     >     > altogether (and hence we would need those fixes) ?
>>>>> >     >     >     >
>>>>> >     >     >     > Rgds,
>>>>> >     >     >     > Gopa.
>>>>> >     >     >     >
>>>>> >     >     >     > On Thu, Mar 16, 2017 at 7:06 PM, Yuanhan Liu <
>>>>> >     >     yuanhan.liu@linux.intel.com
>>>>> >     >     >     >
>>>>> >     >     >     > wrote:
>>>>> >     >     >     >
>>>>> >     >     >     >     On Thu, Mar 16, 2017 at 12:39:16PM -0700,
>>>>> Gopakumar
>>>>> >     Choorakkot
>>>>> >     >     >     Edakkunni
>>>>> >     >     >     >     wrote:
>>>>> >     >     >     >     > So the doc says we should call
>>>>> rte_eth_dev_close()
>>>>> >     *before*
>>>>> >     >     going
>>>>> >     >     >     down.
>>>>> >     >     >     >     And I
>>>>> >     >     >     >     > know that especially in dpdk-virtionet  in
>>>>> the guest +
>>>>> >     >     ovs-dpdk in
>>>>> >     >     >     the
>>>>> >     >     >     >     host,
>>>>> >     >     >     >     > the ovs ends up getting stalled/stuck (!!)
>>>>> if I dont
>>>>> >     close
>>>>> >     >     the port
>>>>> >     >     >     >     before
>>>>> >     >     >     >     > starting() it when the guest dpdk process
>>>>> comes back
>>>>> >     up.
>>>>> >     >     >     >
>>>>> >     >     >     >     I'm assuming you were using an old version,
>>>>> something
>>>>> >     like dpdk
>>>>> >     >     v2.2?
>>>>> >     >     >     >     IIRC, DPDK v16.04 should have fixed your issue.
>>>>> >     >     >     >
>>>>> >     >     >     >     > Considering that this not done properly can
>>>>> screw up
>>>>> >     the HOST
>>>>> >     >     ovs,
>>>>> >     >     >     and I
>>>>> >     >     >     >     want
>>>>> >     >     >     >     > to do everything possible to avoid that, I
>>>>> want to be
>>>>> >     200%
>>>>> >     >     sure
>>>>> >     >     >     that I
>>>>> >     >     >     >     call
>>>>> >     >     >     >     > close even if my process gets a kill -9 ..
>>>>> So obviously
>>>>> >     the
>>>>> >     >     only
>>>>> >     >     >     way of
>>>>> >     >     >     >     doing
>>>>> >     >     >     >     > that is to close the port when the dpdk
>>>>> process comes
>>>>> >     back up
>>>>> >     >     and
>>>>> >     >     >     >     *before* we
>>>>> >     >     >     >     > init the port. rte_eth_dev_close() is not
>>>>> capable of
>>>>> >     doing
>>>>> >     >     that as
>>>>> >     >     >     it
>>>>> >     >     >     >     expects
>>>>> >     >     >     >     > the port parameters to be initialized etc..
>>>>> before it
>>>>> >     can be
>>>>> >     >     >     called.
>>>>> >     >     >     >
>>>>> >     >     >     >     We do virtio reset before init, which is
>>>>> basically what
>>>>> >     >     >     rte_eth_dev_close()
>>>>> >     >     >     >     mainly does. So I see no big issue here.
>>>>> >     >     >     >
>>>>> >     >     >     >     The stuck issue is due to hugepage reset by
>>>>> the guest
>>>>> >     DPDK
>>>>> >     >     >     application,
>>>>> >     >     >     >     leading all virtio vring elements being mem
>>>>> zeroed. The
>>>>> >     old
>>>>> >     >     vhost
>>>>> >     >     >     doesn't
>>>>> >     >     >     >     handle it well, as a result, it got stuck. And
>>>>> here are
>>>>> >     some
>>>>> >     >     relevant
>>>>> >     >     >     >     commits:
>>>>> >     >     >     >
>>>>> >     >     >     >         a436f53 vhost: avoid dead loop chain
>>>>> >     >     >     >         c687b0b vhost: check for ring descriptors
>>>>> overflow
>>>>> >     >     >     >         623bc47 vhost: do sanity check for ring
>>>>> descriptor
>>>>> >     length
>>>>> >     >     >     >
>>>>> >     >     >     >             --yliu
>>>>> >     >     >     >
>>>>> >     >     >     >     > Any other
>>>>> >     >     >     >     > suggestions on what can be done to close on
>>>>> restart
>>>>> >     rather
>>>>> >     >     than
>>>>> >     >     >     close on
>>>>> >     >     >     >     going
>>>>> >     >     >     >     > down ? Thought of bouncing this by the alias
>>>>> before I
>>>>> >     add a
>>>>> >     >     version
>>>>> >     >     >     of
>>>>> >     >     >     >     close
>>>>> >     >     >     >     > myself that can do this close-on-restart
>>>>> >     >     >     >
>>>>> >     >     >     >
>>>>> >     >     >
>>>>> >     >     >
>>>>> >     >
>>>>> >     >
>>>>> >
>>>>> >
>>>>>
>>>>
>>>>
>>>
>>
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2017-03-22  5:32 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-16 19:39 virtio "how to restart applications" - //dpdk.org/doc/virtio-net-pmd Gopakumar Choorakkot Edakkunni
2017-03-17  2:06 ` Yuanhan Liu
2017-03-17  2:48   ` Gopakumar Choorakkot Edakkunni
2017-03-17  4:35     ` Yuanhan Liu
2017-03-17  4:56       ` Gopakumar Choorakkot Edakkunni
2017-03-17  5:13         ` Yuanhan Liu
2017-03-17  5:20           ` Gopakumar Choorakkot Edakkunni
2017-03-17  5:24             ` Yuanhan Liu
2017-03-17  5:30               ` Gopakumar Choorakkot Edakkunni
2017-03-17  5:40                 ` Yuanhan Liu
2017-03-17  5:50                   ` Gopakumar Choorakkot Edakkunni
2017-03-18 21:32                     ` Gopakumar Choorakkot Edakkunni
2017-03-18 21:37                       ` Gopakumar Choorakkot Edakkunni
2017-03-18 23:43                         ` Gopakumar Choorakkot Edakkunni
2017-03-22  5:32                           ` Gopakumar Choorakkot Edakkunni

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.