From mboxrd@z Thu Jan  1 00:00:00 1970
From: =?iso-8859-1?Q?Ga=EBtan?= Rivet <gaetan.rivet@6wind.com>
Subject: Re: [PATCH v2 3/4] net/failsafe: replace local
 sub-device with shared data
Date: Thu, 7 Mar 2019 12:50:38 +0100
Message-ID: <20190307115038.rwomp3jlcxikhj5y@bidouze.vm.6wind.com>
References: <1551779507-10857-1-git-send-email-rasland@mellanox.com>
 <172443910.dRk5910QrU@xps>
 <20190306104632.fia2r2sz5yajvkne@bidouze.vm.6wind.com>
 <6541291.VHOuPfexjb@xps>
 <AM6PR05MB59265B8B39776EE53753BE26C24C0@AM6PR05MB5926.eurprd05.prod.outlook.com>
 <20190307094757.56db3w74ftqkra7d@bidouze.vm.6wind.com>
 <AM6PR05MB59269CF2F1AE80B0635E3836C24C0@AM6PR05MB5926.eurprd05.prod.outlook.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit
Cc: Thomas Monjalon <thomas@monjalon.net>, "dev@dpdk.org" <dev@dpdk.org>,
 "stephen@networkplumber.org" <stephen@networkplumber.org>
To: Raslan Darawsheh <rasland@mellanox.com>
Return-path: <dev-bounces@dpdk.org>
Received: from mail-wm1-f66.google.com (mail-wm1-f66.google.com
 [209.85.128.66]) by dpdk.org (Postfix) with ESMTP id 394562B87
 for <dev@dpdk.org>; Thu,  7 Mar 2019 12:50:42 +0100 (CET)
Received: by mail-wm1-f66.google.com with SMTP id x7so8993854wmj.0
 for <dev@dpdk.org>; Thu, 07 Mar 2019 03:50:42 -0800 (PST)
Content-Disposition: inline
In-Reply-To: <AM6PR05MB59269CF2F1AE80B0635E3836C24C0@AM6PR05MB5926.eurprd05.prod.outlook.com>
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

On Thu, Mar 07, 2019 at 11:34:42AM +0000, Raslan Darawsheh wrote:
> Hi,
> 
> > -----Original Message-----
> > From: Gaëtan Rivet <gaetan.rivet@6wind.com>
> > Sent: Thursday, March 7, 2019 11:48 AM
> > To: Raslan Darawsheh <rasland@mellanox.com>
> > Cc: Thomas Monjalon <thomas@monjalon.net>; dev@dpdk.org;
> > stephen@networkplumber.org
> > Subject: Re: [dpdk-dev] [PATCH v2 3/4] net/failsafe: replace local sub-device
> > with shared data
> > 
> > On Thu, Mar 07, 2019 at 08:43:01AM +0000, Raslan Darawsheh wrote:
> > > Hi,
> > >
> > > > -----Original Message-----
> > > > From: Thomas Monjalon <thomas@monjalon.net>
> > > > Sent: Wednesday, March 6, 2019 8:02 PM
> > > > To: Gaëtan Rivet <gaetan.rivet@6wind.com>; Raslan Darawsheh
> > > > <rasland@mellanox.com>
> > > > Cc: dev@dpdk.org; stephen@networkplumber.org
> > > > Subject: Re: [dpdk-dev] [PATCH v2 3/4] net/failsafe: replace local
> > > > sub-device with shared data
> > > >
> > > > 06/03/2019 11:46, Gaëtan Rivet:
> > > > > On Tue, Mar 05, 2019 at 06:58:04PM +0100, Thomas Monjalon wrote:
> > > > > > 05/03/2019 18:38, Gaëtan Rivet:
> > > > > > > What happens when a primary process closes a device before a
> > > > secondary?
> > > > > > > Is the secondary unable to stop / close its own then? Isn't
> > > > > > > there some missing uninit?
> > > > > >
> > > > > > Is the secondary process supposed to do any closing?
> > > > > > The device management should be done only by the primary process.
> > > > > >
> > > > > > Note: anyway all this hotplug related code should be dropped
> > > > > > from failsafe to be replaced by EAL hotplug management.
> > > > > >
> > > > >
> > > > > I don't know, I've never used secondary process.
> > > > > However, cursory reading the code of rte_eth_dev_close(), I don't
> > > > > see a guard against calling it from a secondary process?
> > > >
> > > > Yes indeed, there is no guard.
> > > > That's something not clear in DPDK, previously we were attaching
> > > > some vdevs in secondary only.
> > > >
> > > > > Reading code like
> > > > >
> > > > >    rte_free(dev->data->rx_queues);
> > > > >    dev->data->rx_queues = NULL;
> > > > >
> > > > > within makes me think the issue has been seen at least once, where
> > > > > shared data is freed multiple times, so I guessed some secondary
> > > > > processes were calling it. Maybe they are not meant to, but what
> > > > > prevents them from being badly written?
> > > > >
> > > > > Also, given rte_dev_remove IPC call to transfer the order to the
> > > > > primary, it seems that at least secondary processes are expected
> > > > > to call
> > > > > rte_dev_remove() at some point? So are they only authorized to
> > > > > call
> > > > > rte_dev_remove() (to manage hotplug), but not rte_eth_dev_close()?
> > > > > Is there a specific documentation detailing the design of
> > > > > secondary process and the related responsibilities in the lifetime of a
> > device?
> > > > > How are they synching their rte_eth_devices list if they are not
> > > > > calling rte_eth_dev_close(), ever?
> > > >
> > > > All these calls should be done in primary.
> > > > The IPC mechanism calls the attach/detach in secondary at EAL level.
> > > > The PMDs does the bridge between EAL device and ethdev port status.
> > > > But you are right, there can be a sync issue if closing an ethdev
> > > > port and not removing the EAL device.
> > > > This is a generic question about deciding whether we want all ethdev
> > > > ports to be synced in multi-process or not.
> > > >
> > > > In failsafe context, we are closing the EAL device and change the
> > > > state of the sub-device accordingly. So I think there is no issue.
> > > >
> > > > > > > This seems dangerous to me. Why not instead allocating a
> > > > > > > per-process slab of memory that would hold the relevant
> > > > > > > references and outlive the shared data (a per-process rte_eth_dev
> > private data...).
> > > > > >
> > > > > > Which data do you think should be allocated per process?
> > > > > >
> > > > > >
> > > > >
> > > > > [-------- SHARED SPACE --------------] [-- PER-PROCESS --------]
> > > > > +--------------------------------------------------------------+
> > > > > | +------------------+                +- rte_eth_devices[n] -+ |
> > > > > | |rte_eth_dev_data  |<---------------+ data                 | | PRIMARY
> > > > > | |                  |   +dev_priv-+  |                      | |
> > > > > | |      dev_private +-->|         |  |                      | |
> > > > > | |              ... |   |         |  |                      | |
> > > > > | |          port_id |   |         |  |                      | |
> > > > > | |                  |   |         |  |                      | |
> > > > > | |                  |   |         |  |                      | |
> > > > > | |                  |   |         |  +----------------------+ |
> > > > > | |                  |   |         |  +- rte_eth_devices[n] -+ |
> > > > > | |                  |   |         |  |                      | | SECONDARY
> > > > > | |                  |   |         |  |                      | |
> > > > > | |                  |   |         |  |                      | |
> > > > > | |                  |   |         |  |                      | |
> > > > > | |                  |   +---------+  |                      | |
> > > > > | |                  |<---------------+ data                 | |
> > > > > | +------------------+                +----------------------+ |
> > > > > +--------------------------------------------------------------+
> > > > >
> > > > > Here port_id is used within fail-safe to get back to rte_eth_devices[n].
> > > > > This disappears once a device is closed, as all shared space is zeroed.
> > > > >
> > > > > This means that sometimes ETH(sdev) and PORT_ID(sdev) is correct,
> > > > > and at some point it is not anymore, once a sub-device has been closed.
> > > > > This seems dangerous.
> > > >
> > > > The state of the sub-device is changed.
> > > > I don't see any issue.
> > > >
> > > > > I was thinking initially that allocating a place where each sdev
> > > > > would store their rte_eth_devices / port_id back-reference could
> > > > > alleviate the issue, meaning that the fail-safe would not zero it
> > > > > on sdev_close(), and it would remain valid for the lifetime of a
> > > > > sub-device, so even when a sub-device is in DEV_PROBED state.
> > > > >
> > > > > But now that I think about it, it could probably be simpler:
> > > > > instead of using (ETH(sdev)->data->port_id) for the port_id of an
> > > > > sdev (meaning that it is dependent on the lifetime of the sdev,
> > > > > instead of the lifetime of the failsafe), the port-id itself
> > > > > should be stored in the sub_device structure. This structure will
> > > > > be available for the lifetime of the failsafe, and the port_id is correct
> > accross all processes.
> > > > >
> > > > > So PORT_ID(sdev) would be defined to something like
> > > > > (sdev->port_id), and
> > > > > ETH(sdev) would be (&rte_eth_devices[PORT_ID(sdev)]). It would
> > > > > remain correct even once the primary has closed the sub-device.
> > > > >
> > > > > What do you think? Do you agree that the current state is
> > > > > dangerous, and do you think the solution would alleviate the
> > > > > issue? Maybe the concern is unfounded and the only issue is within
> > fs_dev_remove().
> > > >
> > > > Yes it is only seen in fs_dev_remove().
> > > > I discussed about your proposal with Raslan, and we agree we could
> > > > change from sub_device.data to sub_device.port_id, it may be more
> > future-proof.
> > > >
> > > > I have only one doubt: look at the macro in this patch:
> > > >
> > > > #define ETH(sdev) \
> > > > 	((sdev)->data == NULL ? NULL : &rte_eth_devices[(sdev)->data-
> > > > >port_id])
> > > >
> > > > The NULL check cannot be done with a port id.
> > > > I think it was needed to manage one case. Raslan?
> > >
> > > That's right since we need it for fs_tx_unsafe, to add a protection for
> > plugged out devices during TX.
> > 
> > Ok, thanks for your insights Thomas and Raslan. Sorry about the rambling
> > above I needed to write down the stuff to think about it.
> > 
> > You can use RTE_MAX_ETHPORTS as a sentinel value for port_id, this way the
> > value is kept unsigned and there are several checks against this specific value
> > otherwise.
> > 
> > so ETH(sdev) could be
> > 
> >         (PORT_ID(sdev) >= RTE_MAX_ETHPORTS ? NULL :
> > &rte_eth_devices[PORT_ID(sdev)])
> > 
> But, this mean that you have to explicitly set the port id  to RTE_MAX_ETH_PORTS in the sdev after fs_dev_remove and you shouldn't rely on the port ID anymore.
> 

Yes, once a sub-device has completely finished its removal (from the
fail-safe PoV), the fail-safe marks the sub-device as not used anymore.
This seems correct.

If the fail-safe used the sdev->data->port_id instead, it would return
0, which is a valid port_id that is probably still used by another port.

> > --
> 
> > Gaëtan Rivet
> > 6WIND
> 
> Kindest regards
> Raslan Darawsheh

-- 
Gaëtan Rivet
6WIND