netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Ertman, David M" <david.m.ertman@intel.com>
To: Jason Gunthorpe <jgg@ziepe.ca>,
	"Kirsher, Jeffrey T" <jeffrey.t.kirsher@intel.com>
Cc: "davem@davemloft.net" <davem@davemloft.net>,
	"gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>,
	"nhorman@redhat.com" <nhorman@redhat.com>,
	"sassmann@redhat.com" <sassmann@redhat.com>,
	"Nguyen, Anthony L" <anthony.l.nguyen@intel.com>,
	"Bowers, AndrewX" <andrewx.bowers@intel.com>
Subject: RE: [RFC PATCH v4 02/25] ice: Create and register virtual bus for RDMA
Date: Thu, 20 Feb 2020 18:48:04 +0000	[thread overview]
Message-ID: <DM6PR11MB2841C1643C0414031941D191DD130@DM6PR11MB2841.namprd11.prod.outlook.com> (raw)
In-Reply-To: <20200214203932.GY31668@ziepe.ca>

> -----Original Message-----
> From: Jason Gunthorpe <jgg@ziepe.ca>
> Sent: Friday, February 14, 2020 12:40 PM
> To: Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>
> Cc: davem@davemloft.net; gregkh@linuxfoundation.org; Ertman, David M
> <david.m.ertman@intel.com>; netdev@vger.kernel.org; linux-
> rdma@vger.kernel.org; nhorman@redhat.com; sassmann@redhat.com;
> Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Bowers, AndrewX
> <andrewx.bowers@intel.com>
> Subject: Re: [RFC PATCH v4 02/25] ice: Create and register virtual bus for
> RDMA
> 
> On Wed, Feb 12, 2020 at 11:14:01AM -0800, Jeff Kirsher wrote:
> > +/**
> > + * ice_init_peer_devices - initializes peer devices
> > + * @pf: ptr to ice_pf
> > + *
> > + * This function initializes peer devices on the virtual bus.
> > + */
> > +int ice_init_peer_devices(struct ice_pf *pf)
> > +{
> > +	struct ice_vsi *vsi = pf->vsi[0];
> > +	struct pci_dev *pdev = pf->pdev;
> > +	struct device *dev = &pdev->dev;
> > +	int status = 0;
> > +	int i;
> > +
> > +	/* Reserve vector resources */
> > +	status = ice_reserve_peer_qvector(pf);
> > +	if (status < 0) {
> > +		dev_err(dev, "failed to reserve vectors for peer drivers\n");
> > +		return status;
> > +	}
> > +	for (i = 0; i < ARRAY_SIZE(ice_peers); i++) {
> > +		struct ice_peer_dev_int *peer_dev_int;
> > +		struct ice_peer_drv_int *peer_drv_int;
> > +		struct iidc_qos_params *qos_info;
> > +		struct iidc_virtbus_object *vbo;
> > +		struct msix_entry *entry = NULL;
> > +		struct iidc_peer_dev *peer_dev;
> > +		struct virtbus_device *vdev;
> > +		int j;
> > +
> > +		/* structure layout needed for container_of's looks like:
> > +		 * ice_peer_dev_int (internal only ice peer superstruct)
> > +		 * |--> iidc_peer_dev
> > +		 * |--> *ice_peer_drv_int
> > +		 *
> > +		 * iidc_virtbus_object (container_of parent for vdev)
> > +		 * |--> virtbus_device
> > +		 * |--> *iidc_peer_dev (pointer from internal struct)
> > +		 *
> > +		 * ice_peer_drv_int (internal only peer_drv struct)
> > +		 */
> > +		peer_dev_int = devm_kzalloc(dev, sizeof(*peer_dev_int),
> > +					    GFP_KERNEL);
> > +		if (!peer_dev_int)
> > +			return -ENOMEM;
> > +
> > +		vbo = kzalloc(sizeof(*vbo), GFP_KERNEL);
> > +		if (!vbo) {
> > +			devm_kfree(dev, peer_dev_int);
> > +			return -ENOMEM;
> > +		}
> > +
> > +		peer_drv_int = devm_kzalloc(dev, sizeof(*peer_drv_int),
> > +					    GFP_KERNEL);
> 
> To me, this looks like a lifetime mess. All these devm allocations
> against the parent object are being referenced through the vbo with a
> different kref lifetime. The whole thing has very unclear semantics
> who should be cleaning up on error

Will cover this at the end after addressing your following points =) 

In my reply, I am going to refer to the kernel object that is registering the
virtbus_device(s) as KO_device and the kernel object that is registering
the virtbus_driver(s) as KO_driver.

> 
> > +		if (!peer_drv_int) {
> > +			devm_kfree(dev, peer_dev_int);
> > +			kfree(vbo);
> 
> ie here we free two things

At this point in the init flow for KO_device, there has only been kallocs done,
no device has been registered with virtbus.  So, only memory cleanup is
required.

> 
> > +			return -ENOMEM;
> > +		}
> > +
> > +		pf->peers[i] = peer_dev_int;
> > +		vbo->peer_dev = &peer_dev_int->peer_dev;
> > +		peer_dev_int->peer_drv_int = peer_drv_int;
> > +		peer_dev_int->peer_dev.vdev = &vbo->vdev;
> > +
> > +		/* Initialize driver values */
> > +		for (j = 0; j < IIDC_EVENT_NBITS; j++)
> > +			bitmap_zero(peer_drv_int->current_events[j].type,
> > +				    IIDC_EVENT_NBITS);
> > +
> > +		mutex_init(&peer_dev_int->peer_dev_state_mutex);
> > +
> > +		peer_dev = &peer_dev_int->peer_dev;
> > +		peer_dev->peer_ops = NULL;
> > +		peer_dev->hw_addr = (u8 __iomem *)pf->hw.hw_addr;
> > +		peer_dev->peer_dev_id = ice_peers[i].id;
> > +		peer_dev->pf_vsi_num = vsi->vsi_num;
> > +		peer_dev->netdev = vsi->netdev;
> > +
> > +		peer_dev_int->ice_peer_wq =
> > +			alloc_ordered_workqueue("ice_peer_wq_%d",
> WQ_UNBOUND,
> > +						i);
> > +		if (!peer_dev_int->ice_peer_wq)
> > +			return -ENOMEM;
> 
> Here we free nothing

This is a miss on my part.  At this point we should keep consistent and free the memory
that has been allocated as we unwind.  

> 
> > +
> > +		peer_dev->pdev = pdev;
> > +		qos_info = &peer_dev->initial_qos_info;
> > +
> > +		/* setup qos_info fields with defaults */
> > +		qos_info->num_apps = 0;
> > +		qos_info->num_tc = 1;
> > +
> > +		for (j = 0; j < IIDC_MAX_USER_PRIORITY; j++)
> > +			qos_info->up2tc[j] = 0;
> > +
> > +		qos_info->tc_info[0].rel_bw = 100;
> > +		for (j = 1; j < IEEE_8021QAZ_MAX_TCS; j++)
> > +			qos_info->tc_info[j].rel_bw = 0;
> > +
> > +		/* for DCB, override the qos_info defaults. */
> > +		ice_setup_dcb_qos_info(pf, qos_info);
> > +
> > +		/* make sure peer specific resources such as msix_count and
> > +		 * msix_entries are initialized
> > +		 */
> > +		switch (ice_peers[i].id) {
> > +		case IIDC_PEER_RDMA_ID:
> > +			if (test_bit(ICE_FLAG_IWARP_ENA, pf->flags)) {
> > +				peer_dev->msix_count = pf-
> >num_rdma_msix;
> > +				entry = &pf->msix_entries[pf-
> >rdma_base_vector];
> > +			}
> > +			break;
> > +		default:
> > +			break;
> > +		}
> > +
> > +		peer_dev->msix_entries = entry;
> > +		ice_peer_state_change(peer_dev_int,
> ICE_PEER_DEV_STATE_INIT,
> > +				      false);
> > +
> > +		vdev = &vbo->vdev;
> > +		vdev->name = ice_peers[i].name;
> > +		vdev->release = ice_peer_vdev_release;
> > +		vdev->dev.parent = &pdev->dev;
> > +
> > +		status = virtbus_dev_register(vdev);
> > +		if (status) {
> > +			virtbus_dev_unregister(vdev);
> > +			vdev = NULL;
> 
> Here we double unregister and free nothing.
> 
> You need to go through all of this really carefully and make some kind
> of sane lifetime model and fix all the error unwinding :(

Thanks for catching this.  A failure in virtbus_register_device()  does
*not* require a call virtbus_unregister_device.  The failure path for the
register function handles this.  Also, we need to remain consistent with freeing
on unwind.

> 
> Why doesn't the release() function of vbo trigger the free of all this
> peer related stuff?
> 
> Use a sane design model of splitting into functions to allocate single
> peices of memory, goto error unwind each function, and build things up
> properly.
> 
> Jason

I am going to add this to the documentation to record the following information.

The KO_device is responsible for allocating the memory for the virtbus_device
and keeping it viable for the lifetime of the KO_device.  KO_device will call
virtbus_register_device to start using the virtbus_device, and KO_device is
responslble for calling virtbus_unregister_device either on KO_device's exit
path (remove/shutdown) or when it is done using the virtbus subsystem.

The KO_driver is responsible for allocating the memory for the virtbus_driver
and keeping it viable for the lifetime of the KO_driver. KO_driver will call
virtbus_register_driver to start using the virtbus_driver, and KO_driver is
responsible for calling virtbus_unregister_driver either on KO_driver's exit
path (remove/shutdown) or when it is done using the virtbus subsystem.

The premise is that the KO_device and KO_driver can load and unload multiple
times and they can reconnect to each other through the virtbus on each
occurrence of their reloads.  So one example of a flow looks like the following:

- KO_device loads (order of KO_device and KO_driver loading is irrelevant)
- KO_device allocates memory for virtbus_device(s) it expects to use and
        any backing memory it is going to use to interact with KO_driver.
- KO_device performs virtbus_register_device() which is the *only* place
        a device_initialize() is performed for virtbus_device.

- KO_driver loads
- KO_driver allocates memory for virtbus_driver(s) it expects to use and
        any backing memory it expects to use to interact with KO_device
- KO_driver performs virtbus_register_driver()

- virtbus matches virtbus_device and virtbus_driver and calls the
        virtbus_drivers's probe()

- KO_driver and KO_device interact with each other however they choose to do so.

- KO_device (for example) receives a call to its remove callback
- KO_device's unload path severs any interaction the KO_device and KO_driver
        were having - implementation dependant
- KO_device's unload path is required to perform a call to
        virtbus_unregister_device().  virtbus_unregister_device() is the *only*
        place a put_device() is performed.
- KO_device's unload path frees memory associated with the virtbus_device

- vitbus calls KO_drivers's .remove callback defined for the virtbus_driver

So, the lifespan of the virtbus_device is controlled by KO_device and the
lifespan of virtbus_driver is controlled by KO_driver.

It is required for the KO's to "allocate -> register -> unregister -> free"
virtbus objects.

-DaveE

  reply	other threads:[~2020-02-20 18:48 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-12 19:13 [RFC PATCH v4 00/25] Intel Wired LAN/RDMA Driver Updates 2020-02-11 Jeff Kirsher
2020-02-12 19:14 ` [RFC PATCH v4 01/25] virtual-bus: Implementation of Virtual Bus Jeff Kirsher
2020-02-14 17:02   ` Greg KH
2020-02-14 20:34     ` Jason Gunthorpe
2020-02-14 20:43       ` Greg KH
2020-02-15  0:01         ` Jason Gunthorpe
2020-02-15  0:53           ` Greg KH
2020-02-14 20:45       ` Greg KH
2020-02-20 18:55         ` Ertman, David M
2020-02-20 19:27           ` Jason Gunthorpe
2020-02-14 21:22   ` Parav Pandit
2020-02-15  0:08   ` Jason Gunthorpe
2020-02-12 19:14 ` [RFC PATCH v4 02/25] ice: Create and register virtual bus for RDMA Jeff Kirsher
2020-02-14 20:39   ` Jason Gunthorpe
2020-02-20 18:48     ` Ertman, David M [this message]
2020-02-20 20:58       ` Jason Gunthorpe
2020-02-12 19:14 ` [RFC PATCH v4 03/25] ice: Complete RDMA peer registration Jeff Kirsher
2020-02-12 19:14 ` [RFC PATCH v4 04/25] ice: Support resource allocation requests Jeff Kirsher
2020-02-12 19:14 ` [RFC PATCH v4 05/25] ice: Enable event notifications Jeff Kirsher
2020-02-12 19:14 ` [RFC PATCH v4 06/25] ice: Allow reset operations Jeff Kirsher
2020-02-12 19:14 ` [RFC PATCH v4 07/25] ice: Pass through communications to VF Jeff Kirsher
2020-02-12 19:14 ` [RFC PATCH v4 08/25] i40e: Move client header location Jeff Kirsher
2020-02-12 19:14 ` [RFC PATCH v4 09/25] i40e: Register a virtbus device to provide RDMA Jeff Kirsher
2020-02-12 19:14 ` [RFC PATCH v4 10/25] RDMA/irdma: Add driver framework definitions Jeff Kirsher
2020-02-14 22:13   ` Parav Pandit
2020-02-18 20:42     ` Saleem, Shiraz
2020-02-20 22:24       ` Parav Pandit
2020-02-20 23:06         ` Jason Gunthorpe
2020-02-21 17:01         ` Saleem, Shiraz
2020-02-21 17:23           ` Parav Pandit
2020-02-21 18:04             ` Jason Gunthorpe
2020-03-19 11:49               ` Martin Habets
2020-02-12 19:14 ` [RFC PATCH v4 11/25] RDMA/irdma: Implement device initialization definitions Jeff Kirsher
2020-02-12 19:14 ` [RFC PATCH v4 12/25] RDMA/irdma: Implement HW Admin Queue OPs Jeff Kirsher
2020-02-12 19:14 ` [RFC PATCH v4 13/25] RDMA/irdma: Add HMC backing store setup functions Jeff Kirsher
2020-02-12 19:14 ` [RFC PATCH v4 14/25] RDMA/irdma: Add privileged UDA queue implementation Jeff Kirsher
2020-02-12 19:14 ` [RFC PATCH v4 15/25] RDMA/irdma: Add QoS definitions Jeff Kirsher
2020-02-12 19:14 ` [RFC PATCH v4 16/25] RDMA/irdma: Add connection manager Jeff Kirsher
2020-02-12 19:14 ` [RFC PATCH v4 17/25] RDMA/irdma: Add PBLE resource manager Jeff Kirsher
2020-02-12 19:14 ` [RFC PATCH v4 18/25] RDMA/irdma: Implement device supported verb APIs Jeff Kirsher
2020-02-14 14:54   ` Jason Gunthorpe
2020-02-14 15:49     ` Andrew Boyer
2020-02-14 16:45       ` Jason Gunthorpe
2020-02-18 20:43     ` Saleem, Shiraz
2020-02-12 19:14 ` [RFC PATCH v4 19/25] RDMA/irdma: Add RoCEv2 UD OP support Jeff Kirsher
2020-02-12 19:14 ` [RFC PATCH v4 20/25] RDMA/irdma: Add user/kernel shared libraries Jeff Kirsher
2020-02-12 19:14 ` [RFC PATCH v4 21/25] RDMA/irdma: Add miscellaneous utility definitions Jeff Kirsher
2020-02-12 19:14 ` [RFC PATCH v4 22/25] RDMA/irdma: Add dynamic tracing for CM Jeff Kirsher
2020-02-14 14:53   ` Jason Gunthorpe
2020-02-18 20:43     ` Saleem, Shiraz
2020-02-12 19:14 ` [RFC PATCH v4 23/25] RDMA/irdma: Add ABI definitions Jeff Kirsher
2020-02-12 19:14 ` [RFC PATCH v4 24/25] RDMA: Add irdma Kconfig/Makefile and remove i40iw Jeff Kirsher
2020-02-12 19:14 ` [RFC PATCH v4 25/25] RDMA/irdma: Update MAINTAINERS file Jeff Kirsher

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DM6PR11MB2841C1643C0414031941D191DD130@DM6PR11MB2841.namprd11.prod.outlook.com \
    --to=david.m.ertman@intel.com \
    --cc=andrewx.bowers@intel.com \
    --cc=anthony.l.nguyen@intel.com \
    --cc=davem@davemloft.net \
    --cc=gregkh@linuxfoundation.org \
    --cc=jeffrey.t.kirsher@intel.com \
    --cc=jgg@ziepe.ca \
    --cc=linux-rdma@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=nhorman@redhat.com \
    --cc=sassmann@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).