From mboxrd@z Thu Jan  1 00:00:00 1970
From: "Guo, Jia" <jia.guo@intel.com>
Subject: Re: [PATCH v2 1/4] bus/pci: handle device hot unplug
Date: Tue, 26 Jun 2018 23:30:00 +0800
Message-ID: <3cc2fd5f-48b9-7d0b-e942-1716b0622462@intel.com>
References: <1498711073-42917-1-git-send-email-jia.guo@intel.com>
 <1529668268-7462-1-git-send-email-jia.guo@intel.com>
 <1529668268-7462-2-git-send-email-jia.guo@intel.com>
 <20180622125908.r5r7qvvndn4dp6ln@bidouze.vm.6wind.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 8bit
Cc: stephen@networkplumber.org, bruce.richardson@intel.com,
 ferruh.yigit@intel.com, konstantin.ananyev@intel.com, jingjing.wu@intel.com,
 thomas@monjalon.net, motih@mellanox.com, matan@mellanox.com,
 harry.van.haaren@intel.com, qi.z.zhang@intel.com, shaopeng.he@intel.com,
 jblunck@infradead.org, shreyansh.jain@nxp.com, dev@dpdk.org,
 helin.zhang@intel.com
To: =?UTF-8?Q?Ga=c3=abtan_Rivet?= <gaetan.rivet@6wind.com>
Return-path: <dev-bounces@dpdk.org>
Received: from mga03.intel.com (mga03.intel.com [134.134.136.65])
 by dpdk.org (Postfix) with ESMTP id D10AA1B5F1
 for <dev@dpdk.org>; Tue, 26 Jun 2018 17:30:28 +0200 (CEST)
In-Reply-To: <20180622125908.r5r7qvvndn4dp6ln@bidouze.vm.6wind.com>
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

hi, gaetan,

thanks for your review, see comment as bellow


On 6/22/2018 8:59 PM, Gaëtan Rivet wrote:
> Hi Jeff,
>
> Sorry, I followed this development from afar,
> I have a remark regarding this API, I think it can be made simpler.
> Details below.
>
> On Fri, Jun 22, 2018 at 07:51:05PM +0800, Jeff Guo wrote:
>> When a hardware device is removed physically or the software disables
>> it, the hot unplug occur. App need to call ether dev API to detach the
>> device, to unplug the device at the bus level and make access to the device
>> invalid. But the problem is that, the removal of the device from the
>> software lists is not going to be instantaneous, at this time if the data
>> path still read/write the device, it will cause MMIO error and result of
>> the app crash out. So a hot unplug handle mechanism need to guaranty app
>> will not crash out when hot unplug device.
>>
>> To handle device hot unplug is bus-specific behavior, this patch introduces
>> a bus ops so that each kind of bus can implement its own logic. Further,
>> this patch implements the ops for PCI bus: remap a dummy memory to avoid
>> bus read/write error.
>>
>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
>> ---
>> v2->v1(v21):
>> refind commit log
>> ---
>>   drivers/bus/pci/pci_common.c            | 65 +++++++++++++++++++++++++++++++++
>>   drivers/bus/pci/pci_common_uio.c        | 33 +++++++++++++++++
>>   drivers/bus/pci/private.h               | 12 ++++++
>>   lib/librte_eal/common/include/rte_bus.h | 16 ++++++++
>>   4 files changed, 126 insertions(+)
>>
>> diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
>> index 7215aae..74d9aa8 100644
>> --- a/drivers/bus/pci/pci_common.c
>> +++ b/drivers/bus/pci/pci_common.c
>> @@ -472,6 +472,70 @@ pci_find_device(const struct rte_device *start, rte_dev_cmp_t cmp,
>>   	return NULL;
>>   }
>>   
>> +static struct rte_pci_device *
>> +pci_find_device_by_addr(void *failure_addr)
>> +{
>> +	struct rte_pci_device *pdev = NULL;
>> +	int i;
>> +
>> +	FOREACH_DEVICE_ON_PCIBUS(pdev) {
>> +		for (i = 0; i != RTE_DIM(pdev->mem_resource); i++) {
>> +			if ((uint64_t)(uintptr_t)failure_addr >=
>> +			    (uint64_t)(uintptr_t)pdev->mem_resource[i].addr &&
>> +			    (uint64_t)(uintptr_t)failure_addr <
>> +			    (uint64_t)(uintptr_t)pdev->mem_resource[i].addr +
>> +			    pdev->mem_resource[i].len) {
>> +				RTE_LOG(ERR, EAL, "Failure address "
>> +					"%16.16"PRIx64" belongs to "
>> +					"device %s!\n",
>> +					(uint64_t)(uintptr_t)failure_addr,
>> +					pdev->device.name);
>> +				return pdev;
>> +			}
>> +		}
>> +	}
>> +	return NULL;
>> +}
> You define here a new bus ops that takes either an rte_device or an
> arbitrary address as input.
>
> In the uev handler that would call this ops afterward, you similarly try
> to find either a bus using the device name, or then iterate over all
> buses and try to find one able to handle the error.
>
> This seems redundant and prone to ambiguity: should one check that the
> device address is actually linked with the provided address? If not, is
> it an improper call or a special case? This is unclear.
>
> Note: I haven't followed the previous discussion, maybe the
>        dual dev_addr + failure_addr is warranted in the API here,
>        if so why not. Otherwise it just seems redundant:
>        the dev addr will never be within a physical BAR mapping,
>        and for all buses / drivers not using physical mappings,
>        the addr is only meaningful as a cue to find an internal
>        resource.
>
> You can use the bus->find_device() to iterate over buses, and design
> your bus ops such that when provided with an addr, would do whatever it
> needs internally to find a relevant resource and either handle the
> error, or return that the error was not handled.

if bus->find_device() can make think more simpler, why not? i will check 
that and modify it.

> Something like that:
>
> /* new bus ops: */
> /* If generic error codes are defined as part of the API,
>     >0 should mean that the sigbus was not handled,
>     <0 that an error occured but that one should stop trying,
>     0 that everything is ok.
>   */
> int (*handle_sigbus)(void *addr);
>
> /* new rte_bus API: */
>
> static int
> bus_handle_sigbus(const struct rte_bus *bus,
>                    const void *addr)
> {
>          /* If additional error codes are defined as part of the API,
>             negative values should stop the iteration.
>             In this case, rte_errno would need to be set as well.
>           */
>          return !(bus->handle_sigbus && bus->handle_sigbus(addr) <= 0);
> }
>
> int
> rte_bus_sigbus_handler(void *addr)
> {
>          struct rte_bus *bus;
>          int old_errno = rte_errno;
>
>          rte_errno = 0;
>          bus = rte_bus_find(NULL, bus_handle_sigbus, addr);
>          if (bus == NULL) {
>                  /* ERROR: no bus could handle the error. */
>                  RTE_LOG(ERR, EAL, "No bus was able to handle the error");
>                  return -1;
>          } else if {rte_errno != 0) {
>                  /* ERROR: a generic sigbus handling error. */
>                  RTE_LOG(ERR, EAL, "Say what the error is");
>                  return -1;
>          }
>          rte_errno = old_errno;
>          return 0;
> }
>
> Which would afterward be implemented, for example in PCI bus:
>
> static rte_pci_device *
> pci_find_device_by_addr(void *addr)
> {
>          struct rte_pci_device *pdev;
>
>          FOREACH_DEVICE_ON_PCIBUS(pdev)
>                  if (&pdev->device == addr ||
>                      /* addr within mappings of pdev */)
>                          return pdev;
>          return NULL;
> }
>
> static int
> pci_handle_sigbus(void *addr)
> {
>          static rte_pci_device *pdev;
>
>          pdev = pci_find_device_by_addr(addr);
>          if (pdev == NULL)
>                  return -1;
>          /* Here, handle uio remap if needed. */
> }
>
> -------------------------
>
> - Leave the bus iteration and check within rte_bus. Centralize the call
>    in a tighter rte_bus API, do not use directly your OPS from your other
>    EAL facility.
>
> - You have left several error messages for signaling success (!), or
>    even simply that a bus could not handle a specific address. This is
>    bad. An error should only appear on error. Otherwise, all of this can
>    easily be traced using a debugger, so I don't think it's necessary
>    to leave it at a DEBUG level.
>
>    But in any case, remove all ERR-level messages for success.

make sense. i will check which debug message is no need at least.

> <...>
>
>> diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
>> index eb9eded..6a5609f 100644
>> --- a/lib/librte_eal/common/include/rte_bus.h
>> +++ b/lib/librte_eal/common/include/rte_bus.h
>> @@ -168,6 +168,20 @@ typedef int (*rte_bus_unplug_t)(struct rte_device *dev);
>>   typedef int (*rte_bus_parse_t)(const char *name, void *addr);
>>   
>>   /**
>> + * Implementation a specific hot unplug handler, which is responsible
>> + * for handle the failure when hot unplug the device, guaranty the system
>> + * would not hung in the case.
>> + * @param dev
>> + *	Pointer of the device structure.
>> + *
>> + * @return
>> + *	0 on success.
>> + *	!0 on error.
>> + */
>> +typedef int (*rte_bus_handle_hot_unplug_t)(struct rte_device *dev,
>> +						void *dev_addr);
>> +
> I don't like the name of the OPS.
> The documentation evokes only "the failure".
> So is it a handle for any and all error possibly happening to a device?
> If so, where is the input to describe the error?
> If it is only meant to handle SIGBUS, because it is a very specific
> error state only meant to happen on certain parts of the bus (the queue
> mappings, if relevant), then it makes sense to only have an arbitrary
> address as context for handling.
>
> But then, it needs to be called as such. The expected failure to be
> handled should be explicit in the name of the ops, and the documentation
> should be more precise about what a bus developper should do with the
> input.

I agree with your point of let the name more explicit, but i think here 
maybe we should spit it into two ops, the one is hotplug_handler, the 
other is sigbus_handler, because there are
2 path that both, data path and control path, they are also need to call 
remap function when detect the hot remove event,even there are no sigbus 
happen.

>> +/**
>>    * Bus scan policies
>>    */
>>   enum rte_bus_scan_mode {
>> @@ -209,6 +223,8 @@ struct rte_bus {
>>   	rte_bus_plug_t plug;         /**< Probe single device for drivers */
>>   	rte_bus_unplug_t unplug;     /**< Remove single device from driver */
>>   	rte_bus_parse_t parse;       /**< Parse a device name */
>> +	rte_bus_handle_hot_unplug_t handle_hot_unplug; /**< handle hot unplug
>> +							device event */
> The new ops should be added at the end of the structure.

ok.

> Regards,