All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] Add hot plug event in rte eal interrupt and inplement it in i40e driver.
@ 2017-05-28 15:44 Jeff Guo
  2017-05-30  7:14 ` Gaëtan Rivet
                   ` (2 more replies)
  0 siblings, 3 replies; 494+ messages in thread
From: Jeff Guo @ 2017-05-28 15:44 UTC (permalink / raw)
  To: helin.zhang, jingjing.wu, bruce.richardson, konstantin.ananyev,
	yuanhan.liu, gaetan.rivet
  Cc: dev, jia.guo

For HW hotplug feature, we had already got upstream that removal event adding from 6wind as bellow.

dependency of “add device removal event” patch set:
http://dpdk.org/dev/patchwork/patch/23693/
[dpdk-dev,v2,1/5] ethdev: introduce device removal event
http://dpdk.org/dev/patchwork/patch/23694/
[dpdk-dev,v2,2/5] net/mlx4: device removal event support
http://dpdk.org/dev/patchwork/patch/23695/
[dpdk-dev,v2,3/5] app/testpmd: generic event handler
http://dpdk.org/dev/patchwork/patch/23696/
[dpdk-dev,v2,4/5] app/testpmd: request link status interrupt
http://dpdk.org/dev/patchwork/patch/23697/
[dpdk-dev,v2,5/5] app/testpmd: request device removal interrupt

>From the patches, we can see a new event type “RTE_ETH_DEV_INTR_RMV” has been added into the ethdev, and the event has been implemented in mlx4 driver, and Testpmd be use for testing purposes and as a practical usage example for how to use these event. The progress is use the mlx4 driver to register interrupt callback function to rte eal interrupt source, and when rte epolling detect the IBV_EVENT_DEVICE_FATAL , which is identify the device remove behavior, it will callback into the driver’s interrupt handle to handle it, and then callback to the user app, such as testpmd, to detach the pci device.

So far, except the mlx4 driver, other driver like i40, that not have the remove interrupt from hw, will not be able to monitoring the remove behavior, so in order to expand the hot plug feature for all driver cases, something must be done ot detect the remove event at the kernel level and offer a new line of interrupt to the userland. The idea is coming as below.

Use I40e as example, we know that the rmv interrupt is not added in hw, but we could monitor the uio file descriptor to catch the device remove event as bellow.

The info of uevent form FD monitoring:
remove@/devices/pci0000:80/0000:80:02.2/0000:82:00.0/0000:83:03.0/0000:84:00.2/uio/uio2
ACTION=remove
DEVPATH=/devices/pci0000:80/0000:80:02.2/0000:82:00.0/0000:83:03.0/0000:84:00.2/uio/uio2
SUBSYSTEM=uio
MAJOR=243
MINOR=2
DEVNAME=uio2
SEQNUM=11366

Firstly, in order to monitor the uio file descriptor, we need to create socket to monitor the uio fd, that is defined as “hotplug_fd”, and then add the uio fd into the epoll fd list, rte eal could epoll all of the interrupt event from hw interrupt and also include the uevent from hotplug_fd.

Secondly, in order to read out the uevent that monitoring, we need to add uevent API in rte layer. We plan add 2 , rte_uevent_connect and  rte_get_uevent. All driver interrupt handler could use these API to enable the uevent monitoring, and read out the uevent type , then corresponding to handle these uevent, such as detach the device when get the remove type.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
 drivers/net/i40e/i40e_ethdev.c                     |  15 +++
 lib/librte_eal/linuxapp/eal/eal_interrupts.c       | 146 ++++++++++++++++++++-
 .../linuxapp/eal/include/exec-env/rte_interrupts.h |  32 +++++
 3 files changed, 191 insertions(+), 2 deletions(-)

diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index 4c49673..0336f82 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -66,6 +66,8 @@
 #include "i40e_pf.h"
 #include "i40e_regs.h"
 
+extern int hotplug_fd;
+
 #define ETH_I40E_FLOATING_VEB_ARG	"enable_floating_veb"
 #define ETH_I40E_FLOATING_VEB_LIST_ARG	"floating_veb_list"
 
@@ -5808,8 +5810,21 @@ i40e_dev_interrupt_handler(void *param)
 {
 	struct rte_eth_dev *dev = (struct rte_eth_dev *)param;
 	struct i40e_hw *hw = I40E_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+	struct rte_uevent event;
 	uint32_t icr0;
 
+	/* check device  uevent */
+	while (rte_get_uevent(hotplug_fd, &event) > 0) {
+		if (event.subsystem == 1) {
+			if (event.action == RTE_UEVENT_ADD) {
+				//do nothing here
+			} else if (event.action == RTE_UEVENT_REMOVE) {
+				_rte_eth_dev_callback_process(dev,
+					RTE_ETH_EVENT_INTR_RMV, NULL);
+			}
+		}
+	}
+
 	/* Disable interrupt */
 	i40e_pf_disable_irq0(hw);
 
diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
index 2e3bd12..873ab5f 100644
--- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c
+++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
@@ -65,6 +65,10 @@
 #include <rte_errno.h>
 #include <rte_spinlock.h>
 
+#include <sys/socket.h>
+#include <linux/netlink.h>
+#include <sys/epoll.h>
+
 #include "eal_private.h"
 #include "eal_vfio.h"
 #include "eal_thread.h"
@@ -74,6 +78,11 @@
 
 static RTE_DEFINE_PER_LCORE(int, _epfd) = -1; /**< epoll fd per thread */
 
+#define RTE_UEVENT_MSG_LEN 4096
+#define RTE_UEVENT_SUBSYSTEM_UIO 1
+
+int hotplug_fd = -1;
+
 /**
  * union for pipe fds.
  */
@@ -669,10 +678,13 @@ eal_intr_process_interrupts(struct epoll_event *events, int nfds)
 			RTE_SET_USED(r);
 			return -1;
 		}
+
 		rte_spinlock_lock(&intr_lock);
 		TAILQ_FOREACH(src, &intr_sources, next)
-			if (src->intr_handle.fd ==
-					events[n].data.fd)
+			if ((src->intr_handle.fd ==
+					events[n].data.fd) ||
+				(hotplug_fd ==
+					events[n].data.fd))
 				break;
 		if (src == NULL){
 			rte_spinlock_unlock(&intr_lock);
@@ -858,7 +870,24 @@ eal_intr_thread_main(__rte_unused void *arg)
 			}
 			else
 				numfds++;
+
+			/**
+			 * add device uevent file descriptor
+			 * into wait list for hot plug.
+			 */
+			ev.events = EPOLLIN | EPOLLPRI | EPOLLRDHUP | EPOLLHUP;
+			ev.data.fd = hotplug_fd;
+			if (epoll_ctl(pfd, EPOLL_CTL_ADD,
+					hotplug_fd, &ev) < 0){
+				rte_panic("Error adding hotplug_fd %d epoll_ctl, %s\n",
+					hotplug_fd, strerror(errno));
+			}
+			else
+				numfds++;
+
 		}
+
+
 		rte_spinlock_unlock(&intr_lock);
 		/* serve the interrupt */
 		eal_intr_handle_interrupts(pfd, numfds);
@@ -877,6 +906,9 @@ rte_eal_intr_init(void)
 	int ret = 0, ret_1 = 0;
 	char thread_name[RTE_MAX_THREAD_NAME_LEN];
 
+	/* connect to monitor device uevent  */
+	rte_uevent_connect();
+
 	/* init the global interrupt source head */
 	TAILQ_INIT(&intr_sources);
 
@@ -1255,3 +1287,113 @@ rte_intr_cap_multiple(struct rte_intr_handle *intr_handle)
 
 	return 0;
 }
+
+int
+rte_uevent_connect(void)
+{
+	struct sockaddr_nl addr;
+	int ret;
+	int netlink_fd;
+	int size = 64 * 1024;
+	int nonblock = 1;
+	memset(&addr, 0, sizeof(addr));
+	addr.nl_family = AF_NETLINK;
+	addr.nl_pid = getpid();
+	addr.nl_groups = 0xffffffff;
+
+	netlink_fd = socket(PF_NETLINK, SOCK_DGRAM, NETLINK_KOBJECT_UEVENT);
+	if (netlink_fd < 0)
+		return -1;
+
+	setsockopt(netlink_fd, SOL_SOCKET, SO_RCVBUFFORCE, &size, sizeof(size));
+
+	ret = ioctl(netlink_fd, FIONBIO, &nonblock);
+	if (ret != 0) {
+		RTE_LOG(ERR, EAL,
+		"ioctl(FIONBIO) failed\n");
+		close(netlink_fd);
+		return -1;
+	}
+
+	if (bind(netlink_fd, (struct sockaddr *) &addr, sizeof(addr)) < 0) {
+		close(netlink_fd);
+		return -1;
+	}
+
+	hotplug_fd = netlink_fd;
+
+	return netlink_fd;
+}
+
+static int
+parse_event(const char *buf, struct rte_uevent *event)
+{
+	char action[RTE_UEVENT_MSG_LEN];
+	char subsystem[RTE_UEVENT_MSG_LEN];
+	char dev_path[RTE_UEVENT_MSG_LEN];
+
+	memset(action, 0, RTE_UEVENT_MSG_LEN);
+	memset(subsystem, 0, RTE_UEVENT_MSG_LEN);
+	memset(dev_path, 0, RTE_UEVENT_MSG_LEN);
+
+	while (*buf) {
+		if (!strncmp(buf, "ACTION=", 7)) {
+			buf += 7;
+			snprintf(action, sizeof(action), "%s", buf);
+		} else if (!strncmp(buf, "DEVPATH=", 8)) {
+			buf += 8;
+			snprintf(dev_path, sizeof(dev_path), "%s", buf);
+		} else if (!strncmp(buf, "SUBSYSTEM=", 10)) {
+			buf += 10;
+			snprintf(subsystem, sizeof(subsystem), "%s", buf);
+		}
+		while (*buf++)
+			;
+	}
+
+	if (!strncmp(subsystem, "uio", 3)) {
+
+		event->subsystem = RTE_UEVENT_SUBSYSTEM_UIO;
+		if (!strncmp(action, "add", 3)) {
+			event->action = RTE_UEVENT_ADD;
+		}
+		if (!strncmp(action, "remove", 6)) {
+			event->action = RTE_UEVENT_REMOVE;
+		}
+		return 1;
+	}
+
+	return -1;
+}
+
+int
+rte_get_uevent(int fd, struct rte_uevent *uevent)
+{
+	int ret;
+	char buf[RTE_UEVENT_MSG_LEN];
+
+	memset(uevent, 0, sizeof(struct rte_uevent));
+	memset(buf, 0, RTE_UEVENT_MSG_LEN);
+
+	ret = recv(fd, buf, RTE_UEVENT_MSG_LEN - 1, MSG_DONTWAIT);
+	if (ret > 0) {
+		return parse_event(buf, uevent);
+	}
+
+	if (ret < 0) {
+		if (errno == EAGAIN || errno == EWOULDBLOCK) {
+			return 0;
+		} else {
+			RTE_LOG(ERR, EAL,
+			"Socket read error(%d): %s\n",
+			errno, strerror(errno));
+		}
+	}
+
+	/* connection closed */
+	if (ret == 0) {
+		return -1;
+	}
+
+	return 0;
+}
diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
index 6daffeb..d32ba01 100644
--- a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
+++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
@@ -99,6 +99,16 @@ struct rte_intr_handle {
 	int *intr_vec;                 /**< intr vector number array */
 };
 
+enum rte_uevent_action {
+	RTE_UEVENT_ADD = 0,		/**< uevent type of device add */
+	RTE_UEVENT_REMOVE = 1,	/**< uevent type of device remove*/
+};
+
+struct rte_uevent {
+	enum rte_uevent_action action;	/**< uevent action type */
+	int subsystem;				/**< subsystem id */
+};
+
 #define RTE_EPOLL_PER_THREAD        -1  /**< to hint using per thread epfd */
 
 /**
@@ -236,4 +246,26 @@ rte_intr_allow_others(struct rte_intr_handle *intr_handle);
 int
 rte_intr_cap_multiple(struct rte_intr_handle *intr_handle);
 
+/**
+ * It read out the uevent from the specific file descriptor.
+ *
+ * @param fd
+ *   The fd which the uevent  associated to
+ * @param uevent
+ *   Pointer to the uevent which read from the monitoring fd.
+ * @return
+ *   - On success, one.
+ *   - On failure, zeor or a negative value.
+ */
+int
+rte_get_uevent(int fd, struct rte_uevent *uevent);
+
+/**
+ * Connect to the device uevent file descriptor.
+ * @return
+ *   return the connected uevent fd.
+ */
+int
+rte_uevent_connect(void);
+
 #endif /* _RTE_LINUXAPP_INTERRUPTS_H_ */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* Re: [RFC] Add hot plug event in rte eal interrupt and inplement it in i40e driver.
  2017-05-28 15:44 [RFC] Add hot plug event in rte eal interrupt and inplement it in i40e driver Jeff Guo
@ 2017-05-30  7:14 ` Gaëtan Rivet
  2017-06-07  7:40   ` Wu, Jingjing
  2017-06-07  7:27 ` Wu, Jingjing
  2017-06-28 11:07 ` [PATCH v2 1/2] eal: add uevent api for hot plug Jeff Guo
  2 siblings, 1 reply; 494+ messages in thread
From: Gaëtan Rivet @ 2017-05-30  7:14 UTC (permalink / raw)
  To: Jeff Guo
  Cc: helin.zhang, jingjing.wu, bruce.richardson, konstantin.ananyev,
	yuanhan.liu, dev

Hi Jeff,

A few comments below:

On Sun, May 28, 2017 at 11:44:40PM +0800, Jeff Guo wrote:
>For HW hotplug feature, we had already got upstream that removal event adding from 6wind as bellow.
>
>dependency of “add device removal event” patch set:
>http://dpdk.org/dev/patchwork/patch/23693/
>[dpdk-dev,v2,1/5] ethdev: introduce device removal event
>http://dpdk.org/dev/patchwork/patch/23694/
>[dpdk-dev,v2,2/5] net/mlx4: device removal event support
>http://dpdk.org/dev/patchwork/patch/23695/
>[dpdk-dev,v2,3/5] app/testpmd: generic event handler
>http://dpdk.org/dev/patchwork/patch/23696/
>[dpdk-dev,v2,4/5] app/testpmd: request link status interrupt
>http://dpdk.org/dev/patchwork/patch/23697/
>[dpdk-dev,v2,5/5] app/testpmd: request device removal interrupt
>
>From the patches, we can see a new event type “RTE_ETH_DEV_INTR_RMV” has been added into the ethdev, and the event has been implemented in mlx4 driver, and Testpmd be use for testing purposes and as a practical usage example for how to use these event. The progress is use the mlx4 driver to register interrupt callback function to rte eal interrupt source, and when rte epolling detect the IBV_EVENT_DEVICE_FATAL , which is identify the device remove behavior, it will callback into the driver’s interrupt handle to handle it, and then callback to the user app, such as testpmd, to detach the pci device.
>
>So far, except the mlx4 driver, other driver like i40, that not have the remove interrupt from hw, will not be able to monitoring the remove behavior, so in order to expand the hot plug feature for all driver cases, something must be done ot detect the remove event at the kernel level and offer a new line of interrupt to the userland. The idea is coming as below.
>

It's nice that this event is extended to other drivers.

>Use I40e as example, we know that the rmv interrupt is not added in hw, but we could monitor the uio file descriptor to catch the device remove event as bellow.
>
>The info of uevent form FD monitoring:
>remove@/devices/pci0000:80/0000:80:02.2/0000:82:00.0/0000:83:03.0/0000:84:00.2/uio/uio2
>ACTION=remove
>DEVPATH=/devices/pci0000:80/0000:80:02.2/0000:82:00.0/0000:83:03.0/0000:84:00.2/uio/uio2
>SUBSYSTEM=uio
>MAJOR=243
>MINOR=2
>DEVNAME=uio2
>SEQNUM=11366
>
>Firstly, in order to monitor the uio file descriptor, we need to create socket to monitor the uio fd, that is defined as “hotplug_fd”, and then add the uio fd into the epoll fd list, rte eal could epoll all of the interrupt event from hw interrupt and also include the uevent from hotplug_fd.
>
>Secondly, in order to read out the uevent that monitoring, we need to add uevent API in rte layer. We plan add 2 , rte_uevent_connect and  rte_get_uevent. All driver interrupt handler could use these API to enable the uevent monitoring, and read out the uevent type , then corresponding to handle these uevent, such as detach the device when get the remove type.
>

I find having a generic uevent API interesting.

However, all specifics pertaining to UIO use (hotplug_fd, subsystem
enum) should stay in UIO specific code (eal_pci_uio.c?).

I am currently moving the PCI bus out of the EAL. EAL subsystems should
not rely on PCI specifics, as they won't be available afterward.

It should also allow you to clean up your API. Exposing hotplug_fd and
requiring PMDs to link it can be avoided and should result in a simpler
API.

>Signed-off-by: Jeff Guo <jia.guo@intel.com>
>---
> drivers/net/i40e/i40e_ethdev.c                     |  15 +++
> lib/librte_eal/linuxapp/eal/eal_interrupts.c       | 146 ++++++++++++++++++++-
> .../linuxapp/eal/include/exec-env/rte_interrupts.h |  32 +++++
> 3 files changed, 191 insertions(+), 2 deletions(-)
>
>diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
>index 4c49673..0336f82 100644
>--- a/drivers/net/i40e/i40e_ethdev.c
>+++ b/drivers/net/i40e/i40e_ethdev.c
>@@ -66,6 +66,8 @@
> #include "i40e_pf.h"
> #include "i40e_regs.h"
>
>+extern int hotplug_fd;
>+
> #define ETH_I40E_FLOATING_VEB_ARG	"enable_floating_veb"
> #define ETH_I40E_FLOATING_VEB_LIST_ARG	"floating_veb_list"
>
>@@ -5808,8 +5810,21 @@ i40e_dev_interrupt_handler(void *param)
> {
> 	struct rte_eth_dev *dev = (struct rte_eth_dev *)param;
> 	struct i40e_hw *hw = I40E_DEV_PRIVATE_TO_HW(dev->data->dev_private);
>+	struct rte_uevent event;
> 	uint32_t icr0;
>
>+	/* check device  uevent */
>+	while (rte_get_uevent(hotplug_fd, &event) > 0) {
>+		if (event.subsystem == 1) {
>+			if (event.action == RTE_UEVENT_ADD) {
>+				//do nothing here
>+			} else if (event.action == RTE_UEVENT_REMOVE) {
>+				_rte_eth_dev_callback_process(dev,
>+					RTE_ETH_EVENT_INTR_RMV, NULL);
>+			}
>+		}
>+	}
>+
> 	/* Disable interrupt */
> 	i40e_pf_disable_irq0(hw);
>
>diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
>index 2e3bd12..873ab5f 100644
>--- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c
>+++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
>@@ -65,6 +65,10 @@
> #include <rte_errno.h>
> #include <rte_spinlock.h>
>
>+#include <sys/socket.h>
>+#include <linux/netlink.h>
>+#include <sys/epoll.h>
>+
> #include "eal_private.h"
> #include "eal_vfio.h"
> #include "eal_thread.h"
>@@ -74,6 +78,11 @@
>
> static RTE_DEFINE_PER_LCORE(int, _epfd) = -1; /**< epoll fd per thread */
>
>+#define RTE_UEVENT_MSG_LEN 4096
>+#define RTE_UEVENT_SUBSYSTEM_UIO 1
>+
>+int hotplug_fd = -1;
>+
> /**
>  * union for pipe fds.
>  */
>@@ -669,10 +678,13 @@ eal_intr_process_interrupts(struct epoll_event *events, int nfds)
> 			RTE_SET_USED(r);
> 			return -1;
> 		}
>+
> 		rte_spinlock_lock(&intr_lock);
> 		TAILQ_FOREACH(src, &intr_sources, next)
>-			if (src->intr_handle.fd ==
>-					events[n].data.fd)
>+			if ((src->intr_handle.fd ==
>+					events[n].data.fd) ||
>+				(hotplug_fd ==
>+					events[n].data.fd))
> 				break;
> 		if (src == NULL){
> 			rte_spinlock_unlock(&intr_lock);
>@@ -858,7 +870,24 @@ eal_intr_thread_main(__rte_unused void *arg)
> 			}
> 			else
> 				numfds++;
>+
>+			/**
>+			 * add device uevent file descriptor
>+			 * into wait list for hot plug.
>+			 */
>+			ev.events = EPOLLIN | EPOLLPRI | EPOLLRDHUP | EPOLLHUP;
>+			ev.data.fd = hotplug_fd;
>+			if (epoll_ctl(pfd, EPOLL_CTL_ADD,
>+					hotplug_fd, &ev) < 0){
>+				rte_panic("Error adding hotplug_fd %d epoll_ctl, %s\n",
>+					hotplug_fd, strerror(errno));
>+			}
>+			else
>+				numfds++;
>+
> 		}
>+
>+
> 		rte_spinlock_unlock(&intr_lock);
> 		/* serve the interrupt */
> 		eal_intr_handle_interrupts(pfd, numfds);
>@@ -877,6 +906,9 @@ rte_eal_intr_init(void)
> 	int ret = 0, ret_1 = 0;
> 	char thread_name[RTE_MAX_THREAD_NAME_LEN];
>
>+	/* connect to monitor device uevent  */
>+	rte_uevent_connect();
>+
> 	/* init the global interrupt source head */
> 	TAILQ_INIT(&intr_sources);
>
>@@ -1255,3 +1287,113 @@ rte_intr_cap_multiple(struct rte_intr_handle *intr_handle)
>
> 	return 0;
> }
>+
>+int
>+rte_uevent_connect(void)
>+{
>+	struct sockaddr_nl addr;
>+	int ret;
>+	int netlink_fd;
>+	int size = 64 * 1024;
>+	int nonblock = 1;
>+	memset(&addr, 0, sizeof(addr));
>+	addr.nl_family = AF_NETLINK;
>+	addr.nl_pid = getpid();
>+	addr.nl_groups = 0xffffffff;
>+
>+	netlink_fd = socket(PF_NETLINK, SOCK_DGRAM, NETLINK_KOBJECT_UEVENT);
>+	if (netlink_fd < 0)
>+		return -1;
>+
>+	setsockopt(netlink_fd, SOL_SOCKET, SO_RCVBUFFORCE, &size, sizeof(size));
>+
>+	ret = ioctl(netlink_fd, FIONBIO, &nonblock);
>+	if (ret != 0) {
>+		RTE_LOG(ERR, EAL,
>+		"ioctl(FIONBIO) failed\n");
>+		close(netlink_fd);
>+		return -1;
>+	}
>+
>+	if (bind(netlink_fd, (struct sockaddr *) &addr, sizeof(addr)) < 0) {
>+		close(netlink_fd);
>+		return -1;
>+	}
>+
>+	hotplug_fd = netlink_fd;
>+
>+	return netlink_fd;
>+}
>+
>+static int
>+parse_event(const char *buf, struct rte_uevent *event)
>+{
>+	char action[RTE_UEVENT_MSG_LEN];
>+	char subsystem[RTE_UEVENT_MSG_LEN];
>+	char dev_path[RTE_UEVENT_MSG_LEN];
>+
>+	memset(action, 0, RTE_UEVENT_MSG_LEN);
>+	memset(subsystem, 0, RTE_UEVENT_MSG_LEN);
>+	memset(dev_path, 0, RTE_UEVENT_MSG_LEN);
>+
>+	while (*buf) {
>+		if (!strncmp(buf, "ACTION=", 7)) {
>+			buf += 7;
>+			snprintf(action, sizeof(action), "%s", buf);
>+		} else if (!strncmp(buf, "DEVPATH=", 8)) {
>+			buf += 8;
>+			snprintf(dev_path, sizeof(dev_path), "%s", buf);
>+		} else if (!strncmp(buf, "SUBSYSTEM=", 10)) {
>+			buf += 10;
>+			snprintf(subsystem, sizeof(subsystem), "%s", buf);
>+		}
>+		while (*buf++)
>+			;
>+	}
>+
>+	if (!strncmp(subsystem, "uio", 3)) {
>+
>+		event->subsystem = RTE_UEVENT_SUBSYSTEM_UIO;
>+		if (!strncmp(action, "add", 3)) {
>+			event->action = RTE_UEVENT_ADD;
>+		}
>+		if (!strncmp(action, "remove", 6)) {
>+			event->action = RTE_UEVENT_REMOVE;
>+		}
>+		return 1;
>+	}
>+
>+	return -1;
>+}
>+
>+int
>+rte_get_uevent(int fd, struct rte_uevent *uevent)
>+{
>+	int ret;
>+	char buf[RTE_UEVENT_MSG_LEN];
>+
>+	memset(uevent, 0, sizeof(struct rte_uevent));
>+	memset(buf, 0, RTE_UEVENT_MSG_LEN);
>+
>+	ret = recv(fd, buf, RTE_UEVENT_MSG_LEN - 1, MSG_DONTWAIT);
>+	if (ret > 0) {
>+		return parse_event(buf, uevent);
>+	}
>+
>+	if (ret < 0) {
>+		if (errno == EAGAIN || errno == EWOULDBLOCK) {
>+			return 0;
>+		} else {
>+			RTE_LOG(ERR, EAL,
>+			"Socket read error(%d): %s\n",
>+			errno, strerror(errno));
>+		}
>+	}
>+
>+	/* connection closed */
>+	if (ret == 0) {
>+		return -1;
>+	}
>+
>+	return 0;
>+}
>diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
>index 6daffeb..d32ba01 100644
>--- a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
>+++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
>@@ -99,6 +99,16 @@ struct rte_intr_handle {
> 	int *intr_vec;                 /**< intr vector number array */
> };
>
>+enum rte_uevent_action {
>+	RTE_UEVENT_ADD = 0,		/**< uevent type of device add */
>+	RTE_UEVENT_REMOVE = 1,	/**< uevent type of device remove*/
>+};
>+
>+struct rte_uevent {
>+	enum rte_uevent_action action;	/**< uevent action type */
>+	int subsystem;				/**< subsystem id */
>+};
>+
> #define RTE_EPOLL_PER_THREAD        -1  /**< to hint using per thread epfd */
>
> /**
>@@ -236,4 +246,26 @@ rte_intr_allow_others(struct rte_intr_handle *intr_handle);
> int
> rte_intr_cap_multiple(struct rte_intr_handle *intr_handle);
>
>+/**
>+ * It read out the uevent from the specific file descriptor.
>+ *
>+ * @param fd
>+ *   The fd which the uevent  associated to
>+ * @param uevent
>+ *   Pointer to the uevent which read from the monitoring fd.
>+ * @return
>+ *   - On success, one.
>+ *   - On failure, zeor or a negative value.
>+ */
>+int
>+rte_get_uevent(int fd, struct rte_uevent *uevent);
>+
>+/**
>+ * Connect to the device uevent file descriptor.
>+ * @return
>+ *   return the connected uevent fd.
>+ */
>+int
>+rte_uevent_connect(void);
>+
> #endif /* _RTE_LINUXAPP_INTERRUPTS_H_ */
>-- 
>2.7.4
>

-- 
Gaëtan Rivet
6WIND

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [RFC] Add hot plug event in rte eal interrupt and inplement it in i40e driver.
  2017-05-28 15:44 [RFC] Add hot plug event in rte eal interrupt and inplement it in i40e driver Jeff Guo
  2017-05-30  7:14 ` Gaëtan Rivet
@ 2017-06-07  7:27 ` Wu, Jingjing
  2017-06-28 11:07 ` [PATCH v2 1/2] eal: add uevent api for hot plug Jeff Guo
  2 siblings, 0 replies; 494+ messages in thread
From: Wu, Jingjing @ 2017-06-07  7:27 UTC (permalink / raw)
  To: Guo, Jia, Zhang, Helin, Richardson, Bruce, Ananyev, Konstantin,
	Liu, Yuanhan, gaetan.rivet
  Cc: dev



> -----Original Message-----
> From: Guo, Jia
> Sent: Sunday, May 28, 2017 11:45 PM
> To: Zhang, Helin <helin.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>; Richardson,
> Bruce <bruce.richardson@intel.com>; Ananyev, Konstantin <konstantin.ananyev@intel.com>;
> Liu, Yuanhan <yuanhan.liu@intel.com>; gaetan.rivet@6wind.com
> Cc: dev@dpdk.org; Guo, Jia <jia.guo@intel.com>
> Subject: [dpdk-dev] [RFC] Add hot plug event in rte eal interrupt and inplement it in i40e
> driver.
> 
> For HW hotplug feature, we had already got upstream that removal event adding from 6wind
> as bellow.
> 
> dependency of “add device removal event” patch set:
> http://dpdk.org/dev/patchwork/patch/23693/
> [dpdk-dev,v2,1/5] ethdev: introduce device removal event
> http://dpdk.org/dev/patchwork/patch/23694/
> [dpdk-dev,v2,2/5] net/mlx4: device removal event support
> http://dpdk.org/dev/patchwork/patch/23695/
> [dpdk-dev,v2,3/5] app/testpmd: generic event handler
> http://dpdk.org/dev/patchwork/patch/23696/
> [dpdk-dev,v2,4/5] app/testpmd: request link status interrupt
> http://dpdk.org/dev/patchwork/patch/23697/
> [dpdk-dev,v2,5/5] app/testpmd: request device removal interrupt
> 
> From the patches, we can see a new event type “RTE_ETH_DEV_INTR_RMV” has been added
> into the ethdev, and the event has been implemented in mlx4 driver, and Testpmd be use for
> testing purposes and as a practical usage example for how to use these event. The progress is
> use the mlx4 driver to register interrupt callback function to rte eal interrupt source, and
> when rte epolling detect the IBV_EVENT_DEVICE_FATAL , which is identify the device remove
> behavior, it will callback into the driver’s interrupt handle to handle it, and then callback to
> the user app, such as testpmd, to detach the pci device.
> 
> So far, except the mlx4 driver, other driver like i40, that not have the remove interrupt from
> hw, will not be able to monitoring the remove behavior, so in order to expand the hot plug
> feature for all driver cases, something must be done ot detect the remove event at the kernel
> level and offer a new line of interrupt to the userland. The idea is coming as below.
> 
> Use I40e as example, we know that the rmv interrupt is not added in hw, but we could
> monitor the uio file descriptor to catch the device remove event as bellow.
> 
> The info of uevent form FD monitoring:
> remove@/devices/pci0000:80/0000:80:02.2/0000:82:00.0/0000:83:03.0/0000:84:00.2/uio
> /uio2
> ACTION=remove
> DEVPATH=/devices/pci0000:80/0000:80:02.2/0000:82:00.0/0000:83:03.0/0000:84:00.2/ui
> o/uio2
> SUBSYSTEM=uio
> MAJOR=243
> MINOR=2
> DEVNAME=uio2
> SEQNUM=11366
> 
> Firstly, in order to monitor the uio file descriptor, we need to create socket to monitor the uio
> fd, that is defined as “hotplug_fd”, and then add the uio fd into the epoll fd list, rte eal could
> epoll all of the interrupt event from hw interrupt and also include the uevent from
> hotplug_fd.
> 
> Secondly, in order to read out the uevent that monitoring, we need to add uevent API in rte
> layer. We plan add 2 , rte_uevent_connect and  rte_get_uevent. All driver interrupt handler
> could use these API to enable the uevent monitoring, and read out the uevent type , then
> corresponding to handle these uevent, such as detach the device when get the remove type.
> 
> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> ---
>  drivers/net/i40e/i40e_ethdev.c                     |  15 +++
>  lib/librte_eal/linuxapp/eal/eal_interrupts.c       | 146 ++++++++++++++++++++-
>  .../linuxapp/eal/include/exec-env/rte_interrupts.h |  32 +++++
>  3 files changed, 191 insertions(+), 2 deletions(-)
>
It will be better to split the patch to two sub patches, one is for eal change, the other is for driver enabling.

> diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
> index 4c49673..0336f82 100644
> --- a/drivers/net/i40e/i40e_ethdev.c
> +++ b/drivers/net/i40e/i40e_ethdev.c
> @@ -66,6 +66,8 @@
>  #include "i40e_pf.h"
>  #include "i40e_regs.h"
> 
> +extern int hotplug_fd;
> +
>  #define ETH_I40E_FLOATING_VEB_ARG	"enable_floating_veb"
>  #define ETH_I40E_FLOATING_VEB_LIST_ARG	"floating_veb_list"
> 
> @@ -5808,8 +5810,21 @@ i40e_dev_interrupt_handler(void *param)
>  {
>  	struct rte_eth_dev *dev = (struct rte_eth_dev *)param;
>  	struct i40e_hw *hw = I40E_DEV_PRIVATE_TO_HW(dev->data->dev_private);
> +	struct rte_uevent event;
>  	uint32_t icr0;
> 
> +	/* check device  uevent */
> +	while (rte_get_uevent(hotplug_fd, &event) > 0) {
The hotplug_fd is used as uevent fd for all devices, right? but in i40e driver, how distinguish the event is to this dev?
I saw you check the src in eal_intr_process_interrupts, but you didn't assign the fd in i40e device's intr_handle.
Is that to say all driver's callback will be triggered?

> +		if (event.subsystem == 1) {
What is the 1 meaning? 
> +			if (event.action == RTE_UEVENT_ADD) {
> +				//do nothing here
Will you plan do anything later? Such as RTE_ETH_EVENT_INTR_NEW? If no, please remove it.
And {} can be omit.

> +			} else if (event.action == RTE_UEVENT_REMOVE) {
> +				_rte_eth_dev_callback_process(dev,
> +					RTE_ETH_EVENT_INTR_RMV, NULL);
> +			}
> +		}


> +
> +	/* connection closed */
> +	if (ret == 0) {
> +		return -1;
> +	}
{} can be omit. Serval in this patch. Please check.


> +/**
> + * It read out the uevent from the specific file descriptor.
> + *
> + * @param fd
> + *   The fd which the uevent  associated to
> + * @param uevent
> + *   Pointer to the uevent which read from the monitoring fd.
> + * @return
> + *   - On success, one.
> + *   - On failure, zeor or a negative value.
Zeor -> zero
Generally speaking, we are using negative value to indicate failure, and 0 indicate success. Expect the result has more than two options (success, failure).

> +int
> +rte_get_uevent(int fd, struct rte_uevent *uevent);
> +
> +/**
> + * Connect to the device uevent file descriptor.
> + * @return
> + *   return the connected uevent fd.
> + */
Any return code for failure?

Thanks
Jingjing

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [RFC] Add hot plug event in rte eal interrupt and inplement it in i40e driver.
  2017-05-30  7:14 ` Gaëtan Rivet
@ 2017-06-07  7:40   ` Wu, Jingjing
  2017-06-15 21:22     ` Gaëtan Rivet
  2017-06-29 17:27     ` Stephen Hemminger
  0 siblings, 2 replies; 494+ messages in thread
From: Wu, Jingjing @ 2017-06-07  7:40 UTC (permalink / raw)
  To: Gaëtan Rivet, Guo, Jia
  Cc: Zhang, Helin, Richardson, Bruce, Ananyev, Konstantin, Liu, Yuanhan, dev

> >
> >Secondly, in order to read out the uevent that monitoring, we need to add uevent API in rte
> layer. We plan add 2 , rte_uevent_connect and  rte_get_uevent. All driver interrupt handler
> could use these API to enable the uevent monitoring, and read out the uevent type , then
> corresponding to handle these uevent, such as detach the device when get the remove type.
> >
> 
> I find having a generic uevent API interesting.
> 
> However, all specifics pertaining to UIO use (hotplug_fd, subsystem
> enum) should stay in UIO specific code (eal_pci_uio.c?).
>
Yes, but it can be also considered as interrupt mechanism, right?

> I am currently moving the PCI bus out of the EAL. EAL subsystems should
> not rely on PCI specifics, as they won't be available afterward.

Will the interrupt handling be kept in EAL, right?

> It should also allow you to clean up your API. Exposing hotplug_fd and
> requiring PMDs to link it can be avoided and should result in a simpler
> API.
Didn't get the idea. Why it will result in a simpler API? Is there any patch help
Me to understand?

Thanks
Jingjing

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [RFC] Add hot plug event in rte eal interrupt and inplement it in i40e driver.
  2017-06-07  7:40   ` Wu, Jingjing
@ 2017-06-15 21:22     ` Gaëtan Rivet
  2017-06-21  2:50       ` Guo, Jia
  2017-06-29 17:27     ` Stephen Hemminger
  1 sibling, 1 reply; 494+ messages in thread
From: Gaëtan Rivet @ 2017-06-15 21:22 UTC (permalink / raw)
  To: Wu, Jingjing
  Cc: Guo, Jia, Zhang, Helin, Richardson, Bruce, Ananyev, Konstantin,
	Liu, Yuanhan, dev

Hi Jingjing,

On Wed, Jun 07, 2017 at 07:40:37AM +0000, Wu, Jingjing wrote:
> > >
> > >Secondly, in order to read out the uevent that monitoring, we need to add uevent API in rte
> > layer. We plan add 2 , rte_uevent_connect and  rte_get_uevent. All driver interrupt handler
> > could use these API to enable the uevent monitoring, and read out the uevent type , then
> > corresponding to handle these uevent, such as detach the device when get the remove type.
> > >
> > 
> > I find having a generic uevent API interesting.
> > 
> > However, all specifics pertaining to UIO use (hotplug_fd, subsystem
> > enum) should stay in UIO specific code (eal_pci_uio.c?).
> >
> Yes, but it can be also considered as interrupt mechanism, right?
> 

Sure.

> > I am currently moving the PCI bus out of the EAL. EAL subsystems should
> > not rely on PCI specifics, as they won't be available afterward.
> 
> Will the interrupt handling be kept in EAL, right?
> 

Ah yes, I was actually mistaken and thought more UIO parts would be
moving.

> > It should also allow you to clean up your API. Exposing hotplug_fd and
> > requiring PMDs to link it can be avoided and should result in a simpler
> > API.
> Didn't get the idea. Why it will result in a simpler API? Is there any patch help
> Me to understand?
> 

How do you demux the hotplug_fd for several drivers / device?

-- 
Gaëtan Rivet
6WIND

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [RFC] Add hot plug event in rte eal interrupt and inplement it in i40e driver.
  2017-06-15 21:22     ` Gaëtan Rivet
@ 2017-06-21  2:50       ` Guo, Jia
  0 siblings, 0 replies; 494+ messages in thread
From: Guo, Jia @ 2017-06-21  2:50 UTC (permalink / raw)
  To: Gaëtan Rivet, Wu, Jingjing
  Cc: Zhang, Helin, Richardson, Bruce, Ananyev, Konstantin, Liu, Yuanhan, dev

hi,gaetan


On 6/16/2017 5:22 AM, Gaëtan Rivet wrote:
> Hi Jingjing,
>
> On Wed, Jun 07, 2017 at 07:40:37AM +0000, Wu, Jingjing wrote:
>>>> Secondly, in order to read out the uevent that monitoring, we need to add uevent API in rte
>>> layer. We plan add 2 , rte_uevent_connect and  rte_get_uevent. All driver interrupt handler
>>> could use these API to enable the uevent monitoring, and read out the uevent type , then
>>> corresponding to handle these uevent, such as detach the device when get the remove type.
>>> I find having a generic uevent API interesting.
>>>
>>> However, all specifics pertaining to UIO use (hotplug_fd, subsystem
>>> enum) should stay in UIO specific code (eal_pci_uio.c?).
>>>
>> Yes, but it can be also considered as interrupt mechanism, right?
>>
> Sure.
>
>>> I am currently moving the PCI bus out of the EAL. EAL subsystems should
>>> not rely on PCI specifics, as they won't be available afterward.
>> Will the interrupt handling be kept in EAL, right?
>>
> Ah yes, I was actually mistaken and thought more UIO parts would be
> moving.
so , i assumption that the interrupt handling still be kept in EAL, so 
that would not affect my adding the uevent in the eal interrupt part, 
right? so if it have any other dependency, please shout to let me know.
>>> It should also allow you to clean up your API. Exposing hotplug_fd and
>>> requiring PMDs to link it can be avoided and should result in a simpler
>>> API.
>> Didn't get the idea. Why it will result in a simpler API? Is there any patch help
>> Me to understand?
>>
> How do you demux the hotplug_fd for several drivers / device?
it is related with the dual port/device/driver problem, i think what 
should be some mapping of the dev_path there,  i will refine the part to 
handle it  in v2.  Thanks.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH v2 1/2] eal: add uevent api for hot plug
  2017-05-28 15:44 [RFC] Add hot plug event in rte eal interrupt and inplement it in i40e driver Jeff Guo
  2017-05-30  7:14 ` Gaëtan Rivet
  2017-06-07  7:27 ` Wu, Jingjing
@ 2017-06-28 11:07 ` Jeff Guo
  2017-06-28 11:07   ` [PATCH v2 2/2] net/i40e: add hot plug monitor in i40e Jeff Guo
                     ` (2 more replies)
  2 siblings, 3 replies; 494+ messages in thread
From: Jeff Guo @ 2017-06-28 11:07 UTC (permalink / raw)
  To: helin.zhang, jingjing.wu; +Cc: dev, jia.guo

From: "Guo, Jia" <jia.guo@intel.com>

This patch aim to add a variable "uevent_fd" in structure
"rte_intr_handle" for enable kernel object uevent monitoring,
and add some uevent API in rte eal interrupt, that is
“rte_uevent_connect” and “rte_uevent_get”, so that all driver
could use these API to monitor and read out the uevent, then
corresponding to handle these uevent, such as detach or attach
the device.

Signed-off-by: Guo, Jia <jia.guo@intel.com>
---
v2->v1: remove global variables of hotplug_fd, add uevent_fd
        in rte_intr_handle to let each pci device self maintain
	to fix dual device fd issue. refine some typo error. 	 
---
 lib/librte_eal/common/eal_common_pci_uio.c         |   6 +-
 lib/librte_eal/linuxapp/eal/eal_interrupts.c       | 143 ++++++++++++++++++++-
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c          |  12 ++
 .../linuxapp/eal/include/exec-env/rte_interrupts.h |  34 +++++
 4 files changed, 192 insertions(+), 3 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_pci_uio.c b/lib/librte_eal/common/eal_common_pci_uio.c
index 367a681..5b62f70 100644
--- a/lib/librte_eal/common/eal_common_pci_uio.c
+++ b/lib/librte_eal/common/eal_common_pci_uio.c
@@ -117,6 +117,7 @@
 
 	dev->intr_handle.fd = -1;
 	dev->intr_handle.uio_cfg_fd = -1;
+	dev->intr_handle.uevent_fd = -1;
 	dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
 
 	/* secondary processes - use already recorded details */
@@ -227,7 +228,10 @@
 		close(dev->intr_handle.uio_cfg_fd);
 		dev->intr_handle.uio_cfg_fd = -1;
 	}
-
+	if (dev->intr_handle.uevent_fd >= 0) {
+		close(dev->intr_handle.uevent_fd);
+		dev->intr_handle.uevent_fd = -1;
+	}
 	dev->intr_handle.fd = -1;
 	dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
 }
diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
index 2e3bd12..d596522 100644
--- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c
+++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
@@ -65,6 +65,10 @@
 #include <rte_errno.h>
 #include <rte_spinlock.h>
 
+#include <sys/socket.h>
+#include <linux/netlink.h>
+#include <sys/epoll.h>
+
 #include "eal_private.h"
 #include "eal_vfio.h"
 #include "eal_thread.h"
@@ -74,6 +78,9 @@
 
 static RTE_DEFINE_PER_LCORE(int, _epfd) = -1; /**< epoll fd per thread */
 
+#define RTE_UEVENT_MSG_LEN 4096
+#define RTE_UEVENT_SUBSYSTEM_UIO 1
+
 /**
  * union for pipe fds.
  */
@@ -669,10 +676,13 @@ struct rte_intr_source {
 			RTE_SET_USED(r);
 			return -1;
 		}
+
 		rte_spinlock_lock(&intr_lock);
 		TAILQ_FOREACH(src, &intr_sources, next)
-			if (src->intr_handle.fd ==
-					events[n].data.fd)
+			if ((src->intr_handle.fd ==
+					events[n].data.fd) ||
+				(src->intr_handle.uevent_fd ==
+					events[n].data.fd))
 				break;
 		if (src == NULL){
 			rte_spinlock_unlock(&intr_lock);
@@ -858,7 +868,24 @@ static __attribute__((noreturn)) void *
 			}
 			else
 				numfds++;
+
+			/**
+			 * add device uevent file descriptor
+			 * into wait list for uevent monitoring.
+			 */
+			ev.events = EPOLLIN | EPOLLPRI | EPOLLRDHUP | EPOLLHUP;
+			ev.data.fd = src->intr_handle.uevent_fd;
+			if (epoll_ctl(pfd, EPOLL_CTL_ADD,
+					src->intr_handle.uevent_fd, &ev) < 0){
+				rte_panic("Error adding uevent_fd %d epoll_ctl"
+					", %s\n",
+					src->intr_handle.uevent_fd,
+					strerror(errno));
+			} else
+				numfds++;
 		}
+
+
 		rte_spinlock_unlock(&intr_lock);
 		/* serve the interrupt */
 		eal_intr_handle_interrupts(pfd, numfds);
@@ -1255,3 +1282,115 @@ static __attribute__((noreturn)) void *
 
 	return 0;
 }
+
+int
+rte_uevent_connect(void)
+{
+	struct sockaddr_nl addr;
+	int ret;
+	int netlink_fd = -1;
+	int size = 64 * 1024;
+	int nonblock = 1;
+	memset(&addr, 0, sizeof(addr));
+	addr.nl_family = AF_NETLINK;
+	addr.nl_pid = 0;
+	addr.nl_groups = 0xffffffff;
+
+	netlink_fd = socket(PF_NETLINK, SOCK_DGRAM, NETLINK_KOBJECT_UEVENT);
+	if (netlink_fd < 0)
+		return -1;
+
+	RTE_LOG(ERR, EAL,
+	"netlink_fd is %d\n", netlink_fd);
+
+	setsockopt(netlink_fd, SOL_SOCKET, SO_RCVBUFFORCE, &size, sizeof(size));
+
+	ret = ioctl(netlink_fd, FIONBIO, &nonblock);
+	if (ret != 0) {
+		RTE_LOG(ERR, EAL,
+		"ioctl(FIONBIO) failed\n");
+		close(netlink_fd);
+		return -1;
+	}
+
+	if (bind(netlink_fd, (struct sockaddr *) &addr, sizeof(addr)) < 0) {
+		close(netlink_fd);
+		return -1;
+	}
+
+	return netlink_fd;
+}
+
+static int
+parse_event(const char *buf, struct rte_uevent *event)
+{
+	char action[RTE_UEVENT_MSG_LEN];
+	char subsystem[RTE_UEVENT_MSG_LEN];
+	char dev_path[RTE_UEVENT_MSG_LEN];
+	int i = 0;
+
+	memset(action, 0, RTE_UEVENT_MSG_LEN);
+	memset(subsystem, 0, RTE_UEVENT_MSG_LEN);
+	memset(dev_path, 0, RTE_UEVENT_MSG_LEN);
+
+	while (*buf && i < RTE_UEVENT_MSG_LEN) {
+		if (!strncmp(buf, "ACTION=", 7)) {
+			buf += 7;
+			snprintf(action, sizeof(action), "%s", buf);
+		} else if (!strncmp(buf, "DEVPATH=", 8)) {
+			buf += 8;
+			snprintf(dev_path, sizeof(dev_path), "%s", buf);
+		} else if (!strncmp(buf, "SUBSYSTEM=", 10)) {
+			buf += 10;
+			snprintf(subsystem, sizeof(subsystem), "%s", buf);
+		}
+		while (*buf++)
+			i++;
+		while (*buf == '\0') {
+			buf++;
+			i++;
+		}
+	}
+
+	if (!strncmp(subsystem, "uio", 3)) {
+
+		event->subsystem = RTE_UEVENT_SUBSYSTEM_UIO;
+		if (!strncmp(action, "add", 3))
+			event->action = RTE_UEVENT_ADD;
+		if (!strncmp(action, "remove", 6))
+			event->action = RTE_UEVENT_REMOVE;
+		return 0;
+	}
+
+	return -1;
+}
+
+int
+rte_uevent_get(int fd, struct rte_uevent *uevent)
+{
+	int ret;
+	char buf[RTE_UEVENT_MSG_LEN];
+
+	memset(uevent, 0, sizeof(struct rte_uevent));
+	memset(buf, 0, RTE_UEVENT_MSG_LEN);
+
+	ret = recv(fd, buf, RTE_UEVENT_MSG_LEN - 1, MSG_DONTWAIT);
+	if (ret > 0)
+		return parse_event(buf, uevent);
+
+	if (ret < 0) {
+		if (errno == EAGAIN || errno == EWOULDBLOCK) {
+			return 0;
+		} else {
+			RTE_LOG(ERR, EAL,
+			"Socket read error(%d): %s\n",
+			errno, strerror(errno));
+		}
+	}
+
+	/* connection closed */
+	if (ret == 0)
+		return -1;
+
+	return 0;
+}
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
index fa10329..2fead82 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
@@ -231,6 +231,10 @@
 		close(dev->intr_handle.uio_cfg_fd);
 		dev->intr_handle.uio_cfg_fd = -1;
 	}
+	if (dev->intr_handle.uevent_fd >= 0) {
+		close(dev->intr_handle.uevent_fd);
+		dev->intr_handle.uevent_fd = -1;
+	}
 	if (dev->intr_handle.fd >= 0) {
 		close(dev->intr_handle.fd);
 		dev->intr_handle.fd = -1;
@@ -245,6 +249,7 @@
 	char dirname[PATH_MAX];
 	char cfgname[PATH_MAX];
 	char devname[PATH_MAX]; /* contains the /dev/uioX */
+	char uevtname[PATH_MAX];
 	int uio_num;
 	struct rte_pci_addr *loc;
 
@@ -276,6 +281,13 @@
 		goto error;
 	}
 
+	dev->intr_handle.uevent_fd = rte_uevent_connect();
+	if (dev->intr_handle.uevent_fd < 0) {
+		RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
+			uevtname, strerror(errno));
+		goto error;
+	}
+
 	if (dev->kdrv == RTE_KDRV_IGB_UIO)
 		dev->intr_handle.type = RTE_INTR_HANDLE_UIO;
 	else {
diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
index 6daffeb..bd1780d 100644
--- a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
+++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
@@ -90,6 +90,7 @@ struct rte_intr_handle {
 					for uio_pci_generic */
 	};
 	int fd;	 /**< interrupt event file descriptor */
+	int uevent_fd;	 /**< uevent file descriptor */
 	enum rte_intr_handle_type type;  /**< handle type */
 	uint32_t max_intr;             /**< max interrupt requested */
 	uint32_t nb_efd;               /**< number of available efd(event fd) */
@@ -99,6 +100,16 @@ struct rte_intr_handle {
 	int *intr_vec;                 /**< intr vector number array */
 };
 
+enum rte_uevent_action {
+	RTE_UEVENT_ADD = 0,		/**< uevent type of device add */
+	RTE_UEVENT_REMOVE = 1,	/**< uevent type of device remove*/
+};
+
+struct rte_uevent {
+	enum rte_uevent_action action;	/**< uevent action type */
+	int subsystem;				/**< subsystem id */
+};
+
 #define RTE_EPOLL_PER_THREAD        -1  /**< to hint using per thread epfd */
 
 /**
@@ -236,4 +247,27 @@ struct rte_intr_handle {
 int
 rte_intr_cap_multiple(struct rte_intr_handle *intr_handle);
 
+/**
+ * It read out the uevent from the specific file descriptor.
+ *
+ * @param fd
+ *   The fd which the uevent associated to
+ * @param uevent
+ *   Pointer to the uevent which read from the monitoring fd.
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+rte_uevent_get(int fd, struct rte_uevent *uevent);
+
+/**
+ * Connect to the device uevent file descriptor.
+ * @return
+ *   - On success, the connected uevent fd.
+ *   - On failure, a negative value.
+ */
+int
+rte_uevent_connect(void);
+
 #endif /* _RTE_LINUXAPP_INTERRUPTS_H_ */
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v2 2/2] net/i40e: add hot plug monitor in i40e
  2017-06-28 11:07 ` [PATCH v2 1/2] eal: add uevent api for hot plug Jeff Guo
@ 2017-06-28 11:07   ` Jeff Guo
  2017-06-29  1:41     ` Wu, Jingjing
                       ` (3 more replies)
  2017-06-29  2:25   ` [PATCH v2 1/2] eal: add uevent api for hot plug Wu, Jingjing
  2017-07-04 23:45   ` Thomas Monjalon
  2 siblings, 4 replies; 494+ messages in thread
From: Jeff Guo @ 2017-06-28 11:07 UTC (permalink / raw)
  To: helin.zhang, jingjing.wu; +Cc: dev, jia.guo

From: "Guo, Jia" <jia.guo@intel.com>

This patch enable the hot plug feature in i40e, by monitoring the
hot plug uevent of the device. When remove event got, call the app
callback function to handle the detach process.

Signed-off-by: Guo, Jia <jia.guo@intel.com>
---
v2->v1: remove unused part for current stage.
---
 drivers/net/i40e/i40e_ethdev.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index 4ee1113..122187e 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -1283,6 +1283,7 @@ static inline void i40e_GLQF_reg_init(struct i40e_hw *hw)
 
 	/* enable uio intr after callback register */
 	rte_intr_enable(intr_handle);
+
 	/*
 	 * Add an ethertype filter to drop all flow control frames transmitted
 	 * from VSIs. By doing so, we stop VF from sending out PAUSE or PFC
@@ -5832,11 +5833,28 @@ struct i40e_vsi *
 {
 	struct rte_eth_dev *dev = (struct rte_eth_dev *)param;
 	struct i40e_hw *hw = I40E_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+	struct rte_uevent event;
 	uint32_t icr0;
+	struct rte_pci_device *pci_dev;
+	struct rte_intr_handle *intr_handle;
+
+	pci_dev = RTE_ETH_DEV_TO_PCI(dev);
+	intr_handle = &pci_dev->intr_handle;
 
 	/* Disable interrupt */
 	i40e_pf_disable_irq0(hw);
 
+	/* check device uevent */
+	if (rte_uevent_get(intr_handle->uevent_fd, &event) > 0) {
+		if (event.subsystem == RTE_UEVENT_SUBSYSTEM_UIO) {
+			if (event.action == RTE_UEVENT_REMOVE) {
+				_rte_eth_dev_callback_process(dev,
+					RTE_ETH_EVENT_INTR_RMV, NULL);
+			}
+		}
+		goto done;
+	}
+
 	/* read out interrupt causes */
 	icr0 = I40E_READ_REG(hw, I40E_PFINT_ICR0);
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* Re: [PATCH v2 2/2] net/i40e: add hot plug monitor in i40e
  2017-06-28 11:07   ` [PATCH v2 2/2] net/i40e: add hot plug monitor in i40e Jeff Guo
@ 2017-06-29  1:41     ` Wu, Jingjing
  2017-06-29  4:31       ` Guo, Jia
  2017-06-29  3:34     ` Stephen Hemminger
                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 494+ messages in thread
From: Wu, Jingjing @ 2017-06-29  1:41 UTC (permalink / raw)
  To: Guo, Jia, Zhang, Helin; +Cc: dev



> -----Original Message-----
> From: Guo, Jia
> Sent: Wednesday, June 28, 2017 7:07 PM
> To: Zhang, Helin <helin.zhang@intel.com>; Wu, Jingjing
> <jingjing.wu@intel.com>
> Cc: dev@dpdk.org; Guo, Jia <jia.guo@intel.com>
> Subject: [PATCH v2 2/2] net/i40e: add hot plug monitor in i40e
> 
> From: "Guo, Jia" <jia.guo@intel.com>
> 
> This patch enable the hot plug feature in i40e, by monitoring the hot plug
> uevent of the device. When remove event got, call the app callback function to
> handle the detach process.
> 
> Signed-off-by: Guo, Jia <jia.guo@intel.com>
> ---
> v2->v1: remove unused part for current stage.
> ---
>  drivers/net/i40e/i40e_ethdev.c | 18 ++++++++++++++++++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
> index 4ee1113..122187e 100644
> --- a/drivers/net/i40e/i40e_ethdev.c
> +++ b/drivers/net/i40e/i40e_ethdev.c
> @@ -1283,6 +1283,7 @@ static inline void i40e_GLQF_reg_init(struct i40e_hw
> *hw)
> 
>  	/* enable uio intr after callback register */
>  	rte_intr_enable(intr_handle);
> +
>  	/*
>  	 * Add an ethertype filter to drop all flow control frames transmitted
>  	 * from VSIs. By doing so, we stop VF from sending out PAUSE or PFC
> @@ -5832,11 +5833,28 @@ struct i40e_vsi *  {
>  	struct rte_eth_dev *dev = (struct rte_eth_dev *)param;
>  	struct i40e_hw *hw = I40E_DEV_PRIVATE_TO_HW(dev->data-
> >dev_private);
> +	struct rte_uevent event;
>  	uint32_t icr0;
> +	struct rte_pci_device *pci_dev;
> +	struct rte_intr_handle *intr_handle;
> +
> +	pci_dev = RTE_ETH_DEV_TO_PCI(dev);
> +	intr_handle = &pci_dev->intr_handle;
> 
>  	/* Disable interrupt */
>  	i40e_pf_disable_irq0(hw);
> 
> +	/* check device uevent */
> +	if (rte_uevent_get(intr_handle->uevent_fd, &event) > 0) {

You declare the rte_uevnet_get like

+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+rte_uevent_get(int fd, struct rte_uevent *uevent);


But here you check if it > 0?

> +		if (event.subsystem == RTE_UEVENT_SUBSYSTEM_UIO) {
> +			if (event.action == RTE_UEVENT_REMOVE) {
> +				_rte_eth_dev_callback_process(dev,
> +					RTE_ETH_EVENT_INTR_RMV, NULL);
> +			}
> +		}
> +		goto done;

I think when the remove happen, no need to goto done, you can just return.
> +	}
> +
>  	/* read out interrupt causes */
>  	icr0 = I40E_READ_REG(hw, I40E_PFINT_ICR0);
> 
> --
> 1.8.3.1

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v2 1/2] eal: add uevent api for hot plug
  2017-06-28 11:07 ` [PATCH v2 1/2] eal: add uevent api for hot plug Jeff Guo
  2017-06-28 11:07   ` [PATCH v2 2/2] net/i40e: add hot plug monitor in i40e Jeff Guo
@ 2017-06-29  2:25   ` Wu, Jingjing
  2017-06-29  4:29     ` Guo, Jia
  2017-07-04 23:45   ` Thomas Monjalon
  2 siblings, 1 reply; 494+ messages in thread
From: Wu, Jingjing @ 2017-06-29  2:25 UTC (permalink / raw)
  To: Guo, Jia, Zhang, Helin; +Cc: dev

> +
> +int
> +rte_uevent_connect(void)
> +{
> +	struct sockaddr_nl addr;
> +	int ret;
> +	int netlink_fd = -1;
> +	int size = 64 * 1024;
> +	int nonblock = 1;
> +	memset(&addr, 0, sizeof(addr));
> +	addr.nl_family = AF_NETLINK;
> +	addr.nl_pid = 0;
> +	addr.nl_groups = 0xffffffff;
> +
> +	netlink_fd = socket(PF_NETLINK, SOCK_DGRAM,
> NETLINK_KOBJECT_UEVENT);
> +	if (netlink_fd < 0)
> +		return -1;
> +
> +	RTE_LOG(ERR, EAL,
> +	"netlink_fd is %d\n", netlink_fd);
Is this a ERR log?

> +
> +	setsockopt(netlink_fd, SOL_SOCKET, SO_RCVBUFFORCE, &size,
> +sizeof(size));
> +
> +	ret = ioctl(netlink_fd, FIONBIO, &nonblock);
> +	if (ret != 0) {
> +		RTE_LOG(ERR, EAL,
> +		"ioctl(FIONBIO) failed\n");
> +		close(netlink_fd);
> +		return -1;
> +	}
> +
> +	if (bind(netlink_fd, (struct sockaddr *) &addr, sizeof(addr)) < 0) {
> +		close(netlink_fd);
> +		return -1;
> +	}
> +
> +	return netlink_fd;
> +}
> +
> +static int
> +parse_event(const char *buf, struct rte_uevent *event) {
> +	char action[RTE_UEVENT_MSG_LEN];
> +	char subsystem[RTE_UEVENT_MSG_LEN];
> +	char dev_path[RTE_UEVENT_MSG_LEN];
> +	int i = 0;
> +
> +	memset(action, 0, RTE_UEVENT_MSG_LEN);
> +	memset(subsystem, 0, RTE_UEVENT_MSG_LEN);
> +	memset(dev_path, 0, RTE_UEVENT_MSG_LEN);
> +
> +	while (*buf && i < RTE_UEVENT_MSG_LEN) {
> +		if (!strncmp(buf, "ACTION=", 7)) {
> +			buf += 7;
> +			snprintf(action, sizeof(action), "%s", buf);
> +		} else if (!strncmp(buf, "DEVPATH=", 8)) {
> +			buf += 8;
> +			snprintf(dev_path, sizeof(dev_path), "%s", buf);
> +		} else if (!strncmp(buf, "SUBSYSTEM=", 10)) {
> +			buf += 10;
> +			snprintf(subsystem, sizeof(subsystem), "%s", buf);
> +		}
> +		while (*buf++)
> +			i++;
> +		while (*buf == '\0') {
> +			buf++;
> +			i++;
> +		}
++ until to the end? The logic looks wrong. Please check carefully.

> +	}
> +
> +	if (!strncmp(subsystem, "uio", 3)) {
> +
> +		event->subsystem = RTE_UEVENT_SUBSYSTEM_UIO;
> +		if (!strncmp(action, "add", 3))
> +			event->action = RTE_UEVENT_ADD;
> +		if (!strncmp(action, "remove", 6))
> +			event->action = RTE_UEVENT_REMOVE;
> +		return 0;
> +	}
> +
> +	return -1;
> +}
> +
> +int
> +rte_uevent_get(int fd, struct rte_uevent *uevent) {
> +	int ret;
> +	char buf[RTE_UEVENT_MSG_LEN];
> +
> +	memset(uevent, 0, sizeof(struct rte_uevent));
> +	memset(buf, 0, RTE_UEVENT_MSG_LEN);
> +
> +	ret = recv(fd, buf, RTE_UEVENT_MSG_LEN - 1, MSG_DONTWAIT);
> +	if (ret > 0)
> +		return parse_event(buf, uevent);
> +
> +	if (ret < 0) {
Meaningless check.
> +		if (errno == EAGAIN || errno == EWOULDBLOCK) {
> +			return 0;
Return -1? The function is declared as 0 means success.

> +		} else {
> +			RTE_LOG(ERR, EAL,
> +			"Socket read error(%d): %s\n",
> +			errno, strerror(errno));
Why not return?

> +		}
> +	}
> +
> +	/* connection closed */
> +	if (ret == 0)
> +		return -1;
> +
> +	return 0;
> +}
> diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
> b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
> index fa10329..2fead82 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
> @@ -231,6 +231,10 @@
>  		close(dev->intr_handle.uio_cfg_fd);
>  		dev->intr_handle.uio_cfg_fd = -1;
>  	}
> +	if (dev->intr_handle.uevent_fd >= 0) {
> +		close(dev->intr_handle.uevent_fd);
> +		dev->intr_handle.uevent_fd = -1;
> +	}
>  	if (dev->intr_handle.fd >= 0) {
>  		close(dev->intr_handle.fd);
>  		dev->intr_handle.fd = -1;
> @@ -245,6 +249,7 @@
>  	char dirname[PATH_MAX];
>  	char cfgname[PATH_MAX];
>  	char devname[PATH_MAX]; /* contains the /dev/uioX */
> +	char uevtname[PATH_MAX];
>  	int uio_num;
>  	struct rte_pci_addr *loc;
> 
> @@ -276,6 +281,13 @@
>  		goto error;
>  	}
> 
> +	dev->intr_handle.uevent_fd = rte_uevent_connect();
> +	if (dev->intr_handle.uevent_fd < 0) {
> +		RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
> +			uevtname, strerror(errno));
> +		goto error;
It seems uevtname is not assigned at all. Do we need it? And the log may means
"cannot connect the event fd", right?. And even the event fd is failed to create,
should it block the process? 


Thanks
Jingjing


^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v2 2/2] net/i40e: add hot plug monitor in i40e
  2017-06-28 11:07   ` [PATCH v2 2/2] net/i40e: add hot plug monitor in i40e Jeff Guo
  2017-06-29  1:41     ` Wu, Jingjing
@ 2017-06-29  3:34     ` Stephen Hemminger
  2017-06-29  4:48       ` Wu, Jingjing
  2017-06-29  4:37     ` [PATCH v3 0/2] add uevent api for hot plug Jeff Guo
  2017-06-29  5:01     ` [PATCH v3 0/2] add uevent api for hot plug Jeff Guo
  3 siblings, 1 reply; 494+ messages in thread
From: Stephen Hemminger @ 2017-06-29  3:34 UTC (permalink / raw)
  To: Jeff Guo; +Cc: helin.zhang, jingjing.wu, dev

On Wed, 28 Jun 2017 19:07:24 +0800
Jeff Guo <jia.guo@intel.com> wrote:

> From: "Guo, Jia" <jia.guo@intel.com>
> 
> This patch enable the hot plug feature in i40e, by monitoring the
> hot plug uevent of the device. When remove event got, call the app
> callback function to handle the detach process.
> 
> Signed-off-by: Guo, Jia <jia.guo@intel.com>
> ---

Hot plug is good and needed.

But it needs to be done in a generic fashion in the bus layer.
There is nothing about uevents that are unique to i40e or even Intel
devices. Plus the way hotplug is handled is OS specific, so this isn't going
to work well on BSD.

Sorry if I sound like a broken record but there has been a repeated pattern
of Intel developers  putting their head down (or in the sand) and creating
functionality inside device driver.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v2 1/2] eal: add uevent api for hot plug
  2017-06-29  2:25   ` [PATCH v2 1/2] eal: add uevent api for hot plug Wu, Jingjing
@ 2017-06-29  4:29     ` Guo, Jia
  0 siblings, 0 replies; 494+ messages in thread
From: Guo, Jia @ 2017-06-29  4:29 UTC (permalink / raw)
  To: Wu, Jingjing, Zhang, Helin; +Cc: dev

The buf have contain lot of consistent '/0', so it is why I need to check that. anyway I will refine that and other return issue in v3.

Thanks jingjing .

Best regards,
Jeff Guo


-----Original Message-----
From: Wu, Jingjing 
Sent: Thursday, June 29, 2017 10:25 AM
To: Guo, Jia <jia.guo@intel.com>; Zhang, Helin <helin.zhang@intel.com>
Cc: dev@dpdk.org
Subject: RE: [PATCH v2 1/2] eal: add uevent api for hot plug

> +
> +int
> +rte_uevent_connect(void)
> +{
> +	struct sockaddr_nl addr;
> +	int ret;
> +	int netlink_fd = -1;
> +	int size = 64 * 1024;
> +	int nonblock = 1;
> +	memset(&addr, 0, sizeof(addr));
> +	addr.nl_family = AF_NETLINK;
> +	addr.nl_pid = 0;
> +	addr.nl_groups = 0xffffffff;
> +
> +	netlink_fd = socket(PF_NETLINK, SOCK_DGRAM,
> NETLINK_KOBJECT_UEVENT);
> +	if (netlink_fd < 0)
> +		return -1;
> +
> +	RTE_LOG(ERR, EAL,
> +	"netlink_fd is %d\n", netlink_fd);
Is this a ERR log?

> +
> +	setsockopt(netlink_fd, SOL_SOCKET, SO_RCVBUFFORCE, &size, 
> +sizeof(size));
> +
> +	ret = ioctl(netlink_fd, FIONBIO, &nonblock);
> +	if (ret != 0) {
> +		RTE_LOG(ERR, EAL,
> +		"ioctl(FIONBIO) failed\n");
> +		close(netlink_fd);
> +		return -1;
> +	}
> +
> +	if (bind(netlink_fd, (struct sockaddr *) &addr, sizeof(addr)) < 0) {
> +		close(netlink_fd);
> +		return -1;
> +	}
> +
> +	return netlink_fd;
> +}
> +
> +static int
> +parse_event(const char *buf, struct rte_uevent *event) {
> +	char action[RTE_UEVENT_MSG_LEN];
> +	char subsystem[RTE_UEVENT_MSG_LEN];
> +	char dev_path[RTE_UEVENT_MSG_LEN];
> +	int i = 0;
> +
> +	memset(action, 0, RTE_UEVENT_MSG_LEN);
> +	memset(subsystem, 0, RTE_UEVENT_MSG_LEN);
> +	memset(dev_path, 0, RTE_UEVENT_MSG_LEN);
> +
> +	while (*buf && i < RTE_UEVENT_MSG_LEN) {
> +		if (!strncmp(buf, "ACTION=", 7)) {
> +			buf += 7;
> +			snprintf(action, sizeof(action), "%s", buf);
> +		} else if (!strncmp(buf, "DEVPATH=", 8)) {
> +			buf += 8;
> +			snprintf(dev_path, sizeof(dev_path), "%s", buf);
> +		} else if (!strncmp(buf, "SUBSYSTEM=", 10)) {
> +			buf += 10;
> +			snprintf(subsystem, sizeof(subsystem), "%s", buf);
> +		}
> +		while (*buf++)
> +			i++;
> +		while (*buf == '\0') {
> +			buf++;
> +			i++;
> +		}
++ until to the end? The logic looks wrong. Please check carefully.

> +	}
> +
> +	if (!strncmp(subsystem, "uio", 3)) {
> +
> +		event->subsystem = RTE_UEVENT_SUBSYSTEM_UIO;
> +		if (!strncmp(action, "add", 3))
> +			event->action = RTE_UEVENT_ADD;
> +		if (!strncmp(action, "remove", 6))
> +			event->action = RTE_UEVENT_REMOVE;
> +		return 0;
> +	}
> +
> +	return -1;
> +}
> +
> +int
> +rte_uevent_get(int fd, struct rte_uevent *uevent) {
> +	int ret;
> +	char buf[RTE_UEVENT_MSG_LEN];
> +
> +	memset(uevent, 0, sizeof(struct rte_uevent));
> +	memset(buf, 0, RTE_UEVENT_MSG_LEN);
> +
> +	ret = recv(fd, buf, RTE_UEVENT_MSG_LEN - 1, MSG_DONTWAIT);
> +	if (ret > 0)
> +		return parse_event(buf, uevent);
> +
> +	if (ret < 0) {
Meaningless check.
> +		if (errno == EAGAIN || errno == EWOULDBLOCK) {
> +			return 0;
Return -1? The function is declared as 0 means success.

> +		} else {
> +			RTE_LOG(ERR, EAL,
> +			"Socket read error(%d): %s\n",
> +			errno, strerror(errno));
Why not return?

> +		}
> +	}
> +
> +	/* connection closed */
> +	if (ret == 0)
> +		return -1;
> +
> +	return 0;
> +}
> diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
> b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
> index fa10329..2fead82 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
> @@ -231,6 +231,10 @@
>  		close(dev->intr_handle.uio_cfg_fd);
>  		dev->intr_handle.uio_cfg_fd = -1;
>  	}
> +	if (dev->intr_handle.uevent_fd >= 0) {
> +		close(dev->intr_handle.uevent_fd);
> +		dev->intr_handle.uevent_fd = -1;
> +	}
>  	if (dev->intr_handle.fd >= 0) {
>  		close(dev->intr_handle.fd);
>  		dev->intr_handle.fd = -1;
> @@ -245,6 +249,7 @@
>  	char dirname[PATH_MAX];
>  	char cfgname[PATH_MAX];
>  	char devname[PATH_MAX]; /* contains the /dev/uioX */
> +	char uevtname[PATH_MAX];
>  	int uio_num;
>  	struct rte_pci_addr *loc;
> 
> @@ -276,6 +281,13 @@
>  		goto error;
>  	}
> 
> +	dev->intr_handle.uevent_fd = rte_uevent_connect();
> +	if (dev->intr_handle.uevent_fd < 0) {
> +		RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
> +			uevtname, strerror(errno));
> +		goto error;
It seems uevtname is not assigned at all. Do we need it? And the log may means "cannot connect the event fd", right?. And even the event fd is failed to create, should it block the process? 


Thanks
Jingjing


^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v2 2/2] net/i40e: add hot plug monitor in i40e
  2017-06-29  1:41     ` Wu, Jingjing
@ 2017-06-29  4:31       ` Guo, Jia
  0 siblings, 0 replies; 494+ messages in thread
From: Guo, Jia @ 2017-06-29  4:31 UTC (permalink / raw)
  To: Wu, Jingjing, Zhang, Helin; +Cc: dev

Yes, if got remove uevent might be directly return to avoid invalid i/o. but if got other uevent such as add and change, must be go done to keep the interrupt process in device. I will refine this part, thanks. 

Best regards,
Jeff Guo


-----Original Message-----
From: Wu, Jingjing 
Sent: Thursday, June 29, 2017 9:42 AM
To: Guo, Jia <jia.guo@intel.com>; Zhang, Helin <helin.zhang@intel.com>
Cc: dev@dpdk.org
Subject: RE: [PATCH v2 2/2] net/i40e: add hot plug monitor in i40e



> -----Original Message-----
> From: Guo, Jia
> Sent: Wednesday, June 28, 2017 7:07 PM
> To: Zhang, Helin <helin.zhang@intel.com>; Wu, Jingjing 
> <jingjing.wu@intel.com>
> Cc: dev@dpdk.org; Guo, Jia <jia.guo@intel.com>
> Subject: [PATCH v2 2/2] net/i40e: add hot plug monitor in i40e
> 
> From: "Guo, Jia" <jia.guo@intel.com>
> 
> This patch enable the hot plug feature in i40e, by monitoring the hot 
> plug uevent of the device. When remove event got, call the app 
> callback function to handle the detach process.
> 
> Signed-off-by: Guo, Jia <jia.guo@intel.com>
> ---
> v2->v1: remove unused part for current stage.
> ---
>  drivers/net/i40e/i40e_ethdev.c | 18 ++++++++++++++++++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/drivers/net/i40e/i40e_ethdev.c 
> b/drivers/net/i40e/i40e_ethdev.c index 4ee1113..122187e 100644
> --- a/drivers/net/i40e/i40e_ethdev.c
> +++ b/drivers/net/i40e/i40e_ethdev.c
> @@ -1283,6 +1283,7 @@ static inline void i40e_GLQF_reg_init(struct 
> i40e_hw
> *hw)
> 
>  	/* enable uio intr after callback register */
>  	rte_intr_enable(intr_handle);
> +
>  	/*
>  	 * Add an ethertype filter to drop all flow control frames transmitted
>  	 * from VSIs. By doing so, we stop VF from sending out PAUSE or PFC 
> @@ -5832,11 +5833,28 @@ struct i40e_vsi *  {
>  	struct rte_eth_dev *dev = (struct rte_eth_dev *)param;
>  	struct i40e_hw *hw = I40E_DEV_PRIVATE_TO_HW(dev->data-
> >dev_private);
> +	struct rte_uevent event;
>  	uint32_t icr0;
> +	struct rte_pci_device *pci_dev;
> +	struct rte_intr_handle *intr_handle;
> +
> +	pci_dev = RTE_ETH_DEV_TO_PCI(dev);
> +	intr_handle = &pci_dev->intr_handle;
> 
>  	/* Disable interrupt */
>  	i40e_pf_disable_irq0(hw);
> 
> +	/* check device uevent */
> +	if (rte_uevent_get(intr_handle->uevent_fd, &event) > 0) {

You declare the rte_uevnet_get like

+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+rte_uevent_get(int fd, struct rte_uevent *uevent);


But here you check if it > 0?

> +		if (event.subsystem == RTE_UEVENT_SUBSYSTEM_UIO) {
> +			if (event.action == RTE_UEVENT_REMOVE) {
> +				_rte_eth_dev_callback_process(dev,
> +					RTE_ETH_EVENT_INTR_RMV, NULL);
> +			}
> +		}
> +		goto done;

I think when the remove happen, no need to goto done, you can just return.
> +	}
> +
>  	/* read out interrupt causes */
>  	icr0 = I40E_READ_REG(hw, I40E_PFINT_ICR0);
> 
> --
> 1.8.3.1

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH v3 0/2] add uevent api for hot plug
  2017-06-28 11:07   ` [PATCH v2 2/2] net/i40e: add hot plug monitor in i40e Jeff Guo
  2017-06-29  1:41     ` Wu, Jingjing
  2017-06-29  3:34     ` Stephen Hemminger
@ 2017-06-29  4:37     ` Jeff Guo
  2017-06-29  4:37       ` [PATCH v3 1/2] eal: " Jeff Guo
                         ` (23 more replies)
  2017-06-29  5:01     ` [PATCH v3 0/2] add uevent api for hot plug Jeff Guo
  3 siblings, 24 replies; 494+ messages in thread
From: Jeff Guo @ 2017-06-29  4:37 UTC (permalink / raw)
  To: helin.zhang, jingjing.wu; +Cc: dev, jia.guo

From: "Guo, Jia" <jia.guo@intel.com>

This patch set aim to add a variable "uevent_fd" in structure
"rte_intr_handle" for enable kernel object uevent monitoring,
and add some uevent API in rte eal interrupt, that is
“rte_uevent_connect” and “rte_uevent_get”. The patch use i40e
for example, the driver could use these API to monitor and read
out the uevent, then corresponding to handle these uevent,
such as detach or attach the device.

Guo, Jia (2):
  eal: add uevent api for hot plug
  net/i40e: add hot plug monitor in i40e

 drivers/net/i40e/i40e_ethdev.c                     |  19 +++
 lib/librte_eal/common/eal_common_pci_uio.c         |   6 +-
 lib/librte_eal/linuxapp/eal/eal_interrupts.c       | 136 ++++++++++++++++++++-
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c          |   6 +
 .../linuxapp/eal/include/exec-env/rte_interrupts.h |  37 ++++++
 5 files changed, 201 insertions(+), 3 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH v3 1/2] eal: add uevent api for hot plug
  2017-06-29  4:37     ` [PATCH v3 0/2] add uevent api for hot plug Jeff Guo
@ 2017-06-29  4:37       ` Jeff Guo
  2017-06-30  3:38         ` Wu, Jingjing
  2017-06-29  4:37       ` [PATCH v3 2/2] net/i40e: add hot plug monitor in i40e Jeff Guo
                         ` (22 subsequent siblings)
  23 siblings, 1 reply; 494+ messages in thread
From: Jeff Guo @ 2017-06-29  4:37 UTC (permalink / raw)
  To: helin.zhang, jingjing.wu; +Cc: dev, jia.guo

From: "Guo, Jia" <jia.guo@intel.com>

This patch aim to add a variable "uevent_fd" in structure
"rte_intr_handle" for enable kernel object uevent monitoring,
and add some uevent API in rte eal interrupt, that is
“rte_uevent_connect” and “rte_uevent_get”, so that all driver
could use these API to monitor and read out the uevent, then
corresponding to handle these uevent, such as detach or attach
the device.

Signed-off-by: Guo, Jia <jia.guo@intel.com>
---
v3->v2: refine some return error
	refine the string searching logic to aviod memory issue
---
 lib/librte_eal/common/eal_common_pci_uio.c         |   6 +-
 lib/librte_eal/linuxapp/eal/eal_interrupts.c       | 136 ++++++++++++++++++++-
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c          |   6 +
 .../linuxapp/eal/include/exec-env/rte_interrupts.h |  37 ++++++
 4 files changed, 182 insertions(+), 3 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_pci_uio.c b/lib/librte_eal/common/eal_common_pci_uio.c
index 367a681..5b62f70 100644
--- a/lib/librte_eal/common/eal_common_pci_uio.c
+++ b/lib/librte_eal/common/eal_common_pci_uio.c
@@ -117,6 +117,7 @@
 
 	dev->intr_handle.fd = -1;
 	dev->intr_handle.uio_cfg_fd = -1;
+	dev->intr_handle.uevent_fd = -1;
 	dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
 
 	/* secondary processes - use already recorded details */
@@ -227,7 +228,10 @@
 		close(dev->intr_handle.uio_cfg_fd);
 		dev->intr_handle.uio_cfg_fd = -1;
 	}
-
+	if (dev->intr_handle.uevent_fd >= 0) {
+		close(dev->intr_handle.uevent_fd);
+		dev->intr_handle.uevent_fd = -1;
+	}
 	dev->intr_handle.fd = -1;
 	dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
 }
diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
index 2e3bd12..2c4a3fb 100644
--- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c
+++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
@@ -65,6 +65,10 @@
 #include <rte_errno.h>
 #include <rte_spinlock.h>
 
+#include <sys/socket.h>
+#include <linux/netlink.h>
+#include <sys/epoll.h>
+
 #include "eal_private.h"
 #include "eal_vfio.h"
 #include "eal_thread.h"
@@ -669,10 +673,13 @@ struct rte_intr_source {
 			RTE_SET_USED(r);
 			return -1;
 		}
+
 		rte_spinlock_lock(&intr_lock);
 		TAILQ_FOREACH(src, &intr_sources, next)
-			if (src->intr_handle.fd ==
-					events[n].data.fd)
+			if ((src->intr_handle.fd ==
+					events[n].data.fd) ||
+				(src->intr_handle.uevent_fd ==
+					events[n].data.fd))
 				break;
 		if (src == NULL){
 			rte_spinlock_unlock(&intr_lock);
@@ -858,7 +865,24 @@ static __attribute__((noreturn)) void *
 			}
 			else
 				numfds++;
+
+			/**
+			 * add device uevent file descriptor
+			 * into wait list for uevent monitoring.
+			 */
+			ev.events = EPOLLIN | EPOLLPRI | EPOLLRDHUP | EPOLLHUP;
+			ev.data.fd = src->intr_handle.uevent_fd;
+			if (epoll_ctl(pfd, EPOLL_CTL_ADD,
+					src->intr_handle.uevent_fd, &ev) < 0){
+				rte_panic("Error adding uevent_fd %d epoll_ctl"
+					", %s\n",
+					src->intr_handle.uevent_fd,
+					strerror(errno));
+			} else
+				numfds++;
 		}
+
+
 		rte_spinlock_unlock(&intr_lock);
 		/* serve the interrupt */
 		eal_intr_handle_interrupts(pfd, numfds);
@@ -1255,3 +1279,111 @@ static __attribute__((noreturn)) void *
 
 	return 0;
 }
+
+int
+rte_uevent_connect(void)
+{
+	struct sockaddr_nl addr;
+	int ret;
+	int netlink_fd = -1;
+	int size = 64 * 1024;
+	int nonblock = 1;
+	memset(&addr, 0, sizeof(addr));
+	addr.nl_family = AF_NETLINK;
+	addr.nl_pid = 0;
+	addr.nl_groups = 0xffffffff;
+
+	netlink_fd = socket(PF_NETLINK, SOCK_DGRAM, NETLINK_KOBJECT_UEVENT);
+	if (netlink_fd < 0)
+		return -1;
+
+	setsockopt(netlink_fd, SOL_SOCKET, SO_RCVBUFFORCE, &size, sizeof(size));
+
+	ret = ioctl(netlink_fd, FIONBIO, &nonblock);
+	if (ret != 0) {
+		RTE_LOG(ERR, EAL,
+		"ioctl(FIONBIO) failed\n");
+		close(netlink_fd);
+		return -1;
+	}
+
+	if (bind(netlink_fd, (struct sockaddr *) &addr, sizeof(addr)) < 0) {
+		close(netlink_fd);
+		return -1;
+	}
+
+	return netlink_fd;
+}
+
+static int
+parse_event(const char *buf, struct rte_uevent *event)
+{
+	char action[RTE_UEVENT_MSG_LEN];
+	char subsystem[RTE_UEVENT_MSG_LEN];
+	char dev_path[RTE_UEVENT_MSG_LEN];
+	int i = 0;
+
+	memset(action, 0, RTE_UEVENT_MSG_LEN);
+	memset(subsystem, 0, RTE_UEVENT_MSG_LEN);
+	memset(dev_path, 0, RTE_UEVENT_MSG_LEN);
+
+	while (i < RTE_UEVENT_MSG_LEN) {
+		for (; i < RTE_UEVENT_MSG_LEN; i++) {
+			if (*buf)
+				break;
+			buf++;
+		}
+		if (!strncmp(buf, "ACTION=", 7)) {
+			buf += 7;
+			i += 7;
+			snprintf(action, sizeof(action), "%s", buf);
+		} else if (!strncmp(buf, "DEVPATH=", 8)) {
+			buf += 8;
+			i += 8;
+			snprintf(dev_path, sizeof(dev_path), "%s", buf);
+		} else if (!strncmp(buf, "SUBSYSTEM=", 10)) {
+			buf += 10;
+			i += 10;
+			snprintf(subsystem, sizeof(subsystem), "%s", buf);
+		}
+		for (; i < RTE_UEVENT_MSG_LEN; i++) {
+			if (*buf == '\0')
+				break;
+			buf++;
+		}
+	}
+
+	if (!strncmp(subsystem, "uio", 3)) {
+
+		event->subsystem = RTE_UEVENT_SUBSYSTEM_UIO;
+		if (!strncmp(action, "add", 3))
+			event->action = RTE_UEVENT_ADD;
+		if (!strncmp(action, "remove", 6))
+			event->action = RTE_UEVENT_REMOVE;
+		return 0;
+	}
+
+	return -1;
+}
+
+int
+rte_uevent_get(int fd, struct rte_uevent *uevent)
+{
+	int ret;
+	char buf[RTE_UEVENT_MSG_LEN];
+
+	memset(uevent, 0, sizeof(struct rte_uevent));
+	memset(buf, 0, RTE_UEVENT_MSG_LEN);
+
+	ret = recv(fd, buf, RTE_UEVENT_MSG_LEN - 1, MSG_DONTWAIT);
+	if (ret > 0)
+		return parse_event(buf, uevent);
+	else if (ret < 0) {
+		RTE_LOG(ERR, EAL,
+		"Socket read error(%d): %s\n",
+		errno, strerror(errno));
+		return -1;
+	} else
+		/* connection closed */
+		return -1;
+}
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
index fa10329..eae9cd5 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
@@ -231,6 +231,10 @@
 		close(dev->intr_handle.uio_cfg_fd);
 		dev->intr_handle.uio_cfg_fd = -1;
 	}
+	if (dev->intr_handle.uevent_fd >= 0) {
+		close(dev->intr_handle.uevent_fd);
+		dev->intr_handle.uevent_fd = -1;
+	}
 	if (dev->intr_handle.fd >= 0) {
 		close(dev->intr_handle.fd);
 		dev->intr_handle.fd = -1;
@@ -276,6 +280,8 @@
 		goto error;
 	}
 
+	dev->intr_handle.uevent_fd = rte_uevent_connect();
+
 	if (dev->kdrv == RTE_KDRV_IGB_UIO)
 		dev->intr_handle.type = RTE_INTR_HANDLE_UIO;
 	else {
diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
index 6daffeb..0b31a22 100644
--- a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
+++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
@@ -90,6 +90,7 @@ struct rte_intr_handle {
 					for uio_pci_generic */
 	};
 	int fd;	 /**< interrupt event file descriptor */
+	int uevent_fd;	 /**< uevent file descriptor */
 	enum rte_intr_handle_type type;  /**< handle type */
 	uint32_t max_intr;             /**< max interrupt requested */
 	uint32_t nb_efd;               /**< number of available efd(event fd) */
@@ -99,6 +100,19 @@ struct rte_intr_handle {
 	int *intr_vec;                 /**< intr vector number array */
 };
 
+#define RTE_UEVENT_MSG_LEN 4096
+#define RTE_UEVENT_SUBSYSTEM_UIO 1
+
+enum rte_uevent_action {
+	RTE_UEVENT_ADD = 0,		/**< uevent type of device add */
+	RTE_UEVENT_REMOVE = 1,	/**< uevent type of device remove*/
+};
+
+struct rte_uevent {
+	enum rte_uevent_action action;	/**< uevent action type */
+	int subsystem;				/**< subsystem id */
+};
+
 #define RTE_EPOLL_PER_THREAD        -1  /**< to hint using per thread epfd */
 
 /**
@@ -236,4 +250,27 @@ struct rte_intr_handle {
 int
 rte_intr_cap_multiple(struct rte_intr_handle *intr_handle);
 
+/**
+ * It read out the uevent from the specific file descriptor.
+ *
+ * @param fd
+ *   The fd which the uevent associated to
+ * @param uevent
+ *   Pointer to the uevent which read from the monitoring fd.
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+rte_uevent_get(int fd, struct rte_uevent *uevent);
+
+/**
+ * Connect to the device uevent file descriptor.
+ * @return
+ *   - On success, the connected uevent fd.
+ *   - On failure, a negative value.
+ */
+int
+rte_uevent_connect(void);
+
 #endif /* _RTE_LINUXAPP_INTERRUPTS_H_ */
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v3 2/2] net/i40e: add hot plug monitor in i40e
  2017-06-29  4:37     ` [PATCH v3 0/2] add uevent api for hot plug Jeff Guo
  2017-06-29  4:37       ` [PATCH v3 1/2] eal: " Jeff Guo
@ 2017-06-29  4:37       ` Jeff Guo
  2017-06-30  3:38         ` Wu, Jingjing
  2018-04-13  8:30       ` [PATCH V22 0/4] add device event monitor framework Jeff Guo
                         ` (21 subsequent siblings)
  23 siblings, 1 reply; 494+ messages in thread
From: Jeff Guo @ 2017-06-29  4:37 UTC (permalink / raw)
  To: helin.zhang, jingjing.wu; +Cc: dev, jia.guo

From: "Guo, Jia" <jia.guo@intel.com>

This patch enable the hot plug feature in i40e, by monitoring the
hot plug uevent of the device. When remove event got, call the app
callback function to handle the detach process.

Signed-off-by: Guo, Jia <jia.guo@intel.com>
---
v3->v2: refine the return issue if device remove
---
 drivers/net/i40e/i40e_ethdev.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index 4ee1113..67ffc14 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -1283,6 +1283,7 @@ static inline void i40e_GLQF_reg_init(struct i40e_hw *hw)
 
 	/* enable uio intr after callback register */
 	rte_intr_enable(intr_handle);
+
 	/*
 	 * Add an ethertype filter to drop all flow control frames transmitted
 	 * from VSIs. By doing so, we stop VF from sending out PAUSE or PFC
@@ -5832,11 +5833,29 @@ struct i40e_vsi *
 {
 	struct rte_eth_dev *dev = (struct rte_eth_dev *)param;
 	struct i40e_hw *hw = I40E_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+	struct rte_uevent event;
 	uint32_t icr0;
+	struct rte_pci_device *pci_dev;
+	struct rte_intr_handle *intr_handle;
+
+	pci_dev = RTE_ETH_DEV_TO_PCI(dev);
+	intr_handle = &pci_dev->intr_handle;
 
 	/* Disable interrupt */
 	i40e_pf_disable_irq0(hw);
 
+	/* check device uevent */
+	if (rte_uevent_get(intr_handle->uevent_fd, &event) == 0) {
+		if (event.subsystem == RTE_UEVENT_SUBSYSTEM_UIO) {
+			if (event.action == RTE_UEVENT_REMOVE) {
+				_rte_eth_dev_callback_process(dev,
+					RTE_ETH_EVENT_INTR_RMV, NULL);
+				return;
+			}
+		}
+		goto done;
+	}
+
 	/* read out interrupt causes */
 	icr0 = I40E_READ_REG(hw, I40E_PFINT_ICR0);
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* Re: [PATCH v2 2/2] net/i40e: add hot plug monitor in i40e
  2017-06-29  3:34     ` Stephen Hemminger
@ 2017-06-29  4:48       ` Wu, Jingjing
  2017-06-29  7:47         ` Guo, Jia
  0 siblings, 1 reply; 494+ messages in thread
From: Wu, Jingjing @ 2017-06-29  4:48 UTC (permalink / raw)
  To: Stephen Hemminger, Guo, Jia
  Cc: Zhang, Helin, dev, Chang, Cunyin, Liang, Cunming



> -----Original Message-----
> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Thursday, June 29, 2017 11:35 AM
> To: Guo, Jia <jia.guo@intel.com>
> Cc: Zhang, Helin <helin.zhang@intel.com>; Wu, Jingjing
> <jingjing.wu@intel.com>; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v2 2/2] net/i40e: add hot plug monitor in i40e
> 
> On Wed, 28 Jun 2017 19:07:24 +0800
> Jeff Guo <jia.guo@intel.com> wrote:
> 
> > From: "Guo, Jia" <jia.guo@intel.com>
> >
> > This patch enable the hot plug feature in i40e, by monitoring the hot
> > plug uevent of the device. When remove event got, call the app
> > callback function to handle the detach process.
> >
> > Signed-off-by: Guo, Jia <jia.guo@intel.com>
> > ---
> 
> Hot plug is good and needed.
> 
> But it needs to be done in a generic fashion in the bus layer.
> There is nothing about uevents that are unique to i40e or even Intel devices.
> Plus the way hotplug is handled is OS specific, so this isn't going to work well on
> BSD.
> 
This patch is not a way to full support hut plug. And we know it is handled in OS specific.
This patch just provides a way to tell DPDK user the remove happened on this device (DPDK dev).

And Mlx driver already supports that with patch
http://dpdk.org/dev/patchwork/patch/23695/

What GuoJia did is just making the EVENT can be process by application through interrupt callback
Mechanisms.

> Sorry if I sound like a broken record but there has been a repeated pattern of
> Intel developers  putting their head down (or in the sand) and creating
> functionality inside device driver.
Sorry, I cannot agree.

Thanks
Jingjing

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH v3 0/2] add uevent api for hot plug
  2017-06-28 11:07   ` [PATCH v2 2/2] net/i40e: add hot plug monitor in i40e Jeff Guo
                       ` (2 preceding siblings ...)
  2017-06-29  4:37     ` [PATCH v3 0/2] add uevent api for hot plug Jeff Guo
@ 2017-06-29  5:01     ` Jeff Guo
  2017-06-29  5:01       ` [PATCH v3 1/2] eal: " Jeff Guo
  2017-06-29  5:01       ` [PATCH v3 2/2] net/i40e: add hot plug monitor in i40e Jeff Guo
  3 siblings, 2 replies; 494+ messages in thread
From: Jeff Guo @ 2017-06-29  5:01 UTC (permalink / raw)
  To: helin.zhang, jingjing.wu
  Cc: dev, jia.guo, bruce.richardson, konstantin.ananyev, gaetan.rivet,
	thomas.monjalon, ferruh.yigit

From: "Guo, Jia" <jia.guo@intel.com>

This patch set aim to add a variable "uevent_fd" in structure
"rte_intr_handle" for enable kernel object uevent monitoring,
and add some uevent API in rte eal interrupt, that is
“rte_uevent_connect” and “rte_uevent_get”. The patch use i40e
for example, the driver could use these API to monitor and read
out the uevent, then corresponding to handle these uevent,
such as detach or attach the device.

Guo, Jia (2):
  eal: add uevent api for hot plug
  net/i40e: add hot plug monitor in i40e

 drivers/net/i40e/i40e_ethdev.c                     |  19 +++
 lib/librte_eal/common/eal_common_pci_uio.c         |   6 +-
 lib/librte_eal/linuxapp/eal/eal_interrupts.c       | 136 ++++++++++++++++++++-
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c          |   6 +
 .../linuxapp/eal/include/exec-env/rte_interrupts.h |  37 ++++++
 5 files changed, 201 insertions(+), 3 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH v3 1/2] eal: add uevent api for hot plug
  2017-06-29  5:01     ` [PATCH v3 0/2] add uevent api for hot plug Jeff Guo
@ 2017-06-29  5:01       ` Jeff Guo
  2017-07-04  7:15         ` Wu, Jingjing
  2017-09-03 15:49         ` [PATCH v4 0/2] add uevent monitor " Jeff Guo
  2017-06-29  5:01       ` [PATCH v3 2/2] net/i40e: add hot plug monitor in i40e Jeff Guo
  1 sibling, 2 replies; 494+ messages in thread
From: Jeff Guo @ 2017-06-29  5:01 UTC (permalink / raw)
  To: helin.zhang, jingjing.wu
  Cc: dev, jia.guo, bruce.richardson, konstantin.ananyev, gaetan.rivet,
	thomas.monjalon, ferruh.yigit

From: "Guo, Jia" <jia.guo@intel.com>

This patch aim to add a variable "uevent_fd" in structure
"rte_intr_handle" for enable kernel object uevent monitoring,
and add some uevent API in rte eal interrupt, that is
“rte_uevent_connect” and “rte_uevent_get”, so that all driver
could use these API to monitor and read out the uevent, then
corresponding to handle these uevent, such as detach or attach
the device.

Signed-off-by: Guo, Jia <jia.guo@intel.com>
---
v3->v2: refine some return error
	refine the string searching logic to aviod memory issue
---
 lib/librte_eal/common/eal_common_pci_uio.c         |   6 +-
 lib/librte_eal/linuxapp/eal/eal_interrupts.c       | 136 ++++++++++++++++++++-
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c          |   6 +
 .../linuxapp/eal/include/exec-env/rte_interrupts.h |  37 ++++++
 4 files changed, 182 insertions(+), 3 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_pci_uio.c b/lib/librte_eal/common/eal_common_pci_uio.c
index 367a681..5b62f70 100644
--- a/lib/librte_eal/common/eal_common_pci_uio.c
+++ b/lib/librte_eal/common/eal_common_pci_uio.c
@@ -117,6 +117,7 @@
 
 	dev->intr_handle.fd = -1;
 	dev->intr_handle.uio_cfg_fd = -1;
+	dev->intr_handle.uevent_fd = -1;
 	dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
 
 	/* secondary processes - use already recorded details */
@@ -227,7 +228,10 @@
 		close(dev->intr_handle.uio_cfg_fd);
 		dev->intr_handle.uio_cfg_fd = -1;
 	}
-
+	if (dev->intr_handle.uevent_fd >= 0) {
+		close(dev->intr_handle.uevent_fd);
+		dev->intr_handle.uevent_fd = -1;
+	}
 	dev->intr_handle.fd = -1;
 	dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
 }
diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
index 2e3bd12..2c4a3fb 100644
--- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c
+++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
@@ -65,6 +65,10 @@
 #include <rte_errno.h>
 #include <rte_spinlock.h>
 
+#include <sys/socket.h>
+#include <linux/netlink.h>
+#include <sys/epoll.h>
+
 #include "eal_private.h"
 #include "eal_vfio.h"
 #include "eal_thread.h"
@@ -669,10 +673,13 @@ struct rte_intr_source {
 			RTE_SET_USED(r);
 			return -1;
 		}
+
 		rte_spinlock_lock(&intr_lock);
 		TAILQ_FOREACH(src, &intr_sources, next)
-			if (src->intr_handle.fd ==
-					events[n].data.fd)
+			if ((src->intr_handle.fd ==
+					events[n].data.fd) ||
+				(src->intr_handle.uevent_fd ==
+					events[n].data.fd))
 				break;
 		if (src == NULL){
 			rte_spinlock_unlock(&intr_lock);
@@ -858,7 +865,24 @@ static __attribute__((noreturn)) void *
 			}
 			else
 				numfds++;
+
+			/**
+			 * add device uevent file descriptor
+			 * into wait list for uevent monitoring.
+			 */
+			ev.events = EPOLLIN | EPOLLPRI | EPOLLRDHUP | EPOLLHUP;
+			ev.data.fd = src->intr_handle.uevent_fd;
+			if (epoll_ctl(pfd, EPOLL_CTL_ADD,
+					src->intr_handle.uevent_fd, &ev) < 0){
+				rte_panic("Error adding uevent_fd %d epoll_ctl"
+					", %s\n",
+					src->intr_handle.uevent_fd,
+					strerror(errno));
+			} else
+				numfds++;
 		}
+
+
 		rte_spinlock_unlock(&intr_lock);
 		/* serve the interrupt */
 		eal_intr_handle_interrupts(pfd, numfds);
@@ -1255,3 +1279,111 @@ static __attribute__((noreturn)) void *
 
 	return 0;
 }
+
+int
+rte_uevent_connect(void)
+{
+	struct sockaddr_nl addr;
+	int ret;
+	int netlink_fd = -1;
+	int size = 64 * 1024;
+	int nonblock = 1;
+	memset(&addr, 0, sizeof(addr));
+	addr.nl_family = AF_NETLINK;
+	addr.nl_pid = 0;
+	addr.nl_groups = 0xffffffff;
+
+	netlink_fd = socket(PF_NETLINK, SOCK_DGRAM, NETLINK_KOBJECT_UEVENT);
+	if (netlink_fd < 0)
+		return -1;
+
+	setsockopt(netlink_fd, SOL_SOCKET, SO_RCVBUFFORCE, &size, sizeof(size));
+
+	ret = ioctl(netlink_fd, FIONBIO, &nonblock);
+	if (ret != 0) {
+		RTE_LOG(ERR, EAL,
+		"ioctl(FIONBIO) failed\n");
+		close(netlink_fd);
+		return -1;
+	}
+
+	if (bind(netlink_fd, (struct sockaddr *) &addr, sizeof(addr)) < 0) {
+		close(netlink_fd);
+		return -1;
+	}
+
+	return netlink_fd;
+}
+
+static int
+parse_event(const char *buf, struct rte_uevent *event)
+{
+	char action[RTE_UEVENT_MSG_LEN];
+	char subsystem[RTE_UEVENT_MSG_LEN];
+	char dev_path[RTE_UEVENT_MSG_LEN];
+	int i = 0;
+
+	memset(action, 0, RTE_UEVENT_MSG_LEN);
+	memset(subsystem, 0, RTE_UEVENT_MSG_LEN);
+	memset(dev_path, 0, RTE_UEVENT_MSG_LEN);
+
+	while (i < RTE_UEVENT_MSG_LEN) {
+		for (; i < RTE_UEVENT_MSG_LEN; i++) {
+			if (*buf)
+				break;
+			buf++;
+		}
+		if (!strncmp(buf, "ACTION=", 7)) {
+			buf += 7;
+			i += 7;
+			snprintf(action, sizeof(action), "%s", buf);
+		} else if (!strncmp(buf, "DEVPATH=", 8)) {
+			buf += 8;
+			i += 8;
+			snprintf(dev_path, sizeof(dev_path), "%s", buf);
+		} else if (!strncmp(buf, "SUBSYSTEM=", 10)) {
+			buf += 10;
+			i += 10;
+			snprintf(subsystem, sizeof(subsystem), "%s", buf);
+		}
+		for (; i < RTE_UEVENT_MSG_LEN; i++) {
+			if (*buf == '\0')
+				break;
+			buf++;
+		}
+	}
+
+	if (!strncmp(subsystem, "uio", 3)) {
+
+		event->subsystem = RTE_UEVENT_SUBSYSTEM_UIO;
+		if (!strncmp(action, "add", 3))
+			event->action = RTE_UEVENT_ADD;
+		if (!strncmp(action, "remove", 6))
+			event->action = RTE_UEVENT_REMOVE;
+		return 0;
+	}
+
+	return -1;
+}
+
+int
+rte_uevent_get(int fd, struct rte_uevent *uevent)
+{
+	int ret;
+	char buf[RTE_UEVENT_MSG_LEN];
+
+	memset(uevent, 0, sizeof(struct rte_uevent));
+	memset(buf, 0, RTE_UEVENT_MSG_LEN);
+
+	ret = recv(fd, buf, RTE_UEVENT_MSG_LEN - 1, MSG_DONTWAIT);
+	if (ret > 0)
+		return parse_event(buf, uevent);
+	else if (ret < 0) {
+		RTE_LOG(ERR, EAL,
+		"Socket read error(%d): %s\n",
+		errno, strerror(errno));
+		return -1;
+	} else
+		/* connection closed */
+		return -1;
+}
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
index fa10329..eae9cd5 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
@@ -231,6 +231,10 @@
 		close(dev->intr_handle.uio_cfg_fd);
 		dev->intr_handle.uio_cfg_fd = -1;
 	}
+	if (dev->intr_handle.uevent_fd >= 0) {
+		close(dev->intr_handle.uevent_fd);
+		dev->intr_handle.uevent_fd = -1;
+	}
 	if (dev->intr_handle.fd >= 0) {
 		close(dev->intr_handle.fd);
 		dev->intr_handle.fd = -1;
@@ -276,6 +280,8 @@
 		goto error;
 	}
 
+	dev->intr_handle.uevent_fd = rte_uevent_connect();
+
 	if (dev->kdrv == RTE_KDRV_IGB_UIO)
 		dev->intr_handle.type = RTE_INTR_HANDLE_UIO;
 	else {
diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
index 6daffeb..0b31a22 100644
--- a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
+++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
@@ -90,6 +90,7 @@ struct rte_intr_handle {
 					for uio_pci_generic */
 	};
 	int fd;	 /**< interrupt event file descriptor */
+	int uevent_fd;	 /**< uevent file descriptor */
 	enum rte_intr_handle_type type;  /**< handle type */
 	uint32_t max_intr;             /**< max interrupt requested */
 	uint32_t nb_efd;               /**< number of available efd(event fd) */
@@ -99,6 +100,19 @@ struct rte_intr_handle {
 	int *intr_vec;                 /**< intr vector number array */
 };
 
+#define RTE_UEVENT_MSG_LEN 4096
+#define RTE_UEVENT_SUBSYSTEM_UIO 1
+
+enum rte_uevent_action {
+	RTE_UEVENT_ADD = 0,		/**< uevent type of device add */
+	RTE_UEVENT_REMOVE = 1,	/**< uevent type of device remove*/
+};
+
+struct rte_uevent {
+	enum rte_uevent_action action;	/**< uevent action type */
+	int subsystem;				/**< subsystem id */
+};
+
 #define RTE_EPOLL_PER_THREAD        -1  /**< to hint using per thread epfd */
 
 /**
@@ -236,4 +250,27 @@ struct rte_intr_handle {
 int
 rte_intr_cap_multiple(struct rte_intr_handle *intr_handle);
 
+/**
+ * It read out the uevent from the specific file descriptor.
+ *
+ * @param fd
+ *   The fd which the uevent associated to
+ * @param uevent
+ *   Pointer to the uevent which read from the monitoring fd.
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+rte_uevent_get(int fd, struct rte_uevent *uevent);
+
+/**
+ * Connect to the device uevent file descriptor.
+ * @return
+ *   - On success, the connected uevent fd.
+ *   - On failure, a negative value.
+ */
+int
+rte_uevent_connect(void);
+
 #endif /* _RTE_LINUXAPP_INTERRUPTS_H_ */
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v3 2/2] net/i40e: add hot plug monitor in i40e
  2017-06-29  5:01     ` [PATCH v3 0/2] add uevent api for hot plug Jeff Guo
  2017-06-29  5:01       ` [PATCH v3 1/2] eal: " Jeff Guo
@ 2017-06-29  5:01       ` Jeff Guo
  2017-07-04  7:15         ` Wu, Jingjing
  2017-07-07  7:56         ` Thomas Monjalon
  1 sibling, 2 replies; 494+ messages in thread
From: Jeff Guo @ 2017-06-29  5:01 UTC (permalink / raw)
  To: helin.zhang, jingjing.wu
  Cc: dev, jia.guo, bruce.richardson, konstantin.ananyev, gaetan.rivet,
	thomas.monjalon, ferruh.yigit

From: "Guo, Jia" <jia.guo@intel.com>

This patch enable the hot plug feature in i40e, by monitoring the
hot plug uevent of the device. When remove event got, call the app
callback function to handle the detach process.

Signed-off-by: Guo, Jia <jia.guo@intel.com>
---
v3->v2: refine the return issue if device remove
---
 drivers/net/i40e/i40e_ethdev.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index 4ee1113..67ffc14 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -1283,6 +1283,7 @@ static inline void i40e_GLQF_reg_init(struct i40e_hw *hw)
 
 	/* enable uio intr after callback register */
 	rte_intr_enable(intr_handle);
+
 	/*
 	 * Add an ethertype filter to drop all flow control frames transmitted
 	 * from VSIs. By doing so, we stop VF from sending out PAUSE or PFC
@@ -5832,11 +5833,29 @@ struct i40e_vsi *
 {
 	struct rte_eth_dev *dev = (struct rte_eth_dev *)param;
 	struct i40e_hw *hw = I40E_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+	struct rte_uevent event;
 	uint32_t icr0;
+	struct rte_pci_device *pci_dev;
+	struct rte_intr_handle *intr_handle;
+
+	pci_dev = RTE_ETH_DEV_TO_PCI(dev);
+	intr_handle = &pci_dev->intr_handle;
 
 	/* Disable interrupt */
 	i40e_pf_disable_irq0(hw);
 
+	/* check device uevent */
+	if (rte_uevent_get(intr_handle->uevent_fd, &event) == 0) {
+		if (event.subsystem == RTE_UEVENT_SUBSYSTEM_UIO) {
+			if (event.action == RTE_UEVENT_REMOVE) {
+				_rte_eth_dev_callback_process(dev,
+					RTE_ETH_EVENT_INTR_RMV, NULL);
+				return;
+			}
+		}
+		goto done;
+	}
+
 	/* read out interrupt causes */
 	icr0 = I40E_READ_REG(hw, I40E_PFINT_ICR0);
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* Re: [PATCH v2 2/2] net/i40e: add hot plug monitor in i40e
  2017-06-29  4:48       ` Wu, Jingjing
@ 2017-06-29  7:47         ` Guo, Jia
  0 siblings, 0 replies; 494+ messages in thread
From: Guo, Jia @ 2017-06-29  7:47 UTC (permalink / raw)
  To: Wu, Jingjing, Stephen Hemminger
  Cc: Zhang, Helin, dev, Chang, Cunyin, Liang, Cunming

Agree with jingjing.

That patch is definitely not for generic fashion of hot plug,  the uevent just give the adding  approach to monitor the remove event even if the driver not add it as interrupt , we know mlx driver have already implement the event of remove interrupt into their infinite framework driver, but other driver maybe not yet.
So uevent is not unique for i40e or other intel nic, the aim just let more diversity drivers which use pci-uio framework  to use the common hot plug feature in DPDK.

Best regards,
Jeff Guo


-----Original Message-----
From: Wu, Jingjing 
Sent: Thursday, June 29, 2017 12:48 PM
To: Stephen Hemminger <stephen@networkplumber.org>; Guo, Jia <jia.guo@intel.com>
Cc: Zhang, Helin <helin.zhang@intel.com>; dev@dpdk.org; Chang, Cunyin <cunyin.chang@intel.com>; Liang, Cunming <cunming.liang@intel.com>
Subject: RE: [dpdk-dev] [PATCH v2 2/2] net/i40e: add hot plug monitor in i40e



> -----Original Message-----
> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Thursday, June 29, 2017 11:35 AM
> To: Guo, Jia <jia.guo@intel.com>
> Cc: Zhang, Helin <helin.zhang@intel.com>; Wu, Jingjing 
> <jingjing.wu@intel.com>; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v2 2/2] net/i40e: add hot plug monitor 
> in i40e
> 
> On Wed, 28 Jun 2017 19:07:24 +0800
> Jeff Guo <jia.guo@intel.com> wrote:
> 
> > From: "Guo, Jia" <jia.guo@intel.com>
> >
> > This patch enable the hot plug feature in i40e, by monitoring the 
> > hot plug uevent of the device. When remove event got, call the app 
> > callback function to handle the detach process.
> >
> > Signed-off-by: Guo, Jia <jia.guo@intel.com>
> > ---
> 
> Hot plug is good and needed.
> 
> But it needs to be done in a generic fashion in the bus layer.
> There is nothing about uevents that are unique to i40e or even Intel devices.
> Plus the way hotplug is handled is OS specific, so this isn't going to 
> work well on BSD.
> 
This patch is not a way to full support hut plug. And we know it is handled in OS specific.
This patch just provides a way to tell DPDK user the remove happened on this device (DPDK dev).

And Mlx driver already supports that with patch http://dpdk.org/dev/patchwork/patch/23695/

What GuoJia did is just making the EVENT can be process by application through interrupt callback Mechanisms.

> Sorry if I sound like a broken record but there has been a repeated 
> pattern of Intel developers  putting their head down (or in the sand) 
> and creating functionality inside device driver.
Sorry, I cannot agree.

Thanks
Jingjing

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [RFC] Add hot plug event in rte eal interrupt and inplement it in i40e driver.
  2017-06-07  7:40   ` Wu, Jingjing
  2017-06-15 21:22     ` Gaëtan Rivet
@ 2017-06-29 17:27     ` Stephen Hemminger
  2017-06-30  3:36       ` Wu, Jingjing
  1 sibling, 1 reply; 494+ messages in thread
From: Stephen Hemminger @ 2017-06-29 17:27 UTC (permalink / raw)
  To: Wu, Jingjing
  Cc: Gaëtan Rivet, Guo, Jia, Zhang, Helin, Richardson, Bruce,
	Ananyev, Konstantin, Liu, Yuanhan, dev

On Wed, 7 Jun 2017 07:40:37 +0000
"Wu, Jingjing" <jingjing.wu@intel.com> wrote:

> > >
> > >Secondly, in order to read out the uevent that monitoring, we need to add uevent API in rte  
> > layer. We plan add 2 , rte_uevent_connect and  rte_get_uevent. All driver interrupt handler
> > could use these API to enable the uevent monitoring, and read out the uevent type , then
> > corresponding to handle these uevent, such as detach the device when get the remove type.  
> > >  
> > 
> > I find having a generic uevent API interesting.
> > 
> > However, all specifics pertaining to UIO use (hotplug_fd, subsystem
> > enum) should stay in UIO specific code (eal_pci_uio.c?).
> >  
> Yes, but it can be also considered as interrupt mechanism, right?
> 
> > I am currently moving the PCI bus out of the EAL. EAL subsystems should
> > not rely on PCI specifics, as they won't be available afterward.  
> 
> Will the interrupt handling be kept in EAL, right?
> 
> > It should also allow you to clean up your API. Exposing hotplug_fd and
> > requiring PMDs to link it can be avoided and should result in a simpler
> > API.  

You were right given the current model this is the correct way to do it.
It would be good if the interrupt model stuff could be moved back into EAL
so that if device is removed, no code in driver needs to be added.

All the bus -> device -> interrupt state is visible, and EAL should
be able to unwind from there.  Thinking more of the Linux model where
there is no need  (in general) for hot plug specific code in each driver.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [RFC] Add hot plug event in rte eal interrupt and inplement it in i40e driver.
  2017-06-29 17:27     ` Stephen Hemminger
@ 2017-06-30  3:36       ` Wu, Jingjing
  0 siblings, 0 replies; 494+ messages in thread
From: Wu, Jingjing @ 2017-06-30  3:36 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Gaëtan Rivet, Guo, Jia, Zhang, Helin, Richardson, Bruce,
	Ananyev, Konstantin, Liu, Yuanhan, dev



> -----Original Message-----
> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Friday, June 30, 2017 1:28 AM
> To: Wu, Jingjing <jingjing.wu@intel.com>
> Cc: Gaëtan Rivet <gaetan.rivet@6wind.com>; Guo, Jia <jia.guo@intel.com>;
> Zhang, Helin <helin.zhang@intel.com>; Richardson, Bruce
> <bruce.richardson@intel.com>; Ananyev, Konstantin
> <konstantin.ananyev@intel.com>; Liu, Yuanhan <yuanhan.liu@intel.com>;
> dev@dpdk.org
> Subject: Re: [dpdk-dev] [RFC] Add hot plug event in rte eal interrupt and
> inplement it in i40e driver.
> 
> On Wed, 7 Jun 2017 07:40:37 +0000
> "Wu, Jingjing" <jingjing.wu@intel.com> wrote:
> 
> > > >
> > > >Secondly, in order to read out the uevent that monitoring, we need
> > > >to add uevent API in rte
> > > layer. We plan add 2 , rte_uevent_connect and  rte_get_uevent. All
> > > driver interrupt handler could use these API to enable the uevent
> > > monitoring, and read out the uevent type , then corresponding to handle
> these uevent, such as detach the device when get the remove type.
> > > >
> > >
> > > I find having a generic uevent API interesting.
> > >
> > > However, all specifics pertaining to UIO use (hotplug_fd, subsystem
> > > enum) should stay in UIO specific code (eal_pci_uio.c?).
> > >
> > Yes, but it can be also considered as interrupt mechanism, right?
> >
> > > I am currently moving the PCI bus out of the EAL. EAL subsystems
> > > should not rely on PCI specifics, as they won't be available afterward.
> >
> > Will the interrupt handling be kept in EAL, right?
> >
> > > It should also allow you to clean up your API. Exposing hotplug_fd
> > > and requiring PMDs to link it can be avoided and should result in a
> > > simpler API.
> 
> You were right given the current model this is the correct way to do it.
Does it mean this way in this RFC is reasonable given the current model?

> It would be good if the interrupt model stuff could be moved back into EAL so
> that if device is removed, no code in driver needs to be added.
> 
At list we still need expose some interrupt/event by drivers. Such as LSC, Rx interrupt.

> All the bus -> device -> interrupt state is visible, and EAL should be able to
> unwind from there.  Thinking more of the Linux model where there is no need
> (in general) for hot plug specific code in each driver.

Yes, such simpler API is what I like to see too. But now, the remove event report
by this way is more economical.


Thanks
Jingjing 

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v3 1/2] eal: add uevent api for hot plug
  2017-06-29  4:37       ` [PATCH v3 1/2] eal: " Jeff Guo
@ 2017-06-30  3:38         ` Wu, Jingjing
  0 siblings, 0 replies; 494+ messages in thread
From: Wu, Jingjing @ 2017-06-30  3:38 UTC (permalink / raw)
  To: Guo, Jia, Zhang, Helin; +Cc: dev



> -----Original Message-----
> From: Guo, Jia
> Sent: Thursday, June 29, 2017 12:38 PM
> To: Zhang, Helin <helin.zhang@intel.com>; Wu, Jingjing
> <jingjing.wu@intel.com>
> Cc: dev@dpdk.org; Guo, Jia <jia.guo@intel.com>
> Subject: [PATCH v3 1/2] eal: add uevent api for hot plug
> 
> From: "Guo, Jia" <jia.guo@intel.com>
> 
> This patch aim to add a variable "uevent_fd" in structure "rte_intr_handle" for
> enable kernel object uevent monitoring, and add some uevent API in rte eal
> interrupt, that is “rte_uevent_connect” and “rte_uevent_get”, so that all driver
> could use these API to monitor and read out the uevent, then corresponding to
> handle these uevent, such as detach or attach the device.
> 
> Signed-off-by: Guo, Jia <jia.guo@intel.com>

Looks fine from me.

Reviewed-by: Jingjing Wu <jingjing.wu@intel.com>

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v3 2/2] net/i40e: add hot plug monitor in i40e
  2017-06-29  4:37       ` [PATCH v3 2/2] net/i40e: add hot plug monitor in i40e Jeff Guo
@ 2017-06-30  3:38         ` Wu, Jingjing
  0 siblings, 0 replies; 494+ messages in thread
From: Wu, Jingjing @ 2017-06-30  3:38 UTC (permalink / raw)
  To: Guo, Jia, Zhang, Helin; +Cc: dev



> -----Original Message-----
> From: Guo, Jia
> Sent: Thursday, June 29, 2017 12:38 PM
> To: Zhang, Helin <helin.zhang@intel.com>; Wu, Jingjing
> <jingjing.wu@intel.com>
> Cc: dev@dpdk.org; Guo, Jia <jia.guo@intel.com>
> Subject: [PATCH v3 2/2] net/i40e: add hot plug monitor in i40e
> 
> From: "Guo, Jia" <jia.guo@intel.com>
> 
> This patch enable the hot plug feature in i40e, by monitoring the hot plug
> uevent of the device. When remove event got, call the app callback function to
> handle the detach process.
> 
> Signed-off-by: Guo, Jia <jia.guo@intel.com>

Acked-by: Jingjing Wu <jingjing.wu@intel.com>

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v3 2/2] net/i40e: add hot plug monitor in i40e
  2017-06-29  5:01       ` [PATCH v3 2/2] net/i40e: add hot plug monitor in i40e Jeff Guo
@ 2017-07-04  7:15         ` Wu, Jingjing
  2017-07-07  7:56         ` Thomas Monjalon
  1 sibling, 0 replies; 494+ messages in thread
From: Wu, Jingjing @ 2017-07-04  7:15 UTC (permalink / raw)
  To: Guo, Jia, Zhang, Helin
  Cc: dev, Richardson, Bruce, Ananyev, Konstantin, gaetan.rivet,
	thomas.monjalon, Yigit, Ferruh



> -----Original Message-----
> From: Guo, Jia
> Sent: Thursday, June 29, 2017 1:02 PM
> To: Zhang, Helin <helin.zhang@intel.com>; Wu, Jingjing
> <jingjing.wu@intel.com>
> Cc: dev@dpdk.org; Guo, Jia <jia.guo@intel.com>; Richardson, Bruce
> <bruce.richardson@intel.com>; Ananyev, Konstantin
> <konstantin.ananyev@intel.com>; gaetan.rivet@6wind.com;
> thomas.monjalon@6wind.com; Yigit, Ferruh <ferruh.yigit@intel.com>
> Subject: [PATCH v3 2/2] net/i40e: add hot plug monitor in i40e
> 
> From: "Guo, Jia" <jia.guo@intel.com>
> 
> This patch enable the hot plug feature in i40e, by monitoring the hot plug
> uevent of the device. When remove event got, call the app callback function to
> handle the detach process.
> 
> Signed-off-by: Guo, Jia <jia.guo@intel.com>

Acked-by: Jingjing Wu <jingjing.wu@intel.com>

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v3 1/2] eal: add uevent api for hot plug
  2017-06-29  5:01       ` [PATCH v3 1/2] eal: " Jeff Guo
@ 2017-07-04  7:15         ` Wu, Jingjing
  2017-09-03 15:49         ` [PATCH v4 0/2] add uevent monitor " Jeff Guo
  1 sibling, 0 replies; 494+ messages in thread
From: Wu, Jingjing @ 2017-07-04  7:15 UTC (permalink / raw)
  To: Guo, Jia, Zhang, Helin
  Cc: dev, Richardson, Bruce, Ananyev, Konstantin, gaetan.rivet,
	thomas.monjalon, Yigit, Ferruh



> -----Original Message-----
> From: Guo, Jia
> Sent: Thursday, June 29, 2017 1:02 PM
> To: Zhang, Helin <helin.zhang@intel.com>; Wu, Jingjing
> <jingjing.wu@intel.com>
> Cc: dev@dpdk.org; Guo, Jia <jia.guo@intel.com>; Richardson, Bruce
> <bruce.richardson@intel.com>; Ananyev, Konstantin
> <konstantin.ananyev@intel.com>; gaetan.rivet@6wind.com;
> thomas.monjalon@6wind.com; Yigit, Ferruh <ferruh.yigit@intel.com>
> Subject: [PATCH v3 1/2] eal: add uevent api for hot plug
> 
> From: "Guo, Jia" <jia.guo@intel.com>
> 
> This patch aim to add a variable "uevent_fd" in structure "rte_intr_handle" for
> enable kernel object uevent monitoring, and add some uevent API in rte eal
> interrupt, that is “rte_uevent_connect” and “rte_uevent_get”, so that all driver
> could use these API to monitor and read out the uevent, then corresponding to
> handle these uevent, such as detach or attach the device.
> 
> Signed-off-by: Guo, Jia <jia.guo@intel.com>

Looks fine from me.

Reviewed-by: Jingjing Wu <jingjing.wu@intel.com>

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v2 1/2] eal: add uevent api for hot plug
  2017-06-28 11:07 ` [PATCH v2 1/2] eal: add uevent api for hot plug Jeff Guo
  2017-06-28 11:07   ` [PATCH v2 2/2] net/i40e: add hot plug monitor in i40e Jeff Guo
  2017-06-29  2:25   ` [PATCH v2 1/2] eal: add uevent api for hot plug Wu, Jingjing
@ 2017-07-04 23:45   ` Thomas Monjalon
  2017-07-05  3:02     ` Guo, Jia
  2 siblings, 1 reply; 494+ messages in thread
From: Thomas Monjalon @ 2017-07-04 23:45 UTC (permalink / raw)
  To: Jeff Guo; +Cc: dev, helin.zhang, jingjing.wu

Hi,

This is an interesting step for hotplug in DPDK.

28/06/2017 13:07, Jeff Guo:
> +       netlink_fd = socket(PF_NETLINK, SOCK_DGRAM, NETLINK_KOBJECT_UEVENT);

It is monitoring the whole system...

> +int
> +rte_uevent_get(int fd, struct rte_uevent *uevent)
> +{
> +       int ret;
> +       char buf[RTE_UEVENT_MSG_LEN];
> +
> +       memset(uevent, 0, sizeof(struct rte_uevent));
> +       memset(buf, 0, RTE_UEVENT_MSG_LEN);
> +
> +       ret = recv(fd, buf, RTE_UEVENT_MSG_LEN - 1, MSG_DONTWAIT);

... and it is read from this function called by one driver.
It cannot work without a global dispatch.

It must be a global mechanism, probably a service core.
The question is also to know whether it should be a mandatory
service in DPDK or an optional helper?

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v2 1/2] eal: add uevent api for hot plug
  2017-07-04 23:45   ` Thomas Monjalon
@ 2017-07-05  3:02     ` Guo, Jia
  2017-07-05  7:32       ` Thomas Monjalon
  0 siblings, 1 reply; 494+ messages in thread
From: Guo, Jia @ 2017-07-05  3:02 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev, helin.zhang, jingjing.wu

hi, thomas


On 7/5/2017 7:45 AM, Thomas Monjalon wrote:
> Hi,
>
> This is an interesting step for hotplug in DPDK.
>
> 28/06/2017 13:07, Jeff Guo:
>> +       netlink_fd = socket(PF_NETLINK, SOCK_DGRAM, NETLINK_KOBJECT_UEVENT);
> It is monitoring the whole system...
>
>> +int
>> +rte_uevent_get(int fd, struct rte_uevent *uevent)
>> +{
>> +       int ret;
>> +       char buf[RTE_UEVENT_MSG_LEN];
>> +
>> +       memset(uevent, 0, sizeof(struct rte_uevent));
>> +       memset(buf, 0, RTE_UEVENT_MSG_LEN);
>> +
>> +       ret = recv(fd, buf, RTE_UEVENT_MSG_LEN - 1, MSG_DONTWAIT);
> ... and it is read from this function called by one driver.
> It cannot work without a global dispatch.
the rte_uevent-connect is called in func of pci_uio_alloc_resource, so 
each socket is created by  by each uio device. so i think that would not 
affect each driver isolate to use it.
> It must be a global mechanism, probably a service core.
> The question is also to know whether it should be a mandatory
> service in DPDK or an optional helper?
a global mechanism would be good, but so far, include mlx driver, we all 
handle the hot plug event in driver by app's registered callback. maybe 
a better global would be try in the future. but now is would work for all
pci uio device.
and more, if in pci uio device to use hot plug , i think it might be 
mandatory.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v2 1/2] eal: add uevent api for hot plug
  2017-07-05  3:02     ` Guo, Jia
@ 2017-07-05  7:32       ` Thomas Monjalon
  2017-07-05  9:04         ` Guo, Jia
  0 siblings, 1 reply; 494+ messages in thread
From: Thomas Monjalon @ 2017-07-05  7:32 UTC (permalink / raw)
  To: Guo, Jia; +Cc: dev, helin.zhang, jingjing.wu

05/07/2017 05:02, Guo, Jia:
> hi, thomas
> 
> 
> On 7/5/2017 7:45 AM, Thomas Monjalon wrote:
> > Hi,
> >
> > This is an interesting step for hotplug in DPDK.
> >
> > 28/06/2017 13:07, Jeff Guo:
> >> +       netlink_fd = socket(PF_NETLINK, SOCK_DGRAM, NETLINK_KOBJECT_UEVENT);
> > It is monitoring the whole system...
> >
> >> +int
> >> +rte_uevent_get(int fd, struct rte_uevent *uevent)
> >> +{
> >> +       int ret;
> >> +       char buf[RTE_UEVENT_MSG_LEN];
> >> +
> >> +       memset(uevent, 0, sizeof(struct rte_uevent));
> >> +       memset(buf, 0, RTE_UEVENT_MSG_LEN);
> >> +
> >> +       ret = recv(fd, buf, RTE_UEVENT_MSG_LEN - 1, MSG_DONTWAIT);
> > ... and it is read from this function called by one driver.
> > It cannot work without a global dispatch.
> the rte_uevent-connect is called in func of pci_uio_alloc_resource, so 
> each socket is created by  by each uio device. so i think that would not 
> affect each driver isolate to use it.

Ah OK, I missed it.

> > It must be a global mechanism, probably a service core.
> > The question is also to know whether it should be a mandatory
> > service in DPDK or an optional helper?
> a global mechanism would be good, but so far, include mlx driver, we all 
> handle the hot plug event in driver by app's registered callback. maybe 
> a better global would be try in the future. but now is would work for all
> pci uio device.

mlx drivers have a special connection to the kernel through the associated
mlx kernel drivers. That's why the PMD handle the events in a specific way.

You are adding event handling for UIO.
Now we need also VFIO.

I am wondering how it could be better integrated in the bus layer.

> and more, if in pci uio device to use hot plug , i think it might be 
> mandatory.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v2 1/2] eal: add uevent api for hot plug
  2017-07-05  7:32       ` Thomas Monjalon
@ 2017-07-05  9:04         ` Guo, Jia
  2017-08-22 14:56           ` Guo, Jia
  0 siblings, 1 reply; 494+ messages in thread
From: Guo, Jia @ 2017-07-05  9:04 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev, helin.zhang, jingjing.wu



On 7/5/2017 3:32 PM, Thomas Monjalon wrote:
> 05/07/2017 05:02, Guo, Jia:
>> hi, thomas
>>
>>
>> On 7/5/2017 7:45 AM, Thomas Monjalon wrote:
>>> Hi,
>>>
>>> This is an interesting step for hotplug in DPDK.
>>>
>>> 28/06/2017 13:07, Jeff Guo:
>>>> +       netlink_fd = socket(PF_NETLINK, SOCK_DGRAM, NETLINK_KOBJECT_UEVENT);
>>> It is monitoring the whole system...
>>>
>>>> +int
>>>> +rte_uevent_get(int fd, struct rte_uevent *uevent)
>>>> +{
>>>> +       int ret;
>>>> +       char buf[RTE_UEVENT_MSG_LEN];
>>>> +
>>>> +       memset(uevent, 0, sizeof(struct rte_uevent));
>>>> +       memset(buf, 0, RTE_UEVENT_MSG_LEN);
>>>> +
>>>> +       ret = recv(fd, buf, RTE_UEVENT_MSG_LEN - 1, MSG_DONTWAIT);
>>> ... and it is read from this function called by one driver.
>>> It cannot work without a global dispatch.
>> the rte_uevent-connect is called in func of pci_uio_alloc_resource, so
>> each socket is created by  by each uio device. so i think that would not
>> affect each driver isolate to use it.
> Ah OK, I missed it.
>
>>> It must be a global mechanism, probably a service core.
>>> The question is also to know whether it should be a mandatory
>>> service in DPDK or an optional helper?
>> a global mechanism would be good, but so far, include mlx driver, we all
>> handle the hot plug event in driver by app's registered callback. maybe
>> a better global would be try in the future. but now is would work for all
>> pci uio device.
> mlx drivers have a special connection to the kernel through the associated
> mlx kernel drivers. That's why the PMD handle the events in a specific way.
>
> You are adding event handling for UIO.
> Now we need also VFIO.
>
> I am wondering how it could be better integrated in the bus layer.
absolutely, hot plug for VFIO must be request more for the live 
migration,  and we plan to add it at next level, when we go thought all 
uio hot plug feature integration done. so, could i expect an ack if 
there aren't other concern about uio uevent here. thanks.
>> and more, if in pci uio device to use hot plug , i think it might be
>> mandatory.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v3 2/2] net/i40e: add hot plug monitor in i40e
  2017-06-29  5:01       ` [PATCH v3 2/2] net/i40e: add hot plug monitor in i40e Jeff Guo
  2017-07-04  7:15         ` Wu, Jingjing
@ 2017-07-07  7:56         ` Thomas Monjalon
  2017-07-07 10:17           ` Thomas Monjalon
  1 sibling, 1 reply; 494+ messages in thread
From: Thomas Monjalon @ 2017-07-07  7:56 UTC (permalink / raw)
  To: Jeff Guo
  Cc: dev, helin.zhang, jingjing.wu, bruce.richardson,
	konstantin.ananyev, gaetan.rivet, thomas.monjalon, ferruh.yigit

29/06/2017 07:01, Jeff Guo:
> --- a/drivers/net/i40e/i40e_ethdev.c
> +++ b/drivers/net/i40e/i40e_ethdev.c
> @@ -1283,6 +1283,7 @@ static inline void i40e_GLQF_reg_init(struct i40e_hw *hw)
>  
>  	/* enable uio intr after callback register */
>  	rte_intr_enable(intr_handle);
> +
>  	/*
>  	 * Add an ethertype filter to drop all flow control frames transmitted
>  	 * from VSIs. By doing so, we stop VF from sending out PAUSE or PFC
> @@ -5832,11 +5833,29 @@ struct i40e_vsi *
>  {
>  	struct rte_eth_dev *dev = (struct rte_eth_dev *)param;
>  	struct i40e_hw *hw = I40E_DEV_PRIVATE_TO_HW(dev->data->dev_private);
> +	struct rte_uevent event;
>  	uint32_t icr0;
> +	struct rte_pci_device *pci_dev;
> +	struct rte_intr_handle *intr_handle;
> +
> +	pci_dev = RTE_ETH_DEV_TO_PCI(dev);
> +	intr_handle = &pci_dev->intr_handle;
>  
>  	/* Disable interrupt */
>  	i40e_pf_disable_irq0(hw);
>  
> +	/* check device uevent */
> +	if (rte_uevent_get(intr_handle->uevent_fd, &event) == 0) {
> +		if (event.subsystem == RTE_UEVENT_SUBSYSTEM_UIO) {
> +			if (event.action == RTE_UEVENT_REMOVE) {
> +				_rte_eth_dev_callback_process(dev,
> +					RTE_ETH_EVENT_INTR_RMV, NULL);
> +				return;
> +			}
> +		}
> +		goto done;
> +	}

There is nothing specific to i40e in this patch.
It seems wrong to add such generic code in every drivers.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v3 2/2] net/i40e: add hot plug monitor in i40e
  2017-07-07  7:56         ` Thomas Monjalon
@ 2017-07-07 10:17           ` Thomas Monjalon
  2017-07-07 14:08             ` Guo, Jia
  0 siblings, 1 reply; 494+ messages in thread
From: Thomas Monjalon @ 2017-07-07 10:17 UTC (permalink / raw)
  To: Jeff Guo
  Cc: dev, helin.zhang, jingjing.wu, bruce.richardson,
	konstantin.ananyev, gaetan.rivet, thomas.monjalon, ferruh.yigit

07/07/2017 09:56, Thomas Monjalon:
> 29/06/2017 07:01, Jeff Guo:
> > --- a/drivers/net/i40e/i40e_ethdev.c
> > +++ b/drivers/net/i40e/i40e_ethdev.c
> > @@ -1283,6 +1283,7 @@ static inline void i40e_GLQF_reg_init(struct i40e_hw *hw)
> >  
> >  	/* enable uio intr after callback register */
> >  	rte_intr_enable(intr_handle);
> > +
> >  	/*
> >  	 * Add an ethertype filter to drop all flow control frames transmitted
> >  	 * from VSIs. By doing so, we stop VF from sending out PAUSE or PFC
> > @@ -5832,11 +5833,29 @@ struct i40e_vsi *
> >  {
> >  	struct rte_eth_dev *dev = (struct rte_eth_dev *)param;
> >  	struct i40e_hw *hw = I40E_DEV_PRIVATE_TO_HW(dev->data->dev_private);
> > +	struct rte_uevent event;
> >  	uint32_t icr0;
> > +	struct rte_pci_device *pci_dev;
> > +	struct rte_intr_handle *intr_handle;
> > +
> > +	pci_dev = RTE_ETH_DEV_TO_PCI(dev);
> > +	intr_handle = &pci_dev->intr_handle;
> >  
> >  	/* Disable interrupt */
> >  	i40e_pf_disable_irq0(hw);
> >  
> > +	/* check device uevent */
> > +	if (rte_uevent_get(intr_handle->uevent_fd, &event) == 0) {
> > +		if (event.subsystem == RTE_UEVENT_SUBSYSTEM_UIO) {
> > +			if (event.action == RTE_UEVENT_REMOVE) {
> > +				_rte_eth_dev_callback_process(dev,
> > +					RTE_ETH_EVENT_INTR_RMV, NULL);
> > +				return;
> > +			}
> > +		}
> > +		goto done;
> > +	}
> 
> There is nothing specific to i40e in this patch.
> It seems wrong to add such generic code in every drivers.

It should be managed at bus layer and not be specific to ethdev only.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v3 2/2] net/i40e: add hot plug monitor in i40e
  2017-07-07 10:17           ` Thomas Monjalon
@ 2017-07-07 14:08             ` Guo, Jia
  2017-07-09 22:35               ` Thomas Monjalon
  0 siblings, 1 reply; 494+ messages in thread
From: Guo, Jia @ 2017-07-07 14:08 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: dev, helin.zhang, jingjing.wu, bruce.richardson,
	konstantin.ananyev, gaetan.rivet, thomas.monjalon, ferruh.yigit



On 7/7/2017 6:17 PM, Thomas Monjalon wrote:
> 07/07/2017 09:56, Thomas Monjalon:
>> 29/06/2017 07:01, Jeff Guo:
>>> --- a/drivers/net/i40e/i40e_ethdev.c
>>> +++ b/drivers/net/i40e/i40e_ethdev.c
>>> @@ -1283,6 +1283,7 @@ static inline void i40e_GLQF_reg_init(struct i40e_hw *hw)
>>>   
>>>   	/* enable uio intr after callback register */
>>>   	rte_intr_enable(intr_handle);
>>> +
>>>   	/*
>>>   	 * Add an ethertype filter to drop all flow control frames transmitted
>>>   	 * from VSIs. By doing so, we stop VF from sending out PAUSE or PFC
>>> @@ -5832,11 +5833,29 @@ struct i40e_vsi *
>>>   {
>>>   	struct rte_eth_dev *dev = (struct rte_eth_dev *)param;
>>>   	struct i40e_hw *hw = I40E_DEV_PRIVATE_TO_HW(dev->data->dev_private);
>>> +	struct rte_uevent event;
>>>   	uint32_t icr0;
>>> +	struct rte_pci_device *pci_dev;
>>> +	struct rte_intr_handle *intr_handle;
>>> +
>>> +	pci_dev = RTE_ETH_DEV_TO_PCI(dev);
>>> +	intr_handle = &pci_dev->intr_handle;
>>>   
>>>   	/* Disable interrupt */
>>>   	i40e_pf_disable_irq0(hw);
>>>   
>>> +	/* check device uevent */
>>> +	if (rte_uevent_get(intr_handle->uevent_fd, &event) == 0) {
>>> +		if (event.subsystem == RTE_UEVENT_SUBSYSTEM_UIO) {
>>> +			if (event.action == RTE_UEVENT_REMOVE) {
>>> +				_rte_eth_dev_callback_process(dev,
>>> +					RTE_ETH_EVENT_INTR_RMV, NULL);
>>> +				return;
>>> +			}
>>> +		}if
>>> +		goto done;
>>> +	}
>> There is nothing specific to i40e in this patch.
>> It seems wrong to add such generic code in every drivers.
> It should be managed at bus layer and not be specific to ethdev only.
  if all could managed at bus layer might also be what i want to see, 
but that would not so economical at currently. because of the event need 
to exposure to driver to use app's callback to handle it by 
detach/attach device. mlx driver also go through this path to show the 
rmv event usege. we just add uevent for pci uio device. Anyway, i think 
the uevent must be useful for future PF/VFIO live migration. if there 
are not other concern about the other patch that added uevent api in 
eal([PATCH v3 1/2] eal: add uevent api for hot plug), i suggest to merge 
it at first. Then we could go next to enhancement it with the 6wind hot 
plug feature.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v3 2/2] net/i40e: add hot plug monitor in i40e
  2017-07-07 14:08             ` Guo, Jia
@ 2017-07-09 22:35               ` Thomas Monjalon
  2017-07-12  7:36                 ` Guo, Jia
  0 siblings, 1 reply; 494+ messages in thread
From: Thomas Monjalon @ 2017-07-09 22:35 UTC (permalink / raw)
  To: Guo, Jia
  Cc: dev, helin.zhang, jingjing.wu, bruce.richardson,
	konstantin.ananyev, gaetan.rivet, thomas.monjalon, ferruh.yigit

07/07/2017 16:08, Guo, Jia:
> 
> On 7/7/2017 6:17 PM, Thomas Monjalon wrote:
> > 07/07/2017 09:56, Thomas Monjalon:
> >> 29/06/2017 07:01, Jeff Guo:
> >>> --- a/drivers/net/i40e/i40e_ethdev.c
> >>> +++ b/drivers/net/i40e/i40e_ethdev.c
> >>> @@ -1283,6 +1283,7 @@ static inline void i40e_GLQF_reg_init(struct i40e_hw *hw)
> >>>   
> >>>   	/* enable uio intr after callback register */
> >>>   	rte_intr_enable(intr_handle);
> >>> +
> >>>   	/*
> >>>   	 * Add an ethertype filter to drop all flow control frames transmitted
> >>>   	 * from VSIs. By doing so, we stop VF from sending out PAUSE or PFC
> >>> @@ -5832,11 +5833,29 @@ struct i40e_vsi *
> >>>   {
> >>>   	struct rte_eth_dev *dev = (struct rte_eth_dev *)param;
> >>>   	struct i40e_hw *hw = I40E_DEV_PRIVATE_TO_HW(dev->data->dev_private);
> >>> +	struct rte_uevent event;
> >>>   	uint32_t icr0;
> >>> +	struct rte_pci_device *pci_dev;
> >>> +	struct rte_intr_handle *intr_handle;
> >>> +
> >>> +	pci_dev = RTE_ETH_DEV_TO_PCI(dev);
> >>> +	intr_handle = &pci_dev->intr_handle;
> >>>   
> >>>   	/* Disable interrupt */
> >>>   	i40e_pf_disable_irq0(hw);
> >>>   
> >>> +	/* check device uevent */
> >>> +	if (rte_uevent_get(intr_handle->uevent_fd, &event) == 0) {
> >>> +		if (event.subsystem == RTE_UEVENT_SUBSYSTEM_UIO) {
> >>> +			if (event.action == RTE_UEVENT_REMOVE) {
> >>> +				_rte_eth_dev_callback_process(dev,
> >>> +					RTE_ETH_EVENT_INTR_RMV, NULL);
> >>> +				return;
> >>> +			}
> >>> +		}if
> >>> +		goto done;
> >>> +	}
> >> There is nothing specific to i40e in this patch.
> >> It seems wrong to add such generic code in every drivers.
> > It should be managed at bus layer and not be specific to ethdev only.
>   if all could managed at bus layer might also be what i want to see, 
> but that would not so economical at currently. because of the event need 
> to exposure to driver to use app's callback to handle it by 
> detach/attach device. mlx driver also go through this path to show the 
> rmv event usege. we just add uevent for pci uio device. Anyway, i think 
> the uevent must be useful for future PF/VFIO live migration. if there 
> are not other concern about the other patch that added uevent api in 
> eal([PATCH v3 1/2] eal: add uevent api for hot plug), i suggest to merge 
> it at first. Then we could go next to enhancement it with the 6wind hot 
> plug feature.

Sorry it looks wrong to apply half of this patchset, given we are not
sure how the remaining part should be implemented.
Let's take time for a better solution and try to gather more opinions.
It will be highlighted as one of the next priorities, after the bus rework
in progress.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v3 2/2] net/i40e: add hot plug monitor in i40e
  2017-07-09 22:35               ` Thomas Monjalon
@ 2017-07-12  7:36                 ` Guo, Jia
  0 siblings, 0 replies; 494+ messages in thread
From: Guo, Jia @ 2017-07-12  7:36 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: dev, helin.zhang, jingjing.wu, bruce.richardson,
	konstantin.ananyev, gaetan.rivet, thomas.monjalon, ferruh.yigit

On 7/10/2017 6:35 AM, Thomas Monjalon wrote:

> 07/07/2017 16:08, Guo, Jia:
>> On 7/7/2017 6:17 PM, Thomas Monjalon wrote:
>>> 07/07/2017 09:56, Thomas Monjalon:
>>>> 29/06/2017 07:01, Jeff Guo:
>>>>> --- a/drivers/net/i40e/i40e_ethdev.c
>>>>> +++ b/drivers/net/i40e/i40e_ethdev.c
>>>>> @@ -1283,6 +1283,7 @@ static inline void i40e_GLQF_reg_init(struct i40e_hw *hw)
>>>>>    
>>>>>    	/* enable uio intr after callback register */
>>>>>    	rte_intr_enable(intr_handle);
>>>>> +
>>>>>    	/*
>>>>>    	 * Add an ethertype filter to drop all flow control frames transmitted
>>>>>    	 * from VSIs. By doing so, we stop VF from sending out PAUSE or PFC
>>>>> @@ -5832,11 +5833,29 @@ struct i40e_vsi *
>>>>>    {
>>>>>    	struct rte_eth_dev *dev = (struct rte_eth_dev *)param;
>>>>>    	struct i40e_hw *hw = I40E_DEV_PRIVATE_TO_HW(dev->data->dev_private);
>>>>> +	struct rte_uevent event;
>>>>>    	uint32_t icr0;
>>>>> +	struct rte_pci_device *pci_dev;
>>>>> +	struct rte_intr_handle *intr_handle;
>>>>> +
>>>>> +	pci_dev = RTE_ETH_DEV_TO_PCI(dev);
>>>>> +	intr_handle = &pci_dev->intr_handle;
>>>>>    
>>>>>    	/* Disable interrupt */
>>>>>    	i40e_pf_disable_irq0(hw);
>>>>>    
>>>>> +	/* check device uevent */
>>>>> +	if (rte_uevent_get(intr_handle->uevent_fd, &event) == 0) {
>>>>> +		if (event.subsystem == RTE_UEVENT_SUBSYSTEM_UIO) {
>>>>> +			if (event.action == RTE_UEVENT_REMOVE) {
>>>>> +				_rte_eth_dev_callback_process(dev,
>>>>> +					RTE_ETH_EVENT_INTR_RMV, NULL);
>>>>> +				return;
>>>>> +			}
>>>>> +		}if
>>>>> +		goto done;
>>>>> +	}
>>>> There is nothing specific to i40e in this patch.
>>>> It seems wrong to add such generic code in every drivers.
>>> It should be managed at bus layer and not be specific to ethdev only.
>>    if all could managed at bus layer might also be what i want to see,
>> but that would not so economical at currently. because of the event need
>> to exposure to driver to use app's callback to handle it by
>> detach/attach device. mlx driver also go through this path to show the
>> rmv event usege. we just add uevent for pci uio device. Anyway, i think
>> the uevent must be useful for future PF/VFIO live migration. if there
>> are not other concern about the other patch that added uevent api in
>> eal([PATCH v3 1/2] eal: add uevent api for hot plug), i suggest to merge
>> it at first. Then we could go next to enhancement it with the 6wind hot
>> plug feature.
> Sorry it looks wrong to apply half of this patchset, given we are not
> sure how the remaining part should be implemented.
> Let's take time for a better solution and try to gather more opinions.
> It will be highlighted as one of the next priorities, after the bus rework
> in progress.
some how i agree with your concern, maybe we could find a better option 
to handle that.
1) about bus layer , as we know , recently the pci uio api be moved out 
of the eal bus, but not for the eal interrupt. so that would not affect 
if we add uio api in, just like the exist api "uio_intr_enable" , i 
could modify the api name into "uio_uevent_get", "uio_uevent_connnect" 
for better identify. But it still in eal interrupt layer before we 
decide to modify the eal interrupt layer.
2) about the uevent callback handler,  fail-safe driver solution said 
that fail-safe driver register the callback into device, and device 
interrupt handler identify the event and callback fail-safe to switch 
sub-device to process hot plug out case.  so if the event process not 
place in device(mlx driver handle it in device driver), it maybe place 
in eal. To be generic for all pci uio driver. i could add 
"uio_uevent_handler" api into eal interrupt. but it will bring some 
ethdev process into eal. other wise we still need to add the handler in 
all device driver like this patch here. Any comment about it?
3) the patch set here is about uevent monitoring for pci uio driver. the 
hot plug remove event processing is depend on the failsave driver. 
bellow for ref.
http://dpdk.org/dev/patchwork/patch/26842/

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v2 1/2] eal: add uevent api for hot plug
  2017-07-05  9:04         ` Guo, Jia
@ 2017-08-22 14:56           ` Guo, Jia
  2017-08-28 15:50             ` Gaëtan Rivet
  0 siblings, 1 reply; 494+ messages in thread
From: Guo, Jia @ 2017-08-22 14:56 UTC (permalink / raw)
  To: shreyansh.jain, jblunck, gaetan.rivet
  Cc: dev, Zhang, Helin, Wu, Jingjing, Thomas Monjalon, stephen,
	Richardson, Bruce, Jain, Deepak K, Liu, Yu Y

a. about uevent mechanism
 
As we know, uevent is come from the kobject of the kernel side, every kobject would have its own uevent, and a sysfs folder identify a kobject,
such as cpu,usb,pci,pci-express,virio, these bus component all have uevent. I agree that uevent would be the best if it could integrated in the bus layer.
I check the kernel src code , the uevent related is in lib/koject_uvent.c, and only for linux not for bsp, both support uio and vfio, 
so where shoud dpdk uevent be location?  I come to my mind 4 option below, and I propose 2) and 4).

1)Eal_bus:  (but uevent like netlink socket thing and event polling not related with bus behavior)
2)eal_dev:  (just considerate it like kernel's udev, and create new epoll, uevent handler)
3)add new file eal_udev.c
4)eal_interrupt. (add into the interrupt epoll, use interrupt handler)

Shreyansh & jblunck & gaetan

Since you recently work on pci bus layer and expert on  that, I want to ask you that if you plan about any other bus layer rework would be conflict my proposer,
or would let me modify to compatibility with the next architect? If you have,  please let me know. thanks. 

b. about pci uevent handler.
I suppose 2 option:
1)use a common interrupt handler for pci pmd to let app or fail-safe pmd to register. 
2)use a common uevent handler for pci pmd to let app or fail-safe pmd register.

Community, are there any good comment about that ?

Best regards,
Jeff Guo

-----Original Message-----
From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Guo, Jia
Sent: Wednesday, July 5, 2017 5:04 PM
To: Thomas Monjalon <thomas@monjalon.net>
Cc: dev@dpdk.org; Zhang, Helin <helin.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>
Subject: Re: [dpdk-dev] [PATCH v2 1/2] eal: add uevent api for hot plug



On 7/5/2017 3:32 PM, Thomas Monjalon wrote:
> 05/07/2017 05:02, Guo, Jia:
>> hi, thomas
>>
>>
>> On 7/5/2017 7:45 AM, Thomas Monjalon wrote:
>>> Hi,
>>>
>>> This is an interesting step for hotplug in DPDK.
>>>
>>> 28/06/2017 13:07, Jeff Guo:
>>>> +       netlink_fd = socket(PF_NETLINK, SOCK_DGRAM, 
>>>> + NETLINK_KOBJECT_UEVENT);
>>> It is monitoring the whole system...
>>>
>>>> +int
>>>> +rte_uevent_get(int fd, struct rte_uevent *uevent) {
>>>> +       int ret;
>>>> +       char buf[RTE_UEVENT_MSG_LEN];
>>>> +
>>>> +       memset(uevent, 0, sizeof(struct rte_uevent));
>>>> +       memset(buf, 0, RTE_UEVENT_MSG_LEN);
>>>> +
>>>> +       ret = recv(fd, buf, RTE_UEVENT_MSG_LEN - 1, MSG_DONTWAIT);
>>> ... and it is read from this function called by one driver.
>>> It cannot work without a global dispatch.
>> the rte_uevent-connect is called in func of pci_uio_alloc_resource, 
>> so each socket is created by  by each uio device. so i think that 
>> would not affect each driver isolate to use it.
> Ah OK, I missed it.
>
>>> It must be a global mechanism, probably a service core.
>>> The question is also to know whether it should be a mandatory 
>>> service in DPDK or an optional helper?
>> a global mechanism would be good, but so far, include mlx driver, we 
>> all handle the hot plug event in driver by app's registered callback. 
>> maybe a better global would be try in the future. but now is would 
>> work for all pci uio device.
> mlx drivers have a special connection to the kernel through the 
> associated mlx kernel drivers. That's why the PMD handle the events in a specific way.
>
> You are adding event handling for UIO.
> Now we need also VFIO.
>
> I am wondering how it could be better integrated in the bus layer.
absolutely, hot plug for VFIO must be request more for the live migration,  and we plan to add it at next level, when we go thought all uio hot plug feature integration done. so, could i expect an ack if there aren't other concern about uio uevent here. thanks.
>> and more, if in pci uio device to use hot plug , i think it might be 
>> mandatory.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v2 1/2] eal: add uevent api for hot plug
  2017-08-22 14:56           ` Guo, Jia
@ 2017-08-28 15:50             ` Gaëtan Rivet
  0 siblings, 0 replies; 494+ messages in thread
From: Gaëtan Rivet @ 2017-08-28 15:50 UTC (permalink / raw)
  To: Guo, Jia
  Cc: shreyansh.jain, jblunck, dev, Zhang, Helin, Wu, Jingjing,
	Thomas Monjalon, stephen, Richardson, Bruce, Jain, Deepak K, Liu,
	Yu Y

Hi Jeff,

On Tue, Aug 22, 2017 at 02:56:04PM +0000, Guo, Jia wrote:
> a. about uevent mechanism
>  
> As we know, uevent is come from the kobject of the kernel side, every kobject would have its own uevent, and a sysfs folder identify a kobject,
> such as cpu,usb,pci,pci-express,virio, these bus component all have uevent. I agree that uevent would be the best if it could integrated in the bus layer.
> I check the kernel src code , the uevent related is in lib/koject_uvent.c, and only for linux not for bsp, both support uio and vfio, 
> so where shoud dpdk uevent be location?  I come to my mind 4 option below, and I propose 2) and 4).
> 
> 1)Eal_bus:  (but uevent like netlink socket thing and event polling not related with bus behavior)
> 2)eal_dev:  (just considerate it like kernel's udev, and create new epoll, uevent handler)
> 3)add new file eal_udev.c
> 4)eal_interrupt. (add into the interrupt epoll, use interrupt handler)
> 
> Shreyansh & jblunck & gaetan
> 
> Since you recently work on pci bus layer and expert on  that, I want to ask you that if you plan about any other bus layer rework would be conflict my proposer,
> or would let me modify to compatibility with the next architect? If you have,  please let me know. thanks. 
> 

Yes, some work is planned at least from my side.

I am moving the PCI bus out of the EAL:
http://dpdk.org/ml/archives/dev/2017-August/073512.html

The current proposal is not yet complete. Some filesystem functions
might need to be exposed.

However, it has not functional impact: functions may move, but they do
the same thing. But even so, the uevent_fd that you add within
eal_common_uio will probably have to be moved (eal_common_pci_uio.c is
going out of the EAL), nothing dramatic.

That's all I can think of from my PoV that might be in conflict with
your work.

> b. about pci uevent handler.
> I suppose 2 option:
> 1)use a common interrupt handler for pci pmd to let app or fail-safe pmd to register. 
> 2)use a common uevent handler for pci pmd to let app or fail-safe pmd register.
> 
> Community, are there any good comment about that ?
> 
> Best regards,
> Jeff Guo
> 
> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Guo, Jia
> Sent: Wednesday, July 5, 2017 5:04 PM
> To: Thomas Monjalon <thomas@monjalon.net>
> Cc: dev@dpdk.org; Zhang, Helin <helin.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>
> Subject: Re: [dpdk-dev] [PATCH v2 1/2] eal: add uevent api for hot plug
> 
> 
> 
> On 7/5/2017 3:32 PM, Thomas Monjalon wrote:
> > 05/07/2017 05:02, Guo, Jia:
> >> hi, thomas
> >>
> >>
> >> On 7/5/2017 7:45 AM, Thomas Monjalon wrote:
> >>> Hi,
> >>>
> >>> This is an interesting step for hotplug in DPDK.
> >>>
> >>> 28/06/2017 13:07, Jeff Guo:
> >>>> +       netlink_fd = socket(PF_NETLINK, SOCK_DGRAM, 
> >>>> + NETLINK_KOBJECT_UEVENT);
> >>> It is monitoring the whole system...
> >>>
> >>>> +int
> >>>> +rte_uevent_get(int fd, struct rte_uevent *uevent) {
> >>>> +       int ret;
> >>>> +       char buf[RTE_UEVENT_MSG_LEN];
> >>>> +
> >>>> +       memset(uevent, 0, sizeof(struct rte_uevent));
> >>>> +       memset(buf, 0, RTE_UEVENT_MSG_LEN);
> >>>> +
> >>>> +       ret = recv(fd, buf, RTE_UEVENT_MSG_LEN - 1, MSG_DONTWAIT);
> >>> ... and it is read from this function called by one driver.
> >>> It cannot work without a global dispatch.
> >> the rte_uevent-connect is called in func of pci_uio_alloc_resource, 
> >> so each socket is created by  by each uio device. so i think that 
> >> would not affect each driver isolate to use it.
> > Ah OK, I missed it.
> >
> >>> It must be a global mechanism, probably a service core.
> >>> The question is also to know whether it should be a mandatory 
> >>> service in DPDK or an optional helper?
> >> a global mechanism would be good, but so far, include mlx driver, we 
> >> all handle the hot plug event in driver by app's registered callback. 
> >> maybe a better global would be try in the future. but now is would 
> >> work for all pci uio device.
> > mlx drivers have a special connection to the kernel through the 
> > associated mlx kernel drivers. That's why the PMD handle the events in a specific way.
> >
> > You are adding event handling for UIO.
> > Now we need also VFIO.
> >
> > I am wondering how it could be better integrated in the bus layer.
> absolutely, hot plug for VFIO must be request more for the live migration,  and we plan to add it at next level, when we go thought all uio hot plug feature integration done. so, could i expect an ack if there aren't other concern about uio uevent here. thanks.
> >> and more, if in pci uio device to use hot plug , i think it might be 
> >> mandatory.
> 

-- 
Gaëtan Rivet
6WIND

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH v4 0/2] add uevent monitor for hot plug
  2017-06-29  5:01       ` [PATCH v3 1/2] eal: " Jeff Guo
  2017-07-04  7:15         ` Wu, Jingjing
@ 2017-09-03 15:49         ` Jeff Guo
  2017-09-03 15:49           ` [PATCH v4 1/2] eal: " Jeff Guo
  2017-09-03 15:49           ` [PATCH v4 2/2] app/testpmd: use uevent to monitor hot removal Jeff Guo
  1 sibling, 2 replies; 494+ messages in thread
From: Jeff Guo @ 2017-09-03 15:49 UTC (permalink / raw)
  To: stephen, bruce.richardson
  Cc: dev, gaetan.rivet, shreyansh.jain, jblunck, helin.zhang,
	ferruh.yigit, konstantin.ananyev, thomas, jingjing.wu, jia.guo

So far, about hot plug in dpdk, we already have hot plug add/remove
api and fail-safe driver to offload the fail-safe work from the app
user. But there are still lack of a general event api, since the interrupt
event, which hot plug related with, is diversity between each device and
driver, such as mlx4, pci driver and others.

Use the hot removal event for example, pci drivers not all exposure the
remove interrupt, so in order to make user to easy use the hot plug feature
for pci driver, something must be done to detect the remove event at the
kernel level and offer a new line of interrupt to the user land.

Base on the uevent of kobject mechanism in kernel, we could use it to
benefit for monitoring the hot plug status of the device which not only
uio/vfio of pci bus devices, but also other, such as cpu/usb/pci-express
bus devices.

The idea is comming as bellow.

a.The uevent message form FD monitoring which will be useful.
remove@/devices/pci0000:80/0000:80:02.2/0000:82:00.0/0000:83:03.0/0000:84:00.2/uio/uio2
ACTION=remove
DEVPATH=/devices/pci0000:80/0000:80:02.2/0000:82:00.0/0000:83:03.0/0000:84:00.2/uio/uio2
SUBSYSTEM=uio
MAJOR=243
MINOR=2
DEVNAME=uio2
SEQNUM=11366

b.add uevent monitoring machanism:
add several general api to enable uevent monitoring.

c.add common uevent handler and uevent failure handler
uevent of device should be handler at bus or device layer, and the memory read
and write failure when hot removal should be handle correctly before detach behaviors.

d.show example how to use uevent monitor
enable uevent monitoring in testpmd or fail-safe to show usage.

patchset history:
v4->v3:
move uevent monitor api from eal interrupt to eal device layer.
create uevent type and struct in eal device.
move uevent handler for each driver to eal layer.
add uevent failure handler to process signal fault issue.
add example for request and use uevent monitoring in testpmd. 

v3->v2:
refine some return error
refine the string searching logic to avoid memory issue

v2->v1:
remove global variables of hotplug_fd, add uevent_fd
in rte_intr_handle to let each pci device self maintain it fd,
to fix dual device fd issue. refine some typo error.

Jeff Guo (2):
  eal: add uevent monitor for hot plug
  app/testpmd: use uevent to monitor hot removal

 app/test-pmd/testpmd.c                             |  62 ++++++
 lib/librte_eal/common/eal_common_dev.c             | 248 ++++++++++++++++++++-
 lib/librte_eal/common/eal_common_pci.c             |  20 ++
 lib/librte_eal/common/eal_private.h                |  14 ++
 lib/librte_eal/common/include/rte_dev.h            | 136 +++++++++++
 lib/librte_eal/common/include/rte_pci.h            |  17 ++
 lib/librte_eal/linuxapp/eal/eal_interrupts.c       |  28 ++-
 lib/librte_eal/linuxapp/eal/eal_pci_init.h         |   4 +
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c          |  62 ++++++
 .../linuxapp/eal/include/exec-env/rte_interrupts.h |   2 +-
 10 files changed, 586 insertions(+), 7 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH v4 1/2] eal: add uevent monitor for hot plug
  2017-09-03 15:49         ` [PATCH v4 0/2] add uevent monitor " Jeff Guo
@ 2017-09-03 15:49           ` Jeff Guo
  2017-09-03 16:10             ` Stephen Hemminger
                               ` (3 more replies)
  2017-09-03 15:49           ` [PATCH v4 2/2] app/testpmd: use uevent to monitor hot removal Jeff Guo
  1 sibling, 4 replies; 494+ messages in thread
From: Jeff Guo @ 2017-09-03 15:49 UTC (permalink / raw)
  To: stephen, bruce.richardson
  Cc: dev, gaetan.rivet, shreyansh.jain, jblunck, helin.zhang,
	ferruh.yigit, konstantin.ananyev, thomas, jingjing.wu, jia.guo

This patch aim to add a general uevent mechanism in eal device layer,
to enable all kernel object hot plug monitoring, so user could use these
API to monitor and read out the device status info sent from the kernel
side, then corresponding to handle it, such as detach or attach the
device, and even benefit to use it for do smoothly fail safe work.

1) About uevent monitoring:
a: add uevent_fd in struct rte_intr_handle, and use eal interrupt
   epolling to monitoring the netlink of uevent_fd.
b: add enum of rte_eal_uevent_type and struct of rte_eal_uevent.
c: add below API in rte eal device layer.
   rte_eal_uev_fd_new
   rte_eal_uev_enable
   rte_eal_uev_receive
   rte_eal_uev_callback_register
   rte_eal_uev_callback_unregister

2) About uevent handler and failure handler, use pci uio for example,
add below API to process it:
   pci_uio_uev_handler
   pci_uio_remap_resource
   pci_map_private_resource

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v4->v3:
move uevent monitor api from eal interrupt to eal device layer.
create uevent type and struct in eal device.
move uevent handler for each driver to eal layer.
add uevent failure handler to process signal fault issue.
add example for request and use uevent monitoring in testpmd.
---
 lib/librte_eal/common/eal_common_dev.c             | 248 ++++++++++++++++++++-
 lib/librte_eal/common/eal_common_pci.c             |  20 ++
 lib/librte_eal/common/eal_private.h                |  14 ++
 lib/librte_eal/common/include/rte_dev.h            | 136 +++++++++++
 lib/librte_eal/common/include/rte_pci.h            |  17 ++
 lib/librte_eal/linuxapp/eal/eal_interrupts.c       |  28 ++-
 lib/librte_eal/linuxapp/eal/eal_pci_init.h         |   4 +
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c          |  62 ++++++
 .../linuxapp/eal/include/exec-env/rte_interrupts.h |   2 +-
 9 files changed, 524 insertions(+), 7 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_dev.c b/lib/librte_eal/common/eal_common_dev.c
index f98302d..ae8f673 100644
--- a/lib/librte_eal/common/eal_common_dev.c
+++ b/lib/librte_eal/common/eal_common_dev.c
@@ -36,15 +36,41 @@
 #include <string.h>
 #include <inttypes.h>
 #include <sys/queue.h>
-
+#include <sys/signalfd.h>
+#include <sys/ioctl.h>
+#include <sys/socket.h>
+#include <linux/netlink.h>
+#include <sys/epoll.h>
+#include <unistd.h>
+
+#include <rte_malloc.h>
 #include <rte_bus.h>
 #include <rte_dev.h>
 #include <rte_devargs.h>
 #include <rte_debug.h>
 #include <rte_log.h>
+#include <rte_spinlock.h>
 
 #include "eal_private.h"
 
+/* spinlock for uevent callbacks */
+static rte_spinlock_t rte_eal_uev_cb_lock = RTE_SPINLOCK_INITIALIZER;
+
+/**
+ * The user application callback description.
+ *
+ * It contains callback address to be registered by user application,
+ * the pointer to the parameters for callback, and the event type.
+ */
+struct rte_eal_uev_callback {
+	TAILQ_ENTRY(rte_eal_uev_callback) next; /**< Callbacks list */
+	rte_eal_uev_cb_fn cb_fn;                /**< Callback address */
+	void *cb_arg;                           /**< Parameter for callback */
+	void *ret_param;                        /**< Return parameter */
+	enum rte_eal_uevent_type event;          /**< Interrupt event type */
+	uint32_t active;                        /**< Callback is executing */
+};
+
 static int cmp_detached_dev_name(const struct rte_device *dev,
 	const void *_name)
 {
@@ -244,3 +270,223 @@ int rte_eal_hotplug_remove(const char *busname, const char *devname)
 	rte_eal_devargs_remove(busname, devname);
 	return ret;
 }
+
+int
+rte_eal_uev_fd_new(void)
+{
+
+	int netlink_fd = -1;
+
+	netlink_fd = socket(PF_NETLINK, SOCK_DGRAM, NETLINK_KOBJECT_UEVENT);
+	if (netlink_fd < 0)
+		return -1;
+
+	return netlink_fd;
+}
+
+int
+rte_eal_uev_enable(int netlink_fd)
+{
+	struct sockaddr_nl addr;
+	int ret;
+	int size = 64 * 1024;
+	int nonblock = 1;
+	memset(&addr, 0, sizeof(addr));
+	addr.nl_family = AF_NETLINK;
+	addr.nl_pid = 0;
+	addr.nl_groups = 0xffffffff;
+
+	setsockopt(netlink_fd, SOL_SOCKET, SO_RCVBUFFORCE, &size, sizeof(size));
+
+	ret = ioctl(netlink_fd, FIONBIO, &nonblock);
+	if (ret != 0) {
+		RTE_LOG(ERR, EAL,
+		"ioctl(FIONBIO) failed\n");
+		close(netlink_fd);
+		return -1;
+	}
+
+	if (bind(netlink_fd, (struct sockaddr *) &addr, sizeof(addr)) < 0) {
+		close(netlink_fd);
+		return -1;
+	}
+
+	return 0;
+}
+
+static int
+rte_eal_uev_parse(const char *buf, struct rte_eal_uevent *event)
+{
+	char action[RTE_EAL_UEVENT_MSG_LEN];
+	char subsystem[RTE_EAL_UEVENT_MSG_LEN];
+	char dev_path[RTE_EAL_UEVENT_MSG_LEN];
+	int i = 0;
+
+	memset(action, 0, RTE_EAL_UEVENT_MSG_LEN);
+	memset(subsystem, 0, RTE_EAL_UEVENT_MSG_LEN);
+	memset(dev_path, 0, RTE_EAL_UEVENT_MSG_LEN);
+
+	while (i < RTE_EAL_UEVENT_MSG_LEN) {
+		for (; i < RTE_EAL_UEVENT_MSG_LEN; i++) {
+			if (*buf)
+				break;
+			buf++;
+		}
+		if (!strncmp(buf, "ACTION=", 7)) {
+			buf += 7;
+			i += 7;
+			snprintf(action, sizeof(action), "%s", buf);
+		} else if (!strncmp(buf, "DEVPATH=", 8)) {
+			buf += 8;
+			i += 8;
+			snprintf(dev_path, sizeof(dev_path), "%s", buf);
+		} else if (!strncmp(buf, "SUBSYSTEM=", 10)) {
+			buf += 10;
+			i += 10;
+			snprintf(subsystem, sizeof(subsystem), "%s", buf);
+		}
+		for (; i < RTE_EAL_UEVENT_MSG_LEN; i++) {
+			if (*buf == '\0')
+				break;
+			buf++;
+		}
+	}
+
+	if ((!strncmp(subsystem, "uio", 3)) ||
+		(!strncmp(subsystem, "pci", 3))) {
+		event->subsystem = RTE_EAL_UEVENT_SUBSYSTEM_UIO;
+		if (!strncmp(action, "add", 3))
+			event->type = RTE_EAL_UEVENT_ADD;
+		if (!strncmp(action, "remove", 6))
+			event->type = RTE_EAL_UEVENT_REMOVE;
+		return 0;
+	}
+
+	return -1;
+}
+
+int
+rte_eal_uev_receive(int fd, struct rte_eal_uevent *uevent)
+{
+	int ret;
+	char buf[RTE_EAL_UEVENT_MSG_LEN];
+
+	memset(uevent, 0, sizeof(struct rte_eal_uevent));
+	memset(buf, 0, RTE_EAL_UEVENT_MSG_LEN);
+
+	ret = recv(fd, buf, RTE_EAL_UEVENT_MSG_LEN - 1, MSG_DONTWAIT);
+	if (ret > 0)
+		return rte_eal_uev_parse(buf, uevent);
+	else if (ret < 0) {
+		RTE_LOG(ERR, EAL,
+		"Socket read error(%d): %s\n",
+		errno, strerror(errno));
+		return -1;
+	} else
+		/* connection closed */
+		return -1;
+}
+
+int
+rte_eal_uev_callback_register(struct rte_device *dev,
+			enum rte_eal_uevent_type event,
+			rte_eal_uev_cb_fn cb_fn, void *cb_arg)
+{
+	struct rte_eal_uev_callback *user_cb;
+
+	if (!cb_fn)
+		return -EINVAL;
+
+	rte_spinlock_lock(&rte_eal_uev_cb_lock);
+
+	TAILQ_FOREACH(user_cb, &(dev->uev_cbs), next) {
+		if (user_cb->cb_fn == cb_fn &&
+			user_cb->cb_arg == cb_arg &&
+			user_cb->event == event) {
+			break;
+		}
+	}
+
+	/* create a new callback. */
+	if (user_cb == NULL) {
+		user_cb = rte_zmalloc("EAL_UEV_CALLBACK",
+					sizeof(struct rte_eal_uev_callback), 0);
+		if (user_cb != NULL) {
+			user_cb->cb_fn = cb_fn;
+			user_cb->cb_arg = cb_arg;
+			user_cb->event = event;
+			TAILQ_INSERT_TAIL(&(dev->uev_cbs), user_cb, next);
+		}
+	}
+
+	rte_spinlock_unlock(&rte_eal_uev_cb_lock);
+	return (user_cb == NULL) ? -ENOMEM : 0;
+}
+
+int
+rte_eal_uev_callback_unregister(struct rte_device *dev,
+			enum rte_eal_uevent_type event,
+			rte_eal_uev_cb_fn cb_fn, void *cb_arg)
+{
+	int ret;
+	struct rte_eal_uev_callback *cb, *next;
+
+	if (!cb_fn)
+		return -EINVAL;
+
+	rte_spinlock_lock(&rte_eal_uev_cb_lock);
+
+	ret = 0;
+	for (cb = TAILQ_FIRST(&dev->uev_cbs); cb != NULL; cb = next) {
+
+		next = TAILQ_NEXT(cb, next);
+
+		if (cb->cb_fn != cb_fn || cb->event != event ||
+				(cb->cb_arg != (void *)-1 &&
+				cb->cb_arg != cb_arg))
+			continue;
+
+		/*
+		 * if this callback is not executing right now,
+		 * then remove it.
+		 */
+		if (cb->active == 0) {
+			TAILQ_REMOVE(&(dev->uev_cbs), cb, next);
+			rte_free(cb);
+		} else {
+			ret = -EAGAIN;
+		}
+	}
+
+	rte_spinlock_unlock(&rte_eal_uev_cb_lock);
+	return ret;
+}
+
+int
+_rte_eal_uev_callback_process(struct rte_device *dev,
+	enum rte_eal_uevent_type event, void *cb_arg, void *ret_param)
+{
+	struct rte_eal_uev_callback *cb_lst;
+	struct rte_eal_uev_callback dev_cb;
+	int rc = 0;
+
+	rte_spinlock_lock(&rte_eal_uev_cb_lock);
+	TAILQ_FOREACH(cb_lst, &(dev->uev_cbs), next) {
+		if (cb_lst->cb_fn == NULL || cb_lst->event != event)
+			continue;
+		dev_cb = *cb_lst;
+		cb_lst->active = 1;
+		if (cb_arg != NULL)
+			dev_cb.cb_arg = cb_arg;
+		if (ret_param != NULL)
+			dev_cb.ret_param = ret_param;
+
+		rte_spinlock_unlock(&rte_eal_uev_cb_lock);
+		rc = dev_cb.cb_fn(dev, dev_cb.event,
+				dev_cb.cb_arg, dev_cb.ret_param);
+		rte_spinlock_lock(&rte_eal_uev_cb_lock);
+		cb_lst->active = 0;
+	}
+	rte_spinlock_unlock(&rte_eal_uev_cb_lock);
+	return rc;
+}
diff --git a/lib/librte_eal/common/eal_common_pci.c b/lib/librte_eal/common/eal_common_pci.c
index 52fd38c..0d640bc 100644
--- a/lib/librte_eal/common/eal_common_pci.c
+++ b/lib/librte_eal/common/eal_common_pci.c
@@ -110,6 +110,26 @@ pci_name_set(struct rte_pci_device *dev)
 		dev->device.name = dev->name;
 }
 
+/* map a private resource from an address*/
+void *
+pci_map_private_resource(void *requested_addr, off_t offset, size_t size)
+{
+	void *mapaddr;
+
+	mapaddr = mmap(requested_addr, size,
+			   PROT_READ | PROT_WRITE,
+			   MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
+	if (mapaddr == MAP_FAILED) {
+		RTE_LOG(ERR, EAL, "%s(): cannot mmap(%p, 0x%lx, 0x%lx): %s (%p)\n",
+			__func__, requested_addr,
+			(unsigned long)size, (unsigned long)offset,
+			strerror(errno), mapaddr);
+	} else
+		RTE_LOG(DEBUG, EAL, "  PCI memory mapped at %p\n", mapaddr);
+
+	return mapaddr;
+}
+
 /* map a particular resource from a file */
 void *
 pci_map_resource(void *requested_addr, int fd, off_t offset, size_t size,
diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h
index 597d82e..8f5e283 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -192,6 +192,8 @@ int pci_uio_map_resource(struct rte_pci_device *dev);
  */
 void pci_uio_unmap_resource(struct rte_pci_device *dev);
 
+void pci_uio_uev_handler(void *parm);
+
 /**
  * Allocate uio resource for PCI device
  *
@@ -222,6 +224,18 @@ void pci_uio_free_resource(struct rte_pci_device *dev,
 		struct mapped_pci_resource *uio_res);
 
 /**
+ * remap the pci uio resource..
+ *
+ * @param dev
+ *   Point to the struct rte pci device.
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+pci_uio_remap_resource(struct rte_pci_device *dev);
+
+/**
  * Map device memory to uio resource
  *
  * This function is private to EAL.
diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h
index 5386d3a..656023e 100644
--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -52,6 +52,13 @@ extern "C" {
 #include <rte_config.h>
 #include <rte_log.h>
 
+struct rte_device;
+
+struct rte_eal_uev_callback;
+/** @internal Structure to keep track of registered callbacks */
+TAILQ_HEAD(rte_eal_uev_cb_list, rte_eal_uev_callback);
+
+
 __attribute__((format(printf, 2, 0)))
 static inline void
 rte_pmd_debug_trace(const char *func_name, const char *fmt, ...)
@@ -163,6 +170,8 @@ struct rte_device {
 	const struct rte_driver *driver;/**< Associated driver */
 	int numa_node;                /**< NUMA node connection */
 	struct rte_devargs *devargs;  /**< Device user arguments */
+	/** User application callbacks for device uevent monitoring  */
+	struct rte_eal_uev_cb_list uev_cbs;
 };
 
 /**
@@ -246,6 +255,133 @@ int rte_eal_hotplug_add(const char *busname, const char *devname,
  */
 int rte_eal_hotplug_remove(const char *busname, const char *devname);
 
+#define RTE_EAL_UEVENT_MSG_LEN 4096
+#define RTE_EAL_UEVENT_SUBSYSTEM_UIO 1
+#define RTE_EAL_UEVENT_SUBSYSTEM_VFIO 2
+
+/**
+ * The eth device event type for interrupt, and maybe others in the future.
+ */
+enum rte_eal_uevent_type {
+	RTE_EAL_UEVENT_UNKNOWN,  /**< unknown event type */
+	RTE_EAL_UEVENT_ADD, /**< lsc interrupt event */
+	RTE_EAL_UEVENT_REMOVE,
+				/**< queue state event (enabled/disabled) */
+	RTE_EAL_UEVENT_CHANGE,
+			/**< reset interrupt event, sent to VF on PF reset */
+	RTE_EAL_UEVENT_MOVE,  /**< message from the VF received by PF */
+	RTE_EAL_UEVENT_ONLINE,   /**< MACsec offload related event */
+	RTE_EAL_UEVENT_OFFLINE, /**< device removal event */
+	RTE_EAL_UEVENT_MAX       /**< max value of this enum */
+};
+
+struct rte_eal_uevent {
+	enum rte_eal_uevent_type type;	/**< uevent action type */
+	int subsystem;				/**< subsystem id */
+};
+
+/**
+ * create  the device uevent file descriptor.
+ * @return
+ *   - On success, the device uevent fd.
+ *   - On failure, a negative value.
+ */
+int
+rte_eal_uev_fd_new(void);
+
+/**
+ * Bind  the netlink to enable  uevent receiving.
+ *
+ * @param fd
+ *   The fd which the uevent associated to
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+rte_eal_uev_enable(int fd);
+
+/**
+ * It read out the uevent from the specific file descriptor.
+ *
+ * @param fd
+ *   The fd which the uevent associated to
+ * @param uevent
+ *   Pointer to the uevent which read from the monitoring fd.
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+rte_eal_uev_receive(int fd, struct rte_eal_uevent *uevent);
+
+typedef int (*rte_eal_uev_cb_fn)(struct rte_device *dev,
+		enum rte_eal_uevent_type event, void *cb_arg, void *ret_param);
+/**< user application callback to be registered for interrupts */
+
+/**
+ * Register a callback function for specific device..
+ *
+ * @param dev
+ *  Pointer to struct rte_device.
+ * @param event
+ *  Uevent interested.
+ * @param cb_fn
+ *  User supplied callback function to be called.
+ * @param cb_arg
+ *  Pointer to the parameters for the registered callback.
+ *
+ * @return
+ *  - On success, zero.
+ *  - On failure, a negative value.
+ */
+int rte_eal_uev_callback_register(struct rte_device *dev,
+			enum rte_eal_uevent_type event,
+			rte_eal_uev_cb_fn cb_fn, void *cb_arg);
+
+/**
+ * Unregister a callback function for specific device.
+ *
+ * @param device
+ *  Pointer to struct rte_device.
+ * @param event
+ *  Uevent interested.
+ * @param cb_fn
+ *  User supplied callback function to be called.
+ * @param cb_arg
+ *  Pointer to the parameters for the registered callback. -1 means to
+ *  remove all for the same callback address and same event.
+ *
+ * @return
+ *  - On success, zero.
+ *  - On failure, a negative value.
+ */
+int rte_eal_uev_callback_unregister(struct rte_device *dev,
+			enum rte_eal_uevent_type event,
+		rte_eal_uev_cb_fn cb_fn, void *cb_arg);
+
+/**
+ * @internal Executes all the user application registered callbacks for
+ * the specific device. It is for DPDK internal user only. User
+ * application should not call it directly.
+ *
+ * @param dev
+ *  Pointer to struct rte_device.
+ * @param event
+ *  rte device uevent type.
+ * @param cb_arg
+ *  callback parameter.
+ * @param ret_param
+ *  To pass data back to user application.
+ *  This allows the user application to decide if a particular function
+ *  is permitted or not.
+ *
+ * @return
+ *  int
+ */
+int _rte_eal_uev_callback_process(struct rte_device *dev,
+		enum rte_eal_uevent_type event, void *cb_arg, void *ret_param);
+
 /**
  * Device comparison function.
  *
diff --git a/lib/librte_eal/common/include/rte_pci.h b/lib/librte_eal/common/include/rte_pci.h
index 8b12339..d3ced27 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -394,6 +394,23 @@ void rte_pci_unmap_device(struct rte_pci_device *dev);
 
 /**
  * @internal
+ * Map to a particular private resource.
+ *
+ * @param requested_addr
+ *      The starting address for the new mapping range.
+ * @param offset
+ *      The offset for the mapping range.
+ * @param size
+ *      The size for the mapping range.
+ * @return
+ *   - On success, the function returns a pointer to the mapped area.
+ *   - On error, the value MAP_FAILED is returned.
+ */
+void *pci_map_private_resource(void *requested_addr, off_t offset,
+		size_t size);
+
+/**
+ * @internal
  * Map a particular resource from a file.
  *
  * @param requested_addr
diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
index 3e9ac41..06cbdab 100644
--- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c
+++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
@@ -670,11 +670,16 @@ eal_intr_process_interrupts(struct epoll_event *events, int nfds)
 			RTE_SET_USED(r);
 			return -1;
 		}
+
 		rte_spinlock_lock(&intr_lock);
-		TAILQ_FOREACH(src, &intr_sources, next)
+		TAILQ_FOREACH(src, &intr_sources, next) {
 			if (src->intr_handle.fd ==
 					events[n].data.fd)
 				break;
+			else if (src->intr_handle.uevent_fd ==
+					events[n].data.fd)
+				break;
+		}
 		if (src == NULL){
 			rte_spinlock_unlock(&intr_lock);
 			continue;
@@ -736,17 +741,13 @@ eal_intr_process_interrupts(struct epoll_event *events, int nfds)
 		rte_spinlock_lock(&intr_lock);
 
 		if (call) {
-
 			/* Finally, call all callbacks. */
 			TAILQ_FOREACH(cb, &src->callbacks, next) {
-
 				/* make a copy and unlock. */
 				active_cb = *cb;
 				rte_spinlock_unlock(&intr_lock);
-
 				/* call the actual callback */
 				active_cb.cb_fn(active_cb.cb_arg);
-
 				/*get the lock back. */
 				rte_spinlock_lock(&intr_lock);
 			}
@@ -859,7 +860,24 @@ eal_intr_thread_main(__rte_unused void *arg)
 			}
 			else
 				numfds++;
+
+			/**
+			 * add device uevent file descriptor
+			 * into wait list for uevent monitoring.
+			 */
+			ev.events = EPOLLIN | EPOLLPRI | EPOLLRDHUP | EPOLLHUP;
+			ev.data.fd = src->intr_handle.uevent_fd;
+			if (epoll_ctl(pfd, EPOLL_CTL_ADD,
+					src->intr_handle.uevent_fd, &ev) < 0){
+				rte_panic("Error adding uevent_fd %d epoll_ctl"
+					", %s\n",
+					src->intr_handle.uevent_fd,
+					strerror(errno));
+			} else
+				numfds++;
 		}
+
+
 		rte_spinlock_unlock(&intr_lock);
 		/* serve the interrupt */
 		eal_intr_handle_interrupts(pfd, numfds);
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_init.h b/lib/librte_eal/linuxapp/eal/eal_pci_init.h
index ae2980d..2f040b4 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_init.h
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_init.h
@@ -52,10 +52,14 @@ void *pci_find_max_end_va(void);
 int pci_parse_one_sysfs_resource(char *line, size_t len, uint64_t *phys_addr,
 	uint64_t *end_addr, uint64_t *flags);
 
+void pci_uio_uev_handler(void *param);
 int pci_uio_alloc_resource(struct rte_pci_device *dev,
 		struct mapped_pci_resource **uio_res);
 void pci_uio_free_resource(struct rte_pci_device *dev,
 		struct mapped_pci_resource *uio_res);
+
+int pci_uio_remap_resource(struct rte_pci_device *dev);
+
 int pci_uio_map_resource_by_index(struct rte_pci_device *dev, int res_idx,
 		struct mapped_pci_resource *uio_res, int map_idx);
 
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
index fa10329..c85eb6c 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
@@ -231,6 +231,10 @@ pci_uio_free_resource(struct rte_pci_device *dev,
 		close(dev->intr_handle.uio_cfg_fd);
 		dev->intr_handle.uio_cfg_fd = -1;
 	}
+	if (dev->intr_handle.uevent_fd >= 0) {
+		close(dev->intr_handle.uevent_fd);
+		dev->intr_handle.uevent_fd = -1;
+	}
 	if (dev->intr_handle.fd >= 0) {
 		close(dev->intr_handle.fd);
 		dev->intr_handle.fd = -1;
@@ -239,6 +243,53 @@ pci_uio_free_resource(struct rte_pci_device *dev,
 }
 
 int
+pci_uio_remap_resource(struct rte_pci_device *dev)
+{
+	int i;
+	uint64_t phaddr;
+	void *map_address;
+
+		/* Map all BARs */
+		for (i = 0; i != PCI_MAX_RESOURCE; i++) {
+			/* skip empty BAR */
+			phaddr = dev->mem_resource[i].phys_addr;
+			if (phaddr == 0)
+				continue;
+			map_address = pci_map_private_resource(dev->mem_resource[i].addr, 0,
+					(size_t)dev->mem_resource[i].len);
+			if (map_address == MAP_FAILED)
+				goto error;
+			memset(map_address, 0xFF, (size_t)dev->mem_resource[i].len);
+			dev->mem_resource[i].addr = map_address;
+		}
+
+	return 0;
+error:
+	return -1;
+}
+
+void
+pci_uio_uev_handler(void *param)
+{
+
+	struct rte_pci_device *dev = (struct rte_pci_device *)param;
+	struct rte_eal_uevent event;
+	int ret;
+
+	/* check device uevent */
+	if (rte_eal_uev_receive(dev->intr_handle.uevent_fd, &event) == 0) {
+		if (event.subsystem == RTE_EAL_UEVENT_SUBSYSTEM_UIO) {
+			if (event.type == RTE_EAL_UEVENT_REMOVE) {
+				/*remap the resource to be fake before removal processing */
+				ret = pci_uio_remap_resource(dev);
+				if (!ret)
+					_rte_eal_uev_callback_process(&dev->device, RTE_EAL_UEVENT_REMOVE, NULL, NULL);
+			}
+		}
+	}
+}
+
+int
 pci_uio_alloc_resource(struct rte_pci_device *dev,
 		struct mapped_pci_resource **uio_res)
 {
@@ -246,6 +297,7 @@ pci_uio_alloc_resource(struct rte_pci_device *dev,
 	char cfgname[PATH_MAX];
 	char devname[PATH_MAX]; /* contains the /dev/uioX */
 	int uio_num;
+	struct rte_intr_handle *intr_handle;
 	struct rte_pci_addr *loc;
 
 	loc = &dev->addr;
@@ -276,6 +328,16 @@ pci_uio_alloc_resource(struct rte_pci_device *dev,
 		goto error;
 	}
 
+	dev->intr_handle.uevent_fd = rte_eal_uev_fd_new();
+	intr_handle = &dev->intr_handle;
+
+	rte_eal_uev_enable(intr_handle->uevent_fd);
+	TAILQ_INIT(&(dev->device.uev_cbs));
+
+	/* register callback func to eal lib */
+	rte_intr_callback_register(intr_handle,
+				   pci_uio_uev_handler, dev);
+
 	if (dev->kdrv == RTE_KDRV_IGB_UIO)
 		dev->intr_handle.type = RTE_INTR_HANDLE_UIO;
 	else {
diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
index 6daffeb..8c7fce4 100644
--- a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
+++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
@@ -90,6 +90,7 @@ struct rte_intr_handle {
 					for uio_pci_generic */
 	};
 	int fd;	 /**< interrupt event file descriptor */
+	int uevent_fd;	 /**< uevent file descriptor */
 	enum rte_intr_handle_type type;  /**< handle type */
 	uint32_t max_intr;             /**< max interrupt requested */
 	uint32_t nb_efd;               /**< number of available efd(event fd) */
@@ -235,5 +236,4 @@ rte_intr_allow_others(struct rte_intr_handle *intr_handle);
  */
 int
 rte_intr_cap_multiple(struct rte_intr_handle *intr_handle);
-
 #endif /* _RTE_LINUXAPP_INTERRUPTS_H_ */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v4 2/2] app/testpmd: use uevent to monitor hot removal
  2017-09-03 15:49         ` [PATCH v4 0/2] add uevent monitor " Jeff Guo
  2017-09-03 15:49           ` [PATCH v4 1/2] eal: " Jeff Guo
@ 2017-09-03 15:49           ` Jeff Guo
  2017-09-20  4:12             ` [PATCH v5 0/2] add uevent monitor for hot plug Jeff Guo
  1 sibling, 1 reply; 494+ messages in thread
From: Jeff Guo @ 2017-09-03 15:49 UTC (permalink / raw)
  To: stephen, bruce.richardson
  Cc: dev, gaetan.rivet, shreyansh.jain, jblunck, helin.zhang,
	ferruh.yigit, konstantin.ananyev, thomas, jingjing.wu, jia.guo

use testpmd for example, to show app how to request and use
uevent monitoring to handle an event of device hot removal.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
 app/test-pmd/testpmd.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 62 insertions(+)

diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 9a36e66..b5ff7d7 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -393,6 +393,10 @@ static void check_all_ports_link_status(uint32_t port_mask);
 static int eth_event_callback(uint8_t port_id,
 			      enum rte_eth_event_type type,
 			      void *param, void *ret_param);
+static int eth_uevent_callback(struct rte_device *dev,
+			      enum rte_eal_uevent_type type,
+			      void *param, void *ret_param);
+
 
 /*
  * Check if all the ports are started.
@@ -1413,6 +1417,7 @@ start_port(portid_t pid)
 	struct rte_port *port;
 	struct ether_addr mac_addr;
 	enum rte_eth_event_type event_type;
+	enum rte_eal_uevent_type uevent_type;
 
 	if (port_id_is_invalid(pid, ENABLED_WARN))
 		return 0;
@@ -1548,6 +1553,18 @@ start_port(portid_t pid)
 			}
 		}
 
+		for (uevent_type = RTE_EAL_UEVENT_UNKNOWN;
+		     uevent_type < RTE_EAL_UEVENT_MAX;
+		     uevent_type++) {
+			diag = rte_eal_uev_callback_register(&port->dev_info.pci_dev->device,uevent_type,
+							eth_uevent_callback, NULL);
+			if (diag) {
+				printf("Failed to setup uevent callback for uevent %d\n",
+					uevent_type);
+				return -1;
+			}
+		}
+
 		/* start port */
 		if (rte_eth_dev_start(pi) < 0) {
 			printf("Fail to start port %d\n", pi);
@@ -1842,6 +1859,16 @@ rmv_event_callback(void *arg)
 			dev->device->name);
 }
 
+static void
+rmv_uevent_callback(void *arg)
+{
+	struct rte_device *dev = (struct rte_device *)arg;
+	printf("removing device %s\n", dev->name);
+	if (rte_eal_dev_detach(dev))
+		RTE_LOG(ERR, USER1, "Failed to detach device %s\n",
+			dev->name);
+}
+
 /* This function is used by the interrupt thread */
 static int
 eth_event_callback(uint8_t port_id, enum rte_eth_event_type type, void *param,
@@ -1883,6 +1910,41 @@ eth_event_callback(uint8_t port_id, enum rte_eth_event_type type, void *param,
 	return 0;
 }
 
+/* This function is used by the interrupt thread */
+static int
+eth_uevent_callback(struct rte_device *dev, enum rte_eal_uevent_type type, void *param,
+		  void *ret_param)
+{
+	static const char * const event_desc[] = {
+		[RTE_EAL_UEVENT_UNKNOWN] = "Unknown",
+		[RTE_EAL_UEVENT_REMOVE] = "remove",
+	};
+
+	RTE_SET_USED(param);
+	RTE_SET_USED(ret_param);
+
+	if (type >= RTE_EAL_UEVENT_MAX) {
+		fprintf(stderr, "%s called upon invalid event %d\n",
+			__func__, type);
+		fflush(stderr);
+	} else if (event_print_mask & (UINT32_C(1) << type)) {
+		printf("%s event\n",
+			event_desc[type]);
+		fflush(stdout);
+	}
+
+	switch (type) {
+	case RTE_EAL_UEVENT_REMOVE:
+		if (rte_eal_alarm_set(100000,
+				rmv_uevent_callback, (void *)dev))
+			fprintf(stderr, "Could not set up deferred device removal\n");
+		break;
+	default:
+		break;
+	}
+	return 0;
+}
+
 static int
 set_tx_queue_stats_mapping_registers(uint8_t port_id, struct rte_port *port)
 {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* Re: [PATCH v4 1/2] eal: add uevent monitor for hot plug
  2017-09-03 15:49           ` [PATCH v4 1/2] eal: " Jeff Guo
@ 2017-09-03 16:10             ` Stephen Hemminger
  2017-09-03 16:12             ` Stephen Hemminger
                               ` (2 subsequent siblings)
  3 siblings, 0 replies; 494+ messages in thread
From: Stephen Hemminger @ 2017-09-03 16:10 UTC (permalink / raw)
  To: Jeff Guo
  Cc: bruce.richardson, dev, gaetan.rivet, shreyansh.jain, jblunck,
	helin.zhang, ferruh.yigit, konstantin.ananyev, thomas,
	jingjing.wu

On Sun,  3 Sep 2017 23:49:44 +0800
Jeff Guo <jia.guo@intel.com> wrote:

> +int
> +rte_eal_uev_fd_new(void)
> +{
> +
> +	int netlink_fd = -1;
> +
> +	netlink_fd = socket(PF

Please don't use the "initialize everything" style of programming.

Gcc has good detection and warning about uninitalized variables, and this style
defeats that.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v4 1/2] eal: add uevent monitor for hot plug
  2017-09-03 15:49           ` [PATCH v4 1/2] eal: " Jeff Guo
  2017-09-03 16:10             ` Stephen Hemminger
@ 2017-09-03 16:12             ` Stephen Hemminger
  2017-09-05  5:28               ` Guo, Jia
  2017-09-03 16:14             ` Stephen Hemminger
  2017-09-03 16:16             ` Stephen Hemminger
  3 siblings, 1 reply; 494+ messages in thread
From: Stephen Hemminger @ 2017-09-03 16:12 UTC (permalink / raw)
  To: Jeff Guo
  Cc: bruce.richardson, dev, gaetan.rivet, shreyansh.jain, jblunck,
	helin.zhang, ferruh.yigit, konstantin.ananyev, thomas,
	jingjing.wu

On Sun,  3 Sep 2017 23:49:44 +0800
Jeff Guo <jia.guo@intel.com> wrote:

> +int
> +rte_eal_uev_enable(int netlink_fd)
> +{
> +	struct sockaddr_nl addr;
> +	int ret;
> +	int size = 64 * 1024;
> +	int nonblock = 1;
> +	memset(&addr, 0, sizeof(addr));

Blank line between declarations and code.
Also use C99 style intializations not memset.

> +	addr.nl_family = AF_NETLINK;
> +	addr.nl_pid = 0;
> +	addr.nl_groups = 0xffffffff;

You don't need or want all events. specify which group you want.

> +
> +	setsockopt(netlink_fd, SOL_SOCKET, SO_RCVBUFFORCE, &size, sizeof(size));

Don't user BUFFORCE, that is only available as root. Just doing SO_RCVBUF is enough.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v4 1/2] eal: add uevent monitor for hot plug
  2017-09-03 15:49           ` [PATCH v4 1/2] eal: " Jeff Guo
  2017-09-03 16:10             ` Stephen Hemminger
  2017-09-03 16:12             ` Stephen Hemminger
@ 2017-09-03 16:14             ` Stephen Hemminger
  2017-09-03 16:16             ` Stephen Hemminger
  3 siblings, 0 replies; 494+ messages in thread
From: Stephen Hemminger @ 2017-09-03 16:14 UTC (permalink / raw)
  To: Jeff Guo
  Cc: bruce.richardson, dev, gaetan.rivet, shreyansh.jain, jblunck,
	helin.zhang, ferruh.yigit, konstantin.ananyev, thomas,
	jingjing.wu

On Sun,  3 Sep 2017 23:49:44 +0800
Jeff Guo <jia.guo@intel.com> wrote:

> +	char buf[RTE_EAL_UEVENT_MSG_LEN];
> +
> +	memset(uevent, 0, sizeof(struct rte_eal_uevent));
> +	memset(buf, 0, RTE_EAL_UEVENT_MSG_LEN);

Please don't initialize everything all the time; you are even initializing receive  data.

> +	ret = recv(fd, buf, RTE_EAL_UEVENT_MSG_LEN - 1, MSG_DONTWAIT);
> +	if (ret > 0)
> +		return rte_eal_uev_parse(buf, uevent);
> +	else if (ret < 0) {

else is unnecessary after return.

Checkpatch would have told you that if you ran it.

> +		RTE_LOG(ERR, EAL,
> +		"Socket read error(%d): %s\n",
> +		errno, strerror(er

Please indent arguements to match the first line:
	RTE_LOG(ERR, EAL,
		"Socket read error ...

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v4 1/2] eal: add uevent monitor for hot plug
  2017-09-03 15:49           ` [PATCH v4 1/2] eal: " Jeff Guo
                               ` (2 preceding siblings ...)
  2017-09-03 16:14             ` Stephen Hemminger
@ 2017-09-03 16:16             ` Stephen Hemminger
  3 siblings, 0 replies; 494+ messages in thread
From: Stephen Hemminger @ 2017-09-03 16:16 UTC (permalink / raw)
  To: Jeff Guo
  Cc: bruce.richardson, dev, gaetan.rivet, shreyansh.jain, jblunck,
	helin.zhang, ferruh.yigit, konstantin.ananyev, thomas,
	jingjing.wu

On Sun,  3 Sep 2017 23:49:44 +0800
Jeff Guo <jia.guo@intel.com> wrote:

> +			/**
> +			 * add device uevent file descriptor
> +			 * into wait list for uevent monitoring.
> +			 */
> +			ev.events = EPOLLIN | EPOLLPRI | EPOLLRDHUP | EPOLLHUP;
> +			ev.data.fd = src->intr_handle.uevent_fd;
> +			if (epoll_ctl(pfd, EPOLL_CTL_ADD,
> +					src->intr_handle.uevent_fd, &ev) < 0){
> +				rte_panic("Error adding uevent_fd %d epoll_ctl"
> +					", %s\n",
> +					src->intr_handle.uevent_fd,
> +					strerror(errno));

Panicing a user application under load is not going to make people happy

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v4 1/2] eal: add uevent monitor for hot plug
  2017-09-03 16:12             ` Stephen Hemminger
@ 2017-09-05  5:28               ` Guo, Jia
  0 siblings, 0 replies; 494+ messages in thread
From: Guo, Jia @ 2017-09-05  5:28 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Richardson, Bruce, dev, gaetan.rivet, shreyansh.jain, jblunck,
	Zhang, Helin, Yigit, Ferruh, Ananyev, Konstantin, thomas, Wu,
	Jingjing

Thanks Stephen for your review and suggestion, I will involve them in next version.

Best regards,
Jeff Guo

-----Original Message-----
From: Stephen Hemminger [mailto:stephen@networkplumber.org] 
Sent: Monday, September 4, 2017 12:12 AM
To: Guo, Jia <jia.guo@intel.com>
Cc: Richardson, Bruce <bruce.richardson@intel.com>; dev@dpdk.org; gaetan.rivet@6wind.com; shreyansh.jain@nxp.com; jblunck@infradead.org; Zhang, Helin <helin.zhang@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com>; Ananyev, Konstantin <konstantin.ananyev@intel.com>; thomas@monjalon.net; Wu, Jingjing <jingjing.wu@intel.com>
Subject: Re: [PATCH v4 1/2] eal: add uevent monitor for hot plug

On Sun,  3 Sep 2017 23:49:44 +0800
Jeff Guo <jia.guo@intel.com> wrote:

> +int
> +rte_eal_uev_enable(int netlink_fd)
> +{
> +	struct sockaddr_nl addr;
> +	int ret;
> +	int size = 64 * 1024;
> +	int nonblock = 1;
> +	memset(&addr, 0, sizeof(addr));

Blank line between declarations and code.
Also use C99 style intializations not memset.

> +	addr.nl_family = AF_NETLINK;
> +	addr.nl_pid = 0;
> +	addr.nl_groups = 0xffffffff;

You don't need or want all events. specify which group you want.

> +
> +	setsockopt(netlink_fd, SOL_SOCKET, SO_RCVBUFFORCE, &size, sizeof(size));

Don't user BUFFORCE, that is only available as root. Just doing SO_RCVBUF is enough.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v5 0/2] add uevent monitor for hot plug
  2017-09-20  4:12             ` [PATCH v5 0/2] add uevent monitor for hot plug Jeff Guo
@ 2017-09-19 18:44               ` Jan Blunck
  2017-09-20  6:51                 ` Guo, Jia
  2017-09-20  4:12               ` [PATCH v5 1/2] eal: " Jeff Guo
  2017-09-20  4:12               ` [PATCH v5 2/2] app/testpmd: use uevent to monitor hot removal Jeff Guo
  2 siblings, 1 reply; 494+ messages in thread
From: Jan Blunck @ 2017-09-19 18:44 UTC (permalink / raw)
  To: Jeff Guo
  Cc: Stephen Hemminger, Bruce Richardson, Ferruh Yigit,
	Gaëtan Rivet, Ananyev, Konstantin, Shreyansh Jain,
	Jingjing Wu, dev, Thomas Monjalon, Helin Zhang

On Wed, Sep 20, 2017 at 6:12 AM, Jeff Guo <jia.guo@intel.com> wrote:
> So far, about hot plug in dpdk, we already have hot plug add/remove
> api and fail-safe driver to offload the fail-safe work from the app
> user. But there are still lack of a general event api, since the interrupt
> event, which hot plug related with, is diversity between each device and
> driver, such as mlx4, pci driver and others.
>
> Use the hot removal event for example, pci drivers not all exposure the
> remove interrupt, so in order to make user to easy use the hot plug feature
> for pci driver, something must be done to detect the remove event at the
> kernel level and offer a new line of interrupt to the user land.
>
> Base on the uevent of kobject mechanism in kernel, we could use it to
> benefit for monitoring the hot plug status of the device which not only
> uio/vfio of pci bus devices, but also other, such as cpu/usb/pci-express
> bus devices.
>
> The idea is comming as bellow.
>

Jeff,

We already have libudev. Sorry for catching it that late but I don't
believe that replicating the uevent handling belongs in the DPDK. You
might want to look into a helper to find the corresponding rte_device
for a given devnode though.

Also the remap_device function should get removed. There is no
synchronization between the polling for the uevent and the rest of the
drivers. Therefore there is no guarantee that you can remap to "safe"
memory. You should fix the drivers instead.

Thanks,
Jan


> a.The uevent message form FD monitoring which will be useful.
> remove@/devices/pci0000:80/0000:80:02.2/0000:82:00.0/0000:83:03.0/0000:84:00.2/uio/uio2
> ACTION=remove
> DEVPATH=/devices/pci0000:80/0000:80:02.2/0000:82:00.0/0000:83:03.0/0000:84:00.2/uio/uio2
> SUBSYSTEM=uio
> MAJOR=243
> MINOR=2
> DEVNAME=uio2
> SEQNUM=11366
>
> b.add uevent monitoring machanism:
> add several general api to enable uevent monitoring.
>
> c.add common uevent handler and uevent failure handler
> uevent of device should be handler at bus or device layer, and the memory read
> and write failure when hot removal should be handle correctly before detach behaviors.
>
> d.show example how to use uevent monitor
> enable uevent monitoring in testpmd or fail-safe to show usage.
>
> patchset history:
> v5->v4:
> 1.Move uevent monitor epolling from eal interrupt to eal device layer.
> 2.Redefine the eal device API for common, and distinguish between linux and bsd
> 3.Add failure handler helper api in bus layer.Add function of find device by name.
> 4.Replace of individual fd bind with single device, use a common fd to polling all device.
> 5.Add to register hot insertion monitoring and process, add function to auto bind driver befor user add device
> 6.Refine some coding style and typos issue
> 7.add new callback to process hot insertion
>
> v4->v3:
> 1.move uevent monitor api from eal interrupt to eal device layer.
> 2.create uevent type and struct in eal device.
> 3.move uevent handler for each driver to eal layer.
> 4.add uevent failure handler to process signal fault issue.
> 5.add example for request and use uevent monitoring in testpmd.
>
> v3->v2:
> 1.refine some return error
> 2.refine the string searching logic to avoid memory issue
>
> v2->v1:
> 1.remove global variables of hotplug_fd, add uevent_fd
> in rte_intr_handle to let each pci device self maintain it fd,
> to fix dual device fd issue.
> 2.refine some typo error.
>
>
> Jeff Guo (2):
>   eal: add uevent monitor for hot plug
>   app/testpmd: use uevent to monitor hot removal
>
>  app/test-pmd/testpmd.c                             |  90 ++++++
>  lib/librte_eal/bsdapp/eal/eal_dev.c                |  64 ++++
>  .../bsdapp/eal/include/exec-env/rte_dev.h          | 105 ++++++
>  lib/librte_eal/common/eal_common_bus.c             |  31 ++
>  lib/librte_eal/common/eal_common_dev.c             | 223 ++++++++++++-
>  lib/librte_eal/common/eal_common_pci.c             |  69 +++-
>  lib/librte_eal/common/eal_common_pci_uio.c         |  31 +-
>  lib/librte_eal/common/eal_common_vdev.c            |  29 +-
>  lib/librte_eal/common/eal_private.h                |  13 +-
>  lib/librte_eal/common/include/rte_bus.h            |  52 +++
>  lib/librte_eal/common/include/rte_dev.h            | 102 +++++-
>  lib/librte_eal/common/include/rte_pci.h            |  26 ++
>  lib/librte_eal/linuxapp/eal/Makefile               |   3 +-
>  lib/librte_eal/linuxapp/eal/eal_dev.c              | 351 +++++++++++++++++++++
>  lib/librte_eal/linuxapp/eal/eal_pci.c              |  33 ++
>  .../linuxapp/eal/include/exec-env/rte_dev.h        | 105 ++++++
>  16 files changed, 1318 insertions(+), 9 deletions(-)
>  create mode 100644 lib/librte_eal/bsdapp/eal/eal_dev.c
>  create mode 100644 lib/librte_eal/bsdapp/eal/include/exec-env/rte_dev.h
>  create mode 100644 lib/librte_eal/linuxapp/eal/eal_dev.c
>  create mode 100644 lib/librte_eal/linuxapp/eal/include/exec-env/rte_dev.h
>
> --
> 2.7.4
>

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH v5 0/2] add uevent monitor for hot plug
  2017-09-03 15:49           ` [PATCH v4 2/2] app/testpmd: use uevent to monitor hot removal Jeff Guo
@ 2017-09-20  4:12             ` Jeff Guo
  2017-09-19 18:44               ` Jan Blunck
                                 ` (2 more replies)
  0 siblings, 3 replies; 494+ messages in thread
From: Jeff Guo @ 2017-09-20  4:12 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, gaetan.rivet
  Cc: konstantin.ananyev, jblunck, shreyansh.jain, jingjing.wu, dev,
	jia.guo, thomas, helin.zhang

So far, about hot plug in dpdk, we already have hot plug add/remove
api and fail-safe driver to offload the fail-safe work from the app
user. But there are still lack of a general event api, since the interrupt
event, which hot plug related with, is diversity between each device and
driver, such as mlx4, pci driver and others.

Use the hot removal event for example, pci drivers not all exposure the
remove interrupt, so in order to make user to easy use the hot plug feature
for pci driver, something must be done to detect the remove event at the
kernel level and offer a new line of interrupt to the user land.

Base on the uevent of kobject mechanism in kernel, we could use it to
benefit for monitoring the hot plug status of the device which not only
uio/vfio of pci bus devices, but also other, such as cpu/usb/pci-express
bus devices.

The idea is comming as bellow.

a.The uevent message form FD monitoring which will be useful.
remove@/devices/pci0000:80/0000:80:02.2/0000:82:00.0/0000:83:03.0/0000:84:00.2/uio/uio2
ACTION=remove
DEVPATH=/devices/pci0000:80/0000:80:02.2/0000:82:00.0/0000:83:03.0/0000:84:00.2/uio/uio2
SUBSYSTEM=uio
MAJOR=243
MINOR=2
DEVNAME=uio2
SEQNUM=11366

b.add uevent monitoring machanism:
add several general api to enable uevent monitoring.

c.add common uevent handler and uevent failure handler
uevent of device should be handler at bus or device layer, and the memory read
and write failure when hot removal should be handle correctly before detach behaviors.

d.show example how to use uevent monitor
enable uevent monitoring in testpmd or fail-safe to show usage.

patchset history:
v5->v4:
1.Move uevent monitor epolling from eal interrupt to eal device layer.
2.Redefine the eal device API for common, and distinguish between linux and bsd
3.Add failure handler helper api in bus layer.Add function of find device by name.
4.Replace of individual fd bind with single device, use a common fd to polling all device.
5.Add to register hot insertion monitoring and process, add function to auto bind driver befor user add device
6.Refine some coding style and typos issue
7.add new callback to process hot insertion

v4->v3:
1.move uevent monitor api from eal interrupt to eal device layer.
2.create uevent type and struct in eal device.
3.move uevent handler for each driver to eal layer.
4.add uevent failure handler to process signal fault issue.
5.add example for request and use uevent monitoring in testpmd.

v3->v2:
1.refine some return error
2.refine the string searching logic to avoid memory issue

v2->v1:
1.remove global variables of hotplug_fd, add uevent_fd
in rte_intr_handle to let each pci device self maintain it fd,
to fix dual device fd issue.
2.refine some typo error.


Jeff Guo (2):
  eal: add uevent monitor for hot plug
  app/testpmd: use uevent to monitor hot removal

 app/test-pmd/testpmd.c                             |  90 ++++++
 lib/librte_eal/bsdapp/eal/eal_dev.c                |  64 ++++
 .../bsdapp/eal/include/exec-env/rte_dev.h          | 105 ++++++
 lib/librte_eal/common/eal_common_bus.c             |  31 ++
 lib/librte_eal/common/eal_common_dev.c             | 223 ++++++++++++-
 lib/librte_eal/common/eal_common_pci.c             |  69 +++-
 lib/librte_eal/common/eal_common_pci_uio.c         |  31 +-
 lib/librte_eal/common/eal_common_vdev.c            |  29 +-
 lib/librte_eal/common/eal_private.h                |  13 +-
 lib/librte_eal/common/include/rte_bus.h            |  52 +++
 lib/librte_eal/common/include/rte_dev.h            | 102 +++++-
 lib/librte_eal/common/include/rte_pci.h            |  26 ++
 lib/librte_eal/linuxapp/eal/Makefile               |   3 +-
 lib/librte_eal/linuxapp/eal/eal_dev.c              | 351 +++++++++++++++++++++
 lib/librte_eal/linuxapp/eal/eal_pci.c              |  33 ++
 .../linuxapp/eal/include/exec-env/rte_dev.h        | 105 ++++++
 16 files changed, 1318 insertions(+), 9 deletions(-)
 create mode 100644 lib/librte_eal/bsdapp/eal/eal_dev.c
 create mode 100644 lib/librte_eal/bsdapp/eal/include/exec-env/rte_dev.h
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_dev.c
 create mode 100644 lib/librte_eal/linuxapp/eal/include/exec-env/rte_dev.h

-- 
2.7.4

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH v5 1/2] eal: add uevent monitor for hot plug
  2017-09-20  4:12             ` [PATCH v5 0/2] add uevent monitor for hot plug Jeff Guo
  2017-09-19 18:44               ` Jan Blunck
@ 2017-09-20  4:12               ` Jeff Guo
  2017-09-20  4:12               ` [PATCH v5 2/2] app/testpmd: use uevent to monitor hot removal Jeff Guo
  2 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2017-09-20  4:12 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, gaetan.rivet
  Cc: konstantin.ananyev, jblunck, shreyansh.jain, jingjing.wu, dev,
	jia.guo, thomas, helin.zhang

This patch aim to add a general uevent mechanism in eal device layer,
to enable all kernel object hot plug monitoring, so user could use these
API to monitor and read out the device status info sent from the kernel
side, then corresponding to handle it, such as detach or attach the
device, and even benefit to use it for do smoothly fail safe work.

1) About uevent monitoring:
a: add one epolling to poll the netlink socket, to monitor the uevent of
   the device, add device_state in struct of rte_device, to identify the
   device state machine.
b: add enum of rte_eal_dev_event_type and struct of rte_eal_uevent.
c: add below API in rte eal device common layer.
   rte_eal_dev_monitor_enable
   rte_dev_callback_register
   rte_dev_callback_unregister
   _rte_dev_callback_process
   rte_dev_bind_driver
   rte_dev_monitor_start
   rte_dev_monitor_stop

2) About failure handler, use pci uio for example,
   add pci_remap_device in bus layer and below function to process it:
   rte_pci_remap_device
   pci_uio_remap_resource
   pci_map_private_resource

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v5->v4:
1.Move uevent monitor epolling from eal interrupt to eal device layer.
2.Redefine the eal device API for common, and distinguish between linux and bsd  
3.Add failure handler helper api in bus layer.Add function of find device by name. 
4.Replace of individual fd bind with single device, use a common fd to polling all device.
5.Add to register hot insertion monitoring and process, add function to auto bind driver befor user add device
6.Refine some coding style and typos issue
---
 lib/librte_eal/bsdapp/eal/eal_dev.c                |  64 ++++
 .../bsdapp/eal/include/exec-env/rte_dev.h          | 105 ++++++
 lib/librte_eal/common/eal_common_bus.c             |  31 ++
 lib/librte_eal/common/eal_common_dev.c             | 223 ++++++++++++-
 lib/librte_eal/common/eal_common_pci.c             |  69 +++-
 lib/librte_eal/common/eal_common_pci_uio.c         |  31 +-
 lib/librte_eal/common/eal_common_vdev.c            |  29 +-
 lib/librte_eal/common/eal_private.h                |  13 +-
 lib/librte_eal/common/include/rte_bus.h            |  52 +++
 lib/librte_eal/common/include/rte_dev.h            | 102 +++++-
 lib/librte_eal/common/include/rte_pci.h            |  26 ++
 lib/librte_eal/linuxapp/eal/Makefile               |   3 +-
 lib/librte_eal/linuxapp/eal/eal_dev.c              | 351 +++++++++++++++++++++
 lib/librte_eal/linuxapp/eal/eal_pci.c              |  33 ++
 .../linuxapp/eal/include/exec-env/rte_dev.h        | 105 ++++++
 15 files changed, 1228 insertions(+), 9 deletions(-)
 create mode 100644 lib/librte_eal/bsdapp/eal/eal_dev.c
 create mode 100644 lib/librte_eal/bsdapp/eal/include/exec-env/rte_dev.h
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_dev.c
 create mode 100644 lib/librte_eal/linuxapp/eal/include/exec-env/rte_dev.h

diff --git a/lib/librte_eal/bsdapp/eal/eal_dev.c b/lib/librte_eal/bsdapp/eal/eal_dev.c
new file mode 100644
index 0000000..6ea9a74
--- /dev/null
+++ b/lib/librte_eal/bsdapp/eal/eal_dev.c
@@ -0,0 +1,64 @@
+/*-
+ *   Copyright(c) 2010-2017 Intel Corporation.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdio.h>
+#include <string.h>
+#include <inttypes.h>
+#include <sys/queue.h>
+#include <sys/signalfd.h>
+#include <sys/ioctl.h>
+#include <sys/socket.h>
+#include <linux/netlink.h>
+#include <sys/epoll.h>
+#include <unistd.h>
+#include <signal.h>
+#include <stdbool.h>
+
+#include <rte_malloc.h>
+#include <rte_bus.h>
+#include <rte_dev.h>
+#include <rte_devargs.h>
+#include <rte_debug.h>
+#include <rte_log.h>
+
+#include "eal_thread.h"
+
+int
+rte_dev_monitor_start(void)
+{
+	return -1;
+}
+
+int
+rte_dev_monitor_stop(void)
+{
+	return -1;
+}
diff --git a/lib/librte_eal/bsdapp/eal/include/exec-env/rte_dev.h b/lib/librte_eal/bsdapp/eal/include/exec-env/rte_dev.h
new file mode 100644
index 0000000..7d2c3c3
--- /dev/null
+++ b/lib/librte_eal/bsdapp/eal/include/exec-env/rte_dev.h
@@ -0,0 +1,105 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_DEV_H_
+#error "don't include this file directly, please include generic <rte_dev.h>"
+#endif
+
+#ifndef _RTE_LINUXAPP_DEV_H_
+#define _RTE_LINUXAPP_DEV_H_
+
+#include <stdio.h>
+
+#include <rte_dev.h>
+
+#define RTE_EAL_UEVENT_MSG_LEN 4096
+
+enum uev_subsystem {
+	UEV_SUBSYSTEM_UIO,
+	UEV_SUBSYSTEM_VFIO,
+	UEV_SUBSYSTEM_PCI,
+	UEV_SUBSYSTEM_MAX
+};
+
+enum uev_monitor_netlink_group {
+	UEV_MONITOR_KERNEL,
+	UEV_MONITOR_UDEV,
+};
+
+/**
+ * The device event type.
+ */
+enum rte_eal_dev_event_type {
+	RTE_EAL_DEV_EVENT_UNKNOWN,	/**< unknown event type */
+	RTE_EAL_DEV_EVENT_ADD,		/**< device adding event */
+	RTE_EAL_DEV_EVENT_REMOVE,
+					/**< device removing event */
+	RTE_EAL_DEV_EVENT_CHANGE,
+					/**< device status change event */
+	RTE_EAL_DEV_EVENT_MOVE,		/**< device sys path move event */
+	RTE_EAL_DEV_EVENT_ONLINE,	/**< device online event */
+	RTE_EAL_DEV_EVENT_OFFLINE,	/**< device offline event */
+	RTE_EAL_DEV_EVENT_MAX		/**< max value of this enum */
+};
+
+struct rte_eal_uevent {
+	enum rte_eal_dev_event_type type;	/**< device event type */
+	int subsystem;				/**< subsystem id */
+	char *devname;				/**< device name */
+	enum uev_monitor_netlink_group group;	/**< device netlink group */
+};
+
+/**
+ * Start the device uevent monitoring.
+ *
+ * @param none
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+rte_dev_monitor_start(void);
+
+/**
+ * Stop the device uevent monitoring .
+ *
+ * @param none
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+
+int
+rte_dev_monitor_stop(void);
+
+#endif /* _RTE_LINUXAPP_DEV_H_ */
diff --git a/lib/librte_eal/common/eal_common_bus.c b/lib/librte_eal/common/eal_common_bus.c
index 08bec2d..e8ed396 100644
--- a/lib/librte_eal/common/eal_common_bus.c
+++ b/lib/librte_eal/common/eal_common_bus.c
@@ -50,8 +50,10 @@ rte_bus_register(struct rte_bus *bus)
 	RTE_VERIFY(bus->scan);
 	RTE_VERIFY(bus->probe);
 	RTE_VERIFY(bus->find_device);
+	RTE_VERIFY(bus->find_device_by_name);
 	/* Buses supporting driver plug also require unplug. */
 	RTE_VERIFY(!bus->plug || bus->unplug);
+	RTE_VERIFY(bus->remap_device);
 
 	TAILQ_INSERT_TAIL(&rte_bus_list, bus, next);
 	RTE_LOG(DEBUG, EAL, "Registered [%s] bus.\n", bus->name);
@@ -174,6 +176,15 @@ cmp_rte_device(const struct rte_device *dev1, const void *_dev2)
 }
 
 static int
+cmp_rte_device_name(const char *dev_name1, const void *_dev_name2)
+{
+	const char *dev_name2 = _dev_name2;
+
+	return strcmp(dev_name1, dev_name2);
+}
+
+
+static int
 bus_find_device(const struct rte_bus *bus, const void *_dev)
 {
 	struct rte_device *dev;
@@ -182,6 +193,26 @@ bus_find_device(const struct rte_bus *bus, const void *_dev)
 	return dev == NULL;
 }
 
+static struct rte_device *
+bus_find_device_by_name(const struct rte_bus *bus, const void *_dev_name)
+{
+	struct rte_device *dev;
+
+	dev = bus->find_device_by_name(NULL, cmp_rte_device_name, _dev_name);
+	return dev;
+}
+
+
+struct rte_device *
+
+rte_bus_find_device(const struct rte_bus *bus, const void *_dev_name)
+{
+	struct rte_device *dev;
+
+	dev = bus_find_device_by_name(bus, _dev_name);
+	return dev;
+}
+
 struct rte_bus *
 rte_bus_find_by_device(const struct rte_device *dev)
 {
diff --git a/lib/librte_eal/common/eal_common_dev.c b/lib/librte_eal/common/eal_common_dev.c
index e251275..9ebdd6a 100644
--- a/lib/librte_eal/common/eal_common_dev.c
+++ b/lib/librte_eal/common/eal_common_dev.c
@@ -36,15 +36,40 @@
 #include <string.h>
 #include <inttypes.h>
 #include <sys/queue.h>
+#include <string.h>
+#include <unistd.h>
+#include <fcntl.h>
 
 #include <rte_bus.h>
 #include <rte_dev.h>
 #include <rte_devargs.h>
 #include <rte_debug.h>
 #include <rte_log.h>
+#include <rte_spinlock.h>
+#include <rte_malloc.h>
 
 #include "eal_private.h"
 
+/* spinlock for device callbacks */
+static rte_spinlock_t rte_dev_cb_lock = RTE_SPINLOCK_INITIALIZER;
+
+/**
+ * The user application callback description.
+ *
+ * It contains callback address to be registered by user application,
+ * the pointer to the parameters for callback, and the event type.
+ */
+struct rte_eal_dev_callback {
+	TAILQ_ENTRY(rte_eal_dev_callback) next; /**< Callbacks list */
+	rte_eal_dev_cb_fn cb_fn;                /**< Callback address */
+	void *cb_arg;                           /**< Parameter for callback */
+	void *ret_param;                        /**< Return parameter */
+	enum rte_eal_dev_event_type event;          /**< device event type */
+	uint32_t active;                        /**< Callback is executing */
+};
+
+struct rte_eal_dev_cb_list uev_cbs;
+
 static int cmp_detached_dev_name(const struct rte_device *dev,
 	const void *_name)
 {
@@ -53,7 +78,6 @@ static int cmp_detached_dev_name(const struct rte_device *dev,
 	/* skip attached devices */
 	if (dev->driver != NULL)
 		return 1;
-
 	return strcmp(dev->name, name);
 }
 
@@ -244,3 +268,200 @@ int rte_eal_hotplug_remove(const char *busname, const char *devname)
 	rte_eal_devargs_remove(busname, devname);
 	return ret;
 }
+
+int
+rte_eal_dev_monitor_enable(void)
+{
+	int ret;
+
+	if (TAILQ_EMPTY(&uev_cbs))
+		TAILQ_INIT(&uev_cbs);
+
+	ret = rte_dev_monitor_start();
+	if (ret)
+		RTE_LOG(ERR, EAL, "Can not init device monitor\n");
+	return ret;
+}
+
+int
+rte_dev_callback_register(enum rte_eal_dev_event_type event,
+			rte_eal_dev_cb_fn cb_fn, void *cb_arg)
+{
+	struct rte_eal_dev_callback *user_cb;
+
+	if (!cb_fn)
+		return -EINVAL;
+
+	rte_spinlock_lock(&rte_dev_cb_lock);
+
+	TAILQ_FOREACH(user_cb, &uev_cbs, next) {
+		if (user_cb->cb_fn == cb_fn &&
+			user_cb->cb_arg == cb_arg &&
+			user_cb->event == event) {
+			break;
+		}
+	}
+
+	/* create a new callback. */
+	if (user_cb == NULL) {
+		/* allocate a new interrupt callback entity */
+		user_cb = rte_zmalloc("eal device event",
+					sizeof(*user_cb), 0);
+		if (user_cb == NULL) {
+			RTE_LOG(ERR, EAL, "Can not allocate memory\n");
+			return -ENOMEM;
+		}
+		user_cb->cb_fn = cb_fn;
+		user_cb->cb_arg = cb_arg;
+		user_cb->event = event;
+		TAILQ_INSERT_TAIL(&uev_cbs, user_cb, next);
+	}
+
+	rte_spinlock_unlock(&rte_dev_cb_lock);
+	return 0;
+}
+
+int
+rte_dev_callback_unregister(enum rte_eal_dev_event_type event,
+			rte_eal_dev_cb_fn cb_fn, void *cb_arg)
+{
+	int ret;
+	struct rte_eal_dev_callback *cb, *next;
+
+	if (!cb_fn)
+		return -EINVAL;
+
+	rte_spinlock_lock(&rte_dev_cb_lock);
+
+	ret = 0;
+	for (cb = TAILQ_FIRST(&uev_cbs); cb != NULL; cb = next) {
+
+		next = TAILQ_NEXT(cb, next);
+
+		if (cb->cb_fn != cb_fn || cb->event != event ||
+				(cb->cb_arg != (void *)-1 &&
+				cb->cb_arg != cb_arg))
+			continue;
+
+		/*
+		 * if this callback is not executing right now,
+		 * then remove it.
+		 */
+		if (cb->active == 0) {
+			TAILQ_REMOVE(&(uev_cbs), cb, next);
+			rte_free(cb);
+		} else {
+			ret = -EAGAIN;
+		}
+	}
+
+	rte_spinlock_unlock(&rte_dev_cb_lock);
+	return ret;
+}
+
+int
+_rte_dev_callback_process(enum rte_eal_dev_event_type event,
+			void *cb_arg, void *ret_param)
+{
+	struct rte_eal_dev_callback *cb_lst;
+	struct rte_eal_dev_callback dev_cb;
+	int rc = 0;
+
+	rte_spinlock_lock(&rte_dev_cb_lock);
+	TAILQ_FOREACH(cb_lst, &(uev_cbs), next) {
+		if (cb_lst->cb_fn == NULL || cb_lst->event != event)
+			continue;
+		dev_cb = *cb_lst;
+		cb_lst->active = 1;
+		if (cb_arg != NULL)
+			dev_cb.cb_arg = cb_arg;
+		if (ret_param != NULL)
+			dev_cb.ret_param = ret_param;
+
+		rte_spinlock_unlock(&rte_dev_cb_lock);
+		rc = dev_cb.cb_fn(dev_cb.event,
+				dev_cb.cb_arg, dev_cb.ret_param);
+		rte_spinlock_lock(&rte_dev_cb_lock);
+		cb_lst->active = 0;
+	}
+	rte_spinlock_unlock(&rte_dev_cb_lock);
+	return rc;
+}
+
+int
+rte_dev_bind_driver(const char *dev_name, const char *drv_type) {
+	char drv_bind_path[PATH_MAX];
+	char drv_override_path[PATH_MAX]; /* contains the /dev/uioX */
+	int drv_override_fd, drv_bind_fd;
+
+	RTE_SET_USED(drv_type);
+
+	snprintf(drv_override_path, sizeof(drv_override_path),
+		"/sys/bus/pci/devices/%s/driver_override", dev_name);
+
+	/* specify the driver for a device by writing to driver_override */
+	drv_override_fd = open(drv_override_path, O_WRONLY);
+	if (drv_override_fd < 0) {
+		RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
+			drv_override_path, strerror(errno));
+		goto err;
+	}
+
+	if (write(drv_override_fd, drv_type, sizeof(drv_type)) < 0) {
+		RTE_LOG(ERR, EAL,
+			"Error: bind failed - Cannot write "
+			"driver %s to device %s\n", drv_type, dev_name);
+		goto err;
+	}
+	close(drv_override_fd);
+
+
+	snprintf(drv_bind_path, sizeof(drv_bind_path),
+		"/sys/bus/pci/drivers/%s/bind", drv_type);
+
+	/* do the bind by writing device to the specific driver  */
+	drv_bind_fd = open(drv_bind_path, O_APPEND | O_WRONLY);
+	if (drv_bind_fd < 0) {
+		RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
+			drv_bind_path, strerror(errno));
+		goto err;
+	}
+
+	if (write(drv_bind_fd, dev_name, sizeof(dev_name)) < 0) {
+		RTE_LOG(ERR, EAL,
+		"Error: bind failed - Cannot write device %s "
+		"to driver %s \n", dev_name, drv_type);
+		goto err;
+	}
+	close(drv_bind_fd);
+
+	snprintf(drv_override_path, sizeof(drv_override_path),
+		"/sys/bus/pci/devices/%s/driver_override", dev_name);
+
+	/**
+	 * Before unbinding it, overwrite driver_override with empty
+	 * string so that the device can be bound to any other
+	 * driver
+	 */
+	drv_override_fd = open(drv_override_path, O_WRONLY);
+	if (drv_override_fd < 0) {
+		RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
+			drv_override_path, strerror(errno));
+		goto err;
+	}
+
+	drv_type = "\00";
+
+	if (write(drv_override_fd, drv_type, sizeof(drv_type)) < 0) {
+		RTE_LOG(ERR, EAL,
+			"Error: bind failed - Cannot write "
+			"driver %s to device %s\n", drv_type, dev_name);
+		goto err;
+	}
+	close(drv_override_fd);
+	return 0;
+err:
+	close(drv_override_fd);
+	close(drv_bind_fd);
+	return -1;
+}
diff --git a/lib/librte_eal/common/eal_common_pci.c b/lib/librte_eal/common/eal_common_pci.c
index 52fd38c..439852b 100644
--- a/lib/librte_eal/common/eal_common_pci.c
+++ b/lib/librte_eal/common/eal_common_pci.c
@@ -110,6 +110,27 @@ pci_name_set(struct rte_pci_device *dev)
 		dev->device.name = dev->name;
 }
 
+/* map a private resource from an address*/
+void *
+pci_map_private_resource(void *requested_addr, off_t offset, size_t size)
+{
+	void *mapaddr;
+
+	mapaddr = mmap(requested_addr, size,
+			   PROT_READ | PROT_WRITE,
+			   MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
+	if (mapaddr == MAP_FAILED) {
+		RTE_LOG(ERR, EAL, "%s(): cannot mmap(%p, 0x%lx, 0x%lx): "
+			"%s (%p)\n",
+			__func__, requested_addr,
+			(unsigned long)size, (unsigned long)offset,
+			strerror(errno), mapaddr);
+	} else
+		RTE_LOG(DEBUG, EAL, "  PCI memory mapped at %p\n", mapaddr);
+
+	return mapaddr;
+}
+
 /* map a particular resource from a file */
 void *
 pci_map_resource(void *requested_addr, int fd, off_t offset, size_t size,
@@ -285,9 +306,10 @@ rte_pci_detach_dev(struct rte_pci_device *dev)
 	RTE_LOG(DEBUG, EAL, "  remove driver: %x:%x %s\n", dev->id.vendor_id,
 			dev->id.device_id, dr->driver.name);
 
-	if (dr->remove && (dr->remove(dev) < 0))
-		return -1;	/* negative value is an error */
-
+	if (dev->device.state != DEVICE_FAULT) {
+		if (dr->remove && (dr->remove(dev) < 0))
+			return -1;	/* negative value is an error */
+	}
 	/* clear driver structure */
 	dev->driver = NULL;
 
@@ -538,6 +560,7 @@ pci_find_device(const struct rte_device *start, rte_dev_cmp_t cmp,
 			start = NULL; /* starting point found */
 			continue;
 		}
+
 		if (cmp(&dev->device, data) == 0)
 			return &dev->device;
 	}
@@ -545,6 +568,44 @@ pci_find_device(const struct rte_device *start, rte_dev_cmp_t cmp,
 	return NULL;
 }
 
+static struct rte_device *
+pci_find_device_by_name(const struct rte_device *start,
+		rte_dev_cmp_name_t cmp_name,
+		const void *data)
+{
+	struct rte_pci_device *dev;
+
+	FOREACH_DEVICE_ON_PCIBUS(dev) {
+		if (start && &dev->device == start) {
+			start = NULL; /* starting point found */
+			continue;
+		}
+		if (cmp_name(dev->device.name, data) == 0)
+			return &dev->device;
+	}
+
+	return NULL;
+}
+
+static int
+pci_remap_device(struct rte_device *dev)
+{
+	struct rte_pci_device *pdev;
+	int ret;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	pdev = RTE_DEV_TO_PCI(dev);
+
+	/* remap resources for devices that use igb_uio */
+	ret = rte_pci_remap_device(pdev);
+	if (ret != 0)
+		RTE_LOG(ERR, EAL, "failed to remap device %s",
+			dev->name);
+	return ret;
+}
+
 static int
 pci_plug(struct rte_device *dev)
 {
@@ -569,9 +630,11 @@ struct rte_pci_bus rte_pci_bus = {
 		.scan = rte_pci_scan,
 		.probe = rte_pci_probe,
 		.find_device = pci_find_device,
+		.find_device_by_name = pci_find_device_by_name,
 		.plug = pci_plug,
 		.unplug = pci_unplug,
 		.parse = pci_parse,
+		.remap_device = pci_remap_device,
 	},
 	.device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
 	.driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
diff --git a/lib/librte_eal/common/eal_common_pci_uio.c b/lib/librte_eal/common/eal_common_pci_uio.c
index 367a681..cb12349 100644
--- a/lib/librte_eal/common/eal_common_pci_uio.c
+++ b/lib/librte_eal/common/eal_common_pci_uio.c
@@ -174,6 +174,34 @@ pci_uio_unmap(struct mapped_pci_resource *uio_res)
 	}
 }
 
+/* remap the PCI resource of a PCI device in private virtual memory */
+int
+pci_uio_remap_resource(struct rte_pci_device *dev)
+{
+	int i;
+	uint64_t phaddr;
+	void *map_address;
+
+	/* Map all BARs */
+	for (i = 0; i != PCI_MAX_RESOURCE; i++) {
+		/* skip empty BAR */
+		phaddr = dev->mem_resource[i].phys_addr;
+		if (phaddr == 0)
+			continue;
+		map_address = pci_map_private_resource(
+				dev->mem_resource[i].addr, 0,
+				(size_t)dev->mem_resource[i].len);
+		if (map_address == MAP_FAILED)
+			goto error;
+		memset(map_address, 0xFF, (size_t)dev->mem_resource[i].len);
+		dev->mem_resource[i].addr = map_address;
+	}
+
+	return 0;
+error:
+	return -1;
+}
+
 static struct mapped_pci_resource *
 pci_uio_find_resource(struct rte_pci_device *dev)
 {
@@ -222,7 +250,8 @@ pci_uio_unmap_resource(struct rte_pci_device *dev)
 	rte_free(uio_res);
 
 	/* close fd if in primary process */
-	close(dev->intr_handle.fd);
+	if (dev->device.state != DEVICE_FAULT)
+		close(dev->intr_handle.fd);
 	if (dev->intr_handle.uio_cfg_fd >= 0) {
 		close(dev->intr_handle.uio_cfg_fd);
 		dev->intr_handle.uio_cfg_fd = -1;
diff --git a/lib/librte_eal/common/eal_common_vdev.c b/lib/librte_eal/common/eal_common_vdev.c
index f7e547a..ebed0b7 100644
--- a/lib/librte_eal/common/eal_common_vdev.c
+++ b/lib/librte_eal/common/eal_common_vdev.c
@@ -303,7 +303,7 @@ vdev_probe(void)
 
 static struct rte_device *
 vdev_find_device(const struct rte_device *start, rte_dev_cmp_t cmp,
-		 const void *data)
+		const void *data)
 {
 	struct rte_vdev_device *dev;
 
@@ -318,6 +318,31 @@ vdev_find_device(const struct rte_device *start, rte_dev_cmp_t cmp,
 	return NULL;
 }
 
+static struct rte_device *
+vdev_find_device_by_name(const struct rte_device *start,
+		rte_dev_cmp_name_t cmp_name,
+		const void *data)
+{
+	struct rte_vdev_device *dev;
+
+	TAILQ_FOREACH(dev, &vdev_device_list, next) {
+		if (start && &dev->device == start) {
+			start = NULL;
+			continue;
+		}
+		if (cmp_name(dev->device.name, data) == 0)
+			return &dev->device;
+	}
+	return NULL;
+}
+
+static int
+vdev_remap_device(struct rte_device *dev)
+{
+	RTE_SET_USED(dev);
+	return 0;
+}
+
 static int
 vdev_plug(struct rte_device *dev)
 {
@@ -334,9 +359,11 @@ static struct rte_bus rte_vdev_bus = {
 	.scan = vdev_scan,
 	.probe = vdev_probe,
 	.find_device = vdev_find_device,
+	.find_device_by_name = vdev_find_device_by_name,
 	.plug = vdev_plug,
 	.unplug = vdev_unplug,
 	.parse = vdev_parse,
+	.remap_device = vdev_remap_device,
 };
 
 RTE_REGISTER_BUS(vdev, rte_vdev_bus);
diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h
index 597d82e..4da42be 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -222,6 +222,18 @@ void pci_uio_free_resource(struct rte_pci_device *dev,
 		struct mapped_pci_resource *uio_res);
 
 /**
+ * remap the pci uio resource..
+ *
+ * @param dev
+ *   Point to the struct rte pci device.
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+pci_uio_remap_resource(struct rte_pci_device *dev);
+
+/**
  * Map device memory to uio resource
  *
  * This function is private to EAL.
@@ -354,5 +366,4 @@ bool rte_eal_using_phys_addrs(void);
  *   NULL if no bus is able to parse this device.
  */
 struct rte_bus *rte_bus_find_by_device_name(const char *str);
-
 #endif /* _EAL_PRIVATE_H_ */
diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
index c79368d..3514014 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -107,6 +107,34 @@ typedef struct rte_device *
 			 const void *data);
 
 /**
+ * Device iterator to find a device on a bus.
+ *
+ * This function returns an rte_device if one of those held by the bus
+ * matches the data passed as parameter.
+ *
+ * If the comparison function returns zero this function should stop iterating
+ * over any more devices. To continue a search the device of a previous search
+ * can be passed via the start parameter.
+ *
+ * @param cmp
+ *	the device name comparison function.
+ *
+ * @param data
+ *	Data to compare each device against.
+ *
+ * @param start
+ *	starting point for the iteration
+ *
+ * @return
+ *	The first device matching the data, NULL if none exists.
+ */
+typedef struct rte_device *
+(*rte_bus_find_device_by_name_t)(const struct rte_device *start,
+			 rte_dev_cmp_name_t cmp,
+			 const void *data);
+
+
+/**
  * Implementation specific probe function which is responsible for linking
  * devices on that bus with applicable drivers.
  *
@@ -153,6 +181,20 @@ typedef int (*rte_bus_unplug_t)(struct rte_device *dev);
 typedef int (*rte_bus_parse_t)(const char *name, void *addr);
 
 /**
+ * Implementation specific remap function which is responsible for remmaping
+ * devices on that bus from original share memory resource to a private memory
+ * resource for the sake of device has been removal.
+ *
+ * @param dev
+ *	Device pointer that was returned by a previous call to find_device.
+ *
+ * @return
+ *	0 on success.
+ *	!0 on error.
+ */
+typedef int (*rte_bus_remap_device_t)(struct rte_device *dev);
+
+/**
  * Bus scan policies
  */
 enum rte_bus_scan_mode {
@@ -177,9 +219,12 @@ struct rte_bus {
 	rte_bus_scan_t scan;         /**< Scan for devices attached to bus */
 	rte_bus_probe_t probe;       /**< Probe devices on bus */
 	rte_bus_find_device_t find_device; /**< Find a device on the bus */
+	rte_bus_find_device_by_name_t find_device_by_name;
+				     /**< Find a device on the bus */
 	rte_bus_plug_t plug;         /**< Probe single device for drivers */
 	rte_bus_unplug_t unplug;     /**< Remove single device from driver */
 	rte_bus_parse_t parse;       /**< Parse a device name */
+	rte_bus_remap_device_t remap_device;       /**< remap a device */
 	struct rte_bus_conf conf;    /**< Bus configuration */
 };
 
@@ -276,6 +321,13 @@ struct rte_bus *rte_bus_find(const struct rte_bus *start, rte_bus_cmp_t cmp,
 struct rte_bus *rte_bus_find_by_device(const struct rte_device *dev);
 
 /**
+ * Find the registered bus for a particular device.
+ */
+struct rte_device *rte_bus_find_device(const struct rte_bus *bus,
+				const void *dev_name);
+
+
+/**
  * Find the registered bus for a given name.
  */
 struct rte_bus *rte_bus_find_by_name(const char *busname);
diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h
index 5386d3a..9cea6bf 100644
--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -52,6 +52,15 @@ extern "C" {
 #include <rte_config.h>
 #include <rte_log.h>
 
+#include <exec-env/rte_dev.h>
+
+typedef int (*rte_eal_dev_cb_fn)(enum rte_eal_dev_event_type event,
+					void *cb_arg, void *ret_param);
+
+struct rte_eal_dev_callback;
+/** @internal Structure to keep track of registered callbacks */
+TAILQ_HEAD(rte_eal_dev_cb_list, rte_eal_dev_callback);
+
 __attribute__((format(printf, 2, 0)))
 static inline void
 rte_pmd_debug_trace(const char *func_name, const char *fmt, ...)
@@ -154,6 +163,13 @@ struct rte_driver {
 
 #define RTE_DEV_NAME_MAX_LEN (32)
 
+enum device_state {
+	DEVICE_UNDEFINED,
+	DEVICE_FAULT,
+	DEVICE_PARSED,
+	DEVICE_PROBED,
+};
+
 /**
  * A structure describing a generic device.
  */
@@ -163,6 +179,7 @@ struct rte_device {
 	const struct rte_driver *driver;/**< Associated driver */
 	int numa_node;                /**< NUMA node connection */
 	struct rte_devargs *devargs;  /**< Device user arguments */
+	enum device_state state;  /**< Device state */
 };
 
 /**
@@ -267,6 +284,8 @@ int rte_eal_hotplug_remove(const char *busname, const char *devname);
  */
 typedef int (*rte_dev_cmp_t)(const struct rte_device *dev, const void *data);
 
+typedef int (*rte_dev_cmp_name_t)(const char *dev_name, const void *data);
+
 #define RTE_PMD_EXPORT_NAME_ARRAY(n, idx) n##idx[]
 
 #define RTE_PMD_EXPORT_NAME(name, idx) \
@@ -312,4 +331,85 @@ __attribute__((used)) = str
 }
 #endif
 
-#endif /* _RTE_VDEV_H_ */
+/**
+ * It enable the device event monitoring for a specific event.
+ *
+ * @param none
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+rte_eal_dev_monitor_enable(void);
+/**
+ * It registers the callback for the specific event. Multiple
+ * callbacks cal be registered at the same time.
+ * @param event
+ *  The device event type.
+ * @param cb_fn
+ *  callback address.
+ * @param cb_arg
+ *  address of parameter for callback.
+ *
+ * @return
+ *  - On success, zero.
+ *  - On failure, a negative value.
+ */
+int rte_dev_callback_register(enum rte_eal_dev_event_type event,
+			rte_eal_dev_cb_fn cb_fn, void *cb_arg);
+
+/**
+ * It unregisters the callback according to the specified event.
+ *
+ * @param event
+ *  The event type which corresponding to the callback.
+ * @param cb_fn
+ *  callback address.
+ *  address of parameter for callback, (void *)-1 means to remove all
+ *  registered which has the same callback address.
+ *
+ * @return
+ *  - On success, return the number of callback entities removed.
+ *  - On failure, a negative value.
+ */
+int rte_dev_callback_unregister(enum rte_eal_dev_event_type event,
+			rte_eal_dev_cb_fn cb_fn, void *cb_arg);
+
+/**
+ * @internal Executes all the user application registered callbacks for
+ * the specific device. It is for DPDK internal user only. User
+ * application should not call it directly.
+ *
+ * @param event
+ *  The device event type.
+ * @param cb_arg
+ *  callback parameter.
+ * @param ret_param
+ *  To pass data back to user application.
+ *  This allows the user application to decide if a particular function
+ *  is permitted or not.
+ *
+ * @return
+ *  - On success, return zero.
+ *  - On failure, a negative value.
+ */
+int
+_rte_dev_callback_process(enum rte_eal_dev_event_type event,
+			void *cb_arg, void *ret_param);
+
+/**
+ * It can be used to bind a device to a specific type of driver.
+ *
+ * @param dev_name
+ *  The device name.
+ * @param drv_type
+ *  The specific driver's type.
+ *
+ * @return
+ *  - On success, zero.
+ *  - On failure, a negative value.
+ */
+int
+rte_dev_bind_driver(const char *dev_name, const char *drv_type);
+
+#endif /* _RTE_DEV_H_ */
diff --git a/lib/librte_eal/common/include/rte_pci.h b/lib/librte_eal/common/include/rte_pci.h
index 8b12339..b33d53e 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -393,6 +393,32 @@ int rte_pci_map_device(struct rte_pci_device *dev);
 void rte_pci_unmap_device(struct rte_pci_device *dev);
 
 /**
+ * Remap this device
+ *
+ * @param dev
+ *   A pointer to a rte_pci_device structure describing the device
+ *   to use
+ */
+int rte_pci_remap_device(struct rte_pci_device *dev);
+
+/**
+ * @internal
+ * Map to a particular private resource.
+ *
+ * @param requested_addr
+ *      The starting address for the new mapping range.
+ * @param offset
+ *      The offset for the mapping range.
+ * @param size
+ *      The size for the mapping range.
+ * @return
+ *   - On success, the function returns a pointer to the mapped area.
+ *   - On error, the value MAP_FAILED is returned.
+ */
+void *pci_map_private_resource(void *requested_addr, off_t offset,
+		size_t size);
+
+/**
  * @internal
  * Map a particular resource from a file.
  *
diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index 90bca4d..34cf8b0 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -73,6 +73,7 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_lcore.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_timer.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_interrupts.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_alarm.c
+SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_dev.c
 
 # from common dir
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_common_lcore.c
@@ -130,7 +131,7 @@ ifeq ($(CONFIG_RTE_TOOLCHAIN_GCC),y)
 CFLAGS_eal_thread.o += -Wno-return-type
 endif
 
-INC := rte_interrupts.h rte_kni_common.h rte_dom0_common.h
+INC := rte_interrupts.h rte_kni_common.h rte_dom0_common.h rte_dev.h
 
 SYMLINK-$(CONFIG_RTE_EXEC_ENV_LINUXAPP)-include/exec-env := \
 	$(addprefix include/exec-env/,$(INC))
diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
new file mode 100644
index 0000000..9478d29
--- /dev/null
+++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
@@ -0,0 +1,351 @@
+/*-
+ *   Copyright(c) 2010-2017 Intel Corporation.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdio.h>
+#include <string.h>
+#include <inttypes.h>
+#include <sys/queue.h>
+#include <sys/signalfd.h>
+#include <sys/ioctl.h>
+#include <sys/socket.h>
+#include <linux/netlink.h>
+#include <sys/epoll.h>
+#include <unistd.h>
+#include <signal.h>
+#include <stdbool.h>
+
+#include <rte_malloc.h>
+#include <rte_bus.h>
+#include <rte_dev.h>
+#include <rte_devargs.h>
+#include <rte_debug.h>
+#include <rte_log.h>
+
+#include "eal_thread.h"
+
+/* uev monitoring thread */
+static pthread_t uev_monitor_thread;
+
+bool udev_exit = true;
+
+bool no_request_thread = true;
+
+static void sig_handler(int signum)
+{
+	if (signum == SIGINT || signum == SIGTERM)
+		rte_dev_monitor_stop();
+}
+
+static int
+dev_monitor_fd_new(void)
+{
+
+	int uevent_fd;
+
+	uevent_fd = socket(PF_NETLINK, SOCK_RAW | SOCK_CLOEXEC |
+			SOCK_NONBLOCK,
+			NETLINK_KOBJECT_UEVENT);
+	if (uevent_fd < 0) {
+		RTE_LOG(ERR, EAL, "create uevent fd failed\n");
+		return -1;
+	}
+	return uevent_fd;
+}
+
+static int
+dev_monitor_enable(int netlink_fd)
+{
+	struct sockaddr_nl addr;
+	int ret;
+	int size = 64 * 1024;
+	int nonblock = 1;
+
+	memset(&addr, 0, sizeof(addr));
+	addr.nl_family = AF_NETLINK;
+	addr.nl_pid = 0;
+	addr.nl_groups = 0xffffffff;
+
+	if (bind(netlink_fd, (struct sockaddr *) &addr, sizeof(addr)) < 0) {
+		RTE_LOG(ERR, EAL, "bind failed\n");
+		goto err;
+	}
+
+	setsockopt(netlink_fd, SOL_SOCKET, SO_PASSCRED, &size, sizeof(size));
+
+	ret = ioctl(netlink_fd, FIONBIO, &nonblock);
+	if (ret != 0) {
+		RTE_LOG(ERR, EAL, "ioctl(FIONBIO) failed\n");
+		goto err;
+	}
+	return 0;
+err:
+	close(netlink_fd);
+	return -1;
+}
+
+static int
+dev_uev_parse(const char *buf, struct rte_eal_uevent *event)
+{
+	char action[RTE_EAL_UEVENT_MSG_LEN];
+	char subsystem[RTE_EAL_UEVENT_MSG_LEN];
+	char dev_path[RTE_EAL_UEVENT_MSG_LEN];
+	char pci_slot_name[RTE_EAL_UEVENT_MSG_LEN];
+	int i = 0;
+
+	memset(action, 0, RTE_EAL_UEVENT_MSG_LEN);
+	memset(subsystem, 0, RTE_EAL_UEVENT_MSG_LEN);
+	memset(dev_path, 0, RTE_EAL_UEVENT_MSG_LEN);
+	memset(pci_slot_name, 0, RTE_EAL_UEVENT_MSG_LEN);
+
+	while (i < RTE_EAL_UEVENT_MSG_LEN) {
+		for (; i < RTE_EAL_UEVENT_MSG_LEN; i++) {
+			if (*buf)
+				break;
+			buf++;
+		}
+		if (!strncmp(buf, "libudev", 7)) {
+			buf += 7;
+			i += 7;
+			event->group = UEV_MONITOR_UDEV;
+		}
+		if (!strncmp(buf, "ACTION=", 7)) {
+			buf += 7;
+			i += 7;
+			snprintf(action, sizeof(action), "%s", buf);
+		} else if (!strncmp(buf, "DEVPATH=", 8)) {
+			buf += 8;
+			i += 8;
+			snprintf(dev_path, sizeof(dev_path), "%s", buf);
+		} else if (!strncmp(buf, "SUBSYSTEM=", 10)) {
+			buf += 10;
+			i += 10;
+			snprintf(subsystem, sizeof(subsystem), "%s", buf);
+		} else if (!strncmp(buf, "PCI_SLOT_NAME=", 14)) {
+			buf += 14;
+			i += 14;
+			snprintf(pci_slot_name, sizeof(subsystem), "%s", buf);
+		}
+		for (; i < RTE_EAL_UEVENT_MSG_LEN; i++) {
+			if (*buf == '\0')
+				break;
+			buf++;
+		}
+	}
+
+	if (!strncmp(subsystem, "pci", 3))
+		event->subsystem = UEV_SUBSYSTEM_PCI;
+	if (!strncmp(action, "add", 3))
+		event->type = RTE_EAL_DEV_EVENT_ADD;
+	if (!strncmp(action, "remove", 6))
+		event->type = RTE_EAL_DEV_EVENT_REMOVE;
+	event->devname = pci_slot_name;
+
+	return 0;
+}
+
+static int
+dev_uev_receive(int fd, struct rte_eal_uevent *uevent)
+{
+	int ret;
+	char buf[RTE_EAL_UEVENT_MSG_LEN];
+
+	memset(uevent, 0, sizeof(struct rte_eal_uevent));
+	memset(buf, 0, RTE_EAL_UEVENT_MSG_LEN);
+
+	ret = recv(fd, buf, RTE_EAL_UEVENT_MSG_LEN - 1, MSG_DONTWAIT);
+	if (ret < 0) {
+		RTE_LOG(ERR, EAL,
+		"Socket read error(%d): %s\n",
+		errno, strerror(errno));
+		return -1;
+	} else if (ret == 0)
+		/* connection closed */
+		return -1;
+
+	return dev_uev_parse(buf, uevent);
+}
+
+static int
+dev_uev_process(struct epoll_event *events, int nfds)
+{
+	struct rte_bus *bus;
+	struct rte_device *dev;
+	struct rte_eal_uevent uevent;
+	int ret;
+	int i;
+
+	for (i = 0; i < nfds; i++) {
+		/**
+		 * check device uevent from kernel side, no need to check
+		 * uevent from udev.
+		 */
+		if ((dev_uev_receive(events[i].data.fd, &uevent)) ||
+			(uevent.group == UEV_MONITOR_UDEV))
+			return 0;
+		if (uevent.subsystem == UEV_SUBSYSTEM_PCI) {
+			bus = rte_bus_find_by_name("pci");
+			dev = rte_bus_find_device(bus, uevent.devname);
+			if (uevent.type == RTE_EAL_DEV_EVENT_REMOVE) {
+				if (!dev)
+					return 0;
+				dev->state = DEVICE_FAULT;
+				/**
+				 * remap the resource to be fake
+				 * before user's removal processing
+				 */
+				ret = bus->remap_device(dev);
+				if (!ret)
+					return(_rte_dev_callback_process(
+					  RTE_EAL_DEV_EVENT_REMOVE,
+					  NULL, NULL));
+			} else if (uevent.type == RTE_EAL_DEV_EVENT_ADD) {
+				if (dev == NULL) {
+					/**
+					 * bind the drvier to the device
+					 * before user's add processing
+					 */
+					rte_dev_bind_driver(
+						uevent.devname,
+						"igb_uio");
+					return(_rte_dev_callback_process(
+					  RTE_EAL_DEV_EVENT_ADD,
+					  uevent.devname, NULL));
+				}
+			}
+		}
+	}
+	return 0;
+}
+
+/**
+ * It builds/rebuilds up the epoll file descriptor with all the
+ * file descriptors being waited on. Then handles the interrupts.
+ *
+ * @param arg
+ *  pointer. (unused)
+ *
+ * @return
+ *  never return;
+ */
+static __attribute__((noreturn)) void *
+dev_uev_monitoring(__rte_unused void *arg)
+{
+	struct sigaction act;
+	sigset_t mask;
+	int netlink_fd;
+	struct epoll_event ep_kernel;
+	int fd_ep;
+
+	udev_exit = false;
+
+	/* set signal handlers */
+	memset(&act, 0x00, sizeof(struct sigaction));
+	act.sa_handler = sig_handler;
+	sigemptyset(&act.sa_mask);
+	act.sa_flags = SA_RESTART;
+	sigaction(SIGINT, &act, NULL);
+	sigaction(SIGTERM, &act, NULL);
+	sigemptyset(&mask);
+	sigaddset(&mask, SIGINT);
+	sigaddset(&mask, SIGTERM);
+	sigprocmask(SIG_UNBLOCK, &mask, NULL);
+
+	fd_ep = epoll_create1(EPOLL_CLOEXEC);
+	if (fd_ep < 0) {
+		RTE_LOG(ERR, EAL, "error creating epoll fd: %m\n");
+		goto out;
+	}
+
+	netlink_fd = dev_monitor_fd_new();
+
+	if (dev_monitor_enable(netlink_fd) < 0) {
+		RTE_LOG(ERR, EAL, "error subscribing to kernel events\n");
+		goto out;
+	}
+
+	memset(&ep_kernel, 0, sizeof(struct epoll_event));
+	ep_kernel.events = EPOLLIN | EPOLLPRI | EPOLLRDHUP | EPOLLHUP;
+	ep_kernel.data.fd = netlink_fd;
+	if (epoll_ctl(fd_ep, EPOLL_CTL_ADD, netlink_fd,
+		&ep_kernel) < 0) {
+		RTE_LOG(ERR, EAL, "error addding fd to epoll: %m\n");
+		goto out;
+	}
+
+	while (!udev_exit) {
+		int fdcount;
+		struct epoll_event ev[1];
+
+		fdcount = epoll_wait(fd_ep, ev, 1, -1);
+		if (fdcount < 0) {
+			if (errno != EINTR)
+				RTE_LOG(ERR, EAL, "error receiving uevent "
+					"message: %m\n");
+				continue;
+			}
+
+		/* epoll_wait has at least one fd ready to read */
+		if (dev_uev_process(ev, fdcount) < 0) {
+			if (errno != EINTR)
+				RTE_LOG(ERR, EAL, "error processing uevent "
+					"message: %m\n");
+		}
+	}
+out:
+	if (fd_ep >= 0)
+		close(fd_ep);
+	if (netlink_fd >= 0)
+		close(netlink_fd);
+	rte_panic("uev monitoring fail\n");
+}
+
+int
+rte_dev_monitor_start(void)
+{
+	int ret;
+
+	if (!no_request_thread)
+		return 0;
+	no_request_thread = false;
+
+	/* create the host thread to wait/handle the uevent from kernel */
+	ret = pthread_create(&uev_monitor_thread, NULL,
+		dev_uev_monitoring, NULL);
+	return ret;
+}
+
+int
+rte_dev_monitor_stop(void)
+{
+	udev_exit = true;
+	no_request_thread = true;
+	return 0;
+}
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c
index 8951ce7..6438047 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -141,6 +141,39 @@ rte_pci_unmap_device(struct rte_pci_device *dev)
 	}
 }
 
+/* Map pci device */
+int
+rte_pci_remap_device(struct rte_pci_device *dev)
+{
+	int ret = -1;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	/* try mapping the NIC resources using VFIO if it exists */
+	switch (dev->kdrv) {
+	case RTE_KDRV_VFIO:
+#ifdef VFIO_PRESENT
+		/* no thing to do */
+#endif
+		break;
+	case RTE_KDRV_IGB_UIO:
+	case RTE_KDRV_UIO_GENERIC:
+		if (rte_eal_using_phys_addrs()) {
+			/* map resources for devices that use uio */
+			ret = pci_uio_remap_resource(dev);
+		}
+		break;
+	default:
+		RTE_LOG(DEBUG, EAL,
+			"  Not managed by a supported kernel driver, skipped\n");
+		ret = 1;
+		break;
+	}
+
+	return ret;
+}
+
 void *
 pci_find_max_end_va(void)
 {
diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_dev.h b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_dev.h
new file mode 100644
index 0000000..7d2c3c3
--- /dev/null
+++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_dev.h
@@ -0,0 +1,105 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_DEV_H_
+#error "don't include this file directly, please include generic <rte_dev.h>"
+#endif
+
+#ifndef _RTE_LINUXAPP_DEV_H_
+#define _RTE_LINUXAPP_DEV_H_
+
+#include <stdio.h>
+
+#include <rte_dev.h>
+
+#define RTE_EAL_UEVENT_MSG_LEN 4096
+
+enum uev_subsystem {
+	UEV_SUBSYSTEM_UIO,
+	UEV_SUBSYSTEM_VFIO,
+	UEV_SUBSYSTEM_PCI,
+	UEV_SUBSYSTEM_MAX
+};
+
+enum uev_monitor_netlink_group {
+	UEV_MONITOR_KERNEL,
+	UEV_MONITOR_UDEV,
+};
+
+/**
+ * The device event type.
+ */
+enum rte_eal_dev_event_type {
+	RTE_EAL_DEV_EVENT_UNKNOWN,	/**< unknown event type */
+	RTE_EAL_DEV_EVENT_ADD,		/**< device adding event */
+	RTE_EAL_DEV_EVENT_REMOVE,
+					/**< device removing event */
+	RTE_EAL_DEV_EVENT_CHANGE,
+					/**< device status change event */
+	RTE_EAL_DEV_EVENT_MOVE,		/**< device sys path move event */
+	RTE_EAL_DEV_EVENT_ONLINE,	/**< device online event */
+	RTE_EAL_DEV_EVENT_OFFLINE,	/**< device offline event */
+	RTE_EAL_DEV_EVENT_MAX		/**< max value of this enum */
+};
+
+struct rte_eal_uevent {
+	enum rte_eal_dev_event_type type;	/**< device event type */
+	int subsystem;				/**< subsystem id */
+	char *devname;				/**< device name */
+	enum uev_monitor_netlink_group group;	/**< device netlink group */
+};
+
+/**
+ * Start the device uevent monitoring.
+ *
+ * @param none
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+rte_dev_monitor_start(void);
+
+/**
+ * Stop the device uevent monitoring .
+ *
+ * @param none
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+
+int
+rte_dev_monitor_stop(void);
+
+#endif /* _RTE_LINUXAPP_DEV_H_ */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v5 2/2] app/testpmd: use uevent to monitor hot removal
  2017-09-20  4:12             ` [PATCH v5 0/2] add uevent monitor for hot plug Jeff Guo
  2017-09-19 18:44               ` Jan Blunck
  2017-09-20  4:12               ` [PATCH v5 1/2] eal: " Jeff Guo
@ 2017-09-20  4:12               ` Jeff Guo
  2017-11-01 20:16                 ` [PATCH v6 0/2] add uevent monitor for hot plug Jeff Guo
  2 siblings, 1 reply; 494+ messages in thread
From: Jeff Guo @ 2017-09-20  4:12 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, gaetan.rivet
  Cc: konstantin.ananyev, jblunck, shreyansh.jain, jingjing.wu, dev,
	jia.guo, thomas, helin.zhang

use testpmd for example, to show app how to request and use
uevent monitoring to handle the hot removal event and the
hot insertion event.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v5->v4:
add new callback to process hot insertion 
---
 app/test-pmd/testpmd.c | 90 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 90 insertions(+)

diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index e097ee0..df2bb48 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -393,6 +393,9 @@ static void check_all_ports_link_status(uint32_t port_mask);
 static int eth_event_callback(uint8_t port_id,
 			      enum rte_eth_event_type type,
 			      void *param, void *ret_param);
+static int eth_uevent_callback(enum rte_eal_dev_event_type type,
+			      void *param, void *ret_param);
+
 
 /*
  * Check if all the ports are started.
@@ -1413,6 +1416,7 @@ start_port(portid_t pid)
 	struct rte_port *port;
 	struct ether_addr mac_addr;
 	enum rte_eth_event_type event_type;
+	enum rte_eal_dev_event_type dev_event_type;
 
 	if (port_id_is_invalid(pid, ENABLED_WARN))
 		return 0;
@@ -1547,6 +1551,21 @@ start_port(portid_t pid)
 				return -1;
 			}
 		}
+		rte_eal_dev_monitor_enable();
+
+		for (dev_event_type = RTE_EAL_DEV_EVENT_UNKNOWN;
+		     dev_event_type < RTE_EAL_DEV_EVENT_MAX;
+		     dev_event_type++) {
+			diag = rte_dev_callback_register(dev_event_type,
+					eth_uevent_callback,
+					&pi);
+			if (diag) {
+				printf("Failed to setup uevent callback for"
+					" device event %d\n",
+					dev_event_type);
+				return -1;
+			}
+		}
 
 		/* start port */
 		if (rte_eth_dev_start(pi) < 0) {
@@ -1883,6 +1902,35 @@ rmv_event_callback(void *arg)
 			dev->device->name);
 }
 
+static void
+rmv_uevent_callback(void *arg)
+{
+	char name[RTE_ETH_NAME_MAX_LEN];
+
+	uint8_t port_id = *(uint8_t *)arg;
+
+	printf("removing device port_id:%u\n", port_id);
+
+	if (rte_eth_dev_detach(port_id, name)) {
+		RTE_LOG(ERR, USER1, "Failed to detach port '%s'\n", name);
+		return;
+	}
+
+	nb_ports = rte_eth_dev_count();
+
+	printf("Port '%s' is detached. Now total ports is %d\n",
+			name, nb_ports);
+}
+
+static void
+add_uevent_callback(void *arg)
+{
+	char *dev_name = (char *)arg;
+
+	printf("adding device %s\n", dev_name);
+	attach_port(dev_name);
+}
+
 /* This function is used by the interrupt thread */
 static int
 eth_event_callback(uint8_t port_id, enum rte_eth_event_type type, void *param,
@@ -1924,6 +1972,48 @@ eth_event_callback(uint8_t port_id, enum rte_eth_event_type type, void *param,
 	return 0;
 }
 
+/* This function is used by the interrupt thread */
+static int
+eth_uevent_callback(enum rte_eal_dev_event_type type, void *arg,
+		  void *ret_param)
+{
+	static const char * const event_desc[] = {
+		[RTE_EAL_DEV_EVENT_UNKNOWN] = "Unknown",
+		[RTE_EAL_DEV_EVENT_ADD] = "add",
+		[RTE_EAL_DEV_EVENT_REMOVE] = "remove",
+	};
+
+	RTE_SET_USED(ret_param);
+
+	if (type >= RTE_EAL_DEV_EVENT_MAX) {
+		fprintf(stderr, "%s called upon invalid event %d\n",
+			__func__, type);
+		fflush(stderr);
+	} else if (event_print_mask & (UINT32_C(1) << type)) {
+		printf("%s event\n",
+			event_desc[type]);
+		fflush(stdout);
+	}
+
+	switch (type) {
+	case RTE_EAL_DEV_EVENT_ADD:
+		if (rte_eal_alarm_set(2000000,
+			add_uevent_callback, (void *)arg))
+			fprintf(stderr, "Could not set up deferred "
+				"device removal\n");
+		break;
+	case RTE_EAL_DEV_EVENT_REMOVE:
+		if (rte_eal_alarm_set(100000,
+			rmv_uevent_callback, (void *)arg))
+			fprintf(stderr, "Could not set up deferred "
+				"device removal\n");
+		break;
+	default:
+		break;
+	}
+	return 0;
+}
+
 static int
 set_tx_queue_stats_mapping_registers(uint8_t port_id, struct rte_port *port)
 {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* Re: [PATCH v5 0/2] add uevent monitor for hot plug
  2017-09-19 18:44               ` Jan Blunck
@ 2017-09-20  6:51                 ` Guo, Jia
  0 siblings, 0 replies; 494+ messages in thread
From: Guo, Jia @ 2017-09-20  6:51 UTC (permalink / raw)
  To: Jan Blunck
  Cc: Stephen Hemminger, Bruce Richardson, Ferruh Yigit,
	Gaëtan Rivet, Ananyev, Konstantin, Shreyansh Jain,
	Jingjing Wu, dev, Thomas Monjalon, Helin Zhang

hi,jan


On 9/20/2017 2:44 AM, Jan Blunck wrote:
> On Wed, Sep 20, 2017 at 6:12 AM, Jeff Guo <jia.guo@intel.com> wrote:
>> So far, about hot plug in dpdk, we already have hot plug add/remove
>> api and fail-safe driver to offload the fail-safe work from the app
>> user. But there are still lack of a general event api, since the interrupt
>> event, which hot plug related with, is diversity between each device and
>> driver, such as mlx4, pci driver and others.
>>
>> Use the hot removal event for example, pci drivers not all exposure the
>> remove interrupt, so in order to make user to easy use the hot plug feature
>> for pci driver, something must be done to detect the remove event at the
>> kernel level and offer a new line of interrupt to the user land.
>>
>> Base on the uevent of kobject mechanism in kernel, we could use it to
>> benefit for monitoring the hot plug status of the device which not only
>> uio/vfio of pci bus devices, but also other, such as cpu/usb/pci-express
>> bus devices.
>>
>> The idea is comming as bellow.
>>
> Jeff,
>
> We already have libudev. Sorry for catching it that late but I don't
> believe that replicating the uevent handling belongs in the DPDK. You
> might want to look into a helper to find the corresponding rte_device
> for a given devnode though.
i think the kobject netlink message could be used by bunch of app,such 
as customize hotplug app or libudev,  off course it could used by dpdk, 
and since it would related with
some different behaviors of diversity drivers and user space complex 
device management. so aim to let dpdk user easy to develop hotplug , we 
want to offload the monitoring work and
coherenceprocession  into eal layer.
> Also the remap_device function should get removed. There is no
> synchronization between the polling for the uevent and the rest of the
> drivers. Therefore there is no guarantee that you can remap to "safe"
> memory. You should fix the drivers instead.
agree with you that would no have sync guarantee, maybe we could add the 
sigaction or other sync condition in drivers to handle the signal bus 
error when unsafe memory control occur.
but i think it is the sync problem , would not affect the remap_device 
functional, and would not affect to use a common bus layer memory 
control to let caller to use to remap to "safe".
am i right? also i want to collect info that it there any other 
conversion about that remap_device api?
> Thanks,
> Jan
>
>> a.The uevent message form FD monitoring which will be useful.
>> remove@/devices/pci0000:80/0000:80:02.2/0000:82:00.0/0000:83:03.0/0000:84:00.2/uio/uio2
>> ACTION=remove
>> DEVPATH=/devices/pci0000:80/0000:80:02.2/0000:82:00.0/0000:83:03.0/0000:84:00.2/uio/uio2
>> SUBSYSTEM=uio
>> MAJOR=243
>> MINOR=2
>> DEVNAME=uio2
>> SEQNUM=11366
>>
>> b.add uevent monitoring machanism:
>> add several general api to enable uevent monitoring.
>>
>> c.add common uevent handler and uevent failure handler
>> uevent of device should be handler at bus or device layer, and the memory read
>> and write failure when hot removal should be handle correctly before detach behaviors.
>>
>> d.show example how to use uevent monitor
>> enable uevent monitoring in testpmd or fail-safe to show usage.
>>
>> patchset history:
>> v5->v4:
>> 1.Move uevent monitor epolling from eal interrupt to eal device layer.
>> 2.Redefine the eal device API for common, and distinguish between linux and bsd
>> 3.Add failure handler helper api in bus layer.Add function of find device by name.
>> 4.Replace of individual fd bind with single device, use a common fd to polling all device.
>> 5.Add to register hot insertion monitoring and process, add function to auto bind driver befor user add device
>> 6.Refine some coding style and typos issue
>> 7.add new callback to process hot insertion
>>
>> v4->v3:
>> 1.move uevent monitor api from eal interrupt to eal device layer.
>> 2.create uevent type and struct in eal device.
>> 3.move uevent handler for each driver to eal layer.
>> 4.add uevent failure handler to process signal fault issue.
>> 5.add example for request and use uevent monitoring in testpmd.
>>
>> v3->v2:
>> 1.refine some return error
>> 2.refine the string searching logic to avoid memory issue
>>
>> v2->v1:
>> 1.remove global variables of hotplug_fd, add uevent_fd
>> in rte_intr_handle to let each pci device self maintain it fd,
>> to fix dual device fd issue.
>> 2.refine some typo error.
>>
>>
>> Jeff Guo (2):
>>    eal: add uevent monitor for hot plug
>>    app/testpmd: use uevent to monitor hot removal
>>
>>   app/test-pmd/testpmd.c                             |  90 ++++++
>>   lib/librte_eal/bsdapp/eal/eal_dev.c                |  64 ++++
>>   .../bsdapp/eal/include/exec-env/rte_dev.h          | 105 ++++++
>>   lib/librte_eal/common/eal_common_bus.c             |  31 ++
>>   lib/librte_eal/common/eal_common_dev.c             | 223 ++++++++++++-
>>   lib/librte_eal/common/eal_common_pci.c             |  69 +++-
>>   lib/librte_eal/common/eal_common_pci_uio.c         |  31 +-
>>   lib/librte_eal/common/eal_common_vdev.c            |  29 +-
>>   lib/librte_eal/common/eal_private.h                |  13 +-
>>   lib/librte_eal/common/include/rte_bus.h            |  52 +++
>>   lib/librte_eal/common/include/rte_dev.h            | 102 +++++-
>>   lib/librte_eal/common/include/rte_pci.h            |  26 ++
>>   lib/librte_eal/linuxapp/eal/Makefile               |   3 +-
>>   lib/librte_eal/linuxapp/eal/eal_dev.c              | 351 +++++++++++++++++++++
>>   lib/librte_eal/linuxapp/eal/eal_pci.c              |  33 ++
>>   .../linuxapp/eal/include/exec-env/rte_dev.h        | 105 ++++++
>>   16 files changed, 1318 insertions(+), 9 deletions(-)
>>   create mode 100644 lib/librte_eal/bsdapp/eal/eal_dev.c
>>   create mode 100644 lib/librte_eal/bsdapp/eal/include/exec-env/rte_dev.h
>>   create mode 100644 lib/librte_eal/linuxapp/eal/eal_dev.c
>>   create mode 100644 lib/librte_eal/linuxapp/eal/include/exec-env/rte_dev.h
>>
>> --
>> 2.7.4
>>

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH v6 0/2] add uevent monitor for hot plug
  2017-09-20  4:12               ` [PATCH v5 2/2] app/testpmd: use uevent to monitor hot removal Jeff Guo
@ 2017-11-01 20:16                 ` Jeff Guo
  2017-11-01 20:16                   ` [PATCH v6 1/2] eal: " Jeff Guo
                                     ` (2 more replies)
  0 siblings, 3 replies; 494+ messages in thread
From: Jeff Guo @ 2017-11-01 20:16 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, gaetan.rivet, thomas
  Cc: konstantin.ananyev, jblunck, shreyansh.jain, jingjing.wu, dev,
	jia.guo, helin.zhang

So far, about hot plug in dpdk, we already have hot plug add/remove
api and fail-safe driver to offload the fail-safe work from the app
user. But there are still lack of a general event api, since the interrupt
event, which hot plug related with, is diversity between each device and
driver, such as mlx4, pci driver and others.

Use the hot removal event for example, pci drivers not all exposure the
remove interrupt, so in order to make user to easy use the hot plug feature
for pci driver, something must be done to detect the remove event at the
kernel level and offer a new line of interrupt to the user land.

Base on the uevent of kobject mechanism in kernel, we could use it to
benefit for monitoring the hot plug status of the device which not only
uio/vfio of pci bus devices, but also other, such as cpu/usb/pci-express
bus devices.

The idea is comming as bellow.

a.The uevent message form FD monitoring which will be useful.
remove@/devices/pci0000:80/0000:80:02.2/0000:82:00.0/0000:83:03.0/0000:84:00.2/uio/uio2
ACTION=remove
DEVPATH=/devices/pci0000:80/0000:80:02.2/0000:82:00.0/0000:83:03.0/0000:84:00.2/uio/uio2
SUBSYSTEM=uio
MAJOR=243
MINOR=2
DEVNAME=uio2
SEQNUM=11366

b.add uevent monitoring machanism:
add several general api to enable uevent monitoring.

c.add common uevent handler and uevent failure handler
uevent of device should be handler at bus or device layer, and the memory read
and write failure when hot removal should be handle correctly before detach behaviors.

d.show example how to use uevent monitor
enable uevent monitoring in testpmd or fail-safe to show usage.

patchset history:
v6->v5:
1.add hot plug policy, in eal, default handle to prepare hot plug work for
all pci device, then let app to manage to deside which device need to
hot plug.
2.modify to manage event callback in each device.
3.fix some system hung issue when igb_uio release.
4.modify the pci part to the bus-pci base on the bus rework.
5.add hot plug policy in app, show example to use hotplug list to manage
to deside which device need to hot plug.

v5->v4:
1.Move uevent monitor epolling from eal interrupt to eal device layer.
2.Redefine the eal device API for common, and distinguish between linux and bsd
3.Add failure handler helper api in bus layer.Add function of find device by name.
4.Replace of individual fd bind with single device, use a common fd to polling all device.
5.Add to register hot insertion monitoring and process, add function to auto bind driver befor user add device
6.Refine some coding style and typos issue
7.add new callback to process hot insertion

v4->v3:
1.move uevent monitor api from eal interrupt to eal device layer.
2.create uevent type and struct in eal device.
3.move uevent handler for each driver to eal layer.
4.add uevent failure handler to process signal fault issue.
5.add example for request and use uevent monitoring in testpmd.

v3->v2:
1.refine some return error
2.refine the string searching logic to avoid memory issue

v2->v1:
1.remove global variables of hotplug_fd, add uevent_fd
in rte_intr_handle to let each pci device self maintain it fd,
to fix dual device fd issue.
2.refine some typo error.

Jeff Guo (2):
  eal: add uevent monitor for hot plug
  app/testpmd: use uevent to monitor hotplug

 app/test-pmd/testpmd.c                             | 172 ++++++++++
 app/test-pmd/testpmd.h                             |   9 +
 drivers/bus/pci/bsd/pci.c                          |  23 ++
 drivers/bus/pci/linux/pci.c                        |  34 ++
 drivers/bus/pci/linux/pci_init.h                   |   1 +
 drivers/bus/pci/pci_common.c                       |  42 +++
 drivers/bus/pci/pci_common_uio.c                   |  28 ++
 drivers/bus/pci/private.h                          |  12 +
 drivers/bus/pci/rte_bus_pci.h                      |   9 +
 lib/librte_eal/bsdapp/eal/eal_dev.c                |  64 ++++
 .../bsdapp/eal/include/exec-env/rte_dev.h          | 105 ++++++
 lib/librte_eal/common/eal_common_bus.c             |  29 ++
 lib/librte_eal/common/eal_common_dev.c             | 222 +++++++++++++
 lib/librte_eal/common/eal_common_vdev.c            |  27 ++
 lib/librte_eal/common/include/rte_bus.h            |  51 +++
 lib/librte_eal/common/include/rte_dev.h            | 107 ++++++-
 lib/librte_eal/linuxapp/eal/Makefile               |   3 +-
 lib/librte_eal/linuxapp/eal/eal_dev.c              | 353 +++++++++++++++++++++
 .../linuxapp/eal/include/exec-env/rte_dev.h        | 105 ++++++
 lib/librte_eal/linuxapp/igb_uio/igb_uio.c          |   6 +
 lib/librte_pci/rte_pci.c                           |  20 ++
 lib/librte_pci/rte_pci.h                           |  17 +
 22 files changed, 1437 insertions(+), 2 deletions(-)
 create mode 100644 lib/librte_eal/bsdapp/eal/eal_dev.c
 create mode 100644 lib/librte_eal/bsdapp/eal/include/exec-env/rte_dev.h
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_dev.c
 create mode 100644 lib/librte_eal/linuxapp/eal/include/exec-env/rte_dev.h

-- 
2.7.4

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH v6 1/2] eal: add uevent monitor for hot plug
  2017-11-01 20:16                 ` [PATCH v6 0/2] add uevent monitor for hot plug Jeff Guo
@ 2017-11-01 20:16                   ` Jeff Guo
  2017-11-01 21:36                     ` Stephen Hemminger
                                       ` (2 more replies)
  2017-11-01 20:16                   ` [PATCH v6 2/2] app/testpmd: use uevent to monitor hotplug Jeff Guo
  2017-12-14  9:48                   ` [PATCH v6 0/2] add uevent monitor for hot plug Mordechay Haimovsky
  2 siblings, 3 replies; 494+ messages in thread
From: Jeff Guo @ 2017-11-01 20:16 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, gaetan.rivet, thomas
  Cc: konstantin.ananyev, jblunck, shreyansh.jain, jingjing.wu, dev,
	jia.guo, helin.zhang

This patch aim to add a general uevent mechanism in eal device layer,
to enable all linux kernel object hot plug monitoring, so user could use these
APIs to monitor and read out the device status info that sent from the kernel
side, then corresponding to handle it, such as detach or attach the
device, and even benefit to use it to do smoothly fail safe work.

1) About uevent monitoring:
a: add one epolling to poll the netlink socket, to monitor the uevent of
   the device, add device_state in struct of rte_device, to identify the
   device state machine.
b: add enum of rte_eal_dev_event_type and struct of rte_eal_uevent.
c: add below API in rte eal device common layer.
   rte_eal_dev_monitor_enable
   rte_dev_callback_register
   rte_dev_callback_unregister
   _rte_dev_callback_process
   rte_dev_bind_driver
   rte_dev_monitor_start
   rte_dev_monitor_stop

2) About failure handler, use pci uio for example,
   add pci_remap_device in bus layer and below function to process it:
   rte_pci_remap_device
   pci_uio_remap_resource
   pci_map_private_resource

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v6->v5:
add hot plug policy, in eal, default handle to prepare hot plug work for
all pci device, then let app to manage to deside which device need to
hot plug.
modify to manage event callback in each device.
fix some system hung issue when igb_uio release.
modify the pci part to the bus-pci base on the bus rework.
---
 drivers/bus/pci/bsd/pci.c                          |  23 ++
 drivers/bus/pci/linux/pci.c                        |  34 ++
 drivers/bus/pci/linux/pci_init.h                   |   1 +
 drivers/bus/pci/pci_common.c                       |  42 +++
 drivers/bus/pci/pci_common_uio.c                   |  28 ++
 drivers/bus/pci/private.h                          |  12 +
 drivers/bus/pci/rte_bus_pci.h                      |   9 +
 lib/librte_eal/bsdapp/eal/eal_dev.c                |  64 ++++
 .../bsdapp/eal/include/exec-env/rte_dev.h          | 105 ++++++
 lib/librte_eal/common/eal_common_bus.c             |  29 ++
 lib/librte_eal/common/eal_common_dev.c             | 222 +++++++++++++
 lib/librte_eal/common/eal_common_vdev.c            |  27 ++
 lib/librte_eal/common/include/rte_bus.h            |  51 +++
 lib/librte_eal/common/include/rte_dev.h            | 107 ++++++-
 lib/librte_eal/linuxapp/eal/Makefile               |   3 +-
 lib/librte_eal/linuxapp/eal/eal_dev.c              | 353 +++++++++++++++++++++
 .../linuxapp/eal/include/exec-env/rte_dev.h        | 105 ++++++
 lib/librte_eal/linuxapp/igb_uio/igb_uio.c          |   6 +
 lib/librte_pci/rte_pci.c                           |  20 ++
 lib/librte_pci/rte_pci.h                           |  17 +
 20 files changed, 1256 insertions(+), 2 deletions(-)
 create mode 100644 lib/librte_eal/bsdapp/eal/eal_dev.c
 create mode 100644 lib/librte_eal/bsdapp/eal/include/exec-env/rte_dev.h
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_dev.c
 create mode 100644 lib/librte_eal/linuxapp/eal/include/exec-env/rte_dev.h

diff --git a/drivers/bus/pci/bsd/pci.c b/drivers/bus/pci/bsd/pci.c
index 39d65c6..bc78f28 100644
--- a/drivers/bus/pci/bsd/pci.c
+++ b/drivers/bus/pci/bsd/pci.c
@@ -127,6 +127,29 @@ rte_pci_unmap_device(struct rte_pci_device *dev)
 	}
 }
 
+/* Map pci device */
+int
+rte_pci_remap_device(struct rte_pci_device *dev)
+{
+	int ret = -1;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	switch (dev->kdrv) {
+	case RTE_KDRV_NIC_UIO:
+		ret = pci_uio_remap_resource(dev);
+		break;
+	default:
+		RTE_LOG(DEBUG, EAL,
+			"  Not managed by a supported kernel driver, skipped\n");
+		ret = 1;
+		break;
+	}
+
+	return ret;
+}
+
 void
 pci_uio_free_resource(struct rte_pci_device *dev,
 		struct mapped_pci_resource *uio_res)
diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index cdf8106..3f8a246 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -145,6 +145,38 @@ rte_pci_unmap_device(struct rte_pci_device *dev)
 	}
 }
 
+/* Map pci device */
+int
+rte_pci_remap_device(struct rte_pci_device *dev)
+{
+	int ret = -1;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	switch (dev->kdrv) {
+	case RTE_KDRV_VFIO:
+#ifdef VFIO_PRESENT
+		/* no thing to do */
+#endif
+		break;
+	case RTE_KDRV_IGB_UIO:
+	case RTE_KDRV_UIO_GENERIC:
+		if (rte_eal_using_phys_addrs()) {
+			/* map resources for devices that use uio */
+			ret = pci_uio_remap_resource(dev);
+		}
+		break;
+	default:
+		RTE_LOG(DEBUG, EAL,
+			"  Not managed by a supported kernel driver, skipped\n");
+		ret = 1;
+		break;
+	}
+
+	return ret;
+}
+
 void *
 pci_find_max_end_va(void)
 {
@@ -386,6 +418,8 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr *addr)
 		rte_pci_add_device(dev);
 	}
 
+	dev->device.state = DEVICE_PARSED;
+	TAILQ_INIT(&(dev->device.uev_cbs));
 	return 0;
 }
 
diff --git a/drivers/bus/pci/linux/pci_init.h b/drivers/bus/pci/linux/pci_init.h
index 99d7a2e..fa71f3c 100644
--- a/drivers/bus/pci/linux/pci_init.h
+++ b/drivers/bus/pci/linux/pci_init.h
@@ -58,6 +58,7 @@ int pci_uio_alloc_resource(struct rte_pci_device *dev,
 		struct mapped_pci_resource **uio_res);
 void pci_uio_free_resource(struct rte_pci_device *dev,
 		struct mapped_pci_resource *uio_res);
+int pci_uio_remap_resource(struct rte_pci_device *dev);
 int pci_uio_map_resource_by_index(struct rte_pci_device *dev, int res_idx,
 		struct mapped_pci_resource *uio_res, int map_idx);
 
diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
index 3e27779..170582d 100644
--- a/drivers/bus/pci/pci_common.c
+++ b/drivers/bus/pci/pci_common.c
@@ -283,6 +283,7 @@ pci_probe_all_drivers(struct rte_pci_device *dev)
 		if (rc > 0)
 			/* positive value means driver doesn't support it */
 			continue;
+		dev->device.state = DEVICE_PROBED;
 		return 0;
 	}
 	return 1;
@@ -482,6 +483,7 @@ rte_pci_insert_device(struct rte_pci_device *exist_pci_dev,
 void
 rte_pci_remove_device(struct rte_pci_device *pci_dev)
 {
+	RTE_LOG(DEBUG, EAL, " rte_pci_remove_device for device list\n");
 	TAILQ_REMOVE(&rte_pci_bus.device_list, pci_dev, next);
 }
 
@@ -503,6 +505,44 @@ pci_find_device(const struct rte_device *start, rte_dev_cmp_t cmp,
 	return NULL;
 }
 
+static struct rte_device *
+pci_find_device_by_name(const struct rte_device *start,
+		rte_dev_cmp_name_t cmp_name,
+		const void *data)
+{
+	struct rte_pci_device *dev;
+
+	FOREACH_DEVICE_ON_PCIBUS(dev) {
+		if (start && &dev->device == start) {
+			start = NULL; /* starting point found */
+			continue;
+		}
+		if (cmp_name(dev->device.name, data) == 0)
+			return &dev->device;
+	}
+
+	return NULL;
+}
+
+static int
+pci_remap_device(struct rte_device *dev)
+{
+	struct rte_pci_device *pdev;
+	int ret;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	pdev = RTE_DEV_TO_PCI(dev);
+
+	/* remap resources for devices that use igb_uio */
+	ret = rte_pci_remap_device(pdev);
+	if (ret != 0)
+		RTE_LOG(ERR, EAL, "failed to remap device %s",
+			dev->name);
+	return ret;
+}
+
 static int
 pci_plug(struct rte_device *dev)
 {
@@ -529,10 +569,12 @@ struct rte_pci_bus rte_pci_bus = {
 		.scan = rte_pci_scan,
 		.probe = rte_pci_probe,
 		.find_device = pci_find_device,
+		.find_device_by_name = pci_find_device_by_name,
 		.plug = pci_plug,
 		.unplug = pci_unplug,
 		.parse = pci_parse,
 		.get_iommu_class = rte_pci_get_iommu_class,
+		.remap_device = pci_remap_device,
 	},
 	.device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
 	.driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
diff --git a/drivers/bus/pci/pci_common_uio.c b/drivers/bus/pci/pci_common_uio.c
index b58bcf5..bb91fbd 100644
--- a/drivers/bus/pci/pci_common_uio.c
+++ b/drivers/bus/pci/pci_common_uio.c
@@ -176,6 +176,34 @@ pci_uio_unmap(struct mapped_pci_resource *uio_res)
 	}
 }
 
+/* remap the PCI resource of a PCI device in private virtual memory */
+int
+pci_uio_remap_resource(struct rte_pci_device *dev)
+{
+	int i;
+	uint64_t phaddr;
+	void *map_address;
+
+	/* Map all BARs */
+	for (i = 0; i != PCI_MAX_RESOURCE; i++) {
+		/* skip empty BAR */
+		phaddr = dev->mem_resource[i].phys_addr;
+		if (phaddr == 0)
+			continue;
+		map_address = pci_map_private_resource(
+				dev->mem_resource[i].addr, 0,
+				(size_t)dev->mem_resource[i].len);
+		if (map_address == MAP_FAILED)
+			goto error;
+		memset(map_address, 0xFF, (size_t)dev->mem_resource[i].len);
+		dev->mem_resource[i].addr = map_address;
+	}
+
+	return 0;
+error:
+	return -1;
+}
+
 static struct mapped_pci_resource *
 pci_uio_find_resource(struct rte_pci_device *dev)
 {
diff --git a/drivers/bus/pci/private.h b/drivers/bus/pci/private.h
index 2283f09..10baa1a 100644
--- a/drivers/bus/pci/private.h
+++ b/drivers/bus/pci/private.h
@@ -202,6 +202,18 @@ void pci_uio_free_resource(struct rte_pci_device *dev,
 		struct mapped_pci_resource *uio_res);
 
 /**
+ * remap the pci uio resource..
+ *
+ * @param dev
+ *   Point to the struct rte pci device.
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+pci_uio_remap_resource(struct rte_pci_device *dev);
+
+/**
  * Map device memory to uio resource
  *
  * This function is private to EAL.
diff --git a/drivers/bus/pci/rte_bus_pci.h b/drivers/bus/pci/rte_bus_pci.h
index c0b619f..56ec7a1 100644
--- a/drivers/bus/pci/rte_bus_pci.h
+++ b/drivers/bus/pci/rte_bus_pci.h
@@ -197,6 +197,15 @@ int rte_pci_map_device(struct rte_pci_device *dev);
 void rte_pci_unmap_device(struct rte_pci_device *dev);
 
 /**
+ * Remap this device
+ *
+ * @param dev
+ *   A pointer to a rte_pci_device structure describing the device
+ *   to use
+ */
+int rte_pci_remap_device(struct rte_pci_device *dev);
+
+/**
  * Dump the content of the PCI bus.
  *
  * @param f
diff --git a/lib/librte_eal/bsdapp/eal/eal_dev.c b/lib/librte_eal/bsdapp/eal/eal_dev.c
new file mode 100644
index 0000000..6ea9a74
--- /dev/null
+++ b/lib/librte_eal/bsdapp/eal/eal_dev.c
@@ -0,0 +1,64 @@
+/*-
+ *   Copyright(c) 2010-2017 Intel Corporation.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdio.h>
+#include <string.h>
+#include <inttypes.h>
+#include <sys/queue.h>
+#include <sys/signalfd.h>
+#include <sys/ioctl.h>
+#include <sys/socket.h>
+#include <linux/netlink.h>
+#include <sys/epoll.h>
+#include <unistd.h>
+#include <signal.h>
+#include <stdbool.h>
+
+#include <rte_malloc.h>
+#include <rte_bus.h>
+#include <rte_dev.h>
+#include <rte_devargs.h>
+#include <rte_debug.h>
+#include <rte_log.h>
+
+#include "eal_thread.h"
+
+int
+rte_dev_monitor_start(void)
+{
+	return -1;
+}
+
+int
+rte_dev_monitor_stop(void)
+{
+	return -1;
+}
diff --git a/lib/librte_eal/bsdapp/eal/include/exec-env/rte_dev.h b/lib/librte_eal/bsdapp/eal/include/exec-env/rte_dev.h
new file mode 100644
index 0000000..7d2c3c3
--- /dev/null
+++ b/lib/librte_eal/bsdapp/eal/include/exec-env/rte_dev.h
@@ -0,0 +1,105 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_DEV_H_
+#error "don't include this file directly, please include generic <rte_dev.h>"
+#endif
+
+#ifndef _RTE_LINUXAPP_DEV_H_
+#define _RTE_LINUXAPP_DEV_H_
+
+#include <stdio.h>
+
+#include <rte_dev.h>
+
+#define RTE_EAL_UEVENT_MSG_LEN 4096
+
+enum uev_subsystem {
+	UEV_SUBSYSTEM_UIO,
+	UEV_SUBSYSTEM_VFIO,
+	UEV_SUBSYSTEM_PCI,
+	UEV_SUBSYSTEM_MAX
+};
+
+enum uev_monitor_netlink_group {
+	UEV_MONITOR_KERNEL,
+	UEV_MONITOR_UDEV,
+};
+
+/**
+ * The device event type.
+ */
+enum rte_eal_dev_event_type {
+	RTE_EAL_DEV_EVENT_UNKNOWN,	/**< unknown event type */
+	RTE_EAL_DEV_EVENT_ADD,		/**< device adding event */
+	RTE_EAL_DEV_EVENT_REMOVE,
+					/**< device removing event */
+	RTE_EAL_DEV_EVENT_CHANGE,
+					/**< device status change event */
+	RTE_EAL_DEV_EVENT_MOVE,		/**< device sys path move event */
+	RTE_EAL_DEV_EVENT_ONLINE,	/**< device online event */
+	RTE_EAL_DEV_EVENT_OFFLINE,	/**< device offline event */
+	RTE_EAL_DEV_EVENT_MAX		/**< max value of this enum */
+};
+
+struct rte_eal_uevent {
+	enum rte_eal_dev_event_type type;	/**< device event type */
+	int subsystem;				/**< subsystem id */
+	char *devname;				/**< device name */
+	enum uev_monitor_netlink_group group;	/**< device netlink group */
+};
+
+/**
+ * Start the device uevent monitoring.
+ *
+ * @param none
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+rte_dev_monitor_start(void);
+
+/**
+ * Stop the device uevent monitoring .
+ *
+ * @param none
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+
+int
+rte_dev_monitor_stop(void);
+
+#endif /* _RTE_LINUXAPP_DEV_H_ */
diff --git a/lib/librte_eal/common/eal_common_bus.c b/lib/librte_eal/common/eal_common_bus.c
index 3e022d5..bdb0e54 100644
--- a/lib/librte_eal/common/eal_common_bus.c
+++ b/lib/librte_eal/common/eal_common_bus.c
@@ -51,8 +51,10 @@ rte_bus_register(struct rte_bus *bus)
 	RTE_VERIFY(bus->scan);
 	RTE_VERIFY(bus->probe);
 	RTE_VERIFY(bus->find_device);
+	RTE_VERIFY(bus->find_device_by_name);
 	/* Buses supporting driver plug also require unplug. */
 	RTE_VERIFY(!bus->plug || bus->unplug);
+	RTE_VERIFY(bus->remap_device);
 
 	TAILQ_INSERT_TAIL(&rte_bus_list, bus, next);
 	RTE_LOG(DEBUG, EAL, "Registered [%s] bus.\n", bus->name);
@@ -170,6 +172,14 @@ cmp_rte_device(const struct rte_device *dev1, const void *_dev2)
 }
 
 static int
+cmp_rte_device_name(const char *dev_name1, const void *_dev_name2)
+{
+	const char *dev_name2 = _dev_name2;
+
+	return strcmp(dev_name1, dev_name2);
+}
+
+static int
 bus_find_device(const struct rte_bus *bus, const void *_dev)
 {
 	struct rte_device *dev;
@@ -178,6 +188,25 @@ bus_find_device(const struct rte_bus *bus, const void *_dev)
 	return dev == NULL;
 }
 
+static struct rte_device *
+bus_find_device_by_name(const struct rte_bus *bus, const void *_dev_name)
+{
+	struct rte_device *dev;
+
+	dev = bus->find_device_by_name(NULL, cmp_rte_device_name, _dev_name);
+	return dev;
+}
+
+struct rte_device *
+
+rte_bus_find_device(const struct rte_bus *bus, const void *_dev_name)
+{
+	struct rte_device *dev;
+
+	dev = bus_find_device_by_name(bus, _dev_name);
+	return dev;
+}
+
 struct rte_bus *
 rte_bus_find_by_device(const struct rte_device *dev)
 {
diff --git a/lib/librte_eal/common/eal_common_dev.c b/lib/librte_eal/common/eal_common_dev.c
index e251275..ab0794b 100644
--- a/lib/librte_eal/common/eal_common_dev.c
+++ b/lib/librte_eal/common/eal_common_dev.c
@@ -36,15 +36,39 @@
 #include <string.h>
 #include <inttypes.h>
 #include <sys/queue.h>
+#include <unistd.h>
+#include <fcntl.h>
 
 #include <rte_bus.h>
 #include <rte_dev.h>
 #include <rte_devargs.h>
 #include <rte_debug.h>
 #include <rte_log.h>
+#include <rte_spinlock.h>
+#include <rte_malloc.h>
 
 #include "eal_private.h"
 
+/* spinlock for device callbacks */
+static rte_spinlock_t rte_dev_cb_lock = RTE_SPINLOCK_INITIALIZER;
+
+/**
+ * The user application callback description.
+ *
+ * It contains callback address to be registered by user application,
+ * the pointer to the parameters for callback, and the event type.
+ */
+struct rte_eal_dev_callback {
+	TAILQ_ENTRY(rte_eal_dev_callback) next; /**< Callbacks list */
+	rte_eal_dev_cb_fn cb_fn;                /**< Callback address */
+	void *cb_arg;                           /**< Parameter for callback */
+	void *ret_param;                        /**< Return parameter */
+	enum rte_eal_dev_event_type event;      /**< device event type */
+	uint32_t active;                        /**< Callback is executing */
+};
+
+static struct rte_eal_dev_callback *dev_add_cb;
+
 static int cmp_detached_dev_name(const struct rte_device *dev,
 	const void *_name)
 {
@@ -244,3 +268,201 @@ int rte_eal_hotplug_remove(const char *busname, const char *devname)
 	rte_eal_devargs_remove(busname, devname);
 	return ret;
 }
+
+int
+rte_eal_dev_monitor_enable(void)
+{
+	int ret;
+
+	ret = rte_dev_monitor_start();
+	if (ret)
+		RTE_LOG(ERR, EAL, "Can not init device monitor\n");
+	return ret;
+}
+
+int
+rte_dev_callback_register(struct rte_device *device,
+			enum rte_eal_dev_event_type event,
+			rte_eal_dev_cb_fn cb_fn, void *cb_arg)
+{
+	struct rte_eal_dev_callback *user_cb;
+
+	if (!cb_fn)
+		return -EINVAL;
+
+	rte_spinlock_lock(&rte_dev_cb_lock);
+
+	if (TAILQ_EMPTY(&(device->uev_cbs)))
+		TAILQ_INIT(&(device->uev_cbs));
+
+	if (event == RTE_EAL_DEV_EVENT_ADD) {
+		user_cb = NULL;
+	} else {
+		TAILQ_FOREACH(user_cb, &(device->uev_cbs), next) {
+			if (user_cb->cb_fn == cb_fn &&
+				user_cb->cb_arg == cb_arg &&
+				user_cb->event == event) {
+				break;
+			}
+		}
+	}
+
+	/* create a new callback. */
+	if (user_cb == NULL) {
+		/* allocate a new interrupt callback entity */
+		user_cb = rte_zmalloc("eal device event",
+					sizeof(*user_cb), 0);
+		if (user_cb == NULL) {
+			RTE_LOG(ERR, EAL, "Can not allocate memory\n");
+			return -ENOMEM;
+		}
+		user_cb->cb_fn = cb_fn;
+		user_cb->cb_arg = cb_arg;
+		user_cb->event = event;
+		if (event == RTE_EAL_DEV_EVENT_ADD)
+			dev_add_cb = user_cb;
+		else
+			TAILQ_INSERT_TAIL(&(device->uev_cbs), user_cb, next);
+	}
+
+	rte_spinlock_unlock(&rte_dev_cb_lock);
+	return 0;
+}
+
+int
+rte_dev_callback_unregister(struct rte_device *device,
+			enum rte_eal_dev_event_type event,
+			rte_eal_dev_cb_fn cb_fn, void *cb_arg)
+{
+	int ret;
+	struct rte_eal_dev_callback *cb, *next;
+
+	if (!cb_fn)
+		return -EINVAL;
+
+	rte_spinlock_lock(&rte_dev_cb_lock);
+
+	ret = 0;
+	if (event == RTE_EAL_DEV_EVENT_ADD) {
+		rte_free(dev_add_cb);
+		dev_add_cb = NULL;
+	} else {
+		for (cb = TAILQ_FIRST(&(device->uev_cbs)); cb != NULL;
+		      cb = next) {
+
+			next = TAILQ_NEXT(cb, next);
+
+			if (cb->cb_fn != cb_fn || cb->event != event ||
+					(cb->cb_arg != (void *)-1 &&
+					cb->cb_arg != cb_arg))
+				continue;
+
+			/*
+			 * if this callback is not executing right now,
+			 * then remove it.
+			 */
+			if (cb->active == 0) {
+				TAILQ_REMOVE(&(device->uev_cbs), cb, next);
+				rte_free(cb);
+			} else {
+				ret = -EAGAIN;
+			}
+		}
+	}
+	rte_spinlock_unlock(&rte_dev_cb_lock);
+	return ret;
+}
+
+int
+_rte_dev_callback_process(struct rte_device *device,
+			enum rte_eal_dev_event_type event,
+			void *cb_arg, void *ret_param)
+{
+	struct rte_eal_dev_callback dev_cb;
+	struct rte_eal_dev_callback *cb_lst;
+	int rc = 0;
+
+	rte_spinlock_lock(&rte_dev_cb_lock);
+	if (event == RTE_EAL_DEV_EVENT_ADD) {
+		if (cb_arg != NULL)
+			dev_add_cb->cb_arg = cb_arg;
+
+		if (ret_param != NULL)
+			dev_add_cb->ret_param = ret_param;
+
+		rte_spinlock_unlock(&rte_dev_cb_lock);
+		rc = dev_add_cb->cb_fn(dev_add_cb->event,
+				dev_add_cb->cb_arg, dev_add_cb->ret_param);
+		rte_spinlock_lock(&rte_dev_cb_lock);
+	} else {
+		TAILQ_FOREACH(cb_lst, &(device->uev_cbs), next) {
+			if (cb_lst->cb_fn == NULL || cb_lst->event != event)
+				continue;
+			dev_cb = *cb_lst;
+			cb_lst->active = 1;
+			if (cb_arg != NULL)
+				dev_cb.cb_arg = cb_arg;
+			if (ret_param != NULL)
+				dev_cb.ret_param = ret_param;
+
+			rte_spinlock_unlock(&rte_dev_cb_lock);
+			rc = dev_cb.cb_fn(dev_cb.event,
+					dev_cb.cb_arg, dev_cb.ret_param);
+			rte_spinlock_lock(&rte_dev_cb_lock);
+			cb_lst->active = 0;
+		}
+	}
+	rte_spinlock_unlock(&rte_dev_cb_lock);
+	return rc;
+}
+
+int
+rte_dev_bind_driver(const char *dev_name, const char *drv_type) {
+	char drv_bind_path[1024];
+	char drv_override_path[1024]; /* contains the /dev/uioX */
+	int drv_override_fd;
+	int drv_bind_fd;
+
+	RTE_SET_USED(drv_type);
+
+	snprintf(drv_override_path, sizeof(drv_override_path),
+		"/sys/bus/pci/devices/%s/driver_override", dev_name);
+
+	/* specify the driver for a device by writing to driver_override */
+	drv_override_fd = open(drv_override_path, O_WRONLY);
+	if (drv_override_fd < 0) {
+		RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
+			drv_override_path, strerror(errno));
+		goto err;
+	}
+
+	if (write(drv_override_fd, drv_type, sizeof(drv_type)) < 0) {
+		RTE_LOG(ERR, EAL,
+			"Error: bind failed - Cannot write "
+			"driver %s to device %s\n", drv_type, dev_name);
+		goto err;
+	}
+
+	close(drv_override_fd);
+
+	snprintf(drv_bind_path, sizeof(drv_bind_path),
+		"/sys/bus/pci/drivers/%s/bind", drv_type);
+
+	/* do the bind by writing device to the specific driver  */
+	drv_bind_fd = open(drv_bind_path, O_WRONLY | O_APPEND);
+	if (drv_bind_fd < 0) {
+		RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
+			drv_bind_path, strerror(errno));
+		goto err;
+	}
+
+	if (write(drv_bind_fd, dev_name, sizeof(dev_name)) < 0)
+		goto err;
+
+	close(drv_bind_fd);
+	return 0;
+err:
+	close(drv_override_fd);
+	close(drv_bind_fd);
+	return -1;
+}
diff --git a/lib/librte_eal/common/eal_common_vdev.c b/lib/librte_eal/common/eal_common_vdev.c
index f7e547a..2ad517c 100644
--- a/lib/librte_eal/common/eal_common_vdev.c
+++ b/lib/librte_eal/common/eal_common_vdev.c
@@ -318,6 +318,31 @@ vdev_find_device(const struct rte_device *start, rte_dev_cmp_t cmp,
 	return NULL;
 }
 
+static struct rte_device *
+vdev_find_device_by_name(const struct rte_device *start,
+		rte_dev_cmp_name_t cmp_name,
+		const void *data)
+{
+	struct rte_vdev_device *dev;
+
+	TAILQ_FOREACH(dev, &vdev_device_list, next) {
+		if (start && &dev->device == start) {
+			start = NULL;
+			continue;
+		}
+		if (cmp_name(dev->device.name, data) == 0)
+			return &dev->device;
+	}
+	return NULL;
+}
+
+static int
+vdev_remap_device(struct rte_device *dev)
+{
+	RTE_SET_USED(dev);
+	return 0;
+}
+
 static int
 vdev_plug(struct rte_device *dev)
 {
@@ -334,9 +359,11 @@ static struct rte_bus rte_vdev_bus = {
 	.scan = vdev_scan,
 	.probe = vdev_probe,
 	.find_device = vdev_find_device,
+	.find_device_by_name = vdev_find_device_by_name,
 	.plug = vdev_plug,
 	.unplug = vdev_unplug,
 	.parse = vdev_parse,
+	.remap_device = vdev_remap_device,
 };
 
 RTE_REGISTER_BUS(vdev, rte_vdev_bus);
diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
index 6fb0834..f1ca670 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -122,6 +122,34 @@ typedef struct rte_device *
 			 const void *data);
 
 /**
+ * Device iterator to find a device on a bus.
+ *
+ * This function returns an rte_device if one of those held by the bus
+ * matches the data passed as parameter.
+ *
+ * If the comparison function returns zero this function should stop iterating
+ * over any more devices. To continue a search the device of a previous search
+ * can be passed via the start parameter.
+ *
+ * @param cmp
+ *	the device name comparison function.
+ *
+ * @param data
+ *	Data to compare each device against.
+ *
+ * @param start
+ *	starting point for the iteration
+ *
+ * @return
+ *	The first device matching the data, NULL if none exists.
+ */
+typedef struct rte_device *
+(*rte_bus_find_device_by_name_t)(const struct rte_device *start,
+			 rte_dev_cmp_name_t cmp,
+			 const void *data);
+
+
+/**
  * Implementation specific probe function which is responsible for linking
  * devices on that bus with applicable drivers.
  *
@@ -168,6 +196,20 @@ typedef int (*rte_bus_unplug_t)(struct rte_device *dev);
 typedef int (*rte_bus_parse_t)(const char *name, void *addr);
 
 /**
+ * Implementation specific remap function which is responsible for remmaping
+ * devices on that bus from original share memory resource to a private memory
+ * resource for the sake of device has been removal.
+ *
+ * @param dev
+ *	Device pointer that was returned by a previous call to find_device.
+ *
+ * @return
+ *	0 on success.
+ *	!0 on error.
+ */
+typedef int (*rte_bus_remap_device_t)(struct rte_device *dev);
+
+/**
  * Bus scan policies
  */
 enum rte_bus_scan_mode {
@@ -206,9 +248,12 @@ struct rte_bus {
 	rte_bus_scan_t scan;         /**< Scan for devices attached to bus */
 	rte_bus_probe_t probe;       /**< Probe devices on bus */
 	rte_bus_find_device_t find_device; /**< Find a device on the bus */
+	rte_bus_find_device_by_name_t find_device_by_name;
+				     /**< Find a device on the bus */
 	rte_bus_plug_t plug;         /**< Probe single device for drivers */
 	rte_bus_unplug_t unplug;     /**< Remove single device from driver */
 	rte_bus_parse_t parse;       /**< Parse a device name */
+	rte_bus_remap_device_t remap_device;       /**< remap a device */
 	struct rte_bus_conf conf;    /**< Bus configuration */
 	rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
 };
@@ -306,6 +351,12 @@ struct rte_bus *rte_bus_find(const struct rte_bus *start, rte_bus_cmp_t cmp,
 struct rte_bus *rte_bus_find_by_device(const struct rte_device *dev);
 
 /**
+ * Find the registered bus for a particular device.
+ */
+struct rte_device *rte_bus_find_device(const struct rte_bus *bus,
+				const void *dev_name);
+
+/**
  * Find the registered bus for a given name.
  */
 struct rte_bus *rte_bus_find_by_name(const char *busname);
diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h
index 4c4ac7e..7307679 100644
--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -52,6 +52,15 @@ extern "C" {
 #include <rte_config.h>
 #include <rte_log.h>
 
+#include <exec-env/rte_dev.h>
+
+typedef int (*rte_eal_dev_cb_fn)(enum rte_eal_dev_event_type event,
+					void *cb_arg, void *ret_param);
+
+struct rte_eal_dev_callback;
+/** @internal Structure to keep track of registered callbacks */
+TAILQ_HEAD(rte_eal_dev_cb_list, rte_eal_dev_callback);
+
 __attribute__((format(printf, 2, 0)))
 static inline void
 rte_pmd_debug_trace(const char *func_name, const char *fmt, ...)
@@ -158,6 +167,13 @@ struct rte_driver {
  */
 #define RTE_DEV_NAME_MAX_LEN 64
 
+enum device_state {
+	DEVICE_UNDEFINED,
+	DEVICE_FAULT,
+	DEVICE_PARSED,
+	DEVICE_PROBED,
+};
+
 /**
  * A structure describing a generic device.
  */
@@ -167,6 +183,9 @@ struct rte_device {
 	const struct rte_driver *driver;/**< Associated driver */
 	int numa_node;                /**< NUMA node connection */
 	struct rte_devargs *devargs;  /**< Device user arguments */
+	enum device_state state;  /**< Device state */
+	/** User application callbacks for device event */
+	struct rte_eal_dev_cb_list uev_cbs;
 };
 
 /**
@@ -271,6 +290,8 @@ int rte_eal_hotplug_remove(const char *busname, const char *devname);
  */
 typedef int (*rte_dev_cmp_t)(const struct rte_device *dev, const void *data);
 
+typedef int (*rte_dev_cmp_name_t)(const char *dev_name, const void *data);
+
 #define RTE_PMD_EXPORT_NAME_ARRAY(n, idx) n##idx[]
 
 #define RTE_PMD_EXPORT_NAME(name, idx) \
@@ -316,4 +337,88 @@ __attribute__((used)) = str
 }
 #endif
 
-#endif /* _RTE_VDEV_H_ */
+/**
+ * It enable the device event monitoring for a specific event.
+ *
+ * @param none
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+rte_eal_dev_monitor_enable(void);
+/**
+ * It registers the callback for the specific event. Multiple
+ * callbacks cal be registered at the same time.
+ * @param event
+ *  The device event type.
+ * @param cb_fn
+ *  callback address.
+ * @param cb_arg
+ *  address of parameter for callback.
+ *
+ * @return
+ *  - On success, zero.
+ *  - On failure, a negative value.
+ */
+int rte_dev_callback_register(struct rte_device *device,
+			enum rte_eal_dev_event_type event,
+			rte_eal_dev_cb_fn cb_fn, void *cb_arg);
+
+/**
+ * It unregisters the callback according to the specified event.
+ *
+ * @param event
+ *  The event type which corresponding to the callback.
+ * @param cb_fn
+ *  callback address.
+ *  address of parameter for callback, (void *)-1 means to remove all
+ *  registered which has the same callback address.
+ *
+ * @return
+ *  - On success, return the number of callback entities removed.
+ *  - On failure, a negative value.
+ */
+int rte_dev_callback_unregister(struct rte_device *device,
+			enum rte_eal_dev_event_type event,
+			rte_eal_dev_cb_fn cb_fn, void *cb_arg);
+
+/**
+ * @internal Executes all the user application registered callbacks for
+ * the specific device. It is for DPDK internal user only. User
+ * application should not call it directly.
+ *
+ * @param event
+ *  The device event type.
+ * @param cb_arg
+ *  callback parameter.
+ * @param ret_param
+ *  To pass data back to user application.
+ *  This allows the user application to decide if a particular function
+ *  is permitted or not.
+ *
+ * @return
+ *  - On success, return zero.
+ *  - On failure, a negative value.
+ */
+int
+_rte_dev_callback_process(struct rte_device *device,
+			enum rte_eal_dev_event_type event,
+			void *cb_arg, void *ret_param);
+
+/**
+ * It can be used to bind a device to a specific type of driver.
+ *
+ * @param dev_name
+ *  The device name.
+ * @param drv_type
+ *  The specific driver's type.
+ *
+ * @return
+ *  - On success, zero.
+ *  - On failure, a negative value.
+ */
+int
+rte_dev_bind_driver(const char *dev_name, const char *drv_type);
+
+#endif /* _RTE_DEV_H_ */
diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index 1d3a42d..e57da21 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -67,6 +67,7 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_lcore.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_timer.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_interrupts.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_alarm.c
+SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_dev.c
 
 # from common dir
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_common_lcore.c
@@ -139,7 +140,7 @@ ifeq ($(CONFIG_RTE_TOOLCHAIN_GCC),y)
 CFLAGS_eal_thread.o += -Wno-return-type
 endif
 
-INC := rte_kni_common.h
+INC := rte_kni_common.h rte_dev.h
 
 SYMLINK-$(CONFIG_RTE_EXEC_ENV_LINUXAPP)-include/exec-env := \
 	$(addprefix include/exec-env/,$(INC))
diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
new file mode 100644
index 0000000..7410643
--- /dev/null
+++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
@@ -0,0 +1,353 @@
+/*-
+ *   Copyright(c) 2010-2017 Intel Corporation.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdio.h>
+#include <string.h>
+#include <inttypes.h>
+#include <sys/queue.h>
+#include <sys/signalfd.h>
+#include <sys/ioctl.h>
+#include <sys/socket.h>
+#include <linux/netlink.h>
+#include <sys/epoll.h>
+#include <unistd.h>
+#include <signal.h>
+#include <stdbool.h>
+
+#include <rte_malloc.h>
+#include <rte_bus.h>
+#include <rte_dev.h>
+#include <rte_devargs.h>
+#include <rte_debug.h>
+#include <rte_log.h>
+
+#include "eal_thread.h"
+
+/* uev monitoring thread */
+static pthread_t uev_monitor_thread;
+
+bool udev_exit = true;
+
+bool no_request_thread = true;
+
+static void sig_handler(int signum)
+{
+	if (signum == SIGINT || signum == SIGTERM)
+		rte_dev_monitor_stop();
+}
+
+static int
+dev_monitor_fd_new(void)
+{
+
+	int uevent_fd;
+
+	uevent_fd = socket(PF_NETLINK, SOCK_RAW | SOCK_CLOEXEC |
+			SOCK_NONBLOCK,
+			NETLINK_KOBJECT_UEVENT);
+	if (uevent_fd < 0) {
+		RTE_LOG(ERR, EAL, "create uevent fd failed\n");
+		return -1;
+	}
+	return uevent_fd;
+}
+
+static int
+dev_monitor_enable(int netlink_fd)
+{
+	struct sockaddr_nl addr;
+	int ret;
+	int size = 64 * 1024;
+	int nonblock = 1;
+
+	memset(&addr, 0, sizeof(addr));
+	addr.nl_family = AF_NETLINK;
+	addr.nl_pid = 0;
+	addr.nl_groups = 0xffffffff;
+
+	if (bind(netlink_fd, (struct sockaddr *) &addr, sizeof(addr)) < 0) {
+		RTE_LOG(ERR, EAL, "bind failed\n");
+		goto err;
+	}
+
+	setsockopt(netlink_fd, SOL_SOCKET, SO_PASSCRED, &size, sizeof(size));
+
+	ret = ioctl(netlink_fd, FIONBIO, &nonblock);
+	if (ret != 0) {
+		RTE_LOG(ERR, EAL, "ioctl(FIONBIO) failed\n");
+		goto err;
+	}
+	return 0;
+err:
+	close(netlink_fd);
+	return -1;
+}
+
+static int
+dev_uev_parse(const char *buf, struct rte_eal_uevent *event)
+{
+	char action[RTE_EAL_UEVENT_MSG_LEN];
+	char subsystem[RTE_EAL_UEVENT_MSG_LEN];
+	char dev_path[RTE_EAL_UEVENT_MSG_LEN];
+	char pci_slot_name[RTE_EAL_UEVENT_MSG_LEN];
+	int i = 0;
+
+	memset(action, 0, RTE_EAL_UEVENT_MSG_LEN);
+	memset(subsystem, 0, RTE_EAL_UEVENT_MSG_LEN);
+	memset(dev_path, 0, RTE_EAL_UEVENT_MSG_LEN);
+	memset(pci_slot_name, 0, RTE_EAL_UEVENT_MSG_LEN);
+
+	while (i < RTE_EAL_UEVENT_MSG_LEN) {
+		for (; i < RTE_EAL_UEVENT_MSG_LEN; i++) {
+			if (*buf)
+				break;
+			buf++;
+		}
+		if (!strncmp(buf, "libudev", 7)) {
+			buf += 7;
+			i += 7;
+			event->group = UEV_MONITOR_UDEV;
+		}
+		if (!strncmp(buf, "ACTION=", 7)) {
+			buf += 7;
+			i += 7;
+			snprintf(action, sizeof(action), "%s", buf);
+		} else if (!strncmp(buf, "DEVPATH=", 8)) {
+			buf += 8;
+			i += 8;
+			snprintf(dev_path, sizeof(dev_path), "%s", buf);
+		} else if (!strncmp(buf, "SUBSYSTEM=", 10)) {
+			buf += 10;
+			i += 10;
+			snprintf(subsystem, sizeof(subsystem), "%s", buf);
+		} else if (!strncmp(buf, "PCI_SLOT_NAME=", 14)) {
+			buf += 14;
+			i += 14;
+			snprintf(pci_slot_name, sizeof(subsystem), "%s", buf);
+		}
+		for (; i < RTE_EAL_UEVENT_MSG_LEN; i++) {
+			if (*buf == '\0')
+				break;
+			buf++;
+		}
+	}
+
+	if (!strncmp(subsystem, "pci", 3))
+		event->subsystem = UEV_SUBSYSTEM_PCI;
+	if (!strncmp(action, "add", 3))
+		event->type = RTE_EAL_DEV_EVENT_ADD;
+	if (!strncmp(action, "remove", 6))
+		event->type = RTE_EAL_DEV_EVENT_REMOVE;
+	event->devname = pci_slot_name;
+
+	return 0;
+}
+
+static int
+dev_uev_receive(int fd, struct rte_eal_uevent *uevent)
+{
+	int ret;
+	char buf[RTE_EAL_UEVENT_MSG_LEN];
+
+	memset(uevent, 0, sizeof(struct rte_eal_uevent));
+	memset(buf, 0, RTE_EAL_UEVENT_MSG_LEN);
+
+	ret = recv(fd, buf, RTE_EAL_UEVENT_MSG_LEN - 1, MSG_DONTWAIT);
+	if (ret < 0) {
+		RTE_LOG(ERR, EAL,
+		"Socket read error(%d): %s\n",
+		errno, strerror(errno));
+		return -1;
+	} else if (ret == 0)
+		/* connection closed */
+		return -1;
+
+	return dev_uev_parse(buf, uevent);
+}
+
+static int
+dev_uev_process(struct epoll_event *events, int nfds)
+{
+	struct rte_bus *bus;
+	struct rte_device *dev;
+	struct rte_eal_uevent uevent;
+	int ret;
+	int i;
+
+	for (i = 0; i < nfds; i++) {
+		/**
+		 * check device uevent from kernel side, no need to check
+		 * uevent from udev.
+		 */
+		if ((dev_uev_receive(events[i].data.fd, &uevent)) ||
+			(uevent.group == UEV_MONITOR_UDEV))
+			return 0;
+
+		/* default handle all pci devcie when is being hot plug */
+		if (uevent.subsystem == UEV_SUBSYSTEM_PCI) {
+			bus = rte_bus_find_by_name("pci");
+			dev = rte_bus_find_device(bus, uevent.devname);
+			if (uevent.type == RTE_EAL_DEV_EVENT_REMOVE) {
+				if ((!dev) || dev->state == DEVICE_UNDEFINED)
+					return 0;
+				dev->state = DEVICE_FAULT;
+				/**
+				 * remap the resource to be fake
+				 * before user's removal processing
+				 */
+				ret = bus->remap_device(dev);
+				if (!ret)
+					return(_rte_dev_callback_process(dev,
+					  RTE_EAL_DEV_EVENT_REMOVE,
+					  NULL, NULL));
+			} else if (uevent.type == RTE_EAL_DEV_EVENT_ADD) {
+				if (dev == NULL) {
+					/**
+					 * bind the driver to the device
+					 * before user's add processing
+					 */
+					rte_dev_bind_driver(
+						uevent.devname,
+						"igb_uio");
+					return(_rte_dev_callback_process(NULL,
+					  RTE_EAL_DEV_EVENT_ADD,
+					  uevent.devname, NULL));
+				}
+			}
+		}
+	}
+	return 0;
+}
+
+/**
+ * It builds/rebuilds up the epoll file descriptor with all the
+ * file descriptors being waited on. Then handles the interrupts.
+ *
+ * @param arg
+ *  pointer. (unused)
+ *
+ * @return
+ *  never return;
+ */
+static __attribute__((noreturn)) void *
+dev_uev_monitoring(__rte_unused void *arg)
+{
+	struct sigaction act;
+	sigset_t mask;
+	int netlink_fd;
+	struct epoll_event ep_kernel;
+	int fd_ep;
+
+	udev_exit = false;
+
+	/* set signal handlers */
+	memset(&act, 0x00, sizeof(struct sigaction));
+	act.sa_handler = sig_handler;
+	sigemptyset(&act.sa_mask);
+	act.sa_flags = SA_RESTART;
+	sigaction(SIGINT, &act, NULL);
+	sigaction(SIGTERM, &act, NULL);
+	sigemptyset(&mask);
+	sigaddset(&mask, SIGINT);
+	sigaddset(&mask, SIGTERM);
+	sigprocmask(SIG_UNBLOCK, &mask, NULL);
+
+	fd_ep = epoll_create1(EPOLL_CLOEXEC);
+	if (fd_ep < 0) {
+		RTE_LOG(ERR, EAL, "error creating epoll fd: %m\n");
+		goto out;
+	}
+
+	netlink_fd = dev_monitor_fd_new();
+
+	if (dev_monitor_enable(netlink_fd) < 0) {
+		RTE_LOG(ERR, EAL, "error subscribing to kernel events\n");
+		goto out;
+	}
+
+	memset(&ep_kernel, 0, sizeof(struct epoll_event));
+	ep_kernel.events = EPOLLIN | EPOLLPRI | EPOLLRDHUP | EPOLLHUP;
+	ep_kernel.data.fd = netlink_fd;
+	if (epoll_ctl(fd_ep, EPOLL_CTL_ADD, netlink_fd,
+		&ep_kernel) < 0) {
+		RTE_LOG(ERR, EAL, "error addding fd to epoll: %m\n");
+		goto out;
+	}
+
+	while (!udev_exit) {
+		int fdcount;
+		struct epoll_event ev[1];
+
+		fdcount = epoll_wait(fd_ep, ev, 1, -1);
+		if (fdcount < 0) {
+			if (errno != EINTR)
+				RTE_LOG(ERR, EAL, "error receiving uevent "
+					"message: %m\n");
+				continue;
+			}
+
+		/* epoll_wait has at least one fd ready to read */
+		if (dev_uev_process(ev, fdcount) < 0) {
+			if (errno != EINTR)
+				RTE_LOG(ERR, EAL, "error processing uevent "
+					"message: %m\n");
+		}
+	}
+out:
+	if (fd_ep >= 0)
+		close(fd_ep);
+	if (netlink_fd >= 0)
+		close(netlink_fd);
+	rte_panic("uev monitoring fail\n");
+}
+
+int
+rte_dev_monitor_start(void)
+{
+	int ret;
+
+	if (!no_request_thread)
+		return 0;
+	no_request_thread = false;
+
+	/* create the host thread to wait/handle the uevent from kernel */
+	ret = pthread_create(&uev_monitor_thread, NULL,
+		dev_uev_monitoring, NULL);
+	return ret;
+}
+
+int
+rte_dev_monitor_stop(void)
+{
+	udev_exit = true;
+	no_request_thread = true;
+	return 0;
+}
diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_dev.h b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_dev.h
new file mode 100644
index 0000000..7d2c3c3
--- /dev/null
+++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_dev.h
@@ -0,0 +1,105 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_DEV_H_
+#error "don't include this file directly, please include generic <rte_dev.h>"
+#endif
+
+#ifndef _RTE_LINUXAPP_DEV_H_
+#define _RTE_LINUXAPP_DEV_H_
+
+#include <stdio.h>
+
+#include <rte_dev.h>
+
+#define RTE_EAL_UEVENT_MSG_LEN 4096
+
+enum uev_subsystem {
+	UEV_SUBSYSTEM_UIO,
+	UEV_SUBSYSTEM_VFIO,
+	UEV_SUBSYSTEM_PCI,
+	UEV_SUBSYSTEM_MAX
+};
+
+enum uev_monitor_netlink_group {
+	UEV_MONITOR_KERNEL,
+	UEV_MONITOR_UDEV,
+};
+
+/**
+ * The device event type.
+ */
+enum rte_eal_dev_event_type {
+	RTE_EAL_DEV_EVENT_UNKNOWN,	/**< unknown event type */
+	RTE_EAL_DEV_EVENT_ADD,		/**< device adding event */
+	RTE_EAL_DEV_EVENT_REMOVE,
+					/**< device removing event */
+	RTE_EAL_DEV_EVENT_CHANGE,
+					/**< device status change event */
+	RTE_EAL_DEV_EVENT_MOVE,		/**< device sys path move event */
+	RTE_EAL_DEV_EVENT_ONLINE,	/**< device online event */
+	RTE_EAL_DEV_EVENT_OFFLINE,	/**< device offline event */
+	RTE_EAL_DEV_EVENT_MAX		/**< max value of this enum */
+};
+
+struct rte_eal_uevent {
+	enum rte_eal_dev_event_type type;	/**< device event type */
+	int subsystem;				/**< subsystem id */
+	char *devname;				/**< device name */
+	enum uev_monitor_netlink_group group;	/**< device netlink group */
+};
+
+/**
+ * Start the device uevent monitoring.
+ *
+ * @param none
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+rte_dev_monitor_start(void);
+
+/**
+ * Stop the device uevent monitoring .
+ *
+ * @param none
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+
+int
+rte_dev_monitor_stop(void);
+
+#endif /* _RTE_LINUXAPP_DEV_H_ */
diff --git a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
index fd320d8..ba10a59 100644
--- a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
+++ b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
@@ -354,6 +354,12 @@ igbuio_pci_release(struct uio_info *info, struct inode *inode)
 	struct rte_uio_pci_dev *udev = info->priv;
 	struct pci_dev *dev = udev->pdev;
 
+	/* check if device have been remove before release */
+	if ((&dev->dev.kobj)->state_remove_uevent_sent == 1) {
+		pr_info("The device have been removed\n");
+		return -1;
+	}
+
 	/* disable interrupts */
 	igbuio_pci_disable_interrupts(udev);
 
diff --git a/lib/librte_pci/rte_pci.c b/lib/librte_pci/rte_pci.c
index 1307a18..0d37ce8 100644
--- a/lib/librte_pci/rte_pci.c
+++ b/lib/librte_pci/rte_pci.c
@@ -180,6 +180,26 @@ pci_addr_parse(const char *str, struct rte_pci_addr *addr)
 	return -1;
 }
 
+/* map a private resource from an address*/
+void *
+pci_map_private_resource(void *requested_addr, off_t offset, size_t size)
+{
+	void *mapaddr;
+
+	mapaddr = mmap(requested_addr, size,
+			   PROT_READ | PROT_WRITE,
+			   MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
+	if (mapaddr == MAP_FAILED) {
+		RTE_LOG(ERR, EAL, "%s(): cannot mmap(%p, 0x%lx, 0x%lx): "
+			"%s (%p)\n",
+			__func__, requested_addr,
+			(unsigned long)size, (unsigned long)offset,
+			strerror(errno), mapaddr);
+	} else
+		RTE_LOG(DEBUG, EAL, "  PCI memory mapped at %p\n", mapaddr);
+
+	return mapaddr;
+}
 
 /* map a particular resource from a file */
 void *
diff --git a/lib/librte_pci/rte_pci.h b/lib/librte_pci/rte_pci.h
index ea0897c..c5392fe 100644
--- a/lib/librte_pci/rte_pci.h
+++ b/lib/librte_pci/rte_pci.h
@@ -243,6 +243,23 @@ int pci_addr_cmp(const struct rte_pci_addr *addr,
 int pci_addr_parse(const char *str, struct rte_pci_addr *addr);
 
 /**
+ * @internal
+ * Map to a particular private resource.
+ *
+ * @param requested_addr
+ *      The starting address for the new mapping range.
+ * @param offset
+ *      The offset for the mapping range.
+ * @param size
+ *      The size for the mapping range.
+ * @return
+ *   - On success, the function returns a pointer to the mapped area.
+ *   - On error, the value MAP_FAILED is returned.
+ */
+void *pci_map_private_resource(void *requested_addr, off_t offset,
+		size_t size);
+
+/**
  * Map a particular resource from a file.
  *
  * @param requested_addr
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v6 2/2] app/testpmd: use uevent to monitor hotplug
  2017-11-01 20:16                 ` [PATCH v6 0/2] add uevent monitor for hot plug Jeff Guo
  2017-11-01 20:16                   ` [PATCH v6 1/2] eal: " Jeff Guo
@ 2017-11-01 20:16                   ` Jeff Guo
  2018-01-03  1:42                     ` [PATCH v7 0/2] add uevent monitor for hot plug Jeff Guo
  2017-12-14  9:48                   ` [PATCH v6 0/2] add uevent monitor for hot plug Mordechay Haimovsky
  2 siblings, 1 reply; 494+ messages in thread
From: Jeff Guo @ 2017-11-01 20:16 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, gaetan.rivet, thomas
  Cc: konstantin.ananyev, jblunck, shreyansh.jain, jingjing.wu, dev,
	jia.guo, helin.zhang

use testpmd for example, to show app how to request and use
uevent monitoring to handle the hot removal event and the
hot insertion event.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v6->v5:
add hot plug policy in app, show example to use hotplug list to manage
to deside which device need to hot plug.
---
 app/test-pmd/testpmd.c | 172 +++++++++++++++++++++++++++++++++++++++++++++++++
 app/test-pmd/testpmd.h |   9 +++
 2 files changed, 181 insertions(+)

diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 9ae5b1f..ac75617 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -402,6 +402,8 @@ uint8_t bitrate_enabled;
 struct gro_status gro_ports[RTE_MAX_ETHPORTS];
 uint8_t gro_flush_cycles = GRO_DEFAULT_FLUSH_CYCLES;
 
+static struct hotplug_request_list hp_list;
+
 /* Forward function declarations */
 static void map_port_queue_stats_mapping_registers(portid_t pi,
 						   struct rte_port *port);
@@ -409,6 +411,13 @@ static void check_all_ports_link_status(uint32_t port_mask);
 static int eth_event_callback(portid_t port_id,
 			      enum rte_eth_event_type type,
 			      void *param, void *ret_param);
+static int eth_uevent_callback(enum rte_eal_dev_event_type type,
+			      void *param, void *ret_param);
+static int eth_uevent_callback_register(portid_t pid);
+static int in_hotplug_list(const char *dev_name);
+
+static int hotplug_list_add(const char *dev_name,
+			    enum rte_eal_dev_event_type event);
 
 /*
  * Check if all the ports are started.
@@ -1758,6 +1767,31 @@ reset_port(portid_t pid)
 	printf("Done\n");
 }
 
+static int
+eth_uevent_callback_register(portid_t pid) {
+	int diag;
+	struct rte_eth_dev *dev;
+	enum rte_eal_dev_event_type dev_event_type;
+
+	/* register the uevent callback */
+	dev = &rte_eth_devices[pid];
+	for (dev_event_type = RTE_EAL_DEV_EVENT_ADD;
+		 dev_event_type < RTE_EAL_DEV_EVENT_CHANGE;
+		 dev_event_type++) {
+		diag = rte_dev_callback_register(dev->device, dev_event_type,
+			eth_uevent_callback,
+			(void *)(intptr_t)pid);
+		if (diag) {
+			printf("Failed to setup uevent callback for"
+				" device event %d\n",
+				dev_event_type);
+			return -1;
+		}
+	}
+
+	return 0;
+}
+
 void
 attach_port(char *identifier)
 {
@@ -1774,6 +1808,8 @@ attach_port(char *identifier)
 	if (rte_eth_dev_attach(identifier, &pi))
 		return;
 
+	eth_uevent_callback_register(pi);
+
 	socket_id = (unsigned)rte_eth_dev_socket_id(pi);
 	/* if socket_id is invalid, set to 0 */
 	if (check_socket_id(socket_id) < 0)
@@ -1785,6 +1821,8 @@ attach_port(char *identifier)
 
 	ports[pi].port_status = RTE_PORT_STOPPED;
 
+	hotplug_list_add(identifier, RTE_EAL_DEV_EVENT_REMOVE);
+
 	printf("Port %d is attached. Now total ports is %d\n", pi, nb_ports);
 	printf("Done\n");
 }
@@ -1811,6 +1849,9 @@ detach_port(portid_t port_id)
 
 	nb_ports = rte_eth_dev_count();
 
+	hotplug_list_add(rte_eth_devices[port_id].device->name,
+			 RTE_EAL_DEV_EVENT_ADD);
+
 	printf("Port '%s' is detached. Now total ports is %d\n",
 			name, nb_ports);
 	printf("Done\n");
@@ -1834,6 +1875,9 @@ pmd_test_exit(void)
 			close_port(pt_id);
 		}
 	}
+
+	rte_dev_monitor_stop();
+
 	printf("\nBye...\n");
 }
 
@@ -1918,6 +1962,43 @@ rmv_event_callback(void *arg)
 			dev->device->name);
 }
 
+static void
+rmv_uevent_callback(void *arg)
+{
+	char name[RTE_ETH_NAME_MAX_LEN];
+	uint8_t port_id = (intptr_t)arg;
+
+	RTE_ETH_VALID_PORTID_OR_RET(port_id);
+	printf("removing port id:%u\n", port_id);
+
+	if (!in_hotplug_list(rte_eth_devices[port_id].device->name))
+		return;
+
+	stop_port(port_id);
+	close_port(port_id);
+	if (rte_eth_dev_detach(port_id, name)) {
+		RTE_LOG(ERR, USER1, "Failed to detach port '%s'\n", name);
+		return;
+	}
+
+	nb_ports = rte_eth_dev_count();
+
+	printf("Port '%s' is detached. Now total ports is %d\n",
+			name, nb_ports);
+}
+
+static void
+add_uevent_callback(void *arg)
+{
+	char *dev_name = (char *)arg;
+
+	if (!in_hotplug_list(dev_name))
+		return;
+
+	RTE_LOG(ERR, EAL, "add device: %s\n", dev_name);
+	attach_port(dev_name);
+}
+
 /* This function is used by the interrupt thread */
 static int
 eth_event_callback(portid_t port_id, enum rte_eth_event_type type, void *param,
@@ -1960,6 +2041,88 @@ eth_event_callback(portid_t port_id, enum rte_eth_event_type type, void *param,
 }
 
 static int
+in_hotplug_list(const char *dev_name)
+{
+	struct hotplug_request *hp_request = NULL;
+
+	TAILQ_FOREACH(hp_request, &hp_list, next) {
+		if (!strcmp(hp_request->dev_name, dev_name))
+			break;
+	}
+
+	if (hp_request)
+		return 1;
+
+	return 0;
+}
+
+static int
+hotplug_list_add(const char *dev_name, enum rte_eal_dev_event_type event)
+{
+	struct hotplug_request *hp_request;
+
+	hp_request = rte_zmalloc("hoplug request",
+			sizeof(*hp_request), 0);
+	if (hp_request == NULL) {
+		fprintf(stderr, "%s can not alloc memory\n",
+			__func__);
+		return -ENOMEM;
+	}
+
+	hp_request->dev_name = dev_name;
+	hp_request->event = event;
+
+	TAILQ_INSERT_TAIL(&hp_list, hp_request, next);
+
+	return 0;
+}
+
+/* This function is used by the interrupt thread */
+static int
+eth_uevent_callback(enum rte_eal_dev_event_type type, void *arg,
+		  void *ret_param)
+{
+	static const char * const event_desc[] = {
+		[RTE_EAL_DEV_EVENT_UNKNOWN] = "Unknown",
+		[RTE_EAL_DEV_EVENT_ADD] = "add",
+		[RTE_EAL_DEV_EVENT_REMOVE] = "remove",
+	};
+	static char *device_name;
+
+	RTE_SET_USED(ret_param);
+
+	if (type >= RTE_EAL_DEV_EVENT_MAX) {
+		fprintf(stderr, "%s called upon invalid event %d\n",
+			__func__, type);
+		fflush(stderr);
+	} else if (event_print_mask & (UINT32_C(1) << type)) {
+		printf("%s event\n",
+			event_desc[type]);
+		fflush(stdout);
+	}
+
+	switch (type) {
+	case RTE_EAL_DEV_EVENT_REMOVE:
+		if (rte_eal_alarm_set(100000,
+			rmv_uevent_callback, arg))
+			fprintf(stderr, "Could not set up deferred "
+				"device removal\n");
+		break;
+	case RTE_EAL_DEV_EVENT_ADD:
+		device_name = malloc(strlen((const char *)arg) + 1);
+		strcpy(device_name, arg);
+		if (rte_eal_alarm_set(500000,
+			add_uevent_callback, device_name))
+			fprintf(stderr, "Could not set up deferred "
+				"device add\n");
+		break;
+	default:
+		break;
+	}
+	return 0;
+}
+
+static int
 set_tx_queue_stats_mapping_registers(portid_t port_id, struct rte_port *port)
 {
 	uint16_t i;
@@ -2439,6 +2602,15 @@ main(int argc, char** argv)
 		       nb_rxq, nb_txq);
 
 	init_config();
+
+	/* enable hot plug monitoring */
+	TAILQ_INIT(&hp_list);
+	rte_eal_dev_monitor_enable();
+	RTE_ETH_FOREACH_DEV(port_id) {
+		hotplug_list_add(rte_eth_devices[port_id].device->name,
+			 RTE_EAL_DEV_EVENT_REMOVE);
+		eth_uevent_callback_register(port_id);
+	}
 	if (start_port(RTE_PORT_ALL) != 0)
 		rte_exit(EXIT_FAILURE, "Start ports failed\n");
 
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 1639d27..9a1088b 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -92,6 +92,15 @@ typedef uint16_t streamid_t;
 #define TM_MODE			0
 #endif
 
+struct hotplug_request {
+	TAILQ_ENTRY(hotplug_request) next; /**< Callbacks list */
+	const char *dev_name;                /* request device name */
+	enum rte_eal_dev_event_type event;      /**< device event type */
+};
+
+/** @internal Structure to keep track of registered callbacks */
+TAILQ_HEAD(hotplug_request_list, hotplug_request);
+
 enum {
 	PORT_TOPOLOGY_PAIRED,
 	PORT_TOPOLOGY_CHAINED,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* Re: [PATCH v6 1/2] eal: add uevent monitor for hot plug
  2017-11-01 20:16                   ` [PATCH v6 1/2] eal: " Jeff Guo
@ 2017-11-01 21:36                     ` Stephen Hemminger
  2017-11-01 21:41                     ` Stephen Hemminger
  2017-12-25 18:06                     ` Stephen Hemminger
  2 siblings, 0 replies; 494+ messages in thread
From: Stephen Hemminger @ 2017-11-01 21:36 UTC (permalink / raw)
  To: Jeff Guo
  Cc: bruce.richardson, ferruh.yigit, gaetan.rivet, thomas,
	konstantin.ananyev, jblunck, shreyansh.jain, jingjing.wu, dev,
	helin.zhang

On Thu,  2 Nov 2017 04:16:44 +0800
Jeff Guo <jia.guo@intel.com> wrote:

>  
> +/* Map pci device */
> +int
> +rte_pci_remap_device(struct rte_pci_device *dev)
> +{
> +	int ret = -1;

Please don't always initialize variables. It is unnecessary, and with
modern compilers a bad habit since it defeats the uninitailized variable
warning message which is useful to detect buggy code paths.


> +
> +	if (dev == NULL)
> +		return -EINVAL;
> +
> +	switch (dev->kdrv) {
> +	case RTE_KDRV_NIC_UIO:
> +		ret = pci_uio_remap_resource(dev);
> +		break;
> +	default:
> +		RTE_LOG(DEBUG, EAL,
> +			"  Not managed by a supported kernel driver, skipped\n");
> +		ret = 1;
> +		break;
> +	}
> +
> +	return ret;
> +}

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v6 1/2] eal: add uevent monitor for hot plug
  2017-11-01 20:16                   ` [PATCH v6 1/2] eal: " Jeff Guo
  2017-11-01 21:36                     ` Stephen Hemminger
@ 2017-11-01 21:41                     ` Stephen Hemminger
  2017-11-08  5:39                       ` Guo, Jia
  2017-12-25  8:30                       ` Guo, Jia
  2017-12-25 18:06                     ` Stephen Hemminger
  2 siblings, 2 replies; 494+ messages in thread
From: Stephen Hemminger @ 2017-11-01 21:41 UTC (permalink / raw)
  To: Jeff Guo
  Cc: bruce.richardson, ferruh.yigit, gaetan.rivet, thomas,
	konstantin.ananyev, jblunck, shreyansh.jain, jingjing.wu, dev,
	helin.zhang

On Thu,  2 Nov 2017 04:16:44 +0800
Jeff Guo <jia.guo@intel.com> wrote:

> +
> +static int
> +dev_uev_parse(const char *buf, struct rte_eal_uevent *event)
> +{
> +	char action[RTE_EAL_UEVENT_MSG_LEN];
> +	char subsystem[RTE_EAL_UEVENT_MSG_LEN];
> +	char dev_path[RTE_EAL_UEVENT_MSG_LEN];
> +	char pci_slot_name[RTE_EAL_UEVENT_MSG_LEN];
> +	int i = 0;
> +
> +	memset(action, 0, RTE_EAL_UEVENT_MSG_LEN);
> +	memset(subsystem, 0, RTE_EAL_UEVENT_MSG_LEN);
> +	memset(dev_path, 0, RTE_EAL_UEVENT_MSG_LEN);
> +	memset(pci_slot_name, 0, RTE_EAL_UEVENT_MSG_LEN);
> +
> +	while (i < RTE_EAL_UEVENT_MSG_LEN) {

Might be simpler, safer, clearer to use rte_strsplit here.

And then have a table of fields rather than open coding the parsing.


> +		for (; i < RTE_EAL_UEVENT_MSG_LEN; i++) {
> +			if (*buf)
> +				break;
> +			buf++;
> +		}
> +		if (!strncmp(buf, "libudev", 7)) {
> +			buf += 7;
> +			i += 7;
> +			event->group = UEV_MONITOR_UDEV;
> +		}
> +		if (!strncmp(buf, "ACTION=", 7)) {
> +			buf += 7;
> +			i += 7;
> +			snprintf(action, sizeof(action), "%s", buf);
> +		} else if (!strncmp(buf, "DEVPATH=", 8)) {
> +			buf += 8;
> +			i += 8;
> +			snprintf(dev_path, sizeof(dev_path), "%s", buf);
> +		} else if (!strncmp(buf, "SUBSYSTEM=", 10)) {
> +			buf += 10;
> +			i += 10;
> +			snprintf(subsystem, sizeof(subsystem), "%s", buf);
> +		} else if (!strncmp(buf, "PCI_SLOT_NAME=", 14)) {
> +			buf += 14;
> +			i += 14;
> +			snprintf(pci_slot_name, sizeof(subsystem), "%s", buf);
> +		}
> +		for (; i < RTE_EAL_UEVENT_MSG_LEN; i++) {
> +			if (*buf == '\0')
> +				break;
> +			buf++;
> +		}
> +	}
> +
> +	if (!strncmp(subsystem, "pci", 3))
> +		event->subsystem = UEV_SUBSYSTEM_PCI;
> +	if (!strncmp(action, "add", 3))
> +		event->type = RTE_EAL_DEV_EVENT_ADD;
> +	if (!strncmp(action, "remove", 6))
> +		event->type = RTE_EAL_DEV_EVENT_REMOVE;
> +	event->devname = pci_slot_name;
> +
> +	return 0;

Function always returns 0, why is it not void?

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v6 1/2] eal: add uevent monitor for hot plug
  2017-11-01 21:41                     ` Stephen Hemminger
@ 2017-11-08  5:39                       ` Guo, Jia
  2017-12-25  8:30                       ` Guo, Jia
  1 sibling, 0 replies; 494+ messages in thread
From: Guo, Jia @ 2017-11-08  5:39 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Richardson, Bruce, Yigit, Ferruh, gaetan.rivet, thomas, Ananyev,
	Konstantin, jblunck, shreyansh.jain, Wu, Jingjing, dev, Zhang,
	Helin

Thanks Stephen for your eye on review, would collect other comment and refine it better in next version.

Best regards,
Jeff Guo

-----Original Message-----
From: Stephen Hemminger [mailto:stephen@networkplumber.org] 
Sent: Thursday, November 2, 2017 5:42 AM
To: Guo, Jia <jia.guo@intel.com>
Cc: Richardson, Bruce <bruce.richardson@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com>; gaetan.rivet@6wind.com; thomas@monjalon.net; Ananyev, Konstantin <konstantin.ananyev@intel.com>; jblunck@infradead.org; shreyansh.jain@nxp.com; Wu, Jingjing <jingjing.wu@intel.com>; dev@dpdk.org; Zhang, Helin <helin.zhang@intel.com>
Subject: Re: [PATCH v6 1/2] eal: add uevent monitor for hot plug

On Thu,  2 Nov 2017 04:16:44 +0800
Jeff Guo <jia.guo@intel.com> wrote:

> +
> +static int
> +dev_uev_parse(const char *buf, struct rte_eal_uevent *event) {
> +	char action[RTE_EAL_UEVENT_MSG_LEN];
> +	char subsystem[RTE_EAL_UEVENT_MSG_LEN];
> +	char dev_path[RTE_EAL_UEVENT_MSG_LEN];
> +	char pci_slot_name[RTE_EAL_UEVENT_MSG_LEN];
> +	int i = 0;
> +
> +	memset(action, 0, RTE_EAL_UEVENT_MSG_LEN);
> +	memset(subsystem, 0, RTE_EAL_UEVENT_MSG_LEN);
> +	memset(dev_path, 0, RTE_EAL_UEVENT_MSG_LEN);
> +	memset(pci_slot_name, 0, RTE_EAL_UEVENT_MSG_LEN);
> +
> +	while (i < RTE_EAL_UEVENT_MSG_LEN) {

Might be simpler, safer, clearer to use rte_strsplit here.

And then have a table of fields rather than open coding the parsing.


> +		for (; i < RTE_EAL_UEVENT_MSG_LEN; i++) {
> +			if (*buf)
> +				break;
> +			buf++;
> +		}
> +		if (!strncmp(buf, "libudev", 7)) {
> +			buf += 7;
> +			i += 7;
> +			event->group = UEV_MONITOR_UDEV;
> +		}
> +		if (!strncmp(buf, "ACTION=", 7)) {
> +			buf += 7;
> +			i += 7;
> +			snprintf(action, sizeof(action), "%s", buf);
> +		} else if (!strncmp(buf, "DEVPATH=", 8)) {
> +			buf += 8;
> +			i += 8;
> +			snprintf(dev_path, sizeof(dev_path), "%s", buf);
> +		} else if (!strncmp(buf, "SUBSYSTEM=", 10)) {
> +			buf += 10;
> +			i += 10;
> +			snprintf(subsystem, sizeof(subsystem), "%s", buf);
> +		} else if (!strncmp(buf, "PCI_SLOT_NAME=", 14)) {
> +			buf += 14;
> +			i += 14;
> +			snprintf(pci_slot_name, sizeof(subsystem), "%s", buf);
> +		}
> +		for (; i < RTE_EAL_UEVENT_MSG_LEN; i++) {
> +			if (*buf == '\0')
> +				break;
> +			buf++;
> +		}
> +	}
> +
> +	if (!strncmp(subsystem, "pci", 3))
> +		event->subsystem = UEV_SUBSYSTEM_PCI;
> +	if (!strncmp(action, "add", 3))
> +		event->type = RTE_EAL_DEV_EVENT_ADD;
> +	if (!strncmp(action, "remove", 6))
> +		event->type = RTE_EAL_DEV_EVENT_REMOVE;
> +	event->devname = pci_slot_name;
> +
> +	return 0;

Function always returns 0, why is it not void?

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v6 0/2] add uevent monitor for hot plug
  2017-11-01 20:16                 ` [PATCH v6 0/2] add uevent monitor for hot plug Jeff Guo
  2017-11-01 20:16                   ` [PATCH v6 1/2] eal: " Jeff Guo
  2017-11-01 20:16                   ` [PATCH v6 2/2] app/testpmd: use uevent to monitor hotplug Jeff Guo
@ 2017-12-14  9:48                   ` Mordechay Haimovsky
  2017-12-14 10:21                     ` Gaëtan Rivet
  2 siblings, 1 reply; 494+ messages in thread
From: Mordechay Haimovsky @ 2017-12-14  9:48 UTC (permalink / raw)
  To: Jeff Guo, stephen, bruce.richardson, ferruh.yigit, gaetan.rivet,
	Thomas Monjalon
  Cc: konstantin.ananyev, jblunck, shreyansh.jain, jingjing.wu, dev,
	helin.zhang

Hello,
 I would like to apply this patch in order to review it.

Trying to apply it on 17.11 (and latest) fails due to missing lib/librte_eal/common/eal_common_vdev.c             
Trying to apply it on 17.08.1 fails on missing drivers/bus/pci/bsd/pci.c file

So, on what DPDK version should I apply it ? 
Or maybe there is a bunch of other  patches I have to apply in order to use this patch ?

Thanks
  Moti H.

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jeff Guo
> Sent: Wednesday, November 1, 2017 10:17 PM
> To: stephen@networkplumber.org; bruce.richardson@intel.com;
> ferruh.yigit@intel.com; gaetan.rivet@6wind.com; Thomas Monjalon
> <thomas@monjalon.net>
> Cc: konstantin.ananyev@intel.com; jblunck@infradead.org;
> shreyansh.jain@nxp.com; jingjing.wu@intel.com; dev@dpdk.org;
> jia.guo@intel.com; helin.zhang@intel.com
> Subject: [dpdk-dev] [PATCH v6 0/2] add uevent monitor for hot plug
> 
> So far, about hot plug in dpdk, we already have hot plug add/remove api and
> fail-safe driver to offload the fail-safe work from the app user. But there are
> still lack of a general event api, since the interrupt event, which hot plug
> related with, is diversity between each device and driver, such as mlx4, pci
> driver and others.
> 
> Use the hot removal event for example, pci drivers not all exposure the
> remove interrupt, so in order to make user to easy use the hot plug feature
> for pci driver, something must be done to detect the remove event at the
> kernel level and offer a new line of interrupt to the user land.
> 
> Base on the uevent of kobject mechanism in kernel, we could use it to
> benefit for monitoring the hot plug status of the device which not only
> uio/vfio of pci bus devices, but also other, such as cpu/usb/pci-express bus
> devices.
> 
> The idea is comming as bellow.
> 
> a.The uevent message form FD monitoring which will be useful.
> remove@/devices/pci0000:80/0000:80:02.2/0000:82:00.0/0000:83:03.0/0000:8
> 4:00.2/uio/uio2
> ACTION=remove
> DEVPATH=/devices/pci0000:80/0000:80:02.2/0000:82:00.0/0000:83:03.0/0000:
> 84:00.2/uio/uio2
> SUBSYSTEM=uio
> MAJOR=243
> MINOR=2
> DEVNAME=uio2
> SEQNUM=11366
> 
> b.add uevent monitoring machanism:
> add several general api to enable uevent monitoring.
> 
> c.add common uevent handler and uevent failure handler uevent of device
> should be handler at bus or device layer, and the memory read and write
> failure when hot removal should be handle correctly before detach
> behaviors.
> 
> d.show example how to use uevent monitor enable uevent monitoring in
> testpmd or fail-safe to show usage.
> 
> patchset history:
> v6->v5:
> 1.add hot plug policy, in eal, default handle to prepare hot plug work for all
> pci device, then let app to manage to deside which device need to hot plug.
> 2.modify to manage event callback in each device.
> 3.fix some system hung issue when igb_uio release.
> 4.modify the pci part to the bus-pci base on the bus rework.
> 5.add hot plug policy in app, show example to use hotplug list to manage to
> deside which device need to hot plug.
> 
> v5->v4:
> 1.Move uevent monitor epolling from eal interrupt to eal device layer.
> 2.Redefine the eal device API for common, and distinguish between linux
> and bsd 3.Add failure handler helper api in bus layer.Add function of find
> device by name.
> 4.Replace of individual fd bind with single device, use a common fd to polling
> all device.
> 5.Add to register hot insertion monitoring and process, add function to auto
> bind driver befor user add device 6.Refine some coding style and typos issue
> 7.add new callback to process hot insertion
> 
> v4->v3:
> 1.move uevent monitor api from eal interrupt to eal device layer.
> 2.create uevent type and struct in eal device.
> 3.move uevent handler for each driver to eal layer.
> 4.add uevent failure handler to process signal fault issue.
> 5.add example for request and use uevent monitoring in testpmd.
> 
> v3->v2:
> 1.refine some return error
> 2.refine the string searching logic to avoid memory issue
> 
> v2->v1:
> 1.remove global variables of hotplug_fd, add uevent_fd in rte_intr_handle to
> let each pci device self maintain it fd, to fix dual device fd issue.
> 2.refine some typo error.
> 
> Jeff Guo (2):
>   eal: add uevent monitor for hot plug
>   app/testpmd: use uevent to monitor hotplug
> 
>  app/test-pmd/testpmd.c                             | 172 ++++++++++
>  app/test-pmd/testpmd.h                             |   9 +
>  drivers/bus/pci/bsd/pci.c                          |  23 ++
>  drivers/bus/pci/linux/pci.c                        |  34 ++
>  drivers/bus/pci/linux/pci_init.h                   |   1 +
>  drivers/bus/pci/pci_common.c                       |  42 +++
>  drivers/bus/pci/pci_common_uio.c                   |  28 ++
>  drivers/bus/pci/private.h                          |  12 +
>  drivers/bus/pci/rte_bus_pci.h                      |   9 +
>  lib/librte_eal/bsdapp/eal/eal_dev.c                |  64 ++++
>  .../bsdapp/eal/include/exec-env/rte_dev.h          | 105 ++++++
>  lib/librte_eal/common/eal_common_bus.c             |  29 ++
>  lib/librte_eal/common/eal_common_dev.c             | 222 +++++++++++++
>  lib/librte_eal/common/eal_common_vdev.c            |  27 ++
>  lib/librte_eal/common/include/rte_bus.h            |  51 +++
>  lib/librte_eal/common/include/rte_dev.h            | 107 ++++++-
>  lib/librte_eal/linuxapp/eal/Makefile               |   3 +-
>  lib/librte_eal/linuxapp/eal/eal_dev.c              | 353
> +++++++++++++++++++++
>  .../linuxapp/eal/include/exec-env/rte_dev.h        | 105 ++++++
>  lib/librte_eal/linuxapp/igb_uio/igb_uio.c          |   6 +
>  lib/librte_pci/rte_pci.c                           |  20 ++
>  lib/librte_pci/rte_pci.h                           |  17 +
>  22 files changed, 1437 insertions(+), 2 deletions(-)  create mode 100644
> lib/librte_eal/bsdapp/eal/eal_dev.c
>  create mode 100644 lib/librte_eal/bsdapp/eal/include/exec-env/rte_dev.h
>  create mode 100644 lib/librte_eal/linuxapp/eal/eal_dev.c
>  create mode 100644 lib/librte_eal/linuxapp/eal/include/exec-env/rte_dev.h
> 
> --
> 2.7.4

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v6 0/2] add uevent monitor for hot plug
  2017-12-14  9:48                   ` [PATCH v6 0/2] add uevent monitor for hot plug Mordechay Haimovsky
@ 2017-12-14 10:21                     ` Gaëtan Rivet
  2017-12-22  0:16                       ` Guo, Jia
  0 siblings, 1 reply; 494+ messages in thread
From: Gaëtan Rivet @ 2017-12-14 10:21 UTC (permalink / raw)
  To: Mordechay Haimovsky; +Cc: Jeff Guo, dev

Hello Moti,

On Thu, Dec 14, 2017 at 09:48:23AM +0000, Mordechay Haimovsky wrote:
> Hello,
>  I would like to apply this patch in order to review it.
> 

In absence of answer from Jeff,

Those two paths were modified during the 17.08 release: both pci and
vdev buses have been moved to drivers/bus.

> Trying to apply it on 17.11 (and latest) fails due to missing lib/librte_eal/common/eal_common_vdev.c             
> Trying to apply it on 17.08.1 fails on missing drivers/bus/pci/bsd/pci.c file
> 

Only the pci bus move was integrated by Jeff to this version of the udev
monitor. The vdev bus move however came later and should be rebased
upon.

> So, on what DPDK version should I apply it ? 
> Or maybe there is a bunch of other  patches I have to apply in order to use this patch ?
> 

You should apply it on 17.11 IMO.
Either you take upon yourself to make it work with the new tree, or
wait for Jeff to send a new version.

-- 
Gaëtan Rivet
6WIND

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v6 0/2] add uevent monitor for hot plug
  2017-12-14 10:21                     ` Gaëtan Rivet
@ 2017-12-22  0:16                       ` Guo, Jia
  2017-12-24 15:12                         ` Mordechay Haimovsky
  0 siblings, 1 reply; 494+ messages in thread
From: Guo, Jia @ 2017-12-22  0:16 UTC (permalink / raw)
  To: Gaëtan Rivet, Mordechay Haimovsky; +Cc: dev

Moti, Hello and sorry for be reply late until now, definitely as gaetan said that there might be some change after the version, anyway I will create a new version to benefit you all to review and further test.  

Best regards,
Jeff Guo

-----Original Message-----
From: Gaëtan Rivet [mailto:gaetan.rivet@6wind.com] 
Sent: Thursday, December 14, 2017 6:21 PM
To: Mordechay Haimovsky <motih@mellanox.com>
Cc: Guo, Jia <jia.guo@intel.com>; dev@dpdk.org
Subject: Re: [dpdk-dev] [PATCH v6 0/2] add uevent monitor for hot plug

Hello Moti,

On Thu, Dec 14, 2017 at 09:48:23AM +0000, Mordechay Haimovsky wrote:
> Hello,
>  I would like to apply this patch in order to review it.
> 

In absence of answer from Jeff,

Those two paths were modified during the 17.08 release: both pci and vdev buses have been moved to drivers/bus.

> Trying to apply it on 17.11 (and latest) fails due to missing lib/librte_eal/common/eal_common_vdev.c             
> Trying to apply it on 17.08.1 fails on missing 
> drivers/bus/pci/bsd/pci.c file
> 

Only the pci bus move was integrated by Jeff to this version of the udev monitor. The vdev bus move however came later and should be rebased upon.

> So, on what DPDK version should I apply it ? 
> Or maybe there is a bunch of other  patches I have to apply in order to use this patch ?
> 

You should apply it on 17.11 IMO.
Either you take upon yourself to make it work with the new tree, or wait for Jeff to send a new version.

--
Gaëtan Rivet
6WIND

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v6 0/2] add uevent monitor for hot plug
  2017-12-22  0:16                       ` Guo, Jia
@ 2017-12-24 15:12                         ` Mordechay Haimovsky
  2018-01-02  9:43                           ` Guo, Jia
  0 siblings, 1 reply; 494+ messages in thread
From: Mordechay Haimovsky @ 2017-12-24 15:12 UTC (permalink / raw)
  To: Guo, Jia, Gaëtan Rivet; +Cc: dev

Thanks Jeff,
 Do you have an estimation on when will these patches be ready ?

Moti H.

> -----Original Message-----
> From: Guo, Jia [mailto:jia.guo@intel.com]
> Sent: Friday, December 22, 2017 2:16 AM
> To: Gaëtan Rivet <gaetan.rivet@6wind.com>; Mordechay Haimovsky
> <motih@mellanox.com>
> Cc: dev@dpdk.org
> Subject: RE: [dpdk-dev] [PATCH v6 0/2] add uevent monitor for hot plug
> 
> Moti, Hello and sorry for be reply late until now, definitely as gaetan said that
> there might be some change after the version, anyway I will create a new
> version to benefit you all to review and further test.
> 
> Best regards,
> Jeff Guo
> 
> -----Original Message-----
> From: Gaëtan Rivet [mailto:gaetan.rivet@6wind.com]
> Sent: Thursday, December 14, 2017 6:21 PM
> To: Mordechay Haimovsky <motih@mellanox.com>
> Cc: Guo, Jia <jia.guo@intel.com>; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v6 0/2] add uevent monitor for hot plug
> 
> Hello Moti,
> 
> On Thu, Dec 14, 2017 at 09:48:23AM +0000, Mordechay Haimovsky wrote:
> > Hello,
> >  I would like to apply this patch in order to review it.
> >
> 
> In absence of answer from Jeff,
> 
> Those two paths were modified during the 17.08 release: both pci and vdev
> buses have been moved to drivers/bus.
> 
> > Trying to apply it on 17.11 (and latest) fails due to missing
> lib/librte_eal/common/eal_common_vdev.c
> > Trying to apply it on 17.08.1 fails on missing
> > drivers/bus/pci/bsd/pci.c file
> >
> 
> Only the pci bus move was integrated by Jeff to this version of the udev
> monitor. The vdev bus move however came later and should be rebased
> upon.
> 
> > So, on what DPDK version should I apply it ?
> > Or maybe there is a bunch of other  patches I have to apply in order to use
> this patch ?
> >
> 
> You should apply it on 17.11 IMO.
> Either you take upon yourself to make it work with the new tree, or wait for
> Jeff to send a new version.
> 
> --
> Gaëtan Rivet
> 6WIND

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v6 1/2] eal: add uevent monitor for hot plug
  2017-11-01 21:41                     ` Stephen Hemminger
  2017-11-08  5:39                       ` Guo, Jia
@ 2017-12-25  8:30                       ` Guo, Jia
  1 sibling, 0 replies; 494+ messages in thread
From: Guo, Jia @ 2017-12-25  8:30 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: bruce.richardson, ferruh.yigit, gaetan.rivet, thomas,
	konstantin.ananyev, jblunck, shreyansh.jain, jingjing.wu, dev,
	helin.zhang



On 11/2/2017 5:41 AM, Stephen Hemminger wrote:
> On Thu,  2 Nov 2017 04:16:44 +0800
> Jeff Guo <jia.guo@intel.com> wrote:
>
>> +
>> +static int
>> +dev_uev_parse(const char *buf, struct rte_eal_uevent *event)
>> +{
>> +	char action[RTE_EAL_UEVENT_MSG_LEN];
>> +	char subsystem[RTE_EAL_UEVENT_MSG_LEN];
>> +	char dev_path[RTE_EAL_UEVENT_MSG_LEN];
>> +	char pci_slot_name[RTE_EAL_UEVENT_MSG_LEN];
>> +	int i = 0;
>> +
>> +	memset(action, 0, RTE_EAL_UEVENT_MSG_LEN);
>> +	memset(subsystem, 0, RTE_EAL_UEVENT_MSG_LEN);
>> +	memset(dev_path, 0, RTE_EAL_UEVENT_MSG_LEN);
>> +	memset(pci_slot_name, 0, RTE_EAL_UEVENT_MSG_LEN);
>> +
>> +	while (i < RTE_EAL_UEVENT_MSG_LEN) {
> Might be simpler, safer, clearer to use rte_strsplit here.
>
> And then have a table of fields rather than open coding the parsing.
>
i think your  point must be make sense , but it hardly use rte_strsplit 
here , because the tokens which need to parse is splited by '\0', even 
more multi adjacent '\0' in the buf witch come from the uevent massage.
>> f+		for (; i < RTE_EAL_UEVENT_MSG_LEN; i++) {
>> +			if (*buf)
>> +				break;
>> +			buf++;
>> +		}
>> +		if (!strncmp(buf, "libudev", 7)) {
>> +			buf += 7;
>> +			i += 7;
>> +			event->group = UEV_MONITOR_UDEV;
>> +		}
>> +		if (!strncmp(buf, "ACTION=", 7)) {
>> +			buf += 7;
>> +			i += 7;
>> +			snprintf(action, sizeof(action), "%s", buf);
>> +		} else if (!strncmp(buf, "DEVPATH=", 8)) {
>> +			buf += 8;
>> +			i += 8;
>> +			snprintf(dev_path, sizeof(dev_path), "%s", buf);
>> +		} else if (!strncmp(buf, "SUBSYSTEM=", 10)) {
>> +			buf += 10;
>> +			i += 10;
>> +			snprintf(subsystem, sizeof(subsystem), "%s", buf);
>> +		} else if (!strncmp(buf, "PCI_SLOT_NAME=", 14)) {
>> +			buf += 14;
>> +			i += 14;
>> +			snprintf(pci_slot_name, sizeof(subsystem), "%s", buf);
>> +		}
>> +		for (; i < RTE_EAL_UEVENT_MSG_LEN; i++) {
>> +			if (*buf == '\0')
>> +				break;
>> +			buf++;
>> +		}
>> +	}
>> +
>> +	if (!strncmp(subsystem, "pci", 3))
>> +		event->subsystem = UEV_SUBSYSTEM_PCI;
>> +	if (!strncmp(action, "add", 3))
>> +		event->type = RTE_EAL_DEV_EVENT_ADD;
>> +	if (!strncmp(action, "remove", 6))
>> +		event->type = RTE_EAL_DEV_EVENT_REMOVE;
>> +	event->devname = pci_slot_name;
>> +
>> +	return 0;
> Function always returns 0, why is it not void?

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v6 1/2] eal: add uevent monitor for hot plug
  2017-11-01 20:16                   ` [PATCH v6 1/2] eal: " Jeff Guo
  2017-11-01 21:36                     ` Stephen Hemminger
  2017-11-01 21:41                     ` Stephen Hemminger
@ 2017-12-25 18:06                     ` Stephen Hemminger
  2018-01-02  9:40                       ` Guo, Jia
  2 siblings, 1 reply; 494+ messages in thread
From: Stephen Hemminger @ 2017-12-25 18:06 UTC (permalink / raw)
  To: Jeff Guo
  Cc: bruce.richardson, ferruh.yigit, gaetan.rivet, thomas,
	konstantin.ananyev, jblunck, shreyansh.jain, jingjing.wu, dev,
	helin.zhang

On Thu,  2 Nov 2017 04:16:44 +0800
Jeff Guo <jia.guo@intel.com> wrote:

> +int
> +rte_dev_bind_driver(const char *dev_name, const char *drv_type) {

Bracket left after declaration.


> +	snprintf(drv_override_path, sizeof(drv_override_path),
> +		"/sys/bus/pci/devices/%s/driver_override", dev_name);
> +
> +	/* specify the driver for a device by writing to driver_override */
> +	drv_override_fd = open(drv_override_path, O_WRONLY);
> +	if (drv_override_fd < 0) {
> +		RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
> +			drv_override_path, strerror(errno));
> +		goto err;
> +	}


You should not have dev functions that assume PCI. Please split into common
and bus specific code.



> +static int
> +dev_uev_parse(const char *buf, struct rte_eal_uevent *event)
> +{
> +	char action[RTE_EAL_UEVENT_MSG_LEN];
> +	char subsystem[RTE_EAL_UEVENT_MSG_LEN];
> +	char dev_path[RTE_EAL_UEVENT_MSG_LEN];
> +	char pci_slot_name[RTE_EAL_UEVENT_MSG_LEN];
> +	int i = 0;
> +
> +	memset(action, 0, RTE_EAL_UEVENT_MSG_LEN);
> +	memset(subsystem, 0, RTE_EAL_UEVENT_MSG_LEN);
> +	memset(dev_path, 0, RTE_EAL_UEVENT_MSG_LEN);
> +	memset(pci_slot_name, 0, RTE_EAL_UEVENT_MSG_LEN);
> +
> +	while (i < RTE_EAL_UEVENT_MSG_LEN) {
> +		for (; i < RTE_EAL_UEVENT_MSG_LEN; i++) {
> +			if (*buf)
> +				break;
> +			buf++;
> +		}
> +		if (!strncmp(buf, "libudev", 7)) {
> +			buf += 7;
> +			i += 7;
> +			event->group = UEV_MONITOR_UDEV;
> +		}
> +		if (!strncmp(buf, "ACTION=", 7)) {
> +			buf += 7;
> +			i += 7;
> +			snprintf(action, sizeof(action), "%s", buf);

Why snprintf rather than strncpy?


> +		} else if (!strncmp(buf, "DEVPATH=", 8)) {
> +			buf += 8;
> +			i += 8;
> +			snprintf(dev_path, sizeof(dev_path), "%s", buf);
> +		} else if (!strncmp(buf, "SUBSYSTEM=", 10)) {
> +			buf += 10;
> +			i += 10;
> +			snprintf(subsystem, sizeof(subsystem), "%s", buf);
> +		} else if (!strncmp(buf, "PCI_SLOT_NAME=", 14)) {
> +			buf += 14;
> +			i += 14;
> +			snprintf(pci_slot_name, sizeof(subsystem), "%s", buf);
> +		}
> +		for (; i < RTE_EAL_UEVENT_MSG_LEN; i++) {
> +			if (*buf == '\0')
> +				break;
> +			buf++;
> +		}
> +	}
> +
> +	if (!strncmp(subsystem, "pci", 3))
> +		event->subsystem = UEV_SUBSYSTEM_PCI;
> +	if (!strncmp(action, "add", 3))
> +		event->type = RTE_EAL_DEV_EVENT_ADD;
> +	if (!strncmp(action, "remove", 6))
> +		event->type = RTE_EAL_DEV_EVENT_REMOVE;
> +	event->devname = pci_slot_name;

Why do you need to first capture the strings, then set state variables?
Instead why not update event->xxx directly?

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v6 1/2] eal: add uevent monitor for hot plug
  2017-12-25 18:06                     ` Stephen Hemminger
@ 2018-01-02  9:40                       ` Guo, Jia
  0 siblings, 0 replies; 494+ messages in thread
From: Guo, Jia @ 2018-01-02  9:40 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: bruce.richardson, ferruh.yigit, gaetan.rivet, thomas,
	konstantin.ananyev, jblunck, shreyansh.jain, jingjing.wu, dev,
	helin.zhang



On 12/26/2017 2:06 AM, Stephen Hemminger wrote:
> On Thu,  2 Nov 2017 04:16:44 +0800
> Jeff Guo <jia.guo@intel.com> wrote:
>
>> +int
>> +rte_dev_bind_driver(const char *dev_name, const char *drv_type) {
> Bracket left after declaration.
thanks.
>
>
>> +	snprintf(drv_override_path, sizeof(drv_override_path),
>> +		"/sys/bus/pci/devices/%s/driver_override", dev_name);
>> +
>> +	/* specify the driver for a device by writing to driver_override */
>> +	drv_override_fd = open(drv_override_path, O_WRONLY);
>> +	if (drv_override_fd < 0) {
>> +		RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
>> +			drv_override_path, strerror(errno));
>> +		goto err;
>> +	}
>
> You should not have dev functions that assume PCI. Please split into common
> and bus specific code.
>
>
make sense, will modify it into bus specific code.
>> +static int
>> +dev_uev_parse(const char *buf, struct rte_eal_uevent *event)
>> +{
>> +	char action[RTE_EAL_UEVENT_MSG_LEN];
>> +	char subsystem[RTE_EAL_UEVENT_MSG_LEN];
>> +	char dev_path[RTE_EAL_UEVENT_MSG_LEN];
>> +	char pci_slot_name[RTE_EAL_UEVENT_MSG_LEN];
>> +	int i = 0;
>> +
>> +	memset(action, 0, RTE_EAL_UEVENT_MSG_LEN);
>> +	memset(subsystem, 0, RTE_EAL_UEVENT_MSG_LEN);
>> +	memset(dev_path, 0, RTE_EAL_UEVENT_MSG_LEN);
>> +	memset(pci_slot_name, 0, RTE_EAL_UEVENT_MSG_LEN);
>> +
>> +	while (i < RTE_EAL_UEVENT_MSG_LEN) {
>> +		for (; i < RTE_EAL_UEVENT_MSG_LEN; i++) {
>> +			if (*buf)
>> +				break;
>> +			buf++;
>> +		}
>> +		if (!strncmp(buf, "libudev", 7)) {
>> +			buf += 7;
>> +			i += 7;
>> +			event->group = UEV_MONITOR_UDEV;
>> +		}
>> +		if (!strncmp(buf, "ACTION=", 7)) {
>> +			buf += 7;
>> +			i += 7;
>> +			snprintf(action, sizeof(action), "%s", buf);
> Why snprintf rather than strncpy?
>
snprintf would no need manual write '\0' and the src length is not 
explicit, and if concern about the efficiency of the snprintf scan, i 
will constrain the value of dest buf length.
>> +		} else if (!strncmp(buf, "DEVPATH=", 8)) {
>> +			buf += 8;
>> +			i += 8;
>> +			snprintf(dev_path, sizeof(dev_path), "%s", buf);
>> +		} else if (!strncmp(buf, "SUBSYSTEM=", 10)) {
>> +			buf += 10;
>> +			i += 10;
>> +			snprintf(subsystem, sizeof(subsystem), "%s", buf);
>> +		} else if (!strncmp(buf, "PCI_SLOT_NAME=", 14)) {
>> +			buf += 14;
>> +			i += 14;
>> +			snprintf(pci_slot_name, sizeof(subsystem), "%s", buf);
>> +		}
>> +		for (; i < RTE_EAL_UEVENT_MSG_LEN; i++) {
>> +			if (*buf == '\0')
>> +				break;
>> +			buf++;
>> +		}
>> +	}
>> +
>> +	if (!strncmp(subsystem, "pci", 3))
>> +		event->subsystem = UEV_SUBSYSTEM_PCI;
>> +	if (!strncmp(action, "add", 3))
>> +		event->type = RTE_EAL_DEV_EVENT_ADD;
>> +	if (!strncmp(action, "remove", 6))
>> +		event->type = RTE_EAL_DEV_EVENT_REMOVE;
>> +	event->devname = pci_slot_name;
> Why do you need to first capture the strings, then set state variables?
> Instead why not update event->xxx directly?
i think that would be more benefit to read and manage out of the loop.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v6 0/2] add uevent monitor for hot plug
  2017-12-24 15:12                         ` Mordechay Haimovsky
@ 2018-01-02  9:43                           ` Guo, Jia
  0 siblings, 0 replies; 494+ messages in thread
From: Guo, Jia @ 2018-01-02  9:43 UTC (permalink / raw)
  To: Mordechay Haimovsky, Gaëtan Rivet; +Cc: dev

hi,moti

please see v7 patch set , thanks.

On 12/24/2017 11:12 PM, Mordechay Haimovsky wrote:
> Thanks Jeff,
>   Do you have an estimation on when will these patches be ready ?
>
> Moti H.
>
>> -----Original Message-----
>> From: Guo, Jia [mailto:jia.guo@intel.com]
>> Sent: Friday, December 22, 2017 2:16 AM
>> To: Gaëtan Rivet <gaetan.rivet@6wind.com>; Mordechay Haimovsky
>> <motih@mellanox.com>
>> Cc: dev@dpdk.org
>> Subject: RE: [dpdk-dev] [PATCH v6 0/2] add uevent monitor for hot plug
>>
>> Moti, Hello and sorry for be reply late until now, definitely as gaetan said that
>> there might be some change after the version, anyway I will create a new
>> version to benefit you all to review and further test.
>>
>> Best regards,
>> Jeff Guo
>>
>> -----Original Message-----
>> From: Gaëtan Rivet [mailto:gaetan.rivet@6wind.com]
>> Sent: Thursday, December 14, 2017 6:21 PM
>> To: Mordechay Haimovsky <motih@mellanox.com>
>> Cc: Guo, Jia <jia.guo@intel.com>; dev@dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH v6 0/2] add uevent monitor for hot plug
>>
>> Hello Moti,
>>
>> On Thu, Dec 14, 2017 at 09:48:23AM +0000, Mordechay Haimovsky wrote:
>>> Hello,
>>>   I would like to apply this patch in order to review it.
>>>
>> In absence of answer from Jeff,
>>
>> Those two paths were modified during the 17.08 release: both pci and vdev
>> buses have been moved to drivers/bus.
>>
>>> Trying to apply it on 17.11 (and latest) fails due to missing
>> lib/librte_eal/common/eal_common_vdev.c
>>> Trying to apply it on 17.08.1 fails on missing
>>> drivers/bus/pci/bsd/pci.c file
>>>
>> Only the pci bus move was integrated by Jeff to this version of the udev
>> monitor. The vdev bus move however came later and should be rebased
>> upon.
>>
>>> So, on what DPDK version should I apply it ?
>>> Or maybe there is a bunch of other  patches I have to apply in order to use
>> this patch ?
>> You should apply it on 17.11 IMO.
>> Either you take upon yourself to make it work with the new tree, or wait for
>> Jeff to send a new version.
>>
>> --
>> Gaëtan Rivet
>> 6WIND

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v7 1/2] eal: add uevent monitor for hot plug
  2018-01-03  1:42                       ` [PATCH v7 1/2] eal: " Jeff Guo
@ 2018-01-02 17:02                         ` Matan Azrad
  2018-01-08  5:26                           ` Guo, Jia
  2018-01-08  6:05                           ` Guo, Jia
  2018-01-09  0:39                         ` Thomas Monjalon
  1 sibling, 2 replies; 494+ messages in thread
From: Matan Azrad @ 2018-01-02 17:02 UTC (permalink / raw)
  To: Jeff Guo, stephen, bruce.richardson, ferruh.yigit, gaetan.rivet
  Cc: konstantin.ananyev, jblunck, shreyansh.jain, jingjing.wu, dev,
	Thomas Monjalon, helin.zhang, Mordechay Haimovsky

Hi Jeff

Maybe I'm touching in previous discussions but please see some comments\questions.

From: Jeff Guo:
> This patch aim to add a general uevent mechanism in eal device layer,
> to enable all linux kernel object hot plug monitoring, so user could use these
> APIs to monitor and read out the device status info that sent from the kernel
> side, then corresponding to handle it, such as detach or attach the
> device, and even benefit to use it to do smoothly fail safe work.
> 
> 1) About uevent monitoring:
> a: add one epolling to poll the netlink socket, to monitor the uevent of
>    the device, add device_state in struct of rte_device, to identify the
>    device state machine.
> b: add enum of rte_eal_dev_event_type and struct of rte_eal_uevent.
> c: add below API in rte eal device common layer.
>    rte_eal_dev_monitor_enable
>    rte_dev_callback_register
>    rte_dev_callback_unregister
>    _rte_dev_callback_process
>    rte_dev_monitor_start
>    rte_dev_monitor_stop
> 
> 2) About failure handler, use pci uio for example,
>    add pci_remap_device in bus layer and below function to process it:
>    rte_pci_remap_device
>    pci_uio_remap_resource
>    pci_map_private_resource
>    add rte_pci_dev_bind_driver to bind pci device with explicit driver.
> 
> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> ---
> v7->v6:
> a.modify vdev part according to the vdev rework
> b.re-define and split the func into common and bus specific code
> c.fix some incorrect issue.
> b.fix the system hung after send packcet issue.
> ---
>  drivers/bus/pci/bsd/pci.c                          |  30 ++
>  drivers/bus/pci/linux/pci.c                        |  87 +++++
>  drivers/bus/pci/linux/pci_init.h                   |   1 +
>  drivers/bus/pci/pci_common.c                       |  43 +++
>  drivers/bus/pci/pci_common_uio.c                   |  28 ++
>  drivers/bus/pci/private.h                          |  12 +
>  drivers/bus/pci/rte_bus_pci.h                      |  25 ++
>  drivers/bus/vdev/vdev.c                            |  36 +++
>  lib/librte_eal/bsdapp/eal/eal_dev.c                |  64 ++++
>  .../bsdapp/eal/include/exec-env/rte_dev.h          | 106 ++++++
>  lib/librte_eal/common/eal_common_bus.c             |  30 ++
>  lib/librte_eal/common/eal_common_dev.c             | 169 ++++++++++
>  lib/librte_eal/common/include/rte_bus.h            |  69 ++++
>  lib/librte_eal/common/include/rte_dev.h            |  89 ++++++
>  lib/librte_eal/linuxapp/eal/Makefile               |   3 +-
>  lib/librte_eal/linuxapp/eal/eal_alarm.c            |   5 +
>  lib/librte_eal/linuxapp/eal/eal_dev.c              | 356
> +++++++++++++++++++++
>  .../linuxapp/eal/include/exec-env/rte_dev.h        | 106 ++++++
>  lib/librte_eal/linuxapp/igb_uio/igb_uio.c          |   6 +
>  lib/librte_pci/rte_pci.c                           |  20 ++
>  lib/librte_pci/rte_pci.h                           |  17 +
>  21 files changed, 1301 insertions(+), 1 deletion(-)
>  create mode 100644 lib/librte_eal/bsdapp/eal/eal_dev.c
>  create mode 100644 lib/librte_eal/bsdapp/eal/include/exec-env/rte_dev.h
>  create mode 100644 lib/librte_eal/linuxapp/eal/eal_dev.c
>  create mode 100644 lib/librte_eal/linuxapp/eal/include/exec-env/rte_dev.h
> 
> diff --git a/drivers/bus/pci/bsd/pci.c b/drivers/bus/pci/bsd/pci.c
> index b8e2178..d58dbf6 100644
> --- a/drivers/bus/pci/bsd/pci.c
> +++ b/drivers/bus/pci/bsd/pci.c
> @@ -126,6 +126,29 @@ rte_pci_unmap_device(struct rte_pci_device *dev)
>  	}
>  }
> 
> +/* re-map pci device */
> +int
> +rte_pci_remap_device(struct rte_pci_device *dev)
> +{
> +	int ret;
> +
> +	if (dev == NULL)
> +		return -EINVAL;
> +
> +	switch (dev->kdrv) {
> +	case RTE_KDRV_NIC_UIO:
> +		ret = pci_uio_remap_resource(dev);
> +		break;
> +	default:
> +		RTE_LOG(DEBUG, EAL,
> +			"  Not managed by a supported kernel driver,
> skipped\n");
> +		ret = 1;
> +		break;
> +	}
> +
> +	return ret;
> +}
> +
>  void
>  pci_uio_free_resource(struct rte_pci_device *dev,
>  		struct mapped_pci_resource *uio_res)
> @@ -678,3 +701,10 @@ rte_pci_ioport_unmap(struct rte_pci_ioport *p)
> 
>  	return ret;
>  }
> +
> +int
> +rte_pci_dev_bind_driver(const char *dev_name, const char *drv_type)
> +{
> +	return -1;
> +}
> +
> diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
> index 5da6728..792fd2c 100644
> --- a/drivers/bus/pci/linux/pci.c
> +++ b/drivers/bus/pci/linux/pci.c
> @@ -145,6 +145,38 @@ rte_pci_unmap_device(struct rte_pci_device *dev)
>  	}
>  }
> 
> +/* Map pci device */
> +int
> +rte_pci_remap_device(struct rte_pci_device *dev)
> +{
> +	int ret = -1;
> +
> +	if (dev == NULL)
> +		return -EINVAL;
> +
> +	switch (dev->kdrv) {
> +	case RTE_KDRV_VFIO:
> +#ifdef VFIO_PRESENT
> +		/* no thing to do */
> +#endif
> +		break;
> +	case RTE_KDRV_IGB_UIO:
> +	case RTE_KDRV_UIO_GENERIC:
> +		if (rte_eal_using_phys_addrs()) {
> +			/* map resources for devices that use uio */
> +			ret = pci_uio_remap_resource(dev);
> +		}
> +		break;
> +	default:
> +		RTE_LOG(DEBUG, EAL,
> +			"  Not managed by a supported kernel driver,
> skipped\n");
> +		ret = 1;
> +		break;
> +	}
> +
> +	return ret;
> +}
> +
>  void *
>  pci_find_max_end_va(void)
>  {
> @@ -386,6 +418,8 @@ pci_scan_one(const char *dirname, const struct
> rte_pci_addr *addr)
>  		rte_pci_add_device(dev);
>  	}
> 
> +	dev->device.state = DEVICE_PARSED;
> +	TAILQ_INIT(&(dev->device.uev_cbs));
>  	return 0;
>  }
> 
> @@ -854,3 +888,56 @@ rte_pci_ioport_unmap(struct rte_pci_ioport *p)
> 
>  	return ret;
>  }
> +
> +int
> +rte_pci_dev_bind_driver(const char *dev_name, const char *drv_type)
> +{
> +	char drv_bind_path[1024];
> +	char drv_override_path[1024]; /* contains the /dev/uioX */
> +	int drv_override_fd;
> +	int drv_bind_fd;
> +
> +	RTE_SET_USED(drv_type);
> +
> +	snprintf(drv_override_path, sizeof(drv_override_path),
> +		"/sys/bus/pci/devices/%s/driver_override", dev_name);
> +
> +	/* specify the driver for a device by writing to driver_override */
> +	drv_override_fd = open(drv_override_path, O_WRONLY);
> +	if (drv_override_fd < 0) {
> +		RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
> +			drv_override_path, strerror(errno));
> +		goto err;
> +	}
> +
> +	if (write(drv_override_fd, drv_type, sizeof(drv_type)) < 0) {
> +		RTE_LOG(ERR, EAL,
> +			"Error: bind failed - Cannot write "
> +			"driver %s to device %s\n", drv_type, dev_name);
> +		goto err;
> +	}
> +
> +	close(drv_override_fd);
> +
> +	snprintf(drv_bind_path, sizeof(drv_bind_path),
> +		"/sys/bus/pci/drivers/%s/bind", drv_type);
> +
> +	/* do the bind by writing device to the specific driver  */
> +	drv_bind_fd = open(drv_bind_path, O_WRONLY | O_APPEND);
> +	if (drv_bind_fd < 0) {
> +		RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
> +			drv_bind_path, strerror(errno));
> +		goto err;
> +	}
> +
> +	if (write(drv_bind_fd, dev_name, sizeof(dev_name)) < 0)
> +		goto err;
> +
> +	close(drv_bind_fd);
> +	return 0;
> +err:
> +	close(drv_override_fd);
> +	close(drv_bind_fd);
> +	return -1;
> +}
> +
> diff --git a/drivers/bus/pci/linux/pci_init.h b/drivers/bus/pci/linux/pci_init.h
> index f342c47..5838402 100644
> --- a/drivers/bus/pci/linux/pci_init.h
> +++ b/drivers/bus/pci/linux/pci_init.h
> @@ -58,6 +58,7 @@ int pci_uio_alloc_resource(struct rte_pci_device *dev,
>  		struct mapped_pci_resource **uio_res);
>  void pci_uio_free_resource(struct rte_pci_device *dev,
>  		struct mapped_pci_resource *uio_res);
> +int pci_uio_remap_resource(struct rte_pci_device *dev);
>  int pci_uio_map_resource_by_index(struct rte_pci_device *dev, int
> res_idx,
>  		struct mapped_pci_resource *uio_res, int map_idx);
> 
> diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
> index 104fdf9..5417b32 100644
> --- a/drivers/bus/pci/pci_common.c
> +++ b/drivers/bus/pci/pci_common.c
> @@ -282,6 +282,7 @@ pci_probe_all_drivers(struct rte_pci_device *dev)
>  		if (rc > 0)
>  			/* positive value means driver doesn't support it */
>  			continue;
> +		dev->device.state = DEVICE_PROBED;
>  		return 0;
>  	}
>  	return 1;
> @@ -481,6 +482,7 @@ rte_pci_insert_device(struct rte_pci_device
> *exist_pci_dev,
>  void
>  rte_pci_remove_device(struct rte_pci_device *pci_dev)
>  {
> +	RTE_LOG(DEBUG, EAL, " rte_pci_remove_device for device list\n");
>  	TAILQ_REMOVE(&rte_pci_bus.device_list, pci_dev, next);
>  }
> 
> @@ -502,6 +504,44 @@ pci_find_device(const struct rte_device *start,
> rte_dev_cmp_t cmp,
>  	return NULL;
>  }
> 
> +static struct rte_device *
> +pci_find_device_by_name(const struct rte_device *start,
> +		rte_dev_cmp_name_t cmp_name,
> +		const void *data)
> +{
> +	struct rte_pci_device *dev;
> +
> +	FOREACH_DEVICE_ON_PCIBUS(dev) {
> +		if (start && &dev->device == start) {
> +			start = NULL; /* starting point found */
> +			continue;
> +		}
> +		if (cmp_name(dev->device.name, data) == 0)
> +			return &dev->device;
> +	}
> +
> +	return NULL;
> +}
> +
> +static int
> +pci_remap_device(struct rte_device *dev)
> +{
> +	struct rte_pci_device *pdev;
> +	int ret;
> +
> +	if (dev == NULL)
> +		return -EINVAL;
> +
> +	pdev = RTE_DEV_TO_PCI(dev);
> +
> +	/* remap resources for devices that use igb_uio */
> +	ret = rte_pci_remap_device(pdev);
> +	if (ret != 0)
> +		RTE_LOG(ERR, EAL, "failed to remap device %s",
> +			dev->name);
> +	return ret;
> +}
> +
>  static int
>  pci_plug(struct rte_device *dev)
>  {
> @@ -528,10 +568,13 @@ struct rte_pci_bus rte_pci_bus = {
>  		.scan = rte_pci_scan,
>  		.probe = rte_pci_probe,
>  		.find_device = pci_find_device,
> +		.find_device_by_name = pci_find_device_by_name,
>  		.plug = pci_plug,
>  		.unplug = pci_unplug,
>  		.parse = pci_parse,
>  		.get_iommu_class = rte_pci_get_iommu_class,
> +		.remap_device = pci_remap_device,
> +		.bind_driver = rte_pci_dev_bind_driver,
>  	},
>  	.device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
>  	.driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
> diff --git a/drivers/bus/pci/pci_common_uio.c
> b/drivers/bus/pci/pci_common_uio.c
> index 0671131..8cb4009 100644
> --- a/drivers/bus/pci/pci_common_uio.c
> +++ b/drivers/bus/pci/pci_common_uio.c
> @@ -176,6 +176,34 @@ pci_uio_unmap(struct mapped_pci_resource
> *uio_res)
>  	}
>  }
> 
> +/* remap the PCI resource of a PCI device in private virtual memory */
> +int
> +pci_uio_remap_resource(struct rte_pci_device *dev)
> +{
> +	int i;
> +	uint64_t phaddr;
> +	void *map_address;
> +
> +	/* Map all BARs */
> +	for (i = 0; i != PCI_MAX_RESOURCE; i++) {
> +		/* skip empty BAR */
> +		phaddr = dev->mem_resource[i].phys_addr;
> +		if (phaddr == 0)
> +			continue;
> +		map_address = pci_map_private_resource(
> +				dev->mem_resource[i].addr, 0,
> +				(size_t)dev->mem_resource[i].len);
> +		if (map_address == MAP_FAILED)
> +			goto error;
> +		memset(map_address, 0xFF, (size_t)dev-
> >mem_resource[i].len);
> +		dev->mem_resource[i].addr = map_address;
> +	}
> +
> +	return 0;
> +error:
> +	return -1;
> +}
> +
>  static struct mapped_pci_resource *
>  pci_uio_find_resource(struct rte_pci_device *dev)
>  {
> diff --git a/drivers/bus/pci/private.h b/drivers/bus/pci/private.h
> index 2283f09..10baa1a 100644
> --- a/drivers/bus/pci/private.h
> +++ b/drivers/bus/pci/private.h
> @@ -202,6 +202,18 @@ void pci_uio_free_resource(struct rte_pci_device
> *dev,
>  		struct mapped_pci_resource *uio_res);
> 
>  /**
> + * remap the pci uio resource..
> + *
> + * @param dev
> + *   Point to the struct rte pci device.
> + * @return
> + *   - On success, zero.
> + *   - On failure, a negative value.
> + */
> +int
> +pci_uio_remap_resource(struct rte_pci_device *dev);
> +
> +/**
>   * Map device memory to uio resource
>   *
>   * This function is private to EAL.
> diff --git a/drivers/bus/pci/rte_bus_pci.h b/drivers/bus/pci/rte_bus_pci.h
> index d4a2996..1662f3b 100644
> --- a/drivers/bus/pci/rte_bus_pci.h
> +++ b/drivers/bus/pci/rte_bus_pci.h
> @@ -52,6 +52,8 @@ extern "C" {
>  #include <sys/queue.h>
>  #include <stdint.h>
>  #include <inttypes.h>
> +#include <unistd.h>
> +#include <fcntl.h>
> 
>  #include <rte_debug.h>
>  #include <rte_interrupts.h>
> @@ -197,6 +199,15 @@ int rte_pci_map_device(struct rte_pci_device *dev);
>  void rte_pci_unmap_device(struct rte_pci_device *dev);
> 
>  /**
> + * Remap this device
> + *
> + * @param dev
> + *   A pointer to a rte_pci_device structure describing the device
> + *   to use
> + */
> +int rte_pci_remap_device(struct rte_pci_device *dev);
> +
> +/**
>   * Dump the content of the PCI bus.
>   *
>   * @param f
> @@ -333,6 +344,20 @@ void rte_pci_ioport_read(struct rte_pci_ioport *p,
>  void rte_pci_ioport_write(struct rte_pci_ioport *p,
>  		const void *data, size_t len, off_t offset);
> 
> +/**
> + * It can be used to bind a device to a specific type of driver.
> + *
> + * @param dev_name
> + *  The device name.
> + * @param drv_type
> + *  The specific driver's type.
> + *
> + * @return
> + *  - On success, zero.
> + *  - On failure, a negative value.
> + */
> +int rte_pci_dev_bind_driver(const char *dev_name, const char *drv_type);
> +
>  #ifdef __cplusplus
>  }
>  #endif
> diff --git a/drivers/bus/vdev/vdev.c b/drivers/bus/vdev/vdev.c
> index fd7736d..773f6e0 100644
> --- a/drivers/bus/vdev/vdev.c
> +++ b/drivers/bus/vdev/vdev.c
> @@ -323,6 +323,39 @@ vdev_find_device(const struct rte_device *start,
> rte_dev_cmp_t cmp,
>  	return NULL;
>  }
> 
> +static struct rte_device *
> +vdev_find_device_by_name(const struct rte_device *start,
> +		rte_dev_cmp_name_t cmp_name,
> +		const void *data)
> +{
> +	struct rte_vdev_device *dev;
> +
> +	TAILQ_FOREACH(dev, &vdev_device_list, next) {
> +		if (start && &dev->device == start) {
> +			start = NULL;
> +			continue;
> +		}
> +		if (cmp_name(dev->device.name, data) == 0)
> +			return &dev->device;
> +	}
> +	return NULL;
> +}
> +
> +static int
> +vdev_remap_device(struct rte_device *dev)
> +{
> +	RTE_SET_USED(dev);
> +	return 0;
> +}
> +
> +static int
> +vdev_bind_driver(const char *dev_name, const char *drv_type)
> +{
> +	RTE_SET_USED(dev_name);
> +	RTE_SET_USED(drv_type);
> +	return 0;
> +}
> +
>  static int
>  vdev_plug(struct rte_device *dev)
>  {
> @@ -339,9 +372,12 @@ static struct rte_bus rte_vdev_bus = {
>  	.scan = vdev_scan,
>  	.probe = vdev_probe,
>  	.find_device = vdev_find_device,
> +	.find_device_by_name = vdev_find_device_by_name,
>  	.plug = vdev_plug,
>  	.unplug = vdev_unplug,
>  	.parse = vdev_parse,
> +	.remap_device = vdev_remap_device,
> +	.bind_driver = vdev_bind_driver,
>  };
> 
>  RTE_REGISTER_BUS(vdev, rte_vdev_bus);
> diff --git a/lib/librte_eal/bsdapp/eal/eal_dev.c
> b/lib/librte_eal/bsdapp/eal/eal_dev.c
> new file mode 100644
> index 0000000..6ea9a74
> --- /dev/null
> +++ b/lib/librte_eal/bsdapp/eal/eal_dev.c
> @@ -0,0 +1,64 @@
> +/*-
> + *   Copyright(c) 2010-2017 Intel Corporation.
> + *   All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in
> + *       the documentation and/or other materials provided with the
> + *       distribution.
> + *     * Neither the name of Intel Corporation nor the names of its
> + *       contributors may be used to endorse or promote products derived
> + *       from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
> CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT
> NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
> FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
> COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
> INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
> NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS
> OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
> AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
> TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
> THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
> DAMAGE.
> + */
> +
> +#include <stdio.h>
> +#include <string.h>
> +#include <inttypes.h>
> +#include <sys/queue.h>
> +#include <sys/signalfd.h>
> +#include <sys/ioctl.h>
> +#include <sys/socket.h>
> +#include <linux/netlink.h>
> +#include <sys/epoll.h>
> +#include <unistd.h>
> +#include <signal.h>
> +#include <stdbool.h>
> +
> +#include <rte_malloc.h>
> +#include <rte_bus.h>
> +#include <rte_dev.h>
> +#include <rte_devargs.h>
> +#include <rte_debug.h>
> +#include <rte_log.h>
> +
> +#include "eal_thread.h"
> +
> +int
> +rte_dev_monitor_start(void)
> +{
> +	return -1;
> +}
> +
> +int
> +rte_dev_monitor_stop(void)
> +{
> +	return -1;
> +}
> diff --git a/lib/librte_eal/bsdapp/eal/include/exec-env/rte_dev.h
> b/lib/librte_eal/bsdapp/eal/include/exec-env/rte_dev.h
> new file mode 100644
> index 0000000..6a6feb5
> --- /dev/null
> +++ b/lib/librte_eal/bsdapp/eal/include/exec-env/rte_dev.h
> @@ -0,0 +1,106 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
> + *   All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in
> + *       the documentation and/or other materials provided with the
> + *       distribution.
> + *     * Neither the name of Intel Corporation nor the names of its
> + *       contributors may be used to endorse or promote products derived
> + *       from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
> CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT
> NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
> FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
> COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
> INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
> NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS
> OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
> AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
> TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
> THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
> DAMAGE.
> + */
> +
> +#ifndef _RTE_DEV_H_
> +#error "don't include this file directly, please include generic <rte_dev.h>"
> +#endif
> +
> +#ifndef _RTE_LINUXAPP_DEV_H_
> +#define _RTE_LINUXAPP_DEV_H_
> +
> +#include <stdio.h>
> +
> +#include <rte_dev.h>
> +
> +#define RTE_EAL_UEV_MSG_LEN 4096
> +#define RTE_EAL_UEV_MSG_ELEM_LEN 128
> +
> +enum uev_subsystem {
> +	UEV_SUBSYSTEM_UIO,
> +	UEV_SUBSYSTEM_VFIO,
> +	UEV_SUBSYSTEM_PCI,
> +	UEV_SUBSYSTEM_MAX
> +};
> +
> +enum uev_monitor_netlink_group {
> +	UEV_MONITOR_KERNEL,
> +	UEV_MONITOR_UDEV,
> +};
> +
> +/**
> + * The device event type.
> + */
> +enum rte_eal_dev_event_type {
> +	RTE_EAL_DEV_EVENT_UNKNOWN,	/**< unknown event type */
> +	RTE_EAL_DEV_EVENT_ADD,		/**< device adding event */
> +	RTE_EAL_DEV_EVENT_REMOVE,
> +					/**< device removing event */
> +	RTE_EAL_DEV_EVENT_CHANGE,
> +					/**< device status change event */
> +	RTE_EAL_DEV_EVENT_MOVE,		/**< device sys path move
> event */
> +	RTE_EAL_DEV_EVENT_ONLINE,	/**< device online event */
> +	RTE_EAL_DEV_EVENT_OFFLINE,	/**< device offline event */
> +	RTE_EAL_DEV_EVENT_MAX		/**< max value of this enum
> */
> +};
> +
> +struct rte_eal_uevent {
> +	enum rte_eal_dev_event_type type;	/**< device event type */
> +	int subsystem;				/**< subsystem id */
> +	char *devname;				/**< device name */
> +	enum uev_monitor_netlink_group group;	/**< device netlink
> group */
> +};
> +
> +/**
> + * Start the device uevent monitoring.
> + *
> + * @param none
> + * @return
> + *   - On success, zero.
> + *   - On failure, a negative value.
> + */
> +int
> +rte_dev_monitor_start(void);
> +
> +/**
> + * Stop the device uevent monitoring .
> + *
> + * @param none
> + * @return
> + *   - On success, zero.
> + *   - On failure, a negative value.
> + */
> +
> +int
> +rte_dev_monitor_stop(void);
> +
> +#endif /* _RTE_LINUXAPP_DEV_H_ */
> diff --git a/lib/librte_eal/common/eal_common_bus.c
> b/lib/librte_eal/common/eal_common_bus.c
> index 3e022d5..b7219c9 100644
> --- a/lib/librte_eal/common/eal_common_bus.c
> +++ b/lib/librte_eal/common/eal_common_bus.c
> @@ -51,8 +51,11 @@ rte_bus_register(struct rte_bus *bus)
>  	RTE_VERIFY(bus->scan);
>  	RTE_VERIFY(bus->probe);
>  	RTE_VERIFY(bus->find_device);
> +	RTE_VERIFY(bus->find_device_by_name);
>  	/* Buses supporting driver plug also require unplug. */
>  	RTE_VERIFY(!bus->plug || bus->unplug);
> +	RTE_VERIFY(bus->remap_device);
> +	RTE_VERIFY(bus->bind_driver);
> 
>  	TAILQ_INSERT_TAIL(&rte_bus_list, bus, next);
>  	RTE_LOG(DEBUG, EAL, "Registered [%s] bus.\n", bus->name);
> @@ -170,6 +173,14 @@ cmp_rte_device(const struct rte_device *dev1,
> const void *_dev2)
>  }
> 
>  static int
> +cmp_rte_device_name(const char *dev_name1, const void *_dev_name2)
> +{
> +	const char *dev_name2 = _dev_name2;
> +
> +	return strcmp(dev_name1, dev_name2);
> +}
> +
> +static int
>  bus_find_device(const struct rte_bus *bus, const void *_dev)
>  {
>  	struct rte_device *dev;
> @@ -178,6 +189,25 @@ bus_find_device(const struct rte_bus *bus, const
> void *_dev)
>  	return dev == NULL;
>  }
> 
> +static struct rte_device *
> +bus_find_device_by_name(const struct rte_bus *bus, const void
> *_dev_name)
> +{
> +	struct rte_device *dev;
> +
> +	dev = bus->find_device_by_name(NULL, cmp_rte_device_name,
> _dev_name);
> +	return dev;
> +}
> +
> +struct rte_device *
> +
> +rte_bus_find_device(const struct rte_bus *bus, const void *_dev_name)
> +{
> +	struct rte_device *dev;
> +
> +	dev = bus_find_device_by_name(bus, _dev_name);
> +	return dev;
> +}
> +
>  struct rte_bus *
>  rte_bus_find_by_device(const struct rte_device *dev)
>  {
> diff --git a/lib/librte_eal/common/eal_common_dev.c
> b/lib/librte_eal/common/eal_common_dev.c
> index dda8f58..47909e8 100644
> --- a/lib/librte_eal/common/eal_common_dev.c
> +++ b/lib/librte_eal/common/eal_common_dev.c
> @@ -42,9 +42,31 @@
>  #include <rte_devargs.h>
>  #include <rte_debug.h>
>  #include <rte_log.h>
> +#include <rte_spinlock.h>
> +#include <rte_malloc.h>
> 
>  #include "eal_private.h"
> 
> +/* spinlock for device callbacks */
> +static rte_spinlock_t rte_dev_cb_lock = RTE_SPINLOCK_INITIALIZER;
> +
> +/**
> + * The user application callback description.
> + *
> + * It contains callback address to be registered by user application,
> + * the pointer to the parameters for callback, and the event type.
> + */
> +struct rte_eal_dev_callback {
> +	TAILQ_ENTRY(rte_eal_dev_callback) next; /**< Callbacks list */
> +	rte_eal_dev_cb_fn cb_fn;                /**< Callback address */
> +	void *cb_arg;                           /**< Parameter for callback */
> +	void *ret_param;                        /**< Return parameter */
> +	enum rte_eal_dev_event_type event;      /**< device event type */
> +	uint32_t active;                        /**< Callback is executing */
> +};
> +
> +static struct rte_eal_dev_callback *dev_add_cb;
> +
>  static int cmp_detached_dev_name(const struct rte_device *dev,
>  	const void *_name)
>  {
> @@ -234,3 +256,150 @@ int rte_eal_hotplug_remove(const char *busname,
> const char *devname)
>  	rte_eal_devargs_remove(busname, devname);
>  	return ret;
>  }
> +
> +int
> +rte_eal_dev_monitor_enable(void)
> +{
> +	int ret;
> +
> +	ret = rte_dev_monitor_start();
> +	if (ret)
> +		RTE_LOG(ERR, EAL, "Can not init device monitor\n");
> +	return ret;
> +}
> +
> +int
> +rte_dev_callback_register(struct rte_device *device,
> +			enum rte_eal_dev_event_type event,
> +			rte_eal_dev_cb_fn cb_fn, void *cb_arg)
> +{
> +	struct rte_eal_dev_callback *user_cb;
> +
> +	if (!cb_fn)
> +		return -EINVAL;
> +

What's about checking the device pointer is not NULL ?

> +	rte_spinlock_lock(&rte_dev_cb_lock);
> +
> +	if (TAILQ_EMPTY(&(device->uev_cbs)))
> +		TAILQ_INIT(&(device->uev_cbs));
> +
> +	if (event == RTE_EAL_DEV_EVENT_ADD) {
> +		user_cb = NULL;
> +	} else {
> +		TAILQ_FOREACH(user_cb, &(device->uev_cbs), next) {
> +			if (user_cb->cb_fn == cb_fn &&
> +				user_cb->cb_arg == cb_arg &&
> +				user_cb->event == event) {
> +				break;
> +			}
> +		}
> +	}
> +
> +	/* create a new callback. */
> +	if (user_cb == NULL) {
> +		/* allocate a new interrupt callback entity */
> +		user_cb = rte_zmalloc("eal device event",
> +					sizeof(*user_cb), 0);
> +		if (user_cb == NULL) {
> +			RTE_LOG(ERR, EAL, "Can not allocate memory\n");

Missing rte_spinlock_unlock.

> +			return -ENOMEM;
> +		}
> +		user_cb->cb_fn = cb_fn;
> +		user_cb->cb_arg = cb_arg;
> +		user_cb->event = event;
> +		if (event == RTE_EAL_DEV_EVENT_ADD)
> +			dev_add_cb = user_cb;

Only one dpdk entity can register to ADD callback?

I suggest to add option to register all devices maybe by using dummy device which will include all the "ALL_DEVICES"  callbacks per event.  
All means past, present and future devices, by this way 1 callback can be called for all the devices and more than one dpdk entity could register to  an ADD\NEW event.
What's about NEW instead of ADD?

I also suggest to add the device pointer as a parameter to the callback(which will be managed by EAL).

> +		else
> +			TAILQ_INSERT_TAIL(&(device->uev_cbs), user_cb,
> next);
> +	}
> +
> +	rte_spinlock_unlock(&rte_dev_cb_lock);
> +	return 0;
> +}
> +
> +int
> +rte_dev_callback_unregister(struct rte_device *device,
> +			enum rte_eal_dev_event_type event,
> +			rte_eal_dev_cb_fn cb_fn, void *cb_arg)
> +{
> +	int ret;
> +	struct rte_eal_dev_callback *cb, *next;
> +
> +	if (!cb_fn)
> +		return -EINVAL;
> +
> +	rte_spinlock_lock(&rte_dev_cb_lock);
> +
> +	ret = 0;
> +	if (event == RTE_EAL_DEV_EVENT_ADD) {
> +		rte_free(dev_add_cb);
> +		dev_add_cb = NULL;
> +	} else {

Device NULL checking?

> +		for (cb = TAILQ_FIRST(&(device->uev_cbs)); cb != NULL;
> +		      cb = next) {
> +
> +			next = TAILQ_NEXT(cb, next);
> +
> +			if (cb->cb_fn != cb_fn || cb->event != event ||
> +					(cb->cb_arg != (void *)-1 &&
> +					cb->cb_arg != cb_arg))
> +				continue;
> +
> +			/*
> +			 * if this callback is not executing right now,
> +			 * then remove it.
> +			 */
> +			if (cb->active == 0) {
> +				TAILQ_REMOVE(&(device->uev_cbs), cb,
> next);
> +				rte_free(cb);
> +			} else {
> +				ret = -EAGAIN;
> +			}
> +		}
> +	}
> +	rte_spinlock_unlock(&rte_dev_cb_lock);
> +	return ret;
> +}
> +
> +int
> +_rte_dev_callback_process(struct rte_device *device,
> +			enum rte_eal_dev_event_type event,
> +			void *cb_arg, void *ret_param)
> +{
> +	struct rte_eal_dev_callback dev_cb;
> +	struct rte_eal_dev_callback *cb_lst;
> +	int rc = 0;
> +
> +	rte_spinlock_lock(&rte_dev_cb_lock);
> +	if (event == RTE_EAL_DEV_EVENT_ADD) {
> +		if (cb_arg != NULL)
> +			dev_add_cb->cb_arg = cb_arg;
> +
> +		if (ret_param != NULL)
> +			dev_add_cb->ret_param = ret_param;
> +
> +		rte_spinlock_unlock(&rte_dev_cb_lock);

Can't someone free it when it running?
I suggest to  keep the lock locked.
Callbacks are not allowed to use this mechanism to prevent deadlock. 

> +		rc = dev_add_cb->cb_fn(dev_add_cb->event,
> +				dev_add_cb->cb_arg, dev_add_cb-
> >ret_param);
> +		rte_spinlock_lock(&rte_dev_cb_lock);
> +	} else {
> +		TAILQ_FOREACH(cb_lst, &(device->uev_cbs), next) {
> +			if (cb_lst->cb_fn == NULL || cb_lst->event != event)
> +				continue;
> +			dev_cb = *cb_lst;
> +			cb_lst->active = 1;
> +			if (cb_arg != NULL)
> +				dev_cb.cb_arg = cb_arg;
> +			if (ret_param != NULL)
> +				dev_cb.ret_param = ret_param;
> +
> +			rte_spinlock_unlock(&rte_dev_cb_lock);

The current active flag doesn't do it  thread safe here, I suggest to keep the lock locked.
Scenario:
	1. Thread A see active = 0 in unregister function.
	2. Context switch.
	3. Thread B start the callback.
	4. Context switch.
	5. Thread A free it.
	6. Context switch.
	7. Seg fault in Thread B.

> +			rc = dev_cb.cb_fn(dev_cb.event,
> +					dev_cb.cb_arg, dev_cb.ret_param);
> +			rte_spinlock_lock(&rte_dev_cb_lock);
> +			cb_lst->active = 0;
> +		}
> +	}
> +	rte_spinlock_unlock(&rte_dev_cb_lock);
> +	return rc;
> +}
> diff --git a/lib/librte_eal/common/include/rte_bus.h
> b/lib/librte_eal/common/include/rte_bus.h
> index 6fb0834..6c4ae31 100644
> --- a/lib/librte_eal/common/include/rte_bus.h
> +++ b/lib/librte_eal/common/include/rte_bus.h
> @@ -122,6 +122,34 @@ typedef struct rte_device *
>  			 const void *data);
> 
>  /**
> + * Device iterator to find a device on a bus.
> + *
> + * This function returns an rte_device if one of those held by the bus
> + * matches the data passed as parameter.
> + *
> + * If the comparison function returns zero this function should stop iterating
> + * over any more devices. To continue a search the device of a previous
> search
> + * can be passed via the start parameter.
> + *
> + * @param cmp
> + *	the device name comparison function.
> + *
> + * @param data
> + *	Data to compare each device against.
> + *
> + * @param start
> + *	starting point for the iteration
> + *
> + * @return
> + *	The first device matching the data, NULL if none exists.
> + */
> +typedef struct rte_device *
> +(*rte_bus_find_device_by_name_t)(const struct rte_device *start,
> +			 rte_dev_cmp_name_t cmp,
> +			 const void *data);
> +
> +
> +/**
>   * Implementation specific probe function which is responsible for linking
>   * devices on that bus with applicable drivers.
>   *
> @@ -168,6 +196,37 @@ typedef int (*rte_bus_unplug_t)(struct rte_device
> *dev);
>  typedef int (*rte_bus_parse_t)(const char *name, void *addr);
> 
>  /**
> + * Implementation specific remap function which is responsible for
> remmaping
> + * devices on that bus from original share memory resource to a private
> memory
> + * resource for the sake of device has been removal.
> + *
> + * @param dev
> + *	Device pointer that was returned by a previous call to find_device.
> + *
> + * @return
> + *	0 on success.
> + *	!0 on error.
> + */
> +typedef int (*rte_bus_remap_device_t)(struct rte_device *dev);
> +
> +/**
> + * Implementation specific bind driver function which is responsible for bind
> + * a explicit type of driver with a devices on that bus.
> + *
> + * @param dev_name
> + *	device textual description.
> + *
> + * @param drv_type
> + *	driver type textual description.
> + *
> + * @return
> + *	0 on success.
> + *	!0 on error.
> + */
> +typedef int (*rte_bus_bind_driver_t)(const char *dev_name,
> +				const char *drv_type);
> +
> +/**
>   * Bus scan policies
>   */
>  enum rte_bus_scan_mode {
> @@ -206,9 +265,13 @@ struct rte_bus {
>  	rte_bus_scan_t scan;         /**< Scan for devices attached to bus */
>  	rte_bus_probe_t probe;       /**< Probe devices on bus */
>  	rte_bus_find_device_t find_device; /**< Find a device on the bus */
> +	rte_bus_find_device_by_name_t find_device_by_name;
> +				     /**< Find a device on the bus */
>  	rte_bus_plug_t plug;         /**< Probe single device for drivers */
>  	rte_bus_unplug_t unplug;     /**< Remove single device from driver
> */
>  	rte_bus_parse_t parse;       /**< Parse a device name */
> +	rte_bus_remap_device_t remap_device;       /**< remap a device */
> +	rte_bus_bind_driver_t bind_driver; /**< bind a driver for bus device
> */
>  	struct rte_bus_conf conf;    /**< Bus configuration */
>  	rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu
> class */
>  };
> @@ -306,6 +369,12 @@ struct rte_bus *rte_bus_find(const struct rte_bus
> *start, rte_bus_cmp_t cmp,
>  struct rte_bus *rte_bus_find_by_device(const struct rte_device *dev);
> 
>  /**
> + * Find the registered bus for a particular device.
> + */
> +struct rte_device *rte_bus_find_device(const struct rte_bus *bus,
> +				const void *dev_name);
> +
> +/**
>   * Find the registered bus for a given name.
>   */
>  struct rte_bus *rte_bus_find_by_name(const char *busname);
> diff --git a/lib/librte_eal/common/include/rte_dev.h
> b/lib/librte_eal/common/include/rte_dev.h
> index 9342e0c..19971d0 100644
> --- a/lib/librte_eal/common/include/rte_dev.h
> +++ b/lib/librte_eal/common/include/rte_dev.h
> @@ -51,6 +51,15 @@ extern "C" {
> 
>  #include <rte_log.h>
> 
> +#include <exec-env/rte_dev.h>
> +
> +typedef int (*rte_eal_dev_cb_fn)(enum rte_eal_dev_event_type event,
> +					void *cb_arg, void *ret_param);
> +
> +struct rte_eal_dev_callback;
> +/** @internal Structure to keep track of registered callbacks */
> +TAILQ_HEAD(rte_eal_dev_cb_list, rte_eal_dev_callback);
> +
>  __attribute__((format(printf, 2, 0)))
>  static inline void
>  rte_pmd_debug_trace(const char *func_name, const char *fmt, ...)
> @@ -157,6 +166,13 @@ struct rte_driver {
>   */
>  #define RTE_DEV_NAME_MAX_LEN 64
> 
> +enum device_state {
> +	DEVICE_UNDEFINED,
> +	DEVICE_FAULT,
> +	DEVICE_PARSED,
> +	DEVICE_PROBED,
> +};
> +
>  /**
>   * A structure describing a generic device.
>   */
> @@ -166,6 +182,9 @@ struct rte_device {
>  	const struct rte_driver *driver;/**< Associated driver */
>  	int numa_node;                /**< NUMA node connection */
>  	struct rte_devargs *devargs;  /**< Device user arguments */
> +	enum device_state state;  /**< Device state */
> +	/** User application callbacks for device event */
> +	struct rte_eal_dev_cb_list uev_cbs;
>  };
> 
>  /**
> @@ -248,6 +267,8 @@ int rte_eal_hotplug_remove(const char *busname,
> const char *devname);
>   */
>  typedef int (*rte_dev_cmp_t)(const struct rte_device *dev, const void
> *data);
> 
> +typedef int (*rte_dev_cmp_name_t)(const char *dev_name, const void
> *data);
> +
>  #define RTE_PMD_EXPORT_NAME_ARRAY(n, idx) n##idx[]
> 
>  #define RTE_PMD_EXPORT_NAME(name, idx) \
> @@ -293,4 +314,72 @@ __attribute__((used)) = str
>  }
>  #endif
> 
> +/**
> + * It enable the device event monitoring for a specific event.
> + *
> + * @param none
> + * @return
> + *   - On success, zero.
> + *   - On failure, a negative value.
> + */
> +int
> +rte_eal_dev_monitor_enable(void);
> +/**
> + * It registers the callback for the specific event. Multiple
> + * callbacks cal be registered at the same time.
> + * @param event
> + *  The device event type.
> + * @param cb_fn
> + *  callback address.
> + * @param cb_arg
> + *  address of parameter for callback.
> + *
> + * @return
> + *  - On success, zero.
> + *  - On failure, a negative value.
> + */
> +int rte_dev_callback_register(struct rte_device *device,
> +			enum rte_eal_dev_event_type event,
> +			rte_eal_dev_cb_fn cb_fn, void *cb_arg);
> +
> +/**
> + * It unregisters the callback according to the specified event.
> + *
> + * @param event
> + *  The event type which corresponding to the callback.
> + * @param cb_fn
> + *  callback address.
> + *  address of parameter for callback, (void *)-1 means to remove all
> + *  registered which has the same callback address.
> + *
> + * @return
> + *  - On success, return the number of callback entities removed.
> + *  - On failure, a negative value.
> + */
> +int rte_dev_callback_unregister(struct rte_device *device,
> +			enum rte_eal_dev_event_type event,
> +			rte_eal_dev_cb_fn cb_fn, void *cb_arg);
> +
> +/**
> + * @internal Executes all the user application registered callbacks for
> + * the specific device. It is for DPDK internal user only. User
> + * application should not call it directly.
> + *
> + * @param event
> + *  The device event type.
> + * @param cb_arg
> + *  callback parameter.
> + * @param ret_param
> + *  To pass data back to user application.
> + *  This allows the user application to decide if a particular function
> + *  is permitted or not.
> + *
> + * @return
> + *  - On success, return zero.
> + *  - On failure, a negative value.
> + */
> +int
> +_rte_dev_callback_process(struct rte_device *device,
> +			enum rte_eal_dev_event_type event,
> +			void *cb_arg, void *ret_param);
>  #endif /* _RTE_DEV_H_ */
> diff --git a/lib/librte_eal/linuxapp/eal/Makefile
> b/lib/librte_eal/linuxapp/eal/Makefile
> index 5a7b8b2..05a2437 100644
> --- a/lib/librte_eal/linuxapp/eal/Makefile
> +++ b/lib/librte_eal/linuxapp/eal/Makefile
> @@ -67,6 +67,7 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) +=
> eal_lcore.c
>  SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_timer.c
>  SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_interrupts.c
>  SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_alarm.c
> +SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_dev.c
> 
>  # from common dir
>  SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_common_lcore.c
> @@ -120,7 +121,7 @@ ifeq ($(CONFIG_RTE_TOOLCHAIN_GCC),y)
>  CFLAGS_eal_thread.o += -Wno-return-type
>  endif
> 
> -INC := rte_kni_common.h
> +INC := rte_kni_common.h rte_dev.h
> 
>  SYMLINK-$(CONFIG_RTE_EXEC_ENV_LINUXAPP)-include/exec-env := \
>  	$(addprefix include/exec-env/,$(INC))
> diff --git a/lib/librte_eal/linuxapp/eal/eal_alarm.c
> b/lib/librte_eal/linuxapp/eal/eal_alarm.c
> index 8e4a775..29e73a7 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_alarm.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_alarm.c
> @@ -209,6 +209,7 @@ rte_eal_alarm_cancel(rte_eal_alarm_callback cb_fn,
> void *cb_arg)
>  	int count = 0;
>  	int err = 0;
>  	int executing;
> +	int ret;
> 
>  	if (!cb_fn) {
>  		rte_errno = EINVAL;
> @@ -259,6 +260,10 @@ rte_eal_alarm_cancel(rte_eal_alarm_callback cb_fn,
> void *cb_arg)
>  			}
>  			ap_prev = ap;
>  		}
> +
> +		ret |= rte_intr_callback_unregister(&intr_handle,
> +				eal_alarm_callback, NULL);
> +
>  		rte_spinlock_unlock(&alarm_list_lk);
>  	} while (executing != 0);
> 
> diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c
> b/lib/librte_eal/linuxapp/eal/eal_dev.c
> new file mode 100644
> index 0000000..49fd0dc
> --- /dev/null
> +++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
> @@ -0,0 +1,356 @@
> +/*-
> + *   Copyright(c) 2010-2017 Intel Corporation.
> + *   All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in
> + *       the documentation and/or other materials provided with the
> + *       distribution.
> + *     * Neither the name of Intel Corporation nor the names of its
> + *       contributors may be used to endorse or promote products derived
> + *       from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
> CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT
> NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
> FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
> COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
> INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
> NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS
> OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
> AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
> TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
> THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
> DAMAGE.
> + */
> +
> +#include <stdio.h>
> +#include <string.h>
> +#include <inttypes.h>
> +#include <sys/queue.h>
> +#include <sys/signalfd.h>
> +#include <sys/ioctl.h>
> +#include <sys/socket.h>
> +#include <linux/netlink.h>
> +#include <sys/epoll.h>
> +#include <unistd.h>
> +#include <signal.h>
> +#include <stdbool.h>
> +
> +#include <rte_malloc.h>
> +#include <rte_bus.h>
> +#include <rte_dev.h>
> +#include <rte_devargs.h>
> +#include <rte_debug.h>
> +#include <rte_log.h>
> +
> +#include "eal_thread.h"
> +
> +/* uev monitoring thread */
> +static pthread_t uev_monitor_thread;
> +
> +bool udev_exit = true;
> +
> +bool no_request_thread = true;
> +
> +static void sig_handler(int signum)
> +{
> +	if (signum == SIGINT || signum == SIGTERM)
> +		rte_dev_monitor_stop();
> +}
> +
> +static int
> +dev_monitor_fd_new(void)
> +{
> +
> +	int uevent_fd;
> +
> +	uevent_fd = socket(PF_NETLINK, SOCK_RAW | SOCK_CLOEXEC |
> +			SOCK_NONBLOCK,
> +			NETLINK_KOBJECT_UEVENT);
> +	if (uevent_fd < 0) {
> +		RTE_LOG(ERR, EAL, "create uevent fd failed\n");
> +		return -1;
> +	}
> +	return uevent_fd;
> +}
> +
> +static int
> +dev_monitor_enable(int netlink_fd)
> +{
> +	struct sockaddr_nl addr;
> +	int ret;
> +	int size = 64 * 1024;
> +	int nonblock = 1;
> +
> +	memset(&addr, 0, sizeof(addr));
> +	addr.nl_family = AF_NETLINK;
> +	addr.nl_pid = 0;
> +	addr.nl_groups = 0xffffffff;
> +
> +	if (bind(netlink_fd, (struct sockaddr *) &addr, sizeof(addr)) < 0) {
> +		RTE_LOG(ERR, EAL, "bind failed\n");
> +		goto err;
> +	}
> +
> +	setsockopt(netlink_fd, SOL_SOCKET, SO_PASSCRED, &size,
> sizeof(size));
> +
> +	ret = ioctl(netlink_fd, FIONBIO, &nonblock);
> +	if (ret != 0) {
> +		RTE_LOG(ERR, EAL, "ioctl(FIONBIO) failed\n");
> +		goto err;
> +	}
> +	return 0;
> +err:
> +	close(netlink_fd);
> +	return -1;
> +}
> +
> +static void
> +dev_uev_parse(const char *buf, struct rte_eal_uevent *event)
> +{
> +	char action[RTE_EAL_UEV_MSG_ELEM_LEN];
> +	char subsystem[RTE_EAL_UEV_MSG_ELEM_LEN];
> +	char dev_path[RTE_EAL_UEV_MSG_ELEM_LEN];
> +	char pci_slot_name[RTE_EAL_UEV_MSG_ELEM_LEN];
> +	int i = 0;
> +
> +	memset(action, 0, RTE_EAL_UEV_MSG_ELEM_LEN);
> +	memset(subsystem, 0, RTE_EAL_UEV_MSG_ELEM_LEN);
> +	memset(dev_path, 0, RTE_EAL_UEV_MSG_ELEM_LEN);
> +	memset(pci_slot_name, 0, RTE_EAL_UEV_MSG_ELEM_LEN);
> +
> +	while (i < RTE_EAL_UEV_MSG_LEN) {
> +		for (; i < RTE_EAL_UEV_MSG_LEN; i++) {
> +			if (*buf)
> +				break;
> +			buf++;
> +		}
> +		if (!strncmp(buf, "libudev", 7)) {
> +			buf += 7;
> +			i += 7;
> +			event->group = UEV_MONITOR_UDEV;
> +		}
> +		if (!strncmp(buf, "ACTION=", 7)) {
> +			buf += 7;
> +			i += 7;
> +			snprintf(action, sizeof(action), "%s", buf);
> +		} else if (!strncmp(buf, "DEVPATH=", 8)) {
> +			buf += 8;
> +			i += 8;
> +			snprintf(dev_path, sizeof(dev_path), "%s", buf);
> +		} else if (!strncmp(buf, "SUBSYSTEM=", 10)) {
> +			buf += 10;
> +			i += 10;
> +			snprintf(subsystem, sizeof(subsystem), "%s", buf);
> +		} else if (!strncmp(buf, "PCI_SLOT_NAME=", 14)) {
> +			buf += 14;
> +			i += 14;
> +			snprintf(pci_slot_name, sizeof(subsystem), "%s",
> buf);
> +			event->devname = pci_slot_name;
> +		}
> +		for (; i < RTE_EAL_UEV_MSG_LEN; i++) {
> +			if (*buf == '\0')
> +				break;
> +			buf++;
> +		}
> +	}
> +
> +	if (!strncmp(subsystem, "pci", 3))
> +		event->subsystem = UEV_SUBSYSTEM_PCI;
> +	if (!strncmp(action, "add", 3))
> +		event->type = RTE_EAL_DEV_EVENT_ADD;
> +	if (!strncmp(action, "remove", 6))
> +		event->type = RTE_EAL_DEV_EVENT_REMOVE;
> +	event->devname = pci_slot_name;
> +}
> +
> +static int
> +dev_uev_receive(int fd, struct rte_eal_uevent *uevent)
> +{
> +	int ret;
> +	char buf[RTE_EAL_UEV_MSG_LEN];
> +
> +	memset(uevent, 0, sizeof(struct rte_eal_uevent));
> +	memset(buf, 0, RTE_EAL_UEV_MSG_LEN);
> +
> +	ret = recv(fd, buf, RTE_EAL_UEV_MSG_LEN - 1, MSG_DONTWAIT);
> +	if (ret < 0) {
> +		RTE_LOG(ERR, EAL,
> +		"Socket read error(%d): %s\n",
> +		errno, strerror(errno));
> +		return -1;
> +	} else if (ret == 0)
> +		/* connection closed */
> +		return -1;
> +
> +	dev_uev_parse(buf, uevent);
> +
> +	return 0;
> +}
> +
> +static int
> +dev_uev_process(struct epoll_event *events, int nfds)
> +{
> +	struct rte_bus *bus;
> +	struct rte_device *dev;
> +	struct rte_eal_uevent uevent;
> +	int ret;
> +	int i;
> +
> +	for (i = 0; i < nfds; i++) {
> +		/**
> +		 * check device uevent from kernel side, no need to check
> +		 * uevent from udev.
> +		 */
> +		if ((dev_uev_receive(events[i].data.fd, &uevent)) ||
> +			(uevent.group == UEV_MONITOR_UDEV))
> +			return 0;
> +
> +		/* default handle all pci devcie when is being hot plug */
> +		if (uevent.subsystem == UEV_SUBSYSTEM_PCI) {
> +			bus = rte_bus_find_by_name("pci");
> +			dev = rte_bus_find_device(bus, uevent.devname);
> +			if (uevent.type == RTE_EAL_DEV_EVENT_REMOVE) {
> +
> +				if ((!dev) || dev->state ==
> DEVICE_UNDEFINED)
> +					return 0;
> +				dev->state = DEVICE_FAULT;
> +
> +				/**
> +				 * remap the resource to be fake
> +				 * before user's removal processing
> +				 */
> +				ret = bus->remap_device(dev);
> +				if (!ret)
> +
> 	return(_rte_dev_callback_process(dev,
> +					  RTE_EAL_DEV_EVENT_REMOVE,
> +					  NULL, NULL));

What is the reason to keep this device in EAL device list after the removal?
I suggest to remove it (driver remove, bus remove and EAL remove) after the callbacks running.
By this way EAL can initiate all device removals.

> +			} else if (uevent.type == RTE_EAL_DEV_EVENT_ADD)
> {
> +				if (dev == NULL) {
> +					/**
> +					 * bind the driver to the device
> +					 * before user's add processing
> +					 */
> +					bus->bind_driver(
> +						uevent.devname,
> +						"igb_uio");
> +

Similar comments here:
EAL can initiate all device probe operations by adding the device and probing it here before the callback running.
Then, also the device pointer can be passed to the callbacks.

> 	return(_rte_dev_callback_process(NULL,
> +					  RTE_EAL_DEV_EVENT_ADD,
> +					  uevent.devname, NULL));
> +				}
> +			}
> +		}
> +	}
> +	return 0;
> +}
> +
> +/**
> + * It builds/rebuilds up the epoll file descriptor with all the
> + * file descriptors being waited on. Then handles the interrupts.
> + *
> + * @param arg
> + *  pointer. (unused)
> + *
> + * @return
> + *  never return;
> + */
> +static __attribute__((noreturn)) void *
> +dev_uev_monitoring(__rte_unused void *arg)
> +{
> +	struct sigaction act;
> +	sigset_t mask;
> +	int netlink_fd;
> +	struct epoll_event ep_kernel;
> +	int fd_ep;
> +
> +	udev_exit = false;
> +
> +	/* set signal handlers */
> +	memset(&act, 0x00, sizeof(struct sigaction));
> +	act.sa_handler = sig_handler;
> +	sigemptyset(&act.sa_mask);
> +	act.sa_flags = SA_RESTART;
> +	sigaction(SIGINT, &act, NULL);
> +	sigaction(SIGTERM, &act, NULL);
> +	sigemptyset(&mask);
> +	sigaddset(&mask, SIGINT);
> +	sigaddset(&mask, SIGTERM);
> +	sigprocmask(SIG_UNBLOCK, &mask, NULL);
> +
> +	fd_ep = epoll_create1(EPOLL_CLOEXEC);
> +	if (fd_ep < 0) {
> +		RTE_LOG(ERR, EAL, "error creating epoll fd: %m\n");
> +		goto out;
> +	}
> +
> +	netlink_fd = dev_monitor_fd_new();
> +
> +	if (dev_monitor_enable(netlink_fd) < 0) {
> +		RTE_LOG(ERR, EAL, "error subscribing to kernel events\n");
> +		goto out;
> +	}
> +
> +	memset(&ep_kernel, 0, sizeof(struct epoll_event));
> +	ep_kernel.events = EPOLLIN | EPOLLPRI | EPOLLRDHUP | EPOLLHUP;
> +	ep_kernel.data.fd = netlink_fd;
> +	if (epoll_ctl(fd_ep, EPOLL_CTL_ADD, netlink_fd,
> +		&ep_kernel) < 0) {
> +		RTE_LOG(ERR, EAL, "error addding fd to epoll: %m\n");
> +		goto out;
> +	}
> +
> +	while (!udev_exit) {
> +		int fdcount;
> +		struct epoll_event ev[1];
> +
> +		fdcount = epoll_wait(fd_ep, ev, 1, -1);
> +		if (fdcount < 0) {
> +			if (errno != EINTR)
> +				RTE_LOG(ERR, EAL, "error receiving uevent "
> +					"message: %m\n");
> +				continue;
> +			}
> +
> +		/* epoll_wait has at least one fd ready to read */
> +		if (dev_uev_process(ev, fdcount) < 0) {
> +			if (errno != EINTR)
> +				RTE_LOG(ERR, EAL, "error processing uevent
> "
> +					"message: %m\n");
> +		}
> +	}
> +out:
> +	if (fd_ep >= 0)
> +		close(fd_ep);
> +	if (netlink_fd >= 0)
> +		close(netlink_fd);
> +	rte_panic("uev monitoring fail\n");
> +}
> +
> +int
> +rte_dev_monitor_start(void)
> +{

Maybe add option to run it also by new EAL command line parameter?

> +	int ret;
> +
> +	if (!no_request_thread)
> +		return 0;
> +	no_request_thread = false;
> +
> +	/* create the host thread to wait/handle the uevent from kernel */
> +	ret = pthread_create(&uev_monitor_thread, NULL,
> +		dev_uev_monitoring, NULL);

What is the reason to open new thread for hotplug?
Why not to use the current dpdk host thread by the alarm mechanism? 

> +	return ret;
> +}
> +
> +int
> +rte_dev_monitor_stop(void)
> +{
> +	udev_exit = true;
> +	no_request_thread = true;
> +	return 0;
> +}
> diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_dev.h
> b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_dev.h
> new file mode 100644
> index 0000000..6a6feb5
> --- /dev/null
> +++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_dev.h
> @@ -0,0 +1,106 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
> + *   All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in
> + *       the documentation and/or other materials provided with the
> + *       distribution.
> + *     * Neither the name of Intel Corporation nor the names of its
> + *       contributors may be used to endorse or promote products derived
> + *       from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
> CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT
> NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
> FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
> COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
> INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
> NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS
> OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
> AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
> TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
> THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
> DAMAGE.
> + */
> +
> +#ifndef _RTE_DEV_H_
> +#error "don't include this file directly, please include generic <rte_dev.h>"
> +#endif
> +
> +#ifndef _RTE_LINUXAPP_DEV_H_
> +#define _RTE_LINUXAPP_DEV_H_
> +
> +#include <stdio.h>
> +
> +#include <rte_dev.h>
> +
> +#define RTE_EAL_UEV_MSG_LEN 4096
> +#define RTE_EAL_UEV_MSG_ELEM_LEN 128
> +
> +enum uev_subsystem {
> +	UEV_SUBSYSTEM_UIO,
> +	UEV_SUBSYSTEM_VFIO,
> +	UEV_SUBSYSTEM_PCI,
> +	UEV_SUBSYSTEM_MAX
> +};
> +
> +enum uev_monitor_netlink_group {
> +	UEV_MONITOR_KERNEL,
> +	UEV_MONITOR_UDEV,
> +};
> +
> +/**
> + * The device event type.
> + */
> +enum rte_eal_dev_event_type {
> +	RTE_EAL_DEV_EVENT_UNKNOWN,	/**< unknown event type */
> +	RTE_EAL_DEV_EVENT_ADD,		/**< device adding event */
> +	RTE_EAL_DEV_EVENT_REMOVE,
> +					/**< device removing event */
> +	RTE_EAL_DEV_EVENT_CHANGE,
> +					/**< device status change event */
> +	RTE_EAL_DEV_EVENT_MOVE,		/**< device sys path move
> event */
> +	RTE_EAL_DEV_EVENT_ONLINE,	/**< device online event */
> +	RTE_EAL_DEV_EVENT_OFFLINE,	/**< device offline event */
> +	RTE_EAL_DEV_EVENT_MAX		/**< max value of this enum
> */
> +};
> +
> +struct rte_eal_uevent {
> +	enum rte_eal_dev_event_type type;	/**< device event type */
> +	int subsystem;				/**< subsystem id */
> +	char *devname;				/**< device name */
> +	enum uev_monitor_netlink_group group;	/**< device netlink
> group */
> +};
> +
> +/**
> + * Start the device uevent monitoring.
> + *
> + * @param none
> + * @return
> + *   - On success, zero.
> + *   - On failure, a negative value.
> + */
> +int
> +rte_dev_monitor_start(void);
> +
> +/**
> + * Stop the device uevent monitoring .
> + *
> + * @param none
> + * @return
> + *   - On success, zero.
> + *   - On failure, a negative value.
> + */
> +
> +int
> +rte_dev_monitor_stop(void);
> +
> +#endif /* _RTE_LINUXAPP_DEV_H_ */
> diff --git a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
> b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
> index a3a98c1..d0e07b4 100644
> --- a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
> +++ b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
> @@ -354,6 +354,12 @@ igbuio_pci_release(struct uio_info *info, struct
> inode *inode)
>  	struct rte_uio_pci_dev *udev = info->priv;
>  	struct pci_dev *dev = udev->pdev;
> 
> +	/* check if device have been remove before release */
> +	if ((&dev->dev.kobj)->state_remove_uevent_sent == 1) {
> +		pr_info("The device have been removed\n");
> +		return -1;
> +	}
> +
>  	/* disable interrupts */
>  	igbuio_pci_disable_interrupts(udev);
> 
> diff --git a/lib/librte_pci/rte_pci.c b/lib/librte_pci/rte_pci.c
> index 0160fc1..feb5fd7 100644
> --- a/lib/librte_pci/rte_pci.c
> +++ b/lib/librte_pci/rte_pci.c
> @@ -172,6 +172,26 @@ rte_pci_addr_parse(const char *str, struct
> rte_pci_addr *addr)
>  	return -1;
>  }
> 
> +/* map a private resource from an address*/
> +void *
> +pci_map_private_resource(void *requested_addr, off_t offset, size_t size)
> +{
> +	void *mapaddr;
> +
> +	mapaddr = mmap(requested_addr, size,
> +			   PROT_READ | PROT_WRITE,
> +			   MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED,
> -1, 0);
> +	if (mapaddr == MAP_FAILED) {
> +		RTE_LOG(ERR, EAL, "%s(): cannot mmap(%p, 0x%lx, 0x%lx): "
> +			"%s (%p)\n",
> +			__func__, requested_addr,
> +			(unsigned long)size, (unsigned long)offset,
> +			strerror(errno), mapaddr);
> +	} else
> +		RTE_LOG(DEBUG, EAL, "  PCI memory mapped at %p\n",
> mapaddr);
> +
> +	return mapaddr;
> +}
> 
>  /* map a particular resource from a file */
>  void *
> diff --git a/lib/librte_pci/rte_pci.h b/lib/librte_pci/rte_pci.h
> index 4f2cd18..f6091a6 100644
> --- a/lib/librte_pci/rte_pci.h
> +++ b/lib/librte_pci/rte_pci.h
> @@ -227,6 +227,23 @@ int rte_pci_addr_cmp(const struct rte_pci_addr
> *addr,
>  int rte_pci_addr_parse(const char *str, struct rte_pci_addr *addr);
> 
>  /**
> + * @internal
> + * Map to a particular private resource.
> + *
> + * @param requested_addr
> + *      The starting address for the new mapping range.
> + * @param offset
> + *      The offset for the mapping range.
> + * @param size
> + *      The size for the mapping range.
> + * @return
> + *   - On success, the function returns a pointer to the mapped area.
> + *   - On error, the value MAP_FAILED is returned.
> + */
> +void *pci_map_private_resource(void *requested_addr, off_t offset,
> +		size_t size);
> +
> +/**
>   * Map a particular resource from a file.
>   *
>   * @param requested_addr
> --
> 2.7.4

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH v7 0/2] add uevent monitor for hot plug
  2017-11-01 20:16                   ` [PATCH v6 2/2] app/testpmd: use uevent to monitor hotplug Jeff Guo
@ 2018-01-03  1:42                     ` Jeff Guo
  2018-01-03  1:42                       ` [PATCH v7 1/2] eal: " Jeff Guo
  2018-01-03  1:42                       ` [PATCH v7 2/2] app/testpmd: use uevent to monitor hotplug Jeff Guo
  0 siblings, 2 replies; 494+ messages in thread
From: Jeff Guo @ 2018-01-03  1:42 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, gaetan.rivet
  Cc: konstantin.ananyev, jblunck, shreyansh.jain, jingjing.wu, dev,
	jia.guo, thomas, helin.zhang, motih

So far, about hot plug in dpdk, we already have hot plug add/remove
api and fail-safe driver to offload the fail-safe work from the app
user. But there are still lack of a general event api, since the interrupt
event, which hot plug related with, is diversity between each device and
driver, such as mlx4, pci driver and others.

Use the hot removal event for example, pci drivers not all exposure the
remove interrupt, so in order to make user to easy use the hot plug feature
for pci driver, something must be done to detect the remove event at the
kernel level and offer a new line of interrupt to the user land.

Base on the uevent of kobject mechanism in kernel, we could use it to
benefit for monitoring the hot plug status of the device which not only
uio/vfio of pci bus devices, but also other, such as cpu/usb/pci-express
bus devices.

The idea is comming as bellow.

a.The uevent message form FD monitoring which will be useful.
remove@/devices/pci0000:80/0000:80:02.2/0000:82:00.0/0000:83:03.0/0000:84:00.2/uio/uio2
ACTION=remove
DEVPATH=/devices/pci0000:80/0000:80:02.2/0000:82:00.0/0000:83:03.0/0000:84:00.2/uio/uio2
SUBSYSTEM=uio
MAJOR=243
MINOR=2
DEVNAME=uio2
SEQNUM=11366

b.add uevent monitoring machanism:
add several general api to enable uevent monitoring.

c.add common uevent handler and uevent failure handler
uevent of device should be handler at bus or device layer, and the memory read
and write failure when hot removal should be handle correctly before detach behaviors.

d.show example how to use uevent monitor
enable uevent monitoring in testpmd or fail-safe to show usage.

patchset history:
v7->v6:
1.modify vdev part according to the vdev rework
2.re-define and split the func into common and bus specific code
3.fix some incorrect issue.
4.fix the system hung after send packcet issue.

v6->v5:
1.add hot plug policy, in eal, default handle to prepare hot plug work for
all pci device, then let app to manage to deside which device need to
hot plug.
2.modify to manage event callback in each device.
3.fix some system hung issue when igb_uio release.
4.modify the pci part to the bus-pci base on the bus rework.
5.add hot plug policy in app, show example to use hotplug list to manage
to deside which device need to hot plug.

v5->v4:
1.Move uevent monitor epolling from eal interrupt to eal device layer.
2.Redefine the eal device API for common, and distinguish between linux and bsd
3.Add failure handler helper api in bus layer.Add function of find device by name.
4.Replace of individual fd bind with single device, use a common fd to polling all device.
5.Add to register hot insertion monitoring and process, add function to auto bind driver befor user add device
6.Refine some coding style and typos issue
7.add new callback to process hot insertion

v4->v3:
1.move uevent monitor api from eal interrupt to eal device layer.
2.create uevent type and struct in eal device.
3.move uevent handler for each driver to eal layer.
4.add uevent failure handler to process signal fault issue.
5.add example for request and use uevent monitoring in testpmd.

v3->v2:
1.refine some return error
2.refine the string searching logic to avoid memory issue

v2->v1:
1.remove global variables of hotplug_fd, add uevent_fd
in rte_intr_handle to let each pci device self maintain it fd,
to fix dual device fd issue.
2.refine some typo error.


Jeff Guo (2):
  eal: add uevent monitor for hot plug
  app/testpmd: use uevent to monitor hotplug

 app/test-pmd/testpmd.c                             | 178 +++++++++++
 app/test-pmd/testpmd.h                             |   9 +
 drivers/bus/pci/bsd/pci.c                          |  30 ++
 drivers/bus/pci/linux/pci.c                        |  87 +++++
 drivers/bus/pci/linux/pci_init.h                   |   1 +
 drivers/bus/pci/pci_common.c                       |  43 +++
 drivers/bus/pci/pci_common_uio.c                   |  28 ++
 drivers/bus/pci/private.h                          |  12 +
 drivers/bus/pci/rte_bus_pci.h                      |  25 ++
 drivers/bus/vdev/vdev.c                            |  36 +++
 lib/librte_eal/bsdapp/eal/eal_dev.c                |  64 ++++
 .../bsdapp/eal/include/exec-env/rte_dev.h          | 106 ++++++
 lib/librte_eal/common/eal_common_bus.c             |  30 ++
 lib/librte_eal/common/eal_common_dev.c             | 169 ++++++++++
 lib/librte_eal/common/include/rte_bus.h            |  69 ++++
 lib/librte_eal/common/include/rte_dev.h            |  89 ++++++
 lib/librte_eal/linuxapp/eal/Makefile               |   3 +-
 lib/librte_eal/linuxapp/eal/eal_alarm.c            |   5 +
 lib/librte_eal/linuxapp/eal/eal_dev.c              | 356 +++++++++++++++++++++
 .../linuxapp/eal/include/exec-env/rte_dev.h        | 106 ++++++
 lib/librte_eal/linuxapp/igb_uio/igb_uio.c          |   6 +
 lib/librte_pci/rte_pci.c                           |  20 ++
 lib/librte_pci/rte_pci.h                           |  17 +
 23 files changed, 1488 insertions(+), 1 deletion(-)
 create mode 100644 lib/librte_eal/bsdapp/eal/eal_dev.c
 create mode 100644 lib/librte_eal/bsdapp/eal/include/exec-env/rte_dev.h
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_dev.c
 create mode 100644 lib/librte_eal/linuxapp/eal/include/exec-env/rte_dev.h

-- 
2.7.4

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH v7 1/2] eal: add uevent monitor for hot plug
  2018-01-03  1:42                     ` [PATCH v7 0/2] add uevent monitor for hot plug Jeff Guo
@ 2018-01-03  1:42                       ` Jeff Guo
  2018-01-02 17:02                         ` Matan Azrad
  2018-01-09  0:39                         ` Thomas Monjalon
  2018-01-03  1:42                       ` [PATCH v7 2/2] app/testpmd: use uevent to monitor hotplug Jeff Guo
  1 sibling, 2 replies; 494+ messages in thread
From: Jeff Guo @ 2018-01-03  1:42 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, gaetan.rivet
  Cc: konstantin.ananyev, jblunck, shreyansh.jain, jingjing.wu, dev,
	jia.guo, thomas, helin.zhang, motih

This patch aim to add a general uevent mechanism in eal device layer,
to enable all linux kernel object hot plug monitoring, so user could use these
APIs to monitor and read out the device status info that sent from the kernel
side, then corresponding to handle it, such as detach or attach the
device, and even benefit to use it to do smoothly fail safe work.

1) About uevent monitoring:
a: add one epolling to poll the netlink socket, to monitor the uevent of
   the device, add device_state in struct of rte_device, to identify the
   device state machine.
b: add enum of rte_eal_dev_event_type and struct of rte_eal_uevent.
c: add below API in rte eal device common layer.
   rte_eal_dev_monitor_enable
   rte_dev_callback_register
   rte_dev_callback_unregister
   _rte_dev_callback_process
   rte_dev_monitor_start
   rte_dev_monitor_stop

2) About failure handler, use pci uio for example,
   add pci_remap_device in bus layer and below function to process it:
   rte_pci_remap_device
   pci_uio_remap_resource
   pci_map_private_resource
   add rte_pci_dev_bind_driver to bind pci device with explicit driver.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v7->v6:
a.modify vdev part according to the vdev rework
b.re-define and split the func into common and bus specific code
c.fix some incorrect issue.
b.fix the system hung after send packcet issue.
---
 drivers/bus/pci/bsd/pci.c                          |  30 ++
 drivers/bus/pci/linux/pci.c                        |  87 +++++
 drivers/bus/pci/linux/pci_init.h                   |   1 +
 drivers/bus/pci/pci_common.c                       |  43 +++
 drivers/bus/pci/pci_common_uio.c                   |  28 ++
 drivers/bus/pci/private.h                          |  12 +
 drivers/bus/pci/rte_bus_pci.h                      |  25 ++
 drivers/bus/vdev/vdev.c                            |  36 +++
 lib/librte_eal/bsdapp/eal/eal_dev.c                |  64 ++++
 .../bsdapp/eal/include/exec-env/rte_dev.h          | 106 ++++++
 lib/librte_eal/common/eal_common_bus.c             |  30 ++
 lib/librte_eal/common/eal_common_dev.c             | 169 ++++++++++
 lib/librte_eal/common/include/rte_bus.h            |  69 ++++
 lib/librte_eal/common/include/rte_dev.h            |  89 ++++++
 lib/librte_eal/linuxapp/eal/Makefile               |   3 +-
 lib/librte_eal/linuxapp/eal/eal_alarm.c            |   5 +
 lib/librte_eal/linuxapp/eal/eal_dev.c              | 356 +++++++++++++++++++++
 .../linuxapp/eal/include/exec-env/rte_dev.h        | 106 ++++++
 lib/librte_eal/linuxapp/igb_uio/igb_uio.c          |   6 +
 lib/librte_pci/rte_pci.c                           |  20 ++
 lib/librte_pci/rte_pci.h                           |  17 +
 21 files changed, 1301 insertions(+), 1 deletion(-)
 create mode 100644 lib/librte_eal/bsdapp/eal/eal_dev.c
 create mode 100644 lib/librte_eal/bsdapp/eal/include/exec-env/rte_dev.h
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_dev.c
 create mode 100644 lib/librte_eal/linuxapp/eal/include/exec-env/rte_dev.h

diff --git a/drivers/bus/pci/bsd/pci.c b/drivers/bus/pci/bsd/pci.c
index b8e2178..d58dbf6 100644
--- a/drivers/bus/pci/bsd/pci.c
+++ b/drivers/bus/pci/bsd/pci.c
@@ -126,6 +126,29 @@ rte_pci_unmap_device(struct rte_pci_device *dev)
 	}
 }
 
+/* re-map pci device */
+int
+rte_pci_remap_device(struct rte_pci_device *dev)
+{
+	int ret;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	switch (dev->kdrv) {
+	case RTE_KDRV_NIC_UIO:
+		ret = pci_uio_remap_resource(dev);
+		break;
+	default:
+		RTE_LOG(DEBUG, EAL,
+			"  Not managed by a supported kernel driver, skipped\n");
+		ret = 1;
+		break;
+	}
+
+	return ret;
+}
+
 void
 pci_uio_free_resource(struct rte_pci_device *dev,
 		struct mapped_pci_resource *uio_res)
@@ -678,3 +701,10 @@ rte_pci_ioport_unmap(struct rte_pci_ioport *p)
 
 	return ret;
 }
+
+int
+rte_pci_dev_bind_driver(const char *dev_name, const char *drv_type)
+{
+	return -1;
+}
+
diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index 5da6728..792fd2c 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -145,6 +145,38 @@ rte_pci_unmap_device(struct rte_pci_device *dev)
 	}
 }
 
+/* Map pci device */
+int
+rte_pci_remap_device(struct rte_pci_device *dev)
+{
+	int ret = -1;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	switch (dev->kdrv) {
+	case RTE_KDRV_VFIO:
+#ifdef VFIO_PRESENT
+		/* no thing to do */
+#endif
+		break;
+	case RTE_KDRV_IGB_UIO:
+	case RTE_KDRV_UIO_GENERIC:
+		if (rte_eal_using_phys_addrs()) {
+			/* map resources for devices that use uio */
+			ret = pci_uio_remap_resource(dev);
+		}
+		break;
+	default:
+		RTE_LOG(DEBUG, EAL,
+			"  Not managed by a supported kernel driver, skipped\n");
+		ret = 1;
+		break;
+	}
+
+	return ret;
+}
+
 void *
 pci_find_max_end_va(void)
 {
@@ -386,6 +418,8 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr *addr)
 		rte_pci_add_device(dev);
 	}
 
+	dev->device.state = DEVICE_PARSED;
+	TAILQ_INIT(&(dev->device.uev_cbs));
 	return 0;
 }
 
@@ -854,3 +888,56 @@ rte_pci_ioport_unmap(struct rte_pci_ioport *p)
 
 	return ret;
 }
+
+int
+rte_pci_dev_bind_driver(const char *dev_name, const char *drv_type)
+{
+	char drv_bind_path[1024];
+	char drv_override_path[1024]; /* contains the /dev/uioX */
+	int drv_override_fd;
+	int drv_bind_fd;
+
+	RTE_SET_USED(drv_type);
+
+	snprintf(drv_override_path, sizeof(drv_override_path),
+		"/sys/bus/pci/devices/%s/driver_override", dev_name);
+
+	/* specify the driver for a device by writing to driver_override */
+	drv_override_fd = open(drv_override_path, O_WRONLY);
+	if (drv_override_fd < 0) {
+		RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
+			drv_override_path, strerror(errno));
+		goto err;
+	}
+
+	if (write(drv_override_fd, drv_type, sizeof(drv_type)) < 0) {
+		RTE_LOG(ERR, EAL,
+			"Error: bind failed - Cannot write "
+			"driver %s to device %s\n", drv_type, dev_name);
+		goto err;
+	}
+
+	close(drv_override_fd);
+
+	snprintf(drv_bind_path, sizeof(drv_bind_path),
+		"/sys/bus/pci/drivers/%s/bind", drv_type);
+
+	/* do the bind by writing device to the specific driver  */
+	drv_bind_fd = open(drv_bind_path, O_WRONLY | O_APPEND);
+	if (drv_bind_fd < 0) {
+		RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
+			drv_bind_path, strerror(errno));
+		goto err;
+	}
+
+	if (write(drv_bind_fd, dev_name, sizeof(dev_name)) < 0)
+		goto err;
+
+	close(drv_bind_fd);
+	return 0;
+err:
+	close(drv_override_fd);
+	close(drv_bind_fd);
+	return -1;
+}
+
diff --git a/drivers/bus/pci/linux/pci_init.h b/drivers/bus/pci/linux/pci_init.h
index f342c47..5838402 100644
--- a/drivers/bus/pci/linux/pci_init.h
+++ b/drivers/bus/pci/linux/pci_init.h
@@ -58,6 +58,7 @@ int pci_uio_alloc_resource(struct rte_pci_device *dev,
 		struct mapped_pci_resource **uio_res);
 void pci_uio_free_resource(struct rte_pci_device *dev,
 		struct mapped_pci_resource *uio_res);
+int pci_uio_remap_resource(struct rte_pci_device *dev);
 int pci_uio_map_resource_by_index(struct rte_pci_device *dev, int res_idx,
 		struct mapped_pci_resource *uio_res, int map_idx);
 
diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
index 104fdf9..5417b32 100644
--- a/drivers/bus/pci/pci_common.c
+++ b/drivers/bus/pci/pci_common.c
@@ -282,6 +282,7 @@ pci_probe_all_drivers(struct rte_pci_device *dev)
 		if (rc > 0)
 			/* positive value means driver doesn't support it */
 			continue;
+		dev->device.state = DEVICE_PROBED;
 		return 0;
 	}
 	return 1;
@@ -481,6 +482,7 @@ rte_pci_insert_device(struct rte_pci_device *exist_pci_dev,
 void
 rte_pci_remove_device(struct rte_pci_device *pci_dev)
 {
+	RTE_LOG(DEBUG, EAL, " rte_pci_remove_device for device list\n");
 	TAILQ_REMOVE(&rte_pci_bus.device_list, pci_dev, next);
 }
 
@@ -502,6 +504,44 @@ pci_find_device(const struct rte_device *start, rte_dev_cmp_t cmp,
 	return NULL;
 }
 
+static struct rte_device *
+pci_find_device_by_name(const struct rte_device *start,
+		rte_dev_cmp_name_t cmp_name,
+		const void *data)
+{
+	struct rte_pci_device *dev;
+
+	FOREACH_DEVICE_ON_PCIBUS(dev) {
+		if (start && &dev->device == start) {
+			start = NULL; /* starting point found */
+			continue;
+		}
+		if (cmp_name(dev->device.name, data) == 0)
+			return &dev->device;
+	}
+
+	return NULL;
+}
+
+static int
+pci_remap_device(struct rte_device *dev)
+{
+	struct rte_pci_device *pdev;
+	int ret;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	pdev = RTE_DEV_TO_PCI(dev);
+
+	/* remap resources for devices that use igb_uio */
+	ret = rte_pci_remap_device(pdev);
+	if (ret != 0)
+		RTE_LOG(ERR, EAL, "failed to remap device %s",
+			dev->name);
+	return ret;
+}
+
 static int
 pci_plug(struct rte_device *dev)
 {
@@ -528,10 +568,13 @@ struct rte_pci_bus rte_pci_bus = {
 		.scan = rte_pci_scan,
 		.probe = rte_pci_probe,
 		.find_device = pci_find_device,
+		.find_device_by_name = pci_find_device_by_name,
 		.plug = pci_plug,
 		.unplug = pci_unplug,
 		.parse = pci_parse,
 		.get_iommu_class = rte_pci_get_iommu_class,
+		.remap_device = pci_remap_device,
+		.bind_driver = rte_pci_dev_bind_driver,
 	},
 	.device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
 	.driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
diff --git a/drivers/bus/pci/pci_common_uio.c b/drivers/bus/pci/pci_common_uio.c
index 0671131..8cb4009 100644
--- a/drivers/bus/pci/pci_common_uio.c
+++ b/drivers/bus/pci/pci_common_uio.c
@@ -176,6 +176,34 @@ pci_uio_unmap(struct mapped_pci_resource *uio_res)
 	}
 }
 
+/* remap the PCI resource of a PCI device in private virtual memory */
+int
+pci_uio_remap_resource(struct rte_pci_device *dev)
+{
+	int i;
+	uint64_t phaddr;
+	void *map_address;
+
+	/* Map all BARs */
+	for (i = 0; i != PCI_MAX_RESOURCE; i++) {
+		/* skip empty BAR */
+		phaddr = dev->mem_resource[i].phys_addr;
+		if (phaddr == 0)
+			continue;
+		map_address = pci_map_private_resource(
+				dev->mem_resource[i].addr, 0,
+				(size_t)dev->mem_resource[i].len);
+		if (map_address == MAP_FAILED)
+			goto error;
+		memset(map_address, 0xFF, (size_t)dev->mem_resource[i].len);
+		dev->mem_resource[i].addr = map_address;
+	}
+
+	return 0;
+error:
+	return -1;
+}
+
 static struct mapped_pci_resource *
 pci_uio_find_resource(struct rte_pci_device *dev)
 {
diff --git a/drivers/bus/pci/private.h b/drivers/bus/pci/private.h
index 2283f09..10baa1a 100644
--- a/drivers/bus/pci/private.h
+++ b/drivers/bus/pci/private.h
@@ -202,6 +202,18 @@ void pci_uio_free_resource(struct rte_pci_device *dev,
 		struct mapped_pci_resource *uio_res);
 
 /**
+ * remap the pci uio resource..
+ *
+ * @param dev
+ *   Point to the struct rte pci device.
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+pci_uio_remap_resource(struct rte_pci_device *dev);
+
+/**
  * Map device memory to uio resource
  *
  * This function is private to EAL.
diff --git a/drivers/bus/pci/rte_bus_pci.h b/drivers/bus/pci/rte_bus_pci.h
index d4a2996..1662f3b 100644
--- a/drivers/bus/pci/rte_bus_pci.h
+++ b/drivers/bus/pci/rte_bus_pci.h
@@ -52,6 +52,8 @@ extern "C" {
 #include <sys/queue.h>
 #include <stdint.h>
 #include <inttypes.h>
+#include <unistd.h>
+#include <fcntl.h>
 
 #include <rte_debug.h>
 #include <rte_interrupts.h>
@@ -197,6 +199,15 @@ int rte_pci_map_device(struct rte_pci_device *dev);
 void rte_pci_unmap_device(struct rte_pci_device *dev);
 
 /**
+ * Remap this device
+ *
+ * @param dev
+ *   A pointer to a rte_pci_device structure describing the device
+ *   to use
+ */
+int rte_pci_remap_device(struct rte_pci_device *dev);
+
+/**
  * Dump the content of the PCI bus.
  *
  * @param f
@@ -333,6 +344,20 @@ void rte_pci_ioport_read(struct rte_pci_ioport *p,
 void rte_pci_ioport_write(struct rte_pci_ioport *p,
 		const void *data, size_t len, off_t offset);
 
+/**
+ * It can be used to bind a device to a specific type of driver.
+ *
+ * @param dev_name
+ *  The device name.
+ * @param drv_type
+ *  The specific driver's type.
+ *
+ * @return
+ *  - On success, zero.
+ *  - On failure, a negative value.
+ */
+int rte_pci_dev_bind_driver(const char *dev_name, const char *drv_type);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/drivers/bus/vdev/vdev.c b/drivers/bus/vdev/vdev.c
index fd7736d..773f6e0 100644
--- a/drivers/bus/vdev/vdev.c
+++ b/drivers/bus/vdev/vdev.c
@@ -323,6 +323,39 @@ vdev_find_device(const struct rte_device *start, rte_dev_cmp_t cmp,
 	return NULL;
 }
 
+static struct rte_device *
+vdev_find_device_by_name(const struct rte_device *start,
+		rte_dev_cmp_name_t cmp_name,
+		const void *data)
+{
+	struct rte_vdev_device *dev;
+
+	TAILQ_FOREACH(dev, &vdev_device_list, next) {
+		if (start && &dev->device == start) {
+			start = NULL;
+			continue;
+		}
+		if (cmp_name(dev->device.name, data) == 0)
+			return &dev->device;
+	}
+	return NULL;
+}
+
+static int
+vdev_remap_device(struct rte_device *dev)
+{
+	RTE_SET_USED(dev);
+	return 0;
+}
+
+static int
+vdev_bind_driver(const char *dev_name, const char *drv_type)
+{
+	RTE_SET_USED(dev_name);
+	RTE_SET_USED(drv_type);
+	return 0;
+}
+
 static int
 vdev_plug(struct rte_device *dev)
 {
@@ -339,9 +372,12 @@ static struct rte_bus rte_vdev_bus = {
 	.scan = vdev_scan,
 	.probe = vdev_probe,
 	.find_device = vdev_find_device,
+	.find_device_by_name = vdev_find_device_by_name,
 	.plug = vdev_plug,
 	.unplug = vdev_unplug,
 	.parse = vdev_parse,
+	.remap_device = vdev_remap_device,
+	.bind_driver = vdev_bind_driver,
 };
 
 RTE_REGISTER_BUS(vdev, rte_vdev_bus);
diff --git a/lib/librte_eal/bsdapp/eal/eal_dev.c b/lib/librte_eal/bsdapp/eal/eal_dev.c
new file mode 100644
index 0000000..6ea9a74
--- /dev/null
+++ b/lib/librte_eal/bsdapp/eal/eal_dev.c
@@ -0,0 +1,64 @@
+/*-
+ *   Copyright(c) 2010-2017 Intel Corporation.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdio.h>
+#include <string.h>
+#include <inttypes.h>
+#include <sys/queue.h>
+#include <sys/signalfd.h>
+#include <sys/ioctl.h>
+#include <sys/socket.h>
+#include <linux/netlink.h>
+#include <sys/epoll.h>
+#include <unistd.h>
+#include <signal.h>
+#include <stdbool.h>
+
+#include <rte_malloc.h>
+#include <rte_bus.h>
+#include <rte_dev.h>
+#include <rte_devargs.h>
+#include <rte_debug.h>
+#include <rte_log.h>
+
+#include "eal_thread.h"
+
+int
+rte_dev_monitor_start(void)
+{
+	return -1;
+}
+
+int
+rte_dev_monitor_stop(void)
+{
+	return -1;
+}
diff --git a/lib/librte_eal/bsdapp/eal/include/exec-env/rte_dev.h b/lib/librte_eal/bsdapp/eal/include/exec-env/rte_dev.h
new file mode 100644
index 0000000..6a6feb5
--- /dev/null
+++ b/lib/librte_eal/bsdapp/eal/include/exec-env/rte_dev.h
@@ -0,0 +1,106 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_DEV_H_
+#error "don't include this file directly, please include generic <rte_dev.h>"
+#endif
+
+#ifndef _RTE_LINUXAPP_DEV_H_
+#define _RTE_LINUXAPP_DEV_H_
+
+#include <stdio.h>
+
+#include <rte_dev.h>
+
+#define RTE_EAL_UEV_MSG_LEN 4096
+#define RTE_EAL_UEV_MSG_ELEM_LEN 128
+
+enum uev_subsystem {
+	UEV_SUBSYSTEM_UIO,
+	UEV_SUBSYSTEM_VFIO,
+	UEV_SUBSYSTEM_PCI,
+	UEV_SUBSYSTEM_MAX
+};
+
+enum uev_monitor_netlink_group {
+	UEV_MONITOR_KERNEL,
+	UEV_MONITOR_UDEV,
+};
+
+/**
+ * The device event type.
+ */
+enum rte_eal_dev_event_type {
+	RTE_EAL_DEV_EVENT_UNKNOWN,	/**< unknown event type */
+	RTE_EAL_DEV_EVENT_ADD,		/**< device adding event */
+	RTE_EAL_DEV_EVENT_REMOVE,
+					/**< device removing event */
+	RTE_EAL_DEV_EVENT_CHANGE,
+					/**< device status change event */
+	RTE_EAL_DEV_EVENT_MOVE,		/**< device sys path move event */
+	RTE_EAL_DEV_EVENT_ONLINE,	/**< device online event */
+	RTE_EAL_DEV_EVENT_OFFLINE,	/**< device offline event */
+	RTE_EAL_DEV_EVENT_MAX		/**< max value of this enum */
+};
+
+struct rte_eal_uevent {
+	enum rte_eal_dev_event_type type;	/**< device event type */
+	int subsystem;				/**< subsystem id */
+	char *devname;				/**< device name */
+	enum uev_monitor_netlink_group group;	/**< device netlink group */
+};
+
+/**
+ * Start the device uevent monitoring.
+ *
+ * @param none
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+rte_dev_monitor_start(void);
+
+/**
+ * Stop the device uevent monitoring .
+ *
+ * @param none
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+
+int
+rte_dev_monitor_stop(void);
+
+#endif /* _RTE_LINUXAPP_DEV_H_ */
diff --git a/lib/librte_eal/common/eal_common_bus.c b/lib/librte_eal/common/eal_common_bus.c
index 3e022d5..b7219c9 100644
--- a/lib/librte_eal/common/eal_common_bus.c
+++ b/lib/librte_eal/common/eal_common_bus.c
@@ -51,8 +51,11 @@ rte_bus_register(struct rte_bus *bus)
 	RTE_VERIFY(bus->scan);
 	RTE_VERIFY(bus->probe);
 	RTE_VERIFY(bus->find_device);
+	RTE_VERIFY(bus->find_device_by_name);
 	/* Buses supporting driver plug also require unplug. */
 	RTE_VERIFY(!bus->plug || bus->unplug);
+	RTE_VERIFY(bus->remap_device);
+	RTE_VERIFY(bus->bind_driver);
 
 	TAILQ_INSERT_TAIL(&rte_bus_list, bus, next);
 	RTE_LOG(DEBUG, EAL, "Registered [%s] bus.\n", bus->name);
@@ -170,6 +173,14 @@ cmp_rte_device(const struct rte_device *dev1, const void *_dev2)
 }
 
 static int
+cmp_rte_device_name(const char *dev_name1, const void *_dev_name2)
+{
+	const char *dev_name2 = _dev_name2;
+
+	return strcmp(dev_name1, dev_name2);
+}
+
+static int
 bus_find_device(const struct rte_bus *bus, const void *_dev)
 {
 	struct rte_device *dev;
@@ -178,6 +189,25 @@ bus_find_device(const struct rte_bus *bus, const void *_dev)
 	return dev == NULL;
 }
 
+static struct rte_device *
+bus_find_device_by_name(const struct rte_bus *bus, const void *_dev_name)
+{
+	struct rte_device *dev;
+
+	dev = bus->find_device_by_name(NULL, cmp_rte_device_name, _dev_name);
+	return dev;
+}
+
+struct rte_device *
+
+rte_bus_find_device(const struct rte_bus *bus, const void *_dev_name)
+{
+	struct rte_device *dev;
+
+	dev = bus_find_device_by_name(bus, _dev_name);
+	return dev;
+}
+
 struct rte_bus *
 rte_bus_find_by_device(const struct rte_device *dev)
 {
diff --git a/lib/librte_eal/common/eal_common_dev.c b/lib/librte_eal/common/eal_common_dev.c
index dda8f58..47909e8 100644
--- a/lib/librte_eal/common/eal_common_dev.c
+++ b/lib/librte_eal/common/eal_common_dev.c
@@ -42,9 +42,31 @@
 #include <rte_devargs.h>
 #include <rte_debug.h>
 #include <rte_log.h>
+#include <rte_spinlock.h>
+#include <rte_malloc.h>
 
 #include "eal_private.h"
 
+/* spinlock for device callbacks */
+static rte_spinlock_t rte_dev_cb_lock = RTE_SPINLOCK_INITIALIZER;
+
+/**
+ * The user application callback description.
+ *
+ * It contains callback address to be registered by user application,
+ * the pointer to the parameters for callback, and the event type.
+ */
+struct rte_eal_dev_callback {
+	TAILQ_ENTRY(rte_eal_dev_callback) next; /**< Callbacks list */
+	rte_eal_dev_cb_fn cb_fn;                /**< Callback address */
+	void *cb_arg;                           /**< Parameter for callback */
+	void *ret_param;                        /**< Return parameter */
+	enum rte_eal_dev_event_type event;      /**< device event type */
+	uint32_t active;                        /**< Callback is executing */
+};
+
+static struct rte_eal_dev_callback *dev_add_cb;
+
 static int cmp_detached_dev_name(const struct rte_device *dev,
 	const void *_name)
 {
@@ -234,3 +256,150 @@ int rte_eal_hotplug_remove(const char *busname, const char *devname)
 	rte_eal_devargs_remove(busname, devname);
 	return ret;
 }
+
+int
+rte_eal_dev_monitor_enable(void)
+{
+	int ret;
+
+	ret = rte_dev_monitor_start();
+	if (ret)
+		RTE_LOG(ERR, EAL, "Can not init device monitor\n");
+	return ret;
+}
+
+int
+rte_dev_callback_register(struct rte_device *device,
+			enum rte_eal_dev_event_type event,
+			rte_eal_dev_cb_fn cb_fn, void *cb_arg)
+{
+	struct rte_eal_dev_callback *user_cb;
+
+	if (!cb_fn)
+		return -EINVAL;
+
+	rte_spinlock_lock(&rte_dev_cb_lock);
+
+	if (TAILQ_EMPTY(&(device->uev_cbs)))
+		TAILQ_INIT(&(device->uev_cbs));
+
+	if (event == RTE_EAL_DEV_EVENT_ADD) {
+		user_cb = NULL;
+	} else {
+		TAILQ_FOREACH(user_cb, &(device->uev_cbs), next) {
+			if (user_cb->cb_fn == cb_fn &&
+				user_cb->cb_arg == cb_arg &&
+				user_cb->event == event) {
+				break;
+			}
+		}
+	}
+
+	/* create a new callback. */
+	if (user_cb == NULL) {
+		/* allocate a new interrupt callback entity */
+		user_cb = rte_zmalloc("eal device event",
+					sizeof(*user_cb), 0);
+		if (user_cb == NULL) {
+			RTE_LOG(ERR, EAL, "Can not allocate memory\n");
+			return -ENOMEM;
+		}
+		user_cb->cb_fn = cb_fn;
+		user_cb->cb_arg = cb_arg;
+		user_cb->event = event;
+		if (event == RTE_EAL_DEV_EVENT_ADD)
+			dev_add_cb = user_cb;
+		else
+			TAILQ_INSERT_TAIL(&(device->uev_cbs), user_cb, next);
+	}
+
+	rte_spinlock_unlock(&rte_dev_cb_lock);
+	return 0;
+}
+
+int
+rte_dev_callback_unregister(struct rte_device *device,
+			enum rte_eal_dev_event_type event,
+			rte_eal_dev_cb_fn cb_fn, void *cb_arg)
+{
+	int ret;
+	struct rte_eal_dev_callback *cb, *next;
+
+	if (!cb_fn)
+		return -EINVAL;
+
+	rte_spinlock_lock(&rte_dev_cb_lock);
+
+	ret = 0;
+	if (event == RTE_EAL_DEV_EVENT_ADD) {
+		rte_free(dev_add_cb);
+		dev_add_cb = NULL;
+	} else {
+		for (cb = TAILQ_FIRST(&(device->uev_cbs)); cb != NULL;
+		      cb = next) {
+
+			next = TAILQ_NEXT(cb, next);
+
+			if (cb->cb_fn != cb_fn || cb->event != event ||
+					(cb->cb_arg != (void *)-1 &&
+					cb->cb_arg != cb_arg))
+				continue;
+
+			/*
+			 * if this callback is not executing right now,
+			 * then remove it.
+			 */
+			if (cb->active == 0) {
+				TAILQ_REMOVE(&(device->uev_cbs), cb, next);
+				rte_free(cb);
+			} else {
+				ret = -EAGAIN;
+			}
+		}
+	}
+	rte_spinlock_unlock(&rte_dev_cb_lock);
+	return ret;
+}
+
+int
+_rte_dev_callback_process(struct rte_device *device,
+			enum rte_eal_dev_event_type event,
+			void *cb_arg, void *ret_param)
+{
+	struct rte_eal_dev_callback dev_cb;
+	struct rte_eal_dev_callback *cb_lst;
+	int rc = 0;
+
+	rte_spinlock_lock(&rte_dev_cb_lock);
+	if (event == RTE_EAL_DEV_EVENT_ADD) {
+		if (cb_arg != NULL)
+			dev_add_cb->cb_arg = cb_arg;
+
+		if (ret_param != NULL)
+			dev_add_cb->ret_param = ret_param;
+
+		rte_spinlock_unlock(&rte_dev_cb_lock);
+		rc = dev_add_cb->cb_fn(dev_add_cb->event,
+				dev_add_cb->cb_arg, dev_add_cb->ret_param);
+		rte_spinlock_lock(&rte_dev_cb_lock);
+	} else {
+		TAILQ_FOREACH(cb_lst, &(device->uev_cbs), next) {
+			if (cb_lst->cb_fn == NULL || cb_lst->event != event)
+				continue;
+			dev_cb = *cb_lst;
+			cb_lst->active = 1;
+			if (cb_arg != NULL)
+				dev_cb.cb_arg = cb_arg;
+			if (ret_param != NULL)
+				dev_cb.ret_param = ret_param;
+
+			rte_spinlock_unlock(&rte_dev_cb_lock);
+			rc = dev_cb.cb_fn(dev_cb.event,
+					dev_cb.cb_arg, dev_cb.ret_param);
+			rte_spinlock_lock(&rte_dev_cb_lock);
+			cb_lst->active = 0;
+		}
+	}
+	rte_spinlock_unlock(&rte_dev_cb_lock);
+	return rc;
+}
diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
index 6fb0834..6c4ae31 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -122,6 +122,34 @@ typedef struct rte_device *
 			 const void *data);
 
 /**
+ * Device iterator to find a device on a bus.
+ *
+ * This function returns an rte_device if one of those held by the bus
+ * matches the data passed as parameter.
+ *
+ * If the comparison function returns zero this function should stop iterating
+ * over any more devices. To continue a search the device of a previous search
+ * can be passed via the start parameter.
+ *
+ * @param cmp
+ *	the device name comparison function.
+ *
+ * @param data
+ *	Data to compare each device against.
+ *
+ * @param start
+ *	starting point for the iteration
+ *
+ * @return
+ *	The first device matching the data, NULL if none exists.
+ */
+typedef struct rte_device *
+(*rte_bus_find_device_by_name_t)(const struct rte_device *start,
+			 rte_dev_cmp_name_t cmp,
+			 const void *data);
+
+
+/**
  * Implementation specific probe function which is responsible for linking
  * devices on that bus with applicable drivers.
  *
@@ -168,6 +196,37 @@ typedef int (*rte_bus_unplug_t)(struct rte_device *dev);
 typedef int (*rte_bus_parse_t)(const char *name, void *addr);
 
 /**
+ * Implementation specific remap function which is responsible for remmaping
+ * devices on that bus from original share memory resource to a private memory
+ * resource for the sake of device has been removal.
+ *
+ * @param dev
+ *	Device pointer that was returned by a previous call to find_device.
+ *
+ * @return
+ *	0 on success.
+ *	!0 on error.
+ */
+typedef int (*rte_bus_remap_device_t)(struct rte_device *dev);
+
+/**
+ * Implementation specific bind driver function which is responsible for bind
+ * a explicit type of driver with a devices on that bus.
+ *
+ * @param dev_name
+ *	device textual description.
+ *
+ * @param drv_type
+ *	driver type textual description.
+ *
+ * @return
+ *	0 on success.
+ *	!0 on error.
+ */
+typedef int (*rte_bus_bind_driver_t)(const char *dev_name,
+				const char *drv_type);
+
+/**
  * Bus scan policies
  */
 enum rte_bus_scan_mode {
@@ -206,9 +265,13 @@ struct rte_bus {
 	rte_bus_scan_t scan;         /**< Scan for devices attached to bus */
 	rte_bus_probe_t probe;       /**< Probe devices on bus */
 	rte_bus_find_device_t find_device; /**< Find a device on the bus */
+	rte_bus_find_device_by_name_t find_device_by_name;
+				     /**< Find a device on the bus */
 	rte_bus_plug_t plug;         /**< Probe single device for drivers */
 	rte_bus_unplug_t unplug;     /**< Remove single device from driver */
 	rte_bus_parse_t parse;       /**< Parse a device name */
+	rte_bus_remap_device_t remap_device;       /**< remap a device */
+	rte_bus_bind_driver_t bind_driver; /**< bind a driver for bus device */
 	struct rte_bus_conf conf;    /**< Bus configuration */
 	rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
 };
@@ -306,6 +369,12 @@ struct rte_bus *rte_bus_find(const struct rte_bus *start, rte_bus_cmp_t cmp,
 struct rte_bus *rte_bus_find_by_device(const struct rte_device *dev);
 
 /**
+ * Find the registered bus for a particular device.
+ */
+struct rte_device *rte_bus_find_device(const struct rte_bus *bus,
+				const void *dev_name);
+
+/**
  * Find the registered bus for a given name.
  */
 struct rte_bus *rte_bus_find_by_name(const char *busname);
diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h
index 9342e0c..19971d0 100644
--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -51,6 +51,15 @@ extern "C" {
 
 #include <rte_log.h>
 
+#include <exec-env/rte_dev.h>
+
+typedef int (*rte_eal_dev_cb_fn)(enum rte_eal_dev_event_type event,
+					void *cb_arg, void *ret_param);
+
+struct rte_eal_dev_callback;
+/** @internal Structure to keep track of registered callbacks */
+TAILQ_HEAD(rte_eal_dev_cb_list, rte_eal_dev_callback);
+
 __attribute__((format(printf, 2, 0)))
 static inline void
 rte_pmd_debug_trace(const char *func_name, const char *fmt, ...)
@@ -157,6 +166,13 @@ struct rte_driver {
  */
 #define RTE_DEV_NAME_MAX_LEN 64
 
+enum device_state {
+	DEVICE_UNDEFINED,
+	DEVICE_FAULT,
+	DEVICE_PARSED,
+	DEVICE_PROBED,
+};
+
 /**
  * A structure describing a generic device.
  */
@@ -166,6 +182,9 @@ struct rte_device {
 	const struct rte_driver *driver;/**< Associated driver */
 	int numa_node;                /**< NUMA node connection */
 	struct rte_devargs *devargs;  /**< Device user arguments */
+	enum device_state state;  /**< Device state */
+	/** User application callbacks for device event */
+	struct rte_eal_dev_cb_list uev_cbs;
 };
 
 /**
@@ -248,6 +267,8 @@ int rte_eal_hotplug_remove(const char *busname, const char *devname);
  */
 typedef int (*rte_dev_cmp_t)(const struct rte_device *dev, const void *data);
 
+typedef int (*rte_dev_cmp_name_t)(const char *dev_name, const void *data);
+
 #define RTE_PMD_EXPORT_NAME_ARRAY(n, idx) n##idx[]
 
 #define RTE_PMD_EXPORT_NAME(name, idx) \
@@ -293,4 +314,72 @@ __attribute__((used)) = str
 }
 #endif
 
+/**
+ * It enable the device event monitoring for a specific event.
+ *
+ * @param none
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+rte_eal_dev_monitor_enable(void);
+/**
+ * It registers the callback for the specific event. Multiple
+ * callbacks cal be registered at the same time.
+ * @param event
+ *  The device event type.
+ * @param cb_fn
+ *  callback address.
+ * @param cb_arg
+ *  address of parameter for callback.
+ *
+ * @return
+ *  - On success, zero.
+ *  - On failure, a negative value.
+ */
+int rte_dev_callback_register(struct rte_device *device,
+			enum rte_eal_dev_event_type event,
+			rte_eal_dev_cb_fn cb_fn, void *cb_arg);
+
+/**
+ * It unregisters the callback according to the specified event.
+ *
+ * @param event
+ *  The event type which corresponding to the callback.
+ * @param cb_fn
+ *  callback address.
+ *  address of parameter for callback, (void *)-1 means to remove all
+ *  registered which has the same callback address.
+ *
+ * @return
+ *  - On success, return the number of callback entities removed.
+ *  - On failure, a negative value.
+ */
+int rte_dev_callback_unregister(struct rte_device *device,
+			enum rte_eal_dev_event_type event,
+			rte_eal_dev_cb_fn cb_fn, void *cb_arg);
+
+/**
+ * @internal Executes all the user application registered callbacks for
+ * the specific device. It is for DPDK internal user only. User
+ * application should not call it directly.
+ *
+ * @param event
+ *  The device event type.
+ * @param cb_arg
+ *  callback parameter.
+ * @param ret_param
+ *  To pass data back to user application.
+ *  This allows the user application to decide if a particular function
+ *  is permitted or not.
+ *
+ * @return
+ *  - On success, return zero.
+ *  - On failure, a negative value.
+ */
+int
+_rte_dev_callback_process(struct rte_device *device,
+			enum rte_eal_dev_event_type event,
+			void *cb_arg, void *ret_param);
 #endif /* _RTE_DEV_H_ */
diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index 5a7b8b2..05a2437 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -67,6 +67,7 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_lcore.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_timer.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_interrupts.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_alarm.c
+SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_dev.c
 
 # from common dir
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_common_lcore.c
@@ -120,7 +121,7 @@ ifeq ($(CONFIG_RTE_TOOLCHAIN_GCC),y)
 CFLAGS_eal_thread.o += -Wno-return-type
 endif
 
-INC := rte_kni_common.h
+INC := rte_kni_common.h rte_dev.h
 
 SYMLINK-$(CONFIG_RTE_EXEC_ENV_LINUXAPP)-include/exec-env := \
 	$(addprefix include/exec-env/,$(INC))
diff --git a/lib/librte_eal/linuxapp/eal/eal_alarm.c b/lib/librte_eal/linuxapp/eal/eal_alarm.c
index 8e4a775..29e73a7 100644
--- a/lib/librte_eal/linuxapp/eal/eal_alarm.c
+++ b/lib/librte_eal/linuxapp/eal/eal_alarm.c
@@ -209,6 +209,7 @@ rte_eal_alarm_cancel(rte_eal_alarm_callback cb_fn, void *cb_arg)
 	int count = 0;
 	int err = 0;
 	int executing;
+	int ret;
 
 	if (!cb_fn) {
 		rte_errno = EINVAL;
@@ -259,6 +260,10 @@ rte_eal_alarm_cancel(rte_eal_alarm_callback cb_fn, void *cb_arg)
 			}
 			ap_prev = ap;
 		}
+
+		ret |= rte_intr_callback_unregister(&intr_handle,
+				eal_alarm_callback, NULL);
+
 		rte_spinlock_unlock(&alarm_list_lk);
 	} while (executing != 0);
 
diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
new file mode 100644
index 0000000..49fd0dc
--- /dev/null
+++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
@@ -0,0 +1,356 @@
+/*-
+ *   Copyright(c) 2010-2017 Intel Corporation.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdio.h>
+#include <string.h>
+#include <inttypes.h>
+#include <sys/queue.h>
+#include <sys/signalfd.h>
+#include <sys/ioctl.h>
+#include <sys/socket.h>
+#include <linux/netlink.h>
+#include <sys/epoll.h>
+#include <unistd.h>
+#include <signal.h>
+#include <stdbool.h>
+
+#include <rte_malloc.h>
+#include <rte_bus.h>
+#include <rte_dev.h>
+#include <rte_devargs.h>
+#include <rte_debug.h>
+#include <rte_log.h>
+
+#include "eal_thread.h"
+
+/* uev monitoring thread */
+static pthread_t uev_monitor_thread;
+
+bool udev_exit = true;
+
+bool no_request_thread = true;
+
+static void sig_handler(int signum)
+{
+	if (signum == SIGINT || signum == SIGTERM)
+		rte_dev_monitor_stop();
+}
+
+static int
+dev_monitor_fd_new(void)
+{
+
+	int uevent_fd;
+
+	uevent_fd = socket(PF_NETLINK, SOCK_RAW | SOCK_CLOEXEC |
+			SOCK_NONBLOCK,
+			NETLINK_KOBJECT_UEVENT);
+	if (uevent_fd < 0) {
+		RTE_LOG(ERR, EAL, "create uevent fd failed\n");
+		return -1;
+	}
+	return uevent_fd;
+}
+
+static int
+dev_monitor_enable(int netlink_fd)
+{
+	struct sockaddr_nl addr;
+	int ret;
+	int size = 64 * 1024;
+	int nonblock = 1;
+
+	memset(&addr, 0, sizeof(addr));
+	addr.nl_family = AF_NETLINK;
+	addr.nl_pid = 0;
+	addr.nl_groups = 0xffffffff;
+
+	if (bind(netlink_fd, (struct sockaddr *) &addr, sizeof(addr)) < 0) {
+		RTE_LOG(ERR, EAL, "bind failed\n");
+		goto err;
+	}
+
+	setsockopt(netlink_fd, SOL_SOCKET, SO_PASSCRED, &size, sizeof(size));
+
+	ret = ioctl(netlink_fd, FIONBIO, &nonblock);
+	if (ret != 0) {
+		RTE_LOG(ERR, EAL, "ioctl(FIONBIO) failed\n");
+		goto err;
+	}
+	return 0;
+err:
+	close(netlink_fd);
+	return -1;
+}
+
+static void
+dev_uev_parse(const char *buf, struct rte_eal_uevent *event)
+{
+	char action[RTE_EAL_UEV_MSG_ELEM_LEN];
+	char subsystem[RTE_EAL_UEV_MSG_ELEM_LEN];
+	char dev_path[RTE_EAL_UEV_MSG_ELEM_LEN];
+	char pci_slot_name[RTE_EAL_UEV_MSG_ELEM_LEN];
+	int i = 0;
+
+	memset(action, 0, RTE_EAL_UEV_MSG_ELEM_LEN);
+	memset(subsystem, 0, RTE_EAL_UEV_MSG_ELEM_LEN);
+	memset(dev_path, 0, RTE_EAL_UEV_MSG_ELEM_LEN);
+	memset(pci_slot_name, 0, RTE_EAL_UEV_MSG_ELEM_LEN);
+
+	while (i < RTE_EAL_UEV_MSG_LEN) {
+		for (; i < RTE_EAL_UEV_MSG_LEN; i++) {
+			if (*buf)
+				break;
+			buf++;
+		}
+		if (!strncmp(buf, "libudev", 7)) {
+			buf += 7;
+			i += 7;
+			event->group = UEV_MONITOR_UDEV;
+		}
+		if (!strncmp(buf, "ACTION=", 7)) {
+			buf += 7;
+			i += 7;
+			snprintf(action, sizeof(action), "%s", buf);
+		} else if (!strncmp(buf, "DEVPATH=", 8)) {
+			buf += 8;
+			i += 8;
+			snprintf(dev_path, sizeof(dev_path), "%s", buf);
+		} else if (!strncmp(buf, "SUBSYSTEM=", 10)) {
+			buf += 10;
+			i += 10;
+			snprintf(subsystem, sizeof(subsystem), "%s", buf);
+		} else if (!strncmp(buf, "PCI_SLOT_NAME=", 14)) {
+			buf += 14;
+			i += 14;
+			snprintf(pci_slot_name, sizeof(subsystem), "%s", buf);
+			event->devname = pci_slot_name;
+		}
+		for (; i < RTE_EAL_UEV_MSG_LEN; i++) {
+			if (*buf == '\0')
+				break;
+			buf++;
+		}
+	}
+
+	if (!strncmp(subsystem, "pci", 3))
+		event->subsystem = UEV_SUBSYSTEM_PCI;
+	if (!strncmp(action, "add", 3))
+		event->type = RTE_EAL_DEV_EVENT_ADD;
+	if (!strncmp(action, "remove", 6))
+		event->type = RTE_EAL_DEV_EVENT_REMOVE;
+	event->devname = pci_slot_name;
+}
+
+static int
+dev_uev_receive(int fd, struct rte_eal_uevent *uevent)
+{
+	int ret;
+	char buf[RTE_EAL_UEV_MSG_LEN];
+
+	memset(uevent, 0, sizeof(struct rte_eal_uevent));
+	memset(buf, 0, RTE_EAL_UEV_MSG_LEN);
+
+	ret = recv(fd, buf, RTE_EAL_UEV_MSG_LEN - 1, MSG_DONTWAIT);
+	if (ret < 0) {
+		RTE_LOG(ERR, EAL,
+		"Socket read error(%d): %s\n",
+		errno, strerror(errno));
+		return -1;
+	} else if (ret == 0)
+		/* connection closed */
+		return -1;
+
+	dev_uev_parse(buf, uevent);
+
+	return 0;
+}
+
+static int
+dev_uev_process(struct epoll_event *events, int nfds)
+{
+	struct rte_bus *bus;
+	struct rte_device *dev;
+	struct rte_eal_uevent uevent;
+	int ret;
+	int i;
+
+	for (i = 0; i < nfds; i++) {
+		/**
+		 * check device uevent from kernel side, no need to check
+		 * uevent from udev.
+		 */
+		if ((dev_uev_receive(events[i].data.fd, &uevent)) ||
+			(uevent.group == UEV_MONITOR_UDEV))
+			return 0;
+
+		/* default handle all pci devcie when is being hot plug */
+		if (uevent.subsystem == UEV_SUBSYSTEM_PCI) {
+			bus = rte_bus_find_by_name("pci");
+			dev = rte_bus_find_device(bus, uevent.devname);
+			if (uevent.type == RTE_EAL_DEV_EVENT_REMOVE) {
+
+				if ((!dev) || dev->state == DEVICE_UNDEFINED)
+					return 0;
+				dev->state = DEVICE_FAULT;
+
+				/**
+				 * remap the resource to be fake
+				 * before user's removal processing
+				 */
+				ret = bus->remap_device(dev);
+				if (!ret)
+					return(_rte_dev_callback_process(dev,
+					  RTE_EAL_DEV_EVENT_REMOVE,
+					  NULL, NULL));
+			} else if (uevent.type == RTE_EAL_DEV_EVENT_ADD) {
+				if (dev == NULL) {
+					/**
+					 * bind the driver to the device
+					 * before user's add processing
+					 */
+					bus->bind_driver(
+						uevent.devname,
+						"igb_uio");
+					return(_rte_dev_callback_process(NULL,
+					  RTE_EAL_DEV_EVENT_ADD,
+					  uevent.devname, NULL));
+				}
+			}
+		}
+	}
+	return 0;
+}
+
+/**
+ * It builds/rebuilds up the epoll file descriptor with all the
+ * file descriptors being waited on. Then handles the interrupts.
+ *
+ * @param arg
+ *  pointer. (unused)
+ *
+ * @return
+ *  never return;
+ */
+static __attribute__((noreturn)) void *
+dev_uev_monitoring(__rte_unused void *arg)
+{
+	struct sigaction act;
+	sigset_t mask;
+	int netlink_fd;
+	struct epoll_event ep_kernel;
+	int fd_ep;
+
+	udev_exit = false;
+
+	/* set signal handlers */
+	memset(&act, 0x00, sizeof(struct sigaction));
+	act.sa_handler = sig_handler;
+	sigemptyset(&act.sa_mask);
+	act.sa_flags = SA_RESTART;
+	sigaction(SIGINT, &act, NULL);
+	sigaction(SIGTERM, &act, NULL);
+	sigemptyset(&mask);
+	sigaddset(&mask, SIGINT);
+	sigaddset(&mask, SIGTERM);
+	sigprocmask(SIG_UNBLOCK, &mask, NULL);
+
+	fd_ep = epoll_create1(EPOLL_CLOEXEC);
+	if (fd_ep < 0) {
+		RTE_LOG(ERR, EAL, "error creating epoll fd: %m\n");
+		goto out;
+	}
+
+	netlink_fd = dev_monitor_fd_new();
+
+	if (dev_monitor_enable(netlink_fd) < 0) {
+		RTE_LOG(ERR, EAL, "error subscribing to kernel events\n");
+		goto out;
+	}
+
+	memset(&ep_kernel, 0, sizeof(struct epoll_event));
+	ep_kernel.events = EPOLLIN | EPOLLPRI | EPOLLRDHUP | EPOLLHUP;
+	ep_kernel.data.fd = netlink_fd;
+	if (epoll_ctl(fd_ep, EPOLL_CTL_ADD, netlink_fd,
+		&ep_kernel) < 0) {
+		RTE_LOG(ERR, EAL, "error addding fd to epoll: %m\n");
+		goto out;
+	}
+
+	while (!udev_exit) {
+		int fdcount;
+		struct epoll_event ev[1];
+
+		fdcount = epoll_wait(fd_ep, ev, 1, -1);
+		if (fdcount < 0) {
+			if (errno != EINTR)
+				RTE_LOG(ERR, EAL, "error receiving uevent "
+					"message: %m\n");
+				continue;
+			}
+
+		/* epoll_wait has at least one fd ready to read */
+		if (dev_uev_process(ev, fdcount) < 0) {
+			if (errno != EINTR)
+				RTE_LOG(ERR, EAL, "error processing uevent "
+					"message: %m\n");
+		}
+	}
+out:
+	if (fd_ep >= 0)
+		close(fd_ep);
+	if (netlink_fd >= 0)
+		close(netlink_fd);
+	rte_panic("uev monitoring fail\n");
+}
+
+int
+rte_dev_monitor_start(void)
+{
+	int ret;
+
+	if (!no_request_thread)
+		return 0;
+	no_request_thread = false;
+
+	/* create the host thread to wait/handle the uevent from kernel */
+	ret = pthread_create(&uev_monitor_thread, NULL,
+		dev_uev_monitoring, NULL);
+	return ret;
+}
+
+int
+rte_dev_monitor_stop(void)
+{
+	udev_exit = true;
+	no_request_thread = true;
+	return 0;
+}
diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_dev.h b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_dev.h
new file mode 100644
index 0000000..6a6feb5
--- /dev/null
+++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_dev.h
@@ -0,0 +1,106 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_DEV_H_
+#error "don't include this file directly, please include generic <rte_dev.h>"
+#endif
+
+#ifndef _RTE_LINUXAPP_DEV_H_
+#define _RTE_LINUXAPP_DEV_H_
+
+#include <stdio.h>
+
+#include <rte_dev.h>
+
+#define RTE_EAL_UEV_MSG_LEN 4096
+#define RTE_EAL_UEV_MSG_ELEM_LEN 128
+
+enum uev_subsystem {
+	UEV_SUBSYSTEM_UIO,
+	UEV_SUBSYSTEM_VFIO,
+	UEV_SUBSYSTEM_PCI,
+	UEV_SUBSYSTEM_MAX
+};
+
+enum uev_monitor_netlink_group {
+	UEV_MONITOR_KERNEL,
+	UEV_MONITOR_UDEV,
+};
+
+/**
+ * The device event type.
+ */
+enum rte_eal_dev_event_type {
+	RTE_EAL_DEV_EVENT_UNKNOWN,	/**< unknown event type */
+	RTE_EAL_DEV_EVENT_ADD,		/**< device adding event */
+	RTE_EAL_DEV_EVENT_REMOVE,
+					/**< device removing event */
+	RTE_EAL_DEV_EVENT_CHANGE,
+					/**< device status change event */
+	RTE_EAL_DEV_EVENT_MOVE,		/**< device sys path move event */
+	RTE_EAL_DEV_EVENT_ONLINE,	/**< device online event */
+	RTE_EAL_DEV_EVENT_OFFLINE,	/**< device offline event */
+	RTE_EAL_DEV_EVENT_MAX		/**< max value of this enum */
+};
+
+struct rte_eal_uevent {
+	enum rte_eal_dev_event_type type;	/**< device event type */
+	int subsystem;				/**< subsystem id */
+	char *devname;				/**< device name */
+	enum uev_monitor_netlink_group group;	/**< device netlink group */
+};
+
+/**
+ * Start the device uevent monitoring.
+ *
+ * @param none
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+rte_dev_monitor_start(void);
+
+/**
+ * Stop the device uevent monitoring .
+ *
+ * @param none
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+
+int
+rte_dev_monitor_stop(void);
+
+#endif /* _RTE_LINUXAPP_DEV_H_ */
diff --git a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
index a3a98c1..d0e07b4 100644
--- a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
+++ b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
@@ -354,6 +354,12 @@ igbuio_pci_release(struct uio_info *info, struct inode *inode)
 	struct rte_uio_pci_dev *udev = info->priv;
 	struct pci_dev *dev = udev->pdev;
 
+	/* check if device have been remove before release */
+	if ((&dev->dev.kobj)->state_remove_uevent_sent == 1) {
+		pr_info("The device have been removed\n");
+		return -1;
+	}
+
 	/* disable interrupts */
 	igbuio_pci_disable_interrupts(udev);
 
diff --git a/lib/librte_pci/rte_pci.c b/lib/librte_pci/rte_pci.c
index 0160fc1..feb5fd7 100644
--- a/lib/librte_pci/rte_pci.c
+++ b/lib/librte_pci/rte_pci.c
@@ -172,6 +172,26 @@ rte_pci_addr_parse(const char *str, struct rte_pci_addr *addr)
 	return -1;
 }
 
+/* map a private resource from an address*/
+void *
+pci_map_private_resource(void *requested_addr, off_t offset, size_t size)
+{
+	void *mapaddr;
+
+	mapaddr = mmap(requested_addr, size,
+			   PROT_READ | PROT_WRITE,
+			   MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
+	if (mapaddr == MAP_FAILED) {
+		RTE_LOG(ERR, EAL, "%s(): cannot mmap(%p, 0x%lx, 0x%lx): "
+			"%s (%p)\n",
+			__func__, requested_addr,
+			(unsigned long)size, (unsigned long)offset,
+			strerror(errno), mapaddr);
+	} else
+		RTE_LOG(DEBUG, EAL, "  PCI memory mapped at %p\n", mapaddr);
+
+	return mapaddr;
+}
 
 /* map a particular resource from a file */
 void *
diff --git a/lib/librte_pci/rte_pci.h b/lib/librte_pci/rte_pci.h
index 4f2cd18..f6091a6 100644
--- a/lib/librte_pci/rte_pci.h
+++ b/lib/librte_pci/rte_pci.h
@@ -227,6 +227,23 @@ int rte_pci_addr_cmp(const struct rte_pci_addr *addr,
 int rte_pci_addr_parse(const char *str, struct rte_pci_addr *addr);
 
 /**
+ * @internal
+ * Map to a particular private resource.
+ *
+ * @param requested_addr
+ *      The starting address for the new mapping range.
+ * @param offset
+ *      The offset for the mapping range.
+ * @param size
+ *      The size for the mapping range.
+ * @return
+ *   - On success, the function returns a pointer to the mapped area.
+ *   - On error, the value MAP_FAILED is returned.
+ */
+void *pci_map_private_resource(void *requested_addr, off_t offset,
+		size_t size);
+
+/**
  * Map a particular resource from a file.
  *
  * @param requested_addr
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v7 2/2] app/testpmd: use uevent to monitor hotplug
  2018-01-03  1:42                     ` [PATCH v7 0/2] add uevent monitor for hot plug Jeff Guo
  2018-01-03  1:42                       ` [PATCH v7 1/2] eal: " Jeff Guo
@ 2018-01-03  1:42                       ` Jeff Guo
  2018-01-10  3:30                         ` [PATCH V8 0/3] add uevent mechanism in eal framework Jeff Guo
  1 sibling, 1 reply; 494+ messages in thread
From: Jeff Guo @ 2018-01-03  1:42 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, gaetan.rivet
  Cc: konstantin.ananyev, jblunck, shreyansh.jain, jingjing.wu, dev,
	jia.guo, thomas, helin.zhang, motih

use testpmd for example, to show app how to request and use
uevent monitoring to handle the hot removal event and the
hot insertion event.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v7->v6:
fix the system hung after send packcet issue.
---
 app/test-pmd/testpmd.c | 178 +++++++++++++++++++++++++++++++++++++++++++++++++
 app/test-pmd/testpmd.h |   9 +++
 2 files changed, 187 insertions(+)

diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index c3ab448..97b4999 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -401,6 +401,8 @@ uint8_t bitrate_enabled;
 struct gro_status gro_ports[RTE_MAX_ETHPORTS];
 uint8_t gro_flush_cycles = GRO_DEFAULT_FLUSH_CYCLES;
 
+static struct hotplug_request_list hp_list;
+
 /* Forward function declarations */
 static void map_port_queue_stats_mapping_registers(portid_t pi,
 						   struct rte_port *port);
@@ -408,6 +410,13 @@ static void check_all_ports_link_status(uint32_t port_mask);
 static int eth_event_callback(portid_t port_id,
 			      enum rte_eth_event_type type,
 			      void *param, void *ret_param);
+static int eth_uevent_callback(enum rte_eal_dev_event_type type,
+			      void *param, void *ret_param);
+static int eth_uevent_callback_register(portid_t pid);
+static int in_hotplug_list(const char *dev_name);
+
+static int hotplug_list_add(const char *dev_name,
+			    enum rte_eal_dev_event_type event);
 
 /*
  * Check if all the ports are started.
@@ -1757,6 +1766,31 @@ reset_port(portid_t pid)
 	printf("Done\n");
 }
 
+static int
+eth_uevent_callback_register(portid_t pid) {
+	int diag;
+	struct rte_eth_dev *dev;
+	enum rte_eal_dev_event_type dev_event_type;
+
+	/* register the uevent callback */
+	dev = &rte_eth_devices[pid];
+	for (dev_event_type = RTE_EAL_DEV_EVENT_ADD;
+		 dev_event_type < RTE_EAL_DEV_EVENT_CHANGE;
+		 dev_event_type++) {
+		diag = rte_dev_callback_register(dev->device, dev_event_type,
+			eth_uevent_callback,
+			(void *)(intptr_t)pid);
+		if (diag) {
+			printf("Failed to setup uevent callback for"
+				" device event %d\n",
+				dev_event_type);
+			return -1;
+		}
+	}
+
+	return 0;
+}
+
 void
 attach_port(char *identifier)
 {
@@ -1773,6 +1807,8 @@ attach_port(char *identifier)
 	if (rte_eth_dev_attach(identifier, &pi))
 		return;
 
+	eth_uevent_callback_register(pi);
+
 	socket_id = (unsigned)rte_eth_dev_socket_id(pi);
 	/* if socket_id is invalid, set to 0 */
 	if (check_socket_id(socket_id) < 0)
@@ -1784,6 +1820,8 @@ attach_port(char *identifier)
 
 	ports[pi].port_status = RTE_PORT_STOPPED;
 
+	hotplug_list_add(identifier, RTE_EAL_DEV_EVENT_REMOVE);
+
 	printf("Port %d is attached. Now total ports is %d\n", pi, nb_ports);
 	printf("Done\n");
 }
@@ -1810,6 +1848,9 @@ detach_port(portid_t port_id)
 
 	nb_ports = rte_eth_dev_count();
 
+	hotplug_list_add(rte_eth_devices[port_id].device->name,
+			 RTE_EAL_DEV_EVENT_ADD);
+
 	printf("Port '%s' is detached. Now total ports is %d\n",
 			name, nb_ports);
 	printf("Done\n");
@@ -1833,6 +1874,9 @@ pmd_test_exit(void)
 			close_port(pt_id);
 		}
 	}
+
+	rte_dev_monitor_stop();
+
 	printf("\nBye...\n");
 }
 
@@ -1917,6 +1961,49 @@ rmv_event_callback(void *arg)
 			dev->device->name);
 }
 
+static void
+rmv_uevent_callback(void *arg)
+{
+	char name[RTE_ETH_NAME_MAX_LEN];
+	uint8_t port_id = (intptr_t)arg;
+
+	rte_eal_alarm_cancel(rmv_uevent_callback, arg);
+
+	RTE_ETH_VALID_PORTID_OR_RET(port_id);
+	printf("removing port id:%u\n", port_id);
+
+	if (!in_hotplug_list(rte_eth_devices[port_id].device->name))
+		return;
+
+	stop_packet_forwarding();
+
+	stop_port(port_id);
+	close_port(port_id);
+	if (rte_eth_dev_detach(port_id, name)) {
+		RTE_LOG(ERR, USER1, "Failed to detach port '%s'\n", name);
+		return;
+	}
+
+	nb_ports = rte_eth_dev_count();
+
+	printf("Port '%s' is detached. Now total ports is %d\n",
+			name, nb_ports);
+}
+
+static void
+add_uevent_callback(void *arg)
+{
+	char *dev_name = (char *)arg;
+
+	rte_eal_alarm_cancel(add_uevent_callback, arg);
+
+	if (!in_hotplug_list(dev_name))
+		return;
+
+	RTE_LOG(ERR, EAL, "add device: %s\n", dev_name);
+	attach_port(dev_name);
+}
+
 /* This function is used by the interrupt thread */
 static int
 eth_event_callback(portid_t port_id, enum rte_eth_event_type type, void *param,
@@ -1959,6 +2046,88 @@ eth_event_callback(portid_t port_id, enum rte_eth_event_type type, void *param,
 }
 
 static int
+in_hotplug_list(const char *dev_name)
+{
+	struct hotplug_request *hp_request = NULL;
+
+	TAILQ_FOREACH(hp_request, &hp_list, next) {
+		if (!strcmp(hp_request->dev_name, dev_name))
+			break;
+	}
+
+	if (hp_request)
+		return 1;
+
+	return 0;
+}
+
+static int
+hotplug_list_add(const char *dev_name, enum rte_eal_dev_event_type event)
+{
+	struct hotplug_request *hp_request;
+
+	hp_request = rte_zmalloc("hoplug request",
+			sizeof(*hp_request), 0);
+	if (hp_request == NULL) {
+		fprintf(stderr, "%s can not alloc memory\n",
+			__func__);
+		return -ENOMEM;
+	}
+
+	hp_request->dev_name = dev_name;
+	hp_request->event = event;
+
+	TAILQ_INSERT_TAIL(&hp_list, hp_request, next);
+
+	return 0;
+}
+
+/* This function is used by the interrupt thread */
+static int
+eth_uevent_callback(enum rte_eal_dev_event_type type, void *arg,
+		  void *ret_param)
+{
+	static const char * const event_desc[] = {
+		[RTE_EAL_DEV_EVENT_UNKNOWN] = "Unknown",
+		[RTE_EAL_DEV_EVENT_ADD] = "add",
+		[RTE_EAL_DEV_EVENT_REMOVE] = "remove",
+	};
+	static char *device_name;
+
+	RTE_SET_USED(ret_param);
+
+	if (type >= RTE_EAL_DEV_EVENT_MAX) {
+		fprintf(stderr, "%s called upon invalid event %d\n",
+			__func__, type);
+		fflush(stderr);
+	} else if (event_print_mask & (UINT32_C(1) << type)) {
+		printf("%s event\n",
+			event_desc[type]);
+		fflush(stdout);
+	}
+
+	switch (type) {
+	case RTE_EAL_DEV_EVENT_REMOVE:
+		if (rte_eal_alarm_set(100000,
+			rmv_uevent_callback, arg))
+			fprintf(stderr, "Could not set up deferred "
+				"device removal\n");
+		break;
+	case RTE_EAL_DEV_EVENT_ADD:
+		device_name = malloc(strlen((const char *)arg) + 1);
+		strcpy(device_name, arg);
+		if (rte_eal_alarm_set(500000,
+			add_uevent_callback, device_name))
+			fprintf(stderr, "Could not set up deferred "
+				"device add\n");
+		break;
+	default:
+		break;
+	}
+	return 0;
+}
+
+static int
 set_tx_queue_stats_mapping_registers(portid_t port_id, struct rte_port *port)
 {
 	uint16_t i;
@@ -2438,6 +2607,15 @@ main(int argc, char** argv)
 		       nb_rxq, nb_txq);
 
 	init_config();
+
+	/* enable hot plug monitoring */
+	TAILQ_INIT(&hp_list);
+	rte_eal_dev_monitor_enable();
+	RTE_ETH_FOREACH_DEV(port_id) {
+		hotplug_list_add(rte_eth_devices[port_id].device->name,
+			 RTE_EAL_DEV_EVENT_REMOVE);
+		eth_uevent_callback_register(port_id);
+	}
 	if (start_port(RTE_PORT_ALL) != 0)
 		rte_exit(EXIT_FAILURE, "Start ports failed\n");
 
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 1639d27..9a1088b 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -92,6 +92,15 @@ typedef uint16_t streamid_t;
 #define TM_MODE			0
 #endif
 
+struct hotplug_request {
+	TAILQ_ENTRY(hotplug_request) next; /**< Callbacks list */
+	const char *dev_name;                /* request device name */
+	enum rte_eal_dev_event_type event;      /**< device event type */
+};
+
+/** @internal Structure to keep track of registered callbacks */
+TAILQ_HEAD(hotplug_request_list, hotplug_request);
+
 enum {
 	PORT_TOPOLOGY_PAIRED,
 	PORT_TOPOLOGY_CHAINED,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* Re: [PATCH v7 1/2] eal: add uevent monitor for hot plug
  2018-01-02 17:02                         ` Matan Azrad
@ 2018-01-08  5:26                           ` Guo, Jia
  2018-01-08  8:14                             ` Matan Azrad
  2018-01-08  6:05                           ` Guo, Jia
  1 sibling, 1 reply; 494+ messages in thread
From: Guo, Jia @ 2018-01-08  5:26 UTC (permalink / raw)
  To: Matan Azrad, stephen, bruce.richardson, ferruh.yigit, gaetan.rivet
  Cc: konstantin.ananyev, jblunck, shreyansh.jain, jingjing.wu, dev,
	Thomas Monjalon, helin.zhang, Mordechay Haimovsky

thanks , matan


On 1/3/2018 1:02 AM, Matan Azrad wrote:
> Hi Jeff
>
> Maybe I'm touching in previous discussions but please see some comments\questions.
>
> From: Jeff Guo:
>> This patch aim to add a general uevent mechanism in eal device layer,
>> to enable all linux kernel object hot plug monitoring, so user could use these
>> APIs to monitor and read out the device status info that sent from the kernel
>> side, then corresponding to handle it, such as detach or attach the
>> device, and even benefit to use it to do smoothly fail safe work.
>>
>> 1) About uevent monitoring:
>> a: add one epolling to poll the netlink socket, to monitor the uevent of
>>     the device, add device_state in struct of rte_device, to identify the
>>     device state machine.
>> b: add enum of rte_eal_dev_event_type and struct of rte_eal_uevent.
>> c: add below API in rte eal device common layer.
>>     rte_eal_dev_monitor_enable
>>     rte_dev_callback_register
>>     rte_dev_callback_unregister
>>     _rte_dev_callback_process
>>     rte_dev_monitor_start
>>     rte_dev_monitor_stop
>>
>> 2) About failure handler, use pci uio for example,
>>     add pci_remap_device in bus layer and below function to process it:
>>     rte_pci_remap_device
>>     pci_uio_remap_resource
>>     pci_map_private_resource
>>     add rte_pci_dev_bind_driver to bind pci device with explicit driver.
>>
>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
>> ---
>> v7->v6:
>> a.modify vdev part according to the vdev rework
>> b.re-define and split the func into common and bus specific code
>> c.fix some incorrect issue.
>> b.fix the system hung after send packcet issue.
>> ---
>>   drivers/bus/pci/bsd/pci.c                          |  30 ++
>>   drivers/bus/pci/linux/pci.c                        |  87 +++++
>>   drivers/bus/pci/linux/pci_init.h                   |   1 +
>>   drivers/bus/pci/pci_common.c                       |  43 +++
>>   drivers/bus/pci/pci_common_uio.c                   |  28 ++
>>   drivers/bus/pci/private.h                          |  12 +
>>   drivers/bus/pci/rte_bus_pci.h                      |  25 ++
>>   drivers/bus/vdev/vdev.c                            |  36 +++
>>   lib/librte_eal/bsdapp/eal/eal_dev.c                |  64 ++++
>>   .../bsdapp/eal/include/exec-env/rte_dev.h          | 106 ++++++
>>   lib/librte_eal/common/eal_common_bus.c             |  30 ++
>>   lib/librte_eal/common/eal_common_dev.c             | 169 ++++++++++
>>   lib/librte_eal/common/include/rte_bus.h            |  69 ++++
>>   lib/librte_eal/common/include/rte_dev.h            |  89 ++++++
>>   lib/librte_eal/linuxapp/eal/Makefile               |   3 +-
>>   lib/librte_eal/linuxapp/eal/eal_alarm.c            |   5 +
>>   lib/librte_eal/linuxapp/eal/eal_dev.c              | 356
>> +++++++++++++++++++++
>>   .../linuxapp/eal/include/exec-env/rte_dev.h        | 106 ++++++
>>   lib/librte_eal/linuxapp/igb_uio/igb_uio.c          |   6 +
>>   lib/librte_pci/rte_pci.c                           |  20 ++
>>   lib/librte_pci/rte_pci.h                           |  17 +
>>   21 files changed, 1301 insertions(+), 1 deletion(-)
>>   create mode 100644 lib/librte_eal/bsdapp/eal/eal_dev.c
>>   create mode 100644 lib/librte_eal/bsdapp/eal/include/exec-env/rte_dev.h
>>   create mode 100644 lib/librte_eal/linuxapp/eal/eal_dev.c
>>   create mode 100644 lib/librte_eal/linuxapp/eal/include/exec-env/rte_dev.h
>>
>> diff --git a/drivers/bus/pci/bsd/pci.c b/drivers/bus/pci/bsd/pci.c
>> index b8e2178..d58dbf6 100644
>> --- a/drivers/bus/pci/bsd/pci.c
>> +++ b/drivers/bus/pci/bsd/pci.c
>> @@ -126,6 +126,29 @@ rte_pci_unmap_device(struct rte_pci_device *dev)
>>   	}
>>   }
>>
>> +/* re-map pci device */
>> +int
>> +rte_pci_remap_device(struct rte_pci_device *dev)
>> +{
>> +	int ret;
>> +
>> +	if (dev == NULL)
>> +		return -EINVAL;
>> +
>> +	switch (dev->kdrv) {
>> +	case RTE_KDRV_NIC_UIO:
>> +		ret = pci_uio_remap_resource(dev);
>> +		break;
>> +	default:
>> +		RTE_LOG(DEBUG, EAL,
>> +			"  Not managed by a supported kernel driver,
>> skipped\n");
>> +		ret = 1;
>> +		break;
>> +	}
>> +
>> +	return ret;
>> +}
>> +
>>   void
>>   pci_uio_free_resource(struct rte_pci_device *dev,
>>   		struct mapped_pci_resource *uio_res)
>> @@ -678,3 +701,10 @@ rte_pci_ioport_unmap(struct rte_pci_ioport *p)
>>
>>   	return ret;
>>   }
>> +
>> +int
>> +rte_pci_dev_bind_driver(const char *dev_name, const char *drv_type)
>> +{
>> +	return -1;
>> +}
>> +
>> diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
>> index 5da6728..792fd2c 100644
>> --- a/drivers/bus/pci/linux/pci.c
>> +++ b/drivers/bus/pci/linux/pci.c
>> @@ -145,6 +145,38 @@ rte_pci_unmap_device(struct rte_pci_device *dev)
>>   	}
>>   }
>>
>> +/* Map pci device */
>> +int
>> +rte_pci_remap_device(struct rte_pci_device *dev)
>> +{
>> +	int ret = -1;
>> +
>> +	if (dev == NULL)
>> +		return -EINVAL;
>> +
>> +	switch (dev->kdrv) {
>> +	case RTE_KDRV_VFIO:
>> +#ifdef VFIO_PRESENT
>> +		/* no thing to do */
>> +#endif
>> +		break;
>> +	case RTE_KDRV_IGB_UIO:
>> +	case RTE_KDRV_UIO_GENERIC:
>> +		if (rte_eal_using_phys_addrs()) {
>> +			/* map resources for devices that use uio */
>> +			ret = pci_uio_remap_resource(dev);
>> +		}
>> +		break;
>> +	default:
>> +		RTE_LOG(DEBUG, EAL,
>> +			"  Not managed by a supported kernel driver,
>> skipped\n");
>> +		ret = 1;
>> +		break;
>> +	}
>> +
>> +	return ret;
>> +}
>> +
>>   void *
>>   pci_find_max_end_va(void)
>>   {
>> @@ -386,6 +418,8 @@ pci_scan_one(const char *dirname, const struct
>> rte_pci_addr *addr)
>>   		rte_pci_add_device(dev);
>>   	}
>>
>> +	dev->device.state = DEVICE_PARSED;
>> +	TAILQ_INIT(&(dev->device.uev_cbs));
>>   	return 0;
>>   }
>>
>> @@ -854,3 +888,56 @@ rte_pci_ioport_unmap(struct rte_pci_ioport *p)
>>
>>   	return ret;
>>   }
>> +
>> +int
>> +rte_pci_dev_bind_driver(const char *dev_name, const char *drv_type)
>> +{
>> +	char drv_bind_path[1024];
>> +	char drv_override_path[1024]; /* contains the /dev/uioX */
>> +	int drv_override_fd;
>> +	int drv_bind_fd;
>> +
>> +	RTE_SET_USED(drv_type);
>> +
>> +	snprintf(drv_override_path, sizeof(drv_override_path),
>> +		"/sys/bus/pci/devices/%s/driver_override", dev_name);
>> +
>> +	/* specify the driver for a device by writing to driver_override */
>> +	drv_override_fd = open(drv_override_path, O_WRONLY);
>> +	if (drv_override_fd < 0) {
>> +		RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
>> +			drv_override_path, strerror(errno));
>> +		goto err;
>> +	}
>> +
>> +	if (write(drv_override_fd, drv_type, sizeof(drv_type)) < 0) {
>> +		RTE_LOG(ERR, EAL,
>> +			"Error: bind failed - Cannot write "
>> +			"driver %s to device %s\n", drv_type, dev_name);
>> +		goto err;
>> +	}
>> +
>> +	close(drv_override_fd);
>> +
>> +	snprintf(drv_bind_path, sizeof(drv_bind_path),
>> +		"/sys/bus/pci/drivers/%s/bind", drv_type);
>> +
>> +	/* do the bind by writing device to the specific driver  */
>> +	drv_bind_fd = open(drv_bind_path, O_WRONLY | O_APPEND);
>> +	if (drv_bind_fd < 0) {
>> +		RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
>> +			drv_bind_path, strerror(errno));
>> +		goto err;
>> +	}
>> +
>> +	if (write(drv_bind_fd, dev_name, sizeof(dev_name)) < 0)
>> +		goto err;
>> +
>> +	close(drv_bind_fd);
>> +	return 0;
>> +err:
>> +	close(drv_override_fd);
>> +	close(drv_bind_fd);
>> +	return -1;
>> +}
>> +
>> diff --git a/drivers/bus/pci/linux/pci_init.h b/drivers/bus/pci/linux/pci_init.h
>> index f342c47..5838402 100644
>> --- a/drivers/bus/pci/linux/pci_init.h
>> +++ b/drivers/bus/pci/linux/pci_init.h
>> @@ -58,6 +58,7 @@ int pci_uio_alloc_resource(struct rte_pci_device *dev,
>>   		struct mapped_pci_resource **uio_res);
>>   void pci_uio_free_resource(struct rte_pci_device *dev,
>>   		struct mapped_pci_resource *uio_res);
>> +int pci_uio_remap_resource(struct rte_pci_device *dev);
>>   int pci_uio_map_resource_by_index(struct rte_pci_device *dev, int
>> res_idx,
>>   		struct mapped_pci_resource *uio_res, int map_idx);
>>
>> diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
>> index 104fdf9..5417b32 100644
>> --- a/drivers/bus/pci/pci_common.c
>> +++ b/drivers/bus/pci/pci_common.c
>> @@ -282,6 +282,7 @@ pci_probe_all_drivers(struct rte_pci_device *dev)
>>   		if (rc > 0)
>>   			/* positive value means driver doesn't support it */
>>   			continue;
>> +		dev->device.state = DEVICE_PROBED;
>>   		return 0;
>>   	}
>>   	return 1;
>> @@ -481,6 +482,7 @@ rte_pci_insert_device(struct rte_pci_device
>> *exist_pci_dev,
>>   void
>>   rte_pci_remove_device(struct rte_pci_device *pci_dev)
>>   {
>> +	RTE_LOG(DEBUG, EAL, " rte_pci_remove_device for device list\n");
>>   	TAILQ_REMOVE(&rte_pci_bus.device_list, pci_dev, next);
>>   }
>>
>> @@ -502,6 +504,44 @@ pci_find_device(const struct rte_device *start,
>> rte_dev_cmp_t cmp,
>>   	return NULL;
>>   }
>>
>> +static struct rte_device *
>> +pci_find_device_by_name(const struct rte_device *start,
>> +		rte_dev_cmp_name_t cmp_name,
>> +		const void *data)
>> +{
>> +	struct rte_pci_device *dev;
>> +
>> +	FOREACH_DEVICE_ON_PCIBUS(dev) {
>> +		if (start && &dev->device == start) {
>> +			start = NULL; /* starting point found */
>> +			continue;
>> +		}
>> +		if (cmp_name(dev->device.name, data) == 0)
>> +			return &dev->device;
>> +	}
>> +
>> +	return NULL;
>> +}
>> +
>> +static int
>> +pci_remap_device(struct rte_device *dev)
>> +{
>> +	struct rte_pci_device *pdev;
>> +	int ret;
>> +
>> +	if (dev == NULL)
>> +		return -EINVAL;
>> +
>> +	pdev = RTE_DEV_TO_PCI(dev);
>> +
>> +	/* remap resources for devices that use igb_uio */
>> +	ret = rte_pci_remap_device(pdev);
>> +	if (ret != 0)
>> +		RTE_LOG(ERR, EAL, "failed to remap device %s",
>> +			dev->name);
>> +	return ret;
>> +}
>> +
>>   static int
>>   pci_plug(struct rte_device *dev)
>>   {
>> @@ -528,10 +568,13 @@ struct rte_pci_bus rte_pci_bus = {
>>   		.scan = rte_pci_scan,
>>   		.probe = rte_pci_probe,
>>   		.find_device = pci_find_device,
>> +		.find_device_by_name = pci_find_device_by_name,
>>   		.plug = pci_plug,
>>   		.unplug = pci_unplug,
>>   		.parse = pci_parse,
>>   		.get_iommu_class = rte_pci_get_iommu_class,
>> +		.remap_device = pci_remap_device,
>> +		.bind_driver = rte_pci_dev_bind_driver,
>>   	},
>>   	.device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
>>   	.driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
>> diff --git a/drivers/bus/pci/pci_common_uio.c
>> b/drivers/bus/pci/pci_common_uio.c
>> index 0671131..8cb4009 100644
>> --- a/drivers/bus/pci/pci_common_uio.c
>> +++ b/drivers/bus/pci/pci_common_uio.c
>> @@ -176,6 +176,34 @@ pci_uio_unmap(struct mapped_pci_resource
>> *uio_res)
>>   	}
>>   }
>>
>> +/* remap the PCI resource of a PCI device in private virtual memory */
>> +int
>> +pci_uio_remap_resource(struct rte_pci_device *dev)
>> +{
>> +	int i;
>> +	uint64_t phaddr;
>> +	void *map_address;
>> +
>> +	/* Map all BARs */
>> +	for (i = 0; i != PCI_MAX_RESOURCE; i++) {
>> +		/* skip empty BAR */
>> +		phaddr = dev->mem_resource[i].phys_addr;
>> +		if (phaddr == 0)
>> +			continue;
>> +		map_address = pci_map_private_resource(
>> +				dev->mem_resource[i].addr, 0,
>> +				(size_t)dev->mem_resource[i].len);
>> +		if (map_address == MAP_FAILED)
>> +			goto error;
>> +		memset(map_address, 0xFF, (size_t)dev-
>>> mem_resource[i].len);
>> +		dev->mem_resource[i].addr = map_address;
>> +	}
>> +
>> +	return 0;
>> +error:
>> +	return -1;
>> +}
>> +
>>   static struct mapped_pci_resource *
>>   pci_uio_find_resource(struct rte_pci_device *dev)
>>   {
>> diff --git a/drivers/bus/pci/private.h b/drivers/bus/pci/private.h
>> index 2283f09..10baa1a 100644
>> --- a/drivers/bus/pci/private.h
>> +++ b/drivers/bus/pci/private.h
>> @@ -202,6 +202,18 @@ void pci_uio_free_resource(struct rte_pci_device
>> *dev,
>>   		struct mapped_pci_resource *uio_res);
>>
>>   /**
>> + * remap the pci uio resource..
>> + *
>> + * @param dev
>> + *   Point to the struct rte pci device.
>> + * @return
>> + *   - On success, zero.
>> + *   - On failure, a negative value.
>> + */
>> +int
>> +pci_uio_remap_resource(struct rte_pci_device *dev);
>> +
>> +/**
>>    * Map device memory to uio resource
>>    *
>>    * This function is private to EAL.
>> diff --git a/drivers/bus/pci/rte_bus_pci.h b/drivers/bus/pci/rte_bus_pci.h
>> index d4a2996..1662f3b 100644
>> --- a/drivers/bus/pci/rte_bus_pci.h
>> +++ b/drivers/bus/pci/rte_bus_pci.h
>> @@ -52,6 +52,8 @@ extern "C" {
>>   #include <sys/queue.h>
>>   #include <stdint.h>
>>   #include <inttypes.h>
>> +#include <unistd.h>
>> +#include <fcntl.h>
>>
>>   #include <rte_debug.h>
>>   #include <rte_interrupts.h>
>> @@ -197,6 +199,15 @@ int rte_pci_map_device(struct rte_pci_device *dev);
>>   void rte_pci_unmap_device(struct rte_pci_device *dev);
>>
>>   /**
>> + * Remap this device
>> + *
>> + * @param dev
>> + *   A pointer to a rte_pci_device structure describing the device
>> + *   to use
>> + */
>> +int rte_pci_remap_device(struct rte_pci_device *dev);
>> +
>> +/**
>>    * Dump the content of the PCI bus.
>>    *
>>    * @param f
>> @@ -333,6 +344,20 @@ void rte_pci_ioport_read(struct rte_pci_ioport *p,
>>   void rte_pci_ioport_write(struct rte_pci_ioport *p,
>>   		const void *data, size_t len, off_t offset);
>>
>> +/**
>> + * It can be used to bind a device to a specific type of driver.
>> + *
>> + * @param dev_name
>> + *  The device name.
>> + * @param drv_type
>> + *  The specific driver's type.
>> + *
>> + * @return
>> + *  - On success, zero.
>> + *  - On failure, a negative value.
>> + */
>> +int rte_pci_dev_bind_driver(const char *dev_name, const char *drv_type);
>> +
>>   #ifdef __cplusplus
>>   }
>>   #endif
>> diff --git a/drivers/bus/vdev/vdev.c b/drivers/bus/vdev/vdev.c
>> index fd7736d..773f6e0 100644
>> --- a/drivers/bus/vdev/vdev.c
>> +++ b/drivers/bus/vdev/vdev.c
>> @@ -323,6 +323,39 @@ vdev_find_device(const struct rte_device *start,
>> rte_dev_cmp_t cmp,
>>   	return NULL;
>>   }
>>
>> +static struct rte_device *
>> +vdev_find_device_by_name(const struct rte_device *start,
>> +		rte_dev_cmp_name_t cmp_name,
>> +		const void *data)
>> +{
>> +	struct rte_vdev_device *dev;
>> +
>> +	TAILQ_FOREACH(dev, &vdev_device_list, next) {
>> +		if (start && &dev->device == start) {
>> +			start = NULL;
>> +			continue;
>> +		}
>> +		if (cmp_name(dev->device.name, data) == 0)
>> +			return &dev->device;
>> +	}
>> +	return NULL;
>> +}
>> +
>> +static int
>> +vdev_remap_device(struct rte_device *dev)
>> +{
>> +	RTE_SET_USED(dev);
>> +	return 0;
>> +}
>> +
>> +static int
>> +vdev_bind_driver(const char *dev_name, const char *drv_type)
>> +{
>> +	RTE_SET_USED(dev_name);
>> +	RTE_SET_USED(drv_type);
>> +	return 0;
>> +}
>> +
>>   static int
>>   vdev_plug(struct rte_device *dev)
>>   {
>> @@ -339,9 +372,12 @@ static struct rte_bus rte_vdev_bus = {
>>   	.scan = vdev_scan,
>>   	.probe = vdev_probe,
>>   	.find_device = vdev_find_device,
>> +	.find_device_by_name = vdev_find_device_by_name,
>>   	.plug = vdev_plug,
>>   	.unplug = vdev_unplug,
>>   	.parse = vdev_parse,
>> +	.remap_device = vdev_remap_device,
>> +	.bind_driver = vdev_bind_driver,
>>   };
>>
>>   RTE_REGISTER_BUS(vdev, rte_vdev_bus);
>> diff --git a/lib/librte_eal/bsdapp/eal/eal_dev.c
>> b/lib/librte_eal/bsdapp/eal/eal_dev.c
>> new file mode 100644
>> index 0000000..6ea9a74
>> --- /dev/null
>> +++ b/lib/librte_eal/bsdapp/eal/eal_dev.c
>> @@ -0,0 +1,64 @@
>> +/*-
>> + *   Copyright(c) 2010-2017 Intel Corporation.
>> + *   All rights reserved.
>> + *
>> + *   Redistribution and use in source and binary forms, with or without
>> + *   modification, are permitted provided that the following conditions
>> + *   are met:
>> + *
>> + *     * Redistributions of source code must retain the above copyright
>> + *       notice, this list of conditions and the following disclaimer.
>> + *     * Redistributions in binary form must reproduce the above copyright
>> + *       notice, this list of conditions and the following disclaimer in
>> + *       the documentation and/or other materials provided with the
>> + *       distribution.
>> + *     * Neither the name of Intel Corporation nor the names of its
>> + *       contributors may be used to endorse or promote products derived
>> + *       from this software without specific prior written permission.
>> + *
>> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
>> CONTRIBUTORS
>> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT
>> NOT
>> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
>> FITNESS FOR
>> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
>> COPYRIGHT
>> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
>> INCIDENTAL,
>> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
>> NOT
>> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS
>> OF USE,
>> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
>> AND ON ANY
>> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
>> TORT
>> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
>> THE USE
>> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
>> DAMAGE.
>> + */
>> +
>> +#include <stdio.h>
>> +#include <string.h>
>> +#include <inttypes.h>
>> +#include <sys/queue.h>
>> +#include <sys/signalfd.h>
>> +#include <sys/ioctl.h>
>> +#include <sys/socket.h>
>> +#include <linux/netlink.h>
>> +#include <sys/epoll.h>
>> +#include <unistd.h>
>> +#include <signal.h>
>> +#include <stdbool.h>
>> +
>> +#include <rte_malloc.h>
>> +#include <rte_bus.h>
>> +#include <rte_dev.h>
>> +#include <rte_devargs.h>
>> +#include <rte_debug.h>
>> +#include <rte_log.h>
>> +
>> +#include "eal_thread.h"
>> +
>> +int
>> +rte_dev_monitor_start(void)
>> +{
>> +	return -1;
>> +}
>> +
>> +int
>> +rte_dev_monitor_stop(void)
>> +{
>> +	return -1;
>> +}
>> diff --git a/lib/librte_eal/bsdapp/eal/include/exec-env/rte_dev.h
>> b/lib/librte_eal/bsdapp/eal/include/exec-env/rte_dev.h
>> new file mode 100644
>> index 0000000..6a6feb5
>> --- /dev/null
>> +++ b/lib/librte_eal/bsdapp/eal/include/exec-env/rte_dev.h
>> @@ -0,0 +1,106 @@
>> +/*-
>> + *   BSD LICENSE
>> + *
>> + *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
>> + *   All rights reserved.
>> + *
>> + *   Redistribution and use in source and binary forms, with or without
>> + *   modification, are permitted provided that the following conditions
>> + *   are met:
>> + *
>> + *     * Redistributions of source code must retain the above copyright
>> + *       notice, this list of conditions and the following disclaimer.
>> + *     * Redistributions in binary form must reproduce the above copyright
>> + *       notice, this list of conditions and the following disclaimer in
>> + *       the documentation and/or other materials provided with the
>> + *       distribution.
>> + *     * Neither the name of Intel Corporation nor the names of its
>> + *       contributors may be used to endorse or promote products derived
>> + *       from this software without specific prior written permission.
>> + *
>> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
>> CONTRIBUTORS
>> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT
>> NOT
>> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
>> FITNESS FOR
>> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
>> COPYRIGHT
>> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
>> INCIDENTAL,
>> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
>> NOT
>> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS
>> OF USE,
>> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
>> AND ON ANY
>> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
>> TORT
>> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
>> THE USE
>> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
>> DAMAGE.
>> + */
>> +
>> +#ifndef _RTE_DEV_H_
>> +#error "don't include this file directly, please include generic <rte_dev.h>"
>> +#endif
>> +
>> +#ifndef _RTE_LINUXAPP_DEV_H_
>> +#define _RTE_LINUXAPP_DEV_H_
>> +
>> +#include <stdio.h>
>> +
>> +#include <rte_dev.h>
>> +
>> +#define RTE_EAL_UEV_MSG_LEN 4096
>> +#define RTE_EAL_UEV_MSG_ELEM_LEN 128
>> +
>> +enum uev_subsystem {
>> +	UEV_SUBSYSTEM_UIO,
>> +	UEV_SUBSYSTEM_VFIO,
>> +	UEV_SUBSYSTEM_PCI,
>> +	UEV_SUBSYSTEM_MAX
>> +};
>> +
>> +enum uev_monitor_netlink_group {
>> +	UEV_MONITOR_KERNEL,
>> +	UEV_MONITOR_UDEV,
>> +};
>> +
>> +/**
>> + * The device event type.
>> + */
>> +enum rte_eal_dev_event_type {
>> +	RTE_EAL_DEV_EVENT_UNKNOWN,	/**< unknown event type */
>> +	RTE_EAL_DEV_EVENT_ADD,		/**< device adding event */
>> +	RTE_EAL_DEV_EVENT_REMOVE,
>> +					/**< device removing event */
>> +	RTE_EAL_DEV_EVENT_CHANGE,
>> +					/**< device status change event */
>> +	RTE_EAL_DEV_EVENT_MOVE,		/**< device sys path move
>> event */
>> +	RTE_EAL_DEV_EVENT_ONLINE,	/**< device online event */
>> +	RTE_EAL_DEV_EVENT_OFFLINE,	/**< device offline event */
>> +	RTE_EAL_DEV_EVENT_MAX		/**< max value of this enum
>> */
>> +};
>> +
>> +struct rte_eal_uevent {
>> +	enum rte_eal_dev_event_type type;	/**< device event type */
>> +	int subsystem;				/**< subsystem id */
>> +	char *devname;				/**< device name */
>> +	enum uev_monitor_netlink_group group;	/**< device netlink
>> group */
>> +};
>> +
>> +/**
>> + * Start the device uevent monitoring.
>> + *
>> + * @param none
>> + * @return
>> + *   - On success, zero.
>> + *   - On failure, a negative value.
>> + */
>> +int
>> +rte_dev_monitor_start(void);
>> +
>> +/**
>> + * Stop the device uevent monitoring .
>> + *
>> + * @param none
>> + * @return
>> + *   - On success, zero.
>> + *   - On failure, a negative value.
>> + */
>> +
>> +int
>> +rte_dev_monitor_stop(void);
>> +
>> +#endif /* _RTE_LINUXAPP_DEV_H_ */
>> diff --git a/lib/librte_eal/common/eal_common_bus.c
>> b/lib/librte_eal/common/eal_common_bus.c
>> index 3e022d5..b7219c9 100644
>> --- a/lib/librte_eal/common/eal_common_bus.c
>> +++ b/lib/librte_eal/common/eal_common_bus.c
>> @@ -51,8 +51,11 @@ rte_bus_register(struct rte_bus *bus)
>>   	RTE_VERIFY(bus->scan);
>>   	RTE_VERIFY(bus->probe);
>>   	RTE_VERIFY(bus->find_device);
>> +	RTE_VERIFY(bus->find_device_by_name);
>>   	/* Buses supporting driver plug also require unplug. */
>>   	RTE_VERIFY(!bus->plug || bus->unplug);
>> +	RTE_VERIFY(bus->remap_device);
>> +	RTE_VERIFY(bus->bind_driver);
>>
>>   	TAILQ_INSERT_TAIL(&rte_bus_list, bus, next);
>>   	RTE_LOG(DEBUG, EAL, "Registered [%s] bus.\n", bus->name);
>> @@ -170,6 +173,14 @@ cmp_rte_device(const struct rte_device *dev1,
>> const void *_dev2)
>>   }
>>
>>   static int
>> +cmp_rte_device_name(const char *dev_name1, const void *_dev_name2)
>> +{
>> +	const char *dev_name2 = _dev_name2;
>> +
>> +	return strcmp(dev_name1, dev_name2);
>> +}
>> +
>> +static int
>>   bus_find_device(const struct rte_bus *bus, const void *_dev)
>>   {
>>   	struct rte_device *dev;
>> @@ -178,6 +189,25 @@ bus_find_device(const struct rte_bus *bus, const
>> void *_dev)
>>   	return dev == NULL;
>>   }
>>
>> +static struct rte_device *
>> +bus_find_device_by_name(const struct rte_bus *bus, const void
>> *_dev_name)
>> +{
>> +	struct rte_device *dev;
>> +
>> +	dev = bus->find_device_by_name(NULL, cmp_rte_device_name,
>> _dev_name);
>> +	return dev;
>> +}
>> +
>> +struct rte_device *
>> +
>> +rte_bus_find_device(const struct rte_bus *bus, const void *_dev_name)
>> +{
>> +	struct rte_device *dev;
>> +
>> +	dev = bus_find_device_by_name(bus, _dev_name);
>> +	return dev;
>> +}
>> +
>>   struct rte_bus *
>>   rte_bus_find_by_device(const struct rte_device *dev)
>>   {
>> diff --git a/lib/librte_eal/common/eal_common_dev.c
>> b/lib/librte_eal/common/eal_common_dev.c
>> index dda8f58..47909e8 100644
>> --- a/lib/librte_eal/common/eal_common_dev.c
>> +++ b/lib/librte_eal/common/eal_common_dev.c
>> @@ -42,9 +42,31 @@
>>   #include <rte_devargs.h>
>>   #include <rte_debug.h>
>>   #include <rte_log.h>
>> +#include <rte_spinlock.h>
>> +#include <rte_malloc.h>
>>
>>   #include "eal_private.h"
>>
>> +/* spinlock for device callbacks */
>> +static rte_spinlock_t rte_dev_cb_lock = RTE_SPINLOCK_INITIALIZER;
>> +
>> +/**
>> + * The user application callback description.
>> + *
>> + * It contains callback address to be registered by user application,
>> + * the pointer to the parameters for callback, and the event type.
>> + */
>> +struct rte_eal_dev_callback {
>> +	TAILQ_ENTRY(rte_eal_dev_callback) next; /**< Callbacks list */
>> +	rte_eal_dev_cb_fn cb_fn;                /**< Callback address */
>> +	void *cb_arg;                           /**< Parameter for callback */
>> +	void *ret_param;                        /**< Return parameter */
>> +	enum rte_eal_dev_event_type event;      /**< device event type */
>> +	uint32_t active;                        /**< Callback is executing */
>> +};
>> +
>> +static struct rte_eal_dev_callback *dev_add_cb;
>> +
>>   static int cmp_detached_dev_name(const struct rte_device *dev,
>>   	const void *_name)
>>   {
>> @@ -234,3 +256,150 @@ int rte_eal_hotplug_remove(const char *busname,
>> const char *devname)
>>   	rte_eal_devargs_remove(busname, devname);
>>   	return ret;
>>   }
>> +
>> +int
>> +rte_eal_dev_monitor_enable(void)
>> +{
>> +	int ret;
>> +
>> +	ret = rte_dev_monitor_start();
>> +	if (ret)
>> +		RTE_LOG(ERR, EAL, "Can not init device monitor\n");
>> +	return ret;
>> +}
>> +
>> +int
>> +rte_dev_callback_register(struct rte_device *device,
>> +			enum rte_eal_dev_event_type event,
>> +			rte_eal_dev_cb_fn cb_fn, void *cb_arg)
>> +{
>> +	struct rte_eal_dev_callback *user_cb;
>> +
>> +	if (!cb_fn)
>> +		return -EINVAL;
>> +
> What's about checking the device pointer is not NULL ?
sure.
>
>> +	rte_spinlock_lock(&rte_dev_cb_lock);
>> +
>> +	if (TAILQ_EMPTY(&(device->uev_cbs)))
>> +		TAILQ_INIT(&(device->uev_cbs));
>> +
>> +	if (event == RTE_EAL_DEV_EVENT_ADD) {
>> +		user_cb = NULL;
>> +	} else {
>> +		TAILQ_FOREACH(user_cb, &(device->uev_cbs), next) {
>> +			if (user_cb->cb_fn == cb_fn &&
>> +				user_cb->cb_arg == cb_arg &&
>> +				user_cb->event == event) {
>> +				break;
>> +			}
>> +		}
>> +	}
>> +
>> +	/* create a new callback. */
>> +	if (user_cb == NULL) {
>> +		/* allocate a new interrupt callback entity */
>> +		user_cb = rte_zmalloc("eal device event",
>> +					sizeof(*user_cb), 0);
>> +		if (user_cb == NULL) {
>> +			RTE_LOG(ERR, EAL, "Can not allocate memory\n");
> Missing rte_spinlock_unlock.
got it.
>
>> +			return -ENOMEM;
>> +		}
>> +		user_cb->cb_fn = cb_fn;
>> +		user_cb->cb_arg = cb_arg;
>> +		user_cb->event = event;
>> +		if (event == RTE_EAL_DEV_EVENT_ADD)
>> +			dev_add_cb = user_cb;
> Only one dpdk entity can register to ADD callback?
>
> I suggest to add option to register all devices maybe by using dummy device which will include all the "ALL_DEVICES"  callbacks per event.
> All means past, present and future devices, by this way 1 callback can be called for all the devices and more than one dpdk entity could register to  an ADD\NEW event.
> What's about NEW instead of ADD?
>
> I also suggest to add the device pointer as a parameter to the callback(which will be managed by EAL).
if you talk about dev_add_cb, the add means device add not cb add, if 
you talk about dev event type, the ADD type is consistent with the type 
form kernel side, anyway could be find a better. but for 1 callback for 
all device, it is make sense , i will think about that.
>> +		else
>> +			TAILQ_INSERT_TAIL(&(device->uev_cbs), user_cb,
>> next);
>> +	}
>> +
>> +	rte_spinlock_unlock(&rte_dev_cb_lock);
>> +	return 0;
>> +}
>> +
>> +int
>> +rte_dev_callback_unregister(struct rte_device *device,
>> +			enum rte_eal_dev_event_type event,
>> +			rte_eal_dev_cb_fn cb_fn, void *cb_arg)
>> +{
>> +	int ret;
>> +	struct rte_eal_dev_callback *cb, *next;
>> +
>> +	if (!cb_fn)
>> +		return -EINVAL;
>> +
>> +	rte_spinlock_lock(&rte_dev_cb_lock);
>> +
>> +	ret = 0;
>> +	if (event == RTE_EAL_DEV_EVENT_ADD) {
>> +		rte_free(dev_add_cb);
>> +		dev_add_cb = NULL;
>> +	} else {
> Device NULL checking?
yes.
>> +		for (cb = TAILQ_FIRST(&(device->uev_cbs)); cb != NULL;
>> +		      cb = next) {
>> +
>> +			next = TAILQ_NEXT(cb, next);
>> +
>> +			if (cb->cb_fn != cb_fn || cb->event != event ||
>> +					(cb->cb_arg != (void *)-1 &&
>> +					cb->cb_arg != cb_arg))
>> +				continue;
>> +
>> +			/*
>> +			 * if this callback is not executing right now,
>> +			 * then remove it.
>> +			 */
>> +			if (cb->active == 0) {
>> +				TAILQ_REMOVE(&(device->uev_cbs), cb,
>> next);
>> +				rte_free(cb);
>> +			} else {
>> +				ret = -EAGAIN;
>> +			}
>> +		}
>> +	}
>> +	rte_spinlock_unlock(&rte_dev_cb_lock);
>> +	return ret;
>> +}
>> +
>> +int
>> +_rte_dev_callback_process(struct rte_device *device,
>> +			enum rte_eal_dev_event_type event,
>> +			void *cb_arg, void *ret_param)
>> +{
>> +	struct rte_eal_dev_callback dev_cb;
>> +	struct rte_eal_dev_callback *cb_lst;
>> +	int rc = 0;
>> +
>> +	rte_spinlock_lock(&rte_dev_cb_lock);
>> +	if (event == RTE_EAL_DEV_EVENT_ADD) {
>> +		if (cb_arg != NULL)
>> +			dev_add_cb->cb_arg = cb_arg;
>> +
>> +		if (ret_param != NULL)
>> +			dev_add_cb->ret_param = ret_param;
>> +
>> +		rte_spinlock_unlock(&rte_dev_cb_lock);
> Can't someone free it when it running?
> I suggest to  keep the lock locked.
> Callbacks are not allowed to use this mechanism to prevent deadlock.
seems it would bring some deadlock here, let's check it more.
>> +		rc = dev_add_cb->cb_fn(dev_add_cb->event,
>> +				dev_add_cb->cb_arg, dev_add_cb-
>>> ret_param);
>> +		rte_spinlock_lock(&rte_dev_cb_lock);
>> +	} else {
>> +		TAILQ_FOREACH(cb_lst, &(device->uev_cbs), next) {
>> +			if (cb_lst->cb_fn == NULL || cb_lst->event != event)
>> +				continue;
>> +			dev_cb = *cb_lst;
>> +			cb_lst->active = 1;
>> +			if (cb_arg != NULL)
>> +				dev_cb.cb_arg = cb_arg;
>> +			if (ret_param != NULL)
>> +				dev_cb.ret_param = ret_param;
>> +
>> +			rte_spinlock_unlock(&rte_dev_cb_lock);
> The current active flag doesn't do it  thread safe here, I suggest to keep the lock locked.
> Scenario:
> 	1. Thread A see active = 0 in unregister function.
> 	2. Context switch.
> 	3. Thread B start the callback.
> 	4. Context switch.
> 	5. Thread A free it.
> 	6. Context switch.
> 	7. Seg fault in Thread B.
the same as above.
>> +			rc = dev_cb.cb_fn(dev_cb.event,
>> +					dev_cb.cb_arg, dev_cb.ret_param);
>> +			rte_spinlock_lock(&rte_dev_cb_lock);
>> +			cb_lst->active = 0;
>> +		}
>> +	}
>> +	rte_spinlock_unlock(&rte_dev_cb_lock);
>> +	return rc;
>> +}
>> diff --git a/lib/librte_eal/common/include/rte_bus.h
>> b/lib/librte_eal/common/include/rte_bus.h
>> index 6fb0834..6c4ae31 100644
>> --- a/lib/librte_eal/common/include/rte_bus.h
>> +++ b/lib/librte_eal/common/include/rte_bus.h
>> @@ -122,6 +122,34 @@ typedef struct rte_device *
>>   			 const void *data);
>>
>>   /**
>> + * Device iterator to find a device on a bus.
>> + *
>> + * This function returns an rte_device if one of those held by the bus
>> + * matches the data passed as parameter.
>> + *
>> + * If the comparison function returns zero this function should stop iterating
>> + * over any more devices. To continue a search the device of a previous
>> search
>> + * can be passed via the start parameter.
>> + *
>> + * @param cmp
>> + *	the device name comparison function.
>> + *
>> + * @param data
>> + *	Data to compare each device against.
>> + *
>> + * @param start
>> + *	starting point for the iteration
>> + *
>> + * @return
>> + *	The first device matching the data, NULL if none exists.
>> + */
>> +typedef struct rte_device *
>> +(*rte_bus_find_device_by_name_t)(const struct rte_device *start,
>> +			 rte_dev_cmp_name_t cmp,
>> +			 const void *data);
>> +
>> +
>> +/**
>>    * Implementation specific probe function which is responsible for linking
>>    * devices on that bus with applicable drivers.
>>    *
>> @@ -168,6 +196,37 @@ typedef int (*rte_bus_unplug_t)(struct rte_device
>> *dev);
>>   typedef int (*rte_bus_parse_t)(const char *name, void *addr);
>>
>>   /**
>> + * Implementation specific remap function which is responsible for
>> remmaping
>> + * devices on that bus from original share memory resource to a private
>> memory
>> + * resource for the sake of device has been removal.
>> + *
>> + * @param dev
>> + *	Device pointer that was returned by a previous call to find_device.
>> + *
>> + * @return
>> + *	0 on success.
>> + *	!0 on error.
>> + */
>> +typedef int (*rte_bus_remap_device_t)(struct rte_device *dev);
>> +
>> +/**
>> + * Implementation specific bind driver function which is responsible for bind
>> + * a explicit type of driver with a devices on that bus.
>> + *
>> + * @param dev_name
>> + *	device textual description.
>> + *
>> + * @param drv_type
>> + *	driver type textual description.
>> + *
>> + * @return
>> + *	0 on success.
>> + *	!0 on error.
>> + */
>> +typedef int (*rte_bus_bind_driver_t)(const char *dev_name,
>> +				const char *drv_type);
>> +
>> +/**
>>    * Bus scan policies
>>    */
>>   enum rte_bus_scan_mode {
>> @@ -206,9 +265,13 @@ struct rte_bus {
>>   	rte_bus_scan_t scan;         /**< Scan for devices attached to bus */
>>   	rte_bus_probe_t probe;       /**< Probe devices on bus */
>>   	rte_bus_find_device_t find_device; /**< Find a device on the bus */
>> +	rte_bus_find_device_by_name_t find_device_by_name;
>> +				     /**< Find a device on the bus */
>>   	rte_bus_plug_t plug;         /**< Probe single device for drivers */
>>   	rte_bus_unplug_t unplug;     /**< Remove single device from driver
>> */
>>   	rte_bus_parse_t parse;       /**< Parse a device name */
>> +	rte_bus_remap_device_t remap_device;       /**< remap a device */
>> +	rte_bus_bind_driver_t bind_driver; /**< bind a driver for bus device
>> */
>>   	struct rte_bus_conf conf;    /**< Bus configuration */
>>   	rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu
>> class */
>>   };
>> @@ -306,6 +369,12 @@ struct rte_bus *rte_bus_find(const struct rte_bus
>> *start, rte_bus_cmp_t cmp,
>>   struct rte_bus *rte_bus_find_by_device(const struct rte_device *dev);
>>
>>   /**
>> + * Find the registered bus for a particular device.
>> + */
>> +struct rte_device *rte_bus_find_device(const struct rte_bus *bus,
>> +				const void *dev_name);
>> +
>> +/**
>>    * Find the registered bus for a given name.
>>    */
>>   struct rte_bus *rte_bus_find_by_name(const char *busname);
>> diff --git a/lib/librte_eal/common/include/rte_dev.h
>> b/lib/librte_eal/common/include/rte_dev.h
>> index 9342e0c..19971d0 100644
>> --- a/lib/librte_eal/common/include/rte_dev.h
>> +++ b/lib/librte_eal/common/include/rte_dev.h
>> @@ -51,6 +51,15 @@ extern "C" {
>>
>>   #include <rte_log.h>
>>
>> +#include <exec-env/rte_dev.h>
>> +
>> +typedef int (*rte_eal_dev_cb_fn)(enum rte_eal_dev_event_type event,
>> +					void *cb_arg, void *ret_param);
>> +
>> +struct rte_eal_dev_callback;
>> +/** @internal Structure to keep track of registered callbacks */
>> +TAILQ_HEAD(rte_eal_dev_cb_list, rte_eal_dev_callback);
>> +
>>   __attribute__((format(printf, 2, 0)))
>>   static inline void
>>   rte_pmd_debug_trace(const char *func_name, const char *fmt, ...)
>> @@ -157,6 +166,13 @@ struct rte_driver {
>>    */
>>   #define RTE_DEV_NAME_MAX_LEN 64
>>
>> +enum device_state {
>> +	DEVICE_UNDEFINED,
>> +	DEVICE_FAULT,
>> +	DEVICE_PARSED,
>> +	DEVICE_PROBED,
>> +};
>> +
>>   /**
>>    * A structure describing a generic device.
>>    */
>> @@ -166,6 +182,9 @@ struct rte_device {
>>   	const struct rte_driver *driver;/**< Associated driver */
>>   	int numa_node;                /**< NUMA node connection */
>>   	struct rte_devargs *devargs;  /**< Device user arguments */
>> +	enum device_state state;  /**< Device state */
>> +	/** User application callbacks for device event */
>> +	struct rte_eal_dev_cb_list uev_cbs;
>>   };
>>
>>   /**
>> @@ -248,6 +267,8 @@ int rte_eal_hotplug_remove(const char *busname,
>> const char *devname);
>>    */
>>   typedef int (*rte_dev_cmp_t)(const struct rte_device *dev, const void
>> *data);
>>
>> +typedef int (*rte_dev_cmp_name_t)(const char *dev_name, const void
>> *data);
>> +
>>   #define RTE_PMD_EXPORT_NAME_ARRAY(n, idx) n##idx[]
>>
>>   #define RTE_PMD_EXPORT_NAME(name, idx) \
>> @@ -293,4 +314,72 @@ __attribute__((used)) = str
>>   }
>>   #endif
>>
>> +/**
>> + * It enable the device event monitoring for a specific event.
>> + *
>> + * @param none
>> + * @return
>> + *   - On success, zero.
>> + *   - On failure, a negative value.
>> + */
>> +int
>> +rte_eal_dev_monitor_enable(void);
>> +/**
>> + * It registers the callback for the specific event. Multiple
>> + * callbacks cal be registered at the same time.
>> + * @param event
>> + *  The device event type.
>> + * @param cb_fn
>> + *  callback address.
>> + * @param cb_arg
>> + *  address of parameter for callback.
>> + *
>> + * @return
>> + *  - On success, zero.
>> + *  - On failure, a negative value.
>> + */
>> +int rte_dev_callback_register(struct rte_device *device,
>> +			enum rte_eal_dev_event_type event,
>> +			rte_eal_dev_cb_fn cb_fn, void *cb_arg);
>> +
>> +/**
>> + * It unregisters the callback according to the specified event.
>> + *
>> + * @param event
>> + *  The event type which corresponding to the callback.
>> + * @param cb_fn
>> + *  callback address.
>> + *  address of parameter for callback, (void *)-1 means to remove all
>> + *  registered which has the same callback address.
>> + *
>> + * @return
>> + *  - On success, return the number of callback entities removed.
>> + *  - On failure, a negative value.
>> + */
>> +int rte_dev_callback_unregister(struct rte_device *device,
>> +			enum rte_eal_dev_event_type event,
>> +			rte_eal_dev_cb_fn cb_fn, void *cb_arg);
>> +
>> +/**
>> + * @internal Executes all the user application registered callbacks for
>> + * the specific device. It is for DPDK internal user only. User
>> + * application should not call it directly.
>> + *
>> + * @param event
>> + *  The device event type.
>> + * @param cb_arg
>> + *  callback parameter.
>> + * @param ret_param
>> + *  To pass data back to user application.
>> + *  This allows the user application to decide if a particular function
>> + *  is permitted or not.
>> + *
>> + * @return
>> + *  - On success, return zero.
>> + *  - On failure, a negative value.
>> + */
>> +int
>> +_rte_dev_callback_process(struct rte_device *device,
>> +			enum rte_eal_dev_event_type event,
>> +			void *cb_arg, void *ret_param);
>>   #endif /* _RTE_DEV_H_ */
>> diff --git a/lib/librte_eal/linuxapp/eal/Makefile
>> b/lib/librte_eal/linuxapp/eal/Makefile
>> index 5a7b8b2..05a2437 100644
>> --- a/lib/librte_eal/linuxapp/eal/Makefile
>> +++ b/lib/librte_eal/linuxapp/eal/Makefile
>> @@ -67,6 +67,7 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) +=
>> eal_lcore.c
>>   SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_timer.c
>>   SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_interrupts.c
>>   SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_alarm.c
>> +SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_dev.c
>>
>>   # from common dir
>>   SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_common_lcore.c
>> @@ -120,7 +121,7 @@ ifeq ($(CONFIG_RTE_TOOLCHAIN_GCC),y)
>>   CFLAGS_eal_thread.o += -Wno-return-type
>>   endif
>>
>> -INC := rte_kni_common.h
>> +INC := rte_kni_common.h rte_dev.h
>>
>>   SYMLINK-$(CONFIG_RTE_EXEC_ENV_LINUXAPP)-include/exec-env := \
>>   	$(addprefix include/exec-env/,$(INC))
>> diff --git a/lib/librte_eal/linuxapp/eal/eal_alarm.c
>> b/lib/librte_eal/linuxapp/eal/eal_alarm.c
>> index 8e4a775..29e73a7 100644
>> --- a/lib/librte_eal/linuxapp/eal/eal_alarm.c
>> +++ b/lib/librte_eal/linuxapp/eal/eal_alarm.c
>> @@ -209,6 +209,7 @@ rte_eal_alarm_cancel(rte_eal_alarm_callback cb_fn,
>> void *cb_arg)
>>   	int count = 0;
>>   	int err = 0;
>>   	int executing;
>> +	int ret;
>>
>>   	if (!cb_fn) {
>>   		rte_errno = EINVAL;
>> @@ -259,6 +260,10 @@ rte_eal_alarm_cancel(rte_eal_alarm_callback cb_fn,
>> void *cb_arg)
>>   			}
>>   			ap_prev = ap;
>>   		}
>> +
>> +		ret |= rte_intr_callback_unregister(&intr_handle,
>> +				eal_alarm_callback, NULL);
>> +
>>   		rte_spinlock_unlock(&alarm_list_lk);
>>   	} while (executing != 0);
>>
>> diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c
>> b/lib/librte_eal/linuxapp/eal/eal_dev.c
>> new file mode 100644
>> index 0000000..49fd0dc
>> --- /dev/null
>> +++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
>> @@ -0,0 +1,356 @@
>> +/*-
>> + *   Copyright(c) 2010-2017 Intel Corporation.
>> + *   All rights reserved.
>> + *
>> + *   Redistribution and use in source and binary forms, with or without
>> + *   modification, are permitted provided that the following conditions
>> + *   are met:
>> + *
>> + *     * Redistributions of source code must retain the above copyright
>> + *       notice, this list of conditions and the following disclaimer.
>> + *     * Redistributions in binary form must reproduce the above copyright
>> + *       notice, this list of conditions and the following disclaimer in
>> + *       the documentation and/or other materials provided with the
>> + *       distribution.
>> + *     * Neither the name of Intel Corporation nor the names of its
>> + *       contributors may be used to endorse or promote products derived
>> + *       from this software without specific prior written permission.
>> + *
>> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
>> CONTRIBUTORS
>> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT
>> NOT
>> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
>> FITNESS FOR
>> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
>> COPYRIGHT
>> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
>> INCIDENTAL,
>> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
>> NOT
>> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS
>> OF USE,
>> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
>> AND ON ANY
>> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
>> TORT
>> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
>> THE USE
>> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
>> DAMAGE.
>> + */
>> +
>> +#include <stdio.h>
>> +#include <string.h>
>> +#include <inttypes.h>
>> +#include <sys/queue.h>
>> +#include <sys/signalfd.h>
>> +#include <sys/ioctl.h>
>> +#include <sys/socket.h>
>> +#include <linux/netlink.h>
>> +#include <sys/epoll.h>
>> +#include <unistd.h>
>> +#include <signal.h>
>> +#include <stdbool.h>
>> +
>> +#include <rte_malloc.h>
>> +#include <rte_bus.h>
>> +#include <rte_dev.h>
>> +#include <rte_devargs.h>
>> +#include <rte_debug.h>
>> +#include <rte_log.h>
>> +
>> +#include "eal_thread.h"
>> +
>> +/* uev monitoring thread */
>> +static pthread_t uev_monitor_thread;
>> +
>> +bool udev_exit = true;
>> +
>> +bool no_request_thread = true;
>> +
>> +static void sig_handler(int signum)
>> +{
>> +	if (signum == SIGINT || signum == SIGTERM)
>> +		rte_dev_monitor_stop();
>> +}
>> +
>> +static int
>> +dev_monitor_fd_new(void)
>> +{
>> +
>> +	int uevent_fd;
>> +
>> +	uevent_fd = socket(PF_NETLINK, SOCK_RAW | SOCK_CLOEXEC |
>> +			SOCK_NONBLOCK,
>> +			NETLINK_KOBJECT_UEVENT);
>> +	if (uevent_fd < 0) {
>> +		RTE_LOG(ERR, EAL, "create uevent fd failed\n");
>> +		return -1;
>> +	}
>> +	return uevent_fd;
>> +}
>> +
>> +static int
>> +dev_monitor_enable(int netlink_fd)
>> +{
>> +	struct sockaddr_nl addr;
>> +	int ret;
>> +	int size = 64 * 1024;
>> +	int nonblock = 1;
>> +
>> +	memset(&addr, 0, sizeof(addr));
>> +	addr.nl_family = AF_NETLINK;
>> +	addr.nl_pid = 0;
>> +	addr.nl_groups = 0xffffffff;
>> +
>> +	if (bind(netlink_fd, (struct sockaddr *) &addr, sizeof(addr)) < 0) {
>> +		RTE_LOG(ERR, EAL, "bind failed\n");
>> +		goto err;
>> +	}
>> +
>> +	setsockopt(netlink_fd, SOL_SOCKET, SO_PASSCRED, &size,
>> sizeof(size));
>> +
>> +	ret = ioctl(netlink_fd, FIONBIO, &nonblock);
>> +	if (ret != 0) {
>> +		RTE_LOG(ERR, EAL, "ioctl(FIONBIO) failed\n");
>> +		goto err;
>> +	}
>> +	return 0;
>> +err:
>> +	close(netlink_fd);
>> +	return -1;
>> +}
>> +
>> +static void
>> +dev_uev_parse(const char *buf, struct rte_eal_uevent *event)
>> +{
>> +	char action[RTE_EAL_UEV_MSG_ELEM_LEN];
>> +	char subsystem[RTE_EAL_UEV_MSG_ELEM_LEN];
>> +	char dev_path[RTE_EAL_UEV_MSG_ELEM_LEN];
>> +	char pci_slot_name[RTE_EAL_UEV_MSG_ELEM_LEN];
>> +	int i = 0;
>> +
>> +	memset(action, 0, RTE_EAL_UEV_MSG_ELEM_LEN);
>> +	memset(subsystem, 0, RTE_EAL_UEV_MSG_ELEM_LEN);
>> +	memset(dev_path, 0, RTE_EAL_UEV_MSG_ELEM_LEN);
>> +	memset(pci_slot_name, 0, RTE_EAL_UEV_MSG_ELEM_LEN);
>> +
>> +	while (i < RTE_EAL_UEV_MSG_LEN) {
>> +		for (; i < RTE_EAL_UEV_MSG_LEN; i++) {
>> +			if (*buf)
>> +				break;
>> +			buf++;
>> +		}
>> +		if (!strncmp(buf, "libudev", 7)) {
>> +			buf += 7;
>> +			i += 7;
>> +			event->group = UEV_MONITOR_UDEV;
>> +		}
>> +		if (!strncmp(buf, "ACTION=", 7)) {
>> +			buf += 7;
>> +			i += 7;
>> +			snprintf(action, sizeof(action), "%s", buf);
>> +		} else if (!strncmp(buf, "DEVPATH=", 8)) {
>> +			buf += 8;
>> +			i += 8;
>> +			snprintf(dev_path, sizeof(dev_path), "%s", buf);
>> +		} else if (!strncmp(buf, "SUBSYSTEM=", 10)) {
>> +			buf += 10;
>> +			i += 10;
>> +			snprintf(subsystem, sizeof(subsystem), "%s", buf);
>> +		} else if (!strncmp(buf, "PCI_SLOT_NAME=", 14)) {
>> +			buf += 14;
>> +			i += 14;
>> +			snprintf(pci_slot_name, sizeof(subsystem), "%s",
>> buf);
>> +			event->devname = pci_slot_name;
>> +		}
>> +		for (; i < RTE_EAL_UEV_MSG_LEN; i++) {
>> +			if (*buf == '\0')
>> +				break;
>> +			buf++;
>> +		}
>> +	}
>> +
>> +	if (!strncmp(subsystem, "pci", 3))
>> +		event->subsystem = UEV_SUBSYSTEM_PCI;
>> +	if (!strncmp(action, "add", 3))
>> +		event->type = RTE_EAL_DEV_EVENT_ADD;
>> +	if (!strncmp(action, "remove", 6))
>> +		event->type = RTE_EAL_DEV_EVENT_REMOVE;
>> +	event->devname = pci_slot_name;
>> +}
>> +
>> +static int
>> +dev_uev_receive(int fd, struct rte_eal_uevent *uevent)
>> +{
>> +	int ret;
>> +	char buf[RTE_EAL_UEV_MSG_LEN];
>> +
>> +	memset(uevent, 0, sizeof(struct rte_eal_uevent));
>> +	memset(buf, 0, RTE_EAL_UEV_MSG_LEN);
>> +
>> +	ret = recv(fd, buf, RTE_EAL_UEV_MSG_LEN - 1, MSG_DONTWAIT);
>> +	if (ret < 0) {
>> +		RTE_LOG(ERR, EAL,
>> +		"Socket read error(%d): %s\n",
>> +		errno, strerror(errno));
>> +		return -1;
>> +	} else if (ret == 0)
>> +		/* connection closed */
>> +		return -1;
>> +
>> +	dev_uev_parse(buf, uevent);
>> +
>> +	return 0;
>> +}
>> +
>> +static int
>> +dev_uev_process(struct epoll_event *events, int nfds)
>> +{
>> +	struct rte_bus *bus;
>> +	struct rte_device *dev;
>> +	struct rte_eal_uevent uevent;
>> +	int ret;
>> +	int i;
>> +
>> +	for (i = 0; i < nfds; i++) {
>> +		/**
>> +		 * check device uevent from kernel side, no need to check
>> +		 * uevent from udev.
>> +		 */
>> +		if ((dev_uev_receive(events[i].data.fd, &uevent)) ||
>> +			(uevent.group == UEV_MONITOR_UDEV))
>> +			return 0;
>> +
>> +		/* default handle all pci devcie when is being hot plug */
>> +		if (uevent.subsystem == UEV_SUBSYSTEM_PCI) {
>> +			bus = rte_bus_find_by_name("pci");
>> +			dev = rte_bus_find_device(bus, uevent.devname);
>> +			if (uevent.type == RTE_EAL_DEV_EVENT_REMOVE) {
>> +
>> +				if ((!dev) || dev->state ==
>> DEVICE_UNDEFINED)
>> +					return 0;
>> +				dev->state = DEVICE_FAULT;
>> +
>> +				/**
>> +				 * remap the resource to be fake
>> +				 * before user's removal processing
>> +				 */
>> +				ret = bus->remap_device(dev);
>> +				if (!ret)
>> +
>> 	return(_rte_dev_callback_process(dev,
>> +					  RTE_EAL_DEV_EVENT_REMOVE,
>> +					  NULL, NULL));
> What is the reason to keep this device in EAL device list after the removal?
> I suggest to remove it (driver remove, bus remove and EAL remove) after the callbacks running.
> By this way EAL can initiate all device removals.
agree, device should be remove from the eal device list after the removal.
>> +			} else if (uevent.type == RTE_EAL_DEV_EVENT_ADD)
>> {
>> +				if (dev == NULL) {
>> +					/**
>> +					 * bind the driver to the device
>> +					 * before user's add processing
>> +					 */
>> +					bus->bind_driver(
>> +						uevent.devname,
>> +						"igb_uio");
>> +
> Similar comments here:
> EAL can initiate all device probe operations by adding the device and probing it here before the callback running.
> Then, also the device pointer can be passed to the callbacks.
pass a device pointer could be bring some more change, let's think about 
more.
>> 	return(_rte_dev_callback_process(NULL,
>> +					  RTE_EAL_DEV_EVENT_ADD,
>> +					  uevent.devname, NULL));
>> +				}
>> +			}
>> +		}
>> +	}
>> +	return 0;
>> +}
>> +
>> +/**
>> + * It builds/rebuilds up the epoll file descriptor with all the
>> + * file descriptors being waited on. Then handles the interrupts.
>> + *
>> + * @param arg
>> + *  pointer. (unused)
>> + *
>> + * @return
>> + *  never return;
>> + */
>> +static __attribute__((noreturn)) void *
>> +dev_uev_monitoring(__rte_unused void *arg)
>> +{
>> +	struct sigaction act;
>> +	sigset_t mask;
>> +	int netlink_fd;
>> +	struct epoll_event ep_kernel;
>> +	int fd_ep;
>> +
>> +	udev_exit = false;
>> +
>> +	/* set signal handlers */
>> +	memset(&act, 0x00, sizeof(struct sigaction));
>> +	act.sa_handler = sig_handler;
>> +	sigemptyset(&act.sa_mask);
>> +	act.sa_flags = SA_RESTART;
>> +	sigaction(SIGINT, &act, NULL);
>> +	sigaction(SIGTERM, &act, NULL);
>> +	sigemptyset(&mask);
>> +	sigaddset(&mask, SIGINT);
>> +	sigaddset(&mask, SIGTERM);
>> +	sigprocmask(SIG_UNBLOCK, &mask, NULL);
>> +
>> +	fd_ep = epoll_create1(EPOLL_CLOEXEC);
>> +	if (fd_ep < 0) {
>> +		RTE_LOG(ERR, EAL, "error creating epoll fd: %m\n");
>> +		goto out;
>> +	}
>> +
>> +	netlink_fd = dev_monitor_fd_new();
>> +
>> +	if (dev_monitor_enable(netlink_fd) < 0) {
>> +		RTE_LOG(ERR, EAL, "error subscribing to kernel events\n");
>> +		goto out;
>> +	}
>> +
>> +	memset(&ep_kernel, 0, sizeof(struct epoll_event));
>> +	ep_kernel.events = EPOLLIN | EPOLLPRI | EPOLLRDHUP | EPOLLHUP;
>> +	ep_kernel.data.fd = netlink_fd;
>> +	if (epoll_ctl(fd_ep, EPOLL_CTL_ADD, netlink_fd,
>> +		&ep_kernel) < 0) {
>> +		RTE_LOG(ERR, EAL, "error addding fd to epoll: %m\n");
>> +		goto out;
>> +	}
>> +
>> +	while (!udev_exit) {
>> +		int fdcount;
>> +		struct epoll_event ev[1];
>> +
>> +		fdcount = epoll_wait(fd_ep, ev, 1, -1);
>> +		if (fdcount < 0) {
>> +			if (errno != EINTR)
>> +				RTE_LOG(ERR, EAL, "error receiving uevent "
>> +					"message: %m\n");
>> +				continue;
>> +			}
>> +
>> +		/* epoll_wait has at least one fd ready to read */
>> +		if (dev_uev_process(ev, fdcount) < 0) {
>> +			if (errno != EINTR)
>> +				RTE_LOG(ERR, EAL, "error processing uevent
>> "
>> +					"message: %m\n");
>> +		}
>> +	}
>> +out:
>> +	if (fd_ep >= 0)
>> +		close(fd_ep);
>> +	if (netlink_fd >= 0)
>> +		close(netlink_fd);
>> +	rte_panic("uev monitoring fail\n");
>> +}
>> +
>> +int
>> +rte_dev_monitor_start(void)
>> +{
> Maybe add option to run it also by new EAL command line parameter?
good idea.
>> +	int ret;
>> +
>> +	if (!no_request_thread)
>> +		return 0;
>> +	no_request_thread = false;
>> +
>> +	/* create the host thread to wait/handle the uevent from kernel */
>> +	ret = pthread_create(&uev_monitor_thread, NULL,
>> +		dev_uev_monitoring, NULL);
> What is the reason to open new thread for hotplug?
> Why not to use the current dpdk host thread by the alarm mechanism?
appropriate if you could talk about what you mean the disadvantage of 
new thread here and the advantage of alarm mechanism at the case?
>> +	return ret;
>> +}
>> +
>> +int
>> +rte_dev_monitor_stop(void)
>> +{
>> +	udev_exit = true;
>> +	no_request_thread = true;
>> +	return 0;
>> +}
>> diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_dev.h
>> b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_dev.h
>> new file mode 100644
>> index 0000000..6a6feb5
>> --- /dev/null
>> +++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_dev.h
>> @@ -0,0 +1,106 @@
>> +/*-
>> + *   BSD LICENSE
>> + *
>> + *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
>> + *   All rights reserved.
>> + *
>> + *   Redistribution and use in source and binary forms, with or without
>> + *   modification, are permitted provided that the following conditions
>> + *   are met:
>> + *
>> + *     * Redistributions of source code must retain the above copyright
>> + *       notice, this list of conditions and the following disclaimer.
>> + *     * Redistributions in binary form must reproduce the above copyright
>> + *       notice, this list of conditions and the following disclaimer in
>> + *       the documentation and/or other materials provided with the
>> + *       distribution.
>> + *     * Neither the name of Intel Corporation nor the names of its
>> + *       contributors may be used to endorse or promote products derived
>> + *       from this software without specific prior written permission.
>> + *
>> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
>> CONTRIBUTORS
>> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT
>> NOT
>> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
>> FITNESS FOR
>> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
>> COPYRIGHT
>> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
>> INCIDENTAL,
>> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
>> NOT
>> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS
>> OF USE,
>> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
>> AND ON ANY
>> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
>> TORT
>> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
>> THE USE
>> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
>> DAMAGE.
>> + */
>> +
>> +#ifndef _RTE_DEV_H_
>> +#error "don't include this file directly, please include generic <rte_dev.h>"
>> +#endif
>> +
>> +#ifndef _RTE_LINUXAPP_DEV_H_
>> +#define _RTE_LINUXAPP_DEV_H_
>> +
>> +#include <stdio.h>
>> +
>> +#include <rte_dev.h>
>> +
>> +#define RTE_EAL_UEV_MSG_LEN 4096
>> +#define RTE_EAL_UEV_MSG_ELEM_LEN 128
>> +
>> +enum uev_subsystem {
>> +	UEV_SUBSYSTEM_UIO,
>> +	UEV_SUBSYSTEM_VFIO,
>> +	UEV_SUBSYSTEM_PCI,
>> +	UEV_SUBSYSTEM_MAX
>> +};
>> +
>> +enum uev_monitor_netlink_group {
>> +	UEV_MONITOR_KERNEL,
>> +	UEV_MONITOR_UDEV,
>> +};
>> +
>> +/**
>> + * The device event type.
>> + */
>> +enum rte_eal_dev_event_type {
>> +	RTE_EAL_DEV_EVENT_UNKNOWN,	/**< unknown event type */
>> +	RTE_EAL_DEV_EVENT_ADD,		/**< device adding event */
>> +	RTE_EAL_DEV_EVENT_REMOVE,
>> +					/**< device removing event */
>> +	RTE_EAL_DEV_EVENT_CHANGE,
>> +					/**< device status change event */
>> +	RTE_EAL_DEV_EVENT_MOVE,		/**< device sys path move
>> event */
>> +	RTE_EAL_DEV_EVENT_ONLINE,	/**< device online event */
>> +	RTE_EAL_DEV_EVENT_OFFLINE,	/**< device offline event */
>> +	RTE_EAL_DEV_EVENT_MAX		/**< max value of this enum
>> */
>> +};
>> +
>> +struct rte_eal_uevent {
>> +	enum rte_eal_dev_event_type type;	/**< device event type */
>> +	int subsystem;				/**< subsystem id */
>> +	char *devname;				/**< device name */
>> +	enum uev_monitor_netlink_group group;	/**< device netlink
>> group */
>> +};
>> +
>> +/**
>> + * Start the device uevent monitoring.
>> + *
>> + * @param none
>> + * @return
>> + *   - On success, zero.
>> + *   - On failure, a negative value.
>> + */
>> +int
>> +rte_dev_monitor_start(void);
>> +
>> +/**
>> + * Stop the device uevent monitoring .
>> + *
>> + * @param none
>> + * @return
>> + *   - On success, zero.
>> + *   - On failure, a negative value.
>> + */
>> +
>> +int
>> +rte_dev_monitor_stop(void);
>> +
>> +#endif /* _RTE_LINUXAPP_DEV_H_ */
>> diff --git a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
>> b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
>> index a3a98c1..d0e07b4 100644
>> --- a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
>> +++ b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
>> @@ -354,6 +354,12 @@ igbuio_pci_release(struct uio_info *info, struct
>> inode *inode)
>>   	struct rte_uio_pci_dev *udev = info->priv;
>>   	struct pci_dev *dev = udev->pdev;
>>
>> +	/* check if device have been remove before release */
>> +	if ((&dev->dev.kobj)->state_remove_uevent_sent == 1) {
>> +		pr_info("The device have been removed\n");
>> +		return -1;
>> +	}
>> +
>>   	/* disable interrupts */
>>   	igbuio_pci_disable_interrupts(udev);
>>
>> diff --git a/lib/librte_pci/rte_pci.c b/lib/librte_pci/rte_pci.c
>> index 0160fc1..feb5fd7 100644
>> --- a/lib/librte_pci/rte_pci.c
>> +++ b/lib/librte_pci/rte_pci.c
>> @@ -172,6 +172,26 @@ rte_pci_addr_parse(const char *str, struct
>> rte_pci_addr *addr)
>>   	return -1;
>>   }
>>
>> +/* map a private resource from an address*/
>> +void *
>> +pci_map_private_resource(void *requested_addr, off_t offset, size_t size)
>> +{
>> +	void *mapaddr;
>> +
>> +	mapaddr = mmap(requested_addr, size,
>> +			   PROT_READ | PROT_WRITE,
>> +			   MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED,
>> -1, 0);
>> +	if (mapaddr == MAP_FAILED) {
>> +		RTE_LOG(ERR, EAL, "%s(): cannot mmap(%p, 0x%lx, 0x%lx): "
>> +			"%s (%p)\n",
>> +			__func__, requested_addr,
>> +			(unsigned long)size, (unsigned long)offset,
>> +			strerror(errno), mapaddr);
>> +	} else
>> +		RTE_LOG(DEBUG, EAL, "  PCI memory mapped at %p\n",
>> mapaddr);
>> +
>> +	return mapaddr;
>> +}
>>
>>   /* map a particular resource from a file */
>>   void *
>> diff --git a/lib/librte_pci/rte_pci.h b/lib/librte_pci/rte_pci.h
>> index 4f2cd18..f6091a6 100644
>> --- a/lib/librte_pci/rte_pci.h
>> +++ b/lib/librte_pci/rte_pci.h
>> @@ -227,6 +227,23 @@ int rte_pci_addr_cmp(const struct rte_pci_addr
>> *addr,
>>   int rte_pci_addr_parse(const char *str, struct rte_pci_addr *addr);
>>
>>   /**
>> + * @internal
>> + * Map to a particular private resource.
>> + *
>> + * @param requested_addr
>> + *      The starting address for the new mapping range.
>> + * @param offset
>> + *      The offset for the mapping range.
>> + * @param size
>> + *      The size for the mapping range.
>> + * @return
>> + *   - On success, the function returns a pointer to the mapped area.
>> + *   - On error, the value MAP_FAILED is returned.
>> + */
>> +void *pci_map_private_resource(void *requested_addr, off_t offset,
>> +		size_t size);
>> +
>> +/**
>>    * Map a particular resource from a file.
>>    *
>>    * @param requested_addr
>> --
>> 2.7.4

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v7 1/2] eal: add uevent monitor for hot plug
  2018-01-02 17:02                         ` Matan Azrad
  2018-01-08  5:26                           ` Guo, Jia
@ 2018-01-08  6:05                           ` Guo, Jia
  1 sibling, 0 replies; 494+ messages in thread
From: Guo, Jia @ 2018-01-08  6:05 UTC (permalink / raw)
  To: Matan Azrad, stephen, bruce.richardson, ferruh.yigit, gaetan.rivet
  Cc: konstantin.ananyev, jblunck, shreyansh.jain, jingjing.wu, dev,
	Thomas Monjalon, helin.zhang, Mordechay Haimovsky

add one more comment.


On 1/3/2018 1:02 AM, Matan Azrad wrote:
> Hi Jeff
>
> Maybe I'm touching in previous discussions but please see some comments\questions.
>
> From: Jeff Guo:
>> This patch aim to add a general uevent mechanism in eal device layer,
>> to enable all linux kernel object hot plug monitoring, so user could use these
>> APIs to monitor and read out the device status info that sent from the kernel
>> side, then corresponding to handle it, such as detach or attach the
>> device, and even benefit to use it to do smoothly fail safe work.
>>
>> 1) About uevent monitoring:
>> a: add one epolling to poll the netlink socket, to monitor the uevent of
>>     the device, add device_state in struct of rte_device, to identify the
>>     device state machine.
>> b: add enum of rte_eal_dev_event_type and struct of rte_eal_uevent.
>> c: add below API in rte eal device common layer.
>>     rte_eal_dev_monitor_enable
>>     rte_dev_callback_register
>>     rte_dev_callback_unregister
>>     _rte_dev_callback_process
>>     rte_dev_monitor_start
>>     rte_dev_monitor_stop
>>
>> 2) About failure handler, use pci uio for example,
>>     add pci_remap_device in bus layer and below function to process it:
>>     rte_pci_remap_device
>>     pci_uio_remap_resource
>>     pci_map_private_resource
>>     add rte_pci_dev_bind_driver to bind pci device with explicit driver.
>>
>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
>> ---
>> v7->v6:
>> a.modify vdev part according to the vdev rework
>> b.re-define and split the func into common and bus specific code
>> c.fix some incorrect issue.
>> b.fix the system hung after send packcet issue.
>> ---
>>   drivers/bus/pci/bsd/pci.c                          |  30 ++
>>   drivers/bus/pci/linux/pci.c                        |  87 +++++
>>   drivers/bus/pci/linux/pci_init.h                   |   1 +
>>   drivers/bus/pci/pci_common.c                       |  43 +++
>>   drivers/bus/pci/pci_common_uio.c                   |  28 ++
>>   drivers/bus/pci/private.h                          |  12 +
>>   drivers/bus/pci/rte_bus_pci.h                      |  25 ++
>>   drivers/bus/vdev/vdev.c                            |  36 +++
>>   lib/librte_eal/bsdapp/eal/eal_dev.c                |  64 ++++
>>   .../bsdapp/eal/include/exec-env/rte_dev.h          | 106 ++++++
>>   lib/librte_eal/common/eal_common_bus.c             |  30 ++
>>   lib/librte_eal/common/eal_common_dev.c             | 169 ++++++++++
>>   lib/librte_eal/common/include/rte_bus.h            |  69 ++++
>>   lib/librte_eal/common/include/rte_dev.h            |  89 ++++++
>>   lib/librte_eal/linuxapp/eal/Makefile               |   3 +-
>>   lib/librte_eal/linuxapp/eal/eal_alarm.c            |   5 +
>>   lib/librte_eal/linuxapp/eal/eal_dev.c              | 356
>> +++++++++++++++++++++
>>   .../linuxapp/eal/include/exec-env/rte_dev.h        | 106 ++++++
>>   lib/librte_eal/linuxapp/igb_uio/igb_uio.c          |   6 +
>>   lib/librte_pci/rte_pci.c                           |  20 ++
>>   lib/librte_pci/rte_pci.h                           |  17 +
>>   21 files changed, 1301 insertions(+), 1 deletion(-)
>>   create mode 100644 lib/librte_eal/bsdapp/eal/eal_dev.c
>>   create mode 100644 lib/librte_eal/bsdapp/eal/include/exec-env/rte_dev.h
>>   create mode 100644 lib/librte_eal/linuxapp/eal/eal_dev.c
>>   create mode 100644 lib/librte_eal/linuxapp/eal/include/exec-env/rte_dev.h
>>
>> diff --git a/drivers/bus/pci/bsd/pci.c b/drivers/bus/pci/bsd/pci.c
>> index b8e2178..d58dbf6 100644
>> --- a/drivers/bus/pci/bsd/pci.c
>> +++ b/drivers/bus/pci/bsd/pci.c
>> @@ -126,6 +126,29 @@ rte_pci_unmap_device(struct rte_pci_device *dev)
>>   	}
>>   }
>>
>> +/* re-map pci device */
>> +int
>> +rte_pci_remap_device(struct rte_pci_device *dev)
>> +{
>> +	int ret;
>> +
>> +	if (dev == NULL)
>> +		return -EINVAL;
>> +
>> +	switch (dev->kdrv) {
>> +	case RTE_KDRV_NIC_UIO:
>> +		ret = pci_uio_remap_resource(dev);
>> +		break;
>> +	default:
>> +		RTE_LOG(DEBUG, EAL,
>> +			"  Not managed by a supported kernel driver,
>> skipped\n");
>> +		ret = 1;
>> +		break;
>> +	}
>> +
>> +	return ret;
>> +}
>> +
>>   void
>>   pci_uio_free_resource(struct rte_pci_device *dev,
>>   		struct mapped_pci_resource *uio_res)
>> @@ -678,3 +701,10 @@ rte_pci_ioport_unmap(struct rte_pci_ioport *p)
>>
>>   	return ret;
>>   }
>> +
>> +int
>> +rte_pci_dev_bind_driver(const char *dev_name, const char *drv_type)
>> +{
>> +	return -1;
>> +}
>> +
>> diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
>> index 5da6728..792fd2c 100644
>> --- a/drivers/bus/pci/linux/pci.c
>> +++ b/drivers/bus/pci/linux/pci.c
>> @@ -145,6 +145,38 @@ rte_pci_unmap_device(struct rte_pci_device *dev)
>>   	}
>>   }
>>
>> +/* Map pci device */
>> +int
>> +rte_pci_remap_device(struct rte_pci_device *dev)
>> +{
>> +	int ret = -1;
>> +
>> +	if (dev == NULL)
>> +		return -EINVAL;
>> +
>> +	switch (dev->kdrv) {
>> +	case RTE_KDRV_VFIO:
>> +#ifdef VFIO_PRESENT
>> +		/* no thing to do */
>> +#endif
>> +		break;
>> +	case RTE_KDRV_IGB_UIO:
>> +	case RTE_KDRV_UIO_GENERIC:
>> +		if (rte_eal_using_phys_addrs()) {
>> +			/* map resources for devices that use uio */
>> +			ret = pci_uio_remap_resource(dev);
>> +		}
>> +		break;
>> +	default:
>> +		RTE_LOG(DEBUG, EAL,
>> +			"  Not managed by a supported kernel driver,
>> skipped\n");
>> +		ret = 1;
>> +		break;
>> +	}
>> +
>> +	return ret;
>> +}
>> +
>>   void *
>>   pci_find_max_end_va(void)
>>   {
>> @@ -386,6 +418,8 @@ pci_scan_one(const char *dirname, const struct
>> rte_pci_addr *addr)
>>   		rte_pci_add_device(dev);
>>   	}
>>
>> +	dev->device.state = DEVICE_PARSED;
>> +	TAILQ_INIT(&(dev->device.uev_cbs));
>>   	return 0;
>>   }
>>
>> @@ -854,3 +888,56 @@ rte_pci_ioport_unmap(struct rte_pci_ioport *p)
>>
>>   	return ret;
>>   }
>> +
>> +int
>> +rte_pci_dev_bind_driver(const char *dev_name, const char *drv_type)
>> +{
>> +	char drv_bind_path[1024];
>> +	char drv_override_path[1024]; /* contains the /dev/uioX */
>> +	int drv_override_fd;
>> +	int drv_bind_fd;
>> +
>> +	RTE_SET_USED(drv_type);
>> +
>> +	snprintf(drv_override_path, sizeof(drv_override_path),
>> +		"/sys/bus/pci/devices/%s/driver_override", dev_name);
>> +
>> +	/* specify the driver for a device by writing to driver_override */
>> +	drv_override_fd = open(drv_override_path, O_WRONLY);
>> +	if (drv_override_fd < 0) {
>> +		RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
>> +			drv_override_path, strerror(errno));
>> +		goto err;
>> +	}
>> +
>> +	if (write(drv_override_fd, drv_type, sizeof(drv_type)) < 0) {
>> +		RTE_LOG(ERR, EAL,
>> +			"Error: bind failed - Cannot write "
>> +			"driver %s to device %s\n", drv_type, dev_name);
>> +		goto err;
>> +	}
>> +
>> +	close(drv_override_fd);
>> +
>> +	snprintf(drv_bind_path, sizeof(drv_bind_path),
>> +		"/sys/bus/pci/drivers/%s/bind", drv_type);
>> +
>> +	/* do the bind by writing device to the specific driver  */
>> +	drv_bind_fd = open(drv_bind_path, O_WRONLY | O_APPEND);
>> +	if (drv_bind_fd < 0) {
>> +		RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
>> +			drv_bind_path, strerror(errno));
>> +		goto err;
>> +	}
>> +
>> +	if (write(drv_bind_fd, dev_name, sizeof(dev_name)) < 0)
>> +		goto err;
>> +
>> +	close(drv_bind_fd);
>> +	return 0;
>> +err:
>> +	close(drv_override_fd);
>> +	close(drv_bind_fd);
>> +	return -1;
>> +}
>> +
>> diff --git a/drivers/bus/pci/linux/pci_init.h b/drivers/bus/pci/linux/pci_init.h
>> index f342c47..5838402 100644
>> --- a/drivers/bus/pci/linux/pci_init.h
>> +++ b/drivers/bus/pci/linux/pci_init.h
>> @@ -58,6 +58,7 @@ int pci_uio_alloc_resource(struct rte_pci_device *dev,
>>   		struct mapped_pci_resource **uio_res);
>>   void pci_uio_free_resource(struct rte_pci_device *dev,
>>   		struct mapped_pci_resource *uio_res);
>> +int pci_uio_remap_resource(struct rte_pci_device *dev);
>>   int pci_uio_map_resource_by_index(struct rte_pci_device *dev, int
>> res_idx,
>>   		struct mapped_pci_resource *uio_res, int map_idx);
>>
>> diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
>> index 104fdf9..5417b32 100644
>> --- a/drivers/bus/pci/pci_common.c
>> +++ b/drivers/bus/pci/pci_common.c
>> @@ -282,6 +282,7 @@ pci_probe_all_drivers(struct rte_pci_device *dev)
>>   		if (rc > 0)
>>   			/* positive value means driver doesn't support it */
>>   			continue;
>> +		dev->device.state = DEVICE_PROBED;
>>   		return 0;
>>   	}
>>   	return 1;
>> @@ -481,6 +482,7 @@ rte_pci_insert_device(struct rte_pci_device
>> *exist_pci_dev,
>>   void
>>   rte_pci_remove_device(struct rte_pci_device *pci_dev)
>>   {
>> +	RTE_LOG(DEBUG, EAL, " rte_pci_remove_device for device list\n");
>>   	TAILQ_REMOVE(&rte_pci_bus.device_list, pci_dev, next);
>>   }
>>
>> @@ -502,6 +504,44 @@ pci_find_device(const struct rte_device *start,
>> rte_dev_cmp_t cmp,
>>   	return NULL;
>>   }
>>
>> +static struct rte_device *
>> +pci_find_device_by_name(const struct rte_device *start,
>> +		rte_dev_cmp_name_t cmp_name,
>> +		const void *data)
>> +{
>> +	struct rte_pci_device *dev;
>> +
>> +	FOREACH_DEVICE_ON_PCIBUS(dev) {
>> +		if (start && &dev->device == start) {
>> +			start = NULL; /* starting point found */
>> +			continue;
>> +		}
>> +		if (cmp_name(dev->device.name, data) == 0)
>> +			return &dev->device;
>> +	}
>> +
>> +	return NULL;
>> +}
>> +
>> +static int
>> +pci_remap_device(struct rte_device *dev)
>> +{
>> +	struct rte_pci_device *pdev;
>> +	int ret;
>> +
>> +	if (dev == NULL)
>> +		return -EINVAL;
>> +
>> +	pdev = RTE_DEV_TO_PCI(dev);
>> +
>> +	/* remap resources for devices that use igb_uio */
>> +	ret = rte_pci_remap_device(pdev);
>> +	if (ret != 0)
>> +		RTE_LOG(ERR, EAL, "failed to remap device %s",
>> +			dev->name);
>> +	return ret;
>> +}
>> +
>>   static int
>>   pci_plug(struct rte_device *dev)
>>   {
>> @@ -528,10 +568,13 @@ struct rte_pci_bus rte_pci_bus = {
>>   		.scan = rte_pci_scan,
>>   		.probe = rte_pci_probe,
>>   		.find_device = pci_find_device,
>> +		.find_device_by_name = pci_find_device_by_name,
>>   		.plug = pci_plug,
>>   		.unplug = pci_unplug,
>>   		.parse = pci_parse,
>>   		.get_iommu_class = rte_pci_get_iommu_class,
>> +		.remap_device = pci_remap_device,
>> +		.bind_driver = rte_pci_dev_bind_driver,
>>   	},
>>   	.device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
>>   	.driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
>> diff --git a/drivers/bus/pci/pci_common_uio.c
>> b/drivers/bus/pci/pci_common_uio.c
>> index 0671131..8cb4009 100644
>> --- a/drivers/bus/pci/pci_common_uio.c
>> +++ b/drivers/bus/pci/pci_common_uio.c
>> @@ -176,6 +176,34 @@ pci_uio_unmap(struct mapped_pci_resource
>> *uio_res)
>>   	}
>>   }
>>
>> +/* remap the PCI resource of a PCI device in private virtual memory */
>> +int
>> +pci_uio_remap_resource(struct rte_pci_device *dev)
>> +{
>> +	int i;
>> +	uint64_t phaddr;
>> +	void *map_address;
>> +
>> +	/* Map all BARs */
>> +	for (i = 0; i != PCI_MAX_RESOURCE; i++) {
>> +		/* skip empty BAR */
>> +		phaddr = dev->mem_resource[i].phys_addr;
>> +		if (phaddr == 0)
>> +			continue;
>> +		map_address = pci_map_private_resource(
>> +				dev->mem_resource[i].addr, 0,
>> +				(size_t)dev->mem_resource[i].len);
>> +		if (map_address == MAP_FAILED)
>> +			goto error;
>> +		memset(map_address, 0xFF, (size_t)dev-
>>> mem_resource[i].len);
>> +		dev->mem_resource[i].addr = map_address;
>> +	}
>> +
>> +	return 0;
>> +error:
>> +	return -1;
>> +}
>> +
>>   static struct mapped_pci_resource *
>>   pci_uio_find_resource(struct rte_pci_device *dev)
>>   {
>> diff --git a/drivers/bus/pci/private.h b/drivers/bus/pci/private.h
>> index 2283f09..10baa1a 100644
>> --- a/drivers/bus/pci/private.h
>> +++ b/drivers/bus/pci/private.h
>> @@ -202,6 +202,18 @@ void pci_uio_free_resource(struct rte_pci_device
>> *dev,
>>   		struct mapped_pci_resource *uio_res);
>>
>>   /**
>> + * remap the pci uio resource..
>> + *
>> + * @param dev
>> + *   Point to the struct rte pci device.
>> + * @return
>> + *   - On success, zero.
>> + *   - On failure, a negative value.
>> + */
>> +int
>> +pci_uio_remap_resource(struct rte_pci_device *dev);
>> +
>> +/**
>>    * Map device memory to uio resource
>>    *
>>    * This function is private to EAL.
>> diff --git a/drivers/bus/pci/rte_bus_pci.h b/drivers/bus/pci/rte_bus_pci.h
>> index d4a2996..1662f3b 100644
>> --- a/drivers/bus/pci/rte_bus_pci.h
>> +++ b/drivers/bus/pci/rte_bus_pci.h
>> @@ -52,6 +52,8 @@ extern "C" {
>>   #include <sys/queue.h>
>>   #include <stdint.h>
>>   #include <inttypes.h>
>> +#include <unistd.h>
>> +#include <fcntl.h>
>>
>>   #include <rte_debug.h>
>>   #include <rte_interrupts.h>
>> @@ -197,6 +199,15 @@ int rte_pci_map_device(struct rte_pci_device *dev);
>>   void rte_pci_unmap_device(struct rte_pci_device *dev);
>>
>>   /**
>> + * Remap this device
>> + *
>> + * @param dev
>> + *   A pointer to a rte_pci_device structure describing the device
>> + *   to use
>> + */
>> +int rte_pci_remap_device(struct rte_pci_device *dev);
>> +
>> +/**
>>    * Dump the content of the PCI bus.
>>    *
>>    * @param f
>> @@ -333,6 +344,20 @@ void rte_pci_ioport_read(struct rte_pci_ioport *p,
>>   void rte_pci_ioport_write(struct rte_pci_ioport *p,
>>   		const void *data, size_t len, off_t offset);
>>
>> +/**
>> + * It can be used to bind a device to a specific type of driver.
>> + *
>> + * @param dev_name
>> + *  The device name.
>> + * @param drv_type
>> + *  The specific driver's type.
>> + *
>> + * @return
>> + *  - On success, zero.
>> + *  - On failure, a negative value.
>> + */
>> +int rte_pci_dev_bind_driver(const char *dev_name, const char *drv_type);
>> +
>>   #ifdef __cplusplus
>>   }
>>   #endif
>> diff --git a/drivers/bus/vdev/vdev.c b/drivers/bus/vdev/vdev.c
>> index fd7736d..773f6e0 100644
>> --- a/drivers/bus/vdev/vdev.c
>> +++ b/drivers/bus/vdev/vdev.c
>> @@ -323,6 +323,39 @@ vdev_find_device(const struct rte_device *start,
>> rte_dev_cmp_t cmp,
>>   	return NULL;
>>   }
>>
>> +static struct rte_device *
>> +vdev_find_device_by_name(const struct rte_device *start,
>> +		rte_dev_cmp_name_t cmp_name,
>> +		const void *data)
>> +{
>> +	struct rte_vdev_device *dev;
>> +
>> +	TAILQ_FOREACH(dev, &vdev_device_list, next) {
>> +		if (start && &dev->device == start) {
>> +			start = NULL;
>> +			continue;
>> +		}
>> +		if (cmp_name(dev->device.name, data) == 0)
>> +			return &dev->device;
>> +	}
>> +	return NULL;
>> +}
>> +
>> +static int
>> +vdev_remap_device(struct rte_device *dev)
>> +{
>> +	RTE_SET_USED(dev);
>> +	return 0;
>> +}
>> +
>> +static int
>> +vdev_bind_driver(const char *dev_name, const char *drv_type)
>> +{
>> +	RTE_SET_USED(dev_name);
>> +	RTE_SET_USED(drv_type);
>> +	return 0;
>> +}
>> +
>>   static int
>>   vdev_plug(struct rte_device *dev)
>>   {
>> @@ -339,9 +372,12 @@ static struct rte_bus rte_vdev_bus = {
>>   	.scan = vdev_scan,
>>   	.probe = vdev_probe,
>>   	.find_device = vdev_find_device,
>> +	.find_device_by_name = vdev_find_device_by_name,
>>   	.plug = vdev_plug,
>>   	.unplug = vdev_unplug,
>>   	.parse = vdev_parse,
>> +	.remap_device = vdev_remap_device,
>> +	.bind_driver = vdev_bind_driver,
>>   };
>>
>>   RTE_REGISTER_BUS(vdev, rte_vdev_bus);
>> diff --git a/lib/librte_eal/bsdapp/eal/eal_dev.c
>> b/lib/librte_eal/bsdapp/eal/eal_dev.c
>> new file mode 100644
>> index 0000000..6ea9a74
>> --- /dev/null
>> +++ b/lib/librte_eal/bsdapp/eal/eal_dev.c
>> @@ -0,0 +1,64 @@
>> +/*-
>> + *   Copyright(c) 2010-2017 Intel Corporation.
>> + *   All rights reserved.
>> + *
>> + *   Redistribution and use in source and binary forms, with or without
>> + *   modification, are permitted provided that the following conditions
>> + *   are met:
>> + *
>> + *     * Redistributions of source code must retain the above copyright
>> + *       notice, this list of conditions and the following disclaimer.
>> + *     * Redistributions in binary form must reproduce the above copyright
>> + *       notice, this list of conditions and the following disclaimer in
>> + *       the documentation and/or other materials provided with the
>> + *       distribution.
>> + *     * Neither the name of Intel Corporation nor the names of its
>> + *       contributors may be used to endorse or promote products derived
>> + *       from this software without specific prior written permission.
>> + *
>> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
>> CONTRIBUTORS
>> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT
>> NOT
>> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
>> FITNESS FOR
>> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
>> COPYRIGHT
>> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
>> INCIDENTAL,
>> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
>> NOT
>> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS
>> OF USE,
>> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
>> AND ON ANY
>> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
>> TORT
>> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
>> THE USE
>> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
>> DAMAGE.
>> + */
>> +
>> +#include <stdio.h>
>> +#include <string.h>
>> +#include <inttypes.h>
>> +#include <sys/queue.h>
>> +#include <sys/signalfd.h>
>> +#include <sys/ioctl.h>
>> +#include <sys/socket.h>
>> +#include <linux/netlink.h>
>> +#include <sys/epoll.h>
>> +#include <unistd.h>
>> +#include <signal.h>
>> +#include <stdbool.h>
>> +
>> +#include <rte_malloc.h>
>> +#include <rte_bus.h>
>> +#include <rte_dev.h>
>> +#include <rte_devargs.h>
>> +#include <rte_debug.h>
>> +#include <rte_log.h>
>> +
>> +#include "eal_thread.h"
>> +
>> +int
>> +rte_dev_monitor_start(void)
>> +{
>> +	return -1;
>> +}
>> +
>> +int
>> +rte_dev_monitor_stop(void)
>> +{
>> +	return -1;
>> +}
>> diff --git a/lib/librte_eal/bsdapp/eal/include/exec-env/rte_dev.h
>> b/lib/librte_eal/bsdapp/eal/include/exec-env/rte_dev.h
>> new file mode 100644
>> index 0000000..6a6feb5
>> --- /dev/null
>> +++ b/lib/librte_eal/bsdapp/eal/include/exec-env/rte_dev.h
>> @@ -0,0 +1,106 @@
>> +/*-
>> + *   BSD LICENSE
>> + *
>> + *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
>> + *   All rights reserved.
>> + *
>> + *   Redistribution and use in source and binary forms, with or without
>> + *   modification, are permitted provided that the following conditions
>> + *   are met:
>> + *
>> + *     * Redistributions of source code must retain the above copyright
>> + *       notice, this list of conditions and the following disclaimer.
>> + *     * Redistributions in binary form must reproduce the above copyright
>> + *       notice, this list of conditions and the following disclaimer in
>> + *       the documentation and/or other materials provided with the
>> + *       distribution.
>> + *     * Neither the name of Intel Corporation nor the names of its
>> + *       contributors may be used to endorse or promote products derived
>> + *       from this software without specific prior written permission.
>> + *
>> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
>> CONTRIBUTORS
>> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT
>> NOT
>> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
>> FITNESS FOR
>> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
>> COPYRIGHT
>> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
>> INCIDENTAL,
>> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
>> NOT
>> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS
>> OF USE,
>> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
>> AND ON ANY
>> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
>> TORT
>> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
>> THE USE
>> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
>> DAMAGE.
>> + */
>> +
>> +#ifndef _RTE_DEV_H_
>> +#error "don't include this file directly, please include generic <rte_dev.h>"
>> +#endif
>> +
>> +#ifndef _RTE_LINUXAPP_DEV_H_
>> +#define _RTE_LINUXAPP_DEV_H_
>> +
>> +#include <stdio.h>
>> +
>> +#include <rte_dev.h>
>> +
>> +#define RTE_EAL_UEV_MSG_LEN 4096
>> +#define RTE_EAL_UEV_MSG_ELEM_LEN 128
>> +
>> +enum uev_subsystem {
>> +	UEV_SUBSYSTEM_UIO,
>> +	UEV_SUBSYSTEM_VFIO,
>> +	UEV_SUBSYSTEM_PCI,
>> +	UEV_SUBSYSTEM_MAX
>> +};
>> +
>> +enum uev_monitor_netlink_group {
>> +	UEV_MONITOR_KERNEL,
>> +	UEV_MONITOR_UDEV,
>> +};
>> +
>> +/**
>> + * The device event type.
>> + */
>> +enum rte_eal_dev_event_type {
>> +	RTE_EAL_DEV_EVENT_UNKNOWN,	/**< unknown event type */
>> +	RTE_EAL_DEV_EVENT_ADD,		/**< device adding event */
>> +	RTE_EAL_DEV_EVENT_REMOVE,
>> +					/**< device removing event */
>> +	RTE_EAL_DEV_EVENT_CHANGE,
>> +					/**< device status change event */
>> +	RTE_EAL_DEV_EVENT_MOVE,		/**< device sys path move
>> event */
>> +	RTE_EAL_DEV_EVENT_ONLINE,	/**< device online event */
>> +	RTE_EAL_DEV_EVENT_OFFLINE,	/**< device offline event */
>> +	RTE_EAL_DEV_EVENT_MAX		/**< max value of this enum
>> */
>> +};
>> +
>> +struct rte_eal_uevent {
>> +	enum rte_eal_dev_event_type type;	/**< device event type */
>> +	int subsystem;				/**< subsystem id */
>> +	char *devname;				/**< device name */
>> +	enum uev_monitor_netlink_group group;	/**< device netlink
>> group */
>> +};
>> +
>> +/**
>> + * Start the device uevent monitoring.
>> + *
>> + * @param none
>> + * @return
>> + *   - On success, zero.
>> + *   - On failure, a negative value.
>> + */
>> +int
>> +rte_dev_monitor_start(void);
>> +
>> +/**
>> + * Stop the device uevent monitoring .
>> + *
>> + * @param none
>> + * @return
>> + *   - On success, zero.
>> + *   - On failure, a negative value.
>> + */
>> +
>> +int
>> +rte_dev_monitor_stop(void);
>> +
>> +#endif /* _RTE_LINUXAPP_DEV_H_ */
>> diff --git a/lib/librte_eal/common/eal_common_bus.c
>> b/lib/librte_eal/common/eal_common_bus.c
>> index 3e022d5..b7219c9 100644
>> --- a/lib/librte_eal/common/eal_common_bus.c
>> +++ b/lib/librte_eal/common/eal_common_bus.c
>> @@ -51,8 +51,11 @@ rte_bus_register(struct rte_bus *bus)
>>   	RTE_VERIFY(bus->scan);
>>   	RTE_VERIFY(bus->probe);
>>   	RTE_VERIFY(bus->find_device);
>> +	RTE_VERIFY(bus->find_device_by_name);
>>   	/* Buses supporting driver plug also require unplug. */
>>   	RTE_VERIFY(!bus->plug || bus->unplug);
>> +	RTE_VERIFY(bus->remap_device);
>> +	RTE_VERIFY(bus->bind_driver);
>>
>>   	TAILQ_INSERT_TAIL(&rte_bus_list, bus, next);
>>   	RTE_LOG(DEBUG, EAL, "Registered [%s] bus.\n", bus->name);
>> @@ -170,6 +173,14 @@ cmp_rte_device(const struct rte_device *dev1,
>> const void *_dev2)
>>   }
>>
>>   static int
>> +cmp_rte_device_name(const char *dev_name1, const void *_dev_name2)
>> +{
>> +	const char *dev_name2 = _dev_name2;
>> +
>> +	return strcmp(dev_name1, dev_name2);
>> +}
>> +
>> +static int
>>   bus_find_device(const struct rte_bus *bus, const void *_dev)
>>   {
>>   	struct rte_device *dev;
>> @@ -178,6 +189,25 @@ bus_find_device(const struct rte_bus *bus, const
>> void *_dev)
>>   	return dev == NULL;
>>   }
>>
>> +static struct rte_device *
>> +bus_find_device_by_name(const struct rte_bus *bus, const void
>> *_dev_name)
>> +{
>> +	struct rte_device *dev;
>> +
>> +	dev = bus->find_device_by_name(NULL, cmp_rte_device_name,
>> _dev_name);
>> +	return dev;
>> +}
>> +
>> +struct rte_device *
>> +
>> +rte_bus_find_device(const struct rte_bus *bus, const void *_dev_name)
>> +{
>> +	struct rte_device *dev;
>> +
>> +	dev = bus_find_device_by_name(bus, _dev_name);
>> +	return dev;
>> +}
>> +
>>   struct rte_bus *
>>   rte_bus_find_by_device(const struct rte_device *dev)
>>   {
>> diff --git a/lib/librte_eal/common/eal_common_dev.c
>> b/lib/librte_eal/common/eal_common_dev.c
>> index dda8f58..47909e8 100644
>> --- a/lib/librte_eal/common/eal_common_dev.c
>> +++ b/lib/librte_eal/common/eal_common_dev.c
>> @@ -42,9 +42,31 @@
>>   #include <rte_devargs.h>
>>   #include <rte_debug.h>
>>   #include <rte_log.h>
>> +#include <rte_spinlock.h>
>> +#include <rte_malloc.h>
>>
>>   #include "eal_private.h"
>>
>> +/* spinlock for device callbacks */
>> +static rte_spinlock_t rte_dev_cb_lock = RTE_SPINLOCK_INITIALIZER;
>> +
>> +/**
>> + * The user application callback description.
>> + *
>> + * It contains callback address to be registered by user application,
>> + * the pointer to the parameters for callback, and the event type.
>> + */
>> +struct rte_eal_dev_callback {
>> +	TAILQ_ENTRY(rte_eal_dev_callback) next; /**< Callbacks list */
>> +	rte_eal_dev_cb_fn cb_fn;                /**< Callback address */
>> +	void *cb_arg;                           /**< Parameter for callback */
>> +	void *ret_param;                        /**< Return parameter */
>> +	enum rte_eal_dev_event_type event;      /**< device event type */
>> +	uint32_t active;                        /**< Callback is executing */
>> +};
>> +
>> +static struct rte_eal_dev_callback *dev_add_cb;
>> +
>>   static int cmp_detached_dev_name(const struct rte_device *dev,
>>   	const void *_name)
>>   {
>> @@ -234,3 +256,150 @@ int rte_eal_hotplug_remove(const char *busname,
>> const char *devname)
>>   	rte_eal_devargs_remove(busname, devname);
>>   	return ret;
>>   }
>> +
>> +int
>> +rte_eal_dev_monitor_enable(void)
>> +{
>> +	int ret;
>> +
>> +	ret = rte_dev_monitor_start();
>> +	if (ret)
>> +		RTE_LOG(ERR, EAL, "Can not init device monitor\n");
>> +	return ret;
>> +}
>> +
>> +int
>> +rte_dev_callback_register(struct rte_device *device,
>> +			enum rte_eal_dev_event_type event,
>> +			rte_eal_dev_cb_fn cb_fn, void *cb_arg)
>> +{
>> +	struct rte_eal_dev_callback *user_cb;
>> +
>> +	if (!cb_fn)
>> +		return -EINVAL;
>> +
> What's about checking the device pointer is not NULL ?
>
>> +	rte_spinlock_lock(&rte_dev_cb_lock);
>> +
>> +	if (TAILQ_EMPTY(&(device->uev_cbs)))
>> +		TAILQ_INIT(&(device->uev_cbs));
>> +
>> +	if (event == RTE_EAL_DEV_EVENT_ADD) {
>> +		user_cb = NULL;
>> +	} else {
>> +		TAILQ_FOREACH(user_cb, &(device->uev_cbs), next) {
>> +			if (user_cb->cb_fn == cb_fn &&
>> +				user_cb->cb_arg == cb_arg &&
>> +				user_cb->event == event) {
>> +				break;
>> +			}
>> +		}
>> +	}
>> +
>> +	/* create a new callback. */
>> +	if (user_cb == NULL) {
>> +		/* allocate a new interrupt callback entity */
>> +		user_cb = rte_zmalloc("eal device event",
>> +					sizeof(*user_cb), 0);
>> +		if (user_cb == NULL) {
>> +			RTE_LOG(ERR, EAL, "Can not allocate memory\n");
> Missing rte_spinlock_unlock.
>
>> +			return -ENOMEM;
>> +		}
>> +		user_cb->cb_fn = cb_fn;
>> +		user_cb->cb_arg = cb_arg;
>> +		user_cb->event = event;
>> +		if (event == RTE_EAL_DEV_EVENT_ADD)
>> +			dev_add_cb = user_cb;
> Only one dpdk entity can register to ADD callback?
>
> I suggest to add option to register all devices maybe by using dummy device which will include all the "ALL_DEVICES"  callbacks per event.
> All means past, present and future devices, by this way 1 callback can be called for all the devices and more than one dpdk entity could register to  an ADD\NEW event.
> What's about NEW instead of ADD?
>
> I also suggest to add the device pointer as a parameter to the callback(which will be managed by EAL).
>
>> +		else
>> +			TAILQ_INSERT_TAIL(&(device->uev_cbs), user_cb,
>> next);
>> +	}
>> +
>> +	rte_spinlock_unlock(&rte_dev_cb_lock);
>> +	return 0;
>> +}
>> +
>> +int
>> +rte_dev_callback_unregister(struct rte_device *device,
>> +			enum rte_eal_dev_event_type event,
>> +			rte_eal_dev_cb_fn cb_fn, void *cb_arg)
>> +{
>> +	int ret;
>> +	struct rte_eal_dev_callback *cb, *next;
>> +
>> +	if (!cb_fn)
>> +		return -EINVAL;
>> +
>> +	rte_spinlock_lock(&rte_dev_cb_lock);
>> +
>> +	ret = 0;
>> +	if (event == RTE_EAL_DEV_EVENT_ADD) {
>> +		rte_free(dev_add_cb);
>> +		dev_add_cb = NULL;
>> +	} else {
> Device NULL checking?
>
>> +		for (cb = TAILQ_FIRST(&(device->uev_cbs)); cb != NULL;
>> +		      cb = next) {
>> +
>> +			next = TAILQ_NEXT(cb, next);
>> +
>> +			if (cb->cb_fn != cb_fn || cb->event != event ||
>> +					(cb->cb_arg != (void *)-1 &&
>> +					cb->cb_arg != cb_arg))
>> +				continue;
>> +
>> +			/*
>> +			 * if this callback is not executing right now,
>> +			 * then remove it.
>> +			 */
>> +			if (cb->active == 0) {
>> +				TAILQ_REMOVE(&(device->uev_cbs), cb,
>> next);
>> +				rte_free(cb);
>> +			} else {
>> +				ret = -EAGAIN;
>> +			}
>> +		}
>> +	}
>> +	rte_spinlock_unlock(&rte_dev_cb_lock);
>> +	return ret;
>> +}
>> +
>> +int
>> +_rte_dev_callback_process(struct rte_device *device,
>> +			enum rte_eal_dev_event_type event,
>> +			void *cb_arg, void *ret_param)
>> +{
>> +	struct rte_eal_dev_callback dev_cb;
>> +	struct rte_eal_dev_callback *cb_lst;
>> +	int rc = 0;
>> +
>> +	rte_spinlock_lock(&rte_dev_cb_lock);
>> +	if (event == RTE_EAL_DEV_EVENT_ADD) {
>> +		if (cb_arg != NULL)
>> +			dev_add_cb->cb_arg = cb_arg;
>> +
>> +		if (ret_param != NULL)
>> +			dev_add_cb->ret_param = ret_param;
>> +
>> +		rte_spinlock_unlock(&rte_dev_cb_lock);
> Can't someone free it when it running?
> I suggest to  keep the lock locked.
> Callbacks are not allowed to use this mechanism to prevent deadlock.
>
>> +		rc = dev_add_cb->cb_fn(dev_add_cb->event,
>> +				dev_add_cb->cb_arg, dev_add_cb-
>>> ret_param);
>> +		rte_spinlock_lock(&rte_dev_cb_lock);
>> +	} else {
>> +		TAILQ_FOREACH(cb_lst, &(device->uev_cbs), next) {
>> +			if (cb_lst->cb_fn == NULL || cb_lst->event != event)
>> +				continue;
>> +			dev_cb = *cb_lst;
>> +			cb_lst->active = 1;
>> +			if (cb_arg != NULL)
>> +				dev_cb.cb_arg = cb_arg;
>> +			if (ret_param != NULL)
>> +				dev_cb.ret_param = ret_param;
>> +
>> +			rte_spinlock_unlock(&rte_dev_cb_lock);
> The current active flag doesn't do it  thread safe here, I suggest to keep the lock locked.
> Scenario:
> 	1. Thread A see active = 0 in unregister function.
> 	2. Context switch.
> 	3. Thread B start the callback.
> 	4. Context switch.
> 	5. Thread A free it.
> 	6. Context switch.
> 	7. Seg fault in Thread B.
>
>> +			rc = dev_cb.cb_fn(dev_cb.event,
>> +					dev_cb.cb_arg, dev_cb.ret_param);
>> +			rte_spinlock_lock(&rte_dev_cb_lock);
>> +			cb_lst->active = 0;
>> +		}
>> +	}
>> +	rte_spinlock_unlock(&rte_dev_cb_lock);
>> +	return rc;
>> +}
>> diff --git a/lib/librte_eal/common/include/rte_bus.h
>> b/lib/librte_eal/common/include/rte_bus.h
>> index 6fb0834..6c4ae31 100644
>> --- a/lib/librte_eal/common/include/rte_bus.h
>> +++ b/lib/librte_eal/common/include/rte_bus.h
>> @@ -122,6 +122,34 @@ typedef struct rte_device *
>>   			 const void *data);
>>
>>   /**
>> + * Device iterator to find a device on a bus.
>> + *
>> + * This function returns an rte_device if one of those held by the bus
>> + * matches the data passed as parameter.
>> + *
>> + * If the comparison function returns zero this function should stop iterating
>> + * over any more devices. To continue a search the device of a previous
>> search
>> + * can be passed via the start parameter.
>> + *
>> + * @param cmp
>> + *	the device name comparison function.
>> + *
>> + * @param data
>> + *	Data to compare each device against.
>> + *
>> + * @param start
>> + *	starting point for the iteration
>> + *
>> + * @return
>> + *	The first device matching the data, NULL if none exists.
>> + */
>> +typedef struct rte_device *
>> +(*rte_bus_find_device_by_name_t)(const struct rte_device *start,
>> +			 rte_dev_cmp_name_t cmp,
>> +			 const void *data);
>> +
>> +
>> +/**
>>    * Implementation specific probe function which is responsible for linking
>>    * devices on that bus with applicable drivers.
>>    *
>> @@ -168,6 +196,37 @@ typedef int (*rte_bus_unplug_t)(struct rte_device
>> *dev);
>>   typedef int (*rte_bus_parse_t)(const char *name, void *addr);
>>
>>   /**
>> + * Implementation specific remap function which is responsible for
>> remmaping
>> + * devices on that bus from original share memory resource to a private
>> memory
>> + * resource for the sake of device has been removal.
>> + *
>> + * @param dev
>> + *	Device pointer that was returned by a previous call to find_device.
>> + *
>> + * @return
>> + *	0 on success.
>> + *	!0 on error.
>> + */
>> +typedef int (*rte_bus_remap_device_t)(struct rte_device *dev);
>> +
>> +/**
>> + * Implementation specific bind driver function which is responsible for bind
>> + * a explicit type of driver with a devices on that bus.
>> + *
>> + * @param dev_name
>> + *	device textual description.
>> + *
>> + * @param drv_type
>> + *	driver type textual description.
>> + *
>> + * @return
>> + *	0 on success.
>> + *	!0 on error.
>> + */
>> +typedef int (*rte_bus_bind_driver_t)(const char *dev_name,
>> +				const char *drv_type);
>> +
>> +/**
>>    * Bus scan policies
>>    */
>>   enum rte_bus_scan_mode {
>> @@ -206,9 +265,13 @@ struct rte_bus {
>>   	rte_bus_scan_t scan;         /**< Scan for devices attached to bus */
>>   	rte_bus_probe_t probe;       /**< Probe devices on bus */
>>   	rte_bus_find_device_t find_device; /**< Find a device on the bus */
>> +	rte_bus_find_device_by_name_t find_device_by_name;
>> +				     /**< Find a device on the bus */
>>   	rte_bus_plug_t plug;         /**< Probe single device for drivers */
>>   	rte_bus_unplug_t unplug;     /**< Remove single device from driver
>> */
>>   	rte_bus_parse_t parse;       /**< Parse a device name */
>> +	rte_bus_remap_device_t remap_device;       /**< remap a device */
>> +	rte_bus_bind_driver_t bind_driver; /**< bind a driver for bus device
>> */
>>   	struct rte_bus_conf conf;    /**< Bus configuration */
>>   	rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu
>> class */
>>   };
>> @@ -306,6 +369,12 @@ struct rte_bus *rte_bus_find(const struct rte_bus
>> *start, rte_bus_cmp_t cmp,
>>   struct rte_bus *rte_bus_find_by_device(const struct rte_device *dev);
>>
>>   /**
>> + * Find the registered bus for a particular device.
>> + */
>> +struct rte_device *rte_bus_find_device(const struct rte_bus *bus,
>> +				const void *dev_name);
>> +
>> +/**
>>    * Find the registered bus for a given name.
>>    */
>>   struct rte_bus *rte_bus_find_by_name(const char *busname);
>> diff --git a/lib/librte_eal/common/include/rte_dev.h
>> b/lib/librte_eal/common/include/rte_dev.h
>> index 9342e0c..19971d0 100644
>> --- a/lib/librte_eal/common/include/rte_dev.h
>> +++ b/lib/librte_eal/common/include/rte_dev.h
>> @@ -51,6 +51,15 @@ extern "C" {
>>
>>   #include <rte_log.h>
>>
>> +#include <exec-env/rte_dev.h>
>> +
>> +typedef int (*rte_eal_dev_cb_fn)(enum rte_eal_dev_event_type event,
>> +					void *cb_arg, void *ret_param);
>> +
>> +struct rte_eal_dev_callback;
>> +/** @internal Structure to keep track of registered callbacks */
>> +TAILQ_HEAD(rte_eal_dev_cb_list, rte_eal_dev_callback);
>> +
>>   __attribute__((format(printf, 2, 0)))
>>   static inline void
>>   rte_pmd_debug_trace(const char *func_name, const char *fmt, ...)
>> @@ -157,6 +166,13 @@ struct rte_driver {
>>    */
>>   #define RTE_DEV_NAME_MAX_LEN 64
>>
>> +enum device_state {
>> +	DEVICE_UNDEFINED,
>> +	DEVICE_FAULT,
>> +	DEVICE_PARSED,
>> +	DEVICE_PROBED,
>> +};
>> +
>>   /**
>>    * A structure describing a generic device.
>>    */
>> @@ -166,6 +182,9 @@ struct rte_device {
>>   	const struct rte_driver *driver;/**< Associated driver */
>>   	int numa_node;                /**< NUMA node connection */
>>   	struct rte_devargs *devargs;  /**< Device user arguments */
>> +	enum device_state state;  /**< Device state */
>> +	/** User application callbacks for device event */
>> +	struct rte_eal_dev_cb_list uev_cbs;
>>   };
>>
>>   /**
>> @@ -248,6 +267,8 @@ int rte_eal_hotplug_remove(const char *busname,
>> const char *devname);
>>    */
>>   typedef int (*rte_dev_cmp_t)(const struct rte_device *dev, const void
>> *data);
>>
>> +typedef int (*rte_dev_cmp_name_t)(const char *dev_name, const void
>> *data);
>> +
>>   #define RTE_PMD_EXPORT_NAME_ARRAY(n, idx) n##idx[]
>>
>>   #define RTE_PMD_EXPORT_NAME(name, idx) \
>> @@ -293,4 +314,72 @@ __attribute__((used)) = str
>>   }
>>   #endif
>>
>> +/**
>> + * It enable the device event monitoring for a specific event.
>> + *
>> + * @param none
>> + * @return
>> + *   - On success, zero.
>> + *   - On failure, a negative value.
>> + */
>> +int
>> +rte_eal_dev_monitor_enable(void);
>> +/**
>> + * It registers the callback for the specific event. Multiple
>> + * callbacks cal be registered at the same time.
>> + * @param event
>> + *  The device event type.
>> + * @param cb_fn
>> + *  callback address.
>> + * @param cb_arg
>> + *  address of parameter for callback.
>> + *
>> + * @return
>> + *  - On success, zero.
>> + *  - On failure, a negative value.
>> + */
>> +int rte_dev_callback_register(struct rte_device *device,
>> +			enum rte_eal_dev_event_type event,
>> +			rte_eal_dev_cb_fn cb_fn, void *cb_arg);
>> +
>> +/**
>> + * It unregisters the callback according to the specified event.
>> + *
>> + * @param event
>> + *  The event type which corresponding to the callback.
>> + * @param cb_fn
>> + *  callback address.
>> + *  address of parameter for callback, (void *)-1 means to remove all
>> + *  registered which has the same callback address.
>> + *
>> + * @return
>> + *  - On success, return the number of callback entities removed.
>> + *  - On failure, a negative value.
>> + */
>> +int rte_dev_callback_unregister(struct rte_device *device,
>> +			enum rte_eal_dev_event_type event,
>> +			rte_eal_dev_cb_fn cb_fn, void *cb_arg);
>> +
>> +/**
>> + * @internal Executes all the user application registered callbacks for
>> + * the specific device. It is for DPDK internal user only. User
>> + * application should not call it directly.
>> + *
>> + * @param event
>> + *  The device event type.
>> + * @param cb_arg
>> + *  callback parameter.
>> + * @param ret_param
>> + *  To pass data back to user application.
>> + *  This allows the user application to decide if a particular function
>> + *  is permitted or not.
>> + *
>> + * @return
>> + *  - On success, return zero.
>> + *  - On failure, a negative value.
>> + */
>> +int
>> +_rte_dev_callback_process(struct rte_device *device,
>> +			enum rte_eal_dev_event_type event,
>> +			void *cb_arg, void *ret_param);
>>   #endif /* _RTE_DEV_H_ */
>> diff --git a/lib/librte_eal/linuxapp/eal/Makefile
>> b/lib/librte_eal/linuxapp/eal/Makefile
>> index 5a7b8b2..05a2437 100644
>> --- a/lib/librte_eal/linuxapp/eal/Makefile
>> +++ b/lib/librte_eal/linuxapp/eal/Makefile
>> @@ -67,6 +67,7 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) +=
>> eal_lcore.c
>>   SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_timer.c
>>   SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_interrupts.c
>>   SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_alarm.c
>> +SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_dev.c
>>
>>   # from common dir
>>   SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_common_lcore.c
>> @@ -120,7 +121,7 @@ ifeq ($(CONFIG_RTE_TOOLCHAIN_GCC),y)
>>   CFLAGS_eal_thread.o += -Wno-return-type
>>   endif
>>
>> -INC := rte_kni_common.h
>> +INC := rte_kni_common.h rte_dev.h
>>
>>   SYMLINK-$(CONFIG_RTE_EXEC_ENV_LINUXAPP)-include/exec-env := \
>>   	$(addprefix include/exec-env/,$(INC))
>> diff --git a/lib/librte_eal/linuxapp/eal/eal_alarm.c
>> b/lib/librte_eal/linuxapp/eal/eal_alarm.c
>> index 8e4a775..29e73a7 100644
>> --- a/lib/librte_eal/linuxapp/eal/eal_alarm.c
>> +++ b/lib/librte_eal/linuxapp/eal/eal_alarm.c
>> @@ -209,6 +209,7 @@ rte_eal_alarm_cancel(rte_eal_alarm_callback cb_fn,
>> void *cb_arg)
>>   	int count = 0;
>>   	int err = 0;
>>   	int executing;
>> +	int ret;
>>
>>   	if (!cb_fn) {
>>   		rte_errno = EINVAL;
>> @@ -259,6 +260,10 @@ rte_eal_alarm_cancel(rte_eal_alarm_callback cb_fn,
>> void *cb_arg)
>>   			}
>>   			ap_prev = ap;
>>   		}
>> +
>> +		ret |= rte_intr_callback_unregister(&intr_handle,
>> +				eal_alarm_callback, NULL);
>> +
>>   		rte_spinlock_unlock(&alarm_list_lk);
>>   	} while (executing != 0);
>>
>> diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c
>> b/lib/librte_eal/linuxapp/eal/eal_dev.c
>> new file mode 100644
>> index 0000000..49fd0dc
>> --- /dev/null
>> +++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
>> @@ -0,0 +1,356 @@
>> +/*-
>> + *   Copyright(c) 2010-2017 Intel Corporation.
>> + *   All rights reserved.
>> + *
>> + *   Redistribution and use in source and binary forms, with or without
>> + *   modification, are permitted provided that the following conditions
>> + *   are met:
>> + *
>> + *     * Redistributions of source code must retain the above copyright
>> + *       notice, this list of conditions and the following disclaimer.
>> + *     * Redistributions in binary form must reproduce the above copyright
>> + *       notice, this list of conditions and the following disclaimer in
>> + *       the documentation and/or other materials provided with the
>> + *       distribution.
>> + *     * Neither the name of Intel Corporation nor the names of its
>> + *       contributors may be used to endorse or promote products derived
>> + *       from this software without specific prior written permission.
>> + *
>> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
>> CONTRIBUTORS
>> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT
>> NOT
>> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
>> FITNESS FOR
>> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
>> COPYRIGHT
>> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
>> INCIDENTAL,
>> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
>> NOT
>> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS
>> OF USE,
>> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
>> AND ON ANY
>> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
>> TORT
>> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
>> THE USE
>> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
>> DAMAGE.
>> + */
>> +
>> +#include <stdio.h>
>> +#include <string.h>
>> +#include <inttypes.h>
>> +#include <sys/queue.h>
>> +#include <sys/signalfd.h>
>> +#include <sys/ioctl.h>
>> +#include <sys/socket.h>
>> +#include <linux/netlink.h>
>> +#include <sys/epoll.h>
>> +#include <unistd.h>
>> +#include <signal.h>
>> +#include <stdbool.h>
>> +
>> +#include <rte_malloc.h>
>> +#include <rte_bus.h>
>> +#include <rte_dev.h>
>> +#include <rte_devargs.h>
>> +#include <rte_debug.h>
>> +#include <rte_log.h>
>> +
>> +#include "eal_thread.h"
>> +
>> +/* uev monitoring thread */
>> +static pthread_t uev_monitor_thread;
>> +
>> +bool udev_exit = true;
>> +
>> +bool no_request_thread = true;
>> +
>> +static void sig_handler(int signum)
>> +{
>> +	if (signum == SIGINT || signum == SIGTERM)
>> +		rte_dev_monitor_stop();
>> +}
>> +
>> +static int
>> +dev_monitor_fd_new(void)
>> +{
>> +
>> +	int uevent_fd;
>> +
>> +	uevent_fd = socket(PF_NETLINK, SOCK_RAW | SOCK_CLOEXEC |
>> +			SOCK_NONBLOCK,
>> +			NETLINK_KOBJECT_UEVENT);
>> +	if (uevent_fd < 0) {
>> +		RTE_LOG(ERR, EAL, "create uevent fd failed\n");
>> +		return -1;
>> +	}
>> +	return uevent_fd;
>> +}
>> +
>> +static int
>> +dev_monitor_enable(int netlink_fd)
>> +{
>> +	struct sockaddr_nl addr;
>> +	int ret;
>> +	int size = 64 * 1024;
>> +	int nonblock = 1;
>> +
>> +	memset(&addr, 0, sizeof(addr));
>> +	addr.nl_family = AF_NETLINK;
>> +	addr.nl_pid = 0;
>> +	addr.nl_groups = 0xffffffff;
>> +
>> +	if (bind(netlink_fd, (struct sockaddr *) &addr, sizeof(addr)) < 0) {
>> +		RTE_LOG(ERR, EAL, "bind failed\n");
>> +		goto err;
>> +	}
>> +
>> +	setsockopt(netlink_fd, SOL_SOCKET, SO_PASSCRED, &size,
>> sizeof(size));
>> +
>> +	ret = ioctl(netlink_fd, FIONBIO, &nonblock);
>> +	if (ret != 0) {
>> +		RTE_LOG(ERR, EAL, "ioctl(FIONBIO) failed\n");
>> +		goto err;
>> +	}
>> +	return 0;
>> +err:
>> +	close(netlink_fd);
>> +	return -1;
>> +}
>> +
>> +static void
>> +dev_uev_parse(const char *buf, struct rte_eal_uevent *event)
>> +{
>> +	char action[RTE_EAL_UEV_MSG_ELEM_LEN];
>> +	char subsystem[RTE_EAL_UEV_MSG_ELEM_LEN];
>> +	char dev_path[RTE_EAL_UEV_MSG_ELEM_LEN];
>> +	char pci_slot_name[RTE_EAL_UEV_MSG_ELEM_LEN];
>> +	int i = 0;
>> +
>> +	memset(action, 0, RTE_EAL_UEV_MSG_ELEM_LEN);
>> +	memset(subsystem, 0, RTE_EAL_UEV_MSG_ELEM_LEN);
>> +	memset(dev_path, 0, RTE_EAL_UEV_MSG_ELEM_LEN);
>> +	memset(pci_slot_name, 0, RTE_EAL_UEV_MSG_ELEM_LEN);
>> +
>> +	while (i < RTE_EAL_UEV_MSG_LEN) {
>> +		for (; i < RTE_EAL_UEV_MSG_LEN; i++) {
>> +			if (*buf)
>> +				break;
>> +			buf++;
>> +		}
>> +		if (!strncmp(buf, "libudev", 7)) {
>> +			buf += 7;
>> +			i += 7;
>> +			event->group = UEV_MONITOR_UDEV;
>> +		}
>> +		if (!strncmp(buf, "ACTION=", 7)) {
>> +			buf += 7;
>> +			i += 7;
>> +			snprintf(action, sizeof(action), "%s", buf);
>> +		} else if (!strncmp(buf, "DEVPATH=", 8)) {
>> +			buf += 8;
>> +			i += 8;
>> +			snprintf(dev_path, sizeof(dev_path), "%s", buf);
>> +		} else if (!strncmp(buf, "SUBSYSTEM=", 10)) {
>> +			buf += 10;
>> +			i += 10;
>> +			snprintf(subsystem, sizeof(subsystem), "%s", buf);
>> +		} else if (!strncmp(buf, "PCI_SLOT_NAME=", 14)) {
>> +			buf += 14;
>> +			i += 14;
>> +			snprintf(pci_slot_name, sizeof(subsystem), "%s",
>> buf);
>> +			event->devname = pci_slot_name;
>> +		}
>> +		for (; i < RTE_EAL_UEV_MSG_LEN; i++) {
>> +			if (*buf == '\0')
>> +				break;
>> +			buf++;
>> +		}
>> +	}
>> +
>> +	if (!strncmp(subsystem, "pci", 3))
>> +		event->subsystem = UEV_SUBSYSTEM_PCI;
>> +	if (!strncmp(action, "add", 3))
>> +		event->type = RTE_EAL_DEV_EVENT_ADD;
>> +	if (!strncmp(action, "remove", 6))
>> +		event->type = RTE_EAL_DEV_EVENT_REMOVE;
>> +	event->devname = pci_slot_name;
>> +}
>> +
>> +static int
>> +dev_uev_receive(int fd, struct rte_eal_uevent *uevent)
>> +{
>> +	int ret;
>> +	char buf[RTE_EAL_UEV_MSG_LEN];
>> +
>> +	memset(uevent, 0, sizeof(struct rte_eal_uevent));
>> +	memset(buf, 0, RTE_EAL_UEV_MSG_LEN);
>> +
>> +	ret = recv(fd, buf, RTE_EAL_UEV_MSG_LEN - 1, MSG_DONTWAIT);
>> +	if (ret < 0) {
>> +		RTE_LOG(ERR, EAL,
>> +		"Socket read error(%d): %s\n",
>> +		errno, strerror(errno));
>> +		return -1;
>> +	} else if (ret == 0)
>> +		/* connection closed */
>> +		return -1;
>> +
>> +	dev_uev_parse(buf, uevent);
>> +
>> +	return 0;
>> +}
>> +
>> +static int
>> +dev_uev_process(struct epoll_event *events, int nfds)
>> +{
>> +	struct rte_bus *bus;
>> +	struct rte_device *dev;
>> +	struct rte_eal_uevent uevent;
>> +	int ret;
>> +	int i;
>> +
>> +	for (i = 0; i < nfds; i++) {
>> +		/**
>> +		 * check device uevent from kernel side, no need to check
>> +		 * uevent from udev.
>> +		 */
>> +		if ((dev_uev_receive(events[i].data.fd, &uevent)) ||
>> +			(uevent.group == UEV_MONITOR_UDEV))
>> +			return 0;
>> +
>> +		/* default handle all pci devcie when is being hot plug */
>> +		if (uevent.subsystem == UEV_SUBSYSTEM_PCI) {
>> +			bus = rte_bus_find_by_name("pci");
>> +			dev = rte_bus_find_device(bus, uevent.devname);
>> +			if (uevent.type == RTE_EAL_DEV_EVENT_REMOVE) {
>> +
>> +				if ((!dev) || dev->state ==
>> DEVICE_UNDEFINED)
>> +					return 0;
>> +				dev->state = DEVICE_FAULT;
>> +
>> +				/**
>> +				 * remap the resource to be fake
>> +				 * before user's removal processing
>> +				 */
>> +				ret = bus->remap_device(dev);
>> +				if (!ret)
>> +
>> 	return(_rte_dev_callback_process(dev,
>> +					  RTE_EAL_DEV_EVENT_REMOVE,
>> +					  NULL, NULL));
> What is the reason to keep this device in EAL device list after the removal?
> I suggest to remove it (driver remove, bus remove and EAL remove) after the callbacks running.
> By this way EAL can initiate all device removals.
it will do device  removal from the device list by the eal device detach 
function in the call backs running. does it fulfill your concerns.
>> +			} else if (uevent.type == RTE_EAL_DEV_EVENT_ADD)
>> {
>> +				if (dev == NULL) {
>> +					/**
>> +					 * bind the driver to the device
>> +					 * before user's add processing
>> +					 */
>> +					bus->bind_driver(
>> +						uevent.devname,
>> +						"igb_uio");
>> +
> Similar comments here:
> EAL can initiate all device probe operations by adding the device and probing it here before the callback running.
> Then, also the device pointer can be passed to the callbacks.
>
>> 	return(_rte_dev_callback_process(NULL,
>> +					  RTE_EAL_DEV_EVENT_ADD,
>> +					  uevent.devname, NULL));
>> +				}
>> +			}
>> +		}
>> +	}
>> +	return 0;
>> +}
>> +
>> +/**
>> + * It builds/rebuilds up the epoll file descriptor with all the
>> + * file descriptors being waited on. Then handles the interrupts.
>> + *
>> + * @param arg
>> + *  pointer. (unused)
>> + *
>> + * @return
>> + *  never return;
>> + */
>> +static __attribute__((noreturn)) void *
>> +dev_uev_monitoring(__rte_unused void *arg)
>> +{
>> +	struct sigaction act;
>> +	sigset_t mask;
>> +	int netlink_fd;
>> +	struct epoll_event ep_kernel;
>> +	int fd_ep;
>> +
>> +	udev_exit = false;
>> +
>> +	/* set signal handlers */
>> +	memset(&act, 0x00, sizeof(struct sigaction));
>> +	act.sa_handler = sig_handler;
>> +	sigemptyset(&act.sa_mask);
>> +	act.sa_flags = SA_RESTART;
>> +	sigaction(SIGINT, &act, NULL);
>> +	sigaction(SIGTERM, &act, NULL);
>> +	sigemptyset(&mask);
>> +	sigaddset(&mask, SIGINT);
>> +	sigaddset(&mask, SIGTERM);
>> +	sigprocmask(SIG_UNBLOCK, &mask, NULL);
>> +
>> +	fd_ep = epoll_create1(EPOLL_CLOEXEC);
>> +	if (fd_ep < 0) {
>> +		RTE_LOG(ERR, EAL, "error creating epoll fd: %m\n");
>> +		goto out;
>> +	}
>> +
>> +	netlink_fd = dev_monitor_fd_new();
>> +
>> +	if (dev_monitor_enable(netlink_fd) < 0) {
>> +		RTE_LOG(ERR, EAL, "error subscribing to kernel events\n");
>> +		goto out;
>> +	}
>> +
>> +	memset(&ep_kernel, 0, sizeof(struct epoll_event));
>> +	ep_kernel.events = EPOLLIN | EPOLLPRI | EPOLLRDHUP | EPOLLHUP;
>> +	ep_kernel.data.fd = netlink_fd;
>> +	if (epoll_ctl(fd_ep, EPOLL_CTL_ADD, netlink_fd,
>> +		&ep_kernel) < 0) {
>> +		RTE_LOG(ERR, EAL, "error addding fd to epoll: %m\n");
>> +		goto out;
>> +	}
>> +
>> +	while (!udev_exit) {
>> +		int fdcount;
>> +		struct epoll_event ev[1];
>> +
>> +		fdcount = epoll_wait(fd_ep, ev, 1, -1);
>> +		if (fdcount < 0) {
>> +			if (errno != EINTR)
>> +				RTE_LOG(ERR, EAL, "error receiving uevent "
>> +					"message: %m\n");
>> +				continue;
>> +			}
>> +
>> +		/* epoll_wait has at least one fd ready to read */
>> +		if (dev_uev_process(ev, fdcount) < 0) {
>> +			if (errno != EINTR)
>> +				RTE_LOG(ERR, EAL, "error processing uevent
>> "
>> +					"message: %m\n");
>> +		}
>> +	}
>> +out:
>> +	if (fd_ep >= 0)
>> +		close(fd_ep);
>> +	if (netlink_fd >= 0)
>> +		close(netlink_fd);
>> +	rte_panic("uev monitoring fail\n");
>> +}
>> +
>> +int
>> +rte_dev_monitor_start(void)
>> +{
> Maybe add option to run it also by new EAL command line parameter?
>
>> +	int ret;
>> +
>> +	if (!no_request_thread)
>> +		return 0;
>> +	no_request_thread = false;
>> +
>> +	/* create the host thread to wait/handle the uevent from kernel */
>> +	ret = pthread_create(&uev_monitor_thread, NULL,
>> +		dev_uev_monitoring, NULL);
> What is the reason to open new thread for hotplug?
> Why not to use the current dpdk host thread by the alarm mechanism?
>
>> +	return ret;
>> +}
>> +
>> +int
>> +rte_dev_monitor_stop(void)
>> +{
>> +	udev_exit = true;
>> +	no_request_thread = true;
>> +	return 0;
>> +}
>> diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_dev.h
>> b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_dev.h
>> new file mode 100644
>> index 0000000..6a6feb5
>> --- /dev/null
>> +++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_dev.h
>> @@ -0,0 +1,106 @@
>> +/*-
>> + *   BSD LICENSE
>> + *
>> + *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
>> + *   All rights reserved.
>> + *
>> + *   Redistribution and use in source and binary forms, with or without
>> + *   modification, are permitted provided that the following conditions
>> + *   are met:
>> + *
>> + *     * Redistributions of source code must retain the above copyright
>> + *       notice, this list of conditions and the following disclaimer.
>> + *     * Redistributions in binary form must reproduce the above copyright
>> + *       notice, this list of conditions and the following disclaimer in
>> + *       the documentation and/or other materials provided with the
>> + *       distribution.
>> + *     * Neither the name of Intel Corporation nor the names of its
>> + *       contributors may be used to endorse or promote products derived
>> + *       from this software without specific prior written permission.
>> + *
>> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
>> CONTRIBUTORS
>> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT
>> NOT
>> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
>> FITNESS FOR
>> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
>> COPYRIGHT
>> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
>> INCIDENTAL,
>> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
>> NOT
>> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS
>> OF USE,
>> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
>> AND ON ANY
>> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
>> TORT
>> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
>> THE USE
>> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
>> DAMAGE.
>> + */
>> +
>> +#ifndef _RTE_DEV_H_
>> +#error "don't include this file directly, please include generic <rte_dev.h>"
>> +#endif
>> +
>> +#ifndef _RTE_LINUXAPP_DEV_H_
>> +#define _RTE_LINUXAPP_DEV_H_
>> +
>> +#include <stdio.h>
>> +
>> +#include <rte_dev.h>
>> +
>> +#define RTE_EAL_UEV_MSG_LEN 4096
>> +#define RTE_EAL_UEV_MSG_ELEM_LEN 128
>> +
>> +enum uev_subsystem {
>> +	UEV_SUBSYSTEM_UIO,
>> +	UEV_SUBSYSTEM_VFIO,
>> +	UEV_SUBSYSTEM_PCI,
>> +	UEV_SUBSYSTEM_MAX
>> +};
>> +
>> +enum uev_monitor_netlink_group {
>> +	UEV_MONITOR_KERNEL,
>> +	UEV_MONITOR_UDEV,
>> +};
>> +
>> +/**
>> + * The device event type.
>> + */
>> +enum rte_eal_dev_event_type {
>> +	RTE_EAL_DEV_EVENT_UNKNOWN,	/**< unknown event type */
>> +	RTE_EAL_DEV_EVENT_ADD,		/**< device adding event */
>> +	RTE_EAL_DEV_EVENT_REMOVE,
>> +					/**< device removing event */
>> +	RTE_EAL_DEV_EVENT_CHANGE,
>> +					/**< device status change event */
>> +	RTE_EAL_DEV_EVENT_MOVE,		/**< device sys path move
>> event */
>> +	RTE_EAL_DEV_EVENT_ONLINE,	/**< device online event */
>> +	RTE_EAL_DEV_EVENT_OFFLINE,	/**< device offline event */
>> +	RTE_EAL_DEV_EVENT_MAX		/**< max value of this enum
>> */
>> +};
>> +
>> +struct rte_eal_uevent {
>> +	enum rte_eal_dev_event_type type;	/**< device event type */
>> +	int subsystem;				/**< subsystem id */
>> +	char *devname;				/**< device name */
>> +	enum uev_monitor_netlink_group group;	/**< device netlink
>> group */
>> +};
>> +
>> +/**
>> + * Start the device uevent monitoring.
>> + *
>> + * @param none
>> + * @return
>> + *   - On success, zero.
>> + *   - On failure, a negative value.
>> + */
>> +int
>> +rte_dev_monitor_start(void);
>> +
>> +/**
>> + * Stop the device uevent monitoring .
>> + *
>> + * @param none
>> + * @return
>> + *   - On success, zero.
>> + *   - On failure, a negative value.
>> + */
>> +
>> +int
>> +rte_dev_monitor_stop(void);
>> +
>> +#endif /* _RTE_LINUXAPP_DEV_H_ */
>> diff --git a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
>> b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
>> index a3a98c1..d0e07b4 100644
>> --- a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
>> +++ b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
>> @@ -354,6 +354,12 @@ igbuio_pci_release(struct uio_info *info, struct
>> inode *inode)
>>   	struct rte_uio_pci_dev *udev = info->priv;
>>   	struct pci_dev *dev = udev->pdev;
>>
>> +	/* check if device have been remove before release */
>> +	if ((&dev->dev.kobj)->state_remove_uevent_sent == 1) {
>> +		pr_info("The device have been removed\n");
>> +		return -1;
>> +	}
>> +
>>   	/* disable interrupts */
>>   	igbuio_pci_disable_interrupts(udev);
>>
>> diff --git a/lib/librte_pci/rte_pci.c b/lib/librte_pci/rte_pci.c
>> index 0160fc1..feb5fd7 100644
>> --- a/lib/librte_pci/rte_pci.c
>> +++ b/lib/librte_pci/rte_pci.c
>> @@ -172,6 +172,26 @@ rte_pci_addr_parse(const char *str, struct
>> rte_pci_addr *addr)
>>   	return -1;
>>   }
>>
>> +/* map a private resource from an address*/
>> +void *
>> +pci_map_private_resource(void *requested_addr, off_t offset, size_t size)
>> +{
>> +	void *mapaddr;
>> +
>> +	mapaddr = mmap(requested_addr, size,
>> +			   PROT_READ | PROT_WRITE,
>> +			   MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED,
>> -1, 0);
>> +	if (mapaddr == MAP_FAILED) {
>> +		RTE_LOG(ERR, EAL, "%s(): cannot mmap(%p, 0x%lx, 0x%lx): "
>> +			"%s (%p)\n",
>> +			__func__, requested_addr,
>> +			(unsigned long)size, (unsigned long)offset,
>> +			strerror(errno), mapaddr);
>> +	} else
>> +		RTE_LOG(DEBUG, EAL, "  PCI memory mapped at %p\n",
>> mapaddr);
>> +
>> +	return mapaddr;
>> +}
>>
>>   /* map a particular resource from a file */
>>   void *
>> diff --git a/lib/librte_pci/rte_pci.h b/lib/librte_pci/rte_pci.h
>> index 4f2cd18..f6091a6 100644
>> --- a/lib/librte_pci/rte_pci.h
>> +++ b/lib/librte_pci/rte_pci.h
>> @@ -227,6 +227,23 @@ int rte_pci_addr_cmp(const struct rte_pci_addr
>> *addr,
>>   int rte_pci_addr_parse(const char *str, struct rte_pci_addr *addr);
>>
>>   /**
>> + * @internal
>> + * Map to a particular private resource.
>> + *
>> + * @param requested_addr
>> + *      The starting address for the new mapping range.
>> + * @param offset
>> + *      The offset for the mapping range.
>> + * @param size
>> + *      The size for the mapping range.
>> + * @return
>> + *   - On success, the function returns a pointer to the mapped area.
>> + *   - On error, the value MAP_FAILED is returned.
>> + */
>> +void *pci_map_private_resource(void *requested_addr, off_t offset,
>> +		size_t size);
>> +
>> +/**
>>    * Map a particular resource from a file.
>>    *
>>    * @param requested_addr
>> --
>> 2.7.4

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v7 1/2] eal: add uevent monitor for hot plug
  2018-01-08  5:26                           ` Guo, Jia
@ 2018-01-08  8:14                             ` Matan Azrad
  0 siblings, 0 replies; 494+ messages in thread
From: Matan Azrad @ 2018-01-08  8:14 UTC (permalink / raw)
  To: Guo, Jia, stephen, bruce.richardson, ferruh.yigit, gaetan.rivet
  Cc: konstantin.ananyev, jblunck, shreyansh.jain, jingjing.wu, dev,
	Thomas Monjalon, helin.zhang, Mordechay Haimovsky

Hi Guo
I answered for the 2 threads here. 

 From: Guo, Jia, Monday, January 8, 2018 7:27 AM
> On 1/3/2018 1:02 AM, Matan Azrad wrote:
> > Hi Jeff
> >
> > Maybe I'm touching in previous discussions but please see some
> comments\questions.
> >
> > From: Jeff Guo:
> >> This patch aim to add a general uevent mechanism in eal device layer,
> >> to enable all linux kernel object hot plug monitoring, so user could use
> these
> >> APIs to monitor and read out the device status info that sent from the
> kernel
> >> side, then corresponding to handle it, such as detach or attach the
> >> device, and even benefit to use it to do smoothly fail safe work.
> >>
> >> 1) About uevent monitoring:
> >> a: add one epolling to poll the netlink socket, to monitor the uevent of
> >>     the device, add device_state in struct of rte_device, to identify the
> >>     device state machine.
> >> b: add enum of rte_eal_dev_event_type and struct of rte_eal_uevent.
> >> c: add below API in rte eal device common layer.
> >>     rte_eal_dev_monitor_enable
> >>     rte_dev_callback_register
> >>     rte_dev_callback_unregister
> >>     _rte_dev_callback_process
> >>     rte_dev_monitor_start
> >>     rte_dev_monitor_stop
> >>
> >> 2) About failure handler, use pci uio for example,
> >>     add pci_remap_device in bus layer and below function to process it:
> >>     rte_pci_remap_device
> >>     pci_uio_remap_resource
> >>     pci_map_private_resource
> >>     add rte_pci_dev_bind_driver to bind pci device with explicit driver.
> >>
> >> Signed-off-by: Jeff Guo <jia.guo@intel.com>
<snip>
> >> +		user_cb->cb_arg = cb_arg;
> >> +		user_cb->event = event;
> >> +		if (event == RTE_EAL_DEV_EVENT_ADD)
> >> +			dev_add_cb = user_cb;
> > Only one dpdk entity can register to ADD callback?
> >
> > I suggest to add option to register all devices maybe by using dummy
> device which will include all the "ALL_DEVICES"  callbacks per event.
> > All means past, present and future devices, by this way 1 callback can be
> called for all the devices and more than one dpdk entity could register to  an
> ADD\NEW event.
> > What's about NEW instead of ADD?
> >
> > I also suggest to add the device pointer as a parameter to the
> callback(which will be managed by EAL).
> if you talk about dev_add_cb, the add means device add not cb add, if
> you talk about dev event type, the ADD type is consistent with the type
> form kernel side, anyway could be find a better.

I'm talking about next:
1. dev_add_cb can hold only 1 callback, why? Can't 2 callbacks to be registered to RTE_EAL_DEV_EVENT_ADD event? (actually there is memory leak in this case)
2. Suggestion to register same callback to "all" devices by 1 call.
3. Suggestion to add parameter for the callback functions - the device pointer. 
4. Suggestion to change name from RTE_EAL_DEV_EVENT_ADD to RTE_EAL_DEV_EVENT_NEW.
5. Clue how to implement 1,2 by dummy device.


> but for 1 callback for all device, it is make sense , i will think about that.
> >> +		else
> >> +			TAILQ_INSERT_TAIL(&(device->uev_cbs), user_cb,
> >> next);
> >> +	}
> >> +
> >> +	rte_spinlock_unlock(&rte_dev_cb_lock);
> >> +	return 0;
> >> +}
<snip>
> >> +int
> >> +_rte_dev_callback_process(struct rte_device *device,
> >> +			enum rte_eal_dev_event_type event,
> >> +			void *cb_arg, void *ret_param)
> >> +{
> >> +	struct rte_eal_dev_callback dev_cb;
> >> +	struct rte_eal_dev_callback *cb_lst;
> >> +	int rc = 0;
> >> +
> >> +	rte_spinlock_lock(&rte_dev_cb_lock);
> >> +	if (event == RTE_EAL_DEV_EVENT_ADD) {
> >> +		if (cb_arg != NULL)
> >> +			dev_add_cb->cb_arg = cb_arg;
> >> +
> >> +		if (ret_param != NULL)
> >> +			dev_add_cb->ret_param = ret_param;
> >> +
> >> +		rte_spinlock_unlock(&rte_dev_cb_lock);
> > Can't someone free it when it running?
> > I suggest to  keep the lock locked.
> > Callbacks are not allowed to use this mechanism to prevent deadlock.
> seems it would bring some deadlock here, let's check it more.

A deadlock should occur only when a callback tries to use this mechanism - I think it is OK, you just need to document it for the user. 

> >> +		rc = dev_add_cb->cb_fn(dev_add_cb->event,
> >> +				dev_add_cb->cb_arg, dev_add_cb-
> >>> ret_param);
> >> +		rte_spinlock_lock(&rte_dev_cb_lock);
> >> +	} else {
> >> +		TAILQ_FOREACH(cb_lst, &(device->uev_cbs), next) {
> >> +			if (cb_lst->cb_fn == NULL || cb_lst->event != event)
> >> +				continue;
> >> +			dev_cb = *cb_lst;
> >> +			cb_lst->active = 1;
> >> +			if (cb_arg != NULL)
> >> +				dev_cb.cb_arg = cb_arg;
> >> +			if (ret_param != NULL)
> >> +				dev_cb.ret_param = ret_param;
> >> +
> >> +			rte_spinlock_unlock(&rte_dev_cb_lock);
> > The current active flag doesn't do it  thread safe here, I suggest to keep the
> lock locked.
> > Scenario:
> > 	1. Thread A see active = 0 in unregister function.
> > 	2. Context switch.
> > 	3. Thread B start the callback.
> > 	4. Context switch.
> > 	5. Thread A free it.
> > 	6. Context switch.
> > 	7. Seg fault in Thread B.
> the same as above.
The same as above, and I think the active flag doesn't solve the race and you must solve it for the both cases.
I suggest just to keep the lock locked and document the optional deadlock by the callback code.

<snip> 
> >> +			rc = dev_cb.cb_fn(dev_cb.event,
> >> +					dev_cb.cb_arg, dev_cb.ret_param);
> >> +			rte_spinlock_lock(&rte_dev_cb_lock);
> >> +			cb_lst->active = 0;
> >> +		}
> >> +	}
> >> +	rte_spinlock_unlock(&rte_dev_cb_lock);
> >> +	return rc;
> >> +}
> >> 	return(_rte_dev_callback_process(dev,
> >> +					  RTE_EAL_DEV_EVENT_REMOVE,
> >> +					  NULL, NULL));
> > What is the reason to keep this device in EAL device list after the removal?
> > I suggest to remove it (driver remove, bus remove and EAL remove) after
> the callbacks running.
> > By this way EAL can initiate all device removals.
> agree, device should be remove from the eal device list after the removal.

I suggest using rte_eal_hotplug_remove().

<Bring from the second thread>
> it will do device  removal from the device list by the eal device detach 
>function in the call backs running. does it fulfill your concerns.

I mean the removal\probe should be initiated by the EAL and not by the users callbacks.

> >> +			} else if (uevent.type == RTE_EAL_DEV_EVENT_ADD)
> >> {
> >> +				if (dev == NULL) {
> >> +					/**
> >> +					 * bind the driver to the device
> >> +					 * before user's add processing
> >> +					 */
> >> +					bus->bind_driver(
> >> +						uevent.devname,
> >> +						"igb_uio");
> >> +
> > Similar comments here:
> > EAL can initiate all device probe operations by adding the device and
> probing it here before the callback running.
> > Then, also the device pointer can be passed to the callbacks.
> pass a device pointer could be bring some more change, let's think about
> more.

Yes, I know, it will help to the user especially in ADD(NEW) and REMOVE events.

Here you can use rte_eal_hotplug_add().

> >> 	return(_rte_dev_callback_process(NULL,
> >> +					  RTE_EAL_DEV_EVENT_ADD,
> >> +					  uevent.devname, NULL));
> >> +				}
> >> +			}
> >> +		}
> >> +	}
> >> +	return 0;
> >> +}
<snip>
> >> +int
> >> +rte_dev_monitor_start(void)
> >> +{
> > Maybe add option to run it also by new EAL command line parameter?
> good idea.
> >> +	int ret;
> >> +
> >> +	if (!no_request_thread)
> >> +		return 0;

Look, also here there is race, no_request_thread doesn't solve it.
Maybe the EAL parameter should be the only way to run it(just don't expose this API), I think the default should be TRUE.

> >> +	no_request_thread = false;
> >> +
> >> +	/* create the host thread to wait/handle the uevent from kernel */
> >> +	ret = pthread_create(&uev_monitor_thread, NULL,
> >> +		dev_uev_monitoring, NULL);
> > What is the reason to open new thread for hotplug?
> > Why not to use the current dpdk host thread by the alarm mechanism?
> appropriate if you could talk about what you mean the disadvantage of
> new thread here and the advantage of alarm mechanism at the case?

One more thread can complicate things - the user will need to synchronize his alarm\interrupt callbacks code(host thread) with his hotplug callbacks code(hotplug thread).  

> >> +	return ret;
> >> +}
> >> +
> >> +int
> >> +rte_dev_monitor_stop(void)
> >> +{
> >> +	udev_exit = true;
> >> +	no_request_thread = true;
> >> +	return 0;
> >> +}
<snip>

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v7 1/2] eal: add uevent monitor for hot plug
  2018-01-03  1:42                       ` [PATCH v7 1/2] eal: " Jeff Guo
  2018-01-02 17:02                         ` Matan Azrad
@ 2018-01-09  0:39                         ` Thomas Monjalon
  2018-01-09  8:25                           ` Guo, Jia
  1 sibling, 1 reply; 494+ messages in thread
From: Thomas Monjalon @ 2018-01-09  0:39 UTC (permalink / raw)
  To: Jeff Guo
  Cc: dev, stephen, bruce.richardson, ferruh.yigit, gaetan.rivet,
	konstantin.ananyev, shreyansh.jain, jingjing.wu, helin.zhang,
	motih, harry.van.haaren

Hi Jeff,

I am surprised that there is not a lot of review of these very
important patches. Maybe it is not easy to review.
Let's try to progress in the following days.

This patch is really too big with a lot of concepts spread
in separate files, so it is difficult to properly review.
Please, try to split in several patches, bringing only one concept
per patch.

At first, you can introduce the new events and the callback API.
The second patch (and the most important one) would be to bring
the uevent parsing for Linux (and void implementation for BSD).
Then you can add and explain some patches around PCI mapping.

At last there is the kernel binding effort - this one will probably
be ignored for 18.02, because it is another huge topic.
Without bothering with kernel binding, we can at least remove a device,
get a notification, and eventually re-add it. It is a good first step.
Anyway your testpmd patch tests exactly this scenario (totally new
devices are not seen).

More comments below:

03/01/2018 02:42, Jeff Guo:
> --- /dev/null
> +++ b/lib/librte_eal/bsdapp/eal/eal_dev.c
> @@ -0,0 +1,64 @@
> +/*-
> + *   Copyright(c) 2010-2017 Intel Corporation.
> + *   All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:

Please check how Intel Copyright and BSD license is newly formatted
with SPDX tag.

> --- /dev/null
> +++ b/lib/librte_eal/bsdapp/eal/include/exec-env/rte_dev.h
> +enum rte_eal_dev_event_type {
> +	RTE_EAL_DEV_EVENT_UNKNOWN,	/**< unknown event type */
> +	RTE_EAL_DEV_EVENT_ADD,		/**< device adding event */
> +	RTE_EAL_DEV_EVENT_REMOVE,
> +					/**< device removing event */
> +	RTE_EAL_DEV_EVENT_CHANGE,
> +					/**< device status change event */
> +	RTE_EAL_DEV_EVENT_MOVE,		/**< device sys path move event */
> +	RTE_EAL_DEV_EVENT_ONLINE,	/**< device online event */
> +	RTE_EAL_DEV_EVENT_OFFLINE,	/**< device offline event */
> +	RTE_EAL_DEV_EVENT_MAX		/**< max value of this enum */
> +};

The comments are not useful.
Please better explain what is change, move, online, etc.

The shorter prefix RTE_DEV is preferred over RTE_EAL_DEV.

This file is full of definitions which must be common, not specific
to BSD or Linux. Please move it.

> +int
> +_rte_dev_callback_process(struct rte_device *device,
> +			enum rte_eal_dev_event_type event,
> +			void *cb_arg, void *ret_param)

cb_arg must be an opaque parameter which is registered with the
callback and passed later. No need as parameter of this function.

ret_param is not needed at all. The kernel event will be just
translated as rte_eal_dev_event_type (rte_dev_event after rename).

> --- a/lib/librte_eal/common/include/rte_bus.h
> +++ b/lib/librte_eal/common/include/rte_bus.h
>  /**
> + * Device iterator to find a device on a bus.
> + *
> + * This function returns an rte_device if one of those held by the bus
> + * matches the data passed as parameter.
> + *
> + * If the comparison function returns zero this function should stop iterating
> + * over any more devices. To continue a search the device of a previous search
> + * can be passed via the start parameter.
> + *
> + * @param cmp
> + *	the device name comparison function.
> + *
> + * @param data
> + *	Data to compare each device against.
> + *
> + * @param start
> + *	starting point for the iteration
> + *
> + * @return
> + *	The first device matching the data, NULL if none exists.
> + */
> +typedef struct rte_device *
> +(*rte_bus_find_device_by_name_t)(const struct rte_device *start,
> +			 rte_dev_cmp_name_t cmp,
> +			 const void *data);

Why is it needed? There is already rte_bus_find_device_t.

> +/**
> + * Implementation specific remap function which is responsible for remmaping
> + * devices on that bus from original share memory resource to a private memory
> + * resource for the sake of device has been removal.
> + *
> + * @param dev
> + *	Device pointer that was returned by a previous call to find_device.
> + *
> + * @return
> + *	0 on success.
> + *	!0 on error.
> + */
> +typedef int (*rte_bus_remap_device_t)(struct rte_device *dev);

You need to better explain why this remap op is needed,
and when it is called exactly?

> @@ -206,9 +265,13 @@ struct rte_bus {
>  	rte_bus_scan_t scan;         /**< Scan for devices attached to bus */
>  	rte_bus_probe_t probe;       /**< Probe devices on bus */
>  	rte_bus_find_device_t find_device; /**< Find a device on the bus */
> +	rte_bus_find_device_by_name_t find_device_by_name;
> +				     /**< Find a device on the bus */
>  	rte_bus_plug_t plug;         /**< Probe single device for drivers */
>  	rte_bus_unplug_t unplug;     /**< Remove single device from driver */
>  	rte_bus_parse_t parse;       /**< Parse a device name */
> +	rte_bus_remap_device_t remap_device;       /**< remap a device */
> +	rte_bus_bind_driver_t bind_driver; /**< bind a driver for bus device */
>  	struct rte_bus_conf conf;    /**< Bus configuration */
>  	rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
>  };

Every new op must be introduced in a separate patch
(if not completely removed).

> --- a/lib/librte_eal/common/include/rte_dev.h
> +++ b/lib/librte_eal/common/include/rte_dev.h
> +enum device_state {
> +	DEVICE_UNDEFINED,
> +	DEVICE_FAULT,
> +	DEVICE_PARSED,
> +	DEVICE_PROBED,
> +};

These constants must prefixed with RTE_
and documented with doxygen please.

> +/**
> + * It enable the device event monitoring for a specific event.

This comment must be reworded.

> + *
> + * @param none

useless

> + * @return
> + *   - On success, zero.
> + *   - On failure, a negative value.
> + */
> +int
> +rte_eal_dev_monitor_enable(void);

I suggest to drop this function which is just calling rte_dev_monitor_start.

> --- a/lib/librte_eal/linuxapp/eal/eal_alarm.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_alarm.c
> @@ -259,6 +260,10 @@ rte_eal_alarm_cancel(rte_eal_alarm_callback cb_fn, void *cb_arg)
>  			}
>  			ap_prev = ap;
>  		}
> +
> +		ret |= rte_intr_callback_unregister(&intr_handle,
> +				eal_alarm_callback, NULL);
> +
>  		rte_spinlock_unlock(&alarm_list_lk);
>  	} while (executing != 0);

Looks to be unrelated.
If it is a fix, please do a separate patch.

> --- /dev/null
> +++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
> +int
> +rte_dev_monitor_start(void)

What about adding "event" in the name of this function?
	rte_dev_event_monitor_start

> +{
> +	int ret;
> +
> +	if (!no_request_thread)
> +		return 0;
> +	no_request_thread = false;
> +
> +	/* create the host thread to wait/handle the uevent from kernel */
> +	ret = pthread_create(&uev_monitor_thread, NULL,
> +		dev_uev_monitoring, NULL);
> +	return ret;
> +}

I think you should use rte_service for thread management.

> --- a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
> +++ b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
> @@ -354,6 +354,12 @@ igbuio_pci_release(struct uio_info *info, struct inode *inode)
>  	struct rte_uio_pci_dev *udev = info->priv;
>  	struct pci_dev *dev = udev->pdev;
>  
> +	/* check if device have been remove before release */
> +	if ((&dev->dev.kobj)->state_remove_uevent_sent == 1) {
> +		pr_info("The device have been removed\n");
> +		return -1;
> +	}

This looks to be unrelated. Separate patch please.


End of first pass review. There are some basic requirements that other
maintainers (especially at Intel) could have reviewed earlier.
Let's try to improve it quickly for 18.02, thanks.
If we are short in time, we should at least focus on adding the
events/callback API and the Linux events implementation.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v7 1/2] eal: add uevent monitor for hot plug
  2018-01-09  0:39                         ` Thomas Monjalon
@ 2018-01-09  8:25                           ` Guo, Jia
  2018-01-09 10:31                             ` Mordechay Haimovsky
  2018-01-09 11:38                             ` Thomas Monjalon
  0 siblings, 2 replies; 494+ messages in thread
From: Guo, Jia @ 2018-01-09  8:25 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: dev, stephen, bruce.richardson, ferruh.yigit, gaetan.rivet,
	konstantin.ananyev, shreyansh.jain, jingjing.wu, helin.zhang,
	motih, harry.van.haaren



On 1/9/2018 8:39 AM, Thomas Monjalon wrote:
> Hi Jeff,
>
> I am surprised that there is not a lot of review of these very
> important patches. Maybe it is not easy to review.
> Let's try to progress in the following days.
>
> This patch is really too big with a lot of concepts spread
> in separate files, so it is difficult to properly review.
> Please, try to split in several patches, bringing only one concept
> per patch.
>
> At first, you can introduce the new events and the callback API.
> The second patch (and the most important one) would be to bring
> the uevent parsing for Linux (and void implementation for BSD).
> Then you can add and explain some patches around PCI mapping.
>
> At last there is the kernel binding effort - this one will probably
> be ignored for 18.02, because it is another huge topic.
> Without bothering with kernel binding, we can at least remove a device,
> get a notification, and eventually re-add it. It is a good first step.
> Anyway your testpmd patch tests exactly this scenario (totally new
> devices are not seen).
i will  separate it for you all to benefit  for review.  for kernel 
binding, i just let it automatically compare with the first time 
manually binding, and it is the part of he hot plug flow. so i suggest 
to review more about that if it is not side effect and workable, beg for 
keep on.
> More comments below:
>
> 03/01/2018 02:42, Jeff Guo:
>> --- /dev/null
>> +++ b/lib/librte_eal/bsdapp/eal/eal_dev.c
>> @@ -0,0 +1,64 @@
>> +/*-
>> + *   Copyright(c) 2010-2017 Intel Corporation.
>> + *   All rights reserved.
>> + *
>> + *   Redistribution and use in source and binary forms, with or without
>> + *   modification, are permitted provided that the following conditions
>> + *   are met:
> Please check how Intel Copyright and BSD license is newly formatted
> with SPDX tag.
>
got it.
>> --- /dev/null
>> +++ b/lib/librte_eal/bsdapp/eal/include/exec-env/rte_dev.h
>> +enum rte_eal_dev_event_type {
>> +	RTE_EAL_DEV_EVENT_UNKNOWN,	/**< unknown event type */
>> +	RTE_EAL_DEV_EVENT_ADD,		/**< device adding event */
>> +	RTE_EAL_DEV_EVENT_REMOVE,
>> +					/**< device removing event */
>> +	RTE_EAL_DEV_EVENT_CHANGE,
>> +					/**< device status change event */
>> +	RTE_EAL_DEV_EVENT_MOVE,		/**< device sys path move event */
>> +	RTE_EAL_DEV_EVENT_ONLINE,	/**< device online event */
>> +	RTE_EAL_DEV_EVENT_OFFLINE,	/**< device offline event */
>> +	RTE_EAL_DEV_EVENT_MAX		/**< max value of this enum */
>> +};
> The comments are not useful.
> Please better explain what is change, move, online, etc.
>
> The shorter prefix RTE_DEV is preferred over RTE_EAL_DEV.
>
> This file is full of definitions which must be common, not specific
> to BSD or Linux. Please move it.
will move it to the better place.
>> +int
>> +_rte_dev_callback_process(struct rte_device *device,
>> +			enum rte_eal_dev_event_type event,
>> +			void *cb_arg, void *ret_param)
> cb_arg must be an opaque parameter which is registered with the
> callback and passed later. No need as parameter of this function.
>
> ret_param is not needed at all. The kernel event will be just
> translated as rte_eal_dev_event_type (rte_dev_event after rename).
suggest hold one to let new param, such as device info, add by 
ret_param, so cb_arg have set when register and no use anymore, delete it.
>> --- a/lib/librte_eal/common/include/rte_bus.h
>> +++ b/lib/librte_eal/common/include/rte_bus.h
>>   /**
>> + * Device iterator to find a device on a bus.
>> + *
>> + * This function returns an rte_device if one of those held by the bus
>> + * matches the data passed as parameter.
>> + *
>> + * If the comparison function returns zero this function should stop iterating
>> + * over any more devices. To continue a search the device of a previous search
>> + * can be passed via the start parameter.
>> + *
>> + * @param cmp
>> + *	the device name comparison function.
>> + *
>> + * @param data
>> + *	Data to compare each device against.
>> + *
>> + * @param start
>> + *	starting point for the iteration
>> + *
>> + * @return
>> + *	The first device matching the data, NULL if none exists.
>> + */
>> +typedef struct rte_device *
>> +(*rte_bus_find_device_by_name_t)(const struct rte_device *start,
>> +			 rte_dev_cmp_name_t cmp,
>> +			 const void *data);
> Why is it needed? There is already rte_bus_find_device_t.
because the rte_bus_find_device_t just find a device structure in the 
device list, but here need to find a device structure by device name 
which come from uevent info.


>> +/**
>> + * Implementation specific remap function which is responsible for remmaping
>> + * devices on that bus from original share memory resource to a private memory
>> + * resource for the sake of device has been removal.
>> + *
>> + * @param dev
>> + *	Device pointer that was returned by a previous call to find_device.
>> + *
>> + * @return
>> + *	0 on success.
>> + *	!0 on error.
>> + */
>> +typedef int (*rte_bus_remap_device_t)(struct rte_device *dev);
> You need to better explain why this remap op is needed,
> and when it is called exactly?
sure.
>> @@ -206,9 +265,13 @@ struct rte_bus {
>>   	rte_bus_scan_t scan;         /**< Scan for devices attached to bus */
>>   	rte_bus_probe_t probe;       /**< Probe devices on bus */
>>   	rte_bus_find_device_t find_device; /**< Find a device on the bus */
>> +	rte_bus_find_device_by_name_t find_device_by_name;
>> +				     /**< Find a device on the bus */
>>   	rte_bus_plug_t plug;         /**< Probe single device for drivers */
>>   	rte_bus_unplug_t unplug;     /**< Remove single device from driver */
>>   	rte_bus_parse_t parse;       /**< Parse a device name */
>> +	rte_bus_remap_device_t remap_device;       /**< remap a device */
>> +	rte_bus_bind_driver_t bind_driver; /**< bind a driver for bus device */
>>   	struct rte_bus_conf conf;    /**< Bus configuration */
>>   	rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
>>   };
> Every new op must be introduced in a separate patch
> (if not completely removed).
make sense.
>> --- a/lib/librte_eal/common/include/rte_dev.h
>> +++ b/lib/librte_eal/common/include/rte_dev.h
>> +enum device_state {
>> +	DEVICE_UNDEFINED,
>> +	DEVICE_FAULT,
>> +	DEVICE_PARSED,
>> +	DEVICE_PROBED,
>> +};
> These constants must prefixed with RTE_
> and documented with doxygen please.
got it.
>> +/**
>> + * It enable the device event monitoring for a specific event.
> This comment must be reworded.
ok.
>> + *
>> + * @param none
> useless
thanks.
>> + * @return
>> + *   - On success, zero.
>> + *   - On failure, a negative value.
>> + */
>> +int
>> +rte_eal_dev_monitor_enable(void);
> I suggest to drop this function which is just calling rte_dev_monitor_start.
more discuss, i suggest keep on it , let rte_dev_monitor_start 
separately stay on the platform code and let user commonly call 
rte_eal_dev_monitor_enable.
>> --- a/lib/librte_eal/linuxapp/eal/eal_alarm.c
>> +++ b/lib/librte_eal/linuxapp/eal/eal_alarm.c
>> @@ -259,6 +260,10 @@ rte_eal_alarm_cancel(rte_eal_alarm_callback cb_fn, void *cb_arg)
>>   			}
>>   			ap_prev = ap;
>>   		}
>> +
>> +		ret |= rte_intr_callback_unregister(&intr_handle,
>> +				eal_alarm_callback, NULL);
>> +
>>   		rte_spinlock_unlock(&alarm_list_lk);
>>   	} while (executing != 0);
> Looks to be unrelated.
> If it is a fix, please do a separate patch.
ok.
>> --- /dev/null
>> +++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
>> +int
>> +rte_dev_monitor_start(void)
> What about adding "event" in the name of this function?
> 	rte_dev_event_monitor_start
dev monitor sounds shock, agree.
>> +{
>> +	int ret;
>> +
>> +	if (!no_request_thread)
>> +		return 0;
>> +	no_request_thread = false;
>> +
>> +	/* create the host thread to wait/handle the uevent from kernel */
>> +	ret = pthread_create(&uev_monitor_thread, NULL,
>> +		dev_uev_monitoring, NULL);
>> +	return ret;
>> +}
> I think you should use rte_service for thread management.
thanks for your info, such a good mechanism to use that  i even not know 
that before. i will study and use it.
>> --- a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
>> +++ b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
>> @@ -354,6 +354,12 @@ igbuio_pci_release(struct uio_info *info, struct inode *inode)
>>   	struct rte_uio_pci_dev *udev = info->priv;
>>   	struct pci_dev *dev = udev->pdev;
>>   
>> +	/* check if device have been remove before release */
>> +	if ((&dev->dev.kobj)->state_remove_uevent_sent == 1) {
>> +		pr_info("The device have been removed\n");
>> +		return -1;
>> +	}
> This looks to be unrelated. Separate patch please.
>
got it.
> End of first pass review. There are some basic requirements that other
> maintainers (especially at Intel) could have reviewed earlier.
> Let's try to improve it quickly for 18.02, thanks.
> If we are short in time, we should at least focus on adding the
> events/callback API and the Linux events implementation.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v7 1/2] eal: add uevent monitor for hot plug
  2018-01-09  8:25                           ` Guo, Jia
@ 2018-01-09 10:31                             ` Mordechay Haimovsky
  2018-01-09 10:47                               ` Thomas Monjalon
  2018-01-09 11:45                               ` Guo, Jia
  2018-01-09 11:38                             ` Thomas Monjalon
  1 sibling, 2 replies; 494+ messages in thread
From: Mordechay Haimovsky @ 2018-01-09 10:31 UTC (permalink / raw)
  To: Guo, Jia, Thomas Monjalon
  Cc: dev, stephen, bruce.richardson, ferruh.yigit, gaetan.rivet,
	konstantin.ananyev, shreyansh.jain, jingjing.wu, helin.zhang,
	harry.van.haaren

Hi Jeff,
The following will not work for Mellanox devices (see inline)
"i will  separate it for you all to benefit  for review.  for kernel binding, i just let it automatically compare with the first time manually binding, and it is the part of he hot plug flow. so i suggest to review more about that if it is not side effect and workable, beg for keep on. "
Also please note the following compilation warnings,
...  lib/librte_eal/linuxapp/eal/eal_dev.c: In function 'dev_uev_monitoring':
..  lib/librte_eal/linuxapp/eal/eal_dev.c:331:8: error: 'netlink_fd' may be used uninitialized in this function [-Werror=maybe-uninit



... drivers/bus/pci/linux/pci.c: In function 'rte_pci_dev_bind_driver':

... /drivers/bus/pci/linux/pci.c:940:7: error: 'drv_bind_fd' may be used uninitialized in this function [-Werror=maybe-uninitialized]

You are releasing uninitialized FDs in the error path of both routines.
Moti H.
From: Guo, Jia [mailto:jia.guo@intel.com]
Sent: Tuesday, January 9, 2018 10:26 AM
To: Thomas Monjalon <thomas@monjalon.net>
Cc: dev@dpdk.org; stephen@networkplumber.org; bruce.richardson@intel.com; ferruh.yigit@intel.com; gaetan.rivet@6wind.com; konstantin.ananyev@intel.com; shreyansh.jain@nxp.com; jingjing.wu@intel.com; helin.zhang@intel.com; Mordechay Haimovsky <motih@mellanox.com>; harry.van.haaren@intel.com
Subject: Re: [dpdk-dev] [PATCH v7 1/2] eal: add uevent monitor for hot plug




On 1/9/2018 8:39 AM, Thomas Monjalon wrote:

Hi Jeff,



I am surprised that there is not a lot of review of these very

important patches. Maybe it is not easy to review.

Let's try to progress in the following days.



This patch is really too big with a lot of concepts spread

in separate files, so it is difficult to properly review.

Please, try to split in several patches, bringing only one concept

per patch.



At first, you can introduce the new events and the callback API.

The second patch (and the most important one) would be to bring

the uevent parsing for Linux (and void implementation for BSD).

Then you can add and explain some patches around PCI mapping.



At last there is the kernel binding effort - this one will probably

be ignored for 18.02, because it is another huge topic.

Without bothering with kernel binding, we can at least remove a device,

get a notification, and eventually re-add it. It is a good first step.

Anyway your testpmd patch tests exactly this scenario (totally new

devices are not seen).
i will  separate it for you all to benefit  for review.  for kernel binding, i just let it automatically compare with the first time manually binding, and it is the part of he hot plug flow. so i suggest to review more about that if it is not side effect and workable, beg for keep on.
This will not work for Mellanox which uses several drivers and services in order to map the device and device queues to user space.
For example,  the mlx4 PMD (PMD for ConnectX-3 devices) requires that mlx4_core mlx4_en and mlx4_ib drivers to be loaded, and
for RDM -core user-space libraries and daemons to be loaded.




More comments below:



03/01/2018 02:42, Jeff Guo:

--- /dev/null

+++ b/lib/librte_eal/bsdapp/eal/eal_dev.c

@@ -0,0 +1,64 @@

+/*-

+ *   Copyright(c) 2010-2017 Intel Corporation.

+ *   All rights reserved.

+ *

+ *   Redistribution and use in source and binary forms, with or without

+ *   modification, are permitted provided that the following conditions

+ *   are met:



Please check how Intel Copyright and BSD license is newly formatted

with SPDX tag.


got it.




--- /dev/null

+++ b/lib/librte_eal/bsdapp/eal/include/exec-env/rte_dev.h

+enum rte_eal_dev_event_type {

+ RTE_EAL_DEV_EVENT_UNKNOWN,     /**< unknown event type */

+ RTE_EAL_DEV_EVENT_ADD,         /**< device adding event */

+ RTE_EAL_DEV_EVENT_REMOVE,

+                                /**< device removing event */

+ RTE_EAL_DEV_EVENT_CHANGE,

+                                /**< device status change event */

+ RTE_EAL_DEV_EVENT_MOVE,               /**< device sys path move event */

+ RTE_EAL_DEV_EVENT_ONLINE,      /**< device online event */

+ RTE_EAL_DEV_EVENT_OFFLINE,     /**< device offline event */

+ RTE_EAL_DEV_EVENT_MAX          /**< max value of this enum */

+};



The comments are not useful.

Please better explain what is change, move, online, etc.



The shorter prefix RTE_DEV is preferred over RTE_EAL_DEV.



This file is full of definitions which must be common, not specific

to BSD or Linux. Please move it.
will move it to the better place.




+int

+_rte_dev_callback_process(struct rte_device *device,

+                 enum rte_eal_dev_event_type event,

+                 void *cb_arg, void *ret_param)



cb_arg must be an opaque parameter which is registered with the

callback and passed later. No need as parameter of this function.



ret_param is not needed at all. The kernel event will be just

translated as rte_eal_dev_event_type (rte_dev_event after rename).
suggest hold one to let new param, such as device info, add by ret_param, so cb_arg have set when register and no use anymore, delete it.




--- a/lib/librte_eal/common/include/rte_bus.h

+++ b/lib/librte_eal/common/include/rte_bus.h

 /**

+ * Device iterator to find a device on a bus.

+ *

+ * This function returns an rte_device if one of those held by the bus

+ * matches the data passed as parameter.

+ *

+ * If the comparison function returns zero this function should stop iterating

+ * over any more devices. To continue a search the device of a previous search

+ * can be passed via the start parameter.

+ *

+ * @param cmp

+ *       the device name comparison function.

+ *

+ * @param data

+ *       Data to compare each device against.

+ *

+ * @param start

+ *       starting point for the iteration

+ *

+ * @return

+ *       The first device matching the data, NULL if none exists.

+ */

+typedef struct rte_device *

+(*rte_bus_find_device_by_name_t)(const struct rte_device *start,

+                  rte_dev_cmp_name_t cmp,

+                  const void *data);



Why is it needed? There is already rte_bus_find_device_t.
because the rte_bus_find_device_t just find a device structure in the device list, but here need to find a device structure by device name which come from uevent info.




+/**

+ * Implementation specific remap function which is responsible for remmaping

+ * devices on that bus from original share memory resource to a private memory

+ * resource for the sake of device has been removal.

+ *

+ * @param dev

+ *       Device pointer that was returned by a previous call to find_device.

+ *

+ * @return

+ *       0 on success.

+ *       !0 on error.

+ */

+typedef int (*rte_bus_remap_device_t)(struct rte_device *dev);



You need to better explain why this remap op is needed,

and when it is called exactly?
sure.




@@ -206,9 +265,13 @@ struct rte_bus {

  rte_bus_scan_t scan;         /**< Scan for devices attached to bus */

  rte_bus_probe_t probe;       /**< Probe devices on bus */

  rte_bus_find_device_t find_device; /**< Find a device on the bus */

+ rte_bus_find_device_by_name_t find_device_by_name;

+                             /**< Find a device on the bus */

  rte_bus_plug_t plug;         /**< Probe single device for drivers */

  rte_bus_unplug_t unplug;     /**< Remove single device from driver */

  rte_bus_parse_t parse;       /**< Parse a device name */

+ rte_bus_remap_device_t remap_device;       /**< remap a device */

+ rte_bus_bind_driver_t bind_driver; /**< bind a driver for bus device */

  struct rte_bus_conf conf;    /**< Bus configuration */

  rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */

 };



Every new op must be introduced in a separate patch

(if not completely removed).
make sense.




--- a/lib/librte_eal/common/include/rte_dev.h

+++ b/lib/librte_eal/common/include/rte_dev.h

+enum device_state {

+ DEVICE_UNDEFINED,

+ DEVICE_FAULT,

+ DEVICE_PARSED,

+ DEVICE_PROBED,

+};



These constants must prefixed with RTE_

and documented with doxygen please.
got it.




+/**

+ * It enable the device event monitoring for a specific event.



This comment must be reworded.
ok.




+ *

+ * @param none



useless
thanks.




+ * @return

+ *   - On success, zero.

+ *   - On failure, a negative value.

+ */

+int

+rte_eal_dev_monitor_enable(void);



I suggest to drop this function which is just calling rte_dev_monitor_start.
more discuss, i suggest keep on it , let rte_dev_monitor_start separately stay on the platform code and let user commonly call rte_eal_dev_monitor_enable.



--- a/lib/librte_eal/linuxapp/eal/eal_alarm.c

+++ b/lib/librte_eal/linuxapp/eal/eal_alarm.c

@@ -259,6 +260,10 @@ rte_eal_alarm_cancel(rte_eal_alarm_callback cb_fn, void *cb_arg)

                  }

                  ap_prev = ap;

          }

+

+         ret |= rte_intr_callback_unregister(&intr_handle,

+                        eal_alarm_callback, NULL);

+

          rte_spinlock_unlock(&alarm_list_lk);

  } while (executing != 0);



Looks to be unrelated.

If it is a fix, please do a separate patch.
ok.




--- /dev/null

+++ b/lib/librte_eal/linuxapp/eal/eal_dev.c

+int

+rte_dev_monitor_start(void)



What about adding "event" in the name of this function?

 rte_dev_event_monitor_start
dev monitor sounds shock, agree.




+{

+ int ret;

+

+ if (!no_request_thread)

+         return 0;

+ no_request_thread = false;

+

+ /* create the host thread to wait/handle the uevent from kernel */

+ ret = pthread_create(&uev_monitor_thread, NULL,

+         dev_uev_monitoring, NULL);

+ return ret;

+}



I think you should use rte_service for thread management.
thanks for your info, such a good mechanism to use that  i even not know that before. i will study and use it.




--- a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c

+++ b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c

@@ -354,6 +354,12 @@ igbuio_pci_release(struct uio_info *info, struct inode *inode)

  struct rte_uio_pci_dev *udev = info->priv;

  struct pci_dev *dev = udev->pdev;



+ /* check if device have been remove before release */

+ if ((&dev->dev.kobj)->state_remove_uevent_sent == 1) {

+         pr_info("The device have been removed\n");

+         return -1;

+ }



This looks to be unrelated. Separate patch please.


got it.




End of first pass review. There are some basic requirements that other

maintainers (especially at Intel) could have reviewed earlier.

Let's try to improve it quickly for 18.02, thanks.

If we are short in time, we should at least focus on adding the

events/callback API and the Linux events implementation.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v7 1/2] eal: add uevent monitor for hot plug
  2018-01-09 10:31                             ` Mordechay Haimovsky
@ 2018-01-09 10:47                               ` Thomas Monjalon
  2018-01-09 11:39                                 ` Guo, Jia
  2018-01-09 11:45                               ` Guo, Jia
  1 sibling, 1 reply; 494+ messages in thread
From: Thomas Monjalon @ 2018-01-09 10:47 UTC (permalink / raw)
  To: Guo, Jia
  Cc: Mordechay Haimovsky, dev, stephen, bruce.richardson,
	ferruh.yigit, gaetan.rivet, konstantin.ananyev, shreyansh.jain,
	jingjing.wu, helin.zhang, harry.van.haaren

09/01/2018 11:31, Mordechay Haimovsky:
> From: Guo, Jia [mailto:jia.guo@intel.com]
> > On 1/9/2018 8:39 AM, Thomas Monjalon wrote:
> > > At last there is the kernel binding effort - this one will probably
> > > be ignored for 18.02, because it is another huge topic.
> > > Without bothering with kernel binding, we can at least remove a device,
> > > get a notification, and eventually re-add it. It is a good first step.
> > > Anyway your testpmd patch tests exactly this scenario (totally new
> > > devices are not seen).
> > 
> > i will  separate it for you all to benefit  for review.  for kernel
> > binding, i just let it automatically compare with the first time manually
> > binding, and it is the part of he hot plug flow. so i suggest to review
> > more about that if it is not side effect and workable, beg for keep on.
> 
> This will not work for Mellanox which uses several drivers and services
> in order to map the device and device queues to user space. For example, 
> the mlx4 PMD (PMD for ConnectX-3 devices) requires that mlx4_core mlx4_en
> and mlx4_ib drivers to be loaded, and for RDM -core user-space libraries
> and daemons to be loaded.

Yes automatic binding is a feature which requires more work.
It cannot be ready for 18.02.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v7 1/2] eal: add uevent monitor for hot plug
  2018-01-09  8:25                           ` Guo, Jia
  2018-01-09 10:31                             ` Mordechay Haimovsky
@ 2018-01-09 11:38                             ` Thomas Monjalon
  2018-01-09 11:58                               ` Guo, Jia
  1 sibling, 1 reply; 494+ messages in thread
From: Thomas Monjalon @ 2018-01-09 11:38 UTC (permalink / raw)
  To: Guo, Jia
  Cc: dev, stephen, bruce.richardson, ferruh.yigit, gaetan.rivet,
	konstantin.ananyev, shreyansh.jain, jingjing.wu, helin.zhang,
	motih, harry.van.haaren

09/01/2018 09:25, Guo, Jia:
> On 1/9/2018 8:39 AM, Thomas Monjalon wrote:
> >> +int
> >> +_rte_dev_callback_process(struct rte_device *device,
> >> +			enum rte_eal_dev_event_type event,
> >> +			void *cb_arg, void *ret_param)
> > 
> > cb_arg must be an opaque parameter which is registered with the
> > callback and passed later. No need as parameter of this function.
> >
> > ret_param is not needed at all. The kernel event will be just
> > translated as rte_eal_dev_event_type (rte_dev_event after rename).
> 
> suggest hold one to let new param, such as device info, add by 
> ret_param, so cb_arg have set when register and no use anymore, delete it.

Sorry I don't understand. Please rephrase.

> >> --- a/lib/librte_eal/common/include/rte_bus.h
> >> +++ b/lib/librte_eal/common/include/rte_bus.h
> >>   /**
> >> + * Device iterator to find a device on a bus.
> >> + *
> >> + * This function returns an rte_device if one of those held by the bus
> >> + * matches the data passed as parameter.
> >> + *
> >> + * If the comparison function returns zero this function should stop iterating
> >> + * over any more devices. To continue a search the device of a previous search
> >> + * can be passed via the start parameter.
> >> + *
> >> + * @param cmp
> >> + *	the device name comparison function.
> >> + *
> >> + * @param data
> >> + *	Data to compare each device against.
> >> + *
> >> + * @param start
> >> + *	starting point for the iteration
> >> + *
> >> + * @return
> >> + *	The first device matching the data, NULL if none exists.
> >> + */
> >> +typedef struct rte_device *
> >> +(*rte_bus_find_device_by_name_t)(const struct rte_device *start,
> >> +			 rte_dev_cmp_name_t cmp,
> >> +			 const void *data);
> > 
> > Why is it needed? There is already rte_bus_find_device_t.
> 
> because the rte_bus_find_device_t just find a device structure in the 
> device list, but here need to find a device structure by device name 
> which come from uevent info.

I don't understand how it is different?
Looking at the code, it is a copy/paste except it is dedicated
to name comparison.
You can remove rte_bus_find_device_by_name_t and provide a
comparison function which looks at name.

> >> +int
> >> +rte_eal_dev_monitor_enable(void);
> > 
> > I suggest to drop this function which is just calling rte_dev_monitor_start.
> 
> more discuss, i suggest keep on it , let rte_dev_monitor_start 
> separately stay on the platform code and let user commonly call 
> rte_eal_dev_monitor_enable.

Then you may need a disable function.
It will end up to be like start/stop.
I think it is just redundant.

If kept, please rename it to rte_dev_event_monitor_enable.

> >> +	/* create the host thread to wait/handle the uevent from kernel */
> >> +	ret = pthread_create(&uev_monitor_thread, NULL,
> >> +		dev_uev_monitoring, NULL);
> >> +	return ret;
> >> +}
> > 
> > I think you should use rte_service for thread management.
> 
> thanks for your info, such a good mechanism to use that  i even not know 
> that before. i will study and use it.

OK, good.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v7 1/2] eal: add uevent monitor for hot plug
  2018-01-09 10:47                               ` Thomas Monjalon
@ 2018-01-09 11:39                                 ` Guo, Jia
  2018-01-09 11:44                                   ` Thomas Monjalon
  0 siblings, 1 reply; 494+ messages in thread
From: Guo, Jia @ 2018-01-09 11:39 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Mordechay Haimovsky, dev, stephen, Richardson, Bruce, Yigit,
	Ferruh, gaetan.rivet, Ananyev, Konstantin, shreyansh.jain, Wu,
	Jingjing, Zhang, Helin, Van Haaren, Harry

So, how can separate the patch into more small patch, use stake or null implement in function. I think we should consider if it is a economic way now, if I could explain more detail in code for you all not very familiar the background? I have sent v8, please check, thanks all. 

Best regards,
Jeff Guo

-----Original Message-----
From: Thomas Monjalon [mailto:thomas@monjalon.net] 
Sent: Tuesday, January 9, 2018 6:48 PM
To: Guo, Jia <jia.guo@intel.com>
Cc: Mordechay Haimovsky <motih@mellanox.com>; dev@dpdk.org; stephen@networkplumber.org; Richardson, Bruce <bruce.richardson@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com>; gaetan.rivet@6wind.com; Ananyev, Konstantin <konstantin.ananyev@intel.com>; shreyansh.jain@nxp.com; Wu, Jingjing <jingjing.wu@intel.com>; Zhang, Helin <helin.zhang@intel.com>; Van Haaren, Harry <harry.van.haaren@intel.com>
Subject: Re: [dpdk-dev] [PATCH v7 1/2] eal: add uevent monitor for hot plug

09/01/2018 11:31, Mordechay Haimovsky:
> From: Guo, Jia [mailto:jia.guo@intel.com]
> > On 1/9/2018 8:39 AM, Thomas Monjalon wrote:
> > > At last there is the kernel binding effort - this one will 
> > > probably be ignored for 18.02, because it is another huge topic.
> > > Without bothering with kernel binding, we can at least remove a 
> > > device, get a notification, and eventually re-add it. It is a good first step.
> > > Anyway your testpmd patch tests exactly this scenario (totally new 
> > > devices are not seen).
> > 
> > i will  separate it for you all to benefit  for review.  for kernel 
> > binding, i just let it automatically compare with the first time 
> > manually binding, and it is the part of he hot plug flow. so i 
> > suggest to review more about that if it is not side effect and workable, beg for keep on.
> 
> This will not work for Mellanox which uses several drivers and 
> services in order to map the device and device queues to user space. 
> For example, the mlx4 PMD (PMD for ConnectX-3 devices) requires that 
> mlx4_core mlx4_en and mlx4_ib drivers to be loaded, and for RDM -core 
> user-space libraries and daemons to be loaded.

Yes automatic binding is a feature which requires more work.
It cannot be ready for 18.02.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v7 1/2] eal: add uevent monitor for hot plug
  2018-01-09 11:39                                 ` Guo, Jia
@ 2018-01-09 11:44                                   ` Thomas Monjalon
  2018-01-09 12:08                                     ` Guo, Jia
  0 siblings, 1 reply; 494+ messages in thread
From: Thomas Monjalon @ 2018-01-09 11:44 UTC (permalink / raw)
  To: Guo, Jia
  Cc: Mordechay Haimovsky, dev, stephen, Richardson, Bruce, Yigit,
	Ferruh, gaetan.rivet, Ananyev, Konstantin, shreyansh.jain, Wu,
	Jingjing, Zhang, Helin, Van Haaren, Harry

09/01/2018 12:39, Guo, Jia:
> So, how can separate the patch into more small patch, use stake or null implement in function. I think we should consider if it is a economic way now, if I could explain more detail in code for you all not very familiar the background? I have sent v8, please check, thanks all. 

The v8 is not split enough.
Please try to address all my comments.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v7 1/2] eal: add uevent monitor for hot plug
  2018-01-09 10:31                             ` Mordechay Haimovsky
  2018-01-09 10:47                               ` Thomas Monjalon
@ 2018-01-09 11:45                               ` Guo, Jia
  1 sibling, 0 replies; 494+ messages in thread
From: Guo, Jia @ 2018-01-09 11:45 UTC (permalink / raw)
  To: Mordechay Haimovsky, Thomas Monjalon
  Cc: dev, stephen, Richardson, Bruce, Yigit, Ferruh, gaetan.rivet,
	Ananyev, Konstantin, shreyansh.jain, Wu, Jingjing, Zhang, Helin,
	Van Haaren, Harry

Got it, seems that would be more complex work to do later, still need to debugging the feature more to fulfill  for more driver.
Best regards,
Jeff Guo

From: Mordechay Haimovsky [mailto:motih@mellanox.com]
Sent: Tuesday, January 9, 2018 6:32 PM
To: Guo, Jia <jia.guo@intel.com>; Thomas Monjalon <thomas@monjalon.net>
Cc: dev@dpdk.org; stephen@networkplumber.org; Richardson, Bruce <bruce.richardson@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com>; gaetan.rivet@6wind.com; Ananyev, Konstantin <konstantin.ananyev@intel.com>; shreyansh.jain@nxp.com; Wu, Jingjing <jingjing.wu@intel.com>; Zhang, Helin <helin.zhang@intel.com>; Van Haaren, Harry <harry.van.haaren@intel.com>
Subject: RE: [dpdk-dev] [PATCH v7 1/2] eal: add uevent monitor for hot plug

Hi Jeff,
The following will not work for Mellanox devices (see inline)
"i will  separate it for you all to benefit  for review.  for kernel binding, i just let it automatically compare with the first time manually binding, and it is the part of he hot plug flow. so i suggest to review more about that if it is not side effect and workable, beg for keep on. "
Also please note the following compilation warnings,
...  lib/librte_eal/linuxapp/eal/eal_dev.c: In function 'dev_uev_monitoring':
..  lib/librte_eal/linuxapp/eal/eal_dev.c:331:8: error: 'netlink_fd' may be used uninitialized in this function [-Werror=maybe-uninit



... drivers/bus/pci/linux/pci.c: In function 'rte_pci_dev_bind_driver':

... /drivers/bus/pci/linux/pci.c:940:7: error: 'drv_bind_fd' may be used uninitialized in this function [-Werror=maybe-uninitialized]

You are releasing uninitialized FDs in the error path of both routines.
Moti H.
From: Guo, Jia [mailto:jia.guo@intel.com]
Sent: Tuesday, January 9, 2018 10:26 AM
To: Thomas Monjalon <thomas@monjalon.net<mailto:thomas@monjalon.net>>
Cc: dev@dpdk.org<mailto:dev@dpdk.org>; stephen@networkplumber.org<mailto:stephen@networkplumber.org>; bruce.richardson@intel.com<mailto:bruce.richardson@intel.com>; ferruh.yigit@intel.com<mailto:ferruh.yigit@intel.com>; gaetan.rivet@6wind.com<mailto:gaetan.rivet@6wind.com>; konstantin.ananyev@intel.com<mailto:konstantin.ananyev@intel.com>; shreyansh.jain@nxp.com<mailto:shreyansh.jain@nxp.com>; jingjing.wu@intel.com<mailto:jingjing.wu@intel.com>; helin.zhang@intel.com<mailto:helin.zhang@intel.com>; Mordechay Haimovsky <motih@mellanox.com<mailto:motih@mellanox.com>>; harry.van.haaren@intel.com<mailto:harry.van.haaren@intel.com>
Subject: Re: [dpdk-dev] [PATCH v7 1/2] eal: add uevent monitor for hot plug




On 1/9/2018 8:39 AM, Thomas Monjalon wrote:

Hi Jeff,



I am surprised that there is not a lot of review of these very

important patches. Maybe it is not easy to review.

Let's try to progress in the following days.



This patch is really too big with a lot of concepts spread

in separate files, so it is difficult to properly review.

Please, try to split in several patches, bringing only one concept

per patch.



At first, you can introduce the new events and the callback API.

The second patch (and the most important one) would be to bring

the uevent parsing for Linux (and void implementation for BSD).

Then you can add and explain some patches around PCI mapping.



At last there is the kernel binding effort - this one will probably

be ignored for 18.02, because it is another huge topic.

Without bothering with kernel binding, we can at least remove a device,

get a notification, and eventually re-add it. It is a good first step.

Anyway your testpmd patch tests exactly this scenario (totally new

devices are not seen).
i will  separate it for you all to benefit  for review.  for kernel binding, i just let it automatically compare with the first time manually binding, and it is the part of he hot plug flow. so i suggest to review more about that if it is not side effect and workable, beg for keep on.
This will not work for Mellanox which uses several drivers and services in order to map the device and device queues to user space.
For example,  the mlx4 PMD (PMD for ConnectX-3 devices) requires that mlx4_core mlx4_en and mlx4_ib drivers to be loaded, and
for RDM -core user-space libraries and daemons to be loaded.




More comments below:



03/01/2018 02:42, Jeff Guo:

--- /dev/null

+++ b/lib/librte_eal/bsdapp/eal/eal_dev.c

@@ -0,0 +1,64 @@

+/*-

+ *   Copyright(c) 2010-2017 Intel Corporation.

+ *   All rights reserved.

+ *

+ *   Redistribution and use in source and binary forms, with or without

+ *   modification, are permitted provided that the following conditions

+ *   are met:



Please check how Intel Copyright and BSD license is newly formatted

with SPDX tag.


got it.



--- /dev/null

+++ b/lib/librte_eal/bsdapp/eal/include/exec-env/rte_dev.h

+enum rte_eal_dev_event_type {

+ RTE_EAL_DEV_EVENT_UNKNOWN,     /**< unknown event type */

+ RTE_EAL_DEV_EVENT_ADD,         /**< device adding event */

+ RTE_EAL_DEV_EVENT_REMOVE,

+                                /**< device removing event */

+ RTE_EAL_DEV_EVENT_CHANGE,

+                                /**< device status change event */

+ RTE_EAL_DEV_EVENT_MOVE,               /**< device sys path move event */

+ RTE_EAL_DEV_EVENT_ONLINE,      /**< device online event */

+ RTE_EAL_DEV_EVENT_OFFLINE,     /**< device offline event */

+ RTE_EAL_DEV_EVENT_MAX          /**< max value of this enum */

+};



The comments are not useful.

Please better explain what is change, move, online, etc.



The shorter prefix RTE_DEV is preferred over RTE_EAL_DEV.



This file is full of definitions which must be common, not specific

to BSD or Linux. Please move it.
will move it to the better place.



+int

+_rte_dev_callback_process(struct rte_device *device,

+                 enum rte_eal_dev_event_type event,

+                 void *cb_arg, void *ret_param)



cb_arg must be an opaque parameter which is registered with the

callback and passed later. No need as parameter of this function.



ret_param is not needed at all. The kernel event will be just

translated as rte_eal_dev_event_type (rte_dev_event after rename).
suggest hold one to let new param, such as device info, add by ret_param, so cb_arg have set when register and no use anymore, delete it.



--- a/lib/librte_eal/common/include/rte_bus.h

+++ b/lib/librte_eal/common/include/rte_bus.h

 /**

+ * Device iterator to find a device on a bus.

+ *

+ * This function returns an rte_device if one of those held by the bus

+ * matches the data passed as parameter.

+ *

+ * If the comparison function returns zero this function should stop iterating

+ * over any more devices. To continue a search the device of a previous search

+ * can be passed via the start parameter.

+ *

+ * @param cmp

+ *       the device name comparison function.

+ *

+ * @param data

+ *       Data to compare each device against.

+ *

+ * @param start

+ *       starting point for the iteration

+ *

+ * @return

+ *       The first device matching the data, NULL if none exists.

+ */

+typedef struct rte_device *

+(*rte_bus_find_device_by_name_t)(const struct rte_device *start,

+                  rte_dev_cmp_name_t cmp,

+                  const void *data);



Why is it needed? There is already rte_bus_find_device_t.
because the rte_bus_find_device_t just find a device structure in the device list, but here need to find a device structure by device name which come from uevent info.




+/**

+ * Implementation specific remap function which is responsible for remmaping

+ * devices on that bus from original share memory resource to a private memory

+ * resource for the sake of device has been removal.

+ *

+ * @param dev

+ *       Device pointer that was returned by a previous call to find_device.

+ *

+ * @return

+ *       0 on success.

+ *       !0 on error.

+ */

+typedef int (*rte_bus_remap_device_t)(struct rte_device *dev);



You need to better explain why this remap op is needed,

and when it is called exactly?
sure.



@@ -206,9 +265,13 @@ struct rte_bus {

  rte_bus_scan_t scan;         /**< Scan for devices attached to bus */

  rte_bus_probe_t probe;       /**< Probe devices on bus */

  rte_bus_find_device_t find_device; /**< Find a device on the bus */

+ rte_bus_find_device_by_name_t find_device_by_name;

+                             /**< Find a device on the bus */

  rte_bus_plug_t plug;         /**< Probe single device for drivers */

  rte_bus_unplug_t unplug;     /**< Remove single device from driver */

  rte_bus_parse_t parse;       /**< Parse a device name */

+ rte_bus_remap_device_t remap_device;       /**< remap a device */

+ rte_bus_bind_driver_t bind_driver; /**< bind a driver for bus device */

  struct rte_bus_conf conf;    /**< Bus configuration */

  rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */

 };



Every new op must be introduced in a separate patch

(if not completely removed).
make sense.



--- a/lib/librte_eal/common/include/rte_dev.h

+++ b/lib/librte_eal/common/include/rte_dev.h

+enum device_state {

+ DEVICE_UNDEFINED,

+ DEVICE_FAULT,

+ DEVICE_PARSED,

+ DEVICE_PROBED,

+};



These constants must prefixed with RTE_

and documented with doxygen please.
got it.



+/**

+ * It enable the device event monitoring for a specific event.



This comment must be reworded.
ok.



+ *

+ * @param none



useless
thanks.



+ * @return

+ *   - On success, zero.

+ *   - On failure, a negative value.

+ */

+int

+rte_eal_dev_monitor_enable(void);



I suggest to drop this function which is just calling rte_dev_monitor_start.
more discuss, i suggest keep on it , let rte_dev_monitor_start separately stay on the platform code and let user commonly call rte_eal_dev_monitor_enable.



--- a/lib/librte_eal/linuxapp/eal/eal_alarm.c

+++ b/lib/librte_eal/linuxapp/eal/eal_alarm.c

@@ -259,6 +260,10 @@ rte_eal_alarm_cancel(rte_eal_alarm_callback cb_fn, void *cb_arg)

                  }

                  ap_prev = ap;

          }

+

+         ret |= rte_intr_callback_unregister(&intr_handle,

+                        eal_alarm_callback, NULL);

+

          rte_spinlock_unlock(&alarm_list_lk);

  } while (executing != 0);



Looks to be unrelated.

If it is a fix, please do a separate patch.
ok.



--- /dev/null

+++ b/lib/librte_eal/linuxapp/eal/eal_dev.c

+int

+rte_dev_monitor_start(void)



What about adding "event" in the name of this function?

 rte_dev_event_monitor_start
dev monitor sounds shock, agree.



+{

+ int ret;

+

+ if (!no_request_thread)

+         return 0;

+ no_request_thread = false;

+

+ /* create the host thread to wait/handle the uevent from kernel */

+ ret = pthread_create(&uev_monitor_thread, NULL,

+         dev_uev_monitoring, NULL);

+ return ret;

+}



I think you should use rte_service for thread management.
thanks for your info, such a good mechanism to use that  i even not know that before. i will study and use it.



--- a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c

+++ b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c

@@ -354,6 +354,12 @@ igbuio_pci_release(struct uio_info *info, struct inode *inode)

  struct rte_uio_pci_dev *udev = info->priv;

  struct pci_dev *dev = udev->pdev;



+ /* check if device have been remove before release */

+ if ((&dev->dev.kobj)->state_remove_uevent_sent == 1) {

+         pr_info("The device have been removed\n");

+         return -1;

+ }



This looks to be unrelated. Separate patch please.


got it.



End of first pass review. There are some basic requirements that other

maintainers (especially at Intel) could have reviewed earlier.

Let's try to improve it quickly for 18.02, thanks.

If we are short in time, we should at least focus on adding the

events/callback API and the Linux events implementation.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v7 1/2] eal: add uevent monitor for hot plug
  2018-01-09 11:38                             ` Thomas Monjalon
@ 2018-01-09 11:58                               ` Guo, Jia
  2018-01-09 13:40                                 ` Thomas Monjalon
  0 siblings, 1 reply; 494+ messages in thread
From: Guo, Jia @ 2018-01-09 11:58 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: dev, stephen, bruce.richardson, ferruh.yigit, gaetan.rivet,
	konstantin.ananyev, shreyansh.jain, jingjing.wu, helin.zhang,
	motih, harry.van.haaren



On 1/9/2018 7:38 PM, Thomas Monjalon wrote:
> 09/01/2018 09:25, Guo, Jia:
>> On 1/9/2018 8:39 AM, Thomas Monjalon wrote:
>>>> +int
>>>> +_rte_dev_callback_process(struct rte_device *device,
>>>> +			enum rte_eal_dev_event_type event,
>>>> +			void *cb_arg, void *ret_param)
>>> cb_arg must be an opaque parameter which is registered with the
>>> callback and passed later. No need as parameter of this function.
>>>
>>> ret_param is not needed at all. The kernel event will be just
>>> translated as rte_eal_dev_event_type (rte_dev_event after rename).
>> suggest hold one to let new param, such as device info, add by
>> ret_param, so cb_arg have set when register and no use anymore, delete it.
> Sorry I don't understand. Please rephrase.
please see v8 part of it. i need ret_param to pass the device name by 
the call back to the user.
>>>> --- a/lib/librte_eal/common/include/rte_bus.h
>>>> +++ b/lib/librte_eal/common/include/rte_bus.h
>>>>    /**
>>>> + * Device iterator to find a device on a bus.
>>>> + *
>>>> + * This function returns an rte_device if one of those held by the bus
>>>> + * matches the data passed as parameter.
>>>> + *
>>>> + * If the comparison function returns zero this function should stop iterating
>>>> + * over any more devices. To continue a search the device of a previous search
>>>> + * can be passed via the start parameter.
>>>> + *
>>>> + * @param cmp
>>>> + *	the device name comparison function.
>>>> + *
>>>> + * @param data
>>>> + *	Data to compare each device against.
>>>> + *
>>>> + * @param start
>>>> + *	starting point for the iteration
>>>> + *
>>>> + * @return
>>>> + *	The first device matching the data, NULL if none exists.
>>>> + */
>>>> +typedef struct rte_device *
>>>> +(*rte_bus_find_device_by_name_t)(const struct rte_device *start,
>>>> +			 rte_dev_cmp_name_t cmp,
>>>> +			 const void *data);
>>> Why is it needed? There is already rte_bus_find_device_t.
>> because the rte_bus_find_device_t just find a device structure in the
>> device list, but here need to find a device structure by device name
>> which come from uevent info.
> I don't understand how it is different?
> Looking at the code, it is a copy/paste except it is dedicated
> to name comparison.
> You can remove rte_bus_find_device_by_name_t and provide a
> comparison function which looks at name.
i mean that if the device have been remove and then insertion, the 
device have not construct when just got the device name from the uevent 
massage,  so this case could i use the original find device function?
>>>> +int
>>>> +rte_eal_dev_monitor_enable(void);
>>> I suggest to drop this function which is just calling rte_dev_monitor_start.
>> more discuss, i suggest keep on it , let rte_dev_monitor_start
>> separately stay on the platform code and let user commonly call
>> rte_eal_dev_monitor_enable.
> Then you may need a disable function.
> It will end up to be like start/stop.
> I think it is just redundant.
>
> If kept, please rename it to rte_dev_event_monitor_enable.
>>>> +	/* create the host thread to wait/handle the uevent from kernel */
>>>> +	ret = pthread_create(&uev_monitor_thread, NULL,
>>>> +		dev_uev_monitoring, NULL);
>>>> +	return ret;
>>>> +}
>>> I think you should use rte_service for thread management.
>> thanks for your info, such a good mechanism to use that  i even not know
>> that before. i will study and use it.
> OK, good.
>

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v7 1/2] eal: add uevent monitor for hot plug
  2018-01-09 11:44                                   ` Thomas Monjalon
@ 2018-01-09 12:08                                     ` Guo, Jia
  2018-01-09 12:42                                       ` Gaëtan Rivet
  2018-01-09 13:44                                       ` Thomas Monjalon
  0 siblings, 2 replies; 494+ messages in thread
From: Guo, Jia @ 2018-01-09 12:08 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Mordechay Haimovsky, dev, stephen, Richardson, Bruce, Yigit,
	Ferruh, gaetan.rivet, Ananyev, Konstantin, shreyansh.jain, Wu,
	Jingjing, Zhang, Helin, Van Haaren, Harry

Your comments about split it totally make sense ,no doubt that, but my question is that if split api with the funcational , so the function part should be set null implement or stake. Any other good idea or tip for that. 

Best regards,
Jeff Guo


-----Original Message-----
From: Thomas Monjalon [mailto:thomas@monjalon.net] 
Sent: Tuesday, January 9, 2018 7:45 PM
To: Guo, Jia <jia.guo@intel.com>
Cc: Mordechay Haimovsky <motih@mellanox.com>; dev@dpdk.org; stephen@networkplumber.org; Richardson, Bruce <bruce.richardson@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com>; gaetan.rivet@6wind.com; Ananyev, Konstantin <konstantin.ananyev@intel.com>; shreyansh.jain@nxp.com; Wu, Jingjing <jingjing.wu@intel.com>; Zhang, Helin <helin.zhang@intel.com>; Van Haaren, Harry <harry.van.haaren@intel.com>
Subject: Re: [dpdk-dev] [PATCH v7 1/2] eal: add uevent monitor for hot plug

09/01/2018 12:39, Guo, Jia:
> So, how can separate the patch into more small patch, use stake or null implement in function. I think we should consider if it is a economic way now, if I could explain more detail in code for you all not very familiar the background? I have sent v8, please check, thanks all. 

The v8 is not split enough.
Please try to address all my comments.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v7 1/2] eal: add uevent monitor for hot plug
  2018-01-09 12:08                                     ` Guo, Jia
@ 2018-01-09 12:42                                       ` Gaëtan Rivet
  2018-01-10  9:29                                         ` Guo, Jia
  2018-01-09 13:44                                       ` Thomas Monjalon
  1 sibling, 1 reply; 494+ messages in thread
From: Gaëtan Rivet @ 2018-01-09 12:42 UTC (permalink / raw)
  To: Guo, Jia
  Cc: Thomas Monjalon, Mordechay Haimovsky, dev, stephen, Richardson,
	Bruce, Yigit, Ferruh, Ananyev, Konstantin, shreyansh.jain, Wu,
	Jingjing, Zhang, Helin, Van Haaren, Harry

Hi Jeff,

On Tue, Jan 09, 2018 at 12:08:52PM +0000, Guo, Jia wrote:
> Your comments about split it totally make sense ,no doubt that, but my question is that if split api with the funcational , so the function part should be set null implement or stake. Any other good idea or tip for that. 
> 

Please avoid top-posting on the mailing list, it is confusing when
reading a thread intertwined with inner-posted mails.

Regarding your issue, it is fine to propose a first skeleton API with
bare implementations, then progressively use your new functions where
relevant.

It is only necessary to ensure compilation is always possible between
each patch. The API itself need not be usable, as long as the patch
order remains coherent and meaningful for review.

Otherwise, sorry about not doing a review earlier, I didn't think I knew
enough about uevent to provide useful comments. However after a quick
reading I may be able to provide a few remarks.

I will wait for your split before doing so.

> Best regards,
> Jeff Guo
> 
> 
> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas@monjalon.net] 
> Sent: Tuesday, January 9, 2018 7:45 PM
> To: Guo, Jia <jia.guo@intel.com>
> Cc: Mordechay Haimovsky <motih@mellanox.com>; dev@dpdk.org; stephen@networkplumber.org; Richardson, Bruce <bruce.richardson@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com>; gaetan.rivet@6wind.com; Ananyev, Konstantin <konstantin.ananyev@intel.com>; shreyansh.jain@nxp.com; Wu, Jingjing <jingjing.wu@intel.com>; Zhang, Helin <helin.zhang@intel.com>; Van Haaren, Harry <harry.van.haaren@intel.com>
> Subject: Re: [dpdk-dev] [PATCH v7 1/2] eal: add uevent monitor for hot plug
> 
> 09/01/2018 12:39, Guo, Jia:
> > So, how can separate the patch into more small patch, use stake or null implement in function. I think we should consider if it is a economic way now, if I could explain more detail in code for you all not very familiar the background? I have sent v8, please check, thanks all. 
> 
> The v8 is not split enough.
> Please try to address all my comments.

-- 
Gaëtan Rivet
6WIND

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v7 1/2] eal: add uevent monitor for hot plug
  2018-01-09 11:58                               ` Guo, Jia
@ 2018-01-09 13:40                                 ` Thomas Monjalon
  0 siblings, 0 replies; 494+ messages in thread
From: Thomas Monjalon @ 2018-01-09 13:40 UTC (permalink / raw)
  To: Guo, Jia
  Cc: dev, stephen, bruce.richardson, ferruh.yigit, gaetan.rivet,
	konstantin.ananyev, shreyansh.jain, jingjing.wu, helin.zhang,
	motih, harry.van.haaren

> >>>> --- a/lib/librte_eal/common/include/rte_bus.h
> >>>> +++ b/lib/librte_eal/common/include/rte_bus.h
> >>>>    /**
> >>>> + * Device iterator to find a device on a bus.
> >>>> + *
> >>>> + * This function returns an rte_device if one of those held by the bus
> >>>> + * matches the data passed as parameter.
> >>>> + *
> >>>> + * If the comparison function returns zero this function should stop iterating
> >>>> + * over any more devices. To continue a search the device of a previous search
> >>>> + * can be passed via the start parameter.
> >>>> + *
> >>>> + * @param cmp
> >>>> + *	the device name comparison function.
> >>>> + *
> >>>> + * @param data
> >>>> + *	Data to compare each device against.
> >>>> + *
> >>>> + * @param start
> >>>> + *	starting point for the iteration
> >>>> + *
> >>>> + * @return
> >>>> + *	The first device matching the data, NULL if none exists.
> >>>> + */
> >>>> +typedef struct rte_device *
> >>>> +(*rte_bus_find_device_by_name_t)(const struct rte_device *start,
> >>>> +			 rte_dev_cmp_name_t cmp,
> >>>> +			 const void *data);
> >>> Why is it needed? There is already rte_bus_find_device_t.
> >> because the rte_bus_find_device_t just find a device structure in the
> >> device list, but here need to find a device structure by device name
> >> which come from uevent info.
> > I don't understand how it is different?
> > Looking at the code, it is a copy/paste except it is dedicated
> > to name comparison.
> > You can remove rte_bus_find_device_by_name_t and provide a
> > comparison function which looks at name.
> i mean that if the device have been remove and then insertion, the 
> device have not construct when just got the device name from the uevent 
> massage,  so this case could i use the original find device function?

The device won't be in the list if it is not yet scanned.
Anyway your function checks the same list.

Let's stop this discussion for now and continue when the need for
this function is better explained in a separate patch.
You really need to introduce things one patch at a time and explain
why you introduce them in the message of each patch.
Thanks

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v7 1/2] eal: add uevent monitor for hot plug
  2018-01-09 12:08                                     ` Guo, Jia
  2018-01-09 12:42                                       ` Gaëtan Rivet
@ 2018-01-09 13:44                                       ` Thomas Monjalon
  2018-01-10  9:32                                         ` Guo, Jia
  1 sibling, 1 reply; 494+ messages in thread
From: Thomas Monjalon @ 2018-01-09 13:44 UTC (permalink / raw)
  To: Guo, Jia
  Cc: Mordechay Haimovsky, dev, stephen, Richardson, Bruce, Yigit,
	Ferruh, gaetan.rivet, Ananyev, Konstantin, shreyansh.jain, Wu,
	Jingjing, Zhang, Helin, Van Haaren, Harry

09/01/2018 13:08, Guo, Jia:
> Your comments about split it totally make sense ,no doubt that, but my question is that if split api with the funcational , so the function part should be set null implement or stake. Any other good idea or tip for that. 

Yes when introducing the callback API first, there will be no
implementation, so the callbacks are not called.
If needed you can have some empty functions.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH V8 0/3] add uevent mechanism in eal framework
  2018-01-03  1:42                       ` [PATCH v7 2/2] app/testpmd: use uevent to monitor hotplug Jeff Guo
@ 2018-01-10  3:30                         ` Jeff Guo
  2018-01-10  3:30                           ` [PATCH V8 1/3] eal: add uevent monitor for hot plug Jeff Guo
                                             ` (2 more replies)
  0 siblings, 3 replies; 494+ messages in thread
From: Jeff Guo @ 2018-01-10  3:30 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, gaetan.rivet
  Cc: konstantin.ananyev, jblunck, shreyansh.jain, jingjing.wu, dev,
	jia.guo, thomas, helin.zhang, motih

So far, about hot plug in dpdk, we already have hot plug add/remove
api and fail-safe driver to offload the fail-safe work from the app
user. But there are still lack of a general event api, since the interrupt
event, which hot plug related with, is diversity between each device and
driver, such as mlx4, pci driver and others.

Use the hot removal event for example, pci drivers not all exposure the
remove interrupt, so in order to make user to easy use the hot plug feature
for pci driver, something must be done to detect the remove event at the
kernel level and offer a new line of interrupt to the user land.

Base on the uevent of kobject mechanism in kernel, we could use it to
benefit for monitoring the hot plug status of the device which not only
uio/vfio of pci bus devices, but also other, such as cpu/usb/pci-express
bus devices.

The idea is comming as bellow.

a.The uevent message form FD monitoring which will be useful.
remove@/devices/pci0000:80/0000:80:02.2/0000:82:00.0/0000:83:03.0/0000:84:00.2/uio/uio2
ACTION=remove
DEVPATH=/devices/pci0000:80/0000:80:02.2/0000:82:00.0/0000:83:03.0/0000:84:00.2/uio/uio2
SUBSYSTEM=uio
MAJOR=243
MINOR=2
DEVNAME=uio2
SEQNUM=11366

b.add uevent monitoring machanism:
add several general api to enable uevent monitoring.

c.add common uevent handler and uevent failure handler
uevent of device should be handler at bus or device layer, and the memory read
and write failure when hot removal should be handle correctly before detach behaviors.

d.show example how to use uevent monitor
enable uevent monitoring in testpmd or fail-safe to show usage.

patchset history:
v8->v7:
1.use rte_service to replace pthread management.
2.fix defind issue and copyright issue
3.fix some lock issue

v7->v6:
1.modify vdev part according to the vdev rework
2.re-define and split the func into common and bus specific code
3.fix some incorrect issue.
4.fix the system hung after send packcet issue.

v6->v5:
1.add hot plug policy, in eal, default handle to prepare hot plug work for
all pci device, then let app to manage to deside which device need to
hot plug.
2.modify to manage event callback in each device.
3.fix some system hung issue when igb_uio release.
4.modify the pci part to the bus-pci base on the bus rework.
5.add hot plug policy in app, show example to use hotplug list to manage
to deside which device need to hot plug.

v5->v4:
1.Move uevent monitor epolling from eal interrupt to eal device layer.
2.Redefine the eal device API for common, and distinguish between linux and bsd
3.Add failure handler helper api in bus layer.Add function of find device by name.
4.Replace of individual fd bind with single device, use a common fd to polling all device.
5.Add to register hot insertion monitoring and process, add function to auto bind driver befor user add device
6.Refine some coding style and typos issue
7.add new callback to process hot insertion

v4->v3:
1.move uevent monitor api from eal interrupt to eal device layer.
2.create uevent type and struct in eal device.
3.move uevent handler for each driver to eal layer.
4.add uevent failure handler to process signal fault issue.
5.add example for request and use uevent monitoring in testpmd.

v3->v2:
1.refine some return error
2.refine the string searching logic to avoid memory issue

v2->v1:
1.remove global variables of hotplug_fd, add uevent_fd
in rte_intr_handle to let each pci device self maintain it fd,
to fix dual device fd issue.
2.refine some typo error.


Jeff Guo (3):
  eal: add uevent monitor for hot plug
  igb_uio: fix device removal issuse for hotplug
  app/testpmd: use uevent to monitor hotplug

 app/test-pmd/testpmd.c                             | 178 ++++++++++
 app/test-pmd/testpmd.h                             |   9 +
 drivers/bus/pci/bsd/pci.c                          |  30 ++
 drivers/bus/pci/linux/pci.c                        |  87 +++++
 drivers/bus/pci/pci_common.c                       |  43 +++
 drivers/bus/pci/pci_common_uio.c                   |  28 ++
 drivers/bus/pci/private.h                          |  12 +
 drivers/bus/pci/rte_bus_pci.h                      |  25 ++
 drivers/bus/vdev/vdev.c                            |  36 ++
 lib/librte_eal/bsdapp/eal/eal_dev.c                |  37 ++
 .../bsdapp/eal/include/exec-env/rte_dev.h          |  39 +++
 lib/librte_eal/common/eal_common_bus.c             |  30 ++
 lib/librte_eal/common/eal_common_dev.c             | 160 +++++++++
 lib/librte_eal/common/include/rte_bus.h            |  71 ++++
 lib/librte_eal/common/include/rte_dev.h            | 128 +++++++
 lib/librte_eal/linuxapp/eal/Makefile               |   3 +-
 lib/librte_eal/linuxapp/eal/eal_dev.c              | 375 +++++++++++++++++++++
 .../linuxapp/eal/include/exec-env/rte_dev.h        |  39 +++
 lib/librte_eal/linuxapp/igb_uio/igb_uio.c          |   6 +
 lib/librte_pci/rte_pci.c                           |  20 ++
 lib/librte_pci/rte_pci.h                           |  17 +
 21 files changed, 1372 insertions(+), 1 deletion(-)
 create mode 100644 lib/librte_eal/bsdapp/eal/eal_dev.c
 create mode 100644 lib/librte_eal/bsdapp/eal/include/exec-env/rte_dev.h
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_dev.c
 create mode 100644 lib/librte_eal/linuxapp/eal/include/exec-env/rte_dev.h

-- 
2.7.4

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH V8 1/3] eal: add uevent monitor for hot plug
  2018-01-10  3:30                         ` [PATCH V8 0/3] add uevent mechanism in eal framework Jeff Guo
@ 2018-01-10  3:30                           ` Jeff Guo
  2018-01-10  3:30                           ` [PATCH V8 2/3] igb_uio: fix device removal issuse for hotplug Jeff Guo
  2018-01-10  3:30                           ` [PATCH V8 3/3] app/testpmd: use uevent to monitor hotplug Jeff Guo
  2 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-01-10  3:30 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, gaetan.rivet
  Cc: konstantin.ananyev, jblunck, shreyansh.jain, jingjing.wu, dev,
	jia.guo, thomas, helin.zhang, motih

This patch aim to add a general uevent mechanism in eal device layer,
to enable all linux kernel object hot plug monitoring, so user could use
these APIs to monitor and read out the device status info that sent from
the kernel side, then corresponding to handle it, such as detach or attach
the device, and even benefit to use it to do smoothly fail safe work.

1) About uevent monitoring:
a: add one epolling to poll the netlink socket, to monitor the uevent of
   the device, add device_state in struct of rte_device, to identify the
   device state machine.
b: add enum of rte_eal_dev_event_type and struct of rte_eal_uevent.
c: add below API in rte eal device common layer.
   rte_dev_evt_monitor_enable
   rte_dev_callback_register
   rte_dev_callback_unregister
   _rte_dev_callback_process
   rte_dev_monitor_start
   rte_dev_monitor_stop

2) About failure handler, use pci uio for example,
   add pci_remap_device in bus layer and below function to process it:
   rte_pci_remap_device
   pci_uio_remap_resource
   pci_map_private_resource
   add rte_pci_dev_bind_driver to bind pci device with explicit driver.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v8->v7:
a.use rte_service to replace pthread management.
b.fix defind issue and copyright issue
c.fix some lock issue
---
 drivers/bus/pci/bsd/pci.c                          |  30 ++
 drivers/bus/pci/linux/pci.c                        |  87 +++++
 drivers/bus/pci/pci_common.c                       |  43 +++
 drivers/bus/pci/pci_common_uio.c                   |  28 ++
 drivers/bus/pci/private.h                          |  12 +
 drivers/bus/pci/rte_bus_pci.h                      |  25 ++
 drivers/bus/vdev/vdev.c                            |  36 ++
 lib/librte_eal/bsdapp/eal/eal_dev.c                |  37 ++
 .../bsdapp/eal/include/exec-env/rte_dev.h          |  39 +++
 lib/librte_eal/common/eal_common_bus.c             |  30 ++
 lib/librte_eal/common/eal_common_dev.c             | 160 +++++++++
 lib/librte_eal/common/include/rte_bus.h            |  71 ++++
 lib/librte_eal/common/include/rte_dev.h            | 128 +++++++
 lib/librte_eal/linuxapp/eal/Makefile               |   3 +-
 lib/librte_eal/linuxapp/eal/eal_dev.c              | 375 +++++++++++++++++++++
 .../linuxapp/eal/include/exec-env/rte_dev.h        |  39 +++
 lib/librte_pci/rte_pci.c                           |  20 ++
 lib/librte_pci/rte_pci.h                           |  17 +
 18 files changed, 1179 insertions(+), 1 deletion(-)
 create mode 100644 lib/librte_eal/bsdapp/eal/eal_dev.c
 create mode 100644 lib/librte_eal/bsdapp/eal/include/exec-env/rte_dev.h
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_dev.c
 create mode 100644 lib/librte_eal/linuxapp/eal/include/exec-env/rte_dev.h

diff --git a/drivers/bus/pci/bsd/pci.c b/drivers/bus/pci/bsd/pci.c
index b8e2178..d58dbf6 100644
--- a/drivers/bus/pci/bsd/pci.c
+++ b/drivers/bus/pci/bsd/pci.c
@@ -126,6 +126,29 @@ rte_pci_unmap_device(struct rte_pci_device *dev)
 	}
 }
 
+/* re-map pci device */
+int
+rte_pci_remap_device(struct rte_pci_device *dev)
+{
+	int ret;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	switch (dev->kdrv) {
+	case RTE_KDRV_NIC_UIO:
+		ret = pci_uio_remap_resource(dev);
+		break;
+	default:
+		RTE_LOG(DEBUG, EAL,
+			"  Not managed by a supported kernel driver, skipped\n");
+		ret = 1;
+		break;
+	}
+
+	return ret;
+}
+
 void
 pci_uio_free_resource(struct rte_pci_device *dev,
 		struct mapped_pci_resource *uio_res)
@@ -678,3 +701,10 @@ rte_pci_ioport_unmap(struct rte_pci_ioport *p)
 
 	return ret;
 }
+
+int
+rte_pci_dev_bind_driver(const char *dev_name, const char *drv_type)
+{
+	return -1;
+}
+
diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index 5da6728..a09644a 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -145,6 +145,38 @@ rte_pci_unmap_device(struct rte_pci_device *dev)
 	}
 }
 
+/* Map pci device */
+int
+rte_pci_remap_device(struct rte_pci_device *dev)
+{
+	int ret = -1;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	switch (dev->kdrv) {
+	case RTE_KDRV_VFIO:
+#ifdef VFIO_PRESENT
+		/* no thing to do */
+#endif
+		break;
+	case RTE_KDRV_IGB_UIO:
+	case RTE_KDRV_UIO_GENERIC:
+		if (rte_eal_using_phys_addrs()) {
+			/* map resources for devices that use uio */
+			ret = pci_uio_remap_resource(dev);
+		}
+		break;
+	default:
+		RTE_LOG(DEBUG, EAL,
+			"  Not managed by a supported kernel driver, skipped\n");
+		ret = 1;
+		break;
+	}
+
+	return ret;
+}
+
 void *
 pci_find_max_end_va(void)
 {
@@ -386,6 +418,8 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr *addr)
 		rte_pci_add_device(dev);
 	}
 
+	dev->device.state = RTE_DEV_PARSED;
+	TAILQ_INIT(&(dev->device.uev_cbs));
 	return 0;
 }
 
@@ -854,3 +888,56 @@ rte_pci_ioport_unmap(struct rte_pci_ioport *p)
 
 	return ret;
 }
+
+int
+rte_pci_dev_bind_driver(const char *dev_name, const char *drv_type)
+{
+	char drv_bind_path[1024];
+	char drv_override_path[1024]; /* contains the /dev/uioX */
+	int drv_override_fd;
+	int drv_bind_fd;
+
+	RTE_SET_USED(drv_type);
+
+	snprintf(drv_override_path, sizeof(drv_override_path),
+		"/sys/bus/pci/devices/%s/driver_override", dev_name);
+
+	/* specify the driver for a device by writing to driver_override */
+	drv_override_fd = open(drv_override_path, O_WRONLY);
+	if (drv_override_fd < 0) {
+		RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
+			drv_override_path, strerror(errno));
+		goto err;
+	}
+
+	if (write(drv_override_fd, drv_type, sizeof(drv_type)) < 0) {
+		RTE_LOG(ERR, EAL,
+			"Error: bind failed - Cannot write "
+			"driver %s to device %s\n", drv_type, dev_name);
+		goto err;
+	}
+
+	close(drv_override_fd);
+
+	snprintf(drv_bind_path, sizeof(drv_bind_path),
+		"/sys/bus/pci/drivers/%s/bind", drv_type);
+
+	/* do the bind by writing device to the specific driver  */
+	drv_bind_fd = open(drv_bind_path, O_WRONLY | O_APPEND);
+	if (drv_bind_fd < 0) {
+		RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
+			drv_bind_path, strerror(errno));
+		goto err;
+	}
+
+	if (write(drv_bind_fd, dev_name, sizeof(dev_name)) < 0)
+		goto err;
+
+	close(drv_bind_fd);
+	return 0;
+err:
+	close(drv_override_fd);
+	close(drv_bind_fd);
+	return -1;
+}
+
diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
index 104fdf9..54601a9 100644
--- a/drivers/bus/pci/pci_common.c
+++ b/drivers/bus/pci/pci_common.c
@@ -282,6 +282,7 @@ pci_probe_all_drivers(struct rte_pci_device *dev)
 		if (rc > 0)
 			/* positive value means driver doesn't support it */
 			continue;
+		dev->device.state = RTE_DEV_PROBED;
 		return 0;
 	}
 	return 1;
@@ -481,6 +482,7 @@ rte_pci_insert_device(struct rte_pci_device *exist_pci_dev,
 void
 rte_pci_remove_device(struct rte_pci_device *pci_dev)
 {
+	RTE_LOG(DEBUG, EAL, " rte_pci_remove_device for device list\n");
 	TAILQ_REMOVE(&rte_pci_bus.device_list, pci_dev, next);
 }
 
@@ -502,6 +504,44 @@ pci_find_device(const struct rte_device *start, rte_dev_cmp_t cmp,
 	return NULL;
 }
 
+static struct rte_device *
+pci_find_device_by_name(const struct rte_device *start,
+		rte_dev_cmp_name_t cmp_name,
+		const void *data)
+{
+	struct rte_pci_device *dev;
+
+	FOREACH_DEVICE_ON_PCIBUS(dev) {
+		if (start && &dev->device == start) {
+			start = NULL; /* starting point found */
+			continue;
+		}
+		if (cmp_name(dev->device.name, data) == 0)
+			return &dev->device;
+	}
+
+	return NULL;
+}
+
+static int
+pci_remap_device(struct rte_device *dev)
+{
+	struct rte_pci_device *pdev;
+	int ret;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	pdev = RTE_DEV_TO_PCI(dev);
+
+	/* remap resources for devices that use igb_uio */
+	ret = rte_pci_remap_device(pdev);
+	if (ret != 0)
+		RTE_LOG(ERR, EAL, "failed to remap device %s",
+			dev->name);
+	return ret;
+}
+
 static int
 pci_plug(struct rte_device *dev)
 {
@@ -528,10 +568,13 @@ struct rte_pci_bus rte_pci_bus = {
 		.scan = rte_pci_scan,
 		.probe = rte_pci_probe,
 		.find_device = pci_find_device,
+		.find_device_by_name = pci_find_device_by_name,
 		.plug = pci_plug,
 		.unplug = pci_unplug,
 		.parse = pci_parse,
 		.get_iommu_class = rte_pci_get_iommu_class,
+		.remap_device = pci_remap_device,
+		.bind_driver = rte_pci_dev_bind_driver,
 	},
 	.device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
 	.driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
diff --git a/drivers/bus/pci/pci_common_uio.c b/drivers/bus/pci/pci_common_uio.c
index 0671131..8cb4009 100644
--- a/drivers/bus/pci/pci_common_uio.c
+++ b/drivers/bus/pci/pci_common_uio.c
@@ -176,6 +176,34 @@ pci_uio_unmap(struct mapped_pci_resource *uio_res)
 	}
 }
 
+/* remap the PCI resource of a PCI device in private virtual memory */
+int
+pci_uio_remap_resource(struct rte_pci_device *dev)
+{
+	int i;
+	uint64_t phaddr;
+	void *map_address;
+
+	/* Map all BARs */
+	for (i = 0; i != PCI_MAX_RESOURCE; i++) {
+		/* skip empty BAR */
+		phaddr = dev->mem_resource[i].phys_addr;
+		if (phaddr == 0)
+			continue;
+		map_address = pci_map_private_resource(
+				dev->mem_resource[i].addr, 0,
+				(size_t)dev->mem_resource[i].len);
+		if (map_address == MAP_FAILED)
+			goto error;
+		memset(map_address, 0xFF, (size_t)dev->mem_resource[i].len);
+		dev->mem_resource[i].addr = map_address;
+	}
+
+	return 0;
+error:
+	return -1;
+}
+
 static struct mapped_pci_resource *
 pci_uio_find_resource(struct rte_pci_device *dev)
 {
diff --git a/drivers/bus/pci/private.h b/drivers/bus/pci/private.h
index 2283f09..10baa1a 100644
--- a/drivers/bus/pci/private.h
+++ b/drivers/bus/pci/private.h
@@ -202,6 +202,18 @@ void pci_uio_free_resource(struct rte_pci_device *dev,
 		struct mapped_pci_resource *uio_res);
 
 /**
+ * remap the pci uio resource..
+ *
+ * @param dev
+ *   Point to the struct rte pci device.
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+pci_uio_remap_resource(struct rte_pci_device *dev);
+
+/**
  * Map device memory to uio resource
  *
  * This function is private to EAL.
diff --git a/drivers/bus/pci/rte_bus_pci.h b/drivers/bus/pci/rte_bus_pci.h
index d4a2996..1662f3b 100644
--- a/drivers/bus/pci/rte_bus_pci.h
+++ b/drivers/bus/pci/rte_bus_pci.h
@@ -52,6 +52,8 @@ extern "C" {
 #include <sys/queue.h>
 #include <stdint.h>
 #include <inttypes.h>
+#include <unistd.h>
+#include <fcntl.h>
 
 #include <rte_debug.h>
 #include <rte_interrupts.h>
@@ -197,6 +199,15 @@ int rte_pci_map_device(struct rte_pci_device *dev);
 void rte_pci_unmap_device(struct rte_pci_device *dev);
 
 /**
+ * Remap this device
+ *
+ * @param dev
+ *   A pointer to a rte_pci_device structure describing the device
+ *   to use
+ */
+int rte_pci_remap_device(struct rte_pci_device *dev);
+
+/**
  * Dump the content of the PCI bus.
  *
  * @param f
@@ -333,6 +344,20 @@ void rte_pci_ioport_read(struct rte_pci_ioport *p,
 void rte_pci_ioport_write(struct rte_pci_ioport *p,
 		const void *data, size_t len, off_t offset);
 
+/**
+ * It can be used to bind a device to a specific type of driver.
+ *
+ * @param dev_name
+ *  The device name.
+ * @param drv_type
+ *  The specific driver's type.
+ *
+ * @return
+ *  - On success, zero.
+ *  - On failure, a negative value.
+ */
+int rte_pci_dev_bind_driver(const char *dev_name, const char *drv_type);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/drivers/bus/vdev/vdev.c b/drivers/bus/vdev/vdev.c
index fd7736d..773f6e0 100644
--- a/drivers/bus/vdev/vdev.c
+++ b/drivers/bus/vdev/vdev.c
@@ -323,6 +323,39 @@ vdev_find_device(const struct rte_device *start, rte_dev_cmp_t cmp,
 	return NULL;
 }
 
+static struct rte_device *
+vdev_find_device_by_name(const struct rte_device *start,
+		rte_dev_cmp_name_t cmp_name,
+		const void *data)
+{
+	struct rte_vdev_device *dev;
+
+	TAILQ_FOREACH(dev, &vdev_device_list, next) {
+		if (start && &dev->device == start) {
+			start = NULL;
+			continue;
+		}
+		if (cmp_name(dev->device.name, data) == 0)
+			return &dev->device;
+	}
+	return NULL;
+}
+
+static int
+vdev_remap_device(struct rte_device *dev)
+{
+	RTE_SET_USED(dev);
+	return 0;
+}
+
+static int
+vdev_bind_driver(const char *dev_name, const char *drv_type)
+{
+	RTE_SET_USED(dev_name);
+	RTE_SET_USED(drv_type);
+	return 0;
+}
+
 static int
 vdev_plug(struct rte_device *dev)
 {
@@ -339,9 +372,12 @@ static struct rte_bus rte_vdev_bus = {
 	.scan = vdev_scan,
 	.probe = vdev_probe,
 	.find_device = vdev_find_device,
+	.find_device_by_name = vdev_find_device_by_name,
 	.plug = vdev_plug,
 	.unplug = vdev_unplug,
 	.parse = vdev_parse,
+	.remap_device = vdev_remap_device,
+	.bind_driver = vdev_bind_driver,
 };
 
 RTE_REGISTER_BUS(vdev, rte_vdev_bus);
diff --git a/lib/librte_eal/bsdapp/eal/eal_dev.c b/lib/librte_eal/bsdapp/eal/eal_dev.c
new file mode 100644
index 0000000..2a9113a
--- /dev/null
+++ b/lib/librte_eal/bsdapp/eal/eal_dev.c
@@ -0,0 +1,37 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdio.h>
+#include <string.h>
+#include <inttypes.h>
+#include <sys/queue.h>
+#include <sys/signalfd.h>
+#include <sys/ioctl.h>
+#include <sys/socket.h>
+#include <linux/netlink.h>
+#include <sys/epoll.h>
+#include <unistd.h>
+#include <signal.h>
+#include <stdbool.h>
+
+#include <rte_malloc.h>
+#include <rte_bus.h>
+#include <rte_dev.h>
+#include <rte_devargs.h>
+#include <rte_debug.h>
+#include <rte_log.h>
+
+#include "eal_thread.h"
+
+int
+rte_dev_monitor_start(void)
+{
+	return -1;
+}
+
+int
+rte_dev_monitor_stop(void)
+{
+	return -1;
+}
diff --git a/lib/librte_eal/bsdapp/eal/include/exec-env/rte_dev.h b/lib/librte_eal/bsdapp/eal/include/exec-env/rte_dev.h
new file mode 100644
index 0000000..70413b3
--- /dev/null
+++ b/lib/librte_eal/bsdapp/eal/include/exec-env/rte_dev.h
@@ -0,0 +1,39 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#ifndef _RTE_DEV_H_
+#error "don't include this file directly, please include generic <rte_dev.h>"
+#endif
+
+#ifndef _RTE_LINUXAPP_DEV_H_
+#define _RTE_LINUXAPP_DEV_H_
+
+#include <stdio.h>
+
+#include <rte_dev.h>
+
+/**
+ * Start the device uevent monitoring.
+ *
+ * @param none
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+rte_dev_monitor_start(void);
+
+/**
+ * Stop the device uevent monitoring .
+ *
+ * @param none
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+
+int
+rte_dev_monitor_stop(void);
+
+#endif /* _RTE_LINUXAPP_DEV_H_ */
diff --git a/lib/librte_eal/common/eal_common_bus.c b/lib/librte_eal/common/eal_common_bus.c
index 3e022d5..b7219c9 100644
--- a/lib/librte_eal/common/eal_common_bus.c
+++ b/lib/librte_eal/common/eal_common_bus.c
@@ -51,8 +51,11 @@ rte_bus_register(struct rte_bus *bus)
 	RTE_VERIFY(bus->scan);
 	RTE_VERIFY(bus->probe);
 	RTE_VERIFY(bus->find_device);
+	RTE_VERIFY(bus->find_device_by_name);
 	/* Buses supporting driver plug also require unplug. */
 	RTE_VERIFY(!bus->plug || bus->unplug);
+	RTE_VERIFY(bus->remap_device);
+	RTE_VERIFY(bus->bind_driver);
 
 	TAILQ_INSERT_TAIL(&rte_bus_list, bus, next);
 	RTE_LOG(DEBUG, EAL, "Registered [%s] bus.\n", bus->name);
@@ -170,6 +173,14 @@ cmp_rte_device(const struct rte_device *dev1, const void *_dev2)
 }
 
 static int
+cmp_rte_device_name(const char *dev_name1, const void *_dev_name2)
+{
+	const char *dev_name2 = _dev_name2;
+
+	return strcmp(dev_name1, dev_name2);
+}
+
+static int
 bus_find_device(const struct rte_bus *bus, const void *_dev)
 {
 	struct rte_device *dev;
@@ -178,6 +189,25 @@ bus_find_device(const struct rte_bus *bus, const void *_dev)
 	return dev == NULL;
 }
 
+static struct rte_device *
+bus_find_device_by_name(const struct rte_bus *bus, const void *_dev_name)
+{
+	struct rte_device *dev;
+
+	dev = bus->find_device_by_name(NULL, cmp_rte_device_name, _dev_name);
+	return dev;
+}
+
+struct rte_device *
+
+rte_bus_find_device(const struct rte_bus *bus, const void *_dev_name)
+{
+	struct rte_device *dev;
+
+	dev = bus_find_device_by_name(bus, _dev_name);
+	return dev;
+}
+
 struct rte_bus *
 rte_bus_find_by_device(const struct rte_device *dev)
 {
diff --git a/lib/librte_eal/common/eal_common_dev.c b/lib/librte_eal/common/eal_common_dev.c
index dda8f58..4f87cec 100644
--- a/lib/librte_eal/common/eal_common_dev.c
+++ b/lib/librte_eal/common/eal_common_dev.c
@@ -42,9 +42,31 @@
 #include <rte_devargs.h>
 #include <rte_debug.h>
 #include <rte_log.h>
+#include <rte_spinlock.h>
+#include <rte_malloc.h>
 
 #include "eal_private.h"
 
+/* spinlock for device callbacks */
+static rte_spinlock_t rte_dev_cb_lock = RTE_SPINLOCK_INITIALIZER;
+
+/**
+ * The user application callback description.
+ *
+ * It contains callback address to be registered by user application,
+ * the pointer to the parameters for callback, and the event type.
+ */
+struct rte_eal_dev_callback {
+	TAILQ_ENTRY(rte_eal_dev_callback) next; /**< Callbacks list */
+	rte_eal_dev_cb_fn cb_fn;                /**< Callback address */
+	void *cb_arg;                           /**< Parameter for callback */
+	void *ret_param;                        /**< Return parameter */
+	enum rte_dev_event_type event;      /**< device event type */
+	uint32_t active;                        /**< Callback is executing */
+};
+
+static struct rte_eal_dev_callback *dev_add_cb;
+
 static int cmp_detached_dev_name(const struct rte_device *dev,
 	const void *_name)
 {
@@ -234,3 +256,141 @@ int rte_eal_hotplug_remove(const char *busname, const char *devname)
 	rte_eal_devargs_remove(busname, devname);
 	return ret;
 }
+
+int
+rte_dev_evt_monitor_enable(void)
+{
+	int ret;
+
+	ret = rte_dev_monitor_start();
+	if (ret)
+		RTE_LOG(ERR, EAL, "Can not init device monitor\n");
+	return ret;
+}
+
+int
+rte_dev_callback_register(struct rte_device *device,
+			enum rte_dev_event_type event,
+			rte_eal_dev_cb_fn cb_fn, void *cb_arg)
+{
+	struct rte_eal_dev_callback *user_cb;
+
+	if (!cb_fn || device == NULL)
+		return -EINVAL;
+
+	rte_spinlock_lock(&rte_dev_cb_lock);
+
+	if (TAILQ_EMPTY(&(device->uev_cbs)))
+		TAILQ_INIT(&(device->uev_cbs));
+
+	if (event == RTE_DEV_EVENT_ADD) {
+		user_cb = NULL;
+	} else {
+		TAILQ_FOREACH(user_cb, &(device->uev_cbs), next) {
+			if (user_cb->cb_fn == cb_fn &&
+				user_cb->cb_arg == cb_arg &&
+				user_cb->event == event) {
+				break;
+			}
+		}
+	}
+
+	/* create a new callback. */
+	if (user_cb == NULL) {
+		/* allocate a new interrupt callback entity */
+		user_cb = rte_zmalloc("eal device event",
+					sizeof(*user_cb), 0);
+		if (user_cb == NULL) {
+			RTE_LOG(ERR, EAL, "Can not allocate memory\n");
+			rte_spinlock_unlock(&rte_dev_cb_lock);
+			return -ENOMEM;
+		}
+		user_cb->cb_fn = cb_fn;
+		user_cb->cb_arg = cb_arg;
+		user_cb->event = event;
+		if (event == RTE_DEV_EVENT_ADD)
+			dev_add_cb = user_cb;
+		else
+			TAILQ_INSERT_TAIL(&(device->uev_cbs), user_cb, next);
+	}
+
+	rte_spinlock_unlock(&rte_dev_cb_lock);
+	return 0;
+}
+
+int
+rte_dev_callback_unregister(struct rte_device *device,
+			enum rte_dev_event_type event,
+			rte_eal_dev_cb_fn cb_fn, void *cb_arg)
+{
+	int ret;
+	struct rte_eal_dev_callback *cb, *next;
+
+	if (!cb_fn || device == NULL)
+		return -EINVAL;
+
+	rte_spinlock_lock(&rte_dev_cb_lock);
+
+	ret = 0;
+	if (event == RTE_DEV_EVENT_ADD) {
+		rte_free(dev_add_cb);
+		dev_add_cb = NULL;
+	} else {
+		for (cb = TAILQ_FIRST(&(device->uev_cbs)); cb != NULL;
+		      cb = next) {
+
+			next = TAILQ_NEXT(cb, next);
+
+			if (cb->cb_fn != cb_fn || cb->event != event ||
+					(cb->cb_arg != (void *)-1 &&
+					cb->cb_arg != cb_arg))
+				continue;
+
+			/*
+			 * if this callback is not executing right now,
+			 * then remove it.
+			 */
+			if (cb->active == 0) {
+				TAILQ_REMOVE(&(device->uev_cbs), cb, next);
+				rte_free(cb);
+			} else {
+				ret = -EAGAIN;
+			}
+		}
+	}
+	rte_spinlock_unlock(&rte_dev_cb_lock);
+	return ret;
+}
+
+int
+_rte_dev_callback_process(struct rte_device *device,
+			enum rte_dev_event_type event,
+			void *ret_param)
+{
+	struct rte_eal_dev_callback dev_cb;
+	struct rte_eal_dev_callback *cb_lst;
+	int rc = 0;
+
+	rte_spinlock_lock(&rte_dev_cb_lock);
+	if (event == RTE_DEV_EVENT_ADD) {
+		if (ret_param != NULL)
+			dev_add_cb->ret_param = ret_param;
+
+		rc = dev_add_cb->cb_fn(dev_add_cb->event,
+				dev_add_cb->cb_arg, dev_add_cb->ret_param);
+	} else {
+		TAILQ_FOREACH(cb_lst, &(device->uev_cbs), next) {
+			if (cb_lst->cb_fn == NULL || cb_lst->event != event)
+				continue;
+			dev_cb = *cb_lst;
+			cb_lst->active = 1;
+			if (ret_param != NULL)
+				dev_cb.ret_param = ret_param;
+			rc = dev_cb.cb_fn(dev_cb.event,
+					dev_cb.cb_arg, dev_cb.ret_param);
+			cb_lst->active = 0;
+		}
+	}
+	rte_spinlock_unlock(&rte_dev_cb_lock);
+	return rc;
+}
diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
index 6fb0834..fb03a74 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -122,6 +122,34 @@ typedef struct rte_device *
 			 const void *data);
 
 /**
+ * Device iterator to find a device on a bus.
+ *
+ * This function returns an rte_device if one of those held by the bus
+ * matches the data passed as parameter.
+ *
+ * If the comparison function returns zero this function should stop iterating
+ * over any more devices. To continue a search the device of a previous search
+ * can be passed via the start parameter.
+ *
+ * @param cmp
+ *	the device name comparison function.
+ *
+ * @param data
+ *	Data to compare each device against.
+ *
+ * @param start
+ *	starting point for the iteration
+ *
+ * @return
+ *	The first device matching the data, NULL if none exists.
+ */
+typedef struct rte_device *
+(*rte_bus_find_device_by_name_t)(const struct rte_device *start,
+			 rte_dev_cmp_name_t cmp,
+			 const void *data);
+
+
+/**
  * Implementation specific probe function which is responsible for linking
  * devices on that bus with applicable drivers.
  *
@@ -168,6 +196,39 @@ typedef int (*rte_bus_unplug_t)(struct rte_device *dev);
 typedef int (*rte_bus_parse_t)(const char *name, void *addr);
 
 /**
+ * Implementation specific remap function which is responsible for
+ * remmaping devices on that bus from original share memory resource
+ * to a private memory resource for the sake of device has been removal,
+ * when detect the device removal event invoke from the kernel side,
+ * prior to call this function before any operation for device hw.
+ *
+ * @param dev
+ *	Device pointer that was returned by a previous call to find_device.
+ *
+ * @return
+ *	0 on success.
+ *	!0 on error.
+ */
+typedef int (*rte_bus_remap_device_t)(struct rte_device *dev);
+
+/**
+ * Implementation specific bind driver function which is responsible for bind
+ * a explicit type of driver with a devices on that bus.
+ *
+ * @param dev_name
+ *	device textual description.
+ *
+ * @param drv_type
+ *	driver type textual description.
+ *
+ * @return
+ *	0 on success.
+ *	!0 on error.
+ */
+typedef int (*rte_bus_bind_driver_t)(const char *dev_name,
+				const char *drv_type);
+
+/**
  * Bus scan policies
  */
 enum rte_bus_scan_mode {
@@ -206,9 +267,13 @@ struct rte_bus {
 	rte_bus_scan_t scan;         /**< Scan for devices attached to bus */
 	rte_bus_probe_t probe;       /**< Probe devices on bus */
 	rte_bus_find_device_t find_device; /**< Find a device on the bus */
+	rte_bus_find_device_by_name_t find_device_by_name;
+				     /**< Find a device on the bus */
 	rte_bus_plug_t plug;         /**< Probe single device for drivers */
 	rte_bus_unplug_t unplug;     /**< Remove single device from driver */
 	rte_bus_parse_t parse;       /**< Parse a device name */
+	rte_bus_remap_device_t remap_device;       /**< remap a device */
+	rte_bus_bind_driver_t bind_driver; /**< bind a driver for bus device */
 	struct rte_bus_conf conf;    /**< Bus configuration */
 	rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
 };
@@ -306,6 +371,12 @@ struct rte_bus *rte_bus_find(const struct rte_bus *start, rte_bus_cmp_t cmp,
 struct rte_bus *rte_bus_find_by_device(const struct rte_device *dev);
 
 /**
+ * Find the registered bus for a particular device.
+ */
+struct rte_device *rte_bus_find_device(const struct rte_bus *bus,
+				const void *dev_name);
+
+/**
  * Find the registered bus for a given name.
  */
 struct rte_bus *rte_bus_find_by_name(const char *busname);
diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h
index 9342e0c..f518db9 100644
--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -51,6 +51,62 @@ extern "C" {
 
 #include <rte_log.h>
 
+#include <exec-env/rte_dev.h>
+
+#define RTE_EAL_UEV_MSG_LEN 4096
+#define RTE_EAL_UEV_MSG_ELEM_LEN 128
+
+enum rte_dev_state {
+	RTE_DEV_UNDEFINED,	/**< unknown device state */
+	RTE_DEV_FAULT,	/**< device fault or error */
+	RTE_DEV_PARSED,	/**< device have been parsed on bus*/
+	RTE_DEV_PROBED,	/**< devcie have been probed driver  */
+};
+
+enum uev_subsystem {
+	UEV_SUBSYSTEM_UIO,
+	UEV_SUBSYSTEM_VFIO,
+	UEV_SUBSYSTEM_PCI,
+	UEV_SUBSYSTEM_MAX
+};
+
+enum uev_monitor_netlink_group {
+	UEV_MONITOR_KERNEL,
+	UEV_MONITOR_UDEV,
+};
+
+/**
+ * The device event type.
+ */
+enum rte_dev_event_type {
+	RTE_DEV_EVENT_UNKNOWN,	/**< unknown event type */
+	RTE_DEV_EVENT_ADD,	/**< device being added */
+	RTE_DEV_EVENT_REMOVE,
+				/**< device being removed */
+	RTE_DEV_EVENT_CHANGE,
+				/**< device status being changed,
+				 * etc charger percent
+				 */
+	RTE_DEV_EVENT_MOVE,	/**< device sysfs path being moved */
+	RTE_DEV_EVENT_ONLINE,	/**< device being enable */
+	RTE_DEV_EVENT_OFFLINE,	/**< device being disable */
+	RTE_DEV_EVENT_MAX	/**< max value of this enum */
+};
+
+struct rte_eal_uevent {
+	enum rte_dev_event_type type;	/**< device event type */
+	int subsystem;				/**< subsystem id */
+	char *devname;				/**< device name */
+	enum uev_monitor_netlink_group group;	/**< device netlink group */
+};
+
+typedef int (*rte_eal_dev_cb_fn)(enum rte_dev_event_type event,
+					void *cb_arg, void *ret_param);
+
+struct rte_eal_dev_callback;
+/** @internal Structure to keep track of registered callbacks */
+TAILQ_HEAD(rte_eal_dev_cb_list, rte_eal_dev_callback);
+
 __attribute__((format(printf, 2, 0)))
 static inline void
 rte_pmd_debug_trace(const char *func_name, const char *fmt, ...)
@@ -166,6 +222,9 @@ struct rte_device {
 	const struct rte_driver *driver;/**< Associated driver */
 	int numa_node;                /**< NUMA node connection */
 	struct rte_devargs *devargs;  /**< Device user arguments */
+	enum rte_dev_state state;  /**< Device state */
+	/** User application callbacks for device event */
+	struct rte_eal_dev_cb_list uev_cbs;
 };
 
 /**
@@ -248,6 +307,8 @@ int rte_eal_hotplug_remove(const char *busname, const char *devname);
  */
 typedef int (*rte_dev_cmp_t)(const struct rte_device *dev, const void *data);
 
+typedef int (*rte_dev_cmp_name_t)(const char *dev_name, const void *data);
+
 #define RTE_PMD_EXPORT_NAME_ARRAY(n, idx) n##idx[]
 
 #define RTE_PMD_EXPORT_NAME(name, idx) \
@@ -293,4 +354,71 @@ __attribute__((used)) = str
 }
 #endif
 
+/**
+ * It enable the device event monitoring.
+ *
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+rte_dev_evt_monitor_enable(void);
+/**
+ * It registers the callback for the specific event. Multiple
+ * callbacks cal be registered at the same time.
+ * @param event
+ *  The device event type.
+ * @param cb_fn
+ *  callback address.
+ * @param cb_arg
+ *  address of parameter for callback.
+ *
+ * @return
+ *  - On success, zero.
+ *  - On failure, a negative value.
+ */
+int rte_dev_callback_register(struct rte_device *device,
+			enum rte_dev_event_type event,
+			rte_eal_dev_cb_fn cb_fn, void *cb_arg);
+
+/**
+ * It unregisters the callback according to the specified event.
+ *
+ * @param event
+ *  The event type which corresponding to the callback.
+ * @param cb_fn
+ *  callback address.
+ *  address of parameter for callback, (void *)-1 means to remove all
+ *  registered which has the same callback address.
+ *
+ * @return
+ *  - On success, return the number of callback entities removed.
+ *  - On failure, a negative value.
+ */
+int rte_dev_callback_unregister(struct rte_device *device,
+			enum rte_dev_event_type event,
+			rte_eal_dev_cb_fn cb_fn, void *cb_arg);
+
+/**
+ * @internal Executes all the user application registered callbacks for
+ * the specific device. It is for DPDK internal user only. User
+ * application should not call it directly.
+ *
+ * @param event
+ *  The device event type.
+ * @param cb_arg
+ *  callback parameter.
+ * @param ret_param
+ *  To pass data back to user application.
+ *  This allows the user application to decide if a particular function
+ *  is permitted or not.
+ *
+ * @return
+ *  - On success, return zero.
+ *  - On failure, a negative value.
+ */
+int
+_rte_dev_callback_process(struct rte_device *device,
+			enum rte_dev_event_type event,
+			void *ret_param);
 #endif /* _RTE_DEV_H_ */
diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index 5a7b8b2..05a2437 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -67,6 +67,7 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_lcore.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_timer.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_interrupts.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_alarm.c
+SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_dev.c
 
 # from common dir
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_common_lcore.c
@@ -120,7 +121,7 @@ ifeq ($(CONFIG_RTE_TOOLCHAIN_GCC),y)
 CFLAGS_eal_thread.o += -Wno-return-type
 endif
 
-INC := rte_kni_common.h
+INC := rte_kni_common.h rte_dev.h
 
 SYMLINK-$(CONFIG_RTE_EXEC_ENV_LINUXAPP)-include/exec-env := \
 	$(addprefix include/exec-env/,$(INC))
diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
new file mode 100644
index 0000000..2a4eb78
--- /dev/null
+++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
@@ -0,0 +1,375 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdio.h>
+#include <string.h>
+#include <inttypes.h>
+#include <sys/queue.h>
+#include <sys/signalfd.h>
+#include <sys/ioctl.h>
+#include <sys/socket.h>
+#include <linux/netlink.h>
+#include <sys/epoll.h>
+#include <unistd.h>
+#include <signal.h>
+#include <stdbool.h>
+
+#include <rte_malloc.h>
+#include <rte_bus.h>
+#include <rte_dev.h>
+#include <rte_devargs.h>
+#include <rte_debug.h>
+#include <rte_log.h>
+#include <rte_service.h>
+#include <rte_service_component.h>
+
+#include "eal_thread.h"
+
+bool service_exit = true;
+
+bool service_no_init = true;
+
+#define DEV_EV_MNT_SERVICE_NAME "device_event_monitor_service"
+
+static void sig_handler(int signum)
+{
+	if (signum == SIGINT || signum == SIGTERM)
+		rte_dev_monitor_stop();
+}
+
+static int
+dev_monitor_fd_new(void)
+{
+
+	int uevent_fd;
+
+	uevent_fd = socket(PF_NETLINK, SOCK_RAW | SOCK_CLOEXEC |
+			SOCK_NONBLOCK,
+			NETLINK_KOBJECT_UEVENT);
+	if (uevent_fd < 0) {
+		RTE_LOG(ERR, EAL, "create uevent fd failed\n");
+		return -1;
+	}
+	return uevent_fd;
+}
+
+static int
+dev_monitor_enable(int netlink_fd)
+{
+	struct sockaddr_nl addr;
+	int ret;
+	int size = 64 * 1024;
+	int nonblock = 1;
+
+	memset(&addr, 0, sizeof(addr));
+	addr.nl_family = AF_NETLINK;
+	addr.nl_pid = 0;
+	addr.nl_groups = 0xffffffff;
+
+	if (bind(netlink_fd, (struct sockaddr *) &addr, sizeof(addr)) < 0) {
+		RTE_LOG(ERR, EAL, "bind failed\n");
+		goto err;
+	}
+
+	setsockopt(netlink_fd, SOL_SOCKET, SO_PASSCRED, &size, sizeof(size));
+
+	ret = ioctl(netlink_fd, FIONBIO, &nonblock);
+	if (ret != 0) {
+		RTE_LOG(ERR, EAL, "ioctl(FIONBIO) failed\n");
+		goto err;
+	}
+	return 0;
+err:
+	close(netlink_fd);
+	return -1;
+}
+
+static void
+dev_uev_parse(const char *buf, struct rte_eal_uevent *event)
+{
+	char action[RTE_EAL_UEV_MSG_ELEM_LEN];
+	char subsystem[RTE_EAL_UEV_MSG_ELEM_LEN];
+	char dev_path[RTE_EAL_UEV_MSG_ELEM_LEN];
+	char pci_slot_name[RTE_EAL_UEV_MSG_ELEM_LEN];
+	int i = 0;
+
+	memset(action, 0, RTE_EAL_UEV_MSG_ELEM_LEN);
+	memset(subsystem, 0, RTE_EAL_UEV_MSG_ELEM_LEN);
+	memset(dev_path, 0, RTE_EAL_UEV_MSG_ELEM_LEN);
+	memset(pci_slot_name, 0, RTE_EAL_UEV_MSG_ELEM_LEN);
+
+	while (i < RTE_EAL_UEV_MSG_LEN) {
+		for (; i < RTE_EAL_UEV_MSG_LEN; i++) {
+			if (*buf)
+				break;
+			buf++;
+		}
+		if (!strncmp(buf, "libudev", 7)) {
+			buf += 7;
+			i += 7;
+			event->group = UEV_MONITOR_UDEV;
+		}
+		if (!strncmp(buf, "ACTION=", 7)) {
+			buf += 7;
+			i += 7;
+			snprintf(action, sizeof(action), "%s", buf);
+		} else if (!strncmp(buf, "DEVPATH=", 8)) {
+			buf += 8;
+			i += 8;
+			snprintf(dev_path, sizeof(dev_path), "%s", buf);
+		} else if (!strncmp(buf, "SUBSYSTEM=", 10)) {
+			buf += 10;
+			i += 10;
+			snprintf(subsystem, sizeof(subsystem), "%s", buf);
+		} else if (!strncmp(buf, "PCI_SLOT_NAME=", 14)) {
+			buf += 14;
+			i += 14;
+			snprintf(pci_slot_name, sizeof(subsystem), "%s", buf);
+			event->devname = pci_slot_name;
+		}
+		for (; i < RTE_EAL_UEV_MSG_LEN; i++) {
+			if (*buf == '\0')
+				break;
+			buf++;
+		}
+	}
+
+	if (!strncmp(subsystem, "pci", 3))
+		event->subsystem = UEV_SUBSYSTEM_PCI;
+	if (!strncmp(action, "add", 3))
+		event->type = RTE_DEV_EVENT_ADD;
+	if (!strncmp(action, "remove", 6))
+		event->type = RTE_DEV_EVENT_REMOVE;
+	event->devname = pci_slot_name;
+}
+
+static int
+dev_uev_receive(int fd, struct rte_eal_uevent *uevent)
+{
+	int ret;
+	char buf[RTE_EAL_UEV_MSG_LEN];
+
+	memset(uevent, 0, sizeof(struct rte_eal_uevent));
+	memset(buf, 0, RTE_EAL_UEV_MSG_LEN);
+
+	ret = recv(fd, buf, RTE_EAL_UEV_MSG_LEN - 1, MSG_DONTWAIT);
+	if (ret < 0) {
+		RTE_LOG(ERR, EAL,
+		"Socket read error(%d): %s\n",
+		errno, strerror(errno));
+		return -1;
+	} else if (ret == 0)
+		/* connection closed */
+		return -1;
+
+	dev_uev_parse(buf, uevent);
+
+	return 0;
+}
+
+static int
+dev_uev_process(struct epoll_event *events, int nfds)
+{
+	struct rte_bus *bus;
+	struct rte_device *dev;
+	struct rte_eal_uevent uevent;
+	int ret;
+	int i;
+
+	for (i = 0; i < nfds; i++) {
+		/**
+		 * check device uevent from kernel side, no need to check
+		 * uevent from udev.
+		 */
+		if ((dev_uev_receive(events[i].data.fd, &uevent)) ||
+			(uevent.group == UEV_MONITOR_UDEV))
+			return 0;
+
+		/* default handle all pci devcie when is being hot plug */
+		if (uevent.subsystem == UEV_SUBSYSTEM_PCI) {
+			bus = rte_bus_find_by_name("pci");
+			dev = rte_bus_find_device(bus, uevent.devname);
+			if (uevent.type == RTE_DEV_EVENT_REMOVE) {
+
+				if ((!dev) || dev->state == RTE_DEV_UNDEFINED)
+					return 0;
+				dev->state = RTE_DEV_FAULT;
+
+				/**
+				 * remap the resource to be fake
+				 * before user's removal processing
+				 */
+				ret = bus->remap_device(dev);
+				if (!ret)
+					return(_rte_dev_callback_process(dev,
+					  RTE_DEV_EVENT_REMOVE,
+					  NULL));
+			} else if (uevent.type == RTE_DEV_EVENT_ADD) {
+				if (dev == NULL) {
+					/**
+					 * bind the driver to the device
+					 * before user's add processing
+					 */
+					bus->bind_driver(
+						uevent.devname,
+						"igb_uio");
+					return(_rte_dev_callback_process(NULL,
+					  RTE_DEV_EVENT_ADD,
+					  uevent.devname));
+				}
+			}
+		}
+	}
+	return 0;
+}
+
+/**
+ * It builds/rebuilds up the epoll file descriptor with all the
+ * file descriptors being waited on. Then handles the interrupts.
+ *
+ * @param arg
+ *  pointer. (unused)
+ *
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+static int32_t dev_uev_monitoring(__rte_unused void *arg)
+{
+	struct sigaction act;
+	sigset_t mask;
+	int netlink_fd;
+	struct epoll_event ep_kernel;
+	int fd_ep;
+
+	service_exit = false;
+
+	/* set signal handlers */
+	memset(&act, 0x00, sizeof(struct sigaction));
+	act.sa_handler = sig_handler;
+	sigemptyset(&act.sa_mask);
+	act.sa_flags = SA_RESTART;
+	sigaction(SIGINT, &act, NULL);
+	sigaction(SIGTERM, &act, NULL);
+	sigemptyset(&mask);
+	sigaddset(&mask, SIGINT);
+	sigaddset(&mask, SIGTERM);
+	sigprocmask(SIG_UNBLOCK, &mask, NULL);
+
+	fd_ep = epoll_create1(EPOLL_CLOEXEC);
+	if (fd_ep < 0) {
+		RTE_LOG(ERR, EAL, "error creating epoll fd: %m\n");
+		goto out;
+	}
+
+	netlink_fd = dev_monitor_fd_new();
+
+	if (dev_monitor_enable(netlink_fd) < 0) {
+		RTE_LOG(ERR, EAL, "error subscribing to kernel events\n");
+		goto out;
+	}
+
+	memset(&ep_kernel, 0, sizeof(struct epoll_event));
+	ep_kernel.events = EPOLLIN | EPOLLPRI | EPOLLRDHUP | EPOLLHUP;
+	ep_kernel.data.fd = netlink_fd;
+	if (epoll_ctl(fd_ep, EPOLL_CTL_ADD, netlink_fd,
+		&ep_kernel) < 0) {
+		RTE_LOG(ERR, EAL, "error addding fd to epoll: %m\n");
+		goto out;
+	}
+
+	while (!service_exit) {
+		int fdcount;
+		struct epoll_event ev[1];
+
+		fdcount = epoll_wait(fd_ep, ev, 1, -1);
+		if (fdcount < 0) {
+			if (errno != EINTR)
+				RTE_LOG(ERR, EAL, "error receiving uevent "
+					"message: %m\n");
+				continue;
+			}
+
+		/* epoll_wait has at least one fd ready to read */
+		if (dev_uev_process(ev, fdcount) < 0) {
+			if (errno != EINTR)
+				RTE_LOG(ERR, EAL, "error processing uevent "
+					"message: %m\n");
+		}
+	}
+	return 0;
+out:
+	if (fd_ep >= 0)
+		close(fd_ep);
+	if (netlink_fd >= 0)
+		close(netlink_fd);
+	rte_panic("uev monitoring fail\n");
+	return -1;
+}
+
+int
+rte_dev_monitor_start(void)
+{
+	int ret;
+	struct rte_service_spec service;
+	uint32_t id;
+	const uint32_t sid = 0;
+
+	if (!service_no_init)
+		return 0;
+
+	uint32_t slcore_1 = rte_get_next_lcore(/* start core */ -1,
+					       /* skip master */ 1,
+					       /* wrap */ 0);
+
+	ret = rte_service_lcore_add(slcore_1);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "dev event monitor lcore add fail");
+		return ret;
+	}
+
+	memset(&service, 0, sizeof(service));
+	snprintf(service.name, sizeof(service.name), DEV_EV_MNT_SERVICE_NAME);
+
+	service.socket_id = rte_socket_id();
+	service.callback = dev_uev_monitoring;
+	service.callback_userdata = NULL;
+	service.capabilities = 0;
+	ret = rte_service_component_register(&service, &id);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Failed to register service %s "
+			"err = %" PRId32,
+			service.name, ret);
+		return ret;
+	}
+	ret = rte_service_runstate_set(sid, 1);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Failed to set the runstate of "
+			"the service");
+		return ret;
+	}
+	ret = rte_service_component_runstate_set(id, 1);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Failed to set the backend runstate"
+			" of a component");
+		return ret;
+	}
+	ret = rte_service_map_lcore_set(sid, slcore_1, 1);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Failed to enable lcore 1 on "
+			"dev event monitor service");
+		return ret;
+	}
+	rte_service_lcore_start(slcore_1);
+	service_no_init = false;
+	return 0;
+}
+
+int
+rte_dev_monitor_stop(void)
+{
+	service_exit = true;
+	service_no_init = true;
+	return 0;
+}
diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_dev.h b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_dev.h
new file mode 100644
index 0000000..70413b3
--- /dev/null
+++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_dev.h
@@ -0,0 +1,39 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#ifndef _RTE_DEV_H_
+#error "don't include this file directly, please include generic <rte_dev.h>"
+#endif
+
+#ifndef _RTE_LINUXAPP_DEV_H_
+#define _RTE_LINUXAPP_DEV_H_
+
+#include <stdio.h>
+
+#include <rte_dev.h>
+
+/**
+ * Start the device uevent monitoring.
+ *
+ * @param none
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+rte_dev_monitor_start(void);
+
+/**
+ * Stop the device uevent monitoring .
+ *
+ * @param none
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+
+int
+rte_dev_monitor_stop(void);
+
+#endif /* _RTE_LINUXAPP_DEV_H_ */
diff --git a/lib/librte_pci/rte_pci.c b/lib/librte_pci/rte_pci.c
index 0160fc1..feb5fd7 100644
--- a/lib/librte_pci/rte_pci.c
+++ b/lib/librte_pci/rte_pci.c
@@ -172,6 +172,26 @@ rte_pci_addr_parse(const char *str, struct rte_pci_addr *addr)
 	return -1;
 }
 
+/* map a private resource from an address*/
+void *
+pci_map_private_resource(void *requested_addr, off_t offset, size_t size)
+{
+	void *mapaddr;
+
+	mapaddr = mmap(requested_addr, size,
+			   PROT_READ | PROT_WRITE,
+			   MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
+	if (mapaddr == MAP_FAILED) {
+		RTE_LOG(ERR, EAL, "%s(): cannot mmap(%p, 0x%lx, 0x%lx): "
+			"%s (%p)\n",
+			__func__, requested_addr,
+			(unsigned long)size, (unsigned long)offset,
+			strerror(errno), mapaddr);
+	} else
+		RTE_LOG(DEBUG, EAL, "  PCI memory mapped at %p\n", mapaddr);
+
+	return mapaddr;
+}
 
 /* map a particular resource from a file */
 void *
diff --git a/lib/librte_pci/rte_pci.h b/lib/librte_pci/rte_pci.h
index 4f2cd18..f6091a6 100644
--- a/lib/librte_pci/rte_pci.h
+++ b/lib/librte_pci/rte_pci.h
@@ -227,6 +227,23 @@ int rte_pci_addr_cmp(const struct rte_pci_addr *addr,
 int rte_pci_addr_parse(const char *str, struct rte_pci_addr *addr);
 
 /**
+ * @internal
+ * Map to a particular private resource.
+ *
+ * @param requested_addr
+ *      The starting address for the new mapping range.
+ * @param offset
+ *      The offset for the mapping range.
+ * @param size
+ *      The size for the mapping range.
+ * @return
+ *   - On success, the function returns a pointer to the mapped area.
+ *   - On error, the value MAP_FAILED is returned.
+ */
+void *pci_map_private_resource(void *requested_addr, off_t offset,
+		size_t size);
+
+/**
  * Map a particular resource from a file.
  *
  * @param requested_addr
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH V8 2/3] igb_uio: fix device removal issuse for hotplug
  2018-01-10  3:30                         ` [PATCH V8 0/3] add uevent mechanism in eal framework Jeff Guo
  2018-01-10  3:30                           ` [PATCH V8 1/3] eal: add uevent monitor for hot plug Jeff Guo
@ 2018-01-10  3:30                           ` Jeff Guo
  2018-01-10  3:30                           ` [PATCH V8 3/3] app/testpmd: use uevent to monitor hotplug Jeff Guo
  2 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-01-10  3:30 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, gaetan.rivet
  Cc: konstantin.ananyev, jblunck, shreyansh.jain, jingjing.wu, dev,
	jia.guo, thomas, helin.zhang, motih

when the hot plug out of the device, the uio resource have been
invalid, so to avoid the system hung, no need to do the invalid
operation in igb_uio pci release function.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
 lib/librte_eal/linuxapp/igb_uio/igb_uio.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
index a3a98c1..d0e07b4 100644
--- a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
+++ b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
@@ -354,6 +354,12 @@ igbuio_pci_release(struct uio_info *info, struct inode *inode)
 	struct rte_uio_pci_dev *udev = info->priv;
 	struct pci_dev *dev = udev->pdev;
 
+	/* check if device have been remove before release */
+	if ((&dev->dev.kobj)->state_remove_uevent_sent == 1) {
+		pr_info("The device have been removed\n");
+		return -1;
+	}
+
 	/* disable interrupts */
 	igbuio_pci_disable_interrupts(udev);
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH V8 3/3] app/testpmd: use uevent to monitor hotplug
  2018-01-10  3:30                         ` [PATCH V8 0/3] add uevent mechanism in eal framework Jeff Guo
  2018-01-10  3:30                           ` [PATCH V8 1/3] eal: add uevent monitor for hot plug Jeff Guo
  2018-01-10  3:30                           ` [PATCH V8 2/3] igb_uio: fix device removal issuse for hotplug Jeff Guo
@ 2018-01-10  3:30                           ` Jeff Guo
  2018-01-10  9:12                             ` [PATCH V9 0/5] add uevent mechanism in eal framework Jeff Guo
  2 siblings, 1 reply; 494+ messages in thread
From: Jeff Guo @ 2018-01-10  3:30 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, gaetan.rivet
  Cc: konstantin.ananyev, jblunck, shreyansh.jain, jingjing.wu, dev,
	jia.guo, thomas, helin.zhang, motih

use testpmd for example, to show app how to request and use
uevent monitoring to handle the hot removal event and the
hot insertion event.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v8->v7:
fix some defind issue
---
 app/test-pmd/testpmd.c | 178 +++++++++++++++++++++++++++++++++++++++++++++++++
 app/test-pmd/testpmd.h |   9 +++
 2 files changed, 187 insertions(+)

diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index c3ab448..d624ec1 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -401,6 +401,8 @@ uint8_t bitrate_enabled;
 struct gro_status gro_ports[RTE_MAX_ETHPORTS];
 uint8_t gro_flush_cycles = GRO_DEFAULT_FLUSH_CYCLES;
 
+static struct hotplug_request_list hp_list;
+
 /* Forward function declarations */
 static void map_port_queue_stats_mapping_registers(portid_t pi,
 						   struct rte_port *port);
@@ -408,6 +410,13 @@ static void check_all_ports_link_status(uint32_t port_mask);
 static int eth_event_callback(portid_t port_id,
 			      enum rte_eth_event_type type,
 			      void *param, void *ret_param);
+static int eth_uevent_callback(enum rte_dev_event_type type,
+			      void *param, void *ret_param);
+static int eth_uevent_callback_register(portid_t pid);
+static int in_hotplug_list(const char *dev_name);
+
+static int hotplug_list_add(const char *dev_name,
+			    enum rte_dev_event_type event);
 
 /*
  * Check if all the ports are started.
@@ -1757,6 +1766,31 @@ reset_port(portid_t pid)
 	printf("Done\n");
 }
 
+static int
+eth_uevent_callback_register(portid_t pid) {
+	int diag;
+	struct rte_eth_dev *dev;
+	enum rte_dev_event_type dev_event_type;
+
+	/* register the uevent callback */
+	dev = &rte_eth_devices[pid];
+	for (dev_event_type = RTE_DEV_EVENT_ADD;
+		 dev_event_type < RTE_DEV_EVENT_CHANGE;
+		 dev_event_type++) {
+		diag = rte_dev_callback_register(dev->device, dev_event_type,
+			eth_uevent_callback,
+			(void *)(intptr_t)pid);
+		if (diag) {
+			printf("Failed to setup uevent callback for"
+				" device event %d\n",
+				dev_event_type);
+			return -1;
+		}
+	}
+
+	return 0;
+}
+
 void
 attach_port(char *identifier)
 {
@@ -1773,6 +1807,8 @@ attach_port(char *identifier)
 	if (rte_eth_dev_attach(identifier, &pi))
 		return;
 
+	eth_uevent_callback_register(pi);
+
 	socket_id = (unsigned)rte_eth_dev_socket_id(pi);
 	/* if socket_id is invalid, set to 0 */
 	if (check_socket_id(socket_id) < 0)
@@ -1784,6 +1820,8 @@ attach_port(char *identifier)
 
 	ports[pi].port_status = RTE_PORT_STOPPED;
 
+	hotplug_list_add(identifier, RTE_DEV_EVENT_REMOVE);
+
 	printf("Port %d is attached. Now total ports is %d\n", pi, nb_ports);
 	printf("Done\n");
 }
@@ -1810,6 +1848,9 @@ detach_port(portid_t port_id)
 
 	nb_ports = rte_eth_dev_count();
 
+	hotplug_list_add(rte_eth_devices[port_id].device->name,
+			 RTE_DEV_EVENT_ADD);
+
 	printf("Port '%s' is detached. Now total ports is %d\n",
 			name, nb_ports);
 	printf("Done\n");
@@ -1833,6 +1874,9 @@ pmd_test_exit(void)
 			close_port(pt_id);
 		}
 	}
+
+	rte_dev_monitor_stop();
+
 	printf("\nBye...\n");
 }
 
@@ -1917,6 +1961,49 @@ rmv_event_callback(void *arg)
 			dev->device->name);
 }
 
+static void
+rmv_uevent_callback(void *arg)
+{
+	char name[RTE_ETH_NAME_MAX_LEN];
+	uint8_t port_id = (intptr_t)arg;
+
+	rte_eal_alarm_cancel(rmv_uevent_callback, arg);
+
+	RTE_ETH_VALID_PORTID_OR_RET(port_id);
+	printf("removing port id:%u\n", port_id);
+
+	if (!in_hotplug_list(rte_eth_devices[port_id].device->name))
+		return;
+
+	stop_packet_forwarding();
+
+	stop_port(port_id);
+	close_port(port_id);
+	if (rte_eth_dev_detach(port_id, name)) {
+		RTE_LOG(ERR, USER1, "Failed to detach port '%s'\n", name);
+		return;
+	}
+
+	nb_ports = rte_eth_dev_count();
+
+	printf("Port '%s' is detached. Now total ports is %d\n",
+			name, nb_ports);
+}
+
+static void
+add_uevent_callback(void *arg)
+{
+	char *dev_name = (char *)arg;
+
+	rte_eal_alarm_cancel(add_uevent_callback, arg);
+
+	if (!in_hotplug_list(dev_name))
+		return;
+
+	RTE_LOG(ERR, EAL, "add device: %s\n", dev_name);
+	attach_port(dev_name);
+}
+
 /* This function is used by the interrupt thread */
 static int
 eth_event_callback(portid_t port_id, enum rte_eth_event_type type, void *param,
@@ -1959,6 +2046,88 @@ eth_event_callback(portid_t port_id, enum rte_eth_event_type type, void *param,
 }
 
 static int
+in_hotplug_list(const char *dev_name)
+{
+	struct hotplug_request *hp_request = NULL;
+
+	TAILQ_FOREACH(hp_request, &hp_list, next) {
+		if (!strcmp(hp_request->dev_name, dev_name))
+			break;
+	}
+
+	if (hp_request)
+		return 1;
+
+	return 0;
+}
+
+static int
+hotplug_list_add(const char *dev_name, enum rte_dev_event_type event)
+{
+	struct hotplug_request *hp_request;
+
+	hp_request = rte_zmalloc("hoplug request",
+			sizeof(*hp_request), 0);
+	if (hp_request == NULL) {
+		fprintf(stderr, "%s can not alloc memory\n",
+			__func__);
+		return -ENOMEM;
+	}
+
+	hp_request->dev_name = dev_name;
+	hp_request->event = event;
+
+	TAILQ_INSERT_TAIL(&hp_list, hp_request, next);
+
+	return 0;
+}
+
+/* This function is used by the interrupt thread */
+static int
+eth_uevent_callback(enum rte_dev_event_type type, void *arg,
+		  void *ret_param)
+{
+	static const char * const event_desc[] = {
+		[RTE_DEV_EVENT_UNKNOWN] = "Unknown",
+		[RTE_DEV_EVENT_ADD] = "add",
+		[RTE_DEV_EVENT_REMOVE] = "remove",
+	};
+	static char *device_name;
+
+	RTE_SET_USED(ret_param);
+
+	if (type >= RTE_DEV_EVENT_MAX) {
+		fprintf(stderr, "%s called upon invalid event %d\n",
+			__func__, type);
+		fflush(stderr);
+	} else if (event_print_mask & (UINT32_C(1) << type)) {
+		printf("%s event\n",
+			event_desc[type]);
+		fflush(stdout);
+	}
+
+	switch (type) {
+	case RTE_DEV_EVENT_REMOVE:
+		if (rte_eal_alarm_set(100000,
+			rmv_uevent_callback, arg))
+			fprintf(stderr, "Could not set up deferred "
+				"device removal\n");
+		break;
+	case RTE_DEV_EVENT_ADD:
+		device_name = malloc(strlen((const char *)ret_param) + 1);
+		strcpy(device_name, ret_param);
+		if (rte_eal_alarm_set(500000,
+			add_uevent_callback, device_name))
+			fprintf(stderr, "Could not set up deferred "
+				"device add\n");
+		break;
+	default:
+		break;
+	}
+	return 0;
+}
+
+static int
 set_tx_queue_stats_mapping_registers(portid_t port_id, struct rte_port *port)
 {
 	uint16_t i;
@@ -2438,6 +2607,15 @@ main(int argc, char** argv)
 		       nb_rxq, nb_txq);
 
 	init_config();
+
+	/* enable hot plug monitoring */
+	TAILQ_INIT(&hp_list);
+	rte_dev_evt_monitor_enable();
+	RTE_ETH_FOREACH_DEV(port_id) {
+		hotplug_list_add(rte_eth_devices[port_id].device->name,
+			 RTE_DEV_EVENT_REMOVE);
+		eth_uevent_callback_register(port_id);
+	}
 	if (start_port(RTE_PORT_ALL) != 0)
 		rte_exit(EXIT_FAILURE, "Start ports failed\n");
 
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 1639d27..1136c4b 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -92,6 +92,15 @@ typedef uint16_t streamid_t;
 #define TM_MODE			0
 #endif
 
+struct hotplug_request {
+	TAILQ_ENTRY(hotplug_request) next; /**< Callbacks list */
+	const char *dev_name;                /* request device name */
+	enum rte_dev_event_type event;      /**< device event type */
+};
+
+/** @internal Structure to keep track of registered callbacks */
+TAILQ_HEAD(hotplug_request_list, hotplug_request);
+
 enum {
 	PORT_TOPOLOGY_PAIRED,
 	PORT_TOPOLOGY_CHAINED,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH V9 0/5] add uevent mechanism in eal framework
  2018-01-10  3:30                           ` [PATCH V8 3/3] app/testpmd: use uevent to monitor hotplug Jeff Guo
@ 2018-01-10  9:12                             ` Jeff Guo
  2018-01-10  9:12                               ` [PATCH V9 1/5] eal: add uevent monitor api and callback func Jeff Guo
                                                 ` (4 more replies)
  0 siblings, 5 replies; 494+ messages in thread
From: Jeff Guo @ 2018-01-10  9:12 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, gaetan.rivet
  Cc: konstantin.ananyev, jblunck, shreyansh.jain, jingjing.wu, dev,
	jia.guo, thomas, helin.zhang, motih

So far, about hot plug in dpdk, we already have hot plug add/remove
api and fail-safe driver to offload the fail-safe work from the app
user. But there are still lack of a general event api to detect all hotplug
event for all driver,now the hotplug interrupt event is diversity between
each device and driver, such as mlx4, pci driver and others.

Use the hot removal event for example, pci drivers not all exposure the
remove interrupt, so in order to make user to easy use the hot plug feature
for pci driver, something must be done to detect the remove event at the
kernel level and offer a new line of interrupt to the user land.

Base on the uevent of kobject mechanism in kernel, we could use it to
benefit for monitoring the hot plug status of the device which not only
uio/vfio of pci bus devices, but also other, such as cpu/usb/pci-express
bus devices.

The idea is comming as bellow.

a.The uevent message form FD monitoring which will be useful.
remove@/devices/pci0000:80/0000:80:02.2/0000:82:00.0/0000:83:03.0/0000:84:00.2/uio/uio2
ACTION=remove
DEVPATH=/devices/pci0000:80/0000:80:02.2/0000:82:00.0/0000:83:03.0/0000:84:00.2/uio/uio2
SUBSYSTEM=uio
MAJOR=243
MINOR=2
DEVNAME=uio2
SEQNUM=11366

b.add uevent monitoring machanism:
add several general api to enable uevent monitoring.

c.add common uevent handler and uevent failure handler
uevent of device should be handler at bus or device layer, and the memory read
and write failure when hot removal should be handle correctly before detach behaviors.

d.show example how to use uevent monitor
enable uevent monitoring in testpmd or fail-safe to show usage.

patchset history:
v9->v8:
split the patch set into small and explicit patch

v8->v7:
1.use rte_service to replace pthread management.
2.fix defind issue and copyright issue
3.fix some lock issue

v7->v6:
1.modify vdev part according to the vdev rework
2.re-define and split the func into common and bus specific code
3.fix some incorrect issue.
4.fix the system hung after send packcet issue.

v6->v5:
1.add hot plug policy, in eal, default handle to prepare hot plug work for
all pci device, then let app to manage to deside which device need to
hot plug.
2.modify to manage event callback in each device.
3.fix some system hung issue when igb_uio release.
4.modify the pci part to the bus-pci base on the bus rework.
5.add hot plug policy in app, show example to use hotplug list to manage
to deside which device need to hot plug.

v5->v4:
1.Move uevent monitor epolling from eal interrupt to eal device layer.
2.Redefine the eal device API for common, and distinguish between linux and bsd
3.Add failure handler helper api in bus layer.Add function of find device by name.
4.Replace of individual fd bind with single device, use a common fd to polling all device.
5.Add to register hot insertion monitoring and process, add function to auto bind driver befor user add device
6.Refine some coding style and typos issue
7.add new callback to process hot insertion

v4->v3:
1.move uevent monitor api from eal interrupt to eal device layer.
2.create uevent type and struct in eal device.
3.move uevent handler for each driver to eal layer.
4.add uevent failure handler to process signal fault issue.
5.add example for request and use uevent monitoring in testpmd.

v3->v2:
1.refine some return error
2.refine the string searching logic to avoid memory issue

v2->v1:
1.remove global variables of hotplug_fd, add uevent_fd
in rte_intr_handle to let each pci device self maintain it fd,
to fix dual device fd issue.
2.refine some typo error.

Jeff Guo (5):
  eal: add uevent monitor api and callback func
  eal: add uevent pass and process function
  app/testpmd: use uevent to monitor hotplug
  pci_uio: add uevent hotplug failure handler in pci
  pci: add driver auto bind for hot insertion

 app/test-pmd/testpmd.c                             | 179 ++++++++++
 app/test-pmd/testpmd.h                             |   9 +
 drivers/bus/pci/bsd/pci.c                          |  30 ++
 drivers/bus/pci/linux/pci.c                        |  87 +++++
 drivers/bus/pci/pci_common.c                       |  43 +++
 drivers/bus/pci/pci_common_uio.c                   |  28 ++
 drivers/bus/pci/private.h                          |  12 +
 drivers/bus/pci/rte_bus_pci.h                      |  25 ++
 drivers/bus/vdev/vdev.c                            |  36 ++
 lib/librte_eal/bsdapp/eal/eal_dev.c                |  36 ++
 .../bsdapp/eal/include/exec-env/rte_dev.h          |  39 +++
 lib/librte_eal/common/eal_common_bus.c             |  30 ++
 lib/librte_eal/common/eal_common_dev.c             | 150 +++++++++
 lib/librte_eal/common/include/rte_bus.h            |  71 ++++
 lib/librte_eal/common/include/rte_dev.h            | 119 +++++++
 lib/librte_eal/linuxapp/eal/Makefile               |   3 +-
 lib/librte_eal/linuxapp/eal/eal_dev.c              | 375 +++++++++++++++++++++
 .../linuxapp/eal/include/exec-env/rte_dev.h        |  39 +++
 lib/librte_eal/linuxapp/igb_uio/igb_uio.c          |   6 +
 lib/librte_pci/rte_pci.c                           |  20 ++
 lib/librte_pci/rte_pci.h                           |  17 +
 21 files changed, 1353 insertions(+), 1 deletion(-)
 create mode 100644 lib/librte_eal/bsdapp/eal/eal_dev.c
 create mode 100644 lib/librte_eal/bsdapp/eal/include/exec-env/rte_dev.h
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_dev.c
 create mode 100644 lib/librte_eal/linuxapp/eal/include/exec-env/rte_dev.h

-- 
2.7.4

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH V9 1/5] eal: add uevent monitor api and callback func
  2018-01-10  9:12                             ` [PATCH V9 0/5] add uevent mechanism in eal framework Jeff Guo
@ 2018-01-10  9:12                               ` Jeff Guo
  2018-01-10 16:34                                 ` Stephen Hemminger
  2018-01-11  1:43                                 ` Thomas Monjalon
  2018-01-10  9:12                               ` [PATCH V9 2/5] eal: add uevent pass and process function Jeff Guo
                                                 ` (3 subsequent siblings)
  4 siblings, 2 replies; 494+ messages in thread
From: Jeff Guo @ 2018-01-10  9:12 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, gaetan.rivet
  Cc: konstantin.ananyev, jblunck, shreyansh.jain, jingjing.wu, dev,
	jia.guo, thomas, helin.zhang, motih

This patch aim to add a general uevent mechanism in eal device layer,
to enable all linux kernel object uevent monitoring, user could use these
APIs to monitor and read out the device status info that sent from the
kernel side, then corresponding to handle it, such as when detect hotplug
uevent type, user could detach or attach the device, and more it benefit
to use to do smoothly fail safe work.

About uevent monitoring:
a: add one epolling to poll the netlink socket, to monitor the uevent of
   the device.
b: add enum of rte_eal_dev_event_type and struct of rte_eal_uevent.
c: add below APIs in rte eal device layer.
   rte_dev_callback_register
   rte_dev_callback_unregister
   _rte_dev_callback_process
   rte_dev_monitor_start
   rte_dev_monitor_stop

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v9->v8:
split the patch set into small and explicit patch
---
 lib/librte_eal/bsdapp/eal/eal_dev.c                |  36 ++++
 .../bsdapp/eal/include/exec-env/rte_dev.h          |  39 ++++
 lib/librte_eal/common/eal_common_dev.c             | 150 ++++++++++++++
 lib/librte_eal/common/include/rte_dev.h            |  98 +++++++++
 lib/librte_eal/linuxapp/eal/Makefile               |   3 +-
 lib/librte_eal/linuxapp/eal/eal_dev.c              | 223 +++++++++++++++++++++
 .../linuxapp/eal/include/exec-env/rte_dev.h        |  39 ++++
 7 files changed, 587 insertions(+), 1 deletion(-)
 create mode 100644 lib/librte_eal/bsdapp/eal/eal_dev.c
 create mode 100644 lib/librte_eal/bsdapp/eal/include/exec-env/rte_dev.h
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_dev.c
 create mode 100644 lib/librte_eal/linuxapp/eal/include/exec-env/rte_dev.h

diff --git a/lib/librte_eal/bsdapp/eal/eal_dev.c b/lib/librte_eal/bsdapp/eal/eal_dev.c
new file mode 100644
index 0000000..7fdc2c0
--- /dev/null
+++ b/lib/librte_eal/bsdapp/eal/eal_dev.c
@@ -0,0 +1,36 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdio.h>
+#include <string.h>
+#include <inttypes.h>
+#include <sys/queue.h>
+#include <sys/signalfd.h>
+#include <sys/ioctl.h>
+#include <sys/socket.h>
+#include <sys/epoll.h>
+#include <unistd.h>
+#include <signal.h>
+#include <stdbool.h>
+
+#include <rte_malloc.h>
+#include <rte_bus.h>
+#include <rte_dev.h>
+#include <rte_devargs.h>
+#include <rte_debug.h>
+#include <rte_log.h>
+
+#include "eal_thread.h"
+
+int
+rte_dev_monitor_start(void)
+{
+	return -1;
+}
+
+int
+rte_dev_monitor_stop(void)
+{
+	return -1;
+}
diff --git a/lib/librte_eal/bsdapp/eal/include/exec-env/rte_dev.h b/lib/librte_eal/bsdapp/eal/include/exec-env/rte_dev.h
new file mode 100644
index 0000000..70413b3
--- /dev/null
+++ b/lib/librte_eal/bsdapp/eal/include/exec-env/rte_dev.h
@@ -0,0 +1,39 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#ifndef _RTE_DEV_H_
+#error "don't include this file directly, please include generic <rte_dev.h>"
+#endif
+
+#ifndef _RTE_LINUXAPP_DEV_H_
+#define _RTE_LINUXAPP_DEV_H_
+
+#include <stdio.h>
+
+#include <rte_dev.h>
+
+/**
+ * Start the device uevent monitoring.
+ *
+ * @param none
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+rte_dev_monitor_start(void);
+
+/**
+ * Stop the device uevent monitoring .
+ *
+ * @param none
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+
+int
+rte_dev_monitor_stop(void);
+
+#endif /* _RTE_LINUXAPP_DEV_H_ */
diff --git a/lib/librte_eal/common/eal_common_dev.c b/lib/librte_eal/common/eal_common_dev.c
index dda8f58..24c410e 100644
--- a/lib/librte_eal/common/eal_common_dev.c
+++ b/lib/librte_eal/common/eal_common_dev.c
@@ -42,9 +42,32 @@
 #include <rte_devargs.h>
 #include <rte_debug.h>
 #include <rte_log.h>
+#include <rte_spinlock.h>
+#include <rte_malloc.h>
 
 #include "eal_private.h"
 
+/* spinlock for device callbacks */
+static rte_spinlock_t rte_dev_cb_lock = RTE_SPINLOCK_INITIALIZER;
+
+/**
+ * The user application callback description.
+ *
+ * It contains callback address to be registered by user application,
+ * the pointer to the parameters for callback, and the event type.
+ */
+struct rte_eal_dev_callback {
+	TAILQ_ENTRY(rte_eal_dev_callback) next; /**< Callbacks list */
+	rte_eal_dev_cb_fn cb_fn;                /**< Callback address */
+	void *cb_arg;                           /**< Parameter for callback */
+	void *ret_param;                        /**< Return parameter */
+	enum rte_dev_event_type event;      /**< device event type */
+	uint32_t active;                        /**< Callback is executing */
+};
+
+/* A genaral callback for all new devices be added onto the bus */
+static struct rte_eal_dev_callback *dev_add_cb;
+
 static int cmp_detached_dev_name(const struct rte_device *dev,
 	const void *_name)
 {
@@ -234,3 +257,130 @@ int rte_eal_hotplug_remove(const char *busname, const char *devname)
 	rte_eal_devargs_remove(busname, devname);
 	return ret;
 }
+
+int
+rte_dev_callback_register(struct rte_device *device,
+			enum rte_dev_event_type event,
+			rte_eal_dev_cb_fn cb_fn, void *cb_arg)
+{
+	struct rte_eal_dev_callback *user_cb;
+
+	if (!cb_fn || device == NULL)
+		return -EINVAL;
+
+	rte_spinlock_lock(&rte_dev_cb_lock);
+
+	if (TAILQ_EMPTY(&(device->uev_cbs)))
+		TAILQ_INIT(&(device->uev_cbs));
+
+	if (event == RTE_DEV_EVENT_ADD) {
+		user_cb = NULL;
+	} else {
+		TAILQ_FOREACH(user_cb, &(device->uev_cbs), next) {
+			if (user_cb->cb_fn == cb_fn &&
+				user_cb->cb_arg == cb_arg &&
+				user_cb->event == event) {
+				break;
+			}
+		}
+	}
+
+	/* create a new callback. */
+	if (user_cb == NULL) {
+		/* allocate a new interrupt callback entity */
+		user_cb = rte_zmalloc("eal device event",
+					sizeof(*user_cb), 0);
+		if (user_cb == NULL) {
+			RTE_LOG(ERR, EAL, "Can not allocate memory\n");
+			rte_spinlock_unlock(&rte_dev_cb_lock);
+			return -ENOMEM;
+		}
+		user_cb->cb_fn = cb_fn;
+		user_cb->cb_arg = cb_arg;
+		user_cb->event = event;
+		if (event == RTE_DEV_EVENT_ADD)
+			dev_add_cb = user_cb;
+		else
+			TAILQ_INSERT_TAIL(&(device->uev_cbs), user_cb, next);
+	}
+
+	rte_spinlock_unlock(&rte_dev_cb_lock);
+	return 0;
+}
+
+int
+rte_dev_callback_unregister(struct rte_device *device,
+			enum rte_dev_event_type event,
+			rte_eal_dev_cb_fn cb_fn, void *cb_arg)
+{
+	int ret;
+	struct rte_eal_dev_callback *cb, *next;
+
+	if (!cb_fn || device == NULL)
+		return -EINVAL;
+
+	rte_spinlock_lock(&rte_dev_cb_lock);
+
+	ret = 0;
+	if (event == RTE_DEV_EVENT_ADD) {
+		rte_free(dev_add_cb);
+		dev_add_cb = NULL;
+	} else {
+		for (cb = TAILQ_FIRST(&(device->uev_cbs)); cb != NULL;
+		      cb = next) {
+
+			next = TAILQ_NEXT(cb, next);
+
+			if (cb->cb_fn != cb_fn || cb->event != event ||
+					(cb->cb_arg != (void *)-1 &&
+					cb->cb_arg != cb_arg))
+				continue;
+
+			/*
+			 * if this callback is not executing right now,
+			 * then remove it.
+			 */
+			if (cb->active == 0) {
+				TAILQ_REMOVE(&(device->uev_cbs), cb, next);
+				rte_free(cb);
+			} else {
+				ret = -EAGAIN;
+			}
+		}
+	}
+	rte_spinlock_unlock(&rte_dev_cb_lock);
+	return ret;
+}
+
+int
+_rte_dev_callback_process(struct rte_device *device,
+			enum rte_dev_event_type event,
+			void *ret_param)
+{
+	struct rte_eal_dev_callback dev_cb;
+	struct rte_eal_dev_callback *cb_lst;
+	int rc = 0;
+
+	rte_spinlock_lock(&rte_dev_cb_lock);
+	if (event == RTE_DEV_EVENT_ADD) {
+		if (ret_param != NULL)
+			dev_add_cb->ret_param = ret_param;
+
+		rc = dev_add_cb->cb_fn(dev_add_cb->event,
+				dev_add_cb->cb_arg, dev_add_cb->ret_param);
+	} else {
+		TAILQ_FOREACH(cb_lst, &(device->uev_cbs), next) {
+			if (cb_lst->cb_fn == NULL || cb_lst->event != event)
+				continue;
+			dev_cb = *cb_lst;
+			cb_lst->active = 1;
+			if (ret_param != NULL)
+				dev_cb.ret_param = ret_param;
+			rc = dev_cb.cb_fn(dev_cb.event,
+					dev_cb.cb_arg, dev_cb.ret_param);
+			cb_lst->active = 0;
+		}
+	}
+	rte_spinlock_unlock(&rte_dev_cb_lock);
+	return rc;
+}
diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h
index 9342e0c..ab12862 100644
--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -51,6 +51,44 @@ extern "C" {
 
 #include <rte_log.h>
 
+#include <exec-env/rte_dev.h>
+
+enum uev_monitor_netlink_group {
+	UEV_MONITOR_KERNEL,
+	UEV_MONITOR_UDEV,
+};
+/**
+ * The device event type.
+ */
+enum rte_dev_event_type {
+	RTE_DEV_EVENT_UNKNOWN,	/**< unknown event type */
+	RTE_DEV_EVENT_ADD,	/**< device being added */
+	RTE_DEV_EVENT_REMOVE,
+				/**< device being removed */
+	RTE_DEV_EVENT_CHANGE,
+				/**< device status being changed,
+				 * etc charger percent
+				 */
+	RTE_DEV_EVENT_MOVE,	/**< device sysfs path being moved */
+	RTE_DEV_EVENT_ONLINE,	/**< device being enable */
+	RTE_DEV_EVENT_OFFLINE,	/**< device being disable */
+	RTE_DEV_EVENT_MAX	/**< max value of this enum */
+};
+
+struct rte_eal_uevent {
+	enum rte_dev_event_type type;	/**< device event type */
+	int subsystem;				/**< subsystem id */
+	char *devname;				/**< device name */
+	enum uev_monitor_netlink_group group;	/**< device netlink group */
+};
+
+typedef int (*rte_eal_dev_cb_fn)(enum rte_dev_event_type event,
+					void *cb_arg, void *ret_param);
+
+struct rte_eal_dev_callback;
+/** @internal Structure to keep track of registered callbacks */
+TAILQ_HEAD(rte_eal_dev_cb_list, rte_eal_dev_callback);
+
 __attribute__((format(printf, 2, 0)))
 static inline void
 rte_pmd_debug_trace(const char *func_name, const char *fmt, ...)
@@ -166,6 +204,8 @@ struct rte_device {
 	const struct rte_driver *driver;/**< Associated driver */
 	int numa_node;                /**< NUMA node connection */
 	struct rte_devargs *devargs;  /**< Device user arguments */
+	/** User application callbacks for device event */
+	struct rte_eal_dev_cb_list uev_cbs;
 };
 
 /**
@@ -293,4 +333,62 @@ __attribute__((used)) = str
 }
 #endif
 
+/**
+ * It registers the callback for the specific event. Multiple
+ * callbacks cal be registered at the same time.
+ * @param event
+ *  The device event type.
+ * @param cb_fn
+ *  callback address.
+ * @param cb_arg
+ *  address of parameter for callback.
+ *
+ * @return
+ *  - On success, zero.
+ *  - On failure, a negative value.
+ */
+int rte_dev_callback_register(struct rte_device *device,
+			enum rte_dev_event_type event,
+			rte_eal_dev_cb_fn cb_fn, void *cb_arg);
+
+/**
+ * It unregisters the callback according to the specified event.
+ *
+ * @param event
+ *  The event type which corresponding to the callback.
+ * @param cb_fn
+ *  callback address.
+ *  address of parameter for callback, (void *)-1 means to remove all
+ *  registered which has the same callback address.
+ *
+ * @return
+ *  - On success, return the number of callback entities removed.
+ *  - On failure, a negative value.
+ */
+int rte_dev_callback_unregister(struct rte_device *device,
+			enum rte_dev_event_type event,
+			rte_eal_dev_cb_fn cb_fn, void *cb_arg);
+
+/**
+ * @internal Executes all the user application registered callbacks for
+ * the specific device. It is for DPDK internal user only. User
+ * application should not call it directly.
+ *
+ * @param event
+ *  The device event type.
+ * @param cb_arg
+ *  callback parameter.
+ * @param ret_param
+ *  To pass data back to user application.
+ *  This allows the user application to decide if a particular function
+ *  is permitted or not.
+ *
+ * @return
+ *  - On success, return zero.
+ *  - On failure, a negative value.
+ */
+int
+_rte_dev_callback_process(struct rte_device *device,
+			enum rte_dev_event_type event,
+			void *ret_param);
 #endif /* _RTE_DEV_H_ */
diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index 588c0bd..b9f5d31 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -39,6 +39,7 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_lcore.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_timer.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_interrupts.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_alarm.c
+SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_dev.c
 
 # from common dir
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_common_lcore.c
@@ -92,7 +93,7 @@ ifeq ($(CONFIG_RTE_TOOLCHAIN_GCC),y)
 CFLAGS_eal_thread.o += -Wno-return-type
 endif
 
-INC := rte_kni_common.h
+INC := rte_kni_common.h rte_dev.h
 
 SYMLINK-$(CONFIG_RTE_EXEC_ENV_LINUXAPP)-include/exec-env := \
 	$(addprefix include/exec-env/,$(INC))
diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
new file mode 100644
index 0000000..4812dbc
--- /dev/null
+++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
@@ -0,0 +1,223 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdio.h>
+#include <string.h>
+#include <inttypes.h>
+#include <sys/queue.h>
+#include <sys/signalfd.h>
+#include <sys/ioctl.h>
+#include <sys/socket.h>
+#include <linux/netlink.h>
+#include <sys/epoll.h>
+#include <unistd.h>
+#include <signal.h>
+#include <stdbool.h>
+
+#include <rte_malloc.h>
+#include <rte_bus.h>
+#include <rte_dev.h>
+#include <rte_devargs.h>
+#include <rte_debug.h>
+#include <rte_log.h>
+#include <rte_service.h>
+#include <rte_service_component.h>
+
+#include "eal_thread.h"
+
+bool service_exit = true;
+
+bool service_no_init = true;
+
+#define DEV_EV_MNT_SERVICE_NAME "device_event_monitor_service"
+
+static int
+dev_monitor_fd_new(void)
+{
+
+	int uevent_fd;
+
+	uevent_fd = socket(PF_NETLINK, SOCK_RAW | SOCK_CLOEXEC |
+			SOCK_NONBLOCK,
+			NETLINK_KOBJECT_UEVENT);
+	if (uevent_fd < 0) {
+		RTE_LOG(ERR, EAL, "create uevent fd failed\n");
+		return -1;
+	}
+	return uevent_fd;
+}
+
+static int
+dev_monitor_enable(int netlink_fd)
+{
+	struct sockaddr_nl addr;
+	int ret;
+	int size = 64 * 1024;
+	int nonblock = 1;
+
+	memset(&addr, 0, sizeof(addr));
+	addr.nl_family = AF_NETLINK;
+	addr.nl_pid = 0;
+	addr.nl_groups = 0xffffffff;
+
+	if (bind(netlink_fd, (struct sockaddr *) &addr, sizeof(addr)) < 0) {
+		RTE_LOG(ERR, EAL, "bind failed\n");
+		goto err;
+	}
+
+	setsockopt(netlink_fd, SOL_SOCKET, SO_PASSCRED, &size, sizeof(size));
+
+	ret = ioctl(netlink_fd, FIONBIO, &nonblock);
+	if (ret != 0) {
+		RTE_LOG(ERR, EAL, "ioctl(FIONBIO) failed\n");
+		goto err;
+	}
+	return 0;
+err:
+	close(netlink_fd);
+	return -1;
+}
+
+static int
+dev_uev_process(__rte_unused struct epoll_event *events, __rte_unused int nfds)
+{
+	/* TODO: device uevent processing */
+	return 0;
+}
+
+/**
+ * It builds/rebuilds up the epoll file descriptor with all the
+ * file descriptors being waited on. Then handles the interrupts.
+ *
+ * @param arg
+ *  pointer. (unused)
+ *
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+static int32_t dev_uev_monitoring(__rte_unused void *arg)
+{
+	int netlink_fd = -1;
+	struct epoll_event ep_kernel;
+	int fd_ep = -1;
+
+	service_exit = false;
+
+	fd_ep = epoll_create1(EPOLL_CLOEXEC);
+	if (fd_ep < 0) {
+		RTE_LOG(ERR, EAL, "error creating epoll fd: %m\n");
+		goto out;
+	}
+
+	netlink_fd = dev_monitor_fd_new();
+
+	if (dev_monitor_enable(netlink_fd) < 0) {
+		RTE_LOG(ERR, EAL, "error subscribing to kernel events\n");
+		goto out;
+	}
+
+	memset(&ep_kernel, 0, sizeof(struct epoll_event));
+	ep_kernel.events = EPOLLIN | EPOLLPRI | EPOLLRDHUP | EPOLLHUP;
+	ep_kernel.data.fd = netlink_fd;
+	if (epoll_ctl(fd_ep, EPOLL_CTL_ADD, netlink_fd,
+		&ep_kernel) < 0) {
+		RTE_LOG(ERR, EAL, "error addding fd to epoll: %m\n");
+		goto out;
+	}
+
+	while (!service_exit) {
+		int fdcount;
+		struct epoll_event ev[1];
+
+		fdcount = epoll_wait(fd_ep, ev, 1, -1);
+		if (fdcount < 0) {
+			if (errno != EINTR)
+				RTE_LOG(ERR, EAL, "error receiving uevent "
+					"message: %m\n");
+				continue;
+			}
+
+		/* epoll_wait has at least one fd ready to read */
+		if (dev_uev_process(ev, fdcount) < 0) {
+			if (errno != EINTR)
+				RTE_LOG(ERR, EAL, "error processing uevent "
+					"message: %m\n");
+		}
+	}
+	return 0;
+out:
+	if (fd_ep >= 0)
+		close(fd_ep);
+	if (netlink_fd >= 0)
+		close(netlink_fd);
+	rte_panic("uev monitoring fail\n");
+	return -1;
+}
+
+int
+rte_dev_monitor_start(void)
+{
+	int ret;
+	struct rte_service_spec service;
+	uint32_t id;
+	const uint32_t sid = 0;
+
+	if (!service_no_init)
+		return 0;
+
+	uint32_t slcore_1 = rte_get_next_lcore(/* start core */ -1,
+					       /* skip master */ 1,
+					       /* wrap */ 0);
+
+	ret = rte_service_lcore_add(slcore_1);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "dev event monitor lcore add fail");
+		return ret;
+	}
+
+	memset(&service, 0, sizeof(service));
+	snprintf(service.name, sizeof(service.name), DEV_EV_MNT_SERVICE_NAME);
+
+	service.socket_id = rte_socket_id();
+	service.callback = dev_uev_monitoring;
+	service.callback_userdata = NULL;
+	service.capabilities = 0;
+	ret = rte_service_component_register(&service, &id);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Failed to register service %s "
+			"err = %" PRId32,
+			service.name, ret);
+		return ret;
+	}
+	ret = rte_service_runstate_set(sid, 1);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Failed to set the runstate of "
+			"the service");
+		return ret;
+	}
+	ret = rte_service_component_runstate_set(id, 1);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Failed to set the backend runstate"
+			" of a component");
+		return ret;
+	}
+	ret = rte_service_map_lcore_set(sid, slcore_1, 1);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Failed to enable lcore 1 on "
+			"dev event monitor service");
+		return ret;
+	}
+	rte_service_lcore_start(slcore_1);
+	service_no_init = false;
+	return 0;
+}
+
+int
+rte_dev_monitor_stop(void)
+{
+	service_exit = true;
+	service_no_init = true;
+	return 0;
+}
diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_dev.h b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_dev.h
new file mode 100644
index 0000000..70413b3
--- /dev/null
+++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_dev.h
@@ -0,0 +1,39 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#ifndef _RTE_DEV_H_
+#error "don't include this file directly, please include generic <rte_dev.h>"
+#endif
+
+#ifndef _RTE_LINUXAPP_DEV_H_
+#define _RTE_LINUXAPP_DEV_H_
+
+#include <stdio.h>
+
+#include <rte_dev.h>
+
+/**
+ * Start the device uevent monitoring.
+ *
+ * @param none
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+rte_dev_monitor_start(void);
+
+/**
+ * Stop the device uevent monitoring .
+ *
+ * @param none
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+
+int
+rte_dev_monitor_stop(void);
+
+#endif /* _RTE_LINUXAPP_DEV_H_ */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH V9 2/5] eal: add uevent pass and process function
  2018-01-10  9:12                             ` [PATCH V9 0/5] add uevent mechanism in eal framework Jeff Guo
  2018-01-10  9:12                               ` [PATCH V9 1/5] eal: add uevent monitor api and callback func Jeff Guo
@ 2018-01-10  9:12                               ` Jeff Guo
  2018-01-11 14:05                                 ` [PATCH V10 1/2] eal: add uevent monitor api and callback func Jeff Guo
  2018-01-10  9:12                               ` [PATCH V9 3/5] app/testpmd: use uevent to monitor hotplug Jeff Guo
                                                 ` (2 subsequent siblings)
  4 siblings, 1 reply; 494+ messages in thread
From: Jeff Guo @ 2018-01-10  9:12 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, gaetan.rivet
  Cc: konstantin.ananyev, jblunck, shreyansh.jain, jingjing.wu, dev,
	jia.guo, thomas, helin.zhang, motih

In order to handle the uevent which have been detected from the kernel side,
add uevent process function, let hot plug event to be example to show uevent
mechanism how to pass the uevent and process the uevent.

About uevent passing and processing, add below functions in linux eal dev layer.
FreeBSD not support uevent ,so let it to be void and do not implement in function.
a.dev_uev_parse
b.dev_uev_receive
c.dev_uev_process

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v9->v8:
split the patch set into small and explicit patch
---
 drivers/bus/pci/pci_common.c            |  20 ++++++
 drivers/bus/vdev/vdev.c                 |  20 ++++++
 lib/librte_eal/common/eal_common_bus.c  |  28 ++++++++
 lib/librte_eal/common/include/rte_bus.h |  36 ++++++++++
 lib/librte_eal/common/include/rte_dev.h |  21 ++++++
 lib/librte_eal/linuxapp/eal/eal_dev.c   | 120 +++++++++++++++++++++++++++++++-
 6 files changed, 243 insertions(+), 2 deletions(-)

diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
index 104fdf9..c4415a0 100644
--- a/drivers/bus/pci/pci_common.c
+++ b/drivers/bus/pci/pci_common.c
@@ -502,6 +502,25 @@ pci_find_device(const struct rte_device *start, rte_dev_cmp_t cmp,
 	return NULL;
 }
 
+static struct rte_device *
+pci_find_device_by_name(const struct rte_device *start,
+		rte_dev_cmp_name_t cmp_name,
+		const void *data)
+{
+	struct rte_pci_device *dev;
+
+	FOREACH_DEVICE_ON_PCIBUS(dev) {
+		if (start && &dev->device == start) {
+			start = NULL; /* starting point found */
+			continue;
+		}
+		if (cmp_name(dev->device.name, data) == 0)
+			return &dev->device;
+	}
+
+	return NULL;
+}
+
 static int
 pci_plug(struct rte_device *dev)
 {
@@ -528,6 +547,7 @@ struct rte_pci_bus rte_pci_bus = {
 		.scan = rte_pci_scan,
 		.probe = rte_pci_probe,
 		.find_device = pci_find_device,
+		.find_device_by_name = pci_find_device_by_name,
 		.plug = pci_plug,
 		.unplug = pci_unplug,
 		.parse = pci_parse,
diff --git a/drivers/bus/vdev/vdev.c b/drivers/bus/vdev/vdev.c
index fd7736d..cac2aa0 100644
--- a/drivers/bus/vdev/vdev.c
+++ b/drivers/bus/vdev/vdev.c
@@ -323,6 +323,25 @@ vdev_find_device(const struct rte_device *start, rte_dev_cmp_t cmp,
 	return NULL;
 }
 
+
+static struct rte_device *
+vdev_find_device_by_name(const struct rte_device *start,
+		rte_dev_cmp_name_t cmp_name,
+		const void *data)
+{
+	struct rte_vdev_device *dev;
+
+	TAILQ_FOREACH(dev, &vdev_device_list, next) {
+		if (start && &dev->device == start) {
+			start = NULL;
+			continue;
+		}
+		if (cmp_name(dev->device.name, data) == 0)
+			return &dev->device;
+	}
+	return NULL;
+}
+
 static int
 vdev_plug(struct rte_device *dev)
 {
@@ -339,6 +358,7 @@ static struct rte_bus rte_vdev_bus = {
 	.scan = vdev_scan,
 	.probe = vdev_probe,
 	.find_device = vdev_find_device,
+	.find_device_by_name = vdev_find_device_by_name,
 	.plug = vdev_plug,
 	.unplug = vdev_unplug,
 	.parse = vdev_parse,
diff --git a/lib/librte_eal/common/eal_common_bus.c b/lib/librte_eal/common/eal_common_bus.c
index 3e022d5..efd5539 100644
--- a/lib/librte_eal/common/eal_common_bus.c
+++ b/lib/librte_eal/common/eal_common_bus.c
@@ -51,6 +51,7 @@ rte_bus_register(struct rte_bus *bus)
 	RTE_VERIFY(bus->scan);
 	RTE_VERIFY(bus->probe);
 	RTE_VERIFY(bus->find_device);
+	RTE_VERIFY(bus->find_device_by_name);
 	/* Buses supporting driver plug also require unplug. */
 	RTE_VERIFY(!bus->plug || bus->unplug);
 
@@ -170,6 +171,14 @@ cmp_rte_device(const struct rte_device *dev1, const void *_dev2)
 }
 
 static int
+cmp_rte_device_name(const char *dev_name1, const void *_dev_name2)
+{
+	const char *dev_name2 = _dev_name2;
+
+	return strcmp(dev_name1, dev_name2);
+}
+
+static int
 bus_find_device(const struct rte_bus *bus, const void *_dev)
 {
 	struct rte_device *dev;
@@ -178,6 +187,25 @@ bus_find_device(const struct rte_bus *bus, const void *_dev)
 	return dev == NULL;
 }
 
+static struct rte_device *
+bus_find_device_by_name(const struct rte_bus *bus, const void *_dev_name)
+{
+	struct rte_device *dev;
+
+	dev = bus->find_device_by_name(NULL, cmp_rte_device_name, _dev_name);
+	return dev;
+}
+
+struct rte_device *
+
+rte_bus_find_device(const struct rte_bus *bus, const void *_dev_name)
+{
+	struct rte_device *dev;
+
+	dev = bus_find_device_by_name(bus, _dev_name);
+	return dev;
+}
+
 struct rte_bus *
 rte_bus_find_by_device(const struct rte_device *dev)
 {
diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
index 6fb0834..6dcfdb3 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -122,6 +122,34 @@ typedef struct rte_device *
 			 const void *data);
 
 /**
+ * Device iterator to find a device on a bus.
+ *
+ * This function returns an rte_device if one of those held by the bus
+ * matches the data passed as parameter.
+ *
+ * If the comparison function returns zero this function should stop iterating
+ * over any more devices. To continue a search the device of a previous search
+ * can be passed via the start parameter.
+ *
+ * @param cmp
+ *	the device name comparison function.
+ *
+ * @param data
+ *	Data to compare each device against.
+ *
+ * @param start
+ *	starting point for the iteration
+ *
+ * @return
+ *	The first device matching the data, NULL if none exists.
+ */
+typedef struct rte_device *
+(*rte_bus_find_device_by_name_t)(const struct rte_device *start,
+			 rte_dev_cmp_name_t cmp,
+			 const void *data);
+
+
+/**
  * Implementation specific probe function which is responsible for linking
  * devices on that bus with applicable drivers.
  *
@@ -206,6 +234,8 @@ struct rte_bus {
 	rte_bus_scan_t scan;         /**< Scan for devices attached to bus */
 	rte_bus_probe_t probe;       /**< Probe devices on bus */
 	rte_bus_find_device_t find_device; /**< Find a device on the bus */
+	rte_bus_find_device_by_name_t find_device_by_name;
+				     /**< Find a device on the bus */
 	rte_bus_plug_t plug;         /**< Probe single device for drivers */
 	rte_bus_unplug_t unplug;     /**< Remove single device from driver */
 	rte_bus_parse_t parse;       /**< Parse a device name */
@@ -306,6 +336,12 @@ struct rte_bus *rte_bus_find(const struct rte_bus *start, rte_bus_cmp_t cmp,
 struct rte_bus *rte_bus_find_by_device(const struct rte_device *dev);
 
 /**
+ * Find the registered bus for a particular device.
+ */
+struct rte_device *rte_bus_find_device(const struct rte_bus *bus,
+				const void *dev_name);
+
+/**
  * Find the registered bus for a given name.
  */
 struct rte_bus *rte_bus_find_by_name(const char *busname);
diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h
index ab12862..d394ad3 100644
--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -53,10 +53,28 @@ extern "C" {
 
 #include <exec-env/rte_dev.h>
 
+#define RTE_EAL_UEV_MSG_LEN 4096
+#define RTE_EAL_UEV_MSG_ELEM_LEN 128
+
+enum rte_dev_state {
+	RTE_DEV_UNDEFINED,	/**< unknown device state */
+	RTE_DEV_FAULT,	/**< device fault or error */
+	RTE_DEV_PARSED,	/**< device have been parsed on bus*/
+	RTE_DEV_PROBED,	/**< devcie have been probed driver  */
+};
+
+enum uev_subsystem {
+	UEV_SUBSYSTEM_UIO,
+	UEV_SUBSYSTEM_VFIO,
+	UEV_SUBSYSTEM_PCI,
+	UEV_SUBSYSTEM_MAX
+};
+
 enum uev_monitor_netlink_group {
 	UEV_MONITOR_KERNEL,
 	UEV_MONITOR_UDEV,
 };
+
 /**
  * The device event type.
  */
@@ -204,6 +222,7 @@ struct rte_device {
 	const struct rte_driver *driver;/**< Associated driver */
 	int numa_node;                /**< NUMA node connection */
 	struct rte_devargs *devargs;  /**< Device user arguments */
+	enum rte_dev_state state;  /**< Device state */
 	/** User application callbacks for device event */
 	struct rte_eal_dev_cb_list uev_cbs;
 };
@@ -288,6 +307,8 @@ int rte_eal_hotplug_remove(const char *busname, const char *devname);
  */
 typedef int (*rte_dev_cmp_t)(const struct rte_device *dev, const void *data);
 
+typedef int (*rte_dev_cmp_name_t)(const char *dev_name, const void *data);
+
 #define RTE_PMD_EXPORT_NAME_ARRAY(n, idx) n##idx[]
 
 #define RTE_PMD_EXPORT_NAME(name, idx) \
diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
index 4812dbc..9d347f2 100644
--- a/lib/librte_eal/linuxapp/eal/eal_dev.c
+++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
@@ -79,10 +79,126 @@ dev_monitor_enable(int netlink_fd)
 	return -1;
 }
 
+static void
+dev_uev_parse(const char *buf, struct rte_eal_uevent *event)
+{
+	char action[RTE_EAL_UEV_MSG_ELEM_LEN];
+	char subsystem[RTE_EAL_UEV_MSG_ELEM_LEN];
+	char dev_path[RTE_EAL_UEV_MSG_ELEM_LEN];
+	char pci_slot_name[RTE_EAL_UEV_MSG_ELEM_LEN];
+	int i = 0;
+
+	memset(action, 0, RTE_EAL_UEV_MSG_ELEM_LEN);
+	memset(subsystem, 0, RTE_EAL_UEV_MSG_ELEM_LEN);
+	memset(dev_path, 0, RTE_EAL_UEV_MSG_ELEM_LEN);
+	memset(pci_slot_name, 0, RTE_EAL_UEV_MSG_ELEM_LEN);
+
+	while (i < RTE_EAL_UEV_MSG_LEN) {
+		for (; i < RTE_EAL_UEV_MSG_LEN; i++) {
+			if (*buf)
+				break;
+			buf++;
+		}
+		if (!strncmp(buf, "libudev", 7)) {
+			buf += 7;
+			i += 7;
+			event->group = UEV_MONITOR_UDEV;
+		}
+		if (!strncmp(buf, "ACTION=", 7)) {
+			buf += 7;
+			i += 7;
+			snprintf(action, sizeof(action), "%s", buf);
+		} else if (!strncmp(buf, "DEVPATH=", 8)) {
+			buf += 8;
+			i += 8;
+			snprintf(dev_path, sizeof(dev_path), "%s", buf);
+		} else if (!strncmp(buf, "SUBSYSTEM=", 10)) {
+			buf += 10;
+			i += 10;
+			snprintf(subsystem, sizeof(subsystem), "%s", buf);
+		} else if (!strncmp(buf, "PCI_SLOT_NAME=", 14)) {
+			buf += 14;
+			i += 14;
+			snprintf(pci_slot_name, sizeof(subsystem), "%s", buf);
+			event->devname = pci_slot_name;
+		}
+		for (; i < RTE_EAL_UEV_MSG_LEN; i++) {
+			if (*buf == '\0')
+				break;
+			buf++;
+		}
+	}
+
+	if (!strncmp(subsystem, "pci", 3))
+		event->subsystem = UEV_SUBSYSTEM_PCI;
+	if (!strncmp(action, "add", 3))
+		event->type = RTE_DEV_EVENT_ADD;
+	if (!strncmp(action, "remove", 6))
+		event->type = RTE_DEV_EVENT_REMOVE;
+	event->devname = pci_slot_name;
+}
+
 static int
-dev_uev_process(__rte_unused struct epoll_event *events, __rte_unused int nfds)
+dev_uev_receive(int fd, struct rte_eal_uevent *uevent)
 {
-	/* TODO: device uevent processing */
+	int ret;
+	char buf[RTE_EAL_UEV_MSG_LEN];
+
+	memset(uevent, 0, sizeof(struct rte_eal_uevent));
+	memset(buf, 0, RTE_EAL_UEV_MSG_LEN);
+
+	ret = recv(fd, buf, RTE_EAL_UEV_MSG_LEN - 1, MSG_DONTWAIT);
+	if (ret < 0) {
+		RTE_LOG(ERR, EAL,
+		"Socket read error(%d): %s\n",
+		errno, strerror(errno));
+		return -1;
+	} else if (ret == 0)
+		/* connection closed */
+		return -1;
+
+	dev_uev_parse(buf, uevent);
+
+	return 0;
+}
+
+static int
+dev_uev_process(struct epoll_event *events, int nfds)
+{
+	struct rte_bus *bus;
+	struct rte_device *dev;
+	struct rte_eal_uevent uevent;
+	int i;
+
+	for (i = 0; i < nfds; i++) {
+		/**
+		 * check device uevent from kernel side, no need to check
+		 * uevent from udev.
+		 */
+		if ((dev_uev_receive(events[i].data.fd, &uevent)) ||
+			(uevent.group == UEV_MONITOR_UDEV))
+			return 0;
+
+		/* default handle all pci devcie when is being hot plug */
+		if (uevent.subsystem == UEV_SUBSYSTEM_PCI) {
+			bus = rte_bus_find_by_name("pci");
+			dev = rte_bus_find_device(bus, uevent.devname);
+			if (uevent.type == RTE_DEV_EVENT_REMOVE) {
+
+				if ((!dev) || dev->state == RTE_DEV_UNDEFINED)
+					return 0;
+				return(_rte_dev_callback_process(dev,
+				  RTE_DEV_EVENT_REMOVE,
+				  NULL));
+			} else if (uevent.type == RTE_DEV_EVENT_ADD) {
+				if (dev == NULL) {
+					return(_rte_dev_callback_process(NULL,
+					  RTE_DEV_EVENT_ADD,
+					  uevent.devname));
+				}
+			}
+		}
+	}
 	return 0;
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH V9 3/5] app/testpmd: use uevent to monitor hotplug
  2018-01-10  9:12                             ` [PATCH V9 0/5] add uevent mechanism in eal framework Jeff Guo
  2018-01-10  9:12                               ` [PATCH V9 1/5] eal: add uevent monitor api and callback func Jeff Guo
  2018-01-10  9:12                               ` [PATCH V9 2/5] eal: add uevent pass and process function Jeff Guo
@ 2018-01-10  9:12                               ` Jeff Guo
  2018-01-10  9:12                               ` [PATCH V9 4/5] pci_uio: add uevent hotplug failure handler in pci Jeff Guo
  2018-01-10  9:12                               ` [PATCH V9 5/5] pci: add driver auto bind for hot insertion Jeff Guo
  4 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-01-10  9:12 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, gaetan.rivet
  Cc: konstantin.ananyev, jblunck, shreyansh.jain, jingjing.wu, dev,
	jia.guo, thomas, helin.zhang, motih

use testpmd for example, to show app how to request and use
uevent monitoring to handle the hot removal event and the
hot insertion event.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v9->v8:
split the patch set into small and explicit patch
---
 app/test-pmd/testpmd.c | 179 +++++++++++++++++++++++++++++++++++++++++++++++++
 app/test-pmd/testpmd.h |   9 +++
 2 files changed, 188 insertions(+)

diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 9414d0e..37c859a 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -373,6 +373,8 @@ uint8_t bitrate_enabled;
 struct gro_status gro_ports[RTE_MAX_ETHPORTS];
 uint8_t gro_flush_cycles = GRO_DEFAULT_FLUSH_CYCLES;
 
+static struct hotplug_request_list hp_list;
+
 /* Forward function declarations */
 static void map_port_queue_stats_mapping_registers(portid_t pi,
 						   struct rte_port *port);
@@ -380,6 +382,13 @@ static void check_all_ports_link_status(uint32_t port_mask);
 static int eth_event_callback(portid_t port_id,
 			      enum rte_eth_event_type type,
 			      void *param, void *ret_param);
+static int eth_uevent_callback(enum rte_dev_event_type type,
+			      void *param, void *ret_param);
+static int eth_uevent_callback_register(portid_t pid);
+static int in_hotplug_list(const char *dev_name);
+
+static int hotplug_list_add(const char *dev_name,
+			    enum rte_dev_event_type event);
 
 /*
  * Check if all the ports are started.
@@ -1729,6 +1738,32 @@ reset_port(portid_t pid)
 	printf("Done\n");
 }
 
+static int
+eth_uevent_callback_register(portid_t pid)
+{
+	int diag;
+	struct rte_eth_dev *dev;
+	enum rte_dev_event_type dev_event_type;
+
+	/* register the uevent callback */
+	dev = &rte_eth_devices[pid];
+	for (dev_event_type = RTE_DEV_EVENT_ADD;
+		 dev_event_type < RTE_DEV_EVENT_CHANGE;
+		 dev_event_type++) {
+		diag = rte_dev_callback_register(dev->device, dev_event_type,
+			eth_uevent_callback,
+			(void *)(intptr_t)pid);
+		if (diag) {
+			printf("Failed to setup uevent callback for"
+				" device event %d\n",
+				dev_event_type);
+			return -1;
+		}
+	}
+
+	return 0;
+}
+
 void
 attach_port(char *identifier)
 {
@@ -1745,6 +1780,8 @@ attach_port(char *identifier)
 	if (rte_eth_dev_attach(identifier, &pi))
 		return;
 
+	eth_uevent_callback_register(pi);
+
 	socket_id = (unsigned)rte_eth_dev_socket_id(pi);
 	/* if socket_id is invalid, set to 0 */
 	if (check_socket_id(socket_id) < 0)
@@ -1756,6 +1793,8 @@ attach_port(char *identifier)
 
 	ports[pi].port_status = RTE_PORT_STOPPED;
 
+	hotplug_list_add(identifier, RTE_DEV_EVENT_REMOVE);
+
 	printf("Port %d is attached. Now total ports is %d\n", pi, nb_ports);
 	printf("Done\n");
 }
@@ -1782,6 +1821,9 @@ detach_port(portid_t port_id)
 
 	nb_ports = rte_eth_dev_count();
 
+	hotplug_list_add(rte_eth_devices[port_id].device->name,
+			 RTE_DEV_EVENT_ADD);
+
 	printf("Port '%s' is detached. Now total ports is %d\n",
 			name, nb_ports);
 	printf("Done\n");
@@ -1805,6 +1847,9 @@ pmd_test_exit(void)
 			close_port(pt_id);
 		}
 	}
+
+	rte_dev_monitor_stop();
+
 	printf("\nBye...\n");
 }
 
@@ -1889,6 +1934,49 @@ rmv_event_callback(void *arg)
 			dev->device->name);
 }
 
+static void
+rmv_uevent_callback(void *arg)
+{
+	char name[RTE_ETH_NAME_MAX_LEN];
+	uint8_t port_id = (intptr_t)arg;
+
+	rte_eal_alarm_cancel(rmv_uevent_callback, arg);
+
+	RTE_ETH_VALID_PORTID_OR_RET(port_id);
+	printf("removing port id:%u\n", port_id);
+
+	if (!in_hotplug_list(rte_eth_devices[port_id].device->name))
+		return;
+
+	stop_packet_forwarding();
+
+	stop_port(port_id);
+	close_port(port_id);
+	if (rte_eth_dev_detach(port_id, name)) {
+		RTE_LOG(ERR, USER1, "Failed to detach port '%s'\n", name);
+		return;
+	}
+
+	nb_ports = rte_eth_dev_count();
+
+	printf("Port '%s' is detached. Now total ports is %d\n",
+			name, nb_ports);
+}
+
+static void
+add_uevent_callback(void *arg)
+{
+	char *dev_name = (char *)arg;
+
+	rte_eal_alarm_cancel(add_uevent_callback, arg);
+
+	if (!in_hotplug_list(dev_name))
+		return;
+
+	RTE_LOG(ERR, EAL, "add device: %s\n", dev_name);
+	attach_port(dev_name);
+}
+
 /* This function is used by the interrupt thread */
 static int
 eth_event_callback(portid_t port_id, enum rte_eth_event_type type, void *param,
@@ -1931,6 +2019,88 @@ eth_event_callback(portid_t port_id, enum rte_eth_event_type type, void *param,
 }
 
 static int
+in_hotplug_list(const char *dev_name)
+{
+	struct hotplug_request *hp_request = NULL;
+
+	TAILQ_FOREACH(hp_request, &hp_list, next) {
+		if (!strcmp(hp_request->dev_name, dev_name))
+			break;
+	}
+
+	if (hp_request)
+		return 1;
+
+	return 0;
+}
+
+static int
+hotplug_list_add(const char *dev_name, enum rte_dev_event_type event)
+{
+	struct hotplug_request *hp_request;
+
+	hp_request = rte_zmalloc("hoplug request",
+			sizeof(*hp_request), 0);
+	if (hp_request == NULL) {
+		fprintf(stderr, "%s can not alloc memory\n",
+			__func__);
+		return -ENOMEM;
+	}
+
+	hp_request->dev_name = dev_name;
+	hp_request->event = event;
+
+	TAILQ_INSERT_TAIL(&hp_list, hp_request, next);
+
+	return 0;
+}
+
+/* This function is used by the interrupt thread */
+static int
+eth_uevent_callback(enum rte_dev_event_type type, void *arg,
+		  void *ret_param)
+{
+	static const char * const event_desc[] = {
+		[RTE_DEV_EVENT_UNKNOWN] = "Unknown",
+		[RTE_DEV_EVENT_ADD] = "add",
+		[RTE_DEV_EVENT_REMOVE] = "remove",
+	};
+	static char *device_name;
+
+	RTE_SET_USED(ret_param);
+
+	if (type >= RTE_DEV_EVENT_MAX) {
+		fprintf(stderr, "%s called upon invalid event %d\n",
+			__func__, type);
+		fflush(stderr);
+	} else if (event_print_mask & (UINT32_C(1) << type)) {
+		printf("%s event\n",
+			event_desc[type]);
+		fflush(stdout);
+	}
+
+	switch (type) {
+	case RTE_DEV_EVENT_REMOVE:
+		if (rte_eal_alarm_set(100000,
+			rmv_uevent_callback, arg))
+			fprintf(stderr, "Could not set up deferred "
+				"device removal\n");
+		break;
+	case RTE_DEV_EVENT_ADD:
+		device_name = malloc(strlen((const char *)ret_param) + 1);
+		strcpy(device_name, ret_param);
+		if (rte_eal_alarm_set(500000,
+			add_uevent_callback, device_name))
+			fprintf(stderr, "Could not set up deferred "
+				"device add\n");
+		break;
+	default:
+		break;
+	}
+	return 0;
+}
+
+static int
 set_tx_queue_stats_mapping_registers(portid_t port_id, struct rte_port *port)
 {
 	uint16_t i;
@@ -2415,6 +2585,15 @@ main(int argc, char** argv)
 		       nb_rxq, nb_txq);
 
 	init_config();
+
+	/* enable hot plug monitoring */
+	TAILQ_INIT(&hp_list);
+	rte_dev_monitor_start();
+	RTE_ETH_FOREACH_DEV(port_id) {
+		hotplug_list_add(rte_eth_devices[port_id].device->name,
+			 RTE_DEV_EVENT_REMOVE);
+		eth_uevent_callback_register(port_id);
+	}
 	if (start_port(RTE_PORT_ALL) != 0)
 		rte_exit(EXIT_FAILURE, "Start ports failed\n");
 
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 2a266fd..64254e6 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -63,6 +63,15 @@ typedef uint16_t streamid_t;
 #define TM_MODE			0
 #endif
 
+struct hotplug_request {
+	TAILQ_ENTRY(hotplug_request) next; /**< Callbacks list */
+	const char *dev_name;                /* request device name */
+	enum rte_dev_event_type event;      /**< device event type */
+};
+
+/** @internal Structure to keep track of registered callbacks */
+TAILQ_HEAD(hotplug_request_list, hotplug_request);
+
 enum {
 	PORT_TOPOLOGY_PAIRED,
 	PORT_TOPOLOGY_CHAINED,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH V9 4/5] pci_uio: add uevent hotplug failure handler in pci
  2018-01-10  9:12                             ` [PATCH V9 0/5] add uevent mechanism in eal framework Jeff Guo
                                                 ` (2 preceding siblings ...)
  2018-01-10  9:12                               ` [PATCH V9 3/5] app/testpmd: use uevent to monitor hotplug Jeff Guo
@ 2018-01-10  9:12                               ` Jeff Guo
  2018-01-10  9:12                               ` [PATCH V9 5/5] pci: add driver auto bind for hot insertion Jeff Guo
  4 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-01-10  9:12 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, gaetan.rivet
  Cc: konstantin.ananyev, jblunck, shreyansh.jain, jingjing.wu, dev,
	jia.guo, thomas, helin.zhang, motih

when detect hot removal uevent of device, the device resource become invalid,
in order to avoid unexpected usage of this resource, remap the device resource
to be a fake memory, that would lead the application keep running well but not
encounter system core dump.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v9->v8:
split the patch set into small and explicit patch
---
 drivers/bus/pci/bsd/pci.c                 | 23 ++++++++++++++++++++
 drivers/bus/pci/linux/pci.c               | 34 ++++++++++++++++++++++++++++++
 drivers/bus/pci/pci_common.c              | 22 +++++++++++++++++++
 drivers/bus/pci/pci_common_uio.c          | 28 +++++++++++++++++++++++++
 drivers/bus/pci/private.h                 | 12 +++++++++++
 drivers/bus/pci/rte_bus_pci.h             | 11 ++++++++++
 drivers/bus/vdev/vdev.c                   |  9 +++++++-
 lib/librte_eal/common/eal_common_bus.c    |  1 +
 lib/librte_eal/common/include/rte_bus.h   | 17 +++++++++++++++
 lib/librte_eal/linuxapp/eal/eal_dev.c     | 35 ++++++++++++++++++++++++++++---
 lib/librte_eal/linuxapp/igb_uio/igb_uio.c |  6 ++++++
 lib/librte_pci/rte_pci.c                  | 20 ++++++++++++++++++
 lib/librte_pci/rte_pci.h                  | 17 +++++++++++++++
 13 files changed, 231 insertions(+), 4 deletions(-)

diff --git a/drivers/bus/pci/bsd/pci.c b/drivers/bus/pci/bsd/pci.c
index 655b34b..d7165b9 100644
--- a/drivers/bus/pci/bsd/pci.c
+++ b/drivers/bus/pci/bsd/pci.c
@@ -97,6 +97,29 @@ rte_pci_unmap_device(struct rte_pci_device *dev)
 	}
 }
 
+/* re-map pci device */
+int
+rte_pci_remap_device(struct rte_pci_device *dev)
+{
+	int ret;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	switch (dev->kdrv) {
+	case RTE_KDRV_NIC_UIO:
+		ret = pci_uio_remap_resource(dev);
+		break;
+	default:
+		RTE_LOG(DEBUG, EAL,
+			"  Not managed by a supported kernel driver, skipped\n");
+		ret = 1;
+		break;
+	}
+
+	return ret;
+}
+
 void
 pci_uio_free_resource(struct rte_pci_device *dev,
 		struct mapped_pci_resource *uio_res)
diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index 25f907e..7aa3079 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -116,6 +116,38 @@ rte_pci_unmap_device(struct rte_pci_device *dev)
 	}
 }
 
+/* Map pci device */
+int
+rte_pci_remap_device(struct rte_pci_device *dev)
+{
+	int ret = -1;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	switch (dev->kdrv) {
+	case RTE_KDRV_VFIO:
+#ifdef VFIO_PRESENT
+		/* no thing to do */
+#endif
+		break;
+	case RTE_KDRV_IGB_UIO:
+	case RTE_KDRV_UIO_GENERIC:
+		if (rte_eal_using_phys_addrs()) {
+			/* map resources for devices that use uio */
+			ret = pci_uio_remap_resource(dev);
+		}
+		break;
+	default:
+		RTE_LOG(DEBUG, EAL,
+			"  Not managed by a supported kernel driver, skipped\n");
+		ret = 1;
+		break;
+	}
+
+	return ret;
+}
+
 void *
 pci_find_max_end_va(void)
 {
@@ -357,6 +389,8 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr *addr)
 		rte_pci_add_device(dev);
 	}
 
+	dev->device.state = RTE_DEV_PARSED;
+	TAILQ_INIT(&(dev->device.uev_cbs));
 	return 0;
 }
 
diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
index c4415a0..3fbe9d7 100644
--- a/drivers/bus/pci/pci_common.c
+++ b/drivers/bus/pci/pci_common.c
@@ -282,6 +282,7 @@ pci_probe_all_drivers(struct rte_pci_device *dev)
 		if (rc > 0)
 			/* positive value means driver doesn't support it */
 			continue;
+		dev->device.state = RTE_DEV_PROBED;
 		return 0;
 	}
 	return 1;
@@ -481,6 +482,7 @@ rte_pci_insert_device(struct rte_pci_device *exist_pci_dev,
 void
 rte_pci_remove_device(struct rte_pci_device *pci_dev)
 {
+	RTE_LOG(DEBUG, EAL, " rte_pci_remove_device for device list\n");
 	TAILQ_REMOVE(&rte_pci_bus.device_list, pci_dev, next);
 }
 
@@ -522,6 +524,25 @@ pci_find_device_by_name(const struct rte_device *start,
 }
 
 static int
+pci_remap_device(struct rte_device *dev)
+{
+	struct rte_pci_device *pdev;
+	int ret;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	pdev = RTE_DEV_TO_PCI(dev);
+
+	/* remap resources for devices that use igb_uio */
+	ret = rte_pci_remap_device(pdev);
+	if (ret != 0)
+		RTE_LOG(ERR, EAL, "failed to remap device %s",
+			dev->name);
+	return ret;
+}
+
+static int
 pci_plug(struct rte_device *dev)
 {
 	return pci_probe_all_drivers(RTE_DEV_TO_PCI(dev));
@@ -552,6 +573,7 @@ struct rte_pci_bus rte_pci_bus = {
 		.unplug = pci_unplug,
 		.parse = pci_parse,
 		.get_iommu_class = rte_pci_get_iommu_class,
+		.remap_device = pci_remap_device,
 	},
 	.device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
 	.driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
diff --git a/drivers/bus/pci/pci_common_uio.c b/drivers/bus/pci/pci_common_uio.c
index dd84ec8..a4bc473 100644
--- a/drivers/bus/pci/pci_common_uio.c
+++ b/drivers/bus/pci/pci_common_uio.c
@@ -147,6 +147,34 @@ pci_uio_unmap(struct mapped_pci_resource *uio_res)
 	}
 }
 
+/* remap the PCI resource of a PCI device in private virtual memory */
+int
+pci_uio_remap_resource(struct rte_pci_device *dev)
+{
+	int i;
+	uint64_t phaddr;
+	void *map_address;
+
+	/* Map all BARs */
+	for (i = 0; i != PCI_MAX_RESOURCE; i++) {
+		/* skip empty BAR */
+		phaddr = dev->mem_resource[i].phys_addr;
+		if (phaddr == 0)
+			continue;
+		map_address = pci_map_private_resource(
+				dev->mem_resource[i].addr, 0,
+				(size_t)dev->mem_resource[i].len);
+		if (map_address == MAP_FAILED)
+			goto error;
+		memset(map_address, 0xFF, (size_t)dev->mem_resource[i].len);
+		dev->mem_resource[i].addr = map_address;
+	}
+
+	return 0;
+error:
+	return -1;
+}
+
 static struct mapped_pci_resource *
 pci_uio_find_resource(struct rte_pci_device *dev)
 {
diff --git a/drivers/bus/pci/private.h b/drivers/bus/pci/private.h
index 2283f09..10baa1a 100644
--- a/drivers/bus/pci/private.h
+++ b/drivers/bus/pci/private.h
@@ -202,6 +202,18 @@ void pci_uio_free_resource(struct rte_pci_device *dev,
 		struct mapped_pci_resource *uio_res);
 
 /**
+ * remap the pci uio resource..
+ *
+ * @param dev
+ *   Point to the struct rte pci device.
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+pci_uio_remap_resource(struct rte_pci_device *dev);
+
+/**
  * Map device memory to uio resource
  *
  * This function is private to EAL.
diff --git a/drivers/bus/pci/rte_bus_pci.h b/drivers/bus/pci/rte_bus_pci.h
index d4a2996..65337eb 100644
--- a/drivers/bus/pci/rte_bus_pci.h
+++ b/drivers/bus/pci/rte_bus_pci.h
@@ -52,6 +52,8 @@ extern "C" {
 #include <sys/queue.h>
 #include <stdint.h>
 #include <inttypes.h>
+#include <unistd.h>
+#include <fcntl.h>
 
 #include <rte_debug.h>
 #include <rte_interrupts.h>
@@ -197,6 +199,15 @@ int rte_pci_map_device(struct rte_pci_device *dev);
 void rte_pci_unmap_device(struct rte_pci_device *dev);
 
 /**
+ * Remap this device
+ *
+ * @param dev
+ *   A pointer to a rte_pci_device structure describing the device
+ *   to use
+ */
+int rte_pci_remap_device(struct rte_pci_device *dev);
+
+/**
  * Dump the content of the PCI bus.
  *
  * @param f
diff --git a/drivers/bus/vdev/vdev.c b/drivers/bus/vdev/vdev.c
index cac2aa0..c9cd369 100644
--- a/drivers/bus/vdev/vdev.c
+++ b/drivers/bus/vdev/vdev.c
@@ -323,7 +323,6 @@ vdev_find_device(const struct rte_device *start, rte_dev_cmp_t cmp,
 	return NULL;
 }
 
-
 static struct rte_device *
 vdev_find_device_by_name(const struct rte_device *start,
 		rte_dev_cmp_name_t cmp_name,
@@ -343,6 +342,13 @@ vdev_find_device_by_name(const struct rte_device *start,
 }
 
 static int
+vdev_remap_device(struct rte_device *dev)
+{
+	RTE_SET_USED(dev);
+	return 0;
+}
+
+static int
 vdev_plug(struct rte_device *dev)
 {
 	return vdev_probe_all_drivers(RTE_DEV_TO_VDEV(dev));
@@ -362,6 +368,7 @@ static struct rte_bus rte_vdev_bus = {
 	.plug = vdev_plug,
 	.unplug = vdev_unplug,
 	.parse = vdev_parse,
+	.remap_device = vdev_remap_device,
 };
 
 RTE_REGISTER_BUS(vdev, rte_vdev_bus);
diff --git a/lib/librte_eal/common/eal_common_bus.c b/lib/librte_eal/common/eal_common_bus.c
index efd5539..bdb0e54 100644
--- a/lib/librte_eal/common/eal_common_bus.c
+++ b/lib/librte_eal/common/eal_common_bus.c
@@ -54,6 +54,7 @@ rte_bus_register(struct rte_bus *bus)
 	RTE_VERIFY(bus->find_device_by_name);
 	/* Buses supporting driver plug also require unplug. */
 	RTE_VERIFY(!bus->plug || bus->unplug);
+	RTE_VERIFY(bus->remap_device);
 
 	TAILQ_INSERT_TAIL(&rte_bus_list, bus, next);
 	RTE_LOG(DEBUG, EAL, "Registered [%s] bus.\n", bus->name);
diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
index 6dcfdb3..78990bc 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -196,6 +196,22 @@ typedef int (*rte_bus_unplug_t)(struct rte_device *dev);
 typedef int (*rte_bus_parse_t)(const char *name, void *addr);
 
 /**
+ * Implementation specific remap function which is responsible for
+ * remmaping devices on that bus from original share memory resource
+ * to a private memory resource for the sake of device has been removal,
+ * when detect the device removal event invoke from the kernel side,
+ * prior to call this function before any operation for device hw.
+ *
+ * @param dev
+ *	Device pointer that was returned by a previous call to find_device.
+ *
+ * @return
+ *	0 on success.
+ *	!0 on error.
+ */
+typedef int (*rte_bus_remap_device_t)(struct rte_device *dev);
+
+/**
  * Bus scan policies
  */
 enum rte_bus_scan_mode {
@@ -239,6 +255,7 @@ struct rte_bus {
 	rte_bus_plug_t plug;         /**< Probe single device for drivers */
 	rte_bus_unplug_t unplug;     /**< Remove single device from driver */
 	rte_bus_parse_t parse;       /**< Parse a device name */
+	rte_bus_remap_device_t remap_device;       /**< remap a device */
 	struct rte_bus_conf conf;    /**< Bus configuration */
 	rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
 };
diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
index 9d347f2..d68ef9b 100644
--- a/lib/librte_eal/linuxapp/eal/eal_dev.c
+++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
@@ -32,6 +32,12 @@ bool service_no_init = true;
 
 #define DEV_EV_MNT_SERVICE_NAME "device_event_monitor_service"
 
+static void sig_handler(int signum)
+{
+	if (signum == SIGINT || signum == SIGTERM)
+		rte_dev_monitor_stop();
+}
+
 static int
 dev_monitor_fd_new(void)
 {
@@ -168,6 +174,7 @@ dev_uev_process(struct epoll_event *events, int nfds)
 	struct rte_bus *bus;
 	struct rte_device *dev;
 	struct rte_eal_uevent uevent;
+	int ret;
 	int i;
 
 	for (i = 0; i < nfds; i++) {
@@ -187,9 +194,17 @@ dev_uev_process(struct epoll_event *events, int nfds)
 
 				if ((!dev) || dev->state == RTE_DEV_UNDEFINED)
 					return 0;
-				return(_rte_dev_callback_process(dev,
-				  RTE_DEV_EVENT_REMOVE,
-				  NULL));
+				dev->state = RTE_DEV_FAULT;
+
+				/**
+				 * remap the resource to be fake
+				 * before user's removal processing
+				 */
+				ret = bus->remap_device(dev);
+				if (!ret)
+					return(_rte_dev_callback_process(dev,
+					  RTE_DEV_EVENT_REMOVE,
+					  NULL));
 			} else if (uevent.type == RTE_DEV_EVENT_ADD) {
 				if (dev == NULL) {
 					return(_rte_dev_callback_process(NULL,
@@ -215,12 +230,26 @@ dev_uev_process(struct epoll_event *events, int nfds)
  */
 static int32_t dev_uev_monitoring(__rte_unused void *arg)
 {
+	struct sigaction act;
+	sigset_t mask;
 	int netlink_fd = -1;
 	struct epoll_event ep_kernel;
 	int fd_ep = -1;
 
 	service_exit = false;
 
+	/* set signal handlers */
+	memset(&act, 0x00, sizeof(struct sigaction));
+	act.sa_handler = sig_handler;
+	sigemptyset(&act.sa_mask);
+	act.sa_flags = SA_RESTART;
+	sigaction(SIGINT, &act, NULL);
+	sigaction(SIGTERM, &act, NULL);
+	sigemptyset(&mask);
+	sigaddset(&mask, SIGINT);
+	sigaddset(&mask, SIGTERM);
+	sigprocmask(SIG_UNBLOCK, &mask, NULL);
+
 	fd_ep = epoll_create1(EPOLL_CLOEXEC);
 	if (fd_ep < 0) {
 		RTE_LOG(ERR, EAL, "error creating epoll fd: %m\n");
diff --git a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
index a3a98c1..d0e07b4 100644
--- a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
+++ b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
@@ -354,6 +354,12 @@ igbuio_pci_release(struct uio_info *info, struct inode *inode)
 	struct rte_uio_pci_dev *udev = info->priv;
 	struct pci_dev *dev = udev->pdev;
 
+	/* check if device have been remove before release */
+	if ((&dev->dev.kobj)->state_remove_uevent_sent == 1) {
+		pr_info("The device have been removed\n");
+		return -1;
+	}
+
 	/* disable interrupts */
 	igbuio_pci_disable_interrupts(udev);
 
diff --git a/lib/librte_pci/rte_pci.c b/lib/librte_pci/rte_pci.c
index 0160fc1..feb5fd7 100644
--- a/lib/librte_pci/rte_pci.c
+++ b/lib/librte_pci/rte_pci.c
@@ -172,6 +172,26 @@ rte_pci_addr_parse(const char *str, struct rte_pci_addr *addr)
 	return -1;
 }
 
+/* map a private resource from an address*/
+void *
+pci_map_private_resource(void *requested_addr, off_t offset, size_t size)
+{
+	void *mapaddr;
+
+	mapaddr = mmap(requested_addr, size,
+			   PROT_READ | PROT_WRITE,
+			   MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
+	if (mapaddr == MAP_FAILED) {
+		RTE_LOG(ERR, EAL, "%s(): cannot mmap(%p, 0x%lx, 0x%lx): "
+			"%s (%p)\n",
+			__func__, requested_addr,
+			(unsigned long)size, (unsigned long)offset,
+			strerror(errno), mapaddr);
+	} else
+		RTE_LOG(DEBUG, EAL, "  PCI memory mapped at %p\n", mapaddr);
+
+	return mapaddr;
+}
 
 /* map a particular resource from a file */
 void *
diff --git a/lib/librte_pci/rte_pci.h b/lib/librte_pci/rte_pci.h
index 4f2cd18..f6091a6 100644
--- a/lib/librte_pci/rte_pci.h
+++ b/lib/librte_pci/rte_pci.h
@@ -227,6 +227,23 @@ int rte_pci_addr_cmp(const struct rte_pci_addr *addr,
 int rte_pci_addr_parse(const char *str, struct rte_pci_addr *addr);
 
 /**
+ * @internal
+ * Map to a particular private resource.
+ *
+ * @param requested_addr
+ *      The starting address for the new mapping range.
+ * @param offset
+ *      The offset for the mapping range.
+ * @param size
+ *      The size for the mapping range.
+ * @return
+ *   - On success, the function returns a pointer to the mapped area.
+ *   - On error, the value MAP_FAILED is returned.
+ */
+void *pci_map_private_resource(void *requested_addr, off_t offset,
+		size_t size);
+
+/**
  * Map a particular resource from a file.
  *
  * @param requested_addr
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH V9 5/5] pci: add driver auto bind for hot insertion
  2018-01-10  9:12                             ` [PATCH V9 0/5] add uevent mechanism in eal framework Jeff Guo
                                                 ` (3 preceding siblings ...)
  2018-01-10  9:12                               ` [PATCH V9 4/5] pci_uio: add uevent hotplug failure handler in pci Jeff Guo
@ 2018-01-10  9:12                               ` Jeff Guo
  2018-03-21  6:11                                 ` [PATCH V15 1/2] pci_uio: add uevent hotplug failure handler in uio Jeff Guo
  4 siblings, 1 reply; 494+ messages in thread
From: Jeff Guo @ 2018-01-10  9:12 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, gaetan.rivet
  Cc: konstantin.ananyev, jblunck, shreyansh.jain, jingjing.wu, dev,
	jia.guo, thomas, helin.zhang, motih

Normally we bind nic driver before application running, so if we want to
automatically driver binding after application run, need to implement
a auto bind function, that would benefit for hot insertion case, when detect
hot insertion uevent of device, auto bind the driver according some user
policy and then attach device, let app running smoothly and automatically
when hotplug behavior occur.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v9->v8:
split the patch set into small and explicit patch
---
 drivers/bus/pci/bsd/pci.c               |  7 +++++
 drivers/bus/pci/linux/pci.c             | 53 +++++++++++++++++++++++++++++++++
 drivers/bus/pci/pci_common.c            |  1 +
 drivers/bus/pci/rte_bus_pci.h           | 14 +++++++++
 drivers/bus/vdev/vdev.c                 |  9 ++++++
 lib/librte_eal/common/eal_common_bus.c  |  1 +
 lib/librte_eal/common/include/rte_bus.h | 18 +++++++++++
 lib/librte_eal/linuxapp/eal/eal_dev.c   |  7 +++++
 8 files changed, 110 insertions(+)

diff --git a/drivers/bus/pci/bsd/pci.c b/drivers/bus/pci/bsd/pci.c
index d7165b9..2d1d24f 100644
--- a/drivers/bus/pci/bsd/pci.c
+++ b/drivers/bus/pci/bsd/pci.c
@@ -672,3 +672,10 @@ rte_pci_ioport_unmap(struct rte_pci_ioport *p)
 
 	return ret;
 }
+
+int
+rte_pci_dev_bind_driver(const char *dev_name, const char *drv_type)
+{
+	return -1;
+}
+
diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index 7aa3079..cec1489 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -859,3 +859,56 @@ rte_pci_ioport_unmap(struct rte_pci_ioport *p)
 
 	return ret;
 }
+
+int
+rte_pci_dev_bind_driver(const char *dev_name, const char *drv_type)
+{
+	char drv_bind_path[1024];
+	char drv_override_path[1024]; /* contains the /dev/uioX */
+	int drv_override_fd;
+	int drv_bind_fd;
+
+	RTE_SET_USED(drv_type);
+
+	snprintf(drv_override_path, sizeof(drv_override_path),
+		"/sys/bus/pci/devices/%s/driver_override", dev_name);
+
+	/* specify the driver for a device by writing to driver_override */
+	drv_override_fd = open(drv_override_path, O_WRONLY);
+	if (drv_override_fd < 0) {
+		RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
+			drv_override_path, strerror(errno));
+		goto err;
+	}
+
+	if (write(drv_override_fd, drv_type, sizeof(drv_type)) < 0) {
+		RTE_LOG(ERR, EAL,
+			"Error: bind failed - Cannot write "
+			"driver %s to device %s\n", drv_type, dev_name);
+		goto err;
+	}
+
+	close(drv_override_fd);
+
+	snprintf(drv_bind_path, sizeof(drv_bind_path),
+		"/sys/bus/pci/drivers/%s/bind", drv_type);
+
+	/* do the bind by writing device to the specific driver  */
+	drv_bind_fd = open(drv_bind_path, O_WRONLY | O_APPEND);
+	if (drv_bind_fd < 0) {
+		RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
+			drv_bind_path, strerror(errno));
+		goto err;
+	}
+
+	if (write(drv_bind_fd, dev_name, sizeof(dev_name)) < 0)
+		goto err;
+
+	close(drv_bind_fd);
+	return 0;
+err:
+	close(drv_override_fd);
+	close(drv_bind_fd);
+	return -1;
+}
+
diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
index 3fbe9d7..54601a9 100644
--- a/drivers/bus/pci/pci_common.c
+++ b/drivers/bus/pci/pci_common.c
@@ -574,6 +574,7 @@ struct rte_pci_bus rte_pci_bus = {
 		.parse = pci_parse,
 		.get_iommu_class = rte_pci_get_iommu_class,
 		.remap_device = pci_remap_device,
+		.bind_driver = rte_pci_dev_bind_driver,
 	},
 	.device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
 	.driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
diff --git a/drivers/bus/pci/rte_bus_pci.h b/drivers/bus/pci/rte_bus_pci.h
index 65337eb..1662f3b 100644
--- a/drivers/bus/pci/rte_bus_pci.h
+++ b/drivers/bus/pci/rte_bus_pci.h
@@ -344,6 +344,20 @@ void rte_pci_ioport_read(struct rte_pci_ioport *p,
 void rte_pci_ioport_write(struct rte_pci_ioport *p,
 		const void *data, size_t len, off_t offset);
 
+/**
+ * It can be used to bind a device to a specific type of driver.
+ *
+ * @param dev_name
+ *  The device name.
+ * @param drv_type
+ *  The specific driver's type.
+ *
+ * @return
+ *  - On success, zero.
+ *  - On failure, a negative value.
+ */
+int rte_pci_dev_bind_driver(const char *dev_name, const char *drv_type);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/drivers/bus/vdev/vdev.c b/drivers/bus/vdev/vdev.c
index c9cd369..773f6e0 100644
--- a/drivers/bus/vdev/vdev.c
+++ b/drivers/bus/vdev/vdev.c
@@ -349,6 +349,14 @@ vdev_remap_device(struct rte_device *dev)
 }
 
 static int
+vdev_bind_driver(const char *dev_name, const char *drv_type)
+{
+	RTE_SET_USED(dev_name);
+	RTE_SET_USED(drv_type);
+	return 0;
+}
+
+static int
 vdev_plug(struct rte_device *dev)
 {
 	return vdev_probe_all_drivers(RTE_DEV_TO_VDEV(dev));
@@ -369,6 +377,7 @@ static struct rte_bus rte_vdev_bus = {
 	.unplug = vdev_unplug,
 	.parse = vdev_parse,
 	.remap_device = vdev_remap_device,
+	.bind_driver = vdev_bind_driver,
 };
 
 RTE_REGISTER_BUS(vdev, rte_vdev_bus);
diff --git a/lib/librte_eal/common/eal_common_bus.c b/lib/librte_eal/common/eal_common_bus.c
index bdb0e54..b7219c9 100644
--- a/lib/librte_eal/common/eal_common_bus.c
+++ b/lib/librte_eal/common/eal_common_bus.c
@@ -55,6 +55,7 @@ rte_bus_register(struct rte_bus *bus)
 	/* Buses supporting driver plug also require unplug. */
 	RTE_VERIFY(!bus->plug || bus->unplug);
 	RTE_VERIFY(bus->remap_device);
+	RTE_VERIFY(bus->bind_driver);
 
 	TAILQ_INSERT_TAIL(&rte_bus_list, bus, next);
 	RTE_LOG(DEBUG, EAL, "Registered [%s] bus.\n", bus->name);
diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
index 78990bc..fb03a74 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -212,6 +212,23 @@ typedef int (*rte_bus_parse_t)(const char *name, void *addr);
 typedef int (*rte_bus_remap_device_t)(struct rte_device *dev);
 
 /**
+ * Implementation specific bind driver function which is responsible for bind
+ * a explicit type of driver with a devices on that bus.
+ *
+ * @param dev_name
+ *	device textual description.
+ *
+ * @param drv_type
+ *	driver type textual description.
+ *
+ * @return
+ *	0 on success.
+ *	!0 on error.
+ */
+typedef int (*rte_bus_bind_driver_t)(const char *dev_name,
+				const char *drv_type);
+
+/**
  * Bus scan policies
  */
 enum rte_bus_scan_mode {
@@ -256,6 +273,7 @@ struct rte_bus {
 	rte_bus_unplug_t unplug;     /**< Remove single device from driver */
 	rte_bus_parse_t parse;       /**< Parse a device name */
 	rte_bus_remap_device_t remap_device;       /**< remap a device */
+	rte_bus_bind_driver_t bind_driver; /**< bind a driver for bus device */
 	struct rte_bus_conf conf;    /**< Bus configuration */
 	rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
 };
diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
index d68ef9b..35db461 100644
--- a/lib/librte_eal/linuxapp/eal/eal_dev.c
+++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
@@ -207,6 +207,13 @@ dev_uev_process(struct epoll_event *events, int nfds)
 					  NULL));
 			} else if (uevent.type == RTE_DEV_EVENT_ADD) {
 				if (dev == NULL) {
+					/**
+					 * bind the driver to the device
+					 * before user's add processing
+					 */
+					bus->bind_driver(
+						uevent.devname,
+						"igb_uio");
 					return(_rte_dev_callback_process(NULL,
 					  RTE_DEV_EVENT_ADD,
 					  uevent.devname));
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* Re: [PATCH v7 1/2] eal: add uevent monitor for hot plug
  2018-01-09 12:42                                       ` Gaëtan Rivet
@ 2018-01-10  9:29                                         ` Guo, Jia
  0 siblings, 0 replies; 494+ messages in thread
From: Guo, Jia @ 2018-01-10  9:29 UTC (permalink / raw)
  To: Gaëtan Rivet
  Cc: Thomas Monjalon, Mordechay Haimovsky, dev, stephen, Richardson,
	Bruce, Yigit, Ferruh, Ananyev, Konstantin, shreyansh.jain, Wu,
	Jingjing, Zhang, Helin, Van Haaren, Harry



On 1/9/2018 8:42 PM, Gaëtan Rivet wrote:
> Hi Jeff,
>
> On Tue, Jan 09, 2018 at 12:08:52PM +0000, Guo, Jia wrote:
>> Your comments about split it totally make sense ,no doubt that, but my question is that if split api with the funcational , so the function part should be set null implement or stake. Any other good idea or tip for that.
>>
> Please avoid top-posting on the mailing list, it is confusing when
> reading a thread intertwined with inner-posted mails.
>
> Regarding your issue, it is fine to propose a first skeleton API with
> bare implementations, then progressively use your new functions where
> relevant.
>
> It is only necessary to ensure compilation is always possible between
> each patch. The API itself need not be usable, as long as the patch
> order remains coherent and meaningful for review.
>
> Otherwise, sorry about not doing a review earlier, I didn't think I knew
> enough about uevent to provide useful comments. However after a quick
> reading I may be able to provide a few remarks.
>
> I will wait for your split before doing so.
make sense, new patch set version have been sent, for you reference.
>> Best regards,
>> Jeff Guo
>>
>>
>> -----Original Message-----
>> From: Thomas Monjalon [mailto:thomas@monjalon.net]
>> Sent: Tuesday, January 9, 2018 7:45 PM
>> To: Guo, Jia <jia.guo@intel.com>
>> Cc: Mordechay Haimovsky <motih@mellanox.com>; dev@dpdk.org; stephen@networkplumber.org; Richardson, Bruce <bruce.richardson@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com>; gaetan.rivet@6wind.com; Ananyev, Konstantin <konstantin.ananyev@intel.com>; shreyansh.jain@nxp.com; Wu, Jingjing <jingjing.wu@intel.com>; Zhang, Helin <helin.zhang@intel.com>; Van Haaren, Harry <harry.van.haaren@intel.com>
>> Subject: Re: [dpdk-dev] [PATCH v7 1/2] eal: add uevent monitor for hot plug
>>
>> 09/01/2018 12:39, Guo, Jia:
>>> So, how can separate the patch into more small patch, use stake or null implement in function. I think we should consider if it is a economic way now, if I could explain more detail in code for you all not very familiar the background? I have sent v8, please check, thanks all.
>> The v8 is not split enough.
>> Please try to address all my comments.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v7 1/2] eal: add uevent monitor for hot plug
  2018-01-09 13:44                                       ` Thomas Monjalon
@ 2018-01-10  9:32                                         ` Guo, Jia
  0 siblings, 0 replies; 494+ messages in thread
From: Guo, Jia @ 2018-01-10  9:32 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Mordechay Haimovsky, dev, stephen, Richardson, Bruce, Yigit,
	Ferruh, gaetan.rivet, Ananyev, Konstantin, shreyansh.jain, Wu,
	Jingjing, Zhang, Helin, Van Haaren, Harry



On 1/9/2018 9:44 PM, Thomas Monjalon wrote:
> 09/01/2018 13:08, Guo, Jia:
>> Your comments about split it totally make sense ,no doubt that, but my question is that if split api with the funcational , so the function part should be set null implement or stake. Any other good idea or tip for that.
> Yes when introducing the callback API first, there will be no
> implementation, so the callbacks are not called.
> If needed you can have some empty functions.
i think we all want to make review more effective in any good way, so 
the v9 patch set have been sent, please check.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V9 1/5] eal: add uevent monitor api and callback func
  2018-01-10  9:12                               ` [PATCH V9 1/5] eal: add uevent monitor api and callback func Jeff Guo
@ 2018-01-10 16:34                                 ` Stephen Hemminger
  2018-01-11  1:43                                 ` Thomas Monjalon
  1 sibling, 0 replies; 494+ messages in thread
From: Stephen Hemminger @ 2018-01-10 16:34 UTC (permalink / raw)
  To: Jeff Guo
  Cc: bruce.richardson, ferruh.yigit, gaetan.rivet, konstantin.ananyev,
	jblunck, shreyansh.jain, jingjing.wu, dev, thomas, helin.zhang,
	motih

On Wed, 10 Jan 2018 17:12:20 +0800
Jeff Guo <jia.guo@intel.com> wrote:

> +static int
> +dev_monitor_fd_new(void)
> +{
> +
> +	int uevent_fd;
> +
> +	uevent_fd = socket(PF_NETLINK, SOCK_RAW | SOCK_CLOEXEC |
> +			SOCK_NONBLOCK,
> +			NETLINK_KOBJECT_UEVENT);
> +	if (uevent_fd < 0) {

If you used a blocking socket, then epoll would not be necessary.
There is one netlink socket for whole system, and the thread is only
reading from one fd.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V9 1/5] eal: add uevent monitor api and callback func
  2018-01-10  9:12                               ` [PATCH V9 1/5] eal: add uevent monitor api and callback func Jeff Guo
  2018-01-10 16:34                                 ` Stephen Hemminger
@ 2018-01-11  1:43                                 ` Thomas Monjalon
  2018-01-11 14:24                                   ` Guo, Jia
  1 sibling, 1 reply; 494+ messages in thread
From: Thomas Monjalon @ 2018-01-11  1:43 UTC (permalink / raw)
  To: Jeff Guo
  Cc: dev, stephen, bruce.richardson, ferruh.yigit, gaetan.rivet,
	konstantin.ananyev, jblunck, shreyansh.jain, jingjing.wu,
	helin.zhang, motih

Hi,

Thanks for splitting the patches.
I will review the first one today. Please see below.

10/01/2018 10:12, Jeff Guo:
> --- /dev/null
> +++ b/lib/librte_eal/bsdapp/eal/eal_dev.c
> +int
> +rte_dev_monitor_start(void)
> +{
> +	return -1;
> +}
> +
> +int
> +rte_dev_monitor_stop(void)
> +{
> +	return -1;
> +}

You should add a log to show it is not supported.

> --- /dev/null
> +++ b/lib/librte_eal/bsdapp/eal/include/exec-env/rte_dev.h
> +#ifndef _RTE_DEV_H_
> +#error "don't include this file directly, please include generic <rte_dev.h>"
> +#endif

Why creating different rte_dev.h for BSD and Linux?
This is an API, it should be the same.

> +/**
> + * Start the device uevent monitoring.
> + *
> + * @param none
> + * @return
> + *   - On success, zero.
> + *   - On failure, a negative value.
> + */
> +int
> +rte_dev_monitor_start(void);
> +
> +/**
> + * Stop the device uevent monitoring .
> + *
> + * @param none
> + * @return
> + *   - On success, zero.
> + *   - On failure, a negative value.
> + */
> +
> +int
> +rte_dev_monitor_stop(void);

> --- a/lib/librte_eal/common/eal_common_dev.c
> +++ b/lib/librte_eal/common/eal_common_dev.c
> @@ -42,9 +42,32 @@
>  #include <rte_devargs.h>
>  #include <rte_debug.h>
>  #include <rte_log.h>
> +#include <rte_spinlock.h>
> +#include <rte_malloc.h>
>  
>  #include "eal_private.h"
>  
> +/* spinlock for device callbacks */
> +static rte_spinlock_t rte_dev_cb_lock = RTE_SPINLOCK_INITIALIZER;

Please rename to rte_dev_event_lock.
Let's use rte_dev_event_ prefix consistently.

> + * The user application callback description.
> + *
> + * It contains callback address to be registered by user application,
> + * the pointer to the parameters for callback, and the event type.
> + */
> +struct rte_eal_dev_callback {

Rename to rte_dev_event?

> +	TAILQ_ENTRY(rte_eal_dev_callback) next; /**< Callbacks list */
> +	rte_eal_dev_cb_fn cb_fn;                /**< Callback address */

Rename to rte_dev_event_callback?

> +	void *cb_arg;                           /**< Parameter for callback */

Comment should be about opaque context.

> +	void *ret_param;                        /**< Return parameter */
> +	enum rte_dev_event_type event;      /**< device event type */
> +	uint32_t active;                        /**< Callback is executing */

Why active is needed?

> +};
> +
> +/* A genaral callback for all new devices be added onto the bus */
> +static struct rte_eal_dev_callback *dev_add_cb;

It should not be a different callback for new devices.
You must allow registering the callback for all and new devices.
Please look how it's done for ethdev:
	https://dpdk.org/patch/32900/

> +int
> +rte_dev_callback_register(struct rte_device *device,
> +			enum rte_dev_event_type event,
> +			rte_eal_dev_cb_fn cb_fn, void *cb_arg)
> +{

Why passing an event type at registration?
I think the event processing dispatch must be done in the callback,
not at registration.

> +		/* allocate a new interrupt callback entity */
> +		user_cb = rte_zmalloc("eal device event",
> +					sizeof(*user_cb), 0);

No need to use rte_malloc here.
Please check this callback API patch:
	https://dpdk.org/patch/33144/

> --- a/lib/librte_eal/common/include/rte_dev.h
> +++ b/lib/librte_eal/common/include/rte_dev.h
> +enum uev_monitor_netlink_group {
> +	UEV_MONITOR_KERNEL,
> +	UEV_MONITOR_UDEV,
> +};

Please keep a namespace prefix like RTE_DEV_EVENT_ (same for enum name).
Some comments are missing for these constants.

> +/**
> + * The device event type.
> + */
> +enum rte_dev_event_type {
> +	RTE_DEV_EVENT_UNKNOWN,	/**< unknown event type */
> +	RTE_DEV_EVENT_ADD,	/**< device being added */
> +	RTE_DEV_EVENT_REMOVE,
> +				/**< device being removed */
> +	RTE_DEV_EVENT_CHANGE,
> +				/**< device status being changed,
> +				 * etc charger percent
> +				 */

What means status changed?
What means charger percent?

> +	RTE_DEV_EVENT_MOVE,	/**< device sysfs path being moved */

sysfs is Linux specific

> +	RTE_DEV_EVENT_ONLINE,	/**< device being enable */

You mean a device can be added but not enabled?
So enabling is switching it on by a register? or something else?

> +	RTE_DEV_EVENT_OFFLINE,	/**< device being disable */
> +	RTE_DEV_EVENT_MAX	/**< max value of this enum */
> +};
> +
> +struct rte_eal_uevent {
> +	enum rte_dev_event_type type;	/**< device event type */
> +	int subsystem;				/**< subsystem id */
> +	char *devname;				/**< device name */
> +	enum uev_monitor_netlink_group group;	/**< device netlink group */
> +};

I don't understand why this struct is exposed in the public API.
Please rename from rte_eal_ to rte_dev_.

> @@ -166,6 +204,8 @@ struct rte_device {
>  	const struct rte_driver *driver;/**< Associated driver */
>  	int numa_node;                /**< NUMA node connection */
>  	struct rte_devargs *devargs;  /**< Device user arguments */
> +	/** User application callbacks for device event */
> +	struct rte_eal_dev_cb_list uev_cbs;

Do not use uev word in API, it refers to uevent which is implementation
specific. You can name it event_callbacks.

I am afraid this change is breaking the ABI.
For the first time, 18.02 will be ABI stable.

> +/**
> + * It registers the callback for the specific event. Multiple
> + * callbacks cal be registered at the same time.
> + * @param event
> + *  The device event type.
> + * @param cb_fn
> + *  callback address.
> + * @param cb_arg
> + *  address of parameter for callback.
> + *
> + * @return
> + *  - On success, zero.
> + *  - On failure, a negative value.
> + */
> +int rte_dev_callback_register(struct rte_device *device,
> +			enum rte_dev_event_type event,
> +			rte_eal_dev_cb_fn cb_fn, void *cb_arg);
> +
> +/**
> + * It unregisters the callback according to the specified event.
> + *
> + * @param event
> + *  The event type which corresponding to the callback.
> + * @param cb_fn
> + *  callback address.
> + *  address of parameter for callback, (void *)-1 means to remove all
> + *  registered which has the same callback address.
> + *
> + * @return
> + *  - On success, return the number of callback entities removed.
> + *  - On failure, a negative value.
> + */
> +int rte_dev_callback_unregister(struct rte_device *device,
> +			enum rte_dev_event_type event,
> +			rte_eal_dev_cb_fn cb_fn, void *cb_arg);

Such new functions should be added as experimental.

There will be probably more to review in this patch.
Let's progress on these comments please.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH V10 1/2] eal: add uevent monitor api and callback func
  2018-01-10  9:12                               ` [PATCH V9 2/5] eal: add uevent pass and process function Jeff Guo
@ 2018-01-11 14:05                                 ` Jeff Guo
  2018-01-11 14:05                                   ` [PATCH V10 2/2] eal: add uevent pass and process function Jeff Guo
  2018-01-14 23:16                                   ` [PATCH V10 1/2] " Thomas Monjalon
  0 siblings, 2 replies; 494+ messages in thread
From: Jeff Guo @ 2018-01-11 14:05 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, gaetan.rivet
  Cc: konstantin.ananyev, jblunck, shreyansh.jain, jingjing.wu, dev,
	jia.guo, thomas, helin.zhang, motih

This patch aim to add a general uevent mechanism in eal device layer,
to enable all linux kernel object uevent monitoring, user could use these
APIs to monitor and read out the device status info that sent from the
kernel side, then corresponding to handle it, such as when detect hotplug
uevent type, user could detach or attach the device, and more it benefit
to use to do smoothly fail safe work.

About uevent monitoring:
a: add one epolling to poll the netlink socket, to monitor the uevent of
   the device.
b: add enum of rte_eal_dev_event_type and struct of rte_eal_uevent.
c: add below APIs in rte eal device layer.
   rte_dev_callback_register
   rte_dev_callback_unregister
   _rte_dev_callback_process
   rte_dev_evt_mntr_start
   rte_dev_evt_mntr_stop

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
V10->V9:
a.fix prefix issue.
b.use a common callback lists for all device and all type to replace
add callback parameter into device struct.
c.delete some unuse part.
---
 lib/librte_eal/bsdapp/eal/eal_dev.c     |  38 ++++++
 lib/librte_eal/common/eal_common_dev.c  | 124 ++++++++++++++++++
 lib/librte_eal/common/include/rte_dev.h | 101 +++++++++++++++
 lib/librte_eal/linuxapp/eal/Makefile    |   1 +
 lib/librte_eal/linuxapp/eal/eal_dev.c   | 223 ++++++++++++++++++++++++++++++++
 5 files changed, 487 insertions(+)
 create mode 100644 lib/librte_eal/bsdapp/eal/eal_dev.c
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_dev.c

diff --git a/lib/librte_eal/bsdapp/eal/eal_dev.c b/lib/librte_eal/bsdapp/eal/eal_dev.c
new file mode 100644
index 0000000..32c17e8
--- /dev/null
+++ b/lib/librte_eal/bsdapp/eal/eal_dev.c
@@ -0,0 +1,38 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdio.h>
+#include <string.h>
+#include <inttypes.h>
+#include <sys/queue.h>
+#include <sys/signalfd.h>
+#include <sys/ioctl.h>
+#include <sys/socket.h>
+#include <sys/epoll.h>
+#include <unistd.h>
+#include <signal.h>
+#include <stdbool.h>
+
+#include <rte_malloc.h>
+#include <rte_bus.h>
+#include <rte_dev.h>
+#include <rte_devargs.h>
+#include <rte_debug.h>
+#include <rte_log.h>
+
+#include "eal_thread.h"
+
+int
+rte_dev_evt_mntr_start(void)
+{
+	RTE_LOG(ERR, EAL, "Not support event monitor for FreeBSD\n");
+	return -1;
+}
+
+int
+rte_dev_evt_mntr_stop(void)
+{
+	RTE_LOG(ERR, EAL, "Not support event monitor for FreeBSD\n");
+	return -1;
+}
diff --git a/lib/librte_eal/common/eal_common_dev.c b/lib/librte_eal/common/eal_common_dev.c
index dda8f58..b602535 100644
--- a/lib/librte_eal/common/eal_common_dev.c
+++ b/lib/librte_eal/common/eal_common_dev.c
@@ -42,9 +42,31 @@
 #include <rte_devargs.h>
 #include <rte_debug.h>
 #include <rte_log.h>
+#include <rte_spinlock.h>
+#include <rte_malloc.h>
 
 #include "eal_private.h"
 
+/* spinlock for device callbacks */
+static rte_spinlock_t rte_dev_event_lock = RTE_SPINLOCK_INITIALIZER;
+
+/**
+ * The user application callback description.
+ *
+ * It contains callback address to be registered by user application,
+ * the pointer to the parameters for callback, and the event type.
+ */
+struct rte_dev_event_callback {
+	TAILQ_ENTRY(rte_dev_event_callback) next; /**< Callbacks list */
+	rte_dev_event_cb_fn cb_fn;                /**< Callback address */
+	void *cb_arg;                           /**< Callback parameter */
+	char *dev_name; /**< device name */
+	uint32_t active;                        /**< Callback is executing */
+};
+
+/* A genaral callback for all registerd devices */
+static struct rte_dev_event_cb_list dev_event_cbs;
+
 static int cmp_detached_dev_name(const struct rte_device *dev,
 	const void *_name)
 {
@@ -234,3 +256,105 @@ int rte_eal_hotplug_remove(const char *busname, const char *devname)
 	rte_eal_devargs_remove(busname, devname);
 	return ret;
 }
+
+int
+rte_dev_callback_register(char *dev_name,
+			rte_dev_event_cb_fn cb_fn, void *cb_arg)
+{
+	struct rte_dev_event_callback *event_cb = NULL;
+
+	rte_spinlock_lock(&rte_dev_event_lock);
+
+	if (TAILQ_EMPTY(&(dev_event_cbs)))
+		TAILQ_INIT(&(dev_event_cbs));
+
+	TAILQ_FOREACH(event_cb, &(dev_event_cbs), next) {
+		if (event_cb->cb_fn == cb_fn &&
+			event_cb->cb_arg == cb_arg &&
+			event_cb->dev_name == dev_name)
+			break;
+	}
+
+	/* create a new callback. */
+	if (event_cb == NULL) {
+		/* allocate a new user callback entity */
+		event_cb = malloc(sizeof(struct rte_dev_event_callback));
+		if (event_cb != NULL) {
+			event_cb->cb_fn = cb_fn;
+			event_cb->cb_arg = cb_arg;
+			event_cb->dev_name = dev_name;
+		}
+		TAILQ_INSERT_TAIL(&(dev_event_cbs), event_cb, next);
+	}
+
+	rte_spinlock_unlock(&rte_dev_event_lock);
+	return (event_cb == NULL) ? -1 : 0;
+}
+
+int
+rte_dev_callback_unregister(char *dev_name,
+			rte_dev_event_cb_fn cb_fn, void *cb_arg)
+{
+	int ret;
+	struct rte_dev_event_callback *event_cb, *next;
+
+	if (!cb_fn || dev_name == NULL)
+		return -EINVAL;
+
+	rte_spinlock_lock(&rte_dev_event_lock);
+
+	ret = 0;
+
+	for (event_cb = TAILQ_FIRST(&(dev_event_cbs)); event_cb != NULL;
+	      event_cb = next) {
+
+		next = TAILQ_NEXT(event_cb, next);
+
+		if (event_cb->cb_fn != cb_fn ||
+				(event_cb->cb_arg != (void *)-1 &&
+				event_cb->cb_arg != cb_arg) ||
+				event_cb->dev_name != dev_name)
+			continue;
+
+		/*
+		 * if this callback is not executing right now,
+		 * then remove it.
+		 */
+		if (event_cb->active == 0) {
+			TAILQ_REMOVE(&(dev_event_cbs), event_cb, next);
+			rte_free(event_cb);
+		} else {
+			ret = -EAGAIN;
+		}
+	}
+
+	rte_spinlock_unlock(&rte_dev_event_lock);
+	return ret;
+}
+
+int
+_rte_dev_callback_process(char *dev_name,
+			enum rte_dev_event_type event)
+{
+	struct rte_dev_event_callback dev_cb;
+	struct rte_dev_event_callback *cb_lst;
+	int rc = 0;
+
+	rte_spinlock_lock(&rte_dev_event_lock);
+
+	if (dev_name == NULL)
+		return -EINVAL;
+
+	TAILQ_FOREACH(cb_lst, &(dev_event_cbs), next) {
+		if (cb_lst->cb_fn == NULL || cb_lst->dev_name != dev_name)
+			continue;
+		dev_cb = *cb_lst;
+		cb_lst->active = 1;
+		rc = dev_cb.cb_fn(event,
+				dev_cb.cb_arg);
+		cb_lst->active = 0;
+	}
+
+	rte_spinlock_unlock(&rte_dev_event_lock);
+	return rc;
+}
diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h
index 9342e0c..fea037a 100644
--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -51,6 +51,29 @@ extern "C" {
 
 #include <rte_log.h>
 
+/**
+ * The device event type.
+ */
+enum rte_dev_event_type {
+	RTE_DEV_EVENT_UNKNOWN,	/**< unknown event type */
+	RTE_DEV_EVENT_ADD,	/**< device being added */
+	RTE_DEV_EVENT_REMOVE,	/**< device being removed */
+	RTE_DEV_EVENT_MAX	/**< max value of this enum */
+};
+
+struct rte_dev_event {
+	enum rte_dev_event_type type;	/**< device event type */
+	int subsystem;				/**< subsystem id */
+	char *devname;				/**< device name */
+};
+
+typedef int (*rte_dev_event_cb_fn)(enum rte_dev_event_type event,
+					void *cb_arg);
+
+struct rte_dev_event_callback;
+/** @internal Structure to keep track of registered callbacks */
+TAILQ_HEAD(rte_dev_event_cb_list, rte_dev_event_callback);
+
 __attribute__((format(printf, 2, 0)))
 static inline void
 rte_pmd_debug_trace(const char *func_name, const char *fmt, ...)
@@ -293,4 +316,82 @@ __attribute__((used)) = str
 }
 #endif
 
+/**
+ * It registers the callback for the specific event. Multiple
+ * callbacks cal be registered at the same time.
+ * @param event
+ *  The device event type.
+ * @param cb_fn
+ *  callback address.
+ * @param cb_arg
+ *  address of parameter for callback.
+ *
+ * @return
+ *  - On success, zero.
+ *  - On failure, a negative value.
+ */
+int rte_dev_callback_register(char *dev_name,
+			rte_dev_event_cb_fn cb_fn, void *cb_arg);
+
+/**
+ * It unregisters the callback according to the specified event.
+ *
+ * @param event
+ *  The event type which corresponding to the callback.
+ * @param cb_fn
+ *  callback address.
+ *  address of parameter for callback, (void *)-1 means to remove all
+ *  registered which has the same callback address.
+ *
+ * @return
+ *  - On success, return the number of callback entities removed.
+ *  - On failure, a negative value.
+ */
+int rte_dev_callback_unregister(char *dev_name,
+			rte_dev_event_cb_fn cb_fn, void *cb_arg);
+
+/**
+ * @internal Executes all the user application registered callbacks for
+ * the specific device. It is for DPDK internal user only. User
+ * application should not call it directly.
+ *
+ * @param event
+ *  The device event type.
+ * @param cb_arg
+ *  callback parameter.
+ * @param ret_param
+ *  To pass data back to user application.
+ *  This allows the user application to decide if a particular function
+ *  is permitted or not.
+ *
+ * @return
+ *  - On success, return zero.
+ *  - On failure, a negative value.
+ */
+int
+_rte_dev_callback_process(char *dev_name,
+			enum rte_dev_event_type event);
+
+/**
+ * Start the device event monitoring.
+ *
+ * @param none
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+rte_dev_evt_mntr_start(void);
+
+/**
+ * Stop the device event monitoring .
+ *
+ * @param none
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+
+int
+rte_dev_evt_mntr_stop(void);
 #endif /* _RTE_DEV_H_ */
diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index 588c0bd..43b00e5 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -39,6 +39,7 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_lcore.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_timer.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_interrupts.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_alarm.c
+SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_dev.c
 
 # from common dir
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_common_lcore.c
diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
new file mode 100644
index 0000000..bc32aab
--- /dev/null
+++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
@@ -0,0 +1,223 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdio.h>
+#include <string.h>
+#include <inttypes.h>
+#include <sys/queue.h>
+#include <sys/signalfd.h>
+#include <sys/ioctl.h>
+#include <sys/socket.h>
+#include <linux/netlink.h>
+#include <sys/epoll.h>
+#include <unistd.h>
+#include <signal.h>
+#include <stdbool.h>
+
+#include <rte_malloc.h>
+#include <rte_bus.h>
+#include <rte_dev.h>
+#include <rte_devargs.h>
+#include <rte_debug.h>
+#include <rte_log.h>
+#include <rte_service.h>
+#include <rte_service_component.h>
+
+#include "eal_thread.h"
+
+bool service_exit = true;
+
+bool service_no_init = true;
+
+#define DEV_EV_MNT_SERVICE_NAME "device_event_monitor_service"
+
+static int
+dev_monitor_fd_new(void)
+{
+
+	int uevent_fd;
+
+	uevent_fd = socket(PF_NETLINK, SOCK_RAW | SOCK_CLOEXEC |
+			SOCK_NONBLOCK,
+			NETLINK_KOBJECT_UEVENT);
+	if (uevent_fd < 0) {
+		RTE_LOG(ERR, EAL, "create uevent fd failed\n");
+		return -1;
+	}
+	return uevent_fd;
+}
+
+static int
+dev_monitor_enable(int netlink_fd)
+{
+	struct sockaddr_nl addr;
+	int ret;
+	int size = 64 * 1024;
+	int nonblock = 1;
+
+	memset(&addr, 0, sizeof(addr));
+	addr.nl_family = AF_NETLINK;
+	addr.nl_pid = 0;
+	addr.nl_groups = 0xffffffff;
+
+	if (bind(netlink_fd, (struct sockaddr *) &addr, sizeof(addr)) < 0) {
+		RTE_LOG(ERR, EAL, "bind failed\n");
+		goto err;
+	}
+
+	setsockopt(netlink_fd, SOL_SOCKET, SO_PASSCRED, &size, sizeof(size));
+
+	ret = ioctl(netlink_fd, FIONBIO, &nonblock);
+	if (ret != 0) {
+		RTE_LOG(ERR, EAL, "ioctl(FIONBIO) failed\n");
+		goto err;
+	}
+	return 0;
+err:
+	close(netlink_fd);
+	return -1;
+}
+
+static int
+dev_uev_process(__rte_unused struct epoll_event *events, __rte_unused int nfds)
+{
+	/* TODO: device uevent processing */
+	return 0;
+}
+
+/**
+ * It builds/rebuilds up the epoll file descriptor with all the
+ * file descriptors being waited on. Then handles the interrupts.
+ *
+ * @param arg
+ *  pointer. (unused)
+ *
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+static int32_t dev_uev_monitoring(__rte_unused void *arg)
+{
+	int netlink_fd = -1;
+	struct epoll_event ep_kernel;
+	int fd_ep = -1;
+
+	service_exit = false;
+
+	fd_ep = epoll_create1(EPOLL_CLOEXEC);
+	if (fd_ep < 0) {
+		RTE_LOG(ERR, EAL, "error creating epoll fd: %m\n");
+		goto out;
+	}
+
+	netlink_fd = dev_monitor_fd_new();
+
+	if (dev_monitor_enable(netlink_fd) < 0) {
+		RTE_LOG(ERR, EAL, "error subscribing to kernel events\n");
+		goto out;
+	}
+
+	memset(&ep_kernel, 0, sizeof(struct epoll_event));
+	ep_kernel.events = EPOLLIN | EPOLLPRI | EPOLLRDHUP | EPOLLHUP;
+	ep_kernel.data.fd = netlink_fd;
+	if (epoll_ctl(fd_ep, EPOLL_CTL_ADD, netlink_fd,
+		&ep_kernel) < 0) {
+		RTE_LOG(ERR, EAL, "error addding fd to epoll: %m\n");
+		goto out;
+	}
+
+	while (!service_exit) {
+		int fdcount;
+		struct epoll_event ev[1];
+
+		fdcount = epoll_wait(fd_ep, ev, 1, -1);
+		if (fdcount < 0) {
+			if (errno != EINTR)
+				RTE_LOG(ERR, EAL, "error receiving uevent "
+					"message: %m\n");
+				continue;
+			}
+
+		/* epoll_wait has at least one fd ready to read */
+		if (dev_uev_process(ev, fdcount) < 0) {
+			if (errno != EINTR)
+				RTE_LOG(ERR, EAL, "error processing uevent "
+					"message: %m\n");
+		}
+	}
+	return 0;
+out:
+	if (fd_ep >= 0)
+		close(fd_ep);
+	if (netlink_fd >= 0)
+		close(netlink_fd);
+	rte_panic("uev monitoring fail\n");
+	return -1;
+}
+
+int
+rte_dev_evt_mntr_start(void)
+{
+	int ret;
+	struct rte_service_spec service;
+	uint32_t id;
+	const uint32_t sid = 0;
+
+	if (!service_no_init)
+		return 0;
+
+	uint32_t slcore_1 = rte_get_next_lcore(/* start core */ -1,
+					       /* skip master */ 1,
+					       /* wrap */ 0);
+
+	ret = rte_service_lcore_add(slcore_1);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "dev event monitor lcore add fail");
+		return ret;
+	}
+
+	memset(&service, 0, sizeof(service));
+	snprintf(service.name, sizeof(service.name), DEV_EV_MNT_SERVICE_NAME);
+
+	service.socket_id = rte_socket_id();
+	service.callback = dev_uev_monitoring;
+	service.callback_userdata = NULL;
+	service.capabilities = 0;
+	ret = rte_service_component_register(&service, &id);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Failed to register service %s "
+			"err = %" PRId32,
+			service.name, ret);
+		return ret;
+	}
+	ret = rte_service_runstate_set(sid, 1);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Failed to set the runstate of "
+			"the service");
+		return ret;
+	}
+	ret = rte_service_component_runstate_set(id, 1);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Failed to set the backend runstate"
+			" of a component");
+		return ret;
+	}
+	ret = rte_service_map_lcore_set(sid, slcore_1, 1);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Failed to enable lcore 1 on "
+			"dev event monitor service");
+		return ret;
+	}
+	rte_service_lcore_start(slcore_1);
+	service_no_init = false;
+	return 0;
+}
+
+int
+rte_dev_evt_mntr_stop(void)
+{
+	service_exit = true;
+	service_no_init = true;
+	return 0;
+}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH V10 2/2] eal: add uevent pass and process function
  2018-01-11 14:05                                 ` [PATCH V10 1/2] eal: add uevent monitor api and callback func Jeff Guo
@ 2018-01-11 14:05                                   ` Jeff Guo
  2018-01-14 23:24                                     ` Thomas Monjalon
  2018-01-15 10:48                                     ` [PATCH V11 1/3] eal: add uevent monitor api and callback func Jeff Guo
  2018-01-14 23:16                                   ` [PATCH V10 1/2] " Thomas Monjalon
  1 sibling, 2 replies; 494+ messages in thread
From: Jeff Guo @ 2018-01-11 14:05 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, gaetan.rivet
  Cc: konstantin.ananyev, jblunck, shreyansh.jain, jingjing.wu, dev,
	jia.guo, thomas, helin.zhang, motih

In order to handle the uevent which have been detected from the kernel
side, add uevent process function, let hot plug event to be example to
show uevent mechanism how to pass the uevent and process the uevent.

About uevent passing and processing, add below functions in linux eal
dev layer. FreeBSD not support uevent ,so let it to be void and do not
implement in function.
a.dev_uev_parse
b.dev_uev_receive
c.dev_uev_process

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
V10->V9:
delete some unuse part
---
 lib/librte_eal/common/include/rte_dev.h |  23 +++++++
 lib/librte_eal/linuxapp/eal/eal_dev.c   | 109 +++++++++++++++++++++++++++++++-
 2 files changed, 130 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h
index fea037a..a3166f7 100644
--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -51,6 +51,23 @@ extern "C" {
 
 #include <rte_log.h>
 
+#define RTE_EAL_UEV_MSG_LEN 4096
+#define RTE_EAL_UEV_MSG_ELEM_LEN 128
+
+enum rte_dev_state {
+	RTE_DEV_UNDEFINED,	/**< unknown device state */
+	RTE_DEV_FAULT,	/**< device fault or error */
+	RTE_DEV_PARSED,	/**< device have been parsed on bus*/
+	RTE_DEV_PROBED,	/**< devcie have been probed driver  */
+};
+
+enum rte_dev_subsystem {
+	RTE_DEV_SUBSYSTEM_UIO,
+	RTE_DEV_SUBSYSTEM_VFIO,
+	RTE_DEV_SUBSYSTEM_PCI,
+	RTE_DEV_SUBSYSTEM_MAX
+};
+
 /**
  * The device event type.
  */
@@ -61,10 +78,16 @@ enum rte_dev_event_type {
 	RTE_DEV_EVENT_MAX	/**< max value of this enum */
 };
 
+enum event_monitor_netlink_group {
+	RTE_DEV_EVENT_MONITOR_KERNEL,
+	RTE_DEV_EVENT_MONITOR_UDEV,
+};
+
 struct rte_dev_event {
 	enum rte_dev_event_type type;	/**< device event type */
 	int subsystem;				/**< subsystem id */
 	char *devname;				/**< device name */
+	enum event_monitor_netlink_group group;	/**< device netlink group */
 };
 
 typedef int (*rte_dev_event_cb_fn)(enum rte_dev_event_type event,
diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
index bc32aab..1d0ec33 100644
--- a/lib/librte_eal/linuxapp/eal/eal_dev.c
+++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
@@ -79,10 +79,115 @@ dev_monitor_enable(int netlink_fd)
 	return -1;
 }
 
+static void
+dev_uev_parse(const char *buf, struct rte_dev_event *event)
+{
+	char action[RTE_EAL_UEV_MSG_ELEM_LEN];
+	char subsystem[RTE_EAL_UEV_MSG_ELEM_LEN];
+	char dev_path[RTE_EAL_UEV_MSG_ELEM_LEN];
+	char pci_slot_name[RTE_EAL_UEV_MSG_ELEM_LEN];
+	int i = 0;
+
+	memset(action, 0, RTE_EAL_UEV_MSG_ELEM_LEN);
+	memset(subsystem, 0, RTE_EAL_UEV_MSG_ELEM_LEN);
+	memset(dev_path, 0, RTE_EAL_UEV_MSG_ELEM_LEN);
+	memset(pci_slot_name, 0, RTE_EAL_UEV_MSG_ELEM_LEN);
+
+	while (i < RTE_EAL_UEV_MSG_LEN) {
+		for (; i < RTE_EAL_UEV_MSG_LEN; i++) {
+			if (*buf)
+				break;
+			buf++;
+		}
+		if (!strncmp(buf, "libudev", 7)) {
+			buf += 7;
+			i += 7;
+			event->group = RTE_DEV_EVENT_MONITOR_UDEV;
+		}
+		if (!strncmp(buf, "ACTION=", 7)) {
+			buf += 7;
+			i += 7;
+			snprintf(action, sizeof(action), "%s", buf);
+		} else if (!strncmp(buf, "DEVPATH=", 8)) {
+			buf += 8;
+			i += 8;
+			snprintf(dev_path, sizeof(dev_path), "%s", buf);
+		} else if (!strncmp(buf, "SUBSYSTEM=", 10)) {
+			buf += 10;
+			i += 10;
+			snprintf(subsystem, sizeof(subsystem), "%s", buf);
+		} else if (!strncmp(buf, "PCI_SLOT_NAME=", 14)) {
+			buf += 14;
+			i += 14;
+			snprintf(pci_slot_name, sizeof(subsystem), "%s", buf);
+			event->devname = pci_slot_name;
+		}
+		for (; i < RTE_EAL_UEV_MSG_LEN; i++) {
+			if (*buf == '\0')
+				break;
+			buf++;
+		}
+	}
+
+	if (!strncmp(subsystem, "pci", 3))
+		event->subsystem = RTE_DEV_SUBSYSTEM_UIO;
+	if (!strncmp(action, "add", 3))
+		event->type = RTE_DEV_EVENT_ADD;
+	if (!strncmp(action, "remove", 6))
+		event->type = RTE_DEV_EVENT_REMOVE;
+	event->devname = pci_slot_name;
+}
+
+static int
+dev_uev_receive(int fd, struct rte_dev_event *uevent)
+{
+	int ret;
+	char buf[RTE_EAL_UEV_MSG_LEN];
+
+	memset(uevent, 0, sizeof(struct rte_dev_event));
+	memset(buf, 0, RTE_EAL_UEV_MSG_LEN);
+
+	ret = recv(fd, buf, RTE_EAL_UEV_MSG_LEN - 1, MSG_DONTWAIT);
+	if (ret < 0) {
+		RTE_LOG(ERR, EAL,
+		"Socket read error(%d): %s\n",
+		errno, strerror(errno));
+		return -1;
+	} else if (ret == 0)
+		/* connection closed */
+		return -1;
+
+	dev_uev_parse(buf, uevent);
+
+	return 0;
+}
+
 static int
-dev_uev_process(__rte_unused struct epoll_event *events, __rte_unused int nfds)
+dev_uev_process(struct epoll_event *events, int nfds)
 {
-	/* TODO: device uevent processing */
+	struct rte_dev_event uevent;
+	int i;
+
+	for (i = 0; i < nfds; i++) {
+		/**
+		 * check device uevent from kernel side, no need to check
+		 * uevent from udev.
+		 */
+		if ((dev_uev_receive(events[i].data.fd, &uevent)) ||
+			(uevent.group == RTE_DEV_EVENT_MONITOR_UDEV))
+			return 0;
+
+		/* default handle all pci devcie when is being hot plug */
+		if (uevent.subsystem == RTE_DEV_SUBSYSTEM_UIO) {
+			if (uevent.type == RTE_DEV_EVENT_REMOVE) {
+				return(_rte_dev_callback_process(NULL,
+				  RTE_DEV_EVENT_REMOVE));
+			} else if (uevent.type == RTE_DEV_EVENT_ADD) {
+				return(_rte_dev_callback_process(NULL,
+				  RTE_DEV_EVENT_ADD));
+			}
+		}
+	}
 	return 0;
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* Re: [PATCH V9 1/5] eal: add uevent monitor api and callback func
  2018-01-11  1:43                                 ` Thomas Monjalon
@ 2018-01-11 14:24                                   ` Guo, Jia
  0 siblings, 0 replies; 494+ messages in thread
From: Guo, Jia @ 2018-01-11 14:24 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: dev, stephen, bruce.richardson, ferruh.yigit, gaetan.rivet,
	konstantin.ananyev, jblunck, shreyansh.jain, jingjing.wu,
	helin.zhang, motih



On 1/11/2018 9:43 AM, Thomas Monjalon wrote:
> Hi,
>
> Thanks for splitting the patches.
> I will review the first one today. Please see below.
>
> 10/01/2018 10:12, Jeff Guo:
>> --- /dev/null
>> +++ b/lib/librte_eal/bsdapp/eal/eal_dev.c
>> +int
>> +rte_dev_monitor_start(void)
>> +{
>> +	return -1;
>> +}
>> +
>> +int
>> +rte_dev_monitor_stop(void)
>> +{
>> +	return -1;
>> +}
> You should add a log to show it is not supported.
ok.
>> --- /dev/null
>> +++ b/lib/librte_eal/bsdapp/eal/include/exec-env/rte_dev.h
>> +#ifndef _RTE_DEV_H_
>> +#error "don't include this file directly, please include generic <rte_dev.h>"
>> +#endif
> Why creating different rte_dev.h for BSD and Linux?
> This is an API, it should be the same.
if no need at this time, combine it to a file.
>> +/**
>> + * Start the device uevent monitoring.
>> + *
>> + * @param none
>> + * @return
>> + *   - On success, zero.
>> + *   - On failure, a negative value.
>> + */
>> +int
>> +rte_dev_monitor_start(void);
>> +
>> +/**
>> + * Stop the device uevent monitoring .
>> + *
>> + * @param none
>> + * @return
>> + *   - On success, zero.
>> + *   - On failure, a negative value.
>> + */
>> +
>> +int
>> +rte_dev_monitor_stop(void);
>> --- a/lib/librte_eal/common/eal_common_dev.c
>> +++ b/lib/librte_eal/common/eal_common_dev.c
>> @@ -42,9 +42,32 @@
>>   #include <rte_devargs.h>
>>   #include <rte_debug.h>
>>   #include <rte_log.h>
>> +#include <rte_spinlock.h>
>> +#include <rte_malloc.h>
>>   
>>   #include "eal_private.h"
>>   
>> +/* spinlock for device callbacks */
>> +static rte_spinlock_t rte_dev_cb_lock = RTE_SPINLOCK_INITIALIZER;
> Please rename to rte_dev_event_lock.
> Let's use rte_dev_event_ prefix consistently.
make consistently, agree.
>> + * The user application callback description.
>> + *
>> + * It contains callback address to be registered by user application,
>> + * the pointer to the parameters for callback, and the event type.
>> + */
>> +struct rte_eal_dev_callback {
> Rename to rte_dev_event?
>> +	TAILQ_ENTRY(rte_eal_dev_callback) next; /**< Callbacks list */
>> +	rte_eal_dev_cb_fn cb_fn;                /**< Callback address */
> Rename to rte_dev_event_callback?
>> +	void *cb_arg;                           /**< Parameter for callback */
> Comment should be about opaque context.
>> +	void *ret_param;                        /**< Return parameter */
>> +	enum rte_dev_event_type event;      /**< device event type */
>> +	uint32_t active;                        /**< Callback is executing */
> Why active is needed?
avoid the lock when unregistered  callback.
>> +};
>> +
>> +/* A genaral callback for all new devices be added onto the bus */
>> +static struct rte_eal_dev_callback *dev_add_cb;
> It should not be a different callback for new devices.
> You must allow registering the callback for all and new devices.
> Please look how it's done for ethdev:
> 	https://dpdk.org/patch/32900/
the aim to use this special callback is because when new device add onto 
the bus, no device instance to store the callback. i saw ethdev 
solution, that is base on port but that would not make sense in rte 
device layer. so
i try to abandon add callback in rte device, replace of add device name 
into callback , please see my v10 patch.
>> +int
>> +rte_dev_callback_register(struct rte_device *device,
>> +			enum rte_dev_event_type event,
>> +			rte_eal_dev_cb_fn cb_fn, void *cb_arg)
>> +{
> Why passing an event type at registration?
> I think the event processing dispatch must be done in the callback,
> not at registration.
make sense, just register all type for device ,and let eal to pass the 
event.
>> +		/* allocate a new interrupt callback entity */
>> +		user_cb = rte_zmalloc("eal device event",
>> +					sizeof(*user_cb), 0);
> No need to use rte_malloc here.
> Please check this callback API patch:
> 	https://dpdk.org/patch/33144/
could be better to concentration the code. but if you could tell me why 
not use rte_zmalloc.
>> --- a/lib/librte_eal/common/include/rte_dev.h
>> +++ b/lib/librte_eal/common/include/rte_dev.h
>> +enum uev_monitor_netlink_group {
>> +	UEV_MONITOR_KERNEL,
>> +	UEV_MONITOR_UDEV,
>> +};
> Please keep a namespace prefix like RTE_DEV_EVENT_ (same for enum name).
> Some comments are missing for these constants.
>> +/**
>> + * The device event type.
>> + */
>> +enum rte_dev_event_type {
>> +	RTE_DEV_EVENT_UNKNOWN,	/**< unknown event type */
>> +	RTE_DEV_EVENT_ADD,	/**< device being added */
>> +	RTE_DEV_EVENT_REMOVE,
>> +				/**< device being removed */
>> +	RTE_DEV_EVENT_CHANGE,
>> +				/**< device status being changed,
>> +				 * etc charger percent
>> +				 */
> What means status changed?
> What means charger percent?
status changed means that object path change or other more,  charger 
percent just a example for some kobject status.  so i don't think we 
should explicit identify all , i will  delete it until we want to use it.
>> +	RTE_DEV_EVENT_MOVE,	/**< device sysfs path being moved */
> sysfs is Linux specific
>
>> +	RTE_DEV_EVENT_ONLINE,	/**< device being enable */
> You mean a device can be added but not enabled?
> So enabling is switching it on by a register? or something else?
>
>> +	RTE_DEV_EVENT_OFFLINE,	/**< device being disable */
>> +	RTE_DEV_EVENT_MAX	/**< max value of this enum */
>> +};
>> +
>> +struct rte_eal_uevent {
>> +	enum rte_dev_event_type type;	/**< device event type */
>> +	int subsystem;				/**< subsystem id */
>> +	char *devname;				/**< device name */
>> +	enum uev_monitor_netlink_group group;	/**< device netlink group */
>> +};
> I don't understand why this struct is exposed in the public API.
> Please rename from rte_eal_ to rte_dev_.
will modify the uevent to event.
>> @@ -166,6 +204,8 @@ struct rte_device {
>>   	const struct rte_driver *driver;/**< Associated driver */
>>   	int numa_node;                /**< NUMA node connection */
>>   	struct rte_devargs *devargs;  /**< Device user arguments */
>> +	/** User application callbacks for device event */
>> +	struct rte_eal_dev_cb_list uev_cbs;
> Do not use uev word in API, it refers to uevent which is implementation
> specific. You can name it event_callbacks.
>
> I am afraid this change is breaking the ABI.
> For the first time, 18.02 will be ABI stable.
will not modify rte device struct in v10.
>> +/**
>> + * It registers the callback for the specific event. Multiple
>> + * callbacks cal be registered at the same time.
>> + * @param event
>> + *  The device event type.
>> + * @param cb_fn
>> + *  callback address.
>> + * @param cb_arg
>> + *  address of parameter for callback.
>> + *
>> + * @return
>> + *  - On success, zero.
>> + *  - On failure, a negative value.
>> + */
>> +int rte_dev_callback_register(struct rte_device *device,
>> +			enum rte_dev_event_type event,
>> +			rte_eal_dev_cb_fn cb_fn, void *cb_arg);
>> +
>> +/**
>> + * It unregisters the callback according to the specified event.
>> + *
>> + * @param event
>> + *  The event type which corresponding to the callback.
>> + * @param cb_fn
>> + *  callback address.
>> + *  address of parameter for callback, (void *)-1 means to remove all
>> + *  registered which has the same callback address.
>> + *
>> + * @return
>> + *  - On success, return the number of callback entities removed.
>> + *  - On failure, a negative value.
>> + */
>> +int rte_dev_callback_unregister(struct rte_device *device,
>> +			enum rte_dev_event_type event,
>> +			rte_eal_dev_cb_fn cb_fn, void *cb_arg);
> Such new functions should be added as experimental.
>
> There will be probably more to review in this patch.
> Let's progress on these comments please.
thanks for your review!

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V10 1/2] eal: add uevent monitor api and callback func
  2018-01-11 14:05                                 ` [PATCH V10 1/2] eal: add uevent monitor api and callback func Jeff Guo
  2018-01-11 14:05                                   ` [PATCH V10 2/2] eal: add uevent pass and process function Jeff Guo
@ 2018-01-14 23:16                                   ` Thomas Monjalon
  2018-01-15 10:55                                     ` Guo, Jia
  1 sibling, 1 reply; 494+ messages in thread
From: Thomas Monjalon @ 2018-01-14 23:16 UTC (permalink / raw)
  To: Jeff Guo
  Cc: dev, stephen, bruce.richardson, ferruh.yigit, gaetan.rivet,
	konstantin.ananyev, jblunck, shreyansh.jain, jingjing.wu,
	helin.zhang, motih

Hi,

11/01/2018 15:05, Jeff Guo:
> +/* A genaral callback for all registerd devices */

Typos: genaral, registerd

So the callback is only for registered devices?
What about hotplugged devices?

> +/**
> + * It registers the callback for the specific event. Multiple
> + * callbacks cal be registered at the same time.
> + * @param event
> + *  The device event type.
> + * @param cb_fn
> + *  callback address.
> + * @param cb_arg
> + *  address of parameter for callback.
> + *
> + * @return
> + *  - On success, zero.
> + *  - On failure, a negative value.
> + */
> +int rte_dev_callback_register(char *dev_name,
> +			rte_dev_event_cb_fn cb_fn, void *cb_arg);
> +
> +/**
> + * It unregisters the callback according to the specified event.
> + *
> + * @param event
> + *  The event type which corresponding to the callback.
> + * @param cb_fn
> + *  callback address.
> + *  address of parameter for callback, (void *)-1 means to remove all
> + *  registered which has the same callback address.
> + *
> + * @return
> + *  - On success, return the number of callback entities removed.
> + *  - On failure, a negative value.
> + */
> +int rte_dev_callback_unregister(char *dev_name,
> +			rte_dev_event_cb_fn cb_fn, void *cb_arg);

These new functions should be tagged as experimental.

> +/**
> + * Start the device event monitoring.
> + *
> + * @param none
> + * @return
> + *   - On success, zero.
> + *   - On failure, a negative value.
> + */
> +int
> +rte_dev_evt_mntr_start(void);

Should be experimental too, as every new public functions.

Please avoid shortening function name too much.
rte_dev_event_monitor_start is more pleasant to read.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V10 2/2] eal: add uevent pass and process function
  2018-01-11 14:05                                   ` [PATCH V10 2/2] eal: add uevent pass and process function Jeff Guo
@ 2018-01-14 23:24                                     ` Thomas Monjalon
  2018-01-15 10:52                                       ` Guo, Jia
  2018-01-15 10:48                                     ` [PATCH V11 1/3] eal: add uevent monitor api and callback func Jeff Guo
  1 sibling, 1 reply; 494+ messages in thread
From: Thomas Monjalon @ 2018-01-14 23:24 UTC (permalink / raw)
  To: Jeff Guo
  Cc: dev, stephen, bruce.richardson, ferruh.yigit, gaetan.rivet,
	konstantin.ananyev, jblunck, shreyansh.jain, jingjing.wu,
	helin.zhang, motih

11/01/2018 15:05, Jeff Guo:
> +enum rte_dev_state {
> +	RTE_DEV_UNDEFINED,	/**< unknown device state */
> +	RTE_DEV_FAULT,	/**< device fault or error */
> +	RTE_DEV_PARSED,	/**< device have been parsed on bus*/
> +	RTE_DEV_PROBED,	/**< devcie have been probed driver  */
> +};

Let's start with nitpicks: please be careful with spacing in comments.
+ typo: devcie
+ grammar: device has

What means parsed on bus? Is it "scanned"?

> +enum rte_dev_subsystem {
> +	RTE_DEV_SUBSYSTEM_UIO,
> +	RTE_DEV_SUBSYSTEM_VFIO,
> +	RTE_DEV_SUBSYSTEM_PCI,
> +	RTE_DEV_SUBSYSTEM_MAX
> +};

I don't think PCI and UIO/VFIO should be described at the same level.
Can you re-use the enum rte_kernel_driver?

> +enum event_monitor_netlink_group {
> +	RTE_DEV_EVENT_MONITOR_KERNEL,
> +	RTE_DEV_EVENT_MONITOR_UDEV,
> +};

This enum should be prefixed with rte_

> +	enum event_monitor_netlink_group group;	/**< device netlink group */

netlink is specific to Linux.
I don't think it should be in a generic API struct.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH V11 1/3] eal: add uevent monitor api and callback func
  2018-01-11 14:05                                   ` [PATCH V10 2/2] eal: add uevent pass and process function Jeff Guo
  2018-01-14 23:24                                     ` Thomas Monjalon
@ 2018-01-15 10:48                                     ` Jeff Guo
  2018-01-15 10:48                                       ` [PATCH V11 2/3] eal: add uevent pass and process function Jeff Guo
                                                         ` (2 more replies)
  1 sibling, 3 replies; 494+ messages in thread
From: Jeff Guo @ 2018-01-15 10:48 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, gaetan.rivet
  Cc: konstantin.ananyev, jblunck, shreyansh.jain, jingjing.wu, dev,
	jia.guo, thomas, helin.zhang, motih

This patch aim to add a general uevent mechanism in eal device layer,
to enable all linux kernel object uevent monitoring, user could use these
APIs to monitor and read out the device status info that sent from the
kernel side, then corresponding to handle it, such as when detect hotplug
uevent type, user could detach or attach the device, and more it benefit
to use to do smoothly fail safe work.

About uevent monitoring:
a: add one epolling to poll the netlink socket, to monitor the uevent of
   the device.
b: add enum of rte_eal_dev_event_type and struct of rte_eal_uevent.
c: add below APIs in rte eal device layer.
   rte_dev_callback_register
   rte_dev_callback_unregister
   _rte_dev_callback_process
   rte_dev_event_monitor_start
   rte_dev_event_monitor_stop

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v11->v10:
modify some typo and add experimental tag in new file.
---
 lib/librte_eal/bsdapp/eal/eal_dev.c     |  38 ++++++
 lib/librte_eal/common/eal_common_dev.c  | 125 ++++++++++++++++++
 lib/librte_eal/common/include/rte_dev.h | 117 +++++++++++++++++
 lib/librte_eal/linuxapp/eal/Makefile    |   1 +
 lib/librte_eal/linuxapp/eal/eal_dev.c   | 223 ++++++++++++++++++++++++++++++++
 5 files changed, 504 insertions(+)
 create mode 100644 lib/librte_eal/bsdapp/eal/eal_dev.c
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_dev.c

diff --git a/lib/librte_eal/bsdapp/eal/eal_dev.c b/lib/librte_eal/bsdapp/eal/eal_dev.c
new file mode 100644
index 0000000..83ffdee
--- /dev/null
+++ b/lib/librte_eal/bsdapp/eal/eal_dev.c
@@ -0,0 +1,38 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdio.h>
+#include <string.h>
+#include <inttypes.h>
+#include <sys/queue.h>
+#include <sys/signalfd.h>
+#include <sys/ioctl.h>
+#include <sys/socket.h>
+#include <sys/epoll.h>
+#include <unistd.h>
+#include <signal.h>
+#include <stdbool.h>
+
+#include <rte_malloc.h>
+#include <rte_bus.h>
+#include <rte_dev.h>
+#include <rte_devargs.h>
+#include <rte_debug.h>
+#include <rte_log.h>
+
+#include "eal_thread.h"
+
+int
+rte_dev_event_monitor_start(void)
+{
+	RTE_LOG(ERR, EAL, "Not support event monitor for FreeBSD\n");
+	return -1;
+}
+
+int
+rte_dev_event_monitor_stop(void)
+{
+	RTE_LOG(ERR, EAL, "Not support event monitor for FreeBSD\n");
+	return -1;
+}
diff --git a/lib/librte_eal/common/eal_common_dev.c b/lib/librte_eal/common/eal_common_dev.c
index dda8f58..f87e769 100644
--- a/lib/librte_eal/common/eal_common_dev.c
+++ b/lib/librte_eal/common/eal_common_dev.c
@@ -42,9 +42,31 @@
 #include <rte_devargs.h>
 #include <rte_debug.h>
 #include <rte_log.h>
+#include <rte_spinlock.h>
+#include <rte_malloc.h>
 
 #include "eal_private.h"
 
+/* spinlock for device callbacks */
+static rte_spinlock_t rte_dev_event_lock = RTE_SPINLOCK_INITIALIZER;
+
+/**
+ * The user application callback description.
+ *
+ * It contains callback address to be registered by user application,
+ * the pointer to the parameters for callback, and the event type.
+ */
+struct rte_dev_event_callback {
+	TAILQ_ENTRY(rte_dev_event_callback) next; /**< Callbacks list */
+	rte_dev_event_cb_fn cb_fn;                /**< Callback address */
+	void *cb_arg;                           /**< Callback parameter */
+	char *dev_name; /**< device name */
+	uint32_t active;                        /**< Callback is executing */
+};
+
+/* A general callback for all registered devices */
+static struct rte_dev_event_cb_list dev_event_cbs;
+
 static int cmp_detached_dev_name(const struct rte_device *dev,
 	const void *_name)
 {
@@ -234,3 +256,106 @@ int rte_eal_hotplug_remove(const char *busname, const char *devname)
 	rte_eal_devargs_remove(busname, devname);
 	return ret;
 }
+
+int
+rte_dev_callback_register(char *device_name, rte_dev_event_cb_fn cb_fn,
+				void *cb_arg)
+{
+	struct rte_dev_event_callback *event_cb = NULL;
+
+	rte_spinlock_lock(&rte_dev_event_lock);
+
+	if (TAILQ_EMPTY(&(dev_event_cbs)))
+		TAILQ_INIT(&(dev_event_cbs));
+
+	TAILQ_FOREACH(event_cb, &(dev_event_cbs), next) {
+		if (event_cb->cb_fn == cb_fn &&
+			event_cb->cb_arg == cb_arg &&
+			!strcmp(event_cb->dev_name, device_name))
+			break;
+	}
+
+	/* create a new callback. */
+	if (event_cb == NULL) {
+		/* allocate a new user callback entity */
+		event_cb = malloc(sizeof(struct rte_dev_event_callback));
+		if (event_cb != NULL) {
+			event_cb->cb_fn = cb_fn;
+			event_cb->cb_arg = cb_arg;
+			event_cb->dev_name = device_name;
+		}
+		TAILQ_INSERT_TAIL(&(dev_event_cbs), event_cb, next);
+	}
+
+	rte_spinlock_unlock(&rte_dev_event_lock);
+	return (event_cb == NULL) ? -1 : 0;
+}
+
+int
+rte_dev_callback_unregister(char *device_name, rte_dev_event_cb_fn cb_fn,
+				void *cb_arg)
+{
+	int ret;
+	struct rte_dev_event_callback *event_cb, *next;
+
+	if (!cb_fn || device_name == NULL)
+		return -EINVAL;
+
+	rte_spinlock_lock(&rte_dev_event_lock);
+
+	ret = 0;
+
+	for (event_cb = TAILQ_FIRST(&(dev_event_cbs)); event_cb != NULL;
+	      event_cb = next) {
+
+		next = TAILQ_NEXT(event_cb, next);
+
+		if (event_cb->cb_fn != cb_fn ||
+				(event_cb->cb_arg != (void *)-1 &&
+				event_cb->cb_arg != cb_arg) ||
+				strcmp(event_cb->dev_name, device_name))
+			continue;
+
+		/*
+		 * if this callback is not executing right now,
+		 * then remove it.
+		 */
+		if (event_cb->active == 0) {
+			TAILQ_REMOVE(&(dev_event_cbs), event_cb, next);
+			rte_free(event_cb);
+		} else {
+			ret = -EAGAIN;
+		}
+	}
+
+	rte_spinlock_unlock(&rte_dev_event_lock);
+	return ret;
+}
+
+int
+_rte_dev_callback_process(char *device_name, enum rte_dev_event_type event,
+				void *cb_arg)
+{
+	struct rte_dev_event_callback dev_cb;
+	struct rte_dev_event_callback *cb_lst;
+	int rc = 0;
+
+	rte_spinlock_lock(&rte_dev_event_lock);
+
+	if (device_name == NULL)
+		return -EINVAL;
+
+	TAILQ_FOREACH(cb_lst, &(dev_event_cbs), next) {
+		if (cb_lst->cb_fn == NULL || cb_lst->cb_arg != cb_arg ||
+			strcmp(cb_lst->dev_name, device_name))
+			continue;
+		dev_cb = *cb_lst;
+		cb_lst->active = 1;
+		rc = dev_cb.cb_fn(device_name, event,
+				dev_cb.cb_arg);
+		cb_lst->active = 0;
+	}
+
+	rte_spinlock_unlock(&rte_dev_event_lock);
+	return rc;
+}
diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h
index 9342e0c..f6c9acb 100644
--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -51,6 +51,30 @@ extern "C" {
 
 #include <rte_log.h>
 
+/**
+ * The device event type.
+ */
+enum rte_dev_event_type {
+	RTE_DEV_EVENT_UNKNOWN,	/**< unknown event type */
+	RTE_DEV_EVENT_ADD,	/**< device being added */
+	RTE_DEV_EVENT_REMOVE,	/**< device being removed */
+	RTE_DEV_EVENT_MAX	/**< max value of this enum */
+};
+
+struct rte_dev_event {
+	enum rte_dev_event_type type;	/**< device event type */
+	int subsystem;			/**< subsystem id */
+	char *devname;			/**< device name */
+};
+
+typedef int (*rte_dev_event_cb_fn)(char *device_name,
+					enum rte_dev_event_type event,
+					void *cb_arg);
+
+struct rte_dev_event_callback;
+/** @internal Structure to keep track of registered callbacks */
+TAILQ_HEAD(rte_dev_event_cb_list, rte_dev_event_callback);
+
 __attribute__((format(printf, 2, 0)))
 static inline void
 rte_pmd_debug_trace(const char *func_name, const char *fmt, ...)
@@ -293,4 +317,97 @@ __attribute__((used)) = str
 }
 #endif
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * It registers the callback for the specific event. Multiple
+ * callbacks cal be registered at the same time.
+ *
+ * @param device_name
+ *  The device name.
+ * @param cb_fn
+ *  callback address.
+ * @param cb_arg
+ *  address of parameter for callback.
+ *
+ * @return
+ *  - On success, zero.
+ *  - On failure, a negative value.
+ */
+int rte_dev_callback_register(char *device_name, rte_dev_event_cb_fn cb_fn,
+					void *cb_arg);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * It unregisters the callback according to the specified event.
+ *
+ * @param device_name
+ *  The device name.
+ * @param cb_fn
+ *  callback address.
+ * @param cb_arg
+ *  address of parameter for callback, (void *)-1 means to remove all
+ *  registered which has the same callback address.
+ *
+ * @return
+ *  - On success, return the number of callback entities removed.
+ *  - On failure, a negative value.
+ */
+int rte_dev_callback_unregister(char *device_name, rte_dev_event_cb_fn cb_fn,
+					void *cb_arg);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * internal Executes all the user application registered callbacks for
+ * the specific device. It is for DPDK internal user only. User
+ * application should not call it directly.
+ *
+ * @param device_name
+ *  The device name.
+ * @param event
+ *  the device event type
+ *  is permitted or not.
+ * @param cb_arg
+ *  callback parameter.
+ *
+ * @return
+ *  - On success, return zero.
+ *  - On failure, a negative value.
+ */
+int
+_rte_dev_callback_process(char *device_name, enum rte_dev_event_type event,
+				void *cb_arg);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Start the device event monitoring.
+ *
+ * @param none
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+rte_dev_event_monitor_start(void);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Stop the device event monitoring .
+ *
+ * @param none
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+rte_dev_event_monitor_stop(void);
 #endif /* _RTE_DEV_H_ */
diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index 588c0bd..43b00e5 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -39,6 +39,7 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_lcore.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_timer.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_interrupts.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_alarm.c
+SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_dev.c
 
 # from common dir
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_common_lcore.c
diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
new file mode 100644
index 0000000..4f4beb5
--- /dev/null
+++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
@@ -0,0 +1,223 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdio.h>
+#include <string.h>
+#include <inttypes.h>
+#include <sys/queue.h>
+#include <sys/signalfd.h>
+#include <sys/ioctl.h>
+#include <sys/socket.h>
+#include <linux/netlink.h>
+#include <sys/epoll.h>
+#include <unistd.h>
+#include <signal.h>
+#include <stdbool.h>
+
+#include <rte_malloc.h>
+#include <rte_bus.h>
+#include <rte_dev.h>
+#include <rte_devargs.h>
+#include <rte_debug.h>
+#include <rte_log.h>
+#include <rte_service.h>
+#include <rte_service_component.h>
+
+#include "eal_thread.h"
+
+bool service_exit = true;
+
+bool service_no_init = true;
+
+#define DEV_EV_MNT_SERVICE_NAME "device_event_monitor_service"
+
+static int
+dev_monitor_fd_new(void)
+{
+
+	int uevent_fd;
+
+	uevent_fd = socket(PF_NETLINK, SOCK_RAW | SOCK_CLOEXEC |
+			SOCK_NONBLOCK,
+			NETLINK_KOBJECT_UEVENT);
+	if (uevent_fd < 0) {
+		RTE_LOG(ERR, EAL, "create uevent fd failed\n");
+		return -1;
+	}
+	return uevent_fd;
+}
+
+static int
+dev_monitor_enable(int netlink_fd)
+{
+	struct sockaddr_nl addr;
+	int ret;
+	int size = 64 * 1024;
+	int nonblock = 1;
+
+	memset(&addr, 0, sizeof(addr));
+	addr.nl_family = AF_NETLINK;
+	addr.nl_pid = 0;
+	addr.nl_groups = 0xffffffff;
+
+	if (bind(netlink_fd, (struct sockaddr *) &addr, sizeof(addr)) < 0) {
+		RTE_LOG(ERR, EAL, "bind failed\n");
+		goto err;
+	}
+
+	setsockopt(netlink_fd, SOL_SOCKET, SO_PASSCRED, &size, sizeof(size));
+
+	ret = ioctl(netlink_fd, FIONBIO, &nonblock);
+	if (ret != 0) {
+		RTE_LOG(ERR, EAL, "ioctl(FIONBIO) failed\n");
+		goto err;
+	}
+	return 0;
+err:
+	close(netlink_fd);
+	return -1;
+}
+
+static int
+dev_uev_process(__rte_unused struct epoll_event *events, __rte_unused int nfds)
+{
+	/* TODO: device uevent processing */
+	return 0;
+}
+
+/**
+ * It builds/rebuilds up the epoll file descriptor with all the
+ * file descriptors being waited on. Then handles the interrupts.
+ *
+ * @param arg
+ *  pointer. (unused)
+ *
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+static int32_t dev_uev_monitoring(__rte_unused void *arg)
+{
+	int netlink_fd = -1;
+	struct epoll_event ep_kernel;
+	int fd_ep = -1;
+
+	service_exit = false;
+
+	fd_ep = epoll_create1(EPOLL_CLOEXEC);
+	if (fd_ep < 0) {
+		RTE_LOG(ERR, EAL, "error creating epoll fd: %m\n");
+		goto out;
+	}
+
+	netlink_fd = dev_monitor_fd_new();
+
+	if (dev_monitor_enable(netlink_fd) < 0) {
+		RTE_LOG(ERR, EAL, "error subscribing to kernel events\n");
+		goto out;
+	}
+
+	memset(&ep_kernel, 0, sizeof(struct epoll_event));
+	ep_kernel.events = EPOLLIN | EPOLLPRI | EPOLLRDHUP | EPOLLHUP;
+	ep_kernel.data.fd = netlink_fd;
+	if (epoll_ctl(fd_ep, EPOLL_CTL_ADD, netlink_fd,
+		&ep_kernel) < 0) {
+		RTE_LOG(ERR, EAL, "error addding fd to epoll: %m\n");
+		goto out;
+	}
+
+	while (!service_exit) {
+		int fdcount;
+		struct epoll_event ev[1];
+
+		fdcount = epoll_wait(fd_ep, ev, 1, -1);
+		if (fdcount < 0) {
+			if (errno != EINTR)
+				RTE_LOG(ERR, EAL, "error receiving uevent "
+					"message: %m\n");
+				continue;
+			}
+
+		/* epoll_wait has at least one fd ready to read */
+		if (dev_uev_process(ev, fdcount) < 0) {
+			if (errno != EINTR)
+				RTE_LOG(ERR, EAL, "error processing uevent "
+					"message: %m\n");
+		}
+	}
+	return 0;
+out:
+	if (fd_ep >= 0)
+		close(fd_ep);
+	if (netlink_fd >= 0)
+		close(netlink_fd);
+	rte_panic("uev monitoring fail\n");
+	return -1;
+}
+
+int
+rte_dev_event_monitor_start(void)
+{
+	int ret;
+	struct rte_service_spec service;
+	uint32_t id;
+	const uint32_t sid = 0;
+
+	if (!service_no_init)
+		return 0;
+
+	uint32_t slcore_1 = rte_get_next_lcore(/* start core */ -1,
+					       /* skip master */ 1,
+					       /* wrap */ 0);
+
+	ret = rte_service_lcore_add(slcore_1);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "dev event monitor lcore add fail");
+		return ret;
+	}
+
+	memset(&service, 0, sizeof(service));
+	snprintf(service.name, sizeof(service.name), DEV_EV_MNT_SERVICE_NAME);
+
+	service.socket_id = rte_socket_id();
+	service.callback = dev_uev_monitoring;
+	service.callback_userdata = NULL;
+	service.capabilities = 0;
+	ret = rte_service_component_register(&service, &id);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Failed to register service %s "
+			"err = %" PRId32,
+			service.name, ret);
+		return ret;
+	}
+	ret = rte_service_runstate_set(sid, 1);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Failed to set the runstate of "
+			"the service");
+		return ret;
+	}
+	ret = rte_service_component_runstate_set(id, 1);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Failed to set the backend runstate"
+			" of a component");
+		return ret;
+	}
+	ret = rte_service_map_lcore_set(sid, slcore_1, 1);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Failed to enable lcore 1 on "
+			"dev event monitor service");
+		return ret;
+	}
+	rte_service_lcore_start(slcore_1);
+	service_no_init = false;
+	return 0;
+}
+
+int
+rte_dev_event_monitor_stop(void)
+{
+	service_exit = true;
+	service_no_init = true;
+	return 0;
+}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH V11 2/3] eal: add uevent pass and process function
  2018-01-15 10:48                                     ` [PATCH V11 1/3] eal: add uevent monitor api and callback func Jeff Guo
@ 2018-01-15 10:48                                       ` Jeff Guo
  2018-01-17 22:00                                         ` Thomas Monjalon
  2018-01-15 10:48                                       ` [PATCH V11 3/3] app/testpmd: use uevent to monitor hotplug Jeff Guo
  2018-01-17 21:59                                       ` [PATCH V11 " Thomas Monjalon
  2 siblings, 1 reply; 494+ messages in thread
From: Jeff Guo @ 2018-01-15 10:48 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, gaetan.rivet
  Cc: konstantin.ananyev, jblunck, shreyansh.jain, jingjing.wu, dev,
	jia.guo, thomas, helin.zhang, motih

In order to handle the uevent which have been detected from the kernel
side, add uevent process function, let hot plug event to be example to
show uevent mechanism how to pass the uevent and process the uevent.

About uevent passing and processing, add below functions in linux eal
dev layer. FreeBSD not support uevent ,so let it to be void and do not
implement in function.
a.dev_uev_parse
b.dev_uev_receive
c.dev_uev_process

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v11->v10:
fix some typo issue.
---
 lib/librte_eal/common/include/rte_dev.h |  17 +++++
 lib/librte_eal/linuxapp/eal/eal_dev.c   | 110 +++++++++++++++++++++++++++++++-
 2 files changed, 125 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h
index f6c9acb..0dbbaa8 100644
--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -51,6 +51,23 @@ extern "C" {
 
 #include <rte_log.h>
 
+#define RTE_EAL_UEV_MSG_LEN 4096
+#define RTE_EAL_UEV_MSG_ELEM_LEN 128
+
+enum rte_dev_state {
+	RTE_DEV_UNDEFINED,	/**< unknown device state */
+	RTE_DEV_FAULT,	/**< device fault or error */
+	RTE_DEV_PARSED,	/**< device has been scanned on bus*/
+	RTE_DEV_PROBED,	/**< device has been probed driver  */
+};
+
+enum rte_dev_event_subsystem {
+	RTE_DEV_EVENT_SUBSYSTEM_UIO,
+	RTE_DEV_EVENT_SUBSYSTEM_VFIO,
+	RTE_DEV_EVENT_SUBSYSTEM_PCI,
+	RTE_DEV_EVENT_SUBSYSTEM_MAX
+};
+
 /**
  * The device event type.
  */
diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
index 4f4beb5..bda4618 100644
--- a/lib/librte_eal/linuxapp/eal/eal_dev.c
+++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
@@ -79,10 +79,116 @@ dev_monitor_enable(int netlink_fd)
 	return -1;
 }
 
+static void
+dev_uev_parse(const char *buf, struct rte_dev_event *event)
+{
+	char action[RTE_EAL_UEV_MSG_ELEM_LEN];
+	char subsystem[RTE_EAL_UEV_MSG_ELEM_LEN];
+	char dev_path[RTE_EAL_UEV_MSG_ELEM_LEN];
+	char pci_slot_name[RTE_EAL_UEV_MSG_ELEM_LEN];
+	int i = 0;
+
+	memset(action, 0, RTE_EAL_UEV_MSG_ELEM_LEN);
+	memset(subsystem, 0, RTE_EAL_UEV_MSG_ELEM_LEN);
+	memset(dev_path, 0, RTE_EAL_UEV_MSG_ELEM_LEN);
+	memset(pci_slot_name, 0, RTE_EAL_UEV_MSG_ELEM_LEN);
+
+	while (i < RTE_EAL_UEV_MSG_LEN) {
+		for (; i < RTE_EAL_UEV_MSG_LEN; i++) {
+			if (*buf)
+				break;
+			buf++;
+		}
+		if (!strncmp(buf, "libudev", 7)) {
+			buf += 7;
+			i += 7;
+			return;
+		}
+		if (!strncmp(buf, "ACTION=", 7)) {
+			buf += 7;
+			i += 7;
+			snprintf(action, sizeof(action), "%s", buf);
+		} else if (!strncmp(buf, "DEVPATH=", 8)) {
+			buf += 8;
+			i += 8;
+			snprintf(dev_path, sizeof(dev_path), "%s", buf);
+		} else if (!strncmp(buf, "SUBSYSTEM=", 10)) {
+			buf += 10;
+			i += 10;
+			snprintf(subsystem, sizeof(subsystem), "%s", buf);
+		} else if (!strncmp(buf, "PCI_SLOT_NAME=", 14)) {
+			buf += 14;
+			i += 14;
+			snprintf(pci_slot_name, sizeof(subsystem), "%s", buf);
+			event->devname = pci_slot_name;
+		}
+		for (; i < RTE_EAL_UEV_MSG_LEN; i++) {
+			if (*buf == '\0')
+				break;
+			buf++;
+		}
+	}
+
+	if (!strncmp(subsystem, "pci", 3))
+		event->subsystem = RTE_DEV_EVENT_SUBSYSTEM_UIO;
+	if (!strncmp(action, "add", 3))
+		event->type = RTE_DEV_EVENT_ADD;
+	if (!strncmp(action, "remove", 6))
+		event->type = RTE_DEV_EVENT_REMOVE;
+	event->devname = pci_slot_name;
+}
+
+static int
+dev_uev_receive(int fd, struct rte_dev_event *uevent)
+{
+	int ret;
+	char buf[RTE_EAL_UEV_MSG_LEN];
+
+	memset(uevent, 0, sizeof(struct rte_dev_event));
+	memset(buf, 0, RTE_EAL_UEV_MSG_LEN);
+
+	ret = recv(fd, buf, RTE_EAL_UEV_MSG_LEN - 1, MSG_DONTWAIT);
+	if (ret < 0) {
+		RTE_LOG(ERR, EAL,
+		"Socket read error(%d): %s\n",
+		errno, strerror(errno));
+		return -1;
+	} else if (ret == 0)
+		/* connection closed */
+		return -1;
+
+	dev_uev_parse(buf, uevent);
+
+	return 0;
+}
+
 static int
-dev_uev_process(__rte_unused struct epoll_event *events, __rte_unused int nfds)
+dev_uev_process(struct epoll_event *events, int nfds)
 {
-	/* TODO: device uevent processing */
+	struct rte_dev_event uevent;
+	int i;
+
+	for (i = 0; i < nfds; i++) {
+		/**
+		 * check device uevent from kernel side, no need to check
+		 * uevent from udev.
+		 */
+		if (dev_uev_receive(events[i].data.fd, &uevent))
+			return 0;
+
+		/* default handle all pci devcie when is being hot plug */
+		if (uevent.subsystem == RTE_DEV_EVENT_SUBSYSTEM_UIO) {
+			if (uevent.type == RTE_DEV_EVENT_REMOVE) {
+				return(_rte_dev_callback_process(
+					uevent.devname,
+					RTE_DEV_EVENT_REMOVE, NULL));
+			} else if (uevent.type == RTE_DEV_EVENT_ADD) {
+				return(_rte_dev_callback_process(
+					uevent.devname,
+					RTE_DEV_EVENT_ADD, NULL));
+			}
+		}
+	}
 	return 0;
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH V11 3/3] app/testpmd: use uevent to monitor hotplug
  2018-01-15 10:48                                     ` [PATCH V11 1/3] eal: add uevent monitor api and callback func Jeff Guo
  2018-01-15 10:48                                       ` [PATCH V11 2/3] eal: add uevent pass and process function Jeff Guo
@ 2018-01-15 10:48                                       ` Jeff Guo
  2018-01-18  4:12                                         ` [PATCH V12 1/3] eal: add uevent monitor api and callback func Jeff Guo
  2018-01-17 21:59                                       ` [PATCH V11 " Thomas Monjalon
  2 siblings, 1 reply; 494+ messages in thread
From: Jeff Guo @ 2018-01-15 10:48 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, gaetan.rivet
  Cc: konstantin.ananyev, jblunck, shreyansh.jain, jingjing.wu, dev,
	jia.guo, thomas, helin.zhang, motih

use testpmd for example, to show app how to request and use
uevent monitoring to handle the hot removal event and the
hot insertion event.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v11->v10:
modify callback register calling.
---
 app/test-pmd/testpmd.c | 168 +++++++++++++++++++++++++++++++++++++++++++++++++
 app/test-pmd/testpmd.h |   9 +++
 2 files changed, 177 insertions(+)

diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 9414d0e..87e3753 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -373,6 +373,8 @@ uint8_t bitrate_enabled;
 struct gro_status gro_ports[RTE_MAX_ETHPORTS];
 uint8_t gro_flush_cycles = GRO_DEFAULT_FLUSH_CYCLES;
 
+static struct hotplug_request_list hp_list;
+
 /* Forward function declarations */
 static void map_port_queue_stats_mapping_registers(portid_t pi,
 						   struct rte_port *port);
@@ -380,6 +382,13 @@ static void check_all_ports_link_status(uint32_t port_mask);
 static int eth_event_callback(portid_t port_id,
 			      enum rte_eth_event_type type,
 			      void *param, void *ret_param);
+static int eth_uevent_callback(char *device_name, enum rte_dev_event_type type,
+			      void *param);
+static int eth_uevent_callback_register(portid_t port_id);
+static int in_hotplug_list(const char *dev_name);
+
+static int hotplug_list_add(const char *dev_name,
+			    enum rte_dev_event_type event);
 
 /*
  * Check if all the ports are started.
@@ -1729,6 +1738,27 @@ reset_port(portid_t pid)
 	printf("Done\n");
 }
 
+static int
+eth_uevent_callback_register(portid_t port_id)
+{
+	int diag;
+	char device_name[128];
+
+	snprintf(device_name, sizeof(device_name),
+		"%s", rte_eth_devices[port_id].device->name);
+
+	/* register the uevent callback */
+
+	diag = rte_dev_callback_register(device_name,
+		eth_uevent_callback, (void *)(intptr_t)port_id);
+	if (diag) {
+		printf("Failed to setup uevent callback\n");
+		return -1;
+	}
+
+	return 0;
+}
+
 void
 attach_port(char *identifier)
 {
@@ -1745,6 +1775,8 @@ attach_port(char *identifier)
 	if (rte_eth_dev_attach(identifier, &pi))
 		return;
 
+	eth_uevent_callback_register(pi);
+
 	socket_id = (unsigned)rte_eth_dev_socket_id(pi);
 	/* if socket_id is invalid, set to 0 */
 	if (check_socket_id(socket_id) < 0)
@@ -1756,6 +1788,8 @@ attach_port(char *identifier)
 
 	ports[pi].port_status = RTE_PORT_STOPPED;
 
+	hotplug_list_add(identifier, RTE_DEV_EVENT_REMOVE);
+
 	printf("Port %d is attached. Now total ports is %d\n", pi, nb_ports);
 	printf("Done\n");
 }
@@ -1782,6 +1816,9 @@ detach_port(portid_t port_id)
 
 	nb_ports = rte_eth_dev_count();
 
+	hotplug_list_add(rte_eth_devices[port_id].device->name,
+			 RTE_DEV_EVENT_ADD);
+
 	printf("Port '%s' is detached. Now total ports is %d\n",
 			name, nb_ports);
 	printf("Done\n");
@@ -1805,6 +1842,9 @@ pmd_test_exit(void)
 			close_port(pt_id);
 		}
 	}
+
+	rte_dev_event_monitor_stop();
+
 	printf("\nBye...\n");
 }
 
@@ -1889,6 +1929,49 @@ rmv_event_callback(void *arg)
 			dev->device->name);
 }
 
+static void
+rmv_uevent_callback(void *arg)
+{
+	char name[RTE_ETH_NAME_MAX_LEN];
+	uint8_t port_id = (intptr_t)arg;
+
+	rte_eal_alarm_cancel(rmv_uevent_callback, arg);
+
+	RTE_ETH_VALID_PORTID_OR_RET(port_id);
+	printf("removing port id:%u\n", port_id);
+
+	if (!in_hotplug_list(rte_eth_devices[port_id].device->name))
+		return;
+
+	stop_packet_forwarding();
+
+	stop_port(port_id);
+	close_port(port_id);
+	if (rte_eth_dev_detach(port_id, name)) {
+		RTE_LOG(ERR, USER1, "Failed to detach port '%s'\n", name);
+		return;
+	}
+
+	nb_ports = rte_eth_dev_count();
+
+	printf("Port '%s' is detached. Now total ports is %d\n",
+			name, nb_ports);
+}
+
+static void
+add_uevent_callback(void *arg)
+{
+	char *dev_name = (char *)arg;
+
+	rte_eal_alarm_cancel(add_uevent_callback, arg);
+
+	if (!in_hotplug_list(dev_name))
+		return;
+
+	RTE_LOG(ERR, EAL, "add device: %s\n", dev_name);
+	attach_port(dev_name);
+}
+
 /* This function is used by the interrupt thread */
 static int
 eth_event_callback(portid_t port_id, enum rte_eth_event_type type, void *param,
@@ -1931,6 +2014,82 @@ eth_event_callback(portid_t port_id, enum rte_eth_event_type type, void *param,
 }
 
 static int
+in_hotplug_list(const char *dev_name)
+{
+	struct hotplug_request *hp_request = NULL;
+
+	TAILQ_FOREACH(hp_request, &hp_list, next) {
+		if (!strcmp(hp_request->dev_name, dev_name))
+			break;
+	}
+
+	if (hp_request)
+		return 1;
+
+	return 0;
+}
+
+static int
+hotplug_list_add(const char *dev_name, enum rte_dev_event_type event)
+{
+	struct hotplug_request *hp_request;
+
+	hp_request = rte_zmalloc("hoplug request",
+			sizeof(*hp_request), 0);
+	if (hp_request == NULL) {
+		fprintf(stderr, "%s can not alloc memory\n",
+			__func__);
+		return -ENOMEM;
+	}
+
+	hp_request->dev_name = dev_name;
+	hp_request->event = event;
+
+	TAILQ_INSERT_TAIL(&hp_list, hp_request, next);
+
+	return 0;
+}
+
+/* This function is used by the interrupt thread */
+static int
+eth_uevent_callback(char *device_name, enum rte_dev_event_type type, void *arg)
+{
+	static const char * const event_desc[] = {
+		[RTE_DEV_EVENT_UNKNOWN] = "Unknown",
+		[RTE_DEV_EVENT_ADD] = "add",
+		[RTE_DEV_EVENT_REMOVE] = "remove",
+	};
+
+	if (type >= RTE_DEV_EVENT_MAX) {
+		fprintf(stderr, "%s called upon invalid event %d\n",
+			__func__, type);
+		fflush(stderr);
+	} else if (event_print_mask & (UINT32_C(1) << type)) {
+		printf("%s event\n",
+			event_desc[type]);
+		fflush(stdout);
+	}
+
+	switch (type) {
+	case RTE_DEV_EVENT_REMOVE:
+		if (rte_eal_alarm_set(100000,
+			rmv_uevent_callback, arg))
+			fprintf(stderr, "Could not set up deferred "
+				"device removal\n");
+		break;
+	case RTE_DEV_EVENT_ADD:
+		if (rte_eal_alarm_set(500000,
+			add_uevent_callback, (void *)device_name))
+			fprintf(stderr, "Could not set up deferred "
+				"device add\n");
+		break;
+	default:
+		break;
+	}
+	return 0;
+}
+
+static int
 set_tx_queue_stats_mapping_registers(portid_t port_id, struct rte_port *port)
 {
 	uint16_t i;
@@ -2415,6 +2574,15 @@ main(int argc, char** argv)
 		       nb_rxq, nb_txq);
 
 	init_config();
+
+	/* enable hot plug monitoring */
+	TAILQ_INIT(&hp_list);
+	rte_dev_event_monitor_start();
+	RTE_ETH_FOREACH_DEV(port_id) {
+		hotplug_list_add(rte_eth_devices[port_id].device->name,
+			 RTE_DEV_EVENT_REMOVE);
+		eth_uevent_callback_register(port_id);
+	}
 	if (start_port(RTE_PORT_ALL) != 0)
 		rte_exit(EXIT_FAILURE, "Start ports failed\n");
 
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 2a266fd..64254e6 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -63,6 +63,15 @@ typedef uint16_t streamid_t;
 #define TM_MODE			0
 #endif
 
+struct hotplug_request {
+	TAILQ_ENTRY(hotplug_request) next; /**< Callbacks list */
+	const char *dev_name;                /* request device name */
+	enum rte_dev_event_type event;      /**< device event type */
+};
+
+/** @internal Structure to keep track of registered callbacks */
+TAILQ_HEAD(hotplug_request_list, hotplug_request);
+
 enum {
 	PORT_TOPOLOGY_PAIRED,
 	PORT_TOPOLOGY_CHAINED,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* Re: [PATCH V10 2/2] eal: add uevent pass and process function
  2018-01-14 23:24                                     ` Thomas Monjalon
@ 2018-01-15 10:52                                       ` Guo, Jia
  2018-01-15 11:29                                         ` Thomas Monjalon
  0 siblings, 1 reply; 494+ messages in thread
From: Guo, Jia @ 2018-01-15 10:52 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: dev, stephen, bruce.richardson, ferruh.yigit, gaetan.rivet,
	konstantin.ananyev, jblunck, shreyansh.jain, jingjing.wu,
	helin.zhang, motih



On 1/15/2018 7:24 AM, Thomas Monjalon wrote:
> 11/01/2018 15:05, Jeff Guo:
>> +enum rte_dev_state {
>> +	RTE_DEV_UNDEFINED,	/**< unknown device state */
>> +	RTE_DEV_FAULT,	/**< device fault or error */
>> +	RTE_DEV_PARSED,	/**< device have been parsed on bus*/
>> +	RTE_DEV_PROBED,	/**< devcie have been probed driver  */
>> +};
> Let's start with nitpicks: please be careful with spacing in comments.
> + typo: devcie
> + grammar: device has
>
> What means parsed on bus? Is it "scanned"?
absolutely what i mean is scanned.
>> +enum rte_dev_subsystem {
>> +	RTE_DEV_SUBSYSTEM_UIO,
>> +	RTE_DEV_SUBSYSTEM_VFIO,
>> +	RTE_DEV_SUBSYSTEM_PCI,
>> +	RTE_DEV_SUBSYSTEM_MAX
>> +};
> I don't think PCI and UIO/VFIO should be described at the same level.
> Can you re-use the enum rte_kernel_driver?

rte_kernel_driver might be not qualify for that use, since that is the event sumsystem, it include pci/uio/vfio, such strings to identify each subsystem. i will modify it to be rte_dev_event_subsystem.

>> +enum event_monitor_netlink_group {
>> +	RTE_DEV_EVENT_MONITOR_KERNEL,
>> +	RTE_DEV_EVENT_MONITOR_UDEV,
>> +};
> This enum should be prefixed with rte_
sure.
>> +	enum event_monitor_netlink_group group;	/**< device netlink group */
> netlink is specific to Linux.
> I don't think it should be in a generic API struct.
agree.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V10 1/2] eal: add uevent monitor api and callback func
  2018-01-14 23:16                                   ` [PATCH V10 1/2] " Thomas Monjalon
@ 2018-01-15 10:55                                     ` Guo, Jia
  2018-01-15 11:32                                       ` Thomas Monjalon
  0 siblings, 1 reply; 494+ messages in thread
From: Guo, Jia @ 2018-01-15 10:55 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: dev, stephen, bruce.richardson, ferruh.yigit, gaetan.rivet,
	konstantin.ananyev, jblunck, shreyansh.jain, jingjing.wu,
	helin.zhang, motih



On 1/15/2018 7:16 AM, Thomas Monjalon wrote:
> Hi,
>
> 11/01/2018 15:05, Jeff Guo:
>> +/* A genaral callback for all registerd devices */
> Typos: genaral, registerd
>
> So the callback is only for registered devices?
> What about hotplugged devices?
the hotplugged devices is managed by the application, if it prior 
registered and in the app's hotplug device list, will always be monitor 
whenever it plug in and out.  the eal only care about the registered 
devices.
>> +/**
>> + * It registers the callback for the specific event. Multiple
>> + * callbacks cal be registered at the same time.
>> + * @param event
>> + *  The device event type.
>> + * @param cb_fn
>> + *  callback address.
>> + * @param cb_arg
>> + *  address of parameter for callback.
>> + *
>> + * @return
>> + *  - On success, zero.
>> + *  - On failure, a negative value.
>> + */
>> +int rte_dev_callback_register(char *dev_name,
>> +			rte_dev_event_cb_fn cb_fn, void *cb_arg);
>> +
>> +/**
>> + * It unregisters the callback according to the specified event.
>> + *
>> + * @param event
>> + *  The event type which corresponding to the callback.
>> + * @param cb_fn
>> + *  callback address.
>> + *  address of parameter for callback, (void *)-1 means to remove all
>> + *  registered which has the same callback address.
>> + *
>> + * @return
>> + *  - On success, return the number of callback entities removed.
>> + *  - On failure, a negative value.
>> + */
>> +int rte_dev_callback_unregister(char *dev_name,
>> +			rte_dev_event_cb_fn cb_fn, void *cb_arg);
> These new functions should be tagged as experimental.
got it.
>> +/**
>> + * Start the device event monitoring.
>> + *
>> + * @param none
>> + * @return
>> + *   - On success, zero.
>> + *   - On failure, a negative value.
>> + */
>> +int
>> +rte_dev_evt_mntr_start(void);
> Should be experimental too, as every new public functions.
>
> Please avoid shortening function name too much.
> rte_dev_event_monitor_start is more pleasant to read.
yes , choose the more pleasant way.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V10 2/2] eal: add uevent pass and process function
  2018-01-15 10:52                                       ` Guo, Jia
@ 2018-01-15 11:29                                         ` Thomas Monjalon
  2018-01-15 15:33                                           ` Guo, Jia
  0 siblings, 1 reply; 494+ messages in thread
From: Thomas Monjalon @ 2018-01-15 11:29 UTC (permalink / raw)
  To: Guo, Jia
  Cc: dev, stephen, bruce.richardson, ferruh.yigit, gaetan.rivet,
	konstantin.ananyev, jblunck, shreyansh.jain, jingjing.wu,
	helin.zhang, motih

15/01/2018 11:52, Guo, Jia:
> On 1/15/2018 7:24 AM, Thomas Monjalon wrote:
> > 11/01/2018 15:05, Jeff Guo:
> >> +enum rte_dev_subsystem {
> >> +	RTE_DEV_SUBSYSTEM_UIO,
> >> +	RTE_DEV_SUBSYSTEM_VFIO,
> >> +	RTE_DEV_SUBSYSTEM_PCI,
> >> +	RTE_DEV_SUBSYSTEM_MAX
> >> +};
> > 
> > I don't think PCI and UIO/VFIO should be described at the same level.
> > Can you re-use the enum rte_kernel_driver?
> 
> rte_kernel_driver might be not qualify for that use, since that is the event sumsystem, it include pci/uio/vfio, such strings to identify each subsystem. i will modify it to be rte_dev_event_subsystem.

I don't understand this classification.
A device can be both PCI and VFIO.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V10 1/2] eal: add uevent monitor api and callback func
  2018-01-15 10:55                                     ` Guo, Jia
@ 2018-01-15 11:32                                       ` Thomas Monjalon
  2018-01-15 15:29                                         ` Guo, Jia
  0 siblings, 1 reply; 494+ messages in thread
From: Thomas Monjalon @ 2018-01-15 11:32 UTC (permalink / raw)
  To: Guo, Jia
  Cc: dev, stephen, bruce.richardson, ferruh.yigit, gaetan.rivet,
	konstantin.ananyev, jblunck, shreyansh.jain, jingjing.wu,
	helin.zhang, motih

15/01/2018 11:55, Guo, Jia:
> On 1/15/2018 7:16 AM, Thomas Monjalon wrote:
> > Hi,
> >
> > 11/01/2018 15:05, Jeff Guo:
> >> +/* A genaral callback for all registerd devices */
> > Typos: genaral, registerd
> >
> > So the callback is only for registered devices?
> > What about hotplugged devices?
> 
> the hotplugged devices is managed by the application, if it prior 
> registered and in the app's hotplug device list, will always be monitor 
> whenever it plug in and out.  the eal only care about the registered 
> devices.

I disagree. The application needs the EAL service to get notified
of a new device plugged in.
We should find a way to register the callback for all devices,
including new ones.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V10 1/2] eal: add uevent monitor api and callback func
  2018-01-15 11:32                                       ` Thomas Monjalon
@ 2018-01-15 15:29                                         ` Guo, Jia
  0 siblings, 0 replies; 494+ messages in thread
From: Guo, Jia @ 2018-01-15 15:29 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: dev, stephen, bruce.richardson, ferruh.yigit, gaetan.rivet,
	konstantin.ananyev, jblunck, shreyansh.jain, jingjing.wu,
	helin.zhang, motih



On 1/15/2018 7:32 PM, Thomas Monjalon wrote:
> 15/01/2018 11:55, Guo, Jia:
>> On 1/15/2018 7:16 AM, Thomas Monjalon wrote:
>>> Hi,
>>>
>>> 11/01/2018 15:05, Jeff Guo:
>>>> +/* A genaral callback for all registerd devices */
>>> Typos: genaral, registerd
>>>
>>> So the callback is only for registered devices?
>>> What about hotplugged devices?
>> the hotplugged devices is managed by the application, if it prior
>> registered and in the app's hotplug device list, will always be monitor
>> whenever it plug in and out.  the eal only care about the registered
>> devices.
> I disagree. The application needs the EAL service to get notified
> of a new device plugged in.
> We should find a way to register the callback for all devices,
> including new ones.
i think the current mechanism would let eal service detect all new 
device in , but would not notify all to the application. if the 
application need to monitor specific device , it will prior know the 
device name. but if notify all  in eal service and let application 
choose it in user side ,that is other story,  that is what you want ? 
but do you think about , if it is need to let user know all device 
plugin , or just let eal know all info would be better or safer?

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V10 2/2] eal: add uevent pass and process function
  2018-01-15 11:29                                         ` Thomas Monjalon
@ 2018-01-15 15:33                                           ` Guo, Jia
  0 siblings, 0 replies; 494+ messages in thread
From: Guo, Jia @ 2018-01-15 15:33 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: dev, stephen, bruce.richardson, ferruh.yigit, gaetan.rivet,
	konstantin.ananyev, jblunck, shreyansh.jain, jingjing.wu,
	helin.zhang, motih



On 1/15/2018 7:29 PM, Thomas Monjalon wrote:
> 15/01/2018 11:52, Guo, Jia:
>> On 1/15/2018 7:24 AM, Thomas Monjalon wrote:
>>> 11/01/2018 15:05, Jeff Guo:
>>>> +enum rte_dev_subsystem {
>>>> +	RTE_DEV_SUBSYSTEM_UIO,
>>>> +	RTE_DEV_SUBSYSTEM_VFIO,
>>>> +	RTE_DEV_SUBSYSTEM_PCI,
>>>> +	RTE_DEV_SUBSYSTEM_MAX
>>>> +};
>>> I don't think PCI and UIO/VFIO should be described at the same level.
>>> Can you re-use the enum rte_kernel_driver?
>> rte_kernel_driver might be not qualify for that use, since that is the event sumsystem, it include pci/uio/vfio, such strings to identify each subsystem. i will modify it to be rte_dev_event_subsystem.
>>
>> I don't understand this classification.
>> A device can be both PCI and VFIO.
yes , i think that might be a little strange, but what i saw in the 
uevent message is that , the item of subsystem info from kernel side is 
pci sometimes, but some time is uio, i don't know if it is good to 
defferentiy them by "subsystem " or "driver" or other. let me think 
about it more.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V11 1/3] eal: add uevent monitor api and callback func
  2018-01-15 10:48                                     ` [PATCH V11 1/3] eal: add uevent monitor api and callback func Jeff Guo
  2018-01-15 10:48                                       ` [PATCH V11 2/3] eal: add uevent pass and process function Jeff Guo
  2018-01-15 10:48                                       ` [PATCH V11 3/3] app/testpmd: use uevent to monitor hotplug Jeff Guo
@ 2018-01-17 21:59                                       ` Thomas Monjalon
  2018-01-18  4:23                                         ` Guo, Jia
  2 siblings, 1 reply; 494+ messages in thread
From: Thomas Monjalon @ 2018-01-17 21:59 UTC (permalink / raw)
  To: Jeff Guo
  Cc: dev, stephen, bruce.richardson, ferruh.yigit, gaetan.rivet,
	konstantin.ananyev, jblunck, shreyansh.jain, jingjing.wu,
	helin.zhang, motih

15/01/2018 11:48, Jeff Guo:
> + * It registers the callback for the specific event. Multiple
> + * callbacks cal be registered at the same time.
> + *
> + * @param device_name
> + *  The device name.
> + * @param cb_fn
> + *  callback address.
> + * @param cb_arg
> + *  address of parameter for callback.
> + *
> + * @return
> + *  - On success, zero.
> + *  - On failure, a negative value.
> + */
> +int rte_dev_callback_register(char *device_name, rte_dev_event_cb_fn cb_fn,
> +                                       void *cb_arg);

What is the device name?

I think we should register a callback for a rte_device or NULL (all devices).

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V11 2/3] eal: add uevent pass and process function
  2018-01-15 10:48                                       ` [PATCH V11 2/3] eal: add uevent pass and process function Jeff Guo
@ 2018-01-17 22:00                                         ` Thomas Monjalon
  2018-01-18  4:17                                           ` Guo, Jia
  0 siblings, 1 reply; 494+ messages in thread
From: Thomas Monjalon @ 2018-01-17 22:00 UTC (permalink / raw)
  To: Jeff Guo
  Cc: dev, stephen, bruce.richardson, ferruh.yigit, gaetan.rivet,
	konstantin.ananyev, jblunck, shreyansh.jain, jingjing.wu,
	helin.zhang, motih

15/01/2018 11:48, Jeff Guo:
> +enum rte_dev_event_subsystem {
> +       RTE_DEV_EVENT_SUBSYSTEM_UIO,
> +       RTE_DEV_EVENT_SUBSYSTEM_VFIO,
> +       RTE_DEV_EVENT_SUBSYSTEM_PCI,
> +       RTE_DEV_EVENT_SUBSYSTEM_MAX
> +};

I still don't understand this classification, mixing PCI and VFIO
at the same level.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH V12 1/3] eal: add uevent monitor api and callback func
  2018-01-15 10:48                                       ` [PATCH V11 3/3] app/testpmd: use uevent to monitor hotplug Jeff Guo
@ 2018-01-18  4:12                                         ` Jeff Guo
  2018-01-18  4:12                                           ` [PATCH V12 2/3] eal: add uevent pass and process function Jeff Guo
                                                             ` (3 more replies)
  0 siblings, 4 replies; 494+ messages in thread
From: Jeff Guo @ 2018-01-18  4:12 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, gaetan.rivet
  Cc: konstantin.ananyev, jblunck, shreyansh.jain, jingjing.wu, dev,
	jia.guo, thomas, helin.zhang, motih

This patch aim to add a general uevent mechanism in eal device layer,
to enable all linux kernel object uevent monitoring, user could use these
APIs to monitor and read out the device status info that sent from the
kernel side, then corresponding to handle it, such as when detect hotplug
uevent type, user could detach or attach the device, and more it benefit
to use to do smoothly fail safe work.

About uevent monitoring:
a: add one epolling to poll the netlink socket, to monitor the uevent of
   the device.
b: add enum of rte_eal_dev_event_type and struct of rte_eal_uevent.
c: add below APIs in rte eal device layer.
   rte_dev_callback_register
   rte_dev_callback_unregister
   _rte_dev_callback_process
   rte_dev_event_monitor_start
   rte_dev_event_monitor_stop

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v12->v11:
identify null param in callback for monitor all devices uevent
---
 lib/librte_eal/bsdapp/eal/eal_dev.c     |  38 ++++++
 lib/librte_eal/common/eal_common_dev.c  | 128 ++++++++++++++++++
 lib/librte_eal/common/include/rte_dev.h | 119 +++++++++++++++++
 lib/librte_eal/linuxapp/eal/Makefile    |   1 +
 lib/librte_eal/linuxapp/eal/eal_dev.c   | 223 ++++++++++++++++++++++++++++++++
 5 files changed, 509 insertions(+)
 create mode 100644 lib/librte_eal/bsdapp/eal/eal_dev.c
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_dev.c

diff --git a/lib/librte_eal/bsdapp/eal/eal_dev.c b/lib/librte_eal/bsdapp/eal/eal_dev.c
new file mode 100644
index 0000000..83ffdee
--- /dev/null
+++ b/lib/librte_eal/bsdapp/eal/eal_dev.c
@@ -0,0 +1,38 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdio.h>
+#include <string.h>
+#include <inttypes.h>
+#include <sys/queue.h>
+#include <sys/signalfd.h>
+#include <sys/ioctl.h>
+#include <sys/socket.h>
+#include <sys/epoll.h>
+#include <unistd.h>
+#include <signal.h>
+#include <stdbool.h>
+
+#include <rte_malloc.h>
+#include <rte_bus.h>
+#include <rte_dev.h>
+#include <rte_devargs.h>
+#include <rte_debug.h>
+#include <rte_log.h>
+
+#include "eal_thread.h"
+
+int
+rte_dev_event_monitor_start(void)
+{
+	RTE_LOG(ERR, EAL, "Not support event monitor for FreeBSD\n");
+	return -1;
+}
+
+int
+rte_dev_event_monitor_stop(void)
+{
+	RTE_LOG(ERR, EAL, "Not support event monitor for FreeBSD\n");
+	return -1;
+}
diff --git a/lib/librte_eal/common/eal_common_dev.c b/lib/librte_eal/common/eal_common_dev.c
index dda8f58..2a196dc 100644
--- a/lib/librte_eal/common/eal_common_dev.c
+++ b/lib/librte_eal/common/eal_common_dev.c
@@ -42,9 +42,32 @@
 #include <rte_devargs.h>
 #include <rte_debug.h>
 #include <rte_log.h>
+#include <rte_spinlock.h>
+#include <rte_malloc.h>
 
 #include "eal_private.h"
 
+/* spinlock for device callbacks */
+static rte_spinlock_t rte_dev_event_lock = RTE_SPINLOCK_INITIALIZER;
+
+/**
+ * The user application callback description.
+ *
+ * It contains callback address to be registered by user application,
+ * the pointer to the parameters for callback, and the device name.
+ */
+struct rte_dev_event_callback {
+	TAILQ_ENTRY(rte_dev_event_callback) next; /**< Callbacks list */
+	rte_dev_event_cb_fn cb_fn;                /**< Callback address */
+	void *cb_arg;                           /**< Callback parameter */
+	char *dev_name;				/**< Callback devcie name, NULL
+							is for all device */
+	uint32_t active;                        /**< Callback is executing */
+};
+
+/* A general callbacks list for all callback of devices */
+static struct rte_dev_event_cb_list dev_event_cbs;
+
 static int cmp_detached_dev_name(const struct rte_device *dev,
 	const void *_name)
 {
@@ -234,3 +257,108 @@ int rte_eal_hotplug_remove(const char *busname, const char *devname)
 	rte_eal_devargs_remove(busname, devname);
 	return ret;
 }
+
+int
+rte_dev_callback_register(char *device_name, rte_dev_event_cb_fn cb_fn,
+				void *cb_arg)
+{
+	struct rte_dev_event_callback *event_cb = NULL;
+
+	rte_spinlock_lock(&rte_dev_event_lock);
+
+	if (TAILQ_EMPTY(&(dev_event_cbs)))
+		TAILQ_INIT(&(dev_event_cbs));
+
+	TAILQ_FOREACH(event_cb, &(dev_event_cbs), next) {
+		if (event_cb->cb_fn == cb_fn &&
+			event_cb->cb_arg == cb_arg &&
+			!strcmp(event_cb->dev_name, device_name))
+			break;
+	}
+
+	/* create a new callback. */
+	if (event_cb == NULL) {
+		/* allocate a new user callback entity */
+		event_cb = malloc(sizeof(struct rte_dev_event_callback));
+		if (event_cb != NULL) {
+			event_cb->cb_fn = cb_fn;
+			event_cb->cb_arg = cb_arg;
+			event_cb->dev_name = device_name;
+		}
+		TAILQ_INSERT_TAIL(&(dev_event_cbs), event_cb, next);
+	}
+
+	rte_spinlock_unlock(&rte_dev_event_lock);
+	return (event_cb == NULL) ? -1 : 0;
+}
+
+int
+rte_dev_callback_unregister(char *device_name, rte_dev_event_cb_fn cb_fn,
+				void *cb_arg)
+{
+	int ret;
+	struct rte_dev_event_callback *event_cb, *next;
+
+	if (!cb_fn || device_name == NULL)
+		return -EINVAL;
+
+	rte_spinlock_lock(&rte_dev_event_lock);
+
+	ret = 0;
+
+	for (event_cb = TAILQ_FIRST(&(dev_event_cbs)); event_cb != NULL;
+	      event_cb = next) {
+
+		next = TAILQ_NEXT(event_cb, next);
+
+		if (event_cb->cb_fn != cb_fn ||
+				(event_cb->cb_arg != (void *)-1 &&
+				event_cb->cb_arg != cb_arg) ||
+				strcmp(event_cb->dev_name, device_name))
+			continue;
+
+		/*
+		 * if this callback is not executing right now,
+		 * then remove it.
+		 */
+		if (event_cb->active == 0) {
+			TAILQ_REMOVE(&(dev_event_cbs), event_cb, next);
+			rte_free(event_cb);
+		} else {
+			ret = -EAGAIN;
+		}
+	}
+
+	rte_spinlock_unlock(&rte_dev_event_lock);
+	return ret;
+}
+
+int
+_rte_dev_callback_process(char *device_name, enum rte_dev_event_type event,
+				void *cb_arg)
+{
+	struct rte_dev_event_callback dev_cb;
+	struct rte_dev_event_callback *cb_lst;
+	int rc = 0;
+
+	rte_spinlock_lock(&rte_dev_event_lock);
+
+	if (device_name == NULL)
+		return -EINVAL;
+
+	TAILQ_FOREACH(cb_lst, &(dev_event_cbs), next) {
+		if (cb_lst->cb_fn == NULL || (strcmp(cb_lst->dev_name,
+			device_name) && cb_lst->dev_name))
+			continue;
+		dev_cb = *cb_lst;
+		cb_lst->active = 1;
+		if (cb_arg)
+			dev_cb->cb_arg = cb_arg;
+		rc = dev_cb.cb_fn(device_name, event,
+				dev_cb.cb_arg);
+		cb_lst->active = 0;
+	}
+
+	rte_spinlock_unlock(&rte_dev_event_lock);
+	return rc;
+}
diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h
index 9342e0c..25e6747 100644
--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -51,6 +51,30 @@ extern "C" {
 
 #include <rte_log.h>
 
+/**
+ * The device event type.
+ */
+enum rte_dev_event_type {
+	RTE_DEV_EVENT_UNKNOWN,	/**< unknown event type */
+	RTE_DEV_EVENT_ADD,	/**< device being added */
+	RTE_DEV_EVENT_REMOVE,	/**< device being removed */
+	RTE_DEV_EVENT_MAX	/**< max value of this enum */
+};
+
+struct rte_dev_event {
+	enum rte_dev_event_type type;	/**< device event type */
+	int subsystem;			/**< subsystem id */
+	char *devname;			/**< device name */
+};
+
+typedef int (*rte_dev_event_cb_fn)(char *device_name,
+					enum rte_dev_event_type event,
+					void *cb_arg);
+
+struct rte_dev_event_callback;
+/** @internal Structure to keep track of registered callbacks */
+TAILQ_HEAD(rte_dev_event_cb_list, rte_dev_event_callback);
+
 __attribute__((format(printf, 2, 0)))
 static inline void
 rte_pmd_debug_trace(const char *func_name, const char *fmt, ...)
@@ -293,4 +317,99 @@ __attribute__((used)) = str
 }
 #endif
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * It registers the callback for the specific device.
+ * Multiple callbacks cal be registered at the same time.
+ *
+ * @param device_name
+ *  The device name, that is the param name of the struct rte_device,
+ *  null value means for all devices.
+ * @param cb_fn
+ *  callback address.
+ * @param cb_arg
+ *  address of parameter for callback.
+ *
+ * @return
+ *  - On success, zero.
+ *  - On failure, a negative value.
+ */
+int rte_dev_callback_register(char *device_name, rte_dev_event_cb_fn cb_fn,
+					void *cb_arg);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * It unregisters the callback according to the specified device.
+ *
+ * @param device_name
+ *  The device name, that is the param name of the struct rte_device,
+ *  null value means for all devices.
+ * @param cb_fn
+ *  callback address.
+ * @param cb_arg
+ *  address of parameter for callback, (void *)-1 means to remove all
+ *  registered which has the same callback address.
+ *
+ * @return
+ *  - On success, return the number of callback entities removed.
+ *  - On failure, a negative value.
+ */
+int rte_dev_callback_unregister(char *device_name, rte_dev_event_cb_fn cb_fn,
+					void *cb_arg);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * internal Executes all the user application registered callbacks for
+ * the specific device. It is for DPDK internal user only. User
+ * application should not call it directly.
+ *
+ * @param device_name
+ *  The device name.
+ * @param event
+ *  the device event type
+ *  is permitted or not.
+ * @param cb_arg
+ *  callback parameter.
+ *
+ * @return
+ *  - On success, return zero.
+ *  - On failure, a negative value.
+ */
+int
+_rte_dev_callback_process(char *device_name, enum rte_dev_event_type event,
+				void *cb_arg);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Start the device event monitoring.
+ *
+ * @param none
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+rte_dev_event_monitor_start(void);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Stop the device event monitoring .
+ *
+ * @param none
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+rte_dev_event_monitor_stop(void);
 #endif /* _RTE_DEV_H_ */
diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index 588c0bd..43b00e5 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -39,6 +39,7 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_lcore.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_timer.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_interrupts.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_alarm.c
+SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_dev.c
 
 # from common dir
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_common_lcore.c
diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
new file mode 100644
index 0000000..f243c2e
--- /dev/null
+++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
@@ -0,0 +1,223 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdio.h>
+#include <string.h>
+#include <inttypes.h>
+#include <sys/queue.h>
+#include <sys/signalfd.h>
+#include <sys/ioctl.h>
+#include <sys/socket.h>
+#include <linux/netlink.h>
+#include <sys/epoll.h>
+#include <unistd.h>
+#include <signal.h>
+#include <stdbool.h>
+
+#include <rte_malloc.h>
+#include <rte_bus.h>
+#include <rte_dev.h>
+#include <rte_devargs.h>
+#include <rte_debug.h>
+#include <rte_log.h>
+#include <rte_service.h>
+#include <rte_service_component.h>
+
+#include "eal_thread.h"
+
+bool service_exit = true;
+
+bool service_no_init = true;
+
+#define DEV_EV_MNT_SERVICE_NAME "device_event_monitor_service"
+
+static int
+dev_monitor_fd_new(void)
+{
+
+	int uevent_fd;
+
+	uevent_fd = socket(PF_NETLINK, SOCK_RAW | SOCK_CLOEXEC |
+			SOCK_NONBLOCK,
+			NETLINK_KOBJECT_UEVENT);
+	if (uevent_fd < 0) {
+		RTE_LOG(ERR, EAL, "create uevent fd failed\n");
+		return -1;
+	}
+	return uevent_fd;
+}
+
+static int
+dev_monitor_enable(int netlink_fd)
+{
+	struct sockaddr_nl addr;
+	int ret;
+	int size = 64 * 1024;
+	int nonblock = 1;
+
+	memset(&addr, 0, sizeof(addr));
+	addr.nl_family = AF_NETLINK;
+	addr.nl_pid = 0;
+	addr.nl_groups = 0xffffffff;
+
+	if (bind(netlink_fd, (struct sockaddr *) &addr, sizeof(addr)) < 0) {
+		RTE_LOG(ERR, EAL, "bind failed\n");
+		goto err;
+	}
+
+	setsockopt(netlink_fd, SOL_SOCKET, SO_PASSCRED, &size, sizeof(size));
+
+	ret = ioctl(netlink_fd, FIONBIO, &nonblock);
+	if (ret != 0) {
+		RTE_LOG(ERR, EAL, "ioctl(FIONBIO) failed\n");
+		goto err;
+	}
+	return 0;
+err:
+	close(netlink_fd);
+	return -1;
+}
+
+static int
+dev_uev_process(__rte_unused struct epoll_event *events, __rte_unused int nfds)
+{
+	/* TODO: device uevent processing */
+	return 0;
+}
+
+/**
+ * It builds/rebuilds up the epoll file descriptor with all the
+ * file descriptors being waited on. Then handles the netlink event.
+ *
+ * @param arg
+ *  pointer. (unused)
+ *
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+static int32_t dev_uev_monitoring(__rte_unused void *arg)
+{
+	int netlink_fd = -1;
+	struct epoll_event ep_kernel;
+	int fd_ep = -1;
+
+	service_exit = false;
+
+	fd_ep = epoll_create1(EPOLL_CLOEXEC);
+	if (fd_ep < 0) {
+		RTE_LOG(ERR, EAL, "error creating epoll fd: %m\n");
+		goto out;
+	}
+
+	netlink_fd = dev_monitor_fd_new();
+
+	if (dev_monitor_enable(netlink_fd) < 0) {
+		RTE_LOG(ERR, EAL, "error subscribing to kernel events\n");
+		goto out;
+	}
+
+	memset(&ep_kernel, 0, sizeof(struct epoll_event));
+	ep_kernel.events = EPOLLIN | EPOLLPRI | EPOLLRDHUP | EPOLLHUP;
+	ep_kernel.data.fd = netlink_fd;
+	if (epoll_ctl(fd_ep, EPOLL_CTL_ADD, netlink_fd,
+		&ep_kernel) < 0) {
+		RTE_LOG(ERR, EAL, "error addding fd to epoll: %m\n");
+		goto out;
+	}
+
+	while (!service_exit) {
+		int fdcount;
+		struct epoll_event ev[1];
+
+		fdcount = epoll_wait(fd_ep, ev, 1, -1);
+		if (fdcount < 0) {
+			if (errno != EINTR)
+				RTE_LOG(ERR, EAL, "error receiving uevent "
+					"message: %m\n");
+				continue;
+			}
+
+		/* epoll_wait has at least one fd ready to read */
+		if (dev_uev_process(ev, fdcount) < 0) {
+			if (errno != EINTR)
+				RTE_LOG(ERR, EAL, "error processing uevent "
+					"message: %m\n");
+		}
+	}
+	return 0;
+out:
+	if (fd_ep >= 0)
+		close(fd_ep);
+	if (netlink_fd >= 0)
+		close(netlink_fd);
+	rte_panic("uev monitoring fail\n");
+	return -1;
+}
+
+int
+rte_dev_event_monitor_start(void)
+{
+	int ret;
+	struct rte_service_spec service;
+	uint32_t id;
+	const uint32_t sid = 0;
+
+	if (!service_no_init)
+		return 0;
+
+	uint32_t slcore_1 = rte_get_next_lcore(/* start core */ -1,
+					       /* skip master */ 1,
+					       /* wrap */ 0);
+
+	ret = rte_service_lcore_add(slcore_1);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "dev event monitor lcore add fail");
+		return ret;
+	}
+
+	memset(&service, 0, sizeof(service));
+	snprintf(service.name, sizeof(service.name), DEV_EV_MNT_SERVICE_NAME);
+
+	service.socket_id = rte_socket_id();
+	service.callback = dev_uev_monitoring;
+	service.callback_userdata = NULL;
+	service.capabilities = 0;
+	ret = rte_service_component_register(&service, &id);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Failed to register service %s "
+			"err = %" PRId32,
+			service.name, ret);
+		return ret;
+	}
+	ret = rte_service_runstate_set(sid, 1);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Failed to set the runstate of "
+			"the service");
+		return ret;
+	}
+	ret = rte_service_component_runstate_set(id, 1);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Failed to set the backend runstate"
+			" of a component");
+		return ret;
+	}
+	ret = rte_service_map_lcore_set(sid, slcore_1, 1);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Failed to enable lcore 1 on "
+			"dev event monitor service");
+		return ret;
+	}
+	rte_service_lcore_start(slcore_1);
+	service_no_init = false;
+	return 0;
+}
+
+int
+rte_dev_event_monitor_stop(void)
+{
+	service_exit = true;
+	service_no_init = true;
+	return 0;
+}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH V12 2/3] eal: add uevent pass and process function
  2018-01-18  4:12                                         ` [PATCH V12 1/3] eal: add uevent monitor api and callback func Jeff Guo
@ 2018-01-18  4:12                                           ` Jeff Guo
  2018-01-24 15:00                                             ` Wu, Jingjing
  2018-01-18  4:12                                           ` [PATCH V12 3/3] app/testpmd: use uevent to monitor hotplug Jeff Guo
                                                             ` (2 subsequent siblings)
  3 siblings, 1 reply; 494+ messages in thread
From: Jeff Guo @ 2018-01-18  4:12 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, gaetan.rivet
  Cc: konstantin.ananyev, jblunck, shreyansh.jain, jingjing.wu, dev,
	jia.guo, thomas, helin.zhang, motih

In order to handle the uevent which have been detected from the kernel
side, add uevent process function, let hot plug event to be example to
show uevent mechanism how to pass the uevent and process the uevent.

About uevent passing and processing, add below functions in linux eal
dev layer. FreeBSD not support uevent ,so let it to be void and do not
implement in function.
a.dev_uev_parse
b.dev_uev_receive
c.dev_uev_process

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v12->v11:
base on newer kernel driver, delete some unuse param in event subsystem
---
 lib/librte_eal/common/include/rte_dev.h |  16 +++++
 lib/librte_eal/linuxapp/eal/eal_dev.c   | 110 +++++++++++++++++++++++++++++++-
 2 files changed, 124 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h
index 25e6747..b3733bf 100644
--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -51,6 +51,22 @@ extern "C" {
 
 #include <rte_log.h>
 
+#define RTE_EAL_UEV_MSG_LEN 4096
+#define RTE_EAL_UEV_MSG_ELEM_LEN 128
+
+enum rte_dev_state {
+	RTE_DEV_UNDEFINED,	/**< unknown device state */
+	RTE_DEV_FAULT,	/**< device fault or error */
+	RTE_DEV_PARSED,	/**< device has been scanned on bus*/
+	RTE_DEV_PROBED,	/**< device has been probed driver  */
+};
+
+enum rte_dev_event_subsystem {
+	RTE_DEV_EVENT_SUBSYSTEM_UIO,
+	RTE_DEV_EVENT_SUBSYSTEM_VFIO,
+	RTE_DEV_EVENT_SUBSYSTEM_MAX
+};
+
 /**
  * The device event type.
  */
diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
index f243c2e..31c7da8 100644
--- a/lib/librte_eal/linuxapp/eal/eal_dev.c
+++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
@@ -79,10 +79,116 @@ dev_monitor_enable(int netlink_fd)
 	return -1;
 }
 
+static void
+dev_uev_parse(const char *buf, struct rte_dev_event *event)
+{
+	char action[RTE_EAL_UEV_MSG_ELEM_LEN];
+	char subsystem[RTE_EAL_UEV_MSG_ELEM_LEN];
+	char dev_path[RTE_EAL_UEV_MSG_ELEM_LEN];
+	char pci_slot_name[RTE_EAL_UEV_MSG_ELEM_LEN];
+	int i = 0;
+
+	memset(action, 0, RTE_EAL_UEV_MSG_ELEM_LEN);
+	memset(subsystem, 0, RTE_EAL_UEV_MSG_ELEM_LEN);
+	memset(dev_path, 0, RTE_EAL_UEV_MSG_ELEM_LEN);
+	memset(pci_slot_name, 0, RTE_EAL_UEV_MSG_ELEM_LEN);
+
+	while (i < RTE_EAL_UEV_MSG_LEN) {
+		for (; i < RTE_EAL_UEV_MSG_LEN; i++) {
+			if (*buf)
+				break;
+			buf++;
+		}
+		if (!strncmp(buf, "libudev", 7)) {
+			buf += 7;
+			i += 7;
+			return;
+		}
+		if (!strncmp(buf, "ACTION=", 7)) {
+			buf += 7;
+			i += 7;
+			snprintf(action, sizeof(action), "%s", buf);
+		} else if (!strncmp(buf, "DEVPATH=", 8)) {
+			buf += 8;
+			i += 8;
+			snprintf(dev_path, sizeof(dev_path), "%s", buf);
+		} else if (!strncmp(buf, "SUBSYSTEM=", 10)) {
+			buf += 10;
+			i += 10;
+			snprintf(subsystem, sizeof(subsystem), "%s", buf);
+		} else if (!strncmp(buf, "PCI_SLOT_NAME=", 14)) {
+			buf += 14;
+			i += 14;
+			snprintf(pci_slot_name, sizeof(subsystem), "%s", buf);
+			event->devname = pci_slot_name;
+		}
+		for (; i < RTE_EAL_UEV_MSG_LEN; i++) {
+			if (*buf == '\0')
+				break;
+			buf++;
+		}
+	}
+
+	if (!strncmp(subsystem, "pci", 3))
+		event->subsystem = RTE_DEV_EVENT_SUBSYSTEM_UIO;
+	if (!strncmp(action, "add", 3))
+		event->type = RTE_DEV_EVENT_ADD;
+	if (!strncmp(action, "remove", 6))
+		event->type = RTE_DEV_EVENT_REMOVE;
+	event->devname = pci_slot_name;
+}
+
+static int
+dev_uev_receive(int fd, struct rte_dev_event *uevent)
+{
+	int ret;
+	char buf[RTE_EAL_UEV_MSG_LEN];
+
+	memset(uevent, 0, sizeof(struct rte_dev_event));
+	memset(buf, 0, RTE_EAL_UEV_MSG_LEN);
+
+	ret = recv(fd, buf, RTE_EAL_UEV_MSG_LEN - 1, MSG_DONTWAIT);
+	if (ret < 0) {
+		RTE_LOG(ERR, EAL,
+		"Socket read error(%d): %s\n",
+		errno, strerror(errno));
+		return -1;
+	} else if (ret == 0)
+		/* connection closed */
+		return -1;
+
+	dev_uev_parse(buf, uevent);
+
+	return 0;
+}
+
 static int
-dev_uev_process(__rte_unused struct epoll_event *events, __rte_unused int nfds)
+dev_uev_process(struct epoll_event *events, int nfds)
 {
-	/* TODO: device uevent processing */
+	struct rte_dev_event uevent;
+	int i;
+
+	for (i = 0; i < nfds; i++) {
+		/**
+		 * check device uevent from kernel side, no need to check
+		 * uevent from udev.
+		 */
+		if (dev_uev_receive(events[i].data.fd, &uevent))
+			return 0;
+
+		/* default handle all pci devcie when is being hot plug */
+		if (uevent.subsystem == RTE_DEV_EVENT_SUBSYSTEM_UIO) {
+			if (uevent.type == RTE_DEV_EVENT_REMOVE) {
+				return(_rte_dev_callback_process(
+					uevent.devname,
+					RTE_DEV_EVENT_REMOVE, NULL));
+			} else if (uevent.type == RTE_DEV_EVENT_ADD) {
+				return(_rte_dev_callback_process(
+					uevent.devname,
+					RTE_DEV_EVENT_ADD, NULL));
+			}
+		}
+	}
 	return 0;
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH V12 3/3] app/testpmd: use uevent to monitor hotplug
  2018-01-18  4:12                                         ` [PATCH V12 1/3] eal: add uevent monitor api and callback func Jeff Guo
  2018-01-18  4:12                                           ` [PATCH V12 2/3] eal: add uevent pass and process function Jeff Guo
@ 2018-01-18  4:12                                           ` Jeff Guo
  2018-01-24 15:21                                             ` Wu, Jingjing
                                                               ` (2 more replies)
  2018-01-19  1:13                                           ` [PATCH V12 " Thomas Monjalon
  2018-01-24 14:52                                           ` Wu, Jingjing
  3 siblings, 3 replies; 494+ messages in thread
From: Jeff Guo @ 2018-01-18  4:12 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, gaetan.rivet
  Cc: konstantin.ananyev, jblunck, shreyansh.jain, jingjing.wu, dev,
	jia.guo, thomas, helin.zhang, motih

use testpmd for example, to show app how to request and use
uevent monitoring to handle the hot removal event and the
hot insertion event.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v12->v11:
no change
---
 app/test-pmd/testpmd.c | 168 +++++++++++++++++++++++++++++++++++++++++++++++++
 app/test-pmd/testpmd.h |   9 +++
 2 files changed, 177 insertions(+)

diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 9414d0e..87e3753 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -373,6 +373,8 @@ uint8_t bitrate_enabled;
 struct gro_status gro_ports[RTE_MAX_ETHPORTS];
 uint8_t gro_flush_cycles = GRO_DEFAULT_FLUSH_CYCLES;
 
+static struct hotplug_request_list hp_list;
+
 /* Forward function declarations */
 static void map_port_queue_stats_mapping_registers(portid_t pi,
 						   struct rte_port *port);
@@ -380,6 +382,13 @@ static void check_all_ports_link_status(uint32_t port_mask);
 static int eth_event_callback(portid_t port_id,
 			      enum rte_eth_event_type type,
 			      void *param, void *ret_param);
+static int eth_uevent_callback(char *device_name, enum rte_dev_event_type type,
+			      void *param);
+static int eth_uevent_callback_register(portid_t port_id);
+static int in_hotplug_list(const char *dev_name);
+
+static int hotplug_list_add(const char *dev_name,
+			    enum rte_dev_event_type event);
 
 /*
  * Check if all the ports are started.
@@ -1729,6 +1738,27 @@ reset_port(portid_t pid)
 	printf("Done\n");
 }
 
+static int
+eth_uevent_callback_register(portid_t port_id)
+{
+	int diag;
+	char device_name[128];
+
+	snprintf(device_name, sizeof(device_name),
+		"%s", rte_eth_devices[port_id].device->name);
+
+	/* register the uevent callback */
+
+	diag = rte_dev_callback_register(device_name,
+		eth_uevent_callback, (void *)(intptr_t)port_id);
+	if (diag) {
+		printf("Failed to setup uevent callback\n");
+		return -1;
+	}
+
+	return 0;
+}
+
 void
 attach_port(char *identifier)
 {
@@ -1745,6 +1775,8 @@ attach_port(char *identifier)
 	if (rte_eth_dev_attach(identifier, &pi))
 		return;
 
+	eth_uevent_callback_register(pi);
+
 	socket_id = (unsigned)rte_eth_dev_socket_id(pi);
 	/* if socket_id is invalid, set to 0 */
 	if (check_socket_id(socket_id) < 0)
@@ -1756,6 +1788,8 @@ attach_port(char *identifier)
 
 	ports[pi].port_status = RTE_PORT_STOPPED;
 
+	hotplug_list_add(identifier, RTE_DEV_EVENT_REMOVE);
+
 	printf("Port %d is attached. Now total ports is %d\n", pi, nb_ports);
 	printf("Done\n");
 }
@@ -1782,6 +1816,9 @@ detach_port(portid_t port_id)
 
 	nb_ports = rte_eth_dev_count();
 
+	hotplug_list_add(rte_eth_devices[port_id].device->name,
+			 RTE_DEV_EVENT_ADD);
+
 	printf("Port '%s' is detached. Now total ports is %d\n",
 			name, nb_ports);
 	printf("Done\n");
@@ -1805,6 +1842,9 @@ pmd_test_exit(void)
 			close_port(pt_id);
 		}
 	}
+
+	rte_dev_event_monitor_stop();
+
 	printf("\nBye...\n");
 }
 
@@ -1889,6 +1929,49 @@ rmv_event_callback(void *arg)
 			dev->device->name);
 }
 
+static void
+rmv_uevent_callback(void *arg)
+{
+	char name[RTE_ETH_NAME_MAX_LEN];
+	uint8_t port_id = (intptr_t)arg;
+
+	rte_eal_alarm_cancel(rmv_uevent_callback, arg);
+
+	RTE_ETH_VALID_PORTID_OR_RET(port_id);
+	printf("removing port id:%u\n", port_id);
+
+	if (!in_hotplug_list(rte_eth_devices[port_id].device->name))
+		return;
+
+	stop_packet_forwarding();
+
+	stop_port(port_id);
+	close_port(port_id);
+	if (rte_eth_dev_detach(port_id, name)) {
+		RTE_LOG(ERR, USER1, "Failed to detach port '%s'\n", name);
+		return;
+	}
+
+	nb_ports = rte_eth_dev_count();
+
+	printf("Port '%s' is detached. Now total ports is %d\n",
+			name, nb_ports);
+}
+
+static void
+add_uevent_callback(void *arg)
+{
+	char *dev_name = (char *)arg;
+
+	rte_eal_alarm_cancel(add_uevent_callback, arg);
+
+	if (!in_hotplug_list(dev_name))
+		return;
+
+	RTE_LOG(ERR, EAL, "add device: %s\n", dev_name);
+	attach_port(dev_name);
+}
+
 /* This function is used by the interrupt thread */
 static int
 eth_event_callback(portid_t port_id, enum rte_eth_event_type type, void *param,
@@ -1931,6 +2014,82 @@ eth_event_callback(portid_t port_id, enum rte_eth_event_type type, void *param,
 }
 
 static int
+in_hotplug_list(const char *dev_name)
+{
+	struct hotplug_request *hp_request = NULL;
+
+	TAILQ_FOREACH(hp_request, &hp_list, next) {
+		if (!strcmp(hp_request->dev_name, dev_name))
+			break;
+	}
+
+	if (hp_request)
+		return 1;
+
+	return 0;
+}
+
+static int
+hotplug_list_add(const char *dev_name, enum rte_dev_event_type event)
+{
+	struct hotplug_request *hp_request;
+
+	hp_request = rte_zmalloc("hoplug request",
+			sizeof(*hp_request), 0);
+	if (hp_request == NULL) {
+		fprintf(stderr, "%s can not alloc memory\n",
+			__func__);
+		return -ENOMEM;
+	}
+
+	hp_request->dev_name = dev_name;
+	hp_request->event = event;
+
+	TAILQ_INSERT_TAIL(&hp_list, hp_request, next);
+
+	return 0;
+}
+
+/* This function is used by the interrupt thread */
+static int
+eth_uevent_callback(char *device_name, enum rte_dev_event_type type, void *arg)
+{
+	static const char * const event_desc[] = {
+		[RTE_DEV_EVENT_UNKNOWN] = "Unknown",
+		[RTE_DEV_EVENT_ADD] = "add",
+		[RTE_DEV_EVENT_REMOVE] = "remove",
+	};
+
+	if (type >= RTE_DEV_EVENT_MAX) {
+		fprintf(stderr, "%s called upon invalid event %d\n",
+			__func__, type);
+		fflush(stderr);
+	} else if (event_print_mask & (UINT32_C(1) << type)) {
+		printf("%s event\n",
+			event_desc[type]);
+		fflush(stdout);
+	}
+
+	switch (type) {
+	case RTE_DEV_EVENT_REMOVE:
+		if (rte_eal_alarm_set(100000,
+			rmv_uevent_callback, arg))
+			fprintf(stderr, "Could not set up deferred "
+				"device removal\n");
+		break;
+	case RTE_DEV_EVENT_ADD:
+		if (rte_eal_alarm_set(500000,
+			add_uevent_callback, (void *)device_name))
+			fprintf(stderr, "Could not set up deferred "
+				"device add\n");
+		break;
+	default:
+		break;
+	}
+	return 0;
+}
+
+static int
 set_tx_queue_stats_mapping_registers(portid_t port_id, struct rte_port *port)
 {
 	uint16_t i;
@@ -2415,6 +2574,15 @@ main(int argc, char** argv)
 		       nb_rxq, nb_txq);
 
 	init_config();
+
+	/* enable hot plug monitoring */
+	TAILQ_INIT(&hp_list);
+	rte_dev_event_monitor_start();
+	RTE_ETH_FOREACH_DEV(port_id) {
+		hotplug_list_add(rte_eth_devices[port_id].device->name,
+			 RTE_DEV_EVENT_REMOVE);
+		eth_uevent_callback_register(port_id);
+	}
 	if (start_port(RTE_PORT_ALL) != 0)
 		rte_exit(EXIT_FAILURE, "Start ports failed\n");
 
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 2a266fd..64254e6 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -63,6 +63,15 @@ typedef uint16_t streamid_t;
 #define TM_MODE			0
 #endif
 
+struct hotplug_request {
+	TAILQ_ENTRY(hotplug_request) next; /**< Callbacks list */
+	const char *dev_name;                /* request device name */
+	enum rte_dev_event_type event;      /**< device event type */
+};
+
+/** @internal Structure to keep track of registered callbacks */
+TAILQ_HEAD(hotplug_request_list, hotplug_request);
+
 enum {
 	PORT_TOPOLOGY_PAIRED,
 	PORT_TOPOLOGY_CHAINED,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* Re: [PATCH V11 2/3] eal: add uevent pass and process function
  2018-01-17 22:00                                         ` Thomas Monjalon
@ 2018-01-18  4:17                                           ` Guo, Jia
  0 siblings, 0 replies; 494+ messages in thread
From: Guo, Jia @ 2018-01-18  4:17 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: dev, stephen, bruce.richardson, ferruh.yigit, gaetan.rivet,
	konstantin.ananyev, jblunck, shreyansh.jain, jingjing.wu,
	helin.zhang, motih



On 1/18/2018 6:00 AM, Thomas Monjalon wrote:
> 15/01/2018 11:48, Jeff Guo:
>> +enum rte_dev_event_subsystem {
>> +       RTE_DEV_EVENT_SUBSYSTEM_UIO,
>> +       RTE_DEV_EVENT_SUBSYSTEM_VFIO,
>> +       RTE_DEV_EVENT_SUBSYSTEM_PCI,
>> +       RTE_DEV_EVENT_SUBSYSTEM_MAX
>> +};
> I still don't understand this classification, mixing PCI and VFIO
> at the same level.
ok, so let me explicit explain that that is because the deference kernel 
version would poke deference subsystem info from the net link message. 
so  just forcus on the new kernel version would be fine and delete pci item.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V11 1/3] eal: add uevent monitor api and callback func
  2018-01-17 21:59                                       ` [PATCH V11 " Thomas Monjalon
@ 2018-01-18  4:23                                         ` Guo, Jia
  2018-01-19  1:10                                           ` Thomas Monjalon
  0 siblings, 1 reply; 494+ messages in thread
From: Guo, Jia @ 2018-01-18  4:23 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: dev, stephen, bruce.richardson, ferruh.yigit, gaetan.rivet,
	konstantin.ananyev, jblunck, shreyansh.jain, jingjing.wu,
	helin.zhang, motih



On 1/18/2018 5:59 AM, Thomas Monjalon wrote:
> 15/01/2018 11:48, Jeff Guo:
>> + * It registers the callback for the specific event. Multiple
>> + * callbacks cal be registered at the same time.
>> + *
>> + * @param device_name
>> + *  The device name.
>> + * @param cb_fn
>> + *  callback address.
>> + * @param cb_arg
>> + *  address of parameter for callback.
>> + *
>> + * @return
>> + *  - On success, zero.
>> + *  - On failure, a negative value.
>> + */
>> +int rte_dev_callback_register(char *device_name, rte_dev_event_cb_fn cb_fn,
>> +                                       void *cb_arg);
> What is the device name?
>
> I think we should register a callback for a rte_device or NULL (all devices).
please see my v12 patch, the device name have been info to user.
i think a device name for a callback might be fulfill , since if use 
NULL for all device, a callback could not belong to a NULL point. if 
there are any advantage by callback for a rte_device, please explicit 
outline it. and i think it must be a improvement  and anyway if need i 
will try to make it better.
and what ever a callback for a rte_device or a device name for a 
callback, i think that is not our gap, i guess what you care about is 
that the new and firstly hot plug in device monitor , so i would add 
null check for identify these new device callback. am i right?

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V11 1/3] eal: add uevent monitor api and callback func
  2018-01-18  4:23                                         ` Guo, Jia
@ 2018-01-19  1:10                                           ` Thomas Monjalon
  0 siblings, 0 replies; 494+ messages in thread
From: Thomas Monjalon @ 2018-01-19  1:10 UTC (permalink / raw)
  To: Guo, Jia
  Cc: dev, stephen, bruce.richardson, ferruh.yigit, gaetan.rivet,
	konstantin.ananyev, jblunck, shreyansh.jain, jingjing.wu,
	helin.zhang, motih

18/01/2018 05:23, Guo, Jia:
> 
> On 1/18/2018 5:59 AM, Thomas Monjalon wrote:
> > 15/01/2018 11:48, Jeff Guo:
> >> + * It registers the callback for the specific event. Multiple
> >> + * callbacks cal be registered at the same time.
> >> + *
> >> + * @param device_name
> >> + *  The device name.
> >> + * @param cb_fn
> >> + *  callback address.
> >> + * @param cb_arg
> >> + *  address of parameter for callback.
> >> + *
> >> + * @return
> >> + *  - On success, zero.
> >> + *  - On failure, a negative value.
> >> + */
> >> +int rte_dev_callback_register(char *device_name, rte_dev_event_cb_fn cb_fn,
> >> +                                       void *cb_arg);
> > What is the device name?
> >
> > I think we should register a callback for a rte_device or NULL (all devices).
> please see my v12 patch, the device name have been info to user.
> i think a device name for a callback might be fulfill , since if use 
> NULL for all device, a callback could not belong to a NULL point. if 
> there are any advantage by callback for a rte_device, please explicit 
> outline it. and i think it must be a improvement  and anyway if need i 
> will try to make it better.
> and what ever a callback for a rte_device or a device name for a 
> callback, i think that is not our gap, i guess what you care about is 
> that the new and firstly hot plug in device monitor , so i would add 
> null check for identify these new device callback. am i right?

Yes I am looking for new device event.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V12 1/3] eal: add uevent monitor api and callback func
  2018-01-18  4:12                                         ` [PATCH V12 1/3] eal: add uevent monitor api and callback func Jeff Guo
  2018-01-18  4:12                                           ` [PATCH V12 2/3] eal: add uevent pass and process function Jeff Guo
  2018-01-18  4:12                                           ` [PATCH V12 3/3] app/testpmd: use uevent to monitor hotplug Jeff Guo
@ 2018-01-19  1:13                                           ` Thomas Monjalon
  2018-01-19  2:51                                             ` Guo, Jia
  2018-01-24 14:52                                           ` Wu, Jingjing
  3 siblings, 1 reply; 494+ messages in thread
From: Thomas Monjalon @ 2018-01-19  1:13 UTC (permalink / raw)
  To: Jeff Guo
  Cc: dev, stephen, bruce.richardson, ferruh.yigit, gaetan.rivet,
	konstantin.ananyev, jblunck, shreyansh.jain, jingjing.wu,
	helin.zhang, motih

18/01/2018 05:12, Jeff Guo:
> + * It registers the callback for the specific device.
> + * Multiple callbacks cal be registered at the same time.
> + *
> + * @param device_name
> + *  The device name, that is the param name of the struct rte_device,

Why not using rte_device pointer?

> + *  null value means for all devices.

I don't see any management of NULL value.
On the contrary, I see
+       if (device_name == NULL)
+               return -EINVAL;

> + * @param cb_fn
> + *  callback address.
> + * @param cb_arg
> + *  address of parameter for callback.
> + *
> + * @return
> + *  - On success, zero.
> + *  - On failure, a negative value.
> + */
> +int rte_dev_callback_register(char *device_name, rte_dev_event_cb_fn cb_fn,
> +                                       void *cb_arg);
> +

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V12 1/3] eal: add uevent monitor api and callback func
  2018-01-19  1:13                                           ` [PATCH V12 " Thomas Monjalon
@ 2018-01-19  2:51                                             ` Guo, Jia
  0 siblings, 0 replies; 494+ messages in thread
From: Guo, Jia @ 2018-01-19  2:51 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: dev, stephen, bruce.richardson, ferruh.yigit, gaetan.rivet,
	konstantin.ananyev, jblunck, shreyansh.jain, jingjing.wu,
	helin.zhang, motih



On 1/19/2018 9:13 AM, Thomas Monjalon wrote:
> 18/01/2018 05:12, Jeff Guo:
>> + * It registers the callback for the specific device.
>> + * Multiple callbacks cal be registered at the same time.
>> + *
>> + * @param device_name
>> + *  The device name, that is the param name of the struct rte_device,
> Why not using rte_device pointer?
sorry,  maybe i have address the reason in other patch mail loop but i 
will explain again. since if use NULL for all device, a callback could 
not belong to a NULL  rte_device pointer.
>> + *  null value means for all devices.
> I don't see any management of NULL value.
> On the contrary, I see
> +       if (device_name == NULL)
> +               return -EINVAL;
the device_name is from the uevent massage, it should not be null for 
ever. NULL value for all devices is use the params dev_name in the 
structure of  rte_dev_event_callback, and control by below part of code. 
if dev->name is null, don't care about the whether the device_name have 
been registered or not. i think that would be fulfill all new device 
monitor.

	TAILQ_FOREACH(cb_lst, &(dev_event_cbs), next) {
		if (cb_lst->cb_fn == NULL || (strcmp(cb_lst->dev_name,
			device_name) && cb_lst->dev_name))
			continue;
		dev_cb = *cb_lst;


>> + * @param cb_fn
>> + *  callback address.
>> + * @param cb_arg
>> + *  address of parameter for callback.
>> + *
>> + * @return
>> + *  - On success, zero.
>> + *  - On failure, a negative value.
>> + */
>> +int rte_dev_callback_register(char *device_name, rte_dev_event_cb_fn cb_fn,
>> +                                       void *cb_arg);
>> +
>
>

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V12 1/3] eal: add uevent monitor api and callback func
  2018-01-18  4:12                                         ` [PATCH V12 1/3] eal: add uevent monitor api and callback func Jeff Guo
                                                             ` (2 preceding siblings ...)
  2018-01-19  1:13                                           ` [PATCH V12 " Thomas Monjalon
@ 2018-01-24 14:52                                           ` Wu, Jingjing
  2018-01-25 14:57                                             ` Guo, Jia
  3 siblings, 1 reply; 494+ messages in thread
From: Wu, Jingjing @ 2018-01-24 14:52 UTC (permalink / raw)
  To: Guo, Jia, stephen, Richardson, Bruce, Yigit, Ferruh, gaetan.rivet
  Cc: Ananyev, Konstantin, jblunck, shreyansh.jain, dev, thomas, Zhang,
	Helin, motih



> -----Original Message-----
> From: Guo, Jia
> Sent: Thursday, January 18, 2018 12:12 PM
> To: stephen@networkplumber.org; Richardson, Bruce <bruce.richardson@intel.com>;
> Yigit, Ferruh <ferruh.yigit@intel.com>; gaetan.rivet@6wind.com
> Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; jblunck@infradead.org;
> shreyansh.jain@nxp.com; Wu, Jingjing <jingjing.wu@intel.com>; dev@dpdk.org; Guo, Jia
> <jia.guo@intel.com>; thomas@monjalon.net; Zhang, Helin <helin.zhang@intel.com>;
> motih@mellanox.com
> Subject: [PATCH V12 1/3] eal: add uevent monitor api and callback func
> 
> This patch aim to add a general uevent mechanism in eal device layer,
> to enable all linux kernel object uevent monitoring, user could use these
> APIs to monitor and read out the device status info that sent from the
> kernel side, then corresponding to handle it, such as when detect hotplug
> uevent type, user could detach or attach the device, and more it benefit
> to use to do smoothly fail safe work.
> 
> About uevent monitoring:
> a: add one epolling to poll the netlink socket, to monitor the uevent of
>    the device.
> b: add enum of rte_eal_dev_event_type and struct of rte_eal_uevent.
> c: add below APIs in rte eal device layer.
>    rte_dev_callback_register
>    rte_dev_callback_unregister
>    _rte_dev_callback_process
>    rte_dev_event_monitor_start
>    rte_dev_event_monitor_stop
> 
> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> ---
> v12->v11:
> identify null param in callback for monitor all devices uevent
> ---
>  lib/librte_eal/bsdapp/eal/eal_dev.c     |  38 ++++++
>  lib/librte_eal/common/eal_common_dev.c  | 128 ++++++++++++++++++
>  lib/librte_eal/common/include/rte_dev.h | 119 +++++++++++++++++
>  lib/librte_eal/linuxapp/eal/Makefile    |   1 +
>  lib/librte_eal/linuxapp/eal/eal_dev.c   | 223 ++++++++++++++++++++++++++++++++
>  5 files changed, 509 insertions(+)
>  create mode 100644 lib/librte_eal/bsdapp/eal/eal_dev.c
>  create mode 100644 lib/librte_eal/linuxapp/eal/eal_dev.c
> 

[......]

> +int
> +rte_dev_callback_register(char *device_name, rte_dev_event_cb_fn cb_fn,
> +				void *cb_arg)
> +{
> +	struct rte_dev_event_callback *event_cb = NULL;
> +
> +	rte_spinlock_lock(&rte_dev_event_lock);
> +
> +	if (TAILQ_EMPTY(&(dev_event_cbs)))
> +		TAILQ_INIT(&(dev_event_cbs));
> +
> +	TAILQ_FOREACH(event_cb, &(dev_event_cbs), next) {
> +		if (event_cb->cb_fn == cb_fn &&
> +			event_cb->cb_arg == cb_arg &&
> +			!strcmp(event_cb->dev_name, device_name))
device_name = NULL means means for all devices, right? Can strcmp accept NULL arguments?

> +			break;
> +	}
> +
> +	/* create a new callback. */
> +	if (event_cb == NULL) {
> +		/* allocate a new user callback entity */
> +		event_cb = malloc(sizeof(struct rte_dev_event_callback));
> +		if (event_cb != NULL) {
> +			event_cb->cb_fn = cb_fn;
> +			event_cb->cb_arg = cb_arg;
> +			event_cb->dev_name = device_name;
> +		}
Is that OK to call TAILQ_INSERT_TAIL below if event_cb == NULL?

> +		TAILQ_INSERT_TAIL(&(dev_event_cbs), event_cb, next);
> +	}
> +
> +	rte_spinlock_unlock(&rte_dev_event_lock);
> +	return (event_cb == NULL) ? -1 : 0;
> +}
> +
> +int
> +rte_dev_callback_unregister(char *device_name, rte_dev_event_cb_fn cb_fn,
> +				void *cb_arg)
> +{
> +	int ret;
> +	struct rte_dev_event_callback *event_cb, *next;
> +
> +	if (!cb_fn || device_name == NULL)
> +		return -EINVAL;
> +
> +	rte_spinlock_lock(&rte_dev_event_lock);
> +
> +	ret = 0;
> +
> +	for (event_cb = TAILQ_FIRST(&(dev_event_cbs)); event_cb != NULL;
> +	      event_cb = next) {
> +
> +		next = TAILQ_NEXT(event_cb, next);
> +
> +		if (event_cb->cb_fn != cb_fn ||
> +				(event_cb->cb_arg != (void *)-1 &&
> +				event_cb->cb_arg != cb_arg) ||
> +				strcmp(event_cb->dev_name, device_name))

The same comments as above.

> +			continue;
> +
> +		/*
> +		 * if this callback is not executing right now,
> +		 * then remove it.
> +		 */
> +		if (event_cb->active == 0) {
> +			TAILQ_REMOVE(&(dev_event_cbs), event_cb, next);
> +			rte_free(event_cb);
> +		} else {
> +			ret = -EAGAIN;
> +		}
> +	}
> +
> +	rte_spinlock_unlock(&rte_dev_event_lock);
> +	return ret;
> +}
> +

[......]

> +int
> +rte_dev_event_monitor_start(void)
> +{
> +	int ret;
> +	struct rte_service_spec service;
> +	uint32_t id;
> +	const uint32_t sid = 0;
> +
> +	if (!service_no_init)
> +		return 0;
> +
> +	uint32_t slcore_1 = rte_get_next_lcore(/* start core */ -1,
> +					       /* skip master */ 1,
> +					       /* wrap */ 0);
> +
> +	ret = rte_service_lcore_add(slcore_1);
> +	if (ret) {
> +		RTE_LOG(ERR, EAL, "dev event monitor lcore add fail");
> +		return ret;
> +	}
> +
> +	memset(&service, 0, sizeof(service));
> +	snprintf(service.name, sizeof(service.name), DEV_EV_MNT_SERVICE_NAME);
> +
> +	service.socket_id = rte_socket_id();
> +	service.callback = dev_uev_monitoring;
> +	service.callback_userdata = NULL;
> +	service.capabilities = 0;
> +	ret = rte_service_component_register(&service, &id);
> +	if (ret) {
> +		RTE_LOG(ERR, EAL, "Failed to register service %s "
> +			"err = %" PRId32,
> +			service.name, ret);
> +		return ret;
> +	}
> +	ret = rte_service_runstate_set(sid, 1);
> +	if (ret) {
> +		RTE_LOG(ERR, EAL, "Failed to set the runstate of "
> +			"the service");
Any rollback need to be done when fails?

> +		return ret;
> +	}
> +	ret = rte_service_component_runstate_set(id, 1);
> +	if (ret) {
> +		RTE_LOG(ERR, EAL, "Failed to set the backend runstate"
> +			" of a component");
> +		return ret;
> +	}
> +	ret = rte_service_map_lcore_set(sid, slcore_1, 1);
> +	if (ret) {
> +		RTE_LOG(ERR, EAL, "Failed to enable lcore 1 on "
> +			"dev event monitor service");
> +		return ret;
> +	}
> +	rte_service_lcore_start(slcore_1);
> +	service_no_init = false;
> +	return 0;
> +}
> +
> +int
> +rte_dev_event_monitor_stop(void)
> +{
> +	service_exit = true;
> +	service_no_init = true;
> +	return 0;

Are start and stop peer functions to call? If we call rte_dev_event_monitor_start to start monitor and then call rte_dev_event_monitor_stop to stop it, and then how to start again?

> +}
> --
> 2.7.4

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V12 2/3] eal: add uevent pass and process function
  2018-01-18  4:12                                           ` [PATCH V12 2/3] eal: add uevent pass and process function Jeff Guo
@ 2018-01-24 15:00                                             ` Wu, Jingjing
  0 siblings, 0 replies; 494+ messages in thread
From: Wu, Jingjing @ 2018-01-24 15:00 UTC (permalink / raw)
  To: Guo, Jia, stephen, Richardson, Bruce, Yigit, Ferruh, gaetan.rivet
  Cc: Ananyev, Konstantin, jblunck, shreyansh.jain, dev, thomas, Zhang,
	Helin, motih



> -----Original Message-----
> From: Guo, Jia
> Sent: Thursday, January 18, 2018 12:12 PM
> To: stephen@networkplumber.org; Richardson, Bruce <bruce.richardson@intel.com>;
> Yigit, Ferruh <ferruh.yigit@intel.com>; gaetan.rivet@6wind.com
> Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; jblunck@infradead.org;
> shreyansh.jain@nxp.com; Wu, Jingjing <jingjing.wu@intel.com>; dev@dpdk.org; Guo, Jia
> <jia.guo@intel.com>; thomas@monjalon.net; Zhang, Helin <helin.zhang@intel.com>;
> motih@mellanox.com
> Subject: [PATCH V12 2/3] eal: add uevent pass and process function
> 
> In order to handle the uevent which have been detected from the kernel
> side, add uevent process function, let hot plug event to be example to
> show uevent mechanism how to pass the uevent and process the uevent.
> 
> About uevent passing and processing, add below functions in linux eal
> dev layer. FreeBSD not support uevent ,so let it to be void and do not
> implement in function.
> a.dev_uev_parse
> b.dev_uev_receive
> c.dev_uev_process
> 
> Signed-off-by: Jeff Guo <jia.guo@intel.com>


Reviewed-by: Jingjing Wu <jingjing.wu@intel.com>

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V12 3/3] app/testpmd: use uevent to monitor hotplug
  2018-01-18  4:12                                           ` [PATCH V12 3/3] app/testpmd: use uevent to monitor hotplug Jeff Guo
@ 2018-01-24 15:21                                             ` Wu, Jingjing
  2018-01-25 14:58                                               ` Guo, Jia
  2018-01-25 14:46                                             ` [PATCH V13 1/3] eal: add uevent monitor api and callback func Jeff Guo
  2018-01-26  3:49                                             ` [PATCH V13 1/3] eal: add uevent monitor api and callback func Jeff Guo
  2 siblings, 1 reply; 494+ messages in thread
From: Wu, Jingjing @ 2018-01-24 15:21 UTC (permalink / raw)
  To: Guo, Jia, stephen, Richardson, Bruce, Yigit, Ferruh, gaetan.rivet
  Cc: Ananyev, Konstantin, jblunck, shreyansh.jain, dev, thomas, Zhang,
	Helin, motih

> +
> +static void
> +add_uevent_callback(void *arg)
> +{
> +	char *dev_name = (char *)arg;
> +
> +	rte_eal_alarm_cancel(add_uevent_callback, arg);
> +
> +	if (!in_hotplug_list(dev_name))
> +		return;
> +
> +	RTE_LOG(ERR, EAL, "add device: %s\n", dev_name);

It's not an error, replace by printf?

> +	attach_port(dev_name);
> +}
> +
>  /* This function is used by the interrupt thread */
>  static int
>  eth_event_callback(portid_t port_id, enum rte_eth_event_type type, void *param,
> @@ -1931,6 +2014,82 @@ eth_event_callback(portid_t port_id, enum
> rte_eth_event_type type, void *param,
>  }
> 
>  static int
> +in_hotplug_list(const char *dev_name)
> +{
> +	struct hotplug_request *hp_request = NULL;
> +
> +	TAILQ_FOREACH(hp_request, &hp_list, next) {
> +		if (!strcmp(hp_request->dev_name, dev_name))
> +			break;
> +	}
> +
> +	if (hp_request)
> +		return 1;
> +
Is it better to use TRUE and FALSE?

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH V13 1/3] eal: add uevent monitor api and callback func
  2018-01-18  4:12                                           ` [PATCH V12 3/3] app/testpmd: use uevent to monitor hotplug Jeff Guo
  2018-01-24 15:21                                             ` Wu, Jingjing
@ 2018-01-25 14:46                                             ` Jeff Guo
  2018-01-25 14:46                                               ` [PATCH V13 2/3] eal: add uevent pass and process function Jeff Guo
  2018-01-25 14:46                                               ` [PATCH V13 3/3] app/testpmd: use uevent to monitor hotplug Jeff Guo
  2018-01-26  3:49                                             ` [PATCH V13 1/3] eal: add uevent monitor api and callback func Jeff Guo
  2 siblings, 2 replies; 494+ messages in thread
From: Jeff Guo @ 2018-01-25 14:46 UTC (permalink / raw)
  To: stephen, gaetan.rivet, jingjing.wu, thomas, motih
  Cc: bruce.richardson, ferruh.yigit, konstantin.ananyev, jblunck,
	shreyansh.jain, dev, jia.guo, helin.zhang

This patch aim to add a general uevent mechanism in eal device layer,
to enable all linux kernel object uevent monitoring, user could use these
APIs to monitor and read out the device status info that sent from the
kernel side, then corresponding to handle it, such as when detect hotplug
uevent type, user could detach or attach the device, and more it benefit
to use to do smoothly fail safe work.

About uevent monitoring:
a: add one epolling to poll the netlink socket, to monitor the uevent of
   the device.
b: add enum of rte_eal_dev_event_type and struct of rte_eal_uevent.
c: add below APIs in rte eal device layer.
   rte_dev_callback_register
   rte_dev_callback_unregister
   _rte_dev_callback_process
   rte_dev_event_monitor_start
   rte_dev_event_monitor_stop

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v13->v12:
fix some logic issue and null check issue
fix monitor stop func issue
---
 lib/librte_eal/bsdapp/eal/eal_dev.c     |  38 +++++
 lib/librte_eal/common/eal_common_dev.c  | 132 +++++++++++++++++
 lib/librte_eal/common/include/rte_dev.h | 119 ++++++++++++++++
 lib/librte_eal/linuxapp/eal/Makefile    |   1 +
 lib/librte_eal/linuxapp/eal/eal_dev.c   | 243 ++++++++++++++++++++++++++++++++
 5 files changed, 533 insertions(+)
 create mode 100644 lib/librte_eal/bsdapp/eal/eal_dev.c
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_dev.c

diff --git a/lib/librte_eal/bsdapp/eal/eal_dev.c b/lib/librte_eal/bsdapp/eal/eal_dev.c
new file mode 100644
index 0000000..83ffdee
--- /dev/null
+++ b/lib/librte_eal/bsdapp/eal/eal_dev.c
@@ -0,0 +1,38 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdio.h>
+#include <string.h>
+#include <inttypes.h>
+#include <sys/queue.h>
+#include <sys/signalfd.h>
+#include <sys/ioctl.h>
+#include <sys/socket.h>
+#include <sys/epoll.h>
+#include <unistd.h>
+#include <signal.h>
+#include <stdbool.h>
+
+#include <rte_malloc.h>
+#include <rte_bus.h>
+#include <rte_dev.h>
+#include <rte_devargs.h>
+#include <rte_debug.h>
+#include <rte_log.h>
+
+#include "eal_thread.h"
+
+int
+rte_dev_event_monitor_start(void)
+{
+	RTE_LOG(ERR, EAL, "Not support event monitor for FreeBSD\n");
+	return -1;
+}
+
+int
+rte_dev_event_monitor_stop(void)
+{
+	RTE_LOG(ERR, EAL, "Not support event monitor for FreeBSD\n");
+	return -1;
+}
diff --git a/lib/librte_eal/common/eal_common_dev.c b/lib/librte_eal/common/eal_common_dev.c
index dda8f58..a8393d9 100644
--- a/lib/librte_eal/common/eal_common_dev.c
+++ b/lib/librte_eal/common/eal_common_dev.c
@@ -42,9 +42,32 @@
 #include <rte_devargs.h>
 #include <rte_debug.h>
 #include <rte_log.h>
+#include <rte_spinlock.h>
+#include <rte_malloc.h>
 
 #include "eal_private.h"
 
+/* spinlock for device callbacks */
+static rte_spinlock_t rte_dev_event_lock = RTE_SPINLOCK_INITIALIZER;
+
+/**
+ * The user application callback description.
+ *
+ * It contains callback address to be registered by user application,
+ * the pointer to the parameters for callback, and the device name.
+ */
+struct rte_dev_event_callback {
+	TAILQ_ENTRY(rte_dev_event_callback) next; /**< Callbacks list */
+	rte_dev_event_cb_fn cb_fn;                /**< Callback address */
+	void *cb_arg;                           /**< Callback parameter */
+	char *dev_name;				/**< Callback devcie name, NULL
+							is for all device */
+	uint32_t active;                        /**< Callback is executing */
+};
+
+/* A general callbacks list for all callback of devices */
+static struct rte_dev_event_cb_list dev_event_cbs;
+
 static int cmp_detached_dev_name(const struct rte_device *dev,
 	const void *_name)
 {
@@ -234,3 +257,112 @@ int rte_eal_hotplug_remove(const char *busname, const char *devname)
 	rte_eal_devargs_remove(busname, devname);
 	return ret;
 }
+
+int
+rte_dev_callback_register(char *device_name, rte_dev_event_cb_fn cb_fn,
+				void *cb_arg)
+{
+	struct rte_dev_event_callback *event_cb = NULL;
+
+	rte_spinlock_lock(&rte_dev_event_lock);
+
+	if (TAILQ_EMPTY(&(dev_event_cbs)))
+		TAILQ_INIT(&(dev_event_cbs));
+
+	TAILQ_FOREACH(event_cb, &(dev_event_cbs), next) {
+		if (event_cb->cb_fn == cb_fn &&
+			event_cb->cb_arg == cb_arg &&
+			(!strcmp(event_cb->dev_name, device_name) ||
+			(!device_name && !event_cb->dev_name)))
+			break;
+	}
+
+	/* create a new callback. */
+	if (event_cb == NULL) {
+		/* allocate a new user callback entity */
+		event_cb = malloc(sizeof(struct rte_dev_event_callback));
+		if (event_cb != NULL) {
+			event_cb->cb_fn = cb_fn;
+			event_cb->cb_arg = cb_arg;
+			strcpy(event_cb->dev_name, device_name);
+			TAILQ_INSERT_TAIL(&(dev_event_cbs), event_cb, next);
+		} else
+			free(event_cb);
+	}
+
+	rte_spinlock_unlock(&rte_dev_event_lock);
+	return (event_cb == NULL) ? -1 : 0;
+}
+
+int
+rte_dev_callback_unregister(char *device_name, rte_dev_event_cb_fn cb_fn,
+				void *cb_arg)
+{
+	int ret;
+	struct rte_dev_event_callback *event_cb, *next;
+
+	if (!cb_fn || device_name == NULL)
+		return -EINVAL;
+
+	rte_spinlock_lock(&rte_dev_event_lock);
+
+	ret = 0;
+
+	for (event_cb = TAILQ_FIRST(&(dev_event_cbs)); event_cb != NULL;
+	      event_cb = next) {
+
+		next = TAILQ_NEXT(event_cb, next);
+
+		if (event_cb->cb_fn != cb_fn ||
+				(event_cb->cb_arg != (void *)-1 &&
+				event_cb->cb_arg != cb_arg) ||
+				strcmp(event_cb->dev_name, device_name) ||
+				(!device_name && event_cb->dev_name) ||
+				(device_name && !event_cb->dev_name))
+			continue;
+
+		/*
+		 * if this callback is not executing right now,
+		 * then remove it.
+		 */
+		if (event_cb->active == 0) {
+			TAILQ_REMOVE(&(dev_event_cbs), event_cb, next);
+			rte_free(event_cb);
+		} else {
+			ret = -EAGAIN;
+		}
+	}
+
+	rte_spinlock_unlock(&rte_dev_event_lock);
+	return ret;
+}
+
+int
+_rte_dev_callback_process(char *device_name, enum rte_dev_event_type event,
+				void *cb_arg)
+{
+	struct rte_dev_event_callback dev_cb;
+	struct rte_dev_event_callback *cb_lst;
+	int rc = 0;
+
+	rte_spinlock_lock(&rte_dev_event_lock);
+
+	if (device_name == NULL)
+		return -EINVAL;
+
+	TAILQ_FOREACH(cb_lst, &(dev_event_cbs), next) {
+		if (cb_lst->cb_fn == NULL || (strcmp(cb_lst->dev_name,
+			device_name) && cb_lst->dev_name))
+			continue;
+		dev_cb = *cb_lst;
+		cb_lst->active = 1;
+		if (cb_arg)
+			dev_cb.cb_arg = cb_arg;
+		rc = dev_cb.cb_fn(device_name, event,
+				dev_cb.cb_arg);
+		cb_lst->active = 0;
+	}
+
+	rte_spinlock_unlock(&rte_dev_event_lock);
+	return rc;
+}
diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h
index 8088dcc..88fbb2d 100644
--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -52,6 +52,30 @@ extern "C" {
 #include <rte_config.h>
 #include <rte_log.h>
 
+/**
+ * The device event type.
+ */
+enum rte_dev_event_type {
+	RTE_DEV_EVENT_UNKNOWN,	/**< unknown event type */
+	RTE_DEV_EVENT_ADD,	/**< device being added */
+	RTE_DEV_EVENT_REMOVE,	/**< device being removed */
+	RTE_DEV_EVENT_MAX	/**< max value of this enum */
+};
+
+struct rte_dev_event {
+	enum rte_dev_event_type type;	/**< device event type */
+	int subsystem;			/**< subsystem id */
+	char *devname;			/**< device name */
+};
+
+typedef int (*rte_dev_event_cb_fn)(char *device_name,
+					enum rte_dev_event_type event,
+					void *cb_arg);
+
+struct rte_dev_event_callback;
+/** @internal Structure to keep track of registered callbacks */
+TAILQ_HEAD(rte_dev_event_cb_list, rte_dev_event_callback);
+
 __attribute__((format(printf, 2, 0)))
 static inline void
 rte_pmd_debug_trace(const char *func_name, const char *fmt, ...)
@@ -294,4 +318,99 @@ __attribute__((used)) = str
 }
 #endif
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * It registers the callback for the specific device.
+ * Multiple callbacks cal be registered at the same time.
+ *
+ * @param device_name
+ *  The device name, that is the param name of the struct rte_device,
+ *  null value means for all devices.
+ * @param cb_fn
+ *  callback address.
+ * @param cb_arg
+ *  address of parameter for callback.
+ *
+ * @return
+ *  - On success, zero.
+ *  - On failure, a negative value.
+ */
+int rte_dev_callback_register(char *device_name, rte_dev_event_cb_fn cb_fn,
+					void *cb_arg);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * It unregisters the callback according to the specified device.
+ *
+ * @param device_name
+ *  The device name, that is the param name of the struct rte_device,
+ *  null value means for all devices.
+ * @param cb_fn
+ *  callback address.
+ * @param cb_arg
+ *  address of parameter for callback, (void *)-1 means to remove all
+ *  registered which has the same callback address.
+ *
+ * @return
+ *  - On success, return the number of callback entities removed.
+ *  - On failure, a negative value.
+ */
+int rte_dev_callback_unregister(char *device_name, rte_dev_event_cb_fn cb_fn,
+					void *cb_arg);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * internal Executes all the user application registered callbacks for
+ * the specific device. It is for DPDK internal user only. User
+ * application should not call it directly.
+ *
+ * @param device_name
+ *  The device name.
+ * @param event
+ *  the device event type
+ *  is permitted or not.
+ * @param cb_arg
+ *  callback parameter.
+ *
+ * @return
+ *  - On success, return zero.
+ *  - On failure, a negative value.
+ */
+int
+_rte_dev_callback_process(char *device_name, enum rte_dev_event_type event,
+				void *cb_arg);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Start the device event monitoring.
+ *
+ * @param none
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+rte_dev_event_monitor_start(void);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Stop the device event monitoring .
+ *
+ * @param none
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+rte_dev_event_monitor_stop(void);
 #endif /* _RTE_DEV_H_ */
diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index 7bf278f..a167401 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -39,6 +39,7 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_lcore.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_timer.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_interrupts.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_alarm.c
+SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_dev.c
 
 # from common dir
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_common_lcore.c
diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
new file mode 100644
index 0000000..72371c9
--- /dev/null
+++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
@@ -0,0 +1,243 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdio.h>
+#include <string.h>
+#include <inttypes.h>
+#include <sys/queue.h>
+#include <sys/signalfd.h>
+#include <sys/ioctl.h>
+#include <sys/socket.h>
+#include <linux/netlink.h>
+#include <sys/epoll.h>
+#include <unistd.h>
+#include <signal.h>
+#include <stdbool.h>
+
+#include <rte_malloc.h>
+#include <rte_bus.h>
+#include <rte_dev.h>
+#include <rte_devargs.h>
+#include <rte_debug.h>
+#include <rte_log.h>
+#include <rte_service.h>
+#include <rte_service_component.h>
+
+#include "eal_thread.h"
+
+bool service_exit = true;
+bool service_no_init = true;
+uint32_t slcore;
+uint32_t sevice_id;
+#define DEV_EV_MNT_SERVICE_NAME "device_event_monitor_service"
+
+static int
+dev_monitor_fd_new(void)
+{
+
+	int uevent_fd;
+
+	uevent_fd = socket(PF_NETLINK, SOCK_RAW | SOCK_CLOEXEC |
+			SOCK_NONBLOCK,
+			NETLINK_KOBJECT_UEVENT);
+	if (uevent_fd < 0) {
+		RTE_LOG(ERR, EAL, "create uevent fd failed\n");
+		return -1;
+	}
+	return uevent_fd;
+}
+
+static int
+dev_monitor_enable(int netlink_fd)
+{
+	struct sockaddr_nl addr;
+	int ret;
+	int size = 64 * 1024;
+	int nonblock = 1;
+
+	memset(&addr, 0, sizeof(addr));
+	addr.nl_family = AF_NETLINK;
+	addr.nl_pid = 0;
+	addr.nl_groups = 0xffffffff;
+
+	if (bind(netlink_fd, (struct sockaddr *) &addr, sizeof(addr)) < 0) {
+		RTE_LOG(ERR, EAL, "bind failed\n");
+		goto err;
+	}
+
+	setsockopt(netlink_fd, SOL_SOCKET, SO_PASSCRED, &size, sizeof(size));
+
+	ret = ioctl(netlink_fd, FIONBIO, &nonblock);
+	if (ret != 0) {
+		RTE_LOG(ERR, EAL, "ioctl(FIONBIO) failed\n");
+		goto err;
+	}
+	return 0;
+err:
+	close(netlink_fd);
+	return -1;
+}
+
+static int
+dev_uev_process(__rte_unused struct epoll_event *events, __rte_unused int nfds)
+{
+	/* TODO: device uevent processing */
+	return 0;
+}
+
+/**
+ * It builds/rebuilds up the epoll file descriptor with all the
+ * file descriptors being waited on. Then handles the netlink event.
+ *
+ * @param arg
+ *  pointer. (unused)
+ *
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+static int32_t dev_uev_monitoring(__rte_unused void *arg)
+{
+	int netlink_fd = -1;
+	struct epoll_event ep_kernel;
+	int fd_ep = -1;
+
+	service_exit = false;
+
+	fd_ep = epoll_create1(EPOLL_CLOEXEC);
+	if (fd_ep < 0) {
+		RTE_LOG(ERR, EAL, "error creating epoll fd: %m\n");
+		goto out;
+	}
+
+	netlink_fd = dev_monitor_fd_new();
+
+	if (dev_monitor_enable(netlink_fd) < 0) {
+		RTE_LOG(ERR, EAL, "error subscribing to kernel events\n");
+		goto out;
+	}
+
+	memset(&ep_kernel, 0, sizeof(struct epoll_event));
+	ep_kernel.events = EPOLLIN | EPOLLPRI | EPOLLRDHUP | EPOLLHUP;
+	ep_kernel.data.fd = netlink_fd;
+	if (epoll_ctl(fd_ep, EPOLL_CTL_ADD, netlink_fd,
+		&ep_kernel) < 0) {
+		RTE_LOG(ERR, EAL, "error addding fd to epoll: %m\n");
+		goto out;
+	}
+
+	while (!service_exit) {
+		int fdcount;
+		struct epoll_event ev[1];
+
+		fdcount = epoll_wait(fd_ep, ev, 1, -1);
+		if (fdcount < 0) {
+			if (errno != EINTR)
+				RTE_LOG(ERR, EAL, "error receiving uevent "
+					"message: %m\n");
+				continue;
+			}
+
+		/* epoll_wait has at least one fd ready to read */
+		if (dev_uev_process(ev, fdcount) < 0) {
+			if (errno != EINTR)
+				RTE_LOG(ERR, EAL, "error processing uevent "
+					"message: %m\n");
+		}
+	}
+	return 0;
+out:
+	if (fd_ep >= 0)
+		close(fd_ep);
+	if (netlink_fd >= 0)
+		close(netlink_fd);
+	rte_panic("uev monitoring fail\n");
+	return -1;
+}
+
+int
+rte_dev_event_monitor_start(void)
+{
+	int ret;
+	struct rte_service_spec service;
+	const uint32_t sid = 0;
+
+	if (!service_no_init)
+		return 0;
+
+	slcore = rte_get_next_lcore(/* start core */ -1,
+					       /* skip master */ 1,
+					       /* wrap */ 0);
+
+	ret = rte_service_lcore_add(slcore);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "dev event monitor lcore add fail");
+		return ret;
+	}
+
+	memset(&service, 0, sizeof(service));
+	snprintf(service.name, sizeof(service.name), DEV_EV_MNT_SERVICE_NAME);
+
+	service.socket_id = rte_socket_id();
+	service.callback = dev_uev_monitoring;
+	service.callback_userdata = NULL;
+	service.capabilities = 0;
+	ret = rte_service_component_register(&service, &sevice_id);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Failed to register service %s "
+			"err = %" PRId32,
+			service.name, ret);
+		return ret;
+	}
+	ret = rte_service_runstate_set(sid, 1);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Failed to set the runstate of "
+			"the service");
+		goto err_done;
+	}
+	ret = rte_service_component_runstate_set(sevice_id, 1);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Failed to set the backend runstate"
+			" of a component");
+		return ret;
+	}
+	ret = rte_service_map_lcore_set(sid, slcore, 1);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Failed to enable lcore 1 on "
+			"dev event monitor service");
+		return ret;
+	}
+	rte_service_lcore_start(slcore);
+	service_no_init = false;
+	return 0;
+
+err_done:
+	rte_service_component_unregister(sevice_id);
+	return ret;
+}
+
+int
+rte_dev_event_monitor_stop(void)
+{
+	int ret;
+
+	ret = rte_service_lcore_stop(slcore);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Failed to stop lcore on "
+			"dev event monitor service");
+		return ret;
+	}
+
+	rte_service_component_unregister(sevice_id);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Failed to unregister service"
+			"err = %" PRId32, ret);
+		return ret;
+	}
+
+	service_exit = true;
+	service_no_init = true;
+
+	return 0;
+}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH V13 2/3] eal: add uevent pass and process function
  2018-01-25 14:46                                             ` [PATCH V13 1/3] eal: add uevent monitor api and callback func Jeff Guo
@ 2018-01-25 14:46                                               ` Jeff Guo
  2018-01-25 14:46                                               ` [PATCH V13 3/3] app/testpmd: use uevent to monitor hotplug Jeff Guo
  1 sibling, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-01-25 14:46 UTC (permalink / raw)
  To: stephen, gaetan.rivet, jingjing.wu, thomas, motih
  Cc: bruce.richardson, ferruh.yigit, konstantin.ananyev, jblunck,
	shreyansh.jain, dev, jia.guo, helin.zhang

In order to handle the uevent which have been detected from the kernel
side, add uevent process function, let hot plug event to be example to
show uevent mechanism how to pass the uevent and process the uevent.

About uevent passing and processing, add below functions in linux eal
dev layer. FreeBSD not support uevent ,so let it to be void and do not
implement in function.
a.dev_uev_parse
b.dev_uev_receive
c.dev_uev_process

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Reviewed-by: Jingjing Wu <jingjing.wu@intel.com>
---
v13->v12:
fix some event parse issue
---
 lib/librte_eal/common/include/rte_dev.h |  16 +++++
 lib/librte_eal/linuxapp/eal/eal_dev.c   | 105 +++++++++++++++++++++++++++++++-
 2 files changed, 119 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h
index 88fbb2d..9396a9f 100644
--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -52,6 +52,22 @@ extern "C" {
 #include <rte_config.h>
 #include <rte_log.h>
 
+#define RTE_EAL_UEV_MSG_LEN 4096
+#define RTE_EAL_UEV_MSG_ELEM_LEN 128
+
+enum rte_dev_state {
+	RTE_DEV_UNDEFINED,	/**< unknown device state */
+	RTE_DEV_FAULT,	/**< device fault or error */
+	RTE_DEV_PARSED,	/**< device has been scanned on bus*/
+	RTE_DEV_PROBED,	/**< device has been probed driver  */
+};
+
+enum rte_dev_event_subsystem {
+	RTE_DEV_EVENT_SUBSYSTEM_UIO,
+	RTE_DEV_EVENT_SUBSYSTEM_VFIO,
+	RTE_DEV_EVENT_SUBSYSTEM_MAX
+};
+
 /**
  * The device event type.
  */
diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
index 72371c9..9dda195 100644
--- a/lib/librte_eal/linuxapp/eal/eal_dev.c
+++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
@@ -79,10 +79,111 @@ dev_monitor_enable(int netlink_fd)
 	return -1;
 }
 
+static void
+dev_uev_parse(const char *buf, struct rte_dev_event *event)
+{
+	char action[RTE_EAL_UEV_MSG_ELEM_LEN];
+	char subsystem[RTE_EAL_UEV_MSG_ELEM_LEN];
+	char dev_path[RTE_EAL_UEV_MSG_ELEM_LEN];
+	char pci_slot_name[RTE_EAL_UEV_MSG_ELEM_LEN];
+	int i = 0;
+
+	memset(action, 0, RTE_EAL_UEV_MSG_ELEM_LEN);
+	memset(subsystem, 0, RTE_EAL_UEV_MSG_ELEM_LEN);
+	memset(dev_path, 0, RTE_EAL_UEV_MSG_ELEM_LEN);
+	memset(pci_slot_name, 0, RTE_EAL_UEV_MSG_ELEM_LEN);
+
+	while (i < RTE_EAL_UEV_MSG_LEN) {
+		for (; i < RTE_EAL_UEV_MSG_LEN; i++) {
+			if (*buf)
+				break;
+			buf++;
+		}
+
+		if (!strncmp(buf, "ACTION=", 7)) {
+			buf += 7;
+			i += 7;
+			snprintf(action, sizeof(action), "%s", buf);
+		} else if (!strncmp(buf, "DEVPATH=", 8)) {
+			buf += 8;
+			i += 8;
+			snprintf(dev_path, sizeof(dev_path), "%s", buf);
+		} else if (!strncmp(buf, "SUBSYSTEM=", 10)) {
+			buf += 10;
+			i += 10;
+			snprintf(subsystem, sizeof(subsystem), "%s", buf);
+		} else if (!strncmp(buf, "PCI_SLOT_NAME=", 14)) {
+			buf += 14;
+			i += 14;
+			snprintf(pci_slot_name, sizeof(subsystem), "%s", buf);
+		}
+		for (; i < RTE_EAL_UEV_MSG_LEN; i++) {
+			if (*buf == '\0')
+				break;
+			buf++;
+		}
+	}
+
+
+	if ((!strncmp(subsystem, "uio", 3)) ||
+		(!strncmp(subsystem, "pci", 3))) {
+		event->subsystem = RTE_DEV_EVENT_SUBSYSTEM_UIO;
+		if (!strncmp(action, "add", 3))
+			event->type = RTE_DEV_EVENT_ADD;
+		if (!strncmp(action, "remove", 6))
+			event->type = RTE_DEV_EVENT_REMOVE;
+		event->devname = pci_slot_name;
+	}
+}
+
+static int
+dev_uev_receive(int fd, struct rte_dev_event *uevent)
+{
+	int ret;
+	char buf[RTE_EAL_UEV_MSG_LEN];
+
+	memset(uevent, 0, sizeof(struct rte_dev_event));
+	memset(buf, 0, RTE_EAL_UEV_MSG_LEN);
+
+	ret = recv(fd, buf, RTE_EAL_UEV_MSG_LEN - 1, MSG_DONTWAIT);
+	if (ret < 0) {
+		RTE_LOG(ERR, EAL,
+		"Socket read error(%d): %s\n",
+		errno, strerror(errno));
+		return -1;
+	} else if (ret == 0)
+		/* connection closed */
+		return -1;
+
+	dev_uev_parse(buf, uevent);
+
+	return 0;
+}
+
 static int
-dev_uev_process(__rte_unused struct epoll_event *events, __rte_unused int nfds)
+dev_uev_process(struct epoll_event *events, int nfds)
 {
-	/* TODO: device uevent processing */
+	struct rte_dev_event uevent;
+	int i;
+
+	for (i = 0; i < nfds; i++) {
+		if (dev_uev_receive(events[i].data.fd, &uevent))
+			return 0;
+
+		/* default handle all pci devcie when is being hot plug */
+		if (uevent.subsystem == RTE_DEV_EVENT_SUBSYSTEM_UIO &&
+			uevent.devname) {
+			if (uevent.type == RTE_DEV_EVENT_REMOVE) {
+				return(_rte_dev_callback_process(
+					uevent.devname,
+					RTE_DEV_EVENT_REMOVE, NULL));
+			} else if (uevent.type == RTE_DEV_EVENT_ADD) {
+				return(_rte_dev_callback_process(
+					uevent.devname,
+					RTE_DEV_EVENT_ADD, NULL));
+			}
+		}
+	}
 	return 0;
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH V13 3/3] app/testpmd: use uevent to monitor hotplug
  2018-01-25 14:46                                             ` [PATCH V13 1/3] eal: add uevent monitor api and callback func Jeff Guo
  2018-01-25 14:46                                               ` [PATCH V13 2/3] eal: add uevent pass and process function Jeff Guo
@ 2018-01-25 14:46                                               ` Jeff Guo
  1 sibling, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-01-25 14:46 UTC (permalink / raw)
  To: stephen, gaetan.rivet, jingjing.wu, thomas, motih
  Cc: bruce.richardson, ferruh.yigit, konstantin.ananyev, jblunck,
	shreyansh.jain, dev, jia.guo, helin.zhang

use testpmd for example, to show app how to request and use
uevent monitoring to handle the hot removal event and the
hot insertion event.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v13->v12:
refine some code style
---
 app/test-pmd/testpmd.c | 169 +++++++++++++++++++++++++++++++++++++++++++++++++
 app/test-pmd/testpmd.h |   9 +++
 2 files changed, 178 insertions(+)

diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 5dc8cca..b29c9d5 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -12,6 +12,7 @@
 #include <sys/mman.h>
 #include <sys/types.h>
 #include <errno.h>
+#include <stdbool.h>
 
 #include <sys/queue.h>
 #include <sys/stat.h>
@@ -367,6 +368,8 @@ uint8_t bitrate_enabled;
 struct gro_status gro_ports[RTE_MAX_ETHPORTS];
 uint8_t gro_flush_cycles = GRO_DEFAULT_FLUSH_CYCLES;
 
+static struct hotplug_request_list hp_list;
+
 /* Forward function declarations */
 static void map_port_queue_stats_mapping_registers(portid_t pi,
 						   struct rte_port *port);
@@ -374,6 +377,13 @@ static void check_all_ports_link_status(uint32_t port_mask);
 static int eth_event_callback(portid_t port_id,
 			      enum rte_eth_event_type type,
 			      void *param, void *ret_param);
+static int eth_uevent_callback(char *device_name, enum rte_dev_event_type type,
+			      void *param);
+static int eth_uevent_callback_register(portid_t port_id);
+static bool in_hotplug_list(const char *dev_name);
+
+static int hotplug_list_add(const char *dev_name,
+			    enum rte_dev_event_type event);
 
 /*
  * Check if all the ports are started.
@@ -1835,6 +1845,27 @@ reset_port(portid_t pid)
 	printf("Done\n");
 }
 
+static int
+eth_uevent_callback_register(portid_t port_id)
+{
+	int diag;
+	char device_name[128];
+
+	snprintf(device_name, sizeof(device_name),
+		"%s", rte_eth_devices[port_id].device->name);
+
+	/* register the uevent callback */
+
+	diag = rte_dev_callback_register(device_name,
+		eth_uevent_callback, (void *)(intptr_t)port_id);
+	if (diag) {
+		printf("Failed to setup uevent callback\n");
+		return -1;
+	}
+
+	return 0;
+}
+
 void
 attach_port(char *identifier)
 {
@@ -1851,6 +1882,8 @@ attach_port(char *identifier)
 	if (rte_eth_dev_attach(identifier, &pi))
 		return;
 
+	eth_uevent_callback_register(pi);
+
 	socket_id = (unsigned)rte_eth_dev_socket_id(pi);
 	/* if socket_id is invalid, set to 0 */
 	if (check_socket_id(socket_id) < 0)
@@ -1862,6 +1895,8 @@ attach_port(char *identifier)
 
 	ports[pi].port_status = RTE_PORT_STOPPED;
 
+	hotplug_list_add(identifier, RTE_DEV_EVENT_REMOVE);
+
 	printf("Port %d is attached. Now total ports is %d\n", pi, nb_ports);
 	printf("Done\n");
 }
@@ -1888,6 +1923,9 @@ detach_port(portid_t port_id)
 
 	nb_ports = rte_eth_dev_count();
 
+	hotplug_list_add(rte_eth_devices[port_id].device->name,
+			 RTE_DEV_EVENT_ADD);
+
 	printf("Port '%s' is detached. Now total ports is %d\n",
 			name, nb_ports);
 	printf("Done\n");
@@ -1911,6 +1949,9 @@ pmd_test_exit(void)
 			close_port(pt_id);
 		}
 	}
+
+	rte_dev_event_monitor_stop();
+
 	printf("\nBye...\n");
 }
 
@@ -1995,6 +2036,49 @@ rmv_event_callback(void *arg)
 			dev->device->name);
 }
 
+static void
+rmv_uevent_callback(void *arg)
+{
+	char name[RTE_ETH_NAME_MAX_LEN];
+	uint8_t port_id = (intptr_t)arg;
+
+	rte_eal_alarm_cancel(rmv_uevent_callback, arg);
+
+	RTE_ETH_VALID_PORTID_OR_RET(port_id);
+	printf("removing port id:%u\n", port_id);
+
+	if (!in_hotplug_list(rte_eth_devices[port_id].device->name))
+		return;
+
+	stop_packet_forwarding();
+
+	stop_port(port_id);
+	close_port(port_id);
+	if (rte_eth_dev_detach(port_id, name)) {
+		TESTPMD_LOG(ERR, "Failed to detach port '%s'\n", name);
+		return;
+	}
+
+	nb_ports = rte_eth_dev_count();
+
+	printf("Port '%s' is detached. Now total ports is %d\n",
+			name, nb_ports);
+}
+
+static void
+add_uevent_callback(void *arg)
+{
+	char *dev_name = (char *)arg;
+
+	rte_eal_alarm_cancel(add_uevent_callback, arg);
+
+	if (!in_hotplug_list(dev_name))
+		return;
+
+	printf("add device: %s\n", dev_name);
+	attach_port(dev_name);
+}
+
 /* This function is used by the interrupt thread */
 static int
 eth_event_callback(portid_t port_id, enum rte_eth_event_type type, void *param,
@@ -2038,6 +2122,82 @@ eth_event_callback(portid_t port_id, enum rte_eth_event_type type, void *param,
 	return 0;
 }
 
+static bool
+in_hotplug_list(const char *dev_name)
+{
+	struct hotplug_request *hp_request = NULL;
+
+	TAILQ_FOREACH(hp_request, &hp_list, next) {
+		if (!strcmp(hp_request->dev_name, dev_name))
+			break;
+	}
+
+	if (hp_request)
+		return true;
+
+	return false;
+}
+
+static int
+hotplug_list_add(const char *dev_name, enum rte_dev_event_type event)
+{
+	struct hotplug_request *hp_request;
+
+	hp_request = rte_zmalloc("hoplug request",
+			sizeof(*hp_request), 0);
+	if (hp_request == NULL) {
+		fprintf(stderr, "%s can not alloc memory\n",
+			__func__);
+		return -ENOMEM;
+	}
+
+	hp_request->dev_name = dev_name;
+	hp_request->event = event;
+
+	TAILQ_INSERT_TAIL(&hp_list, hp_request, next);
+
+	return 0;
+}
+
+/* This function is used by the interrupt thread */
+static int
+eth_uevent_callback(char *device_name, enum rte_dev_event_type type, void *arg)
+{
+	static const char * const event_desc[] = {
+		[RTE_DEV_EVENT_UNKNOWN] = "Unknown",
+		[RTE_DEV_EVENT_ADD] = "add",
+		[RTE_DEV_EVENT_REMOVE] = "remove",
+	};
+
+	if (type >= RTE_DEV_EVENT_MAX) {
+		fprintf(stderr, "%s called upon invalid event %d\n",
+			__func__, type);
+		fflush(stderr);
+	} else if (event_print_mask & (UINT32_C(1) << type)) {
+		printf("%s event\n",
+			event_desc[type]);
+		fflush(stdout);
+	}
+
+	switch (type) {
+	case RTE_DEV_EVENT_REMOVE:
+		if (rte_eal_alarm_set(100000,
+			rmv_uevent_callback, arg))
+			fprintf(stderr, "Could not set up deferred "
+				"device removal\n");
+		break;
+	case RTE_DEV_EVENT_ADD:
+		if (rte_eal_alarm_set(500000,
+			add_uevent_callback, (void *)device_name))
+			fprintf(stderr, "Could not set up deferred "
+				"device add\n");
+		break;
+	default:
+		break;
+	}
+	return 0;
+}
+
 static int
 set_tx_queue_stats_mapping_registers(portid_t port_id, struct rte_port *port)
 {
@@ -2519,6 +2679,15 @@ main(int argc, char** argv)
 		       nb_rxq, nb_txq);
 
 	init_config();
+
+	/* enable hot plug monitoring */
+	TAILQ_INIT(&hp_list);
+	rte_dev_event_monitor_start();
+	RTE_ETH_FOREACH_DEV(port_id) {
+		hotplug_list_add(rte_eth_devices[port_id].device->name,
+			 RTE_DEV_EVENT_REMOVE);
+		eth_uevent_callback_register(port_id);
+	}
 	if (start_port(RTE_PORT_ALL) != 0)
 		rte_exit(EXIT_FAILURE, "Start ports failed\n");
 
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 47f8fa8..c797667 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -63,6 +63,15 @@ typedef uint16_t streamid_t;
 #define TM_MODE			0
 #endif
 
+struct hotplug_request {
+	TAILQ_ENTRY(hotplug_request) next; /**< Callbacks list */
+	const char *dev_name;                /* request device name */
+	enum rte_dev_event_type event;      /**< device event type */
+};
+
+/** @internal Structure to keep track of registered callbacks */
+TAILQ_HEAD(hotplug_request_list, hotplug_request);
+
 enum {
 	PORT_TOPOLOGY_PAIRED,
 	PORT_TOPOLOGY_CHAINED,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* Re: [PATCH V12 1/3] eal: add uevent monitor api and callback func
  2018-01-24 14:52                                           ` Wu, Jingjing
@ 2018-01-25 14:57                                             ` Guo, Jia
  0 siblings, 0 replies; 494+ messages in thread
From: Guo, Jia @ 2018-01-25 14:57 UTC (permalink / raw)
  To: Wu, Jingjing, stephen, Richardson, Bruce, Yigit, Ferruh, gaetan.rivet
  Cc: Ananyev, Konstantin, jblunck, shreyansh.jain, dev, thomas, Zhang,
	Helin, motih


thanks for your review. please check v13.
On 1/24/2018 10:52 PM, Wu, Jingjing wrote:
>
>> -----Original Message-----
>> From: Guo, Jia
>> Sent: Thursday, January 18, 2018 12:12 PM
>> To: stephen@networkplumber.org; Richardson, Bruce <bruce.richardson@intel.com>;
>> Yigit, Ferruh <ferruh.yigit@intel.com>; gaetan.rivet@6wind.com
>> Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; jblunck@infradead.org;
>> shreyansh.jain@nxp.com; Wu, Jingjing <jingjing.wu@intel.com>; dev@dpdk.org; Guo, Jia
>> <jia.guo@intel.com>; thomas@monjalon.net; Zhang, Helin <helin.zhang@intel.com>;
>> motih@mellanox.com
>> Subject: [PATCH V12 1/3] eal: add uevent monitor api and callback func
>>
>> This patch aim to add a general uevent mechanism in eal device layer,
>> to enable all linux kernel object uevent monitoring, user could use these
>> APIs to monitor and read out the device status info that sent from the
>> kernel side, then corresponding to handle it, such as when detect hotplug
>> uevent type, user could detach or attach the device, and more it benefit
>> to use to do smoothly fail safe work.
>>
>> About uevent monitoring:
>> a: add one epolling to poll the netlink socket, to monitor the uevent of
>>     the device.
>> b: add enum of rte_eal_dev_event_type and struct of rte_eal_uevent.
>> c: add below APIs in rte eal device layer.
>>     rte_dev_callback_register
>>     rte_dev_callback_unregister
>>     _rte_dev_callback_process
>>     rte_dev_event_monitor_start
>>     rte_dev_event_monitor_stop
>>
>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
>> ---
>> v12->v11:
>> identify null param in callback for monitor all devices uevent
>> ---
>>   lib/librte_eal/bsdapp/eal/eal_dev.c     |  38 ++++++
>>   lib/librte_eal/common/eal_common_dev.c  | 128 ++++++++++++++++++
>>   lib/librte_eal/common/include/rte_dev.h | 119 +++++++++++++++++
>>   lib/librte_eal/linuxapp/eal/Makefile    |   1 +
>>   lib/librte_eal/linuxapp/eal/eal_dev.c   | 223 ++++++++++++++++++++++++++++++++
>>   5 files changed, 509 insertions(+)
>>   create mode 100644 lib/librte_eal/bsdapp/eal/eal_dev.c
>>   create mode 100644 lib/librte_eal/linuxapp/eal/eal_dev.c
>>
> [......]
>
>> +int
>> +rte_dev_callback_register(char *device_name, rte_dev_event_cb_fn cb_fn,
>> +				void *cb_arg)
>> +{
>> +	struct rte_dev_event_callback *event_cb = NULL;
>> +
>> +	rte_spinlock_lock(&rte_dev_event_lock);
>> +
>> +	if (TAILQ_EMPTY(&(dev_event_cbs)))
>> +		TAILQ_INIT(&(dev_event_cbs));
>> +
>> +	TAILQ_FOREACH(event_cb, &(dev_event_cbs), next) {
>> +		if (event_cb->cb_fn == cb_fn &&
>> +			event_cb->cb_arg == cb_arg &&
>> +			!strcmp(event_cb->dev_name, device_name))
> device_name = NULL means means for all devices, right? Can strcmp accept NULL arguments?
got it.
>> +			break;
>> +	}
>> +
>> +	/* create a new callback. */
>> +	if (event_cb == NULL) {
>> +		/* allocate a new user callback entity */
>> +		event_cb = malloc(sizeof(struct rte_dev_event_callback));
>> +		if (event_cb != NULL) {
>> +			event_cb->cb_fn = cb_fn;
>> +			event_cb->cb_arg = cb_arg;
>> +			event_cb->dev_name = device_name;
>> +		}
> Is that OK to call TAILQ_INSERT_TAIL below if event_cb == NULL?
yes, that might be wrong.
>> +		TAILQ_INSERT_TAIL(&(dev_event_cbs), event_cb, next);
>> +	}
>> +
>> +	rte_spinlock_unlock(&rte_dev_event_lock);
>> +	return (event_cb == NULL) ? -1 : 0;
>> +}
>> +
>> +int
>> +rte_dev_callback_unregister(char *device_name, rte_dev_event_cb_fn cb_fn,
>> +				void *cb_arg)
>> +{
>> +	int ret;
>> +	struct rte_dev_event_callback *event_cb, *next;
>> +
>> +	if (!cb_fn || device_name == NULL)
>> +		return -EINVAL;
>> +
>> +	rte_spinlock_lock(&rte_dev_event_lock);
>> +
>> +	ret = 0;
>> +
>> +	for (event_cb = TAILQ_FIRST(&(dev_event_cbs)); event_cb != NULL;
>> +	      event_cb = next) {
>> +
>> +		next = TAILQ_NEXT(event_cb, next);
>> +
>> +		if (event_cb->cb_fn != cb_fn ||
>> +				(event_cb->cb_arg != (void *)-1 &&
>> +				event_cb->cb_arg != cb_arg) ||
>> +				strcmp(event_cb->dev_name, device_name))
> The same comments as above.
ok.
>> +			continue;
>> +
>> +		/*
>> +		 * if this callback is not executing right now,
>> +		 * then remove it.
>> +		 */
>> +		if (event_cb->active == 0) {
>> +			TAILQ_REMOVE(&(dev_event_cbs), event_cb, next);
>> +			rte_free(event_cb);
>> +		} else {
>> +			ret = -EAGAIN;
>> +		}
>> +	}
>> +
>> +	rte_spinlock_unlock(&rte_dev_event_lock);
>> +	return ret;
>> +}
>> +
> [......]
>
>> +int
>> +rte_dev_event_monitor_start(void)
>> +{
>> +	int ret;
>> +	struct rte_service_spec service;
>> +	uint32_t id;
>> +	const uint32_t sid = 0;
>> +
>> +	if (!service_no_init)
>> +		return 0;
>> +
>> +	uint32_t slcore_1 = rte_get_next_lcore(/* start core */ -1,
>> +					       /* skip master */ 1,
>> +					       /* wrap */ 0);
>> +
>> +	ret = rte_service_lcore_add(slcore_1);
>> +	if (ret) {
>> +		RTE_LOG(ERR, EAL, "dev event monitor lcore add fail");
>> +		return ret;
>> +	}
>> +
>> +	memset(&service, 0, sizeof(service));
>> +	snprintf(service.name, sizeof(service.name), DEV_EV_MNT_SERVICE_NAME);
>> +
>> +	service.socket_id = rte_socket_id();
>> +	service.callback = dev_uev_monitoring;
>> +	service.callback_userdata = NULL;
>> +	service.capabilities = 0;
>> +	ret = rte_service_component_register(&service, &id);
>> +	if (ret) {
>> +		RTE_LOG(ERR, EAL, "Failed to register service %s "
>> +			"err = %" PRId32,
>> +			service.name, ret);
>> +		return ret;
>> +	}
>> +	ret = rte_service_runstate_set(sid, 1);
>> +	if (ret) {
>> +		RTE_LOG(ERR, EAL, "Failed to set the runstate of "
>> +			"the service");
> Any rollback need to be done when fails?
yes,  should be handle fails.
>> +		return ret;
>> +	}
>> +	ret = rte_service_component_runstate_set(id, 1);
>> +	if (ret) {
>> +		RTE_LOG(ERR, EAL, "Failed to set the backend runstate"
>> +			" of a component");
>> +		return ret;
>> +	}
>> +	ret = rte_service_map_lcore_set(sid, slcore_1, 1);
>> +	if (ret) {
>> +		RTE_LOG(ERR, EAL, "Failed to enable lcore 1 on "
>> +			"dev event monitor service");
>> +		return ret;
>> +	}
>> +	rte_service_lcore_start(slcore_1);
>> +	service_no_init = false;
>> +	return 0;
>> +}
>> +
>> +int
>> +rte_dev_event_monitor_stop(void)
>> +{
>> +	service_exit = true;
>> +	service_no_init = true;
>> +	return 0;
> Are start and stop peer functions to call? If we call rte_dev_event_monitor_start to start monitor and then call rte_dev_event_monitor_stop to stop it, and then how to start again?
sure. should peer control.
>> +}
>> --
>> 2.7.4

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V12 3/3] app/testpmd: use uevent to monitor hotplug
  2018-01-24 15:21                                             ` Wu, Jingjing
@ 2018-01-25 14:58                                               ` Guo, Jia
  0 siblings, 0 replies; 494+ messages in thread
From: Guo, Jia @ 2018-01-25 14:58 UTC (permalink / raw)
  To: Wu, Jingjing, stephen, Richardson, Bruce, Yigit, Ferruh, gaetan.rivet
  Cc: Ananyev, Konstantin, jblunck, shreyansh.jain, dev, thomas, Zhang,
	Helin, motih



On 1/24/2018 11:21 PM, Wu, Jingjing wrote:
>> +
>> +static void
>> +add_uevent_callback(void *arg)
>> +{
>> +	char *dev_name = (char *)arg;
>> +
>> +	rte_eal_alarm_cancel(add_uevent_callback, arg);
>> +
>> +	if (!in_hotplug_list(dev_name))
>> +		return;
>> +
>> +	RTE_LOG(ERR, EAL, "add device: %s\n", dev_name);
> It's not an error, replace by printf?
sure.
>> +	attach_port(dev_name);
>> +}
>> +
>>   /* This function is used by the interrupt thread */
>>   static int
>>   eth_event_callback(portid_t port_id, enum rte_eth_event_type type, void *param,
>> @@ -1931,6 +2014,82 @@ eth_event_callback(portid_t port_id, enum
>> rte_eth_event_type type, void *param,
>>   }
>>
>>   static int
>> +in_hotplug_list(const char *dev_name)
>> +{
>> +	struct hotplug_request *hp_request = NULL;
>> +
>> +	TAILQ_FOREACH(hp_request, &hp_list, next) {
>> +		if (!strcmp(hp_request->dev_name, dev_name))
>> +			break;
>> +	}
>> +
>> +	if (hp_request)
>> +		return 1;
>> +
> Is it better to use TRUE and FALSE?
ok,make sense.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH V13 1/3] eal: add uevent monitor api and callback func
  2018-01-18  4:12                                           ` [PATCH V12 3/3] app/testpmd: use uevent to monitor hotplug Jeff Guo
  2018-01-24 15:21                                             ` Wu, Jingjing
  2018-01-25 14:46                                             ` [PATCH V13 1/3] eal: add uevent monitor api and callback func Jeff Guo
@ 2018-01-26  3:49                                             ` Jeff Guo
  2018-01-26  3:49                                               ` [PATCH V13 2/3] eal: add uevent pass and process function Jeff Guo
                                                                 ` (2 more replies)
  2 siblings, 3 replies; 494+ messages in thread
From: Jeff Guo @ 2018-01-26  3:49 UTC (permalink / raw)
  To: stephen, gaetan.rivet, jingjing.wu, thomas, motih
  Cc: bruce.richardson, ferruh.yigit, konstantin.ananyev, jblunck,
	shreyansh.jain, dev, jia.guo, helin.zhang

This patch aim to add a general uevent mechanism in eal device layer,
to enable all linux kernel object uevent monitoring, user could use these
APIs to monitor and read out the device status info that sent from the
kernel side, then corresponding to handle it, such as when detect hotplug
uevent type, user could detach or attach the device, and more it benefit
to use to do smoothly fail safe work.

About uevent monitoring:
a: add one epolling to poll the netlink socket, to monitor the uevent of
   the device.
b: add enum of rte_eal_dev_event_type and struct of rte_eal_uevent.
c: add below APIs in rte eal device layer.
   rte_dev_callback_register
   rte_dev_callback_unregister
   _rte_dev_callback_process
   rte_dev_event_monitor_start
   rte_dev_event_monitor_stop

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v13->v12:
fix some logic issue and null check issue
fix monitor stop func issue and bsp build issue
---
 lib/librte_eal/bsdapp/eal/Makefile      |   1 +
 lib/librte_eal/bsdapp/eal/eal_dev.c     |  38 +++++
 lib/librte_eal/common/eal_common_dev.c  | 133 +++++++++++++++++
 lib/librte_eal/common/include/rte_dev.h | 119 ++++++++++++++++
 lib/librte_eal/linuxapp/eal/Makefile    |   1 +
 lib/librte_eal/linuxapp/eal/eal_dev.c   | 243 ++++++++++++++++++++++++++++++++
 6 files changed, 535 insertions(+)
 create mode 100644 lib/librte_eal/bsdapp/eal/eal_dev.c
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_dev.c

diff --git a/lib/librte_eal/bsdapp/eal/Makefile b/lib/librte_eal/bsdapp/eal/Makefile
index c694076..5ff74d0 100644
--- a/lib/librte_eal/bsdapp/eal/Makefile
+++ b/lib/librte_eal/bsdapp/eal/Makefile
@@ -32,6 +32,7 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_lcore.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_timer.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_interrupts.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_alarm.c
+SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_dev.c
 
 # from common dir
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_common_lcore.c
diff --git a/lib/librte_eal/bsdapp/eal/eal_dev.c b/lib/librte_eal/bsdapp/eal/eal_dev.c
new file mode 100644
index 0000000..83ffdee
--- /dev/null
+++ b/lib/librte_eal/bsdapp/eal/eal_dev.c
@@ -0,0 +1,38 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdio.h>
+#include <string.h>
+#include <inttypes.h>
+#include <sys/queue.h>
+#include <sys/signalfd.h>
+#include <sys/ioctl.h>
+#include <sys/socket.h>
+#include <sys/epoll.h>
+#include <unistd.h>
+#include <signal.h>
+#include <stdbool.h>
+
+#include <rte_malloc.h>
+#include <rte_bus.h>
+#include <rte_dev.h>
+#include <rte_devargs.h>
+#include <rte_debug.h>
+#include <rte_log.h>
+
+#include "eal_thread.h"
+
+int
+rte_dev_event_monitor_start(void)
+{
+	RTE_LOG(ERR, EAL, "Not support event monitor for FreeBSD\n");
+	return -1;
+}
+
+int
+rte_dev_event_monitor_stop(void)
+{
+	RTE_LOG(ERR, EAL, "Not support event monitor for FreeBSD\n");
+	return -1;
+}
diff --git a/lib/librte_eal/common/eal_common_dev.c b/lib/librte_eal/common/eal_common_dev.c
index dda8f58..dbeb670 100644
--- a/lib/librte_eal/common/eal_common_dev.c
+++ b/lib/librte_eal/common/eal_common_dev.c
@@ -42,9 +42,31 @@
 #include <rte_devargs.h>
 #include <rte_debug.h>
 #include <rte_log.h>
+#include <rte_spinlock.h>
+#include <rte_malloc.h>
 
 #include "eal_private.h"
 
+/* spinlock for device callbacks */
+static rte_spinlock_t rte_dev_event_lock = RTE_SPINLOCK_INITIALIZER;
+
+/**
+ * The user application callback description.
+ *
+ * It contains callback address to be registered by user application,
+ * the pointer to the parameters for callback, and the device name.
+ */
+struct rte_dev_event_callback {
+	TAILQ_ENTRY(rte_dev_event_callback) next; /**< Callbacks list */
+	rte_dev_event_cb_fn cb_fn;                /**< Callback address */
+	void *cb_arg;                           /**< Callback parameter */
+	char *dev_name;	 /**< Callback devcie name, NULL is for all device */
+	uint32_t active;                        /**< Callback is executing */
+};
+
+/* A general callbacks list for all callback of devices */
+static struct rte_dev_event_cb_list dev_event_cbs;
+
 static int cmp_detached_dev_name(const struct rte_device *dev,
 	const void *_name)
 {
@@ -234,3 +256,114 @@ int rte_eal_hotplug_remove(const char *busname, const char *devname)
 	rte_eal_devargs_remove(busname, devname);
 	return ret;
 }
+
+int
+rte_dev_callback_register(char *device_name, rte_dev_event_cb_fn cb_fn,
+				void *cb_arg)
+{
+	struct rte_dev_event_callback *event_cb = NULL;
+
+	rte_spinlock_lock(&rte_dev_event_lock);
+
+	if (TAILQ_EMPTY(&(dev_event_cbs)))
+		TAILQ_INIT(&(dev_event_cbs));
+
+	TAILQ_FOREACH(event_cb, &(dev_event_cbs), next) {
+		if (event_cb->cb_fn == cb_fn &&
+			event_cb->cb_arg == cb_arg &&
+			((!device_name && !event_cb->dev_name) ? 1 :
+			(!strcmp(event_cb->dev_name, device_name))))
+			break;
+	}
+
+	/* create a new callback. */
+	if (event_cb == NULL) {
+		/* allocate a new user callback entity */
+		event_cb = malloc(sizeof(struct rte_dev_event_callback));
+		if (event_cb != NULL) {
+			event_cb->cb_fn = cb_fn;
+			event_cb->cb_arg = cb_arg;
+			event_cb->dev_name = !device_name ? NULL :
+				strcpy(event_cb->dev_name, device_name);
+			TAILQ_INSERT_TAIL(&(dev_event_cbs), event_cb, next);
+		} else
+			free(event_cb);
+	}
+
+	rte_spinlock_unlock(&rte_dev_event_lock);
+	return (event_cb == NULL) ? -1 : 0;
+}
+
+int
+rte_dev_callback_unregister(char *device_name, rte_dev_event_cb_fn cb_fn,
+				void *cb_arg)
+{
+	int ret;
+	struct rte_dev_event_callback *event_cb, *next;
+
+	if (!cb_fn || device_name == NULL)
+		return -EINVAL;
+
+	rte_spinlock_lock(&rte_dev_event_lock);
+
+	ret = 0;
+
+	for (event_cb = TAILQ_FIRST(&(dev_event_cbs)); event_cb != NULL;
+	      event_cb = next) {
+
+		next = TAILQ_NEXT(event_cb, next);
+
+		if (event_cb->cb_fn != cb_fn ||
+				(event_cb->cb_arg != (void *)-1 &&
+				event_cb->cb_arg != cb_arg) ||
+				(((!device_name && event_cb->dev_name) ||
+				(device_name && !event_cb->dev_name)) ? 1 :
+				strcmp(event_cb->dev_name, device_name)))
+			continue;
+
+		/*
+		 * if this callback is not executing right now,
+		 * then remove it.
+		 */
+		if (event_cb->active == 0) {
+			TAILQ_REMOVE(&(dev_event_cbs), event_cb, next);
+			rte_free(event_cb);
+		} else {
+			ret = -EAGAIN;
+		}
+	}
+
+	rte_spinlock_unlock(&rte_dev_event_lock);
+	return ret;
+}
+
+int
+_rte_dev_callback_process(char *device_name, enum rte_dev_event_type event,
+				void *cb_arg)
+{
+	struct rte_dev_event_callback dev_cb;
+	struct rte_dev_event_callback *cb_lst;
+	int rc = 0;
+
+	rte_spinlock_lock(&rte_dev_event_lock);
+
+	if (device_name == NULL)
+		return -EINVAL;
+
+	TAILQ_FOREACH(cb_lst, &(dev_event_cbs), next) {
+		if (cb_lst->cb_fn == NULL || (!cb_lst->dev_name ? 0 :
+			strcmp(cb_lst->dev_name,
+			device_name) && cb_lst->dev_name))
+			continue;
+		dev_cb = *cb_lst;
+		cb_lst->active = 1;
+		if (cb_arg)
+			dev_cb.cb_arg = cb_arg;
+		rc = dev_cb.cb_fn(device_name, event,
+				dev_cb.cb_arg);
+		cb_lst->active = 0;
+	}
+
+	rte_spinlock_unlock(&rte_dev_event_lock);
+	return rc;
+}
diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h
index 8088dcc..88fbb2d 100644
--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -52,6 +52,30 @@ extern "C" {
 #include <rte_config.h>
 #include <rte_log.h>
 
+/**
+ * The device event type.
+ */
+enum rte_dev_event_type {
+	RTE_DEV_EVENT_UNKNOWN,	/**< unknown event type */
+	RTE_DEV_EVENT_ADD,	/**< device being added */
+	RTE_DEV_EVENT_REMOVE,	/**< device being removed */
+	RTE_DEV_EVENT_MAX	/**< max value of this enum */
+};
+
+struct rte_dev_event {
+	enum rte_dev_event_type type;	/**< device event type */
+	int subsystem;			/**< subsystem id */
+	char *devname;			/**< device name */
+};
+
+typedef int (*rte_dev_event_cb_fn)(char *device_name,
+					enum rte_dev_event_type event,
+					void *cb_arg);
+
+struct rte_dev_event_callback;
+/** @internal Structure to keep track of registered callbacks */
+TAILQ_HEAD(rte_dev_event_cb_list, rte_dev_event_callback);
+
 __attribute__((format(printf, 2, 0)))
 static inline void
 rte_pmd_debug_trace(const char *func_name, const char *fmt, ...)
@@ -294,4 +318,99 @@ __attribute__((used)) = str
 }
 #endif
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * It registers the callback for the specific device.
+ * Multiple callbacks cal be registered at the same time.
+ *
+ * @param device_name
+ *  The device name, that is the param name of the struct rte_device,
+ *  null value means for all devices.
+ * @param cb_fn
+ *  callback address.
+ * @param cb_arg
+ *  address of parameter for callback.
+ *
+ * @return
+ *  - On success, zero.
+ *  - On failure, a negative value.
+ */
+int rte_dev_callback_register(char *device_name, rte_dev_event_cb_fn cb_fn,
+					void *cb_arg);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * It unregisters the callback according to the specified device.
+ *
+ * @param device_name
+ *  The device name, that is the param name of the struct rte_device,
+ *  null value means for all devices.
+ * @param cb_fn
+ *  callback address.
+ * @param cb_arg
+ *  address of parameter for callback, (void *)-1 means to remove all
+ *  registered which has the same callback address.
+ *
+ * @return
+ *  - On success, return the number of callback entities removed.
+ *  - On failure, a negative value.
+ */
+int rte_dev_callback_unregister(char *device_name, rte_dev_event_cb_fn cb_fn,
+					void *cb_arg);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * internal Executes all the user application registered callbacks for
+ * the specific device. It is for DPDK internal user only. User
+ * application should not call it directly.
+ *
+ * @param device_name
+ *  The device name.
+ * @param event
+ *  the device event type
+ *  is permitted or not.
+ * @param cb_arg
+ *  callback parameter.
+ *
+ * @return
+ *  - On success, return zero.
+ *  - On failure, a negative value.
+ */
+int
+_rte_dev_callback_process(char *device_name, enum rte_dev_event_type event,
+				void *cb_arg);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Start the device event monitoring.
+ *
+ * @param none
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+rte_dev_event_monitor_start(void);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Stop the device event monitoring .
+ *
+ * @param none
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+rte_dev_event_monitor_stop(void);
 #endif /* _RTE_DEV_H_ */
diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index 7bf278f..a167401 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -39,6 +39,7 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_lcore.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_timer.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_interrupts.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_alarm.c
+SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_dev.c
 
 # from common dir
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_common_lcore.c
diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
new file mode 100644
index 0000000..887884c
--- /dev/null
+++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
@@ -0,0 +1,243 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdio.h>
+#include <string.h>
+#include <inttypes.h>
+#include <sys/queue.h>
+#include <sys/signalfd.h>
+#include <sys/ioctl.h>
+#include <sys/socket.h>
+#include <linux/netlink.h>
+#include <sys/epoll.h>
+#include <unistd.h>
+#include <signal.h>
+#include <stdbool.h>
+
+#include <rte_malloc.h>
+#include <rte_bus.h>
+#include <rte_dev.h>
+#include <rte_devargs.h>
+#include <rte_debug.h>
+#include <rte_log.h>
+#include <rte_service.h>
+#include <rte_service_component.h>
+
+#include "eal_thread.h"
+
+bool service_exit = true;
+bool service_no_init = true;
+uint32_t slcore;
+uint32_t sevice_id;
+#define DEV_EV_MNT_SERVICE_NAME "device_event_monitor_service"
+
+static int
+dev_monitor_fd_new(void)
+{
+
+	int uevent_fd;
+
+	uevent_fd = socket(PF_NETLINK, SOCK_RAW | SOCK_CLOEXEC |
+			SOCK_NONBLOCK,
+			NETLINK_KOBJECT_UEVENT);
+	if (uevent_fd < 0) {
+		RTE_LOG(ERR, EAL, "create uevent fd failed\n");
+		return -1;
+	}
+	return uevent_fd;
+}
+
+static int
+dev_monitor_enable(int netlink_fd)
+{
+	struct sockaddr_nl addr;
+	int ret;
+	int size = 64 * 1024;
+	int nonblock = 1;
+
+	memset(&addr, 0, sizeof(addr));
+	addr.nl_family = AF_NETLINK;
+	addr.nl_pid = 0;
+	addr.nl_groups = 0xffffffff;
+
+	if (bind(netlink_fd, (struct sockaddr *) &addr, sizeof(addr)) < 0) {
+		RTE_LOG(ERR, EAL, "bind failed\n");
+		goto err;
+	}
+
+	setsockopt(netlink_fd, SOL_SOCKET, SO_PASSCRED, &size, sizeof(size));
+
+	ret = ioctl(netlink_fd, FIONBIO, &nonblock);
+	if (ret != 0) {
+		RTE_LOG(ERR, EAL, "ioctl(FIONBIO) failed\n");
+		goto err;
+	}
+	return 0;
+err:
+	close(netlink_fd);
+	return -1;
+}
+
+static int
+dev_uev_process(__rte_unused struct epoll_event *events, __rte_unused int nfds)
+{
+	/* TODO: device uevent processing */
+	return 0;
+}
+
+/**
+ * It builds/rebuilds up the epoll file descriptor with all the
+ * file descriptors being waited on. Then handles the netlink event.
+ *
+ * @param arg
+ *  pointer. (unused)
+ *
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+static int32_t dev_uev_monitoring(__rte_unused void *arg)
+{
+	int netlink_fd = -1;
+	struct epoll_event ep_kernel;
+	int fd_ep = -1;
+
+	service_exit = false;
+
+	fd_ep = epoll_create1(EPOLL_CLOEXEC);
+	if (fd_ep < 0) {
+		RTE_LOG(ERR, EAL, "error creating epoll fd: %m\n");
+		goto out;
+	}
+
+	netlink_fd = dev_monitor_fd_new();
+
+	if (dev_monitor_enable(netlink_fd) < 0) {
+		RTE_LOG(ERR, EAL, "error subscribing to kernel events\n");
+		goto out;
+	}
+
+	memset(&ep_kernel, 0, sizeof(struct epoll_event));
+	ep_kernel.events = EPOLLIN | EPOLLPRI | EPOLLRDHUP | EPOLLHUP;
+	ep_kernel.data.fd = netlink_fd;
+	if (epoll_ctl(fd_ep, EPOLL_CTL_ADD, netlink_fd,
+		&ep_kernel) < 0) {
+		RTE_LOG(ERR, EAL, "error addding fd to epoll: %m\n");
+		goto out;
+	}
+
+	while (!service_exit) {
+		int fdcount;
+		struct epoll_event ev[1];
+
+		fdcount = epoll_wait(fd_ep, ev, 1, -1);
+		if (fdcount < 0) {
+			if (errno != EINTR)
+				RTE_LOG(ERR, EAL, "error receiving uevent "
+					"message: %m\n");
+				continue;
+			}
+
+		/* epoll_wait has at least one fd ready to read */
+		if (dev_uev_process(ev, fdcount) < 0) {
+			if (errno != EINTR)
+				RTE_LOG(ERR, EAL, "error processing uevent "
+					"message: %m\n");
+		}
+	}
+	return 0;
+out:
+	if (fd_ep >= 0)
+		close(fd_ep);
+	if (netlink_fd >= 0)
+		close(netlink_fd);
+	rte_panic("uev monitoring fail\n");
+	return -1;
+}
+
+int
+rte_dev_event_monitor_start(void)
+{
+	int ret;
+	struct rte_service_spec service;
+	const uint32_t sid = 0;
+
+	if (!service_no_init)
+		return 0;
+
+	slcore = rte_get_next_lcore(/* start core */ -1,
+					       /* skip master */ 1,
+					       /* wrap */ 0);
+
+	ret = rte_service_lcore_add(slcore);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "dev event monitor lcore add fail");
+		return ret;
+	}
+
+	memset(&service, 0, sizeof(service));
+	snprintf(service.name, sizeof(service.name), DEV_EV_MNT_SERVICE_NAME);
+
+	service.socket_id = rte_socket_id();
+	service.callback = dev_uev_monitoring;
+	service.callback_userdata = NULL;
+	service.capabilities = 0;
+	ret = rte_service_component_register(&service, &sevice_id);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Failed to register service %s "
+			"err = %" PRId32,
+			service.name, ret);
+		return ret;
+	}
+	ret = rte_service_runstate_set(sid, 1);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Failed to set the runstate of "
+			"the service");
+		goto err_done;
+	}
+	ret = rte_service_component_runstate_set(sevice_id, 1);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Failed to set the backend runstate"
+			" of a component");
+		return ret;
+	}
+	ret = rte_service_map_lcore_set(sid, slcore, 1);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Failed to enable lcore 1 on "
+			"dev event monitor service");
+		return ret;
+	}
+	rte_service_lcore_start(slcore);
+	service_no_init = false;
+	return 0;
+
+err_done:
+	rte_service_component_unregister(sevice_id);
+	return ret;
+}
+
+int
+rte_dev_event_monitor_stop(void)
+{
+	int ret;
+
+	ret = rte_service_lcore_stop(slcore);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Failed to stop lcore on "
+			"dev event monitor service");
+		return ret;
+	}
+
+	rte_service_component_unregister(sevice_id);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Failed to unregister service "
+			"err = %" PRId32, ret);
+		return ret;
+	}
+
+	service_exit = true;
+	service_no_init = true;
+
+	return 0;
+}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH V13 2/3] eal: add uevent pass and process function
  2018-01-26  3:49                                             ` [PATCH V13 1/3] eal: add uevent monitor api and callback func Jeff Guo
@ 2018-01-26  3:49                                               ` Jeff Guo
  2018-01-26  3:49                                               ` [PATCH V13 3/3] app/testpmd: use uevent to monitor hotplug Jeff Guo
  2018-01-26 16:53                                               ` [PATCH V13 " Bruce Richardson
  2 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-01-26  3:49 UTC (permalink / raw)
  To: stephen, gaetan.rivet, jingjing.wu, thomas, motih
  Cc: bruce.richardson, ferruh.yigit, konstantin.ananyev, jblunck,
	shreyansh.jain, dev, jia.guo, helin.zhang

In order to handle the uevent which have been detected from the kernel
side, add uevent process function, let hot plug event to be example to
show uevent mechanism how to pass the uevent and process the uevent.

About uevent passing and processing, add below functions in linux eal
dev layer. FreeBSD not support uevent ,so let it to be void and do not
implement in function.
a.dev_uev_parse
b.dev_uev_receive
c.dev_uev_process

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Reviewed-by: Jingjing Wu <jingjing.wu@intel.com>
---
v13->v12:
fix some event parse issue
---
 lib/librte_eal/common/include/rte_dev.h |  16 +++++
 lib/librte_eal/linuxapp/eal/eal_dev.c   | 105 +++++++++++++++++++++++++++++++-
 2 files changed, 119 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h
index 88fbb2d..9396a9f 100644
--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -52,6 +52,22 @@ extern "C" {
 #include <rte_config.h>
 #include <rte_log.h>
 
+#define RTE_EAL_UEV_MSG_LEN 4096
+#define RTE_EAL_UEV_MSG_ELEM_LEN 128
+
+enum rte_dev_state {
+	RTE_DEV_UNDEFINED,	/**< unknown device state */
+	RTE_DEV_FAULT,	/**< device fault or error */
+	RTE_DEV_PARSED,	/**< device has been scanned on bus*/
+	RTE_DEV_PROBED,	/**< device has been probed driver  */
+};
+
+enum rte_dev_event_subsystem {
+	RTE_DEV_EVENT_SUBSYSTEM_UIO,
+	RTE_DEV_EVENT_SUBSYSTEM_VFIO,
+	RTE_DEV_EVENT_SUBSYSTEM_MAX
+};
+
 /**
  * The device event type.
  */
diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
index 887884c..9ada746 100644
--- a/lib/librte_eal/linuxapp/eal/eal_dev.c
+++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
@@ -79,10 +79,111 @@ dev_monitor_enable(int netlink_fd)
 	return -1;
 }
 
+static void
+dev_uev_parse(const char *buf, struct rte_dev_event *event)
+{
+	char action[RTE_EAL_UEV_MSG_ELEM_LEN];
+	char subsystem[RTE_EAL_UEV_MSG_ELEM_LEN];
+	char dev_path[RTE_EAL_UEV_MSG_ELEM_LEN];
+	char pci_slot_name[RTE_EAL_UEV_MSG_ELEM_LEN];
+	int i = 0;
+
+	memset(action, 0, RTE_EAL_UEV_MSG_ELEM_LEN);
+	memset(subsystem, 0, RTE_EAL_UEV_MSG_ELEM_LEN);
+	memset(dev_path, 0, RTE_EAL_UEV_MSG_ELEM_LEN);
+	memset(pci_slot_name, 0, RTE_EAL_UEV_MSG_ELEM_LEN);
+
+	while (i < RTE_EAL_UEV_MSG_LEN) {
+		for (; i < RTE_EAL_UEV_MSG_LEN; i++) {
+			if (*buf)
+				break;
+			buf++;
+		}
+
+		if (!strncmp(buf, "ACTION=", 7)) {
+			buf += 7;
+			i += 7;
+			snprintf(action, sizeof(action), "%s", buf);
+		} else if (!strncmp(buf, "DEVPATH=", 8)) {
+			buf += 8;
+			i += 8;
+			snprintf(dev_path, sizeof(dev_path), "%s", buf);
+		} else if (!strncmp(buf, "SUBSYSTEM=", 10)) {
+			buf += 10;
+			i += 10;
+			snprintf(subsystem, sizeof(subsystem), "%s", buf);
+		} else if (!strncmp(buf, "PCI_SLOT_NAME=", 14)) {
+			buf += 14;
+			i += 14;
+			snprintf(pci_slot_name, sizeof(subsystem), "%s", buf);
+		}
+		for (; i < RTE_EAL_UEV_MSG_LEN; i++) {
+			if (*buf == '\0')
+				break;
+			buf++;
+		}
+	}
+
+
+	if ((!strncmp(subsystem, "uio", 3)) ||
+		(!strncmp(subsystem, "pci", 3))) {
+		event->subsystem = RTE_DEV_EVENT_SUBSYSTEM_UIO;
+		if (!strncmp(action, "add", 3))
+			event->type = RTE_DEV_EVENT_ADD;
+		if (!strncmp(action, "remove", 6))
+			event->type = RTE_DEV_EVENT_REMOVE;
+		event->devname = pci_slot_name;
+	}
+}
+
+static int
+dev_uev_receive(int fd, struct rte_dev_event *uevent)
+{
+	int ret;
+	char buf[RTE_EAL_UEV_MSG_LEN];
+
+	memset(uevent, 0, sizeof(struct rte_dev_event));
+	memset(buf, 0, RTE_EAL_UEV_MSG_LEN);
+
+	ret = recv(fd, buf, RTE_EAL_UEV_MSG_LEN - 1, MSG_DONTWAIT);
+	if (ret < 0) {
+		RTE_LOG(ERR, EAL,
+		"Socket read error(%d): %s\n",
+		errno, strerror(errno));
+		return -1;
+	} else if (ret == 0)
+		/* connection closed */
+		return -1;
+
+	dev_uev_parse(buf, uevent);
+
+	return 0;
+}
+
 static int
-dev_uev_process(__rte_unused struct epoll_event *events, __rte_unused int nfds)
+dev_uev_process(struct epoll_event *events, int nfds)
 {
-	/* TODO: device uevent processing */
+	struct rte_dev_event uevent;
+	int i;
+
+	for (i = 0; i < nfds; i++) {
+		if (dev_uev_receive(events[i].data.fd, &uevent))
+			return 0;
+
+		/* default handle all pci devcie when is being hot plug */
+		if (uevent.subsystem == RTE_DEV_EVENT_SUBSYSTEM_UIO &&
+			uevent.devname) {
+			if (uevent.type == RTE_DEV_EVENT_REMOVE) {
+				return(_rte_dev_callback_process(
+					uevent.devname,
+					RTE_DEV_EVENT_REMOVE, NULL));
+			} else if (uevent.type == RTE_DEV_EVENT_ADD) {
+				return(_rte_dev_callback_process(
+					uevent.devname,
+					RTE_DEV_EVENT_ADD, NULL));
+			}
+		}
+	}
 	return 0;
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH V13 3/3] app/testpmd: use uevent to monitor hotplug
  2018-01-26  3:49                                             ` [PATCH V13 1/3] eal: add uevent monitor api and callback func Jeff Guo
  2018-01-26  3:49                                               ` [PATCH V13 2/3] eal: add uevent pass and process function Jeff Guo
@ 2018-01-26  3:49                                               ` Jeff Guo
  2018-01-30 12:20                                                 ` [PATCH V14 1/3] eal: add uevent monitor api and callback func Jeff Guo
  2018-01-26 16:53                                               ` [PATCH V13 " Bruce Richardson
  2 siblings, 1 reply; 494+ messages in thread
From: Jeff Guo @ 2018-01-26  3:49 UTC (permalink / raw)
  To: stephen, gaetan.rivet, jingjing.wu, thomas, motih
  Cc: bruce.richardson, ferruh.yigit, konstantin.ananyev, jblunck,
	shreyansh.jain, dev, jia.guo, helin.zhang

use testpmd for example, to show app how to request and use
uevent monitoring to handle the hot removal event and the
hot insertion event.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v13->v12:
refine some code style
---
 app/test-pmd/testpmd.c | 169 +++++++++++++++++++++++++++++++++++++++++++++++++
 app/test-pmd/testpmd.h |   9 +++
 2 files changed, 178 insertions(+)

diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 5dc8cca..b29c9d5 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -12,6 +12,7 @@
 #include <sys/mman.h>
 #include <sys/types.h>
 #include <errno.h>
+#include <stdbool.h>
 
 #include <sys/queue.h>
 #include <sys/stat.h>
@@ -367,6 +368,8 @@ uint8_t bitrate_enabled;
 struct gro_status gro_ports[RTE_MAX_ETHPORTS];
 uint8_t gro_flush_cycles = GRO_DEFAULT_FLUSH_CYCLES;
 
+static struct hotplug_request_list hp_list;
+
 /* Forward function declarations */
 static void map_port_queue_stats_mapping_registers(portid_t pi,
 						   struct rte_port *port);
@@ -374,6 +377,13 @@ static void check_all_ports_link_status(uint32_t port_mask);
 static int eth_event_callback(portid_t port_id,
 			      enum rte_eth_event_type type,
 			      void *param, void *ret_param);
+static int eth_uevent_callback(char *device_name, enum rte_dev_event_type type,
+			      void *param);
+static int eth_uevent_callback_register(portid_t port_id);
+static bool in_hotplug_list(const char *dev_name);
+
+static int hotplug_list_add(const char *dev_name,
+			    enum rte_dev_event_type event);
 
 /*
  * Check if all the ports are started.
@@ -1835,6 +1845,27 @@ reset_port(portid_t pid)
 	printf("Done\n");
 }
 
+static int
+eth_uevent_callback_register(portid_t port_id)
+{
+	int diag;
+	char device_name[128];
+
+	snprintf(device_name, sizeof(device_name),
+		"%s", rte_eth_devices[port_id].device->name);
+
+	/* register the uevent callback */
+
+	diag = rte_dev_callback_register(device_name,
+		eth_uevent_callback, (void *)(intptr_t)port_id);
+	if (diag) {
+		printf("Failed to setup uevent callback\n");
+		return -1;
+	}
+
+	return 0;
+}
+
 void
 attach_port(char *identifier)
 {
@@ -1851,6 +1882,8 @@ attach_port(char *identifier)
 	if (rte_eth_dev_attach(identifier, &pi))
 		return;
 
+	eth_uevent_callback_register(pi);
+
 	socket_id = (unsigned)rte_eth_dev_socket_id(pi);
 	/* if socket_id is invalid, set to 0 */
 	if (check_socket_id(socket_id) < 0)
@@ -1862,6 +1895,8 @@ attach_port(char *identifier)
 
 	ports[pi].port_status = RTE_PORT_STOPPED;
 
+	hotplug_list_add(identifier, RTE_DEV_EVENT_REMOVE);
+
 	printf("Port %d is attached. Now total ports is %d\n", pi, nb_ports);
 	printf("Done\n");
 }
@@ -1888,6 +1923,9 @@ detach_port(portid_t port_id)
 
 	nb_ports = rte_eth_dev_count();
 
+	hotplug_list_add(rte_eth_devices[port_id].device->name,
+			 RTE_DEV_EVENT_ADD);
+
 	printf("Port '%s' is detached. Now total ports is %d\n",
 			name, nb_ports);
 	printf("Done\n");
@@ -1911,6 +1949,9 @@ pmd_test_exit(void)
 			close_port(pt_id);
 		}
 	}
+
+	rte_dev_event_monitor_stop();
+
 	printf("\nBye...\n");
 }
 
@@ -1995,6 +2036,49 @@ rmv_event_callback(void *arg)
 			dev->device->name);
 }
 
+static void
+rmv_uevent_callback(void *arg)
+{
+	char name[RTE_ETH_NAME_MAX_LEN];
+	uint8_t port_id = (intptr_t)arg;
+
+	rte_eal_alarm_cancel(rmv_uevent_callback, arg);
+
+	RTE_ETH_VALID_PORTID_OR_RET(port_id);
+	printf("removing port id:%u\n", port_id);
+
+	if (!in_hotplug_list(rte_eth_devices[port_id].device->name))
+		return;
+
+	stop_packet_forwarding();
+
+	stop_port(port_id);
+	close_port(port_id);
+	if (rte_eth_dev_detach(port_id, name)) {
+		TESTPMD_LOG(ERR, "Failed to detach port '%s'\n", name);
+		return;
+	}
+
+	nb_ports = rte_eth_dev_count();
+
+	printf("Port '%s' is detached. Now total ports is %d\n",
+			name, nb_ports);
+}
+
+static void
+add_uevent_callback(void *arg)
+{
+	char *dev_name = (char *)arg;
+
+	rte_eal_alarm_cancel(add_uevent_callback, arg);
+
+	if (!in_hotplug_list(dev_name))
+		return;
+
+	printf("add device: %s\n", dev_name);
+	attach_port(dev_name);
+}
+
 /* This function is used by the interrupt thread */
 static int
 eth_event_callback(portid_t port_id, enum rte_eth_event_type type, void *param,
@@ -2038,6 +2122,82 @@ eth_event_callback(portid_t port_id, enum rte_eth_event_type type, void *param,
 	return 0;
 }
 
+static bool
+in_hotplug_list(const char *dev_name)
+{
+	struct hotplug_request *hp_request = NULL;
+
+	TAILQ_FOREACH(hp_request, &hp_list, next) {
+		if (!strcmp(hp_request->dev_name, dev_name))
+			break;
+	}
+
+	if (hp_request)
+		return true;
+
+	return false;
+}
+
+static int
+hotplug_list_add(const char *dev_name, enum rte_dev_event_type event)
+{
+	struct hotplug_request *hp_request;
+
+	hp_request = rte_zmalloc("hoplug request",
+			sizeof(*hp_request), 0);
+	if (hp_request == NULL) {
+		fprintf(stderr, "%s can not alloc memory\n",
+			__func__);
+		return -ENOMEM;
+	}
+
+	hp_request->dev_name = dev_name;
+	hp_request->event = event;
+
+	TAILQ_INSERT_TAIL(&hp_list, hp_request, next);
+
+	return 0;
+}
+
+/* This function is used by the interrupt thread */
+static int
+eth_uevent_callback(char *device_name, enum rte_dev_event_type type, void *arg)
+{
+	static const char * const event_desc[] = {
+		[RTE_DEV_EVENT_UNKNOWN] = "Unknown",
+		[RTE_DEV_EVENT_ADD] = "add",
+		[RTE_DEV_EVENT_REMOVE] = "remove",
+	};
+
+	if (type >= RTE_DEV_EVENT_MAX) {
+		fprintf(stderr, "%s called upon invalid event %d\n",
+			__func__, type);
+		fflush(stderr);
+	} else if (event_print_mask & (UINT32_C(1) << type)) {
+		printf("%s event\n",
+			event_desc[type]);
+		fflush(stdout);
+	}
+
+	switch (type) {
+	case RTE_DEV_EVENT_REMOVE:
+		if (rte_eal_alarm_set(100000,
+			rmv_uevent_callback, arg))
+			fprintf(stderr, "Could not set up deferred "
+				"device removal\n");
+		break;
+	case RTE_DEV_EVENT_ADD:
+		if (rte_eal_alarm_set(500000,
+			add_uevent_callback, (void *)device_name))
+			fprintf(stderr, "Could not set up deferred "
+				"device add\n");
+		break;
+	default:
+		break;
+	}
+	return 0;
+}
+
 static int
 set_tx_queue_stats_mapping_registers(portid_t port_id, struct rte_port *port)
 {
@@ -2519,6 +2679,15 @@ main(int argc, char** argv)
 		       nb_rxq, nb_txq);
 
 	init_config();
+
+	/* enable hot plug monitoring */
+	TAILQ_INIT(&hp_list);
+	rte_dev_event_monitor_start();
+	RTE_ETH_FOREACH_DEV(port_id) {
+		hotplug_list_add(rte_eth_devices[port_id].device->name,
+			 RTE_DEV_EVENT_REMOVE);
+		eth_uevent_callback_register(port_id);
+	}
 	if (start_port(RTE_PORT_ALL) != 0)
 		rte_exit(EXIT_FAILURE, "Start ports failed\n");
 
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 47f8fa8..c797667 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -63,6 +63,15 @@ typedef uint16_t streamid_t;
 #define TM_MODE			0
 #endif
 
+struct hotplug_request {
+	TAILQ_ENTRY(hotplug_request) next; /**< Callbacks list */
+	const char *dev_name;                /* request device name */
+	enum rte_dev_event_type event;      /**< device event type */
+};
+
+/** @internal Structure to keep track of registered callbacks */
+TAILQ_HEAD(hotplug_request_list, hotplug_request);
+
 enum {
 	PORT_TOPOLOGY_PAIRED,
 	PORT_TOPOLOGY_CHAINED,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* Re: [PATCH V13 1/3] eal: add uevent monitor api and callback func
  2018-01-26  3:49                                             ` [PATCH V13 1/3] eal: add uevent monitor api and callback func Jeff Guo
  2018-01-26  3:49                                               ` [PATCH V13 2/3] eal: add uevent pass and process function Jeff Guo
  2018-01-26  3:49                                               ` [PATCH V13 3/3] app/testpmd: use uevent to monitor hotplug Jeff Guo
@ 2018-01-26 16:53                                               ` Bruce Richardson
  2018-01-27  3:48                                                 ` Guo, Jia
  2 siblings, 1 reply; 494+ messages in thread
From: Bruce Richardson @ 2018-01-26 16:53 UTC (permalink / raw)
  To: Jeff Guo
  Cc: stephen, gaetan.rivet, jingjing.wu, thomas, motih, ferruh.yigit,
	konstantin.ananyev, jblunck, shreyansh.jain, dev, helin.zhang

On Fri, Jan 26, 2018 at 11:49:35AM +0800, Jeff Guo wrote:
> This patch aim to add a general uevent mechanism in eal device layer,
> to enable all linux kernel object uevent monitoring, user could use these
> APIs to monitor and read out the device status info that sent from the
> kernel side, then corresponding to handle it, such as when detect hotplug
> uevent type, user could detach or attach the device, and more it benefit
> to use to do smoothly fail safe work.
> 
> About uevent monitoring:
> a: add one epolling to poll the netlink socket, to monitor the uevent of
>    the device.
> b: add enum of rte_eal_dev_event_type and struct of rte_eal_uevent.
> c: add below APIs in rte eal device layer.
>    rte_dev_callback_register
>    rte_dev_callback_unregister
>    _rte_dev_callback_process
>    rte_dev_event_monitor_start
>    rte_dev_event_monitor_stop
> 
> Signed-off-by: Jeff Guo <jia.guo@intel.com>

Hi Jeff,


> ---
> v13->v12:
> fix some logic issue and null check issue
> fix monitor stop func issue and bsp build issue

<snip>

> +int
> +rte_dev_event_monitor_start(void)
> +{
> +	int ret;
> +	struct rte_service_spec service;
> +	const uint32_t sid = 0;
> +
> +	if (!service_no_init)
> +		return 0;
> +
> +	slcore = rte_get_next_lcore(/* start core */ -1,
> +					       /* skip master */ 1,
> +					       /* wrap */ 0);
> +
> +	ret = rte_service_lcore_add(slcore);
> +	if (ret) {
> +		RTE_LOG(ERR, EAL, "dev event monitor lcore add fail");
> +		return ret;
> +	}
> +
I don't think you should be taking another service core for this purpose
without the user asking for it. I also don't think service cores is the
right "tool" for monitoring the epoll. Rather than using a non-blocking
poll on a service core, I think you should look to reuse the existing
infrastructure for handling interrupts in the EAL, which relies on a
separate thread blocked on fd's awaiting input.

> +	memset(&service, 0, sizeof(service));
> +	snprintf(service.name, sizeof(service.name), DEV_EV_MNT_SERVICE_NAME);
> +
> +	service.socket_id = rte_socket_id();
> +	service.callback = dev_uev_monitoring;
> +	service.callback_userdata = NULL;
> +	service.capabilities = 0;
> +	ret = rte_service_component_register(&service, &sevice_id);
> +	if (ret) {
> +		RTE_LOG(ERR, EAL, "Failed to register service %s "
> +			"err = %" PRId32,
> +			service.name, ret);
> +		return ret;
> +	}
> +	ret = rte_service_runstate_set(sid, 1);
> +	if (ret) {
> +		RTE_LOG(ERR, EAL, "Failed to set the runstate of "
> +			"the service");
> +		goto err_done;
> +	}
> +	ret = rte_service_component_runstate_set(sevice_id, 1);
> +	if (ret) {
> +		RTE_LOG(ERR, EAL, "Failed to set the backend runstate"
> +			" of a component");
> +		return ret;
> +	}
> +	ret = rte_service_map_lcore_set(sid, slcore, 1);
> +	if (ret) {
> +		RTE_LOG(ERR, EAL, "Failed to enable lcore 1 on "
> +			"dev event monitor service");
> +		return ret;
> +	}
> +	rte_service_lcore_start(slcore);
> +	service_no_init = false;
> +	return 0;
> +
> +err_done:
> +	rte_service_component_unregister(sevice_id);
> +	return ret;
> +}
> +

Regards,
/Bruce

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V13 1/3] eal: add uevent monitor api and callback func
  2018-01-26 16:53                                               ` [PATCH V13 " Bruce Richardson
@ 2018-01-27  3:48                                                 ` Guo, Jia
  2018-01-30  0:14                                                   ` Thomas Monjalon
  0 siblings, 1 reply; 494+ messages in thread
From: Guo, Jia @ 2018-01-27  3:48 UTC (permalink / raw)
  To: Bruce Richardson, thomas
  Cc: stephen, gaetan.rivet, jingjing.wu, motih, ferruh.yigit,
	konstantin.ananyev, jblunck, shreyansh.jain, dev, helin.zhang



On 1/27/2018 12:53 AM, Bruce Richardson wrote:
> On Fri, Jan 26, 2018 at 11:49:35AM +0800, Jeff Guo wrote:
>> This patch aim to add a general uevent mechanism in eal device layer,
>> to enable all linux kernel object uevent monitoring, user could use these
>> APIs to monitor and read out the device status info that sent from the
>> kernel side, then corresponding to handle it, such as when detect hotplug
>> uevent type, user could detach or attach the device, and more it benefit
>> to use to do smoothly fail safe work.
>>
>> About uevent monitoring:
>> a: add one epolling to poll the netlink socket, to monitor the uevent of
>>     the device.
>> b: add enum of rte_eal_dev_event_type and struct of rte_eal_uevent.
>> c: add below APIs in rte eal device layer.
>>     rte_dev_callback_register
>>     rte_dev_callback_unregister
>>     _rte_dev_callback_process
>>     rte_dev_event_monitor_start
>>     rte_dev_event_monitor_stop
>>
>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> Hi Jeff,
>
>
>> ---
>> v13->v12:
>> fix some logic issue and null check issue
>> fix monitor stop func issue and bsp build issue
> <snip>
>
>> +int
>> +rte_dev_event_monitor_start(void)
>> +{
>> +	int ret;
>> +	struct rte_service_spec service;
>> +	const uint32_t sid = 0;
>> +
>> +	if (!service_no_init)
>> +		return 0;
>> +
>> +	slcore = rte_get_next_lcore(/* start core */ -1,
>> +					       /* skip master */ 1,
>> +					       /* wrap */ 0);
>> +
>> +	ret = rte_service_lcore_add(slcore);
>> +	if (ret) {
>> +		RTE_LOG(ERR, EAL, "dev event monitor lcore add fail");
>> +		return ret;
>> +	}
>> +
> I don't think you should be taking another service core for this purpose
> without the user asking for it. I also don't think service cores is the
> right "tool" for monitoring the epoll. Rather than using a non-blocking
> poll on a service core, I think you should look to reuse the existing
> infrastructure for handling interrupts in the EAL, which relies on a
> separate thread blocked on fd's awaiting input.
bruce, seems that you might be see the other view of the mountain, so if 
service cores tools basically be born to  need user knowledge and 
control it, and it is no need to add user to control service tool in the 
case, i thinks we might not use the existing interrupts infrastructure 
because it is the device uevent not interrupt as the same functional 
scope ,  we could use a separate thread which i have used before in v7 
to specialize poll the uevent, please check v7 part to see if it is good.

@tomas, do you agree with that above , or other suggestion, could it be 
got agreement all or let it improvement later?
>> +	memset(&service, 0, sizeof(service));
>> +	snprintf(service.name, sizeof(service.name), DEV_EV_MNT_SERVICE_NAME);
>> +
>> +	service.socket_id = rte_socket_id();
>> +	service.callback = dev_uev_monitoring;
>> +	service.callback_userdata = NULL;
>> +	service.capabilities = 0;
>> +	ret = rte_service_component_register(&service, &sevice_id);
>> +	if (ret) {
>> +		RTE_LOG(ERR, EAL, "Failed to register service %s "
>> +			"err = %" PRId32,
>> +			service.name, ret);
>> +		return ret;
>> +	}
>> +	ret = rte_service_runstate_set(sid, 1);
>> +	if (ret) {
>> +		RTE_LOG(ERR, EAL, "Failed to set the runstate of "
>> +			"the service");
>> +		goto err_done;
>> +	}
>> +	ret = rte_service_component_runstate_set(sevice_id, 1);
>> +	if (ret) {
>> +		RTE_LOG(ERR, EAL, "Failed to set the backend runstate"
>> +			" of a component");
>> +		return ret;
>> +	}
>> +	ret = rte_service_map_lcore_set(sid, slcore, 1);
>> +	if (ret) {
>> +		RTE_LOG(ERR, EAL, "Failed to enable lcore 1 on "
>> +			"dev event monitor service");
>> +		return ret;
>> +	}
>> +	rte_service_lcore_start(slcore);
>> +	service_no_init = false;
>> +	return 0;
>> +
>> +err_done:
>> +	rte_service_component_unregister(sevice_id);
>> +	return ret;
>> +}
>> +
> Regards,
> /Bruce

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V13 1/3] eal: add uevent monitor api and callback func
  2018-01-27  3:48                                                 ` Guo, Jia
@ 2018-01-30  0:14                                                   ` Thomas Monjalon
  2018-01-30 12:20                                                     ` Guo, Jia
  0 siblings, 1 reply; 494+ messages in thread
From: Thomas Monjalon @ 2018-01-30  0:14 UTC (permalink / raw)
  To: Guo, Jia, Bruce Richardson, harry.van.haaren
  Cc: dev, stephen, gaetan.rivet, jingjing.wu, motih, ferruh.yigit,
	konstantin.ananyev, shreyansh.jain, helin.zhang

27/01/2018 04:48, Guo, Jia:
> On 1/27/2018 12:53 AM, Bruce Richardson wrote:
> > On Fri, Jan 26, 2018 at 11:49:35AM +0800, Jeff Guo wrote:
> >> +	ret = rte_service_lcore_add(slcore);
> >> +	if (ret) {
> >> +		RTE_LOG(ERR, EAL, "dev event monitor lcore add fail");
> >> +		return ret;
> >> +	}
> >> +
> > I don't think you should be taking another service core for this purpose
> > without the user asking for it. I also don't think service cores is the
> > right "tool" for monitoring the epoll. Rather than using a non-blocking
> > poll on a service core, I think you should look to reuse the existing
> > infrastructure for handling interrupts in the EAL, which relies on a
> > separate thread blocked on fd's awaiting input.
> 
> bruce, seems that you might be see the other view of the mountain, so if 
> service cores tools basically be born to  need user knowledge and 
> control it, and it is no need to add user to control service tool in the 
> case, i thinks we might not use the existing interrupts infrastructure 
> because it is the device uevent not interrupt as the same functional 
> scope ,  we could use a separate thread which i have used before in v7 
> to specialize poll the uevent, please check v7 part to see if it is good.

The v7 was using pthread_create, so it was not the right solution.

> @tomas, do you agree with that above , or other suggestion, could it be 
> got agreement all or let it improvement later?

I have no issue about using rte_service.
I think the other events processing in EAL could use rte_service.
Maybe Harry has a different view?

My main concerns are:
1/ There is not enough review
2/ The callback lookup is using device name from uevent
3/ There is no reference to the rte_device struct

Minor extra requirement: the new __rte_experimental should be used,
see http://dpdk.org/commit/77b7b81e32e

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V13 1/3] eal: add uevent monitor api and callback func
  2018-01-30  0:14                                                   ` Thomas Monjalon
@ 2018-01-30 12:20                                                     ` Guo, Jia
  0 siblings, 0 replies; 494+ messages in thread
From: Guo, Jia @ 2018-01-30 12:20 UTC (permalink / raw)
  To: Thomas Monjalon, Bruce Richardson, harry.van.haaren
  Cc: dev, stephen, gaetan.rivet, jingjing.wu, motih, ferruh.yigit,
	konstantin.ananyev, shreyansh.jain, helin.zhang



On 1/30/2018 8:14 AM, Thomas Monjalon wrote:
> 27/01/2018 04:48, Guo, Jia:
>> On 1/27/2018 12:53 AM, Bruce Richardson wrote:
>>> On Fri, Jan 26, 2018 at 11:49:35AM +0800, Jeff Guo wrote:
>>>> +	ret = rte_service_lcore_add(slcore);
>>>> +	if (ret) {
>>>> +		RTE_LOG(ERR, EAL, "dev event monitor lcore add fail");
>>>> +		return ret;
>>>> +	}
>>>> +
>>> I don't think you should be taking another service core for this purpose
>>> without the user asking for it. I also don't think service cores is the
>>> right "tool" for monitoring the epoll. Rather than using a non-blocking
>>> poll on a service core, I think you should look to reuse the existing
>>> infrastructure for handling interrupts in the EAL, which relies on a
>>> separate thread blocked on fd's awaiting input.
>> bruce, seems that you might be see the other view of the mountain, so if
>> service cores tools basically be born to  need user knowledge and
>> control it, and it is no need to add user to control service tool in the
>> case, i thinks we might not use the existing interrupts infrastructure
>> because it is the device uevent not interrupt as the same functional
>> scope ,  we could use a separate thread which i have used before in v7
>> to specialize poll the uevent, please check v7 part to see if it is good.
> The v7 was using pthread_create, so it was not the right solution.
>
>> @tomas, do you agree with that above , or other suggestion, could it be
>> got agreement all or let it improvement later?
> I have no issue about using rte_service.
> I think the other events processing in EAL could use rte_service.
> Maybe Harry has a different view?
>
> My main concerns are:
> 1/ There is not enough review
> 2/ The callback lookup is using device name from uevent
> 3/ There is no reference to the rte_device struct
>
> Minor extra requirement: the new __rte_experimental should be used,
> see http://dpdk.org/commit/77b7b81e32e
please review my patch v14 , hope i can fix all your concern, about 
rte_device struct , i think if there is not better idea to handler the 
null struct issue, the device name should be use as experimental and i 
have verify that is ok for use.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH V14 1/3] eal: add uevent monitor api and callback func
  2018-01-26  3:49                                               ` [PATCH V13 3/3] app/testpmd: use uevent to monitor hotplug Jeff Guo
@ 2018-01-30 12:20                                                 ` Jeff Guo
  2018-01-30 12:20                                                   ` [PATCH V14 2/3] eal: add uevent pass and process function Jeff Guo
                                                                     ` (2 more replies)
  0 siblings, 3 replies; 494+ messages in thread
From: Jeff Guo @ 2018-01-30 12:20 UTC (permalink / raw)
  To: stephen, bruce.richardson, gaetan.rivet, jingjing.wu, thomas, motih
  Cc: ferruh.yigit, konstantin.ananyev, jblunck, shreyansh.jain, dev,
	jia.guo, helin.zhang, harry.van.haaren, jianfeng.tan

This patch aim to add a general uevent mechanism in eal device layer,
to enable all linux kernel object uevent monitoring, user could use these
APIs to monitor and read out the device status info that sent from the
kernel side, then corresponding to handle it, such as when detect hotplug
uevent type, user could detach or attach the device, and more it benefit
to use to do smoothly fail safe work.

About uevent monitoring:
a: add one epolling to poll the netlink socket, to monitor the uevent of
   the device.
b: add enum of rte_eal_dev_event_type and struct of rte_eal_uevent.
c: add below APIs in rte eal device layer.
   rte_dev_callback_register
   rte_dev_callback_unregister
   _rte_dev_callback_process
   rte_dev_event_monitor_start
   rte_dev_event_monitor_stop

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v14->v13:
add __rte_experimental on function defind and fix bsd build issue
---
 lib/librte_eal/bsdapp/eal/Makefile      |   1 +
 lib/librte_eal/bsdapp/eal/eal_dev.c     |  33 +++++
 lib/librte_eal/common/eal_common_dev.c  | 132 +++++++++++++++++
 lib/librte_eal/common/include/rte_dev.h | 121 ++++++++++++++++
 lib/librte_eal/linuxapp/eal/Makefile    |   1 +
 lib/librte_eal/linuxapp/eal/eal_dev.c   | 243 ++++++++++++++++++++++++++++++++
 6 files changed, 531 insertions(+)
 create mode 100644 lib/librte_eal/bsdapp/eal/eal_dev.c
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_dev.c

diff --git a/lib/librte_eal/bsdapp/eal/Makefile b/lib/librte_eal/bsdapp/eal/Makefile
index dd455e6..c0921dd 100644
--- a/lib/librte_eal/bsdapp/eal/Makefile
+++ b/lib/librte_eal/bsdapp/eal/Makefile
@@ -33,6 +33,7 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_lcore.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_timer.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_interrupts.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_alarm.c
+SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_dev.c
 
 # from common dir
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_common_lcore.c
diff --git a/lib/librte_eal/bsdapp/eal/eal_dev.c b/lib/librte_eal/bsdapp/eal/eal_dev.c
new file mode 100644
index 0000000..3b7bbf2
--- /dev/null
+++ b/lib/librte_eal/bsdapp/eal/eal_dev.c
@@ -0,0 +1,33 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdio.h>
+#include <string.h>
+#include <inttypes.h>
+#include <unistd.h>
+#include <signal.h>
+#include <stdbool.h>
+
+#include <rte_malloc.h>
+#include <rte_bus.h>
+#include <rte_dev.h>
+#include <rte_devargs.h>
+#include <rte_debug.h>
+#include <rte_log.h>
+
+#include "eal_thread.h"
+
+int __rte_experimental
+rte_dev_event_monitor_start(void)
+{
+	RTE_LOG(ERR, EAL, "Not support event monitor for FreeBSD\n");
+	return -1;
+}
+
+int __rte_experimental
+rte_dev_event_monitor_stop(void)
+{
+	RTE_LOG(ERR, EAL, "Not support event monitor for FreeBSD\n");
+	return -1;
+}
diff --git a/lib/librte_eal/common/eal_common_dev.c b/lib/librte_eal/common/eal_common_dev.c
index 0de1c5d..8e3934c 100644
--- a/lib/librte_eal/common/eal_common_dev.c
+++ b/lib/librte_eal/common/eal_common_dev.c
@@ -43,9 +43,31 @@
 #include <rte_devargs.h>
 #include <rte_debug.h>
 #include <rte_log.h>
+#include <rte_spinlock.h>
+#include <rte_malloc.h>
 
 #include "eal_private.h"
 
+/* spinlock for device callbacks */
+static rte_spinlock_t rte_dev_event_lock = RTE_SPINLOCK_INITIALIZER;
+
+/**
+ * The user application callback description.
+ *
+ * It contains callback address to be registered by user application,
+ * the pointer to the parameters for callback, and the device name.
+ */
+struct rte_dev_event_callback {
+	TAILQ_ENTRY(rte_dev_event_callback) next; /**< Callbacks list */
+	rte_dev_event_cb_fn cb_fn;                /**< Callback address */
+	void *cb_arg;                           /**< Callback parameter */
+	char *dev_name;	 /**< Callback devcie name, NULL is for all device */
+	uint32_t active;                        /**< Callback is executing */
+};
+
+/* A general callbacks list for all callback of devices */
+static struct rte_dev_event_cb_list dev_event_cbs;
+
 static int cmp_detached_dev_name(const struct rte_device *dev,
 	const void *_name)
 {
@@ -236,3 +258,113 @@ rte_eal_hotplug_remove(const char *busname, const char *devname)
 	rte_eal_devargs_remove(busname, devname);
 	return ret;
 }
+
+int __rte_experimental
+rte_dev_callback_register(char *device_name, rte_dev_event_cb_fn cb_fn,
+				void *cb_arg)
+{
+	struct rte_dev_event_callback *event_cb = NULL;
+
+	rte_spinlock_lock(&rte_dev_event_lock);
+
+	if (TAILQ_EMPTY(&(dev_event_cbs)))
+		TAILQ_INIT(&(dev_event_cbs));
+
+	TAILQ_FOREACH(event_cb, &(dev_event_cbs), next) {
+		if (event_cb->cb_fn == cb_fn &&
+			event_cb->cb_arg == cb_arg &&
+			((!device_name && !event_cb->dev_name) ? 1 :
+			(!strcmp(event_cb->dev_name, device_name))))
+			break;
+	}
+
+	/* create a new callback. */
+	if (event_cb == NULL) {
+		/* allocate a new user callback entity */
+		event_cb = malloc(sizeof(struct rte_dev_event_callback));
+		if (event_cb != NULL) {
+			event_cb->cb_fn = cb_fn;
+			event_cb->cb_arg = cb_arg;
+			event_cb->dev_name = device_name;
+			TAILQ_INSERT_TAIL(&(dev_event_cbs), event_cb, next);
+		} else
+			free(event_cb);
+	}
+
+	rte_spinlock_unlock(&rte_dev_event_lock);
+	return (event_cb == NULL) ? -1 : 0;
+}
+
+int __rte_experimental
+rte_dev_callback_unregister(char *device_name, rte_dev_event_cb_fn cb_fn,
+				void *cb_arg)
+{
+	int ret;
+	struct rte_dev_event_callback *event_cb, *next;
+
+	if (!cb_fn || device_name == NULL)
+		return -EINVAL;
+
+	rte_spinlock_lock(&rte_dev_event_lock);
+
+	ret = 0;
+
+	for (event_cb = TAILQ_FIRST(&(dev_event_cbs)); event_cb != NULL;
+	      event_cb = next) {
+
+		next = TAILQ_NEXT(event_cb, next);
+
+		if (event_cb->cb_fn != cb_fn ||
+				(event_cb->cb_arg != (void *)-1 &&
+				event_cb->cb_arg != cb_arg) ||
+				(((!device_name && event_cb->dev_name) ||
+				(device_name && !event_cb->dev_name)) ? 1 :
+				strcmp(event_cb->dev_name, device_name)))
+			continue;
+
+		/*
+		 * if this callback is not executing right now,
+		 * then remove it.
+		 */
+		if (event_cb->active == 0) {
+			TAILQ_REMOVE(&(dev_event_cbs), event_cb, next);
+			rte_free(event_cb);
+		} else {
+			ret = -EAGAIN;
+		}
+	}
+
+	rte_spinlock_unlock(&rte_dev_event_lock);
+	return ret;
+}
+
+int __rte_experimental
+_rte_dev_callback_process(char *device_name, enum rte_dev_event_type event,
+				void *cb_arg)
+{
+	struct rte_dev_event_callback dev_cb;
+	struct rte_dev_event_callback *cb_lst;
+	int rc = 0;
+
+	rte_spinlock_lock(&rte_dev_event_lock);
+
+	if (device_name == NULL)
+		return -EINVAL;
+
+	TAILQ_FOREACH(cb_lst, &(dev_event_cbs), next) {
+		if (cb_lst->cb_fn == NULL || (!cb_lst->dev_name ? 0 :
+			strcmp(cb_lst->dev_name,
+			device_name) && cb_lst->dev_name))
+			continue;
+		dev_cb = *cb_lst;
+		cb_lst->active = 1;
+		if (cb_arg)
+			dev_cb.cb_arg = cb_arg;
+		rc = dev_cb.cb_fn(device_name, event,
+				dev_cb.cb_arg);
+		cb_lst->active = 0;
+	}
+
+	rte_spinlock_unlock(&rte_dev_event_lock);
+	return rc;
+}
diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h
index d1598fd..82082d8 100644
--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -53,6 +53,30 @@ extern "C" {
 #include <rte_compat.h>
 #include <rte_log.h>
 
+/**
+ * The device event type.
+ */
+enum rte_dev_event_type {
+	RTE_DEV_EVENT_UNKNOWN,	/**< unknown event type */
+	RTE_DEV_EVENT_ADD,	/**< device being added */
+	RTE_DEV_EVENT_REMOVE,	/**< device being removed */
+	RTE_DEV_EVENT_MAX	/**< max value of this enum */
+};
+
+struct rte_dev_event {
+	enum rte_dev_event_type type;	/**< device event type */
+	int subsystem;			/**< subsystem id */
+	char *devname;			/**< device name */
+};
+
+typedef int (*rte_dev_event_cb_fn)(char *device_name,
+					enum rte_dev_event_type event,
+					void *cb_arg);
+
+struct rte_dev_event_callback;
+/** @internal Structure to keep track of registered callbacks */
+TAILQ_HEAD(rte_dev_event_cb_list, rte_dev_event_callback);
+
 __attribute__((format(printf, 2, 0)))
 static inline void
 rte_pmd_debug_trace(const char *func_name, const char *fmt, ...)
@@ -296,4 +320,101 @@ __attribute__((used)) = str
 }
 #endif
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * It registers the callback for the specific device.
+ * Multiple callbacks cal be registered at the same time.
+ *
+ * @param device_name
+ *  The device name, that is the param name of the struct rte_device,
+ *  null value means for all devices.
+ * @param cb_fn
+ *  callback address.
+ * @param cb_arg
+ *  address of parameter for callback.
+ *
+ * @return
+ *  - On success, zero.
+ *  - On failure, a negative value.
+ */
+int __rte_experimental
+rte_dev_callback_register(char *device_name, rte_dev_event_cb_fn cb_fn,
+			void *cb_arg);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * It unregisters the callback according to the specified device.
+ *
+ * @param device_name
+ *  The device name, that is the param name of the struct rte_device,
+ *  null value means for all devices.
+ * @param cb_fn
+ *  callback address.
+ * @param cb_arg
+ *  address of parameter for callback, (void *)-1 means to remove all
+ *  registered which has the same callback address.
+ *
+ * @return
+ *  - On success, return the number of callback entities removed.
+ *  - On failure, a negative value.
+ */
+int __rte_experimental
+rte_dev_callback_unregister(char *device_name, rte_dev_event_cb_fn cb_fn,
+					void *cb_arg);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * internal Executes all the user application registered callbacks for
+ * the specific device. It is for DPDK internal user only. User
+ * application should not call it directly.
+ *
+ * @param device_name
+ *  The device name.
+ * @param event
+ *  the device event type
+ *  is permitted or not.
+ * @param cb_arg
+ *  callback parameter.
+ *
+ * @return
+ *  - On success, return zero.
+ *  - On failure, a negative value.
+ */
+int __rte_experimental
+_rte_dev_callback_process(char *device_name, enum rte_dev_event_type event,
+				void *cb_arg);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Start the device event monitoring.
+ *
+ * @param none
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int __rte_experimental
+rte_dev_event_monitor_start(void);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Stop the device event monitoring .
+ *
+ * @param none
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int __rte_experimental
+rte_dev_event_monitor_stop(void);
 #endif /* _RTE_DEV_H_ */
diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index 7e5bbe8..8578796 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -41,6 +41,7 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_lcore.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_timer.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_interrupts.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_alarm.c
+SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_dev.c
 
 # from common dir
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_common_lcore.c
diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
new file mode 100644
index 0000000..22ef85e
--- /dev/null
+++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
@@ -0,0 +1,243 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdio.h>
+#include <string.h>
+#include <inttypes.h>
+#include <sys/queue.h>
+#include <sys/signalfd.h>
+#include <sys/ioctl.h>
+#include <sys/socket.h>
+#include <linux/netlink.h>
+#include <sys/epoll.h>
+#include <unistd.h>
+#include <signal.h>
+#include <stdbool.h>
+
+#include <rte_malloc.h>
+#include <rte_bus.h>
+#include <rte_dev.h>
+#include <rte_devargs.h>
+#include <rte_debug.h>
+#include <rte_log.h>
+#include <rte_service.h>
+#include <rte_service_component.h>
+
+#include "eal_thread.h"
+
+bool service_exit = true;
+bool service_no_init = true;
+uint32_t slcore;
+uint32_t sevice_id;
+#define DEV_EV_MNT_SERVICE_NAME "device_event_monitor_service"
+
+static int
+dev_monitor_fd_new(void)
+{
+
+	int uevent_fd;
+
+	uevent_fd = socket(PF_NETLINK, SOCK_RAW | SOCK_CLOEXEC |
+			SOCK_NONBLOCK,
+			NETLINK_KOBJECT_UEVENT);
+	if (uevent_fd < 0) {
+		RTE_LOG(ERR, EAL, "create uevent fd failed\n");
+		return -1;
+	}
+	return uevent_fd;
+}
+
+static int
+dev_monitor_enable(int netlink_fd)
+{
+	struct sockaddr_nl addr;
+	int ret;
+	int size = 64 * 1024;
+	int nonblock = 1;
+
+	memset(&addr, 0, sizeof(addr));
+	addr.nl_family = AF_NETLINK;
+	addr.nl_pid = 0;
+	addr.nl_groups = 0xffffffff;
+
+	if (bind(netlink_fd, (struct sockaddr *) &addr, sizeof(addr)) < 0) {
+		RTE_LOG(ERR, EAL, "bind failed\n");
+		goto err;
+	}
+
+	setsockopt(netlink_fd, SOL_SOCKET, SO_PASSCRED, &size, sizeof(size));
+
+	ret = ioctl(netlink_fd, FIONBIO, &nonblock);
+	if (ret != 0) {
+		RTE_LOG(ERR, EAL, "ioctl(FIONBIO) failed\n");
+		goto err;
+	}
+	return 0;
+err:
+	close(netlink_fd);
+	return -1;
+}
+
+static int
+dev_uev_process(__rte_unused struct epoll_event *events, __rte_unused int nfds)
+{
+	/* TODO: device uevent processing */
+	return 0;
+}
+
+/**
+ * It builds/rebuilds up the epoll file descriptor with all the
+ * file descriptors being waited on. Then handles the netlink event.
+ *
+ * @param arg
+ *  pointer. (unused)
+ *
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+static int32_t dev_uev_monitoring(__rte_unused void *arg)
+{
+	int netlink_fd = -1;
+	struct epoll_event ep_kernel;
+	int fd_ep = -1;
+
+	service_exit = false;
+
+	fd_ep = epoll_create1(EPOLL_CLOEXEC);
+	if (fd_ep < 0) {
+		RTE_LOG(ERR, EAL, "error creating epoll fd: %m\n");
+		goto out;
+	}
+
+	netlink_fd = dev_monitor_fd_new();
+
+	if (dev_monitor_enable(netlink_fd) < 0) {
+		RTE_LOG(ERR, EAL, "error subscribing to kernel events\n");
+		goto out;
+	}
+
+	memset(&ep_kernel, 0, sizeof(struct epoll_event));
+	ep_kernel.events = EPOLLIN | EPOLLPRI | EPOLLRDHUP | EPOLLHUP;
+	ep_kernel.data.fd = netlink_fd;
+	if (epoll_ctl(fd_ep, EPOLL_CTL_ADD, netlink_fd,
+		&ep_kernel) < 0) {
+		RTE_LOG(ERR, EAL, "error addding fd to epoll: %m\n");
+		goto out;
+	}
+
+	while (!service_exit) {
+		int fdcount;
+		struct epoll_event ev[1];
+
+		fdcount = epoll_wait(fd_ep, ev, 1, -1);
+		if (fdcount < 0) {
+			if (errno != EINTR)
+				RTE_LOG(ERR, EAL, "error receiving uevent "
+					"message: %m\n");
+				continue;
+			}
+
+		/* epoll_wait has at least one fd ready to read */
+		if (dev_uev_process(ev, fdcount) < 0) {
+			if (errno != EINTR)
+				RTE_LOG(ERR, EAL, "error processing uevent "
+					"message: %m\n");
+		}
+	}
+	return 0;
+out:
+	if (fd_ep >= 0)
+		close(fd_ep);
+	if (netlink_fd >= 0)
+		close(netlink_fd);
+	rte_panic("uev monitoring fail\n");
+	return -1;
+}
+
+int __rte_experimental
+rte_dev_event_monitor_start(void)
+{
+	int ret;
+	struct rte_service_spec service;
+	const uint32_t sid = 0;
+
+	if (!service_no_init)
+		return 0;
+
+	slcore = rte_get_next_lcore(/* start core */ -1,
+					       /* skip master */ 1,
+					       /* wrap */ 0);
+
+	ret = rte_service_lcore_add(slcore);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "dev event monitor lcore add fail");
+		return ret;
+	}
+
+	memset(&service, 0, sizeof(service));
+	snprintf(service.name, sizeof(service.name), DEV_EV_MNT_SERVICE_NAME);
+
+	service.socket_id = rte_socket_id();
+	service.callback = dev_uev_monitoring;
+	service.callback_userdata = NULL;
+	service.capabilities = 0;
+	ret = rte_service_component_register(&service, &sevice_id);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Failed to register service %s "
+			"err = %" PRId32,
+			service.name, ret);
+		return ret;
+	}
+	ret = rte_service_runstate_set(sid, 1);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Failed to set the runstate of "
+			"the service");
+		goto err_done;
+	}
+	ret = rte_service_component_runstate_set(sevice_id, 1);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Failed to set the backend runstate"
+			" of a component");
+		return ret;
+	}
+	ret = rte_service_map_lcore_set(sid, slcore, 1);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Failed to enable lcore 1 on "
+			"dev event monitor service");
+		return ret;
+	}
+	rte_service_lcore_start(slcore);
+	service_no_init = false;
+	return 0;
+
+err_done:
+	rte_service_component_unregister(sevice_id);
+	return ret;
+}
+
+int __rte_experimental
+rte_dev_event_monitor_stop(void)
+{
+	int ret;
+
+	ret = rte_service_lcore_stop(slcore);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Failed to stop lcore on "
+			"dev event monitor service");
+		return ret;
+	}
+
+	rte_service_component_unregister(sevice_id);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Failed to unregister service "
+			"err = %" PRId32, ret);
+		return ret;
+	}
+
+	service_exit = true;
+	service_no_init = true;
+
+	return 0;
+}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH V14 2/3] eal: add uevent pass and process function
  2018-01-30 12:20                                                 ` [PATCH V14 1/3] eal: add uevent monitor api and callback func Jeff Guo
@ 2018-01-30 12:20                                                   ` Jeff Guo
  2018-01-30 12:21                                                   ` [PATCH V14 3/3] app/testpmd: use uevent to monitor hotplug Jeff Guo
  2018-01-31  0:44                                                   ` [PATCH V14 1/3] eal: add uevent monitor api and callback func Stephen Hemminger
  2 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-01-30 12:20 UTC (permalink / raw)
  To: stephen, bruce.richardson, gaetan.rivet, jingjing.wu, thomas, motih
  Cc: ferruh.yigit, konstantin.ananyev, jblunck, shreyansh.jain, dev,
	jia.guo, helin.zhang, harry.van.haaren, jianfeng.tan

In order to handle the uevent which have been detected from the kernel
side, add uevent process function, let hot plug event to be example to
show uevent mechanism how to pass the uevent and process the uevent.

About uevent passing and processing, add below functions in linux eal
dev layer. FreeBSD not support uevent ,so let it to be void and do not
implement in function.
a.dev_uev_parse
b.dev_uev_receive
c.dev_uev_process

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Reviewed-by: Jingjing Wu <jingjing.wu@intel.com>
---
v14->v13:
no change.
---
 lib/librte_eal/common/include/rte_dev.h |  16 +++++
 lib/librte_eal/linuxapp/eal/eal_dev.c   | 105 +++++++++++++++++++++++++++++++-
 2 files changed, 119 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h
index 82082d8..bd883a7 100644
--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -53,6 +53,22 @@ extern "C" {
 #include <rte_compat.h>
 #include <rte_log.h>
 
+#define RTE_EAL_UEV_MSG_LEN 4096
+#define RTE_EAL_UEV_MSG_ELEM_LEN 128
+
+enum rte_dev_state {
+	RTE_DEV_UNDEFINED,	/**< unknown device state */
+	RTE_DEV_FAULT,	/**< device fault or error */
+	RTE_DEV_PARSED,	/**< device has been scanned on bus*/
+	RTE_DEV_PROBED,	/**< device has been probed driver  */
+};
+
+enum rte_dev_event_subsystem {
+	RTE_DEV_EVENT_SUBSYSTEM_UIO,
+	RTE_DEV_EVENT_SUBSYSTEM_VFIO,
+	RTE_DEV_EVENT_SUBSYSTEM_MAX
+};
+
 /**
  * The device event type.
  */
diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
index 22ef85e..7b01f53 100644
--- a/lib/librte_eal/linuxapp/eal/eal_dev.c
+++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
@@ -79,10 +79,111 @@ dev_monitor_enable(int netlink_fd)
 	return -1;
 }
 
+static void
+dev_uev_parse(const char *buf, struct rte_dev_event *event)
+{
+	char action[RTE_EAL_UEV_MSG_ELEM_LEN];
+	char subsystem[RTE_EAL_UEV_MSG_ELEM_LEN];
+	char dev_path[RTE_EAL_UEV_MSG_ELEM_LEN];
+	char pci_slot_name[RTE_EAL_UEV_MSG_ELEM_LEN];
+	int i = 0;
+
+	memset(action, 0, RTE_EAL_UEV_MSG_ELEM_LEN);
+	memset(subsystem, 0, RTE_EAL_UEV_MSG_ELEM_LEN);
+	memset(dev_path, 0, RTE_EAL_UEV_MSG_ELEM_LEN);
+	memset(pci_slot_name, 0, RTE_EAL_UEV_MSG_ELEM_LEN);
+
+	while (i < RTE_EAL_UEV_MSG_LEN) {
+		for (; i < RTE_EAL_UEV_MSG_LEN; i++) {
+			if (*buf)
+				break;
+			buf++;
+		}
+
+		if (!strncmp(buf, "ACTION=", 7)) {
+			buf += 7;
+			i += 7;
+			snprintf(action, sizeof(action), "%s", buf);
+		} else if (!strncmp(buf, "DEVPATH=", 8)) {
+			buf += 8;
+			i += 8;
+			snprintf(dev_path, sizeof(dev_path), "%s", buf);
+		} else if (!strncmp(buf, "SUBSYSTEM=", 10)) {
+			buf += 10;
+			i += 10;
+			snprintf(subsystem, sizeof(subsystem), "%s", buf);
+		} else if (!strncmp(buf, "PCI_SLOT_NAME=", 14)) {
+			buf += 14;
+			i += 14;
+			snprintf(pci_slot_name, sizeof(subsystem), "%s", buf);
+		}
+		for (; i < RTE_EAL_UEV_MSG_LEN; i++) {
+			if (*buf == '\0')
+				break;
+			buf++;
+		}
+	}
+
+
+	if ((!strncmp(subsystem, "uio", 3)) ||
+		(!strncmp(subsystem, "pci", 3))) {
+		event->subsystem = RTE_DEV_EVENT_SUBSYSTEM_UIO;
+		if (!strncmp(action, "add", 3))
+			event->type = RTE_DEV_EVENT_ADD;
+		if (!strncmp(action, "remove", 6))
+			event->type = RTE_DEV_EVENT_REMOVE;
+		event->devname = pci_slot_name;
+	}
+}
+
+static int
+dev_uev_receive(int fd, struct rte_dev_event *uevent)
+{
+	int ret;
+	char buf[RTE_EAL_UEV_MSG_LEN];
+
+	memset(uevent, 0, sizeof(struct rte_dev_event));
+	memset(buf, 0, RTE_EAL_UEV_MSG_LEN);
+
+	ret = recv(fd, buf, RTE_EAL_UEV_MSG_LEN - 1, MSG_DONTWAIT);
+	if (ret < 0) {
+		RTE_LOG(ERR, EAL,
+		"Socket read error(%d): %s\n",
+		errno, strerror(errno));
+		return -1;
+	} else if (ret == 0)
+		/* connection closed */
+		return -1;
+
+	dev_uev_parse(buf, uevent);
+
+	return 0;
+}
+
 static int
-dev_uev_process(__rte_unused struct epoll_event *events, __rte_unused int nfds)
+dev_uev_process(struct epoll_event *events, int nfds)
 {
-	/* TODO: device uevent processing */
+	struct rte_dev_event uevent;
+	int i;
+
+	for (i = 0; i < nfds; i++) {
+		if (dev_uev_receive(events[i].data.fd, &uevent))
+			return 0;
+
+		/* default handle all pci devcie when is being hot plug */
+		if (uevent.subsystem == RTE_DEV_EVENT_SUBSYSTEM_UIO &&
+			uevent.devname) {
+			if (uevent.type == RTE_DEV_EVENT_REMOVE) {
+				return(_rte_dev_callback_process(
+					uevent.devname,
+					RTE_DEV_EVENT_REMOVE, NULL));
+			} else if (uevent.type == RTE_DEV_EVENT_ADD) {
+				return(_rte_dev_callback_process(
+					uevent.devname,
+					RTE_DEV_EVENT_ADD, NULL));
+			}
+		}
+	}
 	return 0;
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH V14 3/3] app/testpmd: use uevent to monitor hotplug
  2018-01-30 12:20                                                 ` [PATCH V14 1/3] eal: add uevent monitor api and callback func Jeff Guo
  2018-01-30 12:20                                                   ` [PATCH V14 2/3] eal: add uevent pass and process function Jeff Guo
@ 2018-01-30 12:21                                                   ` Jeff Guo
  2018-01-31  5:21                                                     ` Wu, Jingjing
  2018-03-21  5:27                                                     ` [PATCH V15 1/5] eal: add uevent monitor api and callback func Jeff Guo
  2018-01-31  0:44                                                   ` [PATCH V14 1/3] eal: add uevent monitor api and callback func Stephen Hemminger
  2 siblings, 2 replies; 494+ messages in thread
From: Jeff Guo @ 2018-01-30 12:21 UTC (permalink / raw)
  To: stephen, bruce.richardson, gaetan.rivet, jingjing.wu, thomas, motih
  Cc: ferruh.yigit, konstantin.ananyev, jblunck, shreyansh.jain, dev,
	jia.guo, helin.zhang, harry.van.haaren, jianfeng.tan

use testpmd for example, to show app how to request and use
uevent monitoring to handle the hot removal event and the
hot insertion event.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v14->v13:
no change
---
 app/test-pmd/testpmd.c | 169 +++++++++++++++++++++++++++++++++++++++++++++++++
 app/test-pmd/testpmd.h |   9 +++
 2 files changed, 178 insertions(+)

diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index d8ac432..36b7325 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -12,6 +12,7 @@
 #include <sys/mman.h>
 #include <sys/types.h>
 #include <errno.h>
+#include <stdbool.h>
 
 #include <sys/queue.h>
 #include <sys/stat.h>
@@ -368,6 +369,8 @@ uint8_t bitrate_enabled;
 struct gro_status gro_ports[RTE_MAX_ETHPORTS];
 uint8_t gro_flush_cycles = GRO_DEFAULT_FLUSH_CYCLES;
 
+static struct hotplug_request_list hp_list;
+
 /* Forward function declarations */
 static void map_port_queue_stats_mapping_registers(portid_t pi,
 						   struct rte_port *port);
@@ -375,6 +378,13 @@ static void check_all_ports_link_status(uint32_t port_mask);
 static int eth_event_callback(portid_t port_id,
 			      enum rte_eth_event_type type,
 			      void *param, void *ret_param);
+static int eth_uevent_callback(char *device_name, enum rte_dev_event_type type,
+			      void *param);
+static int eth_uevent_callback_register(portid_t port_id);
+static bool in_hotplug_list(const char *dev_name);
+
+static int hotplug_list_add(const char *dev_name,
+			    enum rte_dev_event_type event);
 
 /*
  * Check if all the ports are started.
@@ -1838,6 +1848,27 @@ reset_port(portid_t pid)
 	printf("Done\n");
 }
 
+static int
+eth_uevent_callback_register(portid_t port_id)
+{
+	int diag;
+	char device_name[128];
+
+	snprintf(device_name, sizeof(device_name),
+		"%s", rte_eth_devices[port_id].device->name);
+
+	/* register the uevent callback */
+
+	diag = rte_dev_callback_register(device_name,
+		eth_uevent_callback, (void *)(intptr_t)port_id);
+	if (diag) {
+		printf("Failed to setup uevent callback\n");
+		return -1;
+	}
+
+	return 0;
+}
+
 void
 attach_port(char *identifier)
 {
@@ -1854,6 +1885,8 @@ attach_port(char *identifier)
 	if (rte_eth_dev_attach(identifier, &pi))
 		return;
 
+	eth_uevent_callback_register(pi);
+
 	socket_id = (unsigned)rte_eth_dev_socket_id(pi);
 	/* if socket_id is invalid, set to 0 */
 	if (check_socket_id(socket_id) < 0)
@@ -1865,6 +1898,8 @@ attach_port(char *identifier)
 
 	ports[pi].port_status = RTE_PORT_STOPPED;
 
+	hotplug_list_add(identifier, RTE_DEV_EVENT_REMOVE);
+
 	printf("Port %d is attached. Now total ports is %d\n", pi, nb_ports);
 	printf("Done\n");
 }
@@ -1891,6 +1926,9 @@ detach_port(portid_t port_id)
 
 	nb_ports = rte_eth_dev_count();
 
+	hotplug_list_add(rte_eth_devices[port_id].device->name,
+			 RTE_DEV_EVENT_ADD);
+
 	printf("Port '%s' is detached. Now total ports is %d\n",
 			name, nb_ports);
 	printf("Done\n");
@@ -1914,6 +1952,9 @@ pmd_test_exit(void)
 			close_port(pt_id);
 		}
 	}
+
+	rte_dev_event_monitor_stop();
+
 	printf("\nBye...\n");
 }
 
@@ -1998,6 +2039,49 @@ rmv_event_callback(void *arg)
 			dev->device->name);
 }
 
+static void
+rmv_uevent_callback(void *arg)
+{
+	char name[RTE_ETH_NAME_MAX_LEN];
+	uint8_t port_id = (intptr_t)arg;
+
+	rte_eal_alarm_cancel(rmv_uevent_callback, arg);
+
+	RTE_ETH_VALID_PORTID_OR_RET(port_id);
+	printf("removing port id:%u\n", port_id);
+
+	if (!in_hotplug_list(rte_eth_devices[port_id].device->name))
+		return;
+
+	stop_packet_forwarding();
+
+	stop_port(port_id);
+	close_port(port_id);
+	if (rte_eth_dev_detach(port_id, name)) {
+		TESTPMD_LOG(ERR, "Failed to detach port '%s'\n", name);
+		return;
+	}
+
+	nb_ports = rte_eth_dev_count();
+
+	printf("Port '%s' is detached. Now total ports is %d\n",
+			name, nb_ports);
+}
+
+static void
+add_uevent_callback(void *arg)
+{
+	char *dev_name = (char *)arg;
+
+	rte_eal_alarm_cancel(add_uevent_callback, arg);
+
+	if (!in_hotplug_list(dev_name))
+		return;
+
+	printf("add device: %s\n", dev_name);
+	attach_port(dev_name);
+}
+
 /* This function is used by the interrupt thread */
 static int
 eth_event_callback(portid_t port_id, enum rte_eth_event_type type, void *param,
@@ -2041,6 +2125,82 @@ eth_event_callback(portid_t port_id, enum rte_eth_event_type type, void *param,
 	return 0;
 }
 
+static bool
+in_hotplug_list(const char *dev_name)
+{
+	struct hotplug_request *hp_request = NULL;
+
+	TAILQ_FOREACH(hp_request, &hp_list, next) {
+		if (!strcmp(hp_request->dev_name, dev_name))
+			break;
+	}
+
+	if (hp_request)
+		return true;
+
+	return false;
+}
+
+static int
+hotplug_list_add(const char *dev_name, enum rte_dev_event_type event)
+{
+	struct hotplug_request *hp_request;
+
+	hp_request = rte_zmalloc("hoplug request",
+			sizeof(*hp_request), 0);
+	if (hp_request == NULL) {
+		fprintf(stderr, "%s can not alloc memory\n",
+			__func__);
+		return -ENOMEM;
+	}
+
+	hp_request->dev_name = dev_name;
+	hp_request->event = event;
+
+	TAILQ_INSERT_TAIL(&hp_list, hp_request, next);
+
+	return 0;
+}
+
+/* This function is used by the interrupt thread */
+static int
+eth_uevent_callback(char *device_name, enum rte_dev_event_type type, void *arg)
+{
+	static const char * const event_desc[] = {
+		[RTE_DEV_EVENT_UNKNOWN] = "Unknown",
+		[RTE_DEV_EVENT_ADD] = "add",
+		[RTE_DEV_EVENT_REMOVE] = "remove",
+	};
+
+	if (type >= RTE_DEV_EVENT_MAX) {
+		fprintf(stderr, "%s called upon invalid event %d\n",
+			__func__, type);
+		fflush(stderr);
+	} else if (event_print_mask & (UINT32_C(1) << type)) {
+		printf("%s event\n",
+			event_desc[type]);
+		fflush(stdout);
+	}
+
+	switch (type) {
+	case RTE_DEV_EVENT_REMOVE:
+		if (rte_eal_alarm_set(100000,
+			rmv_uevent_callback, arg))
+			fprintf(stderr, "Could not set up deferred "
+				"device removal\n");
+		break;
+	case RTE_DEV_EVENT_ADD:
+		if (rte_eal_alarm_set(500000,
+			add_uevent_callback, (void *)device_name))
+			fprintf(stderr, "Could not set up deferred "
+				"device add\n");
+		break;
+	default:
+		break;
+	}
+	return 0;
+}
+
 static int
 set_tx_queue_stats_mapping_registers(portid_t port_id, struct rte_port *port)
 {
@@ -2522,6 +2682,15 @@ main(int argc, char** argv)
 		       nb_rxq, nb_txq);
 
 	init_config();
+
+	/* enable hot plug monitoring */
+	TAILQ_INIT(&hp_list);
+	rte_dev_event_monitor_start();
+	RTE_ETH_FOREACH_DEV(port_id) {
+		hotplug_list_add(rte_eth_devices[port_id].device->name,
+			 RTE_DEV_EVENT_REMOVE);
+		eth_uevent_callback_register(port_id);
+	}
 	if (start_port(RTE_PORT_ALL) != 0)
 		rte_exit(EXIT_FAILURE, "Start ports failed\n");
 
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 47f8fa8..c797667 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -63,6 +63,15 @@ typedef uint16_t streamid_t;
 #define TM_MODE			0
 #endif
 
+struct hotplug_request {
+	TAILQ_ENTRY(hotplug_request) next; /**< Callbacks list */
+	const char *dev_name;                /* request device name */
+	enum rte_dev_event_type event;      /**< device event type */
+};
+
+/** @internal Structure to keep track of registered callbacks */
+TAILQ_HEAD(hotplug_request_list, hotplug_request);
+
 enum {
 	PORT_TOPOLOGY_PAIRED,
 	PORT_TOPOLOGY_CHAINED,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* Re: [PATCH V14 1/3] eal: add uevent monitor api and callback func
  2018-01-30 12:20                                                 ` [PATCH V14 1/3] eal: add uevent monitor api and callback func Jeff Guo
  2018-01-30 12:20                                                   ` [PATCH V14 2/3] eal: add uevent pass and process function Jeff Guo
  2018-01-30 12:21                                                   ` [PATCH V14 3/3] app/testpmd: use uevent to monitor hotplug Jeff Guo
@ 2018-01-31  0:44                                                   ` Stephen Hemminger
  2018-02-02 10:45                                                     ` Guo, Jia
  2 siblings, 1 reply; 494+ messages in thread
From: Stephen Hemminger @ 2018-01-31  0:44 UTC (permalink / raw)
  To: Jeff Guo
  Cc: bruce.richardson, gaetan.rivet, jingjing.wu, thomas, motih,
	ferruh.yigit, konstantin.ananyev, jblunck, shreyansh.jain, dev,
	helin.zhang, harry.van.haaren, jianfeng.tan

On Tue, 30 Jan 2018 20:20:58 +0800
Jeff Guo <jia.guo@intel.com> wrote:

> +	memset(&ep_kernel, 0, sizeof(struct epoll_event));
> +	ep_kernel.events = EPOLLIN | EPOLLPRI | EPOLLRDHUP | EPOLLHUP;
> +	ep_kernel.data.fd = netlink_fd;
> +	if (epoll_ctl(fd_ep, EPOLL_CTL_ADD, netlink_fd,
> +		&ep_kernel) < 0) {
> +		RTE_LOG(ERR, EAL, "error addding fd to epoll: %m\n");
> +		goto out;
> +	}
> +
> +	while (!service_exit) {
> +		int fdcount;
> +		struct epoll_event ev[1];
> +
> +		fdcount = epoll_wait(fd_ep, ev, 1, -1);
> +		if (fdcount < 0) {
> +			if (errno != EINTR)
> +				RTE_LOG(ERR, EAL, "error receiving uevent "
> +					"message: %m\n");
> +				continue;
> +			}
> +
> +		/* epoll_wait has at least one fd ready to read */
> +		if (dev_uev_process(ev, fdcount) < 0) {
> +			if (errno != EINTR)
> +				RTE_LOG(ERR, EAL, "error processing uevent "
> +					"message: %m\n");
> +		}
> +	}

What is the point of the extra epoll here?
Why not just make netlink_fd blocking and do recv?
Rather than having two syscalls per event.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V14 3/3] app/testpmd: use uevent to monitor hotplug
  2018-01-30 12:21                                                   ` [PATCH V14 3/3] app/testpmd: use uevent to monitor hotplug Jeff Guo
@ 2018-01-31  5:21                                                     ` Wu, Jingjing
  2018-03-21  5:27                                                     ` [PATCH V15 1/5] eal: add uevent monitor api and callback func Jeff Guo
  1 sibling, 0 replies; 494+ messages in thread
From: Wu, Jingjing @ 2018-01-31  5:21 UTC (permalink / raw)
  To: Guo, Jia, stephen, Richardson, Bruce, gaetan.rivet, thomas, motih
  Cc: Yigit, Ferruh, Ananyev, Konstantin, jblunck, shreyansh.jain, dev,
	Zhang, Helin, Van Haaren, Harry, Tan, Jianfeng



> -----Original Message-----
> From: Guo, Jia
> Sent: Tuesday, January 30, 2018 8:21 PM
> To: stephen@networkplumber.org; Richardson, Bruce <bruce.richardson@intel.com>;
> gaetan.rivet@6wind.com; Wu, Jingjing <jingjing.wu@intel.com>; thomas@monjalon.net;
> motih@mellanox.com
> Cc: Yigit, Ferruh <ferruh.yigit@intel.com>; Ananyev, Konstantin
> <konstantin.ananyev@intel.com>; jblunck@infradead.org; shreyansh.jain@nxp.com;
> dev@dpdk.org; Guo, Jia <jia.guo@intel.com>; Zhang, Helin <helin.zhang@intel.com>;
> Van Haaren, Harry <harry.van.haaren@intel.com>; Tan, Jianfeng
> <jianfeng.tan@intel.com>
> Subject: [PATCH V14 3/3] app/testpmd: use uevent to monitor hotplug
> 
> use testpmd for example, to show app how to request and use
> uevent monitoring to handle the hot removal event and the
> hot insertion event.
> 
> Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Jingjing Wu <jingjing.wu@intel.com>

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V14 1/3] eal: add uevent monitor api and callback func
  2018-01-31  0:44                                                   ` [PATCH V14 1/3] eal: add uevent monitor api and callback func Stephen Hemminger
@ 2018-02-02 10:45                                                     ` Guo, Jia
  0 siblings, 0 replies; 494+ messages in thread
From: Guo, Jia @ 2018-02-02 10:45 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: bruce.richardson, gaetan.rivet, jingjing.wu, thomas, motih,
	ferruh.yigit, konstantin.ananyev, jblunck, shreyansh.jain, dev,
	helin.zhang, harry.van.haaren, jianfeng.tan



On 1/31/2018 8:44 AM, Stephen Hemminger wrote:
> On Tue, 30 Jan 2018 20:20:58 +0800
> Jeff Guo <jia.guo@intel.com> wrote:
>
>> +	memset(&ep_kernel, 0, sizeof(struct epoll_event));
>> +	ep_kernel.events = EPOLLIN | EPOLLPRI | EPOLLRDHUP | EPOLLHUP;
>> +	ep_kernel.data.fd = netlink_fd;
>> +	if (epoll_ctl(fd_ep, EPOLL_CTL_ADD, netlink_fd,
>> +		&ep_kernel) < 0) {
>> +		RTE_LOG(ERR, EAL, "error addding fd to epoll: %m\n");
>> +		goto out;
>> +	}
>> +
>> +	while (!service_exit) {
>> +		int fdcount;
>> +		struct epoll_event ev[1];
>> +
>> +		fdcount = epoll_wait(fd_ep, ev, 1, -1);
>> +		if (fdcount < 0) {
>> +			if (errno != EINTR)
>> +				RTE_LOG(ERR, EAL, "error receiving uevent "
>> +					"message: %m\n");
>> +				continue;
>> +			}
>> +
>> +		/* epoll_wait has at least one fd ready to read */
>> +		if (dev_uev_process(ev, fdcount) < 0) {
>> +			if (errno != EINTR)
>> +				RTE_LOG(ERR, EAL, "error processing uevent "
>> +					"message: %m\n");
>> +		}
>> +	}
> What is the point of the extra epoll here?
> Why not just make netlink_fd blocking and do recv?
> Rather than having two syscalls per event.
if device event monitor only monitor a netlink fd, that might be right 
not need to add extra epoll, let me think about that if it is need to 
restore for future advance or just make it simpler. thanks , stephen.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH V15 1/5] eal: add uevent monitor api and callback func
  2018-01-30 12:21                                                   ` [PATCH V14 3/3] app/testpmd: use uevent to monitor hotplug Jeff Guo
  2018-01-31  5:21                                                     ` Wu, Jingjing
@ 2018-03-21  5:27                                                     ` Jeff Guo
  2018-03-21  5:27                                                       ` [PATCH V15 2/5] eal: add uevent pass and process function Jeff Guo
  2018-03-21  5:27                                                       ` [PATCH V15 3/5] app/testpmd: use uevent to monitor hotplug Jeff Guo
  1 sibling, 2 replies; 494+ messages in thread
From: Jeff Guo @ 2018-03-21  5:27 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, harry.van.haaren,
	jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

This patch aim to add a general uevent mechanism in eal device layer,
to enable all linux kernel object uevent monitoring, user could call these
APIs to monitor and read out the device status info that sent from the
kernel side, then corresponding to handle it, such as when detect hotplug
uevent type, user could detach or attach the device.

About uevent monitoring:
a: add new device event handle type in eal interrupt, it could be register
   for uevent interrupt
b: add enum of rte_eal_dev_event_type and struct of rte_eal_uevent.
c: add below APIs in rte eal device layer for enable monitor and cb register
   rte_dev_callback_register
   rte_dev_callback_unregister
   _rte_dev_callback_process
   rte_dev_event_monitor_start
   rte_dev_event_monitor_stop

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v15->v14:
use exist eal interrupt epoll to replace of rte service usage for monitor thread,
add new device event handle type in eal interrupt 
---
 lib/librte_eal/bsdapp/eal/Makefile                 |   1 +
 lib/librte_eal/bsdapp/eal/eal_dev.c                |  33 +++++
 lib/librte_eal/common/eal_common_dev.c             | 132 ++++++++++++++++++++
 lib/librte_eal/common/include/rte_dev.h            | 121 +++++++++++++++++++
 lib/librte_eal/common/include/rte_eal_interrupts.h |   1 +
 lib/librte_eal/linuxapp/eal/Makefile               |   1 +
 lib/librte_eal/linuxapp/eal/eal_dev.c              | 134 +++++++++++++++++++++
 lib/librte_eal/linuxapp/eal/eal_interrupts.c       |   5 +-
 8 files changed, 427 insertions(+), 1 deletion(-)
 create mode 100644 lib/librte_eal/bsdapp/eal/eal_dev.c
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_dev.c

diff --git a/lib/librte_eal/bsdapp/eal/Makefile b/lib/librte_eal/bsdapp/eal/Makefile
index dd455e6..c0921dd 100644
--- a/lib/librte_eal/bsdapp/eal/Makefile
+++ b/lib/librte_eal/bsdapp/eal/Makefile
@@ -33,6 +33,7 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_lcore.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_timer.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_interrupts.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_alarm.c
+SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_dev.c
 
 # from common dir
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_common_lcore.c
diff --git a/lib/librte_eal/bsdapp/eal/eal_dev.c b/lib/librte_eal/bsdapp/eal/eal_dev.c
new file mode 100644
index 0000000..3b7bbf2
--- /dev/null
+++ b/lib/librte_eal/bsdapp/eal/eal_dev.c
@@ -0,0 +1,33 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdio.h>
+#include <string.h>
+#include <inttypes.h>
+#include <unistd.h>
+#include <signal.h>
+#include <stdbool.h>
+
+#include <rte_malloc.h>
+#include <rte_bus.h>
+#include <rte_dev.h>
+#include <rte_devargs.h>
+#include <rte_debug.h>
+#include <rte_log.h>
+
+#include "eal_thread.h"
+
+int __rte_experimental
+rte_dev_event_monitor_start(void)
+{
+	RTE_LOG(ERR, EAL, "Not support event monitor for FreeBSD\n");
+	return -1;
+}
+
+int __rte_experimental
+rte_dev_event_monitor_stop(void)
+{
+	RTE_LOG(ERR, EAL, "Not support event monitor for FreeBSD\n");
+	return -1;
+}
diff --git a/lib/librte_eal/common/eal_common_dev.c b/lib/librte_eal/common/eal_common_dev.c
index cd07144..9365118 100644
--- a/lib/librte_eal/common/eal_common_dev.c
+++ b/lib/librte_eal/common/eal_common_dev.c
@@ -14,9 +14,31 @@
 #include <rte_devargs.h>
 #include <rte_debug.h>
 #include <rte_log.h>
+#include <rte_spinlock.h>
+#include <rte_malloc.h>
 
 #include "eal_private.h"
 
+/* spinlock for device callbacks */
+static rte_spinlock_t rte_dev_event_lock = RTE_SPINLOCK_INITIALIZER;
+
+/**
+ * The user application callback description.
+ *
+ * It contains callback address to be registered by user application,
+ * the pointer to the parameters for callback, and the device name.
+ */
+struct rte_dev_event_callback {
+	TAILQ_ENTRY(rte_dev_event_callback) next; /**< Callbacks list */
+	rte_dev_event_cb_fn cb_fn;                /**< Callback address */
+	void *cb_arg;                           /**< Callback parameter */
+	char *dev_name;	 /**< Callback devcie name, NULL is for all device */
+	uint32_t active;                        /**< Callback is executing */
+};
+
+/* A general callbacks list for all callback of devices */
+static struct rte_dev_event_cb_list dev_event_cbs;
+
 static int cmp_detached_dev_name(const struct rte_device *dev,
 	const void *_name)
 {
@@ -207,3 +229,113 @@ rte_eal_hotplug_remove(const char *busname, const char *devname)
 	rte_eal_devargs_remove(busname, devname);
 	return ret;
 }
+
+int __rte_experimental
+rte_dev_callback_register(char *device_name, rte_dev_event_cb_fn cb_fn,
+				void *cb_arg)
+{
+	struct rte_dev_event_callback *event_cb = NULL;
+
+	rte_spinlock_lock(&rte_dev_event_lock);
+
+	if (TAILQ_EMPTY(&(dev_event_cbs)))
+		TAILQ_INIT(&(dev_event_cbs));
+
+	TAILQ_FOREACH(event_cb, &(dev_event_cbs), next) {
+		if (event_cb->cb_fn == cb_fn &&
+			event_cb->cb_arg == cb_arg &&
+			((!device_name && !event_cb->dev_name) ? 1 :
+			(!strcmp(event_cb->dev_name, device_name))))
+			break;
+	}
+
+	/* create a new callback. */
+	if (event_cb == NULL) {
+		/* allocate a new user callback entity */
+		event_cb = malloc(sizeof(struct rte_dev_event_callback));
+		if (event_cb != NULL) {
+			event_cb->cb_fn = cb_fn;
+			event_cb->cb_arg = cb_arg;
+			strcpy(event_cb->dev_name, device_name);
+			TAILQ_INSERT_TAIL(&(dev_event_cbs), event_cb, next);
+		} else
+			free(event_cb);
+	}
+
+	rte_spinlock_unlock(&rte_dev_event_lock);
+	return (event_cb == NULL) ? -1 : 0;
+}
+
+int __rte_experimental
+rte_dev_callback_unregister(char *device_name, rte_dev_event_cb_fn cb_fn,
+				void *cb_arg)
+{
+	int ret;
+	struct rte_dev_event_callback *event_cb, *next;
+
+	if (!cb_fn || device_name == NULL)
+		return -EINVAL;
+
+	rte_spinlock_lock(&rte_dev_event_lock);
+
+	ret = 0;
+
+	for (event_cb = TAILQ_FIRST(&(dev_event_cbs)); event_cb != NULL;
+	      event_cb = next) {
+
+		next = TAILQ_NEXT(event_cb, next);
+
+		if (event_cb->cb_fn != cb_fn ||
+				(event_cb->cb_arg != (void *)-1 &&
+				event_cb->cb_arg != cb_arg) ||
+				(((!device_name && event_cb->dev_name) ||
+				(device_name && !event_cb->dev_name)) ? 1 :
+				strcmp(event_cb->dev_name, device_name)))
+			continue;
+
+		/*
+		 * if this callback is not executing right now,
+		 * then remove it.
+		 */
+		if (event_cb->active == 0) {
+			TAILQ_REMOVE(&(dev_event_cbs), event_cb, next);
+			rte_free(event_cb);
+		} else {
+			ret = -EAGAIN;
+		}
+	}
+
+	rte_spinlock_unlock(&rte_dev_event_lock);
+	return ret;
+}
+
+int __rte_experimental
+_rte_dev_callback_process(char *device_name, enum rte_dev_event_type event,
+				void *cb_arg)
+{
+	struct rte_dev_event_callback dev_cb;
+	struct rte_dev_event_callback *cb_lst;
+	int rc = 0;
+
+	rte_spinlock_lock(&rte_dev_event_lock);
+
+	if (device_name == NULL)
+		return -EINVAL;
+
+	TAILQ_FOREACH(cb_lst, &(dev_event_cbs), next) {
+		if (cb_lst->cb_fn == NULL || (!cb_lst->dev_name ? 0 :
+			strcmp(cb_lst->dev_name,
+			device_name) && cb_lst->dev_name))
+			continue;
+		dev_cb = *cb_lst;
+		cb_lst->active = 1;
+		if (cb_arg)
+			dev_cb.cb_arg = cb_arg;
+		rc = dev_cb.cb_fn(device_name, event,
+				dev_cb.cb_arg);
+		cb_lst->active = 0;
+	}
+
+	rte_spinlock_unlock(&rte_dev_event_lock);
+	return rc;
+}
diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h
index b688f1e..d2fcbc9 100644
--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -24,6 +24,30 @@ extern "C" {
 #include <rte_compat.h>
 #include <rte_log.h>
 
+/**
+ * The device event type.
+ */
+enum rte_dev_event_type {
+	RTE_DEV_EVENT_UNKNOWN,	/**< unknown event type */
+	RTE_DEV_EVENT_ADD,	/**< device being added */
+	RTE_DEV_EVENT_REMOVE,	/**< device being removed */
+	RTE_DEV_EVENT_MAX	/**< max value of this enum */
+};
+
+struct rte_dev_event {
+	enum rte_dev_event_type type;	/**< device event type */
+	int subsystem;			/**< subsystem id */
+	char *devname;			/**< device name */
+};
+
+typedef int (*rte_dev_event_cb_fn)(char *device_name,
+					enum rte_dev_event_type event,
+					void *cb_arg);
+
+struct rte_dev_event_callback;
+/** @internal Structure to keep track of registered callbacks */
+TAILQ_HEAD(rte_dev_event_cb_list, rte_dev_event_callback);
+
 __attribute__((format(printf, 2, 0)))
 static inline void
 rte_pmd_debug_trace(const char *func_name, const char *fmt, ...)
@@ -267,4 +291,101 @@ __attribute__((used)) = str
 }
 #endif
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * It registers the callback for the specific device.
+ * Multiple callbacks cal be registered at the same time.
+ *
+ * @param device_name
+ *  The device name, that is the param name of the struct rte_device,
+ *  null value means for all devices.
+ * @param cb_fn
+ *  callback address.
+ * @param cb_arg
+ *  address of parameter for callback.
+ *
+ * @return
+ *  - On success, zero.
+ *  - On failure, a negative value.
+ */
+int __rte_experimental
+rte_dev_callback_register(char *device_name, rte_dev_event_cb_fn cb_fn,
+			void *cb_arg);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * It unregisters the callback according to the specified device.
+ *
+ * @param device_name
+ *  The device name, that is the param name of the struct rte_device,
+ *  null value means for all devices.
+ * @param cb_fn
+ *  callback address.
+ * @param cb_arg
+ *  address of parameter for callback, (void *)-1 means to remove all
+ *  registered which has the same callback address.
+ *
+ * @return
+ *  - On success, return the number of callback entities removed.
+ *  - On failure, a negative value.
+ */
+int __rte_experimental
+rte_dev_callback_unregister(char *device_name, rte_dev_event_cb_fn cb_fn,
+					void *cb_arg);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * internal Executes all the user application registered callbacks for
+ * the specific device. It is for DPDK internal user only. User
+ * application should not call it directly.
+ *
+ * @param device_name
+ *  The device name.
+ * @param event
+ *  the device event type
+ *  is permitted or not.
+ * @param cb_arg
+ *  callback parameter.
+ *
+ * @return
+ *  - On success, return zero.
+ *  - On failure, a negative value.
+ */
+int __rte_experimental
+_rte_dev_callback_process(char *device_name, enum rte_dev_event_type event,
+				void *cb_arg);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Start the device event monitoring.
+ *
+ * @param none
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int __rte_experimental
+rte_dev_event_monitor_start(void);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Stop the device event monitoring .
+ *
+ * @param none
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int __rte_experimental
+rte_dev_event_monitor_stop(void);
 #endif /* _RTE_DEV_H_ */
diff --git a/lib/librte_eal/common/include/rte_eal_interrupts.h b/lib/librte_eal/common/include/rte_eal_interrupts.h
index 3f792a9..6eb4932 100644
--- a/lib/librte_eal/common/include/rte_eal_interrupts.h
+++ b/lib/librte_eal/common/include/rte_eal_interrupts.h
@@ -34,6 +34,7 @@ enum rte_intr_handle_type {
 	RTE_INTR_HANDLE_ALARM,        /**< alarm handle */
 	RTE_INTR_HANDLE_EXT,          /**< external handler */
 	RTE_INTR_HANDLE_VDEV,         /**< virtual device */
+	RTE_INTR_HANDLE_DEV_EVENT,    /**< device event handle */
 	RTE_INTR_HANDLE_MAX           /**< count of elements */
 };
 
diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index 7e5bbe8..8578796 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -41,6 +41,7 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_lcore.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_timer.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_interrupts.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_alarm.c
+SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_dev.c
 
 # from common dir
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_common_lcore.c
diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
new file mode 100644
index 0000000..9d9e088
--- /dev/null
+++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
@@ -0,0 +1,134 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdio.h>
+#include <string.h>
+#include <inttypes.h>
+#include <sys/queue.h>
+#include <sys/signalfd.h>
+#include <sys/ioctl.h>
+#include <sys/socket.h>
+#include <linux/netlink.h>
+#include <sys/epoll.h>
+#include <unistd.h>
+#include <signal.h>
+#include <stdbool.h>
+#include <fcntl.h>
+
+#include <rte_malloc.h>
+#include <rte_bus.h>
+#include <rte_dev.h>
+#include <rte_devargs.h>
+#include <rte_debug.h>
+#include <rte_log.h>
+#include <rte_interrupts.h>
+
+#include "eal_private.h"
+#include "eal_thread.h"
+
+static struct rte_intr_handle intr_handle = {.fd = -1 };
+bool monitor_no_started = true;
+
+static int
+dev_uev_monitor_fd_new(void)
+{
+
+	int uevent_fd;
+
+	uevent_fd = socket(PF_NETLINK, SOCK_RAW | SOCK_CLOEXEC |
+			SOCK_NONBLOCK,
+			NETLINK_KOBJECT_UEVENT);
+	if (uevent_fd < 0) {
+		RTE_LOG(ERR, EAL, "create uevent fd failed\n");
+		return -1;
+	}
+	return uevent_fd;
+}
+
+static int
+dev_uev_monitor_create(int netlink_fd)
+{
+	struct sockaddr_nl addr;
+	int ret;
+	int size = 64 * 1024;
+	int nonblock = 1;
+
+	memset(&addr, 0, sizeof(addr));
+	addr.nl_family = AF_NETLINK;
+	addr.nl_pid = 0;
+	addr.nl_groups = 0xffffffff;
+
+	if (bind(netlink_fd, (struct sockaddr *) &addr, sizeof(addr)) < 0) {
+		RTE_LOG(ERR, EAL, "bind failed\n");
+		goto err;
+	}
+
+	setsockopt(netlink_fd, SOL_SOCKET, SO_PASSCRED, &size, sizeof(size));
+
+	ret = ioctl(netlink_fd, FIONBIO, &nonblock);
+	if (ret != 0) {
+		RTE_LOG(ERR, EAL, "ioctl(FIONBIO) failed\n");
+		goto err;
+	}
+	return 0;
+err:
+	close(netlink_fd);
+	return -1;
+}
+
+static void
+dev_uev_process(__rte_unused void *param)
+{
+	/* TODO: device uevent processing */
+}
+
+int __rte_experimental
+rte_dev_event_monitor_start(void)
+{
+	int ret;
+
+	if (!monitor_no_started)
+		return 0;
+
+	intr_handle.fd = dev_uev_monitor_fd_new();
+	intr_handle.type = RTE_INTR_HANDLE_DEV_EVENT;
+
+	ret = dev_uev_monitor_create(intr_handle.fd);
+
+	if (ret) {
+		RTE_LOG(ERR, EAL, "error create device event monitor\n");
+		return -1;
+	}
+
+	ret = rte_intr_callback_register(&intr_handle, dev_uev_process, NULL);
+
+	if (ret) {
+		RTE_LOG(ERR, EAL, "fail to register uevent callback\n");
+		return -1;
+	}
+
+	monitor_no_started = false;
+
+	return 0;
+}
+
+int __rte_experimental
+rte_dev_event_monitor_stop(void)
+{
+	int ret;
+
+	if (monitor_no_started)
+		return 0;
+
+	ret = rte_intr_callback_unregister(&intr_handle, dev_uev_process, NULL);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "fail to unregister uevent callback");
+		return ret;
+	}
+
+	close(intr_handle.fd);
+	intr_handle.fd = -1;
+	monitor_no_started = true;
+	return 0;
+}
diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
index f86f22f..842acaa 100644
--- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c
+++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
@@ -674,7 +674,10 @@ eal_intr_process_interrupts(struct epoll_event *events, int nfds)
 			bytes_read = 0;
 			call = true;
 			break;
-
+		case RTE_INTR_HANDLE_DEV_EVENT:
+			bytes_read = 0;
+			call = true;
+			break;
 		default:
 			bytes_read = 1;
 			break;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH V15 2/5] eal: add uevent pass and process function
  2018-03-21  5:27                                                     ` [PATCH V15 1/5] eal: add uevent monitor api and callback func Jeff Guo
@ 2018-03-21  5:27                                                       ` Jeff Guo
  2018-03-21 14:20                                                         ` Tan, Jianfeng
  2018-03-21  5:27                                                       ` [PATCH V15 3/5] app/testpmd: use uevent to monitor hotplug Jeff Guo
  1 sibling, 1 reply; 494+ messages in thread
From: Jeff Guo @ 2018-03-21  5:27 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, harry.van.haaren,
	jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

In order to handle the uevent which have been detected from the kernel
side, add uevent process function, let hot plug event to be example to
show uevent mechanism how to pass the uevent and process the uevent.

About uevent passing and processing, add below functions in linux eal
dev layer. FreeBSD not support uevent ,so let it to be void and do not
implement in function.
a.dev_uev_parse
b.dev_uev_receive
c.dev_uev_process

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v15->v14:
remove the uevent type check and any policy from eal, 
let it check and management in user's callback.
---
 lib/librte_eal/common/include/rte_dev.h | 17 ++++++
 lib/librte_eal/linuxapp/eal/eal_dev.c   | 95 ++++++++++++++++++++++++++++++++-
 2 files changed, 111 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h
index d2fcbc9..98ea12b 100644
--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -24,6 +24,23 @@ extern "C" {
 #include <rte_compat.h>
 #include <rte_log.h>
 
+#define RTE_EAL_UEV_MSG_LEN 4096
+#define RTE_EAL_UEV_MSG_ELEM_LEN 128
+
+enum rte_dev_state {
+	RTE_DEV_UNDEFINED,	/**< unknown device state */
+	RTE_DEV_FAULT,	/**< device fault or error */
+	RTE_DEV_PARSED,	/**< device has been scanned on bus*/
+	RTE_DEV_PROBED,	/**< device has been probed driver  */
+};
+
+enum rte_dev_event_subsystem {
+	RTE_DEV_EVENT_SUBSYSTEM_UNKNOWN,
+	RTE_DEV_EVENT_SUBSYSTEM_UIO,
+	RTE_DEV_EVENT_SUBSYSTEM_VFIO,
+	RTE_DEV_EVENT_SUBSYSTEM_MAX
+};
+
 /**
  * The device event type.
  */
diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
index 9d9e088..2b34e2c 100644
--- a/lib/librte_eal/linuxapp/eal/eal_dev.c
+++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
@@ -78,9 +78,102 @@ dev_uev_monitor_create(int netlink_fd)
 }
 
 static void
+dev_uev_parse(const char *buf, struct rte_dev_event *event)
+{
+	char action[RTE_EAL_UEV_MSG_ELEM_LEN];
+	char subsystem[RTE_EAL_UEV_MSG_ELEM_LEN];
+	char dev_path[RTE_EAL_UEV_MSG_ELEM_LEN];
+	char pci_slot_name[RTE_EAL_UEV_MSG_ELEM_LEN];
+	int i = 0;
+
+	memset(action, 0, RTE_EAL_UEV_MSG_ELEM_LEN);
+	memset(subsystem, 0, RTE_EAL_UEV_MSG_ELEM_LEN);
+	memset(dev_path, 0, RTE_EAL_UEV_MSG_ELEM_LEN);
+	memset(pci_slot_name, 0, RTE_EAL_UEV_MSG_ELEM_LEN);
+
+	while (i < RTE_EAL_UEV_MSG_LEN) {
+		for (; i < RTE_EAL_UEV_MSG_LEN; i++) {
+			if (*buf)
+				break;
+			buf++;
+		}
+		/**
+		 * check device uevent from kernel side, no need to check
+		 * uevent from udev.
+		 */
+		if (!strncmp(buf, "libudev", 7)) {
+			buf += 7;
+			i += 7;
+			return;
+		}
+		if (!strncmp(buf, "ACTION=", 7)) {
+			buf += 7;
+			i += 7;
+			snprintf(action, sizeof(action), "%s", buf);
+		} else if (!strncmp(buf, "DEVPATH=", 8)) {
+			buf += 8;
+			i += 8;
+			snprintf(dev_path, sizeof(dev_path), "%s", buf);
+		} else if (!strncmp(buf, "SUBSYSTEM=", 10)) {
+			buf += 10;
+			i += 10;
+			snprintf(subsystem, sizeof(subsystem), "%s", buf);
+		} else if (!strncmp(buf, "PCI_SLOT_NAME=", 14)) {
+			buf += 14;
+			i += 14;
+			snprintf(pci_slot_name, sizeof(subsystem), "%s", buf);
+			event->devname = pci_slot_name;
+		}
+		for (; i < RTE_EAL_UEV_MSG_LEN; i++) {
+			if (*buf == '\0')
+				break;
+			buf++;
+		}
+	}
+
+	if ((!strncmp(subsystem, "uio", 3)) ||
+		(!strncmp(subsystem, "pci", 3)))
+		event->subsystem = RTE_DEV_EVENT_SUBSYSTEM_UIO;
+	if (!strncmp(action, "add", 3))
+		event->type = RTE_DEV_EVENT_ADD;
+	if (!strncmp(action, "remove", 6))
+		event->type = RTE_DEV_EVENT_REMOVE;
+}
+
+static int
+dev_uev_receive(int fd, struct rte_dev_event *uevent)
+{
+	int ret;
+	char buf[RTE_EAL_UEV_MSG_LEN];
+
+	memset(uevent, 0, sizeof(struct rte_dev_event));
+	memset(buf, 0, RTE_EAL_UEV_MSG_LEN);
+
+	ret = recv(fd, buf, RTE_EAL_UEV_MSG_LEN - 1, MSG_DONTWAIT);
+	if (ret < 0) {
+		RTE_LOG(ERR, EAL,
+		"Socket read error(%d): %s\n",
+		errno, strerror(errno));
+		return -1;
+	} else if (ret == 0)
+		/* connection closed */
+		return -1;
+
+	dev_uev_parse(buf, uevent);
+
+	return 0;
+}
+
+static void
 dev_uev_process(__rte_unused void *param)
 {
-	/* TODO: device uevent processing */
+	struct rte_dev_event uevent;
+
+	if (dev_uev_receive(intr_handle.fd, &uevent))
+		return;
+
+	if (uevent.devname)
+		_rte_dev_callback_process(uevent.devname, uevent.type, NULL);
 }
 
 int __rte_experimental
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH V15 3/5] app/testpmd: use uevent to monitor hotplug
  2018-03-21  5:27                                                     ` [PATCH V15 1/5] eal: add uevent monitor api and callback func Jeff Guo
  2018-03-21  5:27                                                       ` [PATCH V15 2/5] eal: add uevent pass and process function Jeff Guo
@ 2018-03-21  5:27                                                       ` Jeff Guo
  2018-03-26 10:55                                                         ` [PATCH V16 0/3] add device event monitor framework Jeff Guo
  2018-03-26 11:20                                                         ` [PATCH V16 0/4] add device event monitor framework Jeff Guo
  1 sibling, 2 replies; 494+ messages in thread
From: Jeff Guo @ 2018-03-21  5:27 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, harry.van.haaren,
	jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

use testpmd for example, to show app how to request and use
uevent monitoring to handle the hot removal event and the
hot insertion event.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v15->v14:
add "--hot-plug" configure parameter in testpmd to switch the hotplug feature
---
 app/test-pmd/parameters.c |   5 +-
 app/test-pmd/testpmd.c    | 194 +++++++++++++++++++++++++++++++++++++++++++++-
 app/test-pmd/testpmd.h    |  11 +++
 3 files changed, 208 insertions(+), 2 deletions(-)

diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index 97d22b8..825d602 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -186,6 +186,7 @@ usage(char* progname)
 	printf("  --flow-isolate-all: "
 	       "requests flow API isolated mode on all ports at initialization time.\n");
 	printf("  --tx-offloads=0xXXXXXXXX: hexadecimal bitmask of TX queue offloads\n");
+	printf("  --hot-plug: enalbe hot plug for device.\n");
 }
 
 #ifdef RTE_LIBRTE_CMDLINE
@@ -621,6 +622,7 @@ launch_args_parse(int argc, char** argv)
 		{ "print-event",		1, 0, 0 },
 		{ "mask-event",			1, 0, 0 },
 		{ "tx-offloads",		1, 0, 0 },
+		{ "hot-plug",			0, 0, 0 },
 		{ 0, 0, 0, 0 },
 	};
 
@@ -1102,7 +1104,8 @@ launch_args_parse(int argc, char** argv)
 					rte_exit(EXIT_FAILURE,
 						 "invalid mask-event argument\n");
 				}
-
+			if (!strcmp(lgopts[opt_idx].name, "hot-plug"))
+				hot_plug = 1;
 			break;
 		case 'h':
 			usage(argv[0]);
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 4c0e258..915532e 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -12,6 +12,7 @@
 #include <sys/mman.h>
 #include <sys/types.h>
 #include <errno.h>
+#include <stdbool.h>
 
 #include <sys/queue.h>
 #include <sys/stat.h>
@@ -284,6 +285,9 @@ uint8_t lsc_interrupt = 1; /* enabled by default */
  */
 uint8_t rmv_interrupt = 1; /* enabled by default */
 
+
+uint8_t hot_plug = 0; /**< hotplug disabled by default. */
+
 /*
  * Display or mask ether events
  * Default to all events except VF_MBOX
@@ -384,6 +388,8 @@ uint8_t bitrate_enabled;
 struct gro_status gro_ports[RTE_MAX_ETHPORTS];
 uint8_t gro_flush_cycles = GRO_DEFAULT_FLUSH_CYCLES;
 
+static struct hotplug_request_list hp_list;
+
 /* Forward function declarations */
 static void map_port_queue_stats_mapping_registers(portid_t pi,
 						   struct rte_port *port);
@@ -391,6 +397,14 @@ static void check_all_ports_link_status(uint32_t port_mask);
 static int eth_event_callback(portid_t port_id,
 			      enum rte_eth_event_type type,
 			      void *param, void *ret_param);
+static int eth_uevent_callback(char *device_name,
+				enum rte_dev_event_type type,
+				void *param);
+static int eth_uevent_callback_register(portid_t port_id);
+static bool in_hotplug_list(const char *dev_name);
+
+static int hotplug_list_add(struct rte_device *device,
+				enum rte_kernel_driver device_kdrv);
 
 /*
  * Check if all the ports are started.
@@ -1853,6 +1867,27 @@ reset_port(portid_t pid)
 	printf("Done\n");
 }
 
+static int
+eth_uevent_callback_register(portid_t port_id)
+{
+	int diag;
+	char device_name[128];
+
+	snprintf(device_name, sizeof(device_name),
+		"%s", rte_eth_devices[port_id].device->name);
+
+	/* register the uevent callback */
+
+	diag = rte_dev_callback_register(device_name,
+		eth_uevent_callback, (void *)(intptr_t)port_id);
+	if (diag) {
+		printf("Failed to setup uevent callback\n");
+		return -1;
+	}
+
+	return 0;
+}
+
 void
 attach_port(char *identifier)
 {
@@ -1869,6 +1904,8 @@ attach_port(char *identifier)
 	if (rte_eth_dev_attach(identifier, &pi))
 		return;
 
+	eth_uevent_callback_register(pi);
+
 	socket_id = (unsigned)rte_eth_dev_socket_id(pi);
 	/* if socket_id is invalid, set to 0 */
 	if (check_socket_id(socket_id) < 0)
@@ -1880,6 +1917,12 @@ attach_port(char *identifier)
 
 	ports[pi].port_status = RTE_PORT_STOPPED;
 
+	if (hot_plug) {
+		hotplug_list_add(rte_eth_devices[pi].device,
+				 rte_eth_devices[pi].data->kdrv);
+		eth_uevent_callback_register(pi);
+	}
+
 	printf("Port %d is attached. Now total ports is %d\n", pi, nb_ports);
 	printf("Done\n");
 }
@@ -1906,6 +1949,12 @@ detach_port(portid_t port_id)
 
 	nb_ports = rte_eth_dev_count();
 
+	if (hot_plug) {
+		hotplug_list_add(rte_eth_devices[port_id].device,
+				 rte_eth_devices[port_id].data->kdrv);
+		eth_uevent_callback_register(port_id);
+	}
+
 	printf("Port '%s' is detached. Now total ports is %d\n",
 			name, nb_ports);
 	printf("Done\n");
@@ -1929,6 +1978,9 @@ pmd_test_exit(void)
 			close_port(pt_id);
 		}
 	}
+
+	rte_dev_event_monitor_stop();
+
 	printf("\nBye...\n");
 }
 
@@ -2013,6 +2065,49 @@ rmv_event_callback(void *arg)
 			dev->device->name);
 }
 
+static void
+rmv_uevent_callback(void *arg)
+{
+	char name[RTE_ETH_NAME_MAX_LEN];
+	uint8_t port_id = (intptr_t)arg;
+
+	rte_eal_alarm_cancel(rmv_uevent_callback, arg);
+
+	RTE_ETH_VALID_PORTID_OR_RET(port_id);
+	printf("removing port id:%u\n", port_id);
+
+	if (!in_hotplug_list(rte_eth_devices[port_id].device->name))
+		return;
+
+	stop_packet_forwarding();
+
+	stop_port(port_id);
+	close_port(port_id);
+	if (rte_eth_dev_detach(port_id, name)) {
+		RTE_LOG(ERR, USER1, "Failed to detach port '%s'\n", name);
+		return;
+	}
+
+	nb_ports = rte_eth_dev_count();
+
+	printf("Port '%s' is detached. Now total ports is %d\n",
+			name, nb_ports);
+}
+
+static void
+add_uevent_callback(void *arg)
+{
+	char *dev_name = (char *)arg;
+
+	rte_eal_alarm_cancel(add_uevent_callback, arg);
+
+	if (!in_hotplug_list(dev_name))
+		return;
+
+	RTE_LOG(ERR, EAL, "add device: %s\n", dev_name);
+	attach_port(dev_name);
+}
+
 /* This function is used by the interrupt thread */
 static int
 eth_event_callback(portid_t port_id, enum rte_eth_event_type type, void *param,
@@ -2059,6 +2154,85 @@ eth_event_callback(portid_t port_id, enum rte_eth_event_type type, void *param,
 	return 0;
 }
 
+static bool
+in_hotplug_list(const char *dev_name)
+{
+	struct hotplug_request *hp_request = NULL;
+
+	TAILQ_FOREACH(hp_request, &hp_list, next) {
+		if (!strcmp(hp_request->dev_name, dev_name))
+			break;
+	}
+
+	if (hp_request)
+		return true;
+
+	return false;
+}
+
+static int
+hotplug_list_add(struct rte_device *device, enum rte_kernel_driver device_kdrv)
+{
+	struct hotplug_request *hp_request;
+
+	hp_request = rte_zmalloc("hoplug request",
+			sizeof(*hp_request), 0);
+	if (hp_request == NULL) {
+		fprintf(stderr, "%s can not alloc memory\n",
+			__func__);
+		return -ENOMEM;
+	}
+
+	hp_request->dev_name = device->name;
+	hp_request->dev_kdrv = device_kdrv;
+
+	TAILQ_INSERT_TAIL(&hp_list, hp_request, next);
+
+	return 0;
+}
+
+/* This function is used by the interrupt thread */
+static int
+eth_uevent_callback(char *device_name, enum rte_dev_event_type type, void *arg)
+{
+	static const char * const event_desc[] = {
+		[RTE_DEV_EVENT_UNKNOWN] = "Unknown",
+		[RTE_DEV_EVENT_ADD] = "add",
+		[RTE_DEV_EVENT_REMOVE] = "remove",
+	};
+	char *dev_name = malloc(strlen(device_name) + 1);
+
+	strcpy(dev_name, device_name);
+
+	if (type >= RTE_DEV_EVENT_MAX) {
+		fprintf(stderr, "%s called upon invalid event %d\n",
+			__func__, type);
+		fflush(stderr);
+	} else if (event_print_mask & (UINT32_C(1) << type)) {
+		printf("%s event\n",
+			event_desc[type]);
+		fflush(stdout);
+	}
+
+	switch (type) {
+	case RTE_DEV_EVENT_REMOVE:
+		if (rte_eal_alarm_set(100000,
+			rmv_uevent_callback, arg))
+			fprintf(stderr, "Could not set up deferred "
+				"device removal\n");
+		break;
+	case RTE_DEV_EVENT_ADD:
+		if (rte_eal_alarm_set(100000,
+			add_uevent_callback, dev_name))
+			fprintf(stderr, "Could not set up deferred "
+				"device add\n");
+		break;
+	default:
+		break;
+	}
+	return 0;
+}
+
 static int
 set_tx_queue_stats_mapping_registers(portid_t port_id, struct rte_port *port)
 {
@@ -2474,8 +2648,9 @@ signal_handler(int signum)
 int
 main(int argc, char** argv)
 {
-	int  diag;
+	int diag;
 	portid_t port_id;
+	int ret;
 
 	signal(SIGINT, signal_handler);
 	signal(SIGTERM, signal_handler);
@@ -2543,6 +2718,23 @@ main(int argc, char** argv)
 		       nb_rxq, nb_txq);
 
 	init_config();
+
+	if (hot_plug) {
+		/* enable hot plug monitoring */
+		ret = rte_dev_event_monitor_start();
+		if (ret) {
+			rte_errno = EINVAL;
+			return -1;
+		}
+		if (TAILQ_EMPTY(&hp_list))
+			TAILQ_INIT(&hp_list);
+		RTE_ETH_FOREACH_DEV(port_id) {
+			hotplug_list_add(rte_eth_devices[port_id].device,
+					 rte_eth_devices[port_id].data->kdrv);
+			eth_uevent_callback_register(port_id);
+		}
+	}
+
 	if (start_port(RTE_PORT_ALL) != 0)
 		rte_exit(EXIT_FAILURE, "Start ports failed\n");
 
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 153abea..c619e11 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -63,6 +63,15 @@ typedef uint16_t streamid_t;
 #define TM_MODE			0
 #endif
 
+struct hotplug_request {
+	TAILQ_ENTRY(hotplug_request) next; /**< Callbacks list */
+	const char *dev_name;            /* request device name */
+	enum rte_kernel_driver dev_kdrv;            /* kernel driver binded */
+};
+
+/** @internal Structure to keep track of registered callbacks */
+TAILQ_HEAD(hotplug_request_list, hotplug_request);
+
 enum {
 	PORT_TOPOLOGY_PAIRED,
 	PORT_TOPOLOGY_CHAINED,
@@ -319,6 +328,8 @@ extern volatile int test_done; /* stop packet forwarding when set to 1. */
 extern uint8_t lsc_interrupt; /**< disabled by "--no-lsc-interrupt" parameter */
 extern uint8_t rmv_interrupt; /**< disabled by "--no-rmv-interrupt" parameter */
 extern uint32_t event_print_mask;
+extern uint8_t hot_plug; /**< enable by "--hot-plug" parameter */
+
 /**< set by "--print-event xxxx" and "--mask-event xxxx parameters */
 
 #ifdef RTE_LIBRTE_IXGBE_BYPASS
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH V15 1/2] pci_uio: add uevent hotplug failure handler in uio
  2018-01-10  9:12                               ` [PATCH V9 5/5] pci: add driver auto bind for hot insertion Jeff Guo
@ 2018-03-21  6:11                                 ` Jeff Guo
  2018-03-21  6:11                                   ` [PATCH V15 2/2] pci: add driver auto bind for hot insertion Jeff Guo
  2018-03-30  3:35                                   ` [PATCH V15 1/2] pci_uio: add uevent hotplug failure handler in uio Tan, Jianfeng
  0 siblings, 2 replies; 494+ messages in thread
From: Jeff Guo @ 2018-03-21  6:11 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, harry.van.haaren,
	jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

when detect hot removal uevent of device, the device resource become
invalid, in order to avoid unexpected usage of this resource, remap
the device resource to be a fake memory, that would lead the application
keep running well but not encounter system core dump.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v15->v14:
delete find_by_name in bus ops, it is no used. use additional flag and
original pci map resource function to replace of adding a new fixed
memory mapping function.
---
 app/test-pmd/testpmd.c                    |  4 +++
 drivers/bus/pci/bsd/pci.c                 | 23 +++++++++++++++++
 drivers/bus/pci/linux/pci.c               | 33 +++++++++++++++++++++++++
 drivers/bus/pci/pci_common.c              | 21 ++++++++++++++++
 drivers/bus/pci/pci_common_uio.c          | 41 +++++++++++++++++++++++++++++++
 drivers/bus/pci/private.h                 | 12 +++++++++
 drivers/bus/pci/rte_bus_pci.h             |  9 +++++++
 drivers/bus/vdev/vdev.c                   |  7 ++++++
 lib/librte_eal/bsdapp/eal/eal_dev.c       |  8 ++++++
 lib/librte_eal/common/eal_common_bus.c    |  1 +
 lib/librte_eal/common/include/rte_bus.h   | 15 +++++++++++
 lib/librte_eal/common/include/rte_dev.h   | 18 ++++++++++++++
 lib/librte_eal/linuxapp/eal/eal_dev.c     | 34 +++++++++++++++++++++++++
 lib/librte_eal/linuxapp/igb_uio/igb_uio.c |  5 ++++
 14 files changed, 231 insertions(+)

diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 915532e..1c4afea 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -2079,6 +2079,10 @@ rmv_uevent_callback(void *arg)
 	if (!in_hotplug_list(rte_eth_devices[port_id].device->name))
 		return;
 
+	/* do failure handler before stop and close the device */
+	rte_dev_failure_handler(rte_eth_devices[port_id].device,
+				rte_eth_devices[port_id].data->kdrv);
+
 	stop_packet_forwarding();
 
 	stop_port(port_id);
diff --git a/drivers/bus/pci/bsd/pci.c b/drivers/bus/pci/bsd/pci.c
index 655b34b..d7165b9 100644
--- a/drivers/bus/pci/bsd/pci.c
+++ b/drivers/bus/pci/bsd/pci.c
@@ -97,6 +97,29 @@ rte_pci_unmap_device(struct rte_pci_device *dev)
 	}
 }
 
+/* re-map pci device */
+int
+rte_pci_remap_device(struct rte_pci_device *dev)
+{
+	int ret;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	switch (dev->kdrv) {
+	case RTE_KDRV_NIC_UIO:
+		ret = pci_uio_remap_resource(dev);
+		break;
+	default:
+		RTE_LOG(DEBUG, EAL,
+			"  Not managed by a supported kernel driver, skipped\n");
+		ret = 1;
+		break;
+	}
+
+	return ret;
+}
+
 void
 pci_uio_free_resource(struct rte_pci_device *dev,
 		struct mapped_pci_resource *uio_res)
diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index abde641..a7dfec7 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -116,6 +116,38 @@ rte_pci_unmap_device(struct rte_pci_device *dev)
 	}
 }
 
+/* Map pci device */
+int
+rte_pci_remap_device(struct rte_pci_device *dev)
+{
+	int ret = -1;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	switch (dev->kdrv) {
+	case RTE_KDRV_VFIO:
+#ifdef VFIO_PRESENT
+		/* no thing to do */
+#endif
+		break;
+	case RTE_KDRV_IGB_UIO:
+	case RTE_KDRV_UIO_GENERIC:
+		if (rte_eal_using_phys_addrs()) {
+			/* map resources for devices that use uio */
+			ret = pci_uio_remap_resource(dev);
+		}
+		break;
+	default:
+		RTE_LOG(DEBUG, EAL,
+			"  Not managed by a supported kernel driver, skipped\n");
+		ret = 1;
+		break;
+	}
+
+	return ret;
+}
+
 void *
 pci_find_max_end_va(void)
 {
@@ -357,6 +389,7 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr *addr)
 		rte_pci_add_device(dev);
 	}
 
+	dev->device.state = RTE_DEV_PARSED;
 	return 0;
 }
 
diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
index 2a00f36..46921a4 100644
--- a/drivers/bus/pci/pci_common.c
+++ b/drivers/bus/pci/pci_common.c
@@ -253,6 +253,7 @@ pci_probe_all_drivers(struct rte_pci_device *dev)
 		if (rc > 0)
 			/* positive value means driver doesn't support it */
 			continue;
+		dev->device.state = RTE_DEV_PARSED;
 		return 0;
 	}
 	return 1;
@@ -474,6 +475,25 @@ pci_find_device(const struct rte_device *start, rte_dev_cmp_t cmp,
 }
 
 static int
+pci_remap_device(struct rte_device *dev)
+{
+	struct rte_pci_device *pdev;
+	int ret;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	pdev = RTE_DEV_TO_PCI(dev);
+
+	/* remap resources for devices that use igb_uio */
+	ret = rte_pci_remap_device(pdev);
+	if (ret != 0)
+		RTE_LOG(ERR, EAL, "failed to remap device %s",
+			dev->name);
+	return ret;
+}
+
+static int
 pci_plug(struct rte_device *dev)
 {
 	return pci_probe_all_drivers(RTE_DEV_TO_PCI(dev));
@@ -503,6 +523,7 @@ struct rte_pci_bus rte_pci_bus = {
 		.unplug = pci_unplug,
 		.parse = pci_parse,
 		.get_iommu_class = rte_pci_get_iommu_class,
+		.remap_device = pci_remap_device,
 	},
 	.device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
 	.driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
diff --git a/drivers/bus/pci/pci_common_uio.c b/drivers/bus/pci/pci_common_uio.c
index 54bc20b..3a0a2bb 100644
--- a/drivers/bus/pci/pci_common_uio.c
+++ b/drivers/bus/pci/pci_common_uio.c
@@ -146,6 +146,47 @@ pci_uio_unmap(struct mapped_pci_resource *uio_res)
 	}
 }
 
+/* remap the PCI resource of a PCI device in private virtual memory */
+int
+pci_uio_remap_resource(struct rte_pci_device *dev)
+{
+	int i;
+	uint64_t phaddr;
+	void *map_address;
+
+	if (dev == NULL)
+		return -1;
+
+	close(dev->intr_handle.fd);
+	if (dev->intr_handle.uio_cfg_fd >= 0) {
+		close(dev->intr_handle.uio_cfg_fd);
+		dev->intr_handle.uio_cfg_fd = -1;
+	}
+
+	dev->intr_handle.fd = -1;
+	dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
+
+	/* Map all BARs */
+	for (i = 0; i != PCI_MAX_RESOURCE; i++) {
+		/* skip empty BAR */
+		phaddr = dev->mem_resource[i].phys_addr;
+		if (phaddr == 0)
+			continue;
+		pci_unmap_resource(dev->mem_resource[i].addr,
+				(size_t)dev->mem_resource[i].len);
+		map_address = pci_map_resource(
+				dev->mem_resource[i].addr, -1, 0,
+				(size_t)dev->mem_resource[i].len,
+				MAP_ANONYMOUS);
+		if (map_address == MAP_FAILED)
+			return -1;
+		memset(map_address, 0xFF, (size_t)dev->mem_resource[i].len);
+		dev->mem_resource[i].addr = map_address;
+	}
+
+	return 0;
+}
+
 static struct mapped_pci_resource *
 pci_uio_find_resource(struct rte_pci_device *dev)
 {
diff --git a/drivers/bus/pci/private.h b/drivers/bus/pci/private.h
index 88fa587..7a862ef 100644
--- a/drivers/bus/pci/private.h
+++ b/drivers/bus/pci/private.h
@@ -173,6 +173,18 @@ void pci_uio_free_resource(struct rte_pci_device *dev,
 		struct mapped_pci_resource *uio_res);
 
 /**
+ * remap the pci uio resource..
+ *
+ * @param dev
+ *   Point to the struct rte pci device.
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+pci_uio_remap_resource(struct rte_pci_device *dev);
+
+/**
  * Map device memory to uio resource
  *
  * This function is private to EAL.
diff --git a/drivers/bus/pci/rte_bus_pci.h b/drivers/bus/pci/rte_bus_pci.h
index 357afb9..6f9cd8b 100644
--- a/drivers/bus/pci/rte_bus_pci.h
+++ b/drivers/bus/pci/rte_bus_pci.h
@@ -168,6 +168,15 @@ int rte_pci_map_device(struct rte_pci_device *dev);
 void rte_pci_unmap_device(struct rte_pci_device *dev);
 
 /**
+ * Remap this device
+ *
+ * @param dev
+ *   A pointer to a rte_pci_device structure describing the device
+ *   to use
+ */
+int rte_pci_remap_device(struct rte_pci_device *dev);
+
+/**
  * Dump the content of the PCI bus.
  *
  * @param f
diff --git a/drivers/bus/vdev/vdev.c b/drivers/bus/vdev/vdev.c
index e4bc724..efc348b 100644
--- a/drivers/bus/vdev/vdev.c
+++ b/drivers/bus/vdev/vdev.c
@@ -400,6 +400,12 @@ vdev_find_device(const struct rte_device *start, rte_dev_cmp_t cmp,
 }
 
 static int
+vdev_remap_device(struct rte_device *dev)
+{
+	RTE_SET_USED(dev);
+	return 0;
+}
+static int
 vdev_plug(struct rte_device *dev)
 {
 	return vdev_probe_all_drivers(RTE_DEV_TO_VDEV(dev));
@@ -418,6 +424,7 @@ static struct rte_bus rte_vdev_bus = {
 	.plug = vdev_plug,
 	.unplug = vdev_unplug,
 	.parse = vdev_parse,
+	.remap_device = vdev_remap_device,
 };
 
 RTE_REGISTER_BUS(vdev, rte_vdev_bus);
diff --git a/lib/librte_eal/bsdapp/eal/eal_dev.c b/lib/librte_eal/bsdapp/eal/eal_dev.c
index 3b7bbf2..a076ec7 100644
--- a/lib/librte_eal/bsdapp/eal/eal_dev.c
+++ b/lib/librte_eal/bsdapp/eal/eal_dev.c
@@ -31,3 +31,11 @@ rte_dev_event_monitor_stop(void)
 	RTE_LOG(ERR, EAL, "Not support event monitor for FreeBSD\n");
 	return -1;
 }
+
+int __rte_experimental
+rte_dev_failure_handler(struct rte_device *dev,
+					enum rte_kernel_driver kdrv_type)
+{
+	RTE_LOG(ERR, EAL, "Not support device failure handler for FreeBSD\n");
+	return -1;
+}
diff --git a/lib/librte_eal/common/eal_common_bus.c b/lib/librte_eal/common/eal_common_bus.c
index 3e022d5..5510bbe 100644
--- a/lib/librte_eal/common/eal_common_bus.c
+++ b/lib/librte_eal/common/eal_common_bus.c
@@ -53,6 +53,7 @@ rte_bus_register(struct rte_bus *bus)
 	RTE_VERIFY(bus->find_device);
 	/* Buses supporting driver plug also require unplug. */
 	RTE_VERIFY(!bus->plug || bus->unplug);
+	RTE_VERIFY(bus->remap_device);
 
 	TAILQ_INSERT_TAIL(&rte_bus_list, bus, next);
 	RTE_LOG(DEBUG, EAL, "Registered [%s] bus.\n", bus->name);
diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
index 6fb0834..1f3c09b 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -168,6 +168,20 @@ typedef int (*rte_bus_unplug_t)(struct rte_device *dev);
 typedef int (*rte_bus_parse_t)(const char *name, void *addr);
 
 /**
+ * Implementation specific remap function which is responsible for remmaping
+ * devices on that bus from original share memory resource to a anonymous
+ * memory resource for the sake of device has been removal.
+ *
+ * @param dev
+ *	Device pointer that was returned by a previous call to find_device.
+ *
+ * @return
+ *	0 on success.
+ *	!0 on error.
+ */
+typedef int (*rte_bus_remap_device_t)(struct rte_device *dev);
+
+/**
  * Bus scan policies
  */
 enum rte_bus_scan_mode {
@@ -209,6 +223,7 @@ struct rte_bus {
 	rte_bus_plug_t plug;         /**< Probe single device for drivers */
 	rte_bus_unplug_t unplug;     /**< Remove single device from driver */
 	rte_bus_parse_t parse;       /**< Parse a device name */
+	rte_bus_remap_device_t remap_device;       /**< remap a device */
 	struct rte_bus_conf conf;    /**< Bus configuration */
 	rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
 };
diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h
index 98ea12b..10a5fcf 100644
--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -180,6 +180,7 @@ struct rte_device {
 	const struct rte_driver *driver;/**< Associated driver */
 	int numa_node;                /**< NUMA node connection */
 	struct rte_devargs *devargs;  /**< Device user arguments */
+	enum rte_dev_state state;  /**< Device state */
 };
 
 /**
@@ -405,4 +406,21 @@ rte_dev_event_monitor_start(void);
  */
 int __rte_experimental
 rte_dev_event_monitor_stop(void);
+
+/**
+ * It can be used to do device failure handler to avoid
+ * system core dump when failure occur.
+ *
+ * @param dev
+ *  The prointer to device structure.
+ * @param kdrv_type
+ *  The specific kernel driver's type.
+ *
+ * @return
+ *  - On success, zero.
+ *  - On failure, a negative value.
+ */
+int __rte_experimental
+rte_dev_failure_handler(struct rte_device *dev,
+			     enum rte_kernel_driver kdrv_type);
 #endif /* _RTE_DEV_H_ */
diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
index 2b34e2c..fa63105 100644
--- a/lib/librte_eal/linuxapp/eal/eal_dev.c
+++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
@@ -225,3 +225,37 @@ rte_dev_event_monitor_stop(void)
 	monitor_no_started = true;
 	return 0;
 }
+
+int __rte_experimental
+rte_dev_failure_handler(struct rte_device *dev,
+					enum rte_kernel_driver kdrv_type)
+{
+	struct rte_bus *bus = rte_bus_find_by_device_name(dev->name);
+	int ret;
+
+	switch (kdrv_type) {
+	case RTE_KDRV_IGB_UIO:
+		if ((!dev) || dev->state == RTE_DEV_UNDEFINED)
+			return -1;
+		dev->state = RTE_DEV_FAULT;
+		/**
+		 * remap the resource to be fake
+		 * before user's removal processing
+		 */
+		ret = bus->remap_device(dev);
+		if (ret) {
+			RTE_LOG(ERR, EAL, "Driver cannot remap the "
+				"device (%s)\n",
+				dev->name);
+			return -1;
+		}
+		break;
+	case RTE_KDRV_VFIO:
+		break;
+	case RTE_KDRV_UIO_GENERIC:
+		break;
+	default:
+		break;
+	}
+	return 0;
+}
diff --git a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
index 4cae4dd..9c50876 100644
--- a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
+++ b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
@@ -350,6 +350,11 @@ igbuio_pci_release(struct uio_info *info, struct inode *inode)
 		return 0;
 	}
 
+	/* check if device has been remove before release */
+	if ((&dev->dev.kobj)->state_remove_uevent_sent == 1) {
+		pr_info("The device has been removed\n");
+		return -1;
+	}
 	/* disable interrupts */
 	igbuio_pci_disable_interrupts(udev);
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH V15 2/2] pci: add driver auto bind for hot insertion
  2018-03-21  6:11                                 ` [PATCH V15 1/2] pci_uio: add uevent hotplug failure handler in uio Jeff Guo
@ 2018-03-21  6:11                                   ` Jeff Guo
  2018-03-30  3:35                                   ` [PATCH V15 1/2] pci_uio: add uevent hotplug failure handler in uio Tan, Jianfeng
  1 sibling, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-03-21  6:11 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, harry.van.haaren,
	jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

Normally we use drivectl or dpdk-devbind.py to bind kernel driver before
application running, so if we want to automatically driver binding after
application run, need to implement a auto bind function, that would
benefit for hot insertion case, when detect hot insertion uevent of
device, user could auto bind the driver according some user policy and
then attach device, let app running smoothly when hotplug behavior occur.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v15->v14:
delete bind_driver in bus ops, replace to add bind_driver api in eal
dev to let app directly call
---
 app/test-pmd/testpmd.c                  | 17 ++++++++++++
 lib/librte_eal/bsdapp/eal/eal_dev.c     |  7 +++++
 lib/librte_eal/common/include/rte_dev.h | 15 ++++++++++
 lib/librte_eal/linuxapp/eal/eal_dev.c   | 49 +++++++++++++++++++++++++++++++++
 4 files changed, 88 insertions(+)

diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 1c4afea..7eb9c48 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -2174,6 +2174,18 @@ in_hotplug_list(const char *dev_name)
 	return false;
 }
 
+static enum rte_kernel_driver
+get_hotplug_driver(const char *dev_name)
+{
+	struct hotplug_request *hp_request = NULL;
+
+	TAILQ_FOREACH(hp_request, &hp_list, next) {
+		if (!strcmp(hp_request->dev_name, dev_name))
+			return hp_request->dev_kdrv;
+	}
+	return -1;
+}
+
 static int
 hotplug_list_add(struct rte_device *device, enum rte_kernel_driver device_kdrv)
 {
@@ -2226,6 +2238,11 @@ eth_uevent_callback(char *device_name, enum rte_dev_event_type type, void *arg)
 				"device removal\n");
 		break;
 	case RTE_DEV_EVENT_ADD:
+		/**
+		 * bind the driver to the device
+		 * before process of hot plug adding device
+		 */
+		rte_dev_bind_driver(dev_name, get_hotplug_driver(dev_name));
 		if (rte_eal_alarm_set(100000,
 			add_uevent_callback, dev_name))
 			fprintf(stderr, "Could not set up deferred "
diff --git a/lib/librte_eal/bsdapp/eal/eal_dev.c b/lib/librte_eal/bsdapp/eal/eal_dev.c
index a076ec7..7f8175b 100644
--- a/lib/librte_eal/bsdapp/eal/eal_dev.c
+++ b/lib/librte_eal/bsdapp/eal/eal_dev.c
@@ -33,6 +33,13 @@ rte_dev_event_monitor_stop(void)
 }
 
 int __rte_experimental
+rte_dev_bind_driver(const char *dev_name, enum rte_kernel_driver kdrv_type)
+{
+	RTE_LOG(ERR, EAL, "Not support device bind driver for FreeBSD\n");
+	return -1;
+}
+
+int __rte_experimental
 rte_dev_failure_handler(struct rte_device *dev,
 					enum rte_kernel_driver kdrv_type)
 {
diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h
index 10a5fcf..e87639f 100644
--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -408,6 +408,21 @@ int __rte_experimental
 rte_dev_event_monitor_stop(void);
 
 /**
+ * It can be used to bind a device to a specific type of driver.
+ *
+ * @param dev_name
+ *  The device name.
+ * @param kdrv_type
+ *  The specific kernel driver's type.
+ *
+ * @return
+ *  - On success, zero.
+ *  - On failure, a negative value.
+ */
+int __rte_experimental
+rte_dev_bind_driver(const char *dev_name, enum rte_kernel_driver kdrv_type);
+
+/**
  * It can be used to do device failure handler to avoid
  * system core dump when failure occur.
  *
diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
index fa63105..9b4adc6 100644
--- a/lib/librte_eal/linuxapp/eal/eal_dev.c
+++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
@@ -227,6 +227,55 @@ rte_dev_event_monitor_stop(void)
 }
 
 int __rte_experimental
+rte_dev_bind_driver(const char *dev_name, enum rte_kernel_driver kdrv_type)
+{
+	const char *kdrv_name;
+	char drv_override_path[1024];
+	int drv_override_fd;
+
+	if (!dev_name || !kdrv_type)
+		return -1;
+
+	switch (kdrv_type) {
+	case RTE_KDRV_IGB_UIO:
+		kdrv_name = "igb_uio";
+		break;
+	case RTE_KDRV_VFIO:
+		kdrv_name = "vfio-pci";
+		break;
+	case RTE_KDRV_UIO_GENERIC:
+		kdrv_name = "uio_pci_generic";
+		break;
+	default:
+		break;
+	}
+
+	snprintf(drv_override_path, sizeof(drv_override_path),
+		"/sys/bus/pci/devices/%s/driver_override", dev_name);
+
+	/* specify the driver for a device by writing to driver_override */
+	drv_override_fd = open(drv_override_path, O_WRONLY);
+	if (drv_override_fd < 0) {
+		RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
+			drv_override_path, strerror(errno));
+		goto err;
+	}
+
+	if (write(drv_override_fd, kdrv_name, sizeof(kdrv_name)) < 0) {
+		RTE_LOG(ERR, EAL,
+			"Error: bind failed - Cannot write "
+			"driver %s to device %s\n", kdrv_name, dev_name);
+		goto err;
+	}
+
+	close(drv_override_fd);
+	return 0;
+err:
+	close(drv_override_fd);
+	return -1;
+}
+
+int __rte_experimental
 rte_dev_failure_handler(struct rte_device *dev,
 					enum rte_kernel_driver kdrv_type)
 {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* Re: [PATCH V15 2/5] eal: add uevent pass and process function
  2018-03-21  5:27                                                       ` [PATCH V15 2/5] eal: add uevent pass and process function Jeff Guo
@ 2018-03-21 14:20                                                         ` Tan, Jianfeng
  2018-03-22  8:20                                                           ` Guo, Jia
  0 siblings, 1 reply; 494+ messages in thread
From: Tan, Jianfeng @ 2018-03-21 14:20 UTC (permalink / raw)
  To: Jeff Guo, stephen, bruce.richardson, ferruh.yigit,
	konstantin.ananyev, gaetan.rivet, jingjing.wu, thomas, motih,
	harry.van.haaren
  Cc: jblunck, shreyansh.jain, dev, helin.zhang



On 3/21/2018 1:27 PM, Jeff Guo wrote:
> In order to handle the uevent which have been detected from the kernel
> side, add uevent process function, let hot plug event to be example to
> show uevent mechanism how to pass the uevent and process the uevent.

In fact, how to pass the uevent to eal/linux for processing, is already 
done by last patch, by registering a callback into interrupt thread.

In this patch, we are actually showing how to process the uevent, and 
translate it into RTE_DEV_EVENT_ADD, RTE_DEV_EVENT_DEL, etc.

So the title would be something like:

eal/linux: translate uevent to dev event


>
> About uevent passing and processing, add below functions in linux eal
> dev layer. FreeBSD not support uevent ,so let it to be void and do not
> implement in function.
> a.dev_uev_parse
> b.dev_uev_receive
> c.dev_uev_process

We already have dummy rte_dev_event_monitor_start and 
rte_dev_event_monitor_stop, we don't need to have those dummy internal 
functions any more. Actually, you did not implement those dummy functions.

>
> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> ---
> v15->v14:
> remove the uevent type check and any policy from eal,
> let it check and management in user's callback.
> ---
>   lib/librte_eal/common/include/rte_dev.h | 17 ++++++

And if you agree with me in the above, we shall not touch this file. 
Move the definition into the previous patch.

>   lib/librte_eal/linuxapp/eal/eal_dev.c   | 95 ++++++++++++++++++++++++++++++++-
>   2 files changed, 111 insertions(+), 1 deletion(-)
>
> diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h
> index d2fcbc9..98ea12b 100644
> --- a/lib/librte_eal/common/include/rte_dev.h
> +++ b/lib/librte_eal/common/include/rte_dev.h
> @@ -24,6 +24,23 @@ extern "C" {
>   #include <rte_compat.h>
>   #include <rte_log.h>
>   
> +#define RTE_EAL_UEV_MSG_LEN 4096
> +#define RTE_EAL_UEV_MSG_ELEM_LEN 128

Such macro shall be linux uevent specific, so put them in linuxapp folder.

> +
> +enum rte_dev_state {
> +	RTE_DEV_UNDEFINED,	/**< unknown device state */
> +	RTE_DEV_FAULT,	/**< device fault or error */
> +	RTE_DEV_PARSED,	/**< device has been scanned on bus*/
> +	RTE_DEV_PROBED,	/**< device has been probed driver  */
> +};

This enum is not used in this patch series, I do see it's used in the 
other series. So put the definition there.

> +
> +enum rte_dev_event_subsystem {
> +	RTE_DEV_EVENT_SUBSYSTEM_UNKNOWN,

I don't see where we use this macro. Seems that we now only implement 
UIO, so I suppose, we shall set the other cases to this UNKNOWN.

> +	RTE_DEV_EVENT_SUBSYSTEM_UIO,
> +	RTE_DEV_EVENT_SUBSYSTEM_VFIO,

If we don't support VFIO now, I prefer not defining it now.

> +	RTE_DEV_EVENT_SUBSYSTEM_MAX
> +};

> +
>   /**
>    * The device event type.
>    */
> diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
> index 9d9e088..2b34e2c 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_dev.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
> @@ -78,9 +78,102 @@ dev_uev_monitor_create(int netlink_fd)
>   }
>   
>   static void
> +dev_uev_parse(const char *buf, struct rte_dev_event *event)
> +{
> +	char action[RTE_EAL_UEV_MSG_ELEM_LEN];
> +	char subsystem[RTE_EAL_UEV_MSG_ELEM_LEN];
> +	char dev_path[RTE_EAL_UEV_MSG_ELEM_LEN];
> +	char pci_slot_name[RTE_EAL_UEV_MSG_ELEM_LEN];
> +	int i = 0;
> +
> +	memset(action, 0, RTE_EAL_UEV_MSG_ELEM_LEN);
> +	memset(subsystem, 0, RTE_EAL_UEV_MSG_ELEM_LEN);
> +	memset(dev_path, 0, RTE_EAL_UEV_MSG_ELEM_LEN);
> +	memset(pci_slot_name, 0, RTE_EAL_UEV_MSG_ELEM_LEN);
> +

Maybe we can put an example here for better understanding.

And if this buf can contain multiple events? If yes, the implementation 
is not correct, we will only record one event; if no, we can simplify it 
a little bit.

> +	while (i < RTE_EAL_UEV_MSG_LEN) {
> +		for (; i < RTE_EAL_UEV_MSG_LEN; i++) {
> +			if (*buf)
> +				break;
> +			buf++;
> +		}

If we pass in the length of the buf, we don't have to skip "\0"?

> +		/**
> +		 * check device uevent from kernel side, no need to check
> +		 * uevent from udev.
> +		 */
> +		if (!strncmp(buf, "libudev", 7)) {

Use strcmp is enough. And we actually need to check left length enough 
for strlen("libudev").

> +			buf += 7;
> +			i += 7;
> +			return;
> +		}
> +		if (!strncmp(buf, "ACTION=", 7)) {
> +			buf += 7;
> +			i += 7;
> +			snprintf(action, sizeof(action), "%s", buf);
> +		} else if (!strncmp(buf, "DEVPATH=", 8)) {
> +			buf += 8;
> +			i += 8;
> +			snprintf(dev_path, sizeof(dev_path), "%s", buf);
> +		} else if (!strncmp(buf, "SUBSYSTEM=", 10)) {
> +			buf += 10;
> +			i += 10;
> +			snprintf(subsystem, sizeof(subsystem), "%s", buf);
> +		} else if (!strncmp(buf, "PCI_SLOT_NAME=", 14)) {
> +			buf += 14;
> +			i += 14;
> +			snprintf(pci_slot_name, sizeof(subsystem), "%s", buf);
> +			event->devname = pci_slot_name;
> +		}
> +		for (; i < RTE_EAL_UEV_MSG_LEN; i++) {
> +			if (*buf == '\0')
> +				break;
> +			buf++;
> +		}

As we already check '\0' in the begin of the loop, we don't need it at 
the end any more.

> +	}
> +
> +	if ((!strncmp(subsystem, "uio", 3)) ||
> +		(!strncmp(subsystem, "pci", 3)))
> +		event->subsystem = RTE_DEV_EVENT_SUBSYSTEM_UIO;
> +	if (!strncmp(action, "add", 3))
> +		event->type = RTE_DEV_EVENT_ADD;
> +	if (!strncmp(action, "remove", 6))
> +		event->type = RTE_DEV_EVENT_REMOVE;
> +}
> +
> +static int
> +dev_uev_receive(int fd, struct rte_dev_event *uevent)
> +{
> +	int ret;
> +	char buf[RTE_EAL_UEV_MSG_LEN];
> +
> +	memset(uevent, 0, sizeof(struct rte_dev_event));
> +	memset(buf, 0, RTE_EAL_UEV_MSG_LEN);
> +
> +	ret = recv(fd, buf, RTE_EAL_UEV_MSG_LEN - 1, MSG_DONTWAIT);
> +	if (ret < 0) {
> +		RTE_LOG(ERR, EAL,
> +		"Socket read error(%d): %s\n",
> +		errno, strerror(errno));
> +		return -1;
> +	} else if (ret == 0)
> +		/* connection closed */
> +		return -1;

So we are sure how many bytes shall be parsed, we can pass the length 
into dev_uev_parse().

> +
> +	dev_uev_parse(buf, uevent);
> +
> +	return 0;
> +}
> +
> +static void
>   dev_uev_process(__rte_unused void *param)
>   {
> -	/* TODO: device uevent processing */
> +	struct rte_dev_event uevent;
> +
> +	if (dev_uev_receive(intr_handle.fd, &uevent))
> +		return;

We don't use uevent->subsystem below, why we have to define it in first 
place?

> +
> +	if (uevent.devname)
> +		_rte_dev_callback_process(uevent.devname, uevent.type, NULL);
>   }
>   
>   int __rte_experimental

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V15 2/5] eal: add uevent pass and process function
  2018-03-21 14:20                                                         ` Tan, Jianfeng
@ 2018-03-22  8:20                                                           ` Guo, Jia
  0 siblings, 0 replies; 494+ messages in thread
From: Guo, Jia @ 2018-03-22  8:20 UTC (permalink / raw)
  To: Tan, Jianfeng, stephen, bruce.richardson, ferruh.yigit,
	konstantin.ananyev, gaetan.rivet, jingjing.wu, thomas, motih,
	harry.van.haaren
  Cc: jblunck, shreyansh.jain, dev, helin.zhang

jianfeng, thanks for your review. almost make sense and comment as bellow.


On 3/21/2018 10:20 PM, Tan, Jianfeng wrote:
>
>
> On 3/21/2018 1:27 PM, Jeff Guo wrote:
>> In order to handle the uevent which have been detected from the kernel
>> side, add uevent process function, let hot plug event to be example to
>> show uevent mechanism how to pass the uevent and process the uevent.
>
> In fact, how to pass the uevent to eal/linux for processing, is 
> already done by last patch, by registering a callback into interrupt 
> thread.
>
> In this patch, we are actually showing how to process the uevent, and 
> translate it into RTE_DEV_EVENT_ADD, RTE_DEV_EVENT_DEL, etc.
>
> So the title would be something like:
>
> eal/linux: translate uevent to dev event
>
>
sorry, that what i mean should be uevent message parse but not pass, and 
what you say make sense.
>>
>> About uevent passing and processing, add below functions in linux eal
>> dev layer. FreeBSD not support uevent ,so let it to be void and do not
>> implement in function.
>> a.dev_uev_parse
>> b.dev_uev_receive
>> c.dev_uev_process
>
> We already have dummy rte_dev_event_monitor_start and 
> rte_dev_event_monitor_stop, we don't need to have those dummy internal 
> functions any more. Actually, you did not implement those dummy 
> functions.
>
yes, not dummy just internal function.
>>
>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
>> ---
>> v15->v14:
>> remove the uevent type check and any policy from eal,
>> let it check and management in user's callback.
>> ---
>>   lib/librte_eal/common/include/rte_dev.h | 17 ++++++
>
> And if you agree with me in the above, we shall not touch this file. 
> Move the definition into the previous patch.
>
will check and split the definition more explicit.
>>   lib/librte_eal/linuxapp/eal/eal_dev.c | 95 
>> ++++++++++++++++++++++++++++++++-
>>   2 files changed, 111 insertions(+), 1 deletion(-)
>>
>> diff --git a/lib/librte_eal/common/include/rte_dev.h 
>> b/lib/librte_eal/common/include/rte_dev.h
>> index d2fcbc9..98ea12b 100644
>> --- a/lib/librte_eal/common/include/rte_dev.h
>> +++ b/lib/librte_eal/common/include/rte_dev.h
>> @@ -24,6 +24,23 @@ extern "C" {
>>   #include <rte_compat.h>
>>   #include <rte_log.h>
>>   +#define RTE_EAL_UEV_MSG_LEN 4096
>> +#define RTE_EAL_UEV_MSG_ELEM_LEN 128
>
> Such macro shall be linux uevent specific, so put them in linuxapp 
> folder.
>
agree.
>> +
>> +enum rte_dev_state {
>> +    RTE_DEV_UNDEFINED,    /**< unknown device state */
>> +    RTE_DEV_FAULT,    /**< device fault or error */
>> +    RTE_DEV_PARSED,    /**< device has been scanned on bus*/
>> +    RTE_DEV_PROBED,    /**< device has been probed driver */
>> +};
>
> This enum is not used in this patch series, I do see it's used in the 
> other series. So put the definition there.
>
yes.
>> +
>> +enum rte_dev_event_subsystem {
>> +    RTE_DEV_EVENT_SUBSYSTEM_UNKNOWN,
>
> I don't see where we use this macro. Seems that we now only implement 
> UIO, so I suppose, we shall set the other cases to this UNKNOWN.
>
ok.
>> +    RTE_DEV_EVENT_SUBSYSTEM_UIO,
>> +    RTE_DEV_EVENT_SUBSYSTEM_VFIO,
>
> If we don't support VFIO now, I prefer not defining it now.
>
will remove it at this stage and add later.
>> +    RTE_DEV_EVENT_SUBSYSTEM_MAX
>> +};
>
>> +
>>   /**
>>    * The device event type.
>>    */
>> diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c 
>> b/lib/librte_eal/linuxapp/eal/eal_dev.c
>> index 9d9e088..2b34e2c 100644
>> --- a/lib/librte_eal/linuxapp/eal/eal_dev.c
>> +++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
>> @@ -78,9 +78,102 @@ dev_uev_monitor_create(int netlink_fd)
>>   }
>>     static void
>> +dev_uev_parse(const char *buf, struct rte_dev_event *event)
>> +{
>> +    char action[RTE_EAL_UEV_MSG_ELEM_LEN];
>> +    char subsystem[RTE_EAL_UEV_MSG_ELEM_LEN];
>> +    char dev_path[RTE_EAL_UEV_MSG_ELEM_LEN];
>> +    char pci_slot_name[RTE_EAL_UEV_MSG_ELEM_LEN];
>> +    int i = 0;
>> +
>> +    memset(action, 0, RTE_EAL_UEV_MSG_ELEM_LEN);
>> +    memset(subsystem, 0, RTE_EAL_UEV_MSG_ELEM_LEN);
>> +    memset(dev_path, 0, RTE_EAL_UEV_MSG_ELEM_LEN);
>> +    memset(pci_slot_name, 0, RTE_EAL_UEV_MSG_ELEM_LEN);
>> +
>
> Maybe we can put an example here for better understanding.
>
> And if this buf can contain multiple events? If yes, the 
> implementation is not correct, we will only record one event; if no, 
> we can simplify it a little bit.
>
the buf do not contain multiple event but will involve more string split 
by several  "/0" , so need check that by bellow code.
>> +    while (i < RTE_EAL_UEV_MSG_LEN) {
>> +        for (; i < RTE_EAL_UEV_MSG_LEN; i++) {
>> +            if (*buf)
>> +                break;
>> +            buf++;
>> +        }
>
> If we pass in the length of the buf, we don't have to skip "\0"?
>
the reason is show as above.
>> +        /**
>> +         * check device uevent from kernel side, no need to check
>> +         * uevent from udev.
>> +         */
>> +        if (!strncmp(buf, "libudev", 7)) {
>
> Use strcmp is enough. And we actually need to check left length enough 
> for strlen("libudev").
>
>> +            buf += 7;
>> +            i += 7;
>> +            return;
>> +        }
>> +        if (!strncmp(buf, "ACTION=", 7)) {
>> +            buf += 7;
>> +            i += 7;
>> +            snprintf(action, sizeof(action), "%s", buf);
>> +        } else if (!strncmp(buf, "DEVPATH=", 8)) {
>> +            buf += 8;
>> +            i += 8;
>> +            snprintf(dev_path, sizeof(dev_path), "%s", buf);
>> +        } else if (!strncmp(buf, "SUBSYSTEM=", 10)) {
>> +            buf += 10;
>> +            i += 10;
>> +            snprintf(subsystem, sizeof(subsystem), "%s", buf);
>> +        } else if (!strncmp(buf, "PCI_SLOT_NAME=", 14)) {
>> +            buf += 14;
>> +            i += 14;
>> +            snprintf(pci_slot_name, sizeof(subsystem), "%s", buf);
>> +            event->devname = pci_slot_name;
>> +        }
>> +        for (; i < RTE_EAL_UEV_MSG_LEN; i++) {
>> +            if (*buf == '\0')
>> +                break;
>> +            buf++;
>> +        }
>
> As we already check '\0' in the begin of the loop, we don't need it at 
> the end any more.
>
the reason is show as above.
>> +    }
>> +
>> +    if ((!strncmp(subsystem, "uio", 3)) ||
>> +        (!strncmp(subsystem, "pci", 3)))
>> +        event->subsystem = RTE_DEV_EVENT_SUBSYSTEM_UIO;
>> +    if (!strncmp(action, "add", 3))
>> +        event->type = RTE_DEV_EVENT_ADD;
>> +    if (!strncmp(action, "remove", 6))
>> +        event->type = RTE_DEV_EVENT_REMOVE;
>> +}
>> +
>> +static int
>> +dev_uev_receive(int fd, struct rte_dev_event *uevent)
>> +{
>> +    int ret;
>> +    char buf[RTE_EAL_UEV_MSG_LEN];
>> +
>> +    memset(uevent, 0, sizeof(struct rte_dev_event));
>> +    memset(buf, 0, RTE_EAL_UEV_MSG_LEN);
>> +
>> +    ret = recv(fd, buf, RTE_EAL_UEV_MSG_LEN - 1, MSG_DONTWAIT);
>> +    if (ret < 0) {
>> +        RTE_LOG(ERR, EAL,
>> +        "Socket read error(%d): %s\n",
>> +        errno, strerror(errno));
>> +        return -1;
>> +    } else if (ret == 0)
>> +        /* connection closed */
>> +        return -1;
>
> So we are sure how many bytes shall be parsed, we can pass the length 
> into dev_uev_parse().
>
might be better from what you said.
>> +
>> +    dev_uev_parse(buf, uevent);
>> +
>> +    return 0;
>> +}
>> +
>> +static void
>>   dev_uev_process(__rte_unused void *param)
>>   {
>> -    /* TODO: device uevent processing */
>> +    struct rte_dev_event uevent;
>> +
>> +    if (dev_uev_receive(intr_handle.fd, &uevent))
>> +        return;
>
> We don't use uevent->subsystem below, why we have to define it in 
> first place?
>
could check here and i will add that check only for uio now.
>> +
>> +    if (uevent.devname)
>> +        _rte_dev_callback_process(uevent.devname, uevent.type, NULL);
>>   }
>>     int __rte_experimental
>

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH V16 0/3] add device event monitor framework
  2018-03-21  5:27                                                       ` [PATCH V15 3/5] app/testpmd: use uevent to monitor hotplug Jeff Guo
@ 2018-03-26 10:55                                                         ` Jeff Guo
  2018-03-26 10:55                                                           ` [PATCH V16 1/3] eal: add device event handle in interrupt thread Jeff Guo
                                                                             ` (2 more replies)
  2018-03-26 11:20                                                         ` [PATCH V16 0/4] add device event monitor framework Jeff Guo
  1 sibling, 3 replies; 494+ messages in thread
From: Jeff Guo @ 2018-03-26 10:55 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, harry.van.haaren,
	jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

About hot plug in dpdk, We already have proactive way to add/remove devices
through APIs (rte_eal_hotplug_add/remove), and also have fail-safe driver to
offload the fail-safe work from the app user. But there are still lack of a
general mechanism to monitor hotplug event for all driver, now the hotplug
interrupt event is diversity between each device and driver, such as mlx4,
pci driver and others.

Use the hot removal event for example, pci drivers not all exposure the
remove interrupt, so in order to make user to easy use the hot plug feature
for pci driver, something must be done to detect the remove event at the
kernel level and offer a new line of interrupt to the user land.

Base on the uevent of kobject mechanism in kernel, we could use it to
benefit for monitoring the hot plug status of the device which not only
uio/vfio of pci bus devices, but also other, such as cpu/usb/pci-express
bus devices.

The idea is comming as bellow.

a.The uevent message form FD monitoring which will be useful.
remove@/devices/pci0000:80/0000:80:02.2/0000:82:00.0/0000:83:03.0/0000:84:00.2/uio/uio2
ACTION=remove
DEVPATH=/devices/pci0000:80/0000:80:02.2/0000:82:00.0/0000:83:03.0/0000:84:00.2/uio/uio2
SUBSYSTEM=uio
MAJOR=243
MINOR=2
DEVNAME=uio2
SEQNUM=11366

b.add uevent monitoring machanism:
add several general api to enable uevent monitoring.

c.add common uevent handler and uevent failure handler
uevent of device should be handler at bus or device layer, and the memory read
and write failure when hot removal should be handle correctly before detach behaviors.

d.show example how to use uevent monitor
enable uevent monitoring in testpmd or fail-safe to show usage.

patchset history:
v16->v15:
1.remove some linux related code out of eal common layer
2.fix some uneasy readble issue.

v15->v14:
1.use exist eal interrupt epoll to replace of rte service usage for monitor thread,
2.add new device event handle type in eal interrupt.
3.remove the uevent type check and any policy from eal,
let it check and management in user's callback.
4.add "--hot-plug" configure parameter in testpmd to switch the hotplug feature.

v14->v13:
1.add __rte_experimental on function defind and fix bsd build issue

v13->v12:
1.fix some logic issue and null check issue
2.fix monitor stop func issue

v12->v11:
1.identify null param in callback for monitor all devices uevent

v11->v10:
1:modify some typo and add experimental tag in new file.
2:modify callback register calling.

v10->v9:
1.fix prefix issue.
2.use a common callback lists for all device and all type to replace
add callback parameter into device struct.
3.delete some unuse part.

v9->v8:
split the patch set into small and explicit patch

v8->v7:
1.use rte_service to replace pthread management.
2.fix defind issue and copyright issue
3.fix some lock issue

v7->v6:
1.modify vdev part according to the vdev rework
2.re-define and split the func into common and bus specific code
3.fix some incorrect issue.
4.fix the system hung after send packcet issue.

v6->v5:
1.add hot plug policy, in eal, default handle to prepare hot plug work for
all pci device, then let app to manage to deside which device need to
hot plug.
2.modify to manage event callback in each device.
3.fix some system hung issue when igb_uio release.
4.modify the pci part to the bus-pci base on the bus rework.
5.add hot plug policy in app, show example to use hotplug list to manage
to deside which device need to hot plug.

v5->v4:
1.Move uevent monitor epolling from eal interrupt to eal device layer.
2.Redefine the eal device API for common, and distinguish between linux and bsd
3.Add failure handler helper api in bus layer.Add function of find device by name.
4.Replace of individual fd bind with single device, use a common fd to polling all device.
5.Add to register hot insertion monitoring and process, add function to auto bind driver befor user add device
6.Refine some coding style and typos issue
7.add new callback to process hot insertion

v4->v3:
1.move uevent monitor api from eal interrupt to eal device layer.
2.create uevent type and struct in eal device.
3.move uevent handler for each driver to eal layer.
4.add uevent failure handler to process signal fault issue.
5.add example for request and use uevent monitoring in testpmd.

v3->v2:
1.refine some return error
2.refine the string searching logic to avoid memory issue

v2->v1:
1.remove global variables of hotplug_fd, add uevent_fd
in rte_intr_handle to let each pci device self maintain it fd,
to fix dual device fd issue.
2.refine some typo error.


Jeff Guo (3):
  eal: add device event handle in interrupt thread
  eal: add device event monitor framework
  app/testpmd: enable device hotplug monitoring

 app/test-pmd/parameters.c                          |   5 +-
 app/test-pmd/testpmd.c                             | 195 ++++++++++++++++++++-
 app/test-pmd/testpmd.h                             |  11 ++
 lib/librte_eal/bsdapp/eal/Makefile                 |   1 +
 lib/librte_eal/bsdapp/eal/eal_dev.c                |  19 ++
 lib/librte_eal/common/eal_common_dev.c             | 145 +++++++++++++++
 lib/librte_eal/common/eal_private.h                |  24 +++
 lib/librte_eal/common/include/rte_dev.h            |  92 ++++++++++
 lib/librte_eal/common/include/rte_eal_interrupts.h |   1 +
 lib/librte_eal/linuxapp/eal/Makefile               |   1 +
 lib/librte_eal/linuxapp/eal/eal_dev.c              |  20 +++
 lib/librte_eal/linuxapp/eal/eal_interrupts.c       |   5 +-
 12 files changed, 516 insertions(+), 3 deletions(-)
 create mode 100644 lib/librte_eal/bsdapp/eal/eal_dev.c
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_dev.c

-- 
2.7.4

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH V16 1/3] eal: add device event handle in interrupt thread
  2018-03-26 10:55                                                         ` [PATCH V16 0/3] add device event monitor framework Jeff Guo
@ 2018-03-26 10:55                                                           ` Jeff Guo
  2018-03-26 10:55                                                           ` [PATCH V16 2/3] eal: add device event monitor framework Jeff Guo
  2018-03-26 10:55                                                           ` [PATCH V16 3/3] app/testpmd: enable device hotplug monitoring Jeff Guo
  2 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-03-26 10:55 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, harry.van.haaren,
	jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

Add new interrupt handle type of RTE_INTR_HANDLE_DEV_EVENT, for
device event interrupt monitor.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v16->v15:
split into small patch base on the function
---
 lib/librte_eal/common/include/rte_eal_interrupts.h | 1 +
 lib/librte_eal/linuxapp/eal/eal_interrupts.c       | 5 ++++-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/common/include/rte_eal_interrupts.h b/lib/librte_eal/common/include/rte_eal_interrupts.h
index 3f792a9..6eb4932 100644
--- a/lib/librte_eal/common/include/rte_eal_interrupts.h
+++ b/lib/librte_eal/common/include/rte_eal_interrupts.h
@@ -34,6 +34,7 @@ enum rte_intr_handle_type {
 	RTE_INTR_HANDLE_ALARM,        /**< alarm handle */
 	RTE_INTR_HANDLE_EXT,          /**< external handler */
 	RTE_INTR_HANDLE_VDEV,         /**< virtual device */
+	RTE_INTR_HANDLE_DEV_EVENT,    /**< device event handle */
 	RTE_INTR_HANDLE_MAX           /**< count of elements */
 };
 
diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
index f86f22f..842acaa 100644
--- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c
+++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
@@ -674,7 +674,10 @@ eal_intr_process_interrupts(struct epoll_event *events, int nfds)
 			bytes_read = 0;
 			call = true;
 			break;
-
+		case RTE_INTR_HANDLE_DEV_EVENT:
+			bytes_read = 0;
+			call = true;
+			break;
 		default:
 			bytes_read = 1;
 			break;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH V16 2/3] eal: add device event monitor framework
  2018-03-26 10:55                                                         ` [PATCH V16 0/3] add device event monitor framework Jeff Guo
  2018-03-26 10:55                                                           ` [PATCH V16 1/3] eal: add device event handle in interrupt thread Jeff Guo
@ 2018-03-26 10:55                                                           ` Jeff Guo
  2018-03-26 10:55                                                           ` [PATCH V16 3/3] app/testpmd: enable device hotplug monitoring Jeff Guo
  2 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-03-26 10:55 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, harry.van.haaren,
	jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

This patch aims to add a general device event monitor mechanism at
EAL device layer, for device hotplug awareness and actions adopted
accordingly. It could also expand for all other type of device event
monitor, but not in this scope at the stage.

To get started, users firstly register or unregister callbacks through
the new added APIs. Callbacks can be some device specific, or for all
devices.
  -rte_dev_callback_register
  -rte_dev_callback_unregister

Then application shall call below new added APIs to enable/disable the
mechanism:
  - rte_dev_event_monitor_start
  - rte_dev_event_monitor_stop

Use hotplug case for example, when device hotplug insertion or hotplug
removal, we will get notified from kernel, then call user's callbacks
accordingly to handle it, such as detach or attach the device from the
bus, and could be benifit for futher fail-safe or live-migration.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v16->v15:
1.remove some linux related code out of eal common layer
2.fix some uneasy readble issue.
---
 lib/librte_eal/bsdapp/eal/Makefile      |   1 +
 lib/librte_eal/bsdapp/eal/eal_dev.c     |  19 +++++
 lib/librte_eal/common/eal_common_dev.c  | 145 ++++++++++++++++++++++++++++++++
 lib/librte_eal/common/eal_private.h     |  24 ++++++
 lib/librte_eal/common/include/rte_dev.h |  92 ++++++++++++++++++++
 lib/librte_eal/linuxapp/eal/Makefile    |   1 +
 lib/librte_eal/linuxapp/eal/eal_dev.c   |  20 +++++
 7 files changed, 302 insertions(+)
 create mode 100644 lib/librte_eal/bsdapp/eal/eal_dev.c
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_dev.c

diff --git a/lib/librte_eal/bsdapp/eal/Makefile b/lib/librte_eal/bsdapp/eal/Makefile
index dd455e6..c0921dd 100644
--- a/lib/librte_eal/bsdapp/eal/Makefile
+++ b/lib/librte_eal/bsdapp/eal/Makefile
@@ -33,6 +33,7 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_lcore.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_timer.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_interrupts.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_alarm.c
+SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_dev.c
 
 # from common dir
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_common_lcore.c
diff --git a/lib/librte_eal/bsdapp/eal/eal_dev.c b/lib/librte_eal/bsdapp/eal/eal_dev.c
new file mode 100644
index 0000000..ad606b3
--- /dev/null
+++ b/lib/librte_eal/bsdapp/eal/eal_dev.c
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <rte_log.h>
+
+int __rte_experimental
+rte_dev_event_monitor_start(void)
+{
+	RTE_LOG(ERR, EAL, "Device event is not supported for FreeBSD\n");
+	return -1;
+}
+
+int __rte_experimental
+rte_dev_event_monitor_stop(void)
+{
+	RTE_LOG(ERR, EAL, "Device event is not supported for FreeBSD\n");
+	return -1;
+}
diff --git a/lib/librte_eal/common/eal_common_dev.c b/lib/librte_eal/common/eal_common_dev.c
index cd07144..3a1bbb6 100644
--- a/lib/librte_eal/common/eal_common_dev.c
+++ b/lib/librte_eal/common/eal_common_dev.c
@@ -14,9 +14,34 @@
 #include <rte_devargs.h>
 #include <rte_debug.h>
 #include <rte_log.h>
+#include <rte_spinlock.h>
+#include <rte_malloc.h>
 
 #include "eal_private.h"
 
+/* spinlock for device callbacks */
+static rte_spinlock_t dev_event_lock = RTE_SPINLOCK_INITIALIZER;
+
+/**
+ * The device event callback description.
+ *
+ * It contains callback address to be registered by user application,
+ * the pointer to the parameters for callback, and the device name.
+ */
+struct dev_event_callback {
+	TAILQ_ENTRY(dev_event_callback) next; /**< Callbacks list */
+	rte_dev_event_cb_fn cb_fn;                /**< Callback address */
+	void *cb_arg;                           /**< Callback parameter */
+	char *dev_name;	 /**< Callback devcie name, NULL is for all device */
+	uint32_t active;                        /**< Callback is executing */
+};
+
+/** @internal Structure to keep track of registered callbacks */
+TAILQ_HEAD(dev_event_cb_list, dev_event_callback);
+
+/* The device event callback list for all registered callbacks. */
+static struct dev_event_cb_list dev_event_cbs;
+
 static int cmp_detached_dev_name(const struct rte_device *dev,
 	const void *_name)
 {
@@ -207,3 +232,123 @@ rte_eal_hotplug_remove(const char *busname, const char *devname)
 	rte_eal_devargs_remove(busname, devname);
 	return ret;
 }
+
+static struct dev_event_callback * __rte_experimental
+dev_event_cb_find(const char *device_name, rte_dev_event_cb_fn cb_fn,
+				void *cb_arg)
+{
+	struct dev_event_callback *event_cb = NULL;
+
+	TAILQ_FOREACH(event_cb, &(dev_event_cbs), next) {
+		if (event_cb->cb_fn == cb_fn && event_cb->cb_arg == cb_arg) {
+			if (device_name == NULL && event_cb->dev_name == NULL)
+				break;
+			if (device_name == NULL || event_cb->dev_name == NULL)
+				continue;
+			if (!strcmp(event_cb->dev_name, device_name))
+				break;
+		}
+	}
+	return event_cb;
+}
+
+int __rte_experimental
+rte_dev_callback_register(const char *device_name, rte_dev_event_cb_fn cb_fn,
+				void *cb_arg)
+{
+	struct dev_event_callback *event_cb = NULL;
+
+	if (!cb_fn)
+		return -EINVAL;
+
+	rte_spinlock_lock(&dev_event_lock);
+
+	if (TAILQ_EMPTY(&(dev_event_cbs)))
+		TAILQ_INIT(&(dev_event_cbs));
+
+	event_cb = dev_event_cb_find(device_name, cb_fn, cb_arg);
+
+	/* create a new callback. */
+	if (event_cb == NULL) {
+		event_cb = malloc(sizeof(struct dev_event_callback));
+		if (event_cb != NULL) {
+			event_cb->cb_fn = cb_fn;
+			event_cb->cb_arg = cb_arg;
+			event_cb->dev_name = strdup(device_name);
+			if (event_cb->dev_name == NULL)
+				return -EINVAL;
+			TAILQ_INSERT_TAIL(&(dev_event_cbs), event_cb, next);
+		} else {
+			RTE_LOG(ERR, EAL,
+				"Failed to allocate memory for device event callback");
+			return -ENOMEM;
+		}
+	}
+
+	rte_spinlock_unlock(&dev_event_lock);
+	return (event_cb == NULL) ? -EEXIST : 0;
+}
+
+int __rte_experimental
+rte_dev_callback_unregister(const char *device_name, rte_dev_event_cb_fn cb_fn,
+				void *cb_arg)
+{
+	int ret = -1;
+	struct dev_event_callback *event_cb = NULL;
+
+	if (!cb_fn)
+		return -EINVAL;
+
+	rte_spinlock_lock(&dev_event_lock);
+
+	event_cb = dev_event_cb_find(device_name, cb_fn, cb_arg);
+
+	/*
+	 * if this callback is not executing right now,
+	 * then remove it.
+	 */
+	if (event_cb != NULL) {
+		if (event_cb->active == 0) {
+			TAILQ_REMOVE(&(dev_event_cbs), event_cb, next);
+			rte_free(event_cb);
+			ret = 0;
+		}
+		ret = -EBUSY;
+	}
+
+	rte_spinlock_unlock(&dev_event_lock);
+	return ret;
+}
+
+int __rte_experimental
+dev_callback_process(char *device_name, enum rte_dev_event_type event,
+				void *cb_arg)
+{
+	struct dev_event_callback *cb_lst;
+	int rc = 0;
+
+	rte_spinlock_lock(&dev_event_lock);
+
+	if (device_name == NULL)
+		return -EINVAL;
+
+	TAILQ_FOREACH(cb_lst, &dev_event_cbs, next) {
+		if (!cb_lst->dev_name)
+			break;
+		else if (!strcmp(cb_lst->dev_name, device_name))
+			break;
+	}
+	if (cb_lst) {
+		cb_lst->active = 1;
+		if (cb_arg)
+			cb_lst->cb_arg = cb_arg;
+		rte_spinlock_unlock(&dev_event_lock);
+		rc = cb_lst->cb_fn(device_name, event,
+				cb_lst->cb_arg);
+		rte_spinlock_lock(&dev_event_lock);
+		cb_lst->active = 0;
+	}
+
+	rte_spinlock_unlock(&dev_event_lock);
+	return rc;
+}
diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h
index 0b28770..d55cd68 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -9,6 +9,8 @@
 #include <stdint.h>
 #include <stdio.h>
 
+#include <rte_dev.h>
+
 /**
  * Initialize the memzone subsystem (private to eal).
  *
@@ -205,4 +207,26 @@ struct rte_bus *rte_bus_find_by_device_name(const char *str);
 
 int rte_mp_channel_init(void);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * internal Executes all the user application registered callbacks for
+ * the specific device. It is for DPDK internal user only. User
+ * application should not call it directly.
+ *
+ * @param device_name
+ *  The device name.
+ * @param event
+ *  the device event type
+ * @param cb_arg
+ *  callback parameter.
+ *
+ * @return
+ *  - On success, return zero.
+ *  - On failure, a negative value.
+ */
+int __rte_experimental
+dev_callback_process(char *device_name, enum rte_dev_event_type event,
+				void *cb_arg);
 #endif /* _EAL_PRIVATE_H_ */
diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h
index b688f1e..8867de6 100644
--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -24,6 +24,26 @@ extern "C" {
 #include <rte_compat.h>
 #include <rte_log.h>
 
+/**
+ * The device event type.
+ */
+enum rte_dev_event_type {
+	RTE_DEV_EVENT_UNKNOWN,	/**< unknown event type */
+	RTE_DEV_EVENT_ADD,	/**< device being added */
+	RTE_DEV_EVENT_REMOVE,	/**< device being removed */
+	RTE_DEV_EVENT_MAX	/**< max value of this enum */
+};
+
+struct rte_dev_event {
+	enum rte_dev_event_type type;	/**< device event type */
+	int subsystem;			/**< subsystem id */
+	char *devname;			/**< device name */
+};
+
+typedef int (*rte_dev_event_cb_fn)(char *device_name,
+					enum rte_dev_event_type event,
+					void *cb_arg);
+
 __attribute__((format(printf, 2, 0)))
 static inline void
 rte_pmd_debug_trace(const char *func_name, const char *fmt, ...)
@@ -267,4 +287,76 @@ __attribute__((used)) = str
 }
 #endif
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * It registers the callback for the specific device.
+ * Multiple callbacks cal be registered at the same time.
+ *
+ * @param device_name
+ *  The device name, that is the param name of the struct rte_device,
+ *  null value means for all devices.
+ * @param cb_fn
+ *  callback address.
+ * @param cb_arg
+ *  address of parameter for callback.
+ *
+ * @return
+ *  - On success, zero.
+ *  - On failure, a negative value.
+ */
+int __rte_experimental
+rte_dev_callback_register(const char *device_name, rte_dev_event_cb_fn cb_fn,
+			void *cb_arg);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * It unregisters the callback according to the specified device.
+ *
+ * @param device_name
+ *  The device name, that is the param name of the struct rte_device,
+ *  null value means for all devices.
+ * @param cb_fn
+ *  callback address.
+ * @param cb_arg
+ *  address of parameter for callback.
+ *
+ * @return
+ *  - On success, return the number of callback entities removed.
+ *  - On failure, a negative value.
+ */
+int __rte_experimental
+rte_dev_callback_unregister(const char *device_name, rte_dev_event_cb_fn cb_fn,
+					void *cb_arg);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Start the device event monitoring.
+ *
+ * @param none
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int __rte_experimental
+rte_dev_event_monitor_start(void);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Stop the device event monitoring .
+ *
+ * @param none
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int __rte_experimental
+rte_dev_event_monitor_stop(void);
 #endif /* _RTE_DEV_H_ */
diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index 7e5bbe8..8578796 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -41,6 +41,7 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_lcore.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_timer.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_interrupts.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_alarm.c
+SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_dev.c
 
 # from common dir
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_common_lcore.c
diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
new file mode 100644
index 0000000..5ab5830
--- /dev/null
+++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <rte_log.h>
+#include <rte_dev.h>
+
+int __rte_experimental
+rte_dev_event_monitor_start(void)
+{
+	/* TODO: start uevent monitor for linux */
+	return 0;
+}
+
+int __rte_experimental
+rte_dev_event_monitor_stop(void)
+{
+	/* TODO: stop uevent monitor for linux */
+	return 0;
+}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH V16 3/3] app/testpmd: enable device hotplug monitoring
  2018-03-26 10:55                                                         ` [PATCH V16 0/3] add device event monitor framework Jeff Guo
  2018-03-26 10:55                                                           ` [PATCH V16 1/3] eal: add device event handle in interrupt thread Jeff Guo
  2018-03-26 10:55                                                           ` [PATCH V16 2/3] eal: add device event monitor framework Jeff Guo
@ 2018-03-26 10:55                                                           ` Jeff Guo
  2 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-03-26 10:55 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, harry.van.haaren,
	jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

Use testpmd for example, to show an application how to use device event
mechanism to monitor the hotplug event, involve both hot removal event
and the hot insertion event.

The process is that, testpmd first enable hotplug monitoring and register
the user's callback, when device being hotplug insertion or hotplug
removal, the eal monitor the event and call user's callbacks, the
application according their hot plug policy to detach or attach the device
from the bus.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v16->v15:
1.modify log and patch description.
---
 app/test-pmd/parameters.c |   5 +-
 app/test-pmd/testpmd.c    | 195 +++++++++++++++++++++++++++++++++++++++++++++-
 app/test-pmd/testpmd.h    |  11 +++
 3 files changed, 209 insertions(+), 2 deletions(-)

diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index 97d22b8..825d602 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -186,6 +186,7 @@ usage(char* progname)
 	printf("  --flow-isolate-all: "
 	       "requests flow API isolated mode on all ports at initialization time.\n");
 	printf("  --tx-offloads=0xXXXXXXXX: hexadecimal bitmask of TX queue offloads\n");
+	printf("  --hot-plug: enalbe hot plug for device.\n");
 }
 
 #ifdef RTE_LIBRTE_CMDLINE
@@ -621,6 +622,7 @@ launch_args_parse(int argc, char** argv)
 		{ "print-event",		1, 0, 0 },
 		{ "mask-event",			1, 0, 0 },
 		{ "tx-offloads",		1, 0, 0 },
+		{ "hot-plug",			0, 0, 0 },
 		{ 0, 0, 0, 0 },
 	};
 
@@ -1102,7 +1104,8 @@ launch_args_parse(int argc, char** argv)
 					rte_exit(EXIT_FAILURE,
 						 "invalid mask-event argument\n");
 				}
-
+			if (!strcmp(lgopts[opt_idx].name, "hot-plug"))
+				hot_plug = 1;
 			break;
 		case 'h':
 			usage(argv[0]);
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 4c0e258..bb1ac8f 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -12,6 +12,7 @@
 #include <sys/mman.h>
 #include <sys/types.h>
 #include <errno.h>
+#include <stdbool.h>
 
 #include <sys/queue.h>
 #include <sys/stat.h>
@@ -284,6 +285,9 @@ uint8_t lsc_interrupt = 1; /* enabled by default */
  */
 uint8_t rmv_interrupt = 1; /* enabled by default */
 
+
+uint8_t hot_plug = 0; /**< hotplug disabled by default. */
+
 /*
  * Display or mask ether events
  * Default to all events except VF_MBOX
@@ -384,6 +388,8 @@ uint8_t bitrate_enabled;
 struct gro_status gro_ports[RTE_MAX_ETHPORTS];
 uint8_t gro_flush_cycles = GRO_DEFAULT_FLUSH_CYCLES;
 
+static struct hotplug_request_list hp_list;
+
 /* Forward function declarations */
 static void map_port_queue_stats_mapping_registers(portid_t pi,
 						   struct rte_port *port);
@@ -391,6 +397,14 @@ static void check_all_ports_link_status(uint32_t port_mask);
 static int eth_event_callback(portid_t port_id,
 			      enum rte_eth_event_type type,
 			      void *param, void *ret_param);
+static int eth_dev_event_callback(char *device_name,
+				enum rte_dev_event_type type,
+				void *param);
+static int eth_dev_event_callback_register(portid_t port_id);
+static bool in_hotplug_list(const char *dev_name);
+
+static int hotplug_list_add(struct rte_device *device,
+				enum rte_kernel_driver device_kdrv);
 
 /*
  * Check if all the ports are started.
@@ -1853,6 +1867,27 @@ reset_port(portid_t pid)
 	printf("Done\n");
 }
 
+static int
+eth_dev_event_callback_register(portid_t port_id)
+{
+	int diag;
+	char device_name[128];
+
+	snprintf(device_name, sizeof(device_name),
+		"%s", rte_eth_devices[port_id].device->name);
+
+	/* register the dev_event callback */
+
+	diag = rte_dev_callback_register(device_name,
+		eth_dev_event_callback, (void *)(intptr_t)port_id);
+	if (diag) {
+		printf("Failed to setup dev_event callback\n");
+		return -1;
+	}
+
+	return 0;
+}
+
 void
 attach_port(char *identifier)
 {
@@ -1869,6 +1904,8 @@ attach_port(char *identifier)
 	if (rte_eth_dev_attach(identifier, &pi))
 		return;
 
+	eth_dev_event_callback_register(pi);
+
 	socket_id = (unsigned)rte_eth_dev_socket_id(pi);
 	/* if socket_id is invalid, set to 0 */
 	if (check_socket_id(socket_id) < 0)
@@ -1880,6 +1917,12 @@ attach_port(char *identifier)
 
 	ports[pi].port_status = RTE_PORT_STOPPED;
 
+	if (hot_plug) {
+		hotplug_list_add(rte_eth_devices[pi].device,
+				 rte_eth_devices[pi].data->kdrv);
+		eth_dev_event_callback_register(pi);
+	}
+
 	printf("Port %d is attached. Now total ports is %d\n", pi, nb_ports);
 	printf("Done\n");
 }
@@ -1906,6 +1949,12 @@ detach_port(portid_t port_id)
 
 	nb_ports = rte_eth_dev_count();
 
+	if (hot_plug) {
+		hotplug_list_add(rte_eth_devices[port_id].device,
+				 rte_eth_devices[port_id].data->kdrv);
+		eth_dev_event_callback_register(port_id);
+	}
+
 	printf("Port '%s' is detached. Now total ports is %d\n",
 			name, nb_ports);
 	printf("Done\n");
@@ -1929,6 +1978,9 @@ pmd_test_exit(void)
 			close_port(pt_id);
 		}
 	}
+
+	rte_dev_event_monitor_stop();
+
 	printf("\nBye...\n");
 }
 
@@ -2013,6 +2065,49 @@ rmv_event_callback(void *arg)
 			dev->device->name);
 }
 
+static void
+rmv_dev_event_callback(void *arg)
+{
+	char name[RTE_ETH_NAME_MAX_LEN];
+	uint8_t port_id = (intptr_t)arg;
+
+	rte_eal_alarm_cancel(rmv_dev_event_callback, arg);
+
+	RTE_ETH_VALID_PORTID_OR_RET(port_id);
+	printf("removing port id:%u\n", port_id);
+
+	if (!in_hotplug_list(rte_eth_devices[port_id].device->name))
+		return;
+
+	stop_packet_forwarding();
+
+	stop_port(port_id);
+	close_port(port_id);
+	if (rte_eth_dev_detach(port_id, name)) {
+		RTE_LOG(ERR, USER1, "Failed to detach port '%s'\n", name);
+		return;
+	}
+
+	nb_ports = rte_eth_dev_count();
+
+	printf("Port '%s' is detached. Now total ports is %d\n",
+			name, nb_ports);
+}
+
+static void
+add_dev_event_callback(void *arg)
+{
+	char *dev_name = (char *)arg;
+
+	rte_eal_alarm_cancel(add_dev_event_callback, arg);
+
+	if (!in_hotplug_list(dev_name))
+		return;
+
+	RTE_LOG(ERR, EAL, "add device: %s\n", dev_name);
+	attach_port(dev_name);
+}
+
 /* This function is used by the interrupt thread */
 static int
 eth_event_callback(portid_t port_id, enum rte_eth_event_type type, void *param,
@@ -2059,6 +2154,86 @@ eth_event_callback(portid_t port_id, enum rte_eth_event_type type, void *param,
 	return 0;
 }
 
+static bool
+in_hotplug_list(const char *dev_name)
+{
+	struct hotplug_request *hp_request = NULL;
+
+	TAILQ_FOREACH(hp_request, &hp_list, next) {
+		if (!strcmp(hp_request->dev_name, dev_name))
+			break;
+	}
+
+	if (hp_request)
+		return true;
+
+	return false;
+}
+
+static int
+hotplug_list_add(struct rte_device *device, enum rte_kernel_driver device_kdrv)
+{
+	struct hotplug_request *hp_request;
+
+	hp_request = rte_zmalloc("hoplug request",
+			sizeof(*hp_request), 0);
+	if (hp_request == NULL) {
+		fprintf(stderr, "%s can not alloc memory\n",
+			__func__);
+		return -ENOMEM;
+	}
+
+	hp_request->dev_name = device->name;
+	hp_request->dev_kdrv = device_kdrv;
+
+	TAILQ_INSERT_TAIL(&hp_list, hp_request, next);
+
+	return 0;
+}
+
+/* This function is used by the interrupt thread */
+static int
+eth_dev_event_callback(char *device_name, enum rte_dev_event_type type,
+			     void *arg)
+{
+	static const char * const event_desc[] = {
+		[RTE_DEV_EVENT_UNKNOWN] = "Unknown",
+		[RTE_DEV_EVENT_ADD] = "add",
+		[RTE_DEV_EVENT_REMOVE] = "remove",
+	};
+	char *dev_name = malloc(strlen(device_name) + 1);
+
+	strcpy(dev_name, device_name);
+
+	if (type >= RTE_DEV_EVENT_MAX) {
+		fprintf(stderr, "%s called upon invalid event %d\n",
+			__func__, type);
+		fflush(stderr);
+	} else if (event_print_mask & (UINT32_C(1) << type)) {
+		printf("%s event\n",
+			event_desc[type]);
+		fflush(stdout);
+	}
+
+	switch (type) {
+	case RTE_DEV_EVENT_REMOVE:
+		if (rte_eal_alarm_set(100000,
+			rmv_dev_event_callback, arg))
+			fprintf(stderr,
+				"Could not set up deferred device removal\n");
+		break;
+	case RTE_DEV_EVENT_ADD:
+		if (rte_eal_alarm_set(100000,
+			add_dev_event_callback, dev_name))
+			fprintf(stderr,
+				"Could not set up deferred device add\n");
+		break;
+	default:
+		break;
+	}
+	return 0;
+}
+
 static int
 set_tx_queue_stats_mapping_registers(portid_t port_id, struct rte_port *port)
 {
@@ -2474,8 +2649,9 @@ signal_handler(int signum)
 int
 main(int argc, char** argv)
 {
-	int  diag;
+	int diag;
 	portid_t port_id;
+	int ret;
 
 	signal(SIGINT, signal_handler);
 	signal(SIGTERM, signal_handler);
@@ -2543,6 +2719,23 @@ main(int argc, char** argv)
 		       nb_rxq, nb_txq);
 
 	init_config();
+
+	if (hot_plug) {
+		/* enable hot plug monitoring */
+		ret = rte_dev_event_monitor_start();
+		if (ret) {
+			rte_errno = EINVAL;
+			return -1;
+		}
+		if (TAILQ_EMPTY(&hp_list))
+			TAILQ_INIT(&hp_list);
+		RTE_ETH_FOREACH_DEV(port_id) {
+			hotplug_list_add(rte_eth_devices[port_id].device,
+					 rte_eth_devices[port_id].data->kdrv);
+			eth_dev_event_callback_register(port_id);
+		}
+	}
+
 	if (start_port(RTE_PORT_ALL) != 0)
 		rte_exit(EXIT_FAILURE, "Start ports failed\n");
 
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 153abea..c619e11 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -63,6 +63,15 @@ typedef uint16_t streamid_t;
 #define TM_MODE			0
 #endif
 
+struct hotplug_request {
+	TAILQ_ENTRY(hotplug_request) next; /**< Callbacks list */
+	const char *dev_name;            /* request device name */
+	enum rte_kernel_driver dev_kdrv;            /* kernel driver binded */
+};
+
+/** @internal Structure to keep track of registered callbacks */
+TAILQ_HEAD(hotplug_request_list, hotplug_request);
+
 enum {
 	PORT_TOPOLOGY_PAIRED,
 	PORT_TOPOLOGY_CHAINED,
@@ -319,6 +328,8 @@ extern volatile int test_done; /* stop packet forwarding when set to 1. */
 extern uint8_t lsc_interrupt; /**< disabled by "--no-lsc-interrupt" parameter */
 extern uint8_t rmv_interrupt; /**< disabled by "--no-rmv-interrupt" parameter */
 extern uint32_t event_print_mask;
+extern uint8_t hot_plug; /**< enable by "--hot-plug" parameter */
+
 /**< set by "--print-event xxxx" and "--mask-event xxxx parameters */
 
 #ifdef RTE_LIBRTE_IXGBE_BYPASS
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH V16 0/4] add device event monitor framework
  2018-03-21  5:27                                                       ` [PATCH V15 3/5] app/testpmd: use uevent to monitor hotplug Jeff Guo
  2018-03-26 10:55                                                         ` [PATCH V16 0/3] add device event monitor framework Jeff Guo
@ 2018-03-26 11:20                                                         ` Jeff Guo
  2018-03-26 11:20                                                           ` [PATCH V16 1/4] eal: add device event handle in interrupt thread Jeff Guo
                                                                             ` (3 more replies)
  1 sibling, 4 replies; 494+ messages in thread
From: Jeff Guo @ 2018-03-26 11:20 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, harry.van.haaren,
	jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

About hot plug in dpdk, We already have proactive way to add/remove devices
through APIs (rte_eal_hotplug_add/remove), and also have fail-safe driver
to offload the fail-safe work from the app user. But there are still lack
of a general mechanism to monitor hotplug event for all driver, now the
hotplug interrupt event is diversity between each device and driver, such
as mlx4, pci driver and others.

Use the hot removal event for example, pci drivers not all exposure the
remove interrupt, so in order to make user to easy use the hot plug
feature for pci driver, something must be done to detect the remove event
at the kernel level and offer a new line of interrupt to the user land.

Base on the uevent of kobject mechanism in kernel, we could use it to
benefit for monitoring the hot plug status of the device which not only
uio/vfio of pci bus devices, but also other, such as cpu/usb/pci-express bus devices.

The idea is comming as bellow.

a.The uevent message form FD monitoring which will be useful.
remove@/devices/pci0000:80/0000:80:02.2/0000:82:00.0/0000:83:03.0/0000:84:00.2/uio/uio2
ACTION=remove
DEVPATH=/devices/pci0000:80/0000:80:02.2/0000:82:00.0/0000:83:03.0/0000:84:00.2/uio/uio2
SUBSYSTEM=uio
MAJOR=243
MINOR=2
DEVNAME=uio2
SEQNUM=11366

b.add uevent monitoring machanism:
add several general api to enable uevent monitoring.

c.add common uevent handler and uevent failure handler
uevent of device should be handler at bus or device layer, and the memory read
and write failure when hot removal should be handle correctly before detach behaviors.

d.show example how to use uevent monitor
enable uevent monitoring in testpmd or fail-safe to show usage.

patchset history:
v16->v15:
1.remove some linux related code out of eal common layer
2.fix some uneasy readble issue.

v15->v14:
1.use exist eal interrupt epoll to replace of rte service usage for monitor thread,
2.add new device event handle type in eal interrupt.
3.remove the uevent type check and any policy from eal,
let it check and management in user's callback.
4.add "--hot-plug" configure parameter in testpmd to switch the hotplug feature.

v14->v13:
1.add __rte_experimental on function defind and fix bsd build issue

v13->v12:
1.fix some logic issue and null check issue
2.fix monitor stop func issue

v12->v11:
1.identify null param in callback for monitor all devices uevent

v11->v10:
1:modify some typo and add experimental tag in new file.
2:modify callback register calling.

v10->v9:
1.fix prefix issue.
2.use a common callback lists for all device and all type to replace
add callback parameter into device struct.
3.delete some unuse part.

v9->v8:
split the patch set into small and explicit patch

v8->v7:
1.use rte_service to replace pthread management.
2.fix defind issue and copyright issue
3.fix some lock issue

v7->v6:
1.modify vdev part according to the vdev rework
2.re-define and split the func into common and bus specific code
3.fix some incorrect issue.
4.fix the system hung after send packcet issue.

v6->v5:
1.add hot plug policy, in eal, default handle to prepare hot plug work for
all pci device, then let app to manage to deside which device need to
hot plug.
2.modify to manage event callback in each device.
3.fix some system hung issue when igb_uioome typo error.release.
4.modify the pci part to the bus-pci base on the bus rework.
5.add hot plug policy in app, show example to use hotplug list to manage
to deside which device need to hot plug.

v5->v4:
1.Move uevent monitor epolling from eal interrupt to eal device layer.
2.Redefine the eal device API for common, and distinguish between linux and bsd
3.Add failure handler helper api in bus layer.Add function of find device by name.
4.Replace of individual fd bind with single device, use a common fd to polling all device.
5.Add to register hot insertion monitoring and process, add function to auto bind driver befor user add device
6.Refine some coding style and typos issue
7.add new callback to process hot insertion

v4->v3:
1.move uevent monitor api from eal interrupt to eal device layer.
2.create uevent type and struct in eal device.
3.move uevent handler for each driver to eal layer.
4.add uevent failure handler to process signal fault issue.
5.add example for request and use uevent monitoring in testpmd.

v3->v2:
1.refine some return error
2.refine the string searching logic to avoid memory issue

v2->v1:
1.remove global variables of hotplug_fd, add uevent_fd
in rte_intr_handle to let each pci device self maintain it fd,
to fix dual device fd issue.
2.refine some typo error.

Jeff Guo (4):
  eal: add device event handle in interrupt thread
  eal: add device event monitor framework
  eal/linux: uevent parse and process
  app/testpmd: enable device hotplug monitoring

 app/test-pmd/parameters.c                          |   5 +-
 app/test-pmd/testpmd.c                             | 195 +++++++++++++++++-
 app/test-pmd/testpmd.h                             |  11 +
 lib/librte_eal/bsdapp/eal/Makefile                 |   1 +
 lib/librte_eal/bsdapp/eal/eal_dev.c                |  19 ++
 lib/librte_eal/common/eal_common_dev.c             | 145 +++++++++++++
 lib/librte_eal/common/eal_private.h                |  24 +++
 lib/librte_eal/common/include/rte_dev.h            |  92 +++++++++
 lib/librte_eal/common/include/rte_eal_interrupts.h |   1 +
 lib/librte_eal/linuxapp/eal/Makefile               |   1 +
 lib/librte_eal/linuxapp/eal/eal_dev.c              | 228 +++++++++++++++++++++
 lib/librte_eal/linuxapp/eal/eal_interrupts.c       |   5 +-
 12 files changed, 724 insertions(+), 3 deletions(-)
 create mode 100644 lib/librte_eal/bsdapp/eal/eal_dev.c
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_dev.c

-- 
2.7.4

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH V16 1/4] eal: add device event handle in interrupt thread
  2018-03-26 11:20                                                         ` [PATCH V16 0/4] add device event monitor framework Jeff Guo
@ 2018-03-26 11:20                                                           ` Jeff Guo
  2018-03-27  9:26                                                             ` Tan, Jianfeng
  2018-03-26 11:20                                                           ` [PATCH V16 2/4] eal: add device event monitor framework Jeff Guo
                                                                             ` (2 subsequent siblings)
  3 siblings, 1 reply; 494+ messages in thread
From: Jeff Guo @ 2018-03-26 11:20 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, harry.van.haaren,
	jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

Add new interrupt handle type of RTE_INTR_HANDLE_DEV_EVENT, for
device event interrupt monitor.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v16->v15:
split into small patch base on the function
---
 lib/librte_eal/common/include/rte_eal_interrupts.h | 1 +
 lib/librte_eal/linuxapp/eal/eal_interrupts.c       | 5 ++++-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/common/include/rte_eal_interrupts.h b/lib/librte_eal/common/include/rte_eal_interrupts.h
index 3f792a9..6eb4932 100644
--- a/lib/librte_eal/common/include/rte_eal_interrupts.h
+++ b/lib/librte_eal/common/include/rte_eal_interrupts.h
@@ -34,6 +34,7 @@ enum rte_intr_handle_type {
 	RTE_INTR_HANDLE_ALARM,        /**< alarm handle */
 	RTE_INTR_HANDLE_EXT,          /**< external handler */
 	RTE_INTR_HANDLE_VDEV,         /**< virtual device */
+	RTE_INTR_HANDLE_DEV_EVENT,    /**< device event handle */
 	RTE_INTR_HANDLE_MAX           /**< count of elements */
 };
 
diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
index f86f22f..842acaa 100644
--- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c
+++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
@@ -674,7 +674,10 @@ eal_intr_process_interrupts(struct epoll_event *events, int nfds)
 			bytes_read = 0;
 			call = true;
 			break;
-
+		case RTE_INTR_HANDLE_DEV_EVENT:
+			bytes_read = 0;
+			call = true;
+			break;
 		default:
 			bytes_read = 1;
 			break;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH V16 2/4] eal: add device event monitor framework
  2018-03-26 11:20                                                         ` [PATCH V16 0/4] add device event monitor framework Jeff Guo
  2018-03-26 11:20                                                           ` [PATCH V16 1/4] eal: add device event handle in interrupt thread Jeff Guo
@ 2018-03-26 11:20                                                           ` Jeff Guo
  2018-03-28  3:39                                                             ` Tan, Jianfeng
  2018-03-26 11:20                                                           ` [PATCH V16 3/4] eal/linux: uevent parse and process Jeff Guo
  2018-03-26 11:20                                                           ` [PATCH V16 4/4] app/testpmd: enable device hotplug monitoring Jeff Guo
  3 siblings, 1 reply; 494+ messages in thread
From: Jeff Guo @ 2018-03-26 11:20 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, harry.van.haaren,
	jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

This patch aims to add a general device event monitor mechanism at
EAL device layer, for device hotplug awareness and actions adopted
accordingly. It could also expand for all other type of device event
monitor, but not in this scope at the stage.

To get started, users firstly register or unregister callbacks through
the new added APIs. Callbacks can be some device specific, or for all
devices.
  -rte_dev_callback_register
  -rte_dev_callback_unregister

Then application shall call below new added APIs to enable/disable the
mechanism:
  - rte_dev_event_monitor_start
  - rte_dev_event_monitor_stop

Use hotplug case for example, when device hotplug insertion or hotplug
removal, we will get notified from kernel, then call user's callbacks
accordingly to handle it, such as detach or attach the device from the
bus, and could be benifit for futher fail-safe or live-migration.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v16->v15:
1.remove some linux related code out of eal common layer
2.fix some uneasy readble issue.
---
 lib/librte_eal/bsdapp/eal/Makefile      |   1 +
 lib/librte_eal/bsdapp/eal/eal_dev.c     |  19 +++++
 lib/librte_eal/common/eal_common_dev.c  | 145 ++++++++++++++++++++++++++++++++
 lib/librte_eal/common/eal_private.h     |  24 ++++++
 lib/librte_eal/common/include/rte_dev.h |  92 ++++++++++++++++++++
 lib/librte_eal/linuxapp/eal/Makefile    |   1 +
 lib/librte_eal/linuxapp/eal/eal_dev.c   |  20 +++++
 7 files changed, 302 insertions(+)
 create mode 100644 lib/librte_eal/bsdapp/eal/eal_dev.c
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_dev.c

diff --git a/lib/librte_eal/bsdapp/eal/Makefile b/lib/librte_eal/bsdapp/eal/Makefile
index dd455e6..c0921dd 100644
--- a/lib/librte_eal/bsdapp/eal/Makefile
+++ b/lib/librte_eal/bsdapp/eal/Makefile
@@ -33,6 +33,7 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_lcore.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_timer.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_interrupts.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_alarm.c
+SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_dev.c
 
 # from common dir
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_common_lcore.c
diff --git a/lib/librte_eal/bsdapp/eal/eal_dev.c b/lib/librte_eal/bsdapp/eal/eal_dev.c
new file mode 100644
index 0000000..ad606b3
--- /dev/null
+++ b/lib/librte_eal/bsdapp/eal/eal_dev.c
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <rte_log.h>
+
+int __rte_experimental
+rte_dev_event_monitor_start(void)
+{
+	RTE_LOG(ERR, EAL, "Device event is not supported for FreeBSD\n");
+	return -1;
+}
+
+int __rte_experimental
+rte_dev_event_monitor_stop(void)
+{
+	RTE_LOG(ERR, EAL, "Device event is not supported for FreeBSD\n");
+	return -1;
+}
diff --git a/lib/librte_eal/common/eal_common_dev.c b/lib/librte_eal/common/eal_common_dev.c
index cd07144..3a1bbb6 100644
--- a/lib/librte_eal/common/eal_common_dev.c
+++ b/lib/librte_eal/common/eal_common_dev.c
@@ -14,9 +14,34 @@
 #include <rte_devargs.h>
 #include <rte_debug.h>
 #include <rte_log.h>
+#include <rte_spinlock.h>
+#include <rte_malloc.h>
 
 #include "eal_private.h"
 
+/* spinlock for device callbacks */
+static rte_spinlock_t dev_event_lock = RTE_SPINLOCK_INITIALIZER;
+
+/**
+ * The device event callback description.
+ *
+ * It contains callback address to be registered by user application,
+ * the pointer to the parameters for callback, and the device name.
+ */
+struct dev_event_callback {
+	TAILQ_ENTRY(dev_event_callback) next; /**< Callbacks list */
+	rte_dev_event_cb_fn cb_fn;                /**< Callback address */
+	void *cb_arg;                           /**< Callback parameter */
+	char *dev_name;	 /**< Callback devcie name, NULL is for all device */
+	uint32_t active;                        /**< Callback is executing */
+};
+
+/** @internal Structure to keep track of registered callbacks */
+TAILQ_HEAD(dev_event_cb_list, dev_event_callback);
+
+/* The device event callback list for all registered callbacks. */
+static struct dev_event_cb_list dev_event_cbs;
+
 static int cmp_detached_dev_name(const struct rte_device *dev,
 	const void *_name)
 {
@@ -207,3 +232,123 @@ rte_eal_hotplug_remove(const char *busname, const char *devname)
 	rte_eal_devargs_remove(busname, devname);
 	return ret;
 }
+
+static struct dev_event_callback * __rte_experimental
+dev_event_cb_find(const char *device_name, rte_dev_event_cb_fn cb_fn,
+				void *cb_arg)
+{
+	struct dev_event_callback *event_cb = NULL;
+
+	TAILQ_FOREACH(event_cb, &(dev_event_cbs), next) {
+		if (event_cb->cb_fn == cb_fn && event_cb->cb_arg == cb_arg) {
+			if (device_name == NULL && event_cb->dev_name == NULL)
+				break;
+			if (device_name == NULL || event_cb->dev_name == NULL)
+				continue;
+			if (!strcmp(event_cb->dev_name, device_name))
+				break;
+		}
+	}
+	return event_cb;
+}
+
+int __rte_experimental
+rte_dev_callback_register(const char *device_name, rte_dev_event_cb_fn cb_fn,
+				void *cb_arg)
+{
+	struct dev_event_callback *event_cb = NULL;
+
+	if (!cb_fn)
+		return -EINVAL;
+
+	rte_spinlock_lock(&dev_event_lock);
+
+	if (TAILQ_EMPTY(&(dev_event_cbs)))
+		TAILQ_INIT(&(dev_event_cbs));
+
+	event_cb = dev_event_cb_find(device_name, cb_fn, cb_arg);
+
+	/* create a new callback. */
+	if (event_cb == NULL) {
+		event_cb = malloc(sizeof(struct dev_event_callback));
+		if (event_cb != NULL) {
+			event_cb->cb_fn = cb_fn;
+			event_cb->cb_arg = cb_arg;
+			event_cb->dev_name = strdup(device_name);
+			if (event_cb->dev_name == NULL)
+				return -EINVAL;
+			TAILQ_INSERT_TAIL(&(dev_event_cbs), event_cb, next);
+		} else {
+			RTE_LOG(ERR, EAL,
+				"Failed to allocate memory for device event callback");
+			return -ENOMEM;
+		}
+	}
+
+	rte_spinlock_unlock(&dev_event_lock);
+	return (event_cb == NULL) ? -EEXIST : 0;
+}
+
+int __rte_experimental
+rte_dev_callback_unregister(const char *device_name, rte_dev_event_cb_fn cb_fn,
+				void *cb_arg)
+{
+	int ret = -1;
+	struct dev_event_callback *event_cb = NULL;
+
+	if (!cb_fn)
+		return -EINVAL;
+
+	rte_spinlock_lock(&dev_event_lock);
+
+	event_cb = dev_event_cb_find(device_name, cb_fn, cb_arg);
+
+	/*
+	 * if this callback is not executing right now,
+	 * then remove it.
+	 */
+	if (event_cb != NULL) {
+		if (event_cb->active == 0) {
+			TAILQ_REMOVE(&(dev_event_cbs), event_cb, next);
+			rte_free(event_cb);
+			ret = 0;
+		}
+		ret = -EBUSY;
+	}
+
+	rte_spinlock_unlock(&dev_event_lock);
+	return ret;
+}
+
+int __rte_experimental
+dev_callback_process(char *device_name, enum rte_dev_event_type event,
+				void *cb_arg)
+{
+	struct dev_event_callback *cb_lst;
+	int rc = 0;
+
+	rte_spinlock_lock(&dev_event_lock);
+
+	if (device_name == NULL)
+		return -EINVAL;
+
+	TAILQ_FOREACH(cb_lst, &dev_event_cbs, next) {
+		if (!cb_lst->dev_name)
+			break;
+		else if (!strcmp(cb_lst->dev_name, device_name))
+			break;
+	}
+	if (cb_lst) {
+		cb_lst->active = 1;
+		if (cb_arg)
+			cb_lst->cb_arg = cb_arg;
+		rte_spinlock_unlock(&dev_event_lock);
+		rc = cb_lst->cb_fn(device_name, event,
+				cb_lst->cb_arg);
+		rte_spinlock_lock(&dev_event_lock);
+		cb_lst->active = 0;
+	}
+
+	rte_spinlock_unlock(&dev_event_lock);
+	return rc;
+}
diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h
index 0b28770..d55cd68 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -9,6 +9,8 @@
 #include <stdint.h>
 #include <stdio.h>
 
+#include <rte_dev.h>
+
 /**
  * Initialize the memzone subsystem (private to eal).
  *
@@ -205,4 +207,26 @@ struct rte_bus *rte_bus_find_by_device_name(const char *str);
 
 int rte_mp_channel_init(void);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * internal Executes all the user application registered callbacks for
+ * the specific device. It is for DPDK internal user only. User
+ * application should not call it directly.
+ *
+ * @param device_name
+ *  The device name.
+ * @param event
+ *  the device event type
+ * @param cb_arg
+ *  callback parameter.
+ *
+ * @return
+ *  - On success, return zero.
+ *  - On failure, a negative value.
+ */
+int __rte_experimental
+dev_callback_process(char *device_name, enum rte_dev_event_type event,
+				void *cb_arg);
 #endif /* _EAL_PRIVATE_H_ */
diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h
index b688f1e..8867de6 100644
--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -24,6 +24,26 @@ extern "C" {
 #include <rte_compat.h>
 #include <rte_log.h>
 
+/**
+ * The device event type.
+ */
+enum rte_dev_event_type {
+	RTE_DEV_EVENT_UNKNOWN,	/**< unknown event type */
+	RTE_DEV_EVENT_ADD,	/**< device being added */
+	RTE_DEV_EVENT_REMOVE,	/**< device being removed */
+	RTE_DEV_EVENT_MAX	/**< max value of this enum */
+};
+
+struct rte_dev_event {
+	enum rte_dev_event_type type;	/**< device event type */
+	int subsystem;			/**< subsystem id */
+	char *devname;			/**< device name */
+};
+
+typedef int (*rte_dev_event_cb_fn)(char *device_name,
+					enum rte_dev_event_type event,
+					void *cb_arg);
+
 __attribute__((format(printf, 2, 0)))
 static inline void
 rte_pmd_debug_trace(const char *func_name, const char *fmt, ...)
@@ -267,4 +287,76 @@ __attribute__((used)) = str
 }
 #endif
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * It registers the callback for the specific device.
+ * Multiple callbacks cal be registered at the same time.
+ *
+ * @param device_name
+ *  The device name, that is the param name of the struct rte_device,
+ *  null value means for all devices.
+ * @param cb_fn
+ *  callback address.
+ * @param cb_arg
+ *  address of parameter for callback.
+ *
+ * @return
+ *  - On success, zero.
+ *  - On failure, a negative value.
+ */
+int __rte_experimental
+rte_dev_callback_register(const char *device_name, rte_dev_event_cb_fn cb_fn,
+			void *cb_arg);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * It unregisters the callback according to the specified device.
+ *
+ * @param device_name
+ *  The device name, that is the param name of the struct rte_device,
+ *  null value means for all devices.
+ * @param cb_fn
+ *  callback address.
+ * @param cb_arg
+ *  address of parameter for callback.
+ *
+ * @return
+ *  - On success, return the number of callback entities removed.
+ *  - On failure, a negative value.
+ */
+int __rte_experimental
+rte_dev_callback_unregister(const char *device_name, rte_dev_event_cb_fn cb_fn,
+					void *cb_arg);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Start the device event monitoring.
+ *
+ * @param none
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int __rte_experimental
+rte_dev_event_monitor_start(void);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Stop the device event monitoring .
+ *
+ * @param none
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int __rte_experimental
+rte_dev_event_monitor_stop(void);
 #endif /* _RTE_DEV_H_ */
diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index 7e5bbe8..8578796 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -41,6 +41,7 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_lcore.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_timer.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_interrupts.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_alarm.c
+SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_dev.c
 
 # from common dir
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_common_lcore.c
diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
new file mode 100644
index 0000000..5ab5830
--- /dev/null
+++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <rte_log.h>
+#include <rte_dev.h>
+
+int __rte_experimental
+rte_dev_event_monitor_start(void)
+{
+	/* TODO: start uevent monitor for linux */
+	return 0;
+}
+
+int __rte_experimental
+rte_dev_event_monitor_stop(void)
+{
+	/* TODO: stop uevent monitor for linux */
+	return 0;
+}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH V16 3/4] eal/linux: uevent parse and process
  2018-03-26 11:20                                                         ` [PATCH V16 0/4] add device event monitor framework Jeff Guo
  2018-03-26 11:20                                                           ` [PATCH V16 1/4] eal: add device event handle in interrupt thread Jeff Guo
  2018-03-26 11:20                                                           ` [PATCH V16 2/4] eal: add device event monitor framework Jeff Guo
@ 2018-03-26 11:20                                                           ` Jeff Guo
  2018-03-28 16:15                                                             ` Tan, Jianfeng
  2018-03-26 11:20                                                           ` [PATCH V16 4/4] app/testpmd: enable device hotplug monitoring Jeff Guo
  3 siblings, 1 reply; 494+ messages in thread
From: Jeff Guo @ 2018-03-26 11:20 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, harry.van.haaren,
	jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

In order to handle the uevent which have been detected from the kernel
side, add uevent parse and process function to translate the uevent into
device event, which user has subscribe to monitor.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
1.move all linux specific together
---
 lib/librte_eal/linuxapp/eal/eal_dev.c | 214 +++++++++++++++++++++++++++++++++-
 1 file changed, 211 insertions(+), 3 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
index 5ab5830..90094c0 100644
--- a/lib/librte_eal/linuxapp/eal/eal_dev.c
+++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
@@ -2,19 +2,227 @@
  * Copyright(c) 2018 Intel Corporation
  */
 
-#include <rte_log.h>
+#include <stdio.h>
+#include <string.h>
+#include <inttypes.h>
+#include <sys/queue.h>
+#include <sys/signalfd.h>
+#include <sys/ioctl.h>
+#include <sys/socket.h>
+#include <linux/netlink.h>
+#include <sys/epoll.h>
+#include <unistd.h>
+#include <signal.h>
+#include <stdbool.h>
+#include <fcntl.h>
+
+#include <rte_malloc.h>
+#include <rte_bus.h>
 #include <rte_dev.h>
+#include <rte_devargs.h>
+#include <rte_debug.h>
+#include <rte_log.h>
+#include <rte_interrupts.h>
+
+#include "eal_private.h"
+#include "eal_thread.h"
+
+static struct rte_intr_handle intr_handle = {.fd = -1 };
+static bool monitor_not_started = true;
+
+#define EAL_UEV_MSG_LEN 4096
+#define EAL_UEV_MSG_ELEM_LEN 128
+
+/* identify the system layer which event exposure from */
+enum eal_dev_event_subsystem {
+	EAL_DEV_EVENT_SUBSYSTEM_PCI, /* PCI bus device event */
+	EAL_DEV_EVENT_SUBSYSTEM_UIO, /* UIO driver device event */
+	EAL_DEV_EVENT_SUBSYSTEM_MAX
+};
+
+static int
+dev_uev_monitor_fd_new(void)
+{
+	int uevent_fd;
+
+	uevent_fd = socket(PF_NETLINK, SOCK_RAW | SOCK_CLOEXEC |
+			SOCK_NONBLOCK,
+			NETLINK_KOBJECT_UEVENT);
+	if (uevent_fd < 0) {
+		RTE_LOG(ERR, EAL, "create uevent fd failed\n");
+		return -1;
+	}
+	return uevent_fd;
+}
+
+static int
+dev_uev_monitor_create(int netlink_fd)
+{
+	struct sockaddr_nl addr;
+	int ret;
+	int size = 64 * 1024;
+	int nonblock = 1;
+
+	memset(&addr, 0, sizeof(addr));
+	addr.nl_family = AF_NETLINK;
+	addr.nl_pid = 0;
+	addr.nl_groups = 0xffffffff;
+
+	if (bind(netlink_fd, (struct sockaddr *) &addr, sizeof(addr)) < 0) {
+		RTE_LOG(ERR, EAL, "bind failed\n");
+		goto err;
+	}
+
+	setsockopt(netlink_fd, SOL_SOCKET, SO_PASSCRED, &size, sizeof(size));
+
+	ret = ioctl(netlink_fd, FIONBIO, &nonblock);
+	if (ret != 0) {
+		RTE_LOG(ERR, EAL, "ioctl(FIONBIO) failed\n");
+		goto err;
+	}
+	return 0;
+err:
+	close(netlink_fd);
+	return -1;
+}
+
+static void
+dev_uev_parse(const char *buf, struct rte_dev_event *event, int length)
+{
+	char action[EAL_UEV_MSG_ELEM_LEN];
+	char subsystem[EAL_UEV_MSG_ELEM_LEN];
+	char dev_path[EAL_UEV_MSG_ELEM_LEN];
+	char pci_slot_name[EAL_UEV_MSG_ELEM_LEN];
+	int i = 0;
+
+	memset(action, 0, EAL_UEV_MSG_ELEM_LEN);
+	memset(subsystem, 0, EAL_UEV_MSG_ELEM_LEN);
+	memset(dev_path, 0, EAL_UEV_MSG_ELEM_LEN);
+	memset(pci_slot_name, 0, EAL_UEV_MSG_ELEM_LEN);
+
+	while (i < length) {
+		for (; i < length; i++) {
+			if (*buf)
+				break;
+			buf++;
+		}
+		if (!strncmp(buf, "ACTION=", 7)) {
+			buf += 7;
+			i += 7;
+			snprintf(action, sizeof(action), "%s", buf);
+		} else if (!strncmp(buf, "DEVPATH=", 8)) {
+			buf += 8;
+			i += 8;
+			snprintf(dev_path, sizeof(dev_path), "%s", buf);
+		} else if (!strncmp(buf, "SUBSYSTEM=", 10)) {
+			buf += 10;
+			i += 10;
+			snprintf(subsystem, sizeof(subsystem), "%s", buf);
+		} else if (!strncmp(buf, "PCI_SLOT_NAME=", 14)) {
+			buf += 14;
+			i += 14;
+			snprintf(pci_slot_name, sizeof(subsystem), "%s", buf);
+			event->devname = pci_slot_name;
+		}
+		for (; i < length; i++) {
+			if (*buf == '\0')
+				break;
+			buf++;
+		}
+	}
+
+	if (!strncmp(subsystem, "uio", 3))
+		event->subsystem = EAL_DEV_EVENT_SUBSYSTEM_UIO;
+	else if (!strncmp(subsystem, "pci", 3))
+		event->subsystem = EAL_DEV_EVENT_SUBSYSTEM_PCI;
+	if (!strncmp(action, "add", 3))
+		event->type = RTE_DEV_EVENT_ADD;
+	if (!strncmp(action, "remove", 6))
+		event->type = RTE_DEV_EVENT_REMOVE;
+}
+
+static int
+dev_uev_receive(int fd, struct rte_dev_event *uevent)
+{
+	int ret;
+	char buf[EAL_UEV_MSG_LEN];
+
+	memset(uevent, 0, sizeof(struct rte_dev_event));
+	memset(buf, 0, EAL_UEV_MSG_LEN);
+
+	ret = recv(fd, buf, EAL_UEV_MSG_LEN, MSG_DONTWAIT);
+	if (ret < 0) {
+		RTE_LOG(ERR, EAL,
+		"Socket read error(%d): %s\n",
+		errno, strerror(errno));
+		return -1;
+	} else if (ret == 0)
+		/* connection closed */
+		return -1;
+
+	dev_uev_parse(buf, uevent, EAL_UEV_MSG_LEN);
+
+	return 0;
+}
+
+static void
+dev_uev_process(__rte_unused void *param)
+{
+	struct rte_dev_event uevent;
+
+	if (dev_uev_receive(intr_handle.fd, &uevent))
+		return;
+
+	if (uevent.devname)
+		dev_callback_process(uevent.devname, uevent.type, NULL);
+}
 
 int __rte_experimental
 rte_dev_event_monitor_start(void)
 {
-	/* TODO: start uevent monitor for linux */
+	int ret;
+
+	if (!monitor_not_started)
+		return 0;
+
+	intr_handle.fd = dev_uev_monitor_fd_new();
+	intr_handle.type = RTE_INTR_HANDLE_DEV_EVENT;
+
+	ret = dev_uev_monitor_create(intr_handle.fd);
+
+	if (ret) {
+		RTE_LOG(ERR, EAL, "error create device event monitor\n");
+		return -1;
+	}
+
+	ret = rte_intr_callback_register(&intr_handle, dev_uev_process, NULL);
+
+	if (ret) {
+		RTE_LOG(ERR, EAL, "fail to register uevent callback\n");
+		return -1;
+	}
+
+	monitor_not_started = false;
+
 	return 0;
 }
 
 int __rte_experimental
 rte_dev_event_monitor_stop(void)
 {
-	/* TODO: stop uevent monitor for linux */
+	int ret;
+
+	if (monitor_not_started)
+		return 0;
+
+	ret = rte_intr_callback_unregister(&intr_handle, dev_uev_process, NULL);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "fail to unregister uevent callback");
+		return ret;
+	}
+
+	close(intr_handle.fd);
+	intr_handle.fd = -1;
+	monitor_not_started = true;
 	return 0;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH V16 4/4] app/testpmd: enable device hotplug monitoring
  2018-03-26 11:20                                                         ` [PATCH V16 0/4] add device event monitor framework Jeff Guo
                                                                             ` (2 preceding siblings ...)
  2018-03-26 11:20                                                           ` [PATCH V16 3/4] eal/linux: uevent parse and process Jeff Guo
@ 2018-03-26 11:20                                                           ` Jeff Guo
  2018-03-28 16:41                                                             ` Tan, Jianfeng
  2018-03-29 16:00                                                             ` [PATCH V17 0/4] add device event monitor framework Jeff Guo
  3 siblings, 2 replies; 494+ messages in thread
From: Jeff Guo @ 2018-03-26 11:20 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, harry.van.haaren,
	jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

Use testpmd for example, to show an application how to use device event
mechanism to monitor the hotplug event, involve both hot removal event
and the hot insertion event.

The process is that, testpmd first enable hotplug monitoring and register
the user's callback, when device being hotplug insertion or hotplug
removal, the eal monitor the event and call user's callbacks, the
application according their hot plug policy to detach or attach the device
from the bus.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
1.modify log and patch description.
---
 app/test-pmd/parameters.c |   5 +-
 app/test-pmd/testpmd.c    | 195 +++++++++++++++++++++++++++++++++++++++++++++-
 app/test-pmd/testpmd.h    |  11 +++
 3 files changed, 209 insertions(+), 2 deletions(-)

diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index 97d22b8..825d602 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -186,6 +186,7 @@ usage(char* progname)
 	printf("  --flow-isolate-all: "
 	       "requests flow API isolated mode on all ports at initialization time.\n");
 	printf("  --tx-offloads=0xXXXXXXXX: hexadecimal bitmask of TX queue offloads\n");
+	printf("  --hot-plug: enalbe hot plug for device.\n");
 }
 
 #ifdef RTE_LIBRTE_CMDLINE
@@ -621,6 +622,7 @@ launch_args_parse(int argc, char** argv)
 		{ "print-event",		1, 0, 0 },
 		{ "mask-event",			1, 0, 0 },
 		{ "tx-offloads",		1, 0, 0 },
+		{ "hot-plug",			0, 0, 0 },
 		{ 0, 0, 0, 0 },
 	};
 
@@ -1102,7 +1104,8 @@ launch_args_parse(int argc, char** argv)
 					rte_exit(EXIT_FAILURE,
 						 "invalid mask-event argument\n");
 				}
-
+			if (!strcmp(lgopts[opt_idx].name, "hot-plug"))
+				hot_plug = 1;
 			break;
 		case 'h':
 			usage(argv[0]);
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 4c0e258..bb1ac8f 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -12,6 +12,7 @@
 #include <sys/mman.h>
 #include <sys/types.h>
 #include <errno.h>
+#include <stdbool.h>
 
 #include <sys/queue.h>
 #include <sys/stat.h>
@@ -284,6 +285,9 @@ uint8_t lsc_interrupt = 1; /* enabled by default */
  */
 uint8_t rmv_interrupt = 1; /* enabled by default */
 
+
+uint8_t hot_plug = 0; /**< hotplug disabled by default. */
+
 /*
  * Display or mask ether events
  * Default to all events except VF_MBOX
@@ -384,6 +388,8 @@ uint8_t bitrate_enabled;
 struct gro_status gro_ports[RTE_MAX_ETHPORTS];
 uint8_t gro_flush_cycles = GRO_DEFAULT_FLUSH_CYCLES;
 
+static struct hotplug_request_list hp_list;
+
 /* Forward function declarations */
 static void map_port_queue_stats_mapping_registers(portid_t pi,
 						   struct rte_port *port);
@@ -391,6 +397,14 @@ static void check_all_ports_link_status(uint32_t port_mask);
 static int eth_event_callback(portid_t port_id,
 			      enum rte_eth_event_type type,
 			      void *param, void *ret_param);
+static int eth_dev_event_callback(char *device_name,
+				enum rte_dev_event_type type,
+				void *param);
+static int eth_dev_event_callback_register(portid_t port_id);
+static bool in_hotplug_list(const char *dev_name);
+
+static int hotplug_list_add(struct rte_device *device,
+				enum rte_kernel_driver device_kdrv);
 
 /*
  * Check if all the ports are started.
@@ -1853,6 +1867,27 @@ reset_port(portid_t pid)
 	printf("Done\n");
 }
 
+static int
+eth_dev_event_callback_register(portid_t port_id)
+{
+	int diag;
+	char device_name[128];
+
+	snprintf(device_name, sizeof(device_name),
+		"%s", rte_eth_devices[port_id].device->name);
+
+	/* register the dev_event callback */
+
+	diag = rte_dev_callback_register(device_name,
+		eth_dev_event_callback, (void *)(intptr_t)port_id);
+	if (diag) {
+		printf("Failed to setup dev_event callback\n");
+		return -1;
+	}
+
+	return 0;
+}
+
 void
 attach_port(char *identifier)
 {
@@ -1869,6 +1904,8 @@ attach_port(char *identifier)
 	if (rte_eth_dev_attach(identifier, &pi))
 		return;
 
+	eth_dev_event_callback_register(pi);
+
 	socket_id = (unsigned)rte_eth_dev_socket_id(pi);
 	/* if socket_id is invalid, set to 0 */
 	if (check_socket_id(socket_id) < 0)
@@ -1880,6 +1917,12 @@ attach_port(char *identifier)
 
 	ports[pi].port_status = RTE_PORT_STOPPED;
 
+	if (hot_plug) {
+		hotplug_list_add(rte_eth_devices[pi].device,
+				 rte_eth_devices[pi].data->kdrv);
+		eth_dev_event_callback_register(pi);
+	}
+
 	printf("Port %d is attached. Now total ports is %d\n", pi, nb_ports);
 	printf("Done\n");
 }
@@ -1906,6 +1949,12 @@ detach_port(portid_t port_id)
 
 	nb_ports = rte_eth_dev_count();
 
+	if (hot_plug) {
+		hotplug_list_add(rte_eth_devices[port_id].device,
+				 rte_eth_devices[port_id].data->kdrv);
+		eth_dev_event_callback_register(port_id);
+	}
+
 	printf("Port '%s' is detached. Now total ports is %d\n",
 			name, nb_ports);
 	printf("Done\n");
@@ -1929,6 +1978,9 @@ pmd_test_exit(void)
 			close_port(pt_id);
 		}
 	}
+
+	rte_dev_event_monitor_stop();
+
 	printf("\nBye...\n");
 }
 
@@ -2013,6 +2065,49 @@ rmv_event_callback(void *arg)
 			dev->device->name);
 }
 
+static void
+rmv_dev_event_callback(void *arg)
+{
+	char name[RTE_ETH_NAME_MAX_LEN];
+	uint8_t port_id = (intptr_t)arg;
+
+	rte_eal_alarm_cancel(rmv_dev_event_callback, arg);
+
+	RTE_ETH_VALID_PORTID_OR_RET(port_id);
+	printf("removing port id:%u\n", port_id);
+
+	if (!in_hotplug_list(rte_eth_devices[port_id].device->name))
+		return;
+
+	stop_packet_forwarding();
+
+	stop_port(port_id);
+	close_port(port_id);
+	if (rte_eth_dev_detach(port_id, name)) {
+		RTE_LOG(ERR, USER1, "Failed to detach port '%s'\n", name);
+		return;
+	}
+
+	nb_ports = rte_eth_dev_count();
+
+	printf("Port '%s' is detached. Now total ports is %d\n",
+			name, nb_ports);
+}
+
+static void
+add_dev_event_callback(void *arg)
+{
+	char *dev_name = (char *)arg;
+
+	rte_eal_alarm_cancel(add_dev_event_callback, arg);
+
+	if (!in_hotplug_list(dev_name))
+		return;
+
+	RTE_LOG(ERR, EAL, "add device: %s\n", dev_name);
+	attach_port(dev_name);
+}
+
 /* This function is used by the interrupt thread */
 static int
 eth_event_callback(portid_t port_id, enum rte_eth_event_type type, void *param,
@@ -2059,6 +2154,86 @@ eth_event_callback(portid_t port_id, enum rte_eth_event_type type, void *param,
 	return 0;
 }
 
+static bool
+in_hotplug_list(const char *dev_name)
+{
+	struct hotplug_request *hp_request = NULL;
+
+	TAILQ_FOREACH(hp_request, &hp_list, next) {
+		if (!strcmp(hp_request->dev_name, dev_name))
+			break;
+	}
+
+	if (hp_request)
+		return true;
+
+	return false;
+}
+
+static int
+hotplug_list_add(struct rte_device *device, enum rte_kernel_driver device_kdrv)
+{
+	struct hotplug_request *hp_request;
+
+	hp_request = rte_zmalloc("hoplug request",
+			sizeof(*hp_request), 0);
+	if (hp_request == NULL) {
+		fprintf(stderr, "%s can not alloc memory\n",
+			__func__);
+		return -ENOMEM;
+	}
+
+	hp_request->dev_name = device->name;
+	hp_request->dev_kdrv = device_kdrv;
+
+	TAILQ_INSERT_TAIL(&hp_list, hp_request, next);
+
+	return 0;
+}
+
+/* This function is used by the interrupt thread */
+static int
+eth_dev_event_callback(char *device_name, enum rte_dev_event_type type,
+			     void *arg)
+{
+	static const char * const event_desc[] = {
+		[RTE_DEV_EVENT_UNKNOWN] = "Unknown",
+		[RTE_DEV_EVENT_ADD] = "add",
+		[RTE_DEV_EVENT_REMOVE] = "remove",
+	};
+	char *dev_name = malloc(strlen(device_name) + 1);
+
+	strcpy(dev_name, device_name);
+
+	if (type >= RTE_DEV_EVENT_MAX) {
+		fprintf(stderr, "%s called upon invalid event %d\n",
+			__func__, type);
+		fflush(stderr);
+	} else if (event_print_mask & (UINT32_C(1) << type)) {
+		printf("%s event\n",
+			event_desc[type]);
+		fflush(stdout);
+	}
+
+	switch (type) {
+	case RTE_DEV_EVENT_REMOVE:
+		if (rte_eal_alarm_set(100000,
+			rmv_dev_event_callback, arg))
+			fprintf(stderr,
+				"Could not set up deferred device removal\n");
+		break;
+	case RTE_DEV_EVENT_ADD:
+		if (rte_eal_alarm_set(100000,
+			add_dev_event_callback, dev_name))
+			fprintf(stderr,
+				"Could not set up deferred device add\n");
+		break;
+	default:
+		break;
+	}
+	return 0;
+}
+
 static int
 set_tx_queue_stats_mapping_registers(portid_t port_id, struct rte_port *port)
 {
@@ -2474,8 +2649,9 @@ signal_handler(int signum)
 int
 main(int argc, char** argv)
 {
-	int  diag;
+	int diag;
 	portid_t port_id;
+	int ret;
 
 	signal(SIGINT, signal_handler);
 	signal(SIGTERM, signal_handler);
@@ -2543,6 +2719,23 @@ main(int argc, char** argv)
 		       nb_rxq, nb_txq);
 
 	init_config();
+
+	if (hot_plug) {
+		/* enable hot plug monitoring */
+		ret = rte_dev_event_monitor_start();
+		if (ret) {
+			rte_errno = EINVAL;
+			return -1;
+		}
+		if (TAILQ_EMPTY(&hp_list))
+			TAILQ_INIT(&hp_list);
+		RTE_ETH_FOREACH_DEV(port_id) {
+			hotplug_list_add(rte_eth_devices[port_id].device,
+					 rte_eth_devices[port_id].data->kdrv);
+			eth_dev_event_callback_register(port_id);
+		}
+	}
+
 	if (start_port(RTE_PORT_ALL) != 0)
 		rte_exit(EXIT_FAILURE, "Start ports failed\n");
 
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 153abea..c619e11 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -63,6 +63,15 @@ typedef uint16_t streamid_t;
 #define TM_MODE			0
 #endif
 
+struct hotplug_request {
+	TAILQ_ENTRY(hotplug_request) next; /**< Callbacks list */
+	const char *dev_name;            /* request device name */
+	enum rte_kernel_driver dev_kdrv;            /* kernel driver binded */
+};
+
+/** @internal Structure to keep track of registered callbacks */
+TAILQ_HEAD(hotplug_request_list, hotplug_request);
+
 enum {
 	PORT_TOPOLOGY_PAIRED,
 	PORT_TOPOLOGY_CHAINED,
@@ -319,6 +328,8 @@ extern volatile int test_done; /* stop packet forwarding when set to 1. */
 extern uint8_t lsc_interrupt; /**< disabled by "--no-lsc-interrupt" parameter */
 extern uint8_t rmv_interrupt; /**< disabled by "--no-rmv-interrupt" parameter */
 extern uint32_t event_print_mask;
+extern uint8_t hot_plug; /**< enable by "--hot-plug" parameter */
+
 /**< set by "--print-event xxxx" and "--mask-event xxxx parameters */
 
 #ifdef RTE_LIBRTE_IXGBE_BYPASS
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* Re: [PATCH V16 1/4] eal: add device event handle in interrupt thread
  2018-03-26 11:20                                                           ` [PATCH V16 1/4] eal: add device event handle in interrupt thread Jeff Guo
@ 2018-03-27  9:26                                                             ` Tan, Jianfeng
  2018-03-28  8:14                                                               ` Guo, Jia
  0 siblings, 1 reply; 494+ messages in thread
From: Tan, Jianfeng @ 2018-03-27  9:26 UTC (permalink / raw)
  To: Jeff Guo, stephen, bruce.richardson, ferruh.yigit,
	konstantin.ananyev, gaetan.rivet, jingjing.wu, thomas, motih,
	harry.van.haaren
  Cc: jblunck, shreyansh.jain, dev, helin.zhang

Hi,


On 3/26/2018 7:20 PM, Jeff Guo wrote:
> Add new interrupt handle type of RTE_INTR_HANDLE_DEV_EVENT, for
> device event interrupt monitor.

A simple search of RTE_INTR_HANDLE_ALARM, we can see that we still need 
to update rte_intr_enable()/rte_intr_disable(), and test_interrupt_init().

Thanks,
Jianfeng

>
> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> ---
> v16->v15:
> split into small patch base on the function
> ---
>   lib/librte_eal/common/include/rte_eal_interrupts.h | 1 +
>   lib/librte_eal/linuxapp/eal/eal_interrupts.c       | 5 ++++-
>   2 files changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/lib/librte_eal/common/include/rte_eal_interrupts.h b/lib/librte_eal/common/include/rte_eal_interrupts.h
> index 3f792a9..6eb4932 100644
> --- a/lib/librte_eal/common/include/rte_eal_interrupts.h
> +++ b/lib/librte_eal/common/include/rte_eal_interrupts.h
> @@ -34,6 +34,7 @@ enum rte_intr_handle_type {
>   	RTE_INTR_HANDLE_ALARM,        /**< alarm handle */
>   	RTE_INTR_HANDLE_EXT,          /**< external handler */
>   	RTE_INTR_HANDLE_VDEV,         /**< virtual device */
> +	RTE_INTR_HANDLE_DEV_EVENT,    /**< device event handle */
>   	RTE_INTR_HANDLE_MAX           /**< count of elements */
>   };
>   
> diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
> index f86f22f..842acaa 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
> @@ -674,7 +674,10 @@ eal_intr_process_interrupts(struct epoll_event *events, int nfds)
>   			bytes_read = 0;
>   			call = true;
>   			break;
> -
> +		case RTE_INTR_HANDLE_DEV_EVENT:
> +			bytes_read = 0;
> +			call = true;
> +			break;
>   		default:
>   			bytes_read = 1;
>   			break;

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V16 2/4] eal: add device event monitor framework
  2018-03-26 11:20                                                           ` [PATCH V16 2/4] eal: add device event monitor framework Jeff Guo
@ 2018-03-28  3:39                                                             ` Tan, Jianfeng
  2018-03-28  8:12                                                               ` Guo, Jia
  0 siblings, 1 reply; 494+ messages in thread
From: Tan, Jianfeng @ 2018-03-28  3:39 UTC (permalink / raw)
  To: Guo, Jia, stephen, Richardson, Bruce, Yigit, Ferruh, Ananyev,
	Konstantin, gaetan.rivet, Wu, Jingjing, thomas, motih,
	Van Haaren, Harry
  Cc: jblunck, shreyansh.jain, dev, Zhang, Helin



> -----Original Message-----
> From: Guo, Jia
> Sent: Monday, March 26, 2018 7:21 PM
> To: stephen@networkplumber.org; Richardson, Bruce; Yigit, Ferruh;
> Ananyev, Konstantin; gaetan.rivet@6wind.com; Wu, Jingjing;
> thomas@monjalon.net; motih@mellanox.com; Van Haaren, Harry; Tan,
> Jianfeng
> Cc: jblunck@infradead.org; shreyansh.jain@nxp.com; dev@dpdk.org; Guo,
> Jia; Zhang, Helin
> Subject: [PATCH V16 2/4] eal: add device event monitor framework
> 
> This patch aims to add a general device event monitor mechanism at

mechanism -> framework?
Linux uevent is a mechanism.

> EAL device layer, for device hotplug awareness and actions adopted
> accordingly. It could also expand for all other type of device event
> monitor, but not in this scope at the stage.
> 
> To get started, users firstly register or unregister callbacks through
> the new added APIs. Callbacks can be some device specific, or for all
> devices.
>   -rte_dev_callback_register
>   -rte_dev_callback_unregister
> 

New APIs shall be added into rte_eal_version.map.

And also, we shall update the release note.

> Then application shall call below new added APIs to enable/disable the
> mechanism:
>   - rte_dev_event_monitor_start
>   - rte_dev_event_monitor_stop

Do we really have the use case to keep the callbacks, but stop monitoring? I don't think we really need these two APIs to enable/disable.

Instead, if we have a callback registered, then enable it; if we don't have any callbacks, then it's definitely disabled. 

> 
> Use hotplug case for example, when device hotplug insertion or hotplug
> removal, we will get notified from kernel, then call user's callbacks
> accordingly to handle it, such as detach or attach the device from the
> bus, and could be benifit for futher fail-safe or live-migration.

Typo: "be benifit " -> "benefit"
Typo: " futher" -> "further"

> 
> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> ---
> v16->v15:
> 1.remove some linux related code out of eal common layer
> 2.fix some uneasy readble issue.
> ---
>  lib/librte_eal/bsdapp/eal/Makefile      |   1 +
>  lib/librte_eal/bsdapp/eal/eal_dev.c     |  19 +++++
>  lib/librte_eal/common/eal_common_dev.c  | 145
> ++++++++++++++++++++++++++++++++
>  lib/librte_eal/common/eal_private.h     |  24 ++++++
>  lib/librte_eal/common/include/rte_dev.h |  92 ++++++++++++++++++++
>  lib/librte_eal/linuxapp/eal/Makefile    |   1 +
>  lib/librte_eal/linuxapp/eal/eal_dev.c   |  20 +++++
>  7 files changed, 302 insertions(+)
>  create mode 100644 lib/librte_eal/bsdapp/eal/eal_dev.c
>  create mode 100644 lib/librte_eal/linuxapp/eal/eal_dev.c
> 
> diff --git a/lib/librte_eal/bsdapp/eal/Makefile
> b/lib/librte_eal/bsdapp/eal/Makefile
> index dd455e6..c0921dd 100644
> --- a/lib/librte_eal/bsdapp/eal/Makefile
> +++ b/lib/librte_eal/bsdapp/eal/Makefile
> @@ -33,6 +33,7 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) +=
> eal_lcore.c
>  SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_timer.c
>  SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_interrupts.c
>  SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_alarm.c
> +SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_dev.c
> 
>  # from common dir
>  SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_common_lcore.c
> diff --git a/lib/librte_eal/bsdapp/eal/eal_dev.c
> b/lib/librte_eal/bsdapp/eal/eal_dev.c
> new file mode 100644
> index 0000000..ad606b3
> --- /dev/null
> +++ b/lib/librte_eal/bsdapp/eal/eal_dev.c
> @@ -0,0 +1,19 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2018 Intel Corporation
> + */
> +
> +#include <rte_log.h>
> +
> +int __rte_experimental
> +rte_dev_event_monitor_start(void)
> +{
> +	RTE_LOG(ERR, EAL, "Device event is not supported for FreeBSD\n");
> +	return -1;
> +}
> +
> +int __rte_experimental
> +rte_dev_event_monitor_stop(void)
> +{
> +	RTE_LOG(ERR, EAL, "Device event is not supported for FreeBSD\n");
> +	return -1;
> +}
> diff --git a/lib/librte_eal/common/eal_common_dev.c
> b/lib/librte_eal/common/eal_common_dev.c
> index cd07144..3a1bbb6 100644
> --- a/lib/librte_eal/common/eal_common_dev.c
> +++ b/lib/librte_eal/common/eal_common_dev.c
> @@ -14,9 +14,34 @@
>  #include <rte_devargs.h>
>  #include <rte_debug.h>
>  #include <rte_log.h>
> +#include <rte_spinlock.h>
> +#include <rte_malloc.h>
> 
>  #include "eal_private.h"
> 
> +/* spinlock for device callbacks */
> +static rte_spinlock_t dev_event_lock = RTE_SPINLOCK_INITIALIZER;
> +
> +/**
> + * The device event callback description.
> + *
> + * It contains callback address to be registered by user application,
> + * the pointer to the parameters for callback, and the device name.
> + */
> +struct dev_event_callback {
> +	TAILQ_ENTRY(dev_event_callback) next; /**< Callbacks list */
> +	rte_dev_event_cb_fn cb_fn;                /**< Callback address */
> +	void *cb_arg;                           /**< Callback parameter */
> +	char *dev_name;	 /**< Callback devcie name, NULL is for all
> device */

Typo: " devcie" -> "device" 

> +	uint32_t active;                        /**< Callback is executing */
> +};
> +
> +/** @internal Structure to keep track of registered callbacks */
> +TAILQ_HEAD(dev_event_cb_list, dev_event_callback);
> +
> +/* The device event callback list for all registered callbacks. */
> +static struct dev_event_cb_list dev_event_cbs;
> +
>  static int cmp_detached_dev_name(const struct rte_device *dev,
>  	const void *_name)
>  {
> @@ -207,3 +232,123 @@ rte_eal_hotplug_remove(const char *busname,
> const char *devname)
>  	rte_eal_devargs_remove(busname, devname);
>  	return ret;
>  }
> +
> +static struct dev_event_callback * __rte_experimental

We don't have to flag an internal function as " __rte_experimental ".

> +dev_event_cb_find(const char *device_name, rte_dev_event_cb_fn cb_fn,
> +				void *cb_arg)
> +{
> +	struct dev_event_callback *event_cb = NULL;
> +
> +	TAILQ_FOREACH(event_cb, &(dev_event_cbs), next) {
> +		if (event_cb->cb_fn == cb_fn && event_cb->cb_arg == cb_arg) {
> +			if (device_name == NULL && event_cb->dev_name == NULL)
> +				break;
> +			if (device_name == NULL || event_cb->dev_name == NULL)
> +				continue;
> +			if (!strcmp(event_cb->dev_name, device_name))
> +				break;
> +		}
> +	}
> +	return event_cb;
> +}
> +
> +int __rte_experimental
> +rte_dev_callback_register(const char *device_name, rte_dev_event_cb_fn cb_fn,
> +				void *cb_arg)

"rte_dev_event_callback_register" sounds more reasonable?

> +{
> +	struct dev_event_callback *event_cb = NULL;

We don't need to initialize it to NULL.

> +
> +	if (!cb_fn)
> +		return -EINVAL;
> +
> +	rte_spinlock_lock(&dev_event_lock);
> +
> +	if (TAILQ_EMPTY(&(dev_event_cbs)))
> +		TAILQ_INIT(&(dev_event_cbs));
> +
> +	event_cb = dev_event_cb_find(device_name, cb_fn, cb_arg);
> +
> +	/* create a new callback. */
> +	if (event_cb == NULL) {
> +		event_cb = malloc(sizeof(struct dev_event_callback));
> +		if (event_cb != NULL) {
> +			event_cb->cb_fn = cb_fn;
> +			event_cb->cb_arg = cb_arg;
> +			event_cb->dev_name = strdup(device_name);
> +			if (event_cb->dev_name == NULL)
> +				return -EINVAL;
> +			TAILQ_INSERT_TAIL(&(dev_event_cbs), event_cb, next);
> +		} else {
> +			RTE_LOG(ERR, EAL,
> +				"Failed to allocate memory for device event callback");

Miss the unlock here.

> +			return -ENOMEM;
> +		}
> +	}
> +
> +	rte_spinlock_unlock(&dev_event_lock);
> +	return (event_cb == NULL) ? -EEXIST : 0;
> +}
> +
> +int __rte_experimental
> +rte_dev_callback_unregister(const char *device_name,
> rte_dev_event_cb_fn cb_fn,
> +				void *cb_arg)
> +{
> +	int ret = -1;
> +	struct dev_event_callback *event_cb = NULL;
> +
> +	if (!cb_fn)
> +		return -EINVAL;
> +
> +	rte_spinlock_lock(&dev_event_lock);
> +
> +	event_cb = dev_event_cb_find(device_name, cb_fn, cb_arg);
> +
> +	/*
> +	 * if this callback is not executing right now,
> +	 * then remove it.
> +	 */

This note is not in right place.

> +	if (event_cb != NULL) {
> +		if (event_cb->active == 0) {
> +			TAILQ_REMOVE(&(dev_event_cbs), event_cb, next);
> +			rte_free(event_cb);
> +			ret = 0;
> +		}
> +		ret = -EBUSY;

Miss "else" for busy cb.

> +	}

Miss "else" for a cb which is not existed. And print an error log here.

> +
> +	rte_spinlock_unlock(&dev_event_lock);
> +	return ret;
> +}
> +
> +int __rte_experimental
> +dev_callback_process(char *device_name, enum rte_dev_event_type event,
> +				void *cb_arg)

>From interrupt thread, there is no cb_arg.

> +{
> +	struct dev_event_callback *cb_lst;
> +	int rc = 0;
> +
> +	rte_spinlock_lock(&dev_event_lock);
> +
> +	if (device_name == NULL)
> +		return -EINVAL;

Put such check out of lock. Or it's very easy to miss the unlock which is happening now.

> +
> +	TAILQ_FOREACH(cb_lst, &dev_event_cbs, next) {
> +		if (!cb_lst->dev_name)
> +			break;
> +		else if (!strcmp(cb_lst->dev_name, device_name))
> +			break;
> +	}

We invoke only one callback. But for this device, we could have many callback to call.

> +	if (cb_lst) {
> +		cb_lst->active = 1;
> +		if (cb_arg)
> +			cb_lst->cb_arg = cb_arg;

What's the reason of overwriting this cb_arg?

> +		rte_spinlock_unlock(&dev_event_lock);
> +		rc = cb_lst->cb_fn(device_name, event,
> +				cb_lst->cb_arg);
> +		rte_spinlock_lock(&dev_event_lock);
> +		cb_lst->active = 0;
> +	}
> +
> +	rte_spinlock_unlock(&dev_event_lock);
> +	return rc;

What's the reason of returning the ret of a callback? I don't think we need to return anything here.

> +}
> diff --git a/lib/librte_eal/common/eal_private.h
> b/lib/librte_eal/common/eal_private.h
> index 0b28770..d55cd68 100644
> --- a/lib/librte_eal/common/eal_private.h
> +++ b/lib/librte_eal/common/eal_private.h
> @@ -9,6 +9,8 @@
>  #include <stdint.h>
>  #include <stdio.h>
> 
> +#include <rte_dev.h>
> +
>  /**
>   * Initialize the memzone subsystem (private to eal).
>   *
> @@ -205,4 +207,26 @@ struct rte_bus
> *rte_bus_find_by_device_name(const char *str);
> 
>  int rte_mp_channel_init(void);
> 
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice

It's not an API. We don't need this.

> + *
> + * internal Executes all the user application registered callbacks for
> + * the specific device. It is for DPDK internal user only. User
> + * application should not call it directly.
> + *
> + * @param device_name
> + *  The device name.
> + * @param event
> + *  the device event type
> + * @param cb_arg
> + *  callback parameter.
> + *
> + * @return
> + *  - On success, return zero.
> + *  - On failure, a negative value.
> + */
> +int __rte_experimental

It's not an API, so don't add "__rte_experimental" flag.

> +dev_callback_process(char *device_name, enum rte_dev_event_type
> event,
> +				void *cb_arg);

As mentioned above, we don't have cb_arg from interrupt thread.

>  #endif /* _EAL_PRIVATE_H_ */
> diff --git a/lib/librte_eal/common/include/rte_dev.h
> b/lib/librte_eal/common/include/rte_dev.h
> index b688f1e..8867de6 100644
> --- a/lib/librte_eal/common/include/rte_dev.h
> +++ b/lib/librte_eal/common/include/rte_dev.h
> @@ -24,6 +24,26 @@ extern "C" {
>  #include <rte_compat.h>
>  #include <rte_log.h>
> 
> +/**
> + * The device event type.
> + */
> +enum rte_dev_event_type {
> +	RTE_DEV_EVENT_UNKNOWN,	/**< unknown event type */

Again, why do we report an "unknown" event to applications?

> +	RTE_DEV_EVENT_ADD,	/**< device being added */
> +	RTE_DEV_EVENT_REMOVE,	/**< device being removed */
> +	RTE_DEV_EVENT_MAX	/**< max value of this enum */
> +};
> +
> +struct rte_dev_event {
> +	enum rte_dev_event_type type;	/**< device event type */
> +	int subsystem;			/**< subsystem id */
> +	char *devname;			/**< device name */

I prefer to remove such note if we can already get the information from the variable name.

> +};
> +
> +typedef int (*rte_dev_event_cb_fn)(char *device_name,
> +					enum rte_dev_event_type event,
> +					void *cb_arg);

"rte_dev_event_callback" sounds better than "rte_dev_event_cb_fn" to me.

> +
>  __attribute__((format(printf, 2, 0)))
>  static inline void
>  rte_pmd_debug_trace(const char *func_name, const char *fmt, ...)
> @@ -267,4 +287,76 @@ __attribute__((used)) = str
>  }
>  #endif
> 
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * It registers the callback for the specific device.

"the" -> "a"

Besides, "It registers a callback for the specific device or all devices"

> + * Multiple callbacks cal be registered at the same time.

"cal" -> "can"

Besides, above sentence sounds like this API can register multiple callbacks. Change to:
"Users can call this API multiple times to register multiple callbacks."

> + *
> + * @param device_name
> + *  The device name, that is the param name of the struct rte_device,
> + *  null value means for all devices.
> + * @param cb_fn
> + *  callback address.
> + * @param cb_arg
> + *  address of parameter for callback.
> + *
> + * @return
> + *  - On success, zero.
> + *  - On failure, a negative value.
> + */
> +int __rte_experimental
> +rte_dev_callback_register(const char *device_name,
> rte_dev_event_cb_fn cb_fn,
> +			void *cb_arg);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * It unregisters the callback according to the specified device.
> + *
> + * @param device_name
> + *  The device name, that is the param name of the struct rte_device,
> + *  null value means for all devices.
> + * @param cb_fn
> + *  callback address.
> + * @param cb_arg
> + *  address of parameter for callback.
> + *
> + * @return
> + *  - On success, return the number of callback entities removed.
> + *  - On failure, a negative value.
> + */
> +int __rte_experimental
> +rte_dev_callback_unregister(const char *device_name,
> rte_dev_event_cb_fn cb_fn,
> +					void *cb_arg);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Start the device event monitoring.
> + *
> + * @param none
> + * @return
> + *   - On success, zero.
> + *   - On failure, a negative value.
> + */
> +int __rte_experimental
> +rte_dev_event_monitor_start(void);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Stop the device event monitoring .
> + *
> + * @param none
> + * @return
> + *   - On success, zero.
> + *   - On failure, a negative value.
> + */
> +int __rte_experimental
> +rte_dev_event_monitor_stop(void);
>  #endif /* _RTE_DEV_H_ */
> diff --git a/lib/librte_eal/linuxapp/eal/Makefile
> b/lib/librte_eal/linuxapp/eal/Makefile
> index 7e5bbe8..8578796 100644
> --- a/lib/librte_eal/linuxapp/eal/Makefile
> +++ b/lib/librte_eal/linuxapp/eal/Makefile
> @@ -41,6 +41,7 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) +=
> eal_lcore.c
>  SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_timer.c
>  SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_interrupts.c
>  SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_alarm.c
> +SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_dev.c
> 
>  # from common dir
>  SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_common_lcore.c
> diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c
> b/lib/librte_eal/linuxapp/eal/eal_dev.c
> new file mode 100644
> index 0000000..5ab5830
> --- /dev/null
> +++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
> @@ -0,0 +1,20 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2018 Intel Corporation
> + */
> +
> +#include <rte_log.h>
> +#include <rte_dev.h>
> +
> +int __rte_experimental
> +rte_dev_event_monitor_start(void)
> +{
> +	/* TODO: start uevent monitor for linux */
> +	return 0;
> +}
> +
> +int __rte_experimental
> +rte_dev_event_monitor_stop(void)
> +{
> +	/* TODO: stop uevent monitor for linux */
> +	return 0;
> +}
> --
> 2.7.4

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V16 2/4] eal: add device event monitor framework
  2018-03-28  3:39                                                             ` Tan, Jianfeng
@ 2018-03-28  8:12                                                               ` Guo, Jia
  0 siblings, 0 replies; 494+ messages in thread
From: Guo, Jia @ 2018-03-28  8:12 UTC (permalink / raw)
  To: Tan, Jianfeng, stephen, Richardson, Bruce, Yigit, Ferruh,
	Ananyev, Konstantin, gaetan.rivet, Wu, Jingjing, thomas, motih,
	Van Haaren, Harry
  Cc: jblunck, shreyansh.jain, dev, Zhang, Helin

jianfeng

will correct every typo, and comment inline.

On 3/28/2018 11:39 AM, Tan, Jianfeng wrote:
>
>> -----Original Message-----
>> From: Guo, Jia
>> Sent: Monday, March 26, 2018 7:21 PM
>> To: stephen@networkplumber.org; Richardson, Bruce; Yigit, Ferruh;
>> Ananyev, Konstantin; gaetan.rivet@6wind.com; Wu, Jingjing;
>> thomas@monjalon.net; motih@mellanox.com; Van Haaren, Harry; Tan,
>> Jianfeng
>> Cc: jblunck@infradead.org; shreyansh.jain@nxp.com; dev@dpdk.org; Guo,
>> Jia; Zhang, Helin
>> Subject: [PATCH V16 2/4] eal: add device event monitor framework
>>
>> This patch aims to add a general device event monitor mechanism at
> mechanism -> framework?
> Linux uevent is a mechanism.
>> EAL device layer, for device hotplug awareness and actions adopted
>> accordingly. It could also expand for all other type of device event
>> monitor, but not in this scope at the stage.
>>
>> To get started, users firstly register or unregister callbacks through
>> the new added APIs. Callbacks can be some device specific, or for all
>> devices.
>>    -rte_dev_callback_register
>>    -rte_dev_callback_unregister
>>
> New APIs shall be added into rte_eal_version.map.
>
> And also, we shall update the release note.
>
>> Then application shall call below new added APIs to enable/disable the
>> mechanism:
>>    - rte_dev_event_monitor_start
>>    - rte_dev_event_monitor_stop
> Do we really have the use case to keep the callbacks, but stop monitoring? I don't think we really need these two APIs to enable/disable.
>
> Instead, if we have a callback registered, then enable it; if we don't have any callbacks, then it's definitely disabled.
you raise a good question, but if it is good readable to enable the 
monitor into the register function or let register into the enable 
function, should let we think about that.
>> Use hotplug case for example, when device hotplug insertion or hotplug
>> removal, we will get notified from kernel, then call user's callbacks
>> accordingly to handle it, such as detach or attach the device from the
>> bus, and could be benifit for futher fail-safe or live-migration.
> Typo: "be benifit " -> "benefit"
> Typo: " futher" -> "further"
>
>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
>> ---
>> v16->v15:
>> 1.remove some linux related code out of eal common layer
>> 2.fix some uneasy readble issue.
>> ---
>>   lib/librte_eal/bsdapp/eal/Makefile      |   1 +
>>   lib/librte_eal/bsdapp/eal/eal_dev.c     |  19 +++++
>>   lib/librte_eal/common/eal_common_dev.c  | 145
>> ++++++++++++++++++++++++++++++++
>>   lib/librte_eal/common/eal_private.h     |  24 ++++++
>>   lib/librte_eal/common/include/rte_dev.h |  92 ++++++++++++++++++++
>>   lib/librte_eal/linuxapp/eal/Makefile    |   1 +
>>   lib/librte_eal/linuxapp/eal/eal_dev.c   |  20 +++++
>>   7 files changed, 302 insertions(+)
>>   create mode 100644 lib/librte_eal/bsdapp/eal/eal_dev.c
>>   create mode 100644 lib/librte_eal/linuxapp/eal/eal_dev.c
>>
>> diff --git a/lib/librte_eal/bsdapp/eal/Makefile
>> b/lib/librte_eal/bsdapp/eal/Makefile
>> index dd455e6..c0921dd 100644
>> --- a/lib/librte_eal/bsdapp/eal/Makefile
>> +++ b/lib/librte_eal/bsdapp/eal/Makefile
>> @@ -33,6 +33,7 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) +=
>> eal_lcore.c
>>   SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_timer.c
>>   SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_interrupts.c
>>   SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_alarm.c
>> +SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_dev.c
>>
>>   # from common dir
>>   SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_common_lcore.c
>> diff --git a/lib/librte_eal/bsdapp/eal/eal_dev.c
>> b/lib/librte_eal/bsdapp/eal/eal_dev.c
>> new file mode 100644
>> index 0000000..ad606b3
>> --- /dev/null
>> +++ b/lib/librte_eal/bsdapp/eal/eal_dev.c
>> @@ -0,0 +1,19 @@
>> +/* SPDX-License-Identifier: BSD-3-Clause
>> + * Copyright(c) 2018 Intel Corporation
>> + */
>> +
>> +#include <rte_log.h>
>> +
>> +int __rte_experimental
>> +rte_dev_event_monitor_start(void)
>> +{
>> +	RTE_LOG(ERR, EAL, "Device event is not supported for FreeBSD\n");
>> +	return -1;
>> +}
>> +
>> +int __rte_experimental
>> +rte_dev_event_monitor_stop(void)
>> +{
>> +	RTE_LOG(ERR, EAL, "Device event is not supported for FreeBSD\n");
>> +	return -1;
>> +}
>> diff --git a/lib/librte_eal/common/eal_common_dev.c
>> b/lib/librte_eal/common/eal_common_dev.c
>> index cd07144..3a1bbb6 100644
>> --- a/lib/librte_eal/common/eal_common_dev.c
>> +++ b/lib/librte_eal/common/eal_common_dev.c
>> @@ -14,9 +14,34 @@
>>   #include <rte_devargs.h>
>>   #include <rte_debug.h>
>>   #include <rte_log.h>
>> +#include <rte_spinlock.h>
>> +#include <rte_malloc.h>
>>
>>   #include "eal_private.h"
>>
>> +/* spinlock for device callbacks */
>> +static rte_spinlock_t dev_event_lock = RTE_SPINLOCK_INITIALIZER;
>> +
>> +/**
>> + * The device event callback description.
>> + *
>> + * It contains callback address to be registered by user application,
>> + * the pointer to the parameters for callback, and the device name.
>> + */
>> +struct dev_event_callback {
>> +	TAILQ_ENTRY(dev_event_callback) next; /**< Callbacks list */
>> +	rte_dev_event_cb_fn cb_fn;                /**< Callback address */
>> +	void *cb_arg;                           /**< Callback parameter */
>> +	char *dev_name;	 /**< Callback devcie name, NULL is for all
>> device */
> Typo: " devcie" -> "device"
>
>> +	uint32_t active;                        /**< Callback is executing */
>> +};
>> +
>> +/** @internal Structure to keep track of registered callbacks */
>> +TAILQ_HEAD(dev_event_cb_list, dev_event_callback);
>> +
>> +/* The device event callback list for all registered callbacks. */
>> +static struct dev_event_cb_list dev_event_cbs;
>> +
>>   static int cmp_detached_dev_name(const struct rte_device *dev,
>>   	const void *_name)
>>   {
>> @@ -207,3 +232,123 @@ rte_eal_hotplug_remove(const char *busname,
>> const char *devname)
>>   	rte_eal_devargs_remove(busname, devname);
>>   	return ret;
>>   }
>> +
>> +static struct dev_event_callback * __rte_experimental
> We don't have to flag an internal function as " __rte_experimental ".
>
>> +dev_event_cb_find(const char *device_name, rte_dev_event_cb_fn cb_fn,
>> +				void *cb_arg)
>> +{
>> +	struct dev_event_callback *event_cb = NULL;
>> +
>> +	TAILQ_FOREACH(event_cb, &(dev_event_cbs), next) {
>> +		if (event_cb->cb_fn == cb_fn && event_cb->cb_arg == cb_arg) {
>> +			if (device_name == NULL && event_cb->dev_name == NULL)
>> +				break;
>> +			if (device_name == NULL || event_cb->dev_name == NULL)
>> +				continue;
>> +			if (!strcmp(event_cb->dev_name, device_name))
>> +				break;
>> +		}
>> +	}
>> +	return event_cb;
>> +}
>> +
>> +int __rte_experimental
>> +rte_dev_callback_register(const char *device_name, rte_dev_event_cb_fn cb_fn,
>> +				void *cb_arg)
> "rte_dev_event_callback_register" sounds more reasonable?
>
>> +{
>> +	struct dev_event_callback *event_cb = NULL;
> We don't need to initialize it to NULL.
>
>> +
>> +	if (!cb_fn)
>> +		return -EINVAL;
>> +
>> +	rte_spinlock_lock(&dev_event_lock);
>> +
>> +	if (TAILQ_EMPTY(&(dev_event_cbs)))
>> +		TAILQ_INIT(&(dev_event_cbs));
>> +
>> +	event_cb = dev_event_cb_find(device_name, cb_fn, cb_arg);
>> +
>> +	/* create a new callback. */
>> +	if (event_cb == NULL) {
>> +		event_cb = malloc(sizeof(struct dev_event_callback));
>> +		if (event_cb != NULL) {
>> +			event_cb->cb_fn = cb_fn;
>> +			event_cb->cb_arg = cb_arg;
>> +			event_cb->dev_name = strdup(device_name);
>> +			if (event_cb->dev_name == NULL)
>> +				return -EINVAL;
>> +			TAILQ_INSERT_TAIL(&(dev_event_cbs), event_cb, next);
>> +		} else {
>> +			RTE_LOG(ERR, EAL,
>> +				"Failed to allocate memory for device event callback");
> Miss the unlock here.
>
>> +			return -ENOMEM;
>> +		}
>> +	}
>> +
>> +	rte_spinlock_unlock(&dev_event_lock);
>> +	return (event_cb == NULL) ? -EEXIST : 0;
>> +}
>> +
>> +int __rte_experimental
>> +rte_dev_callback_unregister(const char *device_name,
>> rte_dev_event_cb_fn cb_fn,
>> +				void *cb_arg)
>> +{
>> +	int ret = -1;
>> +	struct dev_event_callback *event_cb = NULL;
>> +
>> +	if (!cb_fn)
>> +		return -EINVAL;
>> +
>> +	rte_spinlock_lock(&dev_event_lock);
>> +
>> +	event_cb = dev_event_cb_find(device_name, cb_fn, cb_arg);
>> +
>> +	/*
>> +	 * if this callback is not executing right now,
>> +	 * then remove it.
>> +	 */
> This note is not in right place.
>
>> +	if (event_cb != NULL) {
>> +		if (event_cb->active == 0) {
>> +			TAILQ_REMOVE(&(dev_event_cbs), event_cb, next);
>> +			rte_free(event_cb);
>> +			ret = 0;
>> +		}
>> +		ret = -EBUSY;
> Miss "else" for busy cb.
>
>> +	}
> Miss "else" for a cb which is not existed. And print an error log here.
>
>> +
>> +	rte_spinlock_unlock(&dev_event_lock);
>> +	return ret;
>> +}
>> +
>> +int __rte_experimental
>> +dev_callback_process(char *device_name, enum rte_dev_event_type event,
>> +				void *cb_arg)
>  From interrupt thread, there is no cb_arg.
>> +{
>> +	struct dev_event_callback *cb_lst;
>> +	int rc = 0;
>> +
>> +	rte_spinlock_lock(&dev_event_lock);
>> +
>> +	if (device_name == NULL)
>> +		return -EINVAL;
> Put such check out of lock. Or it's very easy to miss the unlock which is happening now.
>
>> +
>> +	TAILQ_FOREACH(cb_lst, &dev_event_cbs, next) {
>> +		if (!cb_lst->dev_name)
>> +			break;
>> +		else if (!strcmp(cb_lst->dev_name, device_name))
>> +			break;
>> +	}
> We invoke only one callback. But for this device, we could have many callback to call.
>
>> +	if (cb_lst) {
>> +		cb_lst->active = 1;
>> +		if (cb_arg)
>> +			cb_lst->cb_arg = cb_arg;
> What's the reason of overwriting this cb_arg?
it is no used anymore here.
>> +		rte_spinlock_unlock(&dev_event_lock);
>> +		rc = cb_lst->cb_fn(device_name, event,
>> +				cb_lst->cb_arg);
>> +		rte_spinlock_lock(&dev_event_lock);
>> +		cb_lst->active = 0;
>> +	}
>> +
>> +	rte_spinlock_unlock(&dev_event_lock);
>> +	return rc;
> What's the reason of returning the ret of a callback? I don't think we need to return anything here.
>
>> +}
>> diff --git a/lib/librte_eal/common/eal_private.h
>> b/lib/librte_eal/common/eal_private.h
>> index 0b28770..d55cd68 100644
>> --- a/lib/librte_eal/common/eal_private.h
>> +++ b/lib/librte_eal/common/eal_private.h
>> @@ -9,6 +9,8 @@
>>   #include <stdint.h>
>>   #include <stdio.h>
>>
>> +#include <rte_dev.h>
>> +
>>   /**
>>    * Initialize the memzone subsystem (private to eal).
>>    *
>> @@ -205,4 +207,26 @@ struct rte_bus
>> *rte_bus_find_by_device_name(const char *str);
>>
>>   int rte_mp_channel_init(void);
>>
>> +/**
>> + * @warning
>> + * @b EXPERIMENTAL: this API may change without prior notice
> It's not an API. We don't need this.
>
>> + *
>> + * internal Executes all the user application registered callbacks for
>> + * the specific device. It is for DPDK internal user only. User
>> + * application should not call it directly.
>> + *
>> + * @param device_name
>> + *  The device name.
>> + * @param event
>> + *  the device event type
>> + * @param cb_arg
>> + *  callback parameter.
>> + *
>> + * @return
>> + *  - On success, return zero.
>> + *  - On failure, a negative value.
>> + */
>> +int __rte_experimental
> It's not an API, so don't add "__rte_experimental" flag.
>
>> +dev_callback_process(char *device_name, enum rte_dev_event_type
>> event,
>> +				void *cb_arg);
> As mentioned above, we don't have cb_arg from interrupt thread.
>
>>   #endif /* _EAL_PRIVATE_H_ */
>> diff --git a/lib/librte_eal/common/include/rte_dev.h
>> b/lib/librte_eal/common/include/rte_dev.h
>> index b688f1e..8867de6 100644
>> --- a/lib/librte_eal/common/include/rte_dev.h
>> +++ b/lib/librte_eal/common/include/rte_dev.h
>> @@ -24,6 +24,26 @@ extern "C" {
>>   #include <rte_compat.h>
>>   #include <rte_log.h>
>>
>> +/**
>> + * The device event type.
>> + */
>> +enum rte_dev_event_type {
>> +	RTE_DEV_EVENT_UNKNOWN,	/**< unknown event type */
> Again, why do we report an "unknown" event to applications?
>
>> +	RTE_DEV_EVENT_ADD,	/**< device being added */
>> +	RTE_DEV_EVENT_REMOVE,	/**< device being removed */
>> +	RTE_DEV_EVENT_MAX	/**< max value of this enum */
>> +};
>> +
>> +struct rte_dev_event {
>> +	enum rte_dev_event_type type;	/**< device event type */
>> +	int subsystem;			/**< subsystem id */
>> +	char *devname;			/**< device name */
> I prefer to remove such note if we can already get the information from the variable name.
>
>> +};
>> +
>> +typedef int (*rte_dev_event_cb_fn)(char *device_name,
>> +					enum rte_dev_event_type event,
>> +					void *cb_arg);
> "rte_dev_event_callback" sounds better than "rte_dev_event_cb_fn" to me.
>
>> +
>>   __attribute__((format(printf, 2, 0)))
>>   static inline void
>>   rte_pmd_debug_trace(const char *func_name, const char *fmt, ...)
>> @@ -267,4 +287,76 @@ __attribute__((used)) = str
>>   }
>>   #endif
>>
>> +/**
>> + * @warning
>> + * @b EXPERIMENTAL: this API may change without prior notice
>> + *
>> + * It registers the callback for the specific device.
> "the" -> "a"
>
> Besides, "It registers a callback for the specific device or all devices"
>
>> + * Multiple callbacks cal be registered at the same time.
> "cal" -> "can"
>
> Besides, above sentence sounds like this API can register multiple callbacks. Change to:
> "Users can call this API multiple times to register multiple callbacks."
>
>> + *
>> + * @param device_name
>> + *  The device name, that is the param name of the struct rte_device,
>> + *  null value means for all devices.
>> + * @param cb_fn
>> + *  callback address.
>> + * @param cb_arg
>> + *  address of parameter for callback.
>> + *
>> + * @return
>> + *  - On success, zero.
>> + *  - On failure, a negative value.
>> + */
>> +int __rte_experimental
>> +rte_dev_callback_register(const char *device_name,
>> rte_dev_event_cb_fn cb_fn,
>> +			void *cb_arg);
>> +
>> +/**
>> + * @warning
>> + * @b EXPERIMENTAL: this API may change without prior notice
>> + *
>> + * It unregisters the callback according to the specified device.
>> + *
>> + * @param device_name
>> + *  The device name, that is the param name of the struct rte_device,
>> + *  null value means for all devices.
>> + * @param cb_fn
>> + *  callback address.
>> + * @param cb_arg
>> + *  address of parameter for callback.
>> + *
>> + * @return
>> + *  - On success, return the number of callback entities removed.
>> + *  - On failure, a negative value.
>> + */
>> +int __rte_experimental
>> +rte_dev_callback_unregister(const char *device_name,
>> rte_dev_event_cb_fn cb_fn,
>> +					void *cb_arg);
>> +
>> +/**
>> + * @warning
>> + * @b EXPERIMENTAL: this API may change without prior notice
>> + *
>> + * Start the device event monitoring.
>> + *
>> + * @param none
>> + * @return
>> + *   - On success, zero.
>> + *   - On failure, a negative value.
>> + */
>> +int __rte_experimental
>> +rte_dev_event_monitor_start(void);
>> +
>> +/**
>> + * @warning
>> + * @b EXPERIMENTAL: this API may change without prior notice
>> + *
>> + * Stop the device event monitoring .
>> + *
>> + * @param none
>> + * @return
>> + *   - On success, zero.
>> + *   - On failure, a negative value.
>> + */
>> +int __rte_experimental
>> +rte_dev_event_monitor_stop(void);
>>   #endif /* _RTE_DEV_H_ */
>> diff --git a/lib/librte_eal/linuxapp/eal/Makefile
>> b/lib/librte_eal/linuxapp/eal/Makefile
>> index 7e5bbe8..8578796 100644
>> --- a/lib/librte_eal/linuxapp/eal/Makefile
>> +++ b/lib/librte_eal/linuxapp/eal/Makefile
>> @@ -41,6 +41,7 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) +=
>> eal_lcore.c
>>   SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_timer.c
>>   SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_interrupts.c
>>   SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_alarm.c
>> +SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_dev.c
>>
>>   # from common dir
>>   SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_common_lcore.c
>> diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c
>> b/lib/librte_eal/linuxapp/eal/eal_dev.c
>> new file mode 100644
>> index 0000000..5ab5830
>> --- /dev/null
>> +++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
>> @@ -0,0 +1,20 @@
>> +/* SPDX-License-Identifier: BSD-3-Clause
>> + * Copyright(c) 2018 Intel Corporation
>> + */
>> +
>> +#include <rte_log.h>
>> +#include <rte_dev.h>
>> +
>> +int __rte_experimental
>> +rte_dev_event_monitor_start(void)
>> +{
>> +	/* TODO: start uevent monitor for linux */
>> +	return 0;
>> +}
>> +
>> +int __rte_experimental
>> +rte_dev_event_monitor_stop(void)
>> +{
>> +	/* TODO: stop uevent monitor for linux */
>> +	return 0;
>> +}
>> --
>> 2.7.4

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V16 1/4] eal: add device event handle in interrupt thread
  2018-03-27  9:26                                                             ` Tan, Jianfeng
@ 2018-03-28  8:14                                                               ` Guo, Jia
  0 siblings, 0 replies; 494+ messages in thread
From: Guo, Jia @ 2018-03-28  8:14 UTC (permalink / raw)
  To: Tan, Jianfeng, stephen, bruce.richardson, ferruh.yigit,
	konstantin.ananyev, gaetan.rivet, jingjing.wu, thomas, motih,
	harry.van.haaren
  Cc: jblunck, shreyansh.jain, dev, helin.zhang

jianfeng


On 3/27/2018 5:26 PM, Tan, Jianfeng wrote:
> Hi,
>
>
> On 3/26/2018 7:20 PM, Jeff Guo wrote:
>> Add new interrupt handle type of RTE_INTR_HANDLE_DEV_EVENT, for
>> device event interrupt monitor.
>
> A simple search of RTE_INTR_HANDLE_ALARM, we can see that we still 
> need to update rte_intr_enable()/rte_intr_disable(), and 
> test_interrupt_init().
>
you are right about that.
> Thanks,
> Jianfeng
>
>>
>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
>> ---
>> v16->v15:
>> split into small patch base on the function
>> ---
>>   lib/librte_eal/common/include/rte_eal_interrupts.h | 1 +
>>   lib/librte_eal/linuxapp/eal/eal_interrupts.c       | 5 ++++-
>>   2 files changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/lib/librte_eal/common/include/rte_eal_interrupts.h 
>> b/lib/librte_eal/common/include/rte_eal_interrupts.h
>> index 3f792a9..6eb4932 100644
>> --- a/lib/librte_eal/common/include/rte_eal_interrupts.h
>> +++ b/lib/librte_eal/common/include/rte_eal_interrupts.h
>> @@ -34,6 +34,7 @@ enum rte_intr_handle_type {
>>       RTE_INTR_HANDLE_ALARM,        /**< alarm handle */
>>       RTE_INTR_HANDLE_EXT,          /**< external handler */
>>       RTE_INTR_HANDLE_VDEV,         /**< virtual device */
>> +    RTE_INTR_HANDLE_DEV_EVENT,    /**< device event handle */
>>       RTE_INTR_HANDLE_MAX           /**< count of elements */
>>   };
>>   diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c 
>> b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
>> index f86f22f..842acaa 100644
>> --- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c
>> +++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
>> @@ -674,7 +674,10 @@ eal_intr_process_interrupts(struct epoll_event 
>> *events, int nfds)
>>               bytes_read = 0;
>>               call = true;
>>               break;
>> -
>> +        case RTE_INTR_HANDLE_DEV_EVENT:
>> +            bytes_read = 0;
>> +            call = true;
>> +            break;
>>           default:
>>               bytes_read = 1;
>>               break;
>

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V16 3/4] eal/linux: uevent parse and process
  2018-03-26 11:20                                                           ` [PATCH V16 3/4] eal/linux: uevent parse and process Jeff Guo
@ 2018-03-28 16:15                                                             ` Tan, Jianfeng
  2018-03-29 13:32                                                               ` Van Haaren, Harry
  2018-03-29 15:08                                                               ` Guo, Jia
  0 siblings, 2 replies; 494+ messages in thread
From: Tan, Jianfeng @ 2018-03-28 16:15 UTC (permalink / raw)
  To: Jeff Guo, stephen, bruce.richardson, ferruh.yigit,
	konstantin.ananyev, gaetan.rivet, jingjing.wu, thomas, motih,
	harry.van.haaren
  Cc: jblunck, shreyansh.jain, dev, helin.zhang

BTW, adding new .c file needs to update meson.build now.

On 3/26/2018 7:20 PM, Jeff Guo wrote:
> In order to handle the uevent which have been detected from the kernel
> side, add uevent parse and process function to translate the uevent into
> device event, which user has subscribe to monitor.
>
> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> ---
> 1.move all linux specific together
> ---
>   lib/librte_eal/linuxapp/eal/eal_dev.c | 214 +++++++++++++++++++++++++++++++++-
>   1 file changed, 211 insertions(+), 3 deletions(-)
>
> diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
> index 5ab5830..90094c0 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_dev.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
> @@ -2,19 +2,227 @@
>    * Copyright(c) 2018 Intel Corporation
>    */
>   
> -#include <rte_log.h>
> +#include <stdio.h>
> +#include <string.h>
> +#include <inttypes.h>
> +#include <sys/queue.h>
> +#include <sys/signalfd.h>
> +#include <sys/ioctl.h>
> +#include <sys/socket.h>
> +#include <linux/netlink.h>
> +#include <sys/epoll.h>
> +#include <unistd.h>
> +#include <signal.h>

Some header files are not necessary, the above one for example.

> +#include <stdbool.h>
> +#include <fcntl.h>
> +
> +#include <rte_malloc.h>
> +#include <rte_bus.h>
>   #include <rte_dev.h>
> +#include <rte_devargs.h>

We don't need this one neither.

> +#include <rte_debug.h>
> +#include <rte_log.h>
> +#include <rte_interrupts.h>
> +
> +#include "eal_private.h"
> +#include "eal_thread.h"

Ditto.

> +
> +static struct rte_intr_handle intr_handle = {.fd = -1 };

I don't think we need a static intr_handle, what we need is the monitor fd.

> +static bool monitor_not_started = true;
> +
> +#define EAL_UEV_MSG_LEN 4096
> +#define EAL_UEV_MSG_ELEM_LEN 128
> +
> +/* identify the system layer which event exposure from */
> +enum eal_dev_event_subsystem {
> +	EAL_DEV_EVENT_SUBSYSTEM_PCI, /* PCI bus device event */
> +	EAL_DEV_EVENT_SUBSYSTEM_UIO, /* UIO driver device event */
> +	EAL_DEV_EVENT_SUBSYSTEM_MAX
> +};
> +
> +static int
> +dev_uev_monitor_fd_new(void)
> +{
> +	int uevent_fd;
> +
> +	uevent_fd = socket(PF_NETLINK, SOCK_RAW | SOCK_CLOEXEC |
> +			SOCK_NONBLOCK,
> +			NETLINK_KOBJECT_UEVENT);
> +	if (uevent_fd < 0) {
> +		RTE_LOG(ERR, EAL, "create uevent fd failed\n");
> +		return -1;
> +	}
> +	return uevent_fd;
> +}
> +
> +static int
> +dev_uev_monitor_create(int netlink_fd)

I think we should merge this function with above function. I don't see a 
reason to split into two functions.

> +{
> +	struct sockaddr_nl addr;
> +	int ret;
> +	int size = 64 * 1024;
> +	int nonblock = 1;
> +
> +	memset(&addr, 0, sizeof(addr));
> +	addr.nl_family = AF_NETLINK;
> +	addr.nl_pid = 0;
> +	addr.nl_groups = 0xffffffff;
> +
> +	if (bind(netlink_fd, (struct sockaddr *) &addr, sizeof(addr)) < 0) {
> +		RTE_LOG(ERR, EAL, "bind failed\n");

Please print more information here, so that we don't have to check the 
code if we really encounter such an error.

> +		goto err;
> +	}
> +
> +	setsockopt(netlink_fd, SOL_SOCKET, SO_PASSCRED, &size, sizeof(size));
> +
> +	ret = ioctl(netlink_fd, FIONBIO, &nonblock);
> +	if (ret != 0) {
> +		RTE_LOG(ERR, EAL, "ioctl(FIONBIO) failed\n");
> +		goto err;
> +	}
> +	return 0;
> +err:
> +	close(netlink_fd);
> +	return -1;
> +}
> +
> +static void
> +dev_uev_parse(const char *buf, struct rte_dev_event *event, int length)

We always get an event we care? If no, we need return something so that 
the caller can skip this event.

> +{
> +	char action[EAL_UEV_MSG_ELEM_LEN];
> +	char subsystem[EAL_UEV_MSG_ELEM_LEN];
> +	char dev_path[EAL_UEV_MSG_ELEM_LEN];
> +	char pci_slot_name[EAL_UEV_MSG_ELEM_LEN];
> +	int i = 0;
> +
> +	memset(action, 0, EAL_UEV_MSG_ELEM_LEN);
> +	memset(subsystem, 0, EAL_UEV_MSG_ELEM_LEN);
> +	memset(dev_path, 0, EAL_UEV_MSG_ELEM_LEN);
> +	memset(pci_slot_name, 0, EAL_UEV_MSG_ELEM_LEN);

I did not see you need dev_path, why do we care to parse it?

> +
> +	while (i < length) {
> +		for (; i < length; i++) {
> +			if (*buf)
> +				break;
> +			buf++;
> +		}
> +		if (!strncmp(buf, "ACTION=", 7)) {
> +			buf += 7;
> +			i += 7;
> +			snprintf(action, sizeof(action), "%s", buf);
> +		} else if (!strncmp(buf, "DEVPATH=", 8)) {
> +			buf += 8;
> +			i += 8;
> +			snprintf(dev_path, sizeof(dev_path), "%s", buf);
> +		} else if (!strncmp(buf, "SUBSYSTEM=", 10)) {
> +			buf += 10;
> +			i += 10;
> +			snprintf(subsystem, sizeof(subsystem), "%s", buf);
> +		} else if (!strncmp(buf, "PCI_SLOT_NAME=", 14)) {
> +			buf += 14;
> +			i += 14;
> +			snprintf(pci_slot_name, sizeof(subsystem), "%s", buf);
> +			event->devname = pci_slot_name;

You are assigning a stack pointer for the caller to use; this is 
dangerous, we should never do that.

> +		}
> +		for (; i < length; i++) {
> +			if (*buf == '\0')
> +				break;
> +			buf++;
> +		}
> +	}
> +
> +	if (!strncmp(subsystem, "uio", 3))
> +		event->subsystem = EAL_DEV_EVENT_SUBSYSTEM_UIO;
> +	else if (!strncmp(subsystem, "pci", 3))
> +		event->subsystem = EAL_DEV_EVENT_SUBSYSTEM_PCI;
> +	if (!strncmp(action, "add", 3))
> +		event->type = RTE_DEV_EVENT_ADD;
> +	if (!strncmp(action, "remove", 6))
> +		event->type = RTE_DEV_EVENT_REMOVE;
> +}
> +
> +static int
> +dev_uev_receive(int fd, struct rte_dev_event *uevent)
> +{
> +	int ret;
> +	char buf[EAL_UEV_MSG_LEN];
> +
> +	memset(uevent, 0, sizeof(struct rte_dev_event));
> +	memset(buf, 0, EAL_UEV_MSG_LEN);
> +
> +	ret = recv(fd, buf, EAL_UEV_MSG_LEN, MSG_DONTWAIT);
> +	if (ret < 0) {
> +		RTE_LOG(ERR, EAL,
> +		"Socket read error(%d): %s\n",
> +		errno, strerror(errno));

The above three lines are in bad format.

> +		return -1;
> +	} else if (ret == 0)
> +		/* connection closed */
> +		return -1;
> +
> +	dev_uev_parse(buf, uevent, EAL_UEV_MSG_LEN);
> +
> +	return 0;
> +}
> +
> +static void
> +dev_uev_process(__rte_unused void *param)
> +{
> +	struct rte_dev_event uevent;
> +
> +	if (dev_uev_receive(intr_handle.fd, &uevent))
> +		return;
> +
> +	if (uevent.devname)
> +		dev_callback_process(uevent.devname, uevent.type, NULL);
> +}
>   
>   int __rte_experimental
>   rte_dev_event_monitor_start(void)
>   {
> -	/* TODO: start uevent monitor for linux */
> +	int ret;
> +
> +	if (!monitor_not_started)
> +		return 0;
> +
> +	intr_handle.fd = dev_uev_monitor_fd_new();
> +	intr_handle.type = RTE_INTR_HANDLE_DEV_EVENT;
> +
> +	ret = dev_uev_monitor_create(intr_handle.fd);
> +
> +	if (ret) {
> +		RTE_LOG(ERR, EAL, "error create device event monitor\n");
> +		return -1;
> +	}
> +
> +	ret = rte_intr_callback_register(&intr_handle, dev_uev_process, NULL);
> +
> +	if (ret) {
> +		RTE_LOG(ERR, EAL, "fail to register uevent callback\n");
> +		return -1;
> +	}
> +
> +	monitor_not_started = false;
> +
>   	return 0;
>   }
>   
>   int __rte_experimental
>   rte_dev_event_monitor_stop(void)
>   {
> -	/* TODO: stop uevent monitor for linux */
> +	int ret;
> +
> +	if (monitor_not_started)
> +		return 0;
> +
> +	ret = rte_intr_callback_unregister(&intr_handle, dev_uev_process, NULL);
> +	if (ret) {
> +		RTE_LOG(ERR, EAL, "fail to unregister uevent callback");
> +		return ret;
> +	}
> +
> +	close(intr_handle.fd);
> +	intr_handle.fd = -1;
> +	monitor_not_started = true;
>   	return 0;
>   }

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V16 4/4] app/testpmd: enable device hotplug monitoring
  2018-03-26 11:20                                                           ` [PATCH V16 4/4] app/testpmd: enable device hotplug monitoring Jeff Guo
@ 2018-03-28 16:41                                                             ` Tan, Jianfeng
  2018-03-29 16:00                                                             ` [PATCH V17 0/4] add device event monitor framework Jeff Guo
  1 sibling, 0 replies; 494+ messages in thread
From: Tan, Jianfeng @ 2018-03-28 16:41 UTC (permalink / raw)
  To: Jeff Guo, stephen, bruce.richardson, ferruh.yigit,
	konstantin.ananyev, gaetan.rivet, jingjing.wu, thomas, motih,
	harry.van.haaren
  Cc: jblunck, shreyansh.jain, dev, helin.zhang



On 3/26/2018 7:20 PM, Jeff Guo wrote:
> Use testpmd for example, to show an application how to use device event
> mechanism to monitor the hotplug event, involve both hot removal event

involve -> including

> and the hot insertion event.
>
> The process is that, testpmd first enable hotplug monitoring and register
> the user's callback, when device being hotplug insertion or hotplug
> removal, the eal monitor the event and call user's callbacks, the
> application according their hot plug policy to detach or attach the device
> from the bus.

You are not exactly describing what's done in this example. From what I 
see, you only implement hot-unplug. For hot-plug, we will need to 
register callback with dev_name of NULL.

And without the other patch series, this only works without starting 
datapath.


>
> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> ---
> 1.modify log and patch description.
> ---
>   app/test-pmd/parameters.c |   5 +-
>   app/test-pmd/testpmd.c    | 195 +++++++++++++++++++++++++++++++++++++++++++++-
>   app/test-pmd/testpmd.h    |  11 +++
>   3 files changed, 209 insertions(+), 2 deletions(-)
>
> diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
> index 97d22b8..825d602 100644
> --- a/app/test-pmd/parameters.c
> +++ b/app/test-pmd/parameters.c
> @@ -186,6 +186,7 @@ usage(char* progname)
>   	printf("  --flow-isolate-all: "
>   	       "requests flow API isolated mode on all ports at initialization time.\n");
>   	printf("  --tx-offloads=0xXXXXXXXX: hexadecimal bitmask of TX queue offloads\n");
> +	printf("  --hot-plug: enalbe hot plug for device.\n");
>   }
>   
>   #ifdef RTE_LIBRTE_CMDLINE
> @@ -621,6 +622,7 @@ launch_args_parse(int argc, char** argv)
>   		{ "print-event",		1, 0, 0 },
>   		{ "mask-event",			1, 0, 0 },
>   		{ "tx-offloads",		1, 0, 0 },
> +		{ "hot-plug",			0, 0, 0 },
>   		{ 0, 0, 0, 0 },
>   	};
>   
> @@ -1102,7 +1104,8 @@ launch_args_parse(int argc, char** argv)
>   					rte_exit(EXIT_FAILURE,
>   						 "invalid mask-event argument\n");
>   				}
> -
> +			if (!strcmp(lgopts[opt_idx].name, "hot-plug"))
> +				hot_plug = 1;
>   			break;
>   		case 'h':
>   			usage(argv[0]);
> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
> index 4c0e258..bb1ac8f 100644
> --- a/app/test-pmd/testpmd.c
> +++ b/app/test-pmd/testpmd.c
> @@ -12,6 +12,7 @@
>   #include <sys/mman.h>
>   #include <sys/types.h>
>   #include <errno.h>
> +#include <stdbool.h>
>   
>   #include <sys/queue.h>
>   #include <sys/stat.h>
> @@ -284,6 +285,9 @@ uint8_t lsc_interrupt = 1; /* enabled by default */
>    */
>   uint8_t rmv_interrupt = 1; /* enabled by default */
>   
> +
> +uint8_t hot_plug = 0; /**< hotplug disabled by default. */
> +
>   /*
>    * Display or mask ether events
>    * Default to all events except VF_MBOX
> @@ -384,6 +388,8 @@ uint8_t bitrate_enabled;
>   struct gro_status gro_ports[RTE_MAX_ETHPORTS];
>   uint8_t gro_flush_cycles = GRO_DEFAULT_FLUSH_CYCLES;
>   
> +static struct hotplug_request_list hp_list;
> +
>   /* Forward function declarations */
>   static void map_port_queue_stats_mapping_registers(portid_t pi,
>   						   struct rte_port *port);
> @@ -391,6 +397,14 @@ static void check_all_ports_link_status(uint32_t port_mask);
>   static int eth_event_callback(portid_t port_id,
>   			      enum rte_eth_event_type type,
>   			      void *param, void *ret_param);
> +static int eth_dev_event_callback(char *device_name,
> +				enum rte_dev_event_type type,
> +				void *param);
> +static int eth_dev_event_callback_register(portid_t port_id);
> +static bool in_hotplug_list(const char *dev_name);
> +
> +static int hotplug_list_add(struct rte_device *device,
> +				enum rte_kernel_driver device_kdrv);
>   
>   /*
>    * Check if all the ports are started.
> @@ -1853,6 +1867,27 @@ reset_port(portid_t pid)
>   	printf("Done\n");
>   }
>   
> +static int
> +eth_dev_event_callback_register(portid_t port_id)
> +{
> +	int diag;
> +	char device_name[128];
> +
> +	snprintf(device_name, sizeof(device_name),
> +		"%s", rte_eth_devices[port_id].device->name);
> +
> +	/* register the dev_event callback */
> +
> +	diag = rte_dev_callback_register(device_name,
> +		eth_dev_event_callback, (void *)(intptr_t)port_id);
> +	if (diag) {
> +		printf("Failed to setup dev_event callback\n");
> +		return -1;
> +	}
> +
> +	return 0;
> +}
> +
>   void
>   attach_port(char *identifier)
>   {
> @@ -1869,6 +1904,8 @@ attach_port(char *identifier)
>   	if (rte_eth_dev_attach(identifier, &pi))
>   		return;
>   
> +	eth_dev_event_callback_register(pi);

What's the difference with below one?

> +
>   	socket_id = (unsigned)rte_eth_dev_socket_id(pi);
>   	/* if socket_id is invalid, set to 0 */
>   	if (check_socket_id(socket_id) < 0)
> @@ -1880,6 +1917,12 @@ attach_port(char *identifier)
>   
>   	ports[pi].port_status = RTE_PORT_STOPPED;
>   
> +	if (hot_plug) {
> +		hotplug_list_add(rte_eth_devices[pi].device,
> +				 rte_eth_devices[pi].data->kdrv);
> +		eth_dev_event_callback_register(pi);
> +	}
> +
>   	printf("Port %d is attached. Now total ports is %d\n", pi, nb_ports);
>   	printf("Done\n");
>   }
> @@ -1906,6 +1949,12 @@ detach_port(portid_t port_id)
>   
>   	nb_ports = rte_eth_dev_count();
>   
> +	if (hot_plug) {
> +		hotplug_list_add(rte_eth_devices[port_id].device,
> +				 rte_eth_devices[port_id].data->kdrv);
> +		eth_dev_event_callback_register(port_id);
> +	}
> +
>   	printf("Port '%s' is detached. Now total ports is %d\n",
>   			name, nb_ports);
>   	printf("Done\n");
> @@ -1929,6 +1978,9 @@ pmd_test_exit(void)
>   			close_port(pt_id);
>   		}
>   	}
> +
> +	rte_dev_event_monitor_stop();
> +
>   	printf("\nBye...\n");
>   }
>   
> @@ -2013,6 +2065,49 @@ rmv_event_callback(void *arg)
>   			dev->device->name);
>   }
>   
> +static void
> +rmv_dev_event_callback(void *arg)
> +{
> +	char name[RTE_ETH_NAME_MAX_LEN];
> +	uint8_t port_id = (intptr_t)arg;
> +
> +	rte_eal_alarm_cancel(rmv_dev_event_callback, arg);
> +
> +	RTE_ETH_VALID_PORTID_OR_RET(port_id);
> +	printf("removing port id:%u\n", port_id);
> +
> +	if (!in_hotplug_list(rte_eth_devices[port_id].device->name))
> +		return;
> +
> +	stop_packet_forwarding();
> +
> +	stop_port(port_id);
> +	close_port(port_id);
> +	if (rte_eth_dev_detach(port_id, name)) {
> +		RTE_LOG(ERR, USER1, "Failed to detach port '%s'\n", name);
> +		return;
> +	}
> +
> +	nb_ports = rte_eth_dev_count();
> +
> +	printf("Port '%s' is detached. Now total ports is %d\n",
> +			name, nb_ports);
> +}
> +
> +static void
> +add_dev_event_callback(void *arg)
> +{
> +	char *dev_name = (char *)arg;
> +
> +	rte_eal_alarm_cancel(add_dev_event_callback, arg);
> +
> +	if (!in_hotplug_list(dev_name))
> +		return;
> +
> +	RTE_LOG(ERR, EAL, "add device: %s\n", dev_name);
> +	attach_port(dev_name);
> +}
> +
>   /* This function is used by the interrupt thread */
>   static int
>   eth_event_callback(portid_t port_id, enum rte_eth_event_type type, void *param,
> @@ -2059,6 +2154,86 @@ eth_event_callback(portid_t port_id, enum rte_eth_event_type type, void *param,
>   	return 0;
>   }
>   
> +static bool
> +in_hotplug_list(const char *dev_name)
> +{
> +	struct hotplug_request *hp_request = NULL;
> +
> +	TAILQ_FOREACH(hp_request, &hp_list, next) {
> +		if (!strcmp(hp_request->dev_name, dev_name))
> +			break;
> +	}
> +
> +	if (hp_request)
> +		return true;
> +
> +	return false;
> +}
> +
> +static int
> +hotplug_list_add(struct rte_device *device, enum rte_kernel_driver device_kdrv)
> +{
> +	struct hotplug_request *hp_request;
> +
> +	hp_request = rte_zmalloc("hoplug request",
> +			sizeof(*hp_request), 0);
> +	if (hp_request == NULL) {
> +		fprintf(stderr, "%s can not alloc memory\n",
> +			__func__);
> +		return -ENOMEM;
> +	}
> +
> +	hp_request->dev_name = device->name;
> +	hp_request->dev_kdrv = device_kdrv;
> +
> +	TAILQ_INSERT_TAIL(&hp_list, hp_request, next);
> +
> +	return 0;
> +}
> +
> +/* This function is used by the interrupt thread */
> +static int
> +eth_dev_event_callback(char *device_name, enum rte_dev_event_type type,
> +			     void *arg)
> +{
> +	static const char * const event_desc[] = {
> +		[RTE_DEV_EVENT_UNKNOWN] = "Unknown",
> +		[RTE_DEV_EVENT_ADD] = "add",
> +		[RTE_DEV_EVENT_REMOVE] = "remove",
> +	};
> +	char *dev_name = malloc(strlen(device_name) + 1);
> +
> +	strcpy(dev_name, device_name);

strdup is easier?

> +
> +	if (type >= RTE_DEV_EVENT_MAX) {
> +		fprintf(stderr, "%s called upon invalid event %d\n",
> +			__func__, type);
> +		fflush(stderr);
> +	} else if (event_print_mask & (UINT32_C(1) << type)) {
> +		printf("%s event\n",
> +			event_desc[type]);
> +		fflush(stdout);
> +	}
> +
> +	switch (type) {
> +	case RTE_DEV_EVENT_REMOVE:
> +		if (rte_eal_alarm_set(100000,
> +			rmv_dev_event_callback, arg))

Why not rmv_dev_event_callback directly?

Besides, the name of rmv_dev_event_callback is really confusing. It's 
not a dev event callback.

> +			fprintf(stderr,
> +				"Could not set up deferred device removal\n");
> +		break;
> +	case RTE_DEV_EVENT_ADD:
> +		if (rte_eal_alarm_set(100000,
> +			add_dev_event_callback, dev_name))
> +			fprintf(stderr,
> +				"Could not set up deferred device add\n");

Ditto.

> +		break;
> +	default:
> +		break;
> +	}
> +	return 0;
> +}
> +
>   static int
>   set_tx_queue_stats_mapping_registers(portid_t port_id, struct rte_port *port)
>   {
> @@ -2474,8 +2649,9 @@ signal_handler(int signum)
>   int
>   main(int argc, char** argv)
>   {
> -	int  diag;
> +	int diag;
>   	portid_t port_id;
> +	int ret;
>   
>   	signal(SIGINT, signal_handler);
>   	signal(SIGTERM, signal_handler);
> @@ -2543,6 +2719,23 @@ main(int argc, char** argv)
>   		       nb_rxq, nb_txq);
>   
>   	init_config();
> +
> +	if (hot_plug) {
> +		/* enable hot plug monitoring */
> +		ret = rte_dev_event_monitor_start();
> +		if (ret) {
> +			rte_errno = EINVAL;
> +			return -1;
> +		}
> +		if (TAILQ_EMPTY(&hp_list))
> +			TAILQ_INIT(&hp_list);
> +		RTE_ETH_FOREACH_DEV(port_id) {
> +			hotplug_list_add(rte_eth_devices[port_id].device,
> +					 rte_eth_devices[port_id].data->kdrv);
> +			eth_dev_event_callback_register(port_id);
> +		}

Why not monitoring all devices with dev_name of NULL? It makes things 
much easier. And we can also monitor the hot-plug event.

> +	}
> +
>   	if (start_port(RTE_PORT_ALL) != 0)
>   		rte_exit(EXIT_FAILURE, "Start ports failed\n");
>   
> diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
> index 153abea..c619e11 100644
> --- a/app/test-pmd/testpmd.h
> +++ b/app/test-pmd/testpmd.h
> @@ -63,6 +63,15 @@ typedef uint16_t streamid_t;
>   #define TM_MODE			0
>   #endif
>   
> +struct hotplug_request {
> +	TAILQ_ENTRY(hotplug_request) next; /**< Callbacks list */
> +	const char *dev_name;            /* request device name */
> +	enum rte_kernel_driver dev_kdrv;            /* kernel driver binded */
> +};
> +
> +/** @internal Structure to keep track of registered callbacks */
> +TAILQ_HEAD(hotplug_request_list, hotplug_request);
> +
>   enum {
>   	PORT_TOPOLOGY_PAIRED,
>   	PORT_TOPOLOGY_CHAINED,
> @@ -319,6 +328,8 @@ extern volatile int test_done; /* stop packet forwarding when set to 1. */
>   extern uint8_t lsc_interrupt; /**< disabled by "--no-lsc-interrupt" parameter */
>   extern uint8_t rmv_interrupt; /**< disabled by "--no-rmv-interrupt" parameter */
>   extern uint32_t event_print_mask;
> +extern uint8_t hot_plug; /**< enable by "--hot-plug" parameter */
> +
>   /**< set by "--print-event xxxx" and "--mask-event xxxx parameters */
>   
>   #ifdef RTE_LIBRTE_IXGBE_BYPASS

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V16 3/4] eal/linux: uevent parse and process
  2018-03-28 16:15                                                             ` Tan, Jianfeng
@ 2018-03-29 13:32                                                               ` Van Haaren, Harry
  2018-03-29 15:03                                                                 ` Guo, Jia
  2018-03-29 15:08                                                               ` Guo, Jia
  1 sibling, 1 reply; 494+ messages in thread
From: Van Haaren, Harry @ 2018-03-29 13:32 UTC (permalink / raw)
  To: Tan, Jianfeng, Guo, Jia, stephen, Richardson, Bruce, Yigit,
	Ferruh, Ananyev, Konstantin, gaetan.rivet, Wu, Jingjing, thomas,
	motih
  Cc: jblunck, shreyansh.jain, dev, Zhang, Helin

Two additional input along with Jianfeng's existing comments;

> -----Original Message-----
> From: Tan, Jianfeng
> Sent: Wednesday, March 28, 2018 5:16 PM
> To: Guo, Jia <jia.guo@intel.com>; stephen@networkplumber.org; Richardson,
> Bruce <bruce.richardson@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com>;
> Ananyev, Konstantin <konstantin.ananyev@intel.com>; gaetan.rivet@6wind.com;
> Wu, Jingjing <jingjing.wu@intel.com>; thomas@monjalon.net;
> motih@mellanox.com; Van Haaren, Harry <harry.van.haaren@intel.com>
> Cc: jblunck@infradead.org; shreyansh.jain@nxp.com; dev@dpdk.org; Zhang,
> Helin <helin.zhang@intel.com>
> Subject: Re: [PATCH V16 3/4] eal/linux: uevent parse and process
> 
> BTW, adding new .c file needs to update meson.build now.
> 
> On 3/26/2018 7:20 PM, Jeff Guo wrote:
> > In order to handle the uevent which have been detected from the kernel
> > side, add uevent parse and process function to translate the uevent into
> > device event, which user has subscribe to monitor.
> >
> > Signed-off-by: Jeff Guo <jia.guo@intel.com>
> > ---
> > 1.move all linux specific together
> > ---
> >   lib/librte_eal/linuxapp/eal/eal_dev.c | 214
> +++++++++++++++++++++++++++++++++-
> >   1 file changed, 211 insertions(+), 3 deletions(-)
> >
> > diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c
> b/lib/librte_eal/linuxapp/eal/eal_dev.c
> 
> > +static bool monitor_not_started = true;

This variable should be named "monitor_started", as it is a static var it will be zero by default,
and the following code is easier to read:

if ( !not_started )   becomes    if (started)



> >   int __rte_experimental
> >   rte_dev_event_monitor_start(void)
> >   {
> > -	/* TODO: start uevent monitor for linux */
> > +	int ret;
> > +
> > +	if (!monitor_not_started)
> > +		return 0;
> > +
> > +	intr_handle.fd = dev_uev_monitor_fd_new();
> > +	intr_handle.type = RTE_INTR_HANDLE_DEV_EVENT;

dev_uev_monitor_fd_new() can return -1 on error, we should check for that case here.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V16 3/4] eal/linux: uevent parse and process
  2018-03-29 13:32                                                               ` Van Haaren, Harry
@ 2018-03-29 15:03                                                                 ` Guo, Jia
  0 siblings, 0 replies; 494+ messages in thread
From: Guo, Jia @ 2018-03-29 15:03 UTC (permalink / raw)
  To: Van Haaren, Harry, Tan, Jianfeng, stephen, Richardson, Bruce,
	Yigit, Ferruh, Ananyev, Konstantin, gaetan.rivet, Wu, Jingjing,
	thomas, motih
  Cc: jblunck, shreyansh.jain, dev, Zhang, Helin

hi, harry

thanks for your review.
On 3/29/2018 9:32 PM, Van Haaren, Harry wrote:
> Two additional input along with Jianfeng's existing comments;
>
>> -----Original Message-----
>> From: Tan, Jianfeng
>> Sent: Wednesday, March 28, 2018 5:16 PM
>> To: Guo, Jia <jia.guo@intel.com>; stephen@networkplumber.org; Richardson,
>> Bruce <bruce.richardson@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com>;
>> Ananyev, Konstantin <konstantin.ananyev@intel.com>; gaetan.rivet@6wind.com;
>> Wu, Jingjing <jingjing.wu@intel.com>; thomas@monjalon.net;
>> motih@mellanox.com; Van Haaren, Harry <harry.van.haaren@intel.com>
>> Cc: jblunck@infradead.org; shreyansh.jain@nxp.com; dev@dpdk.org; Zhang,
>> Helin <helin.zhang@intel.com>
>> Subject: Re: [PATCH V16 3/4] eal/linux: uevent parse and process
>>
>> BTW, adding new .c file needs to update meson.build now.
>>
>> On 3/26/2018 7:20 PM, Jeff Guo wrote:
>>> In order to handle the uevent which have been detected from the kernel
>>> side, add uevent parse and process function to translate the uevent into
>>> device event, which user has subscribe to monitor.
>>>
>>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
>>> ---
>>> 1.move all linux specific together
>>> ---
>>>    lib/librte_eal/linuxapp/eal/eal_dev.c | 214
>> +++++++++++++++++++++++++++++++++-
>>>    1 file changed, 211 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c
>> b/lib/librte_eal/linuxapp/eal/eal_dev.c
>>
>>> +static bool monitor_not_started = true;
> This variable should be named "monitor_started", as it is a static var it will be zero by default,
> and the following code is easier to read:
>
> if ( !not_started )   becomes    if (started)
>
make sense.
>
>>>    int __rte_experimental
>>>    rte_dev_event_monitor_start(void)
>>>    {
>>> -	/* TODO: start uevent monitor for linux */
>>> +	int ret;
>>> +
>>> +	if (!monitor_not_started)
>>> +		return 0;
>>> +
>>> +	intr_handle.fd = dev_uev_monitor_fd_new();
>>> +	intr_handle.type = RTE_INTR_HANDLE_DEV_EVENT;
> dev_uev_monitor_fd_new() can return -1 on error, we should check for that case here.
>
you are right.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V16 3/4] eal/linux: uevent parse and process
  2018-03-28 16:15                                                             ` Tan, Jianfeng
  2018-03-29 13:32                                                               ` Van Haaren, Harry
@ 2018-03-29 15:08                                                               ` Guo, Jia
  1 sibling, 0 replies; 494+ messages in thread
From: Guo, Jia @ 2018-03-29 15:08 UTC (permalink / raw)
  To: Tan, Jianfeng, stephen, bruce.richardson, ferruh.yigit,
	konstantin.ananyev, gaetan.rivet, jingjing.wu, thomas, motih,
	harry.van.haaren
  Cc: jblunck, shreyansh.jain, dev, helin.zhang

jianfeng


On 3/29/2018 12:15 AM, Tan, Jianfeng wrote:
> BTW, adding new .c file needs to update meson.build now.
>
thanks for your info .
> On 3/26/2018 7:20 PM, Jeff Guo wrote:
>> In order to handle the uevent which have been detected from the kernel
>> side, add uevent parse and process function to translate the uevent into
>> device event, which user has subscribe to monitor.
>>
>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
>> ---
>> 1.move all linux specific together
>> ---
>>   lib/librte_eal/linuxapp/eal/eal_dev.c | 214 
>> +++++++++++++++++++++++++++++++++-
>>   1 file changed, 211 insertions(+), 3 deletions(-)
>>
>> diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c 
>> b/lib/librte_eal/linuxapp/eal/eal_dev.c
>> index 5ab5830..90094c0 100644
>> --- a/lib/librte_eal/linuxapp/eal/eal_dev.c
>> +++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
>> @@ -2,19 +2,227 @@
>>    * Copyright(c) 2018 Intel Corporation
>>    */
>>   -#include <rte_log.h>
>> +#include <stdio.h>
>> +#include <string.h>
>> +#include <inttypes.h>
>> +#include <sys/queue.h>
>> +#include <sys/signalfd.h>
>> +#include <sys/ioctl.h>
>> +#include <sys/socket.h>
>> +#include <linux/netlink.h>
>> +#include <sys/epoll.h>
>> +#include <unistd.h>
>> +#include <signal.h>
>
> Some header files are not necessary, the above one for example.
>
>> +#include <stdbool.h>
>> +#include <fcntl.h>
>> +
>> +#include <rte_malloc.h>
>> +#include <rte_bus.h>
>>   #include <rte_dev.h>
>> +#include <rte_devargs.h>
>
> We don't need this one neither.
>
>> +#include <rte_debug.h>
>> +#include <rte_log.h>
>> +#include <rte_interrupts.h>
>> +
>> +#include "eal_private.h"
>> +#include "eal_thread.h"
>
> Ditto.
>
>> +
>> +static struct rte_intr_handle intr_handle = {.fd = -1 };
>
> I don't think we need a static intr_handle, what we need is the 
> monitor fd.
>
>> +static bool monitor_not_started = true;
>> +
>> +#define EAL_UEV_MSG_LEN 4096
>> +#define EAL_UEV_MSG_ELEM_LEN 128
>> +
>> +/* identify the system layer which event exposure from */
>> +enum eal_dev_event_subsystem {
>> +    EAL_DEV_EVENT_SUBSYSTEM_PCI, /* PCI bus device event */
>> +    EAL_DEV_EVENT_SUBSYSTEM_UIO, /* UIO driver device event */
>> +    EAL_DEV_EVENT_SUBSYSTEM_MAX
>> +};
>> +
>> +static int
>> +dev_uev_monitor_fd_new(void)
>> +{
>> +    int uevent_fd;
>> +
>> +    uevent_fd = socket(PF_NETLINK, SOCK_RAW | SOCK_CLOEXEC |
>> +            SOCK_NONBLOCK,
>> +            NETLINK_KOBJECT_UEVENT);
>> +    if (uevent_fd < 0) {
>> +        RTE_LOG(ERR, EAL, "create uevent fd failed\n");
>> +        return -1;
>> +    }
>> +    return uevent_fd;
>> +}
>> +
>> +static int
>> +dev_uev_monitor_create(int netlink_fd)
>
> I think we should merge this function with above function. I don't see 
> a reason to split into two functions.
>
make sense.
>> +{
>> +    struct sockaddr_nl addr;
>> +    int ret;
>> +    int size = 64 * 1024;
>> +    int nonblock = 1;
>> +
>> +    memset(&addr, 0, sizeof(addr));
>> +    addr.nl_family = AF_NETLINK;
>> +    addr.nl_pid = 0;
>> +    addr.nl_groups = 0xffffffff;
>> +
>> +    if (bind(netlink_fd, (struct sockaddr *) &addr, sizeof(addr)) < 
>> 0) {
>> +        RTE_LOG(ERR, EAL, "bind failed\n");
>
> Please print more information here, so that we don't have to check the 
> code if we really encounter such an error.
>
>> +        goto err;
>> +    }
>> +
>> +    setsockopt(netlink_fd, SOL_SOCKET, SO_PASSCRED, &size, 
>> sizeof(size));
>> +
>> +    ret = ioctl(netlink_fd, FIONBIO, &nonblock);
>> +    if (ret != 0) {
>> +        RTE_LOG(ERR, EAL, "ioctl(FIONBIO) failed\n");
>> +        goto err;
>> +    }
>> +    return 0;
>> +err:
>> +    close(netlink_fd);
>> +    return -1;
>> +}
>> +
>> +static void
>> +dev_uev_parse(const char *buf, struct rte_dev_event *event, int length)
>
> We always get an event we care? If no, we need return something so 
> that the caller can skip this event.
>
this function do no filter the event.
>> +{
>> +    char action[EAL_UEV_MSG_ELEM_LEN];
>> +    char subsystem[EAL_UEV_MSG_ELEM_LEN];
>> +    char dev_path[EAL_UEV_MSG_ELEM_LEN];
>> +    char pci_slot_name[EAL_UEV_MSG_ELEM_LEN];
>> +    int i = 0;
>> +
>> +    memset(action, 0, EAL_UEV_MSG_ELEM_LEN);
>> +    memset(subsystem, 0, EAL_UEV_MSG_ELEM_LEN);
>> +    memset(dev_path, 0, EAL_UEV_MSG_ELEM_LEN);
>> +    memset(pci_slot_name, 0, EAL_UEV_MSG_ELEM_LEN);
>
> I did not see you need dev_path, why do we care to parse it?
>
>> +
>> +    while (i < length) {
>> +        for (; i < length; i++) {
>> +            if (*buf)
>> +                break;
>> +            buf++;
>> +        }
>> +        if (!strncmp(buf, "ACTION=", 7)) {
>> +            buf += 7;
>> +            i += 7;
>> +            snprintf(action, sizeof(action), "%s", buf);
>> +        } else if (!strncmp(buf, "DEVPATH=", 8)) {
>> +            buf += 8;
>> +            i += 8;
>> +            snprintf(dev_path, sizeof(dev_path), "%s", buf);
>> +        } else if (!strncmp(buf, "SUBSYSTEM=", 10)) {
>> +            buf += 10;
>> +            i += 10;
>> +            snprintf(subsystem, sizeof(subsystem), "%s", buf);
>> +        } else if (!strncmp(buf, "PCI_SLOT_NAME=", 14)) {
>> +            buf += 14;
>> +            i += 14;
>> +            snprintf(pci_slot_name, sizeof(subsystem), "%s", buf);
>> +            event->devname = pci_slot_name;
>
> You are assigning a stack pointer for the caller to use; this is 
> dangerous, we should never do that.
>
make sense.
>> +        }
>> +        for (; i < length; i++) {
>> +            if (*buf == '\0')
>> +                break;
>> +            buf++;
>> +        }
>> +    }
>> +
>> +    if (!strncmp(subsystem, "uio", 3))
>> +        event->subsystem = EAL_DEV_EVENT_SUBSYSTEM_UIO;
>> +    else if (!strncmp(subsystem, "pci", 3))
>> +        event->subsystem = EAL_DEV_EVENT_SUBSYSTEM_PCI;
>> +    if (!strncmp(action, "add", 3))
>> +        event->type = RTE_DEV_EVENT_ADD;
>> +    if (!strncmp(action, "remove", 6))
>> +        event->type = RTE_DEV_EVENT_REMOVE;
>> +}
>> +
>> +static int
>> +dev_uev_receive(int fd, struct rte_dev_event *uevent)
>> +{
>> +    int ret;
>> +    char buf[EAL_UEV_MSG_LEN];
>> +
>> +    memset(uevent, 0, sizeof(struct rte_dev_event));
>> +    memset(buf, 0, EAL_UEV_MSG_LEN);
>> +
>> +    ret = recv(fd, buf, EAL_UEV_MSG_LEN, MSG_DONTWAIT);
>> +    if (ret < 0) {
>> +        RTE_LOG(ERR, EAL,
>> +        "Socket read error(%d): %s\n",
>> +        errno, strerror(errno));
>
> The above three lines are in bad format.
>
>> +        return -1;
>> +    } else if (ret == 0)
>> +        /* connection closed */
>> +        return -1;
>> +
>> +    dev_uev_parse(buf, uevent, EAL_UEV_MSG_LEN);
>> +
>> +    return 0;
>> +}
>> +
>> +static void
>> +dev_uev_process(__rte_unused void *param)
>> +{
>> +    struct rte_dev_event uevent;
>> +
>> +    if (dev_uev_receive(intr_handle.fd, &uevent))
>> +        return;
>> +
>> +    if (uevent.devname)
>> +        dev_callback_process(uevent.devname, uevent.type, NULL);
>> +}
>>     int __rte_experimental
>>   rte_dev_event_monitor_start(void)
>>   {
>> -    /* TODO: start uevent monitor for linux */
>> +    int ret;
>> +
>> +    if (!monitor_not_started)
>> +        return 0;
>> +
>> +    intr_handle.fd = dev_uev_monitor_fd_new();
>> +    intr_handle.type = RTE_INTR_HANDLE_DEV_EVENT;
>> +
>> +    ret = dev_uev_monitor_create(intr_handle.fd);
>> +
>> +    if (ret) {
>> +        RTE_LOG(ERR, EAL, "error create device event monitor\n");
>> +        return -1;
>> +    }
>> +
>> +    ret = rte_intr_callback_register(&intr_handle, dev_uev_process, 
>> NULL);
>> +
>> +    if (ret) {
>> +        RTE_LOG(ERR, EAL, "fail to register uevent callback\n");
>> +        return -1;
>> +    }
>> +
>> +    monitor_not_started = false;
>> +
>>       return 0;
>>   }
>>     int __rte_experimental
>>   rte_dev_event_monitor_stop(void)
>>   {
>> -    /* TODO: stop uevent monitor for linux */
>> +    int ret;
>> +
>> +    if (monitor_not_started)
>> +        return 0;
>> +
>> +    ret = rte_intr_callback_unregister(&intr_handle, 
>> dev_uev_process, NULL);
>> +    if (ret) {
>> +        RTE_LOG(ERR, EAL, "fail to unregister uevent callback");
>> +        return ret;
>> +    }
>> +
>> +    close(intr_handle.fd);
>> +    intr_handle.fd = -1;
>> +    monitor_not_started = true;
>>       return 0;
>>   }
>

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH V17 0/4] add device event monitor framework
  2018-03-26 11:20                                                           ` [PATCH V16 4/4] app/testpmd: enable device hotplug monitoring Jeff Guo
  2018-03-28 16:41                                                             ` Tan, Jianfeng
@ 2018-03-29 16:00                                                             ` Jeff Guo
  2018-03-29 16:00                                                               ` [PATCH V17 1/4] eal: add device event handle in interrupt thread Jeff Guo
                                                                                 ` (3 more replies)
  1 sibling, 4 replies; 494+ messages in thread
From: Jeff Guo @ 2018-03-29 16:00 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, harry.van.haaren,
	jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

About hot plug in dpdk, We already have proactive way to add/remove devices
through APIs (rte_eal_hotplug_add/remove), and also have fail-safe driver
to offload the fail-safe work from the app user. But there are still lack
of a general mechanism to monitor hotplug event for all driver, now the
hotplug interrupt event is diversity between each device and driver, such
as mlx4, pci driver and others.

Use the hot removal event for example, pci drivers not all exposure the
remove interrupt, so in order to make user to easy use the hot plug
feature for pci driver, something must be done to detect the remove event
at the kernel level and offer a new line of interrupt to the user land.

Base on the uevent of kobject mechanism in kernel, we could use it to
benefit for monitoring the hot plug status of the device which not only
uio/vfio of pci bus devices, but also other, such as cpu/usb/pci-express bus devices.

The idea is comming as bellow.

a.The uevent message form FD monitoring which will be useful.
remove@/devices/pci0000:80/0000:80:02.2/0000:82:00.0/0000:83:03.0/0000:84:00.2/uio/uio2
ACTION=remove
DEVPATH=/devices/pci0000:80/0000:80:02.2/0000:82:00.0/0000:83:03.0/0000:84:00.2/uio/uio2
SUBSYSTEM=uio
MAJOR=243
MINOR=2
DEVNAME=uio2
SEQNUM=11366

b.add uevent monitoring machanism:
add several general api to enable uevent monitoring.

c.add common uevent handler and uevent failure handler
uevent of device should be handler at bus or device layer, and the memory read
and write failure when hot removal should be handle correctly before detach behaviors.

d.show example how to use uevent monitor
enable uevent monitoring in testpmd or fail-safe to show usage.

patchset history:
v17->v16:
1.add related part of the interrupt handle type adding.
2.add new API into map, fix typo issue, add (void*)-1 value for unregister all callback
3.add new file into meson.build, modify coding sytle and add print info, delete unused part.
4.unregister all user's callback when stop event monitor

v16->v15:
1.remove some linux related code out of eal common layer
2.fix some uneasy readble issue.

v15->v14:
1.use exist eal interrupt epoll to replace of rte service usage for monitor thread,
2.add new device event handle type in eal interrupt.
3.remove the uevent type check and any policy from eal,
let it check and management in user's callback.
4.add "--hot-plug" configure parameter in testpmd to switch the hotplug feature.

v14->v13:
1.add __rte_experimental on function defind and fix bsd build issue

v13->v12:
1.fix some logic issue and null check issue
2.fix monitor stop func issue

v12->v11:
1.identify null param in callback for monitor all devices uevent

v11->v10:
1:modify some typo and add experimental tag in new file.
2:modify callback register calling.

v10->v9:
1.fix prefix issue.
2.use a common callback lists for all device and all type to replace
add callback parameter into device struct.
3.delete some unuse part.

v9->v8:
split the patch set into small and explicit patch

v8->v7:
1.use rte_service to replace pthread management.
2.fix defind issue and copyright issue
3.fix some lock issue

v7->v6:
1.modify vdev part according to the vdev rework
2.re-define and split the func into common and bus specific code
3.fix some incorrect issue.
4.fix the system hung after send packcet issue.

v6->v5:
1.add hot plug policy, in eal, default handle to prepare hot plug work for
all pci device, then let app to manage to deside which device need to
hot plug.
2.modify to manage event callback in each device.
3.fix some system hung issue when igb_uioome typo error.release.
4.modify the pci part to the bus-pci base on the bus rework.
5.add hot plug policy in app, show example to use hotplug list to manage
to deside which device need to hot plug.

v5->v4:
1.Move uevent monitor epolling from eal interrupt to eal device layer.
2.Redefine the eal device API for common, and distinguish between linux and bsd
3.Add failure handler helper api in bus layer.Add function of find device by name.
4.Replace of individual fd bind with single device, use a common fd to polling all device.
5.Add to register hot insertion monitoring and process, add function to auto bind driver befor user add device
6.Refine some coding style and typos issue
7.add new callback to process hot insertion

v4->v3:
1.move uevent monitor api from eal interrupt to eal device layer.
2.create uevent type and struct in eal device.
3.move uevent handler for each driver to eal layer.
4.add uevent failure handler to process signal fault issue.
5.add example for request and use uevent monitoring in testpmd.

v3->v2:
1.refine some return error
2.refine the string searching logic to avoid memory issue

v2->v1:
1.remove global variables of hotplug_fd, add uevent_fd
in rte_intr_handle to let each pci device self maintain it fd,
to fix dual device fd issue.
2.refine some typo error.


Jeff Guo (4):
  eal: add device event handle in interrupt thread
  eal: add device event monitor framework
  eal/linux: uevent parse and process
  app/testpmd: enable device hotplug monitoring

 app/test-pmd/parameters.c                          |   5 +-
 app/test-pmd/testpmd.c                             | 242 ++++++++++++++++++++-
 app/test-pmd/testpmd.h                             |  11 +
 lib/librte_eal/bsdapp/eal/Makefile                 |   1 +
 lib/librte_eal/bsdapp/eal/eal_dev.c                |  19 ++
 lib/librte_eal/bsdapp/eal/meson.build              |   1 +
 lib/librte_eal/common/eal_common_dev.c             | 149 +++++++++++++
 lib/librte_eal/common/eal_private.h                |  15 ++
 lib/librte_eal/common/include/rte_dev.h            |  94 ++++++++
 lib/librte_eal/common/include/rte_eal_interrupts.h |   1 +
 lib/librte_eal/linuxapp/eal/Makefile               |   1 +
 lib/librte_eal/linuxapp/eal/eal_dev.c              | 205 +++++++++++++++++
 lib/librte_eal/linuxapp/eal/eal_interrupts.c       |  11 +-
 lib/librte_eal/linuxapp/eal/meson.build            |   1 +
 lib/librte_eal/rte_eal_version.map                 |   2 +
 test/test/test_interrupts.c                        |  39 +++-
 16 files changed, 792 insertions(+), 5 deletions(-)
 create mode 100644 lib/librte_eal/bsdapp/eal/eal_dev.c
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_dev.c

-- 
2.7.4

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH V17 1/4] eal: add device event handle in interrupt thread
  2018-03-29 16:00                                                             ` [PATCH V17 0/4] add device event monitor framework Jeff Guo
@ 2018-03-29 16:00                                                               ` Jeff Guo
  2018-03-29 16:00                                                               ` [PATCH V17 2/4] eal: add device event monitor framework Jeff Guo
                                                                                 ` (2 subsequent siblings)
  3 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-03-29 16:00 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, harry.van.haaren,
	jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

Add new interrupt handle type of RTE_INTR_HANDLE_DEV_EVENT, for
device event interrupt monitor.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v17->v16:
add related part of the interrupt handle type adding.
---
 lib/librte_eal/common/include/rte_eal_interrupts.h |  1 +
 lib/librte_eal/linuxapp/eal/eal_interrupts.c       | 11 +++++-
 test/test/test_interrupts.c                        | 39 ++++++++++++++++++++--
 3 files changed, 48 insertions(+), 3 deletions(-)

diff --git a/lib/librte_eal/common/include/rte_eal_interrupts.h b/lib/librte_eal/common/include/rte_eal_interrupts.h
index 3f792a9..6eb4932 100644
--- a/lib/librte_eal/common/include/rte_eal_interrupts.h
+++ b/lib/librte_eal/common/include/rte_eal_interrupts.h
@@ -34,6 +34,7 @@ enum rte_intr_handle_type {
 	RTE_INTR_HANDLE_ALARM,        /**< alarm handle */
 	RTE_INTR_HANDLE_EXT,          /**< external handler */
 	RTE_INTR_HANDLE_VDEV,         /**< virtual device */
+	RTE_INTR_HANDLE_DEV_EVENT,    /**< device event handle */
 	RTE_INTR_HANDLE_MAX           /**< count of elements */
 };
 
diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
index f86f22f..58e9328 100644
--- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c
+++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
@@ -559,6 +559,9 @@ rte_intr_enable(const struct rte_intr_handle *intr_handle)
 			return -1;
 		break;
 #endif
+	/* not used at this moment */
+	case RTE_INTR_HANDLE_DEV_EVENT:
+		return -1;
 	/* unknown handle type */
 	default:
 		RTE_LOG(ERR, EAL,
@@ -606,6 +609,9 @@ rte_intr_disable(const struct rte_intr_handle *intr_handle)
 			return -1;
 		break;
 #endif
+	/* not used at this moment */
+	case RTE_INTR_HANDLE_DEV_EVENT:
+		return -1;
 	/* unknown handle type */
 	default:
 		RTE_LOG(ERR, EAL,
@@ -674,7 +680,10 @@ eal_intr_process_interrupts(struct epoll_event *events, int nfds)
 			bytes_read = 0;
 			call = true;
 			break;
-
+		case RTE_INTR_HANDLE_DEV_EVENT:
+			bytes_read = 0;
+			call = true;
+			break;
 		default:
 			bytes_read = 1;
 			break;
diff --git a/test/test/test_interrupts.c b/test/test/test_interrupts.c
index 31a70a0..7f4f1b4 100644
--- a/test/test/test_interrupts.c
+++ b/test/test/test_interrupts.c
@@ -20,6 +20,7 @@ enum test_interrupt_handle_type {
 	TEST_INTERRUPT_HANDLE_VALID,
 	TEST_INTERRUPT_HANDLE_VALID_UIO,
 	TEST_INTERRUPT_HANDLE_VALID_ALARM,
+	TEST_INTERRUPT_HANDLE_VALID_DEV_EVENT,
 	TEST_INTERRUPT_HANDLE_CASE1,
 	TEST_INTERRUPT_HANDLE_MAX
 };
@@ -80,6 +81,10 @@ test_interrupt_init(void)
 	intr_handles[TEST_INTERRUPT_HANDLE_VALID_ALARM].type =
 					RTE_INTR_HANDLE_ALARM;
 
+	intr_handles[TEST_INTERRUPT_HANDLE_VALID_DEV_EVENT].fd = pfds.readfd;
+	intr_handles[TEST_INTERRUPT_HANDLE_VALID_DEV_EVENT].type =
+					RTE_INTR_HANDLE_DEV_EVENT;
+
 	intr_handles[TEST_INTERRUPT_HANDLE_CASE1].fd = pfds.writefd;
 	intr_handles[TEST_INTERRUPT_HANDLE_CASE1].type = RTE_INTR_HANDLE_UIO;
 
@@ -250,6 +255,14 @@ test_interrupt_enable(void)
 		return -1;
 	}
 
+	/* check with specific valid intr_handle */
+	test_intr_handle = intr_handles[TEST_INTERRUPT_HANDLE_VALID_DEV_EVENT];
+	if (rte_intr_enable(&test_intr_handle) == 0) {
+		printf("unexpectedly enable a specific intr_handle "
+			"successfully\n");
+		return -1;
+	}
+
 	/* check with valid handler and its type */
 	test_intr_handle = intr_handles[TEST_INTERRUPT_HANDLE_CASE1];
 	if (rte_intr_enable(&test_intr_handle) < 0) {
@@ -306,6 +319,14 @@ test_interrupt_disable(void)
 		return -1;
 	}
 
+	/* check with specific valid intr_handle */
+	test_intr_handle = intr_handles[TEST_INTERRUPT_HANDLE_VALID_DEV_EVENT];
+	if (rte_intr_disable(&test_intr_handle) == 0) {
+		printf("unexpectedly disable a specific intr_handle "
+			"successfully\n");
+		return -1;
+	}
+
 	/* check with valid handler and its type */
 	test_intr_handle = intr_handles[TEST_INTERRUPT_HANDLE_CASE1];
 	if (rte_intr_disable(&test_intr_handle) < 0) {
@@ -393,9 +414,17 @@ test_interrupt(void)
 		goto out;
 	}
 
+	printf("Check valid device event interrupt full path\n");
+	if (test_interrupt_full_path_check(
+		TEST_INTERRUPT_HANDLE_VALID_DEVICE_EVENT) < 0) {
+		printf("failure occurred during checking valid device event "
+						"interrupt full path\n");
+		goto out;
+	}
+
 	printf("Check valid alarm interrupt full path\n");
-	if (test_interrupt_full_path_check(TEST_INTERRUPT_HANDLE_VALID_ALARM)
-									< 0) {
+	if (test_interrupt_full_path_check(
+		TEST_INTERRUPT_HANDLE_VALID_DEV_EVENT) < 0) {
 		printf("failure occurred during checking valid alarm "
 						"interrupt full path\n");
 		goto out;
@@ -513,6 +542,12 @@ test_interrupt(void)
 	rte_intr_callback_unregister(&test_intr_handle,
 			test_interrupt_callback_1, (void *)-1);
 
+	test_intr_handle = intr_handles[TEST_INTERRUPT_HANDLE_VALID_DEV_EVENT];
+	rte_intr_callback_unregister(&test_intr_handle,
+			test_interrupt_callback, (void *)-1);
+	rte_intr_callback_unregister(&test_intr_handle,
+			test_interrupt_callback_1, (void *)-1);
+
 	rte_delay_ms(2 * TEST_INTERRUPT_CHECK_INTERVAL);
 	/* deinit */
 	test_interrupt_deinit();
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH V17 2/4] eal: add device event monitor framework
  2018-03-29 16:00                                                             ` [PATCH V17 0/4] add device event monitor framework Jeff Guo
  2018-03-29 16:00                                                               ` [PATCH V17 1/4] eal: add device event handle in interrupt thread Jeff Guo
@ 2018-03-29 16:00                                                               ` Jeff Guo
  2018-03-29 16:00                                                               ` [PATCH V17 3/4] eal/linux: uevent parse and process Jeff Guo
  2018-03-29 16:00                                                               ` [PATCH V17 4/4] app/testpmd: enable device hotplug monitoring Jeff Guo
  3 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-03-29 16:00 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, harry.van.haaren,
	jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

This patch aims to add a general device event monitor framework at
EAL device layer, for device hotplug awareness and actions adopted
accordingly. It could also expand for all other type of device event
monitor, but not in this scope at the stage.

To get started, users firstly call below new added APIs to enable/disable
the device event monitor mechanism:
  - rte_dev_event_monitor_start
  - rte_dev_event_monitor_stop

Then users shell register or unregister callbacks through the new added
APIs. Callbacks can be some device specific, or for all devices.
  -rte_dev_event_callback_register
  -rte_dev_event_callback_unregister

Use hotplug case for example, when device hotplug insertion or hotplug
removal, we will get notified from kernel, then call user's callbacks
accordingly to handle it, such as detach or attach the device from the
bus, and could benefit for further fail-safe or live-migration.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v17->v16:
add new API into map, fix typo issue, add (void*)-1 value for unregister all callback
---
 lib/librte_eal/bsdapp/eal/Makefile      |   1 +
 lib/librte_eal/bsdapp/eal/eal_dev.c     |  19 ++++
 lib/librte_eal/bsdapp/eal/meson.build   |   1 +
 lib/librte_eal/common/eal_common_dev.c  | 149 ++++++++++++++++++++++++++++++++
 lib/librte_eal/common/eal_private.h     |  15 ++++
 lib/librte_eal/common/include/rte_dev.h |  94 ++++++++++++++++++++
 lib/librte_eal/linuxapp/eal/Makefile    |   1 +
 lib/librte_eal/linuxapp/eal/eal_dev.c   |  20 +++++
 lib/librte_eal/rte_eal_version.map      |   2 +
 9 files changed, 302 insertions(+)
 create mode 100644 lib/librte_eal/bsdapp/eal/eal_dev.c
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_dev.c

diff --git a/lib/librte_eal/bsdapp/eal/Makefile b/lib/librte_eal/bsdapp/eal/Makefile
index dd455e6..c0921dd 100644
--- a/lib/librte_eal/bsdapp/eal/Makefile
+++ b/lib/librte_eal/bsdapp/eal/Makefile
@@ -33,6 +33,7 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_lcore.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_timer.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_interrupts.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_alarm.c
+SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_dev.c
 
 # from common dir
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_common_lcore.c
diff --git a/lib/librte_eal/bsdapp/eal/eal_dev.c b/lib/librte_eal/bsdapp/eal/eal_dev.c
new file mode 100644
index 0000000..ad606b3
--- /dev/null
+++ b/lib/librte_eal/bsdapp/eal/eal_dev.c
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <rte_log.h>
+
+int __rte_experimental
+rte_dev_event_monitor_start(void)
+{
+	RTE_LOG(ERR, EAL, "Device event is not supported for FreeBSD\n");
+	return -1;
+}
+
+int __rte_experimental
+rte_dev_event_monitor_stop(void)
+{
+	RTE_LOG(ERR, EAL, "Device event is not supported for FreeBSD\n");
+	return -1;
+}
diff --git a/lib/librte_eal/bsdapp/eal/meson.build b/lib/librte_eal/bsdapp/eal/meson.build
index e83fc91..6dfc533 100644
--- a/lib/librte_eal/bsdapp/eal/meson.build
+++ b/lib/librte_eal/bsdapp/eal/meson.build
@@ -12,4 +12,5 @@ env_sources = files('eal_alarm.c',
 		'eal_timer.c',
 		'eal.c',
 		'eal_memory.c',
+		'eal_dev.c'
 )
diff --git a/lib/librte_eal/common/eal_common_dev.c b/lib/librte_eal/common/eal_common_dev.c
index cd07144..c94df48 100644
--- a/lib/librte_eal/common/eal_common_dev.c
+++ b/lib/librte_eal/common/eal_common_dev.c
@@ -14,9 +14,34 @@
 #include <rte_devargs.h>
 #include <rte_debug.h>
 #include <rte_log.h>
+#include <rte_spinlock.h>
+#include <rte_malloc.h>
 
 #include "eal_private.h"
 
+/* spinlock for device callbacks */
+static rte_spinlock_t dev_event_lock = RTE_SPINLOCK_INITIALIZER;
+
+/**
+ * The device event callback description.
+ *
+ * It contains callback address to be registered by user application,
+ * the pointer to the parameters for callback, and the device name.
+ */
+struct dev_event_callback {
+	TAILQ_ENTRY(dev_event_callback) next; /**< Callbacks list */
+	rte_dev_event_cb_fn cb_fn;                /**< Callback address */
+	void *cb_arg;                           /**< Callback parameter */
+	char *dev_name;	 /**< Callback device name, NULL is for all device */
+	uint32_t active;                        /**< Callback is executing */
+};
+
+/** @internal Structure to keep track of registered callbacks */
+TAILQ_HEAD(dev_event_cb_list, dev_event_callback);
+
+/* The device event callback list for all registered callbacks. */
+static struct dev_event_cb_list dev_event_cbs;
+
 static int cmp_detached_dev_name(const struct rte_device *dev,
 	const void *_name)
 {
@@ -207,3 +232,127 @@ rte_eal_hotplug_remove(const char *busname, const char *devname)
 	rte_eal_devargs_remove(busname, devname);
 	return ret;
 }
+
+int __rte_experimental
+rte_dev_event_callback_register(const char *device_name,
+				rte_dev_event_cb_fn cb_fn,
+				void *cb_arg)
+{
+	struct dev_event_callback *event_cb;
+
+	if (!cb_fn)
+		return -EINVAL;
+
+	rte_spinlock_lock(&dev_event_lock);
+
+	if (TAILQ_EMPTY(&dev_event_cbs))
+		TAILQ_INIT(&dev_event_cbs);
+
+	TAILQ_FOREACH(event_cb, &dev_event_cbs, next) {
+		if (event_cb->cb_fn == cb_fn && event_cb->cb_arg == cb_arg) {
+			if (device_name == NULL && event_cb->dev_name == NULL)
+				break;
+			if (device_name == NULL || event_cb->dev_name == NULL)
+				continue;
+			if (!strcmp(event_cb->dev_name, device_name))
+				break;
+		}
+	}
+
+	/* create a new callback. */
+	if (event_cb == NULL) {
+		event_cb = malloc(sizeof(struct dev_event_callback));
+		if (event_cb != NULL) {
+			event_cb->cb_fn = cb_fn;
+			event_cb->cb_arg = cb_arg;
+			event_cb->dev_name = strdup(device_name);
+			if (event_cb->dev_name == NULL) {
+				free(event_cb);
+				return -EINVAL;
+			}
+			TAILQ_INSERT_TAIL(&dev_event_cbs, event_cb, next);
+		} else {
+			RTE_LOG(ERR, EAL,
+				"Failed to allocate memory for device event callback.");
+			rte_spinlock_unlock(&dev_event_lock);
+			free(event_cb);
+			return -ENOMEM;
+		}
+	}
+
+	rte_spinlock_unlock(&dev_event_lock);
+	return (event_cb == NULL) ? -EEXIST : 0;
+}
+
+int __rte_experimental
+rte_dev_event_callback_unregister(const char *device_name,
+				  rte_dev_event_cb_fn cb_fn,
+				  void *cb_arg)
+{
+	int ret = 0;
+	struct dev_event_callback *event_cb, *next;
+
+	if (!cb_fn)
+		return -EINVAL;
+
+	rte_spinlock_lock(&dev_event_lock);
+
+	for (event_cb = TAILQ_FIRST(&(dev_event_cbs)); event_cb != NULL;
+		event_cb = next) {
+		next = TAILQ_NEXT(event_cb, next);
+
+		if (device_name != NULL && event_cb->dev_name != NULL) {
+			if (!strcmp(event_cb->dev_name, device_name)) {
+				if (event_cb->cb_fn != cb_fn ||
+				    (cb_arg != (void *)-1 &&
+				    event_cb->cb_arg != cb_arg))
+					continue;
+			}
+		} else if (event_cb->dev_name == NULL) {
+			continue;
+		}
+
+		/*
+		 * if this callback is not executing right now,
+		 * then remove it.
+		 */
+		if (event_cb->active == 0) {
+			TAILQ_REMOVE(&(dev_event_cbs), event_cb, next);
+			rte_free(event_cb);
+		} else {
+			ret = -EAGAIN;
+		}
+	}
+	rte_spinlock_unlock(&dev_event_lock);
+	return ret;
+}
+
+void
+dev_callback_process(char *device_name, enum rte_dev_event_type event)
+{
+	struct dev_event_callback *cb_lst;
+	int rc;
+
+	if (device_name == NULL)
+		return;
+
+	rte_spinlock_lock(&dev_event_lock);
+
+	TAILQ_FOREACH(cb_lst, &dev_event_cbs, next) {
+		if (cb_lst->dev_name) {
+			if (strcmp(cb_lst->dev_name, device_name))
+				continue;
+		}
+		cb_lst->active = 1;
+		rte_spinlock_unlock(&dev_event_lock);
+		rc = cb_lst->cb_fn(device_name, event,
+				cb_lst->cb_arg);
+		if (rc) {
+			RTE_LOG(ERR, EAL,
+				"Failed to process callback function.");
+		}
+		rte_spinlock_lock(&dev_event_lock);
+		cb_lst->active = 0;
+	}
+	rte_spinlock_unlock(&dev_event_lock);
+}
diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h
index 0b28770..88e5a59 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -9,6 +9,8 @@
 #include <stdint.h>
 #include <stdio.h>
 
+#include <rte_dev.h>
+
 /**
  * Initialize the memzone subsystem (private to eal).
  *
@@ -205,4 +207,17 @@ struct rte_bus *rte_bus_find_by_device_name(const char *str);
 
 int rte_mp_channel_init(void);
 
+/**
+ * Internal Executes all the user application registered callbacks for
+ * the specific device. It is for DPDK internal user only. User
+ * application should not call it directly.
+ *
+ * @param device_name
+ *  The device name.
+ * @param event
+ *  the device event type.
+ *
+ */
+void
+dev_callback_process(char *device_name, enum rte_dev_event_type event);
 #endif /* _EAL_PRIVATE_H_ */
diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h
index b688f1e..4c78938 100644
--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -24,6 +24,25 @@ extern "C" {
 #include <rte_compat.h>
 #include <rte_log.h>
 
+/**
+ * The device event type.
+ */
+enum rte_dev_event_type {
+	RTE_DEV_EVENT_ADD,	/**< device being added */
+	RTE_DEV_EVENT_REMOVE,	/**< device being removed */
+	RTE_DEV_EVENT_MAX	/**< max value of this enum */
+};
+
+struct rte_dev_event {
+	enum rte_dev_event_type type;	/**< device event type */
+	int subsystem;			/**< subsystem id */
+	char *devname;			/**< device name */
+};
+
+typedef int (*rte_dev_event_cb_fn)(char *device_name,
+					enum rte_dev_event_type event,
+					void *cb_arg);
+
 __attribute__((format(printf, 2, 0)))
 static inline void
 rte_pmd_debug_trace(const char *func_name, const char *fmt, ...)
@@ -267,4 +286,79 @@ __attribute__((used)) = str
 }
 #endif
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * It registers the callback for the specific device.
+ * Multiple callbacks cal be registered at the same time.
+ *
+ * @param device_name
+ *  The device name, that is the param name of the struct rte_device,
+ *  null value means for all devices.
+ * @param cb_fn
+ *  callback address.
+ * @param cb_arg
+ *  address of parameter for callback.
+ *
+ * @return
+ *  - On success, zero.
+ *  - On failure, a negative value.
+ */
+int __rte_experimental
+rte_dev_event_callback_register(const char *device_name,
+				rte_dev_event_cb_fn cb_fn,
+				void *cb_arg);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * It unregisters the callback according to the specified device.
+ *
+ * @param device_name
+ *  The device name, that is the param name of the struct rte_device,
+ *  null value means for all devices.
+ * @param cb_fn
+ *  callback address.
+ * @param cb_arg
+ *  address of parameter for callback, (void *)-1 means to remove all
+ *  registered which has the same callback address.
+ *
+ * @return
+ *  - On success, return the number of callback entities removed.
+ *  - On failure, a negative value.
+ */
+int __rte_experimental
+rte_dev_event_callback_unregister(const char *device_name,
+				  rte_dev_event_cb_fn cb_fn,
+				  void *cb_arg);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Start the device event monitoring.
+ *
+ * @param none
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int __rte_experimental
+rte_dev_event_monitor_start(void);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Stop the device event monitoring .
+ *
+ * @param none
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int __rte_experimental
+rte_dev_event_monitor_stop(void);
 #endif /* _RTE_DEV_H_ */
diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index 7e5bbe8..8578796 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -41,6 +41,7 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_lcore.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_timer.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_interrupts.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_alarm.c
+SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_dev.c
 
 # from common dir
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_common_lcore.c
diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
new file mode 100644
index 0000000..5ab5830
--- /dev/null
+++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <rte_log.h>
+#include <rte_dev.h>
+
+int __rte_experimental
+rte_dev_event_monitor_start(void)
+{
+	/* TODO: start uevent monitor for linux */
+	return 0;
+}
+
+int __rte_experimental
+rte_dev_event_monitor_stop(void)
+{
+	/* TODO: stop uevent monitor for linux */
+	return 0;
+}
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index d123602..71c9560 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -254,5 +254,7 @@ EXPERIMENTAL {
 	rte_service_set_runstate_mapped_check;
 	rte_service_set_stats_enable;
 	rte_service_start_with_defaults;
+	rte_dev_event_callback_register;
+	rte_dev_event_callback_unregister;
 
 } DPDK_18.02;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH V17 3/4] eal/linux: uevent parse and process
  2018-03-29 16:00                                                             ` [PATCH V17 0/4] add device event monitor framework Jeff Guo
  2018-03-29 16:00                                                               ` [PATCH V17 1/4] eal: add device event handle in interrupt thread Jeff Guo
  2018-03-29 16:00                                                               ` [PATCH V17 2/4] eal: add device event monitor framework Jeff Guo
@ 2018-03-29 16:00                                                               ` Jeff Guo
  2018-03-29 16:59                                                                 ` Stephen Hemminger
  2018-03-29 17:00                                                                 ` Stephen Hemminger
  2018-03-29 16:00                                                               ` [PATCH V17 4/4] app/testpmd: enable device hotplug monitoring Jeff Guo
  3 siblings, 2 replies; 494+ messages in thread
From: Jeff Guo @ 2018-03-29 16:00 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, harry.van.haaren,
	jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

In order to handle the uevent which have been detected from the kernel
side, add uevent parse and process function to translate the uevent into
device event, which user has subscribe to monitor.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v17->v16:
add new file into meson.build, modify coding sytle and add print info,
delete unused part.
---
 lib/librte_eal/linuxapp/eal/eal_dev.c   | 191 +++++++++++++++++++++++++++++++-
 lib/librte_eal/linuxapp/eal/meson.build |   1 +
 2 files changed, 189 insertions(+), 3 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
index 5ab5830..6466329 100644
--- a/lib/librte_eal/linuxapp/eal/eal_dev.c
+++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
@@ -2,19 +2,204 @@
  * Copyright(c) 2018 Intel Corporation
  */
 
-#include <rte_log.h>
+#include <stdio.h>
+#include <stdbool.h>
+#include <string.h>
+#include <unistd.h>
+#include <fcntl.h>
+#include <sys/ioctl.h>
+#include <sys/socket.h>
+#include <linux/netlink.h>
+
 #include <rte_dev.h>
+#include <rte_log.h>
+#include <rte_malloc.h>
+#include <rte_interrupts.h>
+
+#include "eal_private.h"
+
+static struct rte_intr_handle intr_handle = {.fd = -1 };
+static bool monitor_started;
+
+#define EAL_UEV_MSG_LEN 4096
+#define EAL_UEV_MSG_ELEM_LEN 128
+
+/* identify the system layer which event exposure from */
+enum eal_dev_event_subsystem {
+	EAL_DEV_EVENT_SUBSYSTEM_PCI, /* PCI bus device event */
+	EAL_DEV_EVENT_SUBSYSTEM_UIO, /* UIO driver device event */
+	EAL_DEV_EVENT_SUBSYSTEM_MAX
+};
+
+static int
+dev_uev_monitor_create(int netlink_fd)
+{
+	struct sockaddr_nl addr;
+	int ret;
+	int size = 64 * 1024;
+	int nonblock = 1;
+
+	memset(&addr, 0, sizeof(addr));
+	addr.nl_family = AF_NETLINK;
+	addr.nl_pid = 0;
+	addr.nl_groups = 0xffffffff;
+
+	if (bind(netlink_fd, (struct sockaddr *) &addr, sizeof(addr)) < 0) {
+		RTE_LOG(ERR, EAL, "Failed to bind socket for netlink fd.\n");
+		goto err;
+	}
+
+	setsockopt(netlink_fd, SOL_SOCKET, SO_PASSCRED, &size, sizeof(size));
+
+	ret = ioctl(netlink_fd, FIONBIO, &nonblock);
+	if (ret != 0) {
+		RTE_LOG(ERR, EAL, "ioctl(FIONBIO) failed.\n");
+		goto err;
+	}
+	return 0;
+err:
+	close(netlink_fd);
+	return -1;
+}
+
+static void
+dev_uev_parse(const char *buf, struct rte_dev_event *event, int length)
+{
+	char action[EAL_UEV_MSG_ELEM_LEN];
+	char subsystem[EAL_UEV_MSG_ELEM_LEN];
+	char pci_slot_name[EAL_UEV_MSG_ELEM_LEN];
+	int i = 0;
+
+	memset(action, 0, EAL_UEV_MSG_ELEM_LEN);
+	memset(subsystem, 0, EAL_UEV_MSG_ELEM_LEN);
+	memset(pci_slot_name, 0, EAL_UEV_MSG_ELEM_LEN);
+
+	while (i < length) {
+		for (; i < length; i++) {
+			if (*buf)
+				break;
+			buf++;
+		}
+		if (!strncmp(buf, "ACTION=", 7)) {
+			buf += 7;
+			i += 7;
+			snprintf(action, sizeof(action), "%s", buf);
+		} else if (!strncmp(buf, "SUBSYSTEM=", 10)) {
+			buf += 10;
+			i += 10;
+			snprintf(subsystem, sizeof(subsystem), "%s", buf);
+		} else if (!strncmp(buf, "PCI_SLOT_NAME=", 14)) {
+			buf += 14;
+			i += 14;
+			snprintf(pci_slot_name, sizeof(subsystem), "%s", buf);
+			event->devname = strdup(pci_slot_name);
+		}
+		for (; i < length; i++) {
+			if (*buf == '\0')
+				break;
+			buf++;
+		}
+	}
+
+	if (!strncmp(subsystem, "uio", 3))
+		event->subsystem = EAL_DEV_EVENT_SUBSYSTEM_UIO;
+	else if (!strncmp(subsystem, "pci", 3))
+		event->subsystem = EAL_DEV_EVENT_SUBSYSTEM_PCI;
+	if (!strncmp(action, "add", 3))
+		event->type = RTE_DEV_EVENT_ADD;
+	if (!strncmp(action, "remove", 6))
+		event->type = RTE_DEV_EVENT_REMOVE;
+}
+
+static int
+dev_uev_receive(int fd, struct rte_dev_event *uevent)
+{
+	int ret;
+	char buf[EAL_UEV_MSG_LEN];
+
+	memset(uevent, 0, sizeof(struct rte_dev_event));
+	memset(buf, 0, EAL_UEV_MSG_LEN);
+
+	ret = recv(fd, buf, EAL_UEV_MSG_LEN, MSG_DONTWAIT);
+	if (ret < 0) {
+		RTE_LOG(ERR, EAL,
+			"Socket read error(%d): %s.\n",
+			errno, strerror(errno));
+		return -1;
+	} else if (ret == 0)
+		/* connection closed */
+		return -1;
+
+	dev_uev_parse(buf, uevent, EAL_UEV_MSG_LEN);
+
+	return 0;
+}
+
+static void
+dev_uev_process(__rte_unused void *param)
+{
+	struct rte_dev_event uevent;
+
+	if (dev_uev_receive(intr_handle.fd, &uevent))
+		return;
+
+	if (uevent.devname)
+		dev_callback_process(uevent.devname, uevent.type);
+}
 
 int __rte_experimental
 rte_dev_event_monitor_start(void)
 {
-	/* TODO: start uevent monitor for linux */
+	int ret;
+
+	if (monitor_started)
+		return 0;
+
+	intr_handle.fd = socket(PF_NETLINK, SOCK_RAW | SOCK_CLOEXEC |
+			SOCK_NONBLOCK,
+			NETLINK_KOBJECT_UEVENT);
+	if (intr_handle.fd < 0) {
+		RTE_LOG(ERR, EAL, "create uevent fd failed.\n");
+		return -1;
+	}
+	intr_handle.type = RTE_INTR_HANDLE_DEV_EVENT;
+
+	ret = dev_uev_monitor_create(intr_handle.fd);
+
+	if (ret) {
+		RTE_LOG(ERR, EAL, "error create device event monitor.\n");
+		return -1;
+	}
+
+	ret = rte_intr_callback_register(&intr_handle, dev_uev_process, NULL);
+
+	if (ret) {
+		RTE_LOG(ERR, EAL, "fail to register uevent callback.\n");
+		return -1;
+	}
+
+	monitor_started = true;
+
 	return 0;
 }
 
 int __rte_experimental
 rte_dev_event_monitor_stop(void)
 {
-	/* TODO: stop uevent monitor for linux */
+	int ret;
+
+	if (!monitor_started)
+		return 0;
+
+	ret = rte_intr_callback_unregister(&intr_handle, dev_uev_process,
+					   (void *)-1);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "fail to unregister uevent callback.");
+		return ret;
+	}
+
+	close(intr_handle.fd);
+	intr_handle.fd = -1;
+	monitor_started = false;
 	return 0;
 }
diff --git a/lib/librte_eal/linuxapp/eal/meson.build b/lib/librte_eal/linuxapp/eal/meson.build
index 03974ff..b222571 100644
--- a/lib/librte_eal/linuxapp/eal/meson.build
+++ b/lib/librte_eal/linuxapp/eal/meson.build
@@ -18,6 +18,7 @@ env_sources = files('eal_alarm.c',
 		'eal_vfio_mp_sync.c',
 		'eal.c',
 		'eal_memory.c',
+		'eal_dev.c',
 )
 
 if has_libnuma == 1
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH V17 4/4] app/testpmd: enable device hotplug monitoring
  2018-03-29 16:00                                                             ` [PATCH V17 0/4] add device event monitor framework Jeff Guo
                                                                                 ` (2 preceding siblings ...)
  2018-03-29 16:00                                                               ` [PATCH V17 3/4] eal/linux: uevent parse and process Jeff Guo
@ 2018-03-29 16:00                                                               ` Jeff Guo
  2018-03-29 17:00                                                                 ` Stephen Hemminger
                                                                                   ` (2 more replies)
  3 siblings, 3 replies; 494+ messages in thread
From: Jeff Guo @ 2018-03-29 16:00 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, harry.van.haaren,
	jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

Use testpmd for example, to show an application how to use device event
mechanism to monitor the hotplug event, involve both hot removal event
and the hot insertion event.

The process is that, testpmd first enable hotplug monitoring and register
the user's callback, when device being hotplug insertion or hotplug
removal, the eal monitor the event and call user's callbacks, the
application according their hot plug policy to detach or attach the device
from the bus.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v17->v16:
unregister all user's callback when stop event monitor
---
 app/test-pmd/parameters.c |   5 +-
 app/test-pmd/testpmd.c    | 242 +++++++++++++++++++++++++++++++++++++++++++++-
 app/test-pmd/testpmd.h    |  11 +++
 3 files changed, 256 insertions(+), 2 deletions(-)

diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index 97d22b8..825d602 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -186,6 +186,7 @@ usage(char* progname)
 	printf("  --flow-isolate-all: "
 	       "requests flow API isolated mode on all ports at initialization time.\n");
 	printf("  --tx-offloads=0xXXXXXXXX: hexadecimal bitmask of TX queue offloads\n");
+	printf("  --hot-plug: enalbe hot plug for device.\n");
 }
 
 #ifdef RTE_LIBRTE_CMDLINE
@@ -621,6 +622,7 @@ launch_args_parse(int argc, char** argv)
 		{ "print-event",		1, 0, 0 },
 		{ "mask-event",			1, 0, 0 },
 		{ "tx-offloads",		1, 0, 0 },
+		{ "hot-plug",			0, 0, 0 },
 		{ 0, 0, 0, 0 },
 	};
 
@@ -1102,7 +1104,8 @@ launch_args_parse(int argc, char** argv)
 					rte_exit(EXIT_FAILURE,
 						 "invalid mask-event argument\n");
 				}
-
+			if (!strcmp(lgopts[opt_idx].name, "hot-plug"))
+				hot_plug = 1;
 			break;
 		case 'h':
 			usage(argv[0]);
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 4c0e258..d3b28bf 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -12,6 +12,7 @@
 #include <sys/mman.h>
 #include <sys/types.h>
 #include <errno.h>
+#include <stdbool.h>
 
 #include <sys/queue.h>
 #include <sys/stat.h>
@@ -284,6 +285,9 @@ uint8_t lsc_interrupt = 1; /* enabled by default */
  */
 uint8_t rmv_interrupt = 1; /* enabled by default */
 
+#define HOT_PLUG_FOR_ALL_DEVICE -1
+uint8_t hot_plug = 0; /**< hotplug disabled by default. */
+
 /*
  * Display or mask ether events
  * Default to all events except VF_MBOX
@@ -384,6 +388,8 @@ uint8_t bitrate_enabled;
 struct gro_status gro_ports[RTE_MAX_ETHPORTS];
 uint8_t gro_flush_cycles = GRO_DEFAULT_FLUSH_CYCLES;
 
+static struct hotplug_request_list hp_list;
+
 /* Forward function declarations */
 static void map_port_queue_stats_mapping_registers(portid_t pi,
 						   struct rte_port *port);
@@ -391,6 +397,14 @@ static void check_all_ports_link_status(uint32_t port_mask);
 static int eth_event_callback(portid_t port_id,
 			      enum rte_eth_event_type type,
 			      void *param, void *ret_param);
+static int eth_dev_event_callback(char *device_name,
+				enum rte_dev_event_type type,
+				void *param);
+static int eth_dev_event_callback_register(portid_t port_id);
+static bool in_hotplug_list(const char *dev_name);
+
+static int hotplug_list_add(struct rte_device *device,
+				enum rte_kernel_driver device_kdrv);
 
 /*
  * Check if all the ports are started.
@@ -1853,6 +1867,64 @@ reset_port(portid_t pid)
 	printf("Done\n");
 }
 
+static int
+eth_dev_event_callback_register(portid_t port_id)
+{
+	int diag;
+	char *device_name;
+
+	/* if port id equal -1, unregister all device event callbacks */
+	if (port_id == (portid_t)HOT_PLUG_FOR_ALL_DEVICE) {
+		device_name = NULL;
+	} else {
+		device_name = strdup(rte_eth_devices[port_id].device->name);
+		if (!device_name) {
+			free(device_name);
+			return -1;
+		}
+	}
+	/* register the dev_event callback */
+	diag = rte_dev_event_callback_register(device_name,
+		eth_dev_event_callback, (void *)(intptr_t)port_id);
+	if (diag) {
+		printf("Failed to setup dev_event callback\n");
+		return -1;
+	}
+
+	free(device_name);
+	return 0;
+}
+
+
+static int
+eth_dev_event_callback_unregister(portid_t port_id)
+{
+	int diag;
+	char *device_name;
+
+	/* if port id equal -1, unregister all device event callbacks */
+	if (port_id == (portid_t)HOT_PLUG_FOR_ALL_DEVICE) {
+		device_name = NULL;
+	} else {
+		device_name = strdup(rte_eth_devices[port_id].device->name);
+		if (!device_name) {
+			free(device_name);
+			return -1;
+		}
+	}
+
+	/* unregister the dev_event callback */
+	diag = rte_dev_event_callback_unregister(device_name,
+		eth_dev_event_callback, (void *)(intptr_t)port_id);
+	if (diag) {
+		printf("Failed to setup dev_event callback\n");
+		return -1;
+	}
+
+	free(device_name);
+	return 0;
+}
+
 void
 attach_port(char *identifier)
 {
@@ -1869,6 +1941,8 @@ attach_port(char *identifier)
 	if (rte_eth_dev_attach(identifier, &pi))
 		return;
 
+	eth_dev_event_callback_register(pi);
+
 	socket_id = (unsigned)rte_eth_dev_socket_id(pi);
 	/* if socket_id is invalid, set to 0 */
 	if (check_socket_id(socket_id) < 0)
@@ -1880,6 +1954,12 @@ attach_port(char *identifier)
 
 	ports[pi].port_status = RTE_PORT_STOPPED;
 
+	if (hot_plug) {
+		hotplug_list_add(rte_eth_devices[pi].device,
+				 rte_eth_devices[pi].data->kdrv);
+		eth_dev_event_callback_register(pi);
+	}
+
 	printf("Port %d is attached. Now total ports is %d\n", pi, nb_ports);
 	printf("Done\n");
 }
@@ -1906,6 +1986,12 @@ detach_port(portid_t port_id)
 
 	nb_ports = rte_eth_dev_count();
 
+	if (hot_plug) {
+		hotplug_list_add(rte_eth_devices[port_id].device,
+				 rte_eth_devices[port_id].data->kdrv);
+		eth_dev_event_callback_register(port_id);
+	}
+
 	printf("Port '%s' is detached. Now total ports is %d\n",
 			name, nb_ports);
 	printf("Done\n");
@@ -1916,6 +2002,7 @@ void
 pmd_test_exit(void)
 {
 	portid_t pt_id;
+	int ret;
 
 	if (test_done == 0)
 		stop_packet_forwarding();
@@ -1929,6 +2016,19 @@ pmd_test_exit(void)
 			close_port(pt_id);
 		}
 	}
+
+	ret = rte_dev_event_monitor_stop();
+	if (ret) {
+		RTE_LOG(ERR, EAL, "fail to stop device event monitor.");
+		return;
+	}
+
+	ret = eth_dev_event_callback_unregister(HOT_PLUG_FOR_ALL_DEVICE);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "fail to unregister all event callbacks.");
+		return;
+	}
+
 	printf("\nBye...\n");
 }
 
@@ -2013,6 +2113,49 @@ rmv_event_callback(void *arg)
 			dev->device->name);
 }
 
+static void
+rmv_dev_event_callback(void *arg)
+{
+	char name[RTE_ETH_NAME_MAX_LEN];
+	uint8_t port_id = (intptr_t)arg;
+
+	rte_eal_alarm_cancel(rmv_dev_event_callback, arg);
+
+	RTE_ETH_VALID_PORTID_OR_RET(port_id);
+	printf("removing port id:%u\n", port_id);
+
+	if (!in_hotplug_list(rte_eth_devices[port_id].device->name))
+		return;
+
+	stop_packet_forwarding();
+
+	stop_port(port_id);
+	close_port(port_id);
+	if (rte_eth_dev_detach(port_id, name)) {
+		RTE_LOG(ERR, USER1, "Failed to detach port '%s'\n", name);
+		return;
+	}
+
+	nb_ports = rte_eth_dev_count();
+
+	printf("Port '%s' is detached. Now total ports is %d\n",
+			name, nb_ports);
+}
+
+static void
+add_dev_event_callback(void *arg)
+{
+	char *dev_name = (char *)arg;
+
+	rte_eal_alarm_cancel(add_dev_event_callback, arg);
+
+	if (!in_hotplug_list(dev_name))
+		return;
+
+	RTE_LOG(ERR, EAL, "add device: %s\n", dev_name);
+	attach_port(dev_name);
+}
+
 /* This function is used by the interrupt thread */
 static int
 eth_event_callback(portid_t port_id, enum rte_eth_event_type type, void *param,
@@ -2059,6 +2202,85 @@ eth_event_callback(portid_t port_id, enum rte_eth_event_type type, void *param,
 	return 0;
 }
 
+static bool
+in_hotplug_list(const char *dev_name)
+{
+	struct hotplug_request *hp_request = NULL;
+
+	TAILQ_FOREACH(hp_request, &hp_list, next) {
+		if (!strcmp(hp_request->dev_name, dev_name))
+			break;
+	}
+
+	if (hp_request)
+		return true;
+
+	return false;
+}
+
+static int
+hotplug_list_add(struct rte_device *device, enum rte_kernel_driver device_kdrv)
+{
+	struct hotplug_request *hp_request;
+
+	hp_request = rte_zmalloc("hoplug request",
+			sizeof(*hp_request), 0);
+	if (hp_request == NULL) {
+		fprintf(stderr, "%s can not alloc memory\n",
+			__func__);
+		return -ENOMEM;
+	}
+
+	hp_request->dev_name = device->name;
+	hp_request->dev_kdrv = device_kdrv;
+
+	TAILQ_INSERT_TAIL(&hp_list, hp_request, next);
+
+	return 0;
+}
+
+/* This function is used by the interrupt thread */
+static int
+eth_dev_event_callback(char *device_name, enum rte_dev_event_type type,
+			     void *arg)
+{
+	static const char * const event_desc[] = {
+		[RTE_DEV_EVENT_ADD] = "add",
+		[RTE_DEV_EVENT_REMOVE] = "remove",
+	};
+	char *dev_name = malloc(strlen(device_name) + 1);
+
+	strcpy(dev_name, device_name);
+
+	if (type >= RTE_DEV_EVENT_MAX) {
+		fprintf(stderr, "%s called upon invalid event %d\n",
+			__func__, type);
+		fflush(stderr);
+	} else if (event_print_mask & (UINT32_C(1) << type)) {
+		printf("%s event\n",
+			event_desc[type]);
+		fflush(stdout);
+	}
+
+	switch (type) {
+	case RTE_DEV_EVENT_REMOVE:
+		if (rte_eal_alarm_set(100000,
+			rmv_dev_event_callback, arg))
+			fprintf(stderr,
+				"Could not set up deferred device removal\n");
+		break;
+	case RTE_DEV_EVENT_ADD:
+		if (rte_eal_alarm_set(100000,
+			add_dev_event_callback, dev_name))
+			fprintf(stderr,
+				"Could not set up deferred device add\n");
+		break;
+	default:
+		break;
+	}
+	return 0;
+}
+
 static int
 set_tx_queue_stats_mapping_registers(portid_t port_id, struct rte_port *port)
 {
@@ -2474,8 +2696,9 @@ signal_handler(int signum)
 int
 main(int argc, char** argv)
 {
-	int  diag;
+	int diag;
 	portid_t port_id;
+	int ret;
 
 	signal(SIGINT, signal_handler);
 	signal(SIGTERM, signal_handler);
@@ -2543,6 +2766,23 @@ main(int argc, char** argv)
 		       nb_rxq, nb_txq);
 
 	init_config();
+
+	if (hot_plug) {
+		/* enable hot plug monitoring */
+		ret = rte_dev_event_monitor_start();
+		if (ret) {
+			rte_errno = EINVAL;
+			return -1;
+		}
+		if (TAILQ_EMPTY(&hp_list))
+			TAILQ_INIT(&hp_list);
+		RTE_ETH_FOREACH_DEV(port_id) {
+			hotplug_list_add(rte_eth_devices[port_id].device,
+					 rte_eth_devices[port_id].data->kdrv);
+			eth_dev_event_callback_register(port_id);
+		}
+	}
+
 	if (start_port(RTE_PORT_ALL) != 0)
 		rte_exit(EXIT_FAILURE, "Start ports failed\n");
 
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 153abea..c619e11 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -63,6 +63,15 @@ typedef uint16_t streamid_t;
 #define TM_MODE			0
 #endif
 
+struct hotplug_request {
+	TAILQ_ENTRY(hotplug_request) next; /**< Callbacks list */
+	const char *dev_name;            /* request device name */
+	enum rte_kernel_driver dev_kdrv;            /* kernel driver binded */
+};
+
+/** @internal Structure to keep track of registered callbacks */
+TAILQ_HEAD(hotplug_request_list, hotplug_request);
+
 enum {
 	PORT_TOPOLOGY_PAIRED,
 	PORT_TOPOLOGY_CHAINED,
@@ -319,6 +328,8 @@ extern volatile int test_done; /* stop packet forwarding when set to 1. */
 extern uint8_t lsc_interrupt; /**< disabled by "--no-lsc-interrupt" parameter */
 extern uint8_t rmv_interrupt; /**< disabled by "--no-rmv-interrupt" parameter */
 extern uint32_t event_print_mask;
+extern uint8_t hot_plug; /**< enable by "--hot-plug" parameter */
+
 /**< set by "--print-event xxxx" and "--mask-event xxxx parameters */
 
 #ifdef RTE_LIBRTE_IXGBE_BYPASS
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* Re: [PATCH V17 3/4] eal/linux: uevent parse and process
  2018-03-29 16:00                                                               ` [PATCH V17 3/4] eal/linux: uevent parse and process Jeff Guo
@ 2018-03-29 16:59                                                                 ` Stephen Hemminger
  2018-04-02  4:20                                                                   ` Guo, Jia
  2018-03-29 17:00                                                                 ` Stephen Hemminger
  1 sibling, 1 reply; 494+ messages in thread
From: Stephen Hemminger @ 2018-03-29 16:59 UTC (permalink / raw)
  To: Jeff Guo
  Cc: bruce.richardson, ferruh.yigit, konstantin.ananyev, gaetan.rivet,
	jingjing.wu, thomas, motih, harry.van.haaren, jianfeng.tan,
	jblunck, shreyansh.jain, dev, helin.zhang

On Fri, 30 Mar 2018 00:00:04 +0800
Jeff Guo <jia.guo@intel.com> wrote:

> +	ret = ioctl(netlink_fd, FIONBIO, &nonblock);
> +	if (ret != 0) {
> +		RTE_LOG(ERR, EAL, "ioctl(FIONBIO) failed.\n");
> +		goto err;
> +	}
> +	retu

Since you use NOWAIT option, this is unnecessary.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V17 3/4] eal/linux: uevent parse and process
  2018-03-29 16:00                                                               ` [PATCH V17 3/4] eal/linux: uevent parse and process Jeff Guo
  2018-03-29 16:59                                                                 ` Stephen Hemminger
@ 2018-03-29 17:00                                                                 ` Stephen Hemminger
  2018-04-02  4:19                                                                   ` Guo, Jia
  1 sibling, 1 reply; 494+ messages in thread
From: Stephen Hemminger @ 2018-03-29 17:00 UTC (permalink / raw)
  To: Jeff Guo
  Cc: bruce.richardson, ferruh.yigit, konstantin.ananyev, gaetan.rivet,
	jingjing.wu, thomas, motih, harry.van.haaren, jianfeng.tan,
	jblunck, shreyansh.jain, dev, helin.zhang

On Fri, 30 Mar 2018 00:00:04 +0800
Jeff Guo <jia.guo@intel.com> wrote:

> +dev_uev_monitor_create(int netlink_fd)
> +{
> +	struct sockaddr_nl addr;
> +	int ret;
> +	int size = 64 * 1024;
> +	int nonblock = 1;
> +
> +	memset(&addr, 0, sizeof(addr));
> +	addr.nl_family = AF_NETLINK;
> +	addr.nl_pid = 0;
> +	addr.nl_groups = 0xffffffff;
> +
> +	if (bind(netlink_fd, (struct sockaddr *) &addr, sizeof(addr)) < 0) {
> +		RTE_LOG(ERR, EAL, "Failed to bind socket for netlink fd.\n");
> +		goto err;
> +	}
> +
> +	setsockopt(netlink_fd, SOL_SOCKET, SO_PASSCRED, &size, sizeof(size));
> +
> +	ret = ioctl(netlink_fd, FIONBIO, &nonblock);
> +	if (ret != 0) {
> +		RTE_LOG(ERR, EAL, "ioctl(FIONBIO) failed.\n");
> +		goto err;
> +	}
> +	return 0;
> +err:

You should set close on exec for this fd (with fcntl).

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V17 4/4] app/testpmd: enable device hotplug monitoring
  2018-03-29 16:00                                                               ` [PATCH V17 4/4] app/testpmd: enable device hotplug monitoring Jeff Guo
@ 2018-03-29 17:00                                                                 ` Stephen Hemminger
  2018-04-02  4:18                                                                   ` Guo, Jia
  2018-04-02  5:49                                                                 ` Wu, Jingjing
  2018-04-03 10:33                                                                 ` [PATCH V18 0/4] add device event monitor framework Jeff Guo
  2 siblings, 1 reply; 494+ messages in thread
From: Stephen Hemminger @ 2018-03-29 17:00 UTC (permalink / raw)
  To: Jeff Guo
  Cc: bruce.richardson, ferruh.yigit, konstantin.ananyev, gaetan.rivet,
	jingjing.wu, thomas, motih, harry.van.haaren, jianfeng.tan,
	jblunck, shreyansh.jain, dev, helin.zhang

On Fri, 30 Mar 2018 00:00:05 +0800
Jeff Guo <jia.guo@intel.com> wrote:

> Use testpmd for example, to show an application how to use device event
> mechanism to monitor the hotplug event, involve both hot removal event
> and the hot insertion event.
> 
> The process is that, testpmd first enable hotplug monitoring and register
> the user's callback, when device being hotplug insertion or hotplug
> removal, the eal monitor the event and call user's callbacks, the
> application according their hot plug policy to detach or attach the device
> from the bus.
> 
> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> ---
> v17->v16:
> unregister all user's callback when stop event monitor
> ---
>  app/test-pmd/parameters.c |   5 +-
>  app/test-pmd/testpmd.c    | 242 +++++++++++++++++++++++++++++++++++++++++++++-
>  app/test-pmd/testpmd.h    |  11 +++
>  3 files changed, 256 insertions(+), 2 deletions(-)
> 
> diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
> index 97d22b8..825d602 100644
> --- a/app/test-pmd/parameters.c
> +++ b/app/test-pmd/parameters.c
> @@ -186,6 +186,7 @@ usage(char* progname)
>  	printf("  --flow-isolate-all: "
>  	       "requests flow API isolated mode on all ports at initialization time.\n");
>  	printf("  --tx-offloads=0xXXXXXXXX: hexadecimal bitmask of TX queue offloads\n");
> +	printf("  --hot-plug: enalbe hot plug for device.\n")

s/enalbe/enable/

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V15 1/2] pci_uio: add uevent hotplug failure handler in uio
  2018-03-21  6:11                                 ` [PATCH V15 1/2] pci_uio: add uevent hotplug failure handler in uio Jeff Guo
  2018-03-21  6:11                                   ` [PATCH V15 2/2] pci: add driver auto bind for hot insertion Jeff Guo
@ 2018-03-30  3:35                                   ` Tan, Jianfeng
  1 sibling, 0 replies; 494+ messages in thread
From: Tan, Jianfeng @ 2018-03-30  3:35 UTC (permalink / raw)
  To: Jeff Guo, stephen, bruce.richardson, ferruh.yigit,
	konstantin.ananyev, gaetan.rivet, jingjing.wu, thomas, motih,
	harry.van.haaren
  Cc: jblunck, shreyansh.jain, dev, helin.zhang


We shall split this patch to multiple patches.

- This helper function is not necessarily exposed to users. Whenever 
there is a device remove event, before invoking the callbacks, we do the 
remap.
- PCI related code shall be put into librte_pci.
- Personally, I don't think we shall add an ops for bus. We don't know 
if this necessary for all bus types; at least, vdev does not "remap". 
Even in future, we need to support new bus types, that's just internal 
changes, which is acceptable.
- If there is some issue in igb_uio, fix it in another patch.

Thanks,
Jianfeng

On 3/21/2018 2:11 PM, Jeff Guo wrote:
> when detect hot removal uevent of device, the device resource become
> invalid, in order to avoid unexpected usage of this resource, remap
> the device resource to be a fake memory, that would lead the application
> keep running well but not encounter system core dump.
>
> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> ---
> v15->v14:
> delete find_by_name in bus ops, it is no used. use additional flag and
> original pci map resource function to replace of adding a new fixed
> memory mapping function.
> ---
>   app/test-pmd/testpmd.c                    |  4 +++
>   drivers/bus/pci/bsd/pci.c                 | 23 +++++++++++++++++
>   drivers/bus/pci/linux/pci.c               | 33 +++++++++++++++++++++++++
>   drivers/bus/pci/pci_common.c              | 21 ++++++++++++++++
>   drivers/bus/pci/pci_common_uio.c          | 41 +++++++++++++++++++++++++++++++
>   drivers/bus/pci/private.h                 | 12 +++++++++
>   drivers/bus/pci/rte_bus_pci.h             |  9 +++++++
>   drivers/bus/vdev/vdev.c                   |  7 ++++++
>   lib/librte_eal/bsdapp/eal/eal_dev.c       |  8 ++++++
>   lib/librte_eal/common/eal_common_bus.c    |  1 +
>   lib/librte_eal/common/include/rte_bus.h   | 15 +++++++++++
>   lib/librte_eal/common/include/rte_dev.h   | 18 ++++++++++++++
>   lib/librte_eal/linuxapp/eal/eal_dev.c     | 34 +++++++++++++++++++++++++
>   lib/librte_eal/linuxapp/igb_uio/igb_uio.c |  5 ++++
>   14 files changed, 231 insertions(+)
>
> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
> index 915532e..1c4afea 100644
> --- a/app/test-pmd/testpmd.c
> +++ b/app/test-pmd/testpmd.c
> @@ -2079,6 +2079,10 @@ rmv_uevent_callback(void *arg)
>   	if (!in_hotplug_list(rte_eth_devices[port_id].device->name))
>   		return;
>   
> +	/* do failure handler before stop and close the device */
> +	rte_dev_failure_handler(rte_eth_devices[port_id].device,
> +				rte_eth_devices[port_id].data->kdrv);
> +
>   	stop_packet_forwarding();
>   
>   	stop_port(port_id);
> diff --git a/drivers/bus/pci/bsd/pci.c b/drivers/bus/pci/bsd/pci.c
> index 655b34b..d7165b9 100644
> --- a/drivers/bus/pci/bsd/pci.c
> +++ b/drivers/bus/pci/bsd/pci.c
> @@ -97,6 +97,29 @@ rte_pci_unmap_device(struct rte_pci_device *dev)
>   	}
>   }
>   
> +/* re-map pci device */
> +int
> +rte_pci_remap_device(struct rte_pci_device *dev)
> +{
> +	int ret;
> +
> +	if (dev == NULL)
> +		return -EINVAL;
> +
> +	switch (dev->kdrv) {
> +	case RTE_KDRV_NIC_UIO:
> +		ret = pci_uio_remap_resource(dev);
> +		break;
> +	default:
> +		RTE_LOG(DEBUG, EAL,
> +			"  Not managed by a supported kernel driver, skipped\n");
> +		ret = 1;
> +		break;
> +	}
> +
> +	return ret;
> +}
> +
>   void
>   pci_uio_free_resource(struct rte_pci_device *dev,
>   		struct mapped_pci_resource *uio_res)
> diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
> index abde641..a7dfec7 100644
> --- a/drivers/bus/pci/linux/pci.c
> +++ b/drivers/bus/pci/linux/pci.c
> @@ -116,6 +116,38 @@ rte_pci_unmap_device(struct rte_pci_device *dev)
>   	}
>   }
>   
> +/* Map pci device */
> +int
> +rte_pci_remap_device(struct rte_pci_device *dev)
> +{
> +	int ret = -1;
> +
> +	if (dev == NULL)
> +		return -EINVAL;
> +
> +	switch (dev->kdrv) {
> +	case RTE_KDRV_VFIO:
> +#ifdef VFIO_PRESENT
> +		/* no thing to do */
> +#endif
> +		break;
> +	case RTE_KDRV_IGB_UIO:
> +	case RTE_KDRV_UIO_GENERIC:
> +		if (rte_eal_using_phys_addrs()) {
> +			/* map resources for devices that use uio */
> +			ret = pci_uio_remap_resource(dev);
> +		}
> +		break;
> +	default:
> +		RTE_LOG(DEBUG, EAL,
> +			"  Not managed by a supported kernel driver, skipped\n");
> +		ret = 1;
> +		break;
> +	}
> +
> +	return ret;
> +}
> +
>   void *
>   pci_find_max_end_va(void)
>   {
> @@ -357,6 +389,7 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr *addr)
>   		rte_pci_add_device(dev);
>   	}
>   
> +	dev->device.state = RTE_DEV_PARSED;
>   	return 0;
>   }
>   
> diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
> index 2a00f36..46921a4 100644
> --- a/drivers/bus/pci/pci_common.c
> +++ b/drivers/bus/pci/pci_common.c
> @@ -253,6 +253,7 @@ pci_probe_all_drivers(struct rte_pci_device *dev)
>   		if (rc > 0)
>   			/* positive value means driver doesn't support it */
>   			continue;
> +		dev->device.state = RTE_DEV_PARSED;
>   		return 0;
>   	}
>   	return 1;
> @@ -474,6 +475,25 @@ pci_find_device(const struct rte_device *start, rte_dev_cmp_t cmp,
>   }
>   
>   static int
> +pci_remap_device(struct rte_device *dev)
> +{
> +	struct rte_pci_device *pdev;
> +	int ret;
> +
> +	if (dev == NULL)
> +		return -EINVAL;
> +
> +	pdev = RTE_DEV_TO_PCI(dev);
> +
> +	/* remap resources for devices that use igb_uio */
> +	ret = rte_pci_remap_device(pdev);
> +	if (ret != 0)
> +		RTE_LOG(ERR, EAL, "failed to remap device %s",
> +			dev->name);
> +	return ret;
> +}
> +
> +static int
>   pci_plug(struct rte_device *dev)
>   {
>   	return pci_probe_all_drivers(RTE_DEV_TO_PCI(dev));
> @@ -503,6 +523,7 @@ struct rte_pci_bus rte_pci_bus = {
>   		.unplug = pci_unplug,
>   		.parse = pci_parse,
>   		.get_iommu_class = rte_pci_get_iommu_class,
> +		.remap_device = pci_remap_device,
>   	},
>   	.device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
>   	.driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
> diff --git a/drivers/bus/pci/pci_common_uio.c b/drivers/bus/pci/pci_common_uio.c
> index 54bc20b..3a0a2bb 100644
> --- a/drivers/bus/pci/pci_common_uio.c
> +++ b/drivers/bus/pci/pci_common_uio.c
> @@ -146,6 +146,47 @@ pci_uio_unmap(struct mapped_pci_resource *uio_res)
>   	}
>   }
>   
> +/* remap the PCI resource of a PCI device in private virtual memory */
> +int
> +pci_uio_remap_resource(struct rte_pci_device *dev)
> +{
> +	int i;
> +	uint64_t phaddr;
> +	void *map_address;
> +
> +	if (dev == NULL)
> +		return -1;
> +
> +	close(dev->intr_handle.fd);
> +	if (dev->intr_handle.uio_cfg_fd >= 0) {
> +		close(dev->intr_handle.uio_cfg_fd);
> +		dev->intr_handle.uio_cfg_fd = -1;
> +	}
> +
> +	dev->intr_handle.fd = -1;
> +	dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
> +
> +	/* Map all BARs */
> +	for (i = 0; i != PCI_MAX_RESOURCE; i++) {
> +		/* skip empty BAR */
> +		phaddr = dev->mem_resource[i].phys_addr;
> +		if (phaddr == 0)
> +			continue;
> +		pci_unmap_resource(dev->mem_resource[i].addr,
> +				(size_t)dev->mem_resource[i].len);
> +		map_address = pci_map_resource(
> +				dev->mem_resource[i].addr, -1, 0,
> +				(size_t)dev->mem_resource[i].len,
> +				MAP_ANONYMOUS);
> +		if (map_address == MAP_FAILED)
> +			return -1;
> +		memset(map_address, 0xFF, (size_t)dev->mem_resource[i].len);
> +		dev->mem_resource[i].addr = map_address;
> +	}
> +
> +	return 0;
> +}
> +
>   static struct mapped_pci_resource *
>   pci_uio_find_resource(struct rte_pci_device *dev)
>   {
> diff --git a/drivers/bus/pci/private.h b/drivers/bus/pci/private.h
> index 88fa587..7a862ef 100644
> --- a/drivers/bus/pci/private.h
> +++ b/drivers/bus/pci/private.h
> @@ -173,6 +173,18 @@ void pci_uio_free_resource(struct rte_pci_device *dev,
>   		struct mapped_pci_resource *uio_res);
>   
>   /**
> + * remap the pci uio resource..
> + *
> + * @param dev
> + *   Point to the struct rte pci device.
> + * @return
> + *   - On success, zero.
> + *   - On failure, a negative value.
> + */
> +int
> +pci_uio_remap_resource(struct rte_pci_device *dev);
> +
> +/**
>    * Map device memory to uio resource
>    *
>    * This function is private to EAL.
> diff --git a/drivers/bus/pci/rte_bus_pci.h b/drivers/bus/pci/rte_bus_pci.h
> index 357afb9..6f9cd8b 100644
> --- a/drivers/bus/pci/rte_bus_pci.h
> +++ b/drivers/bus/pci/rte_bus_pci.h
> @@ -168,6 +168,15 @@ int rte_pci_map_device(struct rte_pci_device *dev);
>   void rte_pci_unmap_device(struct rte_pci_device *dev);
>   
>   /**
> + * Remap this device
> + *
> + * @param dev
> + *   A pointer to a rte_pci_device structure describing the device
> + *   to use
> + */
> +int rte_pci_remap_device(struct rte_pci_device *dev);
> +
> +/**
>    * Dump the content of the PCI bus.
>    *
>    * @param f
> diff --git a/drivers/bus/vdev/vdev.c b/drivers/bus/vdev/vdev.c
> index e4bc724..efc348b 100644
> --- a/drivers/bus/vdev/vdev.c
> +++ b/drivers/bus/vdev/vdev.c
> @@ -400,6 +400,12 @@ vdev_find_device(const struct rte_device *start, rte_dev_cmp_t cmp,
>   }
>   
>   static int
> +vdev_remap_device(struct rte_device *dev)
> +{
> +	RTE_SET_USED(dev);
> +	return 0;
> +}
> +static int
>   vdev_plug(struct rte_device *dev)
>   {
>   	return vdev_probe_all_drivers(RTE_DEV_TO_VDEV(dev));
> @@ -418,6 +424,7 @@ static struct rte_bus rte_vdev_bus = {
>   	.plug = vdev_plug,
>   	.unplug = vdev_unplug,
>   	.parse = vdev_parse,
> +	.remap_device = vdev_remap_device,
>   };
>   
>   RTE_REGISTER_BUS(vdev, rte_vdev_bus);
> diff --git a/lib/librte_eal/bsdapp/eal/eal_dev.c b/lib/librte_eal/bsdapp/eal/eal_dev.c
> index 3b7bbf2..a076ec7 100644
> --- a/lib/librte_eal/bsdapp/eal/eal_dev.c
> +++ b/lib/librte_eal/bsdapp/eal/eal_dev.c
> @@ -31,3 +31,11 @@ rte_dev_event_monitor_stop(void)
>   	RTE_LOG(ERR, EAL, "Not support event monitor for FreeBSD\n");
>   	return -1;
>   }
> +
> +int __rte_experimental
> +rte_dev_failure_handler(struct rte_device *dev,
> +					enum rte_kernel_driver kdrv_type)
> +{
> +	RTE_LOG(ERR, EAL, "Not support device failure handler for FreeBSD\n");
> +	return -1;
> +}
> diff --git a/lib/librte_eal/common/eal_common_bus.c b/lib/librte_eal/common/eal_common_bus.c
> index 3e022d5..5510bbe 100644
> --- a/lib/librte_eal/common/eal_common_bus.c
> +++ b/lib/librte_eal/common/eal_common_bus.c
> @@ -53,6 +53,7 @@ rte_bus_register(struct rte_bus *bus)
>   	RTE_VERIFY(bus->find_device);
>   	/* Buses supporting driver plug also require unplug. */
>   	RTE_VERIFY(!bus->plug || bus->unplug);
> +	RTE_VERIFY(bus->remap_device);
>   
>   	TAILQ_INSERT_TAIL(&rte_bus_list, bus, next);
>   	RTE_LOG(DEBUG, EAL, "Registered [%s] bus.\n", bus->name);
> diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
> index 6fb0834..1f3c09b 100644
> --- a/lib/librte_eal/common/include/rte_bus.h
> +++ b/lib/librte_eal/common/include/rte_bus.h
> @@ -168,6 +168,20 @@ typedef int (*rte_bus_unplug_t)(struct rte_device *dev);
>   typedef int (*rte_bus_parse_t)(const char *name, void *addr);
>   
>   /**
> + * Implementation specific remap function which is responsible for remmaping
> + * devices on that bus from original share memory resource to a anonymous
> + * memory resource for the sake of device has been removal.
> + *
> + * @param dev
> + *	Device pointer that was returned by a previous call to find_device.
> + *
> + * @return
> + *	0 on success.
> + *	!0 on error.
> + */
> +typedef int (*rte_bus_remap_device_t)(struct rte_device *dev);
> +
> +/**
>    * Bus scan policies
>    */
>   enum rte_bus_scan_mode {
> @@ -209,6 +223,7 @@ struct rte_bus {
>   	rte_bus_plug_t plug;         /**< Probe single device for drivers */
>   	rte_bus_unplug_t unplug;     /**< Remove single device from driver */
>   	rte_bus_parse_t parse;       /**< Parse a device name */
> +	rte_bus_remap_device_t remap_device;       /**< remap a device */
>   	struct rte_bus_conf conf;    /**< Bus configuration */
>   	rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
>   };
> diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h
> index 98ea12b..10a5fcf 100644
> --- a/lib/librte_eal/common/include/rte_dev.h
> +++ b/lib/librte_eal/common/include/rte_dev.h
> @@ -180,6 +180,7 @@ struct rte_device {
>   	const struct rte_driver *driver;/**< Associated driver */
>   	int numa_node;                /**< NUMA node connection */
>   	struct rte_devargs *devargs;  /**< Device user arguments */
> +	enum rte_dev_state state;  /**< Device state */
>   };
>   
>   /**
> @@ -405,4 +406,21 @@ rte_dev_event_monitor_start(void);
>    */
>   int __rte_experimental
>   rte_dev_event_monitor_stop(void);
> +
> +/**
> + * It can be used to do device failure handler to avoid
> + * system core dump when failure occur.
> + *
> + * @param dev
> + *  The prointer to device structure.
> + * @param kdrv_type
> + *  The specific kernel driver's type.
> + *
> + * @return
> + *  - On success, zero.
> + *  - On failure, a negative value.
> + */
> +int __rte_experimental
> +rte_dev_failure_handler(struct rte_device *dev,
> +			     enum rte_kernel_driver kdrv_type);
>   #endif /* _RTE_DEV_H_ */
> diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
> index 2b34e2c..fa63105 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_dev.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
> @@ -225,3 +225,37 @@ rte_dev_event_monitor_stop(void)
>   	monitor_no_started = true;
>   	return 0;
>   }
> +
> +int __rte_experimental
> +rte_dev_failure_handler(struct rte_device *dev,
> +					enum rte_kernel_driver kdrv_type)
> +{
> +	struct rte_bus *bus = rte_bus_find_by_device_name(dev->name);
> +	int ret;
> +
> +	switch (kdrv_type) {
> +	case RTE_KDRV_IGB_UIO:
> +		if ((!dev) || dev->state == RTE_DEV_UNDEFINED)
> +			return -1;
> +		dev->state = RTE_DEV_FAULT;
> +		/**
> +		 * remap the resource to be fake
> +		 * before user's removal processing
> +		 */
> +		ret = bus->remap_device(dev);
> +		if (ret) {
> +			RTE_LOG(ERR, EAL, "Driver cannot remap the "
> +				"device (%s)\n",
> +				dev->name);
> +			return -1;
> +		}
> +		break;
> +	case RTE_KDRV_VFIO:
> +		break;
> +	case RTE_KDRV_UIO_GENERIC:
> +		break;
> +	default:
> +		break;
> +	}
> +	return 0;
> +}
> diff --git a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
> index 4cae4dd..9c50876 100644
> --- a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
> +++ b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
> @@ -350,6 +350,11 @@ igbuio_pci_release(struct uio_info *info, struct inode *inode)
>   		return 0;
>   	}
>   
> +	/* check if device has been remove before release */
> +	if ((&dev->dev.kobj)->state_remove_uevent_sent == 1) {
> +		pr_info("The device has been removed\n");
> +		return -1;
> +	}
>   	/* disable interrupts */
>   	igbuio_pci_disable_interrupts(udev);
>   

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V17 4/4] app/testpmd: enable device hotplug monitoring
  2018-03-29 17:00                                                                 ` Stephen Hemminger
@ 2018-04-02  4:18                                                                   ` Guo, Jia
  0 siblings, 0 replies; 494+ messages in thread
From: Guo, Jia @ 2018-04-02  4:18 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: bruce.richardson, ferruh.yigit, konstantin.ananyev, gaetan.rivet,
	jingjing.wu, thomas, motih, harry.van.haaren, jianfeng.tan,
	jblunck, shreyansh.jain, dev, helin.zhang



On 3/30/2018 1:00 AM, Stephen Hemminger wrote:
> On Fri, 30 Mar 2018 00:00:05 +0800
> Jeff Guo <jia.guo@intel.com> wrote:
>
>> Use testpmd for example, to show an application how to use device event
>> mechanism to monitor the hotplug event, involve both hot removal event
>> and the hot insertion event.
>>
>> The process is that, testpmd first enable hotplug monitoring and register
>> the user's callback, when device being hotplug insertion or hotplug
>> removal, the eal monitor the event and call user's callbacks, the
>> application according their hot plug policy to detach or attach the device
>> from the bus.
>>
>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
>> ---
>> v17->v16:
>> unregister all user's callback when stop event monitor
>> ---
>>   app/test-pmd/parameters.c |   5 +-
>>   app/test-pmd/testpmd.c    | 242 +++++++++++++++++++++++++++++++++++++++++++++-
>>   app/test-pmd/testpmd.h    |  11 +++
>>   3 files changed, 256 insertions(+), 2 deletions(-)
>>
>> diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
>> index 97d22b8..825d602 100644
>> --- a/app/test-pmd/parameters.c
>> +++ b/app/test-pmd/parameters.c
>> @@ -186,6 +186,7 @@ usage(char* progname)
>>   	printf("  --flow-isolate-all: "
>>   	       "requests flow API isolated mode on all ports at initialization time.\n");
>>   	printf("  --tx-offloads=0xXXXXXXXX: hexadecimal bitmask of TX queue offloads\n");
>> +	printf("  --hot-plug: enalbe hot plug for device.\n")
> s/enalbe/enable/
will correct it.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V17 3/4] eal/linux: uevent parse and process
  2018-03-29 17:00                                                                 ` Stephen Hemminger
@ 2018-04-02  4:19                                                                   ` Guo, Jia
  0 siblings, 0 replies; 494+ messages in thread
From: Guo, Jia @ 2018-04-02  4:19 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: bruce.richardson, ferruh.yigit, konstantin.ananyev, gaetan.rivet,
	jingjing.wu, thomas, motih, harry.van.haaren, jianfeng.tan,
	jblunck, shreyansh.jain, dev, helin.zhang



On 3/30/2018 1:00 AM, Stephen Hemminger wrote:
> On Fri, 30 Mar 2018 00:00:04 +0800
> Jeff Guo <jia.guo@intel.com> wrote:
>
>> +dev_uev_monitor_create(int netlink_fd)
>> +{
>> +	struct sockaddr_nl addr;
>> +	int ret;
>> +	int size = 64 * 1024;
>> +	int nonblock = 1;
>> +
>> +	memset(&addr, 0, sizeof(addr));
>> +	addr.nl_family = AF_NETLINK;
>> +	addr.nl_pid = 0;
>> +	addr.nl_groups = 0xffffffff;
>> +
>> +	if (bind(netlink_fd, (struct sockaddr *) &addr, sizeof(addr)) < 0) {
>> +		RTE_LOG(ERR, EAL, "Failed to bind socket for netlink fd.\n");
>> +		goto err;
>> +	}
>> +
>> +	setsockopt(netlink_fd, SOL_SOCKET, SO_PASSCRED, &size, sizeof(size));
>> +
>> +	ret = ioctl(netlink_fd, FIONBIO, &nonblock);
>> +	if (ret != 0) {
>> +		RTE_LOG(ERR, EAL, "ioctl(FIONBIO) failed.\n");
>> +		goto err;
>> +	}
>> +	return 0;
>> +err:
> You should set close on exec for this fd (with fcntl).
yes, but i have already set it when fd creation time by SOCK_CLOEXEC.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V17 3/4] eal/linux: uevent parse and process
  2018-03-29 16:59                                                                 ` Stephen Hemminger
@ 2018-04-02  4:20                                                                   ` Guo, Jia
  0 siblings, 0 replies; 494+ messages in thread
From: Guo, Jia @ 2018-04-02  4:20 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: bruce.richardson, ferruh.yigit, konstantin.ananyev, gaetan.rivet,
	jingjing.wu, thomas, motih, harry.van.haaren, jianfeng.tan,
	jblunck, shreyansh.jain, dev, helin.zhang



On 3/30/2018 12:59 AM, Stephen Hemminger wrote:
> On Fri, 30 Mar 2018 00:00:04 +0800
> Jeff Guo <jia.guo@intel.com> wrote:
>
>> +	ret = ioctl(netlink_fd, FIONBIO, &nonblock);
>> +	if (ret != 0) {
>> +		RTE_LOG(ERR, EAL, "ioctl(FIONBIO) failed.\n");
>> +		goto err;
>> +	}
>> +	retu
> Since you use NOWAIT option, this is unnecessary.
i think you are right.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V17 4/4] app/testpmd: enable device hotplug monitoring
  2018-03-29 16:00                                                               ` [PATCH V17 4/4] app/testpmd: enable device hotplug monitoring Jeff Guo
  2018-03-29 17:00                                                                 ` Stephen Hemminger
@ 2018-04-02  5:49                                                                 ` Wu, Jingjing
  2018-04-02 11:31                                                                   ` Guo, Jia
  2018-04-03 10:33                                                                 ` [PATCH V18 0/4] add device event monitor framework Jeff Guo
  2 siblings, 1 reply; 494+ messages in thread
From: Wu, Jingjing @ 2018-04-02  5:49 UTC (permalink / raw)
  To: Guo, Jia, stephen, Richardson, Bruce, Yigit, Ferruh, Ananyev,
	Konstantin, gaetan.rivet, thomas, motih, Van Haaren, Harry, Tan,
	Jianfeng
  Cc: jblunck, shreyansh.jain, dev, Zhang, Helin

> 
> +static int
> +eth_dev_event_callback_register(portid_t port_id)
> +{
> +	int diag;
> +	char *device_name;
> +
> +	/* if port id equal -1, unregister all device event callbacks */
> +	if (port_id == (portid_t)HOT_PLUG_FOR_ALL_DEVICE) {
> +		device_name = NULL;
> +	} else {
> +		device_name = strdup(rte_eth_devices[port_id].device->name);
> +		if (!device_name) {
> +			free(device_name);
If device_name is NULL, no memory allocated, why free?

> +			return -1;
> +		}
> +	}
> +	/* register the dev_event callback */
> +	diag = rte_dev_event_callback_register(device_name,
> +		eth_dev_event_callback, (void *)(intptr_t)port_id);
> +	if (diag) {
> +		printf("Failed to setup dev_event callback\n");
> +		return -1;
> +	}
> +
> +	free(device_name);
> +	return 0;
> +}
> +
> +
> +static int
> +eth_dev_event_callback_unregister(portid_t port_id)
> +{
> +	int diag;
> +	char *device_name;
> +
> +	/* if port id equal -1, unregister all device event callbacks */
> +	if (port_id == (portid_t)HOT_PLUG_FOR_ALL_DEVICE) {
> +		device_name = NULL;
> +	} else {
> +		device_name = strdup(rte_eth_devices[port_id].device->name);
> +		if (!device_name) {
> +			free(device_name);
Same as above.

> +			return -1;
> +		}
> +	}
> +
> +	/* unregister the dev_event callback */
> +	diag = rte_dev_event_callback_unregister(device_name,
> +		eth_dev_event_callback, (void *)(intptr_t)port_id);
> +	if (diag) {
> +		printf("Failed to setup dev_event callback\n");
> +		return -1;
> +	}
> +
> +	free(device_name);
> +	return 0;
> +}
> +
>  void
>  attach_port(char *identifier)
>  {
> @@ -1869,6 +1941,8 @@ attach_port(char *identifier)
>  	if (rte_eth_dev_attach(identifier, &pi))
>  		return;
> 
> +	eth_dev_event_callback_register(pi);
> +
>  	socket_id = (unsigned)rte_eth_dev_socket_id(pi);
>  	/* if socket_id is invalid, set to 0 */
>  	if (check_socket_id(socket_id) < 0)
> @@ -1880,6 +1954,12 @@ attach_port(char *identifier)
> 
>  	ports[pi].port_status = RTE_PORT_STOPPED;
> 
> +	if (hot_plug) {
> +		hotplug_list_add(rte_eth_devices[pi].device,
> +				 rte_eth_devices[pi].data->kdrv);
> +		eth_dev_event_callback_register(pi);
> +	}
> +
>  	printf("Port %d is attached. Now total ports is %d\n", pi, nb_ports);
>  	printf("Done\n");
>  }
> @@ -1906,6 +1986,12 @@ detach_port(portid_t port_id)
> 
>  	nb_ports = rte_eth_dev_count();
> 
> +	if (hot_plug) {
> +		hotplug_list_add(rte_eth_devices[port_id].device,
> +				 rte_eth_devices[port_id].data->kdrv);
> +		eth_dev_event_callback_register(port_id);
> +	}
> +
>  	printf("Port '%s' is detached. Now total ports is %d\n",
>  			name, nb_ports);
>  	printf("Done\n");
> @@ -1916,6 +2002,7 @@ void
>  pmd_test_exit(void)
>  {
>  	portid_t pt_id;
> +	int ret;
> 
>  	if (test_done == 0)
>  		stop_packet_forwarding();
> @@ -1929,6 +2016,19 @@ pmd_test_exit(void)
>  			close_port(pt_id);
>  		}
>  	}
> +

Need to check if hot_plug is enabled?

> +	ret = rte_dev_event_monitor_stop();
> +	if (ret) {
> +		RTE_LOG(ERR, EAL, "fail to stop device event monitor.");
> +		return;
> +	}
> +
> +	ret = eth_dev_event_callback_unregister(HOT_PLUG_FOR_ALL_DEVICE);
> +	if (ret) {
> +		RTE_LOG(ERR, EAL, "fail to unregister all event callbacks.");
> +		return;
> +	}


<...>

> +static void
> +add_dev_event_callback(void *arg)
> +{
> +	char *dev_name = (char *)arg;
> +
> +	rte_eal_alarm_cancel(add_dev_event_callback, arg);
> +
> +	if (!in_hotplug_list(dev_name))

Remove "!" in the check

> +		return;
> +
> +	RTE_LOG(ERR, EAL, "add device: %s\n", dev_name);
It is not ERR, please make the log aligned with remove device.

> +	attach_port(dev_name);
> +}
> +

<...>
> +
> +/* This function is used by the interrupt thread */
> +static int
> +eth_dev_event_callback(char *device_name, enum rte_dev_event_type type,
> +			     void *arg)
> +{
> +	static const char * const event_desc[] = {
> +		[RTE_DEV_EVENT_ADD] = "add",
> +		[RTE_DEV_EVENT_REMOVE] = "remove",
> +	};
> +	char *dev_name = malloc(strlen(device_name) + 1);
> +
> +	strcpy(dev_name, device_name);
Why not use strdup as above?

> +	if (type >= RTE_DEV_EVENT_MAX) {
> +		fprintf(stderr, "%s called upon invalid event %d\n",
> +			__func__, type);
> +		fflush(stderr);
> +	} else if (event_print_mask & (UINT32_C(1) << type)) {
> +		printf("%s event\n",
> +			event_desc[type]);
> +		fflush(stdout);
> +	}
> +
> +	switch (type) {
> +	case RTE_DEV_EVENT_REMOVE:
> +		if (rte_eal_alarm_set(100000,
> +			rmv_dev_event_callback, arg))
> +			fprintf(stderr,
> +				"Could not set up deferred device removal\n");
> +		break;
> +	case RTE_DEV_EVENT_ADD:
> +		if (rte_eal_alarm_set(100000,
> +			add_dev_event_callback, dev_name))
> +			fprintf(stderr,
> +				"Could not set up deferred device add\n");
> +		break;
> +	default:
> +		break;
> +	}
> +	return 0;
Always 0, even alarm set fails?


Thanks
Jingjing

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V17 4/4] app/testpmd: enable device hotplug monitoring
  2018-04-02  5:49                                                                 ` Wu, Jingjing
@ 2018-04-02 11:31                                                                   ` Guo, Jia
  0 siblings, 0 replies; 494+ messages in thread
From: Guo, Jia @ 2018-04-02 11:31 UTC (permalink / raw)
  To: Wu, Jingjing, stephen, Richardson, Bruce, Yigit, Ferruh, Ananyev,
	Konstantin, gaetan.rivet, thomas, motih, Van Haaren, Harry, Tan,
	Jianfeng
  Cc: jblunck, shreyansh.jain, dev, Zhang, Helin

hi,jingjing


On 4/2/2018 1:49 PM, Wu, Jingjing wrote:
>> +static int
>> +eth_dev_event_callback_register(portid_t port_id)
>> +{
>> +	int diag;
>> +	char *device_name;
>> +
>> +	/* if port id equal -1, unregister all device event callbacks */
>> +	if (port_id == (portid_t)HOT_PLUG_FOR_ALL_DEVICE) {
>> +		device_name = NULL;
>> +	} else {
>> +		device_name = strdup(rte_eth_devices[port_id].device->name);
>> +		if (!device_name) {
>> +			free(device_name);
> If device_name is NULL, no memory allocated, why free?
you are right.
>> +			return -1;
>> +		}
>> +	}
>> +	/* register the dev_event callback */
>> +	diag = rte_dev_event_callback_register(device_name,
>> +		eth_dev_event_callback, (void *)(intptr_t)port_id);
>> +	if (diag) {
>> +		printf("Failed to setup dev_event callback\n");
>> +		return -1;
>> +	}
>> +
>> +	free(device_name);
>> +	return 0;
>> +}
>> +
>> +
>> +static int
>> +eth_dev_event_callback_unregister(portid_t port_id)
>> +{
>> +	int diag;
>> +	char *device_name;
>> +
>> +	/* if port id equal -1, unregister all device event callbacks */
>> +	if (port_id == (portid_t)HOT_PLUG_FOR_ALL_DEVICE) {
>> +		device_name = NULL;
>> +	} else {
>> +		device_name = strdup(rte_eth_devices[port_id].device->name);
>> +		if (!device_name) {
>> +			free(device_name);
> Same as above.
>
>> +			return -1;
>> +		}
>> +	}
>> +
>> +	/* unregister the dev_event callback */
>> +	diag = rte_dev_event_callback_unregister(device_name,
>> +		eth_dev_event_callback, (void *)(intptr_t)port_id);
>> +	if (diag) {
>> +		printf("Failed to setup dev_event callback\n");
>> +		return -1;
>> +	}
>> +
>> +	free(device_name);
>> +	return 0;
>> +}
>> +
>>   void
>>   attach_port(char *identifier)
>>   {
>> @@ -1869,6 +1941,8 @@ attach_port(char *identifier)
>>   	if (rte_eth_dev_attach(identifier, &pi))
>>   		return;
>>
>> +	eth_dev_event_callback_register(pi);
>> +
>>   	socket_id = (unsigned)rte_eth_dev_socket_id(pi);
>>   	/* if socket_id is invalid, set to 0 */
>>   	if (check_socket_id(socket_id) < 0)
>> @@ -1880,6 +1954,12 @@ attach_port(char *identifier)
>>
>>   	ports[pi].port_status = RTE_PORT_STOPPED;
>>
>> +	if (hot_plug) {
>> +		hotplug_list_add(rte_eth_devices[pi].device,
>> +				 rte_eth_devices[pi].data->kdrv);
>> +		eth_dev_event_callback_register(pi);
>> +	}
>> +
>>   	printf("Port %d is attached. Now total ports is %d\n", pi, nb_ports);
>>   	printf("Done\n");
>>   }
>> @@ -1906,6 +1986,12 @@ detach_port(portid_t port_id)
>>
>>   	nb_ports = rte_eth_dev_count();
>>
>> +	if (hot_plug) {
>> +		hotplug_list_add(rte_eth_devices[port_id].device,
>> +				 rte_eth_devices[port_id].data->kdrv);
>> +		eth_dev_event_callback_register(port_id);
>> +	}
>> +
>>   	printf("Port '%s' is detached. Now total ports is %d\n",
>>   			name, nb_ports);
>>   	printf("Done\n");
>> @@ -1916,6 +2002,7 @@ void
>>   pmd_test_exit(void)
>>   {
>>   	portid_t pt_id;
>> +	int ret;
>>
>>   	if (test_done == 0)
>>   		stop_packet_forwarding();
>> @@ -1929,6 +2016,19 @@ pmd_test_exit(void)
>>   			close_port(pt_id);
>>   		}
>>   	}
>> +
> Need to check if hot_plug is enabled?
sure.
>> +	ret = rte_dev_event_monitor_stop();
>> +	if (ret) {
>> +		RTE_LOG(ERR, EAL, "fail to stop device event monitor.");
>> +		return;
>> +	}
>> +
>> +	ret = eth_dev_event_callback_unregister(HOT_PLUG_FOR_ALL_DEVICE);
>> +	if (ret) {
>> +		RTE_LOG(ERR, EAL, "fail to unregister all event callbacks.");
>> +		return;
>> +	}
>
> <...>
>
>> +static void
>> +add_dev_event_callback(void *arg)
>> +{
>> +	char *dev_name = (char *)arg;
>> +
>> +	rte_eal_alarm_cancel(add_dev_event_callback, arg);
>> +
>> +	if (!in_hotplug_list(dev_name))
> Remove "!" in the check
the hot plug list is for hot plug in and hot plug out device, that is 
management by app, when remove a device will add into the hotplug list 
for the future adding.
>> +		return;
>> +
>> +	RTE_LOG(ERR, EAL, "add device: %s\n", dev_name);
> It is not ERR, please make the log aligned with remove device.
yes.
>> +	attach_port(dev_name);
>> +}
>> +
> <...>
>> +
>> +/* This function is used by the interrupt thread */
>> +static int
>> +eth_dev_event_callback(char *device_name, enum rte_dev_event_type type,
>> +			     void *arg)
>> +{
>> +	static const char * const event_desc[] = {
>> +		[RTE_DEV_EVENT_ADD] = "add",
>> +		[RTE_DEV_EVENT_REMOVE] = "remove",
>> +	};
>> +	char *dev_name = malloc(strlen(device_name) + 1);
>> +
>> +	strcpy(dev_name, device_name);
> Why not use strdup as above?
ok.
>> +	if (type >= RTE_DEV_EVENT_MAX) {
>> +		fprintf(stderr, "%s called upon invalid event %d\n",
>> +			__func__, type);
>> +		fflush(stderr);
>> +	} else if (event_print_mask & (UINT32_C(1) << type)) {
>> +		printf("%s event\n",
>> +			event_desc[type]);
>> +		fflush(stdout);
>> +	}
>> +
>> +	switch (type) {
>> +	case RTE_DEV_EVENT_REMOVE:
>> +		if (rte_eal_alarm_set(100000,
>> +			rmv_dev_event_callback, arg))
>> +			fprintf(stderr,
>> +				"Could not set up deferred device removal\n");
>> +		break;
>> +	case RTE_DEV_EVENT_ADD:
>> +		if (rte_eal_alarm_set(100000,
>> +			add_dev_event_callback, dev_name))
>> +			fprintf(stderr,
>> +				"Could not set up deferred device add\n");
>> +		break;
>> +	default:
>> +		break;
>> +	}
>> +	return 0;
> Always 0, even alarm set fails?
>
should check the alarm fails.
> Thanks
> Jingjing

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH V18 0/4] add device event monitor framework
  2018-03-29 16:00                                                               ` [PATCH V17 4/4] app/testpmd: enable device hotplug monitoring Jeff Guo
  2018-03-29 17:00                                                                 ` Stephen Hemminger
  2018-04-02  5:49                                                                 ` Wu, Jingjing
@ 2018-04-03 10:33                                                                 ` Jeff Guo
  2018-04-03 10:33                                                                   ` [PATCH V18 1/4] eal: add device event handle in interrupt thread Jeff Guo
                                                                                     ` (3 more replies)
  2 siblings, 4 replies; 494+ messages in thread
From: Jeff Guo @ 2018-04-03 10:33 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, harry.van.haaren,
	jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

About hot plug in dpdk, We already have proactive way to add/remove devices
through APIs (rte_eal_hotplug_add/remove), and also have fail-safe driver
to offload the fail-safe work from the app user. But there are still lack
of a general mechanism to monitor hotplug event for all driver, now the
hotplug interrupt event is diversity between each device and driver, such
as mlx4, pci driver and others.

Use the hot removal event for example, pci drivers not all exposure the
remove interrupt, so in order to make user to easy use the hot plug
feature for pci driver, something must be done to detect the remove event
at the kernel level and offer a new line of interrupt to the user land.

Base on the uevent of kobject mechanism in kernel, we could use it to
benefit for monitoring the hot plug status of the device which not only
uio/vfio of pci bus devices, but also other, such as cpu/usb/pci-express bus devices.

The idea is comming as bellow.

a.The uevent message form FD monitoring like below.
remove@/devices/pci0000:80/0000:80:02.2/0000:82:00.0/0000:83:03.0/0000:84:00.2/uio/uio2
ACTION=remove
DEVPATH=/devices/pci0000:80/0000:80:02.2/0000:82:00.0/0000:83:03.0/0000:84:00.2/uio/uio2
SUBSYSTEM=uio
MAJOR=243
MINOR=2
DEVNAME=uio2
SEQNUM=11366

b.add device event monitor framework:
add several general api to enable uevent monitoring.

c.show example how to use uevent monitor
enable uevent monitoring in testpmd to show device event monitor machenism usage.

TODO: failure handler mechanism for hot plug and driver auto bind for hot insertion.
that would let the next hot plug patch set to cover.

patchset history:
v18->v17:
1.add feature announcement in release document, fix bsp compile issue.
2.refine socket configuration.
3.remove hotplug policy and detach/attach process from testpmd, let it
focus on the device event monitoring which the patch set introduced.

v17->v16:
1.add related part of the interrupt handle type adding.
2.add new API into map, fix typo issue, add (void*)-1 value for unregister all callback
3.add new file into meson.build, modify coding sytle and add print info, delete unused part.
4.unregister all user's callback when stop event monitor

v16->v15:
1.remove some linux related code out of eal common layer
2.fix some uneasy readble issue.

v15->v14:
1.use exist eal interrupt epoll to replace of rte service usage for monitor thread,
2.add new device event handle type in eal interrupt.
3.remove the uevent type check and any policy from eal,
let it check and management in user's callback.
4.add "--hot-plug" configure parameter in testpmd to switch the hotplug feature.

v14->v13:
1.add __rte_experimental on function defind and fix bsd build issue

v13->v12:
1.fix some logic issue and null check issue
2.fix monitor stop func issue

v12->v11:
1.identify null param in callback for monitor all devices uevent

v11->v10:
1:modify some typo and add experimental tag in new file.
2:modify callback register calling.

v10->v9:
1.fix prefix issue.
2.use a common callback lists for all device and all type to replace
add callback parameter into device struct.
3.delete some unuse part.

v9->v8:
split the patch set into small and explicit patch

v8->v7:
1.use rte_service to replace pthread management.
2.fix defind issue and copyright issue
3.fix some lock issue

v7->v6:
1.modify vdev part according to the vdev rework
2.re-define and split the func into common and bus specific code
3.fix some incorrect issue.
4.fix the system hung after send packcet issue.

v6->v5:
1.add hot plug policy, in eal, default handle to prepare hot plug work for
all pci device, then let app to manage to deside which device need to
hot plug.
2.modify to manage event callback in each device.
3.fix some system hung issue when igb_uioome typo error.release.
4.modify the pci part to the bus-pci base on the bus rework.
5.add hot plug policy in app, show example to use hotplug list to manage
to deside which device need to hot plug.

v5->v4:
1.Move uevent monitor epolling from eal interrupt to eal device layer.
2.Redefine the eal device API for common, and distinguish between linux and bsd
3.Add failure handler helper api in bus layer.Add function of find device by name.
4.Replace of individual fd bind with single device, use a common fd to polling all device.
5.Add to register hot insertion monitoring and process, add function to auto bind driver befor user add device
6.Refine some coding style and typos issue
7.add new callback to process hot insertion

v4->v3:
1.move uevent monitor api from eal interrupt to eal device layer.
2.create uevent type and struct in eal device.
3.move uevent handler for each driver to eal layer.
4.add uevent failure handler to process signal fault issue.
5.add example for request and use uevent monitoring in testpmd.

v3->v2:
1.refine some return error
2.refine the string searching logic to avoid memory issue

v2->v1:
1.remove global variables of hotplug_fd, add uevent_fd
in rte_intr_handle to let each pci device self maintain it fd,
to fix dual device fd issue.
2.refine some typo error.

Jeff Guo (4):
  eal: add device event handle in interrupt thread
  eal: add device event monitor framework
  eal/linux: uevent parse and process
  app/testpmd: enable device hotplug monitoring

 app/test-pmd/parameters.c                          |   5 +-
 app/test-pmd/testpmd.c                             | 112 +++++++++++-
 app/test-pmd/testpmd.h                             |   2 +
 doc/guides/rel_notes/release_18_05.rst             |   9 +
 doc/guides/testpmd_app_ug/run_app.rst              |   4 +
 lib/librte_eal/bsdapp/eal/Makefile                 |   1 +
 lib/librte_eal/bsdapp/eal/eal_dev.c                |  21 +++
 lib/librte_eal/bsdapp/eal/meson.build              |   1 +
 lib/librte_eal/common/eal_common_dev.c             | 168 ++++++++++++++++++
 lib/librte_eal/common/eal_private.h                |  15 ++
 lib/librte_eal/common/include/rte_dev.h            |  94 ++++++++++
 lib/librte_eal/common/include/rte_eal_interrupts.h |   1 +
 lib/librte_eal/linuxapp/eal/Makefile               |   1 +
 lib/librte_eal/linuxapp/eal/eal_dev.c              | 196 +++++++++++++++++++++
 lib/librte_eal/linuxapp/eal/eal_interrupts.c       |  11 +-
 lib/librte_eal/linuxapp/eal/meson.build            |   1 +
 lib/librte_eal/rte_eal_version.map                 |   8 +
 test/test/test_interrupts.c                        |  39 +++-
 18 files changed, 684 insertions(+), 5 deletions(-)
 create mode 100644 lib/librte_eal/bsdapp/eal/eal_dev.c
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_dev.c

-- 
2.7.4

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH V18 1/4] eal: add device event handle in interrupt thread
  2018-04-03 10:33                                                                 ` [PATCH V18 0/4] add device event monitor framework Jeff Guo
@ 2018-04-03 10:33                                                                   ` Jeff Guo
  2018-04-04  1:47                                                                     ` Tan, Jianfeng
  2018-04-03 10:33                                                                   ` [PATCH V18 2/4] eal: add device event monitor framework Jeff Guo
                                                                                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 494+ messages in thread
From: Jeff Guo @ 2018-04-03 10:33 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, harry.van.haaren,
	jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

Add new interrupt handle type of RTE_INTR_HANDLE_DEV_EVENT, for
device event interrupt monitor.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v18->v17:
no change.
---
 lib/librte_eal/common/include/rte_eal_interrupts.h |  1 +
 lib/librte_eal/linuxapp/eal/eal_interrupts.c       | 11 +++++-
 test/test/test_interrupts.c                        | 39 ++++++++++++++++++++--
 3 files changed, 48 insertions(+), 3 deletions(-)

diff --git a/lib/librte_eal/common/include/rte_eal_interrupts.h b/lib/librte_eal/common/include/rte_eal_interrupts.h
index 3f792a9..6eb4932 100644
--- a/lib/librte_eal/common/include/rte_eal_interrupts.h
+++ b/lib/librte_eal/common/include/rte_eal_interrupts.h
@@ -34,6 +34,7 @@ enum rte_intr_handle_type {
 	RTE_INTR_HANDLE_ALARM,        /**< alarm handle */
 	RTE_INTR_HANDLE_EXT,          /**< external handler */
 	RTE_INTR_HANDLE_VDEV,         /**< virtual device */
+	RTE_INTR_HANDLE_DEV_EVENT,    /**< device event handle */
 	RTE_INTR_HANDLE_MAX           /**< count of elements */
 };
 
diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
index f86f22f..58e9328 100644
--- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c
+++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
@@ -559,6 +559,9 @@ rte_intr_enable(const struct rte_intr_handle *intr_handle)
 			return -1;
 		break;
 #endif
+	/* not used at this moment */
+	case RTE_INTR_HANDLE_DEV_EVENT:
+		return -1;
 	/* unknown handle type */
 	default:
 		RTE_LOG(ERR, EAL,
@@ -606,6 +609,9 @@ rte_intr_disable(const struct rte_intr_handle *intr_handle)
 			return -1;
 		break;
 #endif
+	/* not used at this moment */
+	case RTE_INTR_HANDLE_DEV_EVENT:
+		return -1;
 	/* unknown handle type */
 	default:
 		RTE_LOG(ERR, EAL,
@@ -674,7 +680,10 @@ eal_intr_process_interrupts(struct epoll_event *events, int nfds)
 			bytes_read = 0;
 			call = true;
 			break;
-
+		case RTE_INTR_HANDLE_DEV_EVENT:
+			bytes_read = 0;
+			call = true;
+			break;
 		default:
 			bytes_read = 1;
 			break;
diff --git a/test/test/test_interrupts.c b/test/test/test_interrupts.c
index 31a70a0..7f4f1b4 100644
--- a/test/test/test_interrupts.c
+++ b/test/test/test_interrupts.c
@@ -20,6 +20,7 @@ enum test_interrupt_handle_type {
 	TEST_INTERRUPT_HANDLE_VALID,
 	TEST_INTERRUPT_HANDLE_VALID_UIO,
 	TEST_INTERRUPT_HANDLE_VALID_ALARM,
+	TEST_INTERRUPT_HANDLE_VALID_DEV_EVENT,
 	TEST_INTERRUPT_HANDLE_CASE1,
 	TEST_INTERRUPT_HANDLE_MAX
 };
@@ -80,6 +81,10 @@ test_interrupt_init(void)
 	intr_handles[TEST_INTERRUPT_HANDLE_VALID_ALARM].type =
 					RTE_INTR_HANDLE_ALARM;
 
+	intr_handles[TEST_INTERRUPT_HANDLE_VALID_DEV_EVENT].fd = pfds.readfd;
+	intr_handles[TEST_INTERRUPT_HANDLE_VALID_DEV_EVENT].type =
+					RTE_INTR_HANDLE_DEV_EVENT;
+
 	intr_handles[TEST_INTERRUPT_HANDLE_CASE1].fd = pfds.writefd;
 	intr_handles[TEST_INTERRUPT_HANDLE_CASE1].type = RTE_INTR_HANDLE_UIO;
 
@@ -250,6 +255,14 @@ test_interrupt_enable(void)
 		return -1;
 	}
 
+	/* check with specific valid intr_handle */
+	test_intr_handle = intr_handles[TEST_INTERRUPT_HANDLE_VALID_DEV_EVENT];
+	if (rte_intr_enable(&test_intr_handle) == 0) {
+		printf("unexpectedly enable a specific intr_handle "
+			"successfully\n");
+		return -1;
+	}
+
 	/* check with valid handler and its type */
 	test_intr_handle = intr_handles[TEST_INTERRUPT_HANDLE_CASE1];
 	if (rte_intr_enable(&test_intr_handle) < 0) {
@@ -306,6 +319,14 @@ test_interrupt_disable(void)
 		return -1;
 	}
 
+	/* check with specific valid intr_handle */
+	test_intr_handle = intr_handles[TEST_INTERRUPT_HANDLE_VALID_DEV_EVENT];
+	if (rte_intr_disable(&test_intr_handle) == 0) {
+		printf("unexpectedly disable a specific intr_handle "
+			"successfully\n");
+		return -1;
+	}
+
 	/* check with valid handler and its type */
 	test_intr_handle = intr_handles[TEST_INTERRUPT_HANDLE_CASE1];
 	if (rte_intr_disable(&test_intr_handle) < 0) {
@@ -393,9 +414,17 @@ test_interrupt(void)
 		goto out;
 	}
 
+	printf("Check valid device event interrupt full path\n");
+	if (test_interrupt_full_path_check(
+		TEST_INTERRUPT_HANDLE_VALID_DEVICE_EVENT) < 0) {
+		printf("failure occurred during checking valid device event "
+						"interrupt full path\n");
+		goto out;
+	}
+
 	printf("Check valid alarm interrupt full path\n");
-	if (test_interrupt_full_path_check(TEST_INTERRUPT_HANDLE_VALID_ALARM)
-									< 0) {
+	if (test_interrupt_full_path_check(
+		TEST_INTERRUPT_HANDLE_VALID_DEV_EVENT) < 0) {
 		printf("failure occurred during checking valid alarm "
 						"interrupt full path\n");
 		goto out;
@@ -513,6 +542,12 @@ test_interrupt(void)
 	rte_intr_callback_unregister(&test_intr_handle,
 			test_interrupt_callback_1, (void *)-1);
 
+	test_intr_handle = intr_handles[TEST_INTERRUPT_HANDLE_VALID_DEV_EVENT];
+	rte_intr_callback_unregister(&test_intr_handle,
+			test_interrupt_callback, (void *)-1);
+	rte_intr_callback_unregister(&test_intr_handle,
+			test_interrupt_callback_1, (void *)-1);
+
 	rte_delay_ms(2 * TEST_INTERRUPT_CHECK_INTERVAL);
 	/* deinit */
 	test_interrupt_deinit();
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH V18 2/4] eal: add device event monitor framework
  2018-04-03 10:33                                                                 ` [PATCH V18 0/4] add device event monitor framework Jeff Guo
  2018-04-03 10:33                                                                   ` [PATCH V18 1/4] eal: add device event handle in interrupt thread Jeff Guo
@ 2018-04-03 10:33                                                                   ` Jeff Guo
  2018-04-04  2:53                                                                     ` Tan, Jianfeng
  2018-04-03 10:33                                                                   ` [PATCH V18 3/4] eal/linux: uevent parse and process Jeff Guo
  2018-04-03 10:33                                                                   ` [PATCH V18 4/4] app/testpmd: enable device hotplug monitoring Jeff Guo
  3 siblings, 1 reply; 494+ messages in thread
From: Jeff Guo @ 2018-04-03 10:33 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, harry.van.haaren,
	jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

This patch aims to add a general device event monitor framework at
EAL device layer, for device hotplug awareness and actions adopted
accordingly. It could also expand for all other types of device event
monitor, but not in this scope at the stage.

To get started, users firstly call below new added APIs to enable/disable
the device event monitor mechanism:
  - rte_dev_event_monitor_start
  - rte_dev_event_monitor_stop

Then users shell register or unregister callbacks through the new added
APIs. Callbacks can be some device specific, or for all devices.
  -rte_dev_event_callback_register
  -rte_dev_event_callback_unregister

Use hotplug case for example, when device hotplug insertion or hotplug
removal, we will get notified from kernel, then call user's callbacks
accordingly to handle it, such as detach or attach the device from the
bus, and could benefit further fail-safe or live-migration.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v18->v17:
add feature announcement in release document, fix bsp compile issue.
---
 doc/guides/rel_notes/release_18_05.rst  |   9 ++
 lib/librte_eal/bsdapp/eal/Makefile      |   1 +
 lib/librte_eal/bsdapp/eal/eal_dev.c     |  21 ++++
 lib/librte_eal/bsdapp/eal/meson.build   |   1 +
 lib/librte_eal/common/eal_common_dev.c  | 168 ++++++++++++++++++++++++++++++++
 lib/librte_eal/common/eal_private.h     |  15 +++
 lib/librte_eal/common/include/rte_dev.h |  94 ++++++++++++++++++
 lib/librte_eal/linuxapp/eal/Makefile    |   1 +
 lib/librte_eal/linuxapp/eal/eal_dev.c   |  22 +++++
 lib/librte_eal/linuxapp/eal/meson.build |   1 +
 lib/librte_eal/rte_eal_version.map      |   8 ++
 11 files changed, 341 insertions(+)
 create mode 100644 lib/librte_eal/bsdapp/eal/eal_dev.c
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_dev.c

diff --git a/doc/guides/rel_notes/release_18_05.rst b/doc/guides/rel_notes/release_18_05.rst
index 3923dc2..37e00c4 100644
--- a/doc/guides/rel_notes/release_18_05.rst
+++ b/doc/guides/rel_notes/release_18_05.rst
@@ -41,6 +41,15 @@ New Features
      Also, make sure to start the actual text at the margin.
      =========================================================
 
+* **Added device event monitor framework.**
+
+  Added a general device event monitor framework at EAL, for device dynamic management.
+  Such as device hotplug awareness and actions adopted accordingly. The list of new APIs:
+
+  * ``rte_dev_event_monitor_start`` and ``rte_dev_event_monitor_stop`` are for
+    the event monitor enable and disable.
+  * ``rte_dev_event_callback_register`` and ``rte_dev_event_callback_unregister``
+    are for the user's callbacks register and unregister.
 
 API Changes
 -----------
diff --git a/lib/librte_eal/bsdapp/eal/Makefile b/lib/librte_eal/bsdapp/eal/Makefile
index dd455e6..c0921dd 100644
--- a/lib/librte_eal/bsdapp/eal/Makefile
+++ b/lib/librte_eal/bsdapp/eal/Makefile
@@ -33,6 +33,7 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_lcore.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_timer.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_interrupts.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_alarm.c
+SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_dev.c
 
 # from common dir
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_common_lcore.c
diff --git a/lib/librte_eal/bsdapp/eal/eal_dev.c b/lib/librte_eal/bsdapp/eal/eal_dev.c
new file mode 100644
index 0000000..1c6c51b
--- /dev/null
+++ b/lib/librte_eal/bsdapp/eal/eal_dev.c
@@ -0,0 +1,21 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <rte_log.h>
+#include <rte_compat.h>
+#include <rte_dev.h>
+
+int __rte_experimental
+rte_dev_event_monitor_start(void)
+{
+	RTE_LOG(ERR, EAL, "Device event is not supported for FreeBSD\n");
+	return -1;
+}
+
+int __rte_experimental
+rte_dev_event_monitor_stop(void)
+{
+	RTE_LOG(ERR, EAL, "Device event is not supported for FreeBSD\n");
+	return -1;
+}
diff --git a/lib/librte_eal/bsdapp/eal/meson.build b/lib/librte_eal/bsdapp/eal/meson.build
index e83fc91..6dfc533 100644
--- a/lib/librte_eal/bsdapp/eal/meson.build
+++ b/lib/librte_eal/bsdapp/eal/meson.build
@@ -12,4 +12,5 @@ env_sources = files('eal_alarm.c',
 		'eal_timer.c',
 		'eal.c',
 		'eal_memory.c',
+		'eal_dev.c'
 )
diff --git a/lib/librte_eal/common/eal_common_dev.c b/lib/librte_eal/common/eal_common_dev.c
index cd07144..e09e86f 100644
--- a/lib/librte_eal/common/eal_common_dev.c
+++ b/lib/librte_eal/common/eal_common_dev.c
@@ -14,9 +14,34 @@
 #include <rte_devargs.h>
 #include <rte_debug.h>
 #include <rte_log.h>
+#include <rte_spinlock.h>
+#include <rte_malloc.h>
 
 #include "eal_private.h"
 
+/* spinlock for device callbacks */
+static rte_spinlock_t dev_event_lock = RTE_SPINLOCK_INITIALIZER;
+
+/**
+ * The device event callback description.
+ *
+ * It contains callback address to be registered by user application,
+ * the pointer to the parameters for callback, and the device name.
+ */
+struct dev_event_callback {
+	TAILQ_ENTRY(dev_event_callback) next; /**< Callbacks list */
+	rte_dev_event_cb_fn cb_fn;                /**< Callback address */
+	void *cb_arg;                           /**< Callback parameter */
+	char *dev_name;	 /**< Callback device name, NULL is for all device */
+	uint32_t active;                        /**< Callback is executing */
+};
+
+/** @internal Structure to keep track of registered callbacks */
+TAILQ_HEAD(dev_event_cb_list, dev_event_callback);
+
+/* The device event callback list for all registered callbacks. */
+static struct dev_event_cb_list dev_event_cbs;
+
 static int cmp_detached_dev_name(const struct rte_device *dev,
 	const void *_name)
 {
@@ -207,3 +232,146 @@ rte_eal_hotplug_remove(const char *busname, const char *devname)
 	rte_eal_devargs_remove(busname, devname);
 	return ret;
 }
+
+int __rte_experimental
+rte_dev_event_callback_register(const char *device_name,
+				rte_dev_event_cb_fn cb_fn,
+				void *cb_arg)
+{
+	struct dev_event_callback *event_cb;
+	int ret;
+
+	if (!cb_fn)
+		return -EINVAL;
+
+	rte_spinlock_lock(&dev_event_lock);
+
+	if (TAILQ_EMPTY(&dev_event_cbs))
+		TAILQ_INIT(&dev_event_cbs);
+
+	TAILQ_FOREACH(event_cb, &dev_event_cbs, next) {
+		if (event_cb->cb_fn == cb_fn && event_cb->cb_arg == cb_arg) {
+			if (device_name == NULL && event_cb->dev_name == NULL)
+				break;
+			if (device_name == NULL || event_cb->dev_name == NULL)
+				continue;
+			if (!strcmp(event_cb->dev_name, device_name))
+				break;
+		}
+	}
+
+	/* create a new callback. */
+	if (event_cb == NULL) {
+		event_cb = malloc(sizeof(struct dev_event_callback));
+		if (event_cb != NULL) {
+			event_cb->cb_fn = cb_fn;
+			event_cb->cb_arg = cb_arg;
+			event_cb->active = 0;
+			if (!device_name) {
+				event_cb->dev_name = NULL;
+			} else {
+				event_cb->dev_name = strdup(device_name);
+				if (event_cb->dev_name == NULL) {
+					ret = -ENOMEM;
+					goto error;
+				}
+			}
+			TAILQ_INSERT_TAIL(&dev_event_cbs, event_cb, next);
+		} else {
+			RTE_LOG(ERR, EAL,
+				"Failed to allocate memory for device "
+				"event callback.");
+			ret = -ENOMEM;
+			goto error;
+		}
+	} else {
+		RTE_LOG(ERR, EAL,
+			"The callback is already exist, no need "
+			"to register again.\n");
+		ret = -EEXIST;
+		goto error;
+	}
+
+	rte_spinlock_unlock(&dev_event_lock);
+	return 0;
+error:
+	free(event_cb);
+	rte_spinlock_unlock(&dev_event_lock);
+	return ret;
+}
+
+int __rte_experimental
+rte_dev_event_callback_unregister(const char *device_name,
+				  rte_dev_event_cb_fn cb_fn,
+				  void *cb_arg)
+{
+	int ret = 0;
+	struct dev_event_callback *event_cb, *next;
+
+	if (!cb_fn)
+		return -EINVAL;
+
+	rte_spinlock_lock(&dev_event_lock);
+
+	/*walk through the callbacks and remove all that match. */
+	for (event_cb = TAILQ_FIRST(&dev_event_cbs); event_cb != NULL;
+	     event_cb = next) {
+
+		next = TAILQ_NEXT(event_cb, next);
+
+		if (device_name != NULL && event_cb->dev_name != NULL) {
+			if (!strcmp(event_cb->dev_name, device_name)) {
+				if (event_cb->cb_fn != cb_fn ||
+				    (cb_arg != (void *)-1 &&
+				    event_cb->cb_arg != cb_arg))
+					continue;
+			}
+		} else if (device_name != NULL) {
+			continue;
+		}
+
+		/*
+		 * if this callback is not executing right now,
+		 * then remove it.
+		 */
+		if (event_cb->active == 0) {
+			TAILQ_REMOVE(&dev_event_cbs, event_cb, next);
+			free(event_cb);
+			ret++;
+		} else {
+			ret = -EAGAIN;
+		}
+	}
+	rte_spinlock_unlock(&dev_event_lock);
+	return ret;
+}
+
+void
+dev_callback_process(char *device_name, enum rte_dev_event_type event)
+{
+	struct dev_event_callback *cb_lst;
+	int rc;
+
+	if (device_name == NULL)
+		return;
+
+	rte_spinlock_lock(&dev_event_lock);
+
+	TAILQ_FOREACH(cb_lst, &dev_event_cbs, next) {
+		if (cb_lst->dev_name) {
+			if (strcmp(cb_lst->dev_name, device_name))
+				continue;
+		}
+		cb_lst->active = 1;
+		rte_spinlock_unlock(&dev_event_lock);
+		rc = cb_lst->cb_fn(device_name, event,
+				cb_lst->cb_arg);
+		if (rc) {
+			RTE_LOG(ERR, EAL,
+				"Failed to process callback function.");
+		}
+		rte_spinlock_lock(&dev_event_lock);
+		cb_lst->active = 0;
+	}
+	rte_spinlock_unlock(&dev_event_lock);
+}
diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h
index 0b28770..88e5a59 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -9,6 +9,8 @@
 #include <stdint.h>
 #include <stdio.h>
 
+#include <rte_dev.h>
+
 /**
  * Initialize the memzone subsystem (private to eal).
  *
@@ -205,4 +207,17 @@ struct rte_bus *rte_bus_find_by_device_name(const char *str);
 
 int rte_mp_channel_init(void);
 
+/**
+ * Internal Executes all the user application registered callbacks for
+ * the specific device. It is for DPDK internal user only. User
+ * application should not call it directly.
+ *
+ * @param device_name
+ *  The device name.
+ * @param event
+ *  the device event type.
+ *
+ */
+void
+dev_callback_process(char *device_name, enum rte_dev_event_type event);
 #endif /* _EAL_PRIVATE_H_ */
diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h
index b688f1e..4c78938 100644
--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -24,6 +24,25 @@ extern "C" {
 #include <rte_compat.h>
 #include <rte_log.h>
 
+/**
+ * The device event type.
+ */
+enum rte_dev_event_type {
+	RTE_DEV_EVENT_ADD,	/**< device being added */
+	RTE_DEV_EVENT_REMOVE,	/**< device being removed */
+	RTE_DEV_EVENT_MAX	/**< max value of this enum */
+};
+
+struct rte_dev_event {
+	enum rte_dev_event_type type;	/**< device event type */
+	int subsystem;			/**< subsystem id */
+	char *devname;			/**< device name */
+};
+
+typedef int (*rte_dev_event_cb_fn)(char *device_name,
+					enum rte_dev_event_type event,
+					void *cb_arg);
+
 __attribute__((format(printf, 2, 0)))
 static inline void
 rte_pmd_debug_trace(const char *func_name, const char *fmt, ...)
@@ -267,4 +286,79 @@ __attribute__((used)) = str
 }
 #endif
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * It registers the callback for the specific device.
+ * Multiple callbacks cal be registered at the same time.
+ *
+ * @param device_name
+ *  The device name, that is the param name of the struct rte_device,
+ *  null value means for all devices.
+ * @param cb_fn
+ *  callback address.
+ * @param cb_arg
+ *  address of parameter for callback.
+ *
+ * @return
+ *  - On success, zero.
+ *  - On failure, a negative value.
+ */
+int __rte_experimental
+rte_dev_event_callback_register(const char *device_name,
+				rte_dev_event_cb_fn cb_fn,
+				void *cb_arg);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * It unregisters the callback according to the specified device.
+ *
+ * @param device_name
+ *  The device name, that is the param name of the struct rte_device,
+ *  null value means for all devices.
+ * @param cb_fn
+ *  callback address.
+ * @param cb_arg
+ *  address of parameter for callback, (void *)-1 means to remove all
+ *  registered which has the same callback address.
+ *
+ * @return
+ *  - On success, return the number of callback entities removed.
+ *  - On failure, a negative value.
+ */
+int __rte_experimental
+rte_dev_event_callback_unregister(const char *device_name,
+				  rte_dev_event_cb_fn cb_fn,
+				  void *cb_arg);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Start the device event monitoring.
+ *
+ * @param none
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int __rte_experimental
+rte_dev_event_monitor_start(void);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Stop the device event monitoring .
+ *
+ * @param none
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int __rte_experimental
+rte_dev_event_monitor_stop(void);
 #endif /* _RTE_DEV_H_ */
diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index 7e5bbe8..8578796 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -41,6 +41,7 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_lcore.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_timer.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_interrupts.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_alarm.c
+SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_dev.c
 
 # from common dir
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_common_lcore.c
diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
new file mode 100644
index 0000000..9c8d1a0
--- /dev/null
+++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
@@ -0,0 +1,22 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <rte_log.h>
+#include <rte_compat.h>
+#include <rte_dev.h>
+
+
+int __rte_experimental
+rte_dev_event_monitor_start(void)
+{
+	/* TODO: start uevent monitor for linux */
+	return 0;
+}
+
+int __rte_experimental
+rte_dev_event_monitor_stop(void)
+{
+	/* TODO: stop uevent monitor for linux */
+	return 0;
+}
diff --git a/lib/librte_eal/linuxapp/eal/meson.build b/lib/librte_eal/linuxapp/eal/meson.build
index 03974ff..b222571 100644
--- a/lib/librte_eal/linuxapp/eal/meson.build
+++ b/lib/librte_eal/linuxapp/eal/meson.build
@@ -18,6 +18,7 @@ env_sources = files('eal_alarm.c',
 		'eal_vfio_mp_sync.c',
 		'eal.c',
 		'eal_memory.c',
+		'eal_dev.c',
 )
 
 if has_libnuma == 1
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index d123602..d23f491 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -256,3 +256,11 @@ EXPERIMENTAL {
 	rte_service_start_with_defaults;
 
 } DPDK_18.02;
+
+EXPERIMENTAL {
+        global:
+
+        rte_dev_event_callback_register;
+        rte_dev_event_callback_unregister;
+
+} DPDK_18.05;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH V18 3/4] eal/linux: uevent parse and process
  2018-04-03 10:33                                                                 ` [PATCH V18 0/4] add device event monitor framework Jeff Guo
  2018-04-03 10:33                                                                   ` [PATCH V18 1/4] eal: add device event handle in interrupt thread Jeff Guo
  2018-04-03 10:33                                                                   ` [PATCH V18 2/4] eal: add device event monitor framework Jeff Guo
@ 2018-04-03 10:33                                                                   ` Jeff Guo
  2018-04-04  3:15                                                                     ` Tan, Jianfeng
  2018-04-03 10:33                                                                   ` [PATCH V18 4/4] app/testpmd: enable device hotplug monitoring Jeff Guo
  3 siblings, 1 reply; 494+ messages in thread
From: Jeff Guo @ 2018-04-03 10:33 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, harry.van.haaren,
	jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

In order to handle the uevent which has been detected from the kernel
side, add uevent parse and process function to translate the uevent into
device event, which user has subscribed to monitor.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v18->v17:
refine socket configuration.
---
 lib/librte_eal/linuxapp/eal/eal_dev.c | 178 +++++++++++++++++++++++++++++++++-
 1 file changed, 176 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
index 9c8d1a0..9f2ee40 100644
--- a/lib/librte_eal/linuxapp/eal/eal_dev.c
+++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
@@ -2,21 +2,195 @@
  * Copyright(c) 2018 Intel Corporation
  */
 
+#include <string.h>
+#include <unistd.h>
+#include <sys/socket.h>
+#include <linux/netlink.h>
+
 #include <rte_log.h>
 #include <rte_compat.h>
 #include <rte_dev.h>
+#include <rte_malloc.h>
+#include <rte_interrupts.h>
+
+#include "eal_private.h"
+
+static struct rte_intr_handle intr_handle = {.fd = -1 };
+static bool monitor_started;
+
+#define EAL_UEV_MSG_LEN 4096
+#define EAL_UEV_MSG_ELEM_LEN 128
+
+/* identify the system layer which event exposure from */
+enum eal_dev_event_subsystem {
+	EAL_DEV_EVENT_SUBSYSTEM_PCI, /* PCI bus device event */
+	EAL_DEV_EVENT_SUBSYSTEM_UIO, /* UIO driver device event */
+	EAL_DEV_EVENT_SUBSYSTEM_VFIO, /* VFIO driver device event */
+	EAL_DEV_EVENT_SUBSYSTEM_MAX
+};
+
+static int
+dev_uev_socket_fd_create(void)
+{
+	struct sockaddr_nl addr;
+	int ret;
+
+	intr_handle.fd = socket(PF_NETLINK, SOCK_RAW | SOCK_CLOEXEC |
+			SOCK_NONBLOCK,
+			NETLINK_KOBJECT_UEVENT);
+	if (intr_handle.fd < 0) {
+		RTE_LOG(ERR, EAL, "create uevent fd failed.\n");
+		return -1;
+	}
+
+	memset(&addr, 0, sizeof(addr));
+	addr.nl_family = AF_NETLINK;
+	addr.nl_pid = 0;
+	addr.nl_groups = 0xffffffff;
+
+	ret = bind(intr_handle.fd, (struct sockaddr *) &addr, sizeof(addr));
+	if (ret < 0) {
+		RTE_LOG(ERR, EAL, "Failed to bind socket for netlink fd.\n");
+		goto err;
+	}
+
+	return 0;
+err:
+	close(intr_handle.fd);
+	return ret;
+}
+
+static void
+dev_uev_parse(const char *buf, struct rte_dev_event *event, int length)
+{
+	char action[EAL_UEV_MSG_ELEM_LEN];
+	char subsystem[EAL_UEV_MSG_ELEM_LEN];
+	char pci_slot_name[EAL_UEV_MSG_ELEM_LEN];
+	int i = 0;
+
+	memset(action, 0, EAL_UEV_MSG_ELEM_LEN);
+	memset(subsystem, 0, EAL_UEV_MSG_ELEM_LEN);
+	memset(pci_slot_name, 0, EAL_UEV_MSG_ELEM_LEN);
+
+	while (i < length) {
+		for (; i < length; i++) {
+			if (*buf)
+				break;
+			buf++;
+		}
+		if (!strncmp(buf, "ACTION=", 7)) {
+			buf += 7;
+			i += 7;
+			snprintf(action, sizeof(action), "%s", buf);
+		} else if (!strncmp(buf, "SUBSYSTEM=", 10)) {
+			buf += 10;
+			i += 10;
+			snprintf(subsystem, sizeof(subsystem), "%s", buf);
+		} else if (!strncmp(buf, "PCI_SLOT_NAME=", 14)) {
+			buf += 14;
+			i += 14;
+			snprintf(pci_slot_name, sizeof(subsystem), "%s", buf);
+			event->devname = strdup(pci_slot_name);
+		}
+		for (; i < length; i++) {
+			if (*buf == '\0')
+				break;
+			buf++;
+		}
+	}
+
+	if (!strncmp(subsystem, "uio", 3))
+		event->subsystem = EAL_DEV_EVENT_SUBSYSTEM_UIO;
+	else if (!strncmp(subsystem, "pci", 3))
+		event->subsystem = EAL_DEV_EVENT_SUBSYSTEM_PCI;
+	else if (!strncmp(subsystem, "vfio", 4))
+		event->subsystem = EAL_DEV_EVENT_SUBSYSTEM_VFIO;
+	if (!strncmp(action, "add", 3))
+		event->type = RTE_DEV_EVENT_ADD;
+	if (!strncmp(action, "remove", 6))
+		event->type = RTE_DEV_EVENT_REMOVE;
+}
+
+static int
+dev_uev_receive(int fd, struct rte_dev_event *uevent)
+{
+	int ret;
+	char buf[EAL_UEV_MSG_LEN];
+
+	memset(uevent, 0, sizeof(struct rte_dev_event));
+	memset(buf, 0, EAL_UEV_MSG_LEN);
+
+	ret = recv(fd, buf, EAL_UEV_MSG_LEN, MSG_DONTWAIT);
+	if (ret < 0) {
+		RTE_LOG(ERR, EAL,
+			"Socket read error(%d): %s.\n",
+			errno, strerror(errno));
+		return -1;
+	} else if (ret == 0)
+		/* connection closed */
+		return -1;
+
+	dev_uev_parse(buf, uevent, EAL_UEV_MSG_LEN);
+
+	return 0;
+}
+
+static void
+dev_uev_process(__rte_unused void *param)
+{
+	struct rte_dev_event uevent;
+
+	if (dev_uev_receive(intr_handle.fd, &uevent))
+		return;
 
+	if (uevent.devname)
+		dev_callback_process(uevent.devname, uevent.type);
+}
 
 int __rte_experimental
 rte_dev_event_monitor_start(void)
 {
-	/* TODO: start uevent monitor for linux */
+	int ret;
+
+	if (monitor_started)
+		return 0;
+
+	ret = dev_uev_socket_fd_create();
+	if (ret) {
+		RTE_LOG(ERR, EAL, "error create device event fd.\n");
+		return -1;
+	}
+
+	intr_handle.type = RTE_INTR_HANDLE_DEV_EVENT;
+	ret = rte_intr_callback_register(&intr_handle, dev_uev_process, NULL);
+
+	if (ret) {
+		RTE_LOG(ERR, EAL, "fail to register uevent callback.\n");
+		return -1;
+	}
+
+	monitor_started = true;
+
 	return 0;
 }
 
 int __rte_experimental
 rte_dev_event_monitor_stop(void)
 {
-	/* TODO: stop uevent monitor for linux */
+	int ret;
+
+	if (!monitor_started)
+		return 0;
+
+	ret = rte_intr_callback_unregister(&intr_handle, dev_uev_process,
+					   (void *)-1);
+	if (ret < 0) {
+		RTE_LOG(ERR, EAL, "fail to unregister uevent callback.\n");
+		return ret;
+	}
+
+	close(intr_handle.fd);
+	intr_handle.fd = -1;
+	monitor_started = false;
 	return 0;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH V18 4/4] app/testpmd: enable device hotplug monitoring
  2018-04-03 10:33                                                                 ` [PATCH V18 0/4] add device event monitor framework Jeff Guo
                                                                                     ` (2 preceding siblings ...)
  2018-04-03 10:33                                                                   ` [PATCH V18 3/4] eal/linux: uevent parse and process Jeff Guo
@ 2018-04-03 10:33                                                                   ` Jeff Guo
  2018-04-04  3:22                                                                     ` Tan, Jianfeng
                                                                                       ` (2 more replies)
  3 siblings, 3 replies; 494+ messages in thread
From: Jeff Guo @ 2018-04-03 10:33 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, harry.van.haaren,
	jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

Use testpmd for example, to show how an application use device event
APIs to monitor the hotplug events, including both hot removal event
and hot insertion event.

The process is that, testpmd first enable hotplug by below commands,

E.g. ./build/app/testpmd -c 0x3 --n 4 -- -i --hot-plug

then testpmd start the device event monitor by call the new API
(rte_dev_event_monitor_start) and register the user's callback by call
the API (rte_dev_event_callback_register), when device being hotplug
insertion or hotplug removal, the device event monitor detects the event
and call user's callbacks, user could process the event in the callback
accordingly.

This patch only shows the event monitoring, device attach/detach would
not be involved here, will add from other hotplug patch set.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v18->v17:
remove hotplug policy and detach/attach process from testpmd, let it
focus on the device event monitoring which the patch set introduced.
---
 app/test-pmd/parameters.c             |   5 +-
 app/test-pmd/testpmd.c                | 112 +++++++++++++++++++++++++++++++++-
 app/test-pmd/testpmd.h                |   2 +
 doc/guides/testpmd_app_ug/run_app.rst |   4 ++
 4 files changed, 121 insertions(+), 2 deletions(-)

diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index 97d22b8..558cd40 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -186,6 +186,7 @@ usage(char* progname)
 	printf("  --flow-isolate-all: "
 	       "requests flow API isolated mode on all ports at initialization time.\n");
 	printf("  --tx-offloads=0xXXXXXXXX: hexadecimal bitmask of TX queue offloads\n");
+	printf("  --hot-plug: enable hot plug for device.\n");
 }
 
 #ifdef RTE_LIBRTE_CMDLINE
@@ -621,6 +622,7 @@ launch_args_parse(int argc, char** argv)
 		{ "print-event",		1, 0, 0 },
 		{ "mask-event",			1, 0, 0 },
 		{ "tx-offloads",		1, 0, 0 },
+		{ "hot-plug",			0, 0, 0 },
 		{ 0, 0, 0, 0 },
 	};
 
@@ -1102,7 +1104,8 @@ launch_args_parse(int argc, char** argv)
 					rte_exit(EXIT_FAILURE,
 						 "invalid mask-event argument\n");
 				}
-
+			if (!strcmp(lgopts[opt_idx].name, "hot-plug"))
+				hot_plug = 1;
 			break;
 		case 'h':
 			usage(argv[0]);
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 4c0e258..2faeb90 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -12,6 +12,7 @@
 #include <sys/mman.h>
 #include <sys/types.h>
 #include <errno.h>
+#include <stdbool.h>
 
 #include <sys/queue.h>
 #include <sys/stat.h>
@@ -284,6 +285,8 @@ uint8_t lsc_interrupt = 1; /* enabled by default */
  */
 uint8_t rmv_interrupt = 1; /* enabled by default */
 
+uint8_t hot_plug = 0; /**< hotplug disabled by default. */
+
 /*
  * Display or mask ether events
  * Default to all events except VF_MBOX
@@ -391,6 +394,12 @@ static void check_all_ports_link_status(uint32_t port_mask);
 static int eth_event_callback(portid_t port_id,
 			      enum rte_eth_event_type type,
 			      void *param, void *ret_param);
+static int eth_dev_event_callback(char *device_name,
+				enum rte_dev_event_type type,
+				void *param);
+static int eth_dev_event_callback_register(void);
+static int eth_dev_event_callback_unregister(void);
+
 
 /*
  * Check if all the ports are started.
@@ -1853,6 +1862,39 @@ reset_port(portid_t pid)
 	printf("Done\n");
 }
 
+static int
+eth_dev_event_callback_register(void)
+{
+	int diag;
+
+	/* register the device event callback */
+	diag = rte_dev_event_callback_register(NULL,
+		eth_dev_event_callback, NULL);
+	if (diag) {
+		printf("Failed to setup dev_event callback\n");
+		return -1;
+	}
+
+	return 0;
+}
+
+
+static int
+eth_dev_event_callback_unregister(void)
+{
+	int diag;
+
+	/* unregister the device event callback */
+	diag = rte_dev_event_callback_unregister(NULL,
+		eth_dev_event_callback, NULL);
+	if (diag) {
+		printf("Failed to setup dev_event callback\n");
+		return -1;
+	}
+
+	return 0;
+}
+
 void
 attach_port(char *identifier)
 {
@@ -1916,6 +1958,7 @@ void
 pmd_test_exit(void)
 {
 	portid_t pt_id;
+	int ret;
 
 	if (test_done == 0)
 		stop_packet_forwarding();
@@ -1929,6 +1972,18 @@ pmd_test_exit(void)
 			close_port(pt_id);
 		}
 	}
+
+	if (hot_plug) {
+		ret = rte_dev_event_monitor_stop();
+		if (ret)
+			RTE_LOG(ERR, EAL,
+				"fail to stop device event monitor.");
+
+		ret = eth_dev_event_callback_unregister();
+		if (ret)
+			RTE_LOG(ERR, EAL,
+				"fail to unregister all event callbacks.");
+	}
 	printf("\nBye...\n");
 }
 
@@ -2059,6 +2114,48 @@ eth_event_callback(portid_t port_id, enum rte_eth_event_type type, void *param,
 	return 0;
 }
 
+/* This function is used by the interrupt thread */
+static int
+eth_dev_event_callback(char *device_name, enum rte_dev_event_type type,
+			     __rte_unused void *arg)
+{
+	int ret = 0;
+	static const char * const event_desc[] = {
+		[RTE_DEV_EVENT_ADD] = "add",
+		[RTE_DEV_EVENT_REMOVE] = "remove",
+	};
+
+	if (type >= RTE_DEV_EVENT_MAX) {
+		fprintf(stderr, "%s called upon invalid event %d\n",
+			__func__, type);
+		fflush(stderr);
+	} else if (event_print_mask & (UINT32_C(1) << type)) {
+		printf("%s event\n",
+			event_desc[type]);
+		fflush(stdout);
+	}
+
+	switch (type) {
+	case RTE_DEV_EVENT_REMOVE:
+		RTE_LOG(ERR, EAL, "The device: %s has been removed!\n",
+			device_name);
+		/* TODO: After finish failure handle, begin to stop
+		 * packet forward, stop port, close port, detach port.
+		 */
+		break;
+	case RTE_DEV_EVENT_ADD:
+		RTE_LOG(ERR, EAL, "The device: %s has been added!\n",
+			device_name);
+		/* TODO: After finish kernel driver binding,
+		 * begin to attach port.
+		 */
+		break;
+	default:
+		break;
+	}
+	return ret;
+}
+
 static int
 set_tx_queue_stats_mapping_registers(portid_t port_id, struct rte_port *port)
 {
@@ -2474,8 +2571,9 @@ signal_handler(int signum)
 int
 main(int argc, char** argv)
 {
-	int  diag;
+	int diag;
 	portid_t port_id;
+	int ret;
 
 	signal(SIGINT, signal_handler);
 	signal(SIGTERM, signal_handler);
@@ -2543,6 +2641,18 @@ main(int argc, char** argv)
 		       nb_rxq, nb_txq);
 
 	init_config();
+
+	if (hot_plug) {
+		/* enable hot plug monitoring */
+		ret = rte_dev_event_monitor_start();
+		if (ret) {
+			rte_errno = EINVAL;
+			return -1;
+		}
+		eth_dev_event_callback_register();
+
+	}
+
 	if (start_port(RTE_PORT_ALL) != 0)
 		rte_exit(EXIT_FAILURE, "Start ports failed\n");
 
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 153abea..8fde68d 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -319,6 +319,8 @@ extern volatile int test_done; /* stop packet forwarding when set to 1. */
 extern uint8_t lsc_interrupt; /**< disabled by "--no-lsc-interrupt" parameter */
 extern uint8_t rmv_interrupt; /**< disabled by "--no-rmv-interrupt" parameter */
 extern uint32_t event_print_mask;
+extern uint8_t hot_plug; /**< enable by "--hot-plug" parameter */
+
 /**< set by "--print-event xxxx" and "--mask-event xxxx parameters */
 
 #ifdef RTE_LIBRTE_IXGBE_BYPASS
diff --git a/doc/guides/testpmd_app_ug/run_app.rst b/doc/guides/testpmd_app_ug/run_app.rst
index 1fd5395..d0ced36 100644
--- a/doc/guides/testpmd_app_ug/run_app.rst
+++ b/doc/guides/testpmd_app_ug/run_app.rst
@@ -479,3 +479,7 @@ The commandline options are:
 
     Set the hexadecimal bitmask of TX queue offloads.
     The default value is 0.
+
+*   ``--hot-plug``
+
+    Enable device event monitor machenism for hotplug.
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* Re: [PATCH V18 1/4] eal: add device event handle in interrupt thread
  2018-04-03 10:33                                                                   ` [PATCH V18 1/4] eal: add device event handle in interrupt thread Jeff Guo
@ 2018-04-04  1:47                                                                     ` Tan, Jianfeng
  2018-04-04  4:00                                                                       ` Guo, Jia
  0 siblings, 1 reply; 494+ messages in thread
From: Tan, Jianfeng @ 2018-04-04  1:47 UTC (permalink / raw)
  To: Guo, Jia, stephen, Richardson, Bruce, Yigit, Ferruh, Ananyev,
	Konstantin, gaetan.rivet, Wu, Jingjing, thomas, motih,
	Van Haaren, Harry
  Cc: jblunck, shreyansh.jain, dev, Zhang, Helin



> -----Original Message-----
> From: Guo, Jia
> Sent: Tuesday, April 3, 2018 6:34 PM
> To: stephen@networkplumber.org; Richardson, Bruce; Yigit, Ferruh;
> Ananyev, Konstantin; gaetan.rivet@6wind.com; Wu, Jingjing;
> thomas@monjalon.net; motih@mellanox.com; Van Haaren, Harry; Tan,
> Jianfeng
> Cc: jblunck@infradead.org; shreyansh.jain@nxp.com; dev@dpdk.org; Guo,
> Jia; Zhang, Helin
> Subject: [PATCH V18 1/4] eal: add device event handle in interrupt thread
> 
> Add new interrupt handle type of RTE_INTR_HANDLE_DEV_EVENT, for
> device event interrupt monitor.
> 
> Signed-off-by: Jeff Guo <jia.guo@intel.com>

After fixing a typo below, you can carry my:

Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>


> ---
> v18->v17:
> no change.
> ---
>  lib/librte_eal/common/include/rte_eal_interrupts.h |  1 +
>  lib/librte_eal/linuxapp/eal/eal_interrupts.c       | 11 +++++-
>  test/test/test_interrupts.c                        | 39 ++++++++++++++++++++--
>  3 files changed, 48 insertions(+), 3 deletions(-)
> 
> diff --git a/lib/librte_eal/common/include/rte_eal_interrupts.h
> b/lib/librte_eal/common/include/rte_eal_interrupts.h
> index 3f792a9..6eb4932 100644
> --- a/lib/librte_eal/common/include/rte_eal_interrupts.h
> +++ b/lib/librte_eal/common/include/rte_eal_interrupts.h
> @@ -34,6 +34,7 @@ enum rte_intr_handle_type {
>  	RTE_INTR_HANDLE_ALARM,        /**< alarm handle */
>  	RTE_INTR_HANDLE_EXT,          /**< external handler */
>  	RTE_INTR_HANDLE_VDEV,         /**< virtual device */
> +	RTE_INTR_HANDLE_DEV_EVENT,    /**< device event handle */
>  	RTE_INTR_HANDLE_MAX           /**< count of elements */
>  };
> 
> diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c
> b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
> index f86f22f..58e9328 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
> @@ -559,6 +559,9 @@ rte_intr_enable(const struct rte_intr_handle
> *intr_handle)
>  			return -1;
>  		break;
>  #endif
> +	/* not used at this moment */
> +	case RTE_INTR_HANDLE_DEV_EVENT:
> +		return -1;
>  	/* unknown handle type */
>  	default:
>  		RTE_LOG(ERR, EAL,
> @@ -606,6 +609,9 @@ rte_intr_disable(const struct rte_intr_handle
> *intr_handle)
>  			return -1;
>  		break;
>  #endif
> +	/* not used at this moment */
> +	case RTE_INTR_HANDLE_DEV_EVENT:
> +		return -1;
>  	/* unknown handle type */
>  	default:
>  		RTE_LOG(ERR, EAL,
> @@ -674,7 +680,10 @@ eal_intr_process_interrupts(struct epoll_event
> *events, int nfds)
>  			bytes_read = 0;
>  			call = true;
>  			break;
> -
> +		case RTE_INTR_HANDLE_DEV_EVENT:
> +			bytes_read = 0;
> +			call = true;
> +			break;
>  		default:
>  			bytes_read = 1;
>  			break;
> diff --git a/test/test/test_interrupts.c b/test/test/test_interrupts.c
> index 31a70a0..7f4f1b4 100644
> --- a/test/test/test_interrupts.c
> +++ b/test/test/test_interrupts.c
> @@ -20,6 +20,7 @@ enum test_interrupt_handle_type {
>  	TEST_INTERRUPT_HANDLE_VALID,
>  	TEST_INTERRUPT_HANDLE_VALID_UIO,
>  	TEST_INTERRUPT_HANDLE_VALID_ALARM,
> +	TEST_INTERRUPT_HANDLE_VALID_DEV_EVENT,
>  	TEST_INTERRUPT_HANDLE_CASE1,
>  	TEST_INTERRUPT_HANDLE_MAX
>  };
> @@ -80,6 +81,10 @@ test_interrupt_init(void)
>  	intr_handles[TEST_INTERRUPT_HANDLE_VALID_ALARM].type =
>  					RTE_INTR_HANDLE_ALARM;
> 
> +	intr_handles[TEST_INTERRUPT_HANDLE_VALID_DEV_EVENT].fd =
> pfds.readfd;
> +	intr_handles[TEST_INTERRUPT_HANDLE_VALID_DEV_EVENT].type =
> +					RTE_INTR_HANDLE_DEV_EVENT;
> +
>  	intr_handles[TEST_INTERRUPT_HANDLE_CASE1].fd = pfds.writefd;
>  	intr_handles[TEST_INTERRUPT_HANDLE_CASE1].type =
> RTE_INTR_HANDLE_UIO;
> 
> @@ -250,6 +255,14 @@ test_interrupt_enable(void)
>  		return -1;
>  	}
> 
> +	/* check with specific valid intr_handle */
> +	test_intr_handle =
> intr_handles[TEST_INTERRUPT_HANDLE_VALID_DEV_EVENT];
> +	if (rte_intr_enable(&test_intr_handle) == 0) {
> +		printf("unexpectedly enable a specific intr_handle "
> +			"successfully\n");
> +		return -1;
> +	}
> +
>  	/* check with valid handler and its type */
>  	test_intr_handle = intr_handles[TEST_INTERRUPT_HANDLE_CASE1];
>  	if (rte_intr_enable(&test_intr_handle) < 0) {
> @@ -306,6 +319,14 @@ test_interrupt_disable(void)
>  		return -1;
>  	}
> 
> +	/* check with specific valid intr_handle */
> +	test_intr_handle =
> intr_handles[TEST_INTERRUPT_HANDLE_VALID_DEV_EVENT];
> +	if (rte_intr_disable(&test_intr_handle) == 0) {
> +		printf("unexpectedly disable a specific intr_handle "
> +			"successfully\n");
> +		return -1;
> +	}
> +
>  	/* check with valid handler and its type */
>  	test_intr_handle = intr_handles[TEST_INTERRUPT_HANDLE_CASE1];
>  	if (rte_intr_disable(&test_intr_handle) < 0) {
> @@ -393,9 +414,17 @@ test_interrupt(void)
>  		goto out;
>  	}
> 
> +	printf("Check valid device event interrupt full path\n");
> +	if (test_interrupt_full_path_check(
> +		TEST_INTERRUPT_HANDLE_VALID_DEVICE_EVENT) < 0) {

Here is a typo which brings below compile error:
	dpdk/test/test/test_interrupts.c:419:3: error: 'TEST_INTERRUPT_HANDLE_VALID_DEVICE_EVENT' undeclared (first use in this function)
	   TEST_INTERRUPT_HANDLE_VALID_DEVICE_EVENT) < 0) {

Thanks,
Jianfeng

> +		printf("failure occurred during checking valid device event "
> +						"interrupt full path\n");
> +		goto out;
> +	}
> +
>  	printf("Check valid alarm interrupt full path\n");
> -	if
> (test_interrupt_full_path_check(TEST_INTERRUPT_HANDLE_VALID_ALARM)
> -									< 0) {
> +	if (test_interrupt_full_path_check(
> +		TEST_INTERRUPT_HANDLE_VALID_DEV_EVENT) < 0) {
>  		printf("failure occurred during checking valid alarm "
>  						"interrupt full path\n");
>  		goto out;
> @@ -513,6 +542,12 @@ test_interrupt(void)
>  	rte_intr_callback_unregister(&test_intr_handle,
>  			test_interrupt_callback_1, (void *)-1);
> 
> +	test_intr_handle =
> intr_handles[TEST_INTERRUPT_HANDLE_VALID_DEV_EVENT];
> +	rte_intr_callback_unregister(&test_intr_handle,
> +			test_interrupt_callback, (void *)-1);
> +	rte_intr_callback_unregister(&test_intr_handle,
> +			test_interrupt_callback_1, (void *)-1);
> +
>  	rte_delay_ms(2 * TEST_INTERRUPT_CHECK_INTERVAL);
>  	/* deinit */
>  	test_interrupt_deinit();
> --
> 2.7.4

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V18 2/4] eal: add device event monitor framework
  2018-04-03 10:33                                                                   ` [PATCH V18 2/4] eal: add device event monitor framework Jeff Guo
@ 2018-04-04  2:53                                                                     ` Tan, Jianfeng
  2018-04-05  3:44                                                                       ` Guo, Jia
  0 siblings, 1 reply; 494+ messages in thread
From: Tan, Jianfeng @ 2018-04-04  2:53 UTC (permalink / raw)
  To: Guo, Jia, stephen, Richardson, Bruce, Yigit, Ferruh, Ananyev,
	Konstantin, gaetan.rivet, Wu, Jingjing, thomas, motih,
	Van Haaren, Harry
  Cc: jblunck, shreyansh.jain, dev, Zhang, Helin



> -----Original Message-----
> From: Guo, Jia
> Sent: Tuesday, April 3, 2018 6:34 PM
> To: stephen@networkplumber.org; Richardson, Bruce; Yigit, Ferruh;
> Ananyev, Konstantin; gaetan.rivet@6wind.com; Wu, Jingjing;
> thomas@monjalon.net; motih@mellanox.com; Van Haaren, Harry; Tan,
> Jianfeng
> Cc: jblunck@infradead.org; shreyansh.jain@nxp.com; dev@dpdk.org; Guo,
> Jia; Zhang, Helin
> Subject: [PATCH V18 2/4] eal: add device event monitor framework
> 
> This patch aims to add a general device event monitor framework at
> EAL device layer, for device hotplug awareness and actions adopted
> accordingly. It could also expand for all other types of device event
> monitor, but not in this scope at the stage.
> 
> To get started, users firstly call below new added APIs to enable/disable
> the device event monitor mechanism:
>   - rte_dev_event_monitor_start
>   - rte_dev_event_monitor_stop
> 
> Then users shell register or unregister callbacks through the new added
> APIs. Callbacks can be some device specific, or for all devices.
>   -rte_dev_event_callback_register
>   -rte_dev_event_callback_unregister
> 
> Use hotplug case for example, when device hotplug insertion or hotplug
> removal, we will get notified from kernel, then call user's callbacks
> accordingly to handle it, such as detach or attach the device from the
> bus, and could benefit further fail-safe or live-migration.
> 
> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> ---
> v18->v17:
> add feature announcement in release document, fix bsp compile issue.
> ---
>  doc/guides/rel_notes/release_18_05.rst  |   9 ++
>  lib/librte_eal/bsdapp/eal/Makefile      |   1 +
>  lib/librte_eal/bsdapp/eal/eal_dev.c     |  21 ++++
>  lib/librte_eal/bsdapp/eal/meson.build   |   1 +
>  lib/librte_eal/common/eal_common_dev.c  | 168
> ++++++++++++++++++++++++++++++++
>  lib/librte_eal/common/eal_private.h     |  15 +++
>  lib/librte_eal/common/include/rte_dev.h |  94 ++++++++++++++++++
>  lib/librte_eal/linuxapp/eal/Makefile    |   1 +
>  lib/librte_eal/linuxapp/eal/eal_dev.c   |  22 +++++
>  lib/librte_eal/linuxapp/eal/meson.build |   1 +
>  lib/librte_eal/rte_eal_version.map      |   8 ++
>  11 files changed, 341 insertions(+)
>  create mode 100644 lib/librte_eal/bsdapp/eal/eal_dev.c
>  create mode 100644 lib/librte_eal/linuxapp/eal/eal_dev.c
> 
> diff --git a/doc/guides/rel_notes/release_18_05.rst
> b/doc/guides/rel_notes/release_18_05.rst
> index 3923dc2..37e00c4 100644
> --- a/doc/guides/rel_notes/release_18_05.rst
> +++ b/doc/guides/rel_notes/release_18_05.rst
> @@ -41,6 +41,15 @@ New Features
>       Also, make sure to start the actual text at the margin.
> 
> =========================================================
> 
> +* **Added device event monitor framework.**
> +
> +  Added a general device event monitor framework at EAL, for device
> dynamic management.
> +  Such as device hotplug awareness and actions adopted accordingly. The list
> of new APIs:
> +
> +  * ``rte_dev_event_monitor_start`` and ``rte_dev_event_monitor_stop``
> are for
> +    the event monitor enable and disable.
> +  * ``rte_dev_event_callback_register`` and
> ``rte_dev_event_callback_unregister``
> +    are for the user's callbacks register and unregister.
> 
>  API Changes
>  -----------
> diff --git a/lib/librte_eal/bsdapp/eal/Makefile
> b/lib/librte_eal/bsdapp/eal/Makefile
> index dd455e6..c0921dd 100644
> --- a/lib/librte_eal/bsdapp/eal/Makefile
> +++ b/lib/librte_eal/bsdapp/eal/Makefile
> @@ -33,6 +33,7 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) +=
> eal_lcore.c
>  SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_timer.c
>  SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_interrupts.c
>  SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_alarm.c
> +SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_dev.c
> 
>  # from common dir
>  SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_common_lcore.c
> diff --git a/lib/librte_eal/bsdapp/eal/eal_dev.c
> b/lib/librte_eal/bsdapp/eal/eal_dev.c
> new file mode 100644
> index 0000000..1c6c51b
> --- /dev/null
> +++ b/lib/librte_eal/bsdapp/eal/eal_dev.c
> @@ -0,0 +1,21 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2018 Intel Corporation
> + */
> +
> +#include <rte_log.h>
> +#include <rte_compat.h>
> +#include <rte_dev.h>
> +
> +int __rte_experimental
> +rte_dev_event_monitor_start(void)
> +{
> +	RTE_LOG(ERR, EAL, "Device event is not supported for FreeBSD\n");
> +	return -1;
> +}
> +
> +int __rte_experimental
> +rte_dev_event_monitor_stop(void)
> +{
> +	RTE_LOG(ERR, EAL, "Device event is not supported for FreeBSD\n");
> +	return -1;
> +}
> diff --git a/lib/librte_eal/bsdapp/eal/meson.build
> b/lib/librte_eal/bsdapp/eal/meson.build
> index e83fc91..6dfc533 100644
> --- a/lib/librte_eal/bsdapp/eal/meson.build
> +++ b/lib/librte_eal/bsdapp/eal/meson.build
> @@ -12,4 +12,5 @@ env_sources = files('eal_alarm.c',
>  		'eal_timer.c',
>  		'eal.c',
>  		'eal_memory.c',
> +		'eal_dev.c'
>  )
> diff --git a/lib/librte_eal/common/eal_common_dev.c
> b/lib/librte_eal/common/eal_common_dev.c
> index cd07144..e09e86f 100644
> --- a/lib/librte_eal/common/eal_common_dev.c
> +++ b/lib/librte_eal/common/eal_common_dev.c
> @@ -14,9 +14,34 @@
>  #include <rte_devargs.h>
>  #include <rte_debug.h>
>  #include <rte_log.h>
> +#include <rte_spinlock.h>
> +#include <rte_malloc.h>
> 
>  #include "eal_private.h"
> 
> +/* spinlock for device callbacks */

It's for protect callback list.

> +static rte_spinlock_t dev_event_lock = RTE_SPINLOCK_INITIALIZER;

Put this spinlock where the list locates.

> +
> +/**
> + * The device event callback description.
> + *
> + * It contains callback address to be registered by user application,
> + * the pointer to the parameters for callback, and the device name.
> + */
> +struct dev_event_callback {
> +	TAILQ_ENTRY(dev_event_callback) next; /**< Callbacks list */
> +	rte_dev_event_cb_fn cb_fn;                /**< Callback address */
> +	void *cb_arg;                           /**< Callback parameter */
> +	char *dev_name;	 /**< Callback device name, NULL is for all
> device */
> +	uint32_t active;                        /**< Callback is executing */
> +};
> +
> +/** @internal Structure to keep track of registered callbacks */
> +TAILQ_HEAD(dev_event_cb_list, dev_event_callback);
> +
> +/* The device event callback list for all registered callbacks. */
> +static struct dev_event_cb_list dev_event_cbs;
> +
>  static int cmp_detached_dev_name(const struct rte_device *dev,
>  	const void *_name)
>  {
> @@ -207,3 +232,146 @@ rte_eal_hotplug_remove(const char *busname,
> const char *devname)
>  	rte_eal_devargs_remove(busname, devname);
>  	return ret;
>  }
> +
> +int __rte_experimental
> +rte_dev_event_callback_register(const char *device_name,
> +				rte_dev_event_cb_fn cb_fn,
> +				void *cb_arg)
> +{
> +	struct dev_event_callback *event_cb;
> +	int ret;
> +
> +	if (!cb_fn)
> +		return -EINVAL;
> +
> +	rte_spinlock_lock(&dev_event_lock);
> +
> +	if (TAILQ_EMPTY(&dev_event_cbs))
> +		TAILQ_INIT(&dev_event_cbs);
> +
> +	TAILQ_FOREACH(event_cb, &dev_event_cbs, next) {
> +		if (event_cb->cb_fn == cb_fn && event_cb->cb_arg ==
> cb_arg) {
> +			if (device_name == NULL && event_cb->dev_name
> == NULL)
> +				break;
> +			if (device_name == NULL || event_cb->dev_name
> == NULL)
> +				continue;
> +			if (!strcmp(event_cb->dev_name, device_name))
> +				break;
> +		}
> +	}
> +
> +	/* create a new callback. */
> +	if (event_cb == NULL) {
> +		event_cb = malloc(sizeof(struct dev_event_callback));
> +		if (event_cb != NULL) {
> +			event_cb->cb_fn = cb_fn;
> +			event_cb->cb_arg = cb_arg;
> +			event_cb->active = 0;
> +			if (!device_name) {
> +				event_cb->dev_name = NULL;
> +			} else {
> +				event_cb->dev_name =
> strdup(device_name);
> +				if (event_cb->dev_name == NULL) {
> +					ret = -ENOMEM;
> +					goto error;
> +				}
> +			}
> +			TAILQ_INSERT_TAIL(&dev_event_cbs, event_cb,
> next);
> +		} else {
> +			RTE_LOG(ERR, EAL,
> +				"Failed to allocate memory for device "
> +				"event callback.");
> +			ret = -ENOMEM;
> +			goto error;
> +		}
> +	} else {
> +		RTE_LOG(ERR, EAL,
> +			"The callback is already exist, no need "
> +			"to register again.\n");
> +		ret = -EEXIST;
> +		goto error;

Here is a bug that you will free an existing callback entry.

> +	}
> +
> +	rte_spinlock_unlock(&dev_event_lock);
> +	return 0;
> +error:
> +	free(event_cb);
> +	rte_spinlock_unlock(&dev_event_lock);
> +	return ret;
> +}
> +
> +int __rte_experimental
> +rte_dev_event_callback_unregister(const char *device_name,
> +				  rte_dev_event_cb_fn cb_fn,
> +				  void *cb_arg)
> +{

Let's clearly define the behavior and return of this API. If I understand it correctly,

    If cb_arg != -1, we use (dev_name, cb_fn, cb_arg) as the key to look up the registered API.
    If cb_arg == -1, we use (cb_fn) as the key to look up the registered API.

For return value, we want to return the number of callbacks being removed. It could be:
  >=0, number of callbacks been removed. (When we encounter an active callback, we shall skip it or just return -EAGAIN, neither sounds good to me actually)
 <0, error encountered.

If you agree with above statement, below implementation has lots of issues.

> +	int ret = 0;
> +	struct dev_event_callback *event_cb, *next;
> +
> +	if (!cb_fn)
> +		return -EINVAL;
> +
> +	rte_spinlock_lock(&dev_event_lock);
> +
> +	/*walk through the callbacks and remove all that match. */
> +	for (event_cb = TAILQ_FIRST(&dev_event_cbs); event_cb != NULL;
> +	     event_cb = next) {
> +
> +		next = TAILQ_NEXT(event_cb, next);

First of all, if cb_fn  != event_cb->cb_fn, we shall continue.

> +
> +		if (device_name != NULL && event_cb->dev_name != NULL) {
> +			if (!strcmp(event_cb->dev_name, device_name)) {
> +				if (event_cb->cb_fn != cb_fn ||
> +				    (cb_arg != (void *)-1 &&
> +				    event_cb->cb_arg != cb_arg))
> +					continue;
> +			}
> +		} else if (device_name != NULL) {
> +			continue;
> +		}

What about device_name == NULL && event_cb->dev_name != NULL && cb_arg == -1?

What about device_name == NULL && event_cb->dev_name == NULL &&  cb_arg != -1 && cb_arg != event_cb->cb_arg?


> +
> +		/*
> +		 * if this callback is not executing right now,
> +		 * then remove it.
> +		 */
> +		if (event_cb->active == 0) {
> +			TAILQ_REMOVE(&dev_event_cbs, event_cb, next);
> +			free(event_cb);
> +			ret++;
> +		} else {
> +			ret = -EAGAIN;

If you don't break here, next time you find another satisfied callback, you will ret++ on a (-EAGAIN) value.

> +		}
> +	}
> +	rte_spinlock_unlock(&dev_event_lock);
> +	return ret;
> +}

BTW, don't know why DPDK has the tradition of using cb_arg==-1 to stand for multiple callbacks, it's not a good API design to me. Would like as others' opinions, shall we continue this?

> +
> +void
> +dev_callback_process(char *device_name, enum rte_dev_event_type
> event)
> +{
> +	struct dev_event_callback *cb_lst;
> +	int rc;
> +
> +	if (device_name == NULL)
> +		return;
> +
> +	rte_spinlock_lock(&dev_event_lock);
> +
> +	TAILQ_FOREACH(cb_lst, &dev_event_cbs, next) {
> +		if (cb_lst->dev_name) {
> +			if (strcmp(cb_lst->dev_name, device_name))
> +				continue;
> +		}
> +		cb_lst->active = 1;
> +		rte_spinlock_unlock(&dev_event_lock);
> +		rc = cb_lst->cb_fn(device_name, event,
> +				cb_lst->cb_arg);
> +		if (rc) {
> +			RTE_LOG(ERR, EAL,
> +				"Failed to process callback function.");
> +		}

I don't see a reason why we need the return value from callbacks. Probably, define it as void type.

> +		rte_spinlock_lock(&dev_event_lock);
> +		cb_lst->active = 0;
> +	}
> +	rte_spinlock_unlock(&dev_event_lock);
> +}
> diff --git a/lib/librte_eal/common/eal_private.h
> b/lib/librte_eal/common/eal_private.h
> index 0b28770..88e5a59 100644
> --- a/lib/librte_eal/common/eal_private.h
> +++ b/lib/librte_eal/common/eal_private.h
> @@ -9,6 +9,8 @@
>  #include <stdint.h>
>  #include <stdio.h>
> 
> +#include <rte_dev.h>
> +
>  /**
>   * Initialize the memzone subsystem (private to eal).
>   *
> @@ -205,4 +207,17 @@ struct rte_bus
> *rte_bus_find_by_device_name(const char *str);
> 
>  int rte_mp_channel_init(void);
> 
> +/**
> + * Internal Executes all the user application registered callbacks for
> + * the specific device. It is for DPDK internal user only. User
> + * application should not call it directly.
> + *
> + * @param device_name
> + *  The device name.
> + * @param event
> + *  the device event type.
> + *
> + */
> +void
> +dev_callback_process(char *device_name, enum rte_dev_event_type event);

Too many *_process functions in this patch. Let's avoid using such ambiguous words.

For example, you can rename this function to dev_event_callback_invoke().

>  #endif /* _EAL_PRIVATE_H_ */
> diff --git a/lib/librte_eal/common/include/rte_dev.h
> b/lib/librte_eal/common/include/rte_dev.h
> index b688f1e..4c78938 100644
> --- a/lib/librte_eal/common/include/rte_dev.h
> +++ b/lib/librte_eal/common/include/rte_dev.h
> @@ -24,6 +24,25 @@ extern "C" {
>  #include <rte_compat.h>
>  #include <rte_log.h>
> 
> +/**
> + * The device event type.
> + */
> +enum rte_dev_event_type {
> +	RTE_DEV_EVENT_ADD,	/**< device being added */
> +	RTE_DEV_EVENT_REMOVE,	/**< device being removed */
> +	RTE_DEV_EVENT_MAX	/**< max value of this enum */
> +};
> +
> +struct rte_dev_event {
> +	enum rte_dev_event_type type;	/**< device event type */
> +	int subsystem;			/**< subsystem id */
> +	char *devname;			/**< device name */
> +};
> +
> +typedef int (*rte_dev_event_cb_fn)(char *device_name,
> +					enum rte_dev_event_type event,
> +					void *cb_arg);
> +
>  __attribute__((format(printf, 2, 0)))
>  static inline void
>  rte_pmd_debug_trace(const char *func_name, const char *fmt, ...)
> @@ -267,4 +286,79 @@ __attribute__((used)) = str
>  }
>  #endif
> 
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * It registers the callback for the specific device.
> + * Multiple callbacks cal be registered at the same time.
> + *
> + * @param device_name
> + *  The device name, that is the param name of the struct rte_device,
> + *  null value means for all devices.
> + * @param cb_fn
> + *  callback address.
> + * @param cb_arg
> + *  address of parameter for callback.
> + *
> + * @return
> + *  - On success, zero.
> + *  - On failure, a negative value.
> + */
> +int __rte_experimental
> +rte_dev_event_callback_register(const char *device_name,
> +				rte_dev_event_cb_fn cb_fn,
> +				void *cb_arg);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * It unregisters the callback according to the specified device.
> + *
> + * @param device_name
> + *  The device name, that is the param name of the struct rte_device,
> + *  null value means for all devices.
> + * @param cb_fn
> + *  callback address.
> + * @param cb_arg
> + *  address of parameter for callback, (void *)-1 means to remove all
> + *  registered which has the same callback address.
> + *
> + * @return
> + *  - On success, return the number of callback entities removed.
> + *  - On failure, a negative value.
> + */
> +int __rte_experimental
> +rte_dev_event_callback_unregister(const char *device_name,
> +				  rte_dev_event_cb_fn cb_fn,
> +				  void *cb_arg);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Start the device event monitoring.
> + *
> + * @param none
> + * @return
> + *   - On success, zero.
> + *   - On failure, a negative value.
> + */
> +int __rte_experimental
> +rte_dev_event_monitor_start(void);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Stop the device event monitoring .
> + *
> + * @param none
> + * @return
> + *   - On success, zero.
> + *   - On failure, a negative value.
> + */
> +int __rte_experimental
> +rte_dev_event_monitor_stop(void);
>  #endif /* _RTE_DEV_H_ */
> diff --git a/lib/librte_eal/linuxapp/eal/Makefile
> b/lib/librte_eal/linuxapp/eal/Makefile
> index 7e5bbe8..8578796 100644
> --- a/lib/librte_eal/linuxapp/eal/Makefile
> +++ b/lib/librte_eal/linuxapp/eal/Makefile
> @@ -41,6 +41,7 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) +=
> eal_lcore.c
>  SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_timer.c
>  SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_interrupts.c
>  SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_alarm.c
> +SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_dev.c
> 
>  # from common dir
>  SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_common_lcore.c
> diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c
> b/lib/librte_eal/linuxapp/eal/eal_dev.c
> new file mode 100644
> index 0000000..9c8d1a0
> --- /dev/null
> +++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
> @@ -0,0 +1,22 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2018 Intel Corporation
> + */
> +
> +#include <rte_log.h>
> +#include <rte_compat.h>
> +#include <rte_dev.h>
> +
> +
> +int __rte_experimental
> +rte_dev_event_monitor_start(void)
> +{
> +	/* TODO: start uevent monitor for linux */
> +	return 0;
> +}
> +
> +int __rte_experimental
> +rte_dev_event_monitor_stop(void)
> +{
> +	/* TODO: stop uevent monitor for linux */
> +	return 0;
> +}
> diff --git a/lib/librte_eal/linuxapp/eal/meson.build
> b/lib/librte_eal/linuxapp/eal/meson.build
> index 03974ff..b222571 100644
> --- a/lib/librte_eal/linuxapp/eal/meson.build
> +++ b/lib/librte_eal/linuxapp/eal/meson.build
> @@ -18,6 +18,7 @@ env_sources = files('eal_alarm.c',
>  		'eal_vfio_mp_sync.c',
>  		'eal.c',
>  		'eal_memory.c',
> +		'eal_dev.c',
>  )
> 
>  if has_libnuma == 1
> diff --git a/lib/librte_eal/rte_eal_version.map
> b/lib/librte_eal/rte_eal_version.map
> index d123602..d23f491 100644
> --- a/lib/librte_eal/rte_eal_version.map
> +++ b/lib/librte_eal/rte_eal_version.map
> @@ -256,3 +256,11 @@ EXPERIMENTAL {
>  	rte_service_start_with_defaults;
> 
>  } DPDK_18.02;
> +
> +EXPERIMENTAL {
> +        global:
> +
> +        rte_dev_event_callback_register;
> +        rte_dev_event_callback_unregister;
> +
> +} DPDK_18.05;
> --
> 2.7.4

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V18 3/4] eal/linux: uevent parse and process
  2018-04-03 10:33                                                                   ` [PATCH V18 3/4] eal/linux: uevent parse and process Jeff Guo
@ 2018-04-04  3:15                                                                     ` Tan, Jianfeng
  2018-04-05  6:09                                                                       ` Guo, Jia
  0 siblings, 1 reply; 494+ messages in thread
From: Tan, Jianfeng @ 2018-04-04  3:15 UTC (permalink / raw)
  To: Guo, Jia, stephen, Richardson, Bruce, Yigit, Ferruh, Ananyev,
	Konstantin, gaetan.rivet, Wu, Jingjing, thomas, motih,
	Van Haaren, Harry
  Cc: jblunck, shreyansh.jain, dev, Zhang, Helin

Hi Jeff,

Looks much better now, but still have some issues to address.

> -----Original Message-----
> From: Guo, Jia
> Sent: Tuesday, April 3, 2018 6:34 PM
> To: stephen@networkplumber.org; Richardson, Bruce; Yigit, Ferruh;
> Ananyev, Konstantin; gaetan.rivet@6wind.com; Wu, Jingjing;
> thomas@monjalon.net; motih@mellanox.com; Van Haaren, Harry; Tan,
> Jianfeng
> Cc: jblunck@infradead.org; shreyansh.jain@nxp.com; dev@dpdk.org; Guo,
> Jia; Zhang, Helin
> Subject: [PATCH V18 3/4] eal/linux: uevent parse and process
> 
> In order to handle the uevent which has been detected from the kernel
> side, add uevent parse and process function to translate the uevent into
> device event, which user has subscribed to monitor.
> 
> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> ---
> v18->v17:
> refine socket configuration.
> ---
>  lib/librte_eal/linuxapp/eal/eal_dev.c | 178
> +++++++++++++++++++++++++++++++++-
>  1 file changed, 176 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c
> b/lib/librte_eal/linuxapp/eal/eal_dev.c
> index 9c8d1a0..9f2ee40 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_dev.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
> @@ -2,21 +2,195 @@
>   * Copyright(c) 2018 Intel Corporation
>   */
> 
> +#include <string.h>
> +#include <unistd.h>
> +#include <sys/socket.h>
> +#include <linux/netlink.h>
> +
>  #include <rte_log.h>
>  #include <rte_compat.h>
>  #include <rte_dev.h>
> +#include <rte_malloc.h>
> +#include <rte_interrupts.h>
> +
> +#include "eal_private.h"
> +
> +static struct rte_intr_handle intr_handle = {.fd = -1 };
> +static bool monitor_started;
> +
> +#define EAL_UEV_MSG_LEN 4096
> +#define EAL_UEV_MSG_ELEM_LEN 128
> +
> +/* identify the system layer which event exposure from */

Reword it a little bit: 
    /* identify the system layer which reports this event */

> +enum eal_dev_event_subsystem {
> +	EAL_DEV_EVENT_SUBSYSTEM_PCI, /* PCI bus device event */
> +	EAL_DEV_EVENT_SUBSYSTEM_UIO, /* UIO driver device event */
> +	EAL_DEV_EVENT_SUBSYSTEM_VFIO, /* VFIO driver device event */
> +	EAL_DEV_EVENT_SUBSYSTEM_MAX
> +};
> +
> +static int
> +dev_uev_socket_fd_create(void)
> +{
> +	struct sockaddr_nl addr;
> +	int ret;
> +
> +	intr_handle.fd = socket(PF_NETLINK, SOCK_RAW | SOCK_CLOEXEC |
> +			SOCK_NONBLOCK,
> +			NETLINK_KOBJECT_UEVENT);
> +	if (intr_handle.fd < 0) {
> +		RTE_LOG(ERR, EAL, "create uevent fd failed.\n");
> +		return -1;
> +	}
> +
> +	memset(&addr, 0, sizeof(addr));
> +	addr.nl_family = AF_NETLINK;
> +	addr.nl_pid = 0;
> +	addr.nl_groups = 0xffffffff;
> +
> +	ret = bind(intr_handle.fd, (struct sockaddr *) &addr, sizeof(addr));
> +	if (ret < 0) {
> +		RTE_LOG(ERR, EAL, "Failed to bind socket for netlink fd.\n");

Reword it a little bit so that we can understand it's a log related to hotplug:
    Failed to bind uevent socket

> +		goto err;
> +	}
> +
> +	return 0;
> +err:
> +	close(intr_handle.fd);

Then: intr_handle.fd = -1?

> +	return ret;
> +}
> +
> +static void
> +dev_uev_parse(const char *buf, struct rte_dev_event *event, int length)
> +{
> +	char action[EAL_UEV_MSG_ELEM_LEN];
> +	char subsystem[EAL_UEV_MSG_ELEM_LEN];
> +	char pci_slot_name[EAL_UEV_MSG_ELEM_LEN];
> +	int i = 0;
> +
> +	memset(action, 0, EAL_UEV_MSG_ELEM_LEN);
> +	memset(subsystem, 0, EAL_UEV_MSG_ELEM_LEN);
> +	memset(pci_slot_name, 0, EAL_UEV_MSG_ELEM_LEN);
> +
> +	while (i < length) {
> +		for (; i < length; i++) {
> +			if (*buf)
> +				break;
> +			buf++;
> +		}
> +		if (!strncmp(buf, "ACTION=", 7)) {
> +			buf += 7;
> +			i += 7;
> +			snprintf(action, sizeof(action), "%s", buf);
> +		} else if (!strncmp(buf, "SUBSYSTEM=", 10)) {
> +			buf += 10;
> +			i += 10;
> +			snprintf(subsystem, sizeof(subsystem), "%s", buf);
> +		} else if (!strncmp(buf, "PCI_SLOT_NAME=", 14)) {
> +			buf += 14;
> +			i += 14;
> +			snprintf(pci_slot_name, sizeof(subsystem), "%s", buf);
> +			event->devname = strdup(pci_slot_name);
> +		}
> +		for (; i < length; i++) {
> +			if (*buf == '\0')
> +				break;
> +			buf++;
> +		}
> +	}
> +
> +	if (!strncmp(subsystem, "uio", 3))
> +		event->subsystem = EAL_DEV_EVENT_SUBSYSTEM_UIO;
> +	else if (!strncmp(subsystem, "pci", 3))
> +		event->subsystem = EAL_DEV_EVENT_SUBSYSTEM_PCI;
> +	else if (!strncmp(subsystem, "vfio", 4))

How can we indicate it is an event with subsystem that we will not handle?

> +		event->subsystem = EAL_DEV_EVENT_SUBSYSTEM_VFIO;
> +	if (!strncmp(action, "add", 3))
> +		event->type = RTE_DEV_EVENT_ADD;
> +	if (!strncmp(action, "remove", 6))
> +		event->type = RTE_DEV_EVENT_REMOVE;

How can we indicate it is an event with type that we will not handle?

My suggestion is to define a return value for that:
- EVENT_VALID returned for an event that we will handle later.
- EVENT_INVALID returned for any unknown events.

> +}
> +
> +static int
> +dev_uev_receive(int fd, struct rte_dev_event *uevent)
> +{
> +	int ret;
> +	char buf[EAL_UEV_MSG_LEN];
> +
> +	memset(uevent, 0, sizeof(struct rte_dev_event));
> +	memset(buf, 0, EAL_UEV_MSG_LEN);
> +
> +	ret = recv(fd, buf, EAL_UEV_MSG_LEN, MSG_DONTWAIT);
> +	if (ret < 0) {
> +		RTE_LOG(ERR, EAL,
> +			"Socket read error(%d): %s.\n",
> +			errno, strerror(errno));
> +		return -1;
> +	} else if (ret == 0)
> +		/* connection closed */
> +		return -1;
> +
> +	dev_uev_parse(buf, uevent, EAL_UEV_MSG_LEN);
> +
> +	return 0;
> +}
> +
> +static void
> +dev_uev_process(__rte_unused void *param)
> +{
> +	struct rte_dev_event uevent;
> +
> +	if (dev_uev_receive(intr_handle.fd, &uevent))

If error happens, we shall start an alarm task to remove the callback of interrupt thread.

> +		return;

You may want to add a log here for debugging, showing what event comes for which device.

> 
> +	if (uevent.devname)

You can also filter this kind of events using the way I suggested above.

> +		dev_callback_process(uevent.devname, uevent.type);
> +}
> 
>  int __rte_experimental
>  rte_dev_event_monitor_start(void)
>  {
> -	/* TODO: start uevent monitor for linux */
> +	int ret;
> +
> +	if (monitor_started)
> +		return 0;
> +
> +	ret = dev_uev_socket_fd_create();
> +	if (ret) {
> +		RTE_LOG(ERR, EAL, "error create device event fd.\n");
> +		return -1;
> +	}
> +
> +	intr_handle.type = RTE_INTR_HANDLE_DEV_EVENT;
> +	ret = rte_intr_callback_register(&intr_handle, dev_uev_process,
> NULL);
> +
> +	if (ret) {
> +		RTE_LOG(ERR, EAL, "fail to register uevent callback.\n");
> +		return -1;
> +	}
> +
> +	monitor_started = true;
> +
>  	return 0;
>  }
> 
>  int __rte_experimental
>  rte_dev_event_monitor_stop(void)
>  {
> -	/* TODO: stop uevent monitor for linux */
> +	int ret;
> +
> +	if (!monitor_started)
> +		return 0;
> +
> +	ret = rte_intr_callback_unregister(&intr_handle, dev_uev_process,
> +					   (void *)-1);
> +	if (ret < 0) {
> +		RTE_LOG(ERR, EAL, "fail to unregister uevent callback.\n");
> +		return ret;
> +	}
> +
> +	close(intr_handle.fd);
> +	intr_handle.fd = -1;
> +	monitor_started = false;
>  	return 0;
>  }
> --
> 2.7.4

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V18 4/4] app/testpmd: enable device hotplug monitoring
  2018-04-03 10:33                                                                   ` [PATCH V18 4/4] app/testpmd: enable device hotplug monitoring Jeff Guo
@ 2018-04-04  3:22                                                                     ` Tan, Jianfeng
  2018-04-04 16:31                                                                       ` Matan Azrad
  2018-04-05  8:32                                                                     ` [PATCH V19 0/4] add device event monitor framework Jeff Guo
  2018-04-05  9:02                                                                     ` [PATCH V19 0/4] add device event monitor framework Jeff Guo
  2 siblings, 1 reply; 494+ messages in thread
From: Tan, Jianfeng @ 2018-04-04  3:22 UTC (permalink / raw)
  To: Guo, Jia, stephen, Richardson, Bruce, Yigit, Ferruh, Ananyev,
	Konstantin, gaetan.rivet, Wu, Jingjing, thomas, motih,
	Van Haaren, Harry
  Cc: jblunck, shreyansh.jain, dev, Zhang, Helin



> -----Original Message-----
> From: Guo, Jia
> Sent: Tuesday, April 3, 2018 6:34 PM
> To: stephen@networkplumber.org; Richardson, Bruce; Yigit, Ferruh;
> Ananyev, Konstantin; gaetan.rivet@6wind.com; Wu, Jingjing;
> thomas@monjalon.net; motih@mellanox.com; Van Haaren, Harry; Tan,
> Jianfeng
> Cc: jblunck@infradead.org; shreyansh.jain@nxp.com; dev@dpdk.org; Guo,
> Jia; Zhang, Helin
> Subject: [PATCH V18 4/4] app/testpmd: enable device hotplug monitoring
> 
> Use testpmd for example, to show how an application use device event

s/use/uses

> APIs to monitor the hotplug events, including both hot removal event
> and hot insertion event.
> 
> The process is that, testpmd first enable hotplug by below commands,
> 
> E.g. ./build/app/testpmd -c 0x3 --n 4 -- -i --hot-plug
> 
> then testpmd start the device event monitor by call the new API

s/start/starts
s/call/calling

> (rte_dev_event_monitor_start) and register the user's callback by call
> the API (rte_dev_event_callback_register), when device being hotplug
> insertion or hotplug removal, the device event monitor detects the event
> and call user's callbacks, user could process the event in the callback
> accordingly.
> 
> This patch only shows the event monitoring, device attach/detach would
> not be involved here, will add from other hotplug patch set.
> 
> Signed-off-by: Jeff Guo <jia.guo@intel.com>

Some typos and a trivial suggestion. Feel free to carry my

Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>

in the next version.

> ---
> v18->v17:
> remove hotplug policy and detach/attach process from testpmd, let it
> focus on the device event monitoring which the patch set introduced.
> ---
>  app/test-pmd/parameters.c             |   5 +-
>  app/test-pmd/testpmd.c                | 112
> +++++++++++++++++++++++++++++++++-
>  app/test-pmd/testpmd.h                |   2 +
>  doc/guides/testpmd_app_ug/run_app.rst |   4 ++
>  4 files changed, 121 insertions(+), 2 deletions(-)
> 
> diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
> index 97d22b8..558cd40 100644
> --- a/app/test-pmd/parameters.c
> +++ b/app/test-pmd/parameters.c
> @@ -186,6 +186,7 @@ usage(char* progname)
>  	printf("  --flow-isolate-all: "
>  	       "requests flow API isolated mode on all ports at initialization
> time.\n");
>  	printf("  --tx-offloads=0xXXXXXXXX: hexadecimal bitmask of TX
> queue offloads\n");
> +	printf("  --hot-plug: enable hot plug for device.\n");
>  }
> 
>  #ifdef RTE_LIBRTE_CMDLINE
> @@ -621,6 +622,7 @@ launch_args_parse(int argc, char** argv)
>  		{ "print-event",		1, 0, 0 },
>  		{ "mask-event",			1, 0, 0 },
>  		{ "tx-offloads",		1, 0, 0 },
> +		{ "hot-plug",			0, 0, 0 },
>  		{ 0, 0, 0, 0 },
>  	};
> 
> @@ -1102,7 +1104,8 @@ launch_args_parse(int argc, char** argv)
>  					rte_exit(EXIT_FAILURE,
>  						 "invalid mask-event
> argument\n");
>  				}
> -
> +			if (!strcmp(lgopts[opt_idx].name, "hot-plug"))
> +				hot_plug = 1;
>  			break;
>  		case 'h':
>  			usage(argv[0]);
> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
> index 4c0e258..2faeb90 100644
> --- a/app/test-pmd/testpmd.c
> +++ b/app/test-pmd/testpmd.c
> @@ -12,6 +12,7 @@
>  #include <sys/mman.h>
>  #include <sys/types.h>
>  #include <errno.h>
> +#include <stdbool.h>
> 
>  #include <sys/queue.h>
>  #include <sys/stat.h>
> @@ -284,6 +285,8 @@ uint8_t lsc_interrupt = 1; /* enabled by default */
>   */
>  uint8_t rmv_interrupt = 1; /* enabled by default */
> 
> +uint8_t hot_plug = 0; /**< hotplug disabled by default. */
> +
>  /*
>   * Display or mask ether events
>   * Default to all events except VF_MBOX
> @@ -391,6 +394,12 @@ static void check_all_ports_link_status(uint32_t
> port_mask);
>  static int eth_event_callback(portid_t port_id,
>  			      enum rte_eth_event_type type,
>  			      void *param, void *ret_param);
> +static int eth_dev_event_callback(char *device_name,
> +				enum rte_dev_event_type type,
> +				void *param);
> +static int eth_dev_event_callback_register(void);
> +static int eth_dev_event_callback_unregister(void);
> +
> 
>  /*
>   * Check if all the ports are started.
> @@ -1853,6 +1862,39 @@ reset_port(portid_t pid)
>  	printf("Done\n");
>  }
> 
> +static int
> +eth_dev_event_callback_register(void)
> +{
> +	int diag;
> +
> +	/* register the device event callback */
> +	diag = rte_dev_event_callback_register(NULL,
> +		eth_dev_event_callback, NULL);
> +	if (diag) {
> +		printf("Failed to setup dev_event callback\n");
> +		return -1;
> +	}
> +
> +	return 0;
> +}
> +
> +
> +static int
> +eth_dev_event_callback_unregister(void)
> +{
> +	int diag;
> +
> +	/* unregister the device event callback */
> +	diag = rte_dev_event_callback_unregister(NULL,
> +		eth_dev_event_callback, NULL);
> +	if (diag) {
> +		printf("Failed to setup dev_event callback\n");
> +		return -1;
> +	}
> +
> +	return 0;
> +}
> +
>  void
>  attach_port(char *identifier)
>  {
> @@ -1916,6 +1958,7 @@ void
>  pmd_test_exit(void)
>  {
>  	portid_t pt_id;
> +	int ret;
> 
>  	if (test_done == 0)
>  		stop_packet_forwarding();
> @@ -1929,6 +1972,18 @@ pmd_test_exit(void)
>  			close_port(pt_id);
>  		}
>  	}
> +
> +	if (hot_plug) {
> +		ret = rte_dev_event_monitor_stop();
> +		if (ret)
> +			RTE_LOG(ERR, EAL,
> +				"fail to stop device event monitor.");
> +
> +		ret = eth_dev_event_callback_unregister();
> +		if (ret)
> +			RTE_LOG(ERR, EAL,
> +				"fail to unregister all event callbacks.");
> +	}
>  	printf("\nBye...\n");
>  }
> 
> @@ -2059,6 +2114,48 @@ eth_event_callback(portid_t port_id, enum
> rte_eth_event_type type, void *param,
>  	return 0;
>  }
> 
> +/* This function is used by the interrupt thread */
> +static int
> +eth_dev_event_callback(char *device_name, enum rte_dev_event_type
> type,
> +			     __rte_unused void *arg)
> +{
> +	int ret = 0;

>From here

> +	static const char * const event_desc[] = {
> +		[RTE_DEV_EVENT_ADD] = "add",
> +		[RTE_DEV_EVENT_REMOVE] = "remove",
> +	};
> +
> +	if (type >= RTE_DEV_EVENT_MAX) {
> +		fprintf(stderr, "%s called upon invalid event %d\n",
> +			__func__, type);
> +		fflush(stderr);
> +	} else if (event_print_mask & (UINT32_C(1) << type)) {
> +		printf("%s event\n",
> +			event_desc[type]);
> +		fflush(stdout);
> +	}

to here, these check are not necessary.

> +
> +	switch (type) {
> +	case RTE_DEV_EVENT_REMOVE:
> +		RTE_LOG(ERR, EAL, "The device: %s has been removed!\n",
> +			device_name);
> +		/* TODO: After finish failure handle, begin to stop
> +		 * packet forward, stop port, close port, detach port.
> +		 */
> +		break;
> +	case RTE_DEV_EVENT_ADD:
> +		RTE_LOG(ERR, EAL, "The device: %s has been added!\n",
> +			device_name);
> +		/* TODO: After finish kernel driver binding,
> +		 * begin to attach port.
> +		 */
> +		break;
> +	default:
> +		break;
> +	}
> +	return ret;
> +}
> +
>  static int
>  set_tx_queue_stats_mapping_registers(portid_t port_id, struct rte_port
> *port)
>  {
> @@ -2474,8 +2571,9 @@ signal_handler(int signum)
>  int
>  main(int argc, char** argv)
>  {
> -	int  diag;
> +	int diag;
>  	portid_t port_id;
> +	int ret;
> 
>  	signal(SIGINT, signal_handler);
>  	signal(SIGTERM, signal_handler);
> @@ -2543,6 +2641,18 @@ main(int argc, char** argv)
>  		       nb_rxq, nb_txq);
> 
>  	init_config();
> +
> +	if (hot_plug) {
> +		/* enable hot plug monitoring */
> +		ret = rte_dev_event_monitor_start();
> +		if (ret) {
> +			rte_errno = EINVAL;
> +			return -1;
> +		}
> +		eth_dev_event_callback_register();
> +
> +	}
> +
>  	if (start_port(RTE_PORT_ALL) != 0)
>  		rte_exit(EXIT_FAILURE, "Start ports failed\n");
> 
> diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
> index 153abea..8fde68d 100644
> --- a/app/test-pmd/testpmd.h
> +++ b/app/test-pmd/testpmd.h
> @@ -319,6 +319,8 @@ extern volatile int test_done; /* stop packet
> forwarding when set to 1. */
>  extern uint8_t lsc_interrupt; /**< disabled by "--no-lsc-interrupt" parameter
> */
>  extern uint8_t rmv_interrupt; /**< disabled by "--no-rmv-interrupt"
> parameter */
>  extern uint32_t event_print_mask;
> +extern uint8_t hot_plug; /**< enable by "--hot-plug" parameter */
> +
>  /**< set by "--print-event xxxx" and "--mask-event xxxx parameters */
> 
>  #ifdef RTE_LIBRTE_IXGBE_BYPASS
> diff --git a/doc/guides/testpmd_app_ug/run_app.rst
> b/doc/guides/testpmd_app_ug/run_app.rst
> index 1fd5395..d0ced36 100644
> --- a/doc/guides/testpmd_app_ug/run_app.rst
> +++ b/doc/guides/testpmd_app_ug/run_app.rst
> @@ -479,3 +479,7 @@ The commandline options are:
> 
>      Set the hexadecimal bitmask of TX queue offloads.
>      The default value is 0.
> +
> +*   ``--hot-plug``
> +
> +    Enable device event monitor machenism for hotplug.

s/machenism/mechanism

> --
> 2.7.4

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V18 1/4] eal: add device event handle in interrupt thread
  2018-04-04  1:47                                                                     ` Tan, Jianfeng
@ 2018-04-04  4:00                                                                       ` Guo, Jia
  0 siblings, 0 replies; 494+ messages in thread
From: Guo, Jia @ 2018-04-04  4:00 UTC (permalink / raw)
  To: Tan, Jianfeng, stephen, Richardson, Bruce, Yigit, Ferruh,
	Ananyev, Konstantin, gaetan.rivet, Wu, Jingjing, thomas, motih,
	Van Haaren, Harry
  Cc: jblunck, shreyansh.jain, dev, Zhang, Helin



On 4/4/2018 9:47 AM, Tan, Jianfeng wrote:
>
>> -----Original Message-----
>> From: Guo, Jia
>> Sent: Tuesday, April 3, 2018 6:34 PM
>> To: stephen@networkplumber.org; Richardson, Bruce; Yigit, Ferruh;
>> Ananyev, Konstantin; gaetan.rivet@6wind.com; Wu, Jingjing;
>> thomas@monjalon.net; motih@mellanox.com; Van Haaren, Harry; Tan,
>> Jianfeng
>> Cc: jblunck@infradead.org; shreyansh.jain@nxp.com; dev@dpdk.org; Guo,
>> Jia; Zhang, Helin
>> Subject: [PATCH V18 1/4] eal: add device event handle in interrupt thread
>>
>> Add new interrupt handle type of RTE_INTR_HANDLE_DEV_EVENT, for
>> device event interrupt monitor.
>>
>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> After fixing a typo below, you can carry my:
>
> Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
>
>
>> ---
>> v18->v17:
>> no change.
>> ---
>>   lib/librte_eal/common/include/rte_eal_interrupts.h |  1 +
>>   lib/librte_eal/linuxapp/eal/eal_interrupts.c       | 11 +++++-
>>   test/test/test_interrupts.c                        | 39 ++++++++++++++++++++--
>>   3 files changed, 48 insertions(+), 3 deletions(-)
>>
>> diff --git a/lib/librte_eal/common/include/rte_eal_interrupts.h
>> b/lib/librte_eal/common/include/rte_eal_interrupts.h
>> index 3f792a9..6eb4932 100644
>> --- a/lib/librte_eal/common/include/rte_eal_interrupts.h
>> +++ b/lib/librte_eal/common/include/rte_eal_interrupts.h
>> @@ -34,6 +34,7 @@ enum rte_intr_handle_type {
>>   	RTE_INTR_HANDLE_ALARM,        /**< alarm handle */
>>   	RTE_INTR_HANDLE_EXT,          /**< external handler */
>>   	RTE_INTR_HANDLE_VDEV,         /**< virtual device */
>> +	RTE_INTR_HANDLE_DEV_EVENT,    /**< device event handle */
>>   	RTE_INTR_HANDLE_MAX           /**< count of elements */
>>   };
>>
>> diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c
>> b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
>> index f86f22f..58e9328 100644
>> --- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c
>> +++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
>> @@ -559,6 +559,9 @@ rte_intr_enable(const struct rte_intr_handle
>> *intr_handle)
>>   			return -1;
>>   		break;
>>   #endif
>> +	/* not used at this moment */
>> +	case RTE_INTR_HANDLE_DEV_EVENT:
>> +		return -1;
>>   	/* unknown handle type */
>>   	default:
>>   		RTE_LOG(ERR, EAL,
>> @@ -606,6 +609,9 @@ rte_intr_disable(const struct rte_intr_handle
>> *intr_handle)
>>   			return -1;
>>   		break;
>>   #endif
>> +	/* not used at this moment */
>> +	case RTE_INTR_HANDLE_DEV_EVENT:
>> +		return -1;
>>   	/* unknown handle type */
>>   	default:
>>   		RTE_LOG(ERR, EAL,
>> @@ -674,7 +680,10 @@ eal_intr_process_interrupts(struct epoll_event
>> *events, int nfds)
>>   			bytes_read = 0;
>>   			call = true;
>>   			break;
>> -
>> +		case RTE_INTR_HANDLE_DEV_EVENT:
>> +			bytes_read = 0;
>> +			call = true;
>> +			break;
>>   		default:
>>   			bytes_read = 1;
>>   			break;
>> diff --git a/test/test/test_interrupts.c b/test/test/test_interrupts.c
>> index 31a70a0..7f4f1b4 100644
>> --- a/test/test/test_interrupts.c
>> +++ b/test/test/test_interrupts.c
>> @@ -20,6 +20,7 @@ enum test_interrupt_handle_type {
>>   	TEST_INTERRUPT_HANDLE_VALID,
>>   	TEST_INTERRUPT_HANDLE_VALID_UIO,
>>   	TEST_INTERRUPT_HANDLE_VALID_ALARM,
>> +	TEST_INTERRUPT_HANDLE_VALID_DEV_EVENT,
>>   	TEST_INTERRUPT_HANDLE_CASE1,
>>   	TEST_INTERRUPT_HANDLE_MAX
>>   };
>> @@ -80,6 +81,10 @@ test_interrupt_init(void)
>>   	intr_handles[TEST_INTERRUPT_HANDLE_VALID_ALARM].type =
>>   					RTE_INTR_HANDLE_ALARM;
>>
>> +	intr_handles[TEST_INTERRUPT_HANDLE_VALID_DEV_EVENT].fd =
>> pfds.readfd;
>> +	intr_handles[TEST_INTERRUPT_HANDLE_VALID_DEV_EVENT].type =
>> +					RTE_INTR_HANDLE_DEV_EVENT;
>> +
>>   	intr_handles[TEST_INTERRUPT_HANDLE_CASE1].fd = pfds.writefd;
>>   	intr_handles[TEST_INTERRUPT_HANDLE_CASE1].type =
>> RTE_INTR_HANDLE_UIO;
>>
>> @@ -250,6 +255,14 @@ test_interrupt_enable(void)
>>   		return -1;
>>   	}
>>
>> +	/* check with specific valid intr_handle */
>> +	test_intr_handle =
>> intr_handles[TEST_INTERRUPT_HANDLE_VALID_DEV_EVENT];
>> +	if (rte_intr_enable(&test_intr_handle) == 0) {
>> +		printf("unexpectedly enable a specific intr_handle "
>> +			"successfully\n");
>> +		return -1;
>> +	}
>> +
>>   	/* check with valid handler and its type */
>>   	test_intr_handle = intr_handles[TEST_INTERRUPT_HANDLE_CASE1];
>>   	if (rte_intr_enable(&test_intr_handle) < 0) {
>> @@ -306,6 +319,14 @@ test_interrupt_disable(void)
>>   		return -1;
>>   	}
>>
>> +	/* check with specific valid intr_handle */
>> +	test_intr_handle =
>> intr_handles[TEST_INTERRUPT_HANDLE_VALID_DEV_EVENT];
>> +	if (rte_intr_disable(&test_intr_handle) == 0) {
>> +		printf("unexpectedly disable a specific intr_handle "
>> +			"successfully\n");
>> +		return -1;
>> +	}
>> +
>>   	/* check with valid handler and its type */
>>   	test_intr_handle = intr_handles[TEST_INTERRUPT_HANDLE_CASE1];
>>   	if (rte_intr_disable(&test_intr_handle) < 0) {
>> @@ -393,9 +414,17 @@ test_interrupt(void)
>>   		goto out;
>>   	}
>>
>> +	printf("Check valid device event interrupt full path\n");
>> +	if (test_interrupt_full_path_check(
>> +		TEST_INTERRUPT_HANDLE_VALID_DEVICE_EVENT) < 0) {
> Here is a typo which brings below compile error:
> 	dpdk/test/test/test_interrupts.c:419:3: error: 'TEST_INTERRUPT_HANDLE_VALID_DEVICE_EVENT' undeclared (first use in this function)
> 	   TEST_INTERRUPT_HANDLE_VALID_DEVICE_EVENT) < 0) {
>
> Thanks,
> Jianfeng
thanks, will correct it.
>> +		printf("failure occurred during checking valid device event "
>> +						"interrupt full path\n");
>> +		goto out;
>> +	}
>> +
>>   	printf("Check valid alarm interrupt full path\n");
>> -	if
>> (test_interrupt_full_path_check(TEST_INTERRUPT_HANDLE_VALID_ALARM)
>> -									< 0) {
>> +	if (test_interrupt_full_path_check(
>> +		TEST_INTERRUPT_HANDLE_VALID_DEV_EVENT) < 0) {
>>   		printf("failure occurred during checking valid alarm "
>>   						"interrupt full path\n");
>>   		goto out;
>> @@ -513,6 +542,12 @@ test_interrupt(void)
>>   	rte_intr_callback_unregister(&test_intr_handle,
>>   			test_interrupt_callback_1, (void *)-1);
>>
>> +	test_intr_handle =
>> intr_handles[TEST_INTERRUPT_HANDLE_VALID_DEV_EVENT];
>> +	rte_intr_callback_unregister(&test_intr_handle,
>> +			test_interrupt_callback, (void *)-1);
>> +	rte_intr_callback_unregister(&test_intr_handle,
>> +			test_interrupt_callback_1, (void *)-1);
>> +
>>   	rte_delay_ms(2 * TEST_INTERRUPT_CHECK_INTERVAL);
>>   	/* deinit */
>>   	test_interrupt_deinit();
>> --
>> 2.7.4

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V18 4/4] app/testpmd: enable device hotplug monitoring
  2018-04-04  3:22                                                                     ` Tan, Jianfeng
@ 2018-04-04 16:31                                                                       ` Matan Azrad
  2018-04-05  8:40                                                                         ` Guo, Jia
  2018-04-05  9:03                                                                         ` Tan, Jianfeng
  0 siblings, 2 replies; 494+ messages in thread
From: Matan Azrad @ 2018-04-04 16:31 UTC (permalink / raw)
  To: Tan, Jianfeng, Guo, Jia, stephen, Richardson, Bruce, Yigit,
	Ferruh, Ananyev, Konstantin, gaetan.rivet, Wu, Jingjing,
	Thomas Monjalon, Mordechay Haimovsky, Van Haaren, Harry
  Cc: jblunck, shreyansh.jain, dev, Zhang, Helin

Hi all

What do you think about adding the "--hotplug" parameter as a new EAL command line parameter?

From: Tan, Jianfeng, Wednesday, April 4, 2018 6:23 AM
> > -----Original Message-----
> > From: Guo, Jia
> > Sent: Tuesday, April 3, 2018 6:34 PM
> > To: stephen@networkplumber.org; Richardson, Bruce; Yigit, Ferruh;
> > Ananyev, Konstantin; gaetan.rivet@6wind.com; Wu, Jingjing;
> > thomas@monjalon.net; motih@mellanox.com; Van Haaren, Harry; Tan,
> > Jianfeng
> > Cc: jblunck@infradead.org; shreyansh.jain@nxp.com; dev@dpdk.org; Guo,
> > Jia; Zhang, Helin
> > Subject: [PATCH V18 4/4] app/testpmd: enable device hotplug monitoring
> >
> > Use testpmd for example, to show how an application use device event
> 
> s/use/uses
> 
> > APIs to monitor the hotplug events, including both hot removal event
> > and hot insertion event.
> >
> > The process is that, testpmd first enable hotplug by below commands,
> >
> > E.g. ./build/app/testpmd -c 0x3 --n 4 -- -i --hot-plug
> >
> > then testpmd start the device event monitor by call the new API
> 
> s/start/starts
> s/call/calling
> 
> > (rte_dev_event_monitor_start) and register the user's callback by call
> > the API (rte_dev_event_callback_register), when device being hotplug
> > insertion or hotplug removal, the device event monitor detects the
> > event and call user's callbacks, user could process the event in the
> > callback accordingly.
> >
> > This patch only shows the event monitoring, device attach/detach would
> > not be involved here, will add from other hotplug patch set.
> >
> > Signed-off-by: Jeff Guo <jia.guo@intel.com>
> 
> Some typos and a trivial suggestion. Feel free to carry my
> 
> Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
> 
> in the next version.
> 
> > ---
> > v18->v17:
> > remove hotplug policy and detach/attach process from testpmd, let it
> > focus on the device event monitoring which the patch set introduced.
> > ---
> >  app/test-pmd/parameters.c             |   5 +-
> >  app/test-pmd/testpmd.c                | 112
> > +++++++++++++++++++++++++++++++++-
> >  app/test-pmd/testpmd.h                |   2 +
> >  doc/guides/testpmd_app_ug/run_app.rst |   4 ++
> >  4 files changed, 121 insertions(+), 2 deletions(-)
> >
> > diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
> > index 97d22b8..558cd40 100644
> > --- a/app/test-pmd/parameters.c
> > +++ b/app/test-pmd/parameters.c
> > @@ -186,6 +186,7 @@ usage(char* progname)
> >  	printf("  --flow-isolate-all: "
> >  	       "requests flow API isolated mode on all ports at
> > initialization time.\n");
> >  	printf("  --tx-offloads=0xXXXXXXXX: hexadecimal bitmask of TX
> queue
> > offloads\n");
> > +	printf("  --hot-plug: enable hot plug for device.\n");
> >  }
> >
> >  #ifdef RTE_LIBRTE_CMDLINE
> > @@ -621,6 +622,7 @@ launch_args_parse(int argc, char** argv)
> >  		{ "print-event",		1, 0, 0 },
> >  		{ "mask-event",			1, 0, 0 },
> >  		{ "tx-offloads",		1, 0, 0 },
> > +		{ "hot-plug",			0, 0, 0 },
> >  		{ 0, 0, 0, 0 },
> >  	};
> >
> > @@ -1102,7 +1104,8 @@ launch_args_parse(int argc, char** argv)
> >  					rte_exit(EXIT_FAILURE,
> >  						 "invalid mask-event
> > argument\n");
> >  				}
> > -
> > +			if (!strcmp(lgopts[opt_idx].name, "hot-plug"))
> > +				hot_plug = 1;
> >  			break;
> >  		case 'h':
> >  			usage(argv[0]);
> > diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
> > 4c0e258..2faeb90 100644
> > --- a/app/test-pmd/testpmd.c
> > +++ b/app/test-pmd/testpmd.c
> > @@ -12,6 +12,7 @@
> >  #include <sys/mman.h>
> >  #include <sys/types.h>
> >  #include <errno.h>
> > +#include <stdbool.h>
> >
> >  #include <sys/queue.h>
> >  #include <sys/stat.h>
> > @@ -284,6 +285,8 @@ uint8_t lsc_interrupt = 1; /* enabled by default */
> >   */
> >  uint8_t rmv_interrupt = 1; /* enabled by default */
> >
> > +uint8_t hot_plug = 0; /**< hotplug disabled by default. */
> > +
> >  /*
> >   * Display or mask ether events
> >   * Default to all events except VF_MBOX @@ -391,6 +394,12 @@ static
> > void check_all_ports_link_status(uint32_t
> > port_mask);
> >  static int eth_event_callback(portid_t port_id,
> >  			      enum rte_eth_event_type type,
> >  			      void *param, void *ret_param);
> > +static int eth_dev_event_callback(char *device_name,
> > +				enum rte_dev_event_type type,
> > +				void *param);
> > +static int eth_dev_event_callback_register(void);
> > +static int eth_dev_event_callback_unregister(void);
> > +
> >
> >  /*
> >   * Check if all the ports are started.
> > @@ -1853,6 +1862,39 @@ reset_port(portid_t pid)
> >  	printf("Done\n");
> >  }
> >
> > +static int
> > +eth_dev_event_callback_register(void)
> > +{
> > +	int diag;
> > +
> > +	/* register the device event callback */
> > +	diag = rte_dev_event_callback_register(NULL,
> > +		eth_dev_event_callback, NULL);
> > +	if (diag) {
> > +		printf("Failed to setup dev_event callback\n");
> > +		return -1;
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +
> > +static int
> > +eth_dev_event_callback_unregister(void)
> > +{
> > +	int diag;
> > +
> > +	/* unregister the device event callback */
> > +	diag = rte_dev_event_callback_unregister(NULL,
> > +		eth_dev_event_callback, NULL);
> > +	if (diag) {
> > +		printf("Failed to setup dev_event callback\n");
> > +		return -1;
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> >  void
> >  attach_port(char *identifier)
> >  {
> > @@ -1916,6 +1958,7 @@ void
> >  pmd_test_exit(void)
> >  {
> >  	portid_t pt_id;
> > +	int ret;
> >
> >  	if (test_done == 0)
> >  		stop_packet_forwarding();
> > @@ -1929,6 +1972,18 @@ pmd_test_exit(void)
> >  			close_port(pt_id);
> >  		}
> >  	}
> > +
> > +	if (hot_plug) {
> > +		ret = rte_dev_event_monitor_stop();
> > +		if (ret)
> > +			RTE_LOG(ERR, EAL,
> > +				"fail to stop device event monitor.");
> > +
> > +		ret = eth_dev_event_callback_unregister();
> > +		if (ret)
> > +			RTE_LOG(ERR, EAL,
> > +				"fail to unregister all event callbacks.");
> > +	}
> >  	printf("\nBye...\n");
> >  }
> >
> > @@ -2059,6 +2114,48 @@ eth_event_callback(portid_t port_id, enum
> > rte_eth_event_type type, void *param,
> >  	return 0;
> >  }
> >
> > +/* This function is used by the interrupt thread */ static int
> > +eth_dev_event_callback(char *device_name, enum rte_dev_event_type
> > type,
> > +			     __rte_unused void *arg)
> > +{
> > +	int ret = 0;
> 
> From here
> 
> > +	static const char * const event_desc[] = {
> > +		[RTE_DEV_EVENT_ADD] = "add",
> > +		[RTE_DEV_EVENT_REMOVE] = "remove",
> > +	};
> > +
> > +	if (type >= RTE_DEV_EVENT_MAX) {
> > +		fprintf(stderr, "%s called upon invalid event %d\n",
> > +			__func__, type);
> > +		fflush(stderr);
> > +	} else if (event_print_mask & (UINT32_C(1) << type)) {
> > +		printf("%s event\n",
> > +			event_desc[type]);
> > +		fflush(stdout);
> > +	}
> 
> to here, these check are not necessary.
> 
> > +
> > +	switch (type) {
> > +	case RTE_DEV_EVENT_REMOVE:
> > +		RTE_LOG(ERR, EAL, "The device: %s has been removed!\n",
> > +			device_name);
> > +		/* TODO: After finish failure handle, begin to stop
> > +		 * packet forward, stop port, close port, detach port.
> > +		 */
> > +		break;
> > +	case RTE_DEV_EVENT_ADD:
> > +		RTE_LOG(ERR, EAL, "The device: %s has been added!\n",
> > +			device_name);
> > +		/* TODO: After finish kernel driver binding,
> > +		 * begin to attach port.
> > +		 */
> > +		break;
> > +	default:
> > +		break;
> > +	}
> > +	return ret;
> > +}
> > +
> >  static int
> >  set_tx_queue_stats_mapping_registers(portid_t port_id, struct
> > rte_port
> > *port)
> >  {
> > @@ -2474,8 +2571,9 @@ signal_handler(int signum)  int  main(int argc,
> > char** argv)  {
> > -	int  diag;
> > +	int diag;
> >  	portid_t port_id;
> > +	int ret;
> >
> >  	signal(SIGINT, signal_handler);
> >  	signal(SIGTERM, signal_handler);
> > @@ -2543,6 +2641,18 @@ main(int argc, char** argv)
> >  		       nb_rxq, nb_txq);
> >
> >  	init_config();
> > +
> > +	if (hot_plug) {
> > +		/* enable hot plug monitoring */
> > +		ret = rte_dev_event_monitor_start();
> > +		if (ret) {
> > +			rte_errno = EINVAL;
> > +			return -1;
> > +		}
> > +		eth_dev_event_callback_register();
> > +
> > +	}
> > +
> >  	if (start_port(RTE_PORT_ALL) != 0)
> >  		rte_exit(EXIT_FAILURE, "Start ports failed\n");
> >
> > diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h index
> > 153abea..8fde68d 100644
> > --- a/app/test-pmd/testpmd.h
> > +++ b/app/test-pmd/testpmd.h
> > @@ -319,6 +319,8 @@ extern volatile int test_done; /* stop packet
> > forwarding when set to 1. */  extern uint8_t lsc_interrupt; /**<
> > disabled by "--no-lsc-interrupt" parameter */  extern uint8_t
> > rmv_interrupt; /**< disabled by "--no-rmv-interrupt"
> > parameter */
> >  extern uint32_t event_print_mask;
> > +extern uint8_t hot_plug; /**< enable by "--hot-plug" parameter */
> > +
> >  /**< set by "--print-event xxxx" and "--mask-event xxxx parameters */
> >
> >  #ifdef RTE_LIBRTE_IXGBE_BYPASS
> > diff --git a/doc/guides/testpmd_app_ug/run_app.rst
> > b/doc/guides/testpmd_app_ug/run_app.rst
> > index 1fd5395..d0ced36 100644
> > --- a/doc/guides/testpmd_app_ug/run_app.rst
> > +++ b/doc/guides/testpmd_app_ug/run_app.rst
> > @@ -479,3 +479,7 @@ The commandline options are:
> >
> >      Set the hexadecimal bitmask of TX queue offloads.
> >      The default value is 0.
> > +
> > +*   ``--hot-plug``
> > +
> > +    Enable device event monitor machenism for hotplug.
> 
> s/machenism/mechanism
> 
> > --
> > 2.7.4

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V18 2/4] eal: add device event monitor framework
  2018-04-04  2:53                                                                     ` Tan, Jianfeng
@ 2018-04-05  3:44                                                                       ` Guo, Jia
  0 siblings, 0 replies; 494+ messages in thread
From: Guo, Jia @ 2018-04-05  3:44 UTC (permalink / raw)
  To: Tan, Jianfeng, stephen, Richardson, Bruce, Yigit, Ferruh,
	Ananyev, Konstantin, gaetan.rivet, Wu, Jingjing, thomas, motih,
	Van Haaren, Harry
  Cc: jblunck, shreyansh.jain, dev, Zhang, Helin



On 4/4/2018 10:53 AM, Tan, Jianfeng wrote:
>
>> -----Original Message-----
>> From: Guo, Jia
>> Sent: Tuesday, April 3, 2018 6:34 PM
>> To: stephen@networkplumber.org; Richardson, Bruce; Yigit, Ferruh;
>> Ananyev, Konstantin; gaetan.rivet@6wind.com; Wu, Jingjing;
>> thomas@monjalon.net; motih@mellanox.com; Van Haaren, Harry; Tan,
>> Jianfeng
>> Cc: jblunck@infradead.org; shreyansh.jain@nxp.com; dev@dpdk.org; Guo,
>> Jia; Zhang, Helin
>> Subject: [PATCH V18 2/4] eal: add device event monitor framework
>>
>> This patch aims to add a general device event monitor framework at
>> EAL device layer, for device hotplug awareness and actions adopted
>> accordingly. It could also expand for all other types of device event
>> monitor, but not in this scope at the stage.
>>
>> To get started, users firstly call below new added APIs to enable/disable
>> the device event monitor mechanism:
>>    - rte_dev_event_monitor_start
>>    - rte_dev_event_monitor_stop
>>
>> Then users shell register or unregister callbacks through the new added
>> APIs. Callbacks can be some device specific, or for all devices.
>>    -rte_dev_event_callback_register
>>    -rte_dev_event_callback_unregister
>>
>> Use hotplug case for example, when device hotplug insertion or hotplug
>> removal, we will get notified from kernel, then call user's callbacks
>> accordingly to handle it, such as detach or attach the device from the
>> bus, and could benefit further fail-safe or live-migration.
>>
>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
>> ---
>> v18->v17:
>> add feature announcement in release document, fix bsp compile issue.
>> ---
>>   doc/guides/rel_notes/release_18_05.rst  |   9 ++
>>   lib/librte_eal/bsdapp/eal/Makefile      |   1 +
>>   lib/librte_eal/bsdapp/eal/eal_dev.c     |  21 ++++
>>   lib/librte_eal/bsdapp/eal/meson.build   |   1 +
>>   lib/librte_eal/common/eal_common_dev.c  | 168
>> ++++++++++++++++++++++++++++++++
>>   lib/librte_eal/common/eal_private.h     |  15 +++
>>   lib/librte_eal/common/include/rte_dev.h |  94 ++++++++++++++++++
>>   lib/librte_eal/linuxapp/eal/Makefile    |   1 +
>>   lib/librte_eal/linuxapp/eal/eal_dev.c   |  22 +++++
>>   lib/librte_eal/linuxapp/eal/meson.build |   1 +
>>   lib/librte_eal/rte_eal_version.map      |   8 ++
>>   11 files changed, 341 insertions(+)
>>   create mode 100644 lib/librte_eal/bsdapp/eal/eal_dev.c
>>   create mode 100644 lib/librte_eal/linuxapp/eal/eal_dev.c
>>
>> diff --git a/doc/guides/rel_notes/release_18_05.rst
>> b/doc/guides/rel_notes/release_18_05.rst
>> index 3923dc2..37e00c4 100644
>> --- a/doc/guides/rel_notes/release_18_05.rst
>> +++ b/doc/guides/rel_notes/release_18_05.rst
>> @@ -41,6 +41,15 @@ New Features
>>        Also, make sure to start the actual text at the margin.
>>
>> =========================================================
>>
>> +* **Added device event monitor framework.**
>> +
>> +  Added a general device event monitor framework at EAL, for device
>> dynamic management.
>> +  Such as device hotplug awareness and actions adopted accordingly. The list
>> of new APIs:
>> +
>> +  * ``rte_dev_event_monitor_start`` and ``rte_dev_event_monitor_stop``
>> are for
>> +    the event monitor enable and disable.
>> +  * ``rte_dev_event_callback_register`` and
>> ``rte_dev_event_callback_unregister``
>> +    are for the user's callbacks register and unregister.
>>
>>   API Changes
>>   -----------
>> diff --git a/lib/librte_eal/bsdapp/eal/Makefile
>> b/lib/librte_eal/bsdapp/eal/Makefile
>> index dd455e6..c0921dd 100644
>> --- a/lib/librte_eal/bsdapp/eal/Makefile
>> +++ b/lib/librte_eal/bsdapp/eal/Makefile
>> @@ -33,6 +33,7 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) +=
>> eal_lcore.c
>>   SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_timer.c
>>   SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_interrupts.c
>>   SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_alarm.c
>> +SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_dev.c
>>
>>   # from common dir
>>   SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_common_lcore.c
>> diff --git a/lib/librte_eal/bsdapp/eal/eal_dev.c
>> b/lib/librte_eal/bsdapp/eal/eal_dev.c
>> new file mode 100644
>> index 0000000..1c6c51b
>> --- /dev/null
>> +++ b/lib/librte_eal/bsdapp/eal/eal_dev.c
>> @@ -0,0 +1,21 @@
>> +/* SPDX-License-Identifier: BSD-3-Clause
>> + * Copyright(c) 2018 Intel Corporation
>> + */
>> +
>> +#include <rte_log.h>
>> +#include <rte_compat.h>
>> +#include <rte_dev.h>
>> +
>> +int __rte_experimental
>> +rte_dev_event_monitor_start(void)
>> +{
>> +	RTE_LOG(ERR, EAL, "Device event is not supported for FreeBSD\n");
>> +	return -1;
>> +}
>> +
>> +int __rte_experimental
>> +rte_dev_event_monitor_stop(void)
>> +{
>> +	RTE_LOG(ERR, EAL, "Device event is not supported for FreeBSD\n");
>> +	return -1;
>> +}
>> diff --git a/lib/librte_eal/bsdapp/eal/meson.build
>> b/lib/librte_eal/bsdapp/eal/meson.build
>> index e83fc91..6dfc533 100644
>> --- a/lib/librte_eal/bsdapp/eal/meson.build
>> +++ b/lib/librte_eal/bsdapp/eal/meson.build
>> @@ -12,4 +12,5 @@ env_sources = files('eal_alarm.c',
>>   		'eal_timer.c',
>>   		'eal.c',
>>   		'eal_memory.c',
>> +		'eal_dev.c'
>>   )
>> diff --git a/lib/librte_eal/common/eal_common_dev.c
>> b/lib/librte_eal/common/eal_common_dev.c
>> index cd07144..e09e86f 100644
>> --- a/lib/librte_eal/common/eal_common_dev.c
>> +++ b/lib/librte_eal/common/eal_common_dev.c
>> @@ -14,9 +14,34 @@
>>   #include <rte_devargs.h>
>>   #include <rte_debug.h>
>>   #include <rte_log.h>
>> +#include <rte_spinlock.h>
>> +#include <rte_malloc.h>
>>
>>   #include "eal_private.h"
>>
>> +/* spinlock for device callbacks */
> It's for protect callback list.
>
>> +static rte_spinlock_t dev_event_lock = RTE_SPINLOCK_INITIALIZER;
> Put this spinlock where the list locates.
>
>> +
>> +/**
>> + * The device event callback description.
>> + *
>> + * It contains callback address to be registered by user application,
>> + * the pointer to the parameters for callback, and the device name.
>> + */
>> +struct dev_event_callback {
>> +	TAILQ_ENTRY(dev_event_callback) next; /**< Callbacks list */
>> +	rte_dev_event_cb_fn cb_fn;                /**< Callback address */
>> +	void *cb_arg;                           /**< Callback parameter */
>> +	char *dev_name;	 /**< Callback device name, NULL is for all
>> device */
>> +	uint32_t active;                        /**< Callback is executing */
>> +};
>> +
>> +/** @internal Structure to keep track of registered callbacks */
>> +TAILQ_HEAD(dev_event_cb_list, dev_event_callback);
>> +
>> +/* The device event callback list for all registered callbacks. */
>> +static struct dev_event_cb_list dev_event_cbs;
>> +
>>   static int cmp_detached_dev_name(const struct rte_device *dev,
>>   	const void *_name)
>>   {
>> @@ -207,3 +232,146 @@ rte_eal_hotplug_remove(const char *busname,
>> const char *devname)
>>   	rte_eal_devargs_remove(busname, devname);
>>   	return ret;
>>   }
>> +
>> +int __rte_experimental
>> +rte_dev_event_callback_register(const char *device_name,
>> +				rte_dev_event_cb_fn cb_fn,
>> +				void *cb_arg)
>> +{
>> +	struct dev_event_callback *event_cb;
>> +	int ret;
>> +
>> +	if (!cb_fn)
>> +		return -EINVAL;
>> +
>> +	rte_spinlock_lock(&dev_event_lock);
>> +
>> +	if (TAILQ_EMPTY(&dev_event_cbs))
>> +		TAILQ_INIT(&dev_event_cbs);
>> +
>> +	TAILQ_FOREACH(event_cb, &dev_event_cbs, next) {
>> +		if (event_cb->cb_fn == cb_fn && event_cb->cb_arg ==
>> cb_arg) {
>> +			if (device_name == NULL && event_cb->dev_name
>> == NULL)
>> +				break;
>> +			if (device_name == NULL || event_cb->dev_name
>> == NULL)
>> +				continue;
>> +			if (!strcmp(event_cb->dev_name, device_name))
>> +				break;
>> +		}
>> +	}
>> +
>> +	/* create a new callback. */
>> +	if (event_cb == NULL) {
>> +		event_cb = malloc(sizeof(struct dev_event_callback));
>> +		if (event_cb != NULL) {
>> +			event_cb->cb_fn = cb_fn;
>> +			event_cb->cb_arg = cb_arg;
>> +			event_cb->active = 0;
>> +			if (!device_name) {
>> +				event_cb->dev_name = NULL;
>> +			} else {
>> +				event_cb->dev_name =
>> strdup(device_name);
>> +				if (event_cb->dev_name == NULL) {
>> +					ret = -ENOMEM;
>> +					goto error;
>> +				}
>> +			}
>> +			TAILQ_INSERT_TAIL(&dev_event_cbs, event_cb,
>> next);
>> +		} else {
>> +			RTE_LOG(ERR, EAL,
>> +				"Failed to allocate memory for device "
>> +				"event callback.");
>> +			ret = -ENOMEM;
>> +			goto error;
>> +		}
>> +	} else {
>> +		RTE_LOG(ERR, EAL,
>> +			"The callback is already exist, no need "
>> +			"to register again.\n");
>> +		ret = -EEXIST;
>> +		goto error;
> Here is a bug that you will free an existing callback entry.
you are correct.
>> +	}
>> +
>> +	rte_spinlock_unlock(&dev_event_lock);
>> +	return 0;
>> +error:
>> +	free(event_cb);
>> +	rte_spinlock_unlock(&dev_event_lock);
>> +	return ret;
>> +}
>> +
>> +int __rte_experimental
>> +rte_dev_event_callback_unregister(const char *device_name,
>> +				  rte_dev_event_cb_fn cb_fn,
>> +				  void *cb_arg)
>> +{
> Let's clearly define the behavior and return of this API. If I understand it correctly,
>
>      If cb_arg != -1, we use (dev_name, cb_fn, cb_arg) as the key to look up the registered API.
>      If cb_arg == -1, we use (cb_fn) as the key to look up the registered API.
>
> For return value, we want to return the number of callbacks being removed. It could be:
>    >=0, number of callbacks been removed. (When we encounter an active callback, we shall skip it or just return -EAGAIN, neither sounds good to me actually)
>   <0, error encountered.
>
> If you agree with above statement, below implementation has lots of issues.
>
>> +	int ret = 0;
>> +	struct dev_event_callback *event_cb, *next;
>> +
>> +	if (!cb_fn)
>> +		return -EINVAL;
>> +
>> +	rte_spinlock_lock(&dev_event_lock);
>> +
>> +	/*walk through the callbacks and remove all that match. */
>> +	for (event_cb = TAILQ_FIRST(&dev_event_cbs); event_cb != NULL;
>> +	     event_cb = next) {
>> +
>> +		next = TAILQ_NEXT(event_cb, next);
> First of all, if cb_fn  != event_cb->cb_fn, we shall continue.
might not, if cb_arg = -1 for all cb,  don't need to care if  cb_fn 
equal event_cb->cb_fn or not.
>> +
>> +		if (device_name != NULL && event_cb->dev_name != NULL) {
>> +			if (!strcmp(event_cb->dev_name, device_name)) {
>> +				if (event_cb->cb_fn != cb_fn ||
>> +				    (cb_arg != (void *)-1 &&
>> +				    event_cb->cb_arg != cb_arg))
>> +					continue;
>> +			}
>> +		} else if (device_name != NULL) {
>> +			continue;
>> +		}
> What about device_name == NULL && event_cb->dev_name != NULL && cb_arg == -1?
if device_name == NULL, it mean for all device, just process any cb.
>
> What about device_name == NULL && event_cb->dev_name == NULL &&  cb_arg != -1 && cb_arg != event_cb->cb_arg?
if device_name == NULL, don't care about cb_arg, just remove the back.
>> +
>> +		/*
>> +		 * if this callback is not executing right now,
>> +		 * then remove it.
>> +		 */
>> +		if (event_cb->active == 0) {
>> +			TAILQ_REMOVE(&dev_event_cbs, event_cb, next);
>> +			free(event_cb);
>> +			ret++;
>> +		} else {
>> +			ret = -EAGAIN;
> If you don't break here, next time you find another satisfied callback, you will ret++ on a (-EAGAIN) value.
here , i think you are correct. but since return ret value is for number 
check, so that it would just continue here.
>> +		}
>> +	}
>> +	rte_spinlock_unlock(&dev_event_lock);
>> +	return ret;
>> +}
> BTW, don't know why DPDK has the tradition of using cb_arg==-1 to stand for multiple callbacks, it's not a good API design to me. Would like as others' opinions, shall we continue this?
i don't have obvious objection for the cb_arg=-1 usage, it might make 
sense.
>> +
>> +void
>> +dev_callback_process(char *device_name, enum rte_dev_event_type
>> event)
>> +{
>> +	struct dev_event_callback *cb_lst;
>> +	int rc;
>> +
>> +	if (device_name == NULL)
>> +		return;
>> +
>> +	rte_spinlock_lock(&dev_event_lock);
>> +
>> +	TAILQ_FOREACH(cb_lst, &dev_event_cbs, next) {
>> +		if (cb_lst->dev_name) {
>> +			if (strcmp(cb_lst->dev_name, device_name))
>> +				continue;
>> +		}
>> +		cb_lst->active = 1;
>> +		rte_spinlock_unlock(&dev_event_lock);
>> +		rc = cb_lst->cb_fn(device_name, event,
>> +				cb_lst->cb_arg);
>> +		if (rc) {
>> +			RTE_LOG(ERR, EAL,
>> +				"Failed to process callback function.");
>> +		}
> I don't see a reason why we need the return value from callbacks. Probably, define it as void type.
>> +		rte_spinlock_lock(&dev_event_lock);
>> +		cb_lst->active = 0;
>> +	}
>> +	rte_spinlock_unlock(&dev_event_lock);
>> +}
>> diff --git a/lib/librte_eal/common/eal_private.h
>> b/lib/librte_eal/common/eal_private.h
>> index 0b28770..88e5a59 100644
>> --- a/lib/librte_eal/common/eal_private.h
>> +++ b/lib/librte_eal/common/eal_private.h
>> @@ -9,6 +9,8 @@
>>   #include <stdint.h>
>>   #include <stdio.h>
>>
>> +#include <rte_dev.h>
>> +
>>   /**
>>    * Initialize the memzone subsystem (private to eal).
>>    *
>> @@ -205,4 +207,17 @@ struct rte_bus
>> *rte_bus_find_by_device_name(const char *str);
>>
>>   int rte_mp_channel_init(void);
>>
>> +/**
>> + * Internal Executes all the user application registered callbacks for
>> + * the specific device. It is for DPDK internal user only. User
>> + * application should not call it directly.
>> + *
>> + * @param device_name
>> + *  The device name.
>> + * @param event
>> + *  the device event type.
>> + *
>> + */
>> +void
>> +dev_callback_process(char *device_name, enum rte_dev_event_type event);
> Too many *_process functions in this patch. Let's avoid using such ambiguous words.
>
> For example, you can rename this function to dev_event_callback_invoke().
hold on this define,will consider rename other side.
>>   #endif /* _EAL_PRIVATE_H_ */
>> diff --git a/lib/librte_eal/common/include/rte_dev.h
>> b/lib/librte_eal/common/include/rte_dev.h
>> index b688f1e..4c78938 100644
>> --- a/lib/librte_eal/common/include/rte_dev.h
>> +++ b/lib/librte_eal/common/include/rte_dev.h
>> @@ -24,6 +24,25 @@ extern "C" {
>>   #include <rte_compat.h>
>>   #include <rte_log.h>
>>
>> +/**
>> + * The device event type.
>> + */
>> +enum rte_dev_event_type {
>> +	RTE_DEV_EVENT_ADD,	/**< device being added */
>> +	RTE_DEV_EVENT_REMOVE,	/**< device being removed */
>> +	RTE_DEV_EVENT_MAX	/**< max value of this enum */
>> +};
>> +
>> +struct rte_dev_event {
>> +	enum rte_dev_event_type type;	/**< device event type */
>> +	int subsystem;			/**< subsystem id */
>> +	char *devname;			/**< device name */
>> +};
>> +
>> +typedef int (*rte_dev_event_cb_fn)(char *device_name,
>> +					enum rte_dev_event_type event,
>> +					void *cb_arg);
>> +
>>   __attribute__((format(printf, 2, 0)))
>>   static inline void
>>   rte_pmd_debug_trace(const char *func_name, const char *fmt, ...)
>> @@ -267,4 +286,79 @@ __attribute__((used)) = str
>>   }
>>   #endif
>>
>> +/**
>> + * @warning
>> + * @b EXPERIMENTAL: this API may change without prior notice
>> + *
>> + * It registers the callback for the specific device.
>> + * Multiple callbacks cal be registered at the same time.
>> + *
>> + * @param device_name
>> + *  The device name, that is the param name of the struct rte_device,
>> + *  null value means for all devices.
>> + * @param cb_fn
>> + *  callback address.
>> + * @param cb_arg
>> + *  address of parameter for callback.
>> + *
>> + * @return
>> + *  - On success, zero.
>> + *  - On failure, a negative value.
>> + */
>> +int __rte_experimental
>> +rte_dev_event_callback_register(const char *device_name,
>> +				rte_dev_event_cb_fn cb_fn,
>> +				void *cb_arg);
>> +
>> +/**
>> + * @warning
>> + * @b EXPERIMENTAL: this API may change without prior notice
>> + *
>> + * It unregisters the callback according to the specified device.
>> + *
>> + * @param device_name
>> + *  The device name, that is the param name of the struct rte_device,
>> + *  null value means for all devices.
>> + * @param cb_fn
>> + *  callback address.
>> + * @param cb_arg
>> + *  address of parameter for callback, (void *)-1 means to remove all
>> + *  registered which has the same callback address.
>> + *
>> + * @return
>> + *  - On success, return the number of callback entities removed.
>> + *  - On failure, a negative value.
>> + */
>> +int __rte_experimental
>> +rte_dev_event_callback_unregister(const char *device_name,
>> +				  rte_dev_event_cb_fn cb_fn,
>> +				  void *cb_arg);
>> +
>> +/**
>> + * @warning
>> + * @b EXPERIMENTAL: this API may change without prior notice
>> + *
>> + * Start the device event monitoring.
>> + *
>> + * @param none
>> + * @return
>> + *   - On success, zero.
>> + *   - On failure, a negative value.
>> + */
>> +int __rte_experimental
>> +rte_dev_event_monitor_start(void);
>> +
>> +/**
>> + * @warning
>> + * @b EXPERIMENTAL: this API may change without prior notice
>> + *
>> + * Stop the device event monitoring .
>> + *
>> + * @param none
>> + * @return
>> + *   - On success, zero.
>> + *   - On failure, a negative value.
>> + */
>> +int __rte_experimental
>> +rte_dev_event_monitor_stop(void);
>>   #endif /* _RTE_DEV_H_ */
>> diff --git a/lib/librte_eal/linuxapp/eal/Makefile
>> b/lib/librte_eal/linuxapp/eal/Makefile
>> index 7e5bbe8..8578796 100644
>> --- a/lib/librte_eal/linuxapp/eal/Makefile
>> +++ b/lib/librte_eal/linuxapp/eal/Makefile
>> @@ -41,6 +41,7 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) +=
>> eal_lcore.c
>>   SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_timer.c
>>   SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_interrupts.c
>>   SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_alarm.c
>> +SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_dev.c
>>
>>   # from common dir
>>   SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_common_lcore.c
>> diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c
>> b/lib/librte_eal/linuxapp/eal/eal_dev.c
>> new file mode 100644
>> index 0000000..9c8d1a0
>> --- /dev/null
>> +++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
>> @@ -0,0 +1,22 @@
>> +/* SPDX-License-Identifier: BSD-3-Clause
>> + * Copyright(c) 2018 Intel Corporation
>> + */
>> +
>> +#include <rte_log.h>
>> +#include <rte_compat.h>
>> +#include <rte_dev.h>
>> +
>> +
>> +int __rte_experimental
>> +rte_dev_event_monitor_start(void)
>> +{
>> +	/* TODO: start uevent monitor for linux */
>> +	return 0;
>> +}
>> +
>> +int __rte_experimental
>> +rte_dev_event_monitor_stop(void)
>> +{
>> +	/* TODO: stop uevent monitor for linux */
>> +	return 0;
>> +}
>> diff --git a/lib/librte_eal/linuxapp/eal/meson.build
>> b/lib/librte_eal/linuxapp/eal/meson.build
>> index 03974ff..b222571 100644
>> --- a/lib/librte_eal/linuxapp/eal/meson.build
>> +++ b/lib/librte_eal/linuxapp/eal/meson.build
>> @@ -18,6 +18,7 @@ env_sources = files('eal_alarm.c',
>>   		'eal_vfio_mp_sync.c',
>>   		'eal.c',
>>   		'eal_memory.c',
>> +		'eal_dev.c',
>>   )
>>
>>   if has_libnuma == 1
>> diff --git a/lib/librte_eal/rte_eal_version.map
>> b/lib/librte_eal/rte_eal_version.map
>> index d123602..d23f491 100644
>> --- a/lib/librte_eal/rte_eal_version.map
>> +++ b/lib/librte_eal/rte_eal_version.map
>> @@ -256,3 +256,11 @@ EXPERIMENTAL {
>>   	rte_service_start_with_defaults;
>>
>>   } DPDK_18.02;
>> +
>> +EXPERIMENTAL {
>> +        global:
>> +
>> +        rte_dev_event_callback_register;
>> +        rte_dev_event_callback_unregister;
>> +
>> +} DPDK_18.05;
>> --
>> 2.7.4

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V18 3/4] eal/linux: uevent parse and process
  2018-04-04  3:15                                                                     ` Tan, Jianfeng
@ 2018-04-05  6:09                                                                       ` Guo, Jia
  0 siblings, 0 replies; 494+ messages in thread
From: Guo, Jia @ 2018-04-05  6:09 UTC (permalink / raw)
  To: Tan, Jianfeng, stephen, Richardson, Bruce, Yigit, Ferruh,
	Ananyev, Konstantin, gaetan.rivet, Wu, Jingjing, thomas, motih,
	Van Haaren, Harry
  Cc: jblunck, shreyansh.jain, dev, Zhang, Helin

thanks for review.


On 4/4/2018 11:15 AM, Tan, Jianfeng wrote:
> Hi Jeff,
>
> Looks much better now, but still have some issues to address.
>
>> -----Original Message-----
>> From: Guo, Jia
>> Sent: Tuesday, April 3, 2018 6:34 PM
>> To: stephen@networkplumber.org; Richardson, Bruce; Yigit, Ferruh;
>> Ananyev, Konstantin; gaetan.rivet@6wind.com; Wu, Jingjing;
>> thomas@monjalon.net; motih@mellanox.com; Van Haaren, Harry; Tan,
>> Jianfeng
>> Cc: jblunck@infradead.org; shreyansh.jain@nxp.com; dev@dpdk.org; Guo,
>> Jia; Zhang, Helin
>> Subject: [PATCH V18 3/4] eal/linux: uevent parse and process
>>
>> In order to handle the uevent which has been detected from the kernel
>> side, add uevent parse and process function to translate the uevent into
>> device event, which user has subscribed to monitor.
>>
>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
>> ---
>> v18->v17:
>> refine socket configuration.
>> ---
>>   lib/librte_eal/linuxapp/eal/eal_dev.c | 178
>> +++++++++++++++++++++++++++++++++-
>>   1 file changed, 176 insertions(+), 2 deletions(-)
>>
>> diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c
>> b/lib/librte_eal/linuxapp/eal/eal_dev.c
>> index 9c8d1a0..9f2ee40 100644
>> --- a/lib/librte_eal/linuxapp/eal/eal_dev.c
>> +++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
>> @@ -2,21 +2,195 @@
>>    * Copyright(c) 2018 Intel Corporation
>>    */
>>
>> +#include <string.h>
>> +#include <unistd.h>
>> +#include <sys/socket.h>
>> +#include <linux/netlink.h>
>> +
>>   #include <rte_log.h>
>>   #include <rte_compat.h>
>>   #include <rte_dev.h>
>> +#include <rte_malloc.h>
>> +#include <rte_interrupts.h>
>> +
>> +#include "eal_private.h"
>> +
>> +static struct rte_intr_handle intr_handle = {.fd = -1 };
>> +static bool monitor_started;
>> +
>> +#define EAL_UEV_MSG_LEN 4096
>> +#define EAL_UEV_MSG_ELEM_LEN 128
>> +
>> +/* identify the system layer which event exposure from */
> Reword it a little bit:
>      /* identify the system layer which reports this event */
>
>> +enum eal_dev_event_subsystem {
>> +	EAL_DEV_EVENT_SUBSYSTEM_PCI, /* PCI bus device event */
>> +	EAL_DEV_EVENT_SUBSYSTEM_UIO, /* UIO driver device event */
>> +	EAL_DEV_EVENT_SUBSYSTEM_VFIO, /* VFIO driver device event */
>> +	EAL_DEV_EVENT_SUBSYSTEM_MAX
>> +};
>> +
>> +static int
>> +dev_uev_socket_fd_create(void)
>> +{
>> +	struct sockaddr_nl addr;
>> +	int ret;
>> +
>> +	intr_handle.fd = socket(PF_NETLINK, SOCK_RAW | SOCK_CLOEXEC |
>> +			SOCK_NONBLOCK,
>> +			NETLINK_KOBJECT_UEVENT);
>> +	if (intr_handle.fd < 0) {
>> +		RTE_LOG(ERR, EAL, "create uevent fd failed.\n");
>> +		return -1;
>> +	}
>> +
>> +	memset(&addr, 0, sizeof(addr));
>> +	addr.nl_family = AF_NETLINK;
>> +	addr.nl_pid = 0;
>> +	addr.nl_groups = 0xffffffff;
>> +
>> +	ret = bind(intr_handle.fd, (struct sockaddr *) &addr, sizeof(addr));
>> +	if (ret < 0) {
>> +		RTE_LOG(ERR, EAL, "Failed to bind socket for netlink fd.\n");
> Reword it a little bit so that we can understand it's a log related to hotplug:
>      Failed to bind uevent socket
>
>> +		goto err;
>> +	}
>> +
>> +	return 0;
>> +err:
>> +	close(intr_handle.fd);
> Then: intr_handle.fd = -1?
sure.
>> +	return ret;
>> +}
>> +
>> +static void
>> +dev_uev_parse(const char *buf, struct rte_dev_event *event, int length)
>> +{
>> +	char action[EAL_UEV_MSG_ELEM_LEN];
>> +	char subsystem[EAL_UEV_MSG_ELEM_LEN];
>> +	char pci_slot_name[EAL_UEV_MSG_ELEM_LEN];
>> +	int i = 0;
>> +
>> +	memset(action, 0, EAL_UEV_MSG_ELEM_LEN);
>> +	memset(subsystem, 0, EAL_UEV_MSG_ELEM_LEN);
>> +	memset(pci_slot_name, 0, EAL_UEV_MSG_ELEM_LEN);
>> +
>> +	while (i < length) {
>> +		for (; i < length; i++) {
>> +			if (*buf)
>> +				break;
>> +			buf++;
>> +		}
>> +		if (!strncmp(buf, "ACTION=", 7)) {
>> +			buf += 7;
>> +			i += 7;
>> +			snprintf(action, sizeof(action), "%s", buf);
>> +		} else if (!strncmp(buf, "SUBSYSTEM=", 10)) {
>> +			buf += 10;
>> +			i += 10;
>> +			snprintf(subsystem, sizeof(subsystem), "%s", buf);
>> +		} else if (!strncmp(buf, "PCI_SLOT_NAME=", 14)) {
>> +			buf += 14;
>> +			i += 14;
>> +			snprintf(pci_slot_name, sizeof(subsystem), "%s", buf);
>> +			event->devname = strdup(pci_slot_name);
>> +		}
>> +		for (; i < length; i++) {
>> +			if (*buf == '\0')
>> +				break;
>> +			buf++;
>> +		}
>> +	}
>> +
>> +	if (!strncmp(subsystem, "uio", 3))
>> +		event->subsystem = EAL_DEV_EVENT_SUBSYSTEM_UIO;
>> +	else if (!strncmp(subsystem, "pci", 3))
>> +		event->subsystem = EAL_DEV_EVENT_SUBSYSTEM_PCI;
>> +	else if (!strncmp(subsystem, "vfio", 4))
> How can we indicate it is an event with subsystem that we will not handle?
>
>> +		event->subsystem = EAL_DEV_EVENT_SUBSYSTEM_VFIO;
>> +	if (!strncmp(action, "add", 3))
>> +		event->type = RTE_DEV_EVENT_ADD;
>> +	if (!strncmp(action, "remove", 6))
>> +		event->type = RTE_DEV_EVENT_REMOVE;
> How can we indicate it is an event with type that we will not handle?
>
> My suggestion is to define a return value for that:
> - EVENT_VALID returned for an event that we will handle later.
> - EVENT_INVALID returned for any unknown events.
make sense ,just need 0 & -1 enough i think.
>> +}
>> +
>> +static int
>> +dev_uev_receive(int fd, struct rte_dev_event *uevent)
>> +{
>> +	int ret;
>> +	char buf[EAL_UEV_MSG_LEN];
>> +
>> +	memset(uevent, 0, sizeof(struct rte_dev_event));
>> +	memset(buf, 0, EAL_UEV_MSG_LEN);
>> +
>> +	ret = recv(fd, buf, EAL_UEV_MSG_LEN, MSG_DONTWAIT);
>> +	if (ret < 0) {
>> +		RTE_LOG(ERR, EAL,
>> +			"Socket read error(%d): %s.\n",
>> +			errno, strerror(errno));
>> +		return -1;
>> +	} else if (ret == 0)
>> +		/* connection closed */
>> +		return -1;
>> +
>> +	dev_uev_parse(buf, uevent, EAL_UEV_MSG_LEN);
>> +
>> +	return 0;
>> +}
>> +
>> +static void
>> +dev_uev_process(__rte_unused void *param)
>> +{
>> +	struct rte_dev_event uevent;
>> +
>> +	if (dev_uev_receive(intr_handle.fd, &uevent))
> If error happens, we shall start an alarm task to remove the callback of interrupt thread.
>> +		return;
> You may want to add a log here for debugging, showing what event comes for which device.
>
>> +	if (uevent.devname)
> You can also filter this kind of events using the way I suggested above.
>
>> +		dev_callback_process(uevent.devname, uevent.type);
>> +}
>>
>>   int __rte_experimental
>>   rte_dev_event_monitor_start(void)
>>   {
>> -	/* TODO: start uevent monitor for linux */
>> +	int ret;
>> +
>> +	if (monitor_started)
>> +		return 0;
>> +
>> +	ret = dev_uev_socket_fd_create();
>> +	if (ret) {
>> +		RTE_LOG(ERR, EAL, "error create device event fd.\n");
>> +		return -1;
>> +	}
>> +
>> +	intr_handle.type = RTE_INTR_HANDLE_DEV_EVENT;
>> +	ret = rte_intr_callback_register(&intr_handle, dev_uev_process,
>> NULL);
>> +
>> +	if (ret) {
>> +		RTE_LOG(ERR, EAL, "fail to register uevent callback.\n");
>> +		return -1;
>> +	}
>> +
>> +	monitor_started = true;
>> +
>>   	return 0;
>>   }
>>
>>   int __rte_experimental
>>   rte_dev_event_monitor_stop(void)
>>   {
>> -	/* TODO: stop uevent monitor for linux */
>> +	int ret;
>> +
>> +	if (!monitor_started)
>> +		return 0;
>> +
>> +	ret = rte_intr_callback_unregister(&intr_handle, dev_uev_process,
>> +					   (void *)-1);
>> +	if (ret < 0) {
>> +		RTE_LOG(ERR, EAL, "fail to unregister uevent callback.\n");
>> +		return ret;
>> +	}
>> +
>> +	close(intr_handle.fd);
>> +	intr_handle.fd = -1;
>> +	monitor_started = false;
>>   	return 0;
>>   }
>> --
>> 2.7.4

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH V19 0/4] add device event monitor framework
  2018-04-03 10:33                                                                   ` [PATCH V18 4/4] app/testpmd: enable device hotplug monitoring Jeff Guo
  2018-04-04  3:22                                                                     ` Tan, Jianfeng
@ 2018-04-05  8:32                                                                     ` Jeff Guo
  2018-04-05  8:32                                                                       ` [PATCH V19 1/4] eal: add device event handle in interrupt thread Jeff Guo
                                                                                         ` (3 more replies)
  2018-04-05  9:02                                                                     ` [PATCH V19 0/4] add device event monitor framework Jeff Guo
  2 siblings, 4 replies; 494+ messages in thread
From: Jeff Guo @ 2018-04-05  8:32 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, harry.van.haaren,
	jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

About hot plug in dpdk, We already have proactive way to add/remove devices
through APIs (rte_eal_hotplug_add/remove), and also have fail-safe driver
to offload the fail-safe work from the app user. But there are still lack
of a general mechanism to monitor hotplug event for all driver, now the
hotplug interrupt event is diversity between each device and driver, such
as mlx4, pci driver and others.

Use the hot removal event for example, pci drivers not all exposure the
remove interrupt, so in order to make user to easy use the hot plug
feature for pci driver, something must be done to detect the remove event
at the kernel level and offer a new line of interrupt to the user land.

Base on the uevent of kobject mechanism in kernel, we could use it to
benefit for monitoring the hot plug status of the device which not only
uio/vfio of pci bus devices, but also other, such as cpu/usb/pci-express bus devices.

The idea is comming as bellow.

a.The uevent message form FD monitoring like below.
remove@/devices/pci0000:80/0000:80:02.2/0000:82:00.0/0000:83:03.0/0000:84:00.2/uio/uio2
ACTION=remove
DEVPATH=/devices/pci0000:80/0000:80:02.2/0000:82:00.0/0000:83:03.0/0000:84:00.2/uio/uio2
SUBSYSTEM=uio
MAJOR=243
MINOR=2
DEVNAME=uio2
SEQNUM=11366

b.add device event monitor framework:
add several general api to enable uevent monitoring.

c.show example how to use uevent monitor
enable uevent monitoring in testpmd to show device event monitor machenism usage.

TODO: failure handler mechanism for hot plug and driver auto bind for hot insertion.
that would let the next hot plug patch set to cover.

patchset history:
v19->v18:
fix some typo and misunderstanding part

v18->v17:
1.add feature announcement in release document, fix bsp compile issue.
2.refine socket configuration.
3.remove hotplug policy and detach/attach process from testpmd, let it
focus on the device event monitoring which the patch set introduced.

v17->v16:
1.add related part of the interrupt handle type adding.
2.add new API into map, fix typo issue, add (void*)-1 value for unregister all callback
3.add new file into meson.build, modify coding sytle and add print info, delete unused part.
4.unregister all user's callback when stop event monitor

v16->v15:
1.remove some linux related code out of eal common layer
2.fix some uneasy readble issue.

v15->v14:
1.use exist eal interrupt epoll to replace of rte service usage for monitor thread,
2.add new device event handle type in eal interrupt.
3.remove the uevent type check and any policy from eal,
let it check and management in user's callback.
4.add "--hot-plug" configure parameter in testpmd to switch the hotplug feature.

v14->v13:
1.add __rte_experimental on function defind and fix bsd build issue

v13->v12:
1.fix some logic issue and null check issue
2.fix monitor stop func issue

v12->v11:
1.identify null param in callback for monitor all devices uevent

v11->v10:
1:modify some typo and add experimental tag in new file.
2:modify callback register calling.

v10->v9:
1.fix prefix issue.
2.use a common callback lists for all device and all type to replace
add callback parameter into device struct.
3.delete some unuse part.

v9->v8:
split the patch set into small and explicit patch

v8->v7:
1.use rte_service to replace pthread management.
2.fix defind issue and copyright issue
3.fix some lock issue

v7->v6:
1.modify vdev part according to the vdev rework
2.re-define and split the func into common and bus specific code
3.fix some incorrect issue.
4.fix the system hung after send packcet issue.

v6->v5:
1.add hot plug policy, in eal, default handle to prepare hot plug work for
all pci device, then let app to manage to deside which device need to
hot plug.
2.modify to manage event callback in each device.
3.fix some system hung issue when igb_uioome typo error.release.
4.modify the pci part to the bus-pci base on the bus rework.
5.add hot plug policy in app, show example to use hotplug list to manage
to deside which device need to hot plug.

v5->v4:
1.Move uevent monitor epolling from eal interrupt to eal device layer.
2.Redefine the eal device API for common, and distinguish between linux and bsd
3.Add failure handler helper api in bus layer.Add function of find device by name.
4.Replace of individual fd bind with single device, use a common fd to polling all device.
5.Add to register hot insertion monitoring and process, add function to auto bind driver befor user add device
6.Refine some coding style and typos issue
7.add new callback to process hot insertion

v4->v3:
1.move uevent monitor api from eal interrupt to eal device layer.
2.create uevent type and struct in eal device.
3.move uevent handler for each driver to eal layer.
4.add uevent failure handler to process signal fault issue.
5.add example for request and use uevent monitoring in testpmd.

v3->v2:
1.refine some return error
2.refine the string searching logic to avoid memory issue

v2->v1:
1.remove global variables of hotplug_fd, add uevent_fd
in rte_intr_handle to let each pci device self maintain it fd,
to fix dual device fd issue.
2.refine some typo error.

Jeff Guo (4):
  eal: add device event handle in interrupt thread
  eal: add device event monitor framework
  eal/linux: uevent parse and process
  app/testpmd: enable device hotplug monitoring

 app/test-pmd/parameters.c                          |   5 +-
 app/test-pmd/testpmd.c                             | 101 +++++++++-
 app/test-pmd/testpmd.h                             |   2 +
 doc/guides/rel_notes/release_18_05.rst             |   9 +
 doc/guides/testpmd_app_ug/run_app.rst              |   4 +
 lib/librte_eal/bsdapp/eal/Makefile                 |   1 +
 lib/librte_eal/bsdapp/eal/eal_dev.c                |  21 ++
 lib/librte_eal/bsdapp/eal/meson.build              |   1 +
 lib/librte_eal/common/eal_common_dev.c             | 161 ++++++++++++++++
 lib/librte_eal/common/eal_private.h                |  15 ++
 lib/librte_eal/common/include/rte_dev.h            |  94 +++++++++
 lib/librte_eal/common/include/rte_eal_interrupts.h |   1 +
 lib/librte_eal/linuxapp/eal/Makefile               |   1 +
 lib/librte_eal/linuxapp/eal/eal_dev.c              | 214 +++++++++++++++++++++
 lib/librte_eal/linuxapp/eal/eal_interrupts.c       |  11 +-
 lib/librte_eal/linuxapp/eal/meson.build            |   1 +
 lib/librte_eal/rte_eal_version.map                 |  10 +
 test/test/test_interrupts.c                        |  39 +++-
 18 files changed, 686 insertions(+), 5 deletions(-)
 create mode 100644 lib/librte_eal/bsdapp/eal/eal_dev.c
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_dev.c

-- 
2.7.4

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH V19 1/4] eal: add device event handle in interrupt thread
  2018-04-05  8:32                                                                     ` [PATCH V19 0/4] add device event monitor framework Jeff Guo
@ 2018-04-05  8:32                                                                       ` Jeff Guo
  2018-04-05  8:32                                                                       ` [PATCH V19 2/4] eal: add device event monitor framework Jeff Guo
                                                                                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-04-05  8:32 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, harry.van.haaren,
	jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

Add new interrupt handle type of RTE_INTR_HANDLE_DEV_EVENT, for
device event interrupt monitor.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v19->v18:
fix some typo
---
 lib/librte_eal/common/include/rte_eal_interrupts.h |  1 +
 lib/librte_eal/linuxapp/eal/eal_interrupts.c       | 11 +++++-
 test/test/test_interrupts.c                        | 39 ++++++++++++++++++++--
 3 files changed, 48 insertions(+), 3 deletions(-)

diff --git a/lib/librte_eal/common/include/rte_eal_interrupts.h b/lib/librte_eal/common/include/rte_eal_interrupts.h
index 3f792a9..6eb4932 100644
--- a/lib/librte_eal/common/include/rte_eal_interrupts.h
+++ b/lib/librte_eal/common/include/rte_eal_interrupts.h
@@ -34,6 +34,7 @@ enum rte_intr_handle_type {
 	RTE_INTR_HANDLE_ALARM,        /**< alarm handle */
 	RTE_INTR_HANDLE_EXT,          /**< external handler */
 	RTE_INTR_HANDLE_VDEV,         /**< virtual device */
+	RTE_INTR_HANDLE_DEV_EVENT,    /**< device event handle */
 	RTE_INTR_HANDLE_MAX           /**< count of elements */
 };
 
diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
index f86f22f..58e9328 100644
--- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c
+++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
@@ -559,6 +559,9 @@ rte_intr_enable(const struct rte_intr_handle *intr_handle)
 			return -1;
 		break;
 #endif
+	/* not used at this moment */
+	case RTE_INTR_HANDLE_DEV_EVENT:
+		return -1;
 	/* unknown handle type */
 	default:
 		RTE_LOG(ERR, EAL,
@@ -606,6 +609,9 @@ rte_intr_disable(const struct rte_intr_handle *intr_handle)
 			return -1;
 		break;
 #endif
+	/* not used at this moment */
+	case RTE_INTR_HANDLE_DEV_EVENT:
+		return -1;
 	/* unknown handle type */
 	default:
 		RTE_LOG(ERR, EAL,
@@ -674,7 +680,10 @@ eal_intr_process_interrupts(struct epoll_event *events, int nfds)
 			bytes_read = 0;
 			call = true;
 			break;
-
+		case RTE_INTR_HANDLE_DEV_EVENT:
+			bytes_read = 0;
+			call = true;
+			break;
 		default:
 			bytes_read = 1;
 			break;
diff --git a/test/test/test_interrupts.c b/test/test/test_interrupts.c
index 31a70a0..dc19175 100644
--- a/test/test/test_interrupts.c
+++ b/test/test/test_interrupts.c
@@ -20,6 +20,7 @@ enum test_interrupt_handle_type {
 	TEST_INTERRUPT_HANDLE_VALID,
 	TEST_INTERRUPT_HANDLE_VALID_UIO,
 	TEST_INTERRUPT_HANDLE_VALID_ALARM,
+	TEST_INTERRUPT_HANDLE_VALID_DEV_EVENT,
 	TEST_INTERRUPT_HANDLE_CASE1,
 	TEST_INTERRUPT_HANDLE_MAX
 };
@@ -80,6 +81,10 @@ test_interrupt_init(void)
 	intr_handles[TEST_INTERRUPT_HANDLE_VALID_ALARM].type =
 					RTE_INTR_HANDLE_ALARM;
 
+	intr_handles[TEST_INTERRUPT_HANDLE_VALID_DEV_EVENT].fd = pfds.readfd;
+	intr_handles[TEST_INTERRUPT_HANDLE_VALID_DEV_EVENT].type =
+					RTE_INTR_HANDLE_DEV_EVENT;
+
 	intr_handles[TEST_INTERRUPT_HANDLE_CASE1].fd = pfds.writefd;
 	intr_handles[TEST_INTERRUPT_HANDLE_CASE1].type = RTE_INTR_HANDLE_UIO;
 
@@ -250,6 +255,14 @@ test_interrupt_enable(void)
 		return -1;
 	}
 
+	/* check with specific valid intr_handle */
+	test_intr_handle = intr_handles[TEST_INTERRUPT_HANDLE_VALID_DEV_EVENT];
+	if (rte_intr_enable(&test_intr_handle) == 0) {
+		printf("unexpectedly enable a specific intr_handle "
+			"successfully\n");
+		return -1;
+	}
+
 	/* check with valid handler and its type */
 	test_intr_handle = intr_handles[TEST_INTERRUPT_HANDLE_CASE1];
 	if (rte_intr_enable(&test_intr_handle) < 0) {
@@ -306,6 +319,14 @@ test_interrupt_disable(void)
 		return -1;
 	}
 
+	/* check with specific valid intr_handle */
+	test_intr_handle = intr_handles[TEST_INTERRUPT_HANDLE_VALID_DEV_EVENT];
+	if (rte_intr_disable(&test_intr_handle) == 0) {
+		printf("unexpectedly disable a specific intr_handle "
+			"successfully\n");
+		return -1;
+	}
+
 	/* check with valid handler and its type */
 	test_intr_handle = intr_handles[TEST_INTERRUPT_HANDLE_CASE1];
 	if (rte_intr_disable(&test_intr_handle) < 0) {
@@ -393,9 +414,17 @@ test_interrupt(void)
 		goto out;
 	}
 
+	printf("Check valid device event interrupt full path\n");
+	if (test_interrupt_full_path_check(
+		TEST_INTERRUPT_HANDLE_VALID_DEV_EVENT) < 0) {
+		printf("failure occurred during checking valid device event "
+						"interrupt full path\n");
+		goto out;
+	}
+
 	printf("Check valid alarm interrupt full path\n");
-	if (test_interrupt_full_path_check(TEST_INTERRUPT_HANDLE_VALID_ALARM)
-									< 0) {
+	if (test_interrupt_full_path_check(
+		TEST_INTERRUPT_HANDLE_VALID_DEV_EVENT) < 0) {
 		printf("failure occurred during checking valid alarm "
 						"interrupt full path\n");
 		goto out;
@@ -513,6 +542,12 @@ test_interrupt(void)
 	rte_intr_callback_unregister(&test_intr_handle,
 			test_interrupt_callback_1, (void *)-1);
 
+	test_intr_handle = intr_handles[TEST_INTERRUPT_HANDLE_VALID_DEV_EVENT];
+	rte_intr_callback_unregister(&test_intr_handle,
+			test_interrupt_callback, (void *)-1);
+	rte_intr_callback_unregister(&test_intr_handle,
+			test_interrupt_callback_1, (void *)-1);
+
 	rte_delay_ms(2 * TEST_INTERRUPT_CHECK_INTERVAL);
 	/* deinit */
 	test_interrupt_deinit();
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH V19 2/4] eal: add device event monitor framework
  2018-04-05  8:32                                                                     ` [PATCH V19 0/4] add device event monitor framework Jeff Guo
  2018-04-05  8:32                                                                       ` [PATCH V19 1/4] eal: add device event handle in interrupt thread Jeff Guo
@ 2018-04-05  8:32                                                                       ` Jeff Guo
  2018-04-05 10:15                                                                         ` Tan, Jianfeng
  2018-04-05  8:32                                                                       ` [PATCH V19 3/4] eal/linux: uevent parse and process Jeff Guo
  2018-04-05  8:32                                                                       ` [PATCH V19 4/4] app/testpmd: enable device hotplug monitoring Jeff Guo
  3 siblings, 1 reply; 494+ messages in thread
From: Jeff Guo @ 2018-04-05  8:32 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, harry.van.haaren,
	jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

This patch aims to add a general device event monitor framework at
EAL device layer, for device hotplug awareness and actions adopted
accordingly. It could also expand for all other types of device event
monitor, but not in this scope at the stage.

To get started, users firstly call below new added APIs to enable/disable
the device event monitor mechanism:
  - rte_dev_event_monitor_start
  - rte_dev_event_monitor_stop

Then users shell register or unregister callbacks through the new added
APIs. Callbacks can be some device specific, or for all devices.
  -rte_dev_event_callback_register
  -rte_dev_event_callback_unregister

Use hotplug case for example, when device hotplug insertion or hotplug
removal, we will get notified from kernel, then call user's callbacks
accordingly to handle it, such as detach or attach the device from the
bus, and could benefit further fail-safe or live-migration.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v19->v18:
clear the coding style and fix typo
---
 doc/guides/rel_notes/release_18_05.rst  |   9 ++
 lib/librte_eal/bsdapp/eal/Makefile      |   1 +
 lib/librte_eal/bsdapp/eal/eal_dev.c     |  21 +++++
 lib/librte_eal/bsdapp/eal/meson.build   |   1 +
 lib/librte_eal/common/eal_common_dev.c  | 161 ++++++++++++++++++++++++++++++++
 lib/librte_eal/common/eal_private.h     |  15 +++
 lib/librte_eal/common/include/rte_dev.h |  94 +++++++++++++++++++
 lib/librte_eal/linuxapp/eal/Makefile    |   1 +
 lib/librte_eal/linuxapp/eal/eal_dev.c   |  22 +++++
 lib/librte_eal/linuxapp/eal/meson.build |   1 +
 lib/librte_eal/rte_eal_version.map      |  10 ++
 11 files changed, 336 insertions(+)
 create mode 100644 lib/librte_eal/bsdapp/eal/eal_dev.c
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_dev.c

diff --git a/doc/guides/rel_notes/release_18_05.rst b/doc/guides/rel_notes/release_18_05.rst
index e5fac1c..d3c86bd 100644
--- a/doc/guides/rel_notes/release_18_05.rst
+++ b/doc/guides/rel_notes/release_18_05.rst
@@ -58,6 +58,15 @@ New Features
   * Added support for NVGRE, VXLAN and GENEVE filters in flow API.
   * Added support for DROP action in flow API.
 
+* **Added device event monitor framework.**
+
+  Added a general device event monitor framework at EAL, for device dynamic management.
+  Such as device hotplug awareness and actions adopted accordingly. The list of new APIs:
+
+  * ``rte_dev_event_monitor_start`` and ``rte_dev_event_monitor_stop`` are for
+    the event monitor enable and disable.
+  * ``rte_dev_event_callback_register`` and ``rte_dev_event_callback_unregister``
+    are for the user's callbacks register and unregister.
 
 API Changes
 -----------
diff --git a/lib/librte_eal/bsdapp/eal/Makefile b/lib/librte_eal/bsdapp/eal/Makefile
index ed1d17b..90b88eb 100644
--- a/lib/librte_eal/bsdapp/eal/Makefile
+++ b/lib/librte_eal/bsdapp/eal/Makefile
@@ -33,6 +33,7 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_lcore.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_timer.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_interrupts.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_alarm.c
+SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_dev.c
 
 # from common dir
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_common_lcore.c
diff --git a/lib/librte_eal/bsdapp/eal/eal_dev.c b/lib/librte_eal/bsdapp/eal/eal_dev.c
new file mode 100644
index 0000000..1c6c51b
--- /dev/null
+++ b/lib/librte_eal/bsdapp/eal/eal_dev.c
@@ -0,0 +1,21 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <rte_log.h>
+#include <rte_compat.h>
+#include <rte_dev.h>
+
+int __rte_experimental
+rte_dev_event_monitor_start(void)
+{
+	RTE_LOG(ERR, EAL, "Device event is not supported for FreeBSD\n");
+	return -1;
+}
+
+int __rte_experimental
+rte_dev_event_monitor_stop(void)
+{
+	RTE_LOG(ERR, EAL, "Device event is not supported for FreeBSD\n");
+	return -1;
+}
diff --git a/lib/librte_eal/bsdapp/eal/meson.build b/lib/librte_eal/bsdapp/eal/meson.build
index e83fc91..6dfc533 100644
--- a/lib/librte_eal/bsdapp/eal/meson.build
+++ b/lib/librte_eal/bsdapp/eal/meson.build
@@ -12,4 +12,5 @@ env_sources = files('eal_alarm.c',
 		'eal_timer.c',
 		'eal.c',
 		'eal_memory.c',
+		'eal_dev.c'
 )
diff --git a/lib/librte_eal/common/eal_common_dev.c b/lib/librte_eal/common/eal_common_dev.c
index cd07144..e202cf2 100644
--- a/lib/librte_eal/common/eal_common_dev.c
+++ b/lib/librte_eal/common/eal_common_dev.c
@@ -14,9 +14,34 @@
 #include <rte_devargs.h>
 #include <rte_debug.h>
 #include <rte_log.h>
+#include <rte_spinlock.h>
+#include <rte_malloc.h>
 
 #include "eal_private.h"
 
+/**
+ * The device event callback description.
+ *
+ * It contains callback address to be registered by user application,
+ * the pointer to the parameters for callback, and the device name.
+ */
+struct dev_event_callback {
+	TAILQ_ENTRY(dev_event_callback) next; /**< Callbacks list */
+	rte_dev_event_cb_fn cb_fn;                /**< Callback address */
+	void *cb_arg;                           /**< Callback parameter */
+	char *dev_name;	 /**< Callback device name, NULL is for all device */
+	uint32_t active;                        /**< Callback is executing */
+};
+
+/* spinlock for device callbacks */
+static rte_spinlock_t dev_event_lock = RTE_SPINLOCK_INITIALIZER;
+
+/* The device event callback list for all registered callbacks. */
+static struct dev_event_cb_list dev_event_cbs;
+
+/** @internal Structure to keep track of registered callbacks */
+TAILQ_HEAD(dev_event_cb_list, dev_event_callback);
+
 static int cmp_detached_dev_name(const struct rte_device *dev,
 	const void *_name)
 {
@@ -207,3 +232,139 @@ rte_eal_hotplug_remove(const char *busname, const char *devname)
 	rte_eal_devargs_remove(busname, devname);
 	return ret;
 }
+
+int __rte_experimental
+rte_dev_event_callback_register(const char *device_name,
+				rte_dev_event_cb_fn cb_fn,
+				void *cb_arg)
+{
+	struct dev_event_callback *event_cb;
+	int ret;
+
+	if (!cb_fn)
+		return -EINVAL;
+
+	rte_spinlock_lock(&dev_event_lock);
+
+	if (TAILQ_EMPTY(&dev_event_cbs))
+		TAILQ_INIT(&dev_event_cbs);
+
+	TAILQ_FOREACH(event_cb, &dev_event_cbs, next) {
+		if (event_cb->cb_fn == cb_fn && event_cb->cb_arg == cb_arg) {
+			if (device_name == NULL && event_cb->dev_name == NULL)
+				break;
+			if (device_name == NULL || event_cb->dev_name == NULL)
+				continue;
+			if (!strcmp(event_cb->dev_name, device_name))
+				break;
+		}
+	}
+
+	/* create a new callback. */
+	if (event_cb == NULL) {
+		event_cb = malloc(sizeof(struct dev_event_callback));
+		if (event_cb != NULL) {
+			event_cb->cb_fn = cb_fn;
+			event_cb->cb_arg = cb_arg;
+			event_cb->active = 0;
+			if (!device_name) {
+				event_cb->dev_name = NULL;
+			} else {
+				event_cb->dev_name = strdup(device_name);
+				if (event_cb->dev_name == NULL) {
+					ret = -ENOMEM;
+					goto error;
+				}
+			}
+			TAILQ_INSERT_TAIL(&dev_event_cbs, event_cb, next);
+		} else {
+			RTE_LOG(ERR, EAL,
+				"Failed to allocate memory for device "
+				"event callback.");
+			ret = -ENOMEM;
+			goto error;
+		}
+	} else {
+		RTE_LOG(ERR, EAL,
+			"The callback is already exist, no need "
+			"to register again.\n");
+		ret = -EEXIST;
+	}
+
+	rte_spinlock_unlock(&dev_event_lock);
+	return 0;
+error:
+	free(event_cb);
+	rte_spinlock_unlock(&dev_event_lock);
+	return ret;
+}
+
+int __rte_experimental
+rte_dev_event_callback_unregister(const char *device_name,
+				  rte_dev_event_cb_fn cb_fn,
+				  void *cb_arg)
+{
+	int ret = 0;
+	struct dev_event_callback *event_cb, *next;
+
+	if (!cb_fn)
+		return -EINVAL;
+
+	rte_spinlock_lock(&dev_event_lock);
+	/*walk through the callbacks and remove all that match. */
+	for (event_cb = TAILQ_FIRST(&dev_event_cbs); event_cb != NULL;
+	     event_cb = next) {
+
+		next = TAILQ_NEXT(event_cb, next);
+
+		if (device_name != NULL && event_cb->dev_name != NULL) {
+			if (!strcmp(event_cb->dev_name, device_name)) {
+				if (event_cb->cb_fn != cb_fn ||
+				    (cb_arg != (void *)-1 &&
+				    event_cb->cb_arg != cb_arg))
+					continue;
+			}
+		} else if (device_name != NULL) {
+			continue;
+		}
+
+		/*
+		 * if this callback is not executing right now,
+		 * then remove it.
+		 */
+		if (event_cb->active == 0) {
+			TAILQ_REMOVE(&dev_event_cbs, event_cb, next);
+			free(event_cb);
+			ret++;
+		} else {
+			continue;
+		}
+	}
+	rte_spinlock_unlock(&dev_event_lock);
+	return ret;
+}
+
+void
+dev_callback_process(char *device_name, enum rte_dev_event_type event)
+{
+	struct dev_event_callback *cb_lst;
+
+	if (device_name == NULL)
+		return;
+
+	rte_spinlock_lock(&dev_event_lock);
+
+	TAILQ_FOREACH(cb_lst, &dev_event_cbs, next) {
+		if (cb_lst->dev_name) {
+			if (strcmp(cb_lst->dev_name, device_name))
+				continue;
+		}
+		cb_lst->active = 1;
+		rte_spinlock_unlock(&dev_event_lock);
+		cb_lst->cb_fn(device_name, event,
+				cb_lst->cb_arg);
+		rte_spinlock_lock(&dev_event_lock);
+		cb_lst->active = 0;
+	}
+	rte_spinlock_unlock(&dev_event_lock);
+}
diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h
index 0b28770..88e5a59 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -9,6 +9,8 @@
 #include <stdint.h>
 #include <stdio.h>
 
+#include <rte_dev.h>
+
 /**
  * Initialize the memzone subsystem (private to eal).
  *
@@ -205,4 +207,17 @@ struct rte_bus *rte_bus_find_by_device_name(const char *str);
 
 int rte_mp_channel_init(void);
 
+/**
+ * Internal Executes all the user application registered callbacks for
+ * the specific device. It is for DPDK internal user only. User
+ * application should not call it directly.
+ *
+ * @param device_name
+ *  The device name.
+ * @param event
+ *  the device event type.
+ *
+ */
+void
+dev_callback_process(char *device_name, enum rte_dev_event_type event);
 #endif /* _EAL_PRIVATE_H_ */
diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h
index b688f1e..2ed240e 100644
--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -24,6 +24,25 @@ extern "C" {
 #include <rte_compat.h>
 #include <rte_log.h>
 
+/**
+ * The device event type.
+ */
+enum rte_dev_event_type {
+	RTE_DEV_EVENT_ADD,	/**< device being added */
+	RTE_DEV_EVENT_REMOVE,	/**< device being removed */
+	RTE_DEV_EVENT_MAX	/**< max value of this enum */
+};
+
+struct rte_dev_event {
+	enum rte_dev_event_type type;	/**< device event type */
+	int subsystem;			/**< subsystem id */
+	char *devname;			/**< device name */
+};
+
+typedef void (*rte_dev_event_cb_fn)(char *device_name,
+					enum rte_dev_event_type event,
+					void *cb_arg);
+
 __attribute__((format(printf, 2, 0)))
 static inline void
 rte_pmd_debug_trace(const char *func_name, const char *fmt, ...)
@@ -267,4 +286,79 @@ __attribute__((used)) = str
 }
 #endif
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * It registers the callback for the specific device.
+ * Multiple callbacks cal be registered at the same time.
+ *
+ * @param device_name
+ *  The device name, that is the param name of the struct rte_device,
+ *  null value means for all devices.
+ * @param cb_fn
+ *  callback address.
+ * @param cb_arg
+ *  address of parameter for callback.
+ *
+ * @return
+ *  - On success, zero.
+ *  - On failure, a negative value.
+ */
+int __rte_experimental
+rte_dev_event_callback_register(const char *device_name,
+				rte_dev_event_cb_fn cb_fn,
+				void *cb_arg);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * It unregisters the callback according to the specified device.
+ *
+ * @param device_name
+ *  The device name, that is the param name of the struct rte_device,
+ *  null value means for all devices.
+ * @param cb_fn
+ *  callback address.
+ * @param cb_arg
+ *  address of parameter for callback, (void *)-1 means to remove all
+ *  registered which has the same callback address.
+ *
+ * @return
+ *  - On success, return the number of callback entities removed.
+ *  - On failure, a negative value.
+ */
+int __rte_experimental
+rte_dev_event_callback_unregister(const char *device_name,
+				  rte_dev_event_cb_fn cb_fn,
+				  void *cb_arg);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Start the device event monitoring.
+ *
+ * @param none
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int __rte_experimental
+rte_dev_event_monitor_start(void);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Stop the device event monitoring .
+ *
+ * @param none
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int __rte_experimental
+rte_dev_event_monitor_stop(void);
 #endif /* _RTE_DEV_H_ */
diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index b9c7727..db67001 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -41,6 +41,7 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_lcore.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_timer.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_interrupts.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_alarm.c
+SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_dev.c
 
 # from common dir
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_common_lcore.c
diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
new file mode 100644
index 0000000..9c8d1a0
--- /dev/null
+++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
@@ -0,0 +1,22 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <rte_log.h>
+#include <rte_compat.h>
+#include <rte_dev.h>
+
+
+int __rte_experimental
+rte_dev_event_monitor_start(void)
+{
+	/* TODO: start uevent monitor for linux */
+	return 0;
+}
+
+int __rte_experimental
+rte_dev_event_monitor_stop(void)
+{
+	/* TODO: stop uevent monitor for linux */
+	return 0;
+}
diff --git a/lib/librte_eal/linuxapp/eal/meson.build b/lib/librte_eal/linuxapp/eal/meson.build
index 03974ff..b222571 100644
--- a/lib/librte_eal/linuxapp/eal/meson.build
+++ b/lib/librte_eal/linuxapp/eal/meson.build
@@ -18,6 +18,7 @@ env_sources = files('eal_alarm.c',
 		'eal_vfio_mp_sync.c',
 		'eal.c',
 		'eal_memory.c',
+		'eal_dev.c',
 )
 
 if has_libnuma == 1
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index dd38783..3022df1 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -260,3 +260,13 @@ EXPERIMENTAL {
 	rte_socket_id_by_idx;
 
 } DPDK_18.02;
+
+EXPERIMENTAL {
+        global:
+
+	rte_dev_event_monitor_start;
+	rte_dev_event_monitor_stop;
+        rte_dev_event_callback_register;
+        rte_dev_event_callback_unregister;
+
+} DPDK_18.05;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH V19 3/4] eal/linux: uevent parse and process
  2018-04-05  8:32                                                                     ` [PATCH V19 0/4] add device event monitor framework Jeff Guo
  2018-04-05  8:32                                                                       ` [PATCH V19 1/4] eal: add device event handle in interrupt thread Jeff Guo
  2018-04-05  8:32                                                                       ` [PATCH V19 2/4] eal: add device event monitor framework Jeff Guo
@ 2018-04-05  8:32                                                                       ` Jeff Guo
  2018-04-05  8:32                                                                       ` [PATCH V19 4/4] app/testpmd: enable device hotplug monitoring Jeff Guo
  3 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-04-05  8:32 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, harry.van.haaren,
	jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

In order to handle the uevent which has been detected from the kernel
side, add uevent parse and process function to translate the uevent into
device event, which user has subscribed to monitor.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v19->18:
fix some misunderstanding part
---
 lib/librte_eal/linuxapp/eal/eal_dev.c | 196 +++++++++++++++++++++++++++++++++-
 1 file changed, 194 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
index 9c8d1a0..4686c41 100644
--- a/lib/librte_eal/linuxapp/eal/eal_dev.c
+++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
@@ -2,21 +2,213 @@
  * Copyright(c) 2018 Intel Corporation
  */
 
+#include <string.h>
+#include <unistd.h>
+#include <sys/socket.h>
+#include <linux/netlink.h>
+
 #include <rte_log.h>
 #include <rte_compat.h>
 #include <rte_dev.h>
+#include <rte_malloc.h>
+#include <rte_interrupts.h>
+
+#include "eal_private.h"
+
+static struct rte_intr_handle intr_handle = {.fd = -1 };
+static bool monitor_started;
+
+#define EAL_UEV_MSG_LEN 4096
+#define EAL_UEV_MSG_ELEM_LEN 128
+
+/* identify the system layer which reports this event. */
+enum eal_dev_event_subsystem {
+	EAL_DEV_EVENT_SUBSYSTEM_PCI, /* PCI bus device event */
+	EAL_DEV_EVENT_SUBSYSTEM_UIO, /* UIO driver device event */
+	EAL_DEV_EVENT_SUBSYSTEM_VFIO, /* VFIO driver device event */
+	EAL_DEV_EVENT_SUBSYSTEM_MAX
+};
+
+static int
+dev_uev_socket_fd_create(void)
+{
+	struct sockaddr_nl addr;
+	int ret;
+
+	intr_handle.fd = socket(PF_NETLINK, SOCK_RAW | SOCK_CLOEXEC |
+			SOCK_NONBLOCK,
+			NETLINK_KOBJECT_UEVENT);
+	if (intr_handle.fd < 0) {
+		RTE_LOG(ERR, EAL, "create uevent fd failed.\n");
+		return -1;
+	}
+
+	memset(&addr, 0, sizeof(addr));
+	addr.nl_family = AF_NETLINK;
+	addr.nl_pid = 0;
+	addr.nl_groups = 0xffffffff;
+
+	ret = bind(intr_handle.fd, (struct sockaddr *) &addr, sizeof(addr));
+	if (ret < 0) {
+		RTE_LOG(ERR, EAL, "Failed to bind uevent socket.\n");
+		goto err;
+	}
+
+	return 0;
+err:
+	close(intr_handle.fd);
+	intr_handle.fd = -1;
+	return ret;
+}
+
+static int
+dev_uev_parse(const char *buf, struct rte_dev_event *event, int length)
+{
+	char action[EAL_UEV_MSG_ELEM_LEN];
+	char subsystem[EAL_UEV_MSG_ELEM_LEN];
+	char pci_slot_name[EAL_UEV_MSG_ELEM_LEN];
+	int i = 0, ret = 0;
+
+	memset(action, 0, EAL_UEV_MSG_ELEM_LEN);
+	memset(subsystem, 0, EAL_UEV_MSG_ELEM_LEN);
+	memset(pci_slot_name, 0, EAL_UEV_MSG_ELEM_LEN);
+
+	while (i < length) {
+		for (; i < length; i++) {
+			if (*buf)
+				break;
+			buf++;
+		}
+		/**
+		 * check device uevent from kernel side, no need to check
+		 * uevent from udev.
+		 */
+		if (!strncmp(buf, "libudev", 7)) {
+			buf += 7;
+			i += 7;
+			return -1;
+		}
+		if (!strncmp(buf, "ACTION=", 7)) {
+			buf += 7;
+			i += 7;
+			snprintf(action, sizeof(action), "%s", buf);
+		} else if (!strncmp(buf, "SUBSYSTEM=", 10)) {
+			buf += 10;
+			i += 10;
+			snprintf(subsystem, sizeof(subsystem), "%s", buf);
+		} else if (!strncmp(buf, "PCI_SLOT_NAME=", 14)) {
+			buf += 14;
+			i += 14;
+			snprintf(pci_slot_name, sizeof(subsystem), "%s", buf);
+			event->devname = strdup(pci_slot_name);
+		}
+		for (; i < length; i++) {
+			if (*buf == '\0')
+				break;
+			buf++;
+		}
+	}
+
+	/* parse the subsystem layer */
+	if (!strncmp(subsystem, "uio", 3))
+		event->subsystem = EAL_DEV_EVENT_SUBSYSTEM_UIO;
+	else if (!strncmp(subsystem, "pci", 3))
+		event->subsystem = EAL_DEV_EVENT_SUBSYSTEM_PCI;
+	else if (!strncmp(subsystem, "vfio", 4))
+		event->subsystem = EAL_DEV_EVENT_SUBSYSTEM_VFIO;
+	else
+		ret = -1;
 
+	/* parse the action type */
+	if (!strncmp(action, "add", 3))
+		event->type = RTE_DEV_EVENT_ADD;
+	else if (!strncmp(action, "remove", 6))
+		event->type = RTE_DEV_EVENT_REMOVE;
+	else
+		ret = -1;
+	return ret;
+}
+
+static void
+dev_uev_handler(__rte_unused void *param)
+{
+	struct rte_dev_event uevent;
+	int ret;
+	char buf[EAL_UEV_MSG_LEN];
+
+	memset(&uevent, 0, sizeof(struct rte_dev_event));
+	memset(buf, 0, EAL_UEV_MSG_LEN);
+
+	ret = recv(intr_handle.fd, buf, EAL_UEV_MSG_LEN, MSG_DONTWAIT);
+	if (ret == 0 || (ret < 0 && errno != EAGAIN)) {
+		/* connection is closed or broken, can not up again. */
+		RTE_LOG(ERR, EAL, "uevent socket connection is broken.\n");
+		return;
+	} else if (ret < 0) {
+		RTE_LOG(ERR, EAL,
+			"uevent socket read error(%d): %s.\n",
+			errno, strerror(errno));
+		return;
+	}
+
+	ret = dev_uev_parse(buf, &uevent, EAL_UEV_MSG_LEN);
+	if (ret < 0) {
+		RTE_LOG(ERR, EAL, "It is not an valid event "
+			"that need to be handle.\n");
+		return;
+	}
+
+	RTE_LOG(DEBUG, EAL, "receive uevent(name:%s, type:%d, subsystem:%d)\n",
+		uevent.devname, uevent.type, uevent.subsystem);
+
+	if (uevent.devname)
+		dev_callback_process(uevent.devname, uevent.type);
+}
 
 int __rte_experimental
 rte_dev_event_monitor_start(void)
 {
-	/* TODO: start uevent monitor for linux */
+	int ret;
+
+	if (monitor_started)
+		return 0;
+
+	ret = dev_uev_socket_fd_create();
+	if (ret) {
+		RTE_LOG(ERR, EAL, "error create device event fd.\n");
+		return -1;
+	}
+
+	intr_handle.type = RTE_INTR_HANDLE_DEV_EVENT;
+	ret = rte_intr_callback_register(&intr_handle, dev_uev_handler, NULL);
+
+	if (ret) {
+		RTE_LOG(ERR, EAL, "fail to register uevent callback.\n");
+		return -1;
+	}
+
+	monitor_started = true;
+
 	return 0;
 }
 
 int __rte_experimental
 rte_dev_event_monitor_stop(void)
 {
-	/* TODO: stop uevent monitor for linux */
+	int ret;
+
+	if (!monitor_started)
+		return 0;
+
+	ret = rte_intr_callback_unregister(&intr_handle, dev_uev_handler,
+					   (void *)-1);
+	if (ret < 0) {
+		RTE_LOG(ERR, EAL, "fail to unregister uevent callback.\n");
+		return ret;
+	}
+
+	close(intr_handle.fd);
+	intr_handle.fd = -1;
+	monitor_started = false;
 	return 0;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH V19 4/4] app/testpmd: enable device hotplug monitoring
  2018-04-05  8:32                                                                     ` [PATCH V19 0/4] add device event monitor framework Jeff Guo
                                                                                         ` (2 preceding siblings ...)
  2018-04-05  8:32                                                                       ` [PATCH V19 3/4] eal/linux: uevent parse and process Jeff Guo
@ 2018-04-05  8:32                                                                       ` Jeff Guo
  3 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-04-05  8:32 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, harry.van.haaren,
	jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

Use testpmd for example, to show how an application uses device event
APIs to monitor the hotplug events, including both hot removal event
and hot insertion event.

The process is that, testpmd first enable hotplug by below commands,

E.g. ./build/app/testpmd -c 0x3 --n 4 -- -i --hot-plug

then testpmd starts the device event monitor by calling the new API
(rte_dev_event_monitor_start) and register the user's callback by call
the API (rte_dev_event_callback_register), when device being hotplug
insertion or hotplug removal, the device event monitor detects the event
and call user's callbacks, user could process the event in the callback
accordingly.

This patch only shows the event monitoring, device attach/detach would
not be involved here, will add from other hotplug patch set.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v19->v18:
fix some typo
---
 app/test-pmd/parameters.c             |   5 +-
 app/test-pmd/testpmd.c                | 101 +++++++++++++++++++++++++++++++++-
 app/test-pmd/testpmd.h                |   2 +
 doc/guides/testpmd_app_ug/run_app.rst |   4 ++
 4 files changed, 110 insertions(+), 2 deletions(-)

diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index 2192bdc..1a05284 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -186,6 +186,7 @@ usage(char* progname)
 	printf("  --flow-isolate-all: "
 	       "requests flow API isolated mode on all ports at initialization time.\n");
 	printf("  --tx-offloads=0xXXXXXXXX: hexadecimal bitmask of TX queue offloads\n");
+	printf("  --hot-plug: enable hot plug for device.\n");
 }
 
 #ifdef RTE_LIBRTE_CMDLINE
@@ -621,6 +622,7 @@ launch_args_parse(int argc, char** argv)
 		{ "print-event",		1, 0, 0 },
 		{ "mask-event",			1, 0, 0 },
 		{ "tx-offloads",		1, 0, 0 },
+		{ "hot-plug",			0, 0, 0 },
 		{ 0, 0, 0, 0 },
 	};
 
@@ -1101,7 +1103,8 @@ launch_args_parse(int argc, char** argv)
 					rte_exit(EXIT_FAILURE,
 						 "invalid mask-event argument\n");
 				}
-
+			if (!strcmp(lgopts[opt_idx].name, "hot-plug"))
+				hot_plug = 1;
 			break;
 		case 'h':
 			usage(argv[0]);
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 4c0e258..d2c122a 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -12,6 +12,7 @@
 #include <sys/mman.h>
 #include <sys/types.h>
 #include <errno.h>
+#include <stdbool.h>
 
 #include <sys/queue.h>
 #include <sys/stat.h>
@@ -284,6 +285,8 @@ uint8_t lsc_interrupt = 1; /* enabled by default */
  */
 uint8_t rmv_interrupt = 1; /* enabled by default */
 
+uint8_t hot_plug = 0; /**< hotplug disabled by default. */
+
 /*
  * Display or mask ether events
  * Default to all events except VF_MBOX
@@ -391,6 +394,12 @@ static void check_all_ports_link_status(uint32_t port_mask);
 static int eth_event_callback(portid_t port_id,
 			      enum rte_eth_event_type type,
 			      void *param, void *ret_param);
+static void eth_dev_event_callback(char *device_name,
+				enum rte_dev_event_type type,
+				void *param);
+static int eth_dev_event_callback_register(void);
+static int eth_dev_event_callback_unregister(void);
+
 
 /*
  * Check if all the ports are started.
@@ -1853,6 +1862,39 @@ reset_port(portid_t pid)
 	printf("Done\n");
 }
 
+static int
+eth_dev_event_callback_register(void)
+{
+	int ret;
+
+	/* register the device event callback */
+	ret = rte_dev_event_callback_register(NULL,
+		eth_dev_event_callback, NULL);
+	if (ret) {
+		printf("Failed to register device event callback\n");
+		return -1;
+	}
+
+	return 0;
+}
+
+
+static int
+eth_dev_event_callback_unregister(void)
+{
+	int ret;
+
+	/* unregister the device event callback */
+	ret = rte_dev_event_callback_unregister(NULL,
+		eth_dev_event_callback, NULL);
+	if (ret < 0) {
+		printf("Failed to unregister device event callback\n");
+		return -1;
+	}
+
+	return 0;
+}
+
 void
 attach_port(char *identifier)
 {
@@ -1916,6 +1958,7 @@ void
 pmd_test_exit(void)
 {
 	portid_t pt_id;
+	int ret;
 
 	if (test_done == 0)
 		stop_packet_forwarding();
@@ -1929,6 +1972,18 @@ pmd_test_exit(void)
 			close_port(pt_id);
 		}
 	}
+
+	if (hot_plug) {
+		ret = rte_dev_event_monitor_stop();
+		if (ret)
+			RTE_LOG(ERR, EAL,
+				"fail to stop device event monitor.");
+
+		ret = eth_dev_event_callback_unregister();
+		if (ret)
+			RTE_LOG(ERR, EAL,
+				"fail to unregister all event callbacks.");
+	}
 	printf("\nBye...\n");
 }
 
@@ -2059,6 +2114,37 @@ eth_event_callback(portid_t port_id, enum rte_eth_event_type type, void *param,
 	return 0;
 }
 
+/* This function is used by the interrupt thread */
+static void
+eth_dev_event_callback(char *device_name, enum rte_dev_event_type type,
+			     __rte_unused void *arg)
+{
+	if (type >= RTE_DEV_EVENT_MAX) {
+		fprintf(stderr, "%s called upon invalid event %d\n",
+			__func__, type);
+		fflush(stderr);
+	}
+
+	switch (type) {
+	case RTE_DEV_EVENT_REMOVE:
+		RTE_LOG(ERR, EAL, "The device: %s has been removed!\n",
+			device_name);
+		/* TODO: After finish failure handle, begin to stop
+		 * packet forward, stop port, close port, detach port.
+		 */
+		break;
+	case RTE_DEV_EVENT_ADD:
+		RTE_LOG(ERR, EAL, "The device: %s has been added!\n",
+			device_name);
+		/* TODO: After finish kernel driver binding,
+		 * begin to attach port.
+		 */
+		break;
+	default:
+		break;
+	}
+}
+
 static int
 set_tx_queue_stats_mapping_registers(portid_t port_id, struct rte_port *port)
 {
@@ -2474,8 +2560,9 @@ signal_handler(int signum)
 int
 main(int argc, char** argv)
 {
-	int  diag;
+	int diag;
 	portid_t port_id;
+	int ret;
 
 	signal(SIGINT, signal_handler);
 	signal(SIGTERM, signal_handler);
@@ -2543,6 +2630,18 @@ main(int argc, char** argv)
 		       nb_rxq, nb_txq);
 
 	init_config();
+
+	if (hot_plug) {
+		/* enable hot plug monitoring */
+		ret = rte_dev_event_monitor_start();
+		if (ret) {
+			rte_errno = EINVAL;
+			return -1;
+		}
+		eth_dev_event_callback_register();
+
+	}
+
 	if (start_port(RTE_PORT_ALL) != 0)
 		rte_exit(EXIT_FAILURE, "Start ports failed\n");
 
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 153abea..8fde68d 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -319,6 +319,8 @@ extern volatile int test_done; /* stop packet forwarding when set to 1. */
 extern uint8_t lsc_interrupt; /**< disabled by "--no-lsc-interrupt" parameter */
 extern uint8_t rmv_interrupt; /**< disabled by "--no-rmv-interrupt" parameter */
 extern uint32_t event_print_mask;
+extern uint8_t hot_plug; /**< enable by "--hot-plug" parameter */
+
 /**< set by "--print-event xxxx" and "--mask-event xxxx parameters */
 
 #ifdef RTE_LIBRTE_IXGBE_BYPASS
diff --git a/doc/guides/testpmd_app_ug/run_app.rst b/doc/guides/testpmd_app_ug/run_app.rst
index 1fd5395..d0ced36 100644
--- a/doc/guides/testpmd_app_ug/run_app.rst
+++ b/doc/guides/testpmd_app_ug/run_app.rst
@@ -479,3 +479,7 @@ The commandline options are:
 
     Set the hexadecimal bitmask of TX queue offloads.
     The default value is 0.
+
+*   ``--hot-plug``
+
+    Enable device event monitor machenism for hotplug.
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* Re: [PATCH V18 4/4] app/testpmd: enable device hotplug monitoring
  2018-04-04 16:31                                                                       ` Matan Azrad
@ 2018-04-05  8:40                                                                         ` Guo, Jia
  2018-04-05  9:03                                                                         ` Tan, Jianfeng
  1 sibling, 0 replies; 494+ messages in thread
From: Guo, Jia @ 2018-04-05  8:40 UTC (permalink / raw)
  To: Matan Azrad, Tan, Jianfeng, stephen, Richardson, Bruce, Yigit,
	Ferruh, Ananyev, Konstantin, gaetan.rivet, Wu, Jingjing,
	Thomas Monjalon, Mordechay Haimovsky, Van Haaren, Harry
  Cc: jblunck, shreyansh.jain, dev, Zhang, Helin



On 4/5/2018 12:31 AM, Matan Azrad wrote:
> Hi all
>
> What do you think about adding the "--hotplug" parameter as a new EAL command line parameter?
that just use testpmd for example at this stage, if the total solution 
is accept for all and got agreement for that i think could let it in EAL 
command in the coming patch set..
good suggestion, azrad.
> From: Tan, Jianfeng, Wednesday, April 4, 2018 6:23 AM
>>> -----Original Message-----
>>> From: Guo, Jia
>>> Sent: Tuesday, April 3, 2018 6:34 PM
>>> To: stephen@networkplumber.org; Richardson, Bruce; Yigit, Ferruh;
>>> Ananyev, Konstantin; gaetan.rivet@6wind.com; Wu, Jingjing;
>>> thomas@monjalon.net; motih@mellanox.com; Van Haaren, Harry; Tan,
>>> Jianfeng
>>> Cc: jblunck@infradead.org; shreyansh.jain@nxp.com; dev@dpdk.org; Guo,
>>> Jia; Zhang, Helin
>>> Subject: [PATCH V18 4/4] app/testpmd: enable device hotplug monitoring
>>>
>>> Use testpmd for example, to show how an application use device event
>> s/use/uses
>>
>>> APIs to monitor the hotplug events, including both hot removal event
>>> and hot insertion event.
>>>
>>> The process is that, testpmd first enable hotplug by below commands,
>>>
>>> E.g. ./build/app/testpmd -c 0x3 --n 4 -- -i --hot-plug
>>>
>>> then testpmd start the device event monitor by call the new API
>> s/start/starts
>> s/call/calling
>>
>>> (rte_dev_event_monitor_start) and register the user's callback by call
>>> the API (rte_dev_event_callback_register), when device being hotplug
>>> insertion or hotplug removal, the device event monitor detects the
>>> event and call user's callbacks, user could process the event in the
>>> callback accordingly.
>>>
>>> This patch only shows the event monitoring, device attach/detach would
>>> not be involved here, will add from other hotplug patch set.
>>>
>>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
>> Some typos and a trivial suggestion. Feel free to carry my
>>
>> Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
>>
>> in the next version.
>>
>>> ---
>>> v18->v17:
>>> remove hotplug policy and detach/attach process from testpmd, let it
>>> focus on the device event monitoring which the patch set introduced.
>>> ---
>>>   app/test-pmd/parameters.c             |   5 +-
>>>   app/test-pmd/testpmd.c                | 112
>>> +++++++++++++++++++++++++++++++++-
>>>   app/test-pmd/testpmd.h                |   2 +
>>>   doc/guides/testpmd_app_ug/run_app.rst |   4 ++
>>>   4 files changed, 121 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
>>> index 97d22b8..558cd40 100644
>>> --- a/app/test-pmd/parameters.c
>>> +++ b/app/test-pmd/parameters.c
>>> @@ -186,6 +186,7 @@ usage(char* progname)
>>>   	printf("  --flow-isolate-all: "
>>>   	       "requests flow API isolated mode on all ports at
>>> initialization time.\n");
>>>   	printf("  --tx-offloads=0xXXXXXXXX: hexadecimal bitmask of TX
>> queue
>>> offloads\n");
>>> +	printf("  --hot-plug: enable hot plug for device.\n");
>>>   }
>>>
>>>   #ifdef RTE_LIBRTE_CMDLINE
>>> @@ -621,6 +622,7 @@ launch_args_parse(int argc, char** argv)
>>>   		{ "print-event",		1, 0, 0 },
>>>   		{ "mask-event",			1, 0, 0 },
>>>   		{ "tx-offloads",		1, 0, 0 },
>>> +		{ "hot-plug",			0, 0, 0 },
>>>   		{ 0, 0, 0, 0 },
>>>   	};
>>>
>>> @@ -1102,7 +1104,8 @@ launch_args_parse(int argc, char** argv)
>>>   					rte_exit(EXIT_FAILURE,
>>>   						 "invalid mask-event
>>> argument\n");
>>>   				}
>>> -
>>> +			if (!strcmp(lgopts[opt_idx].name, "hot-plug"))
>>> +				hot_plug = 1;
>>>   			break;
>>>   		case 'h':
>>>   			usage(argv[0]);
>>> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
>>> 4c0e258..2faeb90 100644
>>> --- a/app/test-pmd/testpmd.c
>>> +++ b/app/test-pmd/testpmd.c
>>> @@ -12,6 +12,7 @@
>>>   #include <sys/mman.h>
>>>   #include <sys/types.h>
>>>   #include <errno.h>
>>> +#include <stdbool.h>
>>>
>>>   #include <sys/queue.h>
>>>   #include <sys/stat.h>
>>> @@ -284,6 +285,8 @@ uint8_t lsc_interrupt = 1; /* enabled by default */
>>>    */
>>>   uint8_t rmv_interrupt = 1; /* enabled by default */
>>>
>>> +uint8_t hot_plug = 0; /**< hotplug disabled by default. */
>>> +
>>>   /*
>>>    * Display or mask ether events
>>>    * Default to all events except VF_MBOX @@ -391,6 +394,12 @@ static
>>> void check_all_ports_link_status(uint32_t
>>> port_mask);
>>>   static int eth_event_callback(portid_t port_id,
>>>   			      enum rte_eth_event_type type,
>>>   			      void *param, void *ret_param);
>>> +static int eth_dev_event_callback(char *device_name,
>>> +				enum rte_dev_event_type type,
>>> +				void *param);
>>> +static int eth_dev_event_callback_register(void);
>>> +static int eth_dev_event_callback_unregister(void);
>>> +
>>>
>>>   /*
>>>    * Check if all the ports are started.
>>> @@ -1853,6 +1862,39 @@ reset_port(portid_t pid)
>>>   	printf("Done\n");
>>>   }
>>>
>>> +static int
>>> +eth_dev_event_callback_register(void)
>>> +{
>>> +	int diag;
>>> +
>>> +	/* register the device event callback */
>>> +	diag = rte_dev_event_callback_register(NULL,
>>> +		eth_dev_event_callback, NULL);
>>> +	if (diag) {
>>> +		printf("Failed to setup dev_event callback\n");
>>> +		return -1;
>>> +	}
>>> +
>>> +	return 0;
>>> +}
>>> +
>>> +
>>> +static int
>>> +eth_dev_event_callback_unregister(void)
>>> +{
>>> +	int diag;
>>> +
>>> +	/* unregister the device event callback */
>>> +	diag = rte_dev_event_callback_unregister(NULL,
>>> +		eth_dev_event_callback, NULL);
>>> +	if (diag) {
>>> +		printf("Failed to setup dev_event callback\n");
>>> +		return -1;
>>> +	}
>>> +
>>> +	return 0;
>>> +}
>>> +
>>>   void
>>>   attach_port(char *identifier)
>>>   {
>>> @@ -1916,6 +1958,7 @@ void
>>>   pmd_test_exit(void)
>>>   {
>>>   	portid_t pt_id;
>>> +	int ret;
>>>
>>>   	if (test_done == 0)
>>>   		stop_packet_forwarding();
>>> @@ -1929,6 +1972,18 @@ pmd_test_exit(void)
>>>   			close_port(pt_id);
>>>   		}
>>>   	}
>>> +
>>> +	if (hot_plug) {
>>> +		ret = rte_dev_event_monitor_stop();
>>> +		if (ret)
>>> +			RTE_LOG(ERR, EAL,
>>> +				"fail to stop device event monitor.");
>>> +
>>> +		ret = eth_dev_event_callback_unregister();
>>> +		if (ret)
>>> +			RTE_LOG(ERR, EAL,
>>> +				"fail to unregister all event callbacks.");
>>> +	}
>>>   	printf("\nBye...\n");
>>>   }
>>>
>>> @@ -2059,6 +2114,48 @@ eth_event_callback(portid_t port_id, enum
>>> rte_eth_event_type type, void *param,
>>>   	return 0;
>>>   }
>>>
>>> +/* This function is used by the interrupt thread */ static int
>>> +eth_dev_event_callback(char *device_name, enum rte_dev_event_type
>>> type,
>>> +			     __rte_unused void *arg)
>>> +{
>>> +	int ret = 0;
>>  From here
>>
>>> +	static const char * const event_desc[] = {
>>> +		[RTE_DEV_EVENT_ADD] = "add",
>>> +		[RTE_DEV_EVENT_REMOVE] = "remove",
>>> +	};
>>> +
>>> +	if (type >= RTE_DEV_EVENT_MAX) {
>>> +		fprintf(stderr, "%s called upon invalid event %d\n",
>>> +			__func__, type);
>>> +		fflush(stderr);
>>> +	} else if (event_print_mask & (UINT32_C(1) << type)) {
>>> +		printf("%s event\n",
>>> +			event_desc[type]);
>>> +		fflush(stdout);
>>> +	}
>> to here, these check are not necessary.
>>
>>> +
>>> +	switch (type) {
>>> +	case RTE_DEV_EVENT_REMOVE:
>>> +		RTE_LOG(ERR, EAL, "The device: %s has been removed!\n",
>>> +			device_name);
>>> +		/* TODO: After finish failure handle, begin to stop
>>> +		 * packet forward, stop port, close port, detach port.
>>> +		 */
>>> +		break;
>>> +	case RTE_DEV_EVENT_ADD:
>>> +		RTE_LOG(ERR, EAL, "The device: %s has been added!\n",
>>> +			device_name);
>>> +		/* TODO: After finish kernel driver binding,
>>> +		 * begin to attach port.
>>> +		 */
>>> +		break;
>>> +	default:
>>> +		break;
>>> +	}
>>> +	return ret;
>>> +}
>>> +
>>>   static int
>>>   set_tx_queue_stats_mapping_registers(portid_t port_id, struct
>>> rte_port
>>> *port)
>>>   {
>>> @@ -2474,8 +2571,9 @@ signal_handler(int signum)  int  main(int argc,
>>> char** argv)  {
>>> -	int  diag;
>>> +	int diag;
>>>   	portid_t port_id;
>>> +	int ret;
>>>
>>>   	signal(SIGINT, signal_handler);
>>>   	signal(SIGTERM, signal_handler);
>>> @@ -2543,6 +2641,18 @@ main(int argc, char** argv)
>>>   		       nb_rxq, nb_txq);
>>>
>>>   	init_config();
>>> +
>>> +	if (hot_plug) {
>>> +		/* enable hot plug monitoring */
>>> +		ret = rte_dev_event_monitor_start();
>>> +		if (ret) {
>>> +			rte_errno = EINVAL;
>>> +			return -1;
>>> +		}
>>> +		eth_dev_event_callback_register();
>>> +
>>> +	}
>>> +
>>>   	if (start_port(RTE_PORT_ALL) != 0)
>>>   		rte_exit(EXIT_FAILURE, "Start ports failed\n");
>>>
>>> diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h index
>>> 153abea..8fde68d 100644
>>> --- a/app/test-pmd/testpmd.h
>>> +++ b/app/test-pmd/testpmd.h
>>> @@ -319,6 +319,8 @@ extern volatile int test_done; /* stop packet
>>> forwarding when set to 1. */  extern uint8_t lsc_interrupt; /**<
>>> disabled by "--no-lsc-interrupt" parameter */  extern uint8_t
>>> rmv_interrupt; /**< disabled by "--no-rmv-interrupt"
>>> parameter */
>>>   extern uint32_t event_print_mask;
>>> +extern uint8_t hot_plug; /**< enable by "--hot-plug" parameter */
>>> +
>>>   /**< set by "--print-event xxxx" and "--mask-event xxxx parameters */
>>>
>>>   #ifdef RTE_LIBRTE_IXGBE_BYPASS
>>> diff --git a/doc/guides/testpmd_app_ug/run_app.rst
>>> b/doc/guides/testpmd_app_ug/run_app.rst
>>> index 1fd5395..d0ced36 100644
>>> --- a/doc/guides/testpmd_app_ug/run_app.rst
>>> +++ b/doc/guides/testpmd_app_ug/run_app.rst
>>> @@ -479,3 +479,7 @@ The commandline options are:
>>>
>>>       Set the hexadecimal bitmask of TX queue offloads.
>>>       The default value is 0.
>>> +
>>> +*   ``--hot-plug``
>>> +
>>> +    Enable device event monitor machenism for hotplug.
>> s/machenism/mechanism
>>
>>> --
>>> 2.7.4

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH V19 0/4] add device event monitor framework
  2018-04-03 10:33                                                                   ` [PATCH V18 4/4] app/testpmd: enable device hotplug monitoring Jeff Guo
  2018-04-04  3:22                                                                     ` Tan, Jianfeng
  2018-04-05  8:32                                                                     ` [PATCH V19 0/4] add device event monitor framework Jeff Guo
@ 2018-04-05  9:02                                                                     ` Jeff Guo
  2018-04-05  9:02                                                                       ` [PATCH V19 1/4] eal: add device event handle in interrupt thread Jeff Guo
                                                                                         ` (3 more replies)
  2 siblings, 4 replies; 494+ messages in thread
From: Jeff Guo @ 2018-04-05  9:02 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, harry.van.haaren,
	jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

About hot plug in dpdk, We already have proactive way to add/remove devices
through APIs (rte_eal_hotplug_add/remove), and also have fail-safe driver
to offload the fail-safe work from the app user. But there are still lack
of a general mechanism to monitor hotplug event for all driver, now the
hotplug interrupt event is diversity between each device and driver, such
as mlx4, pci driver and others.

Use the hot removal event for example, pci drivers not all exposure the
remove interrupt, so in order to make user to easy use the hot plug
feature for pci driver, something must be done to detect the remove event
at the kernel level and offer a new line of interrupt to the user land.

Base on the uevent of kobject mechanism in kernel, we could use it to
benefit for monitoring the hot plug status of the device which not only
uio/vfio of pci bus devices, but also other, such as cpu/usb/pci-express bus devices.

The idea is comming as bellow.

a.The uevent message form FD monitoring like below.
remove@/devices/pci0000:80/0000:80:02.2/0000:82:00.0/0000:83:03.0/0000:84:00.2/uio/uio2
ACTION=remove
DEVPATH=/devices/pci0000:80/0000:80:02.2/0000:82:00.0/0000:83:03.0/0000:84:00.2/uio/uio2
SUBSYSTEM=uio
MAJOR=243
MINOR=2
DEVNAME=uio2
SEQNUM=11366

b.add device event monitor framework:
add several general api to enable uevent monitoring.

c.show example how to use uevent monitor
enable uevent monitoring in testpmd to show device event monitor machenism usage.

TODO: failure handler mechanism for hot plug and driver auto bind for hot insertion.
that would let the next hot plug patch set to cover.

patchset history:
v19->v18:
fix some typo and misunderstanding part

v18->v17:
1.add feature announcement in release document, fix bsp compile issue.
2.refine socket configuration.
3.remove hotplug policy and detach/attach process from testpmd, let it
focus on the device event monitoring which the patch set introduced.

v17->v16:
1.add related part of the interrupt handle type adding.
2.add new API into map, fix typo issue, add (void*)-1 value for unregister all callback
3.add new file into meson.build, modify coding sytle and add print info, delete unused part.
4.unregister all user's callback when stop event monitor

v16->v15:
1.remove some linux related code out of eal common layer
2.fix some uneasy readble issue.

v15->v14:
1.use exist eal interrupt epoll to replace of rte service usage for monitor thread,
2.add new device event handle type in eal interrupt.
3.remove the uevent type check and any policy from eal,
let it check and management in user's callback.
4.add "--hot-plug" configure parameter in testpmd to switch the hotplug feature.

v14->v13:
1.add __rte_experimental on function defind and fix bsd build issue

v13->v12:
1.fix some logic issue and null check issue
2.fix monitor stop func issue

v12->v11:
1.identify null param in callback for monitor all devices uevent

v11->v10:
1:modify some typo and add experimental tag in new file.
2:modify callback register calling.

v10->v9:
1.fix prefix issue.
2.use a common callback lists for all device and all type to replace
add callback parameter into device struct.
3.delete some unuse part.

v9->v8:
split the patch set into small and explicit patch

v8->v7:
1.use rte_service to replace pthread management.
2.fix defind issue and copyright issue
3.fix some lock issue

v7->v6:
1.modify vdev part according to the vdev rework
2.re-define and split the func into common and bus specific code
3.fix some incorrect issue.
4.fix the system hung after send packcet issue.

v6->v5:
1.add hot plug policy, in eal, default handle to prepare hot plug work for
all pci device, then let app to manage to deside which device need to
hot plug.
2.modify to manage event callback in each device.
3.fix some system hung issue when igb_uioome typo error.release.
4.modify the pci part to the bus-pci base on the bus rework.
5.add hot plug policy in app, show example to use hotplug list to manage
to deside which device need to hot plug.

v5->v4:
1.Move uevent monitor epolling from eal interrupt to eal device layer.
2.Redefine the eal device API for common, and distinguish between linux and bsd
3.Add failure handler helper api in bus layer.Add function of find device by name.
4.Replace of individual fd bind with single device, use a common fd to polling all device.
5.Add to register hot insertion monitoring and process, add function to auto bind driver befor user add device
6.Refine some coding style and typos issue
7.add new callback to process hot insertion

v4->v3:
1.move uevent monitor api from eal interrupt to eal device layer.
2.create uevent type and struct in eal device.
3.move uevent handler for each driver to eal layer.
4.add uevent failure handler to process signal fault issue.
5.add example for request and use uevent monitoring in testpmd.

v3->v2:
1.refine some return error
2.refine the string searching logic to avoid memory issue

v2->v1:
1.remove global variables of hotplug_fd, add uevent_fd
in rte_intr_handle to let each pci device self maintain it fd,
to fix dual device fd issue.
2.refine some typo error.

Jeff Guo (4):
  eal: add device event handle in interrupt thread
  eal: add device event monitor framework
  eal/linux: uevent parse and process
  app/testpmd: enable device hotplug monitoring

 app/test-pmd/parameters.c                          |   5 +-
 app/test-pmd/testpmd.c                             | 101 +++++++++-
 app/test-pmd/testpmd.h                             |   2 +
 doc/guides/rel_notes/release_18_05.rst             |   9 +
 doc/guides/testpmd_app_ug/run_app.rst              |   4 +
 lib/librte_eal/bsdapp/eal/Makefile                 |   1 +
 lib/librte_eal/bsdapp/eal/eal_dev.c                |  21 ++
 lib/librte_eal/bsdapp/eal/meson.build              |   1 +
 lib/librte_eal/common/eal_common_dev.c             | 161 ++++++++++++++++
 lib/librte_eal/common/eal_private.h                |  15 ++
 lib/librte_eal/common/include/rte_dev.h            |  94 +++++++++
 lib/librte_eal/common/include/rte_eal_interrupts.h |   1 +
 lib/librte_eal/linuxapp/eal/Makefile               |   1 +
 lib/librte_eal/linuxapp/eal/eal_dev.c              | 214 +++++++++++++++++++++
 lib/librte_eal/linuxapp/eal/eal_interrupts.c       |  11 +-
 lib/librte_eal/linuxapp/eal/meson.build            |   1 +
 lib/librte_eal/rte_eal_version.map                 |  10 +
 test/test/test_interrupts.c                        |  39 +++-
 18 files changed, 686 insertions(+), 5 deletions(-)
 create mode 100644 lib/librte_eal/bsdapp/eal/eal_dev.c
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_dev.c

-- 
2.7.4

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH V19 1/4] eal: add device event handle in interrupt thread
  2018-04-05  9:02                                                                     ` [PATCH V19 0/4] add device event monitor framework Jeff Guo
@ 2018-04-05  9:02                                                                       ` Jeff Guo
  2018-04-05  9:02                                                                       ` [PATCH V19 2/4] eal: add device event monitor framework Jeff Guo
                                                                                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-04-05  9:02 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, harry.van.haaren,
	jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

Add new interrupt handle type of RTE_INTR_HANDLE_DEV_EVENT, for
device event interrupt monitor.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
---
v19->v18:
fix some typo
---
 lib/librte_eal/common/include/rte_eal_interrupts.h |  1 +
 lib/librte_eal/linuxapp/eal/eal_interrupts.c       | 11 +++++-
 test/test/test_interrupts.c                        | 39 ++++++++++++++++++++--
 3 files changed, 48 insertions(+), 3 deletions(-)

diff --git a/lib/librte_eal/common/include/rte_eal_interrupts.h b/lib/librte_eal/common/include/rte_eal_interrupts.h
index 3f792a9..6eb4932 100644
--- a/lib/librte_eal/common/include/rte_eal_interrupts.h
+++ b/lib/librte_eal/common/include/rte_eal_interrupts.h
@@ -34,6 +34,7 @@ enum rte_intr_handle_type {
 	RTE_INTR_HANDLE_ALARM,        /**< alarm handle */
 	RTE_INTR_HANDLE_EXT,          /**< external handler */
 	RTE_INTR_HANDLE_VDEV,         /**< virtual device */
+	RTE_INTR_HANDLE_DEV_EVENT,    /**< device event handle */
 	RTE_INTR_HANDLE_MAX           /**< count of elements */
 };
 
diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
index f86f22f..58e9328 100644
--- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c
+++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
@@ -559,6 +559,9 @@ rte_intr_enable(const struct rte_intr_handle *intr_handle)
 			return -1;
 		break;
 #endif
+	/* not used at this moment */
+	case RTE_INTR_HANDLE_DEV_EVENT:
+		return -1;
 	/* unknown handle type */
 	default:
 		RTE_LOG(ERR, EAL,
@@ -606,6 +609,9 @@ rte_intr_disable(const struct rte_intr_handle *intr_handle)
 			return -1;
 		break;
 #endif
+	/* not used at this moment */
+	case RTE_INTR_HANDLE_DEV_EVENT:
+		return -1;
 	/* unknown handle type */
 	default:
 		RTE_LOG(ERR, EAL,
@@ -674,7 +680,10 @@ eal_intr_process_interrupts(struct epoll_event *events, int nfds)
 			bytes_read = 0;
 			call = true;
 			break;
-
+		case RTE_INTR_HANDLE_DEV_EVENT:
+			bytes_read = 0;
+			call = true;
+			break;
 		default:
 			bytes_read = 1;
 			break;
diff --git a/test/test/test_interrupts.c b/test/test/test_interrupts.c
index 31a70a0..dc19175 100644
--- a/test/test/test_interrupts.c
+++ b/test/test/test_interrupts.c
@@ -20,6 +20,7 @@ enum test_interrupt_handle_type {
 	TEST_INTERRUPT_HANDLE_VALID,
 	TEST_INTERRUPT_HANDLE_VALID_UIO,
 	TEST_INTERRUPT_HANDLE_VALID_ALARM,
+	TEST_INTERRUPT_HANDLE_VALID_DEV_EVENT,
 	TEST_INTERRUPT_HANDLE_CASE1,
 	TEST_INTERRUPT_HANDLE_MAX
 };
@@ -80,6 +81,10 @@ test_interrupt_init(void)
 	intr_handles[TEST_INTERRUPT_HANDLE_VALID_ALARM].type =
 					RTE_INTR_HANDLE_ALARM;
 
+	intr_handles[TEST_INTERRUPT_HANDLE_VALID_DEV_EVENT].fd = pfds.readfd;
+	intr_handles[TEST_INTERRUPT_HANDLE_VALID_DEV_EVENT].type =
+					RTE_INTR_HANDLE_DEV_EVENT;
+
 	intr_handles[TEST_INTERRUPT_HANDLE_CASE1].fd = pfds.writefd;
 	intr_handles[TEST_INTERRUPT_HANDLE_CASE1].type = RTE_INTR_HANDLE_UIO;
 
@@ -250,6 +255,14 @@ test_interrupt_enable(void)
 		return -1;
 	}
 
+	/* check with specific valid intr_handle */
+	test_intr_handle = intr_handles[TEST_INTERRUPT_HANDLE_VALID_DEV_EVENT];
+	if (rte_intr_enable(&test_intr_handle) == 0) {
+		printf("unexpectedly enable a specific intr_handle "
+			"successfully\n");
+		return -1;
+	}
+
 	/* check with valid handler and its type */
 	test_intr_handle = intr_handles[TEST_INTERRUPT_HANDLE_CASE1];
 	if (rte_intr_enable(&test_intr_handle) < 0) {
@@ -306,6 +319,14 @@ test_interrupt_disable(void)
 		return -1;
 	}
 
+	/* check with specific valid intr_handle */
+	test_intr_handle = intr_handles[TEST_INTERRUPT_HANDLE_VALID_DEV_EVENT];
+	if (rte_intr_disable(&test_intr_handle) == 0) {
+		printf("unexpectedly disable a specific intr_handle "
+			"successfully\n");
+		return -1;
+	}
+
 	/* check with valid handler and its type */
 	test_intr_handle = intr_handles[TEST_INTERRUPT_HANDLE_CASE1];
 	if (rte_intr_disable(&test_intr_handle) < 0) {
@@ -393,9 +414,17 @@ test_interrupt(void)
 		goto out;
 	}
 
+	printf("Check valid device event interrupt full path\n");
+	if (test_interrupt_full_path_check(
+		TEST_INTERRUPT_HANDLE_VALID_DEV_EVENT) < 0) {
+		printf("failure occurred during checking valid device event "
+						"interrupt full path\n");
+		goto out;
+	}
+
 	printf("Check valid alarm interrupt full path\n");
-	if (test_interrupt_full_path_check(TEST_INTERRUPT_HANDLE_VALID_ALARM)
-									< 0) {
+	if (test_interrupt_full_path_check(
+		TEST_INTERRUPT_HANDLE_VALID_DEV_EVENT) < 0) {
 		printf("failure occurred during checking valid alarm "
 						"interrupt full path\n");
 		goto out;
@@ -513,6 +542,12 @@ test_interrupt(void)
 	rte_intr_callback_unregister(&test_intr_handle,
 			test_interrupt_callback_1, (void *)-1);
 
+	test_intr_handle = intr_handles[TEST_INTERRUPT_HANDLE_VALID_DEV_EVENT];
+	rte_intr_callback_unregister(&test_intr_handle,
+			test_interrupt_callback, (void *)-1);
+	rte_intr_callback_unregister(&test_intr_handle,
+			test_interrupt_callback_1, (void *)-1);
+
 	rte_delay_ms(2 * TEST_INTERRUPT_CHECK_INTERVAL);
 	/* deinit */
 	test_interrupt_deinit();
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH V19 2/4] eal: add device event monitor framework
  2018-04-05  9:02                                                                     ` [PATCH V19 0/4] add device event monitor framework Jeff Guo
  2018-04-05  9:02                                                                       ` [PATCH V19 1/4] eal: add device event handle in interrupt thread Jeff Guo
@ 2018-04-05  9:02                                                                       ` Jeff Guo
  2018-04-05  9:02                                                                       ` [PATCH V19 3/4] eal/linux: uevent parse and process Jeff Guo
  2018-04-05  9:02                                                                       ` [PATCH V19 4/4] app/testpmd: enable device hotplug monitoring Jeff Guo
  3 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-04-05  9:02 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, harry.van.haaren,
	jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

This patch aims to add a general device event monitor framework at
EAL device layer, for device hotplug awareness and actions adopted
accordingly. It could also expand for all other types of device event
monitor, but not in this scope at the stage.

To get started, users firstly call below new added APIs to enable/disable
the device event monitor mechanism:
  - rte_dev_event_monitor_start
  - rte_dev_event_monitor_stop

Then users shell register or unregister callbacks through the new added
APIs. Callbacks can be some device specific, or for all devices.
  -rte_dev_event_callback_register
  -rte_dev_event_callback_unregister

Use hotplug case for example, when device hotplug insertion or hotplug
removal, we will get notified from kernel, then call user's callbacks
accordingly to handle it, such as detach or attach the device from the
bus, and could benefit further fail-safe or live-migration.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v19->v18:
clear the coding style and fix typo
---
 doc/guides/rel_notes/release_18_05.rst  |   9 ++
 lib/librte_eal/bsdapp/eal/Makefile      |   1 +
 lib/librte_eal/bsdapp/eal/eal_dev.c     |  21 +++++
 lib/librte_eal/bsdapp/eal/meson.build   |   1 +
 lib/librte_eal/common/eal_common_dev.c  | 161 ++++++++++++++++++++++++++++++++
 lib/librte_eal/common/eal_private.h     |  15 +++
 lib/librte_eal/common/include/rte_dev.h |  94 +++++++++++++++++++
 lib/librte_eal/linuxapp/eal/Makefile    |   1 +
 lib/librte_eal/linuxapp/eal/eal_dev.c   |  22 +++++
 lib/librte_eal/linuxapp/eal/meson.build |   1 +
 lib/librte_eal/rte_eal_version.map      |  10 ++
 11 files changed, 336 insertions(+)
 create mode 100644 lib/librte_eal/bsdapp/eal/eal_dev.c
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_dev.c

diff --git a/doc/guides/rel_notes/release_18_05.rst b/doc/guides/rel_notes/release_18_05.rst
index e5fac1c..d3c86bd 100644
--- a/doc/guides/rel_notes/release_18_05.rst
+++ b/doc/guides/rel_notes/release_18_05.rst
@@ -58,6 +58,15 @@ New Features
   * Added support for NVGRE, VXLAN and GENEVE filters in flow API.
   * Added support for DROP action in flow API.
 
+* **Added device event monitor framework.**
+
+  Added a general device event monitor framework at EAL, for device dynamic management.
+  Such as device hotplug awareness and actions adopted accordingly. The list of new APIs:
+
+  * ``rte_dev_event_monitor_start`` and ``rte_dev_event_monitor_stop`` are for
+    the event monitor enable and disable.
+  * ``rte_dev_event_callback_register`` and ``rte_dev_event_callback_unregister``
+    are for the user's callbacks register and unregister.
 
 API Changes
 -----------
diff --git a/lib/librte_eal/bsdapp/eal/Makefile b/lib/librte_eal/bsdapp/eal/Makefile
index ed1d17b..90b88eb 100644
--- a/lib/librte_eal/bsdapp/eal/Makefile
+++ b/lib/librte_eal/bsdapp/eal/Makefile
@@ -33,6 +33,7 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_lcore.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_timer.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_interrupts.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_alarm.c
+SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_dev.c
 
 # from common dir
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_common_lcore.c
diff --git a/lib/librte_eal/bsdapp/eal/eal_dev.c b/lib/librte_eal/bsdapp/eal/eal_dev.c
new file mode 100644
index 0000000..1c6c51b
--- /dev/null
+++ b/lib/librte_eal/bsdapp/eal/eal_dev.c
@@ -0,0 +1,21 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <rte_log.h>
+#include <rte_compat.h>
+#include <rte_dev.h>
+
+int __rte_experimental
+rte_dev_event_monitor_start(void)
+{
+	RTE_LOG(ERR, EAL, "Device event is not supported for FreeBSD\n");
+	return -1;
+}
+
+int __rte_experimental
+rte_dev_event_monitor_stop(void)
+{
+	RTE_LOG(ERR, EAL, "Device event is not supported for FreeBSD\n");
+	return -1;
+}
diff --git a/lib/librte_eal/bsdapp/eal/meson.build b/lib/librte_eal/bsdapp/eal/meson.build
index e83fc91..6dfc533 100644
--- a/lib/librte_eal/bsdapp/eal/meson.build
+++ b/lib/librte_eal/bsdapp/eal/meson.build
@@ -12,4 +12,5 @@ env_sources = files('eal_alarm.c',
 		'eal_timer.c',
 		'eal.c',
 		'eal_memory.c',
+		'eal_dev.c'
 )
diff --git a/lib/librte_eal/common/eal_common_dev.c b/lib/librte_eal/common/eal_common_dev.c
index cd07144..e202cf2 100644
--- a/lib/librte_eal/common/eal_common_dev.c
+++ b/lib/librte_eal/common/eal_common_dev.c
@@ -14,9 +14,34 @@
 #include <rte_devargs.h>
 #include <rte_debug.h>
 #include <rte_log.h>
+#include <rte_spinlock.h>
+#include <rte_malloc.h>
 
 #include "eal_private.h"
 
+/**
+ * The device event callback description.
+ *
+ * It contains callback address to be registered by user application,
+ * the pointer to the parameters for callback, and the device name.
+ */
+struct dev_event_callback {
+	TAILQ_ENTRY(dev_event_callback) next; /**< Callbacks list */
+	rte_dev_event_cb_fn cb_fn;                /**< Callback address */
+	void *cb_arg;                           /**< Callback parameter */
+	char *dev_name;	 /**< Callback device name, NULL is for all device */
+	uint32_t active;                        /**< Callback is executing */
+};
+
+/* spinlock for device callbacks */
+static rte_spinlock_t dev_event_lock = RTE_SPINLOCK_INITIALIZER;
+
+/* The device event callback list for all registered callbacks. */
+static struct dev_event_cb_list dev_event_cbs;
+
+/** @internal Structure to keep track of registered callbacks */
+TAILQ_HEAD(dev_event_cb_list, dev_event_callback);
+
 static int cmp_detached_dev_name(const struct rte_device *dev,
 	const void *_name)
 {
@@ -207,3 +232,139 @@ rte_eal_hotplug_remove(const char *busname, const char *devname)
 	rte_eal_devargs_remove(busname, devname);
 	return ret;
 }
+
+int __rte_experimental
+rte_dev_event_callback_register(const char *device_name,
+				rte_dev_event_cb_fn cb_fn,
+				void *cb_arg)
+{
+	struct dev_event_callback *event_cb;
+	int ret;
+
+	if (!cb_fn)
+		return -EINVAL;
+
+	rte_spinlock_lock(&dev_event_lock);
+
+	if (TAILQ_EMPTY(&dev_event_cbs))
+		TAILQ_INIT(&dev_event_cbs);
+
+	TAILQ_FOREACH(event_cb, &dev_event_cbs, next) {
+		if (event_cb->cb_fn == cb_fn && event_cb->cb_arg == cb_arg) {
+			if (device_name == NULL && event_cb->dev_name == NULL)
+				break;
+			if (device_name == NULL || event_cb->dev_name == NULL)
+				continue;
+			if (!strcmp(event_cb->dev_name, device_name))
+				break;
+		}
+	}
+
+	/* create a new callback. */
+	if (event_cb == NULL) {
+		event_cb = malloc(sizeof(struct dev_event_callback));
+		if (event_cb != NULL) {
+			event_cb->cb_fn = cb_fn;
+			event_cb->cb_arg = cb_arg;
+			event_cb->active = 0;
+			if (!device_name) {
+				event_cb->dev_name = NULL;
+			} else {
+				event_cb->dev_name = strdup(device_name);
+				if (event_cb->dev_name == NULL) {
+					ret = -ENOMEM;
+					goto error;
+				}
+			}
+			TAILQ_INSERT_TAIL(&dev_event_cbs, event_cb, next);
+		} else {
+			RTE_LOG(ERR, EAL,
+				"Failed to allocate memory for device "
+				"event callback.");
+			ret = -ENOMEM;
+			goto error;
+		}
+	} else {
+		RTE_LOG(ERR, EAL,
+			"The callback is already exist, no need "
+			"to register again.\n");
+		ret = -EEXIST;
+	}
+
+	rte_spinlock_unlock(&dev_event_lock);
+	return 0;
+error:
+	free(event_cb);
+	rte_spinlock_unlock(&dev_event_lock);
+	return ret;
+}
+
+int __rte_experimental
+rte_dev_event_callback_unregister(const char *device_name,
+				  rte_dev_event_cb_fn cb_fn,
+				  void *cb_arg)
+{
+	int ret = 0;
+	struct dev_event_callback *event_cb, *next;
+
+	if (!cb_fn)
+		return -EINVAL;
+
+	rte_spinlock_lock(&dev_event_lock);
+	/*walk through the callbacks and remove all that match. */
+	for (event_cb = TAILQ_FIRST(&dev_event_cbs); event_cb != NULL;
+	     event_cb = next) {
+
+		next = TAILQ_NEXT(event_cb, next);
+
+		if (device_name != NULL && event_cb->dev_name != NULL) {
+			if (!strcmp(event_cb->dev_name, device_name)) {
+				if (event_cb->cb_fn != cb_fn ||
+				    (cb_arg != (void *)-1 &&
+				    event_cb->cb_arg != cb_arg))
+					continue;
+			}
+		} else if (device_name != NULL) {
+			continue;
+		}
+
+		/*
+		 * if this callback is not executing right now,
+		 * then remove it.
+		 */
+		if (event_cb->active == 0) {
+			TAILQ_REMOVE(&dev_event_cbs, event_cb, next);
+			free(event_cb);
+			ret++;
+		} else {
+			continue;
+		}
+	}
+	rte_spinlock_unlock(&dev_event_lock);
+	return ret;
+}
+
+void
+dev_callback_process(char *device_name, enum rte_dev_event_type event)
+{
+	struct dev_event_callback *cb_lst;
+
+	if (device_name == NULL)
+		return;
+
+	rte_spinlock_lock(&dev_event_lock);
+
+	TAILQ_FOREACH(cb_lst, &dev_event_cbs, next) {
+		if (cb_lst->dev_name) {
+			if (strcmp(cb_lst->dev_name, device_name))
+				continue;
+		}
+		cb_lst->active = 1;
+		rte_spinlock_unlock(&dev_event_lock);
+		cb_lst->cb_fn(device_name, event,
+				cb_lst->cb_arg);
+		rte_spinlock_lock(&dev_event_lock);
+		cb_lst->active = 0;
+	}
+	rte_spinlock_unlock(&dev_event_lock);
+}
diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h
index 0b28770..88e5a59 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -9,6 +9,8 @@
 #include <stdint.h>
 #include <stdio.h>
 
+#include <rte_dev.h>
+
 /**
  * Initialize the memzone subsystem (private to eal).
  *
@@ -205,4 +207,17 @@ struct rte_bus *rte_bus_find_by_device_name(const char *str);
 
 int rte_mp_channel_init(void);
 
+/**
+ * Internal Executes all the user application registered callbacks for
+ * the specific device. It is for DPDK internal user only. User
+ * application should not call it directly.
+ *
+ * @param device_name
+ *  The device name.
+ * @param event
+ *  the device event type.
+ *
+ */
+void
+dev_callback_process(char *device_name, enum rte_dev_event_type event);
 #endif /* _EAL_PRIVATE_H_ */
diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h
index b688f1e..2ed240e 100644
--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -24,6 +24,25 @@ extern "C" {
 #include <rte_compat.h>
 #include <rte_log.h>
 
+/**
+ * The device event type.
+ */
+enum rte_dev_event_type {
+	RTE_DEV_EVENT_ADD,	/**< device being added */
+	RTE_DEV_EVENT_REMOVE,	/**< device being removed */
+	RTE_DEV_EVENT_MAX	/**< max value of this enum */
+};
+
+struct rte_dev_event {
+	enum rte_dev_event_type type;	/**< device event type */
+	int subsystem;			/**< subsystem id */
+	char *devname;			/**< device name */
+};
+
+typedef void (*rte_dev_event_cb_fn)(char *device_name,
+					enum rte_dev_event_type event,
+					void *cb_arg);
+
 __attribute__((format(printf, 2, 0)))
 static inline void
 rte_pmd_debug_trace(const char *func_name, const char *fmt, ...)
@@ -267,4 +286,79 @@ __attribute__((used)) = str
 }
 #endif
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * It registers the callback for the specific device.
+ * Multiple callbacks cal be registered at the same time.
+ *
+ * @param device_name
+ *  The device name, that is the param name of the struct rte_device,
+ *  null value means for all devices.
+ * @param cb_fn
+ *  callback address.
+ * @param cb_arg
+ *  address of parameter for callback.
+ *
+ * @return
+ *  - On success, zero.
+ *  - On failure, a negative value.
+ */
+int __rte_experimental
+rte_dev_event_callback_register(const char *device_name,
+				rte_dev_event_cb_fn cb_fn,
+				void *cb_arg);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * It unregisters the callback according to the specified device.
+ *
+ * @param device_name
+ *  The device name, that is the param name of the struct rte_device,
+ *  null value means for all devices.
+ * @param cb_fn
+ *  callback address.
+ * @param cb_arg
+ *  address of parameter for callback, (void *)-1 means to remove all
+ *  registered which has the same callback address.
+ *
+ * @return
+ *  - On success, return the number of callback entities removed.
+ *  - On failure, a negative value.
+ */
+int __rte_experimental
+rte_dev_event_callback_unregister(const char *device_name,
+				  rte_dev_event_cb_fn cb_fn,
+				  void *cb_arg);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Start the device event monitoring.
+ *
+ * @param none
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int __rte_experimental
+rte_dev_event_monitor_start(void);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Stop the device event monitoring .
+ *
+ * @param none
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int __rte_experimental
+rte_dev_event_monitor_stop(void);
 #endif /* _RTE_DEV_H_ */
diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index b9c7727..db67001 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -41,6 +41,7 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_lcore.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_timer.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_interrupts.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_alarm.c
+SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_dev.c
 
 # from common dir
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_common_lcore.c
diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
new file mode 100644
index 0000000..9c8d1a0
--- /dev/null
+++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
@@ -0,0 +1,22 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <rte_log.h>
+#include <rte_compat.h>
+#include <rte_dev.h>
+
+
+int __rte_experimental
+rte_dev_event_monitor_start(void)
+{
+	/* TODO: start uevent monitor for linux */
+	return 0;
+}
+
+int __rte_experimental
+rte_dev_event_monitor_stop(void)
+{
+	/* TODO: stop uevent monitor for linux */
+	return 0;
+}
diff --git a/lib/librte_eal/linuxapp/eal/meson.build b/lib/librte_eal/linuxapp/eal/meson.build
index 03974ff..b222571 100644
--- a/lib/librte_eal/linuxapp/eal/meson.build
+++ b/lib/librte_eal/linuxapp/eal/meson.build
@@ -18,6 +18,7 @@ env_sources = files('eal_alarm.c',
 		'eal_vfio_mp_sync.c',
 		'eal.c',
 		'eal_memory.c',
+		'eal_dev.c',
 )
 
 if has_libnuma == 1
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index dd38783..3022df1 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -260,3 +260,13 @@ EXPERIMENTAL {
 	rte_socket_id_by_idx;
 
 } DPDK_18.02;
+
+EXPERIMENTAL {
+        global:
+
+	rte_dev_event_monitor_start;
+	rte_dev_event_monitor_stop;
+        rte_dev_event_callback_register;
+        rte_dev_event_callback_unregister;
+
+} DPDK_18.05;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH V19 3/4] eal/linux: uevent parse and process
  2018-04-05  9:02                                                                     ` [PATCH V19 0/4] add device event monitor framework Jeff Guo
  2018-04-05  9:02                                                                       ` [PATCH V19 1/4] eal: add device event handle in interrupt thread Jeff Guo
  2018-04-05  9:02                                                                       ` [PATCH V19 2/4] eal: add device event monitor framework Jeff Guo
@ 2018-04-05  9:02                                                                       ` Jeff Guo
  2018-04-05 11:05                                                                         ` Tan, Jianfeng
  2018-04-05  9:02                                                                       ` [PATCH V19 4/4] app/testpmd: enable device hotplug monitoring Jeff Guo
  3 siblings, 1 reply; 494+ messages in thread
From: Jeff Guo @ 2018-04-05  9:02 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, harry.van.haaren,
	jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

In order to handle the uevent which has been detected from the kernel
side, add uevent parse and process function to translate the uevent into
device event, which user has subscribed to monitor.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v19->18:
fix some misunderstanding part
---
 lib/librte_eal/linuxapp/eal/eal_dev.c | 196 +++++++++++++++++++++++++++++++++-
 1 file changed, 194 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
index 9c8d1a0..4686c41 100644
--- a/lib/librte_eal/linuxapp/eal/eal_dev.c
+++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
@@ -2,21 +2,213 @@
  * Copyright(c) 2018 Intel Corporation
  */
 
+#include <string.h>
+#include <unistd.h>
+#include <sys/socket.h>
+#include <linux/netlink.h>
+
 #include <rte_log.h>
 #include <rte_compat.h>
 #include <rte_dev.h>
+#include <rte_malloc.h>
+#include <rte_interrupts.h>
+
+#include "eal_private.h"
+
+static struct rte_intr_handle intr_handle = {.fd = -1 };
+static bool monitor_started;
+
+#define EAL_UEV_MSG_LEN 4096
+#define EAL_UEV_MSG_ELEM_LEN 128
+
+/* identify the system layer which reports this event. */
+enum eal_dev_event_subsystem {
+	EAL_DEV_EVENT_SUBSYSTEM_PCI, /* PCI bus device event */
+	EAL_DEV_EVENT_SUBSYSTEM_UIO, /* UIO driver device event */
+	EAL_DEV_EVENT_SUBSYSTEM_VFIO, /* VFIO driver device event */
+	EAL_DEV_EVENT_SUBSYSTEM_MAX
+};
+
+static int
+dev_uev_socket_fd_create(void)
+{
+	struct sockaddr_nl addr;
+	int ret;
+
+	intr_handle.fd = socket(PF_NETLINK, SOCK_RAW | SOCK_CLOEXEC |
+			SOCK_NONBLOCK,
+			NETLINK_KOBJECT_UEVENT);
+	if (intr_handle.fd < 0) {
+		RTE_LOG(ERR, EAL, "create uevent fd failed.\n");
+		return -1;
+	}
+
+	memset(&addr, 0, sizeof(addr));
+	addr.nl_family = AF_NETLINK;
+	addr.nl_pid = 0;
+	addr.nl_groups = 0xffffffff;
+
+	ret = bind(intr_handle.fd, (struct sockaddr *) &addr, sizeof(addr));
+	if (ret < 0) {
+		RTE_LOG(ERR, EAL, "Failed to bind uevent socket.\n");
+		goto err;
+	}
+
+	return 0;
+err:
+	close(intr_handle.fd);
+	intr_handle.fd = -1;
+	return ret;
+}
+
+static int
+dev_uev_parse(const char *buf, struct rte_dev_event *event, int length)
+{
+	char action[EAL_UEV_MSG_ELEM_LEN];
+	char subsystem[EAL_UEV_MSG_ELEM_LEN];
+	char pci_slot_name[EAL_UEV_MSG_ELEM_LEN];
+	int i = 0, ret = 0;
+
+	memset(action, 0, EAL_UEV_MSG_ELEM_LEN);
+	memset(subsystem, 0, EAL_UEV_MSG_ELEM_LEN);
+	memset(pci_slot_name, 0, EAL_UEV_MSG_ELEM_LEN);
+
+	while (i < length) {
+		for (; i < length; i++) {
+			if (*buf)
+				break;
+			buf++;
+		}
+		/**
+		 * check device uevent from kernel side, no need to check
+		 * uevent from udev.
+		 */
+		if (!strncmp(buf, "libudev", 7)) {
+			buf += 7;
+			i += 7;
+			return -1;
+		}
+		if (!strncmp(buf, "ACTION=", 7)) {
+			buf += 7;
+			i += 7;
+			snprintf(action, sizeof(action), "%s", buf);
+		} else if (!strncmp(buf, "SUBSYSTEM=", 10)) {
+			buf += 10;
+			i += 10;
+			snprintf(subsystem, sizeof(subsystem), "%s", buf);
+		} else if (!strncmp(buf, "PCI_SLOT_NAME=", 14)) {
+			buf += 14;
+			i += 14;
+			snprintf(pci_slot_name, sizeof(subsystem), "%s", buf);
+			event->devname = strdup(pci_slot_name);
+		}
+		for (; i < length; i++) {
+			if (*buf == '\0')
+				break;
+			buf++;
+		}
+	}
+
+	/* parse the subsystem layer */
+	if (!strncmp(subsystem, "uio", 3))
+		event->subsystem = EAL_DEV_EVENT_SUBSYSTEM_UIO;
+	else if (!strncmp(subsystem, "pci", 3))
+		event->subsystem = EAL_DEV_EVENT_SUBSYSTEM_PCI;
+	else if (!strncmp(subsystem, "vfio", 4))
+		event->subsystem = EAL_DEV_EVENT_SUBSYSTEM_VFIO;
+	else
+		ret = -1;
 
+	/* parse the action type */
+	if (!strncmp(action, "add", 3))
+		event->type = RTE_DEV_EVENT_ADD;
+	else if (!strncmp(action, "remove", 6))
+		event->type = RTE_DEV_EVENT_REMOVE;
+	else
+		ret = -1;
+	return ret;
+}
+
+static void
+dev_uev_handler(__rte_unused void *param)
+{
+	struct rte_dev_event uevent;
+	int ret;
+	char buf[EAL_UEV_MSG_LEN];
+
+	memset(&uevent, 0, sizeof(struct rte_dev_event));
+	memset(buf, 0, EAL_UEV_MSG_LEN);
+
+	ret = recv(intr_handle.fd, buf, EAL_UEV_MSG_LEN, MSG_DONTWAIT);
+	if (ret == 0 || (ret < 0 && errno != EAGAIN)) {
+		/* connection is closed or broken, can not up again. */
+		RTE_LOG(ERR, EAL, "uevent socket connection is broken.\n");
+		return;
+	} else if (ret < 0) {
+		RTE_LOG(ERR, EAL,
+			"uevent socket read error(%d): %s.\n",
+			errno, strerror(errno));
+		return;
+	}
+
+	ret = dev_uev_parse(buf, &uevent, EAL_UEV_MSG_LEN);
+	if (ret < 0) {
+		RTE_LOG(ERR, EAL, "It is not an valid event "
+			"that need to be handle.\n");
+		return;
+	}
+
+	RTE_LOG(DEBUG, EAL, "receive uevent(name:%s, type:%d, subsystem:%d)\n",
+		uevent.devname, uevent.type, uevent.subsystem);
+
+	if (uevent.devname)
+		dev_callback_process(uevent.devname, uevent.type);
+}
 
 int __rte_experimental
 rte_dev_event_monitor_start(void)
 {
-	/* TODO: start uevent monitor for linux */
+	int ret;
+
+	if (monitor_started)
+		return 0;
+
+	ret = dev_uev_socket_fd_create();
+	if (ret) {
+		RTE_LOG(ERR, EAL, "error create device event fd.\n");
+		return -1;
+	}
+
+	intr_handle.type = RTE_INTR_HANDLE_DEV_EVENT;
+	ret = rte_intr_callback_register(&intr_handle, dev_uev_handler, NULL);
+
+	if (ret) {
+		RTE_LOG(ERR, EAL, "fail to register uevent callback.\n");
+		return -1;
+	}
+
+	monitor_started = true;
+
 	return 0;
 }
 
 int __rte_experimental
 rte_dev_event_monitor_stop(void)
 {
-	/* TODO: stop uevent monitor for linux */
+	int ret;
+
+	if (!monitor_started)
+		return 0;
+
+	ret = rte_intr_callback_unregister(&intr_handle, dev_uev_handler,
+					   (void *)-1);
+	if (ret < 0) {
+		RTE_LOG(ERR, EAL, "fail to unregister uevent callback.\n");
+		return ret;
+	}
+
+	close(intr_handle.fd);
+	intr_handle.fd = -1;
+	monitor_started = false;
 	return 0;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH V19 4/4] app/testpmd: enable device hotplug monitoring
  2018-04-05  9:02                                                                     ` [PATCH V19 0/4] add device event monitor framework Jeff Guo
                                                                                         ` (2 preceding siblings ...)
  2018-04-05  9:02                                                                       ` [PATCH V19 3/4] eal/linux: uevent parse and process Jeff Guo
@ 2018-04-05  9:02                                                                       ` Jeff Guo
  2018-04-05 16:10                                                                         ` [PATCH V20 0/4] add device event monitor framework Jeff Guo
  3 siblings, 1 reply; 494+ messages in thread
From: Jeff Guo @ 2018-04-05  9:02 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, harry.van.haaren,
	jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

Use testpmd for example, to show how an application uses device event
APIs to monitor the hotplug events, including both hot removal event
and hot insertion event.

The process is that, testpmd first enable hotplug by below commands,

E.g. ./build/app/testpmd -c 0x3 --n 4 -- -i --hot-plug

then testpmd starts the device event monitor by calling the new API
(rte_dev_event_monitor_start) and register the user's callback by call
the API (rte_dev_event_callback_register), when device being hotplug
insertion or hotplug removal, the device event monitor detects the event
and call user's callbacks, user could process the event in the callback
accordingly.

This patch only shows the event monitoring, device attach/detach would
not be involved here, will add from other hotplug patch set.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
---
v19->v18:
fix some typo
---
 app/test-pmd/parameters.c             |   5 +-
 app/test-pmd/testpmd.c                | 101 +++++++++++++++++++++++++++++++++-
 app/test-pmd/testpmd.h                |   2 +
 doc/guides/testpmd_app_ug/run_app.rst |   4 ++
 4 files changed, 110 insertions(+), 2 deletions(-)

diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index 2192bdc..1a05284 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -186,6 +186,7 @@ usage(char* progname)
 	printf("  --flow-isolate-all: "
 	       "requests flow API isolated mode on all ports at initialization time.\n");
 	printf("  --tx-offloads=0xXXXXXXXX: hexadecimal bitmask of TX queue offloads\n");
+	printf("  --hot-plug: enable hot plug for device.\n");
 }
 
 #ifdef RTE_LIBRTE_CMDLINE
@@ -621,6 +622,7 @@ launch_args_parse(int argc, char** argv)
 		{ "print-event",		1, 0, 0 },
 		{ "mask-event",			1, 0, 0 },
 		{ "tx-offloads",		1, 0, 0 },
+		{ "hot-plug",			0, 0, 0 },
 		{ 0, 0, 0, 0 },
 	};
 
@@ -1101,7 +1103,8 @@ launch_args_parse(int argc, char** argv)
 					rte_exit(EXIT_FAILURE,
 						 "invalid mask-event argument\n");
 				}
-
+			if (!strcmp(lgopts[opt_idx].name, "hot-plug"))
+				hot_plug = 1;
 			break;
 		case 'h':
 			usage(argv[0]);
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 4c0e258..d2c122a 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -12,6 +12,7 @@
 #include <sys/mman.h>
 #include <sys/types.h>
 #include <errno.h>
+#include <stdbool.h>
 
 #include <sys/queue.h>
 #include <sys/stat.h>
@@ -284,6 +285,8 @@ uint8_t lsc_interrupt = 1; /* enabled by default */
  */
 uint8_t rmv_interrupt = 1; /* enabled by default */
 
+uint8_t hot_plug = 0; /**< hotplug disabled by default. */
+
 /*
  * Display or mask ether events
  * Default to all events except VF_MBOX
@@ -391,6 +394,12 @@ static void check_all_ports_link_status(uint32_t port_mask);
 static int eth_event_callback(portid_t port_id,
 			      enum rte_eth_event_type type,
 			      void *param, void *ret_param);
+static void eth_dev_event_callback(char *device_name,
+				enum rte_dev_event_type type,
+				void *param);
+static int eth_dev_event_callback_register(void);
+static int eth_dev_event_callback_unregister(void);
+
 
 /*
  * Check if all the ports are started.
@@ -1853,6 +1862,39 @@ reset_port(portid_t pid)
 	printf("Done\n");
 }
 
+static int
+eth_dev_event_callback_register(void)
+{
+	int ret;
+
+	/* register the device event callback */
+	ret = rte_dev_event_callback_register(NULL,
+		eth_dev_event_callback, NULL);
+	if (ret) {
+		printf("Failed to register device event callback\n");
+		return -1;
+	}
+
+	return 0;
+}
+
+
+static int
+eth_dev_event_callback_unregister(void)
+{
+	int ret;
+
+	/* unregister the device event callback */
+	ret = rte_dev_event_callback_unregister(NULL,
+		eth_dev_event_callback, NULL);
+	if (ret < 0) {
+		printf("Failed to unregister device event callback\n");
+		return -1;
+	}
+
+	return 0;
+}
+
 void
 attach_port(char *identifier)
 {
@@ -1916,6 +1958,7 @@ void
 pmd_test_exit(void)
 {
 	portid_t pt_id;
+	int ret;
 
 	if (test_done == 0)
 		stop_packet_forwarding();
@@ -1929,6 +1972,18 @@ pmd_test_exit(void)
 			close_port(pt_id);
 		}
 	}
+
+	if (hot_plug) {
+		ret = rte_dev_event_monitor_stop();
+		if (ret)
+			RTE_LOG(ERR, EAL,
+				"fail to stop device event monitor.");
+
+		ret = eth_dev_event_callback_unregister();
+		if (ret)
+			RTE_LOG(ERR, EAL,
+				"fail to unregister all event callbacks.");
+	}
 	printf("\nBye...\n");
 }
 
@@ -2059,6 +2114,37 @@ eth_event_callback(portid_t port_id, enum rte_eth_event_type type, void *param,
 	return 0;
 }
 
+/* This function is used by the interrupt thread */
+static void
+eth_dev_event_callback(char *device_name, enum rte_dev_event_type type,
+			     __rte_unused void *arg)
+{
+	if (type >= RTE_DEV_EVENT_MAX) {
+		fprintf(stderr, "%s called upon invalid event %d\n",
+			__func__, type);
+		fflush(stderr);
+	}
+
+	switch (type) {
+	case RTE_DEV_EVENT_REMOVE:
+		RTE_LOG(ERR, EAL, "The device: %s has been removed!\n",
+			device_name);
+		/* TODO: After finish failure handle, begin to stop
+		 * packet forward, stop port, close port, detach port.
+		 */
+		break;
+	case RTE_DEV_EVENT_ADD:
+		RTE_LOG(ERR, EAL, "The device: %s has been added!\n",
+			device_name);
+		/* TODO: After finish kernel driver binding,
+		 * begin to attach port.
+		 */
+		break;
+	default:
+		break;
+	}
+}
+
 static int
 set_tx_queue_stats_mapping_registers(portid_t port_id, struct rte_port *port)
 {
@@ -2474,8 +2560,9 @@ signal_handler(int signum)
 int
 main(int argc, char** argv)
 {
-	int  diag;
+	int diag;
 	portid_t port_id;
+	int ret;
 
 	signal(SIGINT, signal_handler);
 	signal(SIGTERM, signal_handler);
@@ -2543,6 +2630,18 @@ main(int argc, char** argv)
 		       nb_rxq, nb_txq);
 
 	init_config();
+
+	if (hot_plug) {
+		/* enable hot plug monitoring */
+		ret = rte_dev_event_monitor_start();
+		if (ret) {
+			rte_errno = EINVAL;
+			return -1;
+		}
+		eth_dev_event_callback_register();
+
+	}
+
 	if (start_port(RTE_PORT_ALL) != 0)
 		rte_exit(EXIT_FAILURE, "Start ports failed\n");
 
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 153abea..8fde68d 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -319,6 +319,8 @@ extern volatile int test_done; /* stop packet forwarding when set to 1. */
 extern uint8_t lsc_interrupt; /**< disabled by "--no-lsc-interrupt" parameter */
 extern uint8_t rmv_interrupt; /**< disabled by "--no-rmv-interrupt" parameter */
 extern uint32_t event_print_mask;
+extern uint8_t hot_plug; /**< enable by "--hot-plug" parameter */
+
 /**< set by "--print-event xxxx" and "--mask-event xxxx parameters */
 
 #ifdef RTE_LIBRTE_IXGBE_BYPASS
diff --git a/doc/guides/testpmd_app_ug/run_app.rst b/doc/guides/testpmd_app_ug/run_app.rst
index 1fd5395..d0ced36 100644
--- a/doc/guides/testpmd_app_ug/run_app.rst
+++ b/doc/guides/testpmd_app_ug/run_app.rst
@@ -479,3 +479,7 @@ The commandline options are:
 
     Set the hexadecimal bitmask of TX queue offloads.
     The default value is 0.
+
+*   ``--hot-plug``
+
+    Enable device event monitor machenism for hotplug.
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* Re: [PATCH V18 4/4] app/testpmd: enable device hotplug monitoring
  2018-04-04 16:31                                                                       ` Matan Azrad
  2018-04-05  8:40                                                                         ` Guo, Jia
@ 2018-04-05  9:03                                                                         ` Tan, Jianfeng
  1 sibling, 0 replies; 494+ messages in thread
From: Tan, Jianfeng @ 2018-04-05  9:03 UTC (permalink / raw)
  To: Matan Azrad, Guo, Jia, stephen, Richardson, Bruce, Yigit, Ferruh,
	Ananyev, Konstantin, gaetan.rivet, Wu, Jingjing, Thomas Monjalon,
	Mordechay Haimovsky, Van Haaren, Harry
  Cc: jblunck, shreyansh.jain, dev, Zhang, Helin



On 4/5/2018 12:31 AM, Matan Azrad wrote:
> Hi all
>
> What do you think about adding the "--hotplug" parameter as a new EAL command line parameter?

+1

Thanks,
Jianfeng

>
> From: Tan, Jianfeng, Wednesday, April 4, 2018 6:23 AM
>>> -----Original Message-----
>>> From: Guo, Jia
>>> Sent: Tuesday, April 3, 2018 6:34 PM
>>> To: stephen@networkplumber.org; Richardson, Bruce; Yigit, Ferruh;
>>> Ananyev, Konstantin; gaetan.rivet@6wind.com; Wu, Jingjing;
>>> thomas@monjalon.net; motih@mellanox.com; Van Haaren, Harry; Tan,
>>> Jianfeng
>>> Cc: jblunck@infradead.org; shreyansh.jain@nxp.com; dev@dpdk.org; Guo,
>>> Jia; Zhang, Helin
>>> Subject: [PATCH V18 4/4] app/testpmd: enable device hotplug monitoring
>>>
>>> Use testpmd for example, to show how an application use device event
>> s/use/uses
>>
>>> APIs to monitor the hotplug events, including both hot removal event
>>> and hot insertion event.
>>>
>>> The process is that, testpmd first enable hotplug by below commands,
>>>
>>> E.g. ./build/app/testpmd -c 0x3 --n 4 -- -i --hot-plug
>>>
>>> then testpmd start the device event monitor by call the new API
>> s/start/starts
>> s/call/calling
>>
>>> (rte_dev_event_monitor_start) and register the user's callback by call
>>> the API (rte_dev_event_callback_register), when device being hotplug
>>> insertion or hotplug removal, the device event monitor detects the
>>> event and call user's callbacks, user could process the event in the
>>> callback accordingly.
>>>
>>> This patch only shows the event monitoring, device attach/detach would
>>> not be involved here, will add from other hotplug patch set.
>>>
>>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
>> Some typos and a trivial suggestion. Feel free to carry my
>>
>> Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
>>
>> in the next version.
>>
>>> ---
>>> v18->v17:
>>> remove hotplug policy and detach/attach process from testpmd, let it
>>> focus on the device event monitoring which the patch set introduced.
>>> ---
>>>   app/test-pmd/parameters.c             |   5 +-
>>>   app/test-pmd/testpmd.c                | 112
>>> +++++++++++++++++++++++++++++++++-
>>>   app/test-pmd/testpmd.h                |   2 +
>>>   doc/guides/testpmd_app_ug/run_app.rst |   4 ++
>>>   4 files changed, 121 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
>>> index 97d22b8..558cd40 100644
>>> --- a/app/test-pmd/parameters.c
>>> +++ b/app/test-pmd/parameters.c
>>> @@ -186,6 +186,7 @@ usage(char* progname)
>>>   	printf("  --flow-isolate-all: "
>>>   	       "requests flow API isolated mode on all ports at
>>> initialization time.\n");
>>>   	printf("  --tx-offloads=0xXXXXXXXX: hexadecimal bitmask of TX
>> queue
>>> offloads\n");
>>> +	printf("  --hot-plug: enable hot plug for device.\n");
>>>   }
>>>
>>>   #ifdef RTE_LIBRTE_CMDLINE
>>> @@ -621,6 +622,7 @@ launch_args_parse(int argc, char** argv)
>>>   		{ "print-event",		1, 0, 0 },
>>>   		{ "mask-event",			1, 0, 0 },
>>>   		{ "tx-offloads",		1, 0, 0 },
>>> +		{ "hot-plug",			0, 0, 0 },
>>>   		{ 0, 0, 0, 0 },
>>>   	};
>>>
>>> @@ -1102,7 +1104,8 @@ launch_args_parse(int argc, char** argv)
>>>   					rte_exit(EXIT_FAILURE,
>>>   						 "invalid mask-event
>>> argument\n");
>>>   				}
>>> -
>>> +			if (!strcmp(lgopts[opt_idx].name, "hot-plug"))
>>> +				hot_plug = 1;
>>>   			break;
>>>   		case 'h':
>>>   			usage(argv[0]);
>>> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
>>> 4c0e258..2faeb90 100644
>>> --- a/app/test-pmd/testpmd.c
>>> +++ b/app/test-pmd/testpmd.c
>>> @@ -12,6 +12,7 @@
>>>   #include <sys/mman.h>
>>>   #include <sys/types.h>
>>>   #include <errno.h>
>>> +#include <stdbool.h>
>>>
>>>   #include <sys/queue.h>
>>>   #include <sys/stat.h>
>>> @@ -284,6 +285,8 @@ uint8_t lsc_interrupt = 1; /* enabled by default */
>>>    */
>>>   uint8_t rmv_interrupt = 1; /* enabled by default */
>>>
>>> +uint8_t hot_plug = 0; /**< hotplug disabled by default. */
>>> +
>>>   /*
>>>    * Display or mask ether events
>>>    * Default to all events except VF_MBOX @@ -391,6 +394,12 @@ static
>>> void check_all_ports_link_status(uint32_t
>>> port_mask);
>>>   static int eth_event_callback(portid_t port_id,
>>>   			      enum rte_eth_event_type type,
>>>   			      void *param, void *ret_param);
>>> +static int eth_dev_event_callback(char *device_name,
>>> +				enum rte_dev_event_type type,
>>> +				void *param);
>>> +static int eth_dev_event_callback_register(void);
>>> +static int eth_dev_event_callback_unregister(void);
>>> +
>>>
>>>   /*
>>>    * Check if all the ports are started.
>>> @@ -1853,6 +1862,39 @@ reset_port(portid_t pid)
>>>   	printf("Done\n");
>>>   }
>>>
>>> +static int
>>> +eth_dev_event_callback_register(void)
>>> +{
>>> +	int diag;
>>> +
>>> +	/* register the device event callback */
>>> +	diag = rte_dev_event_callback_register(NULL,
>>> +		eth_dev_event_callback, NULL);
>>> +	if (diag) {
>>> +		printf("Failed to setup dev_event callback\n");
>>> +		return -1;
>>> +	}
>>> +
>>> +	return 0;
>>> +}
>>> +
>>> +
>>> +static int
>>> +eth_dev_event_callback_unregister(void)
>>> +{
>>> +	int diag;
>>> +
>>> +	/* unregister the device event callback */
>>> +	diag = rte_dev_event_callback_unregister(NULL,
>>> +		eth_dev_event_callback, NULL);
>>> +	if (diag) {
>>> +		printf("Failed to setup dev_event callback\n");
>>> +		return -1;
>>> +	}
>>> +
>>> +	return 0;
>>> +}
>>> +
>>>   void
>>>   attach_port(char *identifier)
>>>   {
>>> @@ -1916,6 +1958,7 @@ void
>>>   pmd_test_exit(void)
>>>   {
>>>   	portid_t pt_id;
>>> +	int ret;
>>>
>>>   	if (test_done == 0)
>>>   		stop_packet_forwarding();
>>> @@ -1929,6 +1972,18 @@ pmd_test_exit(void)
>>>   			close_port(pt_id);
>>>   		}
>>>   	}
>>> +
>>> +	if (hot_plug) {
>>> +		ret = rte_dev_event_monitor_stop();
>>> +		if (ret)
>>> +			RTE_LOG(ERR, EAL,
>>> +				"fail to stop device event monitor.");
>>> +
>>> +		ret = eth_dev_event_callback_unregister();
>>> +		if (ret)
>>> +			RTE_LOG(ERR, EAL,
>>> +				"fail to unregister all event callbacks.");
>>> +	}
>>>   	printf("\nBye...\n");
>>>   }
>>>
>>> @@ -2059,6 +2114,48 @@ eth_event_callback(portid_t port_id, enum
>>> rte_eth_event_type type, void *param,
>>>   	return 0;
>>>   }
>>>
>>> +/* This function is used by the interrupt thread */ static int
>>> +eth_dev_event_callback(char *device_name, enum rte_dev_event_type
>>> type,
>>> +			     __rte_unused void *arg)
>>> +{
>>> +	int ret = 0;
>>  From here
>>
>>> +	static const char * const event_desc[] = {
>>> +		[RTE_DEV_EVENT_ADD] = "add",
>>> +		[RTE_DEV_EVENT_REMOVE] = "remove",
>>> +	};
>>> +
>>> +	if (type >= RTE_DEV_EVENT_MAX) {
>>> +		fprintf(stderr, "%s called upon invalid event %d\n",
>>> +			__func__, type);
>>> +		fflush(stderr);
>>> +	} else if (event_print_mask & (UINT32_C(1) << type)) {
>>> +		printf("%s event\n",
>>> +			event_desc[type]);
>>> +		fflush(stdout);
>>> +	}
>> to here, these check are not necessary.
>>
>>> +
>>> +	switch (type) {
>>> +	case RTE_DEV_EVENT_REMOVE:
>>> +		RTE_LOG(ERR, EAL, "The device: %s has been removed!\n",
>>> +			device_name);
>>> +		/* TODO: After finish failure handle, begin to stop
>>> +		 * packet forward, stop port, close port, detach port.
>>> +		 */
>>> +		break;
>>> +	case RTE_DEV_EVENT_ADD:
>>> +		RTE_LOG(ERR, EAL, "The device: %s has been added!\n",
>>> +			device_name);
>>> +		/* TODO: After finish kernel driver binding,
>>> +		 * begin to attach port.
>>> +		 */
>>> +		break;
>>> +	default:
>>> +		break;
>>> +	}
>>> +	return ret;
>>> +}
>>> +
>>>   static int
>>>   set_tx_queue_stats_mapping_registers(portid_t port_id, struct
>>> rte_port
>>> *port)
>>>   {
>>> @@ -2474,8 +2571,9 @@ signal_handler(int signum)  int  main(int argc,
>>> char** argv)  {
>>> -	int  diag;
>>> +	int diag;
>>>   	portid_t port_id;
>>> +	int ret;
>>>
>>>   	signal(SIGINT, signal_handler);
>>>   	signal(SIGTERM, signal_handler);
>>> @@ -2543,6 +2641,18 @@ main(int argc, char** argv)
>>>   		       nb_rxq, nb_txq);
>>>
>>>   	init_config();
>>> +
>>> +	if (hot_plug) {
>>> +		/* enable hot plug monitoring */
>>> +		ret = rte_dev_event_monitor_start();
>>> +		if (ret) {
>>> +			rte_errno = EINVAL;
>>> +			return -1;
>>> +		}
>>> +		eth_dev_event_callback_register();
>>> +
>>> +	}
>>> +
>>>   	if (start_port(RTE_PORT_ALL) != 0)
>>>   		rte_exit(EXIT_FAILURE, "Start ports failed\n");
>>>
>>> diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h index
>>> 153abea..8fde68d 100644
>>> --- a/app/test-pmd/testpmd.h
>>> +++ b/app/test-pmd/testpmd.h
>>> @@ -319,6 +319,8 @@ extern volatile int test_done; /* stop packet
>>> forwarding when set to 1. */  extern uint8_t lsc_interrupt; /**<
>>> disabled by "--no-lsc-interrupt" parameter */  extern uint8_t
>>> rmv_interrupt; /**< disabled by "--no-rmv-interrupt"
>>> parameter */
>>>   extern uint32_t event_print_mask;
>>> +extern uint8_t hot_plug; /**< enable by "--hot-plug" parameter */
>>> +
>>>   /**< set by "--print-event xxxx" and "--mask-event xxxx parameters */
>>>
>>>   #ifdef RTE_LIBRTE_IXGBE_BYPASS
>>> diff --git a/doc/guides/testpmd_app_ug/run_app.rst
>>> b/doc/guides/testpmd_app_ug/run_app.rst
>>> index 1fd5395..d0ced36 100644
>>> --- a/doc/guides/testpmd_app_ug/run_app.rst
>>> +++ b/doc/guides/testpmd_app_ug/run_app.rst
>>> @@ -479,3 +479,7 @@ The commandline options are:
>>>
>>>       Set the hexadecimal bitmask of TX queue offloads.
>>>       The default value is 0.
>>> +
>>> +*   ``--hot-plug``
>>> +
>>> +    Enable device event monitor machenism for hotplug.
>> s/machenism/mechanism
>>
>>> --
>>> 2.7.4

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V19 2/4] eal: add device event monitor framework
  2018-04-05  8:32                                                                       ` [PATCH V19 2/4] eal: add device event monitor framework Jeff Guo
@ 2018-04-05 10:15                                                                         ` Tan, Jianfeng
  0 siblings, 0 replies; 494+ messages in thread
From: Tan, Jianfeng @ 2018-04-05 10:15 UTC (permalink / raw)
  To: Jeff Guo, stephen, bruce.richardson, ferruh.yigit,
	konstantin.ananyev, gaetan.rivet, jingjing.wu, thomas, motih,
	harry.van.haaren
  Cc: jblunck, shreyansh.jain, dev, helin.zhang



On 4/5/2018 4:32 PM, Jeff Guo wrote:
> This patch aims to add a general device event monitor framework at
> EAL device layer, for device hotplug awareness and actions adopted
> accordingly. It could also expand for all other types of device event
> monitor, but not in this scope at the stage.
>
> To get started, users firstly call below new added APIs to enable/disable
> the device event monitor mechanism:
>    - rte_dev_event_monitor_start
>    - rte_dev_event_monitor_stop
>
> Then users shell register or unregister callbacks through the new added
> APIs. Callbacks can be some device specific, or for all devices.
>    -rte_dev_event_callback_register
>    -rte_dev_event_callback_unregister
>
> Use hotplug case for example, when device hotplug insertion or hotplug
> removal, we will get notified from kernel, then call user's callbacks
> accordingly to handle it, such as detach or attach the device from the
> bus, and could benefit further fail-safe or live-migration.
>
> Signed-off-by: Jeff Guo <jia.guo@intel.com>

Except some trivial things, I'm ok with this patch, so

Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>

> ---
> v19->v18:
> clear the coding style and fix typo
> ---
>   doc/guides/rel_notes/release_18_05.rst  |   9 ++
>   lib/librte_eal/bsdapp/eal/Makefile      |   1 +
>   lib/librte_eal/bsdapp/eal/eal_dev.c     |  21 +++++
>   lib/librte_eal/bsdapp/eal/meson.build   |   1 +
>   lib/librte_eal/common/eal_common_dev.c  | 161 ++++++++++++++++++++++++++++++++
>   lib/librte_eal/common/eal_private.h     |  15 +++
>   lib/librte_eal/common/include/rte_dev.h |  94 +++++++++++++++++++
>   lib/librte_eal/linuxapp/eal/Makefile    |   1 +
>   lib/librte_eal/linuxapp/eal/eal_dev.c   |  22 +++++
>   lib/librte_eal/linuxapp/eal/meson.build |   1 +
>   lib/librte_eal/rte_eal_version.map      |  10 ++
>   11 files changed, 336 insertions(+)
>   create mode 100644 lib/librte_eal/bsdapp/eal/eal_dev.c
>   create mode 100644 lib/librte_eal/linuxapp/eal/eal_dev.c
>
> diff --git a/doc/guides/rel_notes/release_18_05.rst b/doc/guides/rel_notes/release_18_05.rst
> index e5fac1c..d3c86bd 100644
> --- a/doc/guides/rel_notes/release_18_05.rst
> +++ b/doc/guides/rel_notes/release_18_05.rst
> @@ -58,6 +58,15 @@ New Features
>     * Added support for NVGRE, VXLAN and GENEVE filters in flow API.
>     * Added support for DROP action in flow API.
>   
> +* **Added device event monitor framework.**
> +
> +  Added a general device event monitor framework at EAL, for device dynamic management.
> +  Such as device hotplug awareness and actions adopted accordingly. The list of new APIs:
> +
> +  * ``rte_dev_event_monitor_start`` and ``rte_dev_event_monitor_stop`` are for
> +    the event monitor enable and disable.
> +  * ``rte_dev_event_callback_register`` and ``rte_dev_event_callback_unregister``
> +    are for the user's callbacks register and unregister.
>   
>   API Changes
>   -----------
> diff --git a/lib/librte_eal/bsdapp/eal/Makefile b/lib/librte_eal/bsdapp/eal/Makefile
> index ed1d17b..90b88eb 100644
> --- a/lib/librte_eal/bsdapp/eal/Makefile
> +++ b/lib/librte_eal/bsdapp/eal/Makefile
> @@ -33,6 +33,7 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_lcore.c
>   SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_timer.c
>   SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_interrupts.c
>   SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_alarm.c
> +SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_dev.c
>   
>   # from common dir
>   SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_common_lcore.c
> diff --git a/lib/librte_eal/bsdapp/eal/eal_dev.c b/lib/librte_eal/bsdapp/eal/eal_dev.c
> new file mode 100644
> index 0000000..1c6c51b
> --- /dev/null
> +++ b/lib/librte_eal/bsdapp/eal/eal_dev.c
> @@ -0,0 +1,21 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2018 Intel Corporation
> + */
> +
> +#include <rte_log.h>
> +#include <rte_compat.h>
> +#include <rte_dev.h>
> +
> +int __rte_experimental
> +rte_dev_event_monitor_start(void)
> +{
> +	RTE_LOG(ERR, EAL, "Device event is not supported for FreeBSD\n");
> +	return -1;
> +}
> +
> +int __rte_experimental
> +rte_dev_event_monitor_stop(void)
> +{
> +	RTE_LOG(ERR, EAL, "Device event is not supported for FreeBSD\n");
> +	return -1;
> +}
> diff --git a/lib/librte_eal/bsdapp/eal/meson.build b/lib/librte_eal/bsdapp/eal/meson.build
> index e83fc91..6dfc533 100644
> --- a/lib/librte_eal/bsdapp/eal/meson.build
> +++ b/lib/librte_eal/bsdapp/eal/meson.build
> @@ -12,4 +12,5 @@ env_sources = files('eal_alarm.c',
>   		'eal_timer.c',
>   		'eal.c',
>   		'eal_memory.c',
> +		'eal_dev.c'
>   )
> diff --git a/lib/librte_eal/common/eal_common_dev.c b/lib/librte_eal/common/eal_common_dev.c
> index cd07144..e202cf2 100644
> --- a/lib/librte_eal/common/eal_common_dev.c
> +++ b/lib/librte_eal/common/eal_common_dev.c
> @@ -14,9 +14,34 @@
>   #include <rte_devargs.h>
>   #include <rte_debug.h>
>   #include <rte_log.h>
> +#include <rte_spinlock.h>
> +#include <rte_malloc.h>
>   
>   #include "eal_private.h"
>   
> +/**
> + * The device event callback description.
> + *
> + * It contains callback address to be registered by user application,
> + * the pointer to the parameters for callback, and the device name.
> + */
> +struct dev_event_callback {
> +	TAILQ_ENTRY(dev_event_callback) next; /**< Callbacks list */
> +	rte_dev_event_cb_fn cb_fn;                /**< Callback address */
> +	void *cb_arg;                           /**< Callback parameter */
> +	char *dev_name;	 /**< Callback device name, NULL is for all device */
> +	uint32_t active;                        /**< Callback is executing */
> +};
> +
> +/* spinlock for device callbacks */
> +static rte_spinlock_t dev_event_lock = RTE_SPINLOCK_INITIALIZER;

It's still not in the right position; it shall be put after 
dev_event_callback list.

> +
> +/* The device event callback list for all registered callbacks. */
> +static struct dev_event_cb_list dev_event_cbs;
> +
> +/** @internal Structure to keep track of registered callbacks */
> +TAILQ_HEAD(dev_event_cb_list, dev_event_callback);
> +
>   static int cmp_detached_dev_name(const struct rte_device *dev,
>   	const void *_name)
>   {
> @@ -207,3 +232,139 @@ rte_eal_hotplug_remove(const char *busname, const char *devname)
>   	rte_eal_devargs_remove(busname, devname);
>   	return ret;
>   }
> +
> +int __rte_experimental
> +rte_dev_event_callback_register(const char *device_name,
> +				rte_dev_event_cb_fn cb_fn,
> +				void *cb_arg)
> +{
> +	struct dev_event_callback *event_cb;
> +	int ret;
> +
> +	if (!cb_fn)
> +		return -EINVAL;
> +
> +	rte_spinlock_lock(&dev_event_lock);
> +
> +	if (TAILQ_EMPTY(&dev_event_cbs))
> +		TAILQ_INIT(&dev_event_cbs);
> +
> +	TAILQ_FOREACH(event_cb, &dev_event_cbs, next) {
> +		if (event_cb->cb_fn == cb_fn && event_cb->cb_arg == cb_arg) {
> +			if (device_name == NULL && event_cb->dev_name == NULL)
> +				break;
> +			if (device_name == NULL || event_cb->dev_name == NULL)
> +				continue;
> +			if (!strcmp(event_cb->dev_name, device_name))
> +				break;
> +		}
> +	}
> +
> +	/* create a new callback. */
> +	if (event_cb == NULL) {
> +		event_cb = malloc(sizeof(struct dev_event_callback));
> +		if (event_cb != NULL) {
> +			event_cb->cb_fn = cb_fn;
> +			event_cb->cb_arg = cb_arg;
> +			event_cb->active = 0;
> +			if (!device_name) {
> +				event_cb->dev_name = NULL;
> +			} else {
> +				event_cb->dev_name = strdup(device_name);
> +				if (event_cb->dev_name == NULL) {
> +					ret = -ENOMEM;
> +					goto error;
> +				}
> +			}
> +			TAILQ_INSERT_TAIL(&dev_event_cbs, event_cb, next);
> +		} else {
> +			RTE_LOG(ERR, EAL,
> +				"Failed to allocate memory for device "
> +				"event callback.");
> +			ret = -ENOMEM;
> +			goto error;
> +		}
> +	} else {
> +		RTE_LOG(ERR, EAL,
> +			"The callback is already exist, no need "
> +			"to register again.\n");
> +		ret = -EEXIST;
> +	}
> +
> +	rte_spinlock_unlock(&dev_event_lock);
> +	return 0;
> +error:
> +	free(event_cb);
> +	rte_spinlock_unlock(&dev_event_lock);
> +	return ret;
> +}
> +
> +int __rte_experimental
> +rte_dev_event_callback_unregister(const char *device_name,
> +				  rte_dev_event_cb_fn cb_fn,
> +				  void *cb_arg)
> +{
> +	int ret = 0;
> +	struct dev_event_callback *event_cb, *next;
> +
> +	if (!cb_fn)
> +		return -EINVAL;
> +
> +	rte_spinlock_lock(&dev_event_lock);
> +	/*walk through the callbacks and remove all that match. */
> +	for (event_cb = TAILQ_FIRST(&dev_event_cbs); event_cb != NULL;
> +	     event_cb = next) {
> +
> +		next = TAILQ_NEXT(event_cb, next);
> +
> +		if (device_name != NULL && event_cb->dev_name != NULL) {
> +			if (!strcmp(event_cb->dev_name, device_name)) {
> +				if (event_cb->cb_fn != cb_fn ||
> +				    (cb_arg != (void *)-1 &&
> +				    event_cb->cb_arg != cb_arg))
> +					continue;
> +			}
> +		} else if (device_name != NULL) {
> +			continue;
> +		}
> +
> +		/*
> +		 * if this callback is not executing right now,
> +		 * then remove it.
> +		 */
> +		if (event_cb->active == 0) {
> +			TAILQ_REMOVE(&dev_event_cbs, event_cb, next);
> +			free(event_cb);
> +			ret++;
> +		} else {
> +			continue;
> +		}
> +	}
> +	rte_spinlock_unlock(&dev_event_lock);
> +	return ret;
> +}
> +
> +void
> +dev_callback_process(char *device_name, enum rte_dev_event_type event)
> +{
> +	struct dev_event_callback *cb_lst;
> +
> +	if (device_name == NULL)
> +		return;
> +
> +	rte_spinlock_lock(&dev_event_lock);
> +
> +	TAILQ_FOREACH(cb_lst, &dev_event_cbs, next) {
> +		if (cb_lst->dev_name) {
> +			if (strcmp(cb_lst->dev_name, device_name))
> +				continue;
> +		}
> +		cb_lst->active = 1;
> +		rte_spinlock_unlock(&dev_event_lock);
> +		cb_lst->cb_fn(device_name, event,
> +				cb_lst->cb_arg);
> +		rte_spinlock_lock(&dev_event_lock);
> +		cb_lst->active = 0;
> +	}
> +	rte_spinlock_unlock(&dev_event_lock);
> +}
> diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h
> index 0b28770..88e5a59 100644
> --- a/lib/librte_eal/common/eal_private.h
> +++ b/lib/librte_eal/common/eal_private.h
> @@ -9,6 +9,8 @@
>   #include <stdint.h>
>   #include <stdio.h>
>   
> +#include <rte_dev.h>
> +
>   /**
>    * Initialize the memzone subsystem (private to eal).
>    *
> @@ -205,4 +207,17 @@ struct rte_bus *rte_bus_find_by_device_name(const char *str);
>   
>   int rte_mp_channel_init(void);
>   
> +/**
> + * Internal Executes all the user application registered callbacks for
> + * the specific device. It is for DPDK internal user only. User
> + * application should not call it directly.
> + *
> + * @param device_name
> + *  The device name.
> + * @param event
> + *  the device event type.
> + *
> + */
> +void
> +dev_callback_process(char *device_name, enum rte_dev_event_type event);
>   #endif /* _EAL_PRIVATE_H_ */
> diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h
> index b688f1e..2ed240e 100644
> --- a/lib/librte_eal/common/include/rte_dev.h
> +++ b/lib/librte_eal/common/include/rte_dev.h
> @@ -24,6 +24,25 @@ extern "C" {
>   #include <rte_compat.h>
>   #include <rte_log.h>
>   
> +/**
> + * The device event type.
> + */
> +enum rte_dev_event_type {
> +	RTE_DEV_EVENT_ADD,	/**< device being added */
> +	RTE_DEV_EVENT_REMOVE,	/**< device being removed */
> +	RTE_DEV_EVENT_MAX	/**< max value of this enum */
> +};
> +
> +struct rte_dev_event {
> +	enum rte_dev_event_type type;	/**< device event type */
> +	int subsystem;			/**< subsystem id */
> +	char *devname;			/**< device name */
> +};
> +
> +typedef void (*rte_dev_event_cb_fn)(char *device_name,
> +					enum rte_dev_event_type event,
> +					void *cb_arg);
> +
>   __attribute__((format(printf, 2, 0)))
>   static inline void
>   rte_pmd_debug_trace(const char *func_name, const char *fmt, ...)
> @@ -267,4 +286,79 @@ __attribute__((used)) = str
>   }
>   #endif
>   
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * It registers the callback for the specific device.
> + * Multiple callbacks cal be registered at the same time.
> + *
> + * @param device_name
> + *  The device name, that is the param name of the struct rte_device,
> + *  null value means for all devices.
> + * @param cb_fn
> + *  callback address.
> + * @param cb_arg
> + *  address of parameter for callback.
> + *
> + * @return
> + *  - On success, zero.
> + *  - On failure, a negative value.
> + */
> +int __rte_experimental
> +rte_dev_event_callback_register(const char *device_name,
> +				rte_dev_event_cb_fn cb_fn,
> +				void *cb_arg);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * It unregisters the callback according to the specified device.
> + *
> + * @param device_name
> + *  The device name, that is the param name of the struct rte_device,
> + *  null value means for all devices.

Do you mean all callbacks?

> + * @param cb_fn
> + *  callback address.
> + * @param cb_arg
> + *  address of parameter for callback, (void *)-1 means to remove all
> + *  registered which has the same callback address.
> + *
> + * @return
> + *  - On success, return the number of callback entities removed.
> + *  - On failure, a negative value.
> + */
> +int __rte_experimental
> +rte_dev_event_callback_unregister(const char *device_name,
> +				  rte_dev_event_cb_fn cb_fn,
> +				  void *cb_arg);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Start the device event monitoring.
> + *
> + * @param none
> + * @return
> + *   - On success, zero.
> + *   - On failure, a negative value.
> + */
> +int __rte_experimental
> +rte_dev_event_monitor_start(void);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Stop the device event monitoring .
> + *
> + * @param none
> + * @return
> + *   - On success, zero.
> + *   - On failure, a negative value.
> + */
> +int __rte_experimental
> +rte_dev_event_monitor_stop(void);
>   #endif /* _RTE_DEV_H_ */
> diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
> index b9c7727..db67001 100644
> --- a/lib/librte_eal/linuxapp/eal/Makefile
> +++ b/lib/librte_eal/linuxapp/eal/Makefile
> @@ -41,6 +41,7 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_lcore.c
>   SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_timer.c
>   SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_interrupts.c
>   SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_alarm.c
> +SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_dev.c
>   
>   # from common dir
>   SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_common_lcore.c
> diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
> new file mode 100644
> index 0000000..9c8d1a0
> --- /dev/null
> +++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
> @@ -0,0 +1,22 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2018 Intel Corporation
> + */
> +
> +#include <rte_log.h>
> +#include <rte_compat.h>
> +#include <rte_dev.h>
> +
> +
> +int __rte_experimental
> +rte_dev_event_monitor_start(void)
> +{
> +	/* TODO: start uevent monitor for linux */
> +	return 0;
> +}
> +
> +int __rte_experimental
> +rte_dev_event_monitor_stop(void)
> +{
> +	/* TODO: stop uevent monitor for linux */
> +	return 0;
> +}
> diff --git a/lib/librte_eal/linuxapp/eal/meson.build b/lib/librte_eal/linuxapp/eal/meson.build
> index 03974ff..b222571 100644
> --- a/lib/librte_eal/linuxapp/eal/meson.build
> +++ b/lib/librte_eal/linuxapp/eal/meson.build
> @@ -18,6 +18,7 @@ env_sources = files('eal_alarm.c',
>   		'eal_vfio_mp_sync.c',
>   		'eal.c',
>   		'eal_memory.c',
> +		'eal_dev.c',
>   )
>   
>   if has_libnuma == 1
> diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
> index dd38783..3022df1 100644
> --- a/lib/librte_eal/rte_eal_version.map
> +++ b/lib/librte_eal/rte_eal_version.map
> @@ -260,3 +260,13 @@ EXPERIMENTAL {
>   	rte_socket_id_by_idx;
>   
>   } DPDK_18.02;
> +
> +EXPERIMENTAL {
> +        global:
> +
> +	rte_dev_event_monitor_start;
> +	rte_dev_event_monitor_stop;
> +        rte_dev_event_callback_register;
> +        rte_dev_event_callback_unregister;

Use tab instead of spaces.

> +
> +} DPDK_18.05;

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V19 3/4] eal/linux: uevent parse and process
  2018-04-05  9:02                                                                       ` [PATCH V19 3/4] eal/linux: uevent parse and process Jeff Guo
@ 2018-04-05 11:05                                                                         ` Tan, Jianfeng
  2018-04-11 11:40                                                                           ` Guo, Jia
  0 siblings, 1 reply; 494+ messages in thread
From: Tan, Jianfeng @ 2018-04-05 11:05 UTC (permalink / raw)
  To: Jeff Guo, stephen, bruce.richardson, ferruh.yigit,
	konstantin.ananyev, gaetan.rivet, jingjing.wu, thomas, motih,
	harry.van.haaren
  Cc: jblunck, shreyansh.jain, dev, helin.zhang



On 4/5/2018 5:02 PM, Jeff Guo wrote:
> In order to handle the uevent which has been detected from the kernel
> side, add uevent parse and process function to translate the uevent into
> device event, which user has subscribed to monitor.
>
> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> ---
> v19->18:
> fix some misunderstanding part
> ---
>   lib/librte_eal/linuxapp/eal/eal_dev.c | 196 +++++++++++++++++++++++++++++++++-
>   1 file changed, 194 insertions(+), 2 deletions(-)
>
> diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
> index 9c8d1a0..4686c41 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_dev.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
> @@ -2,21 +2,213 @@
>    * Copyright(c) 2018 Intel Corporation
>    */
>   
> +#include <string.h>
> +#include <unistd.h>
> +#include <sys/socket.h>
> +#include <linux/netlink.h>
> +
>   #include <rte_log.h>
>   #include <rte_compat.h>
>   #include <rte_dev.h>
> +#include <rte_malloc.h>
> +#include <rte_interrupts.h>
> +
> +#include "eal_private.h"
> +
> +static struct rte_intr_handle intr_handle = {.fd = -1 };
> +static bool monitor_started;
> +
> +#define EAL_UEV_MSG_LEN 4096
> +#define EAL_UEV_MSG_ELEM_LEN 128
> +
> +/* identify the system layer which reports this event. */
> +enum eal_dev_event_subsystem {
> +	EAL_DEV_EVENT_SUBSYSTEM_PCI, /* PCI bus device event */
> +	EAL_DEV_EVENT_SUBSYSTEM_UIO, /* UIO driver device event */
> +	EAL_DEV_EVENT_SUBSYSTEM_VFIO, /* VFIO driver device event */
> +	EAL_DEV_EVENT_SUBSYSTEM_MAX
> +};
> +
> +static int
> +dev_uev_socket_fd_create(void)
> +{
> +	struct sockaddr_nl addr;
> +	int ret;
> +
> +	intr_handle.fd = socket(PF_NETLINK, SOCK_RAW | SOCK_CLOEXEC |
> +			SOCK_NONBLOCK,
> +			NETLINK_KOBJECT_UEVENT);
> +	if (intr_handle.fd < 0) {
> +		RTE_LOG(ERR, EAL, "create uevent fd failed.\n");
> +		return -1;
> +	}
> +
> +	memset(&addr, 0, sizeof(addr));
> +	addr.nl_family = AF_NETLINK;
> +	addr.nl_pid = 0;
> +	addr.nl_groups = 0xffffffff;
> +
> +	ret = bind(intr_handle.fd, (struct sockaddr *) &addr, sizeof(addr));
> +	if (ret < 0) {
> +		RTE_LOG(ERR, EAL, "Failed to bind uevent socket.\n");
> +		goto err;
> +	}
> +
> +	return 0;
> +err:
> +	close(intr_handle.fd);
> +	intr_handle.fd = -1;
> +	return ret;
> +}
> +
> +static int
> +dev_uev_parse(const char *buf, struct rte_dev_event *event, int length)
> +{
> +	char action[EAL_UEV_MSG_ELEM_LEN];
> +	char subsystem[EAL_UEV_MSG_ELEM_LEN];
> +	char pci_slot_name[EAL_UEV_MSG_ELEM_LEN];
> +	int i = 0, ret = 0;
> +
> +	memset(action, 0, EAL_UEV_MSG_ELEM_LEN);
> +	memset(subsystem, 0, EAL_UEV_MSG_ELEM_LEN);
> +	memset(pci_slot_name, 0, EAL_UEV_MSG_ELEM_LEN);
> +
> +	while (i < length) {
> +		for (; i < length; i++) {
> +			if (*buf)
> +				break;
> +			buf++;
> +		}
> +		/**
> +		 * check device uevent from kernel side, no need to check
> +		 * uevent from udev.
> +		 */
> +		if (!strncmp(buf, "libudev", 7)) {
> +			buf += 7;
> +			i += 7;
> +			return -1;
> +		}
> +		if (!strncmp(buf, "ACTION=", 7)) {
> +			buf += 7;
> +			i += 7;
> +			snprintf(action, sizeof(action), "%s", buf);
> +		} else if (!strncmp(buf, "SUBSYSTEM=", 10)) {
> +			buf += 10;
> +			i += 10;
> +			snprintf(subsystem, sizeof(subsystem), "%s", buf);
> +		} else if (!strncmp(buf, "PCI_SLOT_NAME=", 14)) {
> +			buf += 14;
> +			i += 14;
> +			snprintf(pci_slot_name, sizeof(subsystem), "%s", buf);
> +			event->devname = strdup(pci_slot_name);
> +		}
> +		for (; i < length; i++) {
> +			if (*buf == '\0')
> +				break;
> +			buf++;
> +		}
> +	}
> +
> +	/* parse the subsystem layer */
> +	if (!strncmp(subsystem, "uio", 3))
> +		event->subsystem = EAL_DEV_EVENT_SUBSYSTEM_UIO;
> +	else if (!strncmp(subsystem, "pci", 3))
> +		event->subsystem = EAL_DEV_EVENT_SUBSYSTEM_PCI;
> +	else if (!strncmp(subsystem, "vfio", 4))
> +		event->subsystem = EAL_DEV_EVENT_SUBSYSTEM_VFIO;
> +	else
> +		ret = -1;

We can just return -1 here.

>   
> +	/* parse the action type */
> +	if (!strncmp(action, "add", 3))
> +		event->type = RTE_DEV_EVENT_ADD;
> +	else if (!strncmp(action, "remove", 6))
> +		event->type = RTE_DEV_EVENT_REMOVE;
> +	else
> +		ret = -1;

We can just return -1 here.

> +	return ret;

return 0 here.

> +}
> +
> +static void
> +dev_uev_handler(__rte_unused void *param)
> +{
> +	struct rte_dev_event uevent;
> +	int ret;
> +	char buf[EAL_UEV_MSG_LEN];
> +
> +	memset(&uevent, 0, sizeof(struct rte_dev_event));
> +	memset(buf, 0, EAL_UEV_MSG_LEN);
> +
> +	ret = recv(intr_handle.fd, buf, EAL_UEV_MSG_LEN, MSG_DONTWAIT);
> +	if (ret == 0 || (ret < 0 && errno != EAGAIN)) {
> +		/* connection is closed or broken, can not up again. */
> +		RTE_LOG(ERR, EAL, "uevent socket connection is broken.\n");

Again, we need an alarm to unregister the callback from intr thread.

> +		return;
> +	} else if (ret < 0) {
> +		RTE_LOG(ERR, EAL,
> +			"uevent socket read error(%d): %s.\n",
> +			errno, strerror(errno));
> +		return;
> +	}

I think the above code can be adjusted as:

     if (ret == 0 || (ret < 0 && errno == EAGAIN))
             return;
     else if (ret < 0) {
            RTE_LOG(ERR, EAL, "...");
            set_alarm_to_unregister();
     }


> +
> +	ret = dev_uev_parse(buf, &uevent, EAL_UEV_MSG_LEN);
> +	if (ret < 0) {
> +		RTE_LOG(ERR, EAL, "It is not an valid event "

s/ERR/DEBUG, or there are too many logs.

> +			"that need to be handle.\n");
> +		return;
> +	}
> +
> +	RTE_LOG(DEBUG, EAL, "receive uevent(name:%s, type:%d, subsystem:%d)\n",
> +		uevent.devname, uevent.type, uevent.subsystem);
> +
> +	if (uevent.devname)
> +		dev_callback_process(uevent.devname, uevent.type);
> +}
>   
>   int __rte_experimental
>   rte_dev_event_monitor_start(void)
>   {
> -	/* TODO: start uevent monitor for linux */
> +	int ret;
> +
> +	if (monitor_started)
> +		return 0;
> +
> +	ret = dev_uev_socket_fd_create();
> +	if (ret) {
> +		RTE_LOG(ERR, EAL, "error create device event fd.\n");
> +		return -1;
> +	}
> +
> +	intr_handle.type = RTE_INTR_HANDLE_DEV_EVENT;
> +	ret = rte_intr_callback_register(&intr_handle, dev_uev_handler, NULL);
> +
> +	if (ret) {
> +		RTE_LOG(ERR, EAL, "fail to register uevent callback.\n");
> +		return -1;
> +	}
> +
> +	monitor_started = true;
> +
>   	return 0;
>   }
>   
>   int __rte_experimental
>   rte_dev_event_monitor_stop(void)
>   {
> -	/* TODO: stop uevent monitor for linux */
> +	int ret;
> +
> +	if (!monitor_started)
> +		return 0;
> +
> +	ret = rte_intr_callback_unregister(&intr_handle, dev_uev_handler,
> +					   (void *)-1);
> +	if (ret < 0) {
> +		RTE_LOG(ERR, EAL, "fail to unregister uevent callback.\n");
> +		return ret;
> +	}
> +
> +	close(intr_handle.fd);
> +	intr_handle.fd = -1;
> +	monitor_started = false;
>   	return 0;
>   }

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH V20 0/4] add device event monitor framework
  2018-04-05  9:02                                                                       ` [PATCH V19 4/4] app/testpmd: enable device hotplug monitoring Jeff Guo
@ 2018-04-05 16:10                                                                         ` Jeff Guo
  2018-04-05 16:10                                                                           ` [PATCH V20 1/4] eal: add device event handle in interrupt thread Jeff Guo
                                                                                             ` (3 more replies)
  0 siblings, 4 replies; 494+ messages in thread
From: Jeff Guo @ 2018-04-05 16:10 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, harry.van.haaren,
	jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

About hot plug in dpdk, We already have proactive way to add/remove devices
through APIs (rte_eal_hotplug_add/remove), and also have fail-safe driver
to offload the fail-safe work from the app user. But there are still lack
of a general mechanism to monitor hotplug event for all driver, now the
hotplug interrupt event is diversity between each device and driver, such
as mlx4, pci driver and others.

Use the hot removal event for example, pci drivers not all exposure the
remove interrupt, so in order to make user to easy use the hot plug
feature for pci driver, something must be done to detect the remove event
at the kernel level and offer a new line of interrupt to the user land.

Base on the uevent of kobject mechanism in kernel, we could use it to
benefit for monitoring the hot plug status of the device which not only
uio/vfio of pci bus devices, but also other, such as cpu/usb/pci-express bus devices.

The idea is comming as bellow.

a.The uevent message form FD monitoring like below.
remove@/devices/pci0000:80/0000:80:02.2/0000:82:00.0/0000:83:03.0/0000:84:00.2/uio/uio2
ACTION=remove
DEVPATH=/devices/pci0000:80/0000:80:02.2/0000:82:00.0/0000:83:03.0/0000:84:00.2/uio/uio2
SUBSYSTEM=uio
MAJOR=243
MINOR=2
DEVNAME=uio2
SEQNUM=11366

b.add device event monitor framework:
add several general api to enable uevent monitoring.

c.show example how to use uevent monitor
enable uevent monitoring in testpmd to show device event monitor machenism usage.

TODO: failure handler mechanism for hot plug and driver auto bind for hot insertion.
that would let the next hot plug patch set to cover.

patchset history:
v20->v19:
add more detail note and socket error handler.

v19->18:
fix some typo and misunderstanding part

v18->v17:
1.add feature announcement in release document, fix bsp compile issue.
2.refine socket configuration.
3.remove hotplug policy and detach/attach process from testpmd, let it
focus on the device event monitoring which the patch set introduced.

v17->v16:
1.add related part of the interrupt handle type adding.
2.add new API into map, fix typo issue, add (void*)-1 value for unregister all callback
3.add new file into meson.build, modify coding sytle and add print info, delete unused part.
4.unregister all user's callback when stop event monitor

v16->v15:
1.remove some linux related code out of eal common layer
2.fix some uneasy readble issue.

v15->v14:
1.use exist eal interrupt epoll to replace of rte service usage for monitor thread,
2.add new device event handle type in eal interrupt.
3.remove the uevent type check and any policy from eal,
let it check and management in user's callback.
4.add "--hot-plug" configure parameter in testpmd to switch the hotplug feature.

v14->v13:
1.add __rte_experimental on function defind and fix bsd build issue

v13->v12:
1.fix some logic issue and null check issue
2.fix monitor stop func issue

v12->v11:
1.identify null param in callback for monitor all devices uevent

v11->v10:
1:modify some typo and add experimental tag in new file.
2:modify callback register calling.

v10->v9:
1.fix prefix issue.
2.use a common callback lists for all device and all type to replace
add callback parameter into device struct.
3.delete some unuse part.

v9->v8:
split the patch set into small and explicit patch

v8->v7:
1.use rte_service to replace pthread management.
2.fix defind issue and copyright issue
3.fix some lock issue

v7->v6:
1.modify vdev part according to the vdev rework
2.re-define and split the func into common and bus specific code
3.fix some incorrect issue.
4.fix the system hung after send packcet issue.

v6->v5:
1.add hot plug policy, in eal, default handle to prepare hot plug work for
all pci device, then let app to manage to deside which device need to
hot plug.
2.modify to manage event callback in each device.
3.fix some system hung issue when igb_uioome typo error.release.
4.modify the pci part to the bus-pci base on the bus rework.
5.add hot plug policy in app, show example to use hotplug list to manage
to deside which device need to hot plug.

v5->v4:
1.Move uevent monitor epolling from eal interrupt to eal device layer.
2.Redefine the eal device API for common, and distinguish between linux and bsd
3.Add failure handler helper api in bus layer.Add function of find device by name.
4.Replace of individual fd bind with single device, use a common fd to polling all device.
5.Add to register hot insertion monitoring and process, add function to auto bind driver befor user add device
6.Refine some coding style and typos issue
7.add new callback to process hot insertion

v4->v3:
1.move uevent monitor api from eal interrupt to eal device layer.
2.create uevent type and struct in eal device.
3.move uevent handler for each driver to eal layer.
4.add uevent failure handler to process signal fault issue.
5.add example for request and use uevent monitoring in testpmd.

v3->v2:
1.refine some return error
2.refine the string searching logic to avoid memory issue

v2->v1:
1.remove global variables of hotplug_fd, add uevent_fd
in rte_intr_handle to let each pci device self maintain it fd,
to fix dual device fd issue.
2.refine some typo error.

Jeff Guo (4):
  eal: add device event handle in interrupt thread
  eal: add device event monitor framework
  eal/linux: uevent parse and process
  app/testpmd: enable device hotplug monitoring

 app/test-pmd/parameters.c                          |   5 +-
 app/test-pmd/testpmd.c                             | 101 +++++++++-
 app/test-pmd/testpmd.h                             |   2 +
 doc/guides/rel_notes/release_18_05.rst             |   9 +
 doc/guides/testpmd_app_ug/run_app.rst              |   4 +
 lib/librte_eal/bsdapp/eal/Makefile                 |   1 +
 lib/librte_eal/bsdapp/eal/eal_dev.c                |  21 ++
 lib/librte_eal/bsdapp/eal/meson.build              |   1 +
 lib/librte_eal/common/eal_common_dev.c             | 161 +++++++++++++++
 lib/librte_eal/common/eal_private.h                |  15 ++
 lib/librte_eal/common/include/rte_dev.h            |  94 +++++++++
 lib/librte_eal/common/include/rte_eal_interrupts.h |   1 +
 lib/librte_eal/linuxapp/eal/Makefile               |   1 +
 lib/librte_eal/linuxapp/eal/eal_dev.c              | 224 +++++++++++++++++++++
 lib/librte_eal/linuxapp/eal/eal_interrupts.c       |  11 +-
 lib/librte_eal/linuxapp/eal/meson.build            |   1 +
 lib/librte_eal/rte_eal_version.map                 |  10 +
 test/test/test_interrupts.c                        |  39 +++-
 18 files changed, 696 insertions(+), 5 deletions(-)
 create mode 100644 lib/librte_eal/bsdapp/eal/eal_dev.c
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_dev.c

-- 
2.7.4

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH V20 1/4] eal: add device event handle in interrupt thread
  2018-04-05 16:10                                                                         ` [PATCH V20 0/4] add device event monitor framework Jeff Guo
@ 2018-04-05 16:10                                                                           ` Jeff Guo
  2018-04-05 16:10                                                                           ` [PATCH V20 2/4] eal: add device event monitor framework Jeff Guo
                                                                                             ` (2 subsequent siblings)
  3 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-04-05 16:10 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, harry.van.haaren,
	jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

Add new interrupt handle type of RTE_INTR_HANDLE_DEV_EVENT, for
device event interrupt monitor.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
---
v20->19:
no change
---
 lib/librte_eal/common/include/rte_eal_interrupts.h |  1 +
 lib/librte_eal/linuxapp/eal/eal_interrupts.c       | 11 +++++-
 test/test/test_interrupts.c                        | 39 ++++++++++++++++++++--
 3 files changed, 48 insertions(+), 3 deletions(-)

diff --git a/lib/librte_eal/common/include/rte_eal_interrupts.h b/lib/librte_eal/common/include/rte_eal_interrupts.h
index 3f792a9..6eb4932 100644
--- a/lib/librte_eal/common/include/rte_eal_interrupts.h
+++ b/lib/librte_eal/common/include/rte_eal_interrupts.h
@@ -34,6 +34,7 @@ enum rte_intr_handle_type {
 	RTE_INTR_HANDLE_ALARM,        /**< alarm handle */
 	RTE_INTR_HANDLE_EXT,          /**< external handler */
 	RTE_INTR_HANDLE_VDEV,         /**< virtual device */
+	RTE_INTR_HANDLE_DEV_EVENT,    /**< device event handle */
 	RTE_INTR_HANDLE_MAX           /**< count of elements */
 };
 
diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
index f86f22f..58e9328 100644
--- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c
+++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
@@ -559,6 +559,9 @@ rte_intr_enable(const struct rte_intr_handle *intr_handle)
 			return -1;
 		break;
 #endif
+	/* not used at this moment */
+	case RTE_INTR_HANDLE_DEV_EVENT:
+		return -1;
 	/* unknown handle type */
 	default:
 		RTE_LOG(ERR, EAL,
@@ -606,6 +609,9 @@ rte_intr_disable(const struct rte_intr_handle *intr_handle)
 			return -1;
 		break;
 #endif
+	/* not used at this moment */
+	case RTE_INTR_HANDLE_DEV_EVENT:
+		return -1;
 	/* unknown handle type */
 	default:
 		RTE_LOG(ERR, EAL,
@@ -674,7 +680,10 @@ eal_intr_process_interrupts(struct epoll_event *events, int nfds)
 			bytes_read = 0;
 			call = true;
 			break;
-
+		case RTE_INTR_HANDLE_DEV_EVENT:
+			bytes_read = 0;
+			call = true;
+			break;
 		default:
 			bytes_read = 1;
 			break;
diff --git a/test/test/test_interrupts.c b/test/test/test_interrupts.c
index 31a70a0..dc19175 100644
--- a/test/test/test_interrupts.c
+++ b/test/test/test_interrupts.c
@@ -20,6 +20,7 @@ enum test_interrupt_handle_type {
 	TEST_INTERRUPT_HANDLE_VALID,
 	TEST_INTERRUPT_HANDLE_VALID_UIO,
 	TEST_INTERRUPT_HANDLE_VALID_ALARM,
+	TEST_INTERRUPT_HANDLE_VALID_DEV_EVENT,
 	TEST_INTERRUPT_HANDLE_CASE1,
 	TEST_INTERRUPT_HANDLE_MAX
 };
@@ -80,6 +81,10 @@ test_interrupt_init(void)
 	intr_handles[TEST_INTERRUPT_HANDLE_VALID_ALARM].type =
 					RTE_INTR_HANDLE_ALARM;
 
+	intr_handles[TEST_INTERRUPT_HANDLE_VALID_DEV_EVENT].fd = pfds.readfd;
+	intr_handles[TEST_INTERRUPT_HANDLE_VALID_DEV_EVENT].type =
+					RTE_INTR_HANDLE_DEV_EVENT;
+
 	intr_handles[TEST_INTERRUPT_HANDLE_CASE1].fd = pfds.writefd;
 	intr_handles[TEST_INTERRUPT_HANDLE_CASE1].type = RTE_INTR_HANDLE_UIO;
 
@@ -250,6 +255,14 @@ test_interrupt_enable(void)
 		return -1;
 	}
 
+	/* check with specific valid intr_handle */
+	test_intr_handle = intr_handles[TEST_INTERRUPT_HANDLE_VALID_DEV_EVENT];
+	if (rte_intr_enable(&test_intr_handle) == 0) {
+		printf("unexpectedly enable a specific intr_handle "
+			"successfully\n");
+		return -1;
+	}
+
 	/* check with valid handler and its type */
 	test_intr_handle = intr_handles[TEST_INTERRUPT_HANDLE_CASE1];
 	if (rte_intr_enable(&test_intr_handle) < 0) {
@@ -306,6 +319,14 @@ test_interrupt_disable(void)
 		return -1;
 	}
 
+	/* check with specific valid intr_handle */
+	test_intr_handle = intr_handles[TEST_INTERRUPT_HANDLE_VALID_DEV_EVENT];
+	if (rte_intr_disable(&test_intr_handle) == 0) {
+		printf("unexpectedly disable a specific intr_handle "
+			"successfully\n");
+		return -1;
+	}
+
 	/* check with valid handler and its type */
 	test_intr_handle = intr_handles[TEST_INTERRUPT_HANDLE_CASE1];
 	if (rte_intr_disable(&test_intr_handle) < 0) {
@@ -393,9 +414,17 @@ test_interrupt(void)
 		goto out;
 	}
 
+	printf("Check valid device event interrupt full path\n");
+	if (test_interrupt_full_path_check(
+		TEST_INTERRUPT_HANDLE_VALID_DEV_EVENT) < 0) {
+		printf("failure occurred during checking valid device event "
+						"interrupt full path\n");
+		goto out;
+	}
+
 	printf("Check valid alarm interrupt full path\n");
-	if (test_interrupt_full_path_check(TEST_INTERRUPT_HANDLE_VALID_ALARM)
-									< 0) {
+	if (test_interrupt_full_path_check(
+		TEST_INTERRUPT_HANDLE_VALID_DEV_EVENT) < 0) {
 		printf("failure occurred during checking valid alarm "
 						"interrupt full path\n");
 		goto out;
@@ -513,6 +542,12 @@ test_interrupt(void)
 	rte_intr_callback_unregister(&test_intr_handle,
 			test_interrupt_callback_1, (void *)-1);
 
+	test_intr_handle = intr_handles[TEST_INTERRUPT_HANDLE_VALID_DEV_EVENT];
+	rte_intr_callback_unregister(&test_intr_handle,
+			test_interrupt_callback, (void *)-1);
+	rte_intr_callback_unregister(&test_intr_handle,
+			test_interrupt_callback_1, (void *)-1);
+
 	rte_delay_ms(2 * TEST_INTERRUPT_CHECK_INTERVAL);
 	/* deinit */
 	test_interrupt_deinit();
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH V20 2/4] eal: add device event monitor framework
  2018-04-05 16:10                                                                         ` [PATCH V20 0/4] add device event monitor framework Jeff Guo
  2018-04-05 16:10                                                                           ` [PATCH V20 1/4] eal: add device event handle in interrupt thread Jeff Guo
@ 2018-04-05 16:10                                                                           ` Jeff Guo
  2018-04-05 21:54                                                                             ` Thomas Monjalon
  2018-04-05 16:10                                                                           ` [PATCH V20 3/4] eal/linux: uevent parse and process Jeff Guo
  2018-04-05 16:10                                                                           ` [PATCH V20 4/4] app/testpmd: enable device hotplug monitoring Jeff Guo
  3 siblings, 1 reply; 494+ messages in thread
From: Jeff Guo @ 2018-04-05 16:10 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, harry.van.haaren,
	jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

This patch aims to add a general device event monitor framework at
EAL device layer, for device hotplug awareness and actions adopted
accordingly. It could also expand for all other types of device event
monitor, but not in this scope at the stage.

To get started, users firstly call below new added APIs to enable/disable
the device event monitor mechanism:
  - rte_dev_event_monitor_start
  - rte_dev_event_monitor_stop

Then users shell register or unregister callbacks through the new added
APIs. Callbacks can be some device specific, or for all devices.
  -rte_dev_event_callback_register
  -rte_dev_event_callback_unregister

Use hotplug case for example, when device hotplug insertion or hotplug
removal, we will get notified from kernel, then call user's callbacks
accordingly to handle it, such as detach or attach the device from the
bus, and could benefit further fail-safe or live-migration.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
---
v20->v19:
add more detail note for callback unregister.
---
 doc/guides/rel_notes/release_18_05.rst  |   9 ++
 lib/librte_eal/bsdapp/eal/Makefile      |   1 +
 lib/librte_eal/bsdapp/eal/eal_dev.c     |  21 +++++
 lib/librte_eal/bsdapp/eal/meson.build   |   1 +
 lib/librte_eal/common/eal_common_dev.c  | 161 ++++++++++++++++++++++++++++++++
 lib/librte_eal/common/eal_private.h     |  15 +++
 lib/librte_eal/common/include/rte_dev.h |  94 +++++++++++++++++++
 lib/librte_eal/linuxapp/eal/Makefile    |   1 +
 lib/librte_eal/linuxapp/eal/eal_dev.c   |  22 +++++
 lib/librte_eal/linuxapp/eal/meson.build |   1 +
 lib/librte_eal/rte_eal_version.map      |  10 ++
 11 files changed, 336 insertions(+)
 create mode 100644 lib/librte_eal/bsdapp/eal/eal_dev.c
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_dev.c

diff --git a/doc/guides/rel_notes/release_18_05.rst b/doc/guides/rel_notes/release_18_05.rst
index e5fac1c..d3c86bd 100644
--- a/doc/guides/rel_notes/release_18_05.rst
+++ b/doc/guides/rel_notes/release_18_05.rst
@@ -58,6 +58,15 @@ New Features
   * Added support for NVGRE, VXLAN and GENEVE filters in flow API.
   * Added support for DROP action in flow API.
 
+* **Added device event monitor framework.**
+
+  Added a general device event monitor framework at EAL, for device dynamic management.
+  Such as device hotplug awareness and actions adopted accordingly. The list of new APIs:
+
+  * ``rte_dev_event_monitor_start`` and ``rte_dev_event_monitor_stop`` are for
+    the event monitor enable and disable.
+  * ``rte_dev_event_callback_register`` and ``rte_dev_event_callback_unregister``
+    are for the user's callbacks register and unregister.
 
 API Changes
 -----------
diff --git a/lib/librte_eal/bsdapp/eal/Makefile b/lib/librte_eal/bsdapp/eal/Makefile
index ed1d17b..90b88eb 100644
--- a/lib/librte_eal/bsdapp/eal/Makefile
+++ b/lib/librte_eal/bsdapp/eal/Makefile
@@ -33,6 +33,7 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_lcore.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_timer.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_interrupts.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_alarm.c
+SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_dev.c
 
 # from common dir
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_common_lcore.c
diff --git a/lib/librte_eal/bsdapp/eal/eal_dev.c b/lib/librte_eal/bsdapp/eal/eal_dev.c
new file mode 100644
index 0000000..1c6c51b
--- /dev/null
+++ b/lib/librte_eal/bsdapp/eal/eal_dev.c
@@ -0,0 +1,21 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <rte_log.h>
+#include <rte_compat.h>
+#include <rte_dev.h>
+
+int __rte_experimental
+rte_dev_event_monitor_start(void)
+{
+	RTE_LOG(ERR, EAL, "Device event is not supported for FreeBSD\n");
+	return -1;
+}
+
+int __rte_experimental
+rte_dev_event_monitor_stop(void)
+{
+	RTE_LOG(ERR, EAL, "Device event is not supported for FreeBSD\n");
+	return -1;
+}
diff --git a/lib/librte_eal/bsdapp/eal/meson.build b/lib/librte_eal/bsdapp/eal/meson.build
index e83fc91..6dfc533 100644
--- a/lib/librte_eal/bsdapp/eal/meson.build
+++ b/lib/librte_eal/bsdapp/eal/meson.build
@@ -12,4 +12,5 @@ env_sources = files('eal_alarm.c',
 		'eal_timer.c',
 		'eal.c',
 		'eal_memory.c',
+		'eal_dev.c'
 )
diff --git a/lib/librte_eal/common/eal_common_dev.c b/lib/librte_eal/common/eal_common_dev.c
index cd07144..e156c66 100644
--- a/lib/librte_eal/common/eal_common_dev.c
+++ b/lib/librte_eal/common/eal_common_dev.c
@@ -14,9 +14,34 @@
 #include <rte_devargs.h>
 #include <rte_debug.h>
 #include <rte_log.h>
+#include <rte_spinlock.h>
+#include <rte_malloc.h>
 
 #include "eal_private.h"
 
+/**
+ * The device event callback description.
+ *
+ * It contains callback address to be registered by user application,
+ * the pointer to the parameters for callback, and the device name.
+ */
+struct dev_event_callback {
+	TAILQ_ENTRY(dev_event_callback) next; /**< Callbacks list */
+	rte_dev_event_cb_fn cb_fn;                /**< Callback address */
+	void *cb_arg;                           /**< Callback parameter */
+	char *dev_name;	 /**< Callback device name, NULL is for all device */
+	uint32_t active;                        /**< Callback is executing */
+};
+
+/* The device event callback list for all registered callbacks. */
+static struct dev_event_cb_list dev_event_cbs;
+
+/** @internal Structure to keep track of registered callbacks */
+TAILQ_HEAD(dev_event_cb_list, dev_event_callback);
+
+/* spinlock for device callbacks */
+static rte_spinlock_t dev_event_lock = RTE_SPINLOCK_INITIALIZER;
+
 static int cmp_detached_dev_name(const struct rte_device *dev,
 	const void *_name)
 {
@@ -207,3 +232,139 @@ rte_eal_hotplug_remove(const char *busname, const char *devname)
 	rte_eal_devargs_remove(busname, devname);
 	return ret;
 }
+
+int __rte_experimental
+rte_dev_event_callback_register(const char *device_name,
+				rte_dev_event_cb_fn cb_fn,
+				void *cb_arg)
+{
+	struct dev_event_callback *event_cb;
+	int ret;
+
+	if (!cb_fn)
+		return -EINVAL;
+
+	rte_spinlock_lock(&dev_event_lock);
+
+	if (TAILQ_EMPTY(&dev_event_cbs))
+		TAILQ_INIT(&dev_event_cbs);
+
+	TAILQ_FOREACH(event_cb, &dev_event_cbs, next) {
+		if (event_cb->cb_fn == cb_fn && event_cb->cb_arg == cb_arg) {
+			if (device_name == NULL && event_cb->dev_name == NULL)
+				break;
+			if (device_name == NULL || event_cb->dev_name == NULL)
+				continue;
+			if (!strcmp(event_cb->dev_name, device_name))
+				break;
+		}
+	}
+
+	/* create a new callback. */
+	if (event_cb == NULL) {
+		event_cb = malloc(sizeof(struct dev_event_callback));
+		if (event_cb != NULL) {
+			event_cb->cb_fn = cb_fn;
+			event_cb->cb_arg = cb_arg;
+			event_cb->active = 0;
+			if (!device_name) {
+				event_cb->dev_name = NULL;
+			} else {
+				event_cb->dev_name = strdup(device_name);
+				if (event_cb->dev_name == NULL) {
+					ret = -ENOMEM;
+					goto error;
+				}
+			}
+			TAILQ_INSERT_TAIL(&dev_event_cbs, event_cb, next);
+		} else {
+			RTE_LOG(ERR, EAL,
+				"Failed to allocate memory for device "
+				"event callback.");
+			ret = -ENOMEM;
+			goto error;
+		}
+	} else {
+		RTE_LOG(ERR, EAL,
+			"The callback is already exist, no need "
+			"to register again.\n");
+		ret = -EEXIST;
+	}
+
+	rte_spinlock_unlock(&dev_event_lock);
+	return 0;
+error:
+	free(event_cb);
+	rte_spinlock_unlock(&dev_event_lock);
+	return ret;
+}
+
+int __rte_experimental
+rte_dev_event_callback_unregister(const char *device_name,
+				  rte_dev_event_cb_fn cb_fn,
+				  void *cb_arg)
+{
+	int ret = 0;
+	struct dev_event_callback *event_cb, *next;
+
+	if (!cb_fn)
+		return -EINVAL;
+
+	rte_spinlock_lock(&dev_event_lock);
+	/*walk through the callbacks and remove all that match. */
+	for (event_cb = TAILQ_FIRST(&dev_event_cbs); event_cb != NULL;
+	     event_cb = next) {
+
+		next = TAILQ_NEXT(event_cb, next);
+
+		if (device_name != NULL && event_cb->dev_name != NULL) {
+			if (!strcmp(event_cb->dev_name, device_name)) {
+				if (event_cb->cb_fn != cb_fn ||
+				    (cb_arg != (void *)-1 &&
+				    event_cb->cb_arg != cb_arg))
+					continue;
+			}
+		} else if (device_name != NULL) {
+			continue;
+		}
+
+		/*
+		 * if this callback is not executing right now,
+		 * then remove it.
+		 */
+		if (event_cb->active == 0) {
+			TAILQ_REMOVE(&dev_event_cbs, event_cb, next);
+			free(event_cb);
+			ret++;
+		} else {
+			continue;
+		}
+	}
+	rte_spinlock_unlock(&dev_event_lock);
+	return ret;
+}
+
+void
+dev_callback_process(char *device_name, enum rte_dev_event_type event)
+{
+	struct dev_event_callback *cb_lst;
+
+	if (device_name == NULL)
+		return;
+
+	rte_spinlock_lock(&dev_event_lock);
+
+	TAILQ_FOREACH(cb_lst, &dev_event_cbs, next) {
+		if (cb_lst->dev_name) {
+			if (strcmp(cb_lst->dev_name, device_name))
+				continue;
+		}
+		cb_lst->active = 1;
+		rte_spinlock_unlock(&dev_event_lock);
+		cb_lst->cb_fn(device_name, event,
+				cb_lst->cb_arg);
+		rte_spinlock_lock(&dev_event_lock);
+		cb_lst->active = 0;
+	}
+	rte_spinlock_unlock(&dev_event_lock);
+}
diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h
index 0b28770..88e5a59 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -9,6 +9,8 @@
 #include <stdint.h>
 #include <stdio.h>
 
+#include <rte_dev.h>
+
 /**
  * Initialize the memzone subsystem (private to eal).
  *
@@ -205,4 +207,17 @@ struct rte_bus *rte_bus_find_by_device_name(const char *str);
 
 int rte_mp_channel_init(void);
 
+/**
+ * Internal Executes all the user application registered callbacks for
+ * the specific device. It is for DPDK internal user only. User
+ * application should not call it directly.
+ *
+ * @param device_name
+ *  The device name.
+ * @param event
+ *  the device event type.
+ *
+ */
+void
+dev_callback_process(char *device_name, enum rte_dev_event_type event);
 #endif /* _EAL_PRIVATE_H_ */
diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h
index b688f1e..a5203e7 100644
--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -24,6 +24,25 @@ extern "C" {
 #include <rte_compat.h>
 #include <rte_log.h>
 
+/**
+ * The device event type.
+ */
+enum rte_dev_event_type {
+	RTE_DEV_EVENT_ADD,	/**< device being added */
+	RTE_DEV_EVENT_REMOVE,	/**< device being removed */
+	RTE_DEV_EVENT_MAX	/**< max value of this enum */
+};
+
+struct rte_dev_event {
+	enum rte_dev_event_type type;	/**< device event type */
+	int subsystem;			/**< subsystem id */
+	char *devname;			/**< device name */
+};
+
+typedef void (*rte_dev_event_cb_fn)(char *device_name,
+					enum rte_dev_event_type event,
+					void *cb_arg);
+
 __attribute__((format(printf, 2, 0)))
 static inline void
 rte_pmd_debug_trace(const char *func_name, const char *fmt, ...)
@@ -267,4 +286,79 @@ __attribute__((used)) = str
 }
 #endif
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * It registers the callback for the specific device.
+ * Multiple callbacks cal be registered at the same time.
+ *
+ * @param device_name
+ *  The device name, that is the param name of the struct rte_device,
+ *  null value means for all devices.
+ * @param cb_fn
+ *  callback address.
+ * @param cb_arg
+ *  address of parameter for callback.
+ *
+ * @return
+ *  - On success, zero.
+ *  - On failure, a negative value.
+ */
+int __rte_experimental
+rte_dev_event_callback_register(const char *device_name,
+				rte_dev_event_cb_fn cb_fn,
+				void *cb_arg);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * It unregisters the callback according to the specified device.
+ *
+ * @param device_name
+ *  The device name, that is the param name of the struct rte_device,
+ *  null value means for all devices and their callbacks.
+ * @param cb_fn
+ *  callback address.
+ * @param cb_arg
+ *  address of parameter for callback, (void *)-1 means to remove all
+ *  registered which has the same callback address.
+ *
+ * @return
+ *  - On success, return the number of callback entities removed.
+ *  - On failure, a negative value.
+ */
+int __rte_experimental
+rte_dev_event_callback_unregister(const char *device_name,
+				  rte_dev_event_cb_fn cb_fn,
+				  void *cb_arg);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Start the device event monitoring.
+ *
+ * @param none
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int __rte_experimental
+rte_dev_event_monitor_start(void);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Stop the device event monitoring .
+ *
+ * @param none
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int __rte_experimental
+rte_dev_event_monitor_stop(void);
 #endif /* _RTE_DEV_H_ */
diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index b9c7727..db67001 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -41,6 +41,7 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_lcore.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_timer.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_interrupts.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_alarm.c
+SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_dev.c
 
 # from common dir
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_common_lcore.c
diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
new file mode 100644
index 0000000..9c8d1a0
--- /dev/null
+++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
@@ -0,0 +1,22 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <rte_log.h>
+#include <rte_compat.h>
+#include <rte_dev.h>
+
+
+int __rte_experimental
+rte_dev_event_monitor_start(void)
+{
+	/* TODO: start uevent monitor for linux */
+	return 0;
+}
+
+int __rte_experimental
+rte_dev_event_monitor_stop(void)
+{
+	/* TODO: stop uevent monitor for linux */
+	return 0;
+}
diff --git a/lib/librte_eal/linuxapp/eal/meson.build b/lib/librte_eal/linuxapp/eal/meson.build
index 03974ff..b222571 100644
--- a/lib/librte_eal/linuxapp/eal/meson.build
+++ b/lib/librte_eal/linuxapp/eal/meson.build
@@ -18,6 +18,7 @@ env_sources = files('eal_alarm.c',
 		'eal_vfio_mp_sync.c',
 		'eal.c',
 		'eal_memory.c',
+		'eal_dev.c',
 )
 
 if has_libnuma == 1
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index dd38783..33ac125 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -260,3 +260,13 @@ EXPERIMENTAL {
 	rte_socket_id_by_idx;
 
 } DPDK_18.02;
+
+EXPERIMENTAL {
+	global:
+
+	rte_dev_event_monitor_start;
+	rte_dev_event_monitor_stop;
+	rte_dev_event_callback_register;
+	rte_dev_event_callback_unregister;
+
+} DPDK_18.05;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH V20 3/4] eal/linux: uevent parse and process
  2018-04-05 16:10                                                                         ` [PATCH V20 0/4] add device event monitor framework Jeff Guo
  2018-04-05 16:10                                                                           ` [PATCH V20 1/4] eal: add device event handle in interrupt thread Jeff Guo
  2018-04-05 16:10                                                                           ` [PATCH V20 2/4] eal: add device event monitor framework Jeff Guo
@ 2018-04-05 16:10                                                                           ` Jeff Guo
  2018-04-05 16:22                                                                             ` Tan, Jianfeng
  2018-04-05 21:58                                                                             ` Thomas Monjalon
  2018-04-05 16:10                                                                           ` [PATCH V20 4/4] app/testpmd: enable device hotplug monitoring Jeff Guo
  3 siblings, 2 replies; 494+ messages in thread
From: Jeff Guo @ 2018-04-05 16:10 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, harry.van.haaren,
	jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

In order to handle the uevent which has been detected from the kernel
side, add uevent parse and process function to translate the uevent into
device event, which user has subscribed to monitor.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v20->v19:
add socket error handler
---
 lib/librte_eal/linuxapp/eal/eal_dev.c | 206 +++++++++++++++++++++++++++++++++-
 1 file changed, 204 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
index 9c8d1a0..bde595c 100644
--- a/lib/librte_eal/linuxapp/eal/eal_dev.c
+++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
@@ -2,21 +2,223 @@
  * Copyright(c) 2018 Intel Corporation
  */
 
+#include <string.h>
+#include <unistd.h>
+#include <sys/socket.h>
+#include <linux/netlink.h>
+
 #include <rte_log.h>
 #include <rte_compat.h>
 #include <rte_dev.h>
+#include <rte_malloc.h>
+#include <rte_interrupts.h>
+#include <rte_alarm.h>
+
+#include "eal_private.h"
+
+static struct rte_intr_handle intr_handle = {.fd = -1 };
+static bool monitor_started;
+
+#define EAL_UEV_MSG_LEN 4096
+#define EAL_UEV_MSG_ELEM_LEN 128
+
+static void dev_uev_handler(__rte_unused void *param);
+
+/* identify the system layer which reports this event. */
+enum eal_dev_event_subsystem {
+	EAL_DEV_EVENT_SUBSYSTEM_PCI, /* PCI bus device event */
+	EAL_DEV_EVENT_SUBSYSTEM_UIO, /* UIO driver device event */
+	EAL_DEV_EVENT_SUBSYSTEM_VFIO, /* VFIO driver device event */
+	EAL_DEV_EVENT_SUBSYSTEM_MAX
+};
+
+static int
+dev_uev_socket_fd_create(void)
+{
+	struct sockaddr_nl addr;
+	int ret;
+
+	intr_handle.fd = socket(PF_NETLINK, SOCK_RAW | SOCK_CLOEXEC |
+			SOCK_NONBLOCK,
+			NETLINK_KOBJECT_UEVENT);
+	if (intr_handle.fd < 0) {
+		RTE_LOG(ERR, EAL, "create uevent fd failed.\n");
+		return -1;
+	}
+
+	memset(&addr, 0, sizeof(addr));
+	addr.nl_family = AF_NETLINK;
+	addr.nl_pid = 0;
+	addr.nl_groups = 0xffffffff;
+
+	ret = bind(intr_handle.fd, (struct sockaddr *) &addr, sizeof(addr));
+	if (ret < 0) {
+		RTE_LOG(ERR, EAL, "Failed to bind uevent socket.\n");
+		goto err;
+	}
 
+	return 0;
+err:
+	close(intr_handle.fd);
+	intr_handle.fd = -1;
+	return ret;
+}
+
+static int
+dev_uev_parse(const char *buf, struct rte_dev_event *event, int length)
+{
+	char action[EAL_UEV_MSG_ELEM_LEN];
+	char subsystem[EAL_UEV_MSG_ELEM_LEN];
+	char pci_slot_name[EAL_UEV_MSG_ELEM_LEN];
+	int i = 0;
+
+	memset(action, 0, EAL_UEV_MSG_ELEM_LEN);
+	memset(subsystem, 0, EAL_UEV_MSG_ELEM_LEN);
+	memset(pci_slot_name, 0, EAL_UEV_MSG_ELEM_LEN);
+
+	while (i < length) {
+		for (; i < length; i++) {
+			if (*buf)
+				break;
+			buf++;
+		}
+		/**
+		 * check device uevent from kernel side, no need to check
+		 * uevent from udev.
+		 */
+		if (!strncmp(buf, "libudev", 7)) {
+			buf += 7;
+			i += 7;
+			return -1;
+		}
+		if (!strncmp(buf, "ACTION=", 7)) {
+			buf += 7;
+			i += 7;
+			snprintf(action, sizeof(action), "%s", buf);
+		} else if (!strncmp(buf, "SUBSYSTEM=", 10)) {
+			buf += 10;
+			i += 10;
+			snprintf(subsystem, sizeof(subsystem), "%s", buf);
+		} else if (!strncmp(buf, "PCI_SLOT_NAME=", 14)) {
+			buf += 14;
+			i += 14;
+			snprintf(pci_slot_name, sizeof(subsystem), "%s", buf);
+			event->devname = strdup(pci_slot_name);
+		}
+		for (; i < length; i++) {
+			if (*buf == '\0')
+				break;
+			buf++;
+		}
+	}
+
+	/* parse the subsystem layer */
+	if (!strncmp(subsystem, "uio", 3))
+		event->subsystem = EAL_DEV_EVENT_SUBSYSTEM_UIO;
+	else if (!strncmp(subsystem, "pci", 3))
+		event->subsystem = EAL_DEV_EVENT_SUBSYSTEM_PCI;
+	else if (!strncmp(subsystem, "vfio", 4))
+		event->subsystem = EAL_DEV_EVENT_SUBSYSTEM_VFIO;
+	else
+		return -1;
+
+	/* parse the action type */
+	if (!strncmp(action, "add", 3))
+		event->type = RTE_DEV_EVENT_ADD;
+	else if (!strncmp(action, "remove", 6))
+		event->type = RTE_DEV_EVENT_REMOVE;
+	else
+		return -1;
+	return 0;
+}
+
+static void
+dev_delayed_unregister(void *param)
+{
+	rte_intr_callback_unregister(&intr_handle, dev_uev_handler, param);
+}
+
+static void
+dev_uev_handler(__rte_unused void *param)
+{
+	struct rte_dev_event uevent;
+	int ret;
+	char buf[EAL_UEV_MSG_LEN];
+
+	memset(&uevent, 0, sizeof(struct rte_dev_event));
+	memset(buf, 0, EAL_UEV_MSG_LEN);
+
+	ret = recv(intr_handle.fd, buf, EAL_UEV_MSG_LEN, MSG_DONTWAIT);
+	if (ret == 0 || (ret < 0 && errno != EAGAIN)) {
+		/* connection is closed or broken, can not up again. */
+		RTE_LOG(ERR, EAL, "uevent socket connection is broken.\n");
+		rte_eal_alarm_set(1, dev_delayed_unregister, NULL);
+		return;
+	} else if (ret < 0) {
+		RTE_LOG(ERR, EAL,
+			"uevent socket read error(%d): %s.\n",
+			errno, strerror(errno));
+		return;
+	}
+
+	ret = dev_uev_parse(buf, &uevent, EAL_UEV_MSG_LEN);
+	if (ret < 0) {
+		RTE_LOG(DEBUG, EAL, "It is not an valid event "
+			"that need to be handle.\n");
+		return;
+	}
+
+	RTE_LOG(DEBUG, EAL, "receive uevent(name:%s, type:%d, subsystem:%d)\n",
+		uevent.devname, uevent.type, uevent.subsystem);
+
+	if (uevent.devname)
+		dev_callback_process(uevent.devname, uevent.type);
+}
 
 int __rte_experimental
 rte_dev_event_monitor_start(void)
 {
-	/* TODO: start uevent monitor for linux */
+	int ret;
+
+	if (monitor_started)
+		return 0;
+
+	ret = dev_uev_socket_fd_create();
+	if (ret) {
+		RTE_LOG(ERR, EAL, "error create device event fd.\n");
+		return -1;
+	}
+
+	intr_handle.type = RTE_INTR_HANDLE_DEV_EVENT;
+	ret = rte_intr_callback_register(&intr_handle, dev_uev_handler, NULL);
+
+	if (ret) {
+		RTE_LOG(ERR, EAL, "fail to register uevent callback.\n");
+		return -1;
+	}
+
+	monitor_started = true;
+
 	return 0;
 }
 
 int __rte_experimental
 rte_dev_event_monitor_stop(void)
 {
-	/* TODO: stop uevent monitor for linux */
+	int ret;
+
+	if (!monitor_started)
+		return 0;
+
+	ret = rte_intr_callback_unregister(&intr_handle, dev_uev_handler,
+					   (void *)-1);
+	if (ret < 0) {
+		RTE_LOG(ERR, EAL, "fail to unregister uevent callback.\n");
+		return ret;
+	}
+
+	close(intr_handle.fd);
+	intr_handle.fd = -1;
+	monitor_started = false;
 	return 0;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH V20 4/4] app/testpmd: enable device hotplug monitoring
  2018-04-05 16:10                                                                         ` [PATCH V20 0/4] add device event monitor framework Jeff Guo
                                                                                             ` (2 preceding siblings ...)
  2018-04-05 16:10                                                                           ` [PATCH V20 3/4] eal/linux: uevent parse and process Jeff Guo
@ 2018-04-05 16:10                                                                           ` Jeff Guo
  2018-04-05 21:48                                                                             ` Thomas Monjalon
  2018-04-06  3:54                                                                             ` [PATCH V21 0/4] add device event monitor framework Jeff Guo
  3 siblings, 2 replies; 494+ messages in thread
From: Jeff Guo @ 2018-04-05 16:10 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, harry.van.haaren,
	jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

Use testpmd for example, to show how an application uses device event
APIs to monitor the hotplug events, including both hot removal event
and hot insertion event.

The process is that, testpmd first enable hotplug by below commands,

E.g. ./build/app/testpmd -c 0x3 --n 4 -- -i --hot-plug

then testpmd starts the device event monitor by calling the new API
(rte_dev_event_monitor_start) and register the user's callback by call
the API (rte_dev_event_callback_register), when device being hotplug
insertion or hotplug removal, the device event monitor detects the event
and call user's callbacks, user could process the event in the callback
accordingly.

This patch only shows the event monitoring, device attach/detach would
not be involved here, will add from other hotplug patch set.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
---
v20->v19:
no change
---
 app/test-pmd/parameters.c             |   5 +-
 app/test-pmd/testpmd.c                | 101 +++++++++++++++++++++++++++++++++-
 app/test-pmd/testpmd.h                |   2 +
 doc/guides/testpmd_app_ug/run_app.rst |   4 ++
 4 files changed, 110 insertions(+), 2 deletions(-)

diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index 2192bdc..1a05284 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -186,6 +186,7 @@ usage(char* progname)
 	printf("  --flow-isolate-all: "
 	       "requests flow API isolated mode on all ports at initialization time.\n");
 	printf("  --tx-offloads=0xXXXXXXXX: hexadecimal bitmask of TX queue offloads\n");
+	printf("  --hot-plug: enable hot plug for device.\n");
 }
 
 #ifdef RTE_LIBRTE_CMDLINE
@@ -621,6 +622,7 @@ launch_args_parse(int argc, char** argv)
 		{ "print-event",		1, 0, 0 },
 		{ "mask-event",			1, 0, 0 },
 		{ "tx-offloads",		1, 0, 0 },
+		{ "hot-plug",			0, 0, 0 },
 		{ 0, 0, 0, 0 },
 	};
 
@@ -1101,7 +1103,8 @@ launch_args_parse(int argc, char** argv)
 					rte_exit(EXIT_FAILURE,
 						 "invalid mask-event argument\n");
 				}
-
+			if (!strcmp(lgopts[opt_idx].name, "hot-plug"))
+				hot_plug = 1;
 			break;
 		case 'h':
 			usage(argv[0]);
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 4c0e258..d2c122a 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -12,6 +12,7 @@
 #include <sys/mman.h>
 #include <sys/types.h>
 #include <errno.h>
+#include <stdbool.h>
 
 #include <sys/queue.h>
 #include <sys/stat.h>
@@ -284,6 +285,8 @@ uint8_t lsc_interrupt = 1; /* enabled by default */
  */
 uint8_t rmv_interrupt = 1; /* enabled by default */
 
+uint8_t hot_plug = 0; /**< hotplug disabled by default. */
+
 /*
  * Display or mask ether events
  * Default to all events except VF_MBOX
@@ -391,6 +394,12 @@ static void check_all_ports_link_status(uint32_t port_mask);
 static int eth_event_callback(portid_t port_id,
 			      enum rte_eth_event_type type,
 			      void *param, void *ret_param);
+static void eth_dev_event_callback(char *device_name,
+				enum rte_dev_event_type type,
+				void *param);
+static int eth_dev_event_callback_register(void);
+static int eth_dev_event_callback_unregister(void);
+
 
 /*
  * Check if all the ports are started.
@@ -1853,6 +1862,39 @@ reset_port(portid_t pid)
 	printf("Done\n");
 }
 
+static int
+eth_dev_event_callback_register(void)
+{
+	int ret;
+
+	/* register the device event callback */
+	ret = rte_dev_event_callback_register(NULL,
+		eth_dev_event_callback, NULL);
+	if (ret) {
+		printf("Failed to register device event callback\n");
+		return -1;
+	}
+
+	return 0;
+}
+
+
+static int
+eth_dev_event_callback_unregister(void)
+{
+	int ret;
+
+	/* unregister the device event callback */
+	ret = rte_dev_event_callback_unregister(NULL,
+		eth_dev_event_callback, NULL);
+	if (ret < 0) {
+		printf("Failed to unregister device event callback\n");
+		return -1;
+	}
+
+	return 0;
+}
+
 void
 attach_port(char *identifier)
 {
@@ -1916,6 +1958,7 @@ void
 pmd_test_exit(void)
 {
 	portid_t pt_id;
+	int ret;
 
 	if (test_done == 0)
 		stop_packet_forwarding();
@@ -1929,6 +1972,18 @@ pmd_test_exit(void)
 			close_port(pt_id);
 		}
 	}
+
+	if (hot_plug) {
+		ret = rte_dev_event_monitor_stop();
+		if (ret)
+			RTE_LOG(ERR, EAL,
+				"fail to stop device event monitor.");
+
+		ret = eth_dev_event_callback_unregister();
+		if (ret)
+			RTE_LOG(ERR, EAL,
+				"fail to unregister all event callbacks.");
+	}
 	printf("\nBye...\n");
 }
 
@@ -2059,6 +2114,37 @@ eth_event_callback(portid_t port_id, enum rte_eth_event_type type, void *param,
 	return 0;
 }
 
+/* This function is used by the interrupt thread */
+static void
+eth_dev_event_callback(char *device_name, enum rte_dev_event_type type,
+			     __rte_unused void *arg)
+{
+	if (type >= RTE_DEV_EVENT_MAX) {
+		fprintf(stderr, "%s called upon invalid event %d\n",
+			__func__, type);
+		fflush(stderr);
+	}
+
+	switch (type) {
+	case RTE_DEV_EVENT_REMOVE:
+		RTE_LOG(ERR, EAL, "The device: %s has been removed!\n",
+			device_name);
+		/* TODO: After finish failure handle, begin to stop
+		 * packet forward, stop port, close port, detach port.
+		 */
+		break;
+	case RTE_DEV_EVENT_ADD:
+		RTE_LOG(ERR, EAL, "The device: %s has been added!\n",
+			device_name);
+		/* TODO: After finish kernel driver binding,
+		 * begin to attach port.
+		 */
+		break;
+	default:
+		break;
+	}
+}
+
 static int
 set_tx_queue_stats_mapping_registers(portid_t port_id, struct rte_port *port)
 {
@@ -2474,8 +2560,9 @@ signal_handler(int signum)
 int
 main(int argc, char** argv)
 {
-	int  diag;
+	int diag;
 	portid_t port_id;
+	int ret;
 
 	signal(SIGINT, signal_handler);
 	signal(SIGTERM, signal_handler);
@@ -2543,6 +2630,18 @@ main(int argc, char** argv)
 		       nb_rxq, nb_txq);
 
 	init_config();
+
+	if (hot_plug) {
+		/* enable hot plug monitoring */
+		ret = rte_dev_event_monitor_start();
+		if (ret) {
+			rte_errno = EINVAL;
+			return -1;
+		}
+		eth_dev_event_callback_register();
+
+	}
+
 	if (start_port(RTE_PORT_ALL) != 0)
 		rte_exit(EXIT_FAILURE, "Start ports failed\n");
 
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 153abea..8fde68d 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -319,6 +319,8 @@ extern volatile int test_done; /* stop packet forwarding when set to 1. */
 extern uint8_t lsc_interrupt; /**< disabled by "--no-lsc-interrupt" parameter */
 extern uint8_t rmv_interrupt; /**< disabled by "--no-rmv-interrupt" parameter */
 extern uint32_t event_print_mask;
+extern uint8_t hot_plug; /**< enable by "--hot-plug" parameter */
+
 /**< set by "--print-event xxxx" and "--mask-event xxxx parameters */
 
 #ifdef RTE_LIBRTE_IXGBE_BYPASS
diff --git a/doc/guides/testpmd_app_ug/run_app.rst b/doc/guides/testpmd_app_ug/run_app.rst
index 1fd5395..d0ced36 100644
--- a/doc/guides/testpmd_app_ug/run_app.rst
+++ b/doc/guides/testpmd_app_ug/run_app.rst
@@ -479,3 +479,7 @@ The commandline options are:
 
     Set the hexadecimal bitmask of TX queue offloads.
     The default value is 0.
+
+*   ``--hot-plug``
+
+    Enable device event monitor machenism for hotplug.
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* Re: [PATCH V20 3/4] eal/linux: uevent parse and process
  2018-04-05 16:10                                                                           ` [PATCH V20 3/4] eal/linux: uevent parse and process Jeff Guo
@ 2018-04-05 16:22                                                                             ` Tan, Jianfeng
  2018-04-06  3:47                                                                               ` Guo, Jia
  2018-04-05 21:58                                                                             ` Thomas Monjalon
  1 sibling, 1 reply; 494+ messages in thread
From: Tan, Jianfeng @ 2018-04-05 16:22 UTC (permalink / raw)
  To: Jeff Guo, stephen, bruce.richardson, ferruh.yigit,
	konstantin.ananyev, gaetan.rivet, jingjing.wu, thomas, motih,
	harry.van.haaren
  Cc: jblunck, shreyansh.jain, dev, helin.zhang



On 4/6/2018 12:10 AM, Jeff Guo wrote:
> In order to handle the uevent which has been detected from the kernel
> side, add uevent parse and process function to translate the uevent into
> device event, which user has subscribed to monitor.
>
> Signed-off-by: Jeff Guo <jia.guo@intel.com>

Other than two nits below,
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>

> ---
> v20->v19:
> add socket error handler
> ---
>   lib/librte_eal/linuxapp/eal/eal_dev.c | 206 +++++++++++++++++++++++++++++++++-
>   1 file changed, 204 insertions(+), 2 deletions(-)
>
> diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
> index 9c8d1a0..bde595c 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_dev.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
> @@ -2,21 +2,223 @@
>    * Copyright(c) 2018 Intel Corporation
>    */
>   
> +#include <string.h>
> +#include <unistd.h>
> +#include <sys/socket.h>
> +#include <linux/netlink.h>
> +
>   #include <rte_log.h>
>   #include <rte_compat.h>
>   #include <rte_dev.h>
> +#include <rte_malloc.h>
> +#include <rte_interrupts.h>
> +#include <rte_alarm.h>
> +
> +#include "eal_private.h"
> +
> +static struct rte_intr_handle intr_handle = {.fd = -1 };
> +static bool monitor_started;
> +
> +#define EAL_UEV_MSG_LEN 4096
> +#define EAL_UEV_MSG_ELEM_LEN 128
> +
> +static void dev_uev_handler(__rte_unused void *param);
> +
> +/* identify the system layer which reports this event. */
> +enum eal_dev_event_subsystem {
> +	EAL_DEV_EVENT_SUBSYSTEM_PCI, /* PCI bus device event */
> +	EAL_DEV_EVENT_SUBSYSTEM_UIO, /* UIO driver device event */
> +	EAL_DEV_EVENT_SUBSYSTEM_VFIO, /* VFIO driver device event */
> +	EAL_DEV_EVENT_SUBSYSTEM_MAX
> +};
> +
> +static int
> +dev_uev_socket_fd_create(void)
> +{
> +	struct sockaddr_nl addr;
> +	int ret;
> +
> +	intr_handle.fd = socket(PF_NETLINK, SOCK_RAW | SOCK_CLOEXEC |
> +			SOCK_NONBLOCK,
> +			NETLINK_KOBJECT_UEVENT);
> +	if (intr_handle.fd < 0) {
> +		RTE_LOG(ERR, EAL, "create uevent fd failed.\n");
> +		return -1;
> +	}
> +
> +	memset(&addr, 0, sizeof(addr));
> +	addr.nl_family = AF_NETLINK;
> +	addr.nl_pid = 0;
> +	addr.nl_groups = 0xffffffff;
> +
> +	ret = bind(intr_handle.fd, (struct sockaddr *) &addr, sizeof(addr));
> +	if (ret < 0) {
> +		RTE_LOG(ERR, EAL, "Failed to bind uevent socket.\n");
> +		goto err;
> +	}
>   
> +	return 0;
> +err:
> +	close(intr_handle.fd);
> +	intr_handle.fd = -1;
> +	return ret;
> +}
> +
> +static int
> +dev_uev_parse(const char *buf, struct rte_dev_event *event, int length)
> +{
> +	char action[EAL_UEV_MSG_ELEM_LEN];
> +	char subsystem[EAL_UEV_MSG_ELEM_LEN];
> +	char pci_slot_name[EAL_UEV_MSG_ELEM_LEN];
> +	int i = 0;
> +
> +	memset(action, 0, EAL_UEV_MSG_ELEM_LEN);
> +	memset(subsystem, 0, EAL_UEV_MSG_ELEM_LEN);
> +	memset(pci_slot_name, 0, EAL_UEV_MSG_ELEM_LEN);
> +
> +	while (i < length) {
> +		for (; i < length; i++) {
> +			if (*buf)
> +				break;
> +			buf++;
> +		}
> +		/**
> +		 * check device uevent from kernel side, no need to check
> +		 * uevent from udev.
> +		 */
> +		if (!strncmp(buf, "libudev", 7)) {
> +			buf += 7;
> +			i += 7;
> +			return -1;
> +		}
> +		if (!strncmp(buf, "ACTION=", 7)) {
> +			buf += 7;
> +			i += 7;
> +			snprintf(action, sizeof(action), "%s", buf);
> +		} else if (!strncmp(buf, "SUBSYSTEM=", 10)) {
> +			buf += 10;
> +			i += 10;
> +			snprintf(subsystem, sizeof(subsystem), "%s", buf);
> +		} else if (!strncmp(buf, "PCI_SLOT_NAME=", 14)) {
> +			buf += 14;
> +			i += 14;
> +			snprintf(pci_slot_name, sizeof(subsystem), "%s", buf);
> +			event->devname = strdup(pci_slot_name);
> +		}
> +		for (; i < length; i++) {
> +			if (*buf == '\0')
> +				break;
> +			buf++;
> +		}
> +	}
> +
> +	/* parse the subsystem layer */
> +	if (!strncmp(subsystem, "uio", 3))
> +		event->subsystem = EAL_DEV_EVENT_SUBSYSTEM_UIO;
> +	else if (!strncmp(subsystem, "pci", 3))
> +		event->subsystem = EAL_DEV_EVENT_SUBSYSTEM_PCI;
> +	else if (!strncmp(subsystem, "vfio", 4))
> +		event->subsystem = EAL_DEV_EVENT_SUBSYSTEM_VFIO;
> +	else
> +		return -1;
> +
> +	/* parse the action type */
> +	if (!strncmp(action, "add", 3))
> +		event->type = RTE_DEV_EVENT_ADD;
> +	else if (!strncmp(action, "remove", 6))
> +		event->type = RTE_DEV_EVENT_REMOVE;
> +	else
> +		return -1;
> +	return 0;
> +}
> +
> +static void
> +dev_delayed_unregister(void *param)
> +{
> +	rte_intr_callback_unregister(&intr_handle, dev_uev_handler, param);

You might also want to:

+	close(intr_handle.fd);
+	intr_handle.fd = -1;


> +}
> +
> +static void
> +dev_uev_handler(__rte_unused void *param)
> +{
> +	struct rte_dev_event uevent;
> +	int ret;
> +	char buf[EAL_UEV_MSG_LEN];
> +
> +	memset(&uevent, 0, sizeof(struct rte_dev_event));
> +	memset(buf, 0, EAL_UEV_MSG_LEN);
> +
> +	ret = recv(intr_handle.fd, buf, EAL_UEV_MSG_LEN, MSG_DONTWAIT);
> +	if (ret == 0 || (ret < 0 && errno != EAGAIN)) {
> +		/* connection is closed or broken, can not up again. */
> +		RTE_LOG(ERR, EAL, "uevent socket connection is broken.\n");
> +		rte_eal_alarm_set(1, dev_delayed_unregister, NULL);
> +		return;
> +	} else if (ret < 0) {
> +		RTE_LOG(ERR, EAL,
> +			"uevent socket read error(%d): %s.\n",
> +			errno, strerror(errno));
> +		return;
> +	}

Above error handle can be rewritten a little bit:

+	if (ret < 0 && errno == EAGAIN)
+		return;
+	else if (ret <= 0) {
+		/* connection is closed or broken, can not up again. */
+		RTE_LOG(ERR, EAL, "uevent socket connection is broken.\n");
+		rte_eal_alarm_set(1, dev_delayed_unregister, NULL);
+		return;
+	}


> +
> +	ret = dev_uev_parse(buf, &uevent, EAL_UEV_MSG_LEN);
> +	if (ret < 0) {
> +		RTE_LOG(DEBUG, EAL, "It is not an valid event "
> +			"that need to be handle.\n");
> +		return;
> +	}
> +
> +	RTE_LOG(DEBUG, EAL, "receive uevent(name:%s, type:%d, subsystem:%d)\n",
> +		uevent.devname, uevent.type, uevent.subsystem);
> +
> +	if (uevent.devname)
> +		dev_callback_process(uevent.devname, uevent.type);
> +}
>   
>   int __rte_experimental
>   rte_dev_event_monitor_start(void)
>   {
> -	/* TODO: start uevent monitor for linux */
> +	int ret;
> +
> +	if (monitor_started)
> +		return 0;
> +
> +	ret = dev_uev_socket_fd_create();
> +	if (ret) {
> +		RTE_LOG(ERR, EAL, "error create device event fd.\n");
> +		return -1;
> +	}
> +
> +	intr_handle.type = RTE_INTR_HANDLE_DEV_EVENT;
> +	ret = rte_intr_callback_register(&intr_handle, dev_uev_handler, NULL);
> +
> +	if (ret) {
> +		RTE_LOG(ERR, EAL, "fail to register uevent callback.\n");
> +		return -1;
> +	}
> +
> +	monitor_started = true;
> +
>   	return 0;
>   }
>   
>   int __rte_experimental
>   rte_dev_event_monitor_stop(void)
>   {
> -	/* TODO: stop uevent monitor for linux */
> +	int ret;
> +
> +	if (!monitor_started)
> +		return 0;
> +
> +	ret = rte_intr_callback_unregister(&intr_handle, dev_uev_handler,
> +					   (void *)-1);
> +	if (ret < 0) {
> +		RTE_LOG(ERR, EAL, "fail to unregister uevent callback.\n");
> +		return ret;
> +	}
> +
> +	close(intr_handle.fd);
> +	intr_handle.fd = -1;
> +	monitor_started = false;
>   	return 0;
>   }

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V20 4/4] app/testpmd: enable device hotplug monitoring
  2018-04-05 16:10                                                                           ` [PATCH V20 4/4] app/testpmd: enable device hotplug monitoring Jeff Guo
@ 2018-04-05 21:48                                                                             ` Thomas Monjalon
  2018-04-06  3:51                                                                               ` Guo, Jia
  2018-04-06  3:54                                                                             ` [PATCH V21 0/4] add device event monitor framework Jeff Guo
  1 sibling, 1 reply; 494+ messages in thread
From: Thomas Monjalon @ 2018-04-05 21:48 UTC (permalink / raw)
  To: Jeff Guo, shreyansh.jain
  Cc: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, motih, harry.van.haaren, jianfeng.tan,
	dev, helin.zhang

05/04/2018 18:10, Jeff Guo:
> Use testpmd for example, to show how an application uses device event
> APIs to monitor the hotplug events, including both hot removal event
> and hot insertion event.
> 
> The process is that, testpmd first enable hotplug by below commands,
> 
> E.g. ./build/app/testpmd -c 0x3 --n 4 -- -i --hot-plug

I am not convinced by the testpmd option.
Why not just a CLI command to start monitoring events?

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V20 2/4] eal: add device event monitor framework
  2018-04-05 16:10                                                                           ` [PATCH V20 2/4] eal: add device event monitor framework Jeff Guo
@ 2018-04-05 21:54                                                                             ` Thomas Monjalon
  2018-04-06  3:51                                                                               ` Guo, Jia
  0 siblings, 1 reply; 494+ messages in thread
From: Thomas Monjalon @ 2018-04-05 21:54 UTC (permalink / raw)
  To: Jeff Guo
  Cc: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, motih, harry.van.haaren, jianfeng.tan,
	shreyansh.jain, dev, helin.zhang

05/04/2018 18:10, Jeff Guo:
> --- a/lib/librte_eal/rte_eal_version.map
> +++ b/lib/librte_eal/rte_eal_version.map
> @@ -260,3 +260,13 @@ EXPERIMENTAL {
>         rte_socket_id_by_idx;
>  
>  } DPDK_18.02;
> +
> +EXPERIMENTAL {
> +       global:
> +
> +       rte_dev_event_monitor_start;
> +       rte_dev_event_monitor_stop;
> +       rte_dev_event_callback_register;
> +       rte_dev_event_callback_unregister;
> +
> +} DPDK_18.05;

These functions should go in the already existing EXPERIMENTAL
block above.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V20 3/4] eal/linux: uevent parse and process
  2018-04-05 16:10                                                                           ` [PATCH V20 3/4] eal/linux: uevent parse and process Jeff Guo
  2018-04-05 16:22                                                                             ` Tan, Jianfeng
@ 2018-04-05 21:58                                                                             ` Thomas Monjalon
  2018-04-06  3:52                                                                               ` Guo, Jia
  1 sibling, 1 reply; 494+ messages in thread
From: Thomas Monjalon @ 2018-04-05 21:58 UTC (permalink / raw)
  To: Jeff Guo
  Cc: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, motih, harry.van.haaren, jianfeng.tan,
	shreyansh.jain, dev, helin.zhang

05/04/2018 18:10, Jeff Guo:
> In order to handle the uevent which has been detected from the kernel
> side, add uevent parse and process function to translate the uevent into
> device event, which user has subscribed to monitor.
> 
> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> ---
>  lib/librte_eal/linuxapp/eal/eal_dev.c | 206 +++++++++++++++++++++++++++++++++-
>  1 file changed, 204 insertions(+), 2 deletions(-)

Please update the release notes entry, explaining that Linux uevent
is supported as backend of the new EAL device event notification framework.

Thanks for the work, I think we are close to the merge.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V20 3/4] eal/linux: uevent parse and process
  2018-04-05 16:22                                                                             ` Tan, Jianfeng
@ 2018-04-06  3:47                                                                               ` Guo, Jia
  0 siblings, 0 replies; 494+ messages in thread
From: Guo, Jia @ 2018-04-06  3:47 UTC (permalink / raw)
  To: Tan, Jianfeng, stephen, bruce.richardson, ferruh.yigit,
	konstantin.ananyev, gaetan.rivet, jingjing.wu, thomas, motih,
	harry.van.haaren
  Cc: jblunck, shreyansh.jain, dev, helin.zhang

thanks.


On 4/6/2018 12:22 AM, Tan, Jianfeng wrote:
>
>
> On 4/6/2018 12:10 AM, Jeff Guo wrote:
>> In order to handle the uevent which has been detected from the kernel
>> side, add uevent parse and process function to translate the uevent into
>> device event, which user has subscribed to monitor.
>>
>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
>
> Other than two nits below,
> Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
>
>> ---
>> v20->v19:
>> add socket error handler
>> ---
>>   lib/librte_eal/linuxapp/eal/eal_dev.c | 206 
>> +++++++++++++++++++++++++++++++++-
>>   1 file changed, 204 insertions(+), 2 deletions(-)
>>
>> diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c 
>> b/lib/librte_eal/linuxapp/eal/eal_dev.c
>> index 9c8d1a0..bde595c 100644
>> --- a/lib/librte_eal/linuxapp/eal/eal_dev.c
>> +++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
>> @@ -2,21 +2,223 @@
>>    * Copyright(c) 2018 Intel Corporation
>>    */
>>   +#include <string.h>
>> +#include <unistd.h>
>> +#include <sys/socket.h>
>> +#include <linux/netlink.h>
>> +
>>   #include <rte_log.h>
>>   #include <rte_compat.h>
>>   #include <rte_dev.h>
>> +#include <rte_malloc.h>
>> +#include <rte_interrupts.h>
>> +#include <rte_alarm.h>
>> +
>> +#include "eal_private.h"
>> +
>> +static struct rte_intr_handle intr_handle = {.fd = -1 };
>> +static bool monitor_started;
>> +
>> +#define EAL_UEV_MSG_LEN 4096
>> +#define EAL_UEV_MSG_ELEM_LEN 128
>> +
>> +static void dev_uev_handler(__rte_unused void *param);
>> +
>> +/* identify the system layer which reports this event. */
>> +enum eal_dev_event_subsystem {
>> +    EAL_DEV_EVENT_SUBSYSTEM_PCI, /* PCI bus device event */
>> +    EAL_DEV_EVENT_SUBSYSTEM_UIO, /* UIO driver device event */
>> +    EAL_DEV_EVENT_SUBSYSTEM_VFIO, /* VFIO driver device event */
>> +    EAL_DEV_EVENT_SUBSYSTEM_MAX
>> +};
>> +
>> +static int
>> +dev_uev_socket_fd_create(void)
>> +{
>> +    struct sockaddr_nl addr;
>> +    int ret;
>> +
>> +    intr_handle.fd = socket(PF_NETLINK, SOCK_RAW | SOCK_CLOEXEC |
>> +            SOCK_NONBLOCK,
>> +            NETLINK_KOBJECT_UEVENT);
>> +    if (intr_handle.fd < 0) {
>> +        RTE_LOG(ERR, EAL, "create uevent fd failed.\n");
>> +        return -1;
>> +    }
>> +
>> +    memset(&addr, 0, sizeof(addr));
>> +    addr.nl_family = AF_NETLINK;
>> +    addr.nl_pid = 0;
>> +    addr.nl_groups = 0xffffffff;
>> +
>> +    ret = bind(intr_handle.fd, (struct sockaddr *) &addr, 
>> sizeof(addr));
>> +    if (ret < 0) {
>> +        RTE_LOG(ERR, EAL, "Failed to bind uevent socket.\n");
>> +        goto err;
>> +    }
>>   +    return 0;
>> +err:
>> +    close(intr_handle.fd);
>> +    intr_handle.fd = -1;
>> +    return ret;
>> +}
>> +
>> +static int
>> +dev_uev_parse(const char *buf, struct rte_dev_event *event, int length)
>> +{
>> +    char action[EAL_UEV_MSG_ELEM_LEN];
>> +    char subsystem[EAL_UEV_MSG_ELEM_LEN];
>> +    char pci_slot_name[EAL_UEV_MSG_ELEM_LEN];
>> +    int i = 0;
>> +
>> +    memset(action, 0, EAL_UEV_MSG_ELEM_LEN);
>> +    memset(subsystem, 0, EAL_UEV_MSG_ELEM_LEN);
>> +    memset(pci_slot_name, 0, EAL_UEV_MSG_ELEM_LEN);
>> +
>> +    while (i < length) {
>> +        for (; i < length; i++) {
>> +            if (*buf)
>> +                break;
>> +            buf++;
>> +        }
>> +        /**
>> +         * check device uevent from kernel side, no need to check
>> +         * uevent from udev.
>> +         */
>> +        if (!strncmp(buf, "libudev", 7)) {
>> +            buf += 7;
>> +            i += 7;
>> +            return -1;
>> +        }
>> +        if (!strncmp(buf, "ACTION=", 7)) {
>> +            buf += 7;
>> +            i += 7;
>> +            snprintf(action, sizeof(action), "%s", buf);
>> +        } else if (!strncmp(buf, "SUBSYSTEM=", 10)) {
>> +            buf += 10;
>> +            i += 10;
>> +            snprintf(subsystem, sizeof(subsystem), "%s", buf);
>> +        } else if (!strncmp(buf, "PCI_SLOT_NAME=", 14)) {
>> +            buf += 14;
>> +            i += 14;
>> +            snprintf(pci_slot_name, sizeof(subsystem), "%s", buf);
>> +            event->devname = strdup(pci_slot_name);
>> +        }
>> +        for (; i < length; i++) {
>> +            if (*buf == '\0')
>> +                break;
>> +            buf++;
>> +        }
>> +    }
>> +
>> +    /* parse the subsystem layer */
>> +    if (!strncmp(subsystem, "uio", 3))
>> +        event->subsystem = EAL_DEV_EVENT_SUBSYSTEM_UIO;
>> +    else if (!strncmp(subsystem, "pci", 3))
>> +        event->subsystem = EAL_DEV_EVENT_SUBSYSTEM_PCI;
>> +    else if (!strncmp(subsystem, "vfio", 4))
>> +        event->subsystem = EAL_DEV_EVENT_SUBSYSTEM_VFIO;
>> +    else
>> +        return -1;
>> +
>> +    /* parse the action type */
>> +    if (!strncmp(action, "add", 3))
>> +        event->type = RTE_DEV_EVENT_ADD;
>> +    else if (!strncmp(action, "remove", 6))
>> +        event->type = RTE_DEV_EVENT_REMOVE;
>> +    else
>> +        return -1;
>> +    return 0;
>> +}
>> +
>> +static void
>> +dev_delayed_unregister(void *param)
>> +{
>> +    rte_intr_callback_unregister(&intr_handle, dev_uev_handler, param);
>
> You might also want to:
>
> +    close(intr_handle.fd);
> +    intr_handle.fd = -1;
>
>
>> +}
>> +
>> +static void
>> +dev_uev_handler(__rte_unused void *param)
>> +{
>> +    struct rte_dev_event uevent;
>> +    int ret;
>> +    char buf[EAL_UEV_MSG_LEN];
>> +
>> +    memset(&uevent, 0, sizeof(struct rte_dev_event));
>> +    memset(buf, 0, EAL_UEV_MSG_LEN);
>> +
>> +    ret = recv(intr_handle.fd, buf, EAL_UEV_MSG_LEN, MSG_DONTWAIT);
>> +    if (ret == 0 || (ret < 0 && errno != EAGAIN)) {
>> +        /* connection is closed or broken, can not up again. */
>> +        RTE_LOG(ERR, EAL, "uevent socket connection is broken.\n");
>> +        rte_eal_alarm_set(1, dev_delayed_unregister, NULL);
>> +        return;
>> +    } else if (ret < 0) {
>> +        RTE_LOG(ERR, EAL,
>> +            "uevent socket read error(%d): %s.\n",
>> +            errno, strerror(errno));
>> +        return;
>> +    }
>
> Above error handle can be rewritten a little bit:
>
> +    if (ret < 0 && errno == EAGAIN)
> +        return;
> +    else if (ret <= 0) {
> +        /* connection is closed or broken, can not up again. */
> +        RTE_LOG(ERR, EAL, "uevent socket connection is broken.\n");
> +        rte_eal_alarm_set(1, dev_delayed_unregister, NULL);
> +        return;
> +    }
>
>
>> +
>> +    ret = dev_uev_parse(buf, &uevent, EAL_UEV_MSG_LEN);
>> +    if (ret < 0) {
>> +        RTE_LOG(DEBUG, EAL, "It is not an valid event "
>> +            "that need to be handle.\n");
>> +        return;
>> +    }
>> +
>> +    RTE_LOG(DEBUG, EAL, "receive uevent(name:%s, type:%d, 
>> subsystem:%d)\n",
>> +        uevent.devname, uevent.type, uevent.subsystem);
>> +
>> +    if (uevent.devname)
>> +        dev_callback_process(uevent.devname, uevent.type);
>> +}
>>     int __rte_experimental
>>   rte_dev_event_monitor_start(void)
>>   {
>> -    /* TODO: start uevent monitor for linux */
>> +    int ret;
>> +
>> +    if (monitor_started)
>> +        return 0;
>> +
>> +    ret = dev_uev_socket_fd_create();
>> +    if (ret) {
>> +        RTE_LOG(ERR, EAL, "error create device event fd.\n");
>> +        return -1;
>> +    }
>> +
>> +    intr_handle.type = RTE_INTR_HANDLE_DEV_EVENT;
>> +    ret = rte_intr_callback_register(&intr_handle, dev_uev_handler, 
>> NULL);
>> +
>> +    if (ret) {
>> +        RTE_LOG(ERR, EAL, "fail to register uevent callback.\n");
>> +        return -1;
>> +    }
>> +
>> +    monitor_started = true;
>> +
>>       return 0;
>>   }
>>     int __rte_experimental
>>   rte_dev_event_monitor_stop(void)
>>   {
>> -    /* TODO: stop uevent monitor for linux */
>> +    int ret;
>> +
>> +    if (!monitor_started)
>> +        return 0;
>> +
>> +    ret = rte_intr_callback_unregister(&intr_handle, dev_uev_handler,
>> +                       (void *)-1);
>> +    if (ret < 0) {
>> +        RTE_LOG(ERR, EAL, "fail to unregister uevent callback.\n");
>> +        return ret;
>> +    }
>> +
>> +    close(intr_handle.fd);
>> +    intr_handle.fd = -1;
>> +    monitor_started = false;
>>       return 0;
>>   }
>

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V20 4/4] app/testpmd: enable device hotplug monitoring
  2018-04-05 21:48                                                                             ` Thomas Monjalon
@ 2018-04-06  3:51                                                                               ` Guo, Jia
  0 siblings, 0 replies; 494+ messages in thread
From: Guo, Jia @ 2018-04-06  3:51 UTC (permalink / raw)
  To: Thomas Monjalon, shreyansh.jain
  Cc: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, motih, harry.van.haaren, jianfeng.tan,
	dev, helin.zhang



On 4/6/2018 5:48 AM, Thomas Monjalon wrote:
> 05/04/2018 18:10, Jeff Guo:
>> Use testpmd for example, to show how an application uses device event
>> APIs to monitor the hotplug events, including both hot removal event
>> and hot insertion event.
>>
>> The process is that, testpmd first enable hotplug by below commands,
>>
>> E.g. ./build/app/testpmd -c 0x3 --n 4 -- -i --hot-plug
> I am not convinced by the testpmd option.
> Why not just a CLI command to start monitoring events?
hot plug is a basic related memory like feature, let the feature go into 
option like other base option would be make sense i think.
>

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V20 2/4] eal: add device event monitor framework
  2018-04-05 21:54                                                                             ` Thomas Monjalon
@ 2018-04-06  3:51                                                                               ` Guo, Jia
  0 siblings, 0 replies; 494+ messages in thread
From: Guo, Jia @ 2018-04-06  3:51 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, motih, harry.van.haaren, jianfeng.tan,
	shreyansh.jain, dev, helin.zhang



On 4/6/2018 5:54 AM, Thomas Monjalon wrote:
> 05/04/2018 18:10, Jeff Guo:
>> --- a/lib/librte_eal/rte_eal_version.map
>> +++ b/lib/librte_eal/rte_eal_version.map
>> @@ -260,3 +260,13 @@ EXPERIMENTAL {
>>          rte_socket_id_by_idx;
>>   
>>   } DPDK_18.02;
>> +
>> +EXPERIMENTAL {
>> +       global:
>> +
>> +       rte_dev_event_monitor_start;
>> +       rte_dev_event_monitor_stop;
>> +       rte_dev_event_callback_register;
>> +       rte_dev_event_callback_unregister;
>> +
>> +} DPDK_18.05;
> These functions should go in the already existing EXPERIMENTAL
> block above.
ok.
>

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V20 3/4] eal/linux: uevent parse and process
  2018-04-05 21:58                                                                             ` Thomas Monjalon
@ 2018-04-06  3:52                                                                               ` Guo, Jia
  0 siblings, 0 replies; 494+ messages in thread
From: Guo, Jia @ 2018-04-06  3:52 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, motih, harry.van.haaren, jianfeng.tan,
	shreyansh.jain, dev, helin.zhang



On 4/6/2018 5:58 AM, Thomas Monjalon wrote:
> 05/04/2018 18:10, Jeff Guo:
>> In order to handle the uevent which has been detected from the kernel
>> side, add uevent parse and process function to translate the uevent into
>> device event, which user has subscribed to monitor.
>>
>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
>> ---
>>   lib/librte_eal/linuxapp/eal/eal_dev.c | 206 +++++++++++++++++++++++++++++++++-
>>   1 file changed, 204 insertions(+), 2 deletions(-)
> Please update the release notes entry, explaining that Linux uevent
> is supported as backend of the new EAL device event notification framework.
>
> Thanks for the work, I think we are close to the merge.
also thanks for your review and great suggestions.
>

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH V21 0/4] add device event monitor framework
  2018-04-05 16:10                                                                           ` [PATCH V20 4/4] app/testpmd: enable device hotplug monitoring Jeff Guo
  2018-04-05 21:48                                                                             ` Thomas Monjalon
@ 2018-04-06  3:54                                                                             ` Jeff Guo
  2018-04-06  3:54                                                                               ` [PATCH V21 1/4] eal: add device event handle in interrupt thread Jeff Guo
                                                                                                 ` (3 more replies)
  1 sibling, 4 replies; 494+ messages in thread
From: Jeff Guo @ 2018-04-06  3:54 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, harry.van.haaren,
	jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

About hot plug in dpdk, We already have proactive way to add/remove devices
through APIs (rte_eal_hotplug_add/remove), and also have fail-safe driver
to offload the fail-safe work from the app user. But there are still lack
of a general mechanism to monitor hotplug event for all driver, now the
hotplug interrupt event is diversity between each device and driver, such
as mlx4, pci driver and others.

Use the hot removal event for example, pci drivers not all exposure the
remove interrupt, so in order to make user to easy use the hot plug
feature for pci driver, something must be done to detect the remove event
at the kernel level and offer a new line of interrupt to the user land.

Base on the uevent of kobject mechanism in kernel, we could use it to
benefit for monitoring the hot plug status of the device which not only
uio/vfio of pci bus devices, but also other, such as cpu/usb/pci-express bus devices.

The idea is comming as bellow.

a.The uevent message form FD monitoring like below.
remove@/devices/pci0000:80/0000:80:02.2/0000:82:00.0/0000:83:03.0/0000:84:00.2/uio/uio2
ACTION=remove
DEVPATH=/devices/pci0000:80/0000:80:02.2/0000:82:00.0/0000:83:03.0/0000:84:00.2/uio/uio2
SUBSYSTEM=uio
MAJOR=243
MINOR=2
DEVNAME=uio2
SEQNUM=11366

b.add device event monitor framework:
add several general api to enable uevent monitoring.

c.show example how to use uevent monitor
enable uevent monitoring in testpmd to show device event monitor machenism usage.

TODO: failure handler mechanism for hot plug and driver auto bind for hot insertion.
that would let the next hot plug patch set to cover.

patchset history:
v21->v20:
refine release note and some code cleaning.

v20->v19:
add more detail note and socket error handler.

v19->18:
fix some typo and misunderstanding part

v18->v17:
1.add feature announcement in release document, fix bsp compile issue.
2.refine socket configuration.
3.remove hotplug policy and detach/attach process from testpmd, let it
focus on the device event monitoring which the patch set introduced.

v17->v16:
1.add related part of the interrupt handle type adding.
2.add new API into map, fix typo issue, add (void*)-1 value for unregister all callback
3.add new file into meson.build, modify coding sytle and add print info, delete unused part.
4.unregister all user's callback when stop event monitor

v16->v15:
1.remove some linux related code out of eal common layer
2.fix some uneasy readble issue.

v15->v14:
1.use exist eal interrupt epoll to replace of rte service usage for monitor thread,
2.add new device event handle type in eal interrupt.
3.remove the uevent type check and any policy from eal,
let it check and management in user's callback.
4.add "--hot-plug" configure parameter in testpmd to switch the hotplug feature.

v14->v13:
1.add __rte_experimental on function defind and fix bsd build issue

v13->v12:
1.fix some logic issue and null check issue
2.fix monitor stop func issue

v12->v11:
1.identify null param in callback for monitor all devices uevent

v11->v10:
1:modify some typo and add experimental tag in new file.
2:modify callback register calling.

v10->v9:
1.fix prefix issue.
2.use a common callback lists for all device and all type to replace
add callback parameter into device struct.
3.delete some unuse part.

v9->v8:
split the patch set into small and explicit patch

v8->v7:
1.use rte_service to replace pthread management.
2.fix defind issue and copyright issue
3.fix some lock issue

v7->v6:
1.modify vdev part according to the vdev rework
2.re-define and split the func into common and bus specific code
3.fix some incorrect issue.
4.fix the system hung after send packcet issue.

v6->v5:
1.add hot plug policy, in eal, default handle to prepare hot plug work for
all pci device, then let app to manage to deside which device need to
hot plug.
2.modify to manage event callback in each device.
3.fix some system hung issue when igb_uioome typo error.release.
4.modify the pci part to the bus-pci base on the bus rework.
5.add hot plug policy in app, show example to use hotplug list to manage
to deside which device need to hot plug.

v5->v4:
1.Move uevent monitor epolling from eal interrupt to eal device layer.
2.Redefine the eal device API for common, and distinguish between linux and bsd
3.Add failure handler helper api in bus layer.Add function of find device by name.
4.Replace of individual fd bind with single device, use a common fd to polling all device.
5.Add to register hot insertion monitoring and process, add function to auto bind driver befor user add device
6.Refine some coding style and typos issue
7.add new callback to process hot insertion

v4->v3:
1.move uevent monitor api from eal interrupt to eal device layer.
2.create uevent type and struct in eal device.
3.move uevent handler for each driver to eal layer.
4.add uevent failure handler to process signal fault issue.
5.add example for request and use uevent monitoring in testpmd.

v3->v2:
1.refine some return error
2.refine the string searching logic to avoid memory issue

v2->v1:
1.remove global variables of hotplug_fd, add uevent_fd
in rte_intr_handle to let each pci device self maintain it fd,
to fix dual device fd issue.
2.refine some typo error.

Jeff Guo (4):
  eal: add device event handle in interrupt thread
  eal: add device event monitor framework
  eal/linux: uevent parse and process
  app/testpmd: enable device hotplug monitoring

 app/test-pmd/parameters.c                          |   5 +-
 app/test-pmd/testpmd.c                             | 101 +++++++++-
 app/test-pmd/testpmd.h                             |   2 +
 doc/guides/rel_notes/release_18_05.rst             |  11 +
 doc/guides/testpmd_app_ug/run_app.rst              |   4 +
 lib/librte_eal/bsdapp/eal/Makefile                 |   1 +
 lib/librte_eal/bsdapp/eal/eal_dev.c                |  21 ++
 lib/librte_eal/bsdapp/eal/meson.build              |   1 +
 lib/librte_eal/common/eal_common_dev.c             | 161 +++++++++++++++
 lib/librte_eal/common/eal_private.h                |  15 ++
 lib/librte_eal/common/include/rte_dev.h            |  94 +++++++++
 lib/librte_eal/common/include/rte_eal_interrupts.h |   1 +
 lib/librte_eal/linuxapp/eal/Makefile               |   1 +
 lib/librte_eal/linuxapp/eal/eal_dev.c              | 223 +++++++++++++++++++++
 lib/librte_eal/linuxapp/eal/eal_interrupts.c       |  11 +-
 lib/librte_eal/linuxapp/eal/meson.build            |   1 +
 lib/librte_eal/rte_eal_version.map                 |   4 +
 test/test/test_interrupts.c                        |  39 +++-
 18 files changed, 691 insertions(+), 5 deletions(-)
 create mode 100644 lib/librte_eal/bsdapp/eal/eal_dev.c
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_dev.c

-- 
2.7.4

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH V21 1/4] eal: add device event handle in interrupt thread
  2018-04-06  3:54                                                                             ` [PATCH V21 0/4] add device event monitor framework Jeff Guo
@ 2018-04-06  3:54                                                                               ` Jeff Guo
  2018-04-06  3:55                                                                               ` [PATCH V21 2/4] eal: add device event monitor framework Jeff Guo
                                                                                                 ` (2 subsequent siblings)
  3 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-04-06  3:54 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, harry.van.haaren,
	jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

Add new interrupt handle type of RTE_INTR_HANDLE_DEV_EVENT, for
device event interrupt monitor.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
---
v21->v20:
no change
---
 lib/librte_eal/common/include/rte_eal_interrupts.h |  1 +
 lib/librte_eal/linuxapp/eal/eal_interrupts.c       | 11 +++++-
 test/test/test_interrupts.c                        | 39 ++++++++++++++++++++--
 3 files changed, 48 insertions(+), 3 deletions(-)

diff --git a/lib/librte_eal/common/include/rte_eal_interrupts.h b/lib/librte_eal/common/include/rte_eal_interrupts.h
index 3f792a9..6eb4932 100644
--- a/lib/librte_eal/common/include/rte_eal_interrupts.h
+++ b/lib/librte_eal/common/include/rte_eal_interrupts.h
@@ -34,6 +34,7 @@ enum rte_intr_handle_type {
 	RTE_INTR_HANDLE_ALARM,        /**< alarm handle */
 	RTE_INTR_HANDLE_EXT,          /**< external handler */
 	RTE_INTR_HANDLE_VDEV,         /**< virtual device */
+	RTE_INTR_HANDLE_DEV_EVENT,    /**< device event handle */
 	RTE_INTR_HANDLE_MAX           /**< count of elements */
 };
 
diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
index f86f22f..58e9328 100644
--- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c
+++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
@@ -559,6 +559,9 @@ rte_intr_enable(const struct rte_intr_handle *intr_handle)
 			return -1;
 		break;
 #endif
+	/* not used at this moment */
+	case RTE_INTR_HANDLE_DEV_EVENT:
+		return -1;
 	/* unknown handle type */
 	default:
 		RTE_LOG(ERR, EAL,
@@ -606,6 +609,9 @@ rte_intr_disable(const struct rte_intr_handle *intr_handle)
 			return -1;
 		break;
 #endif
+	/* not used at this moment */
+	case RTE_INTR_HANDLE_DEV_EVENT:
+		return -1;
 	/* unknown handle type */
 	default:
 		RTE_LOG(ERR, EAL,
@@ -674,7 +680,10 @@ eal_intr_process_interrupts(struct epoll_event *events, int nfds)
 			bytes_read = 0;
 			call = true;
 			break;
-
+		case RTE_INTR_HANDLE_DEV_EVENT:
+			bytes_read = 0;
+			call = true;
+			break;
 		default:
 			bytes_read = 1;
 			break;
diff --git a/test/test/test_interrupts.c b/test/test/test_interrupts.c
index 31a70a0..dc19175 100644
--- a/test/test/test_interrupts.c
+++ b/test/test/test_interrupts.c
@@ -20,6 +20,7 @@ enum test_interrupt_handle_type {
 	TEST_INTERRUPT_HANDLE_VALID,
 	TEST_INTERRUPT_HANDLE_VALID_UIO,
 	TEST_INTERRUPT_HANDLE_VALID_ALARM,
+	TEST_INTERRUPT_HANDLE_VALID_DEV_EVENT,
 	TEST_INTERRUPT_HANDLE_CASE1,
 	TEST_INTERRUPT_HANDLE_MAX
 };
@@ -80,6 +81,10 @@ test_interrupt_init(void)
 	intr_handles[TEST_INTERRUPT_HANDLE_VALID_ALARM].type =
 					RTE_INTR_HANDLE_ALARM;
 
+	intr_handles[TEST_INTERRUPT_HANDLE_VALID_DEV_EVENT].fd = pfds.readfd;
+	intr_handles[TEST_INTERRUPT_HANDLE_VALID_DEV_EVENT].type =
+					RTE_INTR_HANDLE_DEV_EVENT;
+
 	intr_handles[TEST_INTERRUPT_HANDLE_CASE1].fd = pfds.writefd;
 	intr_handles[TEST_INTERRUPT_HANDLE_CASE1].type = RTE_INTR_HANDLE_UIO;
 
@@ -250,6 +255,14 @@ test_interrupt_enable(void)
 		return -1;
 	}
 
+	/* check with specific valid intr_handle */
+	test_intr_handle = intr_handles[TEST_INTERRUPT_HANDLE_VALID_DEV_EVENT];
+	if (rte_intr_enable(&test_intr_handle) == 0) {
+		printf("unexpectedly enable a specific intr_handle "
+			"successfully\n");
+		return -1;
+	}
+
 	/* check with valid handler and its type */
 	test_intr_handle = intr_handles[TEST_INTERRUPT_HANDLE_CASE1];
 	if (rte_intr_enable(&test_intr_handle) < 0) {
@@ -306,6 +319,14 @@ test_interrupt_disable(void)
 		return -1;
 	}
 
+	/* check with specific valid intr_handle */
+	test_intr_handle = intr_handles[TEST_INTERRUPT_HANDLE_VALID_DEV_EVENT];
+	if (rte_intr_disable(&test_intr_handle) == 0) {
+		printf("unexpectedly disable a specific intr_handle "
+			"successfully\n");
+		return -1;
+	}
+
 	/* check with valid handler and its type */
 	test_intr_handle = intr_handles[TEST_INTERRUPT_HANDLE_CASE1];
 	if (rte_intr_disable(&test_intr_handle) < 0) {
@@ -393,9 +414,17 @@ test_interrupt(void)
 		goto out;
 	}
 
+	printf("Check valid device event interrupt full path\n");
+	if (test_interrupt_full_path_check(
+		TEST_INTERRUPT_HANDLE_VALID_DEV_EVENT) < 0) {
+		printf("failure occurred during checking valid device event "
+						"interrupt full path\n");
+		goto out;
+	}
+
 	printf("Check valid alarm interrupt full path\n");
-	if (test_interrupt_full_path_check(TEST_INTERRUPT_HANDLE_VALID_ALARM)
-									< 0) {
+	if (test_interrupt_full_path_check(
+		TEST_INTERRUPT_HANDLE_VALID_DEV_EVENT) < 0) {
 		printf("failure occurred during checking valid alarm "
 						"interrupt full path\n");
 		goto out;
@@ -513,6 +542,12 @@ test_interrupt(void)
 	rte_intr_callback_unregister(&test_intr_handle,
 			test_interrupt_callback_1, (void *)-1);
 
+	test_intr_handle = intr_handles[TEST_INTERRUPT_HANDLE_VALID_DEV_EVENT];
+	rte_intr_callback_unregister(&test_intr_handle,
+			test_interrupt_callback, (void *)-1);
+	rte_intr_callback_unregister(&test_intr_handle,
+			test_interrupt_callback_1, (void *)-1);
+
 	rte_delay_ms(2 * TEST_INTERRUPT_CHECK_INTERVAL);
 	/* deinit */
 	test_interrupt_deinit();
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH V21 2/4] eal: add device event monitor framework
  2018-04-06  3:54                                                                             ` [PATCH V21 0/4] add device event monitor framework Jeff Guo
  2018-04-06  3:54                                                                               ` [PATCH V21 1/4] eal: add device event handle in interrupt thread Jeff Guo
@ 2018-04-06  3:55                                                                               ` Jeff Guo
  2018-04-12  8:36                                                                                 ` Thomas Monjalon
  2018-04-06  3:55                                                                               ` [PATCH V21 3/4] eal/linux: uevent parse and process Jeff Guo
  2018-04-06  3:55                                                                               ` [PATCH V21 4/4] app/testpmd: enable device hotplug monitoring Jeff Guo
  3 siblings, 1 reply; 494+ messages in thread
From: Jeff Guo @ 2018-04-06  3:55 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, harry.van.haaren,
	jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

This patch aims to add a general device event monitor framework at
EAL device layer, for device hotplug awareness and actions adopted
accordingly. It could also expand for all other types of device event
monitor, but not in this scope at the stage.

To get started, users firstly call below new added APIs to enable/disable
the device event monitor mechanism:
  - rte_dev_event_monitor_start
  - rte_dev_event_monitor_stop

Then users shell register or unregister callbacks through the new added
APIs. Callbacks can be some device specific, or for all devices.
  -rte_dev_event_callback_register
  -rte_dev_event_callback_unregister

Use hotplug case for example, when device hotplug insertion or hotplug
removal, we will get notified from kernel, then call user's callbacks
accordingly to handle it, such as detach or attach the device from the
bus, and could benefit further fail-safe or live-migration.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
---
v21->v20:
refind map document
---
 doc/guides/rel_notes/release_18_05.rst  |   9 ++
 lib/librte_eal/bsdapp/eal/Makefile      |   1 +
 lib/librte_eal/bsdapp/eal/eal_dev.c     |  21 +++++
 lib/librte_eal/bsdapp/eal/meson.build   |   1 +
 lib/librte_eal/common/eal_common_dev.c  | 161 ++++++++++++++++++++++++++++++++
 lib/librte_eal/common/eal_private.h     |  15 +++
 lib/librte_eal/common/include/rte_dev.h |  94 +++++++++++++++++++
 lib/librte_eal/linuxapp/eal/Makefile    |   1 +
 lib/librte_eal/linuxapp/eal/eal_dev.c   |  22 +++++
 lib/librte_eal/linuxapp/eal/meson.build |   1 +
 lib/librte_eal/rte_eal_version.map      |   4 +
 11 files changed, 330 insertions(+)
 create mode 100644 lib/librte_eal/bsdapp/eal/eal_dev.c
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_dev.c

diff --git a/doc/guides/rel_notes/release_18_05.rst b/doc/guides/rel_notes/release_18_05.rst
index e5fac1c..d3c86bd 100644
--- a/doc/guides/rel_notes/release_18_05.rst
+++ b/doc/guides/rel_notes/release_18_05.rst
@@ -58,6 +58,15 @@ New Features
   * Added support for NVGRE, VXLAN and GENEVE filters in flow API.
   * Added support for DROP action in flow API.
 
+* **Added device event monitor framework.**
+
+  Added a general device event monitor framework at EAL, for device dynamic management.
+  Such as device hotplug awareness and actions adopted accordingly. The list of new APIs:
+
+  * ``rte_dev_event_monitor_start`` and ``rte_dev_event_monitor_stop`` are for
+    the event monitor enable and disable.
+  * ``rte_dev_event_callback_register`` and ``rte_dev_event_callback_unregister``
+    are for the user's callbacks register and unregister.
 
 API Changes
 -----------
diff --git a/lib/librte_eal/bsdapp/eal/Makefile b/lib/librte_eal/bsdapp/eal/Makefile
index ed1d17b..90b88eb 100644
--- a/lib/librte_eal/bsdapp/eal/Makefile
+++ b/lib/librte_eal/bsdapp/eal/Makefile
@@ -33,6 +33,7 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_lcore.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_timer.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_interrupts.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_alarm.c
+SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_dev.c
 
 # from common dir
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_common_lcore.c
diff --git a/lib/librte_eal/bsdapp/eal/eal_dev.c b/lib/librte_eal/bsdapp/eal/eal_dev.c
new file mode 100644
index 0000000..1c6c51b
--- /dev/null
+++ b/lib/librte_eal/bsdapp/eal/eal_dev.c
@@ -0,0 +1,21 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <rte_log.h>
+#include <rte_compat.h>
+#include <rte_dev.h>
+
+int __rte_experimental
+rte_dev_event_monitor_start(void)
+{
+	RTE_LOG(ERR, EAL, "Device event is not supported for FreeBSD\n");
+	return -1;
+}
+
+int __rte_experimental
+rte_dev_event_monitor_stop(void)
+{
+	RTE_LOG(ERR, EAL, "Device event is not supported for FreeBSD\n");
+	return -1;
+}
diff --git a/lib/librte_eal/bsdapp/eal/meson.build b/lib/librte_eal/bsdapp/eal/meson.build
index e83fc91..6dfc533 100644
--- a/lib/librte_eal/bsdapp/eal/meson.build
+++ b/lib/librte_eal/bsdapp/eal/meson.build
@@ -12,4 +12,5 @@ env_sources = files('eal_alarm.c',
 		'eal_timer.c',
 		'eal.c',
 		'eal_memory.c',
+		'eal_dev.c'
 )
diff --git a/lib/librte_eal/common/eal_common_dev.c b/lib/librte_eal/common/eal_common_dev.c
index cd07144..e156c66 100644
--- a/lib/librte_eal/common/eal_common_dev.c
+++ b/lib/librte_eal/common/eal_common_dev.c
@@ -14,9 +14,34 @@
 #include <rte_devargs.h>
 #include <rte_debug.h>
 #include <rte_log.h>
+#include <rte_spinlock.h>
+#include <rte_malloc.h>
 
 #include "eal_private.h"
 
+/**
+ * The device event callback description.
+ *
+ * It contains callback address to be registered by user application,
+ * the pointer to the parameters for callback, and the device name.
+ */
+struct dev_event_callback {
+	TAILQ_ENTRY(dev_event_callback) next; /**< Callbacks list */
+	rte_dev_event_cb_fn cb_fn;                /**< Callback address */
+	void *cb_arg;                           /**< Callback parameter */
+	char *dev_name;	 /**< Callback device name, NULL is for all device */
+	uint32_t active;                        /**< Callback is executing */
+};
+
+/* The device event callback list for all registered callbacks. */
+static struct dev_event_cb_list dev_event_cbs;
+
+/** @internal Structure to keep track of registered callbacks */
+TAILQ_HEAD(dev_event_cb_list, dev_event_callback);
+
+/* spinlock for device callbacks */
+static rte_spinlock_t dev_event_lock = RTE_SPINLOCK_INITIALIZER;
+
 static int cmp_detached_dev_name(const struct rte_device *dev,
 	const void *_name)
 {
@@ -207,3 +232,139 @@ rte_eal_hotplug_remove(const char *busname, const char *devname)
 	rte_eal_devargs_remove(busname, devname);
 	return ret;
 }
+
+int __rte_experimental
+rte_dev_event_callback_register(const char *device_name,
+				rte_dev_event_cb_fn cb_fn,
+				void *cb_arg)
+{
+	struct dev_event_callback *event_cb;
+	int ret;
+
+	if (!cb_fn)
+		return -EINVAL;
+
+	rte_spinlock_lock(&dev_event_lock);
+
+	if (TAILQ_EMPTY(&dev_event_cbs))
+		TAILQ_INIT(&dev_event_cbs);
+
+	TAILQ_FOREACH(event_cb, &dev_event_cbs, next) {
+		if (event_cb->cb_fn == cb_fn && event_cb->cb_arg == cb_arg) {
+			if (device_name == NULL && event_cb->dev_name == NULL)
+				break;
+			if (device_name == NULL || event_cb->dev_name == NULL)
+				continue;
+			if (!strcmp(event_cb->dev_name, device_name))
+				break;
+		}
+	}
+
+	/* create a new callback. */
+	if (event_cb == NULL) {
+		event_cb = malloc(sizeof(struct dev_event_callback));
+		if (event_cb != NULL) {
+			event_cb->cb_fn = cb_fn;
+			event_cb->cb_arg = cb_arg;
+			event_cb->active = 0;
+			if (!device_name) {
+				event_cb->dev_name = NULL;
+			} else {
+				event_cb->dev_name = strdup(device_name);
+				if (event_cb->dev_name == NULL) {
+					ret = -ENOMEM;
+					goto error;
+				}
+			}
+			TAILQ_INSERT_TAIL(&dev_event_cbs, event_cb, next);
+		} else {
+			RTE_LOG(ERR, EAL,
+				"Failed to allocate memory for device "
+				"event callback.");
+			ret = -ENOMEM;
+			goto error;
+		}
+	} else {
+		RTE_LOG(ERR, EAL,
+			"The callback is already exist, no need "
+			"to register again.\n");
+		ret = -EEXIST;
+	}
+
+	rte_spinlock_unlock(&dev_event_lock);
+	return 0;
+error:
+	free(event_cb);
+	rte_spinlock_unlock(&dev_event_lock);
+	return ret;
+}
+
+int __rte_experimental
+rte_dev_event_callback_unregister(const char *device_name,
+				  rte_dev_event_cb_fn cb_fn,
+				  void *cb_arg)
+{
+	int ret = 0;
+	struct dev_event_callback *event_cb, *next;
+
+	if (!cb_fn)
+		return -EINVAL;
+
+	rte_spinlock_lock(&dev_event_lock);
+	/*walk through the callbacks and remove all that match. */
+	for (event_cb = TAILQ_FIRST(&dev_event_cbs); event_cb != NULL;
+	     event_cb = next) {
+
+		next = TAILQ_NEXT(event_cb, next);
+
+		if (device_name != NULL && event_cb->dev_name != NULL) {
+			if (!strcmp(event_cb->dev_name, device_name)) {
+				if (event_cb->cb_fn != cb_fn ||
+				    (cb_arg != (void *)-1 &&
+				    event_cb->cb_arg != cb_arg))
+					continue;
+			}
+		} else if (device_name != NULL) {
+			continue;
+		}
+
+		/*
+		 * if this callback is not executing right now,
+		 * then remove it.
+		 */
+		if (event_cb->active == 0) {
+			TAILQ_REMOVE(&dev_event_cbs, event_cb, next);
+			free(event_cb);
+			ret++;
+		} else {
+			continue;
+		}
+	}
+	rte_spinlock_unlock(&dev_event_lock);
+	return ret;
+}
+
+void
+dev_callback_process(char *device_name, enum rte_dev_event_type event)
+{
+	struct dev_event_callback *cb_lst;
+
+	if (device_name == NULL)
+		return;
+
+	rte_spinlock_lock(&dev_event_lock);
+
+	TAILQ_FOREACH(cb_lst, &dev_event_cbs, next) {
+		if (cb_lst->dev_name) {
+			if (strcmp(cb_lst->dev_name, device_name))
+				continue;
+		}
+		cb_lst->active = 1;
+		rte_spinlock_unlock(&dev_event_lock);
+		cb_lst->cb_fn(device_name, event,
+				cb_lst->cb_arg);
+		rte_spinlock_lock(&dev_event_lock);
+		cb_lst->active = 0;
+	}
+	rte_spinlock_unlock(&dev_event_lock);
+}
diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h
index 0b28770..88e5a59 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -9,6 +9,8 @@
 #include <stdint.h>
 #include <stdio.h>
 
+#include <rte_dev.h>
+
 /**
  * Initialize the memzone subsystem (private to eal).
  *
@@ -205,4 +207,17 @@ struct rte_bus *rte_bus_find_by_device_name(const char *str);
 
 int rte_mp_channel_init(void);
 
+/**
+ * Internal Executes all the user application registered callbacks for
+ * the specific device. It is for DPDK internal user only. User
+ * application should not call it directly.
+ *
+ * @param device_name
+ *  The device name.
+ * @param event
+ *  the device event type.
+ *
+ */
+void
+dev_callback_process(char *device_name, enum rte_dev_event_type event);
 #endif /* _EAL_PRIVATE_H_ */
diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h
index b688f1e..a5203e7 100644
--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -24,6 +24,25 @@ extern "C" {
 #include <rte_compat.h>
 #include <rte_log.h>
 
+/**
+ * The device event type.
+ */
+enum rte_dev_event_type {
+	RTE_DEV_EVENT_ADD,	/**< device being added */
+	RTE_DEV_EVENT_REMOVE,	/**< device being removed */
+	RTE_DEV_EVENT_MAX	/**< max value of this enum */
+};
+
+struct rte_dev_event {
+	enum rte_dev_event_type type;	/**< device event type */
+	int subsystem;			/**< subsystem id */
+	char *devname;			/**< device name */
+};
+
+typedef void (*rte_dev_event_cb_fn)(char *device_name,
+					enum rte_dev_event_type event,
+					void *cb_arg);
+
 __attribute__((format(printf, 2, 0)))
 static inline void
 rte_pmd_debug_trace(const char *func_name, const char *fmt, ...)
@@ -267,4 +286,79 @@ __attribute__((used)) = str
 }
 #endif
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * It registers the callback for the specific device.
+ * Multiple callbacks cal be registered at the same time.
+ *
+ * @param device_name
+ *  The device name, that is the param name of the struct rte_device,
+ *  null value means for all devices.
+ * @param cb_fn
+ *  callback address.
+ * @param cb_arg
+ *  address of parameter for callback.
+ *
+ * @return
+ *  - On success, zero.
+ *  - On failure, a negative value.
+ */
+int __rte_experimental
+rte_dev_event_callback_register(const char *device_name,
+				rte_dev_event_cb_fn cb_fn,
+				void *cb_arg);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * It unregisters the callback according to the specified device.
+ *
+ * @param device_name
+ *  The device name, that is the param name of the struct rte_device,
+ *  null value means for all devices and their callbacks.
+ * @param cb_fn
+ *  callback address.
+ * @param cb_arg
+ *  address of parameter for callback, (void *)-1 means to remove all
+ *  registered which has the same callback address.
+ *
+ * @return
+ *  - On success, return the number of callback entities removed.
+ *  - On failure, a negative value.
+ */
+int __rte_experimental
+rte_dev_event_callback_unregister(const char *device_name,
+				  rte_dev_event_cb_fn cb_fn,
+				  void *cb_arg);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Start the device event monitoring.
+ *
+ * @param none
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int __rte_experimental
+rte_dev_event_monitor_start(void);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Stop the device event monitoring .
+ *
+ * @param none
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int __rte_experimental
+rte_dev_event_monitor_stop(void);
 #endif /* _RTE_DEV_H_ */
diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index b9c7727..db67001 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -41,6 +41,7 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_lcore.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_timer.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_interrupts.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_alarm.c
+SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_dev.c
 
 # from common dir
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_common_lcore.c
diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
new file mode 100644
index 0000000..9c8d1a0
--- /dev/null
+++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
@@ -0,0 +1,22 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <rte_log.h>
+#include <rte_compat.h>
+#include <rte_dev.h>
+
+
+int __rte_experimental
+rte_dev_event_monitor_start(void)
+{
+	/* TODO: start uevent monitor for linux */
+	return 0;
+}
+
+int __rte_experimental
+rte_dev_event_monitor_stop(void)
+{
+	/* TODO: stop uevent monitor for linux */
+	return 0;
+}
diff --git a/lib/librte_eal/linuxapp/eal/meson.build b/lib/librte_eal/linuxapp/eal/meson.build
index 03974ff..b222571 100644
--- a/lib/librte_eal/linuxapp/eal/meson.build
+++ b/lib/librte_eal/linuxapp/eal/meson.build
@@ -18,6 +18,7 @@ env_sources = files('eal_alarm.c',
 		'eal_vfio_mp_sync.c',
 		'eal.c',
 		'eal_memory.c',
+		'eal_dev.c',
 )
 
 if has_libnuma == 1
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index dd38783..fc5c62a 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -258,5 +258,9 @@ EXPERIMENTAL {
 	rte_service_start_with_defaults;
 	rte_socket_count;
 	rte_socket_id_by_idx;
+	rte_dev_event_monitor_start;
+	rte_dev_event_monitor_stop;
+	rte_dev_event_callback_register;
+	rte_dev_event_callback_unregister;
 
 } DPDK_18.02;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH V21 3/4] eal/linux: uevent parse and process
  2018-04-06  3:54                                                                             ` [PATCH V21 0/4] add device event monitor framework Jeff Guo
  2018-04-06  3:54                                                                               ` [PATCH V21 1/4] eal: add device event handle in interrupt thread Jeff Guo
  2018-04-06  3:55                                                                               ` [PATCH V21 2/4] eal: add device event monitor framework Jeff Guo
@ 2018-04-06  3:55                                                                               ` Jeff Guo
  2018-04-06  3:55                                                                               ` [PATCH V21 4/4] app/testpmd: enable device hotplug monitoring Jeff Guo
  3 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-04-06  3:55 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, harry.van.haaren,
	jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

In order to handle the uevent which has been detected from the kernel
side, add uevent parse and process function to translate the uevent into
device event, which user has subscribed to monitor.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
---
v21->v20:
refine release note and some code cleaning
---
 doc/guides/rel_notes/release_18_05.rst |   2 +
 lib/librte_eal/linuxapp/eal/eal_dev.c  | 205 ++++++++++++++++++++++++++++++++-
 2 files changed, 205 insertions(+), 2 deletions(-)

diff --git a/doc/guides/rel_notes/release_18_05.rst b/doc/guides/rel_notes/release_18_05.rst
index d3c86bd..cb9e050 100644
--- a/doc/guides/rel_notes/release_18_05.rst
+++ b/doc/guides/rel_notes/release_18_05.rst
@@ -68,6 +68,8 @@ New Features
   * ``rte_dev_event_callback_register`` and ``rte_dev_event_callback_unregister``
     are for the user's callbacks register and unregister.
 
+  Linux uevent is supported as backend of this device event notification framework.
+
 API Changes
 -----------
 
diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
index 9c8d1a0..9478a39 100644
--- a/lib/librte_eal/linuxapp/eal/eal_dev.c
+++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
@@ -2,21 +2,222 @@
  * Copyright(c) 2018 Intel Corporation
  */
 
+#include <string.h>
+#include <unistd.h>
+#include <sys/socket.h>
+#include <linux/netlink.h>
+
 #include <rte_log.h>
 #include <rte_compat.h>
 #include <rte_dev.h>
+#include <rte_malloc.h>
+#include <rte_interrupts.h>
+#include <rte_alarm.h>
+
+#include "eal_private.h"
+
+static struct rte_intr_handle intr_handle = {.fd = -1 };
+static bool monitor_started;
+
+#define EAL_UEV_MSG_LEN 4096
+#define EAL_UEV_MSG_ELEM_LEN 128
+
+static void dev_uev_handler(__rte_unused void *param);
+
+/* identify the system layer which reports this event. */
+enum eal_dev_event_subsystem {
+	EAL_DEV_EVENT_SUBSYSTEM_PCI, /* PCI bus device event */
+	EAL_DEV_EVENT_SUBSYSTEM_UIO, /* UIO driver device event */
+	EAL_DEV_EVENT_SUBSYSTEM_VFIO, /* VFIO driver device event */
+	EAL_DEV_EVENT_SUBSYSTEM_MAX
+};
+
+static int
+dev_uev_socket_fd_create(void)
+{
+	struct sockaddr_nl addr;
+	int ret;
+
+	intr_handle.fd = socket(PF_NETLINK, SOCK_RAW | SOCK_CLOEXEC |
+			SOCK_NONBLOCK,
+			NETLINK_KOBJECT_UEVENT);
+	if (intr_handle.fd < 0) {
+		RTE_LOG(ERR, EAL, "create uevent fd failed.\n");
+		return -1;
+	}
+
+	memset(&addr, 0, sizeof(addr));
+	addr.nl_family = AF_NETLINK;
+	addr.nl_pid = 0;
+	addr.nl_groups = 0xffffffff;
+
+	ret = bind(intr_handle.fd, (struct sockaddr *) &addr, sizeof(addr));
+	if (ret < 0) {
+		RTE_LOG(ERR, EAL, "Failed to bind uevent socket.\n");
+		goto err;
+	}
 
+	return 0;
+err:
+	close(intr_handle.fd);
+	intr_handle.fd = -1;
+	return ret;
+}
+
+static int
+dev_uev_parse(const char *buf, struct rte_dev_event *event, int length)
+{
+	char action[EAL_UEV_MSG_ELEM_LEN];
+	char subsystem[EAL_UEV_MSG_ELEM_LEN];
+	char pci_slot_name[EAL_UEV_MSG_ELEM_LEN];
+	int i = 0;
+
+	memset(action, 0, EAL_UEV_MSG_ELEM_LEN);
+	memset(subsystem, 0, EAL_UEV_MSG_ELEM_LEN);
+	memset(pci_slot_name, 0, EAL_UEV_MSG_ELEM_LEN);
+
+	while (i < length) {
+		for (; i < length; i++) {
+			if (*buf)
+				break;
+			buf++;
+		}
+		/**
+		 * check device uevent from kernel side, no need to check
+		 * uevent from udev.
+		 */
+		if (!strncmp(buf, "libudev", 7)) {
+			buf += 7;
+			i += 7;
+			return -1;
+		}
+		if (!strncmp(buf, "ACTION=", 7)) {
+			buf += 7;
+			i += 7;
+			snprintf(action, sizeof(action), "%s", buf);
+		} else if (!strncmp(buf, "SUBSYSTEM=", 10)) {
+			buf += 10;
+			i += 10;
+			snprintf(subsystem, sizeof(subsystem), "%s", buf);
+		} else if (!strncmp(buf, "PCI_SLOT_NAME=", 14)) {
+			buf += 14;
+			i += 14;
+			snprintf(pci_slot_name, sizeof(subsystem), "%s", buf);
+			event->devname = strdup(pci_slot_name);
+		}
+		for (; i < length; i++) {
+			if (*buf == '\0')
+				break;
+			buf++;
+		}
+	}
+
+	/* parse the subsystem layer */
+	if (!strncmp(subsystem, "uio", 3))
+		event->subsystem = EAL_DEV_EVENT_SUBSYSTEM_UIO;
+	else if (!strncmp(subsystem, "pci", 3))
+		event->subsystem = EAL_DEV_EVENT_SUBSYSTEM_PCI;
+	else if (!strncmp(subsystem, "vfio", 4))
+		event->subsystem = EAL_DEV_EVENT_SUBSYSTEM_VFIO;
+	else
+		return -1;
+
+	/* parse the action type */
+	if (!strncmp(action, "add", 3))
+		event->type = RTE_DEV_EVENT_ADD;
+	else if (!strncmp(action, "remove", 6))
+		event->type = RTE_DEV_EVENT_REMOVE;
+	else
+		return -1;
+	return 0;
+}
+
+static void
+dev_delayed_unregister(void *param)
+{
+	rte_intr_callback_unregister(&intr_handle, dev_uev_handler, param);
+	close(intr_handle.fd);
+	intr_handle.fd = -1;
+}
+
+static void
+dev_uev_handler(__rte_unused void *param)
+{
+	struct rte_dev_event uevent;
+	int ret;
+	char buf[EAL_UEV_MSG_LEN];
+
+	memset(&uevent, 0, sizeof(struct rte_dev_event));
+	memset(buf, 0, EAL_UEV_MSG_LEN);
+
+	ret = recv(intr_handle.fd, buf, EAL_UEV_MSG_LEN, MSG_DONTWAIT);
+	if (ret < 0 && errno == EAGAIN)
+		return;
+	else if (ret <= 0) {
+		/* connection is closed or broken, can not up again. */
+		RTE_LOG(ERR, EAL, "uevent socket connection is broken.\n");
+		rte_eal_alarm_set(1, dev_delayed_unregister, NULL);
+		return;
+	}
+
+	ret = dev_uev_parse(buf, &uevent, EAL_UEV_MSG_LEN);
+	if (ret < 0) {
+		RTE_LOG(DEBUG, EAL, "It is not an valid event "
+			"that need to be handle.\n");
+		return;
+	}
+
+	RTE_LOG(DEBUG, EAL, "receive uevent(name:%s, type:%d, subsystem:%d)\n",
+		uevent.devname, uevent.type, uevent.subsystem);
+
+	if (uevent.devname)
+		dev_callback_process(uevent.devname, uevent.type);
+}
 
 int __rte_experimental
 rte_dev_event_monitor_start(void)
 {
-	/* TODO: start uevent monitor for linux */
+	int ret;
+
+	if (monitor_started)
+		return 0;
+
+	ret = dev_uev_socket_fd_create();
+	if (ret) {
+		RTE_LOG(ERR, EAL, "error create device event fd.\n");
+		return -1;
+	}
+
+	intr_handle.type = RTE_INTR_HANDLE_DEV_EVENT;
+	ret = rte_intr_callback_register(&intr_handle, dev_uev_handler, NULL);
+
+	if (ret) {
+		RTE_LOG(ERR, EAL, "fail to register uevent callback.\n");
+		return -1;
+	}
+
+	monitor_started = true;
+
 	return 0;
 }
 
 int __rte_experimental
 rte_dev_event_monitor_stop(void)
 {
-	/* TODO: stop uevent monitor for linux */
+	int ret;
+
+	if (!monitor_started)
+		return 0;
+
+	ret = rte_intr_callback_unregister(&intr_handle, dev_uev_handler,
+					   (void *)-1);
+	if (ret < 0) {
+		RTE_LOG(ERR, EAL, "fail to unregister uevent callback.\n");
+		return ret;
+	}
+
+	close(intr_handle.fd);
+	intr_handle.fd = -1;
+	monitor_started = false;
 	return 0;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH V21 4/4] app/testpmd: enable device hotplug monitoring
  2018-04-06  3:54                                                                             ` [PATCH V21 0/4] add device event monitor framework Jeff Guo
                                                                                                 ` (2 preceding siblings ...)
  2018-04-06  3:55                                                                               ` [PATCH V21 3/4] eal/linux: uevent parse and process Jeff Guo
@ 2018-04-06  3:55                                                                               ` Jeff Guo
  3 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-04-06  3:55 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, harry.van.haaren,
	jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

Use testpmd for example, to show how an application uses device event
APIs to monitor the hotplug events, including both hot removal event
and hot insertion event.

The process is that, testpmd first enable hotplug by below commands,

E.g. ./build/app/testpmd -c 0x3 --n 4 -- -i --hot-plug

then testpmd starts the device event monitor by calling the new API
(rte_dev_event_monitor_start) and register the user's callback by call
the API (rte_dev_event_callback_register), when device being hotplug
insertion or hotplug removal, the device event monitor detects the event
and call user's callbacks, user could process the event in the callback
accordingly.

This patch only shows the event monitoring, device attach/detach would
not be involved here, will add from other hotplug patch set.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
---
v21->v20:
no change.
---
 app/test-pmd/parameters.c             |   5 +-
 app/test-pmd/testpmd.c                | 101 +++++++++++++++++++++++++++++++++-
 app/test-pmd/testpmd.h                |   2 +
 doc/guides/testpmd_app_ug/run_app.rst |   4 ++
 4 files changed, 110 insertions(+), 2 deletions(-)

diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index 2192bdc..1a05284 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -186,6 +186,7 @@ usage(char* progname)
 	printf("  --flow-isolate-all: "
 	       "requests flow API isolated mode on all ports at initialization time.\n");
 	printf("  --tx-offloads=0xXXXXXXXX: hexadecimal bitmask of TX queue offloads\n");
+	printf("  --hot-plug: enable hot plug for device.\n");
 }
 
 #ifdef RTE_LIBRTE_CMDLINE
@@ -621,6 +622,7 @@ launch_args_parse(int argc, char** argv)
 		{ "print-event",		1, 0, 0 },
 		{ "mask-event",			1, 0, 0 },
 		{ "tx-offloads",		1, 0, 0 },
+		{ "hot-plug",			0, 0, 0 },
 		{ 0, 0, 0, 0 },
 	};
 
@@ -1101,7 +1103,8 @@ launch_args_parse(int argc, char** argv)
 					rte_exit(EXIT_FAILURE,
 						 "invalid mask-event argument\n");
 				}
-
+			if (!strcmp(lgopts[opt_idx].name, "hot-plug"))
+				hot_plug = 1;
 			break;
 		case 'h':
 			usage(argv[0]);
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 4c0e258..d2c122a 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -12,6 +12,7 @@
 #include <sys/mman.h>
 #include <sys/types.h>
 #include <errno.h>
+#include <stdbool.h>
 
 #include <sys/queue.h>
 #include <sys/stat.h>
@@ -284,6 +285,8 @@ uint8_t lsc_interrupt = 1; /* enabled by default */
  */
 uint8_t rmv_interrupt = 1; /* enabled by default */
 
+uint8_t hot_plug = 0; /**< hotplug disabled by default. */
+
 /*
  * Display or mask ether events
  * Default to all events except VF_MBOX
@@ -391,6 +394,12 @@ static void check_all_ports_link_status(uint32_t port_mask);
 static int eth_event_callback(portid_t port_id,
 			      enum rte_eth_event_type type,
 			      void *param, void *ret_param);
+static void eth_dev_event_callback(char *device_name,
+				enum rte_dev_event_type type,
+				void *param);
+static int eth_dev_event_callback_register(void);
+static int eth_dev_event_callback_unregister(void);
+
 
 /*
  * Check if all the ports are started.
@@ -1853,6 +1862,39 @@ reset_port(portid_t pid)
 	printf("Done\n");
 }
 
+static int
+eth_dev_event_callback_register(void)
+{
+	int ret;
+
+	/* register the device event callback */
+	ret = rte_dev_event_callback_register(NULL,
+		eth_dev_event_callback, NULL);
+	if (ret) {
+		printf("Failed to register device event callback\n");
+		return -1;
+	}
+
+	return 0;
+}
+
+
+static int
+eth_dev_event_callback_unregister(void)
+{
+	int ret;
+
+	/* unregister the device event callback */
+	ret = rte_dev_event_callback_unregister(NULL,
+		eth_dev_event_callback, NULL);
+	if (ret < 0) {
+		printf("Failed to unregister device event callback\n");
+		return -1;
+	}
+
+	return 0;
+}
+
 void
 attach_port(char *identifier)
 {
@@ -1916,6 +1958,7 @@ void
 pmd_test_exit(void)
 {
 	portid_t pt_id;
+	int ret;
 
 	if (test_done == 0)
 		stop_packet_forwarding();
@@ -1929,6 +1972,18 @@ pmd_test_exit(void)
 			close_port(pt_id);
 		}
 	}
+
+	if (hot_plug) {
+		ret = rte_dev_event_monitor_stop();
+		if (ret)
+			RTE_LOG(ERR, EAL,
+				"fail to stop device event monitor.");
+
+		ret = eth_dev_event_callback_unregister();
+		if (ret)
+			RTE_LOG(ERR, EAL,
+				"fail to unregister all event callbacks.");
+	}
 	printf("\nBye...\n");
 }
 
@@ -2059,6 +2114,37 @@ eth_event_callback(portid_t port_id, enum rte_eth_event_type type, void *param,
 	return 0;
 }
 
+/* This function is used by the interrupt thread */
+static void
+eth_dev_event_callback(char *device_name, enum rte_dev_event_type type,
+			     __rte_unused void *arg)
+{
+	if (type >= RTE_DEV_EVENT_MAX) {
+		fprintf(stderr, "%s called upon invalid event %d\n",
+			__func__, type);
+		fflush(stderr);
+	}
+
+	switch (type) {
+	case RTE_DEV_EVENT_REMOVE:
+		RTE_LOG(ERR, EAL, "The device: %s has been removed!\n",
+			device_name);
+		/* TODO: After finish failure handle, begin to stop
+		 * packet forward, stop port, close port, detach port.
+		 */
+		break;
+	case RTE_DEV_EVENT_ADD:
+		RTE_LOG(ERR, EAL, "The device: %s has been added!\n",
+			device_name);
+		/* TODO: After finish kernel driver binding,
+		 * begin to attach port.
+		 */
+		break;
+	default:
+		break;
+	}
+}
+
 static int
 set_tx_queue_stats_mapping_registers(portid_t port_id, struct rte_port *port)
 {
@@ -2474,8 +2560,9 @@ signal_handler(int signum)
 int
 main(int argc, char** argv)
 {
-	int  diag;
+	int diag;
 	portid_t port_id;
+	int ret;
 
 	signal(SIGINT, signal_handler);
 	signal(SIGTERM, signal_handler);
@@ -2543,6 +2630,18 @@ main(int argc, char** argv)
 		       nb_rxq, nb_txq);
 
 	init_config();
+
+	if (hot_plug) {
+		/* enable hot plug monitoring */
+		ret = rte_dev_event_monitor_start();
+		if (ret) {
+			rte_errno = EINVAL;
+			return -1;
+		}
+		eth_dev_event_callback_register();
+
+	}
+
 	if (start_port(RTE_PORT_ALL) != 0)
 		rte_exit(EXIT_FAILURE, "Start ports failed\n");
 
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 153abea..8fde68d 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -319,6 +319,8 @@ extern volatile int test_done; /* stop packet forwarding when set to 1. */
 extern uint8_t lsc_interrupt; /**< disabled by "--no-lsc-interrupt" parameter */
 extern uint8_t rmv_interrupt; /**< disabled by "--no-rmv-interrupt" parameter */
 extern uint32_t event_print_mask;
+extern uint8_t hot_plug; /**< enable by "--hot-plug" parameter */
+
 /**< set by "--print-event xxxx" and "--mask-event xxxx parameters */
 
 #ifdef RTE_LIBRTE_IXGBE_BYPASS
diff --git a/doc/guides/testpmd_app_ug/run_app.rst b/doc/guides/testpmd_app_ug/run_app.rst
index 1fd5395..d0ced36 100644
--- a/doc/guides/testpmd_app_ug/run_app.rst
+++ b/doc/guides/testpmd_app_ug/run_app.rst
@@ -479,3 +479,7 @@ The commandline options are:
 
     Set the hexadecimal bitmask of TX queue offloads.
     The default value is 0.
+
+*   ``--hot-plug``
+
+    Enable device event monitor machenism for hotplug.
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* Re: [PATCH V19 3/4] eal/linux: uevent parse and process
  2018-04-05 11:05                                                                         ` Tan, Jianfeng
@ 2018-04-11 11:40                                                                           ` Guo, Jia
  0 siblings, 0 replies; 494+ messages in thread
From: Guo, Jia @ 2018-04-11 11:40 UTC (permalink / raw)
  To: Tan, Jianfeng, stephen, bruce.richardson, ferruh.yigit,
	konstantin.ananyev, gaetan.rivet, jingjing.wu, thomas, motih,
	harry.van.haaren
  Cc: jblunck, shreyansh.jain, dev, helin.zhang



On 4/5/2018 7:05 PM, Tan, Jianfeng wrote:
>
>
> On 4/5/2018 5:02 PM, Jeff Guo wrote:
>> In order to handle the uevent which has been detected from the kernel
>> side, add uevent parse and process function to translate the uevent into
>> device event, which user has subscribed to monitor.
>>
>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
>> ---
>> v19->18:
>> fix some misunderstanding part
>> ---
>>   lib/librte_eal/linuxapp/eal/eal_dev.c | 196 
>> +++++++++++++++++++++++++++++++++-
>>   1 file changed, 194 insertions(+), 2 deletions(-)
>>
>> diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c 
>> b/lib/librte_eal/linuxapp/eal/eal_dev.c
>> index 9c8d1a0..4686c41 100644
>> --- a/lib/librte_eal/linuxapp/eal/eal_dev.c
>> +++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
>> @@ -2,21 +2,213 @@
>>    * Copyright(c) 2018 Intel Corporation
>>    */
>>   +#include <string.h>
>> +#include <unistd.h>
>> +#include <sys/socket.h>
>> +#include <linux/netlink.h>
>> +
>>   #include <rte_log.h>
>>   #include <rte_compat.h>
>>   #include <rte_dev.h>
>> +#include <rte_malloc.h>
>> +#include <rte_interrupts.h>
>> +
>> +#include "eal_private.h"
>> +
>> +static struct rte_intr_handle intr_handle = {.fd = -1 };
>> +static bool monitor_started;
>> +
>> +#define EAL_UEV_MSG_LEN 4096
>> +#define EAL_UEV_MSG_ELEM_LEN 128
>> +
>> +/* identify the system layer which reports this event. */
>> +enum eal_dev_event_subsystem {
>> +    EAL_DEV_EVENT_SUBSYSTEM_PCI, /* PCI bus device event */
>> +    EAL_DEV_EVENT_SUBSYSTEM_UIO, /* UIO driver device event */
>> +    EAL_DEV_EVENT_SUBSYSTEM_VFIO, /* VFIO driver device event */
>> +    EAL_DEV_EVENT_SUBSYSTEM_MAX
>> +};
>> +
>> +static int
>> +dev_uev_socket_fd_create(void)
>> +{
>> +    struct sockaddr_nl addr;
>> +    int ret;
>> +
>> +    intr_handle.fd = socket(PF_NETLINK, SOCK_RAW | SOCK_CLOEXEC |
>> +            SOCK_NONBLOCK,
>> +            NETLINK_KOBJECT_UEVENT);
>> +    if (intr_handle.fd < 0) {
>> +        RTE_LOG(ERR, EAL, "create uevent fd failed.\n");
>> +        return -1;
>> +    }
>> +
>> +    memset(&addr, 0, sizeof(addr));
>> +    addr.nl_family = AF_NETLINK;
>> +    addr.nl_pid = 0;
>> +    addr.nl_groups = 0xffffffff;
>> +
>> +    ret = bind(intr_handle.fd, (struct sockaddr *) &addr, 
>> sizeof(addr));
>> +    if (ret < 0) {
>> +        RTE_LOG(ERR, EAL, "Failed to bind uevent socket.\n");
>> +        goto err;
>> +    }
>> +
>> +    return 0;
>> +err:
>> +    close(intr_handle.fd);
>> +    intr_handle.fd = -1;
>> +    return ret;
>> +}
>> +
>> +static int
>> +dev_uev_parse(const char *buf, struct rte_dev_event *event, int length)
>> +{
>> +    char action[EAL_UEV_MSG_ELEM_LEN];
>> +    char subsystem[EAL_UEV_MSG_ELEM_LEN];
>> +    char pci_slot_name[EAL_UEV_MSG_ELEM_LEN];
>> +    int i = 0, ret = 0;
>> +
>> +    memset(action, 0, EAL_UEV_MSG_ELEM_LEN);
>> +    memset(subsystem, 0, EAL_UEV_MSG_ELEM_LEN);
>> +    memset(pci_slot_name, 0, EAL_UEV_MSG_ELEM_LEN);
>> +
>> +    while (i < length) {
>> +        for (; i < length; i++) {
>> +            if (*buf)
>> +                break;
>> +            buf++;
>> +        }
>> +        /**
>> +         * check device uevent from kernel side, no need to check
>> +         * uevent from udev.
>> +         */
>> +        if (!strncmp(buf, "libudev", 7)) {
>> +            buf += 7;
>> +            i += 7;
>> +            return -1;
>> +        }
>> +        if (!strncmp(buf, "ACTION=", 7)) {
>> +            buf += 7;
>> +            i += 7;
>> +            snprintf(action, sizeof(action), "%s", buf);
>> +        } else if (!strncmp(buf, "SUBSYSTEM=", 10)) {
>> +            buf += 10;
>> +            i += 10;
>> +            snprintf(subsystem, sizeof(subsystem), "%s", buf);
>> +        } else if (!strncmp(buf, "PCI_SLOT_NAME=", 14)) {
>> +            buf += 14;
>> +            i += 14;
>> +            snprintf(pci_slot_name, sizeof(subsystem), "%s", buf);
>> +            event->devname = strdup(pci_slot_name);
>> +        }
>> +        for (; i < length; i++) {
>> +            if (*buf == '\0')
>> +                break;
>> +            buf++;
>> +        }
>> +    }
>> +
>> +    /* parse the subsystem layer */
>> +    if (!strncmp(subsystem, "uio", 3))
>> +        event->subsystem = EAL_DEV_EVENT_SUBSYSTEM_UIO;
>> +    else if (!strncmp(subsystem, "pci", 3))
>> +        event->subsystem = EAL_DEV_EVENT_SUBSYSTEM_PCI;
>> +    else if (!strncmp(subsystem, "vfio", 4))
>> +        event->subsystem = EAL_DEV_EVENT_SUBSYSTEM_VFIO;
>> +    else
>> +        ret = -1;
>
> We can just return -1 here.
>
>>   +    /* parse the action type */
>> +    if (!strncmp(action, "add", 3))
>> +        event->type = RTE_DEV_EVENT_ADD;
>> +    else if (!strncmp(action, "remove", 6))
>> +        event->type = RTE_DEV_EVENT_REMOVE;
>> +    else
>> +        ret = -1;
>
> We can just return -1 here.
>
>> +    return ret;
>
> return 0 here.
>
>> +}
>> +
>> +static void
>> +dev_uev_handler(__rte_unused void *param)
>> +{
>> +    struct rte_dev_event uevent;
>> +    int ret;
>> +    char buf[EAL_UEV_MSG_LEN];
>> +
>> +    memset(&uevent, 0, sizeof(struct rte_dev_event));
>> +    memset(buf, 0, EAL_UEV_MSG_LEN);
>> +
>> +    ret = recv(intr_handle.fd, buf, EAL_UEV_MSG_LEN, MSG_DONTWAIT);
>> +    if (ret == 0 || (ret < 0 && errno != EAGAIN)) {
>> +        /* connection is closed or broken, can not up again. */
>> +        RTE_LOG(ERR, EAL, "uevent socket connection is broken.\n");
>
> Again, we need an alarm to unregister the callback from intr thread.
>
ok.
>> +        return;
>> +    } else if (ret < 0) {
>> +        RTE_LOG(ERR, EAL,
>> +            "uevent socket read error(%d): %s.\n",
>> +            errno, strerror(errno));
>> +        return;
>> +    }
>
> I think the above code can be adjusted as:
>
>     if (ret == 0 || (ret < 0 && errno == EAGAIN))
>             return;
>     else if (ret < 0) {
>            RTE_LOG(ERR, EAL, "...");
>            set_alarm_to_unregister();
>     }
>
>
will check the best coding style.
>> +
>> +    ret = dev_uev_parse(buf, &uevent, EAL_UEV_MSG_LEN);
>> +    if (ret < 0) {
>> +        RTE_LOG(ERR, EAL, "It is not an valid event "
>
> s/ERR/DEBUG, or there are too many logs.
>
make sense.
>> +            "that need to be handle.\n");
>> +        return;
>> +    }
>> +
>> +    RTE_LOG(DEBUG, EAL, "receive uevent(name:%s, type:%d, 
>> subsystem:%d)\n",
>> +        uevent.devname, uevent.type, uevent.subsystem);
>> +
>> +    if (uevent.devname)
>> +        dev_callback_process(uevent.devname, uevent.type);
>> +}
>>     int __rte_experimental
>>   rte_dev_event_monitor_start(void)
>>   {
>> -    /* TODO: start uevent monitor for linux */
>> +    int ret;
>> +
>> +    if (monitor_started)
>> +        return 0;
>> +
>> +    ret = dev_uev_socket_fd_create();
>> +    if (ret) {
>> +        RTE_LOG(ERR, EAL, "error create device event fd.\n");
>> +        return -1;
>> +    }
>> +
>> +    intr_handle.type = RTE_INTR_HANDLE_DEV_EVENT;
>> +    ret = rte_intr_callback_register(&intr_handle, dev_uev_handler, 
>> NULL);
>> +
>> +    if (ret) {
>> +        RTE_LOG(ERR, EAL, "fail to register uevent callback.\n");
>> +        return -1;
>> +    }
>> +
>> +    monitor_started = true;
>> +
>>       return 0;
>>   }
>>     int __rte_experimental
>>   rte_dev_event_monitor_stop(void)
>>   {
>> -    /* TODO: stop uevent monitor for linux */
>> +    int ret;
>> +
>> +    if (!monitor_started)
>> +        return 0;
>> +
>> +    ret = rte_intr_callback_unregister(&intr_handle, dev_uev_handler,
>> +                       (void *)-1);
>> +    if (ret < 0) {
>> +        RTE_LOG(ERR, EAL, "fail to unregister uevent callback.\n");
>> +        return ret;
>> +    }
>> +
>> +    close(intr_handle.fd);
>> +    intr_handle.fd = -1;
>> +    monitor_started = false;
>>       return 0;
>>   }
>

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V21 2/4] eal: add device event monitor framework
  2018-04-06  3:55                                                                               ` [PATCH V21 2/4] eal: add device event monitor framework Jeff Guo
@ 2018-04-12  8:36                                                                                 ` Thomas Monjalon
  0 siblings, 0 replies; 494+ messages in thread
From: Thomas Monjalon @ 2018-04-12  8:36 UTC (permalink / raw)
  To: Jeff Guo
  Cc: dev, stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, motih, harry.van.haaren, jianfeng.tan,
	shreyansh.jain, helin.zhang

06/04/2018 05:55, Jeff Guo:
> v21->v20:

This is a very high number of revisions.
I cannot see them in my mail client because they are too much nested
and indented in the thread representation.
Tip: when sending a new revision, it is better to thread it with the
first revision, so we do not have an infinite nesting.


> --- a/doc/guides/rel_notes/release_18_05.rst
> +++ b/doc/guides/rel_notes/release_18_05.rst
> +* **Added device event monitor framework.**
> +
> +  Added a general device event monitor framework at EAL, for device dynamic management.
> +  Such as device hotplug awareness and actions adopted accordingly. The list of new APIs:
> +
> +  * ``rte_dev_event_monitor_start`` and ``rte_dev_event_monitor_stop`` are for
> +    the event monitor enable and disable.
> +  * ``rte_dev_event_callback_register`` and ``rte_dev_event_callback_unregister``
> +    are for the user's callbacks register and unregister.
>  
>  API Changes

Please keep 2 blank lines before the title.


> +/* The device event callback list for all registered callbacks. */
> +static struct dev_event_cb_list dev_event_cbs;
> +
> +/** @internal Structure to keep track of registered callbacks */
> +TAILQ_HEAD(dev_event_cb_list, dev_event_callback);

There is a compilation error with clang:

lib/librte_eal/common/eal_common_dev.c:37:33: fatal error:
	tentative definition of variable with internal linkage
	has incomplete non-array type
		'struct dev_event_cb_list' [-Wtentative-definition-incomplete-type]
static struct dev_event_cb_list dev_event_cbs;
                                ^


> --- a/lib/librte_eal/rte_eal_version.map
> +++ b/lib/librte_eal/rte_eal_version.map
> @@ -258,5 +258,9 @@ EXPERIMENTAL {
>  	rte_service_start_with_defaults;
>  	rte_socket_count;
>  	rte_socket_id_by_idx;
> +	rte_dev_event_monitor_start;
> +	rte_dev_event_monitor_stop;
> +	rte_dev_event_callback_register;
> +	rte_dev_event_callback_unregister;
>  
>  } DPDK_18.02;

Please keep the alphabetical order.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH V22 0/4] add device event monitor framework
  2017-06-29  4:37     ` [PATCH v3 0/2] add uevent api for hot plug Jeff Guo
  2017-06-29  4:37       ` [PATCH v3 1/2] eal: " Jeff Guo
  2017-06-29  4:37       ` [PATCH v3 2/2] net/i40e: add hot plug monitor in i40e Jeff Guo
@ 2018-04-13  8:30       ` Jeff Guo
  2018-04-13  8:30         ` [PATCH V22 1/4] eal: add device event handle in interrupt thread Jeff Guo
                           ` (4 more replies)
  2018-04-18 13:38       ` [PATCH V20 0/4] add hot plug recovery mechanism Jeff Guo
                         ` (20 subsequent siblings)
  23 siblings, 5 replies; 494+ messages in thread
From: Jeff Guo @ 2018-04-13  8:30 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, harry.van.haaren,
	jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

About hot plug in dpdk, We already have proactive way to add/remove devices
through APIs (rte_eal_hotplug_add/remove), and also have fail-safe driver
to offload the fail-safe work from the app user. But there are still lack
of a general mechanism to monitor hotplug event for all driver, now the
hotplug interrupt event is diversity between each device and driver, such
as mlx4, pci driver and others.

Use the hot removal event for example, pci drivers not all exposure the
remove interrupt, so in order to make user to easy use the hot plug
feature for pci driver, something must be done to detect the remove event
at the kernel level and offer a new line of interrupt to the user land.

Base on the uevent of kobject mechanism in kernel, we could use it to
benefit for monitoring the hot plug status of the device which not only
uio/vfio of pci bus devices, but also other, such as cpu/usb/pci-express bus devices.

The idea is comming as bellow.

a.The uevent message form FD monitoring like below.
remove@/devices/pci0000:80/0000:80:02.2/0000:82:00.0/0000:83:03.0/0000:84:00.2/uio/uio2
ACTION=remove
DEVPATH=/devices/pci0000:80/0000:80:02.2/0000:82:00.0/0000:83:03.0/0000:84:00.2/uio/uio2
SUBSYSTEM=uio
MAJOR=243
MINOR=2
DEVNAME=uio2
SEQNUM=11366

b.add device event monitor framework:
add several general api to enable uevent monitoring.

c.show example how to use uevent monitor
enable uevent monitoring in testpmd to show device event monitor machenism usage.

TODO: failure handler mechanism for hot plug and driver auto bind for hot insertion.
that would let the next hot plug patch set to cover.

patchset history:
v22->v21:
fix clang compile issue and doc style

v21->v20:
refine release note and some code cleaning.

v20->v19:
add more detail note and socket error handler.

v19->18:
fix some typo and misunderstanding part

v18->v17:
1.add feature announcement in release document, fix bsp compile issue.
2.refine socket configuration.
3.remove hotplug policy and detach/attach process from testpmd, let it
focus on the device event monitoring which the patch set introduced.

v17->v16:
1.add related part of the interrupt handle type adding.
2.add new API into map, fix typo issue, add (void*)-1 value for unregister all callback
3.add new file into meson.build, modify coding sytle and add print info, delete unused part.
4.unregister all user's callback when stop event monitor

v16->v15:
1.remove some linux related code out of eal common layer
2.fix some uneasy readble issue.

v15->v14:
1.use exist eal interrupt epoll to replace of rte service usage for monitor thread,
2.add new device event handle type in eal interrupt.
3.remove the uevent type check and any policy from eal,
let it check and management in user's callback.
4.add "--hot-plug" configure parameter in testpmd to switch the hotplug feature.

v14->v13:
1.add __rte_experimental on function defind and fix bsd build issue

v13->v12:
1.fix some logic issue and null check issue
2.fix monitor stop func issue

v12->v11:
1.identify null param in callback for monitor all devices uevent

v11->v10:
1:modify some typo and add experimental tag in new file.
2:modify callback register calling.

v10->v9:
1.fix prefix issue.
2.use a common callback lists for all device and all type to replace
add callback parameter into device struct.
3.delete some unuse part.

v9->v8:
split the patch set into small and explicit patch

v8->v7:
1.use rte_service to replace pthread management.
2.fix defind issue and copyright issue
3.fix some lock issue

v7->v6:
1.modify vdev part according to the vdev rework
2.re-define and split the func into common and bus specific code
3.fix some incorrect issue.
4.fix the system hung after send packcet issue.

v6->v5:
1.add hot plug policy, in eal, default handle to prepare hot plug work for
all pci device, then let app to manage to deside which device need to
hot plug.
2.modify to manage event callback in each device.
3.fix some system hung issue when igb_uioome typo error.release.
4.modify the pci part to the bus-pci base on the bus rework.
5.add hot plug policy in app, show example to use hotplug list to manage
to deside which device need to hot plug.

v5->v4:
1.Move uevent monitor epolling from eal interrupt to eal device layer.
2.Redefine the eal device API for common, and distinguish between linux and bsd
3.Add failure handler helper api in bus layer.Add function of find device by name.
4.Replace of individual fd bind with single device, use a common fd to polling all device.
5.Add to register hot insertion monitoring and process, add function to auto bind driver befor user add device
6.Refine some coding style and typos issue
7.add new callback to process hot insertion

v4->v3:
1.move uevent monitor api from eal interrupt to eal device layer.
2.create uevent type and struct in eal device.
3.move uevent handler for each driver to eal layer.
4.add uevent failure handler to process signal fault issue.
5.add example for request and use uevent monitoring in testpmd.

v3->v2:
1.refine some return error
2.refine the string searching logic to avoid memory issue

v2->v1:
1.remove global variables of hotplug_fd, add uevent_fd
in rte_intr_handle to let each pci device self maintain it fd,
to fix dual device fd issue.
2.refine some typo error.

Jeff Guo (4):
  eal: add device event handle in interrupt thread
  eal: add device event monitor framework
  eal/linux: uevent parse and process
  app/testpmd: enable device hotplug monitoring

 app/test-pmd/parameters.c                          |   5 +-
 app/test-pmd/testpmd.c                             | 101 +++++++++-
 app/test-pmd/testpmd.h                             |   2 +
 doc/guides/rel_notes/release_18_05.rst             |  12 ++
 doc/guides/testpmd_app_ug/run_app.rst              |   4 +
 lib/librte_eal/bsdapp/eal/Makefile                 |   1 +
 lib/librte_eal/bsdapp/eal/eal_dev.c                |  21 ++
 lib/librte_eal/bsdapp/eal/meson.build              |   1 +
 lib/librte_eal/common/eal_common_dev.c             | 161 +++++++++++++++
 lib/librte_eal/common/eal_private.h                |  15 ++
 lib/librte_eal/common/include/rte_dev.h            |  94 +++++++++
 lib/librte_eal/common/include/rte_eal_interrupts.h |   1 +
 lib/librte_eal/linuxapp/eal/Makefile               |   1 +
 lib/librte_eal/linuxapp/eal/eal_dev.c              | 223 +++++++++++++++++++++
 lib/librte_eal/linuxapp/eal/eal_interrupts.c       |  11 +-
 lib/librte_eal/linuxapp/eal/meson.build            |   1 +
 lib/librte_eal/rte_eal_version.map                 |   4 +
 test/test/test_interrupts.c                        |  39 +++-
 18 files changed, 692 insertions(+), 5 deletions(-)
 create mode 100644 lib/librte_eal/bsdapp/eal/eal_dev.c
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_dev.c

-- 
2.7.4

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH V22 1/4] eal: add device event handle in interrupt thread
  2018-04-13  8:30       ` [PATCH V22 0/4] add device event monitor framework Jeff Guo
@ 2018-04-13  8:30         ` Jeff Guo
  2018-04-13  8:30         ` [PATCH V22 2/4] eal: add device event monitor framework Jeff Guo
                           ` (3 subsequent siblings)
  4 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-04-13  8:30 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, harry.van.haaren,
	jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

Add new interrupt handle type of RTE_INTR_HANDLE_DEV_EVENT, for
device event interrupt monitor.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
---
v22->v21:
no change
---
 lib/librte_eal/common/include/rte_eal_interrupts.h |  1 +
 lib/librte_eal/linuxapp/eal/eal_interrupts.c       | 11 +++++-
 test/test/test_interrupts.c                        | 39 ++++++++++++++++++++--
 3 files changed, 48 insertions(+), 3 deletions(-)

diff --git a/lib/librte_eal/common/include/rte_eal_interrupts.h b/lib/librte_eal/common/include/rte_eal_interrupts.h
index 3f792a9..6eb4932 100644
--- a/lib/librte_eal/common/include/rte_eal_interrupts.h
+++ b/lib/librte_eal/common/include/rte_eal_interrupts.h
@@ -34,6 +34,7 @@ enum rte_intr_handle_type {
 	RTE_INTR_HANDLE_ALARM,        /**< alarm handle */
 	RTE_INTR_HANDLE_EXT,          /**< external handler */
 	RTE_INTR_HANDLE_VDEV,         /**< virtual device */
+	RTE_INTR_HANDLE_DEV_EVENT,    /**< device event handle */
 	RTE_INTR_HANDLE_MAX           /**< count of elements */
 };
 
diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
index f86f22f..58e9328 100644
--- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c
+++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
@@ -559,6 +559,9 @@ rte_intr_enable(const struct rte_intr_handle *intr_handle)
 			return -1;
 		break;
 #endif
+	/* not used at this moment */
+	case RTE_INTR_HANDLE_DEV_EVENT:
+		return -1;
 	/* unknown handle type */
 	default:
 		RTE_LOG(ERR, EAL,
@@ -606,6 +609,9 @@ rte_intr_disable(const struct rte_intr_handle *intr_handle)
 			return -1;
 		break;
 #endif
+	/* not used at this moment */
+	case RTE_INTR_HANDLE_DEV_EVENT:
+		return -1;
 	/* unknown handle type */
 	default:
 		RTE_LOG(ERR, EAL,
@@ -674,7 +680,10 @@ eal_intr_process_interrupts(struct epoll_event *events, int nfds)
 			bytes_read = 0;
 			call = true;
 			break;
-
+		case RTE_INTR_HANDLE_DEV_EVENT:
+			bytes_read = 0;
+			call = true;
+			break;
 		default:
 			bytes_read = 1;
 			break;
diff --git a/test/test/test_interrupts.c b/test/test/test_interrupts.c
index 31a70a0..dc19175 100644
--- a/test/test/test_interrupts.c
+++ b/test/test/test_interrupts.c
@@ -20,6 +20,7 @@ enum test_interrupt_handle_type {
 	TEST_INTERRUPT_HANDLE_VALID,
 	TEST_INTERRUPT_HANDLE_VALID_UIO,
 	TEST_INTERRUPT_HANDLE_VALID_ALARM,
+	TEST_INTERRUPT_HANDLE_VALID_DEV_EVENT,
 	TEST_INTERRUPT_HANDLE_CASE1,
 	TEST_INTERRUPT_HANDLE_MAX
 };
@@ -80,6 +81,10 @@ test_interrupt_init(void)
 	intr_handles[TEST_INTERRUPT_HANDLE_VALID_ALARM].type =
 					RTE_INTR_HANDLE_ALARM;
 
+	intr_handles[TEST_INTERRUPT_HANDLE_VALID_DEV_EVENT].fd = pfds.readfd;
+	intr_handles[TEST_INTERRUPT_HANDLE_VALID_DEV_EVENT].type =
+					RTE_INTR_HANDLE_DEV_EVENT;
+
 	intr_handles[TEST_INTERRUPT_HANDLE_CASE1].fd = pfds.writefd;
 	intr_handles[TEST_INTERRUPT_HANDLE_CASE1].type = RTE_INTR_HANDLE_UIO;
 
@@ -250,6 +255,14 @@ test_interrupt_enable(void)
 		return -1;
 	}
 
+	/* check with specific valid intr_handle */
+	test_intr_handle = intr_handles[TEST_INTERRUPT_HANDLE_VALID_DEV_EVENT];
+	if (rte_intr_enable(&test_intr_handle) == 0) {
+		printf("unexpectedly enable a specific intr_handle "
+			"successfully\n");
+		return -1;
+	}
+
 	/* check with valid handler and its type */
 	test_intr_handle = intr_handles[TEST_INTERRUPT_HANDLE_CASE1];
 	if (rte_intr_enable(&test_intr_handle) < 0) {
@@ -306,6 +319,14 @@ test_interrupt_disable(void)
 		return -1;
 	}
 
+	/* check with specific valid intr_handle */
+	test_intr_handle = intr_handles[TEST_INTERRUPT_HANDLE_VALID_DEV_EVENT];
+	if (rte_intr_disable(&test_intr_handle) == 0) {
+		printf("unexpectedly disable a specific intr_handle "
+			"successfully\n");
+		return -1;
+	}
+
 	/* check with valid handler and its type */
 	test_intr_handle = intr_handles[TEST_INTERRUPT_HANDLE_CASE1];
 	if (rte_intr_disable(&test_intr_handle) < 0) {
@@ -393,9 +414,17 @@ test_interrupt(void)
 		goto out;
 	}
 
+	printf("Check valid device event interrupt full path\n");
+	if (test_interrupt_full_path_check(
+		TEST_INTERRUPT_HANDLE_VALID_DEV_EVENT) < 0) {
+		printf("failure occurred during checking valid device event "
+						"interrupt full path\n");
+		goto out;
+	}
+
 	printf("Check valid alarm interrupt full path\n");
-	if (test_interrupt_full_path_check(TEST_INTERRUPT_HANDLE_VALID_ALARM)
-									< 0) {
+	if (test_interrupt_full_path_check(
+		TEST_INTERRUPT_HANDLE_VALID_DEV_EVENT) < 0) {
 		printf("failure occurred during checking valid alarm "
 						"interrupt full path\n");
 		goto out;
@@ -513,6 +542,12 @@ test_interrupt(void)
 	rte_intr_callback_unregister(&test_intr_handle,
 			test_interrupt_callback_1, (void *)-1);
 
+	test_intr_handle = intr_handles[TEST_INTERRUPT_HANDLE_VALID_DEV_EVENT];
+	rte_intr_callback_unregister(&test_intr_handle,
+			test_interrupt_callback, (void *)-1);
+	rte_intr_callback_unregister(&test_intr_handle,
+			test_interrupt_callback_1, (void *)-1);
+
 	rte_delay_ms(2 * TEST_INTERRUPT_CHECK_INTERVAL);
 	/* deinit */
 	test_interrupt_deinit();
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH V22 2/4] eal: add device event monitor framework
  2018-04-13  8:30       ` [PATCH V22 0/4] add device event monitor framework Jeff Guo
  2018-04-13  8:30         ` [PATCH V22 1/4] eal: add device event handle in interrupt thread Jeff Guo
@ 2018-04-13  8:30         ` Jeff Guo
  2018-04-13  8:30         ` [PATCH V22 3/4] eal/linux: uevent parse and process Jeff Guo
                           ` (2 subsequent siblings)
  4 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-04-13  8:30 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, harry.van.haaren,
	jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

This patch aims to add a general device event monitor framework at
EAL device layer, for device hotplug awareness and actions adopted
accordingly. It could also expand for all other types of device event
monitor, but not in this scope at the stage.

To get started, users firstly call below new added APIs to enable/disable
the device event monitor mechanism:
  - rte_dev_event_monitor_start
  - rte_dev_event_monitor_stop

Then users shell register or unregister callbacks through the new added
APIs. Callbacks can be some device specific, or for all devices.
  -rte_dev_event_callback_register
  -rte_dev_event_callback_unregister

Use hotplug case for example, when device hotplug insertion or hotplug
removal, we will get notified from kernel, then call user's callbacks
accordingly to handle it, such as detach or attach the device from the
bus, and could benefit further fail-safe or live-migration.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
---
v22->v21:
fix clang compile issue
---
 doc/guides/rel_notes/release_18_05.rst  |  10 ++
 lib/librte_eal/bsdapp/eal/Makefile      |   1 +
 lib/librte_eal/bsdapp/eal/eal_dev.c     |  21 +++++
 lib/librte_eal/bsdapp/eal/meson.build   |   1 +
 lib/librte_eal/common/eal_common_dev.c  | 161 ++++++++++++++++++++++++++++++++
 lib/librte_eal/common/eal_private.h     |  15 +++
 lib/librte_eal/common/include/rte_dev.h |  94 +++++++++++++++++++
 lib/librte_eal/linuxapp/eal/Makefile    |   1 +
 lib/librte_eal/linuxapp/eal/eal_dev.c   |  22 +++++
 lib/librte_eal/linuxapp/eal/meson.build |   1 +
 lib/librte_eal/rte_eal_version.map      |   4 +
 11 files changed, 331 insertions(+)
 create mode 100644 lib/librte_eal/bsdapp/eal/eal_dev.c
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_dev.c

diff --git a/doc/guides/rel_notes/release_18_05.rst b/doc/guides/rel_notes/release_18_05.rst
index 563c2f3..071ec91 100644
--- a/doc/guides/rel_notes/release_18_05.rst
+++ b/doc/guides/rel_notes/release_18_05.rst
@@ -58,6 +58,16 @@ New Features
   * Added support for NVGRE, VXLAN and GENEVE filters in flow API.
   * Added support for DROP action in flow API.
 
+* **Added device event monitor framework.**
+
+  Added a general device event monitor framework at EAL, for device dynamic management.
+  Such as device hotplug awareness and actions adopted accordingly. The list of new APIs:
+
+  * ``rte_dev_event_monitor_start`` and ``rte_dev_event_monitor_stop`` are for
+    the event monitor enable and disable.
+  * ``rte_dev_event_callback_register`` and ``rte_dev_event_callback_unregister``
+    are for the user's callbacks register and unregister.
+
 
 API Changes
 -----------
diff --git a/lib/librte_eal/bsdapp/eal/Makefile b/lib/librte_eal/bsdapp/eal/Makefile
index 250d5c1..200285e 100644
--- a/lib/librte_eal/bsdapp/eal/Makefile
+++ b/lib/librte_eal/bsdapp/eal/Makefile
@@ -34,6 +34,7 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_lcore.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_timer.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_interrupts.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_alarm.c
+SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_dev.c
 
 # from common dir
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_common_lcore.c
diff --git a/lib/librte_eal/bsdapp/eal/eal_dev.c b/lib/librte_eal/bsdapp/eal/eal_dev.c
new file mode 100644
index 0000000..1c6c51b
--- /dev/null
+++ b/lib/librte_eal/bsdapp/eal/eal_dev.c
@@ -0,0 +1,21 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <rte_log.h>
+#include <rte_compat.h>
+#include <rte_dev.h>
+
+int __rte_experimental
+rte_dev_event_monitor_start(void)
+{
+	RTE_LOG(ERR, EAL, "Device event is not supported for FreeBSD\n");
+	return -1;
+}
+
+int __rte_experimental
+rte_dev_event_monitor_stop(void)
+{
+	RTE_LOG(ERR, EAL, "Device event is not supported for FreeBSD\n");
+	return -1;
+}
diff --git a/lib/librte_eal/bsdapp/eal/meson.build b/lib/librte_eal/bsdapp/eal/meson.build
index 4b40223..4c56118 100644
--- a/lib/librte_eal/bsdapp/eal/meson.build
+++ b/lib/librte_eal/bsdapp/eal/meson.build
@@ -13,4 +13,5 @@ env_sources = files('eal_alarm.c',
 		'eal_timer.c',
 		'eal.c',
 		'eal_memory.c',
+		'eal_dev.c'
 )
diff --git a/lib/librte_eal/common/eal_common_dev.c b/lib/librte_eal/common/eal_common_dev.c
index cd07144..0628b62 100644
--- a/lib/librte_eal/common/eal_common_dev.c
+++ b/lib/librte_eal/common/eal_common_dev.c
@@ -14,9 +14,34 @@
 #include <rte_devargs.h>
 #include <rte_debug.h>
 #include <rte_log.h>
+#include <rte_spinlock.h>
+#include <rte_malloc.h>
 
 #include "eal_private.h"
 
+/**
+ * The device event callback description.
+ *
+ * It contains callback address to be registered by user application,
+ * the pointer to the parameters for callback, and the device name.
+ */
+struct dev_event_callback {
+	TAILQ_ENTRY(dev_event_callback) next; /**< Callbacks list */
+	rte_dev_event_cb_fn cb_fn;                /**< Callback address */
+	void *cb_arg;                           /**< Callback parameter */
+	char *dev_name;	 /**< Callback device name, NULL is for all device */
+	uint32_t active;                        /**< Callback is executing */
+};
+
+/** @internal Structure to keep track of registered callbacks */
+TAILQ_HEAD(dev_event_cb_list, dev_event_callback);
+
+/* The device event callback list for all registered callbacks. */
+static struct dev_event_cb_list dev_event_cbs;
+
+/* spinlock for device callbacks */
+static rte_spinlock_t dev_event_lock = RTE_SPINLOCK_INITIALIZER;
+
 static int cmp_detached_dev_name(const struct rte_device *dev,
 	const void *_name)
 {
@@ -207,3 +232,139 @@ rte_eal_hotplug_remove(const char *busname, const char *devname)
 	rte_eal_devargs_remove(busname, devname);
 	return ret;
 }
+
+int __rte_experimental
+rte_dev_event_callback_register(const char *device_name,
+				rte_dev_event_cb_fn cb_fn,
+				void *cb_arg)
+{
+	struct dev_event_callback *event_cb;
+	int ret;
+
+	if (!cb_fn)
+		return -EINVAL;
+
+	rte_spinlock_lock(&dev_event_lock);
+
+	if (TAILQ_EMPTY(&dev_event_cbs))
+		TAILQ_INIT(&dev_event_cbs);
+
+	TAILQ_FOREACH(event_cb, &dev_event_cbs, next) {
+		if (event_cb->cb_fn == cb_fn && event_cb->cb_arg == cb_arg) {
+			if (device_name == NULL && event_cb->dev_name == NULL)
+				break;
+			if (device_name == NULL || event_cb->dev_name == NULL)
+				continue;
+			if (!strcmp(event_cb->dev_name, device_name))
+				break;
+		}
+	}
+
+	/* create a new callback. */
+	if (event_cb == NULL) {
+		event_cb = malloc(sizeof(struct dev_event_callback));
+		if (event_cb != NULL) {
+			event_cb->cb_fn = cb_fn;
+			event_cb->cb_arg = cb_arg;
+			event_cb->active = 0;
+			if (!device_name) {
+				event_cb->dev_name = NULL;
+			} else {
+				event_cb->dev_name = strdup(device_name);
+				if (event_cb->dev_name == NULL) {
+					ret = -ENOMEM;
+					goto error;
+				}
+			}
+			TAILQ_INSERT_TAIL(&dev_event_cbs, event_cb, next);
+		} else {
+			RTE_LOG(ERR, EAL,
+				"Failed to allocate memory for device "
+				"event callback.");
+			ret = -ENOMEM;
+			goto error;
+		}
+	} else {
+		RTE_LOG(ERR, EAL,
+			"The callback is already exist, no need "
+			"to register again.\n");
+		ret = -EEXIST;
+	}
+
+	rte_spinlock_unlock(&dev_event_lock);
+	return 0;
+error:
+	free(event_cb);
+	rte_spinlock_unlock(&dev_event_lock);
+	return ret;
+}
+
+int __rte_experimental
+rte_dev_event_callback_unregister(const char *device_name,
+				  rte_dev_event_cb_fn cb_fn,
+				  void *cb_arg)
+{
+	int ret = 0;
+	struct dev_event_callback *event_cb, *next;
+
+	if (!cb_fn)
+		return -EINVAL;
+
+	rte_spinlock_lock(&dev_event_lock);
+	/*walk through the callbacks and remove all that match. */
+	for (event_cb = TAILQ_FIRST(&dev_event_cbs); event_cb != NULL;
+	     event_cb = next) {
+
+		next = TAILQ_NEXT(event_cb, next);
+
+		if (device_name != NULL && event_cb->dev_name != NULL) {
+			if (!strcmp(event_cb->dev_name, device_name)) {
+				if (event_cb->cb_fn != cb_fn ||
+				    (cb_arg != (void *)-1 &&
+				    event_cb->cb_arg != cb_arg))
+					continue;
+			}
+		} else if (device_name != NULL) {
+			continue;
+		}
+
+		/*
+		 * if this callback is not executing right now,
+		 * then remove it.
+		 */
+		if (event_cb->active == 0) {
+			TAILQ_REMOVE(&dev_event_cbs, event_cb, next);
+			free(event_cb);
+			ret++;
+		} else {
+			continue;
+		}
+	}
+	rte_spinlock_unlock(&dev_event_lock);
+	return ret;
+}
+
+void
+dev_callback_process(char *device_name, enum rte_dev_event_type event)
+{
+	struct dev_event_callback *cb_lst;
+
+	if (device_name == NULL)
+		return;
+
+	rte_spinlock_lock(&dev_event_lock);
+
+	TAILQ_FOREACH(cb_lst, &dev_event_cbs, next) {
+		if (cb_lst->dev_name) {
+			if (strcmp(cb_lst->dev_name, device_name))
+				continue;
+		}
+		cb_lst->active = 1;
+		rte_spinlock_unlock(&dev_event_lock);
+		cb_lst->cb_fn(device_name, event,
+				cb_lst->cb_arg);
+		rte_spinlock_lock(&dev_event_lock);
+		cb_lst->active = 0;
+	}
+	rte_spinlock_unlock(&dev_event_lock);
+}
diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h
index 3fed436..c359589 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -9,6 +9,8 @@
 #include <stdint.h>
 #include <stdio.h>
 
+#include <rte_dev.h>
+
 /**
  * Initialize the memzone subsystem (private to eal).
  *
@@ -238,4 +240,17 @@ struct rte_bus *rte_bus_find_by_device_name(const char *str);
 
 int rte_mp_channel_init(void);
 
+/**
+ * Internal Executes all the user application registered callbacks for
+ * the specific device. It is for DPDK internal user only. User
+ * application should not call it directly.
+ *
+ * @param device_name
+ *  The device name.
+ * @param event
+ *  the device event type.
+ *
+ */
+void
+dev_callback_process(char *device_name, enum rte_dev_event_type event);
 #endif /* _EAL_PRIVATE_H_ */
diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h
index b688f1e..a5203e7 100644
--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -24,6 +24,25 @@ extern "C" {
 #include <rte_compat.h>
 #include <rte_log.h>
 
+/**
+ * The device event type.
+ */
+enum rte_dev_event_type {
+	RTE_DEV_EVENT_ADD,	/**< device being added */
+	RTE_DEV_EVENT_REMOVE,	/**< device being removed */
+	RTE_DEV_EVENT_MAX	/**< max value of this enum */
+};
+
+struct rte_dev_event {
+	enum rte_dev_event_type type;	/**< device event type */
+	int subsystem;			/**< subsystem id */
+	char *devname;			/**< device name */
+};
+
+typedef void (*rte_dev_event_cb_fn)(char *device_name,
+					enum rte_dev_event_type event,
+					void *cb_arg);
+
 __attribute__((format(printf, 2, 0)))
 static inline void
 rte_pmd_debug_trace(const char *func_name, const char *fmt, ...)
@@ -267,4 +286,79 @@ __attribute__((used)) = str
 }
 #endif
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * It registers the callback for the specific device.
+ * Multiple callbacks cal be registered at the same time.
+ *
+ * @param device_name
+ *  The device name, that is the param name of the struct rte_device,
+ *  null value means for all devices.
+ * @param cb_fn
+ *  callback address.
+ * @param cb_arg
+ *  address of parameter for callback.
+ *
+ * @return
+ *  - On success, zero.
+ *  - On failure, a negative value.
+ */
+int __rte_experimental
+rte_dev_event_callback_register(const char *device_name,
+				rte_dev_event_cb_fn cb_fn,
+				void *cb_arg);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * It unregisters the callback according to the specified device.
+ *
+ * @param device_name
+ *  The device name, that is the param name of the struct rte_device,
+ *  null value means for all devices and their callbacks.
+ * @param cb_fn
+ *  callback address.
+ * @param cb_arg
+ *  address of parameter for callback, (void *)-1 means to remove all
+ *  registered which has the same callback address.
+ *
+ * @return
+ *  - On success, return the number of callback entities removed.
+ *  - On failure, a negative value.
+ */
+int __rte_experimental
+rte_dev_event_callback_unregister(const char *device_name,
+				  rte_dev_event_cb_fn cb_fn,
+				  void *cb_arg);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Start the device event monitoring.
+ *
+ * @param none
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int __rte_experimental
+rte_dev_event_monitor_start(void);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Stop the device event monitoring .
+ *
+ * @param none
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int __rte_experimental
+rte_dev_event_monitor_stop(void);
 #endif /* _RTE_DEV_H_ */
diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index 542bf7e..45517a2 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -42,6 +42,7 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_lcore.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_timer.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_interrupts.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_alarm.c
+SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_dev.c
 
 # from common dir
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_common_lcore.c
diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
new file mode 100644
index 0000000..9c8d1a0
--- /dev/null
+++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
@@ -0,0 +1,22 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <rte_log.h>
+#include <rte_compat.h>
+#include <rte_dev.h>
+
+
+int __rte_experimental
+rte_dev_event_monitor_start(void)
+{
+	/* TODO: start uevent monitor for linux */
+	return 0;
+}
+
+int __rte_experimental
+rte_dev_event_monitor_stop(void)
+{
+	/* TODO: stop uevent monitor for linux */
+	return 0;
+}
diff --git a/lib/librte_eal/linuxapp/eal/meson.build b/lib/librte_eal/linuxapp/eal/meson.build
index 5254c6c..9c01931 100644
--- a/lib/librte_eal/linuxapp/eal/meson.build
+++ b/lib/librte_eal/linuxapp/eal/meson.build
@@ -19,6 +19,7 @@ env_sources = files('eal_alarm.c',
 		'eal_vfio_mp_sync.c',
 		'eal.c',
 		'eal_memory.c',
+		'eal_dev.c',
 )
 
 if has_libnuma == 1
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 603c744..d02d80b 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -213,6 +213,10 @@ DPDK_18.02 {
 EXPERIMENTAL {
 	global:
 
+	rte_dev_event_callback_register;
+	rte_dev_event_callback_unregister;
+	rte_dev_event_monitor_start;
+	rte_dev_event_monitor_stop;
 	rte_eal_cleanup;
 	rte_eal_devargs_insert;
 	rte_eal_devargs_parse;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH V22 3/4] eal/linux: uevent parse and process
  2018-04-13  8:30       ` [PATCH V22 0/4] add device event monitor framework Jeff Guo
  2018-04-13  8:30         ` [PATCH V22 1/4] eal: add device event handle in interrupt thread Jeff Guo
  2018-04-13  8:30         ` [PATCH V22 2/4] eal: add device event monitor framework Jeff Guo
@ 2018-04-13  8:30         ` Jeff Guo
  2018-04-13  8:30         ` [PATCH V22 4/4] app/testpmd: enable device hotplug monitoring Jeff Guo
  2018-04-13 10:03         ` [PATCH V22 0/4] add device event monitor framework Thomas Monjalon
  4 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-04-13  8:30 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, harry.van.haaren,
	jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

In order to handle the uevent which has been detected from the kernel
side, add uevent parse and process function to translate the uevent into
device event, which user has subscribed to monitor.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
---
v22->v21:
correct some doc style
---
 doc/guides/rel_notes/release_18_05.rst |   2 +
 lib/librte_eal/linuxapp/eal/eal_dev.c  | 205 ++++++++++++++++++++++++++++++++-
 2 files changed, 205 insertions(+), 2 deletions(-)

diff --git a/doc/guides/rel_notes/release_18_05.rst b/doc/guides/rel_notes/release_18_05.rst
index 071ec91..a018ef5 100644
--- a/doc/guides/rel_notes/release_18_05.rst
+++ b/doc/guides/rel_notes/release_18_05.rst
@@ -68,6 +68,8 @@ New Features
   * ``rte_dev_event_callback_register`` and ``rte_dev_event_callback_unregister``
     are for the user's callbacks register and unregister.
 
+  Linux uevent is supported as backend of this device event notification framework.
+
 
 API Changes
 -----------
diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
index 9c8d1a0..9478a39 100644
--- a/lib/librte_eal/linuxapp/eal/eal_dev.c
+++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
@@ -2,21 +2,222 @@
  * Copyright(c) 2018 Intel Corporation
  */
 
+#include <string.h>
+#include <unistd.h>
+#include <sys/socket.h>
+#include <linux/netlink.h>
+
 #include <rte_log.h>
 #include <rte_compat.h>
 #include <rte_dev.h>
+#include <rte_malloc.h>
+#include <rte_interrupts.h>
+#include <rte_alarm.h>
+
+#include "eal_private.h"
+
+static struct rte_intr_handle intr_handle = {.fd = -1 };
+static bool monitor_started;
+
+#define EAL_UEV_MSG_LEN 4096
+#define EAL_UEV_MSG_ELEM_LEN 128
+
+static void dev_uev_handler(__rte_unused void *param);
+
+/* identify the system layer which reports this event. */
+enum eal_dev_event_subsystem {
+	EAL_DEV_EVENT_SUBSYSTEM_PCI, /* PCI bus device event */
+	EAL_DEV_EVENT_SUBSYSTEM_UIO, /* UIO driver device event */
+	EAL_DEV_EVENT_SUBSYSTEM_VFIO, /* VFIO driver device event */
+	EAL_DEV_EVENT_SUBSYSTEM_MAX
+};
+
+static int
+dev_uev_socket_fd_create(void)
+{
+	struct sockaddr_nl addr;
+	int ret;
+
+	intr_handle.fd = socket(PF_NETLINK, SOCK_RAW | SOCK_CLOEXEC |
+			SOCK_NONBLOCK,
+			NETLINK_KOBJECT_UEVENT);
+	if (intr_handle.fd < 0) {
+		RTE_LOG(ERR, EAL, "create uevent fd failed.\n");
+		return -1;
+	}
+
+	memset(&addr, 0, sizeof(addr));
+	addr.nl_family = AF_NETLINK;
+	addr.nl_pid = 0;
+	addr.nl_groups = 0xffffffff;
+
+	ret = bind(intr_handle.fd, (struct sockaddr *) &addr, sizeof(addr));
+	if (ret < 0) {
+		RTE_LOG(ERR, EAL, "Failed to bind uevent socket.\n");
+		goto err;
+	}
 
+	return 0;
+err:
+	close(intr_handle.fd);
+	intr_handle.fd = -1;
+	return ret;
+}
+
+static int
+dev_uev_parse(const char *buf, struct rte_dev_event *event, int length)
+{
+	char action[EAL_UEV_MSG_ELEM_LEN];
+	char subsystem[EAL_UEV_MSG_ELEM_LEN];
+	char pci_slot_name[EAL_UEV_MSG_ELEM_LEN];
+	int i = 0;
+
+	memset(action, 0, EAL_UEV_MSG_ELEM_LEN);
+	memset(subsystem, 0, EAL_UEV_MSG_ELEM_LEN);
+	memset(pci_slot_name, 0, EAL_UEV_MSG_ELEM_LEN);
+
+	while (i < length) {
+		for (; i < length; i++) {
+			if (*buf)
+				break;
+			buf++;
+		}
+		/**
+		 * check device uevent from kernel side, no need to check
+		 * uevent from udev.
+		 */
+		if (!strncmp(buf, "libudev", 7)) {
+			buf += 7;
+			i += 7;
+			return -1;
+		}
+		if (!strncmp(buf, "ACTION=", 7)) {
+			buf += 7;
+			i += 7;
+			snprintf(action, sizeof(action), "%s", buf);
+		} else if (!strncmp(buf, "SUBSYSTEM=", 10)) {
+			buf += 10;
+			i += 10;
+			snprintf(subsystem, sizeof(subsystem), "%s", buf);
+		} else if (!strncmp(buf, "PCI_SLOT_NAME=", 14)) {
+			buf += 14;
+			i += 14;
+			snprintf(pci_slot_name, sizeof(subsystem), "%s", buf);
+			event->devname = strdup(pci_slot_name);
+		}
+		for (; i < length; i++) {
+			if (*buf == '\0')
+				break;
+			buf++;
+		}
+	}
+
+	/* parse the subsystem layer */
+	if (!strncmp(subsystem, "uio", 3))
+		event->subsystem = EAL_DEV_EVENT_SUBSYSTEM_UIO;
+	else if (!strncmp(subsystem, "pci", 3))
+		event->subsystem = EAL_DEV_EVENT_SUBSYSTEM_PCI;
+	else if (!strncmp(subsystem, "vfio", 4))
+		event->subsystem = EAL_DEV_EVENT_SUBSYSTEM_VFIO;
+	else
+		return -1;
+
+	/* parse the action type */
+	if (!strncmp(action, "add", 3))
+		event->type = RTE_DEV_EVENT_ADD;
+	else if (!strncmp(action, "remove", 6))
+		event->type = RTE_DEV_EVENT_REMOVE;
+	else
+		return -1;
+	return 0;
+}
+
+static void
+dev_delayed_unregister(void *param)
+{
+	rte_intr_callback_unregister(&intr_handle, dev_uev_handler, param);
+	close(intr_handle.fd);
+	intr_handle.fd = -1;
+}
+
+static void
+dev_uev_handler(__rte_unused void *param)
+{
+	struct rte_dev_event uevent;
+	int ret;
+	char buf[EAL_UEV_MSG_LEN];
+
+	memset(&uevent, 0, sizeof(struct rte_dev_event));
+	memset(buf, 0, EAL_UEV_MSG_LEN);
+
+	ret = recv(intr_handle.fd, buf, EAL_UEV_MSG_LEN, MSG_DONTWAIT);
+	if (ret < 0 && errno == EAGAIN)
+		return;
+	else if (ret <= 0) {
+		/* connection is closed or broken, can not up again. */
+		RTE_LOG(ERR, EAL, "uevent socket connection is broken.\n");
+		rte_eal_alarm_set(1, dev_delayed_unregister, NULL);
+		return;
+	}
+
+	ret = dev_uev_parse(buf, &uevent, EAL_UEV_MSG_LEN);
+	if (ret < 0) {
+		RTE_LOG(DEBUG, EAL, "It is not an valid event "
+			"that need to be handle.\n");
+		return;
+	}
+
+	RTE_LOG(DEBUG, EAL, "receive uevent(name:%s, type:%d, subsystem:%d)\n",
+		uevent.devname, uevent.type, uevent.subsystem);
+
+	if (uevent.devname)
+		dev_callback_process(uevent.devname, uevent.type);
+}
 
 int __rte_experimental
 rte_dev_event_monitor_start(void)
 {
-	/* TODO: start uevent monitor for linux */
+	int ret;
+
+	if (monitor_started)
+		return 0;
+
+	ret = dev_uev_socket_fd_create();
+	if (ret) {
+		RTE_LOG(ERR, EAL, "error create device event fd.\n");
+		return -1;
+	}
+
+	intr_handle.type = RTE_INTR_HANDLE_DEV_EVENT;
+	ret = rte_intr_callback_register(&intr_handle, dev_uev_handler, NULL);
+
+	if (ret) {
+		RTE_LOG(ERR, EAL, "fail to register uevent callback.\n");
+		return -1;
+	}
+
+	monitor_started = true;
+
 	return 0;
 }
 
 int __rte_experimental
 rte_dev_event_monitor_stop(void)
 {
-	/* TODO: stop uevent monitor for linux */
+	int ret;
+
+	if (!monitor_started)
+		return 0;
+
+	ret = rte_intr_callback_unregister(&intr_handle, dev_uev_handler,
+					   (void *)-1);
+	if (ret < 0) {
+		RTE_LOG(ERR, EAL, "fail to unregister uevent callback.\n");
+		return ret;
+	}
+
+	close(intr_handle.fd);
+	intr_handle.fd = -1;
+	monitor_started = false;
 	return 0;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH V22 4/4] app/testpmd: enable device hotplug monitoring
  2018-04-13  8:30       ` [PATCH V22 0/4] add device event monitor framework Jeff Guo
                           ` (2 preceding siblings ...)
  2018-04-13  8:30         ` [PATCH V22 3/4] eal/linux: uevent parse and process Jeff Guo
@ 2018-04-13  8:30         ` Jeff Guo
  2018-04-13 10:03         ` [PATCH V22 0/4] add device event monitor framework Thomas Monjalon
  4 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-04-13  8:30 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, harry.van.haaren,
	jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

Use testpmd for example, to show how an application uses device event
APIs to monitor the hotplug events, including both hot removal event
and hot insertion event.

The process is that, testpmd first enable hotplug by below commands,

E.g. ./build/app/testpmd -c 0x3 --n 4 -- -i --hot-plug

then testpmd starts the device event monitor by calling the new API
(rte_dev_event_monitor_start) and register the user's callback by call
the API (rte_dev_event_callback_register), when device being hotplug
insertion or hotplug removal, the device event monitor detects the event
and call user's callbacks, user could process the event in the callback
accordingly.

This patch only shows the event monitoring, device attach/detach would
not be involved here, will add from other hotplug patch set.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Reviewed-by: Jianfeng Tan <jianfeng.tan@intel.com>
---
v22->v21:
no change
---
 app/test-pmd/parameters.c             |   5 +-
 app/test-pmd/testpmd.c                | 101 +++++++++++++++++++++++++++++++++-
 app/test-pmd/testpmd.h                |   2 +
 doc/guides/testpmd_app_ug/run_app.rst |   4 ++
 4 files changed, 110 insertions(+), 2 deletions(-)

diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index 2192bdc..1a05284 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -186,6 +186,7 @@ usage(char* progname)
 	printf("  --flow-isolate-all: "
 	       "requests flow API isolated mode on all ports at initialization time.\n");
 	printf("  --tx-offloads=0xXXXXXXXX: hexadecimal bitmask of TX queue offloads\n");
+	printf("  --hot-plug: enable hot plug for device.\n");
 }
 
 #ifdef RTE_LIBRTE_CMDLINE
@@ -621,6 +622,7 @@ launch_args_parse(int argc, char** argv)
 		{ "print-event",		1, 0, 0 },
 		{ "mask-event",			1, 0, 0 },
 		{ "tx-offloads",		1, 0, 0 },
+		{ "hot-plug",			0, 0, 0 },
 		{ 0, 0, 0, 0 },
 	};
 
@@ -1101,7 +1103,8 @@ launch_args_parse(int argc, char** argv)
 					rte_exit(EXIT_FAILURE,
 						 "invalid mask-event argument\n");
 				}
-
+			if (!strcmp(lgopts[opt_idx].name, "hot-plug"))
+				hot_plug = 1;
 			break;
 		case 'h':
 			usage(argv[0]);
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 4c0e258..d2c122a 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -12,6 +12,7 @@
 #include <sys/mman.h>
 #include <sys/types.h>
 #include <errno.h>
+#include <stdbool.h>
 
 #include <sys/queue.h>
 #include <sys/stat.h>
@@ -284,6 +285,8 @@ uint8_t lsc_interrupt = 1; /* enabled by default */
  */
 uint8_t rmv_interrupt = 1; /* enabled by default */
 
+uint8_t hot_plug = 0; /**< hotplug disabled by default. */
+
 /*
  * Display or mask ether events
  * Default to all events except VF_MBOX
@@ -391,6 +394,12 @@ static void check_all_ports_link_status(uint32_t port_mask);
 static int eth_event_callback(portid_t port_id,
 			      enum rte_eth_event_type type,
 			      void *param, void *ret_param);
+static void eth_dev_event_callback(char *device_name,
+				enum rte_dev_event_type type,
+				void *param);
+static int eth_dev_event_callback_register(void);
+static int eth_dev_event_callback_unregister(void);
+
 
 /*
  * Check if all the ports are started.
@@ -1853,6 +1862,39 @@ reset_port(portid_t pid)
 	printf("Done\n");
 }
 
+static int
+eth_dev_event_callback_register(void)
+{
+	int ret;
+
+	/* register the device event callback */
+	ret = rte_dev_event_callback_register(NULL,
+		eth_dev_event_callback, NULL);
+	if (ret) {
+		printf("Failed to register device event callback\n");
+		return -1;
+	}
+
+	return 0;
+}
+
+
+static int
+eth_dev_event_callback_unregister(void)
+{
+	int ret;
+
+	/* unregister the device event callback */
+	ret = rte_dev_event_callback_unregister(NULL,
+		eth_dev_event_callback, NULL);
+	if (ret < 0) {
+		printf("Failed to unregister device event callback\n");
+		return -1;
+	}
+
+	return 0;
+}
+
 void
 attach_port(char *identifier)
 {
@@ -1916,6 +1958,7 @@ void
 pmd_test_exit(void)
 {
 	portid_t pt_id;
+	int ret;
 
 	if (test_done == 0)
 		stop_packet_forwarding();
@@ -1929,6 +1972,18 @@ pmd_test_exit(void)
 			close_port(pt_id);
 		}
 	}
+
+	if (hot_plug) {
+		ret = rte_dev_event_monitor_stop();
+		if (ret)
+			RTE_LOG(ERR, EAL,
+				"fail to stop device event monitor.");
+
+		ret = eth_dev_event_callback_unregister();
+		if (ret)
+			RTE_LOG(ERR, EAL,
+				"fail to unregister all event callbacks.");
+	}
 	printf("\nBye...\n");
 }
 
@@ -2059,6 +2114,37 @@ eth_event_callback(portid_t port_id, enum rte_eth_event_type type, void *param,
 	return 0;
 }
 
+/* This function is used by the interrupt thread */
+static void
+eth_dev_event_callback(char *device_name, enum rte_dev_event_type type,
+			     __rte_unused void *arg)
+{
+	if (type >= RTE_DEV_EVENT_MAX) {
+		fprintf(stderr, "%s called upon invalid event %d\n",
+			__func__, type);
+		fflush(stderr);
+	}
+
+	switch (type) {
+	case RTE_DEV_EVENT_REMOVE:
+		RTE_LOG(ERR, EAL, "The device: %s has been removed!\n",
+			device_name);
+		/* TODO: After finish failure handle, begin to stop
+		 * packet forward, stop port, close port, detach port.
+		 */
+		break;
+	case RTE_DEV_EVENT_ADD:
+		RTE_LOG(ERR, EAL, "The device: %s has been added!\n",
+			device_name);
+		/* TODO: After finish kernel driver binding,
+		 * begin to attach port.
+		 */
+		break;
+	default:
+		break;
+	}
+}
+
 static int
 set_tx_queue_stats_mapping_registers(portid_t port_id, struct rte_port *port)
 {
@@ -2474,8 +2560,9 @@ signal_handler(int signum)
 int
 main(int argc, char** argv)
 {
-	int  diag;
+	int diag;
 	portid_t port_id;
+	int ret;
 
 	signal(SIGINT, signal_handler);
 	signal(SIGTERM, signal_handler);
@@ -2543,6 +2630,18 @@ main(int argc, char** argv)
 		       nb_rxq, nb_txq);
 
 	init_config();
+
+	if (hot_plug) {
+		/* enable hot plug monitoring */
+		ret = rte_dev_event_monitor_start();
+		if (ret) {
+			rte_errno = EINVAL;
+			return -1;
+		}
+		eth_dev_event_callback_register();
+
+	}
+
 	if (start_port(RTE_PORT_ALL) != 0)
 		rte_exit(EXIT_FAILURE, "Start ports failed\n");
 
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 153abea..8fde68d 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -319,6 +319,8 @@ extern volatile int test_done; /* stop packet forwarding when set to 1. */
 extern uint8_t lsc_interrupt; /**< disabled by "--no-lsc-interrupt" parameter */
 extern uint8_t rmv_interrupt; /**< disabled by "--no-rmv-interrupt" parameter */
 extern uint32_t event_print_mask;
+extern uint8_t hot_plug; /**< enable by "--hot-plug" parameter */
+
 /**< set by "--print-event xxxx" and "--mask-event xxxx parameters */
 
 #ifdef RTE_LIBRTE_IXGBE_BYPASS
diff --git a/doc/guides/testpmd_app_ug/run_app.rst b/doc/guides/testpmd_app_ug/run_app.rst
index 1fd5395..d0ced36 100644
--- a/doc/guides/testpmd_app_ug/run_app.rst
+++ b/doc/guides/testpmd_app_ug/run_app.rst
@@ -479,3 +479,7 @@ The commandline options are:
 
     Set the hexadecimal bitmask of TX queue offloads.
     The default value is 0.
+
+*   ``--hot-plug``
+
+    Enable device event monitor machenism for hotplug.
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* Re: [PATCH V22 0/4] add device event monitor framework
  2018-04-13  8:30       ` [PATCH V22 0/4] add device event monitor framework Jeff Guo
                           ` (3 preceding siblings ...)
  2018-04-13  8:30         ` [PATCH V22 4/4] app/testpmd: enable device hotplug monitoring Jeff Guo
@ 2018-04-13 10:03         ` Thomas Monjalon
  4 siblings, 0 replies; 494+ messages in thread
From: Thomas Monjalon @ 2018-04-13 10:03 UTC (permalink / raw)
  To: Jeff Guo
  Cc: dev, stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, motih, harry.van.haaren, jianfeng.tan,
	shreyansh.jain, helin.zhang

13/04/2018 10:30, Jeff Guo:
> About hot plug in dpdk, We already have proactive way to add/remove devices
> through APIs (rte_eal_hotplug_add/remove), and also have fail-safe driver
> to offload the fail-safe work from the app user. But there are still lack
> of a general mechanism to monitor hotplug event for all driver, now the
> hotplug interrupt event is diversity between each device and driver, such
> as mlx4, pci driver and others.
> 
> Use the hot removal event for example, pci drivers not all exposure the
> remove interrupt, so in order to make user to easy use the hot plug
> feature for pci driver, something must be done to detect the remove event
> at the kernel level and offer a new line of interrupt to the user land.
> 
> Base on the uevent of kobject mechanism in kernel, we could use it to
> benefit for monitoring the hot plug status of the device which not only
> uio/vfio of pci bus devices, but also other, such as cpu/usb/pci-express bus devices.
[...]
> Jeff Guo (4):
>   eal: add device event handle in interrupt thread
>   eal: add device event monitor framework
>   eal/linux: uevent parse and process
>   app/testpmd: enable device hotplug monitoring

Applied, thanks

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH V20 0/4] add hot plug recovery mechanism
  2017-06-29  4:37     ` [PATCH v3 0/2] add uevent api for hot plug Jeff Guo
                         ` (2 preceding siblings ...)
  2018-04-13  8:30       ` [PATCH V22 0/4] add device event monitor framework Jeff Guo
@ 2018-04-18 13:38       ` Jeff Guo
  2018-04-18 13:38         ` [PATCH V20 1/4] bus/pci: introduce device hot unplug handle Jeff Guo
                           ` (3 more replies)
  2018-05-03  8:57       ` [PATCH V21 0/4] hot plug recovery mechanism Jeff Guo
                         ` (19 subsequent siblings)
  23 siblings, 4 replies; 494+ messages in thread
From: Jeff Guo @ 2018-04-18 13:38 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

At the prior, device event monitor framework have been introduced, 
the typical usage of it is for device hot plug. If we want application
would not be break down when device hot plug in or out, we still need some
measures to do recovery to do preparation for device detach, so that we will
not encounter any memory fault after device be hot unplug, that will let
application to keep working.

This patch set will introduces an API to implement the recovery mechanism to 
handle hot plug, and also use testpmd to show example how to
use the API for process hot plug event, let the process could be
smoothly like below:

plug out->failure handle->stop forward->stop port->close port->detach port

with this mechanism, user such as fail-safe driver or testpmd could be able to
develop their own hot plug application.

patchset history:
v20->v19:
clean the code
refine the remap logic for multiple device.
remove the auto binding  

v19->18:
note for limitation of multiple hotplug,fix some typo, sqeeze patch.

v18->v15:
add document, add signal bus handler, refine the code to be more clear.

the prior patch history please check the patch set
"add device event monitor framework"

Jeff Guo (4):
  bus/pci: introduce device hot unplug handle
  eal: add failure handler mechanism for hot plug
  igb_uio: fix uio release issue when hot unplug
  app/testpmd: show example to handler hot unplug

 app/test-pmd/testpmd.c                  |  29 ++++++--
 doc/guides/rel_notes/release_18_05.rst  |   6 ++
 drivers/bus/pci/pci_common.c            |  67 +++++++++++++++++
 drivers/bus/pci/pci_common_uio.c        |  32 +++++++++
 drivers/bus/pci/private.h               |  12 ++++
 kernel/linux/igb_uio/igb_uio.c          |   4 ++
 lib/librte_eal/common/include/rte_bus.h |  16 +++++
 lib/librte_eal/common/include/rte_dev.h |  11 +++
 lib/librte_eal/linuxapp/eal/eal_dev.c   | 124 +++++++++++++++++++++++++++++++-
 lib/librte_eal/rte_eal_version.map      |   1 +
 10 files changed, 297 insertions(+), 5 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH V20 1/4] bus/pci: introduce device hot unplug handle
  2018-04-18 13:38       ` [PATCH V20 0/4] add hot plug recovery mechanism Jeff Guo
@ 2018-04-18 13:38         ` Jeff Guo
  2018-04-20 10:32           ` Ananyev, Konstantin
  2018-04-18 13:38         ` [PATCH V20 2/4] eal: add failure handler mechanism for hot plug Jeff Guo
                           ` (2 subsequent siblings)
  3 siblings, 1 reply; 494+ messages in thread
From: Jeff Guo @ 2018-04-18 13:38 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

As of device hot unplug, we need some preparatory measures so that we will
not encounter memory fault after device be plug out of the system,
and also let we could recover the running data path but not been break.
This patch allows the buses to handle device hot unplug event.
The patch only enable the ops in pci bus, when handle device hot unplug
event, remap a dummy memory to avoid bus read/write error.
Other buses could accordingly implement this ops specific by themselves.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v20->19:
clean the code
---
 drivers/bus/pci/pci_common.c            | 67 +++++++++++++++++++++++++++++++++
 drivers/bus/pci/pci_common_uio.c        | 32 ++++++++++++++++
 drivers/bus/pci/private.h               | 12 ++++++
 lib/librte_eal/common/include/rte_bus.h | 16 ++++++++
 4 files changed, 127 insertions(+)

diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
index 2a00f36..709eaf3 100644
--- a/drivers/bus/pci/pci_common.c
+++ b/drivers/bus/pci/pci_common.c
@@ -474,6 +474,72 @@ pci_find_device(const struct rte_device *start, rte_dev_cmp_t cmp,
 }
 
 static int
+pci_handle_hot_unplug(struct rte_device *dev, void *failure_addr)
+{
+	struct rte_pci_device *pdev = NULL;
+	int ret = 0, i, isfound = 0;
+
+	if (failure_addr != NULL) {
+		FOREACH_DEVICE_ON_PCIBUS(pdev) {
+			for (i = 0; i != sizeof(pdev->mem_resource) /
+				sizeof(pdev->mem_resource[0]); i++) {
+				if ((uint64_t)failure_addr >=
+				    (uint64_t)pdev->mem_resource[i].addr &&
+				    (uint64_t)failure_addr <=
+				    (uint64_t)pdev->mem_resource[i].addr +
+				    pdev->mem_resource[i].len) {
+					RTE_LOG(ERR, EAL, "Failure address "
+						"%16.16"PRIx64" is belong to "
+						"resource of device %s!\n",
+						(uint64_t)failure_addr,
+						pdev->device.name);
+					isfound = 1;
+					break;
+				}
+			}
+			if (isfound)
+				break;
+		}
+	} else if (dev != NULL) {
+		pdev = RTE_DEV_TO_PCI(dev);
+	} else {
+		return -EINVAL;
+	}
+
+	if (!pdev)
+		return -1;
+
+	/* remap resources for devices */
+	switch (pdev->kdrv) {
+	case RTE_KDRV_VFIO:
+#ifdef VFIO_PRESENT
+		/* TODO */
+#endif
+		break;
+	case RTE_KDRV_IGB_UIO:
+	case RTE_KDRV_UIO_GENERIC:
+		if (rte_eal_using_phys_addrs()) {
+			/* map resources for devices that use uio */
+			ret = pci_uio_remap_resource(pdev);
+		}
+		break;
+	case RTE_KDRV_NIC_UIO:
+		ret = pci_uio_remap_resource(pdev);
+		break;
+	default:
+		RTE_LOG(DEBUG, EAL,
+			"  Not managed by a supported kernel driver, skipped\n");
+		ret = -1;
+		break;
+	}
+
+	if (ret != 0)
+		RTE_LOG(ERR, EAL, "failed to handle hot unplug of %s",
+			pdev->name);
+	return ret;
+}
+
+static int
 pci_plug(struct rte_device *dev)
 {
 	return pci_probe_all_drivers(RTE_DEV_TO_PCI(dev));
@@ -503,6 +569,7 @@ struct rte_pci_bus rte_pci_bus = {
 		.unplug = pci_unplug,
 		.parse = pci_parse,
 		.get_iommu_class = rte_pci_get_iommu_class,
+		.handle_hot_unplug = pci_handle_hot_unplug,
 	},
 	.device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
 	.driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
diff --git a/drivers/bus/pci/pci_common_uio.c b/drivers/bus/pci/pci_common_uio.c
index 54bc20b..ba2c458 100644
--- a/drivers/bus/pci/pci_common_uio.c
+++ b/drivers/bus/pci/pci_common_uio.c
@@ -146,6 +146,38 @@ pci_uio_unmap(struct mapped_pci_resource *uio_res)
 	}
 }
 
+/* remap the PCI resource of a PCI device in anonymous virtual memory */
+int
+pci_uio_remap_resource(struct rte_pci_device *dev)
+{
+	int i;
+	void *map_address;
+
+	if (dev == NULL)
+		return -1;
+
+	/* Remap all BARs */
+	for (i = 0; i != PCI_MAX_RESOURCE; i++) {
+		/* skip empty BAR */
+		if (dev->mem_resource[i].phys_addr == 0)
+			continue;
+		pci_unmap_resource(dev->mem_resource[i].addr,
+				(size_t)dev->mem_resource[i].len);
+		map_address = pci_map_resource(
+				dev->mem_resource[i].addr, -1, 0,
+				(size_t)dev->mem_resource[i].len,
+				MAP_ANONYMOUS | MAP_FIXED);
+		if (map_address == MAP_FAILED) {
+			RTE_LOG(ERR, EAL,
+				"Cannot remap resource for device %s\n",
+				dev->name);
+			return -1;
+		}
+	}
+
+	return 0;
+}
+
 static struct mapped_pci_resource *
 pci_uio_find_resource(struct rte_pci_device *dev)
 {
diff --git a/drivers/bus/pci/private.h b/drivers/bus/pci/private.h
index 88fa587..cc1668c 100644
--- a/drivers/bus/pci/private.h
+++ b/drivers/bus/pci/private.h
@@ -173,6 +173,18 @@ void pci_uio_free_resource(struct rte_pci_device *dev,
 		struct mapped_pci_resource *uio_res);
 
 /**
+ * remap the pci uio resource.
+ *
+ * @param dev
+ *   Point to the struct rte pci device.
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+pci_uio_remap_resource(struct rte_pci_device *dev);
+
+/**
  * Map device memory to uio resource
  *
  * This function is private to EAL.
diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
index 6fb0834..d2c5778 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -168,6 +168,20 @@ typedef int (*rte_bus_unplug_t)(struct rte_device *dev);
 typedef int (*rte_bus_parse_t)(const char *name, void *addr);
 
 /**
+ * Implementation specific hot unplug handler function which is responsible
+ * for handle the failure when hot unplug the device, guaranty the system
+ * would not crash in the case.
+ * @param dev
+ *	Pointer of the device structure.
+ *
+ * @return
+ *	0 on success.
+ *	!0 on error.
+ */
+typedef int (*rte_bus_handle_hot_unplug_t)(struct rte_device *dev,
+						void *dev_addr);
+
+/**
  * Bus scan policies
  */
 enum rte_bus_scan_mode {
@@ -209,6 +223,8 @@ struct rte_bus {
 	rte_bus_plug_t plug;         /**< Probe single device for drivers */
 	rte_bus_unplug_t unplug;     /**< Remove single device from driver */
 	rte_bus_parse_t parse;       /**< Parse a device name */
+	rte_bus_handle_hot_unplug_t handle_hot_unplug; /**< handle hot unplug
+							device event */
 	struct rte_bus_conf conf;    /**< Bus configuration */
 	rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
 };
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH V20 2/4] eal: add failure handler mechanism for hot plug
  2018-04-18 13:38       ` [PATCH V20 0/4] add hot plug recovery mechanism Jeff Guo
  2018-04-18 13:38         ` [PATCH V20 1/4] bus/pci: introduce device hot unplug handle Jeff Guo
@ 2018-04-18 13:38         ` Jeff Guo
  2018-04-19  1:30           ` Zhang, Qi Z
                             ` (2 more replies)
  2018-04-18 13:38         ` [PATCH V20 3/4] igb_uio: fix uio release issue when hot unplug Jeff Guo
  2018-04-18 13:38         ` [PATCH V20 4/4] app/testpmd: show example to handler " Jeff Guo
  3 siblings, 3 replies; 494+ messages in thread
From: Jeff Guo @ 2018-04-18 13:38 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

This patch introduces a failure handler mechanism to handle device
hot unplug event. When device be hot plug out, the device resource
become invalid, if this resource is still be unexpected read/write,
system will crash. This patch let eal help application to handle
this fault, when sigbus error occur, check the failure address and
accordingly remap the invalid memory for the corresponding device,
that could guaranty the application not to be shut down when hot plug.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v20->v19:
refine the logic of remapping for multiple device.
---
 doc/guides/rel_notes/release_18_05.rst  |   6 ++
 lib/librte_eal/common/include/rte_dev.h |  11 +++
 lib/librte_eal/linuxapp/eal/eal_dev.c   | 124 +++++++++++++++++++++++++++++++-
 lib/librte_eal/rte_eal_version.map      |   1 +
 4 files changed, 141 insertions(+), 1 deletion(-)

diff --git a/doc/guides/rel_notes/release_18_05.rst b/doc/guides/rel_notes/release_18_05.rst
index a018ef5..a4ea9af 100644
--- a/doc/guides/rel_notes/release_18_05.rst
+++ b/doc/guides/rel_notes/release_18_05.rst
@@ -70,6 +70,12 @@ New Features
 
   Linux uevent is supported as backend of this device event notification framework.
 
+* **Added hot plug failure handler.**
+
+  Added a failure handler machenism to handle hot unplug device.
+
+  * ``rte_dev_handle_hot_unplug`` for handle hot unplug device failure.
+
 
 API Changes
 -----------
diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h
index 0955e9a..9933131 100644
--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -360,4 +360,15 @@ rte_dev_event_monitor_start(void);
 int __rte_experimental
 rte_dev_event_monitor_stop(void);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * It can be used to handle the device signal bus error. when signal bus error
+ * occur, the handler would check the failure address to find the corresponding
+ * device and remap the memory resource of the device, that would guaranty
+ * the system not crash when the device be hot unplug.
+ */
+void __rte_experimental
+rte_dev_handle_hot_unplug(void);
 #endif /* _RTE_DEV_H_ */
diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
index 9478a39..33e7026 100644
--- a/lib/librte_eal/linuxapp/eal/eal_dev.c
+++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
@@ -4,6 +4,8 @@
 
 #include <string.h>
 #include <unistd.h>
+#include <fcntl.h>
+#include <signal.h>
 #include <sys/socket.h>
 #include <linux/netlink.h>
 
@@ -13,12 +15,16 @@
 #include <rte_malloc.h>
 #include <rte_interrupts.h>
 #include <rte_alarm.h>
+#include <rte_bus.h>
+#include <rte_eal.h>
 
 #include "eal_private.h"
 
 static struct rte_intr_handle intr_handle = {.fd = -1 };
 static bool monitor_started;
 
+extern struct rte_bus_list rte_bus_list;
+
 #define EAL_UEV_MSG_LEN 4096
 #define EAL_UEV_MSG_ELEM_LEN 128
 
@@ -33,6 +39,68 @@ enum eal_dev_event_subsystem {
 };
 
 static int
+dev_uev_failure_process(struct rte_device *dev, void *dev_addr)
+{
+	struct rte_bus *bus;
+	int ret = 0;
+
+	if (!dev && !dev_addr) {
+		return -EINVAL;
+	} else if (dev) {
+		bus = rte_bus_find_by_device_name(dev->name);
+		if (bus->handle_hot_unplug) {
+			/**
+			 * call bus ops to handle hot unplug.
+			 */
+			ret = bus->handle_hot_unplug(dev, dev_addr);
+			if (ret) {
+				RTE_LOG(ERR, EAL,
+					"It cannot handle hot unplug "
+					"for device (%s) "
+					"on the bus.\n ",
+					dev->name);
+			}
+		}
+	} else {
+		TAILQ_FOREACH(bus, &rte_bus_list, next) {
+			if (bus->handle_hot_unplug) {
+				/**
+				 * call bus ops to handle hot unplug.
+				 */
+				ret = bus->handle_hot_unplug(dev, dev_addr);
+				if (ret) {
+					RTE_LOG(ERR, EAL,
+						"It cannot handle hot unplug "
+						"for the device "
+						"on the bus.\n ");
+				}
+			}
+		}
+	}
+	return ret;
+}
+
+static void sigbus_handler(int signum __rte_unused, siginfo_t *info,
+				void *ctx __rte_unused)
+{
+	int ret;
+
+	RTE_LOG(ERR, EAL, "SIGBUS error, fault address:%p\n", info->si_addr);
+	ret = dev_uev_failure_process(NULL, info->si_addr);
+	if (!ret)
+		RTE_LOG(DEBUG, EAL,
+			"SIGBUS error is because of hot unplug!\n");
+}
+
+static int cmp_dev_name(const struct rte_device *dev,
+	const void *_name)
+{
+	const char *name = _name;
+
+	return strcmp(dev->name, name);
+}
+
+static int
 dev_uev_socket_fd_create(void)
 {
 	struct sockaddr_nl addr;
@@ -146,6 +214,9 @@ dev_uev_handler(__rte_unused void *param)
 	struct rte_dev_event uevent;
 	int ret;
 	char buf[EAL_UEV_MSG_LEN];
+	struct rte_bus *bus;
+	struct rte_device *dev;
+	const char *busname;
 
 	memset(&uevent, 0, sizeof(struct rte_dev_event));
 	memset(buf, 0, EAL_UEV_MSG_LEN);
@@ -170,8 +241,41 @@ dev_uev_handler(__rte_unused void *param)
 	RTE_LOG(DEBUG, EAL, "receive uevent(name:%s, type:%d, subsystem:%d)\n",
 		uevent.devname, uevent.type, uevent.subsystem);
 
-	if (uevent.devname)
+	switch (uevent.subsystem) {
+	case EAL_DEV_EVENT_SUBSYSTEM_PCI:
+	case EAL_DEV_EVENT_SUBSYSTEM_UIO:
+		busname = "pci";
+		break;
+	default:
+		break;
+	}
+
+	if (uevent.devname) {
+		if (uevent.type == RTE_DEV_EVENT_REMOVE) {
+			bus = rte_bus_find_by_name(busname);
+			if (bus == NULL) {
+				RTE_LOG(ERR, EAL, "Cannot find bus (%s)\n",
+					uevent.devname);
+				return;
+			}
+			dev = bus->find_device(NULL, cmp_dev_name,
+					       uevent.devname);
+			if (dev == NULL) {
+				RTE_LOG(ERR, EAL,
+					"Cannot find unplugged device (%s)\n",
+					uevent.devname);
+				return;
+			}
+			ret = dev_uev_failure_process(dev, NULL);
+			if (ret) {
+				RTE_LOG(ERR, EAL, "Driver cannot remap the "
+					"device (%s)\n",
+					dev->name);
+				return;
+			}
+		}
 		dev_callback_process(uevent.devname, uevent.type);
+	}
 }
 
 int __rte_experimental
@@ -216,8 +320,26 @@ rte_dev_event_monitor_stop(void)
 		return ret;
 	}
 
+	/* recover sigbus. */
+	sigaction(SIGBUS, NULL, NULL);
+
 	close(intr_handle.fd);
 	intr_handle.fd = -1;
 	monitor_started = false;
+
 	return 0;
 }
+
+void __rte_experimental
+rte_dev_handle_hot_unplug(void)
+{
+	struct sigaction act;
+
+	/* set sigbus handler for hotplug. */
+	memset(&act, 0x00, sizeof(struct sigaction));
+	act.sa_sigaction = sigbus_handler;
+	sigemptyset(&act.sa_mask);
+	sigaddset(&act.sa_mask, SIGBUS);
+	act.sa_flags = SA_SIGINFO;
+	sigaction(SIGBUS, &act, NULL);
+}
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index d02d80b..39a0213 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -217,6 +217,7 @@ EXPERIMENTAL {
 	rte_dev_event_callback_unregister;
 	rte_dev_event_monitor_start;
 	rte_dev_event_monitor_stop;
+	rte_dev_handle_hot_unplug;
 	rte_eal_cleanup;
 	rte_eal_devargs_insert;
 	rte_eal_devargs_parse;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH V20 3/4] igb_uio: fix uio release issue when hot unplug
  2018-04-18 13:38       ` [PATCH V20 0/4] add hot plug recovery mechanism Jeff Guo
  2018-04-18 13:38         ` [PATCH V20 1/4] bus/pci: introduce device hot unplug handle Jeff Guo
  2018-04-18 13:38         ` [PATCH V20 2/4] eal: add failure handler mechanism for hot plug Jeff Guo
@ 2018-04-18 13:38         ` Jeff Guo
  2018-04-18 13:38         ` [PATCH V20 4/4] app/testpmd: show example to handler " Jeff Guo
  3 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-04-18 13:38 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

when device being hot unplug, release a none exist uio resource will
result kernel null pointer error, so this patch will check if device
has been remove before release uio release procedure, if so just return
back.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v20->v19:
split patch independently.
---
 kernel/linux/igb_uio/igb_uio.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/kernel/linux/igb_uio/igb_uio.c b/kernel/linux/igb_uio/igb_uio.c
index cbc5ab6..c296332 100644
--- a/kernel/linux/igb_uio/igb_uio.c
+++ b/kernel/linux/igb_uio/igb_uio.c
@@ -344,6 +344,10 @@ igbuio_pci_release(struct uio_info *info, struct inode *inode)
 	struct rte_uio_pci_dev *udev = info->priv;
 	struct pci_dev *dev = udev->pdev;
 
+	/* check if device has been remove before release */
+	if ((&dev->dev.kobj)->state_remove_uevent_sent == 1)
+		return -1;
+
 	mutex_lock(&udev->lock);
 	if (--udev->refcnt > 0) {
 		mutex_unlock(&udev->lock);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH V20 4/4] app/testpmd: show example to handler hot unplug
  2018-04-18 13:38       ` [PATCH V20 0/4] add hot plug recovery mechanism Jeff Guo
                           ` (2 preceding siblings ...)
  2018-04-18 13:38         ` [PATCH V20 3/4] igb_uio: fix uio release issue when hot unplug Jeff Guo
@ 2018-04-18 13:38         ` Jeff Guo
  2018-05-03  7:25           ` Matan Azrad
  3 siblings, 1 reply; 494+ messages in thread
From: Jeff Guo @ 2018-04-18 13:38 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

Use testpmd for example, to show how an application smoothly handle
failure when device being hot unplug. Once app detect the removal event,
the callback would be called, it first stop the packet forwarding, then
stop the port, close the port and finally detach the port.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v20->v19:
remove the auto binding example.
---
 app/test-pmd/testpmd.c | 29 +++++++++++++++++++++++++----
 1 file changed, 25 insertions(+), 4 deletions(-)

diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 5986ff7..3751901 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -1125,6 +1125,9 @@ run_pkt_fwd_on_lcore(struct fwd_lcore *fc, packet_fwd_t pkt_fwd)
 	tics_datum = rte_rdtsc();
 	tics_per_1sec = rte_get_timer_hz();
 #endif
+	if (hot_plug)
+		rte_dev_handle_hot_unplug();
+
 	fsm = &fwd_streams[fc->stream_idx];
 	nb_fs = fc->stream_nb;
 	do {
@@ -2069,6 +2072,26 @@ rmv_event_callback(void *arg)
 			dev->device->name);
 }
 
+static void
+rmv_dev_event_callback(char *dev_name)
+{
+	uint16_t port_id;
+	int ret;
+
+	ret = rte_eth_dev_get_port_by_name(dev_name, &port_id);
+	if (ret) {
+		printf("can not get port by device %s!\n", dev_name);
+		return;
+	}
+
+	RTE_ETH_VALID_PORTID_OR_RET(port_id);
+	printf("removing port id:%u\n", port_id);
+	stop_packet_forwarding();
+	stop_port(port_id);
+	close_port(port_id);
+	detach_port(port_id);
+}
+
 /* This function is used by the interrupt thread */
 static int
 eth_event_callback(portid_t port_id, enum rte_eth_event_type type, void *param,
@@ -2130,9 +2153,7 @@ eth_dev_event_callback(char *device_name, enum rte_dev_event_type type,
 	case RTE_DEV_EVENT_REMOVE:
 		RTE_LOG(ERR, EAL, "The device: %s has been removed!\n",
 			device_name);
-		/* TODO: After finish failure handle, begin to stop
-		 * packet forward, stop port, close port, detach port.
-		 */
+		rmv_dev_event_callback(device_name);
 		break;
 	case RTE_DEV_EVENT_ADD:
 		RTE_LOG(ERR, EAL, "The device: %s has been added!\n",
@@ -2640,7 +2661,7 @@ main(int argc, char** argv)
 			return -1;
 		}
 		eth_dev_event_callback_register();
-
+		rte_dev_handle_hot_unplug();
 	}
 
 	if (start_port(RTE_PORT_ALL) != 0)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* Re: [PATCH V20 2/4] eal: add failure handler mechanism for hot plug
  2018-04-18 13:38         ` [PATCH V20 2/4] eal: add failure handler mechanism for hot plug Jeff Guo
@ 2018-04-19  1:30           ` Zhang, Qi Z
  2018-04-20 11:14           ` Ananyev, Konstantin
  2018-04-20 16:16           ` Ananyev, Konstantin
  2 siblings, 0 replies; 494+ messages in thread
From: Zhang, Qi Z @ 2018-04-19  1:30 UTC (permalink / raw)
  To: Guo, Jia, stephen, Richardson, Bruce, Yigit, Ferruh, Ananyev,
	Konstantin, gaetan.rivet, Wu, Jingjing, thomas, motih, matan,
	Van Haaren, Harry, Tan, Jianfeng
  Cc: jblunck, shreyansh.jain, dev, Guo, Jia, Zhang, Helin

Hi Jeff

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jeff Guo
> Sent: Wednesday, April 18, 2018 9:38 PM
> To: stephen@networkplumber.org; Richardson, Bruce
> <bruce.richardson@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com>;
> Ananyev, Konstantin <konstantin.ananyev@intel.com>;
> gaetan.rivet@6wind.com; Wu, Jingjing <jingjing.wu@intel.com>;
> thomas@monjalon.net; motih@mellanox.com; matan@mellanox.com; Van
> Haaren, Harry <harry.van.haaren@intel.com>; Tan, Jianfeng
> <jianfeng.tan@intel.com>
> Cc: jblunck@infradead.org; shreyansh.jain@nxp.com; dev@dpdk.org; Guo,
> Jia <jia.guo@intel.com>; Zhang, Helin <helin.zhang@intel.com>
> Subject: [dpdk-dev] [PATCH V20 2/4] eal: add failure handler mechanism for
> hot plug
> 
> This patch introduces a failure handler mechanism to handle device hot
> unplug event. When device be hot plug out, the device resource become
> invalid, if this resource is still be unexpected read/write, system will crash.
> This patch let eal help application to handle this fault, when sigbus error
> occur, check the failure address and accordingly remap the invalid memory
> for the corresponding device, that could guaranty the application not to be
> shut down when hot plug.
> 
> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> ---
> v20->v19:
> refine the logic of remapping for multiple device.
> ---
>  doc/guides/rel_notes/release_18_05.rst  |   6 ++
>  lib/librte_eal/common/include/rte_dev.h |  11 +++
>  lib/librte_eal/linuxapp/eal/eal_dev.c   | 124
> +++++++++++++++++++++++++++++++-
>  lib/librte_eal/rte_eal_version.map      |   1 +
>  4 files changed, 141 insertions(+), 1 deletion(-)
> 
> diff --git a/doc/guides/rel_notes/release_18_05.rst
> b/doc/guides/rel_notes/release_18_05.rst
> index a018ef5..a4ea9af 100644
> --- a/doc/guides/rel_notes/release_18_05.rst
> +++ b/doc/guides/rel_notes/release_18_05.rst
> @@ -70,6 +70,12 @@ New Features
> 
>    Linux uevent is supported as backend of this device event notification
> framework.
> 
> +* **Added hot plug failure handler.**
> +
> +  Added a failure handler machenism to handle hot unplug device.
> +
> +  * ``rte_dev_handle_hot_unplug`` for handle hot unplug device failure.
> +
> 
>  API Changes
>  -----------
> diff --git a/lib/librte_eal/common/include/rte_dev.h
> b/lib/librte_eal/common/include/rte_dev.h
> index 0955e9a..9933131 100644
> --- a/lib/librte_eal/common/include/rte_dev.h
> +++ b/lib/librte_eal/common/include/rte_dev.h
> @@ -360,4 +360,15 @@ rte_dev_event_monitor_start(void);
>  int __rte_experimental
>  rte_dev_event_monitor_stop(void);
> 
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * It can be used to handle the device signal bus error. when signal
> +bus error
> + * occur, the handler would check the failure address to find the
> +corresponding
> + * device and remap the memory resource of the device, that would
> +guaranty
> + * the system not crash when the device be hot unplug.
> + */
> +void __rte_experimental
> +rte_dev_handle_hot_unplug(void);
>  #endif /* _RTE_DEV_H_ */
> diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c
> b/lib/librte_eal/linuxapp/eal/eal_dev.c
> index 9478a39..33e7026 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_dev.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
> @@ -4,6 +4,8 @@
> 
>  #include <string.h>
>  #include <unistd.h>
> +#include <fcntl.h>
> +#include <signal.h>
>  #include <sys/socket.h>
>  #include <linux/netlink.h>
> 
> @@ -13,12 +15,16 @@
>  #include <rte_malloc.h>
>  #include <rte_interrupts.h>
>  #include <rte_alarm.h>
> +#include <rte_bus.h>
> +#include <rte_eal.h>
> 
>  #include "eal_private.h"
> 
>  static struct rte_intr_handle intr_handle = {.fd = -1 };  static bool
> monitor_started;
> 
> +extern struct rte_bus_list rte_bus_list;
> +
>  #define EAL_UEV_MSG_LEN 4096
>  #define EAL_UEV_MSG_ELEM_LEN 128
> 
> @@ -33,6 +39,68 @@ enum eal_dev_event_subsystem {  };
> 
>  static int
> +dev_uev_failure_process(struct rte_device *dev, void *dev_addr) {
> +	struct rte_bus *bus;
> +	int ret = 0;
> +
> +	if (!dev && !dev_addr) {
> +		return -EINVAL;
> +	} else if (dev) {
> +		bus = rte_bus_find_by_device_name(dev->name);
> +		if (bus->handle_hot_unplug) {
> +			/**
> +			 * call bus ops to handle hot unplug.
> +			 */
> +			ret = bus->handle_hot_unplug(dev, dev_addr);
> +			if (ret) {
> +				RTE_LOG(ERR, EAL,
> +					"It cannot handle hot unplug "
> +					"for device (%s) "
> +					"on the bus.\n ",
> +					dev->name);
> +			}
> +		}
> +	} else {
> +		TAILQ_FOREACH(bus, &rte_bus_list, next) {
> +			if (bus->handle_hot_unplug) {
> +				/**
> +				 * call bus ops to handle hot unplug.
> +				 */
> +				ret = bus->handle_hot_unplug(dev, dev_addr);
> +				if (ret) {
> +					RTE_LOG(ERR, EAL,
> +						"It cannot handle hot unplug "
> +						"for the device "
> +						"on the bus.\n ");
> +				}
> +			}
> +		}
> +	}
> +	return ret;
> +}
> +
> +static void sigbus_handler(int signum __rte_unused, siginfo_t *info,
> +				void *ctx __rte_unused)
> +{
> +	int ret;
> +
> +	RTE_LOG(ERR, EAL, "SIGBUS error, fault address:%p\n", info->si_addr);
> +	ret = dev_uev_failure_process(NULL, info->si_addr);
> +	if (!ret)
> +		RTE_LOG(DEBUG, EAL,
> +			"SIGBUS error is because of hot unplug!\n"); }
> +
> +static int cmp_dev_name(const struct rte_device *dev,
> +	const void *_name)
> +{
> +	const char *name = _name;
> +
> +	return strcmp(dev->name, name);
> +}
> +
> +static int
>  dev_uev_socket_fd_create(void)
>  {
>  	struct sockaddr_nl addr;
> @@ -146,6 +214,9 @@ dev_uev_handler(__rte_unused void *param)
>  	struct rte_dev_event uevent;
>  	int ret;
>  	char buf[EAL_UEV_MSG_LEN];
> +	struct rte_bus *bus;
> +	struct rte_device *dev;
> +	const char *busname;
> 
>  	memset(&uevent, 0, sizeof(struct rte_dev_event));
>  	memset(buf, 0, EAL_UEV_MSG_LEN);
> @@ -170,8 +241,41 @@ dev_uev_handler(__rte_unused void *param)
>  	RTE_LOG(DEBUG, EAL, "receive uevent(name:%s, type:%d,
> subsystem:%d)\n",
>  		uevent.devname, uevent.type, uevent.subsystem);
> 
> -	if (uevent.devname)
> +	switch (uevent.subsystem) {
> +	case EAL_DEV_EVENT_SUBSYSTEM_PCI:
> +	case EAL_DEV_EVENT_SUBSYSTEM_UIO:
> +		busname = "pci";
> +		break;
> +	default:
> +		break;
> +	}
> +
> +	if (uevent.devname) {
> +		if (uevent.type == RTE_DEV_EVENT_REMOVE) {
> +			bus = rte_bus_find_by_name(busname);
> +			if (bus == NULL) {
> +				RTE_LOG(ERR, EAL, "Cannot find bus (%s)\n",
> +					uevent.devname);
> +				return;
> +			}
> +			dev = bus->find_device(NULL, cmp_dev_name,
> +					       uevent.devname);
> +			if (dev == NULL) {
> +				RTE_LOG(ERR, EAL,
> +					"Cannot find unplugged device (%s)\n",
> +					uevent.devname);
> +				return;
> +			}
> +			ret = dev_uev_failure_process(dev, NULL);
> +			if (ret) {
> +				RTE_LOG(ERR, EAL, "Driver cannot remap the "
> +					"device (%s)\n",
> +					dev->name);
> +				return;
> +			}
> +		}
>  		dev_callback_process(uevent.devname, uevent.type);
> +	}
>  }
> 
>  int __rte_experimental
> @@ -216,8 +320,26 @@ rte_dev_event_monitor_stop(void)
>  		return ret;
>  	}
> 
> +	/* recover sigbus. */
> +	sigaction(SIGBUS, NULL, NULL);
> +
>  	close(intr_handle.fd);
>  	intr_handle.fd = -1;
>  	monitor_started = false;
> +
>  	return 0;
>  }
> +
> +void __rte_experimental
> +rte_dev_handle_hot_unplug(void)
> +{
> +	struct sigaction act;
> +
> +	/* set sigbus handler for hotplug. */
> +	memset(&act, 0x00, sizeof(struct sigaction));
> +	act.sa_sigaction = sigbus_handler;
> +	sigemptyset(&act.sa_mask);
> +	sigaddset(&act.sa_mask, SIGBUS);
> +	act.sa_flags = SA_SIGINFO;
> +	sigaction(SIGBUS, &act, NULL);
> +}

Not sure if it's necessary to expose this API, 
it can be invoked in rte_dev_event_monitor_start, 
since register a sigbus handler looks like an init step when user device to enable hotplug

Regards
Qi

> diff --git a/lib/librte_eal/rte_eal_version.map
> b/lib/librte_eal/rte_eal_version.map
> index d02d80b..39a0213 100644
> --- a/lib/librte_eal/rte_eal_version.map
> +++ b/lib/librte_eal/rte_eal_version.map
> @@ -217,6 +217,7 @@ EXPERIMENTAL {
>  	rte_dev_event_callback_unregister;
>  	rte_dev_event_monitor_start;
>  	rte_dev_event_monitor_stop;
> +	rte_dev_handle_hot_unplug;
>  	rte_eal_cleanup;
>  	rte_eal_devargs_insert;
>  	rte_eal_devargs_parse;
> --
> 2.7.4

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V20 1/4] bus/pci: introduce device hot unplug handle
  2018-04-18 13:38         ` [PATCH V20 1/4] bus/pci: introduce device hot unplug handle Jeff Guo
@ 2018-04-20 10:32           ` Ananyev, Konstantin
  2018-05-03  3:05             ` Guo, Jia
  0 siblings, 1 reply; 494+ messages in thread
From: Ananyev, Konstantin @ 2018-04-20 10:32 UTC (permalink / raw)
  To: Guo, Jia, stephen, Richardson, Bruce, Yigit, Ferruh,
	gaetan.rivet, Wu, Jingjing, thomas, motih, matan, Van Haaren,
	Harry, Tan, Jianfeng
  Cc: jblunck, shreyansh.jain, dev, Zhang, Helin

Hi Jeff,

> As of device hot unplug, we need some preparatory measures so that we will
> not encounter memory fault after device be plug out of the system,
> and also let we could recover the running data path but not been break.
> This patch allows the buses to handle device hot unplug event.
> The patch only enable the ops in pci bus, when handle device hot unplug
> event, remap a dummy memory to avoid bus read/write error.
> Other buses could accordingly implement this ops specific by themselves.
> 
> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> ---
> v20->19:
> clean the code
> ---
>  drivers/bus/pci/pci_common.c            | 67 +++++++++++++++++++++++++++++++++
>  drivers/bus/pci/pci_common_uio.c        | 32 ++++++++++++++++
>  drivers/bus/pci/private.h               | 12 ++++++
>  lib/librte_eal/common/include/rte_bus.h | 16 ++++++++
>  4 files changed, 127 insertions(+)
> 
> diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
> index 2a00f36..709eaf3 100644
> --- a/drivers/bus/pci/pci_common.c
> +++ b/drivers/bus/pci/pci_common.c
> @@ -474,6 +474,72 @@ pci_find_device(const struct rte_device *start, rte_dev_cmp_t cmp,
>  }
> 
>  static int
> +pci_handle_hot_unplug(struct rte_device *dev, void *failure_addr)
> +{
> +	struct rte_pci_device *pdev = NULL;
> +	int ret = 0, i, isfound = 0;
> +
> +	if (failure_addr != NULL) {
> +		FOREACH_DEVICE_ON_PCIBUS(pdev) {
> +			for (i = 0; i != sizeof(pdev->mem_resource) /
> +				sizeof(pdev->mem_resource[0]); i++) {

You can do i != RTE_DIM(pdev->mem_resource) here.

> +				if ((uint64_t)failure_addr >=
> +				    (uint64_t)pdev->mem_resource[i].addr &&
> +				    (uint64_t)failure_addr <=
> +				    (uint64_t)pdev->mem_resource[i].addr +
> +				    pdev->mem_resource[i].len) {


I think it should be failure_addr < addr + len

> +					RTE_LOG(ERR, EAL, "Failure address "
> +						"%16.16"PRIx64" is belong to "
> +						"resource of device %s!\n",
> +						(uint64_t)failure_addr,
> +						pdev->device.name);
> +					isfound = 1;
> +					break;
> +				}
> +			}
> +			if (isfound)
> +				break;


Might be it is a good thing to put the code that searches for address into a separate function. 

> +		}
> +	} else if (dev != NULL) {
> +		pdev = RTE_DEV_TO_PCI(dev);
> +	} else {
> +		return -EINVAL;
> +	}
> +
> +	if (!pdev)
> +		return -1;
> +
> +	/* remap resources for devices */
> +	switch (pdev->kdrv) {
> +	case RTE_KDRV_VFIO:
> +#ifdef VFIO_PRESENT
> +		/* TODO */
> +#endif

Should set ret =-1 as not implemented now.

> +		break;
> +	case RTE_KDRV_IGB_UIO:
> +	case RTE_KDRV_UIO_GENERIC:
> +		if (rte_eal_using_phys_addrs()) {
> +			/* map resources for devices that use uio */
> +			ret = pci_uio_remap_resource(pdev);
> +		}
> +		break;
> +	case RTE_KDRV_NIC_UIO:
> +		ret = pci_uio_remap_resource(pdev);
> +		break;
> +	default:
> +		RTE_LOG(DEBUG, EAL,
> +			"  Not managed by a supported kernel driver, skipped\n");
> +		ret = -1;
> +		break;
> +	}
> +
> +	if (ret != 0)
> +		RTE_LOG(ERR, EAL, "failed to handle hot unplug of %s",
> +			pdev->name);
> +	return ret;
> +}
> +
> +static int
>  pci_plug(struct rte_device *dev)
>  {
>  	return pci_probe_all_drivers(RTE_DEV_TO_PCI(dev));
> @@ -503,6 +569,7 @@ struct rte_pci_bus rte_pci_bus = {
>  		.unplug = pci_unplug,
>  		.parse = pci_parse,
>  		.get_iommu_class = rte_pci_get_iommu_class,
> +		.handle_hot_unplug = pci_handle_hot_unplug,
>  	},
>  	.device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
>  	.driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
> diff --git a/drivers/bus/pci/pci_common_uio.c b/drivers/bus/pci/pci_common_uio.c
> index 54bc20b..ba2c458 100644
> --- a/drivers/bus/pci/pci_common_uio.c
> +++ b/drivers/bus/pci/pci_common_uio.c
> @@ -146,6 +146,38 @@ pci_uio_unmap(struct mapped_pci_resource *uio_res)
>  	}
>  }
> 
> +/* remap the PCI resource of a PCI device in anonymous virtual memory */
> +int
> +pci_uio_remap_resource(struct rte_pci_device *dev)
> +{
> +	int i;
> +	void *map_address;
> +
> +	if (dev == NULL)
> +		return -1;
> +
> +	/* Remap all BARs */
> +	for (i = 0; i != PCI_MAX_RESOURCE; i++) {
> +		/* skip empty BAR */
> +		if (dev->mem_resource[i].phys_addr == 0)
> +			continue;
> +		pci_unmap_resource(dev->mem_resource[i].addr,
> +				(size_t)dev->mem_resource[i].len);
> +		map_address = pci_map_resource(
> +				dev->mem_resource[i].addr, -1, 0,
> +				(size_t)dev->mem_resource[i].len,
> +				MAP_ANONYMOUS | MAP_FIXED);

Instead of using mumap/mmap() can we use mremap() here?
Might be a bit safer approach.

> +		if (map_address == MAP_FAILED) {
> +			RTE_LOG(ERR, EAL,
> +				"Cannot remap resource for device %s\n",
> +				dev->name);
> +			return -1;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
>  static struct mapped_pci_resource *
>  pci_uio_find_resource(struct rte_pci_device *dev)
>  {
> diff --git a/drivers/bus/pci/private.h b/drivers/bus/pci/private.h
> index 88fa587..cc1668c 100644
> --- a/drivers/bus/pci/private.h
> +++ b/drivers/bus/pci/private.h
> @@ -173,6 +173,18 @@ void pci_uio_free_resource(struct rte_pci_device *dev,
>  		struct mapped_pci_resource *uio_res);
> 
>  /**
> + * remap the pci uio resource.
> + *
> + * @param dev
> + *   Point to the struct rte pci device.
> + * @return
> + *   - On success, zero.
> + *   - On failure, a negative value.
> + */
> +int
> +pci_uio_remap_resource(struct rte_pci_device *dev);
> +
> +/**
>   * Map device memory to uio resource
>   *
>   * This function is private to EAL.
> diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
> index 6fb0834..d2c5778 100644
> --- a/lib/librte_eal/common/include/rte_bus.h
> +++ b/lib/librte_eal/common/include/rte_bus.h
> @@ -168,6 +168,20 @@ typedef int (*rte_bus_unplug_t)(struct rte_device *dev);
>  typedef int (*rte_bus_parse_t)(const char *name, void *addr);
> 
>  /**
> + * Implementation specific hot unplug handler function which is responsible
> + * for handle the failure when hot unplug the device, guaranty the system
> + * would not crash in the case.
> + * @param dev
> + *	Pointer of the device structure.
> + *
> + * @return
> + *	0 on success.
> + *	!0 on error.
> + */
> +typedef int (*rte_bus_handle_hot_unplug_t)(struct rte_device *dev,
> +						void *dev_addr);
> +
> +/**
>   * Bus scan policies
>   */
>  enum rte_bus_scan_mode {
> @@ -209,6 +223,8 @@ struct rte_bus {
>  	rte_bus_plug_t plug;         /**< Probe single device for drivers */
>  	rte_bus_unplug_t unplug;     /**< Remove single device from driver */
>  	rte_bus_parse_t parse;       /**< Parse a device name */
> +	rte_bus_handle_hot_unplug_t handle_hot_unplug; /**< handle hot unplug
> +							device event */
>  	struct rte_bus_conf conf;    /**< Bus configuration */
>  	rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
>  };
> --
> 2.7.4

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V20 2/4] eal: add failure handler mechanism for hot plug
  2018-04-18 13:38         ` [PATCH V20 2/4] eal: add failure handler mechanism for hot plug Jeff Guo
  2018-04-19  1:30           ` Zhang, Qi Z
@ 2018-04-20 11:14           ` Ananyev, Konstantin
  2018-05-03  3:13             ` Guo, Jia
  2018-04-20 16:16           ` Ananyev, Konstantin
  2 siblings, 1 reply; 494+ messages in thread
From: Ananyev, Konstantin @ 2018-04-20 11:14 UTC (permalink / raw)
  To: Guo, Jia, stephen, Richardson, Bruce, Yigit, Ferruh,
	gaetan.rivet, Wu, Jingjing, thomas, motih, matan, Van Haaren,
	Harry, Tan, Jianfeng
  Cc: jblunck, shreyansh.jain, dev, Zhang, Helin



> This patch introduces a failure handler mechanism to handle device
> hot unplug event. When device be hot plug out, the device resource
> become invalid, if this resource is still be unexpected read/write,
> system will crash. This patch let eal help application to handle
> this fault, when sigbus error occur, check the failure address and
> accordingly remap the invalid memory for the corresponding device,
> that could guaranty the application not to be shut down when hot plug.
> 
> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> ---
> v20->v19:
> refine the logic of remapping for multiple device.
> ---
>  doc/guides/rel_notes/release_18_05.rst  |   6 ++
>  lib/librte_eal/common/include/rte_dev.h |  11 +++
>  lib/librte_eal/linuxapp/eal/eal_dev.c   | 124 +++++++++++++++++++++++++++++++-
>  lib/librte_eal/rte_eal_version.map      |   1 +
>  4 files changed, 141 insertions(+), 1 deletion(-)
> 
> diff --git a/doc/guides/rel_notes/release_18_05.rst b/doc/guides/rel_notes/release_18_05.rst
> index a018ef5..a4ea9af 100644
> --- a/doc/guides/rel_notes/release_18_05.rst
> +++ b/doc/guides/rel_notes/release_18_05.rst
> @@ -70,6 +70,12 @@ New Features
> 
>    Linux uevent is supported as backend of this device event notification framework.
> 
> +* **Added hot plug failure handler.**
> +
> +  Added a failure handler machenism to handle hot unplug device.
> +
> +  * ``rte_dev_handle_hot_unplug`` for handle hot unplug device failure.
> +
> 
>  API Changes
>  -----------
> diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h
> index 0955e9a..9933131 100644
> --- a/lib/librte_eal/common/include/rte_dev.h
> +++ b/lib/librte_eal/common/include/rte_dev.h
> @@ -360,4 +360,15 @@ rte_dev_event_monitor_start(void);
>  int __rte_experimental
>  rte_dev_event_monitor_stop(void);
> 
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * It can be used to handle the device signal bus error. when signal bus error
> + * occur, the handler would check the failure address to find the corresponding
> + * device and remap the memory resource of the device, that would guaranty
> + * the system not crash when the device be hot unplug.
> + */
> +void __rte_experimental
> +rte_dev_handle_hot_unplug(void);
>  #endif /* _RTE_DEV_H_ */
> diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
> index 9478a39..33e7026 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_dev.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
> @@ -4,6 +4,8 @@
> 
>  #include <string.h>
>  #include <unistd.h>
> +#include <fcntl.h>
> +#include <signal.h>
>  #include <sys/socket.h>
>  #include <linux/netlink.h>
> 
> @@ -13,12 +15,16 @@
>  #include <rte_malloc.h>
>  #include <rte_interrupts.h>
>  #include <rte_alarm.h>
> +#include <rte_bus.h>
> +#include <rte_eal.h>
> 
>  #include "eal_private.h"
> 
>  static struct rte_intr_handle intr_handle = {.fd = -1 };
>  static bool monitor_started;
> 
> +extern struct rte_bus_list rte_bus_list;
> +
>  #define EAL_UEV_MSG_LEN 4096
>  #define EAL_UEV_MSG_ELEM_LEN 128
> 
> @@ -33,6 +39,68 @@ enum eal_dev_event_subsystem {
>  };
> 
>  static int
> +dev_uev_failure_process(struct rte_device *dev, void *dev_addr)
> +{
> +	struct rte_bus *bus;
> +	int ret = 0;
> +
> +	if (!dev && !dev_addr) {
> +		return -EINVAL;
> +	} else if (dev) {
> +		bus = rte_bus_find_by_device_name(dev->name);
> +		if (bus->handle_hot_unplug) {
> +			/**
> +			 * call bus ops to handle hot unplug.
> +			 */
> +			ret = bus->handle_hot_unplug(dev, dev_addr);
> +			if (ret) {
> +				RTE_LOG(ERR, EAL,
> +					"It cannot handle hot unplug "
> +					"for device (%s) "
> +					"on the bus.\n ",
> +					dev->name);
> +			}
> +		}


You would retrun 0 if bus->handle_hot_unplug == NULL.
Is that intended?
Shouldn't be I think.

> +	} else {
> +		TAILQ_FOREACH(bus, &rte_bus_list, next) {
> +			if (bus->handle_hot_unplug) {
> +				/**
> +				 * call bus ops to handle hot unplug.
> +				 */
> +				ret = bus->handle_hot_unplug(dev, dev_addr);
> +				if (ret) {
> +					RTE_LOG(ERR, EAL,
> +						"It cannot handle hot unplug "
> +						"for the device "
> +						"on the bus.\n ");
> +				}

So how we would know what happened here:
That address doesn't belong to that bus or unplug_handler failed?
Should we separate search and unplug ops?
Another question - shouldn't we break out of loop if bus->handle_hot_unplug()
returns 0?
Otherwise you can return error value even when unplug handled worked correctly.


> +			}
> +		}
> +	}
> +	return ret;
> +}
> +
> +static void sigbus_handler(int signum __rte_unused, siginfo_t *info,
> +				void *ctx __rte_unused)
> +{
> +	int ret;
> +
> +	RTE_LOG(ERR, EAL, "SIGBUS error, fault address:%p\n", info->si_addr);
> +	ret = dev_uev_failure_process(NULL, info->si_addr);

As now you can try to mmap/munmap same address from two or more different threads
you probably need some synchronization here.
Something simple as spinlock seems to be enough here. 
We might have one per device or might be even a global one would be ok here.

> +	if (!ret)
> +		RTE_LOG(DEBUG, EAL,
> +			"SIGBUS error is because of hot unplug!\n");
> +}
> +
> +static int cmp_dev_name(const struct rte_device *dev,
> +	const void *_name)
> +{
> +	const char *name = _name;
> +
> +	return strcmp(dev->name, name);
> +}

Is it really worth a separate function?

> +
> +static int
>  dev_uev_socket_fd_create(void)
>  {
>  	struct sockaddr_nl addr;
> @@ -146,6 +214,9 @@ dev_uev_handler(__rte_unused void *param)
>  	struct rte_dev_event uevent;
>  	int ret;
>  	char buf[EAL_UEV_MSG_LEN];
> +	struct rte_bus *bus;
> +	struct rte_device *dev;
> +	const char *busname;
> 
>  	memset(&uevent, 0, sizeof(struct rte_dev_event));
>  	memset(buf, 0, EAL_UEV_MSG_LEN);
> @@ -170,8 +241,41 @@ dev_uev_handler(__rte_unused void *param)
>  	RTE_LOG(DEBUG, EAL, "receive uevent(name:%s, type:%d, subsystem:%d)\n",
>  		uevent.devname, uevent.type, uevent.subsystem);
> 
> -	if (uevent.devname)
> +	switch (uevent.subsystem) {
> +	case EAL_DEV_EVENT_SUBSYSTEM_PCI:
> +	case EAL_DEV_EVENT_SUBSYSTEM_UIO:
> +		busname = "pci";
> +		break;
> +	default:
> +		break;
> +	}
> +
> +	if (uevent.devname) {
> +		if (uevent.type == RTE_DEV_EVENT_REMOVE) {
> +			bus = rte_bus_find_by_name(busname);
> +			if (bus == NULL) {
> +				RTE_LOG(ERR, EAL, "Cannot find bus (%s)\n",
> +					uevent.devname);
> +				return;
> +			}
> +			dev = bus->find_device(NULL, cmp_dev_name,
> +					       uevent.devname);
> +			if (dev == NULL) {
> +				RTE_LOG(ERR, EAL,
> +					"Cannot find unplugged device (%s)\n",
> +					uevent.devname);
> +				return;
> +			}
> +			ret = dev_uev_failure_process(dev, NULL);
> +			if (ret) {
> +				RTE_LOG(ERR, EAL, "Driver cannot remap the "
> +					"device (%s)\n",
> +					dev->name);
> +				return;
> +			}
> +		}
>  		dev_callback_process(uevent.devname, uevent.type);
> +	}
>  }
> 
>  int __rte_experimental
> @@ -216,8 +320,26 @@ rte_dev_event_monitor_stop(void)
>  		return ret;
>  	}
> 
> +	/* recover sigbus. */
> +	sigaction(SIGBUS, NULL, NULL);
> +

Probably better to restore previous action.

>  	close(intr_handle.fd);
>  	intr_handle.fd = -1;
>  	monitor_started = false;
> +
>  	return 0;
>  }
> +
> +void __rte_experimental
> +rte_dev_handle_hot_unplug(void)
> +{
> +	struct sigaction act;
> +
> +	/* set sigbus handler for hotplug. */
> +	memset(&act, 0x00, sizeof(struct sigaction));
> +	act.sa_sigaction = sigbus_handler;
> +	sigemptyset(&act.sa_mask);
> +	sigaddset(&act.sa_mask, SIGBUS);
> +	act.sa_flags = SA_SIGINFO;
> +	sigaction(SIGBUS, &act, NULL);
> +}
> diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
> index d02d80b..39a0213 100644
> --- a/lib/librte_eal/rte_eal_version.map
> +++ b/lib/librte_eal/rte_eal_version.map
> @@ -217,6 +217,7 @@ EXPERIMENTAL {
>  	rte_dev_event_callback_unregister;
>  	rte_dev_event_monitor_start;
>  	rte_dev_event_monitor_stop;
> +	rte_dev_handle_hot_unplug;
>  	rte_eal_cleanup;
>  	rte_eal_devargs_insert;
>  	rte_eal_devargs_parse;
> --
> 2.7.4

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V20 2/4] eal: add failure handler mechanism for hot plug
  2018-04-18 13:38         ` [PATCH V20 2/4] eal: add failure handler mechanism for hot plug Jeff Guo
  2018-04-19  1:30           ` Zhang, Qi Z
  2018-04-20 11:14           ` Ananyev, Konstantin
@ 2018-04-20 16:16           ` Ananyev, Konstantin
  2018-05-03  3:17             ` Guo, Jia
  2 siblings, 1 reply; 494+ messages in thread
From: Ananyev, Konstantin @ 2018-04-20 16:16 UTC (permalink / raw)
  To: Guo, Jia, stephen, Richardson, Bruce, Yigit, Ferruh,
	gaetan.rivet, Wu, Jingjing, thomas, motih, matan, Van Haaren,
	Harry, Tan, Jianfeng
  Cc: jblunck, shreyansh.jain, dev, Zhang, Helin


> > +
> > +static void sigbus_handler(int signum __rte_unused, siginfo_t *info,
> > +				void *ctx __rte_unused)
> > +{
> > +	int ret;
> > +
> > +	RTE_LOG(ERR, EAL, "SIGBUS error, fault address:%p\n", info->si_addr);
> > +	ret = dev_uev_failure_process(NULL, info->si_addr);
> 
> As now you can try to mmap/munmap same address from two or more different threads
> you probably need some synchronization here.
> Something simple as spinlock seems to be enough here.
> We might have one per device or might be even a global one would be ok here.
> 
> > +	if (!ret)
> > +		RTE_LOG(DEBUG, EAL,
> > +			"SIGBUS error is because of hot unplug!\n");

Also if sigbus handler wasn't able to fix things - failure addr doesn't belong to 
any devices, or remaping fails - we probably should invoke previously installed handler
or just apply default action.
Konstantin

> > +}
> > +

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V20 1/4] bus/pci: introduce device hot unplug handle
  2018-04-20 10:32           ` Ananyev, Konstantin
@ 2018-05-03  3:05             ` Guo, Jia
  0 siblings, 0 replies; 494+ messages in thread
From: Guo, Jia @ 2018-05-03  3:05 UTC (permalink / raw)
  To: Ananyev, Konstantin, stephen, Richardson, Bruce, Yigit, Ferruh,
	gaetan.rivet, Wu, Jingjing, thomas, motih, matan, Van Haaren,
	Harry, Tan, Jianfeng
  Cc: jblunck, shreyansh.jain, dev, Zhang, Helin



On 4/20/2018 6:32 PM, Ananyev, Konstantin wrote:
> Hi Jeff,
>
>> As of device hot unplug, we need some preparatory measures so that we will
>> not encounter memory fault after device be plug out of the system,
>> and also let we could recover the running data path but not been break.
>> This patch allows the buses to handle device hot unplug event.
>> The patch only enable the ops in pci bus, when handle device hot unplug
>> event, remap a dummy memory to avoid bus read/write error.
>> Other buses could accordingly implement this ops specific by themselves.
>>
>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
>> ---
>> v20->19:
>> clean the code
>> ---
>>   drivers/bus/pci/pci_common.c            | 67 +++++++++++++++++++++++++++++++++
>>   drivers/bus/pci/pci_common_uio.c        | 32 ++++++++++++++++
>>   drivers/bus/pci/private.h               | 12 ++++++
>>   lib/librte_eal/common/include/rte_bus.h | 16 ++++++++
>>   4 files changed, 127 insertions(+)
>>
>> diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
>> index 2a00f36..709eaf3 100644
>> --- a/drivers/bus/pci/pci_common.c
>> +++ b/drivers/bus/pci/pci_common.c
>> @@ -474,6 +474,72 @@ pci_find_device(const struct rte_device *start, rte_dev_cmp_t cmp,
>>   }
>>
>>   static int
>> +pci_handle_hot_unplug(struct rte_device *dev, void *failure_addr)
>> +{
>> +	struct rte_pci_device *pdev = NULL;
>> +	int ret = 0, i, isfound = 0;
>> +
>> +	if (failure_addr != NULL) {
>> +		FOREACH_DEVICE_ON_PCIBUS(pdev) {
>> +			for (i = 0; i != sizeof(pdev->mem_resource) /
>> +				sizeof(pdev->mem_resource[0]); i++) {
> You can do i != RTE_DIM(pdev->mem_resource) here.
sure.
>> +				if ((uint64_t)failure_addr >=
>> +				    (uint64_t)pdev->mem_resource[i].addr &&
>> +				    (uint64_t)failure_addr <=
>> +				    (uint64_t)pdev->mem_resource[i].addr +
>> +				    pdev->mem_resource[i].len) {
>
> I think it should be failure_addr < addr + len
i think you are right.
>> +					RTE_LOG(ERR, EAL, "Failure address "
>> +						"%16.16"PRIx64" is belong to "
>> +						"resource of device %s!\n",
>> +						(uint64_t)failure_addr,
>> +						pdev->device.name);
>> +					isfound = 1;
>> +					break;
>> +				}
>> +			}
>> +			if (isfound)
>> +				break;
>
> Might be it is a good thing to put the code that searches for address into a separate function.
good idea.
>> +		}
>> +	} else if (dev != NULL) {
>> +		pdev = RTE_DEV_TO_PCI(dev);
>> +	} else {
>> +		return -EINVAL;
>> +	}
>> +
>> +	if (!pdev)
>> +		return -1;
>> +
>> +	/* remap resources for devices */
>> +	switch (pdev->kdrv) {
>> +	case RTE_KDRV_VFIO:
>> +#ifdef VFIO_PRESENT
>> +		/* TODO */
>> +#endif
> Should set ret =-1 as not implemented now.
ok.
>> +		break;
>> +	case RTE_KDRV_IGB_UIO:
>> +	case RTE_KDRV_UIO_GENERIC:
>> +		if (rte_eal_using_phys_addrs()) {
>> +			/* map resources for devices that use uio */
>> +			ret = pci_uio_remap_resource(pdev);
>> +		}
>> +		break;
>> +	case RTE_KDRV_NIC_UIO:
>> +		ret = pci_uio_remap_resource(pdev);
>> +		break;
>> +	default:
>> +		RTE_LOG(DEBUG, EAL,
>> +			"  Not managed by a supported kernel driver, skipped\n");
>> +		ret = -1;
>> +		break;
>> +	}
>> +
>> +	if (ret != 0)
>> +		RTE_LOG(ERR, EAL, "failed to handle hot unplug of %s",
>> +			pdev->name);
>> +	return ret;
>> +}
>> +
>> +static int
>>   pci_plug(struct rte_device *dev)
>>   {
>>   	return pci_probe_all_drivers(RTE_DEV_TO_PCI(dev));
>> @@ -503,6 +569,7 @@ struct rte_pci_bus rte_pci_bus = {
>>   		.unplug = pci_unplug,
>>   		.parse = pci_parse,
>>   		.get_iommu_class = rte_pci_get_iommu_class,
>> +		.handle_hot_unplug = pci_handle_hot_unplug,
>>   	},
>>   	.device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
>>   	.driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
>> diff --git a/drivers/bus/pci/pci_common_uio.c b/drivers/bus/pci/pci_common_uio.c
>> index 54bc20b..ba2c458 100644
>> --- a/drivers/bus/pci/pci_common_uio.c
>> +++ b/drivers/bus/pci/pci_common_uio.c
>> @@ -146,6 +146,38 @@ pci_uio_unmap(struct mapped_pci_resource *uio_res)
>>   	}
>>   }
>>
>> +/* remap the PCI resource of a PCI device in anonymous virtual memory */
>> +int
>> +pci_uio_remap_resource(struct rte_pci_device *dev)
>> +{
>> +	int i;
>> +	void *map_address;
>> +
>> +	if (dev == NULL)
>> +		return -1;
>> +
>> +	/* Remap all BARs */
>> +	for (i = 0; i != PCI_MAX_RESOURCE; i++) {
>> +		/* skip empty BAR */
>> +		if (dev->mem_resource[i].phys_addr == 0)
>> +			continue;
>> +		pci_unmap_resource(dev->mem_resource[i].addr,
>> +				(size_t)dev->mem_resource[i].len);
>> +		map_address = pci_map_resource(
>> +				dev->mem_resource[i].addr, -1, 0,
>> +				(size_t)dev->mem_resource[i].len,
>> +				MAP_ANONYMOUS | MAP_FIXED);
> Instead of using mumap/mmap() can we use mremap() here?
> Might be a bit safer approach.
because of mremap not have the can not map an anonymous memory, so that 
is not fit for this case, and i check and found that  MAP_FIXED could 
overlap the part of the existing mapping, no need to use unmap at first 
before remap.
>> +		if (map_address == MAP_FAILED) {
>> +			RTE_LOG(ERR, EAL,
>> +				"Cannot remap resource for device %s\n",
>> +				dev->name);
>> +			return -1;
>> +		}
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>>   static struct mapped_pci_resource *
>>   pci_uio_find_resource(struct rte_pci_device *dev)
>>   {
>> diff --git a/drivers/bus/pci/private.h b/drivers/bus/pci/private.h
>> index 88fa587..cc1668c 100644
>> --- a/drivers/bus/pci/private.h
>> +++ b/drivers/bus/pci/private.h
>> @@ -173,6 +173,18 @@ void pci_uio_free_resource(struct rte_pci_device *dev,
>>   		struct mapped_pci_resource *uio_res);
>>
>>   /**
>> + * remap the pci uio resource.
>> + *
>> + * @param dev
>> + *   Point to the struct rte pci device.
>> + * @return
>> + *   - On success, zero.
>> + *   - On failure, a negative value.
>> + */
>> +int
>> +pci_uio_remap_resource(struct rte_pci_device *dev);
>> +
>> +/**
>>    * Map device memory to uio resource
>>    *
>>    * This function is private to EAL.
>> diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
>> index 6fb0834..d2c5778 100644
>> --- a/lib/librte_eal/common/include/rte_bus.h
>> +++ b/lib/librte_eal/common/include/rte_bus.h
>> @@ -168,6 +168,20 @@ typedef int (*rte_bus_unplug_t)(struct rte_device *dev);
>>   typedef int (*rte_bus_parse_t)(const char *name, void *addr);
>>
>>   /**
>> + * Implementation specific hot unplug handler function which is responsible
>> + * for handle the failure when hot unplug the device, guaranty the system
>> + * would not crash in the case.
>> + * @param dev
>> + *	Pointer of the device structure.
>> + *
>> + * @return
>> + *	0 on success.
>> + *	!0 on error.
>> + */
>> +typedef int (*rte_bus_handle_hot_unplug_t)(struct rte_device *dev,
>> +						void *dev_addr);
>> +
>> +/**
>>    * Bus scan policies
>>    */
>>   enum rte_bus_scan_mode {
>> @@ -209,6 +223,8 @@ struct rte_bus {
>>   	rte_bus_plug_t plug;         /**< Probe single device for drivers */
>>   	rte_bus_unplug_t unplug;     /**< Remove single device from driver */
>>   	rte_bus_parse_t parse;       /**< Parse a device name */
>> +	rte_bus_handle_hot_unplug_t handle_hot_unplug; /**< handle hot unplug
>> +							device event */
>>   	struct rte_bus_conf conf;    /**< Bus configuration */
>>   	rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
>>   };
>> --
>> 2.7.4

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V20 2/4] eal: add failure handler mechanism for hot plug
  2018-04-20 11:14           ` Ananyev, Konstantin
@ 2018-05-03  3:13             ` Guo, Jia
  0 siblings, 0 replies; 494+ messages in thread
From: Guo, Jia @ 2018-05-03  3:13 UTC (permalink / raw)
  To: Ananyev, Konstantin, stephen, Richardson, Bruce, Yigit, Ferruh,
	gaetan.rivet, Wu, Jingjing, thomas, motih, matan, Van Haaren,
	Harry, Tan, Jianfeng
  Cc: jblunck, shreyansh.jain, dev, Zhang, Helin



On 4/20/2018 7:14 PM, Ananyev, Konstantin wrote:
>
>> This patch introduces a failure handler mechanism to handle device
>> hot unplug event. When device be hot plug out, the device resource
>> become invalid, if this resource is still be unexpected read/write,
>> system will crash. This patch let eal help application to handle
>> this fault, when sigbus error occur, check the failure address and
>> accordingly remap the invalid memory for the corresponding device,
>> that could guaranty the application not to be shut down when hot plug.
>>
>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
>> ---
>> v20->v19:
>> refine the logic of remapping for multiple device.
>> ---
>>   doc/guides/rel_notes/release_18_05.rst  |   6 ++
>>   lib/librte_eal/common/include/rte_dev.h |  11 +++
>>   lib/librte_eal/linuxapp/eal/eal_dev.c   | 124 +++++++++++++++++++++++++++++++-
>>   lib/librte_eal/rte_eal_version.map      |   1 +
>>   4 files changed, 141 insertions(+), 1 deletion(-)
>>
>> diff --git a/doc/guides/rel_notes/release_18_05.rst b/doc/guides/rel_notes/release_18_05.rst
>> index a018ef5..a4ea9af 100644
>> --- a/doc/guides/rel_notes/release_18_05.rst
>> +++ b/doc/guides/rel_notes/release_18_05.rst
>> @@ -70,6 +70,12 @@ New Features
>>
>>     Linux uevent is supported as backend of this device event notification framework.
>>
>> +* **Added hot plug failure handler.**
>> +
>> +  Added a failure handler machenism to handle hot unplug device.
>> +
>> +  * ``rte_dev_handle_hot_unplug`` for handle hot unplug device failure.
>> +
>>
>>   API Changes
>>   -----------
>> diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h
>> index 0955e9a..9933131 100644
>> --- a/lib/librte_eal/common/include/rte_dev.h
>> +++ b/lib/librte_eal/common/include/rte_dev.h
>> @@ -360,4 +360,15 @@ rte_dev_event_monitor_start(void);
>>   int __rte_experimental
>>   rte_dev_event_monitor_stop(void);
>>
>> +/**
>> + * @warning
>> + * @b EXPERIMENTAL: this API may change without prior notice
>> + *
>> + * It can be used to handle the device signal bus error. when signal bus error
>> + * occur, the handler would check the failure address to find the corresponding
>> + * device and remap the memory resource of the device, that would guaranty
>> + * the system not crash when the device be hot unplug.
>> + */
>> +void __rte_experimental
>> +rte_dev_handle_hot_unplug(void);
>>   #endif /* _RTE_DEV_H_ */
>> diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
>> index 9478a39..33e7026 100644
>> --- a/lib/librte_eal/linuxapp/eal/eal_dev.c
>> +++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
>> @@ -4,6 +4,8 @@
>>
>>   #include <string.h>
>>   #include <unistd.h>
>> +#include <fcntl.h>
>> +#include <signal.h>
>>   #include <sys/socket.h>
>>   #include <linux/netlink.h>
>>
>> @@ -13,12 +15,16 @@
>>   #include <rte_malloc.h>
>>   #include <rte_interrupts.h>
>>   #include <rte_alarm.h>
>> +#include <rte_bus.h>
>> +#include <rte_eal.h>
>>
>>   #include "eal_private.h"
>>
>>   static struct rte_intr_handle intr_handle = {.fd = -1 };
>>   static bool monitor_started;
>>
>> +extern struct rte_bus_list rte_bus_list;
>> +
>>   #define EAL_UEV_MSG_LEN 4096
>>   #define EAL_UEV_MSG_ELEM_LEN 128
>>
>> @@ -33,6 +39,68 @@ enum eal_dev_event_subsystem {
>>   };
>>
>>   static int
>> +dev_uev_failure_process(struct rte_device *dev, void *dev_addr)
>> +{
>> +	struct rte_bus *bus;
>> +	int ret = 0;
>> +
>> +	if (!dev && !dev_addr) {
>> +		return -EINVAL;
>> +	} else if (dev) {
>> +		bus = rte_bus_find_by_device_name(dev->name);
>> +		if (bus->handle_hot_unplug) {
>> +			/**
>> +			 * call bus ops to handle hot unplug.
>> +			 */
>> +			ret = bus->handle_hot_unplug(dev, dev_addr);
>> +			if (ret) {
>> +				RTE_LOG(ERR, EAL,
>> +					"It cannot handle hot unplug "
>> +					"for device (%s) "
>> +					"on the bus.\n ",
>> +					dev->name);
>> +			}
>> +		}
>
> You would retrun 0 if bus->handle_hot_unplug == NULL.
> Is that intended?
> Shouldn't be I think.
shouldn't be. will modify it.
>> +	} else {
>> +		TAILQ_FOREACH(bus, &rte_bus_list, next) {
>> +			if (bus->handle_hot_unplug) {
>> +				/**
>> +				 * call bus ops to handle hot unplug.
>> +				 */
>> +				ret = bus->handle_hot_unplug(dev, dev_addr);
>> +				if (ret) {
>> +					RTE_LOG(ERR, EAL,
>> +						"It cannot handle hot unplug "
>> +						"for the device "
>> +						"on the bus.\n ");
>> +				}
> So how we would know what happened here:
> That address doesn't belong to that bus or unplug_handler failed?
> Should we separate search and unplug ops?
> Another question - shouldn't we break out of loop if bus->handle_hot_unplug()
> returns 0?
> Otherwise you can return error value even when unplug handled worked correctly.
>
you are right here.
>> +			}
>> +		}
>> +	}
>> +	return ret;
>> +}
>> +
>> +static void sigbus_handler(int signum __rte_unused, siginfo_t *info,
>> +				void *ctx __rte_unused)
>> +{
>> +	int ret;
>> +
>> +	RTE_LOG(ERR, EAL, "SIGBUS error, fault address:%p\n", info->si_addr);
>> +	ret = dev_uev_failure_process(NULL, info->si_addr);
> As now you can try to mmap/munmap same address from two or more different threads
> you probably need some synchronization here.
> Something simple as spinlock seems to be enough here.
> We might have one per device or might be even a global one would be ok here.
i think global one and synchronization would be fine.
>> +	if (!ret)
>> +		RTE_LOG(DEBUG, EAL,
>> +			"SIGBUS error is because of hot unplug!\n");
>> +}
>> +
>> +static int cmp_dev_name(const struct rte_device *dev,
>> +	const void *_name)
>> +{
>> +	const char *name = _name;
>> +
>> +	return strcmp(dev->name, name);
>> +}
> Is it really worth a separate function?
i think that would be the bus ops struct of rte_bus_find_device_t usage 
here.
>> +
>> +static int
>>   dev_uev_socket_fd_create(void)
>>   {
>>   	struct sockaddr_nl addr;
>> @@ -146,6 +214,9 @@ dev_uev_handler(__rte_unused void *param)
>>   	struct rte_dev_event uevent;
>>   	int ret;
>>   	char buf[EAL_UEV_MSG_LEN];
>> +	struct rte_bus *bus;
>> +	struct rte_device *dev;
>> +	const char *busname;
>>
>>   	memset(&uevent, 0, sizeof(struct rte_dev_event));
>>   	memset(buf, 0, EAL_UEV_MSG_LEN);
>> @@ -170,8 +241,41 @@ dev_uev_handler(__rte_unused void *param)
>>   	RTE_LOG(DEBUG, EAL, "receive uevent(name:%s, type:%d, subsystem:%d)\n",
>>   		uevent.devname, uevent.type, uevent.subsystem);
>>
>> -	if (uevent.devname)
>> +	switch (uevent.subsystem) {
>> +	case EAL_DEV_EVENT_SUBSYSTEM_PCI:
>> +	case EAL_DEV_EVENT_SUBSYSTEM_UIO:
>> +		busname = "pci";
>> +		break;
>> +	default:
>> +		break;
>> +	}
>> +
>> +	if (uevent.devname) {
>> +		if (uevent.type == RTE_DEV_EVENT_REMOVE) {
>> +			bus = rte_bus_find_by_name(busname);
>> +			if (bus == NULL) {
>> +				RTE_LOG(ERR, EAL, "Cannot find bus (%s)\n",
>> +					uevent.devname);
>> +				return;
>> +			}
>> +			dev = bus->find_device(NULL, cmp_dev_name,
>> +					       uevent.devname);
>> +			if (dev == NULL) {
>> +				RTE_LOG(ERR, EAL,
>> +					"Cannot find unplugged device (%s)\n",
>> +					uevent.devname);
>> +				return;
>> +			}
>> +			ret = dev_uev_failure_process(dev, NULL);
>> +			if (ret) {
>> +				RTE_LOG(ERR, EAL, "Driver cannot remap the "
>> +					"device (%s)\n",
>> +					dev->name);
>> +				return;
>> +			}
>> +		}
>>   		dev_callback_process(uevent.devname, uevent.type);
>> +	}
>>   }
>>
>>   int __rte_experimental
>> @@ -216,8 +320,26 @@ rte_dev_event_monitor_stop(void)
>>   		return ret;
>>   	}
>>
>> +	/* recover sigbus. */
>> +	sigaction(SIGBUS, NULL, NULL);
>> +
> Probably better to restore previous action.
correct, restore the previous sigbus action so that no affect other.
>>   	close(intr_handle.fd);
>>   	intr_handle.fd = -1;
>>   	monitor_started = false;
>> +
>>   	return 0;
>>   }
>> +
>> +void __rte_experimental
>> +rte_dev_handle_hot_unplug(void)
>> +{
>> +	struct sigaction act;
>> +
>> +	/* set sigbus handler for hotplug. */
>> +	memset(&act, 0x00, sizeof(struct sigaction));
>> +	act.sa_sigaction = sigbus_handler;
>> +	sigemptyset(&act.sa_mask);
>> +	sigaddset(&act.sa_mask, SIGBUS);
>> +	act.sa_flags = SA_SIGINFO;
>> +	sigaction(SIGBUS, &act, NULL);
>> +}
>> diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
>> index d02d80b..39a0213 100644
>> --- a/lib/librte_eal/rte_eal_version.map
>> +++ b/lib/librte_eal/rte_eal_version.map
>> @@ -217,6 +217,7 @@ EXPERIMENTAL {
>>   	rte_dev_event_callback_unregister;
>>   	rte_dev_event_monitor_start;
>>   	rte_dev_event_monitor_stop;
>> +	rte_dev_handle_hot_unplug;
>>   	rte_eal_cleanup;
>>   	rte_eal_devargs_insert;
>>   	rte_eal_devargs_parse;
>> --
>> 2.7.4

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V20 2/4] eal: add failure handler mechanism for hot plug
  2018-04-20 16:16           ` Ananyev, Konstantin
@ 2018-05-03  3:17             ` Guo, Jia
  0 siblings, 0 replies; 494+ messages in thread
From: Guo, Jia @ 2018-05-03  3:17 UTC (permalink / raw)
  To: Ananyev, Konstantin, stephen, Richardson, Bruce, Yigit, Ferruh,
	gaetan.rivet, Wu, Jingjing, thomas, motih, matan, Van Haaren,
	Harry, Tan, Jianfeng
  Cc: jblunck, shreyansh.jain, dev, Zhang, Helin



On 4/21/2018 12:16 AM, Ananyev, Konstantin wrote:
>>> +
>>> +static void sigbus_handler(int signum __rte_unused, siginfo_t *info,
>>> +				void *ctx __rte_unused)
>>> +{
>>> +	int ret;
>>> +
>>> +	RTE_LOG(ERR, EAL, "SIGBUS error, fault address:%p\n", info->si_addr);
>>> +	ret = dev_uev_failure_process(NULL, info->si_addr);
>> As now you can try to mmap/munmap same address from two or more different threads
>> you probably need some synchronization here.
>> Something simple as spinlock seems to be enough here.
>> We might have one per device or might be even a global one would be ok here.
>>
>>> +	if (!ret)
>>> +		RTE_LOG(DEBUG, EAL,
>>> +			"SIGBUS error is because of hot unplug!\n");
> Also if sigbus handler wasn't able to fix things - failure addr doesn't belong to
> any devices, or remaping fails - we probably should invoke previously installed handler
> or just apply default action.
> Konstantin
i think just exception here by exit for apply default action, and info 
that is a normal sigbus error should be ok.
>>> +}
>>> +

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V20 4/4] app/testpmd: show example to handler hot unplug
  2018-04-18 13:38         ` [PATCH V20 4/4] app/testpmd: show example to handler " Jeff Guo
@ 2018-05-03  7:25           ` Matan Azrad
  2018-05-03  9:35             ` Guo, Jia
  0 siblings, 1 reply; 494+ messages in thread
From: Matan Azrad @ 2018-05-03  7:25 UTC (permalink / raw)
  To: Jeff Guo, stephen, bruce.richardson, ferruh.yigit,
	konstantin.ananyev, gaetan.rivet, jingjing.wu, Thomas Monjalon,
	Mordechay Haimovsky, harry.van.haaren, jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, helin.zhang

Hi Jeff

> From: Jeff Guo, Wednesday, April 18, 2018 4:38 PM
> Use testpmd for example, to show how an application smoothly handle
> failure when device being hot unplug. Once app detect the removal event,
> the callback would be called, it first stop the packet forwarding, then stop the
> port, close the port and finally detach the port.
> 
> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> ---
> v20->v19:
> remove the auto binding example.
> ---
>  app/test-pmd/testpmd.c | 29 +++++++++++++++++++++++++----
>  1 file changed, 25 insertions(+), 4 deletions(-)
> 
> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
> 5986ff7..3751901 100644
> --- a/app/test-pmd/testpmd.c
> +++ b/app/test-pmd/testpmd.c
> @@ -1125,6 +1125,9 @@ run_pkt_fwd_on_lcore(struct fwd_lcore *fc,
> packet_fwd_t pkt_fwd)
>  	tics_datum = rte_rdtsc();
>  	tics_per_1sec = rte_get_timer_hz();
>  #endif
> +	if (hot_plug)
> +		rte_dev_handle_hot_unplug();
> +

Again, I don't understand why the application should configure it - it already started the hot-plug,
Can't the EAL handle this automatically when the user starts the hot-plug?

>  	fsm = &fwd_streams[fc->stream_idx];
>  	nb_fs = fc->stream_nb;
>  	do {
> @@ -2069,6 +2072,26 @@ rmv_event_callback(void *arg)
>  			dev->device->name);
>  }
> 
> +static void
> +rmv_dev_event_callback(char *dev_name)
> +{
> +	uint16_t port_id;
> +	int ret;
> +
> +	ret = rte_eth_dev_get_port_by_name(dev_name, &port_id);
> +	if (ret) {
> +		printf("can not get port by device %s!\n", dev_name);
> +		return;
> +	}
> +
> +	RTE_ETH_VALID_PORTID_OR_RET(port_id);
> +	printf("removing port id:%u\n", port_id);
> +	stop_packet_forwarding();
> +	stop_port(port_id);
> +	close_port(port_id);
> +	detach_port(port_id);
> +}

We have also the rmv_event_callback() which is triggered by a RMV interrupt and running by the host thread.
What is the context thread of rmv_dev_event_callback()?
Shouldn't they be synchronized? Should we need both in the same time?

> +
>  /* This function is used by the interrupt thread */  static int
> eth_event_callback(portid_t port_id, enum rte_eth_event_type type, void
> *param, @@ -2130,9 +2153,7 @@ eth_dev_event_callback(char
> *device_name, enum rte_dev_event_type type,
>  	case RTE_DEV_EVENT_REMOVE:
>  		RTE_LOG(ERR, EAL, "The device: %s has been removed!\n",
>  			device_name);
> -		/* TODO: After finish failure handle, begin to stop
> -		 * packet forward, stop port, close port, detach port.
> -		 */
> +		rmv_dev_event_callback(device_name);
>  		break;
>  	case RTE_DEV_EVENT_ADD:
>  		RTE_LOG(ERR, EAL, "The device: %s has been added!\n",
> @@ -2640,7 +2661,7 @@ main(int argc, char** argv)
>  			return -1;
>  		}
>  		eth_dev_event_callback_register();
> -
> +		rte_dev_handle_hot_unplug();
>  	}
> 
>  	if (start_port(RTE_PORT_ALL) != 0)
> --
> 2.7.4

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH V21 0/4] hot plug recovery mechanism
  2017-06-29  4:37     ` [PATCH v3 0/2] add uevent api for hot plug Jeff Guo
                         ` (3 preceding siblings ...)
  2018-04-18 13:38       ` [PATCH V20 0/4] add hot plug recovery mechanism Jeff Guo
@ 2018-05-03  8:57       ` Jeff Guo
  2018-05-03  8:57         ` [PATCH V21 1/4] bus/pci: handle device hot unplug Jeff Guo
                           ` (3 more replies)
  2018-05-03 10:48       ` [PATCH V21 0/4] hot plug recovery mechanism Jeff Guo
                         ` (18 subsequent siblings)
  23 siblings, 4 replies; 494+ messages in thread
From: Jeff Guo @ 2018-05-03  8:57 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

At the prior, device event monitor framework have been introduced,
the typical usage is for device hot plug. If we want application
would not be break down when device hot plug in or out, we still need
some measures to help app to handle that, such as recovery device for
device detaching, so that app can keep running smoothly but not be
disturbed by any hotplug behaviors.

This patch set will introduces an recovery mechanism to handle hot unplug,
and also use testpmd to show example of how to use this mechanism to process
hot plug event. The process could be shown as below:

plug out->failure handle->stop forward->stop port->close port->detach port

with this mechanism, user such as fail-safe driver or testpmd could be
able to develop their own hot plug application.

patchset history:
v21->v20:
split function in hot unplug ops
sync failure hanlde to fix multiple process issue
fix attach port issue for multiple devices case.  

v20->v19:
clean the code
refine the remap logic for multiple device.
remove the auto binding  

v19->18:
note for limitation of multiple hotplug,fix some typo, sqeeze patch.

v18->v15:
add document, add signal bus handler, refine the code to be more clear.

the prior patch history please check the patch set "add device event monitor framework"

Jeff Guo (4):
  bus/pci: handle device hot unplug
  eal: add failure handle mechanism for hot plug
  igb_uio: fix uio release issue when hot unplug
  app/testpmd: show example to handle hot unplug

 app/test-pmd/testpmd.c                  |  28 ++++--
 drivers/bus/pci/pci_common.c            |  65 ++++++++++++++
 drivers/bus/pci/pci_common_uio.c        |  33 +++++++
 drivers/bus/pci/private.h               |  12 +++
 kernel/linux/igb_uio/igb_uio.c          |   4 +
 lib/librte_eal/common/include/rte_bus.h |  16 ++++
 lib/librte_eal/linuxapp/eal/eal_dev.c   | 154 +++++++++++++++++++++++++++++++-
 7 files changed, 306 insertions(+), 6 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH V21 1/4] bus/pci: handle device hot unplug
  2018-05-03  8:57       ` [PATCH V21 0/4] hot plug recovery mechanism Jeff Guo
@ 2018-05-03  8:57         ` Jeff Guo
  2018-05-03  8:57         ` [PATCH V21 2/4] eal: add failure handle mechanism for hot plug Jeff Guo
                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-05-03  8:57 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

As of device hot unplug, we need some preparatory measures, so that when
we encounter memory fault (like SIGBUS error) due to the unplug action,
we can recover instead of crash.

To handle device hot unplug is bus-specific behavior, this patch introduces
a bus ops so that each kind of bus can implement its own logic. Further,
this patch implements the ops for PCI bus: remap a dummy memory to avoid
bus read/write error.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v21->v20:
split function in hot unplug ops
---
 drivers/bus/pci/pci_common.c            | 65 +++++++++++++++++++++++++++++++++
 drivers/bus/pci/pci_common_uio.c        | 33 +++++++++++++++++
 drivers/bus/pci/private.h               | 12 ++++++
 lib/librte_eal/common/include/rte_bus.h | 16 ++++++++
 4 files changed, 126 insertions(+)

diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
index 7215aae..74d9aa8 100644
--- a/drivers/bus/pci/pci_common.c
+++ b/drivers/bus/pci/pci_common.c
@@ -472,6 +472,70 @@ pci_find_device(const struct rte_device *start, rte_dev_cmp_t cmp,
 	return NULL;
 }
 
+static struct rte_pci_device *
+pci_find_device_by_addr(void *failure_addr)
+{
+	struct rte_pci_device *pdev = NULL;
+	int i;
+
+	FOREACH_DEVICE_ON_PCIBUS(pdev) {
+		for (i = 0; i != RTE_DIM(pdev->mem_resource); i++) {
+			if ((uint64_t)(uintptr_t)failure_addr >=
+			    (uint64_t)(uintptr_t)pdev->mem_resource[i].addr &&
+			    (uint64_t)(uintptr_t)failure_addr <
+			    (uint64_t)(uintptr_t)pdev->mem_resource[i].addr +
+			    pdev->mem_resource[i].len) {
+				RTE_LOG(ERR, EAL, "Failure address "
+					"%16.16"PRIx64" belongs to "
+					"device %s!\n",
+					(uint64_t)(uintptr_t)failure_addr,
+					pdev->device.name);
+				return pdev;
+			}
+		}
+	}
+	return NULL;
+}
+static int
+pci_handle_hot_unplug(struct rte_device *dev, void *failure_addr)
+{
+	struct rte_pci_device *pdev = NULL;
+	int ret = 0;
+
+	if (dev != NULL)
+		pdev = RTE_DEV_TO_PCI(dev);
+	else
+		pdev = pci_find_device_by_addr(failure_addr);
+
+	if (!pdev)
+		return -1;
+
+	/* remap resources for devices */
+	switch (pdev->kdrv) {
+	case RTE_KDRV_VFIO:
+#ifdef VFIO_PRESENT
+		/* TODO */
+		ret = -1;
+#endif
+		break;
+	case RTE_KDRV_IGB_UIO:
+	case RTE_KDRV_UIO_GENERIC:
+	case RTE_KDRV_NIC_UIO:
+		ret = pci_uio_remap_resource(pdev);
+		break;
+	default:
+		RTE_LOG(DEBUG, EAL,
+			"Not managed by a supported kernel driver, skipped\n");
+		ret = -1;
+		break;
+	}
+
+	if (ret != 0)
+		RTE_LOG(ERR, EAL, "Failed to handle hot unplug of device %s",
+			pdev->name);
+	return ret;
+}
+
 static int
 pci_plug(struct rte_device *dev)
 {
@@ -502,6 +566,7 @@ struct rte_pci_bus rte_pci_bus = {
 		.unplug = pci_unplug,
 		.parse = pci_parse,
 		.get_iommu_class = rte_pci_get_iommu_class,
+		.handle_hot_unplug = pci_handle_hot_unplug,
 	},
 	.device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
 	.driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
diff --git a/drivers/bus/pci/pci_common_uio.c b/drivers/bus/pci/pci_common_uio.c
index 54bc20b..7ea73db 100644
--- a/drivers/bus/pci/pci_common_uio.c
+++ b/drivers/bus/pci/pci_common_uio.c
@@ -146,6 +146,39 @@ pci_uio_unmap(struct mapped_pci_resource *uio_res)
 	}
 }
 
+/* remap the PCI resource of a PCI device in anonymous virtual memory */
+int
+pci_uio_remap_resource(struct rte_pci_device *dev)
+{
+	int i;
+	void *map_address;
+
+	if (dev == NULL)
+		return -1;
+
+	/* Remap all BARs */
+	for (i = 0; i != PCI_MAX_RESOURCE; i++) {
+		/* skip empty BAR */
+		if (dev->mem_resource[i].phys_addr == 0)
+			continue;
+		map_address = mmap(dev->mem_resource[i].addr,
+				(size_t)dev->mem_resource[i].len,
+				PROT_READ | PROT_WRITE,
+				MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
+		if (map_address == MAP_FAILED) {
+			RTE_LOG(ERR, EAL,
+				"Cannot remap resource for device %s\n",
+				dev->name);
+			return -1;
+		}
+		RTE_LOG(INFO, EAL,
+			"Successful remap resource for device %s\n",
+			dev->name);
+	}
+
+	return 0;
+}
+
 static struct mapped_pci_resource *
 pci_uio_find_resource(struct rte_pci_device *dev)
 {
diff --git a/drivers/bus/pci/private.h b/drivers/bus/pci/private.h
index 88fa587..5551506 100644
--- a/drivers/bus/pci/private.h
+++ b/drivers/bus/pci/private.h
@@ -173,6 +173,18 @@ void pci_uio_free_resource(struct rte_pci_device *dev,
 		struct mapped_pci_resource *uio_res);
 
 /**
+ * Remap the PCI resource of a PCI device in anonymous virtual memory.
+ *
+ * @param dev
+ *   Point to the struct rte pci device.
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+pci_uio_remap_resource(struct rte_pci_device *dev);
+
+/**
  * Map device memory to uio resource
  *
  * This function is private to EAL.
diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
index eb9eded..6a5609f 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -168,6 +168,20 @@ typedef int (*rte_bus_unplug_t)(struct rte_device *dev);
 typedef int (*rte_bus_parse_t)(const char *name, void *addr);
 
 /**
+ * Implementation a specific hot unplug handler, which is responsible
+ * for handle the failure when hot unplug the device, guaranty the system
+ * would not hung in the case.
+ * @param dev
+ *	Pointer of the device structure.
+ *
+ * @return
+ *	0 on success.
+ *	!0 on error.
+ */
+typedef int (*rte_bus_handle_hot_unplug_t)(struct rte_device *dev,
+						void *dev_addr);
+
+/**
  * Bus scan policies
  */
 enum rte_bus_scan_mode {
@@ -209,6 +223,8 @@ struct rte_bus {
 	rte_bus_plug_t plug;         /**< Probe single device for drivers */
 	rte_bus_unplug_t unplug;     /**< Remove single device from driver */
 	rte_bus_parse_t parse;       /**< Parse a device name */
+	rte_bus_handle_hot_unplug_t handle_hot_unplug; /**< handle hot unplug
+							device event */
 	struct rte_bus_conf conf;    /**< Bus configuration */
 	rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
 };
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH V21 2/4] eal: add failure handle mechanism for hot plug
  2018-05-03  8:57       ` [PATCH V21 0/4] hot plug recovery mechanism Jeff Guo
  2018-05-03  8:57         ` [PATCH V21 1/4] bus/pci: handle device hot unplug Jeff Guo
@ 2018-05-03  8:57         ` Jeff Guo
  2018-05-03  8:57         ` [PATCH V21 3/4] igb_uio: fix uio release issue when hot unplug Jeff Guo
  2018-05-03  8:57         ` [PATCH V21 4/4] app/testpmd: show example to handle " Jeff Guo
  3 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-05-03  8:57 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

This patch introduces a failure handler mechanism to handle device
hot unplug event. When device be hot plug out, the device resource
become invalid, if this resource is still be unexpected read/write,
system will crash. This patch let eal help application to handle
this fault, when sigbus error occur, check the failure address and
accordingly remap the invalid memory for the corresponding device,
that could guaranty the application not to be shut down when hot plug.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v21->v20:
sync failure hanlde to fix multiple process issue
---
 lib/librte_eal/linuxapp/eal/eal_dev.c | 154 +++++++++++++++++++++++++++++++++-
 1 file changed, 153 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
index 1cf6aeb..3067f39 100644
--- a/lib/librte_eal/linuxapp/eal/eal_dev.c
+++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
@@ -4,6 +4,8 @@
 
 #include <string.h>
 #include <unistd.h>
+#include <fcntl.h>
+#include <signal.h>
 #include <sys/socket.h>
 #include <linux/netlink.h>
 
@@ -14,15 +16,27 @@
 #include <rte_malloc.h>
 #include <rte_interrupts.h>
 #include <rte_alarm.h>
+#include <rte_bus.h>
+#include <rte_eal.h>
+#include <rte_spinlock.h>
 
 #include "eal_private.h"
 
 static struct rte_intr_handle intr_handle = {.fd = -1 };
 static bool monitor_started;
 
+extern struct rte_bus_list rte_bus_list;
+
 #define EAL_UEV_MSG_LEN 4096
 #define EAL_UEV_MSG_ELEM_LEN 128
 
+/* spinlock for device failure process */
+static rte_spinlock_t dev_failure_lock = RTE_SPINLOCK_INITIALIZER;
+
+static struct sigaction sigbus_action_old;
+
+static int sigbus_need_recover;
+
 static void dev_uev_handler(__rte_unused void *param);
 
 /* identify the system layer which reports this event. */
@@ -34,6 +48,93 @@ enum eal_dev_event_subsystem {
 };
 
 static int
+dev_uev_failure_process(struct rte_device *dev, void *dev_addr)
+{
+	struct rte_bus *bus;
+	int ret = 0;
+
+	if (!dev && !dev_addr) {
+		return -EINVAL;
+	} else if (dev) {
+		bus = rte_bus_find_by_device_name(dev->name);
+		if (bus->handle_hot_unplug) {
+			/**
+			 * call bus ops to handle hot unplug.
+			 */
+			ret = bus->handle_hot_unplug(dev, dev_addr);
+			if (ret) {
+				RTE_LOG(ERR, EAL,
+					"Cannot handle hot unplug "
+					"for device %s "
+					"on the bus %s.\n ",
+					dev->name, bus->name);
+			}
+		} else {
+			RTE_LOG(ERR, EAL,
+				"Not support handle hot unplug for bus %s!\n",
+				bus->name);
+			ret = -ENOTSUP;
+		}
+	} else {
+		TAILQ_FOREACH(bus, &rte_bus_list, next) {
+			if (bus->handle_hot_unplug) {
+				/**
+				 * call bus ops to handle hot unplug.
+				 */
+				ret = bus->handle_hot_unplug(dev, dev_addr);
+				if (ret)
+					RTE_LOG(ERR, EAL,
+						"Cannot handle hot unplug "
+						"for the device "
+						"on the bus %s!\n", bus->name);
+				else
+					break;
+			} else {
+				RTE_LOG(ERR, EAL,
+					"Not support handle hot unplug "
+					"for bus %s!\n", bus->name);
+				ret = -ENOTSUP;
+			}
+		}
+	}
+	return ret;
+}
+
+static void
+sigbus_action_recover(void)
+{
+	if (sigbus_need_recover) {
+		sigaction(SIGBUS, &sigbus_action_old, NULL);
+		sigbus_need_recover = 0;
+	}
+}
+
+static void sigbus_handler(int signum __rte_unused, siginfo_t *info,
+				void *ctx __rte_unused)
+{
+	int ret;
+
+	RTE_LOG(DEBUG, EAL, "Thread[%d] catch SIGBUS, fault address:%p\n",
+		(int)pthread_self(), info->si_addr);
+	rte_spinlock_lock(&dev_failure_lock);
+	ret = dev_uev_failure_process(NULL, info->si_addr);
+	rte_spinlock_unlock(&dev_failure_lock);
+	if (!ret)
+		RTE_LOG(DEBUG, EAL,
+			"Success to handle SIGBUS error for hot unplug!\n");
+	else
+		rte_exit(EXIT_FAILURE, "exit for SIGBUS error!");
+}
+
+static int cmp_dev_name(const struct rte_device *dev,
+	const void *_name)
+{
+	const char *name = _name;
+
+	return strcmp(dev->name, name);
+}
+
+static int
 dev_uev_socket_fd_create(void)
 {
 	struct sockaddr_nl addr;
@@ -147,6 +248,9 @@ dev_uev_handler(__rte_unused void *param)
 	struct rte_dev_event uevent;
 	int ret;
 	char buf[EAL_UEV_MSG_LEN];
+	struct rte_bus *bus;
+	struct rte_device *dev;
+	const char *busname;
 
 	memset(&uevent, 0, sizeof(struct rte_dev_event));
 	memset(buf, 0, EAL_UEV_MSG_LEN);
@@ -171,13 +275,50 @@ dev_uev_handler(__rte_unused void *param)
 	RTE_LOG(DEBUG, EAL, "receive uevent(name:%s, type:%d, subsystem:%d)\n",
 		uevent.devname, uevent.type, uevent.subsystem);
 
-	if (uevent.devname)
+	switch (uevent.subsystem) {
+	case EAL_DEV_EVENT_SUBSYSTEM_PCI:
+	case EAL_DEV_EVENT_SUBSYSTEM_UIO:
+		busname = "pci";
+		break;
+	default:
+		break;
+	}
+
+	if (uevent.devname) {
+		if (uevent.type == RTE_DEV_EVENT_REMOVE) {
+			bus = rte_bus_find_by_name(busname);
+			if (bus == NULL) {
+				RTE_LOG(ERR, EAL, "Cannot find bus (%s)\n",
+					uevent.devname);
+				return;
+			}
+			dev = bus->find_device(NULL, cmp_dev_name,
+					       uevent.devname);
+			if (dev == NULL) {
+				RTE_LOG(ERR, EAL,
+					"Cannot find unplugged device (%s)\n",
+					uevent.devname);
+				return;
+			}
+			rte_spinlock_lock(&dev_failure_lock);
+			ret = dev_uev_failure_process(dev, NULL);
+			rte_spinlock_unlock(&dev_failure_lock);
+			if (ret) {
+				RTE_LOG(ERR, EAL, "Driver cannot remap the "
+					"device (%s)\n",
+					dev->name);
+				return;
+			}
+		}
 		dev_callback_process(uevent.devname, uevent.type);
+	}
 }
 
 int __rte_experimental
 rte_dev_event_monitor_start(void)
 {
+	sigset_t mask;
+	struct sigaction action;
 	int ret;
 
 	if (monitor_started)
@@ -197,6 +338,14 @@ rte_dev_event_monitor_start(void)
 		return -1;
 	}
 
+	/* register sigbus handler */
+	sigemptyset(&mask);
+	sigaddset(&mask, SIGBUS);
+	action.sa_flags = SA_SIGINFO;
+	action.sa_mask = mask;
+	action.sa_sigaction = sigbus_handler;
+	sigbus_need_recover = !sigaction(SIGBUS, &action, &sigbus_action_old);
+
 	monitor_started = true;
 
 	return 0;
@@ -217,8 +366,11 @@ rte_dev_event_monitor_stop(void)
 		return ret;
 	}
 
+	sigbus_action_recover();
+
 	close(intr_handle.fd);
 	intr_handle.fd = -1;
 	monitor_started = false;
+
 	return 0;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH V21 3/4] igb_uio: fix uio release issue when hot unplug
  2018-05-03  8:57       ` [PATCH V21 0/4] hot plug recovery mechanism Jeff Guo
  2018-05-03  8:57         ` [PATCH V21 1/4] bus/pci: handle device hot unplug Jeff Guo
  2018-05-03  8:57         ` [PATCH V21 2/4] eal: add failure handle mechanism for hot plug Jeff Guo
@ 2018-05-03  8:57         ` Jeff Guo
  2018-05-03  8:57         ` [PATCH V21 4/4] app/testpmd: show example to handle " Jeff Guo
  3 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-05-03  8:57 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

when device being hot unplug, release a none exist uio resource will
result kernel null pointer error, so this patch will check if device
has been remove before release uio release procedure, if so just return
back.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v21->v20:
no change
---
 kernel/linux/igb_uio/igb_uio.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/kernel/linux/igb_uio/igb_uio.c b/kernel/linux/igb_uio/igb_uio.c
index cd9b7e7..f0b1cfe 100644
--- a/kernel/linux/igb_uio/igb_uio.c
+++ b/kernel/linux/igb_uio/igb_uio.c
@@ -344,6 +344,10 @@ igbuio_pci_release(struct uio_info *info, struct inode *inode)
 	struct rte_uio_pci_dev *udev = info->priv;
 	struct pci_dev *dev = udev->pdev;
 
+	/* check if device has been remove before release */
+	if ((&dev->dev.kobj)->state_remove_uevent_sent == 1)
+		return -1;
+
 	mutex_lock(&udev->lock);
 	if (--udev->refcnt > 0) {
 		mutex_unlock(&udev->lock);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH V21 4/4] app/testpmd: show example to handle hot unplug
  2018-05-03  8:57       ` [PATCH V21 0/4] hot plug recovery mechanism Jeff Guo
                           ` (2 preceding siblings ...)
  2018-05-03  8:57         ` [PATCH V21 3/4] igb_uio: fix uio release issue when hot unplug Jeff Guo
@ 2018-05-03  8:57         ` Jeff Guo
  2018-05-16 14:30           ` Iremonger, Bernard
  3 siblings, 1 reply; 494+ messages in thread
From: Jeff Guo @ 2018-05-03  8:57 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

Use testpmd for example, to show how an application smoothly handle
failure when device being hot unplug. Once app detect the removal event,
the callback would be called, it first stop the packet forwarding, then
stop the port, close the port and finally detach the port.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v21->v20:
fix attach port issue, let it work for multiple device case.
---
 app/test-pmd/testpmd.c | 28 +++++++++++++++++++++++-----
 1 file changed, 23 insertions(+), 5 deletions(-)

diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index db23f23..81f41e3 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -1908,9 +1908,10 @@ eth_dev_event_callback_unregister(void)
 void
 attach_port(char *identifier)
 {
-	portid_t pi = 0;
 	unsigned int socket_id;
 
+	portid_t pi = rte_eth_dev_count_avail();
+
 	printf("Attaching a new port...\n");
 
 	if (identifier == NULL) {
@@ -2079,6 +2080,26 @@ rmv_event_callback(void *arg)
 			dev->device->name);
 }
 
+static void
+rmv_dev_event_callback(char *dev_name)
+{
+	uint16_t port_id;
+	int ret;
+
+	ret = rte_eth_dev_get_port_by_name(dev_name, &port_id);
+	if (ret) {
+		printf("can not get port by device %s!\n", dev_name);
+		return;
+	}
+
+	RTE_ETH_VALID_PORTID_OR_RET(port_id);
+	printf("removing port id:%u\n", port_id);
+	stop_packet_forwarding();
+	stop_port(port_id);
+	close_port(port_id);
+	detach_port(port_id);
+}
+
 /* This function is used by the interrupt thread */
 static int
 eth_event_callback(portid_t port_id, enum rte_eth_event_type type, void *param,
@@ -2141,9 +2162,7 @@ eth_dev_event_callback(char *device_name, enum rte_dev_event_type type,
 	case RTE_DEV_EVENT_REMOVE:
 		RTE_LOG(ERR, EAL, "The device: %s has been removed!\n",
 			device_name);
-		/* TODO: After finish failure handle, begin to stop
-		 * packet forward, stop port, close port, detach port.
-		 */
+		rmv_dev_event_callback(device_name);
 		break;
 	case RTE_DEV_EVENT_ADD:
 		RTE_LOG(ERR, EAL, "The device: %s has been added!\n",
@@ -2666,7 +2685,6 @@ main(int argc, char** argv)
 			return -1;
 		}
 		eth_dev_event_callback_register();
-
 	}
 
 	if (start_port(RTE_PORT_ALL) != 0)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* Re: [PATCH V20 4/4] app/testpmd: show example to handler hot unplug
  2018-05-03  7:25           ` Matan Azrad
@ 2018-05-03  9:35             ` Guo, Jia
  2018-05-03 11:27               ` Matan Azrad
  0 siblings, 1 reply; 494+ messages in thread
From: Guo, Jia @ 2018-05-03  9:35 UTC (permalink / raw)
  To: Matan Azrad, stephen, bruce.richardson, ferruh.yigit,
	konstantin.ananyev, gaetan.rivet, jingjing.wu, Thomas Monjalon,
	Mordechay Haimovsky, harry.van.haaren, jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, helin.zhang

hi, matan


On 5/3/2018 3:25 PM, Matan Azrad wrote:
> Hi Jeff
>
>> From: Jeff Guo, Wednesday, April 18, 2018 4:38 PM
>> Use testpmd for example, to show how an application smoothly handle
>> failure when device being hot unplug. Once app detect the removal event,
>> the callback would be called, it first stop the packet forwarding, then stop the
>> port, close the port and finally detach the port.
>>
>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
>> ---
>> v20->v19:
>> remove the auto binding example.
>> ---
>>   app/test-pmd/testpmd.c | 29 +++++++++++++++++++++++++----
>>   1 file changed, 25 insertions(+), 4 deletions(-)
>>
>> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
>> 5986ff7..3751901 100644
>> --- a/app/test-pmd/testpmd.c
>> +++ b/app/test-pmd/testpmd.c
>> @@ -1125,6 +1125,9 @@ run_pkt_fwd_on_lcore(struct fwd_lcore *fc,
>> packet_fwd_t pkt_fwd)
>>   	tics_datum = rte_rdtsc();
>>   	tics_per_1sec = rte_get_timer_hz();
>>   #endif
>> +	if (hot_plug)
>> +		rte_dev_handle_hot_unplug();
>> +
> Again, I don't understand why the application should configure it - it already started the hot-plug,
> Can't the EAL handle this automatically when the user starts the hot-plug?
please check v21,  agree with you and have already modify it.
>>   	fsm = &fwd_streams[fc->stream_idx];
>>   	nb_fs = fc->stream_nb;
>>   	do {
>> @@ -2069,6 +2072,26 @@ rmv_event_callback(void *arg)
>>   			dev->device->name);
>>   }
>>
>> +static void
>> +rmv_dev_event_callback(char *dev_name)
>> +{
>> +	uint16_t port_id;
>> +	int ret;
>> +
>> +	ret = rte_eth_dev_get_port_by_name(dev_name, &port_id);
>> +	if (ret) {
>> +		printf("can not get port by device %s!\n", dev_name);
>> +		return;
>> +	}
>> +
>> +	RTE_ETH_VALID_PORTID_OR_RET(port_id);
>> +	printf("removing port id:%u\n", port_id);
>> +	stop_packet_forwarding();
>> +	stop_port(port_id);
>> +	close_port(port_id);
>> +	detach_port(port_id);
>> +}
> We have also the rmv_event_callback() which is triggered by a RMV interrupt and running by the host thread.
> What is the context thread of rmv_dev_event_callback()?
> Shouldn't they be synchronized? Should we need both in the same time?
the context thread is interrupt thread.  and we might be discuss how to 
sync it. do you have comment if i combine these 2 into  1 callback?
>> +
>>   /* This function is used by the interrupt thread */  static int
>> eth_event_callback(portid_t port_id, enum rte_eth_event_type type, void
>> *param, @@ -2130,9 +2153,7 @@ eth_dev_event_callback(char
>> *device_name, enum rte_dev_event_type type,
>>   	case RTE_DEV_EVENT_REMOVE:
>>   		RTE_LOG(ERR, EAL, "The device: %s has been removed!\n",
>>   			device_name);
>> -		/* TODO: After finish failure handle, begin to stop
>> -		 * packet forward, stop port, close port, detach port.
>> -		 */
>> +		rmv_dev_event_callback(device_name);
>>   		break;
>>   	case RTE_DEV_EVENT_ADD:
>>   		RTE_LOG(ERR, EAL, "The device: %s has been added!\n",
>> @@ -2640,7 +2661,7 @@ main(int argc, char** argv)
>>   			return -1;
>>   		}
>>   		eth_dev_event_callback_register();
>> -
>> +		rte_dev_handle_hot_unplug();
>>   	}
>>
>>   	if (start_port(RTE_PORT_ALL) != 0)
>> --
>> 2.7.4

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH V21 0/4] hot plug recovery mechanism
  2017-06-29  4:37     ` [PATCH v3 0/2] add uevent api for hot plug Jeff Guo
                         ` (4 preceding siblings ...)
  2018-05-03  8:57       ` [PATCH V21 0/4] hot plug recovery mechanism Jeff Guo
@ 2018-05-03 10:48       ` Jeff Guo
  2018-05-03 10:48         ` [PATCH V21 1/4] bus/pci: handle device hot unplug Jeff Guo
                           ` (3 more replies)
  2018-06-22 11:51       ` [PATCH v2 0/4] hot plug failure handle mechanism Jeff Guo
                         ` (17 subsequent siblings)
  23 siblings, 4 replies; 494+ messages in thread
From: Jeff Guo @ 2018-05-03 10:48 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

At the prior, device event monitor framework have been introduced,
the typical usage is for device hot plug. If we want application
would not be break down when device hot plug in or out, we still need
some measures to help app to handle that, such as recovery device for
device detaching, so that app can keep running smoothly but not be
disturbed by any hotplug behaviors.

This patch set will introduces an recovery mechanism to handle hot unplug,
and also use testpmd to show example of how to use this mechanism to process
hot plug event. The process could be shown as below:

plug out->failure handle->stop forward->stop port->close port->detach port

with this mechanism, user such as fail-safe driver or testpmd could be
able to develop their own hot plug application.

patchset history:
v21->v20:
split function in hot unplug ops
sync failure hanlde to fix multiple process issue
fix attach port issue for multiple devices case.
combind rmv callback function to be only one.

v20->v19:
clean the code
refine the remap logic for multiple device.
remove the auto binding

v19->18:
note for limitation of multiple hotplug,fix some typo, sqeeze patch.

v18->v15:
add document, add signal bus handler, refine the code to be more clear.

the prior patch history please check the patch set "add device event monitor framework"

Jeff Guo (4):
  bus/pci: handle device hot unplug
  eal: add failure handle mechanism for hot plug
  igb_uio: fix uio release issue when hot unplug
  app/testpmd: show example to handle hot unplug

 app/test-pmd/testpmd.c                  |  27 ++++--
 drivers/bus/pci/pci_common.c            |  65 ++++++++++++++
 drivers/bus/pci/pci_common_uio.c        |  33 +++++++
 drivers/bus/pci/private.h               |  12 +++
 kernel/linux/igb_uio/igb_uio.c          |   4 +
 lib/librte_eal/common/include/rte_bus.h |  16 ++++
 lib/librte_eal/linuxapp/eal/eal_dev.c   | 154 +++++++++++++++++++++++++++++++-
 7 files changed, 301 insertions(+), 10 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH V21 1/4] bus/pci: handle device hot unplug
  2018-05-03 10:48       ` [PATCH V21 0/4] hot plug recovery mechanism Jeff Guo
@ 2018-05-03 10:48         ` Jeff Guo
  2018-05-03 10:48         ` [PATCH V21 2/4] eal: add failure handle mechanism for hot plug Jeff Guo
                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-05-03 10:48 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

As of device hot unplug, we need some preparatory measures, so that when
we encounter memory fault (like SIGBUS error) due to the unplug action,
we can recover instead of crash.

To handle device hot unplug is bus-specific behavior, this patch introduces
a bus ops so that each kind of bus can implement its own logic. Further,
this patch implements the ops for PCI bus: remap a dummy memory to avoid
bus read/write error.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v21->v20:
split function in hot unplug ops
---
 drivers/bus/pci/pci_common.c            | 65 +++++++++++++++++++++++++++++++++
 drivers/bus/pci/pci_common_uio.c        | 33 +++++++++++++++++
 drivers/bus/pci/private.h               | 12 ++++++
 lib/librte_eal/common/include/rte_bus.h | 16 ++++++++
 4 files changed, 126 insertions(+)

diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
index 7215aae..74d9aa8 100644
--- a/drivers/bus/pci/pci_common.c
+++ b/drivers/bus/pci/pci_common.c
@@ -472,6 +472,70 @@ pci_find_device(const struct rte_device *start, rte_dev_cmp_t cmp,
 	return NULL;
 }
 
+static struct rte_pci_device *
+pci_find_device_by_addr(void *failure_addr)
+{
+	struct rte_pci_device *pdev = NULL;
+	int i;
+
+	FOREACH_DEVICE_ON_PCIBUS(pdev) {
+		for (i = 0; i != RTE_DIM(pdev->mem_resource); i++) {
+			if ((uint64_t)(uintptr_t)failure_addr >=
+			    (uint64_t)(uintptr_t)pdev->mem_resource[i].addr &&
+			    (uint64_t)(uintptr_t)failure_addr <
+			    (uint64_t)(uintptr_t)pdev->mem_resource[i].addr +
+			    pdev->mem_resource[i].len) {
+				RTE_LOG(ERR, EAL, "Failure address "
+					"%16.16"PRIx64" belongs to "
+					"device %s!\n",
+					(uint64_t)(uintptr_t)failure_addr,
+					pdev->device.name);
+				return pdev;
+			}
+		}
+	}
+	return NULL;
+}
+static int
+pci_handle_hot_unplug(struct rte_device *dev, void *failure_addr)
+{
+	struct rte_pci_device *pdev = NULL;
+	int ret = 0;
+
+	if (dev != NULL)
+		pdev = RTE_DEV_TO_PCI(dev);
+	else
+		pdev = pci_find_device_by_addr(failure_addr);
+
+	if (!pdev)
+		return -1;
+
+	/* remap resources for devices */
+	switch (pdev->kdrv) {
+	case RTE_KDRV_VFIO:
+#ifdef VFIO_PRESENT
+		/* TODO */
+		ret = -1;
+#endif
+		break;
+	case RTE_KDRV_IGB_UIO:
+	case RTE_KDRV_UIO_GENERIC:
+	case RTE_KDRV_NIC_UIO:
+		ret = pci_uio_remap_resource(pdev);
+		break;
+	default:
+		RTE_LOG(DEBUG, EAL,
+			"Not managed by a supported kernel driver, skipped\n");
+		ret = -1;
+		break;
+	}
+
+	if (ret != 0)
+		RTE_LOG(ERR, EAL, "Failed to handle hot unplug of device %s",
+			pdev->name);
+	return ret;
+}
+
 static int
 pci_plug(struct rte_device *dev)
 {
@@ -502,6 +566,7 @@ struct rte_pci_bus rte_pci_bus = {
 		.unplug = pci_unplug,
 		.parse = pci_parse,
 		.get_iommu_class = rte_pci_get_iommu_class,
+		.handle_hot_unplug = pci_handle_hot_unplug,
 	},
 	.device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
 	.driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
diff --git a/drivers/bus/pci/pci_common_uio.c b/drivers/bus/pci/pci_common_uio.c
index 54bc20b..7ea73db 100644
--- a/drivers/bus/pci/pci_common_uio.c
+++ b/drivers/bus/pci/pci_common_uio.c
@@ -146,6 +146,39 @@ pci_uio_unmap(struct mapped_pci_resource *uio_res)
 	}
 }
 
+/* remap the PCI resource of a PCI device in anonymous virtual memory */
+int
+pci_uio_remap_resource(struct rte_pci_device *dev)
+{
+	int i;
+	void *map_address;
+
+	if (dev == NULL)
+		return -1;
+
+	/* Remap all BARs */
+	for (i = 0; i != PCI_MAX_RESOURCE; i++) {
+		/* skip empty BAR */
+		if (dev->mem_resource[i].phys_addr == 0)
+			continue;
+		map_address = mmap(dev->mem_resource[i].addr,
+				(size_t)dev->mem_resource[i].len,
+				PROT_READ | PROT_WRITE,
+				MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
+		if (map_address == MAP_FAILED) {
+			RTE_LOG(ERR, EAL,
+				"Cannot remap resource for device %s\n",
+				dev->name);
+			return -1;
+		}
+		RTE_LOG(INFO, EAL,
+			"Successful remap resource for device %s\n",
+			dev->name);
+	}
+
+	return 0;
+}
+
 static struct mapped_pci_resource *
 pci_uio_find_resource(struct rte_pci_device *dev)
 {
diff --git a/drivers/bus/pci/private.h b/drivers/bus/pci/private.h
index 88fa587..5551506 100644
--- a/drivers/bus/pci/private.h
+++ b/drivers/bus/pci/private.h
@@ -173,6 +173,18 @@ void pci_uio_free_resource(struct rte_pci_device *dev,
 		struct mapped_pci_resource *uio_res);
 
 /**
+ * Remap the PCI resource of a PCI device in anonymous virtual memory.
+ *
+ * @param dev
+ *   Point to the struct rte pci device.
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+pci_uio_remap_resource(struct rte_pci_device *dev);
+
+/**
  * Map device memory to uio resource
  *
  * This function is private to EAL.
diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
index eb9eded..6a5609f 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -168,6 +168,20 @@ typedef int (*rte_bus_unplug_t)(struct rte_device *dev);
 typedef int (*rte_bus_parse_t)(const char *name, void *addr);
 
 /**
+ * Implementation a specific hot unplug handler, which is responsible
+ * for handle the failure when hot unplug the device, guaranty the system
+ * would not hung in the case.
+ * @param dev
+ *	Pointer of the device structure.
+ *
+ * @return
+ *	0 on success.
+ *	!0 on error.
+ */
+typedef int (*rte_bus_handle_hot_unplug_t)(struct rte_device *dev,
+						void *dev_addr);
+
+/**
  * Bus scan policies
  */
 enum rte_bus_scan_mode {
@@ -209,6 +223,8 @@ struct rte_bus {
 	rte_bus_plug_t plug;         /**< Probe single device for drivers */
 	rte_bus_unplug_t unplug;     /**< Remove single device from driver */
 	rte_bus_parse_t parse;       /**< Parse a device name */
+	rte_bus_handle_hot_unplug_t handle_hot_unplug; /**< handle hot unplug
+							device event */
 	struct rte_bus_conf conf;    /**< Bus configuration */
 	rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
 };
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH V21 2/4] eal: add failure handle mechanism for hot plug
  2018-05-03 10:48       ` [PATCH V21 0/4] hot plug recovery mechanism Jeff Guo
  2018-05-03 10:48         ` [PATCH V21 1/4] bus/pci: handle device hot unplug Jeff Guo
@ 2018-05-03 10:48         ` Jeff Guo
  2018-05-04 15:56           ` Ananyev, Konstantin
  2018-05-03 10:48         ` [PATCH V21 3/4] igb_uio: fix uio release issue when hot unplug Jeff Guo
  2018-05-03 10:48         ` [PATCH V21 4/4] app/testpmd: show example to handle " Jeff Guo
  3 siblings, 1 reply; 494+ messages in thread
From: Jeff Guo @ 2018-05-03 10:48 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

This patch introduces a failure handler mechanism to handle device
hot unplug event. When device be hot plug out, the device resource
become invalid, if this resource is still be unexpected read/write,
system will crash. This patch let eal help application to handle
this fault, when sigbus error occur, check the failure address and
accordingly remap the invalid memory for the corresponding device,
that could guaranty the application not to be shut down when hot plug.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v21->v20:
sync failure hanlde to fix multiple process issue
---
 lib/librte_eal/linuxapp/eal/eal_dev.c | 154 +++++++++++++++++++++++++++++++++-
 1 file changed, 153 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
index 1cf6aeb..3067f39 100644
--- a/lib/librte_eal/linuxapp/eal/eal_dev.c
+++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
@@ -4,6 +4,8 @@
 
 #include <string.h>
 #include <unistd.h>
+#include <fcntl.h>
+#include <signal.h>
 #include <sys/socket.h>
 #include <linux/netlink.h>
 
@@ -14,15 +16,27 @@
 #include <rte_malloc.h>
 #include <rte_interrupts.h>
 #include <rte_alarm.h>
+#include <rte_bus.h>
+#include <rte_eal.h>
+#include <rte_spinlock.h>
 
 #include "eal_private.h"
 
 static struct rte_intr_handle intr_handle = {.fd = -1 };
 static bool monitor_started;
 
+extern struct rte_bus_list rte_bus_list;
+
 #define EAL_UEV_MSG_LEN 4096
 #define EAL_UEV_MSG_ELEM_LEN 128
 
+/* spinlock for device failure process */
+static rte_spinlock_t dev_failure_lock = RTE_SPINLOCK_INITIALIZER;
+
+static struct sigaction sigbus_action_old;
+
+static int sigbus_need_recover;
+
 static void dev_uev_handler(__rte_unused void *param);
 
 /* identify the system layer which reports this event. */
@@ -34,6 +48,93 @@ enum eal_dev_event_subsystem {
 };
 
 static int
+dev_uev_failure_process(struct rte_device *dev, void *dev_addr)
+{
+	struct rte_bus *bus;
+	int ret = 0;
+
+	if (!dev && !dev_addr) {
+		return -EINVAL;
+	} else if (dev) {
+		bus = rte_bus_find_by_device_name(dev->name);
+		if (bus->handle_hot_unplug) {
+			/**
+			 * call bus ops to handle hot unplug.
+			 */
+			ret = bus->handle_hot_unplug(dev, dev_addr);
+			if (ret) {
+				RTE_LOG(ERR, EAL,
+					"Cannot handle hot unplug "
+					"for device %s "
+					"on the bus %s.\n ",
+					dev->name, bus->name);
+			}
+		} else {
+			RTE_LOG(ERR, EAL,
+				"Not support handle hot unplug for bus %s!\n",
+				bus->name);
+			ret = -ENOTSUP;
+		}
+	} else {
+		TAILQ_FOREACH(bus, &rte_bus_list, next) {
+			if (bus->handle_hot_unplug) {
+				/**
+				 * call bus ops to handle hot unplug.
+				 */
+				ret = bus->handle_hot_unplug(dev, dev_addr);
+				if (ret)
+					RTE_LOG(ERR, EAL,
+						"Cannot handle hot unplug "
+						"for the device "
+						"on the bus %s!\n", bus->name);
+				else
+					break;
+			} else {
+				RTE_LOG(ERR, EAL,
+					"Not support handle hot unplug "
+					"for bus %s!\n", bus->name);
+				ret = -ENOTSUP;
+			}
+		}
+	}
+	return ret;
+}
+
+static void
+sigbus_action_recover(void)
+{
+	if (sigbus_need_recover) {
+		sigaction(SIGBUS, &sigbus_action_old, NULL);
+		sigbus_need_recover = 0;
+	}
+}
+
+static void sigbus_handler(int signum __rte_unused, siginfo_t *info,
+				void *ctx __rte_unused)
+{
+	int ret;
+
+	RTE_LOG(DEBUG, EAL, "Thread[%d] catch SIGBUS, fault address:%p\n",
+		(int)pthread_self(), info->si_addr);
+	rte_spinlock_lock(&dev_failure_lock);
+	ret = dev_uev_failure_process(NULL, info->si_addr);
+	rte_spinlock_unlock(&dev_failure_lock);
+	if (!ret)
+		RTE_LOG(DEBUG, EAL,
+			"Success to handle SIGBUS error for hot unplug!\n");
+	else
+		rte_exit(EXIT_FAILURE, "exit for SIGBUS error!");
+}
+
+static int cmp_dev_name(const struct rte_device *dev,
+	const void *_name)
+{
+	const char *name = _name;
+
+	return strcmp(dev->name, name);
+}
+
+static int
 dev_uev_socket_fd_create(void)
 {
 	struct sockaddr_nl addr;
@@ -147,6 +248,9 @@ dev_uev_handler(__rte_unused void *param)
 	struct rte_dev_event uevent;
 	int ret;
 	char buf[EAL_UEV_MSG_LEN];
+	struct rte_bus *bus;
+	struct rte_device *dev;
+	const char *busname;
 
 	memset(&uevent, 0, sizeof(struct rte_dev_event));
 	memset(buf, 0, EAL_UEV_MSG_LEN);
@@ -171,13 +275,50 @@ dev_uev_handler(__rte_unused void *param)
 	RTE_LOG(DEBUG, EAL, "receive uevent(name:%s, type:%d, subsystem:%d)\n",
 		uevent.devname, uevent.type, uevent.subsystem);
 
-	if (uevent.devname)
+	switch (uevent.subsystem) {
+	case EAL_DEV_EVENT_SUBSYSTEM_PCI:
+	case EAL_DEV_EVENT_SUBSYSTEM_UIO:
+		busname = "pci";
+		break;
+	default:
+		break;
+	}
+
+	if (uevent.devname) {
+		if (uevent.type == RTE_DEV_EVENT_REMOVE) {
+			bus = rte_bus_find_by_name(busname);
+			if (bus == NULL) {
+				RTE_LOG(ERR, EAL, "Cannot find bus (%s)\n",
+					uevent.devname);
+				return;
+			}
+			dev = bus->find_device(NULL, cmp_dev_name,
+					       uevent.devname);
+			if (dev == NULL) {
+				RTE_LOG(ERR, EAL,
+					"Cannot find unplugged device (%s)\n",
+					uevent.devname);
+				return;
+			}
+			rte_spinlock_lock(&dev_failure_lock);
+			ret = dev_uev_failure_process(dev, NULL);
+			rte_spinlock_unlock(&dev_failure_lock);
+			if (ret) {
+				RTE_LOG(ERR, EAL, "Driver cannot remap the "
+					"device (%s)\n",
+					dev->name);
+				return;
+			}
+		}
 		dev_callback_process(uevent.devname, uevent.type);
+	}
 }
 
 int __rte_experimental
 rte_dev_event_monitor_start(void)
 {
+	sigset_t mask;
+	struct sigaction action;
 	int ret;
 
 	if (monitor_started)
@@ -197,6 +338,14 @@ rte_dev_event_monitor_start(void)
 		return -1;
 	}
 
+	/* register sigbus handler */
+	sigemptyset(&mask);
+	sigaddset(&mask, SIGBUS);
+	action.sa_flags = SA_SIGINFO;
+	action.sa_mask = mask;
+	action.sa_sigaction = sigbus_handler;
+	sigbus_need_recover = !sigaction(SIGBUS, &action, &sigbus_action_old);
+
 	monitor_started = true;
 
 	return 0;
@@ -217,8 +366,11 @@ rte_dev_event_monitor_stop(void)
 		return ret;
 	}
 
+	sigbus_action_recover();
+
 	close(intr_handle.fd);
 	intr_handle.fd = -1;
 	monitor_started = false;
+
 	return 0;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH V21 3/4] igb_uio: fix uio release issue when hot unplug
  2018-05-03 10:48       ` [PATCH V21 0/4] hot plug recovery mechanism Jeff Guo
  2018-05-03 10:48         ` [PATCH V21 1/4] bus/pci: handle device hot unplug Jeff Guo
  2018-05-03 10:48         ` [PATCH V21 2/4] eal: add failure handle mechanism for hot plug Jeff Guo
@ 2018-05-03 10:48         ` Jeff Guo
  2018-05-03 10:48         ` [PATCH V21 4/4] app/testpmd: show example to handle " Jeff Guo
  3 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-05-03 10:48 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

when device being hot unplug, release a none exist uio resource will
result kernel null pointer error, so this patch will check if device
has been remove before release uio release procedure, if so just return
back.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v21->v20:
no change
---
 kernel/linux/igb_uio/igb_uio.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/kernel/linux/igb_uio/igb_uio.c b/kernel/linux/igb_uio/igb_uio.c
index cd9b7e7..f0b1cfe 100644
--- a/kernel/linux/igb_uio/igb_uio.c
+++ b/kernel/linux/igb_uio/igb_uio.c
@@ -344,6 +344,10 @@ igbuio_pci_release(struct uio_info *info, struct inode *inode)
 	struct rte_uio_pci_dev *udev = info->priv;
 	struct pci_dev *dev = udev->pdev;
 
+	/* check if device has been remove before release */
+	if ((&dev->dev.kobj)->state_remove_uevent_sent == 1)
+		return -1;
+
 	mutex_lock(&udev->lock);
 	if (--udev->refcnt > 0) {
 		mutex_unlock(&udev->lock);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH V21 4/4] app/testpmd: show example to handle hot unplug
  2018-05-03 10:48       ` [PATCH V21 0/4] hot plug recovery mechanism Jeff Guo
                           ` (2 preceding siblings ...)
  2018-05-03 10:48         ` [PATCH V21 3/4] igb_uio: fix uio release issue when hot unplug Jeff Guo
@ 2018-05-03 10:48         ` Jeff Guo
  2018-06-14 12:59           ` Iremonger, Bernard
  3 siblings, 1 reply; 494+ messages in thread
From: Jeff Guo @ 2018-05-03 10:48 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

Use testpmd for example, to show how an application smoothly handle
failure when device being hot unplug. Once app detect the removal event,
the callback would be called, it first stop the packet forwarding, then
stop the port, close the port and finally detach the port.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v21->v20:
fix attach port issue, let it work for multiple device case.
combind rmv callback to only one.
---
 app/test-pmd/testpmd.c | 27 ++++++++++++++++++---------
 1 file changed, 18 insertions(+), 9 deletions(-)

diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index db23f23..a1ff8f3 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -1908,9 +1908,10 @@ eth_dev_event_callback_unregister(void)
 void
 attach_port(char *identifier)
 {
-	portid_t pi = 0;
 	unsigned int socket_id;
 
+	portid_t pi = rte_eth_dev_count_avail();
+
 	printf("Attaching a new port...\n");
 
 	if (identifier == NULL) {
@@ -2071,12 +2072,14 @@ rmv_event_callback(void *arg)
 	RTE_ETH_VALID_PORTID_OR_RET(port_id);
 	dev = &rte_eth_devices[port_id];
 
+	if (dev->state == RTE_ETH_DEV_UNUSED)
+		return;
+
+	printf("removing device %s\n", dev->device->name);
+	stop_packet_forwarding();
 	stop_port(port_id);
 	close_port(port_id);
-	printf("removing device %s\n", dev->device->name);
-	if (rte_eal_dev_detach(dev->device))
-		TESTPMD_LOG(ERR, "Failed to detach device %s\n",
-			dev->device->name);
+	detach_port(port_id);
 }
 
 /* This function is used by the interrupt thread */
@@ -2131,19 +2134,26 @@ static void
 eth_dev_event_callback(char *device_name, enum rte_dev_event_type type,
 			     __rte_unused void *arg)
 {
+	uint16_t port_id;
+	int ret;
+
 	if (type >= RTE_DEV_EVENT_MAX) {
 		fprintf(stderr, "%s called upon invalid event %d\n",
 			__func__, type);
 		fflush(stderr);
 	}
 
+	ret = rte_eth_dev_get_port_by_name(device_name, &port_id);
+	if (ret) {
+		printf("can not get port by device %s!\n", device_name);
+		return;
+	}
+
 	switch (type) {
 	case RTE_DEV_EVENT_REMOVE:
 		RTE_LOG(ERR, EAL, "The device: %s has been removed!\n",
 			device_name);
-		/* TODO: After finish failure handle, begin to stop
-		 * packet forward, stop port, close port, detach port.
-		 */
+		rmv_event_callback((void *)(intptr_t)port_id);
 		break;
 	case RTE_DEV_EVENT_ADD:
 		RTE_LOG(ERR, EAL, "The device: %s has been added!\n",
@@ -2666,7 +2676,6 @@ main(int argc, char** argv)
 			return -1;
 		}
 		eth_dev_event_callback_register();
-
 	}
 
 	if (start_port(RTE_PORT_ALL) != 0)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* Re: [PATCH V20 4/4] app/testpmd: show example to handler hot unplug
  2018-05-03  9:35             ` Guo, Jia
@ 2018-05-03 11:27               ` Matan Azrad
  0 siblings, 0 replies; 494+ messages in thread
From: Matan Azrad @ 2018-05-03 11:27 UTC (permalink / raw)
  To: Guo, Jia, stephen, bruce.richardson, ferruh.yigit,
	konstantin.ananyev, gaetan.rivet, jingjing.wu, Thomas Monjalon,
	Mordechay Haimovsky, harry.van.haaren, jianfeng.tan
  Cc: jblunck, shreyansh.jain, dev, helin.zhang

Hi Guo

From: Guo, Jia, Thursday, May 3, 2018 12:36 PM
> hi, matan
> 
> 
> On 5/3/2018 3:25 PM, Matan Azrad wrote:
> > Hi Jeff
> >
> >> From: Jeff Guo, Wednesday, April 18, 2018 4:38 PM Use testpmd for
> >> example, to show how an application smoothly handle failure when
> >> device being hot unplug. Once app detect the removal event, the
> >> callback would be called, it first stop the packet forwarding, then
> >> stop the port, close the port and finally detach the port.
> >>
> >> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> >> ---
> >> v20->v19:
> >> remove the auto binding example.
> >> ---
> >>   app/test-pmd/testpmd.c | 29 +++++++++++++++++++++++++----
> >>   1 file changed, 25 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
> >> 5986ff7..3751901 100644
> >> --- a/app/test-pmd/testpmd.c
> >> +++ b/app/test-pmd/testpmd.c
> >> @@ -1125,6 +1125,9 @@ run_pkt_fwd_on_lcore(struct fwd_lcore *fc,
> >> packet_fwd_t pkt_fwd)
> >>   	tics_datum = rte_rdtsc();
> >>   	tics_per_1sec = rte_get_timer_hz();
> >>   #endif
> >> +	if (hot_plug)
> >> +		rte_dev_handle_hot_unplug();
> >> +
> > Again, I don't understand why the application should configure it - it
> > already started the hot-plug, Can't the EAL handle this automatically when
> the user starts the hot-plug?
> please check v21,  agree with you and have already modify it.

Looks good, thanks.

> >>   	fsm = &fwd_streams[fc->stream_idx];
> >>   	nb_fs = fc->stream_nb;
> >>   	do {
> >> @@ -2069,6 +2072,26 @@ rmv_event_callback(void *arg)
> >>   			dev->device->name);
> >>   }
> >>
> >> +static void
> >> +rmv_dev_event_callback(char *dev_name) {
> >> +	uint16_t port_id;
> >> +	int ret;
> >> +
> >> +	ret = rte_eth_dev_get_port_by_name(dev_name, &port_id);
> >> +	if (ret) {
> >> +		printf("can not get port by device %s!\n", dev_name);
> >> +		return;
> >> +	}
> >> +
> >> +	RTE_ETH_VALID_PORTID_OR_RET(port_id);
> >> +	printf("removing port id:%u\n", port_id);
> >> +	stop_packet_forwarding();
> >> +	stop_port(port_id);
> >> +	close_port(port_id);
> >> +	detach_port(port_id);
> >> +}
> > We have also the rmv_event_callback() which is triggered by a RMV
> interrupt and running by the host thread.
> > What is the context thread of rmv_dev_event_callback()?
> > Shouldn't they be synchronized? Should we need both in the same time?
> the context thread is interrupt thread.  and we might be discuss how to sync
> it. do you have comment if i combine these 2 into  1 callback?


Please see the patch series I sent today regarding rmv_event_callback() function:
" [PATCH 0/6] Testpmd: fix port hotplug".


Yes, I think you should use rmv_event_callback() by your function (after the port id retrieving) to do code reuse.

Regarding synchronization, let's discuss:

So, the both callbacks are running from the same thread, but by different fd.
Right?
So, they will be triggered sequentially by the kernel when a device is plugged-out and we cannot know the order.
Right?
The second one may get an "invalid port" error  because the first one was detached the port - not a major issue,
but the port id can be reused between them and then the second one may detach an available port.
Right?

So, looks like it is better to choose only one of them,

If all the above conclusions are correct , I suggest to disable the ethdev mechanism when the EAL hotplug is enabled:


	@@ -2152,9 +2152,10 @@ struct pmd_test_command {
	 
	        switch (type) {
	        case RTE_ETH_EVENT_INTR_RMV:
	-               if (rte_eal_alarm_set(100000,
	-                               rmv_event_callback, (void *)(intptr_t)port_id))
	-                       fprintf(stderr, "Could not set up deferred device removal\n");
	+               if (!hot_plug)
	+                       if (rte_eal_alarm_set(100000, rmv_event_callback,
	+                           (void *)(intptr_t)port_id))
	+                               fprintf(stderr, "Could not set up deferred device removal\n");
	                break;
	        default:
	                break;


What do you think?

Matan.


> >>   /* This function is used by the interrupt thread */  static int
> >> eth_event_callback(portid_t port_id, enum rte_eth_event_type type,
> >> void *param, @@ -2130,9 +2153,7 @@ eth_dev_event_callback(char
> >> *device_name, enum rte_dev_event_type type,
> >>   	case RTE_DEV_EVENT_REMOVE:
> >>   		RTE_LOG(ERR, EAL, "The device: %s has been removed!\n",
> >>   			device_name);
> >> -		/* TODO: After finish failure handle, begin to stop
> >> -		 * packet forward, stop port, close port, detach port.
> >> -		 */
> >> +		rmv_dev_event_callback(device_name);
> >>   		break;
> >>   	case RTE_DEV_EVENT_ADD:
> >>   		RTE_LOG(ERR, EAL, "The device: %s has been added!\n",
> @@ -2640,7
> >> +2661,7 @@ main(int argc, char** argv)
> >>   			return -1;
> >>   		}
> >>   		eth_dev_event_callback_register();
> >> -
> >> +		rte_dev_handle_hot_unplug();
> >>   	}
> >>
> >>   	if (start_port(RTE_PORT_ALL) != 0)
> >> --
> >> 2.7.4

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V21 2/4] eal: add failure handle mechanism for hot plug
  2018-05-03 10:48         ` [PATCH V21 2/4] eal: add failure handle mechanism for hot plug Jeff Guo
@ 2018-05-04 15:56           ` Ananyev, Konstantin
  2018-05-08 14:57             ` Guo, Jia
  0 siblings, 1 reply; 494+ messages in thread
From: Ananyev, Konstantin @ 2018-05-04 15:56 UTC (permalink / raw)
  To: Guo, Jia, stephen, Richardson, Bruce, Yigit, Ferruh,
	gaetan.rivet, Wu, Jingjing, thomas, motih, matan, Van Haaren,
	Harry, Tan, Jianfeng
  Cc: jblunck, shreyansh.jain, dev, Zhang, Helin

Hi Jeff,

> 
> This patch introduces a failure handler mechanism to handle device
> hot unplug event. When device be hot plug out, the device resource
> become invalid, if this resource is still be unexpected read/write,
> system will crash. This patch let eal help application to handle
> this fault, when sigbus error occur, check the failure address and
> accordingly remap the invalid memory for the corresponding device,
> that could guaranty the application not to be shut down when hot plug.
> 
> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> ---
> v21->v20:
> sync failure hanlde to fix multiple process issue
> ---
>  lib/librte_eal/linuxapp/eal/eal_dev.c | 154 +++++++++++++++++++++++++++++++++-
>  1 file changed, 153 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
> index 1cf6aeb..3067f39 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_dev.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
> @@ -4,6 +4,8 @@
> 
>  #include <string.h>
>  #include <unistd.h>
> +#include <fcntl.h>
> +#include <signal.h>
>  #include <sys/socket.h>
>  #include <linux/netlink.h>
> 
> @@ -14,15 +16,27 @@
>  #include <rte_malloc.h>
>  #include <rte_interrupts.h>
>  #include <rte_alarm.h>
> +#include <rte_bus.h>
> +#include <rte_eal.h>
> +#include <rte_spinlock.h>
> 
>  #include "eal_private.h"
> 
>  static struct rte_intr_handle intr_handle = {.fd = -1 };
>  static bool monitor_started;
> 
> +extern struct rte_bus_list rte_bus_list;
> +
>  #define EAL_UEV_MSG_LEN 4096
>  #define EAL_UEV_MSG_ELEM_LEN 128
> 
> +/* spinlock for device failure process */
> +static rte_spinlock_t dev_failure_lock = RTE_SPINLOCK_INITIALIZER;
> +
> +static struct sigaction sigbus_action_old;
> +
> +static int sigbus_need_recover;
> +
>  static void dev_uev_handler(__rte_unused void *param);
> 
>  /* identify the system layer which reports this event. */
> @@ -34,6 +48,93 @@ enum eal_dev_event_subsystem {
>  };
> 
>  static int
> +dev_uev_failure_process(struct rte_device *dev, void *dev_addr)
> +{
> +	struct rte_bus *bus;
> +	int ret = 0;
> +
> +	if (!dev && !dev_addr) {
> +		return -EINVAL;
> +	} else if (dev) {
> +		bus = rte_bus_find_by_device_name(dev->name);
> +		if (bus->handle_hot_unplug) {
> +			/**
> +			 * call bus ops to handle hot unplug.
> +			 */
> +			ret = bus->handle_hot_unplug(dev, dev_addr);
> +			if (ret) {
> +				RTE_LOG(ERR, EAL,
> +					"Cannot handle hot unplug "
> +					"for device %s "
> +					"on the bus %s.\n ",
> +					dev->name, bus->name);
> +			}
> +		} else {
> +			RTE_LOG(ERR, EAL,
> +				"Not support handle hot unplug for bus %s!\n",
> +				bus->name);
> +			ret = -ENOTSUP;
> +		}
> +	} else {
> +		TAILQ_FOREACH(bus, &rte_bus_list, next) {
> +			if (bus->handle_hot_unplug) {
> +				/**
> +				 * call bus ops to handle hot unplug.
> +				 */
> +				ret = bus->handle_hot_unplug(dev, dev_addr);
> +				if (ret)
> +					RTE_LOG(ERR, EAL,
> +						"Cannot handle hot unplug "
> +						"for the device "
> +						"on the bus %s!\n", bus->name);
> +				else
> +					break;
> +			} else {
> +				RTE_LOG(ERR, EAL,
> +					"Not support handle hot unplug "
> +					"for bus %s!\n", bus->name);
> +				ret = -ENOTSUP;
> +			}
> +		}
> +	}
> +	return ret;
> +}
> +
> +static void
> +sigbus_action_recover(void)
> +{
> +	if (sigbus_need_recover) {
> +		sigaction(SIGBUS, &sigbus_action_old, NULL);
> +		sigbus_need_recover = 0;
> +	}
> +}
> +
> +static void sigbus_handler(int signum __rte_unused, siginfo_t *info,
> +				void *ctx __rte_unused)
> +{
> +	int ret;
> +
> +	RTE_LOG(DEBUG, EAL, "Thread[%d] catch SIGBUS, fault address:%p\n",
> +		(int)pthread_self(), info->si_addr);
> +	rte_spinlock_lock(&dev_failure_lock);
> +	ret = dev_uev_failure_process(NULL, info->si_addr);
> +	rte_spinlock_unlock(&dev_failure_lock);
> +	if (!ret)
> +		RTE_LOG(DEBUG, EAL,
> +			"Success to handle SIGBUS error for hot unplug!\n");
> +	else
> +		rte_exit(EXIT_FAILURE, "exit for SIGBUS error!");

I still think we have to distinguish here 2 cases:
1) failure addr is not belong to any dpdk devices
2) failure addr does belong to dpdk device, but we fail to remap it.

For 1) we probably need to call previous sigbus handler.
For 2) we probably can only do exit().

> +}
> +
> +static int cmp_dev_name(const struct rte_device *dev,
> +	const void *_name)
> +{
> +	const char *name = _name;
> +
> +	return strcmp(dev->name, name);
> +}
> +
> +static int
>  dev_uev_socket_fd_create(void)
>  {
>  	struct sockaddr_nl addr;
> @@ -147,6 +248,9 @@ dev_uev_handler(__rte_unused void *param)
>  	struct rte_dev_event uevent;
>  	int ret;
>  	char buf[EAL_UEV_MSG_LEN];
> +	struct rte_bus *bus;
> +	struct rte_device *dev;
> +	const char *busname;
> 
>  	memset(&uevent, 0, sizeof(struct rte_dev_event));
>  	memset(buf, 0, EAL_UEV_MSG_LEN);
> @@ -171,13 +275,50 @@ dev_uev_handler(__rte_unused void *param)
>  	RTE_LOG(DEBUG, EAL, "receive uevent(name:%s, type:%d, subsystem:%d)\n",
>  		uevent.devname, uevent.type, uevent.subsystem);
> 
> -	if (uevent.devname)
> +	switch (uevent.subsystem) {
> +	case EAL_DEV_EVENT_SUBSYSTEM_PCI:
> +	case EAL_DEV_EVENT_SUBSYSTEM_UIO:
> +		busname = "pci";
> +		break;
> +	default:
> +		break;
> +	}
> +
> +	if (uevent.devname) {
> +		if (uevent.type == RTE_DEV_EVENT_REMOVE) {
> +			bus = rte_bus_find_by_name(busname);
> +			if (bus == NULL) {
> +				RTE_LOG(ERR, EAL, "Cannot find bus (%s)\n",
> +					uevent.devname);
> +				return;
> +			}
> +			dev = bus->find_device(NULL, cmp_dev_name,
> +					       uevent.devname);
> +			if (dev == NULL) {
> +				RTE_LOG(ERR, EAL,
> +					"Cannot find unplugged device (%s)\n",
> +					uevent.devname);
> +				return;
> +			}
> +			rte_spinlock_lock(&dev_failure_lock);
> +			ret = dev_uev_failure_process(dev, NULL);
> +			rte_spinlock_unlock(&dev_failure_lock);

That's interrupt thread, right?
I wonder could it happen that user will call device_detach() at the same moment?
Konstantin


> +			if (ret) {
> +				RTE_LOG(ERR, EAL, "Driver cannot remap the "
> +					"device (%s)\n",
> +					dev->name);
> +				return;
> +			}
> +		}
>  		dev_callback_process(uevent.devname, uevent.type);
> +	}
>  }
> 
>  int __rte_experimental
>  rte_dev_event_monitor_start(void)
>  {
> +	sigset_t mask;
> +	struct sigaction action;
>  	int ret;
> 
>  	if (monitor_started)
> @@ -197,6 +338,14 @@ rte_dev_event_monitor_start(void)
>  		return -1;
>  	}
> 
> +	/* register sigbus handler */
> +	sigemptyset(&mask);
> +	sigaddset(&mask, SIGBUS);
> +	action.sa_flags = SA_SIGINFO;
> +	action.sa_mask = mask;
> +	action.sa_sigaction = sigbus_handler;
> +	sigbus_need_recover = !sigaction(SIGBUS, &action, &sigbus_action_old);
> +
>  	monitor_started = true;
> 
>  	return 0;
> @@ -217,8 +366,11 @@ rte_dev_event_monitor_stop(void)
>  		return ret;
>  	}
> 
> +	sigbus_action_recover();
> +
>  	close(intr_handle.fd);
>  	intr_handle.fd = -1;
>  	monitor_started = false;
> +
>  	return 0;
>  }
> --
> 2.7.4

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V21 2/4] eal: add failure handle mechanism for hot plug
  2018-05-04 15:56           ` Ananyev, Konstantin
@ 2018-05-08 14:57             ` Guo, Jia
  2018-05-08 15:19               ` Ananyev, Konstantin
  0 siblings, 1 reply; 494+ messages in thread
From: Guo, Jia @ 2018-05-08 14:57 UTC (permalink / raw)
  To: Ananyev, Konstantin, stephen, Richardson, Bruce, Yigit, Ferruh,
	gaetan.rivet, Wu, Jingjing, thomas, motih, matan, Van Haaren,
	Harry, Tan, Jianfeng
  Cc: jblunck, shreyansh.jain, dev, Zhang, Helin



On 5/4/2018 11:56 PM, Ananyev, Konstantin wrote:
> Hi Jeff,
>
>> This patch introduces a failure handler mechanism to handle device
>> hot unplug event. When device be hot plug out, the device resource
>> become invalid, if this resource is still be unexpected read/write,
>> system will crash. This patch let eal help application to handle
>> this fault, when sigbus error occur, check the failure address and
>> accordingly remap the invalid memory for the corresponding device,
>> that could guaranty the application not to be shut down when hot plug.
>>
>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
>> ---
>> v21->v20:
>> sync failure hanlde to fix multiple process issue
>> ---
>>   lib/librte_eal/linuxapp/eal/eal_dev.c | 154 +++++++++++++++++++++++++++++++++-
>>   1 file changed, 153 insertions(+), 1 deletion(-)
>>
>> diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
>> index 1cf6aeb..3067f39 100644
>> --- a/lib/librte_eal/linuxapp/eal/eal_dev.c
>> +++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
>> @@ -4,6 +4,8 @@
>>
>>   #include <string.h>
>>   #include <unistd.h>
>> +#include <fcntl.h>
>> +#include <signal.h>
>>   #include <sys/socket.h>
>>   #include <linux/netlink.h>
>>
>> @@ -14,15 +16,27 @@
>>   #include <rte_malloc.h>
>>   #include <rte_interrupts.h>
>>   #include <rte_alarm.h>
>> +#include <rte_bus.h>
>> +#include <rte_eal.h>
>> +#include <rte_spinlock.h>
>>
>>   #include "eal_private.h"
>>
>>   static struct rte_intr_handle intr_handle = {.fd = -1 };
>>   static bool monitor_started;
>>
>> +extern struct rte_bus_list rte_bus_list;
>> +
>>   #define EAL_UEV_MSG_LEN 4096
>>   #define EAL_UEV_MSG_ELEM_LEN 128
>>
>> +/* spinlock for device failure process */
>> +static rte_spinlock_t dev_failure_lock = RTE_SPINLOCK_INITIALIZER;
>> +
>> +static struct sigaction sigbus_action_old;
>> +
>> +static int sigbus_need_recover;
>> +
>>   static void dev_uev_handler(__rte_unused void *param);
>>
>>   /* identify the system layer which reports this event. */
>> @@ -34,6 +48,93 @@ enum eal_dev_event_subsystem {
>>   };
>>
>>   static int
>> +dev_uev_failure_process(struct rte_device *dev, void *dev_addr)
>> +{
>> +	struct rte_bus *bus;
>> +	int ret = 0;
>> +
>> +	if (!dev && !dev_addr) {
>> +		return -EINVAL;
>> +	} else if (dev) {
>> +		bus = rte_bus_find_by_device_name(dev->name);
>> +		if (bus->handle_hot_unplug) {
>> +			/**
>> +			 * call bus ops to handle hot unplug.
>> +			 */
>> +			ret = bus->handle_hot_unplug(dev, dev_addr);
>> +			if (ret) {
>> +				RTE_LOG(ERR, EAL,
>> +					"Cannot handle hot unplug "
>> +					"for device %s "
>> +					"on the bus %s.\n ",
>> +					dev->name, bus->name);
>> +			}
>> +		} else {
>> +			RTE_LOG(ERR, EAL,
>> +				"Not support handle hot unplug for bus %s!\n",
>> +				bus->name);
>> +			ret = -ENOTSUP;
>> +		}
>> +	} else {
>> +		TAILQ_FOREACH(bus, &rte_bus_list, next) {
>> +			if (bus->handle_hot_unplug) {
>> +				/**
>> +				 * call bus ops to handle hot unplug.
>> +				 */
>> +				ret = bus->handle_hot_unplug(dev, dev_addr);
>> +				if (ret)
>> +					RTE_LOG(ERR, EAL,
>> +						"Cannot handle hot unplug "
>> +						"for the device "
>> +						"on the bus %s!\n", bus->name);
>> +				else
>> +					break;
>> +			} else {
>> +				RTE_LOG(ERR, EAL,
>> +					"Not support handle hot unplug "
>> +					"for bus %s!\n", bus->name);
>> +				ret = -ENOTSUP;
>> +			}
>> +		}
>> +	}
>> +	return ret;
>> +}
>> +
>> +static void
>> +sigbus_action_recover(void)
>> +{
>> +	if (sigbus_need_recover) {
>> +		sigaction(SIGBUS, &sigbus_action_old, NULL);
>> +		sigbus_need_recover = 0;
>> +	}
>> +}
>> +
>> +static void sigbus_handler(int signum __rte_unused, siginfo_t *info,
>> +				void *ctx __rte_unused)
>> +{
>> +	int ret;
>> +
>> +	RTE_LOG(DEBUG, EAL, "Thread[%d] catch SIGBUS, fault address:%p\n",
>> +		(int)pthread_self(), info->si_addr);
>> +	rte_spinlock_lock(&dev_failure_lock);
>> +	ret = dev_uev_failure_process(NULL, info->si_addr);
>> +	rte_spinlock_unlock(&dev_failure_lock);
>> +	if (!ret)
>> +		RTE_LOG(DEBUG, EAL,
>> +			"Success to handle SIGBUS error for hot unplug!\n");
>> +	else
>> +		rte_exit(EXIT_FAILURE, "exit for SIGBUS error!");
> I still think we have to distinguish here 2 cases:
> 1) failure addr is not belong to any dpdk devices
> 2) failure addr does belong to dpdk device, but we fail to remap it.
>
> For 1) we probably need to call previous sigbus handler.
> For 2) we probably can only do exit().

i think the previous sigbus handler is just a exception of sigbus error 
and exit out of the process, so i think should use one way to handler 
1)+2) should be fine, do you agree with that? or you could find any 
chance to
call any other sigbus handler at this positoin?
>> +}
>> +
>> +static int cmp_dev_name(const struct rte_device *dev,
>> +	const void *_name)
>> +{
>> +	const char *name = _name;
>> +
>> +	return strcmp(dev->name, name);
>> +}
>> +
>> +static int
>>   dev_uev_socket_fd_create(void)
>>   {
>>   	struct sockaddr_nl addr;
>> @@ -147,6 +248,9 @@ dev_uev_handler(__rte_unused void *param)
>>   	struct rte_dev_event uevent;
>>   	int ret;
>>   	char buf[EAL_UEV_MSG_LEN];
>> +	struct rte_bus *bus;
>> +	struct rte_device *dev;
>> +	const char *busname;
>>
>>   	memset(&uevent, 0, sizeof(struct rte_dev_event));
>>   	memset(buf, 0, EAL_UEV_MSG_LEN);
>> @@ -171,13 +275,50 @@ dev_uev_handler(__rte_unused void *param)
>>   	RTE_LOG(DEBUG, EAL, "receive uevent(name:%s, type:%d, subsystem:%d)\n",
>>   		uevent.devname, uevent.type, uevent.subsystem);
>>
>> -	if (uevent.devname)
>> +	switch (uevent.subsystem) {
>> +	case EAL_DEV_EVENT_SUBSYSTEM_PCI:
>> +	case EAL_DEV_EVENT_SUBSYSTEM_UIO:
>> +		busname = "pci";
>> +		break;
>> +	default:
>> +		break;
>> +	}
>> +
>> +	if (uevent.devname) {
>> +		if (uevent.type == RTE_DEV_EVENT_REMOVE) {
>> +			bus = rte_bus_find_by_name(busname);
>> +			if (bus == NULL) {
>> +				RTE_LOG(ERR, EAL, "Cannot find bus (%s)\n",
>> +					uevent.devname);
>> +				return;
>> +			}
>> +			dev = bus->find_device(NULL, cmp_dev_name,
>> +					       uevent.devname);
>> +			if (dev == NULL) {
>> +				RTE_LOG(ERR, EAL,
>> +					"Cannot find unplugged device (%s)\n",
>> +					uevent.devname);
>> +				return;
>> +			}
>> +			rte_spinlock_lock(&dev_failure_lock);
>> +			ret = dev_uev_failure_process(dev, NULL);
>> +			rte_spinlock_unlock(&dev_failure_lock);
> That's interrupt thread, right?
> I wonder could it happen that user will call device_detach() at the same moment?
> Konstantin

it is in interrupt thread, and user will call device_detach after failure process, you concern about twice or more device detach? i don't think is there any problem here.

>> +			if (ret) {
>> +				RTE_LOG(ERR, EAL, "Driver cannot remap the "
>> +					"device (%s)\n",
>> +					dev->name);
>> +				return;
>> +			}
>> +		}
>>   		dev_callback_process(uevent.devname, uevent.type);
>> +	}
>>   }
>>
>>   int __rte_experimental
>>   rte_dev_event_monitor_start(void)
>>   {
>> +	sigset_t mask;
>> +	struct sigaction action;
>>   	int ret;
>>
>>   	if (monitor_started)
>> @@ -197,6 +338,14 @@ rte_dev_event_monitor_start(void)
>>   		return -1;
>>   	}
>>
>> +	/* register sigbus handler */
>> +	sigemptyset(&mask);
>> +	sigaddset(&mask, SIGBUS);
>> +	action.sa_flags = SA_SIGINFO;
>> +	action.sa_mask = mask;
>> +	action.sa_sigaction = sigbus_handler;
>> +	sigbus_need_recover = !sigaction(SIGBUS, &action, &sigbus_action_old);
>> +
>>   	monitor_started = true;
>>
>>   	return 0;
>> @@ -217,8 +366,11 @@ rte_dev_event_monitor_stop(void)
>>   		return ret;
>>   	}
>>
>> +	sigbus_action_recover();
>> +
>>   	close(intr_handle.fd);
>>   	intr_handle.fd = -1;
>>   	monitor_started = false;
>> +
>>   	return 0;
>>   }
>> --
>> 2.7.4

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V21 2/4] eal: add failure handle mechanism for hot plug
  2018-05-08 14:57             ` Guo, Jia
@ 2018-05-08 15:19               ` Ananyev, Konstantin
  0 siblings, 0 replies; 494+ messages in thread
From: Ananyev, Konstantin @ 2018-05-08 15:19 UTC (permalink / raw)
  To: Guo, Jia, stephen, Richardson, Bruce, Yigit, Ferruh,
	gaetan.rivet, Wu, Jingjing, thomas, motih, matan, Van Haaren,
	Harry, Tan, Jianfeng
  Cc: jblunck, shreyansh.jain, dev, Zhang, Helin



> -----Original Message-----
> From: Guo, Jia
> Sent: Tuesday, May 8, 2018 3:57 PM
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; stephen@networkplumber.org; Richardson, Bruce
> <bruce.richardson@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com>; gaetan.rivet@6wind.com; Wu, Jingjing
> <jingjing.wu@intel.com>; thomas@monjalon.net; motih@mellanox.com; matan@mellanox.com; Van Haaren, Harry
> <harry.van.haaren@intel.com>; Tan, Jianfeng <jianfeng.tan@intel.com>
> Cc: jblunck@infradead.org; shreyansh.jain@nxp.com; dev@dpdk.org; Zhang, Helin <helin.zhang@intel.com>
> Subject: Re: [PATCH V21 2/4] eal: add failure handle mechanism for hot plug
> 
> 
> 
> On 5/4/2018 11:56 PM, Ananyev, Konstantin wrote:
> > Hi Jeff,
> >
> >> This patch introduces a failure handler mechanism to handle device
> >> hot unplug event. When device be hot plug out, the device resource
> >> become invalid, if this resource is still be unexpected read/write,
> >> system will crash. This patch let eal help application to handle
> >> this fault, when sigbus error occur, check the failure address and
> >> accordingly remap the invalid memory for the corresponding device,
> >> that could guaranty the application not to be shut down when hot plug.
> >>
> >> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> >> ---
> >> v21->v20:
> >> sync failure hanlde to fix multiple process issue
> >> ---
> >>   lib/librte_eal/linuxapp/eal/eal_dev.c | 154 +++++++++++++++++++++++++++++++++-
> >>   1 file changed, 153 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
> >> index 1cf6aeb..3067f39 100644
> >> --- a/lib/librte_eal/linuxapp/eal/eal_dev.c
> >> +++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
> >> @@ -4,6 +4,8 @@
> >>
> >>   #include <string.h>
> >>   #include <unistd.h>
> >> +#include <fcntl.h>
> >> +#include <signal.h>
> >>   #include <sys/socket.h>
> >>   #include <linux/netlink.h>
> >>
> >> @@ -14,15 +16,27 @@
> >>   #include <rte_malloc.h>
> >>   #include <rte_interrupts.h>
> >>   #include <rte_alarm.h>
> >> +#include <rte_bus.h>
> >> +#include <rte_eal.h>
> >> +#include <rte_spinlock.h>
> >>
> >>   #include "eal_private.h"
> >>
> >>   static struct rte_intr_handle intr_handle = {.fd = -1 };
> >>   static bool monitor_started;
> >>
> >> +extern struct rte_bus_list rte_bus_list;
> >> +
> >>   #define EAL_UEV_MSG_LEN 4096
> >>   #define EAL_UEV_MSG_ELEM_LEN 128
> >>
> >> +/* spinlock for device failure process */
> >> +static rte_spinlock_t dev_failure_lock = RTE_SPINLOCK_INITIALIZER;
> >> +
> >> +static struct sigaction sigbus_action_old;
> >> +
> >> +static int sigbus_need_recover;
> >> +
> >>   static void dev_uev_handler(__rte_unused void *param);
> >>
> >>   /* identify the system layer which reports this event. */
> >> @@ -34,6 +48,93 @@ enum eal_dev_event_subsystem {
> >>   };
> >>
> >>   static int
> >> +dev_uev_failure_process(struct rte_device *dev, void *dev_addr)
> >> +{
> >> +	struct rte_bus *bus;
> >> +	int ret = 0;
> >> +
> >> +	if (!dev && !dev_addr) {
> >> +		return -EINVAL;
> >> +	} else if (dev) {
> >> +		bus = rte_bus_find_by_device_name(dev->name);
> >> +		if (bus->handle_hot_unplug) {
> >> +			/**
> >> +			 * call bus ops to handle hot unplug.
> >> +			 */
> >> +			ret = bus->handle_hot_unplug(dev, dev_addr);
> >> +			if (ret) {
> >> +				RTE_LOG(ERR, EAL,
> >> +					"Cannot handle hot unplug "
> >> +					"for device %s "
> >> +					"on the bus %s.\n ",
> >> +					dev->name, bus->name);
> >> +			}
> >> +		} else {
> >> +			RTE_LOG(ERR, EAL,
> >> +				"Not support handle hot unplug for bus %s!\n",
> >> +				bus->name);
> >> +			ret = -ENOTSUP;
> >> +		}
> >> +	} else {
> >> +		TAILQ_FOREACH(bus, &rte_bus_list, next) {
> >> +			if (bus->handle_hot_unplug) {
> >> +				/**
> >> +				 * call bus ops to handle hot unplug.
> >> +				 */
> >> +				ret = bus->handle_hot_unplug(dev, dev_addr);
> >> +				if (ret)
> >> +					RTE_LOG(ERR, EAL,
> >> +						"Cannot handle hot unplug "
> >> +						"for the device "
> >> +						"on the bus %s!\n", bus->name);
> >> +				else
> >> +					break;
> >> +			} else {
> >> +				RTE_LOG(ERR, EAL,
> >> +					"Not support handle hot unplug "
> >> +					"for bus %s!\n", bus->name);
> >> +				ret = -ENOTSUP;
> >> +			}
> >> +		}
> >> +	}
> >> +	return ret;
> >> +}
> >> +
> >> +static void
> >> +sigbus_action_recover(void)
> >> +{
> >> +	if (sigbus_need_recover) {
> >> +		sigaction(SIGBUS, &sigbus_action_old, NULL);
> >> +		sigbus_need_recover = 0;
> >> +	}
> >> +}
> >> +
> >> +static void sigbus_handler(int signum __rte_unused, siginfo_t *info,
> >> +				void *ctx __rte_unused)
> >> +{
> >> +	int ret;
> >> +
> >> +	RTE_LOG(DEBUG, EAL, "Thread[%d] catch SIGBUS, fault address:%p\n",
> >> +		(int)pthread_self(), info->si_addr);
> >> +	rte_spinlock_lock(&dev_failure_lock);
> >> +	ret = dev_uev_failure_process(NULL, info->si_addr);
> >> +	rte_spinlock_unlock(&dev_failure_lock);
> >> +	if (!ret)
> >> +		RTE_LOG(DEBUG, EAL,
> >> +			"Success to handle SIGBUS error for hot unplug!\n");
> >> +	else
> >> +		rte_exit(EXIT_FAILURE, "exit for SIGBUS error!");
> > I still think we have to distinguish here 2 cases:
> > 1) failure addr is not belong to any dpdk devices
> > 2) failure addr does belong to dpdk device, but we fail to remap it.
> >
> > For 1) we probably need to call previous sigbus handler.
> > For 2) we probably can only do exit().
> 
> i think the previous sigbus handler is just a exception of sigbus error
> and exit out of the process, so i think should use one way to handler
> 1)+2) should be fine, do you agree with that? or you could find any
> chance to
> call any other sigbus handler at this positoin?

I think application can have its own sigbus handler installed (same as we do).

> >> +}
> >> +
> >> +static int cmp_dev_name(const struct rte_device *dev,
> >> +	const void *_name)
> >> +{
> >> +	const char *name = _name;
> >> +
> >> +	return strcmp(dev->name, name);
> >> +}
> >> +
> >> +static int
> >>   dev_uev_socket_fd_create(void)
> >>   {
> >>   	struct sockaddr_nl addr;
> >> @@ -147,6 +248,9 @@ dev_uev_handler(__rte_unused void *param)
> >>   	struct rte_dev_event uevent;
> >>   	int ret;
> >>   	char buf[EAL_UEV_MSG_LEN];
> >> +	struct rte_bus *bus;
> >> +	struct rte_device *dev;
> >> +	const char *busname;
> >>
> >>   	memset(&uevent, 0, sizeof(struct rte_dev_event));
> >>   	memset(buf, 0, EAL_UEV_MSG_LEN);
> >> @@ -171,13 +275,50 @@ dev_uev_handler(__rte_unused void *param)
> >>   	RTE_LOG(DEBUG, EAL, "receive uevent(name:%s, type:%d, subsystem:%d)\n",
> >>   		uevent.devname, uevent.type, uevent.subsystem);
> >>
> >> -	if (uevent.devname)
> >> +	switch (uevent.subsystem) {
> >> +	case EAL_DEV_EVENT_SUBSYSTEM_PCI:
> >> +	case EAL_DEV_EVENT_SUBSYSTEM_UIO:
> >> +		busname = "pci";
> >> +		break;
> >> +	default:
> >> +		break;
> >> +	}
> >> +
> >> +	if (uevent.devname) {
> >> +		if (uevent.type == RTE_DEV_EVENT_REMOVE) {
> >> +			bus = rte_bus_find_by_name(busname);
> >> +			if (bus == NULL) {
> >> +				RTE_LOG(ERR, EAL, "Cannot find bus (%s)\n",
> >> +					uevent.devname);
> >> +				return;
> >> +			}
> >> +			dev = bus->find_device(NULL, cmp_dev_name,
> >> +					       uevent.devname);
> >> +			if (dev == NULL) {
> >> +				RTE_LOG(ERR, EAL,
> >> +					"Cannot find unplugged device (%s)\n",
> >> +					uevent.devname);
> >> +				return;
> >> +			}
> >> +			rte_spinlock_lock(&dev_failure_lock);
> >> +			ret = dev_uev_failure_process(dev, NULL);
> >> +			rte_spinlock_unlock(&dev_failure_lock);
> > That's interrupt thread, right?
> > I wonder could it happen that user will call device_detach() at the same moment?
> > Konstantin
> 
> it is in interrupt thread, and user will call device_detach after failure process, you concern about twice or more device detach? i
> don't think is there any problem here.

Ok, but user can call device_detach() on his own, without waiting for failure to happen, right?


> 
> >> +			if (ret) {
> >> +				RTE_LOG(ERR, EAL, "Driver cannot remap the "
> >> +					"device (%s)\n",
> >> +					dev->name);
> >> +				return;
> >> +			}
> >> +		}
> >>   		dev_callback_process(uevent.devname, uevent.type);
> >> +	}
> >>   }
> >>
> >>   int __rte_experimental
> >>   rte_dev_event_monitor_start(void)
> >>   {
> >> +	sigset_t mask;
> >> +	struct sigaction action;
> >>   	int ret;
> >>
> >>   	if (monitor_started)
> >> @@ -197,6 +338,14 @@ rte_dev_event_monitor_start(void)
> >>   		return -1;
> >>   	}
> >>
> >> +	/* register sigbus handler */
> >> +	sigemptyset(&mask);
> >> +	sigaddset(&mask, SIGBUS);
> >> +	action.sa_flags = SA_SIGINFO;
> >> +	action.sa_mask = mask;
> >> +	action.sa_sigaction = sigbus_handler;
> >> +	sigbus_need_recover = !sigaction(SIGBUS, &action, &sigbus_action_old);
> >> +
> >>   	monitor_started = true;
> >>
> >>   	return 0;
> >> @@ -217,8 +366,11 @@ rte_dev_event_monitor_stop(void)
> >>   		return ret;
> >>   	}
> >>
> >> +	sigbus_action_recover();
> >> +
> >>   	close(intr_handle.fd);
> >>   	intr_handle.fd = -1;
> >>   	monitor_started = false;
> >> +
> >>   	return 0;
> >>   }
> >> --
> >> 2.7.4

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V21 4/4] app/testpmd: show example to handle hot unplug
  2018-05-03  8:57         ` [PATCH V21 4/4] app/testpmd: show example to handle " Jeff Guo
@ 2018-05-16 14:30           ` Iremonger, Bernard
  0 siblings, 0 replies; 494+ messages in thread
From: Iremonger, Bernard @ 2018-05-16 14:30 UTC (permalink / raw)
  To: Guo, Jia, stephen, Richardson, Bruce, Yigit, Ferruh, Ananyev,
	Konstantin, gaetan.rivet, Wu, Jingjing, thomas, motih, matan,
	Van Haaren, Harry, Tan, Jianfeng
  Cc: jblunck, shreyansh.jain, dev, Guo, Jia, Zhang, Helin

Hi Jeff

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jeff Guo
> Sent: Thursday, May 3, 2018 9:57 AM
> To: stephen@networkplumber.org; Richardson, Bruce
> <bruce.richardson@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com>;
> Ananyev, Konstantin <konstantin.ananyev@intel.com>;
> gaetan.rivet@6wind.com; Wu, Jingjing <jingjing.wu@intel.com>;
> thomas@monjalon.net; motih@mellanox.com; matan@mellanox.com; Van
> Haaren, Harry <harry.van.haaren@intel.com>; Tan, Jianfeng
> <jianfeng.tan@intel.com>
> Cc: jblunck@infradead.org; shreyansh.jain@nxp.com; dev@dpdk.org; Guo,
> Jia <jia.guo@intel.com>; Zhang, Helin <helin.zhang@intel.com>
> Subject: [dpdk-dev] [PATCH V21 4/4] app/testpmd: show example to handle
> hot unplug
> 
> Use testpmd for example, to show how an application smoothly handle
> failure when device being hot unplug. Once app detect the removal event,
> the callback would be called, it first stop the packet forwarding, then stop the
> port, close the port and finally detach the port.
> 
> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> ---
> v21->v20:
> fix attach port issue, let it work for multiple device case.
> ---
>  app/test-pmd/testpmd.c | 28 +++++++++++++++++++++++-----
>  1 file changed, 23 insertions(+), 5 deletions(-)
> 
> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
> db23f23..81f41e3 100644
> --- a/app/test-pmd/testpmd.c
> +++ b/app/test-pmd/testpmd.c
> @@ -1908,9 +1908,10 @@ eth_dev_event_callback_unregister(void)
>  void
>  attach_port(char *identifier)
>  {
> -	portid_t pi = 0;
>  	unsigned int socket_id;
> 
> +	portid_t pi = rte_eth_dev_count_avail();
> +
>  	printf("Attaching a new port...\n");
> 
>  	if (identifier == NULL) {
> @@ -2079,6 +2080,26 @@ rmv_event_callback(void *arg)
>  			dev->device->name);
>  }
> 
> +static void
> +rmv_dev_event_callback(char *dev_name)
> +{
> +	uint16_t port_id;
> +	int ret;
> +
> +	ret = rte_eth_dev_get_port_by_name(dev_name, &port_id);
> +	if (ret) {
> +		printf("can not get port by device %s!\n", dev_name);
> +		return;
> +	}
> +
> +	RTE_ETH_VALID_PORTID_OR_RET(port_id);
> +	printf("removing port id:%u\n", port_id);
> +	stop_packet_forwarding();
> +	stop_port(port_id);
> +	close_port(port_id);
> +	detach_port(port_id);
> +}
> +
>  /* This function is used by the interrupt thread */  static int
> eth_event_callback(portid_t port_id, enum rte_eth_event_type type, void
> *param, @@ -2141,9 +2162,7 @@ eth_dev_event_callback(char
> *device_name, enum rte_dev_event_type type,
>  	case RTE_DEV_EVENT_REMOVE:
>  		RTE_LOG(ERR, EAL, "The device: %s has been removed!\n",
>  			device_name);
> -		/* TODO: After finish failure handle, begin to stop
> -		 * packet forward, stop port, close port, detach port.
> -		 */
> +		rmv_dev_event_callback(device_name);
>  		break;
>  	case RTE_DEV_EVENT_ADD:
>  		RTE_LOG(ERR, EAL, "The device: %s has been added!\n",
> @@ -2666,7 +2685,6 @@ main(int argc, char** argv)
>  			return -1;
>  		}
>  		eth_dev_event_callback_register();
> -
>  	}
> 
>  	if (start_port(RTE_PORT_ALL) != 0)
> --
> 2.7.4

This patch does not apply to dpdk_18_05_R4C master branch and needs to be rebased.

Regards,

Bernard.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V21 4/4] app/testpmd: show example to handle hot unplug
  2018-05-03 10:48         ` [PATCH V21 4/4] app/testpmd: show example to handle " Jeff Guo
@ 2018-06-14 12:59           ` Iremonger, Bernard
  2018-06-15  8:32             ` Guo, Jia
  0 siblings, 1 reply; 494+ messages in thread
From: Iremonger, Bernard @ 2018-06-14 12:59 UTC (permalink / raw)
  To: Guo, Jia, stephen, Richardson, Bruce, Yigit, Ferruh, Ananyev,
	Konstantin, gaetan.rivet, Wu, Jingjing, thomas, motih, matan,
	Van Haaren, Harry, Tan, Jianfeng
  Cc: jblunck, shreyansh.jain, dev, Guo, Jia, Zhang, Helin

Hi Jeff,

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jeff Guo
> Sent: Thursday, May 3, 2018 11:49 AM
> To: stephen@networkplumber.org; Richardson, Bruce
> <bruce.richardson@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com>;
> Ananyev, Konstantin <konstantin.ananyev@intel.com>;
> gaetan.rivet@6wind.com; Wu, Jingjing <jingjing.wu@intel.com>;
> thomas@monjalon.net; motih@mellanox.com; matan@mellanox.com; Van
> Haaren, Harry <harry.van.haaren@intel.com>; Tan, Jianfeng
> <jianfeng.tan@intel.com>
> Cc: jblunck@infradead.org; shreyansh.jain@nxp.com; dev@dpdk.org; Guo,
> Jia <jia.guo@intel.com>; Zhang, Helin <helin.zhang@intel.com>
> Subject: [dpdk-dev] [PATCH V21 4/4] app/testpmd: show example to handle
> hot unplug
> 
> Use testpmd for example, to show how an application smoothly handle
> failure when device being hot unplug. Once app detect the removal event,
> the callback would be called, it first stop the packet forwarding, then stop the
> port, close the port and finally detach the port.
> 
> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> ---
> v21->v20:
> fix attach port issue, let it work for multiple device case.
> combind rmv callback to only one.
> ---
>  app/test-pmd/testpmd.c | 27 ++++++++++++++++++---------
>  1 file changed, 18 insertions(+), 9 deletions(-)
> 
> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
> db23f23..a1ff8f3 100644
> --- a/app/test-pmd/testpmd.c
> +++ b/app/test-pmd/testpmd.c
> @@ -1908,9 +1908,10 @@ eth_dev_event_callback_unregister(void)
>  void
>  attach_port(char *identifier)
>  {
> -	portid_t pi = 0;
>  	unsigned int socket_id;
> 
> +	portid_t pi = rte_eth_dev_count_avail();
> +
>  	printf("Attaching a new port...\n");
> 
>  	if (identifier == NULL) {
> @@ -2071,12 +2072,14 @@ rmv_event_callback(void *arg)
>  	RTE_ETH_VALID_PORTID_OR_RET(port_id);
>  	dev = &rte_eth_devices[port_id];
> 
> +	if (dev->state == RTE_ETH_DEV_UNUSED)
> +		return;
> +
> +	printf("removing device %s\n", dev->device->name);
> +	stop_packet_forwarding();
>  	stop_port(port_id);
>  	close_port(port_id);
> -	printf("removing device %s\n", dev->device->name);
> -	if (rte_eal_dev_detach(dev->device))
> -		TESTPMD_LOG(ERR, "Failed to detach device %s\n",
> -			dev->device->name);
> +	detach_port(port_id);
>  }
> 
>  /* This function is used by the interrupt thread */ @@ -2131,19 +2134,26
> @@ static void  eth_dev_event_callback(char *device_name, enum
> rte_dev_event_type type,
>  			     __rte_unused void *arg)
>  {
> +	uint16_t port_id;
> +	int ret;
> +
>  	if (type >= RTE_DEV_EVENT_MAX) {
>  		fprintf(stderr, "%s called upon invalid event %d\n",
>  			__func__, type);
>  		fflush(stderr);
>  	}
> 
> +	ret = rte_eth_dev_get_port_by_name(device_name, &port_id);
> +	if (ret) {
> +		printf("can not get port by device %s!\n", device_name);
> +		return;
> +	}
> +
>  	switch (type) {
>  	case RTE_DEV_EVENT_REMOVE:
>  		RTE_LOG(ERR, EAL, "The device: %s has been removed!\n",
>  			device_name);
> -		/* TODO: After finish failure handle, begin to stop
> -		 * packet forward, stop port, close port, detach port.
> -		 */
> +		rmv_event_callback((void *)(intptr_t)port_id);
>  		break;
>  	case RTE_DEV_EVENT_ADD:
>  		RTE_LOG(ERR, EAL, "The device: %s has been added!\n",
> @@ -2666,7 +2676,6 @@ main(int argc, char** argv)
>  			return -1;
>  		}
>  		eth_dev_event_callback_register();
> -
>  	}
> 
>  	if (start_port(RTE_PORT_ALL) != 0)
> --
> 2.7.4

This patch fails to apply to dpdk 18.08-rc0 and needs to be rebased.

Regards,

Bernard.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V21 4/4] app/testpmd: show example to handle hot unplug
  2018-06-14 12:59           ` Iremonger, Bernard
@ 2018-06-15  8:32             ` Guo, Jia
  0 siblings, 0 replies; 494+ messages in thread
From: Guo, Jia @ 2018-06-15  8:32 UTC (permalink / raw)
  To: Iremonger, Bernard, stephen, Richardson, Bruce, Yigit, Ferruh,
	Ananyev, Konstantin, gaetan.rivet, Wu, Jingjing, thomas, motih,
	matan, Van Haaren, Harry, Tan, Jianfeng
  Cc: jblunck, shreyansh.jain, dev, Zhang, Helin



On 6/14/2018 8:59 PM, Iremonger, Bernard wrote:
> Hi Jeff,
>
>> -----Original Message-----
>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jeff Guo
>> Sent: Thursday, May 3, 2018 11:49 AM
>> To: stephen@networkplumber.org; Richardson, Bruce
>> <bruce.richardson@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com>;
>> Ananyev, Konstantin <konstantin.ananyev@intel.com>;
>> gaetan.rivet@6wind.com; Wu, Jingjing <jingjing.wu@intel.com>;
>> thomas@monjalon.net; motih@mellanox.com; matan@mellanox.com; Van
>> Haaren, Harry <harry.van.haaren@intel.com>; Tan, Jianfeng
>> <jianfeng.tan@intel.com>
>> Cc: jblunck@infradead.org; shreyansh.jain@nxp.com; dev@dpdk.org; Guo,
>> Jia <jia.guo@intel.com>; Zhang, Helin <helin.zhang@intel.com>
>> Subject: [dpdk-dev] [PATCH V21 4/4] app/testpmd: show example to handle
>> hot unplug
>>
>> Use testpmd for example, to show how an application smoothly handle
>> failure when device being hot unplug. Once app detect the removal event,
>> the callback would be called, it first stop the packet forwarding, then stop the
>> port, close the port and finally detach the port.
>>
>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
>> ---
>> v21->v20:
>> fix attach port issue, let it work for multiple device case.
>> combind rmv callback to only one.
>> ---
>>   app/test-pmd/testpmd.c | 27 ++++++++++++++++++---------
>>   1 file changed, 18 insertions(+), 9 deletions(-)
>>
>> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
>> db23f23..a1ff8f3 100644
>> --- a/app/test-pmd/testpmd.c
>> +++ b/app/test-pmd/testpmd.c
>> @@ -1908,9 +1908,10 @@ eth_dev_event_callback_unregister(void)
>>   void
>>   attach_port(char *identifier)
>>   {
>> -	portid_t pi = 0;
>>   	unsigned int socket_id;
>>
>> +	portid_t pi = rte_eth_dev_count_avail();
>> +
>>   	printf("Attaching a new port...\n");
>>
>>   	if (identifier == NULL) {
>> @@ -2071,12 +2072,14 @@ rmv_event_callback(void *arg)
>>   	RTE_ETH_VALID_PORTID_OR_RET(port_id);
>>   	dev = &rte_eth_devices[port_id];
>>
>> +	if (dev->state == RTE_ETH_DEV_UNUSED)
>> +		return;
>> +
>> +	printf("removing device %s\n", dev->device->name);
>> +	stop_packet_forwarding();
>>   	stop_port(port_id);
>>   	close_port(port_id);
>> -	printf("removing device %s\n", dev->device->name);
>> -	if (rte_eal_dev_detach(dev->device))
>> -		TESTPMD_LOG(ERR, "Failed to detach device %s\n",
>> -			dev->device->name);
>> +	detach_port(port_id);
>>   }
>>
>>   /* This function is used by the interrupt thread */ @@ -2131,19 +2134,26
>> @@ static void  eth_dev_event_callback(char *device_name, enum
>> rte_dev_event_type type,
>>   			     __rte_unused void *arg)
>>   {
>> +	uint16_t port_id;
>> +	int ret;
>> +
>>   	if (type >= RTE_DEV_EVENT_MAX) {
>>   		fprintf(stderr, "%s called upon invalid event %d\n",
>>   			__func__, type);
>>   		fflush(stderr);
>>   	}
>>
>> +	ret = rte_eth_dev_get_port_by_name(device_name, &port_id);
>> +	if (ret) {
>> +		printf("can not get port by device %s!\n", device_name);
>> +		return;
>> +	}
>> +
>>   	switch (type) {
>>   	case RTE_DEV_EVENT_REMOVE:
>>   		RTE_LOG(ERR, EAL, "The device: %s has been removed!\n",
>>   			device_name);
>> -		/* TODO: After finish failure handle, begin to stop
>> -		 * packet forward, stop port, close port, detach port.
>> -		 */
>> +		rmv_event_callback((void *)(intptr_t)port_id);
>>   		break;
>>   	case RTE_DEV_EVENT_ADD:
>>   		RTE_LOG(ERR, EAL, "The device: %s has been added!\n",
>> @@ -2666,7 +2676,6 @@ main(int argc, char** argv)
>>   			return -1;
>>   		}
>>   		eth_dev_event_callback_register();
>> -
>>   	}
>>
>>   	if (start_port(RTE_PORT_ALL) != 0)
>> --
>> 2.7.4
> This patch fails to apply to dpdk 18.08-rc0 and needs to be rebased.
>
> Regards,
>
> Bernard.

thanks your notify, bernard, the coming next patch set will update to 
fix it.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH v2 0/4] hot plug failure handle mechanism
  2017-06-29  4:37     ` [PATCH v3 0/2] add uevent api for hot plug Jeff Guo
                         ` (5 preceding siblings ...)
  2018-05-03 10:48       ` [PATCH V21 0/4] hot plug recovery mechanism Jeff Guo
@ 2018-06-22 11:51       ` Jeff Guo
  2018-06-22 11:51         ` [PATCH v2 1/4] bus/pci: handle device hot unplug Jeff Guo
                           ` (3 more replies)
  2018-06-26 15:36       ` [PATCH V3 1/4] bus/pci: handle device " Jeff Guo
                         ` (16 subsequent siblings)
  23 siblings, 4 replies; 494+ messages in thread
From: Jeff Guo @ 2018-06-22 11:51 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

As we know, hot plug is an importance feature, either use for the datacenter
device’s fail-safe, or use for SRIOV Live Migration in SDN/NFV. It could bring
the higher flexibility and continuality to the networking services in multiple
use cases in industry. So let we see, dpdk as an importance networking
framework, what can it help to implement hot plug solution for users.

We already have a general device event detect mechanism, failsafe driver,
bonding driver and hot plug/unplug api in framework, app could use these to
develop their hot plug solution.

let’s see the case of hot unplug, it can happen when a hardware device is
be removed physically, or when the software disables it.  App need to call
ether dev API to detach the device, to unplug the device at the bus level and
make access to the device invalid. But the problem is that, the removal of the
device from the software lists is not going to be instantaneous, at this time
if the data(fast) path still read/write the device, it will cause MMIO error
and result of the app crash out.

Seems that we have got fail-safe driver(or app) + RTE_ETH_EVENT_INTR_RMV +
kernel core driver solution to handle it, but still not have failsafe driver
(or app) + RTE_DEV_EVENT_REMOVE + PCIe pmd driver failure handle solution. So
there is an absence in dpdk hot plug solution right now.

Also, we know that kernel only guaranty hot plug on the kernel side, but not for
the user mode side. Firstly we can hardly have a gatekeeper for any MMIO for
multiple PMD driver. Secondly, no more specific 3rd tools such as udev/driverctl
have especially cover these hot plug failure processing. Third, the feasibility
of app’s implement for multiple user mode PMD driver is still a problem. Here,
a general hot plug failure handle mechanism in dpdk framework would be proposed,
it aim to guaranty that, when hot unplug occur, the system will not crash and
app will not be break out, and user space can normally stop and release any
relevant resources, then unplug of the device at the bus level cleanly.

The mechanism should be come across as bellow:

Firstly, app enabled the device event monitor and register the hot plug event’s
callback before running data path. Once the hot unplug behave occur, the
mechanism will detect the removal event and then accordingly do the failure
handle. In order to do that, below functional will be bring in.
 - Add a new bus ops “handle_hot_unplug” to handle bus read/write error, it is
   bus-specific and each kind of bus can implement its own logic.
 - Implement pci bus specific ops “pci_handle_hot_unplug”. It will base on the
   failure address to remap memory for the corresponding device that unplugged.

For the data path or other unexpected control from the control path when hot
unplug occur.
 - Implement a new sigbus handler, it is registered when start device even
   monitoring. The handler is per process. Base on the signal event principle,
   control path thread and data path thread will randomly receive the sigbus
   error, but will go to the common sigbus handler. Once the MMIO sigbus error
   exposure, it will trigger the above hot unplug operation. The sigbus will be
   check if it is cause of the hot unplug or not, if not will info exception as
   the original sigbus handler. If yes, will do memory remapping.

For the control path and the igb uio release:
 - When hot unplug device, the kernel will release the device resource in the
   kernel side, such as the fd sys file will disappear, and the irq will be
   released. At this time, if igb uio driver still try to release this resource,
   it will cause kernel crash.
   On the other hand, something like interrupt disable do not automatically
   process in kernel side. If not handler it, this redundancy and dirty thing
   will affect the interrupt resource be used by other device.
   So the igb_uio driver have to check the hot plug status and corresponding
   process should be taken in igb uio deriver.
   This patch propose to add structure of rte_udev_state into rte_uio_pci_dev
   of igb_uio kernel driver, which will record the state of uio device, such as
   probed/opened/released/removed/unplug. When detect the unexpected removal
   which cause of hot unplug behavior, it will corresponding disable interrupt
   resource, while for the part of releasement which kernel have already handle,
   just skip it to avoid double free or null pointer kernel crash issue.

The mechanism could be use for fail-safe driver and app which want to use hot
plug solution. At this stage, will only use testpmd as reference to show how to
use the mechanism.
 - Enable device event monitor->device unplug->failure handle->stop forwarding->
   stop port->close port->detach port.

This process will not breaking the app/fail-safe running, and will not break
other irrelevance device. And app could plug in the device and restart the date
path again by below.
 - Device plug in->bind igb_uio driver ->attached device->start port->
   start forwarding.

patchset history:

v2->v1(v21):
refine some doc and commit log
fix igb uio kernel issue for control path failure
rebase testpmd code 

Since the hot plug solution be discussed serval around in the public, the
scope be changed and the patch set be split into many times. Coming to the
recently RFC and feature design, it just focus on the hot unplug failure
handler at this patch set, so in order let this topic more clear and focus,
summarize privours patch set in history “v1(v21)”, the v2 here go ahead
for further track.  

"v1(21)" == v21 as below:
v21->v20:
split function in hot unplug ops
sync failure hanlde to fix multiple process issue fix attach port issue for multiple devices case.
combind rmv callback function to be only one.

v20->v19:
clean the code
refine the remap logic for multiple device.
remove the auto binding

v19->18:
note for limitation of multiple hotplug,fix some typo, sqeeze patch.

v18->v15:
add document, add signal bus handler, refine the code to be more clear.

the prior patch history please check the patch set "add device event monitor framework"

Jeff Guo (4):
  bus/pci: handle device hot unplug
  eal: add failure handle mechanism for hot plug
  igb_uio: fix uio release issue when hot unplug
  app/testpmd: show example to handle hot unplug

 app/test-pmd/testpmd.c                  |  25 ++++--
 drivers/bus/pci/pci_common.c            |  65 ++++++++++++++
 drivers/bus/pci/pci_common_uio.c        |  33 +++++++
 drivers/bus/pci/private.h               |  12 +++
 kernel/linux/igb_uio/igb_uio.c          |  50 +++++++++--
 lib/librte_eal/common/include/rte_bus.h |  16 ++++
 lib/librte_eal/linuxapp/eal/eal_dev.c   | 154 +++++++++++++++++++++++++++++++-
 7 files changed, 344 insertions(+), 11 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH v2 1/4] bus/pci: handle device hot unplug
  2018-06-22 11:51       ` [PATCH v2 0/4] hot plug failure handle mechanism Jeff Guo
@ 2018-06-22 11:51         ` Jeff Guo
  2018-06-22 12:59           ` Gaëtan Rivet
  2018-06-22 11:51         ` [PATCH v2 2/4] eal: add failure handle mechanism for hot plug Jeff Guo
                           ` (2 subsequent siblings)
  3 siblings, 1 reply; 494+ messages in thread
From: Jeff Guo @ 2018-06-22 11:51 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

When a hardware device is removed physically or the software disables
it, the hot unplug occur. App need to call ether dev API to detach the
device, to unplug the device at the bus level and make access to the device
invalid. But the problem is that, the removal of the device from the
software lists is not going to be instantaneous, at this time if the data
path still read/write the device, it will cause MMIO error and result of
the app crash out. So a hot unplug handle mechanism need to guaranty app
will not crash out when hot unplug device.

To handle device hot unplug is bus-specific behavior, this patch introduces
a bus ops so that each kind of bus can implement its own logic. Further,
this patch implements the ops for PCI bus: remap a dummy memory to avoid
bus read/write error.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v2->v1(v21):
refind commit log
---
 drivers/bus/pci/pci_common.c            | 65 +++++++++++++++++++++++++++++++++
 drivers/bus/pci/pci_common_uio.c        | 33 +++++++++++++++++
 drivers/bus/pci/private.h               | 12 ++++++
 lib/librte_eal/common/include/rte_bus.h | 16 ++++++++
 4 files changed, 126 insertions(+)

diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
index 7215aae..74d9aa8 100644
--- a/drivers/bus/pci/pci_common.c
+++ b/drivers/bus/pci/pci_common.c
@@ -472,6 +472,70 @@ pci_find_device(const struct rte_device *start, rte_dev_cmp_t cmp,
 	return NULL;
 }
 
+static struct rte_pci_device *
+pci_find_device_by_addr(void *failure_addr)
+{
+	struct rte_pci_device *pdev = NULL;
+	int i;
+
+	FOREACH_DEVICE_ON_PCIBUS(pdev) {
+		for (i = 0; i != RTE_DIM(pdev->mem_resource); i++) {
+			if ((uint64_t)(uintptr_t)failure_addr >=
+			    (uint64_t)(uintptr_t)pdev->mem_resource[i].addr &&
+			    (uint64_t)(uintptr_t)failure_addr <
+			    (uint64_t)(uintptr_t)pdev->mem_resource[i].addr +
+			    pdev->mem_resource[i].len) {
+				RTE_LOG(ERR, EAL, "Failure address "
+					"%16.16"PRIx64" belongs to "
+					"device %s!\n",
+					(uint64_t)(uintptr_t)failure_addr,
+					pdev->device.name);
+				return pdev;
+			}
+		}
+	}
+	return NULL;
+}
+static int
+pci_handle_hot_unplug(struct rte_device *dev, void *failure_addr)
+{
+	struct rte_pci_device *pdev = NULL;
+	int ret = 0;
+
+	if (dev != NULL)
+		pdev = RTE_DEV_TO_PCI(dev);
+	else
+		pdev = pci_find_device_by_addr(failure_addr);
+
+	if (!pdev)
+		return -1;
+
+	/* remap resources for devices */
+	switch (pdev->kdrv) {
+	case RTE_KDRV_VFIO:
+#ifdef VFIO_PRESENT
+		/* TODO */
+		ret = -1;
+#endif
+		break;
+	case RTE_KDRV_IGB_UIO:
+	case RTE_KDRV_UIO_GENERIC:
+	case RTE_KDRV_NIC_UIO:
+		ret = pci_uio_remap_resource(pdev);
+		break;
+	default:
+		RTE_LOG(DEBUG, EAL,
+			"Not managed by a supported kernel driver, skipped\n");
+		ret = -1;
+		break;
+	}
+
+	if (ret != 0)
+		RTE_LOG(ERR, EAL, "Failed to handle hot unplug of device %s",
+			pdev->name);
+	return ret;
+}
+
 static int
 pci_plug(struct rte_device *dev)
 {
@@ -502,6 +566,7 @@ struct rte_pci_bus rte_pci_bus = {
 		.unplug = pci_unplug,
 		.parse = pci_parse,
 		.get_iommu_class = rte_pci_get_iommu_class,
+		.handle_hot_unplug = pci_handle_hot_unplug,
 	},
 	.device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
 	.driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
diff --git a/drivers/bus/pci/pci_common_uio.c b/drivers/bus/pci/pci_common_uio.c
index 54bc20b..7ea73db 100644
--- a/drivers/bus/pci/pci_common_uio.c
+++ b/drivers/bus/pci/pci_common_uio.c
@@ -146,6 +146,39 @@ pci_uio_unmap(struct mapped_pci_resource *uio_res)
 	}
 }
 
+/* remap the PCI resource of a PCI device in anonymous virtual memory */
+int
+pci_uio_remap_resource(struct rte_pci_device *dev)
+{
+	int i;
+	void *map_address;
+
+	if (dev == NULL)
+		return -1;
+
+	/* Remap all BARs */
+	for (i = 0; i != PCI_MAX_RESOURCE; i++) {
+		/* skip empty BAR */
+		if (dev->mem_resource[i].phys_addr == 0)
+			continue;
+		map_address = mmap(dev->mem_resource[i].addr,
+				(size_t)dev->mem_resource[i].len,
+				PROT_READ | PROT_WRITE,
+				MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
+		if (map_address == MAP_FAILED) {
+			RTE_LOG(ERR, EAL,
+				"Cannot remap resource for device %s\n",
+				dev->name);
+			return -1;
+		}
+		RTE_LOG(INFO, EAL,
+			"Successful remap resource for device %s\n",
+			dev->name);
+	}
+
+	return 0;
+}
+
 static struct mapped_pci_resource *
 pci_uio_find_resource(struct rte_pci_device *dev)
 {
diff --git a/drivers/bus/pci/private.h b/drivers/bus/pci/private.h
index 88fa587..5551506 100644
--- a/drivers/bus/pci/private.h
+++ b/drivers/bus/pci/private.h
@@ -173,6 +173,18 @@ void pci_uio_free_resource(struct rte_pci_device *dev,
 		struct mapped_pci_resource *uio_res);
 
 /**
+ * Remap the PCI resource of a PCI device in anonymous virtual memory.
+ *
+ * @param dev
+ *   Point to the struct rte pci device.
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+pci_uio_remap_resource(struct rte_pci_device *dev);
+
+/**
  * Map device memory to uio resource
  *
  * This function is private to EAL.
diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
index eb9eded..6a5609f 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -168,6 +168,20 @@ typedef int (*rte_bus_unplug_t)(struct rte_device *dev);
 typedef int (*rte_bus_parse_t)(const char *name, void *addr);
 
 /**
+ * Implementation a specific hot unplug handler, which is responsible
+ * for handle the failure when hot unplug the device, guaranty the system
+ * would not hung in the case.
+ * @param dev
+ *	Pointer of the device structure.
+ *
+ * @return
+ *	0 on success.
+ *	!0 on error.
+ */
+typedef int (*rte_bus_handle_hot_unplug_t)(struct rte_device *dev,
+						void *dev_addr);
+
+/**
  * Bus scan policies
  */
 enum rte_bus_scan_mode {
@@ -209,6 +223,8 @@ struct rte_bus {
 	rte_bus_plug_t plug;         /**< Probe single device for drivers */
 	rte_bus_unplug_t unplug;     /**< Remove single device from driver */
 	rte_bus_parse_t parse;       /**< Parse a device name */
+	rte_bus_handle_hot_unplug_t handle_hot_unplug; /**< handle hot unplug
+							device event */
 	struct rte_bus_conf conf;    /**< Bus configuration */
 	rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
 };
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v2 2/4] eal: add failure handle mechanism for hot plug
  2018-06-22 11:51       ` [PATCH v2 0/4] hot plug failure handle mechanism Jeff Guo
  2018-06-22 11:51         ` [PATCH v2 1/4] bus/pci: handle device hot unplug Jeff Guo
@ 2018-06-22 11:51         ` Jeff Guo
  2018-06-22 11:51         ` [PATCH v2 3/4] igb_uio: fix uio release issue when hot unplug Jeff Guo
  2018-06-22 11:51         ` [PATCH v2 4/4] app/testpmd: show example to handle " Jeff Guo
  3 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-06-22 11:51 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

This patch introduces a failure handler mechanism to handle device
hot unplug event. When device be hot plug out, the device resource
become invalid, if this resource is still be unexpected read/write,
system will crash.

This patch let framework help application to handle this fault. When
sigbus error occur, check the failure address and accordingly remap
the invalid memory for the corresponding device, that could guaranty
the application not to be shut down when hot unplug devices.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v2->v1(v21):
refine commit log
---
 lib/librte_eal/linuxapp/eal/eal_dev.c | 154 +++++++++++++++++++++++++++++++++-
 1 file changed, 153 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
index 1cf6aeb..3067f39 100644
--- a/lib/librte_eal/linuxapp/eal/eal_dev.c
+++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
@@ -4,6 +4,8 @@
 
 #include <string.h>
 #include <unistd.h>
+#include <fcntl.h>
+#include <signal.h>
 #include <sys/socket.h>
 #include <linux/netlink.h>
 
@@ -14,15 +16,27 @@
 #include <rte_malloc.h>
 #include <rte_interrupts.h>
 #include <rte_alarm.h>
+#include <rte_bus.h>
+#include <rte_eal.h>
+#include <rte_spinlock.h>
 
 #include "eal_private.h"
 
 static struct rte_intr_handle intr_handle = {.fd = -1 };
 static bool monitor_started;
 
+extern struct rte_bus_list rte_bus_list;
+
 #define EAL_UEV_MSG_LEN 4096
 #define EAL_UEV_MSG_ELEM_LEN 128
 
+/* spinlock for device failure process */
+static rte_spinlock_t dev_failure_lock = RTE_SPINLOCK_INITIALIZER;
+
+static struct sigaction sigbus_action_old;
+
+static int sigbus_need_recover;
+
 static void dev_uev_handler(__rte_unused void *param);
 
 /* identify the system layer which reports this event. */
@@ -34,6 +48,93 @@ enum eal_dev_event_subsystem {
 };
 
 static int
+dev_uev_failure_process(struct rte_device *dev, void *dev_addr)
+{
+	struct rte_bus *bus;
+	int ret = 0;
+
+	if (!dev && !dev_addr) {
+		return -EINVAL;
+	} else if (dev) {
+		bus = rte_bus_find_by_device_name(dev->name);
+		if (bus->handle_hot_unplug) {
+			/**
+			 * call bus ops to handle hot unplug.
+			 */
+			ret = bus->handle_hot_unplug(dev, dev_addr);
+			if (ret) {
+				RTE_LOG(ERR, EAL,
+					"Cannot handle hot unplug "
+					"for device %s "
+					"on the bus %s.\n ",
+					dev->name, bus->name);
+			}
+		} else {
+			RTE_LOG(ERR, EAL,
+				"Not support handle hot unplug for bus %s!\n",
+				bus->name);
+			ret = -ENOTSUP;
+		}
+	} else {
+		TAILQ_FOREACH(bus, &rte_bus_list, next) {
+			if (bus->handle_hot_unplug) {
+				/**
+				 * call bus ops to handle hot unplug.
+				 */
+				ret = bus->handle_hot_unplug(dev, dev_addr);
+				if (ret)
+					RTE_LOG(ERR, EAL,
+						"Cannot handle hot unplug "
+						"for the device "
+						"on the bus %s!\n", bus->name);
+				else
+					break;
+			} else {
+				RTE_LOG(ERR, EAL,
+					"Not support handle hot unplug "
+					"for bus %s!\n", bus->name);
+				ret = -ENOTSUP;
+			}
+		}
+	}
+	return ret;
+}
+
+static void
+sigbus_action_recover(void)
+{
+	if (sigbus_need_recover) {
+		sigaction(SIGBUS, &sigbus_action_old, NULL);
+		sigbus_need_recover = 0;
+	}
+}
+
+static void sigbus_handler(int signum __rte_unused, siginfo_t *info,
+				void *ctx __rte_unused)
+{
+	int ret;
+
+	RTE_LOG(DEBUG, EAL, "Thread[%d] catch SIGBUS, fault address:%p\n",
+		(int)pthread_self(), info->si_addr);
+	rte_spinlock_lock(&dev_failure_lock);
+	ret = dev_uev_failure_process(NULL, info->si_addr);
+	rte_spinlock_unlock(&dev_failure_lock);
+	if (!ret)
+		RTE_LOG(DEBUG, EAL,
+			"Success to handle SIGBUS error for hot unplug!\n");
+	else
+		rte_exit(EXIT_FAILURE, "exit for SIGBUS error!");
+}
+
+static int cmp_dev_name(const struct rte_device *dev,
+	const void *_name)
+{
+	const char *name = _name;
+
+	return strcmp(dev->name, name);
+}
+
+static int
 dev_uev_socket_fd_create(void)
 {
 	struct sockaddr_nl addr;
@@ -147,6 +248,9 @@ dev_uev_handler(__rte_unused void *param)
 	struct rte_dev_event uevent;
 	int ret;
 	char buf[EAL_UEV_MSG_LEN];
+	struct rte_bus *bus;
+	struct rte_device *dev;
+	const char *busname;
 
 	memset(&uevent, 0, sizeof(struct rte_dev_event));
 	memset(buf, 0, EAL_UEV_MSG_LEN);
@@ -171,13 +275,50 @@ dev_uev_handler(__rte_unused void *param)
 	RTE_LOG(DEBUG, EAL, "receive uevent(name:%s, type:%d, subsystem:%d)\n",
 		uevent.devname, uevent.type, uevent.subsystem);
 
-	if (uevent.devname)
+	switch (uevent.subsystem) {
+	case EAL_DEV_EVENT_SUBSYSTEM_PCI:
+	case EAL_DEV_EVENT_SUBSYSTEM_UIO:
+		busname = "pci";
+		break;
+	default:
+		break;
+	}
+
+	if (uevent.devname) {
+		if (uevent.type == RTE_DEV_EVENT_REMOVE) {
+			bus = rte_bus_find_by_name(busname);
+			if (bus == NULL) {
+				RTE_LOG(ERR, EAL, "Cannot find bus (%s)\n",
+					uevent.devname);
+				return;
+			}
+			dev = bus->find_device(NULL, cmp_dev_name,
+					       uevent.devname);
+			if (dev == NULL) {
+				RTE_LOG(ERR, EAL,
+					"Cannot find unplugged device (%s)\n",
+					uevent.devname);
+				return;
+			}
+			rte_spinlock_lock(&dev_failure_lock);
+			ret = dev_uev_failure_process(dev, NULL);
+			rte_spinlock_unlock(&dev_failure_lock);
+			if (ret) {
+				RTE_LOG(ERR, EAL, "Driver cannot remap the "
+					"device (%s)\n",
+					dev->name);
+				return;
+			}
+		}
 		dev_callback_process(uevent.devname, uevent.type);
+	}
 }
 
 int __rte_experimental
 rte_dev_event_monitor_start(void)
 {
+	sigset_t mask;
+	struct sigaction action;
 	int ret;
 
 	if (monitor_started)
@@ -197,6 +338,14 @@ rte_dev_event_monitor_start(void)
 		return -1;
 	}
 
+	/* register sigbus handler */
+	sigemptyset(&mask);
+	sigaddset(&mask, SIGBUS);
+	action.sa_flags = SA_SIGINFO;
+	action.sa_mask = mask;
+	action.sa_sigaction = sigbus_handler;
+	sigbus_need_recover = !sigaction(SIGBUS, &action, &sigbus_action_old);
+
 	monitor_started = true;
 
 	return 0;
@@ -217,8 +366,11 @@ rte_dev_event_monitor_stop(void)
 		return ret;
 	}
 
+	sigbus_action_recover();
+
 	close(intr_handle.fd);
 	intr_handle.fd = -1;
 	monitor_started = false;
+
 	return 0;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v2 3/4] igb_uio: fix uio release issue when hot unplug
  2018-06-22 11:51       ` [PATCH v2 0/4] hot plug failure handle mechanism Jeff Guo
  2018-06-22 11:51         ` [PATCH v2 1/4] bus/pci: handle device hot unplug Jeff Guo
  2018-06-22 11:51         ` [PATCH v2 2/4] eal: add failure handle mechanism for hot plug Jeff Guo
@ 2018-06-22 11:51         ` Jeff Guo
  2018-06-22 11:51         ` [PATCH v2 4/4] app/testpmd: show example to handle " Jeff Guo
  3 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-06-22 11:51 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

When hot unplug device, the kernel will release the device resource in the
kernel side, such as the fd sys file will disappear, and the irq will be
released. At this time, if igb uio driver still try to release this
resource, it will cause kernel crash. On the other hand, something like
interrupt disabling do not automatically process in kernel side. If not
handler it, this redundancy and dirty thing will affect the interrupt
resource be used by other device. So the igb_uio driver have to check the
hot plug status, and the corresponding process should be taken in igb uio
driver.

This patch propose to add structure of rte_udev_state into rte_uio_pci_dev
of igb_uio kernel driver, which will record the state of uio device, such
as probed/opened/released/removed/unplug. When detect the unexpected
removal which cause of hot unplug behavior, it will corresponding disable
interrupt resource, while for the part of releasement which kernel have
already handle, just skip it to avoid double free or null pointer kernel
crash issue.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v2->v1(v21):
add uio device state to check hot plug unexpected removmal.
fix igb uio kernel driver issue.
---
 kernel/linux/igb_uio/igb_uio.c | 50 +++++++++++++++++++++++++++++++++++++-----
 1 file changed, 45 insertions(+), 5 deletions(-)

diff --git a/kernel/linux/igb_uio/igb_uio.c b/kernel/linux/igb_uio/igb_uio.c
index cd9b7e7..fdd692a 100644
--- a/kernel/linux/igb_uio/igb_uio.c
+++ b/kernel/linux/igb_uio/igb_uio.c
@@ -19,6 +19,15 @@
 
 #include "compat.h"
 
+/* uio pci device state */
+enum rte_udev_state {
+	RTE_UDEV_PROBED,
+	RTE_UDEV_OPENNED,
+	RTE_UDEV_RELEASED,
+	RTE_UDEV_REMOVED,
+	RTE_UDEV_UNPLUG
+};
+
 /**
  * A structure describing the private information for a uio device.
  */
@@ -28,6 +37,7 @@ struct rte_uio_pci_dev {
 	enum rte_intr_mode mode;
 	struct mutex lock;
 	int refcnt;
+	enum rte_udev_state state;
 };
 
 static char *intr_mode;
@@ -194,12 +204,20 @@ igbuio_pci_irqhandler(int irq, void *dev_id)
 {
 	struct rte_uio_pci_dev *udev = (struct rte_uio_pci_dev *)dev_id;
 	struct uio_info *info = &udev->info;
+	struct pci_dev *pdev = udev->pdev;
 
 	/* Legacy mode need to mask in hardware */
 	if (udev->mode == RTE_INTR_MODE_LEGACY &&
 	    !pci_check_and_mask_intx(udev->pdev))
 		return IRQ_NONE;
 
+	/* check the uevent of the kobj */
+	if ((&pdev->dev.kobj)->state_remove_uevent_sent == 1) {
+		dev_notice(&pdev->dev, "device:%s, sent remove uevent!\n",
+			   (&pdev->dev.kobj)->name);
+		udev->state = RTE_UDEV_UNPLUG;
+	}
+
 	uio_event_notify(info);
 
 	/* Message signal mode, no share IRQ and automasked */
@@ -308,7 +326,6 @@ igbuio_pci_disable_interrupts(struct rte_uio_pci_dev *udev)
 #endif
 }
 
-
 /**
  * This gets called while opening uio device file.
  */
@@ -330,24 +347,33 @@ igbuio_pci_open(struct uio_info *info, struct inode *inode)
 
 	/* enable interrupts */
 	err = igbuio_pci_enable_interrupts(udev);
-	mutex_unlock(&udev->lock);
 	if (err) {
 		dev_err(&dev->dev, "Enable interrupt fails\n");
+		pci_clear_master(dev);
 		return err;
 	}
+	udev->state = RTE_UDEV_OPENNED;
+	mutex_unlock(&udev->lock);
 	return 0;
 }
 
+/**
+ * This gets called while closing uio device file.
+ */
 static int
 igbuio_pci_release(struct uio_info *info, struct inode *inode)
 {
+
 	struct rte_uio_pci_dev *udev = info->priv;
 	struct pci_dev *dev = udev->pdev;
 
+	if (udev->state == RTE_UDEV_REMOVED)
+		return 0;
+
 	mutex_lock(&udev->lock);
 	if (--udev->refcnt > 0) {
 		mutex_unlock(&udev->lock);
-		return 0;
+		return -1;
 	}
 
 	/* disable interrupts */
@@ -355,7 +381,7 @@ igbuio_pci_release(struct uio_info *info, struct inode *inode)
 
 	/* stop the device from further DMA */
 	pci_clear_master(dev);
-
+	udev->state = RTE_UDEV_RELEASED;
 	mutex_unlock(&udev->lock);
 	return 0;
 }
@@ -557,6 +583,7 @@ igbuio_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
 			 (unsigned long long)map_dma_addr, map_addr);
 	}
 
+	udev->state = RTE_UDEV_PROBED;
 	return 0;
 
 fail_remove_group:
@@ -573,11 +600,24 @@ igbuio_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
 static void
 igbuio_pci_remove(struct pci_dev *dev)
 {
+
 	struct rte_uio_pci_dev *udev = pci_get_drvdata(dev);
+	int ret;
+
+	/* handler hot unplug */
+	if (udev->state == RTE_UDEV_OPENNED ||
+		udev->state == RTE_UDEV_UNPLUG) {
+		dev_notice(&dev->dev, "Unexpected removal!\n");
+		ret = igbuio_pci_release(&udev->info, NULL);
+		if (ret)
+			return;
+		udev->state = RTE_UDEV_REMOVED;
+		return;
+	}
 
 	mutex_destroy(&udev->lock);
-	sysfs_remove_group(&dev->dev.kobj, &dev_attr_grp);
 	uio_unregister_device(&udev->info);
+	sysfs_remove_group(&dev->dev.kobj, &dev_attr_grp);
 	igbuio_pci_release_iomem(&udev->info);
 	pci_disable_device(dev);
 	pci_set_drvdata(dev, NULL);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v2 4/4] app/testpmd: show example to handle hot unplug
  2018-06-22 11:51       ` [PATCH v2 0/4] hot plug failure handle mechanism Jeff Guo
                           ` (2 preceding siblings ...)
  2018-06-22 11:51         ` [PATCH v2 3/4] igb_uio: fix uio release issue when hot unplug Jeff Guo
@ 2018-06-22 11:51         ` Jeff Guo
  2018-06-26 10:06           ` Iremonger, Bernard
  2018-06-26 11:58           ` Matan Azrad
  3 siblings, 2 replies; 494+ messages in thread
From: Jeff Guo @ 2018-06-22 11:51 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

Use testpmd for example, to show how an application smoothly handle
failure when device being hot unplug. If app have enabled the device event
monitor and register the hot plug event’s callback before running, once
app detect the removal event, the callback would be called. It will first
stop the packet forwarding, then stop the port, close the port, and finally
detach the port to remove the device out from the device lists.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v2->v1(v21):
rebase testpmd code
---
 app/test-pmd/testpmd.c | 25 ++++++++++++++++++++-----
 1 file changed, 20 insertions(+), 5 deletions(-)

diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 24c1998..286f242 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -1951,9 +1951,10 @@ eth_dev_event_callback_unregister(void)
 void
 attach_port(char *identifier)
 {
-	portid_t pi = 0;
 	unsigned int socket_id;
 
+	portid_t pi = rte_eth_dev_count_avail();
+
 	printf("Attaching a new port...\n");
 
 	if (identifier == NULL) {
@@ -2125,16 +2126,25 @@ check_all_ports_link_status(uint32_t port_mask)
 static void
 rmv_event_callback(void *arg)
 {
+	struct rte_eth_dev *dev;
+
 	int need_to_start = 0;
 	int org_no_link_check = no_link_check;
 	portid_t port_id = (intptr_t)arg;
 
 	RTE_ETH_VALID_PORTID_OR_RET(port_id);
+	dev = &rte_eth_devices[port_id];
+
+	if (dev->state == RTE_ETH_DEV_UNUSED)
+		return;
+
+	printf("removing device %s\n", dev->device->name);
 
 	if (!test_done && port_is_forwarding(port_id)) {
 		need_to_start = 1;
 		stop_packet_forwarding();
 	}
+
 	no_link_check = 1;
 	stop_port(port_id);
 	no_link_check = org_no_link_check;
@@ -2196,6 +2206,9 @@ static void
 eth_dev_event_callback(char *device_name, enum rte_dev_event_type type,
 			     __rte_unused void *arg)
 {
+	uint16_t port_id;
+	int ret;
+
 	if (type >= RTE_DEV_EVENT_MAX) {
 		fprintf(stderr, "%s called upon invalid event %d\n",
 			__func__, type);
@@ -2206,9 +2219,12 @@ eth_dev_event_callback(char *device_name, enum rte_dev_event_type type,
 	case RTE_DEV_EVENT_REMOVE:
 		RTE_LOG(ERR, EAL, "The device: %s has been removed!\n",
 			device_name);
-		/* TODO: After finish failure handle, begin to stop
-		 * packet forward, stop port, close port, detach port.
-		 */
+		ret = rte_eth_dev_get_port_by_name(device_name, &port_id);
+		if (ret) {
+			printf("can not get port by device %s!\n", device_name);
+			return;
+		}
+		rmv_event_callback((void *)(intptr_t)port_id);
 		break;
 	case RTE_DEV_EVENT_ADD:
 		RTE_LOG(ERR, EAL, "The device: %s has been added!\n",
@@ -2736,7 +2752,6 @@ main(int argc, char** argv)
 			return -1;
 		}
 		eth_dev_event_callback_register();
-
 	}
 
 	if (start_port(RTE_PORT_ALL) != 0)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* Re: [PATCH v2 1/4] bus/pci: handle device hot unplug
  2018-06-22 11:51         ` [PATCH v2 1/4] bus/pci: handle device hot unplug Jeff Guo
@ 2018-06-22 12:59           ` Gaëtan Rivet
  2018-06-26 15:30             ` Guo, Jia
  0 siblings, 1 reply; 494+ messages in thread
From: Gaëtan Rivet @ 2018-06-22 12:59 UTC (permalink / raw)
  To: Jeff Guo
  Cc: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	jingjing.wu, thomas, motih, matan, harry.van.haaren, qi.z.zhang,
	shaopeng.he, jblunck, shreyansh.jain, dev, helin.zhang

Hi Jeff,

Sorry, I followed this development from afar,
I have a remark regarding this API, I think it can be made simpler.
Details below.

On Fri, Jun 22, 2018 at 07:51:05PM +0800, Jeff Guo wrote:
> When a hardware device is removed physically or the software disables
> it, the hot unplug occur. App need to call ether dev API to detach the
> device, to unplug the device at the bus level and make access to the device
> invalid. But the problem is that, the removal of the device from the
> software lists is not going to be instantaneous, at this time if the data
> path still read/write the device, it will cause MMIO error and result of
> the app crash out. So a hot unplug handle mechanism need to guaranty app
> will not crash out when hot unplug device.
> 
> To handle device hot unplug is bus-specific behavior, this patch introduces
> a bus ops so that each kind of bus can implement its own logic. Further,
> this patch implements the ops for PCI bus: remap a dummy memory to avoid
> bus read/write error.
> 
> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> ---
> v2->v1(v21):
> refind commit log
> ---
>  drivers/bus/pci/pci_common.c            | 65 +++++++++++++++++++++++++++++++++
>  drivers/bus/pci/pci_common_uio.c        | 33 +++++++++++++++++
>  drivers/bus/pci/private.h               | 12 ++++++
>  lib/librte_eal/common/include/rte_bus.h | 16 ++++++++
>  4 files changed, 126 insertions(+)
> 
> diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
> index 7215aae..74d9aa8 100644
> --- a/drivers/bus/pci/pci_common.c
> +++ b/drivers/bus/pci/pci_common.c
> @@ -472,6 +472,70 @@ pci_find_device(const struct rte_device *start, rte_dev_cmp_t cmp,
>  	return NULL;
>  }
>  
> +static struct rte_pci_device *
> +pci_find_device_by_addr(void *failure_addr)
> +{
> +	struct rte_pci_device *pdev = NULL;
> +	int i;
> +
> +	FOREACH_DEVICE_ON_PCIBUS(pdev) {
> +		for (i = 0; i != RTE_DIM(pdev->mem_resource); i++) {
> +			if ((uint64_t)(uintptr_t)failure_addr >=
> +			    (uint64_t)(uintptr_t)pdev->mem_resource[i].addr &&
> +			    (uint64_t)(uintptr_t)failure_addr <
> +			    (uint64_t)(uintptr_t)pdev->mem_resource[i].addr +
> +			    pdev->mem_resource[i].len) {
> +				RTE_LOG(ERR, EAL, "Failure address "
> +					"%16.16"PRIx64" belongs to "
> +					"device %s!\n",
> +					(uint64_t)(uintptr_t)failure_addr,
> +					pdev->device.name);
> +				return pdev;
> +			}
> +		}
> +	}
> +	return NULL;
> +}

You define here a new bus ops that takes either an rte_device or an
arbitrary address as input.

In the uev handler that would call this ops afterward, you similarly try
to find either a bus using the device name, or then iterate over all
buses and try to find one able to handle the error.

This seems redundant and prone to ambiguity: should one check that the
device address is actually linked with the provided address? If not, is
it an improper call or a special case? This is unclear.

Note: I haven't followed the previous discussion, maybe the
      dual dev_addr + failure_addr is warranted in the API here,
      if so why not. Otherwise it just seems redundant:
      the dev addr will never be within a physical BAR mapping,
      and for all buses / drivers not using physical mappings,
      the addr is only meaningful as a cue to find an internal
      resource.

You can use the bus->find_device() to iterate over buses, and design
your bus ops such that when provided with an addr, would do whatever it
needs internally to find a relevant resource and either handle the
error, or return that the error was not handled.

Something like that:

/* new bus ops: */
/* If generic error codes are defined as part of the API,
   >0 should mean that the sigbus was not handled,
   <0 that an error occured but that one should stop trying,
   0 that everything is ok.
 */
int (*handle_sigbus)(void *addr);

/* new rte_bus API: */

static int
bus_handle_sigbus(const struct rte_bus *bus,
                  const void *addr)
{
        /* If additional error codes are defined as part of the API,
           negative values should stop the iteration.
           In this case, rte_errno would need to be set as well.
         */
        return !(bus->handle_sigbus && bus->handle_sigbus(addr) <= 0);
}

int
rte_bus_sigbus_handler(void *addr)
{
        struct rte_bus *bus;
        int old_errno = rte_errno;

        rte_errno = 0;
        bus = rte_bus_find(NULL, bus_handle_sigbus, addr);
        if (bus == NULL) {
                /* ERROR: no bus could handle the error. */
                RTE_LOG(ERR, EAL, "No bus was able to handle the error");
                return -1;
        } else if {rte_errno != 0) {
                /* ERROR: a generic sigbus handling error. */
                RTE_LOG(ERR, EAL, "Say what the error is");
                return -1;
        }
        rte_errno = old_errno;
        return 0;
}

Which would afterward be implemented, for example in PCI bus:

static rte_pci_device *
pci_find_device_by_addr(void *addr)
{
        struct rte_pci_device *pdev;

        FOREACH_DEVICE_ON_PCIBUS(pdev)
                if (&pdev->device == addr ||
                    /* addr within mappings of pdev */)
                        return pdev;
        return NULL;
}

static int
pci_handle_sigbus(void *addr)
{
        static rte_pci_device *pdev;

        pdev = pci_find_device_by_addr(addr);
        if (pdev == NULL)
                return -1;
        /* Here, handle uio remap if needed. */
}

-------------------------

- Leave the bus iteration and check within rte_bus. Centralize the call
  in a tighter rte_bus API, do not use directly your OPS from your other
  EAL facility.

- You have left several error messages for signaling success (!), or
  even simply that a bus could not handle a specific address. This is
  bad. An error should only appear on error. Otherwise, all of this can
  easily be traced using a debugger, so I don't think it's necessary
  to leave it at a DEBUG level.

  But in any case, remove all ERR-level messages for success.

<...>

> diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
> index eb9eded..6a5609f 100644
> --- a/lib/librte_eal/common/include/rte_bus.h
> +++ b/lib/librte_eal/common/include/rte_bus.h
> @@ -168,6 +168,20 @@ typedef int (*rte_bus_unplug_t)(struct rte_device *dev);
>  typedef int (*rte_bus_parse_t)(const char *name, void *addr);
>  
>  /**
> + * Implementation a specific hot unplug handler, which is responsible
> + * for handle the failure when hot unplug the device, guaranty the system
> + * would not hung in the case.
> + * @param dev
> + *	Pointer of the device structure.
> + *
> + * @return
> + *	0 on success.
> + *	!0 on error.
> + */
> +typedef int (*rte_bus_handle_hot_unplug_t)(struct rte_device *dev,
> +						void *dev_addr);
> +

I don't like the name of the OPS.
The documentation evokes only "the failure".
So is it a handle for any and all error possibly happening to a device?
If so, where is the input to describe the error?

If it is only meant to handle SIGBUS, because it is a very specific
error state only meant to happen on certain parts of the bus (the queue
mappings, if relevant), then it makes sense to only have an arbitrary
address as context for handling.

But then, it needs to be called as such. The expected failure to be
handled should be explicit in the name of the ops, and the documentation
should be more precise about what a bus developper should do with the
input.

> +/**
>   * Bus scan policies
>   */
>  enum rte_bus_scan_mode {
> @@ -209,6 +223,8 @@ struct rte_bus {
>  	rte_bus_plug_t plug;         /**< Probe single device for drivers */
>  	rte_bus_unplug_t unplug;     /**< Remove single device from driver */
>  	rte_bus_parse_t parse;       /**< Parse a device name */
> +	rte_bus_handle_hot_unplug_t handle_hot_unplug; /**< handle hot unplug
> +							device event */

The new ops should be added at the end of the structure.

Regards,
-- 
Gaëtan Rivet
6WIND

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v2 4/4] app/testpmd: show example to handle hot unplug
  2018-06-22 11:51         ` [PATCH v2 4/4] app/testpmd: show example to handle " Jeff Guo
@ 2018-06-26 10:06           ` Iremonger, Bernard
  2018-06-26 11:58           ` Matan Azrad
  1 sibling, 0 replies; 494+ messages in thread
From: Iremonger, Bernard @ 2018-06-26 10:06 UTC (permalink / raw)
  To: Guo, Jia, stephen, Richardson, Bruce, Yigit, Ferruh, Ananyev,
	Konstantin, gaetan.rivet, Wu, Jingjing, thomas, motih, matan,
	Van Haaren, Harry, Zhang, Qi Z, He, Shaopeng
  Cc: jblunck, shreyansh.jain, dev, Guo, Jia, Zhang, Helin

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jeff Guo
> Sent: Friday, June 22, 2018 12:51 PM
> To: stephen@networkplumber.org; Richardson, Bruce
> <bruce.richardson@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com>; Ananyev,
> Konstantin <konstantin.ananyev@intel.com>; gaetan.rivet@6wind.com; Wu,
> Jingjing <jingjing.wu@intel.com>; thomas@monjalon.net;
> motih@mellanox.com; matan@mellanox.com; Van Haaren, Harry
> <harry.van.haaren@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>; He,
> Shaopeng <shaopeng.he@intel.com>
> Cc: jblunck@infradead.org; shreyansh.jain@nxp.com; dev@dpdk.org; Guo, Jia
> <jia.guo@intel.com>; Zhang, Helin <helin.zhang@intel.com>
> Subject: [dpdk-dev] [PATCH v2 4/4] app/testpmd: show example to handle hot
> unplug
> 
> Use testpmd for example, to show how an application smoothly handle failure
> when device being hot unplug. If app have enabled the device event monitor and
> register the hot plug event’s callback before running, once app detect the
> removal event, the callback would be called. It will first stop the packet
> forwarding, then stop the port, close the port, and finally detach the port to
> remove the device out from the device lists.
> 
> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> ---

Acked-by: Bernard Iremonger <bernard.iremonger@intel.com>


^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v2 4/4] app/testpmd: show example to handle hot unplug
  2018-06-22 11:51         ` [PATCH v2 4/4] app/testpmd: show example to handle " Jeff Guo
  2018-06-26 10:06           ` Iremonger, Bernard
@ 2018-06-26 11:58           ` Matan Azrad
  2018-06-26 15:33             ` Guo, Jia
  1 sibling, 1 reply; 494+ messages in thread
From: Matan Azrad @ 2018-06-26 11:58 UTC (permalink / raw)
  To: Jeff Guo, stephen, bruce.richardson, ferruh.yigit,
	konstantin.ananyev, gaetan.rivet, jingjing.wu, Thomas Monjalon,
	Mordechay Haimovsky, harry.van.haaren, qi.z.zhang, shaopeng.he
  Cc: jblunck, shreyansh.jain, dev, helin.zhang

Hi Jeff

Please see comments...

From: Jeff Guo
> Sent: Friday, June 22, 2018 2:51 PM
> To: stephen@networkplumber.org; bruce.richardson@intel.com;
> ferruh.yigit@intel.com; konstantin.ananyev@intel.com;
> gaetan.rivet@6wind.com; jingjing.wu@intel.com; Thomas Monjalon
> <thomas@monjalon.net>; Mordechay Haimovsky <motih@mellanox.com>;
> Matan Azrad <matan@mellanox.com>; harry.van.haaren@intel.com;
> qi.z.zhang@intel.com; shaopeng.he@intel.com
> Cc: jblunck@infradead.org; shreyansh.jain@nxp.com; dev@dpdk.org;
> jia.guo@intel.com; helin.zhang@intel.com
> Subject: [PATCH v2 4/4] app/testpmd: show example to handle hot unplug
> 
> Use testpmd for example, to show how an application smoothly handle failure
> when device being hot unplug. If app have enabled the device event monitor
> and register the hot plug event’s callback before running, once app detect the
> removal event, the callback would be called. It will first stop the packet
> forwarding, then stop the port, close the port, and finally detach the port to
> remove the device out from the device lists.
> 
> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> ---
> v2->v1(v21):
> rebase testpmd code
> ---
>  app/test-pmd/testpmd.c | 25 ++++++++++++++++++++-----
>  1 file changed, 20 insertions(+), 5 deletions(-)
> 
> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
> 24c1998..286f242 100644
> --- a/app/test-pmd/testpmd.c
> +++ b/app/test-pmd/testpmd.c
> @@ -1951,9 +1951,10 @@ eth_dev_event_callback_unregister(void)
>  void
>  attach_port(char *identifier)
>  {
> -	portid_t pi = 0;
>  	unsigned int socket_id;
> 
> +	portid_t pi = rte_eth_dev_count_avail();
> +
>  	printf("Attaching a new port...\n");
> 
>  	if (identifier == NULL) {
> @@ -2125,16 +2126,25 @@ check_all_ports_link_status(uint32_t port_mask)
> static void  rmv_event_callback(void *arg)  {

There is a race between ethdev RMV event to the EAL remove event, I think the application must synchronize it if both are configured.

> +	struct rte_eth_dev *dev;
> +
>  	int need_to_start = 0;
>  	int org_no_link_check = no_link_check;
>  	portid_t port_id = (intptr_t)arg;
> 
>  	RTE_ETH_VALID_PORTID_OR_RET(port_id);
> +	dev = &rte_eth_devices[port_id];
> +
> +	if (dev->state == RTE_ETH_DEV_UNUSED)
> +		return;

Can you explain why do you check the state?
Doesn't RTE_ETH_VALID_PORTID_OR_RET do it?

> +	printf("removing device %s\n", dev->device->name);
> 
>  	if (!test_done && port_is_forwarding(port_id)) {
>  		need_to_start = 1;
>  		stop_packet_forwarding();
>  	}
> +
>  	no_link_check = 1;
>  	stop_port(port_id);
>  	no_link_check = org_no_link_check;
> @@ -2196,6 +2206,9 @@ static void
>  eth_dev_event_callback(char *device_name, enum rte_dev_event_type type,
>  			     __rte_unused void *arg)
>  {
> +	uint16_t port_id;
> +	int ret;
> +
>  	if (type >= RTE_DEV_EVENT_MAX) {
>  		fprintf(stderr, "%s called upon invalid event %d\n",
>  			__func__, type);
> @@ -2206,9 +2219,12 @@ eth_dev_event_callback(char *device_name, enum
> rte_dev_event_type type,
>  	case RTE_DEV_EVENT_REMOVE:
>  		RTE_LOG(ERR, EAL, "The device: %s has been removed!\n",
>  			device_name);
> -		/* TODO: After finish failure handle, begin to stop
> -		 * packet forward, stop port, close port, detach port.
> -		 */
> +		ret = rte_eth_dev_get_port_by_name(device_name, &port_id);
> +		if (ret) {
> +			printf("can not get port by device %s!\n",
> device_name);
> +			return;
> +		}
> +		rmv_event_callback((void *)(intptr_t)port_id);
>  		break;
>  	case RTE_DEV_EVENT_ADD:
>  		RTE_LOG(ERR, EAL, "The device: %s has been added!\n", @@ -
> 2736,7 +2752,6 @@ main(int argc, char** argv)
>  			return -1;
>  		}
>  		eth_dev_event_callback_register();
> -
>  	}
> 
>  	if (start_port(RTE_PORT_ALL) != 0)
> --
> 2.7.4


^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v2 1/4] bus/pci: handle device hot unplug
  2018-06-22 12:59           ` Gaëtan Rivet
@ 2018-06-26 15:30             ` Guo, Jia
  0 siblings, 0 replies; 494+ messages in thread
From: Guo, Jia @ 2018-06-26 15:30 UTC (permalink / raw)
  To: Gaëtan Rivet
  Cc: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	jingjing.wu, thomas, motih, matan, harry.van.haaren, qi.z.zhang,
	shaopeng.he, jblunck, shreyansh.jain, dev, helin.zhang

hi, gaetan,

thanks for your review, see comment as bellow


On 6/22/2018 8:59 PM, Gaëtan Rivet wrote:
> Hi Jeff,
>
> Sorry, I followed this development from afar,
> I have a remark regarding this API, I think it can be made simpler.
> Details below.
>
> On Fri, Jun 22, 2018 at 07:51:05PM +0800, Jeff Guo wrote:
>> When a hardware device is removed physically or the software disables
>> it, the hot unplug occur. App need to call ether dev API to detach the
>> device, to unplug the device at the bus level and make access to the device
>> invalid. But the problem is that, the removal of the device from the
>> software lists is not going to be instantaneous, at this time if the data
>> path still read/write the device, it will cause MMIO error and result of
>> the app crash out. So a hot unplug handle mechanism need to guaranty app
>> will not crash out when hot unplug device.
>>
>> To handle device hot unplug is bus-specific behavior, this patch introduces
>> a bus ops so that each kind of bus can implement its own logic. Further,
>> this patch implements the ops for PCI bus: remap a dummy memory to avoid
>> bus read/write error.
>>
>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
>> ---
>> v2->v1(v21):
>> refind commit log
>> ---
>>   drivers/bus/pci/pci_common.c            | 65 +++++++++++++++++++++++++++++++++
>>   drivers/bus/pci/pci_common_uio.c        | 33 +++++++++++++++++
>>   drivers/bus/pci/private.h               | 12 ++++++
>>   lib/librte_eal/common/include/rte_bus.h | 16 ++++++++
>>   4 files changed, 126 insertions(+)
>>
>> diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
>> index 7215aae..74d9aa8 100644
>> --- a/drivers/bus/pci/pci_common.c
>> +++ b/drivers/bus/pci/pci_common.c
>> @@ -472,6 +472,70 @@ pci_find_device(const struct rte_device *start, rte_dev_cmp_t cmp,
>>   	return NULL;
>>   }
>>   
>> +static struct rte_pci_device *
>> +pci_find_device_by_addr(void *failure_addr)
>> +{
>> +	struct rte_pci_device *pdev = NULL;
>> +	int i;
>> +
>> +	FOREACH_DEVICE_ON_PCIBUS(pdev) {
>> +		for (i = 0; i != RTE_DIM(pdev->mem_resource); i++) {
>> +			if ((uint64_t)(uintptr_t)failure_addr >=
>> +			    (uint64_t)(uintptr_t)pdev->mem_resource[i].addr &&
>> +			    (uint64_t)(uintptr_t)failure_addr <
>> +			    (uint64_t)(uintptr_t)pdev->mem_resource[i].addr +
>> +			    pdev->mem_resource[i].len) {
>> +				RTE_LOG(ERR, EAL, "Failure address "
>> +					"%16.16"PRIx64" belongs to "
>> +					"device %s!\n",
>> +					(uint64_t)(uintptr_t)failure_addr,
>> +					pdev->device.name);
>> +				return pdev;
>> +			}
>> +		}
>> +	}
>> +	return NULL;
>> +}
> You define here a new bus ops that takes either an rte_device or an
> arbitrary address as input.
>
> In the uev handler that would call this ops afterward, you similarly try
> to find either a bus using the device name, or then iterate over all
> buses and try to find one able to handle the error.
>
> This seems redundant and prone to ambiguity: should one check that the
> device address is actually linked with the provided address? If not, is
> it an improper call or a special case? This is unclear.
>
> Note: I haven't followed the previous discussion, maybe the
>        dual dev_addr + failure_addr is warranted in the API here,
>        if so why not. Otherwise it just seems redundant:
>        the dev addr will never be within a physical BAR mapping,
>        and for all buses / drivers not using physical mappings,
>        the addr is only meaningful as a cue to find an internal
>        resource.
>
> You can use the bus->find_device() to iterate over buses, and design
> your bus ops such that when provided with an addr, would do whatever it
> needs internally to find a relevant resource and either handle the
> error, or return that the error was not handled.

if bus->find_device() can make think more simpler, why not? i will check 
that and modify it.

> Something like that:
>
> /* new bus ops: */
> /* If generic error codes are defined as part of the API,
>     >0 should mean that the sigbus was not handled,
>     <0 that an error occured but that one should stop trying,
>     0 that everything is ok.
>   */
> int (*handle_sigbus)(void *addr);
>
> /* new rte_bus API: */
>
> static int
> bus_handle_sigbus(const struct rte_bus *bus,
>                    const void *addr)
> {
>          /* If additional error codes are defined as part of the API,
>             negative values should stop the iteration.
>             In this case, rte_errno would need to be set as well.
>           */
>          return !(bus->handle_sigbus && bus->handle_sigbus(addr) <= 0);
> }
>
> int
> rte_bus_sigbus_handler(void *addr)
> {
>          struct rte_bus *bus;
>          int old_errno = rte_errno;
>
>          rte_errno = 0;
>          bus = rte_bus_find(NULL, bus_handle_sigbus, addr);
>          if (bus == NULL) {
>                  /* ERROR: no bus could handle the error. */
>                  RTE_LOG(ERR, EAL, "No bus was able to handle the error");
>                  return -1;
>          } else if {rte_errno != 0) {
>                  /* ERROR: a generic sigbus handling error. */
>                  RTE_LOG(ERR, EAL, "Say what the error is");
>                  return -1;
>          }
>          rte_errno = old_errno;
>          return 0;
> }
>
> Which would afterward be implemented, for example in PCI bus:
>
> static rte_pci_device *
> pci_find_device_by_addr(void *addr)
> {
>          struct rte_pci_device *pdev;
>
>          FOREACH_DEVICE_ON_PCIBUS(pdev)
>                  if (&pdev->device == addr ||
>                      /* addr within mappings of pdev */)
>                          return pdev;
>          return NULL;
> }
>
> static int
> pci_handle_sigbus(void *addr)
> {
>          static rte_pci_device *pdev;
>
>          pdev = pci_find_device_by_addr(addr);
>          if (pdev == NULL)
>                  return -1;
>          /* Here, handle uio remap if needed. */
> }
>
> -------------------------
>
> - Leave the bus iteration and check within rte_bus. Centralize the call
>    in a tighter rte_bus API, do not use directly your OPS from your other
>    EAL facility.
>
> - You have left several error messages for signaling success (!), or
>    even simply that a bus could not handle a specific address. This is
>    bad. An error should only appear on error. Otherwise, all of this can
>    easily be traced using a debugger, so I don't think it's necessary
>    to leave it at a DEBUG level.
>
>    But in any case, remove all ERR-level messages for success.

make sense. i will check which debug message is no need at least.

> <...>
>
>> diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
>> index eb9eded..6a5609f 100644
>> --- a/lib/librte_eal/common/include/rte_bus.h
>> +++ b/lib/librte_eal/common/include/rte_bus.h
>> @@ -168,6 +168,20 @@ typedef int (*rte_bus_unplug_t)(struct rte_device *dev);
>>   typedef int (*rte_bus_parse_t)(const char *name, void *addr);
>>   
>>   /**
>> + * Implementation a specific hot unplug handler, which is responsible
>> + * for handle the failure when hot unplug the device, guaranty the system
>> + * would not hung in the case.
>> + * @param dev
>> + *	Pointer of the device structure.
>> + *
>> + * @return
>> + *	0 on success.
>> + *	!0 on error.
>> + */
>> +typedef int (*rte_bus_handle_hot_unplug_t)(struct rte_device *dev,
>> +						void *dev_addr);
>> +
> I don't like the name of the OPS.
> The documentation evokes only "the failure".
> So is it a handle for any and all error possibly happening to a device?
> If so, where is the input to describe the error?
> If it is only meant to handle SIGBUS, because it is a very specific
> error state only meant to happen on certain parts of the bus (the queue
> mappings, if relevant), then it makes sense to only have an arbitrary
> address as context for handling.
>
> But then, it needs to be called as such. The expected failure to be
> handled should be explicit in the name of the ops, and the documentation
> should be more precise about what a bus developper should do with the
> input.

I agree with your point of let the name more explicit, but i think here 
maybe we should spit it into two ops, the one is hotplug_handler, the 
other is sigbus_handler, because there are
2 path that both, data path and control path, they are also need to call 
remap function when detect the hot remove event,even there are no sigbus 
happen.

>> +/**
>>    * Bus scan policies
>>    */
>>   enum rte_bus_scan_mode {
>> @@ -209,6 +223,8 @@ struct rte_bus {
>>   	rte_bus_plug_t plug;         /**< Probe single device for drivers */
>>   	rte_bus_unplug_t unplug;     /**< Remove single device from driver */
>>   	rte_bus_parse_t parse;       /**< Parse a device name */
>> +	rte_bus_handle_hot_unplug_t handle_hot_unplug; /**< handle hot unplug
>> +							device event */
> The new ops should be added at the end of the structure.

ok.

> Regards,

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v2 4/4] app/testpmd: show example to handle hot unplug
  2018-06-26 11:58           ` Matan Azrad
@ 2018-06-26 15:33             ` Guo, Jia
  0 siblings, 0 replies; 494+ messages in thread
From: Guo, Jia @ 2018-06-26 15:33 UTC (permalink / raw)
  To: Matan Azrad, stephen, bruce.richardson, ferruh.yigit,
	konstantin.ananyev, gaetan.rivet, jingjing.wu, Thomas Monjalon,
	Mordechay Haimovsky, harry.van.haaren, qi.z.zhang, shaopeng.he
  Cc: jblunck, shreyansh.jain, dev, helin.zhang

hi, matan

thanks for your review, see comment.

On 6/26/2018 7:58 PM, Matan Azrad wrote:
> Hi Jeff
>
> Please see comments...
>
> From: Jeff Guo
>> Sent: Friday, June 22, 2018 2:51 PM
>> To: stephen@networkplumber.org; bruce.richardson@intel.com;
>> ferruh.yigit@intel.com; konstantin.ananyev@intel.com;
>> gaetan.rivet@6wind.com; jingjing.wu@intel.com; Thomas Monjalon
>> <thomas@monjalon.net>; Mordechay Haimovsky <motih@mellanox.com>;
>> Matan Azrad <matan@mellanox.com>; harry.van.haaren@intel.com;
>> qi.z.zhang@intel.com; shaopeng.he@intel.com
>> Cc: jblunck@infradead.org; shreyansh.jain@nxp.com; dev@dpdk.org;
>> jia.guo@intel.com; helin.zhang@intel.com
>> Subject: [PATCH v2 4/4] app/testpmd: show example to handle hot unplug
>>
>> Use testpmd for example, to show how an application smoothly handle failure
>> when device being hot unplug. If app have enabled the device event monitor
>> and register the hot plug event’s callback before running, once app detect the
>> removal event, the callback would be called. It will first stop the packet
>> forwarding, then stop the port, close the port, and finally detach the port to
>> remove the device out from the device lists.
>>
>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
>> ---
>> v2->v1(v21):
>> rebase testpmd code
>> ---
>>   app/test-pmd/testpmd.c | 25 ++++++++++++++++++++-----
>>   1 file changed, 20 insertions(+), 5 deletions(-)
>>
>> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
>> 24c1998..286f242 100644
>> --- a/app/test-pmd/testpmd.c
>> +++ b/app/test-pmd/testpmd.c
>> @@ -1951,9 +1951,10 @@ eth_dev_event_callback_unregister(void)
>>   void
>>   attach_port(char *identifier)
>>   {
>> -	portid_t pi = 0;
>>   	unsigned int socket_id;
>>
>> +	portid_t pi = rte_eth_dev_count_avail();
>> +
>>   	printf("Attaching a new port...\n");
>>
>>   	if (identifier == NULL) {
>> @@ -2125,16 +2126,25 @@ check_all_ports_link_status(uint32_t port_mask)
>> static void  rmv_event_callback(void *arg)  {
> There is a race between ethdev RMV event to the EAL remove event, I think the application must synchronize it if both are configured.

Is this race will affect the device detaching? what is the side effect 
and what is your propose to synchronize it, and i still think about 
that.....

>> +	struct rte_eth_dev *dev;
>> +
>>   	int need_to_start = 0;
>>   	int org_no_link_check = no_link_check;
>>   	portid_t port_id = (intptr_t)arg;
>>
>>   	RTE_ETH_VALID_PORTID_OR_RET(port_id);
>> +	dev = &rte_eth_devices[port_id];
>> +
>> +	if (dev->state == RTE_ETH_DEV_UNUSED)
>> +		return;
> Can you explain why do you check the state?
> Doesn't RTE_ETH_VALID_PORTID_OR_RET do it?

correct, i check that it is no used here. thank info.

>> +	printf("removing device %s\n", dev->device->name);
>>
>>   	if (!test_done && port_is_forwarding(port_id)) {
>>   		need_to_start = 1;
>>   		stop_packet_forwarding();
>>   	}
>> +
>>   	no_link_check = 1;
>>   	stop_port(port_id);
>>   	no_link_check = org_no_link_check;
>> @@ -2196,6 +2206,9 @@ static void
>>   eth_dev_event_callback(char *device_name, enum rte_dev_event_type type,
>>   			     __rte_unused void *arg)
>>   {
>> +	uint16_t port_id;
>> +	int ret;
>> +
>>   	if (type >= RTE_DEV_EVENT_MAX) {
>>   		fprintf(stderr, "%s called upon invalid event %d\n",
>>   			__func__, type);
>> @@ -2206,9 +2219,12 @@ eth_dev_event_callback(char *device_name, enum
>> rte_dev_event_type type,
>>   	case RTE_DEV_EVENT_REMOVE:
>>   		RTE_LOG(ERR, EAL, "The device: %s has been removed!\n",
>>   			device_name);
>> -		/* TODO: After finish failure handle, begin to stop
>> -		 * packet forward, stop port, close port, detach port.
>> -		 */
>> +		ret = rte_eth_dev_get_port_by_name(device_name, &port_id);
>> +		if (ret) {
>> +			printf("can not get port by device %s!\n",
>> device_name);
>> +			return;
>> +		}
>> +		rmv_event_callback((void *)(intptr_t)port_id);
>>   		break;
>>   	case RTE_DEV_EVENT_ADD:
>>   		RTE_LOG(ERR, EAL, "The device: %s has been added!\n", @@ -
>> 2736,7 +2752,6 @@ main(int argc, char** argv)
>>   			return -1;
>>   		}
>>   		eth_dev_event_callback_register();
>> -
>>   	}
>>
>>   	if (start_port(RTE_PORT_ALL) != 0)
>> --
>> 2.7.4

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH V3 1/4] bus/pci: handle device hot unplug
  2017-06-29  4:37     ` [PATCH v3 0/2] add uevent api for hot plug Jeff Guo
                         ` (6 preceding siblings ...)
  2018-06-22 11:51       ` [PATCH v2 0/4] hot plug failure handle mechanism Jeff Guo
@ 2018-06-26 15:36       ` Jeff Guo
  2018-06-26 15:36         ` [PATCH V3 2/4] eal: add failure handle mechanism for hot plug Jeff Guo
                           ` (2 more replies)
  2018-06-29 10:30       ` [PATCH V4 0/9] hot plug failure handle mechanism Jeff Guo
                         ` (15 subsequent siblings)
  23 siblings, 3 replies; 494+ messages in thread
From: Jeff Guo @ 2018-06-26 15:36 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

When a hardware device is removed physically or the software disables
it, the hot unplug occur. App need to call ether dev API to detach the
device, to unplug the device at the bus level and make access to the device
invalid. But the problem is that, the removal of the device from the
software lists is not going to be instantaneous, at this time if the data
path still read/write the device, it will cause MMIO error and result of
the app crash out. So a hot unplug handle mechanism need to guaranty app
will not crash out when hot unplug device.

To handle device hot unplug is bus-specific behavior, this patch introduces
a bus ops so that each kind of bus can implement its own logic. Further,
this patch implements the ops for PCI bus: remap a dummy memory to avoid
bus read/write error.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v3->v2:
change bus ops name to bus_hotplug_handler.
---
 drivers/bus/pci/pci_common.c            | 34 +++++++++++++++++++++++++++++++++
 drivers/bus/pci/pci_common_uio.c        | 33 ++++++++++++++++++++++++++++++++
 drivers/bus/pci/private.h               | 12 ++++++++++++
 lib/librte_eal/common/include/rte_bus.h | 14 ++++++++++++++
 4 files changed, 93 insertions(+)

diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
index 7215aae..e607d08 100644
--- a/drivers/bus/pci/pci_common.c
+++ b/drivers/bus/pci/pci_common.c
@@ -473,6 +473,39 @@ pci_find_device(const struct rte_device *start, rte_dev_cmp_t cmp,
 }
 
 static int
+pci_hotplug_handler(struct rte_device *dev)
+{
+	struct rte_pci_device *pdev = NULL;
+	int ret = 0;
+
+	pdev = RTE_DEV_TO_PCI(dev);
+	if (!pdev)
+		return -1;
+
+	/* remap resources for devices */
+	switch (pdev->kdrv) {
+	case RTE_KDRV_VFIO:
+#ifdef VFIO_PRESENT
+		/* TODO */
+		ret = -1;
+#endif
+		break;
+	case RTE_KDRV_IGB_UIO:
+	case RTE_KDRV_UIO_GENERIC:
+	case RTE_KDRV_NIC_UIO:
+		ret = pci_uio_remap_resource(pdev);
+		break;
+	default:
+		RTE_LOG(DEBUG, EAL,
+			"Not managed by a supported kernel driver, skipped\n");
+		ret = -1;
+		break;
+	}
+
+	return ret;
+}
+
+static int
 pci_plug(struct rte_device *dev)
 {
 	return pci_probe_all_drivers(RTE_DEV_TO_PCI(dev));
@@ -502,6 +535,7 @@ struct rte_pci_bus rte_pci_bus = {
 		.unplug = pci_unplug,
 		.parse = pci_parse,
 		.get_iommu_class = rte_pci_get_iommu_class,
+		.hotplug_handler = pci_hotplug_handler,
 	},
 	.device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
 	.driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
diff --git a/drivers/bus/pci/pci_common_uio.c b/drivers/bus/pci/pci_common_uio.c
index 54bc20b..7ea73db 100644
--- a/drivers/bus/pci/pci_common_uio.c
+++ b/drivers/bus/pci/pci_common_uio.c
@@ -146,6 +146,39 @@ pci_uio_unmap(struct mapped_pci_resource *uio_res)
 	}
 }
 
+/* remap the PCI resource of a PCI device in anonymous virtual memory */
+int
+pci_uio_remap_resource(struct rte_pci_device *dev)
+{
+	int i;
+	void *map_address;
+
+	if (dev == NULL)
+		return -1;
+
+	/* Remap all BARs */
+	for (i = 0; i != PCI_MAX_RESOURCE; i++) {
+		/* skip empty BAR */
+		if (dev->mem_resource[i].phys_addr == 0)
+			continue;
+		map_address = mmap(dev->mem_resource[i].addr,
+				(size_t)dev->mem_resource[i].len,
+				PROT_READ | PROT_WRITE,
+				MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
+		if (map_address == MAP_FAILED) {
+			RTE_LOG(ERR, EAL,
+				"Cannot remap resource for device %s\n",
+				dev->name);
+			return -1;
+		}
+		RTE_LOG(INFO, EAL,
+			"Successful remap resource for device %s\n",
+			dev->name);
+	}
+
+	return 0;
+}
+
 static struct mapped_pci_resource *
 pci_uio_find_resource(struct rte_pci_device *dev)
 {
diff --git a/drivers/bus/pci/private.h b/drivers/bus/pci/private.h
index 88fa587..5551506 100644
--- a/drivers/bus/pci/private.h
+++ b/drivers/bus/pci/private.h
@@ -173,6 +173,18 @@ void pci_uio_free_resource(struct rte_pci_device *dev,
 		struct mapped_pci_resource *uio_res);
 
 /**
+ * Remap the PCI resource of a PCI device in anonymous virtual memory.
+ *
+ * @param dev
+ *   Point to the struct rte pci device.
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+pci_uio_remap_resource(struct rte_pci_device *dev);
+
+/**
  * Map device memory to uio resource
  *
  * This function is private to EAL.
diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
index eb9eded..6507f24 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -168,6 +168,19 @@ typedef int (*rte_bus_unplug_t)(struct rte_device *dev);
 typedef int (*rte_bus_parse_t)(const char *name, void *addr);
 
 /**
+ * Implementation a specific hot plug handler, which is responsible
+ * for handle the failure when hot remove the device, guaranty the system
+ * would not crash in the case.
+ * @param dev
+ *	Pointer of the device structure.
+ *
+ * @return
+ *	0 on success.
+ *	!0 on error.
+ */
+typedef int (*rte_bus_hotplug_handler_t)(struct rte_device *dev);
+
+/**
  * Bus scan policies
  */
 enum rte_bus_scan_mode {
@@ -211,6 +224,7 @@ struct rte_bus {
 	rte_bus_parse_t parse;       /**< Parse a device name */
 	struct rte_bus_conf conf;    /**< Bus configuration */
 	rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
+	rte_bus_hotplug_handler_t hotplug_handler; /**< handle hot plug on bus */
 };
 
 /**
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH V3 2/4] eal: add failure handle mechanism for hot plug
  2018-06-26 15:36       ` [PATCH V3 1/4] bus/pci: handle device " Jeff Guo
@ 2018-06-26 15:36         ` Jeff Guo
  2018-06-26 15:36         ` [PATCH V3 3/4] igb_uio: fix uio release issue when hot unplug Jeff Guo
  2018-06-26 15:36         ` [PATCH V3 4/4] app/testpmd: show example to handle " Jeff Guo
  2 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-06-26 15:36 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

This patch introduces a failure handler mechanism to handle device
hot unplug event. When device be hot plug out, the device resource
become invalid, if this resource is still be unexpected read/write,
system will crash.

This patch let framework help application to handle this fault. When
sigbus error occur, check the failure address and accordingly remap
the invalid memory for the corresponding device, that could guaranty
the application not to be shut down when hot unplug devices.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v3->v2:
add new API and bus ops of bus_signal_handler
distingush handle generic sigbus and hotplug sigbus
---
 drivers/bus/pci/pci_common.c            | 53 ++++++++++++++++++++
 lib/librte_eal/common/eal_common_bus.c  | 34 ++++++++++++-
 lib/librte_eal/common/include/rte_bus.h | 19 +++++++
 lib/librte_eal/linuxapp/eal/eal_dev.c   | 88 ++++++++++++++++++++++++++++++++-
 4 files changed, 192 insertions(+), 2 deletions(-)

diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
index e607d08..4c0ac98 100644
--- a/drivers/bus/pci/pci_common.c
+++ b/drivers/bus/pci/pci_common.c
@@ -472,6 +472,32 @@ pci_find_device(const struct rte_device *start, rte_dev_cmp_t cmp,
 	return NULL;
 }
 
+/* check the failure address belongs to which device. */
+static struct rte_pci_device *
+pci_find_device_by_addr(const void *failure_addr)
+{
+	struct rte_pci_device *pdev = NULL;
+	int i;
+
+	FOREACH_DEVICE_ON_PCIBUS(pdev) {
+		for (i = 0; i != RTE_DIM(pdev->mem_resource); i++) {
+			if ((uint64_t)(uintptr_t)failure_addr >=
+			    (uint64_t)(uintptr_t)pdev->mem_resource[i].addr &&
+			    (uint64_t)(uintptr_t)failure_addr <
+			    (uint64_t)(uintptr_t)pdev->mem_resource[i].addr +
+			    pdev->mem_resource[i].len) {
+				RTE_LOG(INFO, EAL, "Failure address "
+					"%16.16"PRIx64" belongs to "
+					"device %s!\n",
+					(uint64_t)(uintptr_t)failure_addr,
+					pdev->device.name);
+				return pdev;
+			}
+		}
+	}
+	return NULL;
+}
+
 static int
 pci_hotplug_handler(struct rte_device *dev)
 {
@@ -506,6 +532,32 @@ pci_hotplug_handler(struct rte_device *dev)
 }
 
 static int
+pci_sigbus_handler(const void *failure_addr)
+{
+	struct rte_pci_device *pdev = NULL;
+	int ret = 0;
+
+	pdev = pci_find_device_by_addr(failure_addr);
+	if (!pdev) {
+		/* not found the device which is illegal access in MMIO,
+		 * so it is a generic sigbus error.
+		 */
+		ret = 1;
+	}
+
+	/* handle hotplug when sigbus error is caused of hot removal */
+	ret = pci_hotplug_handler(&pdev->device);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Failed to handle hot plug for device %s",
+			pdev->name);
+		ret = -1;
+		rte_errno = -1;
+	}
+
+	return ret;
+}
+
+static int
 pci_plug(struct rte_device *dev)
 {
 	return pci_probe_all_drivers(RTE_DEV_TO_PCI(dev));
@@ -536,6 +588,7 @@ struct rte_pci_bus rte_pci_bus = {
 		.parse = pci_parse,
 		.get_iommu_class = rte_pci_get_iommu_class,
 		.hotplug_handler = pci_hotplug_handler,
+		.sigbus_handler = pci_sigbus_handler,
 	},
 	.device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
 	.driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
diff --git a/lib/librte_eal/common/eal_common_bus.c b/lib/librte_eal/common/eal_common_bus.c
index 0943851..b505b9b 100644
--- a/lib/librte_eal/common/eal_common_bus.c
+++ b/lib/librte_eal/common/eal_common_bus.c
@@ -37,6 +37,7 @@
 #include <rte_bus.h>
 #include <rte_debug.h>
 #include <rte_string_fns.h>
+#include <rte_errno.h>
 
 #include "eal_private.h"
 
@@ -220,7 +221,6 @@ rte_bus_find_by_device_name(const char *str)
 	return rte_bus_find(NULL, bus_can_parse, name);
 }
 
-
 /*
  * Get iommu class of devices on the bus.
  */
@@ -242,3 +242,35 @@ rte_bus_get_iommu_class(void)
 	}
 	return mode;
 }
+
+static int
+bus_handle_sigbus(const struct rte_bus *bus,
+			const void *failure_addr)
+{
+	return !(bus->sigbus_handler && bus->sigbus_handler(failure_addr) <= 0);
+}
+
+int
+rte_bus_sigbus_handler(const void *failure_addr)
+{
+	struct rte_bus *bus;
+	int old_errno = rte_errno;
+	int no_handle = 0;
+
+	rte_errno = 0;
+
+	bus = rte_bus_find(NULL, bus_handle_sigbus, failure_addr);
+	if (bus == NULL) {
+		RTE_LOG(ERR, EAL, "No bus can handle the sigbus error!");
+		no_handle = 1;
+	} else if (rte_errno != 0) {
+		RTE_LOG(ERR, EAL, "Failed to handle the sigbus error!");
+		no_handle = 1;
+	}
+
+	/* if sigbus not be handled, return back old errno. */
+	if (no_handle)
+		rte_errno = old_errno;
+
+	return no_handle;
+}
diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
index 6507f24..4389c42 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -181,6 +181,19 @@ typedef int (*rte_bus_parse_t)(const char *name, void *addr);
 typedef int (*rte_bus_hotplug_handler_t)(struct rte_device *dev);
 
 /**
+ * Implementation a specific sigbus handler, which is responsible
+ * for handle the sigbus error which is original memory error, or specific
+ * memory error that caused of hot unplug.
+ * @param failure_addr
+ *	Pointer of the fault address of the sigbus error.
+ *
+ * @return
+ *	0 on success.
+ *	!0 on error.
+ */
+typedef int (*rte_bus_sigbus_handler_t)(const void *failure_addr);
+
+/**
  * Bus scan policies
  */
 enum rte_bus_scan_mode {
@@ -225,6 +238,7 @@ struct rte_bus {
 	struct rte_bus_conf conf;    /**< Bus configuration */
 	rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
 	rte_bus_hotplug_handler_t hotplug_handler; /**< handle hot plug on bus */
+	rte_bus_sigbus_handler_t sigbus_handler; /**< handle sigbus error */
 };
 
 /**
@@ -335,6 +349,11 @@ struct rte_bus *rte_bus_find_by_name(const char *busname);
 enum rte_iova_mode rte_bus_get_iommu_class(void);
 
 /**
+ * Handle the sigbus error on corresponding bus.
+ */
+int rte_bus_sigbus_handler(const void* failure_addr);
+
+/**
  * Helper for Bus registration.
  * The constructor has higher priority than PMD constructors.
  */
diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
index 1cf6aeb..c9dddab 100644
--- a/lib/librte_eal/linuxapp/eal/eal_dev.c
+++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
@@ -4,6 +4,8 @@
 
 #include <string.h>
 #include <unistd.h>
+#include <fcntl.h>
+#include <signal.h>
 #include <sys/socket.h>
 #include <linux/netlink.h>
 
@@ -14,15 +16,24 @@
 #include <rte_malloc.h>
 #include <rte_interrupts.h>
 #include <rte_alarm.h>
+#include <rte_bus.h>
+#include <rte_eal.h>
+#include <rte_spinlock.h>
+#include <rte_errno.h>
 
 #include "eal_private.h"
 
 static struct rte_intr_handle intr_handle = {.fd = -1 };
 static bool monitor_started;
 
+extern struct rte_bus_list rte_bus_list;
+
 #define EAL_UEV_MSG_LEN 4096
 #define EAL_UEV_MSG_ELEM_LEN 128
 
+/* spinlock for device failure process */
+static rte_spinlock_t dev_failure_lock = RTE_SPINLOCK_INITIALIZER;
+
 static void dev_uev_handler(__rte_unused void *param);
 
 /* identify the system layer which reports this event. */
@@ -33,6 +44,34 @@ enum eal_dev_event_subsystem {
 	EAL_DEV_EVENT_SUBSYSTEM_MAX
 };
 
+static void sigbus_handler(int signum __rte_unused, siginfo_t *info,
+				void *ctx __rte_unused)
+{
+	int ret;
+
+	RTE_LOG(DEBUG, EAL, "Thread[%d] catch SIGBUS, fault address:%p\n",
+		(int)pthread_self(), info->si_addr);
+
+	rte_spinlock_lock(&dev_failure_lock);
+	ret = rte_bus_sigbus_handler(info->si_addr);
+	rte_spinlock_unlock(&dev_failure_lock);
+	if (!ret)
+		RTE_LOG(INFO, EAL,
+			"Success to handle SIGBUS error for hotplug!\n");
+	else
+		rte_exit(EXIT_FAILURE,
+			 "A generic SIGBUS error, (rte_errno: %s)!",
+			 strerror(rte_errno));
+}
+
+static int cmp_dev_name(const struct rte_device *dev,
+	const void *_name)
+{
+	const char *name = _name;
+
+	return strcmp(dev->name, name);
+}
+
 static int
 dev_uev_socket_fd_create(void)
 {
@@ -147,6 +186,9 @@ dev_uev_handler(__rte_unused void *param)
 	struct rte_dev_event uevent;
 	int ret;
 	char buf[EAL_UEV_MSG_LEN];
+	struct rte_bus *bus;
+	struct rte_device *dev;
+	const char *busname;
 
 	memset(&uevent, 0, sizeof(struct rte_dev_event));
 	memset(buf, 0, EAL_UEV_MSG_LEN);
@@ -171,13 +213,48 @@ dev_uev_handler(__rte_unused void *param)
 	RTE_LOG(DEBUG, EAL, "receive uevent(name:%s, type:%d, subsystem:%d)\n",
 		uevent.devname, uevent.type, uevent.subsystem);
 
-	if (uevent.devname)
+	switch (uevent.subsystem) {
+	case EAL_DEV_EVENT_SUBSYSTEM_PCI:
+	case EAL_DEV_EVENT_SUBSYSTEM_UIO:
+		busname = "pci";
+		break;
+	default:
+		break;
+	}
+
+	if (uevent.devname) {
+		if (uevent.type == RTE_DEV_EVENT_REMOVE) {
+			bus = rte_bus_find_by_name(busname);
+			if (bus == NULL) {
+				RTE_LOG(ERR, EAL, "Cannot find bus (%s)\n",
+					busname);
+				return;
+			}
+			dev = bus->find_device(NULL, cmp_dev_name,
+					       uevent.devname);
+			if (dev == NULL) {
+				RTE_LOG(ERR, EAL, "Cannot find device (%s) on "
+					"bus (%s)\n", uevent.devname, busname);
+				return;
+			}
+			rte_spinlock_lock(&dev_failure_lock);
+			ret = bus->hotplug_handler(dev);
+			rte_spinlock_unlock(&dev_failure_lock);
+			if (ret) {
+				RTE_LOG(ERR, EAL, "Can not handle hotplug for "
+					"device (%s)\n", dev->name);
+				return;
+			}
+		}
 		dev_callback_process(uevent.devname, uevent.type);
+	}
 }
 
 int __rte_experimental
 rte_dev_event_monitor_start(void)
 {
+	sigset_t mask;
+	struct sigaction action;
 	int ret;
 
 	if (monitor_started)
@@ -197,6 +274,14 @@ rte_dev_event_monitor_start(void)
 		return -1;
 	}
 
+	/* register sigbus handler */
+	sigemptyset(&mask);
+	sigaddset(&mask, SIGBUS);
+	action.sa_flags = SA_SIGINFO;
+	action.sa_mask = mask;
+	action.sa_sigaction = sigbus_handler;
+	sigaction(SIGBUS, &action, NULL);
+
 	monitor_started = true;
 
 	return 0;
@@ -220,5 +305,6 @@ rte_dev_event_monitor_stop(void)
 	close(intr_handle.fd);
 	intr_handle.fd = -1;
 	monitor_started = false;
+
 	return 0;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH V3 3/4] igb_uio: fix uio release issue when hot unplug
  2018-06-26 15:36       ` [PATCH V3 1/4] bus/pci: handle device " Jeff Guo
  2018-06-26 15:36         ` [PATCH V3 2/4] eal: add failure handle mechanism for hot plug Jeff Guo
@ 2018-06-26 15:36         ` Jeff Guo
  2018-06-26 15:36         ` [PATCH V3 4/4] app/testpmd: show example to handle " Jeff Guo
  2 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-06-26 15:36 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

When hot unplug device, the kernel will release the device resource in the
kernel side, such as the fd sys file will disappear, and the irq will be
released. At this time, if igb uio driver still try to release this
resource, it will cause kernel crash. On the other hand, something like
interrupt disabling do not automatically process in kernel side. If not
handler it, this redundancy and dirty thing will affect the interrupt
resource be used by other device. So the igb_uio driver have to check the
hot plug status, and the corresponding process should be taken in igb uio
driver.

This patch propose to add structure of rte_udev_state into rte_uio_pci_dev
of igb_uio kernel driver, which will record the state of uio device, such
as probed/opened/released/removed/unplug. When detect the unexpected
removal which cause of hot unplug behavior, it will corresponding disable
interrupt resource, while for the part of releasement which kernel have
already handle, just skip it to avoid double free or null pointer kernel
crash issue.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v3->v2:
no change.
---
 kernel/linux/igb_uio/igb_uio.c | 50 +++++++++++++++++++++++++++++++++++++-----
 1 file changed, 45 insertions(+), 5 deletions(-)

diff --git a/kernel/linux/igb_uio/igb_uio.c b/kernel/linux/igb_uio/igb_uio.c
index cd9b7e7..fdd692a 100644
--- a/kernel/linux/igb_uio/igb_uio.c
+++ b/kernel/linux/igb_uio/igb_uio.c
@@ -19,6 +19,15 @@
 
 #include "compat.h"
 
+/* uio pci device state */
+enum rte_udev_state {
+	RTE_UDEV_PROBED,
+	RTE_UDEV_OPENNED,
+	RTE_UDEV_RELEASED,
+	RTE_UDEV_REMOVED,
+	RTE_UDEV_UNPLUG
+};
+
 /**
  * A structure describing the private information for a uio device.
  */
@@ -28,6 +37,7 @@ struct rte_uio_pci_dev {
 	enum rte_intr_mode mode;
 	struct mutex lock;
 	int refcnt;
+	enum rte_udev_state state;
 };
 
 static char *intr_mode;
@@ -194,12 +204,20 @@ igbuio_pci_irqhandler(int irq, void *dev_id)
 {
 	struct rte_uio_pci_dev *udev = (struct rte_uio_pci_dev *)dev_id;
 	struct uio_info *info = &udev->info;
+	struct pci_dev *pdev = udev->pdev;
 
 	/* Legacy mode need to mask in hardware */
 	if (udev->mode == RTE_INTR_MODE_LEGACY &&
 	    !pci_check_and_mask_intx(udev->pdev))
 		return IRQ_NONE;
 
+	/* check the uevent of the kobj */
+	if ((&pdev->dev.kobj)->state_remove_uevent_sent == 1) {
+		dev_notice(&pdev->dev, "device:%s, sent remove uevent!\n",
+			   (&pdev->dev.kobj)->name);
+		udev->state = RTE_UDEV_UNPLUG;
+	}
+
 	uio_event_notify(info);
 
 	/* Message signal mode, no share IRQ and automasked */
@@ -308,7 +326,6 @@ igbuio_pci_disable_interrupts(struct rte_uio_pci_dev *udev)
 #endif
 }
 
-
 /**
  * This gets called while opening uio device file.
  */
@@ -330,24 +347,33 @@ igbuio_pci_open(struct uio_info *info, struct inode *inode)
 
 	/* enable interrupts */
 	err = igbuio_pci_enable_interrupts(udev);
-	mutex_unlock(&udev->lock);
 	if (err) {
 		dev_err(&dev->dev, "Enable interrupt fails\n");
+		pci_clear_master(dev);
 		return err;
 	}
+	udev->state = RTE_UDEV_OPENNED;
+	mutex_unlock(&udev->lock);
 	return 0;
 }
 
+/**
+ * This gets called while closing uio device file.
+ */
 static int
 igbuio_pci_release(struct uio_info *info, struct inode *inode)
 {
+
 	struct rte_uio_pci_dev *udev = info->priv;
 	struct pci_dev *dev = udev->pdev;
 
+	if (udev->state == RTE_UDEV_REMOVED)
+		return 0;
+
 	mutex_lock(&udev->lock);
 	if (--udev->refcnt > 0) {
 		mutex_unlock(&udev->lock);
-		return 0;
+		return -1;
 	}
 
 	/* disable interrupts */
@@ -355,7 +381,7 @@ igbuio_pci_release(struct uio_info *info, struct inode *inode)
 
 	/* stop the device from further DMA */
 	pci_clear_master(dev);
-
+	udev->state = RTE_UDEV_RELEASED;
 	mutex_unlock(&udev->lock);
 	return 0;
 }
@@ -557,6 +583,7 @@ igbuio_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
 			 (unsigned long long)map_dma_addr, map_addr);
 	}
 
+	udev->state = RTE_UDEV_PROBED;
 	return 0;
 
 fail_remove_group:
@@ -573,11 +600,24 @@ igbuio_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
 static void
 igbuio_pci_remove(struct pci_dev *dev)
 {
+
 	struct rte_uio_pci_dev *udev = pci_get_drvdata(dev);
+	int ret;
+
+	/* handler hot unplug */
+	if (udev->state == RTE_UDEV_OPENNED ||
+		udev->state == RTE_UDEV_UNPLUG) {
+		dev_notice(&dev->dev, "Unexpected removal!\n");
+		ret = igbuio_pci_release(&udev->info, NULL);
+		if (ret)
+			return;
+		udev->state = RTE_UDEV_REMOVED;
+		return;
+	}
 
 	mutex_destroy(&udev->lock);
-	sysfs_remove_group(&dev->dev.kobj, &dev_attr_grp);
 	uio_unregister_device(&udev->info);
+	sysfs_remove_group(&dev->dev.kobj, &dev_attr_grp);
 	igbuio_pci_release_iomem(&udev->info);
 	pci_disable_device(dev);
 	pci_set_drvdata(dev, NULL);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH V3 4/4] app/testpmd: show example to handle hot unplug
  2018-06-26 15:36       ` [PATCH V3 1/4] bus/pci: handle device " Jeff Guo
  2018-06-26 15:36         ` [PATCH V3 2/4] eal: add failure handle mechanism for hot plug Jeff Guo
  2018-06-26 15:36         ` [PATCH V3 3/4] igb_uio: fix uio release issue when hot unplug Jeff Guo
@ 2018-06-26 15:36         ` Jeff Guo
  2018-06-26 17:07           ` Matan Azrad
  2 siblings, 1 reply; 494+ messages in thread
From: Jeff Guo @ 2018-06-26 15:36 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

Use testpmd for example, to show how an application smoothly handle
failure when device being hot unplug. If app have enabled the device event
monitor and register the hot plug event’s callback before running, once
app detect the removal event, the callback would be called. It will first
stop the packet forwarding, then stop the port, close the port, and finally
detach the port to remove the device out from the device lists.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v3->v2:
delete some unused check
---
 app/test-pmd/testpmd.c | 22 +++++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 24c1998..2ee5621 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -1951,9 +1951,10 @@ eth_dev_event_callback_unregister(void)
 void
 attach_port(char *identifier)
 {
-	portid_t pi = 0;
 	unsigned int socket_id;
 
+	portid_t pi = rte_eth_dev_count_avail();
+
 	printf("Attaching a new port...\n");
 
 	if (identifier == NULL) {
@@ -2125,16 +2126,22 @@ check_all_ports_link_status(uint32_t port_mask)
 static void
 rmv_event_callback(void *arg)
 {
+	struct rte_eth_dev *dev;
+
 	int need_to_start = 0;
 	int org_no_link_check = no_link_check;
 	portid_t port_id = (intptr_t)arg;
 
 	RTE_ETH_VALID_PORTID_OR_RET(port_id);
+	dev = &rte_eth_devices[port_id];
+
+	printf("removing device %s\n", dev->device->name);
 
 	if (!test_done && port_is_forwarding(port_id)) {
 		need_to_start = 1;
 		stop_packet_forwarding();
 	}
+
 	no_link_check = 1;
 	stop_port(port_id);
 	no_link_check = org_no_link_check;
@@ -2196,6 +2203,9 @@ static void
 eth_dev_event_callback(char *device_name, enum rte_dev_event_type type,
 			     __rte_unused void *arg)
 {
+	uint16_t port_id;
+	int ret;
+
 	if (type >= RTE_DEV_EVENT_MAX) {
 		fprintf(stderr, "%s called upon invalid event %d\n",
 			__func__, type);
@@ -2206,9 +2216,12 @@ eth_dev_event_callback(char *device_name, enum rte_dev_event_type type,
 	case RTE_DEV_EVENT_REMOVE:
 		RTE_LOG(ERR, EAL, "The device: %s has been removed!\n",
 			device_name);
-		/* TODO: After finish failure handle, begin to stop
-		 * packet forward, stop port, close port, detach port.
-		 */
+		ret = rte_eth_dev_get_port_by_name(device_name, &port_id);
+		if (ret) {
+			printf("can not get port by device %s!\n", device_name);
+			return;
+		}
+		rmv_event_callback((void *)(intptr_t)port_id);
 		break;
 	case RTE_DEV_EVENT_ADD:
 		RTE_LOG(ERR, EAL, "The device: %s has been added!\n",
@@ -2736,7 +2749,6 @@ main(int argc, char** argv)
 			return -1;
 		}
 		eth_dev_event_callback_register();
-
 	}
 
 	if (start_port(RTE_PORT_ALL) != 0)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* Re: [PATCH V3 4/4] app/testpmd: show example to handle hot unplug
  2018-06-26 15:36         ` [PATCH V3 4/4] app/testpmd: show example to handle " Jeff Guo
@ 2018-06-26 17:07           ` Matan Azrad
  2018-06-27  3:56             ` Guo, Jia
  0 siblings, 1 reply; 494+ messages in thread
From: Matan Azrad @ 2018-06-26 17:07 UTC (permalink / raw)
  To: Jeff Guo, stephen, bruce.richardson, ferruh.yigit,
	konstantin.ananyev, gaetan.rivet, jingjing.wu, Thomas Monjalon,
	Mordechay Haimovsky, harry.van.haaren, qi.z.zhang, shaopeng.he,
	bernard.iremonger
  Cc: jblunck, shreyansh.jain, dev, helin.zhang

Hi Jeff

Continue session from last version + more comments\question.

From: Jeff Guo 
> Sent: Tuesday, June 26, 2018 6:36 PM
> To: stephen@networkplumber.org; bruce.richardson@intel.com;
> ferruh.yigit@intel.com; konstantin.ananyev@intel.com;
> gaetan.rivet@6wind.com; jingjing.wu@intel.com; Thomas Monjalon
> <thomas@monjalon.net>; Mordechay Haimovsky <motih@mellanox.com>;
> Matan Azrad <matan@mellanox.com>; harry.van.haaren@intel.com;
> qi.z.zhang@intel.com; shaopeng.he@intel.com; bernard.iremonger@intel.com
> Cc: jblunck@infradead.org; shreyansh.jain@nxp.com; dev@dpdk.org;
> jia.guo@intel.com; helin.zhang@intel.com
> Subject: [PATCH V3 4/4] app/testpmd: show example to handle hot unplug
> 
> Use testpmd for example, to show how an application smoothly handle failure
> when device being hot unplug. If app have enabled the device event monitor
> and register the hot plug event’s callback before running, once app detect the
> removal event, the callback would be called. It will first stop the packet
> forwarding, then stop the port, close the port, and finally detach the port to
> remove the device out from the device lists.
> 
> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> ---
> v3->v2:
> delete some unused check
> ---
>  app/test-pmd/testpmd.c | 22 +++++++++++++++++-----
>  1 file changed, 17 insertions(+), 5 deletions(-)
> 
> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
> 24c1998..2ee5621 100644
> --- a/app/test-pmd/testpmd.c
> +++ b/app/test-pmd/testpmd.c
> @@ -1951,9 +1951,10 @@ eth_dev_event_callback_unregister(void)
>  void
>  attach_port(char *identifier)
>  {
> -	portid_t pi = 0;
>  	unsigned int socket_id;
> 
> +	portid_t pi = rte_eth_dev_count_avail();
> +

I don't understand this change... can you explain?

>  	printf("Attaching a new port...\n");
> 
>  	if (identifier == NULL) {
> @@ -2125,16 +2126,22 @@ check_all_ports_link_status(uint32_t port_mask)
> static void  rmv_event_callback(void *arg)  {
> +	struct rte_eth_dev *dev;
> +
>  	int need_to_start = 0;
>  	int org_no_link_check = no_link_check;
>  	portid_t port_id = (intptr_t)arg;
> 
>  	RTE_ETH_VALID_PORTID_OR_RET(port_id);
> +	dev = &rte_eth_devices[port_id];
> +
> +	printf("removing device %s\n", dev->device->name);
> 
>  	if (!test_done && port_is_forwarding(port_id)) {
>  		need_to_start = 1;
>  		stop_packet_forwarding();
>  	}
> +

I don't think you need to change anything in this function.
You can add the print in the caller code.

>  	no_link_check = 1;
>  	stop_port(port_id);
>  	no_link_check = org_no_link_check;

Suggestion for synchronization:
Don't register to ethdev RMV event if EAL hotplug is enabled.

> @@ -2196,6 +2203,9 @@ static void
>  eth_dev_event_callback(char *device_name, enum rte_dev_event_type type,
>  			     __rte_unused void *arg)
>  {
> +	uint16_t port_id;
> +	int ret;
> +
>  	if (type >= RTE_DEV_EVENT_MAX) {
>  		fprintf(stderr, "%s called upon invalid event %d\n",
>  			__func__, type);
> @@ -2206,9 +2216,12 @@ eth_dev_event_callback(char *device_name, enum
> rte_dev_event_type type,
>  	case RTE_DEV_EVENT_REMOVE:
>  		RTE_LOG(ERR, EAL, "The device: %s has been removed!\n",
>  			device_name);
> -		/* TODO: After finish failure handle, begin to stop
> -		 * packet forward, stop port, close port, detach port.
> -		 */
> +		ret = rte_eth_dev_get_port_by_name(device_name, &port_id);
> +		if (ret) {
> +			printf("can not get port by device %s!\n",
> device_name);
> +			return;
> +		}
> +		rmv_event_callback((void *)(intptr_t)port_id);
>  		break;
>  	case RTE_DEV_EVENT_ADD:
>  		RTE_LOG(ERR, EAL, "The device: %s has been added!\n", @@ -
> 2736,7 +2749,6 @@ main(int argc, char** argv)
>  			return -1;
>  		}
>  		eth_dev_event_callback_register();
> -
>  	}
> 
>  	if (start_port(RTE_PORT_ALL) != 0)
> --
> 2.7.4


^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V3 4/4] app/testpmd: show example to handle hot unplug
  2018-06-26 17:07           ` Matan Azrad
@ 2018-06-27  3:56             ` Guo, Jia
  2018-06-27  6:05               ` Matan Azrad
  0 siblings, 1 reply; 494+ messages in thread
From: Guo, Jia @ 2018-06-27  3:56 UTC (permalink / raw)
  To: Matan Azrad, stephen, bruce.richardson, ferruh.yigit,
	konstantin.ananyev, gaetan.rivet, jingjing.wu, Thomas Monjalon,
	Mordechay Haimovsky, harry.van.haaren, qi.z.zhang, shaopeng.he,
	bernard.iremonger
  Cc: jblunck, shreyansh.jain, dev, helin.zhang

hi, mantan


On 6/27/2018 1:07 AM, Matan Azrad wrote:
> Hi Jeff
>
> Continue session from last version + more comments\question.
>
> From: Jeff Guo
>> Sent: Tuesday, June 26, 2018 6:36 PM
>> To: stephen@networkplumber.org; bruce.richardson@intel.com;
>> ferruh.yigit@intel.com; konstantin.ananyev@intel.com;
>> gaetan.rivet@6wind.com; jingjing.wu@intel.com; Thomas Monjalon
>> <thomas@monjalon.net>; Mordechay Haimovsky <motih@mellanox.com>;
>> Matan Azrad <matan@mellanox.com>; harry.van.haaren@intel.com;
>> qi.z.zhang@intel.com; shaopeng.he@intel.com; bernard.iremonger@intel.com
>> Cc: jblunck@infradead.org; shreyansh.jain@nxp.com; dev@dpdk.org;
>> jia.guo@intel.com; helin.zhang@intel.com
>> Subject: [PATCH V3 4/4] app/testpmd: show example to handle hot unplug
>>
>> Use testpmd for example, to show how an application smoothly handle failure
>> when device being hot unplug. If app have enabled the device event monitor
>> and register the hot plug event’s callback before running, once app detect the
>> removal event, the callback would be called. It will first stop the packet
>> forwarding, then stop the port, close the port, and finally detach the port to
>> remove the device out from the device lists.
>>
>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
>> ---
>> v3->v2:
>> delete some unused check
>> ---
>>   app/test-pmd/testpmd.c | 22 +++++++++++++++++-----
>>   1 file changed, 17 insertions(+), 5 deletions(-)
>>
>> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
>> 24c1998..2ee5621 100644
>> --- a/app/test-pmd/testpmd.c
>> +++ b/app/test-pmd/testpmd.c
>> @@ -1951,9 +1951,10 @@ eth_dev_event_callback_unregister(void)
>>   void
>>   attach_port(char *identifier)
>>   {
>> -	portid_t pi = 0;
>>   	unsigned int socket_id;
>>
>> +	portid_t pi = rte_eth_dev_count_avail();
>> +
> I don't understand this change... can you explain?

think about if there are 2 or more device have been attached? The new 
device should not always add into port 0, right?

>>   	printf("Attaching a new port...\n");
>>
>>   	if (identifier == NULL) {
>> @@ -2125,16 +2126,22 @@ check_all_ports_link_status(uint32_t port_mask)
>> static void  rmv_event_callback(void *arg)  {
>> +	struct rte_eth_dev *dev;
>> +
>>   	int need_to_start = 0;
>>   	int org_no_link_check = no_link_check;
>>   	portid_t port_id = (intptr_t)arg;
>>
>>   	RTE_ETH_VALID_PORTID_OR_RET(port_id);
>> +	dev = &rte_eth_devices[port_id];
>> +
>> +	printf("removing device %s\n", dev->device->name);
>>
>>   	if (!test_done && port_is_forwarding(port_id)) {
>>   		need_to_start = 1;
>>   		stop_packet_forwarding();
>>   	}
>> +
> I don't think you need to change anything in this function.
> You can add the print in the caller code.

ok, i am fine for your point.

>>   	no_link_check = 1;
>>   	stop_port(port_id);
>>   	no_link_check = org_no_link_check;
> Suggestion for synchronization:
> Don't register to ethdev RMV event if EAL hotplug is enabled.

i think that what you propose might be a chose right now, and might need 
we think more about the better for all,
but do you agree it is better split it in another fix patch, to let it 
patch focus on the feature propose and implement?

>> @@ -2196,6 +2203,9 @@ static void
>>   eth_dev_event_callback(char *device_name, enum rte_dev_event_type type,
>>   			     __rte_unused void *arg)
>>   {
>> +	uint16_t port_id;
>> +	int ret;
>> +
>>   	if (type >= RTE_DEV_EVENT_MAX) {
>>   		fprintf(stderr, "%s called upon invalid event %d\n",
>>   			__func__, type);
>> @@ -2206,9 +2216,12 @@ eth_dev_event_callback(char *device_name, enum
>> rte_dev_event_type type,
>>   	case RTE_DEV_EVENT_REMOVE:
>>   		RTE_LOG(ERR, EAL, "The device: %s has been removed!\n",
>>   			device_name);
>> -		/* TODO: After finish failure handle, begin to stop
>> -		 * packet forward, stop port, close port, detach port.
>> -		 */
>> +		ret = rte_eth_dev_get_port_by_name(device_name, &port_id);
>> +		if (ret) {
>> +			printf("can not get port by device %s!\n",
>> device_name);
>> +			return;
>> +		}
>> +		rmv_event_callback((void *)(intptr_t)port_id);
>>   		break;
>>   	case RTE_DEV_EVENT_ADD:
>>   		RTE_LOG(ERR, EAL, "The device: %s has been added!\n", @@ -
>> 2736,7 +2749,6 @@ main(int argc, char** argv)
>>   			return -1;
>>   		}
>>   		eth_dev_event_callback_register();
>> -
>>   	}
>>
>>   	if (start_port(RTE_PORT_ALL) != 0)
>> --
>> 2.7.4

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V3 4/4] app/testpmd: show example to handle hot unplug
  2018-06-27  3:56             ` Guo, Jia
@ 2018-06-27  6:05               ` Matan Azrad
  2018-06-29 10:26                 ` Guo, Jia
  0 siblings, 1 reply; 494+ messages in thread
From: Matan Azrad @ 2018-06-27  6:05 UTC (permalink / raw)
  To: Guo, Jia, stephen, bruce.richardson, ferruh.yigit,
	konstantin.ananyev, gaetan.rivet, jingjing.wu, Thomas Monjalon,
	Mordechay Haimovsky, harry.van.haaren, qi.z.zhang, shaopeng.he,
	bernard.iremonger
  Cc: jblunck, shreyansh.jain, dev, helin.zhang

Hi Guo

From: Guo, Jia
> Sent: Wednesday, June 27, 2018 6:56 AM
> To: Matan Azrad <matan@mellanox.com>; stephen@networkplumber.org;
> bruce.richardson@intel.com; ferruh.yigit@intel.com;
> konstantin.ananyev@intel.com; gaetan.rivet@6wind.com;
> jingjing.wu@intel.com; Thomas Monjalon <thomas@monjalon.net>;
> Mordechay Haimovsky <motih@mellanox.com>; harry.van.haaren@intel.com;
> qi.z.zhang@intel.com; shaopeng.he@intel.com; bernard.iremonger@intel.com
> Cc: jblunck@infradead.org; shreyansh.jain@nxp.com; dev@dpdk.org;
> helin.zhang@intel.com
> Subject: Re: [PATCH V3 4/4] app/testpmd: show example to handle hot unplug
> 
> hi, mantan
> 
> 
> On 6/27/2018 1:07 AM, Matan Azrad wrote:
> > Hi Jeff
> >
> > Continue session from last version + more comments\question.
> >
> > From: Jeff Guo
> >> Sent: Tuesday, June 26, 2018 6:36 PM
> >> To: stephen@networkplumber.org; bruce.richardson@intel.com;
> >> ferruh.yigit@intel.com; konstantin.ananyev@intel.com;
> >> gaetan.rivet@6wind.com; jingjing.wu@intel.com; Thomas Monjalon
> >> <thomas@monjalon.net>; Mordechay Haimovsky <motih@mellanox.com>;
> >> Matan Azrad <matan@mellanox.com>; harry.van.haaren@intel.com;
> >> qi.z.zhang@intel.com; shaopeng.he@intel.com;
> >> bernard.iremonger@intel.com
> >> Cc: jblunck@infradead.org; shreyansh.jain@nxp.com; dev@dpdk.org;
> >> jia.guo@intel.com; helin.zhang@intel.com
> >> Subject: [PATCH V3 4/4] app/testpmd: show example to handle hot
> >> unplug
> >>
> >> Use testpmd for example, to show how an application smoothly handle
> >> failure when device being hot unplug. If app have enabled the device
> >> event monitor and register the hot plug event’s callback before
> >> running, once app detect the removal event, the callback would be
> >> called. It will first stop the packet forwarding, then stop the port,
> >> close the port, and finally detach the port to remove the device out from the
> device lists.
> >>
> >> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> >> ---
> >> v3->v2:
> >> delete some unused check
> >> ---
> >>   app/test-pmd/testpmd.c | 22 +++++++++++++++++-----
> >>   1 file changed, 17 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
> >> 24c1998..2ee5621 100644
> >> --- a/app/test-pmd/testpmd.c
> >> +++ b/app/test-pmd/testpmd.c
> >> @@ -1951,9 +1951,10 @@ eth_dev_event_callback_unregister(void)
> >>   void
> >>   attach_port(char *identifier)
> >>   {
> >> -	portid_t pi = 0;
> >>   	unsigned int socket_id;
> >>
> >> +	portid_t pi = rte_eth_dev_count_avail();
> >> +
> > I don't understand this change... can you explain?
> 
> think about if there are 2 or more device have been attached? The new device
> should not always add into port 0, right?

I think you miss here something, you are getting the port id from ethdev, you are just passing a pointer to get it.
I think you should remove this change too.

> 
> >>   	printf("Attaching a new port...\n");
> >>
> >>   	if (identifier == NULL) {
> >> @@ -2125,16 +2126,22 @@ check_all_ports_link_status(uint32_t
> >> port_mask) static void  rmv_event_callback(void *arg)  {
> >> +	struct rte_eth_dev *dev;
> >> +
> >>   	int need_to_start = 0;
> >>   	int org_no_link_check = no_link_check;
> >>   	portid_t port_id = (intptr_t)arg;
> >>
> >>   	RTE_ETH_VALID_PORTID_OR_RET(port_id);
> >> +	dev = &rte_eth_devices[port_id];
> >> +
> >> +	printf("removing device %s\n", dev->device->name);
> >>
> >>   	if (!test_done && port_is_forwarding(port_id)) {
> >>   		need_to_start = 1;
> >>   		stop_packet_forwarding();
> >>   	}
> >> +
> > I don't think you need to change anything in this function.
> > You can add the print in the caller code.
> 
> ok, i am fine for your point.
> 
> >>   	no_link_check = 1;
> >>   	stop_port(port_id);
> >>   	no_link_check = org_no_link_check;
> > Suggestion for synchronization:
> > Don't register to ethdev RMV event if EAL hotplug is enabled.
> 
> i think that what you propose might be a chose right now, and might need we
> think more about the better for all, but do you agree it is better split it in
> another fix patch, to let it patch focus on the feature propose and implement?

So, Are you suggesting to insert a bug and then to fix it ?:)

My suggestion:
Add a prior patch to depend the ethdev RMV event by a user parameter (can be your hotplug parameter and should be true by default).
In this patch add one more mode to the parameter to enable hotplug by the EAL. 

So finally the options of hotplug parameter can be:
0 - for no hotplug handle.
1 - ethdev hotplug (should be the default)
2 - EAL hotplug

What do you think?

> >> @@ -2196,6 +2203,9 @@ static void
> >>   eth_dev_event_callback(char *device_name, enum rte_dev_event_type
> type,
> >>   			     __rte_unused void *arg)
> >>   {
> >> +	uint16_t port_id;
> >> +	int ret;
> >> +
> >>   	if (type >= RTE_DEV_EVENT_MAX) {
> >>   		fprintf(stderr, "%s called upon invalid event %d\n",
> >>   			__func__, type);
> >> @@ -2206,9 +2216,12 @@ eth_dev_event_callback(char *device_name,
> enum
> >> rte_dev_event_type type,
> >>   	case RTE_DEV_EVENT_REMOVE:
> >>   		RTE_LOG(ERR, EAL, "The device: %s has been removed!\n",
> >>   			device_name);
> >> -		/* TODO: After finish failure handle, begin to stop
> >> -		 * packet forward, stop port, close port, detach port.
> >> -		 */
> >> +		ret = rte_eth_dev_get_port_by_name(device_name, &port_id);
> >> +		if (ret) {
> >> +			printf("can not get port by device %s!\n",
> >> device_name);
> >> +			return;
> >> +		}
> >> +		rmv_event_callback((void *)(intptr_t)port_id);
> >>   		break;
> >>   	case RTE_DEV_EVENT_ADD:
> >>   		RTE_LOG(ERR, EAL, "The device: %s has been added!\n", @@ -
> >> 2736,7 +2749,6 @@ main(int argc, char** argv)
> >>   			return -1;
> >>   		}
> >>   		eth_dev_event_callback_register();
> >> -
> >>   	}
> >>
> >>   	if (start_port(RTE_PORT_ALL) != 0)
> >> --
> >> 2.7.4


^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V3 4/4] app/testpmd: show example to handle hot unplug
  2018-06-27  6:05               ` Matan Azrad
@ 2018-06-29 10:26                 ` Guo, Jia
  0 siblings, 0 replies; 494+ messages in thread
From: Guo, Jia @ 2018-06-29 10:26 UTC (permalink / raw)
  To: Matan Azrad, stephen, bruce.richardson, ferruh.yigit,
	konstantin.ananyev, gaetan.rivet, jingjing.wu, Thomas Monjalon,
	Mordechay Haimovsky, harry.van.haaren, qi.z.zhang, shaopeng.he,
	bernard.iremonger
  Cc: jblunck, shreyansh.jain, dev, helin.zhang

matan


On 6/27/2018 2:05 PM, Matan Azrad wrote:
> Hi Guo
>
> From: Guo, Jia
>> Sent: Wednesday, June 27, 2018 6:56 AM
>> To: Matan Azrad <matan@mellanox.com>; stephen@networkplumber.org;
>> bruce.richardson@intel.com; ferruh.yigit@intel.com;
>> konstantin.ananyev@intel.com; gaetan.rivet@6wind.com;
>> jingjing.wu@intel.com; Thomas Monjalon <thomas@monjalon.net>;
>> Mordechay Haimovsky <motih@mellanox.com>; harry.van.haaren@intel.com;
>> qi.z.zhang@intel.com; shaopeng.he@intel.com; bernard.iremonger@intel.com
>> Cc: jblunck@infradead.org; shreyansh.jain@nxp.com; dev@dpdk.org;
>> helin.zhang@intel.com
>> Subject: Re: [PATCH V3 4/4] app/testpmd: show example to handle hot unplug
>>
>> hi, mantan
>>
>>
>> On 6/27/2018 1:07 AM, Matan Azrad wrote:
>>> Hi Jeff
>>>
>>> Continue session from last version + more comments\question.
>>>
>>> From: Jeff Guo
>>>> Sent: Tuesday, June 26, 2018 6:36 PM
>>>> To: stephen@networkplumber.org; bruce.richardson@intel.com;
>>>> ferruh.yigit@intel.com; konstantin.ananyev@intel.com;
>>>> gaetan.rivet@6wind.com; jingjing.wu@intel.com; Thomas Monjalon
>>>> <thomas@monjalon.net>; Mordechay Haimovsky <motih@mellanox.com>;
>>>> Matan Azrad <matan@mellanox.com>; harry.van.haaren@intel.com;
>>>> qi.z.zhang@intel.com; shaopeng.he@intel.com;
>>>> bernard.iremonger@intel.com
>>>> Cc: jblunck@infradead.org; shreyansh.jain@nxp.com; dev@dpdk.org;
>>>> jia.guo@intel.com; helin.zhang@intel.com
>>>> Subject: [PATCH V3 4/4] app/testpmd: show example to handle hot
>>>> unplug
>>>>
>>>> Use testpmd for example, to show how an application smoothly handle
>>>> failure when device being hot unplug. If app have enabled the device
>>>> event monitor and register the hot plug event’s callback before
>>>> running, once app detect the removal event, the callback would be
>>>> called. It will first stop the packet forwarding, then stop the port,
>>>> close the port, and finally detach the port to remove the device out from the
>> device lists.
>>>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
>>>> ---
>>>> v3->v2:
>>>> delete some unused check
>>>> ---
>>>>    app/test-pmd/testpmd.c | 22 +++++++++++++++++-----
>>>>    1 file changed, 17 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
>>>> 24c1998..2ee5621 100644
>>>> --- a/app/test-pmd/testpmd.c
>>>> +++ b/app/test-pmd/testpmd.c
>>>> @@ -1951,9 +1951,10 @@ eth_dev_event_callback_unregister(void)
>>>>    void
>>>>    attach_port(char *identifier)
>>>>    {
>>>> -	portid_t pi = 0;
>>>>    	unsigned int socket_id;
>>>>
>>>> +	portid_t pi = rte_eth_dev_count_avail();
>>>> +
>>> I don't understand this change... can you explain?
>> think about if there are 2 or more device have been attached? The new device
>> should not always add into port 0, right?
> I think you miss here something, you are getting the port id from ethdev, you are just passing a pointer to get it.
> I think you should remove this change too.

ok, seems i am missing something, let me check.

>>>>    	printf("Attaching a new port...\n");
>>>>
>>>>    	if (identifier == NULL) {
>>>> @@ -2125,16 +2126,22 @@ check_all_ports_link_status(uint32_t
>>>> port_mask) static void  rmv_event_callback(void *arg)  {
>>>> +	struct rte_eth_dev *dev;
>>>> +
>>>>    	int need_to_start = 0;
>>>>    	int org_no_link_check = no_link_check;
>>>>    	portid_t port_id = (intptr_t)arg;
>>>>
>>>>    	RTE_ETH_VALID_PORTID_OR_RET(port_id);
>>>> +	dev = &rte_eth_devices[port_id];
>>>> +
>>>> +	printf("removing device %s\n", dev->device->name);
>>>>
>>>>    	if (!test_done && port_is_forwarding(port_id)) {
>>>>    		need_to_start = 1;
>>>>    		stop_packet_forwarding();
>>>>    	}
>>>> +
>>> I don't think you need to change anything in this function.
>>> You can add the print in the caller code.
>> ok, i am fine for your point.
>>
>>>>    	no_link_check = 1;
>>>>    	stop_port(port_id);
>>>>    	no_link_check = org_no_link_check;
>>> Suggestion for synchronization:
>>> Don't register to ethdev RMV event if EAL hotplug is enabled.
>> i think that what you propose might be a chose right now, and might need we
>> think more about the better for all, but do you agree it is better split it in
>> another fix patch, to let it patch focus on the feature propose and implement?
> So, Are you suggesting to insert a bug and then to fix it ?:)
>
> My suggestion:
> Add a prior patch to depend the ethdev RMV event by a user parameter (can be your hotplug parameter and should be true by default).
> In this patch add one more mode to the parameter to enable hotplug by the EAL.
>
> So finally the options of hotplug parameter can be:
> 0 - for no hotplug handle.
> 1 - ethdev hotplug (should be the default)
> 2 - EAL hotplug
>
> What do you think?

sure, i think you absolutely know i don't want to add any bug here :) 
just want to make it more focus and clear.
your propose looks fine by me. good idea, thanks.
please check my v4 patch set.

>>>> @@ -2196,6 +2203,9 @@ static void
>>>>    eth_dev_event_callback(char *device_name, enum rte_dev_event_type
>> type,
>>>>    			     __rte_unused void *arg)
>>>>    {
>>>> +	uint16_t port_id;
>>>> +	int ret;
>>>> +
>>>>    	if (type >= RTE_DEV_EVENT_MAX) {
>>>>    		fprintf(stderr, "%s called upon invalid event %d\n",
>>>>    			__func__, type);
>>>> @@ -2206,9 +2216,12 @@ eth_dev_event_callback(char *device_name,
>> enum
>>>> rte_dev_event_type type,
>>>>    	case RTE_DEV_EVENT_REMOVE:
>>>>    		RTE_LOG(ERR, EAL, "The device: %s has been removed!\n",
>>>>    			device_name);
>>>> -		/* TODO: After finish failure handle, begin to stop
>>>> -		 * packet forward, stop port, close port, detach port.
>>>> -		 */
>>>> +		ret = rte_eth_dev_get_port_by_name(device_name, &port_id);
>>>> +		if (ret) {
>>>> +			printf("can not get port by device %s!\n",
>>>> device_name);
>>>> +			return;
>>>> +		}
>>>> +		rmv_event_callback((void *)(intptr_t)port_id);
>>>>    		break;
>>>>    	case RTE_DEV_EVENT_ADD:
>>>>    		RTE_LOG(ERR, EAL, "The device: %s has been added!\n", @@ -
>>>> 2736,7 +2749,6 @@ main(int argc, char** argv)
>>>>    			return -1;
>>>>    		}
>>>>    		eth_dev_event_callback_register();
>>>> -
>>>>    	}
>>>>
>>>>    	if (start_port(RTE_PORT_ALL) != 0)
>>>> --
>>>> 2.7.4

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH V4 0/9] hot plug failure handle mechanism
  2017-06-29  4:37     ` [PATCH v3 0/2] add uevent api for hot plug Jeff Guo
                         ` (7 preceding siblings ...)
  2018-06-26 15:36       ` [PATCH V3 1/4] bus/pci: handle device " Jeff Guo
@ 2018-06-29 10:30       ` Jeff Guo
  2018-06-29 10:30         ` [PATCH V4 1/9] bus: introduce hotplug failure handler Jeff Guo
                           ` (8 more replies)
  2018-07-05  7:38       ` [PATCH V5 0/7] hot plug failure handle mechanism Jeff Guo
                         ` (14 subsequent siblings)
  23 siblings, 9 replies; 494+ messages in thread
From: Jeff Guo @ 2018-06-29 10:30 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

As we know, hot plug is an importance feature, either use for the datacenter
device’s fail-safe, or use for SRIOV Live Migration in SDN/NFV. It could bring
the higher flexibility and continuality to the networking services in multiple
use cases in industry. So let we see, dpdk as an importance networking
framework, what can it help to implement hot plug solution for users.

We already have a general device event detect mechanism, failsafe driver,
bonding driver and hot plug/unplug api in framework, app could use these to
develop their hot plug solution.

let’s see the case of hot unplug, it can happen when a hardware device is
be removed physically, or when the software disables it.  App need to call
ether dev API to detach the device, to unplug the device at the bus level and
make access to the device invalid. But the problem is that, the removal of the
device from the software lists is not going to be instantaneous, at this time
if the data(fast) path still read/write the device, it will cause MMIO error
and result of the app crash out.

Seems that we have got fail-safe driver(or app) + RTE_ETH_EVENT_INTR_RMV +
kernel core driver solution to handle it, but still not have failsafe driver
(or app) + RTE_DEV_EVENT_REMOVE + PCIe pmd driver failure handle solution. So
there is an absence in dpdk hot plug solution right now.

Also, we know that kernel only guaranty hot plug on the kernel side, but not for
the user mode side. Firstly we can hardly have a gatekeeper for any MMIO for
multiple PMD driver. Secondly, no more specific 3rd tools such as udev/driverctl
have especially cover these hot plug failure processing. Third, the feasibility
of app’s implement for multiple user mode PMD driver is still a problem. Here,
a general hot plug failure handle mechanism in dpdk framework would be proposed,
it aim to guaranty that, when hot unplug occur, the system will not crash and
app will not be break out, and user space can normally stop and release any
relevant resources, then unplug of the device at the bus level cleanly.

The mechanism should be come across as bellow:

Firstly, app enabled the device event monitor and register the hot plug event’s
callback before running data path. Once the hot unplug behave occur, the
mechanism will detect the removal event and then accordingly do the failure
handle. In order to do that, below functional will be bring in.
 - Add a new bus ops “handle_hot_unplug” to handle bus read/write error, it is
   bus-specific and each kind of bus can implement its own logic.
 - Implement pci bus specific ops “pci_handle_hot_unplug”. It will base on the
   failure address to remap memory for the corresponding device that unplugged.

For the data path or other unexpected control from the control path when hot
unplug occur.
 - Implement a new sigbus handler, it is registered when start device even
   monitoring. The handler is per process. Base on the signal event principle,
   control path thread and data path thread will randomly receive the sigbus
   error, but will go to the common sigbus handler. Once the MMIO sigbus error
   exposure, it will trigger the above hot unplug operation. The sigbus will be
   check if it is cause of the hot unplug or not, if not will info exception as
   the original sigbus handler. If yes, will do memory remapping.

For the control path and the igb uio release:
 - When hot unplug device, the kernel will release the device resource in the
   kernel side, such as the fd sys file will disappear, and the irq will be
   released. At this time, if igb uio driver still try to release this resource,
   it will cause kernel crash.
   On the other hand, something like interrupt disable do not automatically
   process in kernel side. If not handler it, this redundancy and dirty thing
   will affect the interrupt resource be used by other device.
   So the igb_uio driver have to check the hot plug status and corresponding
   process should be taken in igb uio deriver.
   This patch propose to add structure of rte_udev_state into rte_uio_pci_dev
   of igb_uio kernel driver, which will record the state of uio device, such as
   probed/opened/released/removed/unplug. When detect the unexpected removal
   which cause of hot unplug behavior, it will corresponding disable interrupt
   resource, while for the part of releasement which kernel have already handle,
   just skip it to avoid double free or null pointer kernel crash issue.

The mechanism could be use for fail-safe driver and app which want to use hot
plug solution. At this stage, will only use testpmd as reference to show how to
use the mechanism.
 - Enable device event monitor->device unplug->failure handle->stop forwarding->
   stop port->close port->detach port.

This process will not breaking the app/fail-safe running, and will not break
other irrelevance device. And app could plug in the device and restart the date
path again by below.
 - Device plug in->bind igb_uio driver ->attached device->start port->
   start forwarding.

patchset history:
v4->v3:
split patches to be small and clear
change to use new parameter "--hotplug-mode" in testpmd
to identify the eal hotplug and ethdev hotplug

v3->v2:
change bus ops name to bus_hotplug_handler.
add new API and bus ops of bus_signal_handler
distingush handle generic sigbus and hotplug sigbus

v2->v1(v21):
refine some doc and commit log
fix igb uio kernel issue for control path failure
rebase testpmd code

Since the hot plug solution be discussed serval around in the public, the
scope be changed and the patch set be split into many times. Coming to the
recently RFC and feature design, it just focus on the hot unplug failure
handler at this patch set, so in order let this topic more clear and focus,
summarize privours patch set in history “v1(v21)”, the v2 here go ahead
for further track.

"v1(21)" == v21 as below:
v21->v20:
split function in hot unplug ops
sync failure hanlde to fix multiple process issue fix attach port issue for multiple devices case.
combind rmv callback function to be only one.

v20->v19:
clean the code
refine the remap logic for multiple device.
remove the auto binding

v19->18:
note for limitation of multiple hotplug,fix some typo, sqeeze patch.

v18->v15:
add document, add signal bus handler, refine the code to be more clear.

the prior patch history please check the patch set "add device event monitor framework"

Jeff Guo (9):
  bus: introduce hotplug failure handler
  bus/pci: implement hotplug handler operation
  bus: introduce sigbus handler
  bus/pci: implement sigbus handler operation
  bus: add helper to handle sigbus
  eal: add failure handle mechanism for hot plug
  igb_uio: fix uio release issue when hot unplug
  app/testpmd: show example to handle hot unplug
  app/testpmd: enable device hotplug monitoring

 app/test-pmd/parameters.c               | 20 ++++++--
 app/test-pmd/testpmd.c                  | 31 +++++++-----
 app/test-pmd/testpmd.h                  |  8 ++-
 doc/guides/testpmd_app_ug/run_app.rst   | 10 +++-
 drivers/bus/pci/pci_common.c            | 78 +++++++++++++++++++++++++++++
 drivers/bus/pci/pci_common_uio.c        | 33 +++++++++++++
 drivers/bus/pci/private.h               | 12 +++++
 kernel/linux/igb_uio/igb_uio.c          | 50 +++++++++++++++++--
 lib/librte_eal/common/eal_common_bus.c  | 34 ++++++++++++-
 lib/librte_eal/common/eal_private.h     | 11 +++++
 lib/librte_eal/common/include/rte_bus.h | 31 ++++++++++++
 lib/librte_eal/linuxapp/eal/eal_dev.c   | 88 ++++++++++++++++++++++++++++++++-
 12 files changed, 381 insertions(+), 25 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH V4 1/9] bus: introduce hotplug failure handler
  2018-06-29 10:30       ` [PATCH V4 0/9] hot plug failure handle mechanism Jeff Guo
@ 2018-06-29 10:30         ` Jeff Guo
  2018-07-03 22:21           ` Thomas Monjalon
  2018-06-29 10:30         ` [PATCH V4 2/9] bus/pci: implement hotplug handler operation Jeff Guo
                           ` (7 subsequent siblings)
  8 siblings, 1 reply; 494+ messages in thread
From: Jeff Guo @ 2018-06-29 10:30 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

When a hardware device is removed physically or the software disables
it, the hotplug occur. App need to call ether dev API to detach the device,
to unplug the device at the bus level and make access to the device
invalid. But the removal of the device from the software lists is not going
to be instantaneous, at this time if the data path still read/write the
device, it will cause MMIO error and result of the app crash out. So a
hotplug failure handle mechanism need to be used to guaranty app will not
crash out when hot unplug device.

To handle device hot plug failure is a bus-specific behavior, this patch
introduces a bus ops so that each kind of bus can implement its own logic.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v4->v3:
split patches to be small and clear.
---
 lib/librte_eal/common/include/rte_bus.h | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
index eb9eded..3642aeb 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -168,6 +168,19 @@ typedef int (*rte_bus_unplug_t)(struct rte_device *dev);
 typedef int (*rte_bus_parse_t)(const char *name, void *addr);
 
 /**
+ * Implementation a specific hot plug handler, which is responsible
+ * for handle the failure when hot remove the device, guaranty the system
+ * would not crash in the case.
+ * @param dev
+ *	Pointer of the device structure.
+ *
+ * @return
+ *	0 on success.
+ *	!0 on error.
+ */
+typedef int (*rte_bus_hotplug_handler_t)(struct rte_device *dev);
+
+/**
  * Bus scan policies
  */
 enum rte_bus_scan_mode {
@@ -211,6 +224,8 @@ struct rte_bus {
 	rte_bus_parse_t parse;       /**< Parse a device name */
 	struct rte_bus_conf conf;    /**< Bus configuration */
 	rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
+	rte_bus_hotplug_handler_t hotplug_handler;
+						/**< handle hot plug on bus */
 };
 
 /**
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH V4 2/9] bus/pci: implement hotplug handler operation
  2018-06-29 10:30       ` [PATCH V4 0/9] hot plug failure handle mechanism Jeff Guo
  2018-06-29 10:30         ` [PATCH V4 1/9] bus: introduce hotplug failure handler Jeff Guo
@ 2018-06-29 10:30         ` Jeff Guo
  2018-06-29 10:30         ` [PATCH V4 3/9] bus: introduce sigbus handler Jeff Guo
                           ` (6 subsequent siblings)
  8 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-06-29 10:30 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

This patch implements the ops of hotplug handler for PCI bus, it is
functional to remap a new dummy memory which overlap to the failure
memory to avoid MMIO read/write error.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v4->v3:
split patches to be small and clear.
---
 drivers/bus/pci/pci_common.c     | 28 ++++++++++++++++++++++++++++
 drivers/bus/pci/pci_common_uio.c | 33 +++++++++++++++++++++++++++++++++
 drivers/bus/pci/private.h        | 12 ++++++++++++
 3 files changed, 73 insertions(+)

diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
index d8151b0..095cd4e 100644
--- a/drivers/bus/pci/pci_common.c
+++ b/drivers/bus/pci/pci_common.c
@@ -401,6 +401,33 @@ pci_find_device(const struct rte_device *start, rte_dev_cmp_t cmp,
 }
 
 static int
+pci_hotplug_handler(struct rte_device *dev)
+{
+	struct rte_pci_device *pdev = NULL;
+	int ret = 0;
+
+	pdev = RTE_DEV_TO_PCI(dev);
+	if (!pdev)
+		return -1;
+
+	switch (pdev->kdrv) {
+	case RTE_KDRV_IGB_UIO:
+	case RTE_KDRV_UIO_GENERIC:
+	case RTE_KDRV_NIC_UIO:
+		/* mmio resources is invalid, remap it to be safe. */
+		ret = pci_uio_remap_resource(pdev);
+		break;
+	default:
+		RTE_LOG(DEBUG, EAL,
+			"Not managed by a supported kernel driver, skipped\n");
+		ret = -1;
+		break;
+	}
+
+	return ret;
+}
+
+static int
 pci_plug(struct rte_device *dev)
 {
 	return pci_probe_all_drivers(RTE_DEV_TO_PCI(dev));
@@ -430,6 +457,7 @@ struct rte_pci_bus rte_pci_bus = {
 		.unplug = pci_unplug,
 		.parse = pci_parse,
 		.get_iommu_class = rte_pci_get_iommu_class,
+		.hotplug_handler = pci_hotplug_handler,
 	},
 	.device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
 	.driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
diff --git a/drivers/bus/pci/pci_common_uio.c b/drivers/bus/pci/pci_common_uio.c
index 54bc20b..7ea73db 100644
--- a/drivers/bus/pci/pci_common_uio.c
+++ b/drivers/bus/pci/pci_common_uio.c
@@ -146,6 +146,39 @@ pci_uio_unmap(struct mapped_pci_resource *uio_res)
 	}
 }
 
+/* remap the PCI resource of a PCI device in anonymous virtual memory */
+int
+pci_uio_remap_resource(struct rte_pci_device *dev)
+{
+	int i;
+	void *map_address;
+
+	if (dev == NULL)
+		return -1;
+
+	/* Remap all BARs */
+	for (i = 0; i != PCI_MAX_RESOURCE; i++) {
+		/* skip empty BAR */
+		if (dev->mem_resource[i].phys_addr == 0)
+			continue;
+		map_address = mmap(dev->mem_resource[i].addr,
+				(size_t)dev->mem_resource[i].len,
+				PROT_READ | PROT_WRITE,
+				MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
+		if (map_address == MAP_FAILED) {
+			RTE_LOG(ERR, EAL,
+				"Cannot remap resource for device %s\n",
+				dev->name);
+			return -1;
+		}
+		RTE_LOG(INFO, EAL,
+			"Successful remap resource for device %s\n",
+			dev->name);
+	}
+
+	return 0;
+}
+
 static struct mapped_pci_resource *
 pci_uio_find_resource(struct rte_pci_device *dev)
 {
diff --git a/drivers/bus/pci/private.h b/drivers/bus/pci/private.h
index 8ddd03e..6b312e5 100644
--- a/drivers/bus/pci/private.h
+++ b/drivers/bus/pci/private.h
@@ -123,6 +123,18 @@ void pci_uio_free_resource(struct rte_pci_device *dev,
 		struct mapped_pci_resource *uio_res);
 
 /**
+ * Remap the PCI resource of a PCI device in anonymous virtual memory.
+ *
+ * @param dev
+ *   Point to the struct rte pci device.
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+pci_uio_remap_resource(struct rte_pci_device *dev);
+
+/**
  * Map device memory to uio resource
  *
  * This function is private to EAL.
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH V4 3/9] bus: introduce sigbus handler
  2018-06-29 10:30       ` [PATCH V4 0/9] hot plug failure handle mechanism Jeff Guo
  2018-06-29 10:30         ` [PATCH V4 1/9] bus: introduce hotplug failure handler Jeff Guo
  2018-06-29 10:30         ` [PATCH V4 2/9] bus/pci: implement hotplug handler operation Jeff Guo
@ 2018-06-29 10:30         ` Jeff Guo
  2018-07-10 21:55           ` Stephen Hemminger
  2018-06-29 10:30         ` [PATCH V4 4/9] bus/pci: implement sigbus handler operation Jeff Guo
                           ` (5 subsequent siblings)
  8 siblings, 1 reply; 494+ messages in thread
From: Jeff Guo @ 2018-06-29 10:30 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

When device be hotplug, if data path still read/write device, the sigbus
error will occur, this error need to be handled. So a handler need to be
here to capture the signal and handle it correspondingly.

To handle sigbus error is a bus-specific behavior, this patch introduces
a bus ops so that each kind of bus can implement its own logic.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v4->v3:
split patches to be small and clear.
---
 lib/librte_eal/common/include/rte_bus.h | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
index 3642aeb..231bd3d 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -181,6 +181,20 @@ typedef int (*rte_bus_parse_t)(const char *name, void *addr);
 typedef int (*rte_bus_hotplug_handler_t)(struct rte_device *dev);
 
 /**
+ * Implementation a specific sigbus handler, which is responsible
+ * for handle the sigbus error which is original memory error, or specific
+ * memory error that caused of hot unplug.
+ * @param failure_addr
+ *	Pointer of the fault address of the sigbus error.
+ *
+ * @return
+ *	0 for success handle the sigbus.
+ *	1 for no handle the sigbus.
+ *	-1 for failed to handle the sigbus
+ */
+typedef int (*rte_bus_sigbus_handler_t)(const void *failure_addr);
+
+/**
  * Bus scan policies
  */
 enum rte_bus_scan_mode {
@@ -226,6 +240,8 @@ struct rte_bus {
 	rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
 	rte_bus_hotplug_handler_t hotplug_handler;
 						/**< handle hot plug on bus */
+	rte_bus_sigbus_handler_t sigbus_handler; /**< handle sigbus error */
+
 };
 
 /**
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH V4 4/9] bus/pci: implement sigbus handler operation
  2018-06-29 10:30       ` [PATCH V4 0/9] hot plug failure handle mechanism Jeff Guo
                           ` (2 preceding siblings ...)
  2018-06-29 10:30         ` [PATCH V4 3/9] bus: introduce sigbus handler Jeff Guo
@ 2018-06-29 10:30         ` Jeff Guo
  2018-06-29 10:30         ` [PATCH V4 5/9] bus: add helper to handle sigbus Jeff Guo
                           ` (4 subsequent siblings)
  8 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-06-29 10:30 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

This patch implements the ops of sigbus handler for PCI bus, it is
functional to find the corresponding pci device which is be hot removal.
and then handle the hot plug failure for this device.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v4->v3:
split patches to be small and clear.
---
 drivers/bus/pci/pci_common.c | 50 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 50 insertions(+)

diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
index 095cd4e..0f5b4af 100644
--- a/drivers/bus/pci/pci_common.c
+++ b/drivers/bus/pci/pci_common.c
@@ -400,6 +400,32 @@ pci_find_device(const struct rte_device *start, rte_dev_cmp_t cmp,
 	return NULL;
 }
 
+/* check the failure address belongs to which device. */
+static struct rte_pci_device *
+pci_find_device_by_addr(const void *failure_addr)
+{
+	struct rte_pci_device *pdev = NULL;
+	int i;
+
+	FOREACH_DEVICE_ON_PCIBUS(pdev) {
+		for (i = 0; i != RTE_DIM(pdev->mem_resource); i++) {
+			if ((uint64_t)(uintptr_t)failure_addr >=
+			    (uint64_t)(uintptr_t)pdev->mem_resource[i].addr &&
+			    (uint64_t)(uintptr_t)failure_addr <
+			    (uint64_t)(uintptr_t)pdev->mem_resource[i].addr +
+			    pdev->mem_resource[i].len) {
+				RTE_LOG(INFO, EAL, "Failure address "
+					"%16.16"PRIx64" belongs to "
+					"device %s!\n",
+					(uint64_t)(uintptr_t)failure_addr,
+					pdev->device.name);
+				return pdev;
+			}
+		}
+	}
+	return NULL;
+}
+
 static int
 pci_hotplug_handler(struct rte_device *dev)
 {
@@ -428,6 +454,29 @@ pci_hotplug_handler(struct rte_device *dev)
 }
 
 static int
+pci_sigbus_handler(const void *failure_addr)
+{
+	struct rte_pci_device *pdev = NULL;
+	int ret = 0;
+
+	pdev = pci_find_device_by_addr(failure_addr);
+	if (!pdev) {
+		/* It is a generic sigbus error. */
+		ret = 1;
+	} else {
+		/* The sigbus error is caused of hot removal. */
+		ret = pci_hotplug_handler(&pdev->device);
+		if (ret) {
+			RTE_LOG(ERR, EAL, "Failed to handle hot plug for "
+				"device %s", pdev->name);
+			ret = -1;
+			rte_errno = -1;
+		}
+	}
+	return ret;
+}
+
+static int
 pci_plug(struct rte_device *dev)
 {
 	return pci_probe_all_drivers(RTE_DEV_TO_PCI(dev));
@@ -458,6 +507,7 @@ struct rte_pci_bus rte_pci_bus = {
 		.parse = pci_parse,
 		.get_iommu_class = rte_pci_get_iommu_class,
 		.hotplug_handler = pci_hotplug_handler,
+		.sigbus_handler = pci_sigbus_handler,
 	},
 	.device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
 	.driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH V4 5/9] bus: add helper to handle sigbus
  2018-06-29 10:30       ` [PATCH V4 0/9] hot plug failure handle mechanism Jeff Guo
                           ` (3 preceding siblings ...)
  2018-06-29 10:30         ` [PATCH V4 4/9] bus/pci: implement sigbus handler operation Jeff Guo
@ 2018-06-29 10:30         ` Jeff Guo
  2018-06-29 10:51           ` Ananyev, Konstantin
  2018-06-29 10:30         ` [PATCH V4 6/9] eal: add failure handle mechanism for hot plug Jeff Guo
                           ` (3 subsequent siblings)
  8 siblings, 1 reply; 494+ messages in thread
From: Jeff Guo @ 2018-06-29 10:30 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

This patch aim to add a helper to iterate all buses to find the
corresponding bus to handle the sigbus error.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v4->v3:
split patches to be small and clear.
---
 lib/librte_eal/common/eal_common_bus.c | 34 +++++++++++++++++++++++++++++++++-
 lib/librte_eal/common/eal_private.h    | 11 +++++++++++
 2 files changed, 44 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/common/eal_common_bus.c b/lib/librte_eal/common/eal_common_bus.c
index 0943851..34c4f2d 100644
--- a/lib/librte_eal/common/eal_common_bus.c
+++ b/lib/librte_eal/common/eal_common_bus.c
@@ -37,6 +37,7 @@
 #include <rte_bus.h>
 #include <rte_debug.h>
 #include <rte_string_fns.h>
+#include <rte_errno.h>
 
 #include "eal_private.h"
 
@@ -220,7 +221,6 @@ rte_bus_find_by_device_name(const char *str)
 	return rte_bus_find(NULL, bus_can_parse, name);
 }
 
-
 /*
  * Get iommu class of devices on the bus.
  */
@@ -242,3 +242,35 @@ rte_bus_get_iommu_class(void)
 	}
 	return mode;
 }
+
+static int
+bus_handle_sigbus(const struct rte_bus *bus,
+			const void *failure_addr)
+{
+	return !(bus->sigbus_handler && bus->sigbus_handler(failure_addr) <= 0);
+}
+
+int
+rte_bus_sigbus_handler(const void *failure_addr)
+{
+	struct rte_bus *bus;
+	int old_errno = rte_errno;
+	int ret = 0;
+
+	rte_errno = 0;
+
+	bus = rte_bus_find(NULL, bus_handle_sigbus, failure_addr);
+	if (bus == NULL) {
+		RTE_LOG(ERR, EAL, "No bus can handle the sigbus error!");
+		ret = -1;
+	} else if (rte_errno != 0) {
+		RTE_LOG(ERR, EAL, "Failed to handle the sigbus error!");
+		ret = -1;
+	}
+
+	/* if sigbus not be handled, return back old errno. */
+	if (ret)
+		rte_errno = old_errno;
+
+	return ret;
+}
diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h
index bdadc4d..9517f2b 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -258,4 +258,15 @@ int rte_mp_channel_init(void);
  */
 void dev_callback_process(char *device_name, enum rte_dev_event_type event);
 
+
+/**
+ * Iterate all buses to find the corresponding bus, to handle the sigbus error.
+ * @param failure_addr
+ *	Pointer of the fault address of the sigbus error.
+ *
+ * @return
+ *	 0 on success.
+ *	-1 on error
+ */
+int rte_bus_sigbus_handler(const void *failure_addr);
 #endif /* _EAL_PRIVATE_H_ */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH V4 6/9] eal: add failure handle mechanism for hot plug
  2018-06-29 10:30       ` [PATCH V4 0/9] hot plug failure handle mechanism Jeff Guo
                           ` (4 preceding siblings ...)
  2018-06-29 10:30         ` [PATCH V4 5/9] bus: add helper to handle sigbus Jeff Guo
@ 2018-06-29 10:30         ` Jeff Guo
  2018-06-29 10:49           ` Ananyev, Konstantin
  2018-06-29 10:30         ` [PATCH V4 7/9] igb_uio: fix uio release issue when hot unplug Jeff Guo
                           ` (2 subsequent siblings)
  8 siblings, 1 reply; 494+ messages in thread
From: Jeff Guo @ 2018-06-29 10:30 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

This patch introduces a failure handler mechanism to handle device
hot plug removal event.

First register sigbus handler, once sigbus error be captured, will
check the failure address and accordingly remap the invalid memory
for the corresponding device. Bese on this mechanism, it could
guaranty the application not to be crash when hot unplug devices.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v4->v3:
split patches to be small and clear.
---
 lib/librte_eal/linuxapp/eal/eal_dev.c | 88 ++++++++++++++++++++++++++++++++++-
 1 file changed, 87 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
index 1cf6aeb..c9dddab 100644
--- a/lib/librte_eal/linuxapp/eal/eal_dev.c
+++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
@@ -4,6 +4,8 @@
 
 #include <string.h>
 #include <unistd.h>
+#include <fcntl.h>
+#include <signal.h>
 #include <sys/socket.h>
 #include <linux/netlink.h>
 
@@ -14,15 +16,24 @@
 #include <rte_malloc.h>
 #include <rte_interrupts.h>
 #include <rte_alarm.h>
+#include <rte_bus.h>
+#include <rte_eal.h>
+#include <rte_spinlock.h>
+#include <rte_errno.h>
 
 #include "eal_private.h"
 
 static struct rte_intr_handle intr_handle = {.fd = -1 };
 static bool monitor_started;
 
+extern struct rte_bus_list rte_bus_list;
+
 #define EAL_UEV_MSG_LEN 4096
 #define EAL_UEV_MSG_ELEM_LEN 128
 
+/* spinlock for device failure process */
+static rte_spinlock_t dev_failure_lock = RTE_SPINLOCK_INITIALIZER;
+
 static void dev_uev_handler(__rte_unused void *param);
 
 /* identify the system layer which reports this event. */
@@ -33,6 +44,34 @@ enum eal_dev_event_subsystem {
 	EAL_DEV_EVENT_SUBSYSTEM_MAX
 };
 
+static void sigbus_handler(int signum __rte_unused, siginfo_t *info,
+				void *ctx __rte_unused)
+{
+	int ret;
+
+	RTE_LOG(DEBUG, EAL, "Thread[%d] catch SIGBUS, fault address:%p\n",
+		(int)pthread_self(), info->si_addr);
+
+	rte_spinlock_lock(&dev_failure_lock);
+	ret = rte_bus_sigbus_handler(info->si_addr);
+	rte_spinlock_unlock(&dev_failure_lock);
+	if (!ret)
+		RTE_LOG(INFO, EAL,
+			"Success to handle SIGBUS error for hotplug!\n");
+	else
+		rte_exit(EXIT_FAILURE,
+			 "A generic SIGBUS error, (rte_errno: %s)!",
+			 strerror(rte_errno));
+}
+
+static int cmp_dev_name(const struct rte_device *dev,
+	const void *_name)
+{
+	const char *name = _name;
+
+	return strcmp(dev->name, name);
+}
+
 static int
 dev_uev_socket_fd_create(void)
 {
@@ -147,6 +186,9 @@ dev_uev_handler(__rte_unused void *param)
 	struct rte_dev_event uevent;
 	int ret;
 	char buf[EAL_UEV_MSG_LEN];
+	struct rte_bus *bus;
+	struct rte_device *dev;
+	const char *busname;
 
 	memset(&uevent, 0, sizeof(struct rte_dev_event));
 	memset(buf, 0, EAL_UEV_MSG_LEN);
@@ -171,13 +213,48 @@ dev_uev_handler(__rte_unused void *param)
 	RTE_LOG(DEBUG, EAL, "receive uevent(name:%s, type:%d, subsystem:%d)\n",
 		uevent.devname, uevent.type, uevent.subsystem);
 
-	if (uevent.devname)
+	switch (uevent.subsystem) {
+	case EAL_DEV_EVENT_SUBSYSTEM_PCI:
+	case EAL_DEV_EVENT_SUBSYSTEM_UIO:
+		busname = "pci";
+		break;
+	default:
+		break;
+	}
+
+	if (uevent.devname) {
+		if (uevent.type == RTE_DEV_EVENT_REMOVE) {
+			bus = rte_bus_find_by_name(busname);
+			if (bus == NULL) {
+				RTE_LOG(ERR, EAL, "Cannot find bus (%s)\n",
+					busname);
+				return;
+			}
+			dev = bus->find_device(NULL, cmp_dev_name,
+					       uevent.devname);
+			if (dev == NULL) {
+				RTE_LOG(ERR, EAL, "Cannot find device (%s) on "
+					"bus (%s)\n", uevent.devname, busname);
+				return;
+			}
+			rte_spinlock_lock(&dev_failure_lock);
+			ret = bus->hotplug_handler(dev);
+			rte_spinlock_unlock(&dev_failure_lock);
+			if (ret) {
+				RTE_LOG(ERR, EAL, "Can not handle hotplug for "
+					"device (%s)\n", dev->name);
+				return;
+			}
+		}
 		dev_callback_process(uevent.devname, uevent.type);
+	}
 }
 
 int __rte_experimental
 rte_dev_event_monitor_start(void)
 {
+	sigset_t mask;
+	struct sigaction action;
 	int ret;
 
 	if (monitor_started)
@@ -197,6 +274,14 @@ rte_dev_event_monitor_start(void)
 		return -1;
 	}
 
+	/* register sigbus handler */
+	sigemptyset(&mask);
+	sigaddset(&mask, SIGBUS);
+	action.sa_flags = SA_SIGINFO;
+	action.sa_mask = mask;
+	action.sa_sigaction = sigbus_handler;
+	sigaction(SIGBUS, &action, NULL);
+
 	monitor_started = true;
 
 	return 0;
@@ -220,5 +305,6 @@ rte_dev_event_monitor_stop(void)
 	close(intr_handle.fd);
 	intr_handle.fd = -1;
 	monitor_started = false;
+
 	return 0;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH V4 7/9] igb_uio: fix uio release issue when hot unplug
  2018-06-29 10:30       ` [PATCH V4 0/9] hot plug failure handle mechanism Jeff Guo
                           ` (5 preceding siblings ...)
  2018-06-29 10:30         ` [PATCH V4 6/9] eal: add failure handle mechanism for hot plug Jeff Guo
@ 2018-06-29 10:30         ` Jeff Guo
  2018-07-03 12:12           ` Ferruh Yigit
  2018-06-29 10:30         ` [PATCH V4 8/9] app/testpmd: show example to handle " Jeff Guo
  2018-06-29 10:30         ` [PATCH V4 9/9] app/testpmd: enable device hotplug monitoring Jeff Guo
  8 siblings, 1 reply; 494+ messages in thread
From: Jeff Guo @ 2018-06-29 10:30 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

When hot unplug device, the kernel will release the device resource in the
kernel side, such as the fd sys file will disappear, and the irq will be
released. At this time, if igb uio driver still try to release this
resource, it will cause kernel crash. On the other hand, something like
interrupt disabling do not automatically process in kernel side. If not
handler it, this redundancy and dirty thing will affect the interrupt
resource be used by other device. So the igb_uio driver have to check the
hot plug status, and the corresponding process should be taken in igb uio
driver.

This patch propose to add structure of rte_udev_state into rte_uio_pci_dev
of igb_uio kernel driver, which will record the state of uio device, such
as probed/opened/released/removed/unplug. When detect the unexpected
removal which cause of hot unplug behavior, it will corresponding disable
interrupt resource, while for the part of releasement which kernel have
already handle, just skip it to avoid double free or null pointer kernel
crash issue.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v4->v3:
no change
---
 kernel/linux/igb_uio/igb_uio.c | 50 +++++++++++++++++++++++++++++++++++++-----
 1 file changed, 45 insertions(+), 5 deletions(-)

diff --git a/kernel/linux/igb_uio/igb_uio.c b/kernel/linux/igb_uio/igb_uio.c
index b3233f1..d301302 100644
--- a/kernel/linux/igb_uio/igb_uio.c
+++ b/kernel/linux/igb_uio/igb_uio.c
@@ -19,6 +19,15 @@
 
 #include "compat.h"
 
+/* uio pci device state */
+enum rte_udev_state {
+	RTE_UDEV_PROBED,
+	RTE_UDEV_OPENNED,
+	RTE_UDEV_RELEASED,
+	RTE_UDEV_REMOVED,
+	RTE_UDEV_UNPLUG
+};
+
 /**
  * A structure describing the private information for a uio device.
  */
@@ -28,6 +37,7 @@ struct rte_uio_pci_dev {
 	enum rte_intr_mode mode;
 	struct mutex lock;
 	int refcnt;
+	enum rte_udev_state state;
 };
 
 static char *intr_mode;
@@ -194,12 +204,20 @@ igbuio_pci_irqhandler(int irq, void *dev_id)
 {
 	struct rte_uio_pci_dev *udev = (struct rte_uio_pci_dev *)dev_id;
 	struct uio_info *info = &udev->info;
+	struct pci_dev *pdev = udev->pdev;
 
 	/* Legacy mode need to mask in hardware */
 	if (udev->mode == RTE_INTR_MODE_LEGACY &&
 	    !pci_check_and_mask_intx(udev->pdev))
 		return IRQ_NONE;
 
+	/* check the uevent of the kobj */
+	if ((&pdev->dev.kobj)->state_remove_uevent_sent == 1) {
+		dev_notice(&pdev->dev, "device:%s, sent remove uevent!\n",
+			   (&pdev->dev.kobj)->name);
+		udev->state = RTE_UDEV_UNPLUG;
+	}
+
 	uio_event_notify(info);
 
 	/* Message signal mode, no share IRQ and automasked */
@@ -308,7 +326,6 @@ igbuio_pci_disable_interrupts(struct rte_uio_pci_dev *udev)
 #endif
 }
 
-
 /**
  * This gets called while opening uio device file.
  */
@@ -330,24 +347,33 @@ igbuio_pci_open(struct uio_info *info, struct inode *inode)
 
 	/* enable interrupts */
 	err = igbuio_pci_enable_interrupts(udev);
-	mutex_unlock(&udev->lock);
 	if (err) {
 		dev_err(&dev->dev, "Enable interrupt fails\n");
+		pci_clear_master(dev);
 		return err;
 	}
+	udev->state = RTE_UDEV_OPENNED;
+	mutex_unlock(&udev->lock);
 	return 0;
 }
 
+/**
+ * This gets called while closing uio device file.
+ */
 static int
 igbuio_pci_release(struct uio_info *info, struct inode *inode)
 {
+
 	struct rte_uio_pci_dev *udev = info->priv;
 	struct pci_dev *dev = udev->pdev;
 
+	if (udev->state == RTE_UDEV_REMOVED)
+		return 0;
+
 	mutex_lock(&udev->lock);
 	if (--udev->refcnt > 0) {
 		mutex_unlock(&udev->lock);
-		return 0;
+		return -1;
 	}
 
 	/* disable interrupts */
@@ -355,7 +381,7 @@ igbuio_pci_release(struct uio_info *info, struct inode *inode)
 
 	/* stop the device from further DMA */
 	pci_clear_master(dev);
-
+	udev->state = RTE_UDEV_RELEASED;
 	mutex_unlock(&udev->lock);
 	return 0;
 }
@@ -557,6 +583,7 @@ igbuio_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
 			 (unsigned long long)map_dma_addr, map_addr);
 	}
 
+	udev->state = RTE_UDEV_PROBED;
 	return 0;
 
 fail_remove_group:
@@ -573,11 +600,24 @@ igbuio_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
 static void
 igbuio_pci_remove(struct pci_dev *dev)
 {
+
 	struct rte_uio_pci_dev *udev = pci_get_drvdata(dev);
+	int ret;
+
+	/* handler hot unplug */
+	if (udev->state == RTE_UDEV_OPENNED ||
+		udev->state == RTE_UDEV_UNPLUG) {
+		dev_notice(&dev->dev, "Unexpected removal!\n");
+		ret = igbuio_pci_release(&udev->info, NULL);
+		if (ret)
+			return;
+		udev->state = RTE_UDEV_REMOVED;
+		return;
+	}
 
 	mutex_destroy(&udev->lock);
-	sysfs_remove_group(&dev->dev.kobj, &dev_attr_grp);
 	uio_unregister_device(&udev->info);
+	sysfs_remove_group(&dev->dev.kobj, &dev_attr_grp);
 	igbuio_pci_release_iomem(&udev->info);
 	pci_disable_device(dev);
 	pci_set_drvdata(dev, NULL);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH V4 8/9] app/testpmd: show example to handle hot unplug
  2018-06-29 10:30       ` [PATCH V4 0/9] hot plug failure handle mechanism Jeff Guo
                           ` (6 preceding siblings ...)
  2018-06-29 10:30         ` [PATCH V4 7/9] igb_uio: fix uio release issue when hot unplug Jeff Guo
@ 2018-06-29 10:30         ` Jeff Guo
  2018-07-01  7:46           ` Matan Azrad
  2018-06-29 10:30         ` [PATCH V4 9/9] app/testpmd: enable device hotplug monitoring Jeff Guo
  8 siblings, 1 reply; 494+ messages in thread
From: Jeff Guo @ 2018-06-29 10:30 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

Use testpmd for example, to show how an application smoothly handle
failure when device being hot unplug. If app have enabled the device event
monitor and register the hot plug event’s callback before running, once
app detect the removal event, the callback would be called. It will first
stop the packet forwarding, then stop the port, close the port, and finally
detach the port to remove the device out from the device lists.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v4->v3:
remove some unused code
---
 app/test-pmd/testpmd.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 24c1998..42ed196 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -2196,6 +2196,9 @@ static void
 eth_dev_event_callback(char *device_name, enum rte_dev_event_type type,
 			     __rte_unused void *arg)
 {
+	uint16_t port_id;
+	int ret;
+
 	if (type >= RTE_DEV_EVENT_MAX) {
 		fprintf(stderr, "%s called upon invalid event %d\n",
 			__func__, type);
@@ -2206,9 +2209,12 @@ eth_dev_event_callback(char *device_name, enum rte_dev_event_type type,
 	case RTE_DEV_EVENT_REMOVE:
 		RTE_LOG(ERR, EAL, "The device: %s has been removed!\n",
 			device_name);
-		/* TODO: After finish failure handle, begin to stop
-		 * packet forward, stop port, close port, detach port.
-		 */
+		ret = rte_eth_dev_get_port_by_name(device_name, &port_id);
+		if (ret) {
+			printf("can not get port by device %s!\n", device_name);
+			return;
+		}
+		rmv_event_callback((void *)(intptr_t)port_id);
 		break;
 	case RTE_DEV_EVENT_ADD:
 		RTE_LOG(ERR, EAL, "The device: %s has been added!\n",
@@ -2736,7 +2742,6 @@ main(int argc, char** argv)
 			return -1;
 		}
 		eth_dev_event_callback_register();
-
 	}
 
 	if (start_port(RTE_PORT_ALL) != 0)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH V4 9/9] app/testpmd: enable device hotplug monitoring
  2018-06-29 10:30       ` [PATCH V4 0/9] hot plug failure handle mechanism Jeff Guo
                           ` (7 preceding siblings ...)
  2018-06-29 10:30         ` [PATCH V4 8/9] app/testpmd: show example to handle " Jeff Guo
@ 2018-06-29 10:30         ` Jeff Guo
  8 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-06-29 10:30 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

As we know, there 2 different hotplug mechanisms in dpdk, the one is
ethdev event + kernel driver hotplug solution, while the other one is
eal device event + pci uio driver hotplug solution, each of them have
different configure and callback process in testpmd. In oder to avoid
the race between them, this patch aim to use a new parameter
"--hotplug-mode" to replace the previous "--hot-plug" command parameter,
to identify these different mode.

There are 3 modes on hotplug mode: disable, eal, or ethdev(default).

If user want to use eal device event monitor mode, could use below
command when start testpmd. If not set this parameter, ethdev hotplug
mode is default to be used.

E.g. ./build/app/testpmd -c 0x3 --n 4 -- -i --hotplug-mode=eal

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v4->v3:
change to use new parameter "--hotplug-mode" in testpmd
to identify the eal hotplug and ethdev hotplug
---
 app/test-pmd/parameters.c             | 20 ++++++++++++++++----
 app/test-pmd/testpmd.c                | 18 +++++++++++-------
 app/test-pmd/testpmd.h                |  8 +++++++-
 doc/guides/testpmd_app_ug/run_app.rst | 10 ++++++++--
 4 files changed, 42 insertions(+), 14 deletions(-)

diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index 7580762..601e13e 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -186,7 +186,8 @@ usage(char* progname)
 	printf("  --flow-isolate-all: "
 	       "requests flow API isolated mode on all ports at initialization time.\n");
 	printf("  --tx-offloads=0xXXXXXXXX: hexadecimal bitmask of TX queue offloads\n");
-	printf("  --hot-plug: enable hot plug for device.\n");
+	printf("  --hotplug-mode=N: set hotplug mode for device "
+	       "(N: disable (default) or eal or ethdev.\n");
 	printf("  --vxlan-gpe-port=N: UPD port of tunnel VXLAN-GPE\n");
 	printf("  --mlockall: lock all memory\n");
 	printf("  --no-mlockall: do not lock all memory\n");
@@ -621,7 +622,7 @@ launch_args_parse(int argc, char** argv)
 		{ "print-event",		1, 0, 0 },
 		{ "mask-event",			1, 0, 0 },
 		{ "tx-offloads",		1, 0, 0 },
-		{ "hot-plug",			0, 0, 0 },
+		{ "hotplug-mode",		1, 0, 0 },
 		{ "vxlan-gpe-port",		1, 0, 0 },
 		{ "mlockall",			0, 0, 0 },
 		{ "no-mlockall",		0, 0, 0 },
@@ -1139,8 +1140,19 @@ launch_args_parse(int argc, char** argv)
 					rte_exit(EXIT_FAILURE,
 						 "invalid mask-event argument\n");
 				}
-			if (!strcmp(lgopts[opt_idx].name, "hot-plug"))
-				hot_plug = 1;
+			if (!strcmp(lgopts[opt_idx].name, "hotplug-mode")) {
+				if (!strcmp(optarg, "disable"))
+					hotplug_mode = HOTPLUG_MODE_DISABLE;
+				else if (!strcmp(optarg, "eal"))
+					hotplug_mode = HOTPLUG_MODE_EAL;
+				else if (!strcmp(optarg, "ethdev"))
+					hotplug_mode = HOTPLUG_MODE_ETHDEV;
+				else
+					rte_exit(EXIT_FAILURE,
+						 "hotplug-mode %s invalid - must be: "
+						 "disable, eal, ethdev.\n",
+						 optarg);
+			}
 			if (!strcmp(lgopts[opt_idx].name, "mlockall"))
 				do_mlockall = 1;
 			if (!strcmp(lgopts[opt_idx].name, "no-mlockall"))
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 42ed196..9269400 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -286,7 +286,7 @@ uint8_t lsc_interrupt = 1; /* enabled by default */
  */
 uint8_t rmv_interrupt = 1; /* enabled by default */
 
-uint8_t hot_plug = 0; /**< hotplug disabled by default. */
+uint8_t hotplug_mode = HOTPLUG_MODE_ETHDEV; /**< hotplug disabled by default. */
 
 /*
  * Display or mask ether events
@@ -2043,7 +2043,7 @@ pmd_test_exit(void)
 		}
 	}
 
-	if (hot_plug) {
+	if (hotplug_mode == HOTPLUG_MODE_EAL) {
 		ret = rte_dev_event_monitor_stop();
 		if (ret)
 			RTE_LOG(ERR, EAL,
@@ -2181,9 +2181,13 @@ eth_event_callback(portid_t port_id, enum rte_eth_event_type type, void *param,
 
 	switch (type) {
 	case RTE_ETH_EVENT_INTR_RMV:
-		if (rte_eal_alarm_set(100000,
-				rmv_event_callback, (void *)(intptr_t)port_id))
-			fprintf(stderr, "Could not set up deferred device removal\n");
+		if (hotplug_mode == HOTPLUG_MODE_ETHDEV) {
+			if (rte_eal_alarm_set(100000,
+					rmv_event_callback,
+					(void *)(intptr_t)port_id))
+				fprintf(stderr, "Could not set up deferred "
+					"device removal\n");
+		}
 		break;
 	default:
 		break;
@@ -2734,8 +2738,8 @@ main(int argc, char** argv)
 
 	init_config();
 
-	if (hot_plug) {
-		/* enable hot plug monitoring */
+	if (hotplug_mode == HOTPLUG_MODE_EAL) {
+		/* enable hotplug event monitoring */
 		ret = rte_dev_event_monitor_start();
 		if (ret) {
 			rte_errno = EINVAL;
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index f51cd9d..e29ee2a 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -69,6 +69,12 @@ enum {
 	PORT_TOPOLOGY_LOOP,
 };
 
+enum {
+	HOTPLUG_MODE_DISABLE,
+	HOTPLUG_MODE_EAL,
+	HOTPLUG_MODE_ETHDEV,
+};
+
 #ifdef RTE_TEST_PMD_RECORD_BURST_STATS
 /**
  * The data structure associated with RX and TX packet burst statistics
@@ -335,7 +341,7 @@ extern uint8_t lsc_interrupt; /**< disabled by "--no-lsc-interrupt" parameter */
 extern uint8_t rmv_interrupt; /**< disabled by "--no-rmv-interrupt" parameter */
 extern uint32_t event_print_mask;
 /**< set by "--print-event xxxx" and "--mask-event xxxx parameters */
-extern uint8_t hot_plug; /**< enable by "--hot-plug" parameter */
+extern uint8_t hotplug_mode; /**< set by "--hotplug-mode" parameter */
 extern int do_mlockall; /**< set by "--mlockall" or "--no-mlockall" parameter */
 
 #ifdef RTE_LIBRTE_IXGBE_BYPASS
diff --git a/doc/guides/testpmd_app_ug/run_app.rst b/doc/guides/testpmd_app_ug/run_app.rst
index f301c2b..09e2716 100644
--- a/doc/guides/testpmd_app_ug/run_app.rst
+++ b/doc/guides/testpmd_app_ug/run_app.rst
@@ -482,9 +482,15 @@ The commandline options are:
     Set the hexadecimal bitmask of TX queue offloads.
     The default value is 0.
 
-*   ``--hot-plug``
+*   ``--hotplug-mode``
 
-    Enable device event monitor machenism for hotplug.
+    Set the hotplug handle mode, that is ``disable`` or ``eal`` or ``ethdev`` (the default).
+
+    In ``disable`` mode, it will not handle the hotplug for device.
+
+    In ``eal`` mode, it will start device event monitor and register eth_dev_event_callback for hotplug process.
+
+    In ``ethdev`` mode, it will process RTE_ETH_EVENT_INTR_RMV event which is detected from ethdev.
 
 *   ``--vxlan-gpe-port=N``
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* Re: [PATCH V4 6/9] eal: add failure handle mechanism for hot plug
  2018-06-29 10:30         ` [PATCH V4 6/9] eal: add failure handle mechanism for hot plug Jeff Guo
@ 2018-06-29 10:49           ` Ananyev, Konstantin
  2018-06-29 11:15             ` Guo, Jia
  0 siblings, 1 reply; 494+ messages in thread
From: Ananyev, Konstantin @ 2018-06-29 10:49 UTC (permalink / raw)
  To: Guo, Jia, stephen, Richardson, Bruce, Yigit, Ferruh,
	gaetan.rivet, Wu, Jingjing, thomas, motih, matan, Van Haaren,
	Harry, Zhang, Qi Z, He, Shaopeng, Iremonger, Bernard
  Cc: jblunck, shreyansh.jain, dev, Zhang, Helin

Hi Jeff,

> 
> This patch introduces a failure handler mechanism to handle device
> hot plug removal event.
> 
> First register sigbus handler, once sigbus error be captured, will
> check the failure address and accordingly remap the invalid memory
> for the corresponding device. Bese on this mechanism, it could
> guaranty the application not to be crash when hot unplug devices.
> 
> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> ---
> v4->v3:
> split patches to be small and clear.
> ---
>  lib/librte_eal/linuxapp/eal/eal_dev.c | 88 ++++++++++++++++++++++++++++++++++-
>  1 file changed, 87 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
> index 1cf6aeb..c9dddab 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_dev.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
> @@ -4,6 +4,8 @@
> 
>  #include <string.h>
>  #include <unistd.h>
> +#include <fcntl.h>
> +#include <signal.h>
>  #include <sys/socket.h>
>  #include <linux/netlink.h>
> 
> @@ -14,15 +16,24 @@
>  #include <rte_malloc.h>
>  #include <rte_interrupts.h>
>  #include <rte_alarm.h>
> +#include <rte_bus.h>
> +#include <rte_eal.h>
> +#include <rte_spinlock.h>
> +#include <rte_errno.h>
> 
>  #include "eal_private.h"
> 
>  static struct rte_intr_handle intr_handle = {.fd = -1 };
>  static bool monitor_started;
> 
> +extern struct rte_bus_list rte_bus_list;
> +
>  #define EAL_UEV_MSG_LEN 4096
>  #define EAL_UEV_MSG_ELEM_LEN 128
> 
> +/* spinlock for device failure process */
> +static rte_spinlock_t dev_failure_lock = RTE_SPINLOCK_INITIALIZER;
> +
>  static void dev_uev_handler(__rte_unused void *param);
> 
>  /* identify the system layer which reports this event. */
> @@ -33,6 +44,34 @@ enum eal_dev_event_subsystem {
>  	EAL_DEV_EVENT_SUBSYSTEM_MAX
>  };
> 
> +static void sigbus_handler(int signum __rte_unused, siginfo_t *info,
> +				void *ctx __rte_unused)
> +{
> +	int ret;
> +
> +	RTE_LOG(DEBUG, EAL, "Thread[%d] catch SIGBUS, fault address:%p\n",
> +		(int)pthread_self(), info->si_addr);
> +
> +	rte_spinlock_lock(&dev_failure_lock);
> +	ret = rte_bus_sigbus_handler(info->si_addr);
> +	rte_spinlock_unlock(&dev_failure_lock);
> +	if (!ret)
> +		RTE_LOG(INFO, EAL,
> +			"Success to handle SIGBUS error for hotplug!\n");
> +	else
> +		rte_exit(EXIT_FAILURE,
> +			 "A generic SIGBUS error, (rte_errno: %s)!",
> +			 strerror(rte_errno));
> +}

As I said in comments for previous versions:
I think we need to distinguish why do we fail -
 1) address doesn't belong to any device,
 2) we failed to remap
For 1) we probably need to call previous sigbus handler.

> +
> +static int cmp_dev_name(const struct rte_device *dev,
> +	const void *_name)
> +{
> +	const char *name = _name;
> +
> +	return strcmp(dev->name, name);
> +}
> +
>  static int
>  dev_uev_socket_fd_create(void)
>  {
> @@ -147,6 +186,9 @@ dev_uev_handler(__rte_unused void *param)
>  	struct rte_dev_event uevent;
>  	int ret;
>  	char buf[EAL_UEV_MSG_LEN];
> +	struct rte_bus *bus;
> +	struct rte_device *dev;
> +	const char *busname;
> 
>  	memset(&uevent, 0, sizeof(struct rte_dev_event));
>  	memset(buf, 0, EAL_UEV_MSG_LEN);
> @@ -171,13 +213,48 @@ dev_uev_handler(__rte_unused void *param)
>  	RTE_LOG(DEBUG, EAL, "receive uevent(name:%s, type:%d, subsystem:%d)\n",
>  		uevent.devname, uevent.type, uevent.subsystem);
> 
> -	if (uevent.devname)
> +	switch (uevent.subsystem) {
> +	case EAL_DEV_EVENT_SUBSYSTEM_PCI:
> +	case EAL_DEV_EVENT_SUBSYSTEM_UIO:
> +		busname = "pci";
> +		break;
> +	default:
> +		break;
> +	}
> +
> +	if (uevent.devname) {
> +		if (uevent.type == RTE_DEV_EVENT_REMOVE) {
> +			bus = rte_bus_find_by_name(busname);
> +			if (bus == NULL) {
> +				RTE_LOG(ERR, EAL, "Cannot find bus (%s)\n",
> +					busname);
> +				return;
> +			}
> +			dev = bus->find_device(NULL, cmp_dev_name,
> +					       uevent.devname);
> +			if (dev == NULL) {
> +				RTE_LOG(ERR, EAL, "Cannot find device (%s) on "
> +					"bus (%s)\n", uevent.devname, busname);
> +				return;
> +			}
> +			rte_spinlock_lock(&dev_failure_lock);
> +			ret = bus->hotplug_handler(dev);
> +			rte_spinlock_unlock(&dev_failure_lock);

Ok, but this function is executed from interrupt thread, correct?
What would happen if user would do dev-detach() at the same time and dev would not be valid anymore?
Shouldn't we have a lock (per bus?) that we would grab before find_device() and release after hotplug_handler? 
Though in that case we probably need to revisit other bus ops too.

> +			if (ret) {
> +				RTE_LOG(ERR, EAL, "Can not handle hotplug for "
> +					"device (%s)\n", dev->name);
> +				return;
> +			}
> +		}
>  		dev_callback_process(uevent.devname, uevent.type);
> +	}
>  }
> 
>  int __rte_experimental
>  rte_dev_event_monitor_start(void)
>  {
> +	sigset_t mask;
> +	struct sigaction action;
>  	int ret;
> 
>  	if (monitor_started)
> @@ -197,6 +274,14 @@ rte_dev_event_monitor_start(void)
>  		return -1;
>  	}
> 
> +	/* register sigbus handler */
> +	sigemptyset(&mask);
> +	sigaddset(&mask, SIGBUS);
> +	action.sa_flags = SA_SIGINFO;
> +	action.sa_mask = mask;
> +	action.sa_sigaction = sigbus_handler;
> +	sigaction(SIGBUS, &action, NULL);
> +

I still think we have to save (and restore at monitor_stop) previous sigbus handler.

>  	monitor_started = true;
> 
>  	return 0;
> @@ -220,5 +305,6 @@ rte_dev_event_monitor_stop(void)
>  	close(intr_handle.fd);
>  	intr_handle.fd = -1;
>  	monitor_started = false;
> +
>  	return 0;
>  }
> --
> 2.7.4

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V4 5/9] bus: add helper to handle sigbus
  2018-06-29 10:30         ` [PATCH V4 5/9] bus: add helper to handle sigbus Jeff Guo
@ 2018-06-29 10:51           ` Ananyev, Konstantin
  2018-06-29 11:23             ` Guo, Jia
  0 siblings, 1 reply; 494+ messages in thread
From: Ananyev, Konstantin @ 2018-06-29 10:51 UTC (permalink / raw)
  To: Guo, Jia, stephen, Richardson, Bruce, Yigit, Ferruh,
	gaetan.rivet, Wu, Jingjing, thomas, motih, matan, Van Haaren,
	Harry, Zhang, Qi Z, He, Shaopeng, Iremonger, Bernard
  Cc: jblunck, shreyansh.jain, dev, Zhang, Helin


> +int
> +rte_bus_sigbus_handler(const void *failure_addr)
> +{
> +	struct rte_bus *bus;
> +	int old_errno = rte_errno;
> +	int ret = 0;
> +
> +	rte_errno = 0;
> +
> +	bus = rte_bus_find(NULL, bus_handle_sigbus, failure_addr);
> +	if (bus == NULL) {
> +		RTE_LOG(ERR, EAL, "No bus can handle the sigbus error!");
> +		ret = -1;
> +	} else if (rte_errno != 0) {
> +		RTE_LOG(ERR, EAL, "Failed to handle the sigbus error!");
> +		ret = -1;
> +	}
> +
> +	/* if sigbus not be handled, return back old errno. */
> +	if (ret)
> +		rte_errno = old_errno;

Hmm, not sure why we need to set/restore rte_errno here?

> +
> +	return ret;
> +}

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V4 6/9] eal: add failure handle mechanism for hot plug
  2018-06-29 10:49           ` Ananyev, Konstantin
@ 2018-06-29 11:15             ` Guo, Jia
  2018-06-29 12:06               ` Ananyev, Konstantin
  0 siblings, 1 reply; 494+ messages in thread
From: Guo, Jia @ 2018-06-29 11:15 UTC (permalink / raw)
  To: Ananyev, Konstantin, stephen, Richardson, Bruce, Yigit, Ferruh,
	gaetan.rivet, Wu, Jingjing, thomas, motih, matan, Van Haaren,
	Harry, Zhang, Qi Z, He, Shaopeng, Iremonger, Bernard
  Cc: jblunck, shreyansh.jain, dev, Zhang, Helin

hi,konstantin


On 6/29/2018 6:49 PM, Ananyev, Konstantin wrote:
> Hi Jeff,
>
>> This patch introduces a failure handler mechanism to handle device
>> hot plug removal event.
>>
>> First register sigbus handler, once sigbus error be captured, will
>> check the failure address and accordingly remap the invalid memory
>> for the corresponding device. Bese on this mechanism, it could
>> guaranty the application not to be crash when hot unplug devices.
>>
>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
>> ---
>> v4->v3:
>> split patches to be small and clear.
>> ---
>>   lib/librte_eal/linuxapp/eal/eal_dev.c | 88 ++++++++++++++++++++++++++++++++++-
>>   1 file changed, 87 insertions(+), 1 deletion(-)
>>
>> diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
>> index 1cf6aeb..c9dddab 100644
>> --- a/lib/librte_eal/linuxapp/eal/eal_dev.c
>> +++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
>> @@ -4,6 +4,8 @@
>>
>>   #include <string.h>
>>   #include <unistd.h>
>> +#include <fcntl.h>
>> +#include <signal.h>
>>   #include <sys/socket.h>
>>   #include <linux/netlink.h>
>>
>> @@ -14,15 +16,24 @@
>>   #include <rte_malloc.h>
>>   #include <rte_interrupts.h>
>>   #include <rte_alarm.h>
>> +#include <rte_bus.h>
>> +#include <rte_eal.h>
>> +#include <rte_spinlock.h>
>> +#include <rte_errno.h>
>>
>>   #include "eal_private.h"
>>
>>   static struct rte_intr_handle intr_handle = {.fd = -1 };
>>   static bool monitor_started;
>>
>> +extern struct rte_bus_list rte_bus_list;
>> +
>>   #define EAL_UEV_MSG_LEN 4096
>>   #define EAL_UEV_MSG_ELEM_LEN 128
>>
>> +/* spinlock for device failure process */
>> +static rte_spinlock_t dev_failure_lock = RTE_SPINLOCK_INITIALIZER;
>> +
>>   static void dev_uev_handler(__rte_unused void *param);
>>
>>   /* identify the system layer which reports this event. */
>> @@ -33,6 +44,34 @@ enum eal_dev_event_subsystem {
>>   	EAL_DEV_EVENT_SUBSYSTEM_MAX
>>   };
>>
>> +static void sigbus_handler(int signum __rte_unused, siginfo_t *info,
>> +				void *ctx __rte_unused)
>> +{
>> +	int ret;
>> +
>> +	RTE_LOG(DEBUG, EAL, "Thread[%d] catch SIGBUS, fault address:%p\n",
>> +		(int)pthread_self(), info->si_addr);
>> +
>> +	rte_spinlock_lock(&dev_failure_lock);
>> +	ret = rte_bus_sigbus_handler(info->si_addr);
>> +	rte_spinlock_unlock(&dev_failure_lock);
>> +	if (!ret)
>> +		RTE_LOG(INFO, EAL,
>> +			"Success to handle SIGBUS error for hotplug!\n");
>> +	else
>> +		rte_exit(EXIT_FAILURE,
>> +			 "A generic SIGBUS error, (rte_errno: %s)!",
>> +			 strerror(rte_errno));
>> +}
> As I said in comments for previous versions:
> I think we need to distinguish why do we fail -
>   1) address doesn't belong to any device,
>   2) we failed to remap
> For 1) we probably need to call previous sigbus handler.

i know your point, but i think what ever 1) or 2), we should also need 
to call previous sigbus handler to show exception of the memory error, 
to cut down any other after try to run.
and for the previous sigbus handler, i still not find a explicit call to 
use it, i think it the sigbus handler could be restore but only will use 
when next error occur, right?
if so, do you think i just use a rte_exit to replace this origin handler 
is make sense?

>> +
>> +static int cmp_dev_name(const struct rte_device *dev,
>> +	const void *_name)
>> +{
>> +	const char *name = _name;
>> +
>> +	return strcmp(dev->name, name);
>> +}
>> +
>>   static int
>>   dev_uev_socket_fd_create(void)
>>   {
>> @@ -147,6 +186,9 @@ dev_uev_handler(__rte_unused void *param)
>>   	struct rte_dev_event uevent;
>>   	int ret;
>>   	char buf[EAL_UEV_MSG_LEN];
>> +	struct rte_bus *bus;
>> +	struct rte_device *dev;
>> +	const char *busname;
>>
>>   	memset(&uevent, 0, sizeof(struct rte_dev_event));
>>   	memset(buf, 0, EAL_UEV_MSG_LEN);
>> @@ -171,13 +213,48 @@ dev_uev_handler(__rte_unused void *param)
>>   	RTE_LOG(DEBUG, EAL, "receive uevent(name:%s, type:%d, subsystem:%d)\n",
>>   		uevent.devname, uevent.type, uevent.subsystem);
>>
>> -	if (uevent.devname)
>> +	switch (uevent.subsystem) {
>> +	case EAL_DEV_EVENT_SUBSYSTEM_PCI:
>> +	case EAL_DEV_EVENT_SUBSYSTEM_UIO:
>> +		busname = "pci";
>> +		break;
>> +	default:
>> +		break;
>> +	}
>> +
>> +	if (uevent.devname) {
>> +		if (uevent.type == RTE_DEV_EVENT_REMOVE) {
>> +			bus = rte_bus_find_by_name(busname);
>> +			if (bus == NULL) {
>> +				RTE_LOG(ERR, EAL, "Cannot find bus (%s)\n",
>> +					busname);
>> +				return;
>> +			}
>> +			dev = bus->find_device(NULL, cmp_dev_name,
>> +					       uevent.devname);
>> +			if (dev == NULL) {
>> +				RTE_LOG(ERR, EAL, "Cannot find device (%s) on "
>> +					"bus (%s)\n", uevent.devname, busname);
>> +				return;
>> +			}
>> +			rte_spinlock_lock(&dev_failure_lock);
>> +			ret = bus->hotplug_handler(dev);
>> +			rte_spinlock_unlock(&dev_failure_lock);
> Ok, but this function is executed from interrupt thread, correct?

yes.

> What would happen if user would do dev-detach() at the same time and dev would not be valid anymore?
> Shouldn't we have a lock (per bus?) that we would grab before find_device() and release after hotplug_handler?
> Though in that case we probably need to revisit other bus ops too.

make sense, i think should be the case and need lock any bus ops here to 
sync.

>> +			if (ret) {
>> +				RTE_LOG(ERR, EAL, "Can not handle hotplug for "
>> +					"device (%s)\n", dev->name);
>> +				return;
>> +			}
>> +		}
>>   		dev_callback_process(uevent.devname, uevent.type);
>> +	}
>>   }
>>
>>   int __rte_experimental
>>   rte_dev_event_monitor_start(void)
>>   {
>> +	sigset_t mask;
>> +	struct sigaction action;
>>   	int ret;
>>
>>   	if (monitor_started)
>> @@ -197,6 +274,14 @@ rte_dev_event_monitor_start(void)
>>   		return -1;
>>   	}
>>
>> +	/* register sigbus handler */
>> +	sigemptyset(&mask);
>> +	sigaddset(&mask, SIGBUS);
>> +	action.sa_flags = SA_SIGINFO;
>> +	action.sa_mask = mask;
>> +	action.sa_sigaction = sigbus_handler;
>> +	sigaction(SIGBUS, &action, NULL);
>> +
> I still think we have to save (and restore at monitor_stop) previous sigbus handler.

ok, i think i missing here, if monitor_stop, definitely should restore 
the previous sigbus handler.

>>   	monitor_started = true;
>>
>>   	return 0;
>> @@ -220,5 +305,6 @@ rte_dev_event_monitor_stop(void)
>>   	close(intr_handle.fd);
>>   	intr_handle.fd = -1;
>>   	monitor_started = false;
>> +
>>   	return 0;
>>   }
>> --
>> 2.7.4

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V4 5/9] bus: add helper to handle sigbus
  2018-06-29 10:51           ` Ananyev, Konstantin
@ 2018-06-29 11:23             ` Guo, Jia
  2018-06-29 12:21               ` Ananyev, Konstantin
  0 siblings, 1 reply; 494+ messages in thread
From: Guo, Jia @ 2018-06-29 11:23 UTC (permalink / raw)
  To: Ananyev, Konstantin, stephen, Richardson, Bruce, Yigit, Ferruh,
	gaetan.rivet, Wu, Jingjing, thomas, motih, matan, Van Haaren,
	Harry, Zhang, Qi Z, He, Shaopeng, Iremonger, Bernard
  Cc: jblunck, shreyansh.jain, dev, Zhang, Helin

hi, konstantin


On 6/29/2018 6:51 PM, Ananyev, Konstantin wrote:
>> +int
>> +rte_bus_sigbus_handler(const void *failure_addr)
>> +{
>> +	struct rte_bus *bus;
>> +	int old_errno = rte_errno;
>> +	int ret = 0;
>> +
>> +	rte_errno = 0;
>> +
>> +	bus = rte_bus_find(NULL, bus_handle_sigbus, failure_addr);
>> +	if (bus == NULL) {
>> +		RTE_LOG(ERR, EAL, "No bus can handle the sigbus error!");
>> +		ret = -1;
>> +	} else if (rte_errno != 0) {
>> +		RTE_LOG(ERR, EAL, "Failed to handle the sigbus error!");
>> +		ret = -1;
>> +	}
>> +
>> +	/* if sigbus not be handled, return back old errno. */
>> +	if (ret)
>> +		rte_errno = old_errno;
> Hmm, not sure why we need to set/restore rte_errno here?

restore old_errno just use to let caller know that the generic sigbus 
still not handler by bus hotplug handler,  that involve find a bus 
handle but failed and can not find a hander,  and can corresponding use 
the previous sigbus handler to process it.
that is also unwser your question in other patch. do you think that make 
sense?

>> +
>> +	return ret;
>> +}

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V4 6/9] eal: add failure handle mechanism for hot plug
  2018-06-29 11:15             ` Guo, Jia
@ 2018-06-29 12:06               ` Ananyev, Konstantin
  0 siblings, 0 replies; 494+ messages in thread
From: Ananyev, Konstantin @ 2018-06-29 12:06 UTC (permalink / raw)
  To: Guo, Jia, stephen, Richardson, Bruce, Yigit, Ferruh,
	gaetan.rivet, Wu, Jingjing, thomas, motih, matan, Van Haaren,
	Harry, Zhang, Qi Z, He, Shaopeng, Iremonger, Bernard
  Cc: jblunck, shreyansh.jain, dev, Zhang, Helin



> -----Original Message-----
> From: Guo, Jia
> Sent: Friday, June 29, 2018 12:15 PM
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; stephen@networkplumber.org; Richardson, Bruce
> <bruce.richardson@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com>; gaetan.rivet@6wind.com; Wu, Jingjing
> <jingjing.wu@intel.com>; thomas@monjalon.net; motih@mellanox.com; matan@mellanox.com; Van Haaren, Harry
> <harry.van.haaren@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>; He, Shaopeng <shaopeng.he@intel.com>; Iremonger, Bernard
> <bernard.iremonger@intel.com>
> Cc: jblunck@infradead.org; shreyansh.jain@nxp.com; dev@dpdk.org; Zhang, Helin <helin.zhang@intel.com>
> Subject: Re: [PATCH V4 6/9] eal: add failure handle mechanism for hot plug
> 
> hi,konstantin
> 
> 
> On 6/29/2018 6:49 PM, Ananyev, Konstantin wrote:
> > Hi Jeff,
> >
> >> This patch introduces a failure handler mechanism to handle device
> >> hot plug removal event.
> >>
> >> First register sigbus handler, once sigbus error be captured, will
> >> check the failure address and accordingly remap the invalid memory
> >> for the corresponding device. Bese on this mechanism, it could
> >> guaranty the application not to be crash when hot unplug devices.
> >>
> >> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> >> ---
> >> v4->v3:
> >> split patches to be small and clear.
> >> ---
> >>   lib/librte_eal/linuxapp/eal/eal_dev.c | 88 ++++++++++++++++++++++++++++++++++-
> >>   1 file changed, 87 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
> >> index 1cf6aeb..c9dddab 100644
> >> --- a/lib/librte_eal/linuxapp/eal/eal_dev.c
> >> +++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
> >> @@ -4,6 +4,8 @@
> >>
> >>   #include <string.h>
> >>   #include <unistd.h>
> >> +#include <fcntl.h>
> >> +#include <signal.h>
> >>   #include <sys/socket.h>
> >>   #include <linux/netlink.h>
> >>
> >> @@ -14,15 +16,24 @@
> >>   #include <rte_malloc.h>
> >>   #include <rte_interrupts.h>
> >>   #include <rte_alarm.h>
> >> +#include <rte_bus.h>
> >> +#include <rte_eal.h>
> >> +#include <rte_spinlock.h>
> >> +#include <rte_errno.h>
> >>
> >>   #include "eal_private.h"
> >>
> >>   static struct rte_intr_handle intr_handle = {.fd = -1 };
> >>   static bool monitor_started;
> >>
> >> +extern struct rte_bus_list rte_bus_list;
> >> +
> >>   #define EAL_UEV_MSG_LEN 4096
> >>   #define EAL_UEV_MSG_ELEM_LEN 128
> >>
> >> +/* spinlock for device failure process */
> >> +static rte_spinlock_t dev_failure_lock = RTE_SPINLOCK_INITIALIZER;
> >> +
> >>   static void dev_uev_handler(__rte_unused void *param);
> >>
> >>   /* identify the system layer which reports this event. */
> >> @@ -33,6 +44,34 @@ enum eal_dev_event_subsystem {
> >>   	EAL_DEV_EVENT_SUBSYSTEM_MAX
> >>   };
> >>
> >> +static void sigbus_handler(int signum __rte_unused, siginfo_t *info,
> >> +				void *ctx __rte_unused)
> >> +{
> >> +	int ret;
> >> +
> >> +	RTE_LOG(DEBUG, EAL, "Thread[%d] catch SIGBUS, fault address:%p\n",
> >> +		(int)pthread_self(), info->si_addr);
> >> +
> >> +	rte_spinlock_lock(&dev_failure_lock);
> >> +	ret = rte_bus_sigbus_handler(info->si_addr);
> >> +	rte_spinlock_unlock(&dev_failure_lock);
> >> +	if (!ret)
> >> +		RTE_LOG(INFO, EAL,
> >> +			"Success to handle SIGBUS error for hotplug!\n");
> >> +	else
> >> +		rte_exit(EXIT_FAILURE,
> >> +			 "A generic SIGBUS error, (rte_errno: %s)!",
> >> +			 strerror(rte_errno));
> >> +}
> > As I said in comments for previous versions:
> > I think we need to distinguish why do we fail -
> >   1) address doesn't belong to any device,
> >   2) we failed to remap
> > For 1) we probably need to call previous sigbus handler.
> 
> i know your point, but i think what ever 1) or 2), we should also need
> to call previous sigbus handler to show exception of the memory error,
> to cut down any other after try to run.

I don't agree.
If 1) - that error doesn't belong to us (DPDK), but user app might know how to handle it.
So we just invoke previously saved previous (if any) sigbus handler.
for 2) - there is not much we can do but rte_exit(). 

> and for the previous sigbus handler, i still not find a explicit call to
> use it, i think it the sigbus handler could be restore but only will use
> when next error occur, right?
> if so, do you think i just use a rte_exit to replace this origin handler
> is make sense?
> 
> >> +
> >> +static int cmp_dev_name(const struct rte_device *dev,
> >> +	const void *_name)
> >> +{
> >> +	const char *name = _name;
> >> +
> >> +	return strcmp(dev->name, name);
> >> +}
> >> +
> >>   static int
> >>   dev_uev_socket_fd_create(void)
> >>   {
> >> @@ -147,6 +186,9 @@ dev_uev_handler(__rte_unused void *param)
> >>   	struct rte_dev_event uevent;
> >>   	int ret;
> >>   	char buf[EAL_UEV_MSG_LEN];
> >> +	struct rte_bus *bus;
> >> +	struct rte_device *dev;
> >> +	const char *busname;
> >>
> >>   	memset(&uevent, 0, sizeof(struct rte_dev_event));
> >>   	memset(buf, 0, EAL_UEV_MSG_LEN);
> >> @@ -171,13 +213,48 @@ dev_uev_handler(__rte_unused void *param)
> >>   	RTE_LOG(DEBUG, EAL, "receive uevent(name:%s, type:%d, subsystem:%d)\n",
> >>   		uevent.devname, uevent.type, uevent.subsystem);
> >>
> >> -	if (uevent.devname)
> >> +	switch (uevent.subsystem) {
> >> +	case EAL_DEV_EVENT_SUBSYSTEM_PCI:
> >> +	case EAL_DEV_EVENT_SUBSYSTEM_UIO:
> >> +		busname = "pci";
> >> +		break;
> >> +	default:
> >> +		break;
> >> +	}
> >> +
> >> +	if (uevent.devname) {
> >> +		if (uevent.type == RTE_DEV_EVENT_REMOVE) {
> >> +			bus = rte_bus_find_by_name(busname);
> >> +			if (bus == NULL) {
> >> +				RTE_LOG(ERR, EAL, "Cannot find bus (%s)\n",
> >> +					busname);
> >> +				return;
> >> +			}
> >> +			dev = bus->find_device(NULL, cmp_dev_name,
> >> +					       uevent.devname);
> >> +			if (dev == NULL) {
> >> +				RTE_LOG(ERR, EAL, "Cannot find device (%s) on "
> >> +					"bus (%s)\n", uevent.devname, busname);
> >> +				return;
> >> +			}
> >> +			rte_spinlock_lock(&dev_failure_lock);
> >> +			ret = bus->hotplug_handler(dev);
> >> +			rte_spinlock_unlock(&dev_failure_lock);
> > Ok, but this function is executed from interrupt thread, correct?
> 
> yes.
> 
> > What would happen if user would do dev-detach() at the same time and dev would not be valid anymore?
> > Shouldn't we have a lock (per bus?) that we would grab before find_device() and release after hotplug_handler?
> > Though in that case we probably need to revisit other bus ops too.
> 
> make sense, i think should be the case and need lock any bus ops here to
> sync.
> 
> >> +			if (ret) {
> >> +				RTE_LOG(ERR, EAL, "Can not handle hotplug for "
> >> +					"device (%s)\n", dev->name);
> >> +				return;
> >> +			}
> >> +		}
> >>   		dev_callback_process(uevent.devname, uevent.type);
> >> +	}
> >>   }
> >>
> >>   int __rte_experimental
> >>   rte_dev_event_monitor_start(void)
> >>   {
> >> +	sigset_t mask;
> >> +	struct sigaction action;
> >>   	int ret;
> >>
> >>   	if (monitor_started)
> >> @@ -197,6 +274,14 @@ rte_dev_event_monitor_start(void)
> >>   		return -1;
> >>   	}
> >>
> >> +	/* register sigbus handler */
> >> +	sigemptyset(&mask);
> >> +	sigaddset(&mask, SIGBUS);
> >> +	action.sa_flags = SA_SIGINFO;
> >> +	action.sa_mask = mask;
> >> +	action.sa_sigaction = sigbus_handler;
> >> +	sigaction(SIGBUS, &action, NULL);
> >> +
> > I still think we have to save (and restore at monitor_stop) previous sigbus handler.
> 
> ok, i think i missing here, if monitor_stop, definitely should restore
> the previous sigbus handler.
> 
> >>   	monitor_started = true;
> >>
> >>   	return 0;
> >> @@ -220,5 +305,6 @@ rte_dev_event_monitor_stop(void)
> >>   	close(intr_handle.fd);
> >>   	intr_handle.fd = -1;
> >>   	monitor_started = false;
> >> +
> >>   	return 0;
> >>   }
> >> --
> >> 2.7.4

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V4 5/9] bus: add helper to handle sigbus
  2018-06-29 11:23             ` Guo, Jia
@ 2018-06-29 12:21               ` Ananyev, Konstantin
  2018-06-29 12:52                 ` Gaëtan Rivet
  0 siblings, 1 reply; 494+ messages in thread
From: Ananyev, Konstantin @ 2018-06-29 12:21 UTC (permalink / raw)
  To: Guo, Jia, stephen, Richardson, Bruce, Yigit, Ferruh,
	gaetan.rivet, Wu, Jingjing, thomas, motih, matan, Van Haaren,
	Harry, Zhang, Qi Z, He, Shaopeng, Iremonger, Bernard
  Cc: jblunck, shreyansh.jain, dev, Zhang, Helin



> -----Original Message-----
> From: Guo, Jia
> Sent: Friday, June 29, 2018 12:23 PM
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; stephen@networkplumber.org; Richardson, Bruce
> <bruce.richardson@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com>; gaetan.rivet@6wind.com; Wu, Jingjing
> <jingjing.wu@intel.com>; thomas@monjalon.net; motih@mellanox.com; matan@mellanox.com; Van Haaren, Harry
> <harry.van.haaren@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>; He, Shaopeng <shaopeng.he@intel.com>; Iremonger, Bernard
> <bernard.iremonger@intel.com>
> Cc: jblunck@infradead.org; shreyansh.jain@nxp.com; dev@dpdk.org; Zhang, Helin <helin.zhang@intel.com>
> Subject: Re: [PATCH V4 5/9] bus: add helper to handle sigbus
> 
> hi, konstantin
> 
> 
> On 6/29/2018 6:51 PM, Ananyev, Konstantin wrote:
> >> +int
> >> +rte_bus_sigbus_handler(const void *failure_addr)
> >> +{
> >> +	struct rte_bus *bus;
> >> +	int old_errno = rte_errno;
> >> +	int ret = 0;
> >> +
> >> +	rte_errno = 0;
> >> +
> >> +	bus = rte_bus_find(NULL, bus_handle_sigbus, failure_addr);
> >> +	if (bus == NULL) {
> >> +		RTE_LOG(ERR, EAL, "No bus can handle the sigbus error!");
> >> +		ret = -1;
> >> +	} else if (rte_errno != 0) {
> >> +		RTE_LOG(ERR, EAL, "Failed to handle the sigbus error!");
> >> +		ret = -1;
> >> +	}
> >> +
> >> +	/* if sigbus not be handled, return back old errno. */
> >> +	if (ret)
> >> +		rte_errno = old_errno;
> > Hmm, not sure why we need to set/restore rte_errno here?
> 
> restore old_errno just use to let caller know that the generic sigbus
> still not handler by bus hotplug handler,  that involve find a bus
> handle but failed and can not find a hander,  and can corresponding use
> the previous sigbus handler to process it.
> that is also unwser your question in other patch. do you think that make
> sense?

Sorry, still don't understand the intention.
Suppose rte_bus_find() will return NULL, in that case you'll setup rte_errno
to what it was before calling that function.
If the returned bus is not NULL, but bus_find() set's an rte_errno,
you still would restore rte_ernno?
What is the prupose?
Why do you need to touch rte_errno at all in that function?
Konstantin

> 
> >> +
> >> +	return ret;
> >> +}

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V4 5/9] bus: add helper to handle sigbus
  2018-06-29 12:21               ` Ananyev, Konstantin
@ 2018-06-29 12:52                 ` Gaëtan Rivet
  2018-07-03 11:24                   ` Guo, Jia
  0 siblings, 1 reply; 494+ messages in thread
From: Gaëtan Rivet @ 2018-06-29 12:52 UTC (permalink / raw)
  To: Ananyev, Konstantin
  Cc: Guo, Jia, stephen, Richardson, Bruce, Yigit, Ferruh, Wu,
	Jingjing, thomas, motih, matan, Van Haaren, Harry, Zhang, Qi Z,
	He, Shaopeng, Iremonger, Bernard, jblunck, shreyansh.jain, dev,
	Zhang, Helin

On Fri, Jun 29, 2018 at 12:21:39PM +0000, Ananyev, Konstantin wrote:
> 
> 
> > -----Original Message-----
> > From: Guo, Jia
> > Sent: Friday, June 29, 2018 12:23 PM
> > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; stephen@networkplumber.org; Richardson, Bruce
> > <bruce.richardson@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com>; gaetan.rivet@6wind.com; Wu, Jingjing
> > <jingjing.wu@intel.com>; thomas@monjalon.net; motih@mellanox.com; matan@mellanox.com; Van Haaren, Harry
> > <harry.van.haaren@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>; He, Shaopeng <shaopeng.he@intel.com>; Iremonger, Bernard
> > <bernard.iremonger@intel.com>
> > Cc: jblunck@infradead.org; shreyansh.jain@nxp.com; dev@dpdk.org; Zhang, Helin <helin.zhang@intel.com>
> > Subject: Re: [PATCH V4 5/9] bus: add helper to handle sigbus
> > 
> > hi, konstantin
> > 
> > 
> > On 6/29/2018 6:51 PM, Ananyev, Konstantin wrote:
> > >> +int
> > >> +rte_bus_sigbus_handler(const void *failure_addr)
> > >> +{
> > >> +	struct rte_bus *bus;
> > >> +	int old_errno = rte_errno;
> > >> +	int ret = 0;
> > >> +
> > >> +	rte_errno = 0;
> > >> +
> > >> +	bus = rte_bus_find(NULL, bus_handle_sigbus, failure_addr);
> > >> +	if (bus == NULL) {
> > >> +		RTE_LOG(ERR, EAL, "No bus can handle the sigbus error!");
> > >> +		ret = -1;
> > >> +	} else if (rte_errno != 0) {
> > >> +		RTE_LOG(ERR, EAL, "Failed to handle the sigbus error!");
> > >> +		ret = -1;
> > >> +	}
> > >> +
> > >> +	/* if sigbus not be handled, return back old errno. */
> > >> +	if (ret)
> > >> +		rte_errno = old_errno;
> > > Hmm, not sure why we need to set/restore rte_errno here?
> > 
> > restore old_errno just use to let caller know that the generic sigbus
> > still not handler by bus hotplug handler,  that involve find a bus
> > handle but failed and can not find a hander,  and can corresponding use
> > the previous sigbus handler to process it.
> > that is also unwser your question in other patch. do you think that make
> > sense?
> 
> Sorry, still don't understand the intention.
> Suppose rte_bus_find() will return NULL, in that case you'll setup rte_errno
> to what it was before calling that function.
> If the returned bus is not NULL, but bus_find() set's an rte_errno,
> you still would restore rte_ernno?
> What is the prupose?
> Why do you need to touch rte_errno at all in that function?
> Konstantin
> 

The way it is written here does not work, but the intention is
to make sure that a previous error is still catched. Something like
that:

   int old_errno = rte_errno;
   
   rte_errno = 0;
   rte_eal_call();
   
   if (rte_errno)
       return -1;
   else {
       rte_errno = old_errno;
       return 0;
   }

If someone calls the function while rte_errno is already set, then an
earlier error would be hidden by setting rte_errno to 0 within the
function.

I'm not sure this is useful, but sometimes when using errno within a
library call I'm bothered that I am masking previous issues.

Should it be avoided?

-- 
Gaëtan Rivet
6WIND

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V4 8/9] app/testpmd: show example to handle hot unplug
  2018-06-29 10:30         ` [PATCH V4 8/9] app/testpmd: show example to handle " Jeff Guo
@ 2018-07-01  7:46           ` Matan Azrad
  2018-07-03  9:35             ` Guo, Jia
  0 siblings, 1 reply; 494+ messages in thread
From: Matan Azrad @ 2018-07-01  7:46 UTC (permalink / raw)
  To: Jeff Guo, stephen, bruce.richardson, ferruh.yigit,
	konstantin.ananyev, gaetan.rivet, jingjing.wu, Thomas Monjalon,
	Mordechay Haimovsky, harry.van.haaren, qi.z.zhang, shaopeng.he,
	bernard.iremonger
  Cc: jblunck, shreyansh.jain, dev, helin.zhang

Hi Jeff

A good advance, thank you, but as I said in previous version, this patch inserts a bug and the next one fixes it.
Patch 9 should be before patch 8 while this patch just add 1 more option for EAL hotplug.

Please see 1 more comment below.

From: Jeff Guo
> Use testpmd for example, to show how an application smoothly handle failure
> when device being hot unplug. If app have enabled the device event monitor
> and register the hot plug event’s callback before running, once app detect the
> removal event, the callback would be called. It will first stop the packet
> forwarding, then stop the port, close the port, and finally detach the port to
> remove the device out from the device lists.
> 
> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> ---
> v4->v3:
> remove some unused code
> ---
>  app/test-pmd/testpmd.c | 13 +++++++++----
>  1 file changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
> 24c1998..42ed196 100644
> --- a/app/test-pmd/testpmd.c
> +++ b/app/test-pmd/testpmd.c
> @@ -2196,6 +2196,9 @@ static void
>  eth_dev_event_callback(char *device_name, enum rte_dev_event_type type,
>  			     __rte_unused void *arg)
>  {
> +	uint16_t port_id;
> +	int ret;
> +
>  	if (type >= RTE_DEV_EVENT_MAX) {
>  		fprintf(stderr, "%s called upon invalid event %d\n",
>  			__func__, type);
> @@ -2206,9 +2209,12 @@ eth_dev_event_callback(char *device_name, enum
> rte_dev_event_type type,
>  	case RTE_DEV_EVENT_REMOVE:
>  		RTE_LOG(ERR, EAL, "The device: %s has been removed!\n",
>  			device_name);
> -		/* TODO: After finish failure handle, begin to stop
> -		 * packet forward, stop port, close port, detach port.
> -		 */
> +		ret = rte_eth_dev_get_port_by_name(device_name, &port_id);

As you probably know, 1 rte_device may be associated to more than one ethdev ports, so the ethdev port name can be different from rte_device name.
Looks like we need a new ethdev API to get all the ports associated to one rte_device.

> +		if (ret) {
> +			printf("can not get port by device %s!\n",
> device_name);
> +			return;
> +		}
> +		rmv_event_callback((void *)(intptr_t)port_id);
>  		break;
>  	case RTE_DEV_EVENT_ADD:
>  		RTE_LOG(ERR, EAL, "The device: %s has been added!\n", @@ -
> 2736,7 +2742,6 @@ main(int argc, char** argv)
>  			return -1;
>  		}
>  		eth_dev_event_callback_register();
> -
>  	}
> 
>  	if (start_port(RTE_PORT_ALL) != 0)
> --
> 2.7.4


^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V4 8/9] app/testpmd: show example to handle hot unplug
  2018-07-01  7:46           ` Matan Azrad
@ 2018-07-03  9:35             ` Guo, Jia
  2018-07-03 22:44               ` Thomas Monjalon
  0 siblings, 1 reply; 494+ messages in thread
From: Guo, Jia @ 2018-07-03  9:35 UTC (permalink / raw)
  To: Matan Azrad, stephen, bruce.richardson, ferruh.yigit,
	konstantin.ananyev, gaetan.rivet, jingjing.wu, Thomas Monjalon,
	Mordechay Haimovsky, harry.van.haaren, qi.z.zhang, shaopeng.he,
	bernard.iremonger
  Cc: jblunck, shreyansh.jain, dev, helin.zhang

mantan,


On 7/1/2018 3:46 PM, Matan Azrad wrote:
> Hi Jeff
>
> A good advance, thank you, but as I said in previous version, this patch inserts a bug and the next one fixes it.
> Patch 9 should be before patch 8 while this patch just add 1 more option for EAL hotplug.

i agree that patch 9 before patch 8 could be better. thank.

> Please see 1 more comment below.
>
> From: Jeff Guo
>> Use testpmd for example, to show how an application smoothly handle failure
>> when device being hot unplug. If app have enabled the device event monitor
>> and register the hot plug event’s callback before running, once app detect the
>> removal event, the callback would be called. It will first stop the packet
>> forwarding, then stop the port, close the port, and finally detach the port to
>> remove the device out from the device lists.
>>
>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
>> ---
>> v4->v3:
>> remove some unused code
>> ---
>>   app/test-pmd/testpmd.c | 13 +++++++++----
>>   1 file changed, 9 insertions(+), 4 deletions(-)
>>
>> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
>> 24c1998..42ed196 100644
>> --- a/app/test-pmd/testpmd.c
>> +++ b/app/test-pmd/testpmd.c
>> @@ -2196,6 +2196,9 @@ static void
>>   eth_dev_event_callback(char *device_name, enum rte_dev_event_type type,
>>   			     __rte_unused void *arg)
>>   {
>> +	uint16_t port_id;
>> +	int ret;
>> +
>>   	if (type >= RTE_DEV_EVENT_MAX) {
>>   		fprintf(stderr, "%s called upon invalid event %d\n",
>>   			__func__, type);
>> @@ -2206,9 +2209,12 @@ eth_dev_event_callback(char *device_name, enum
>> rte_dev_event_type type,
>>   	case RTE_DEV_EVENT_REMOVE:
>>   		RTE_LOG(ERR, EAL, "The device: %s has been removed!\n",
>>   			device_name);
>> -		/* TODO: After finish failure handle, begin to stop
>> -		 * packet forward, stop port, close port, detach port.
>> -		 */
>> +		ret = rte_eth_dev_get_port_by_name(device_name, &port_id);
> As you probably know, 1 rte_device may be associated to more than one ethdev ports, so the ethdev port name can be different from rte_device name.
> Looks like we need a new ethdev API to get all the ports associated to one rte_device.

agree, seems that the the old ethdev API have some issue when got all 
port by device name. we could check with ethdev maintainer and fix it by 
specific ethdev patch later.

>> +		if (ret) {
>> +			printf("can not get port by device %s!\n",
>> device_name);
>> +			return;
>> +		}
>> +		rmv_event_callback((void *)(intptr_t)port_id);
>>   		break;
>>   	case RTE_DEV_EVENT_ADD:
>>   		RTE_LOG(ERR, EAL, "The device: %s has been added!\n", @@ -
>> 2736,7 +2742,6 @@ main(int argc, char** argv)
>>   			return -1;
>>   		}
>>   		eth_dev_event_callback_register();
>> -
>>   	}
>>
>>   	if (start_port(RTE_PORT_ALL) != 0)
>> --
>> 2.7.4

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V4 5/9] bus: add helper to handle sigbus
  2018-06-29 12:52                 ` Gaëtan Rivet
@ 2018-07-03 11:24                   ` Guo, Jia
  0 siblings, 0 replies; 494+ messages in thread
From: Guo, Jia @ 2018-07-03 11:24 UTC (permalink / raw)
  To: Gaëtan Rivet, Ananyev, Konstantin
  Cc: stephen, Richardson, Bruce, Yigit, Ferruh, Wu, Jingjing, thomas,
	motih, matan, Van Haaren, Harry, Zhang, Qi Z, He, Shaopeng,
	Iremonger, Bernard, jblunck, shreyansh.jain, dev, Zhang, Helin

hi, gaetan and konstantin

answer both of your questions here as below.

On 6/29/2018 8:52 PM, Gaëtan Rivet wrote:
> On Fri, Jun 29, 2018 at 12:21:39PM +0000, Ananyev, Konstantin wrote:
>>
>>> -----Original Message-----
>>> From: Guo, Jia
>>> Sent: Friday, June 29, 2018 12:23 PM
>>> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; stephen@networkplumber.org; Richardson, Bruce
>>> <bruce.richardson@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com>; gaetan.rivet@6wind.com; Wu, Jingjing
>>> <jingjing.wu@intel.com>; thomas@monjalon.net; motih@mellanox.com; matan@mellanox.com; Van Haaren, Harry
>>> <harry.van.haaren@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>; He, Shaopeng <shaopeng.he@intel.com>; Iremonger, Bernard
>>> <bernard.iremonger@intel.com>
>>> Cc: jblunck@infradead.org; shreyansh.jain@nxp.com; dev@dpdk.org; Zhang, Helin <helin.zhang@intel.com>
>>> Subject: Re: [PATCH V4 5/9] bus: add helper to handle sigbus
>>>
>>> hi, konstantin
>>>
>>>
>>> On 6/29/2018 6:51 PM, Ananyev, Konstantin wrote:
>>>>> +int
>>>>> +rte_bus_sigbus_handler(const void *failure_addr)
>>>>> +{
>>>>> +	struct rte_bus *bus;
>>>>> +	int old_errno = rte_errno;
>>>>> +	int ret = 0;
>>>>> +
>>>>> +	rte_errno = 0;
>>>>> +
>>>>> +	bus = rte_bus_find(NULL, bus_handle_sigbus, failure_addr);
>>>>> +	if (bus == NULL) {
>>>>> +		RTE_LOG(ERR, EAL, "No bus can handle the sigbus error!");
>>>>> +		ret = -1;
>>>>> +	} else if (rte_errno != 0) {
>>>>> +		RTE_LOG(ERR, EAL, "Failed to handle the sigbus error!");
>>>>> +		ret = -1;
>>>>> +	}
>>>>> +
>>>>> +	/* if sigbus not be handled, return back old errno. */
>>>>> +	if (ret)
>>>>> +		rte_errno = old_errno;
>>>> Hmm, not sure why we need to set/restore rte_errno here?
>>> restore old_errno just use to let caller know that the generic sigbus
>>> still not handler by bus hotplug handler,  that involve find a bus
>>> handle but failed and can not find a hander,  and can corresponding use
>>> the previous sigbus handler to process it.
>>> that is also unwser your question in other patch. do you think that make
>>> sense?
>> Sorry, still don't understand the intention.
>> Suppose rte_bus_find() will return NULL, in that case you'll setup rte_errno
>> to what it was before calling that function.
>> If the returned bus is not NULL, but bus_find() set's an rte_errno,
>> you still would restore rte_ernno?
>> What is the prupose?
>> Why do you need to touch rte_errno at all in that function?
>> Konstantin
>>
> The way it is written here does not work, but the intention is
> to make sure that a previous error is still catched. Something like
> that:
>
>     int old_errno = rte_errno;
>     
>     rte_errno = 0;
>     rte_eal_call();
>     
>     if (rte_errno)
>         return -1;
>     else {
>         rte_errno = old_errno;
>         return 0;
>     }
>
> If someone calls the function while rte_errno is already set, then an
> earlier error would be hidden by setting rte_errno to 0 within the
> function.
>
> I'm not sure this is useful, but sometimes when using errno within a
> library call I'm bothered that I am masking previous issues.
>
> Should it be avoided?

i agree with konstantin about distinguish to process the handle failed 
or no handle,
and agree with gaetan about restore the errno if it is not belong to the 
sigbus handler.
Could you check if it is fulfill that as bellow,

-1 means find bus but handle failed,  use rte_exit.
1 means can no find bus, use older handler to handle.
0 means find bus and success handle. the handler is the new handler.

static int
bus_handle_sigbus(const struct rte_bus *bus,
             const void *failure_addr)
{
     int ret;
     ret = bus->sigbus_handler(failure_addr);
     rte_errno = ret;
     return !(bus->sigbus_handler && ret <= 0);
}

int
rte_bus_sigbus_handler(const void *failure_addr)
{
     struct rte_bus *bus;
     int ret = 0;
     int old_errno = rte_errno;
     rte_errno = 0;
     bus = rte_bus_find(NULL, bus_handle_sigbus, failure_addr);
     /* failed to handle the sigbus, pass the new errno. */
     if (bus && rte_errno == -1)
         return -1;
     else if (!bus)
         ret =1;

     /* otherwise restore the old errno. */
     rte_errno = old_errno;

     return ret;
}

static void sigbus_handler(int signum, siginfo_t *info,
                 void *ctx __rte_unused)
{
     int ret;

     rte_spinlock_lock(&dev_failure_lock);
     ret = rte_bus_sigbus_handler(info->si_addr);
     rte_spinlock_unlock(&dev_failure_lock);
     if (ret == -1) {
         rte_exit(EXIT_FAILURE,
              "Failed to handle SIGBUS for hotplug, "
              "(rte_errno: %s)!", strerror(rte_errno));
     } else if (ret == 1) {
         if (sigbus_action_old.sa_handler)
             (*(sigbus_action_old.sa_handler))(signum);
         else
             rte_exit(EXIT_FAILURE,
                  "Failed to handle generic SIGBUS!");
     }
}

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V4 7/9] igb_uio: fix uio release issue when hot unplug
  2018-06-29 10:30         ` [PATCH V4 7/9] igb_uio: fix uio release issue when hot unplug Jeff Guo
@ 2018-07-03 12:12           ` Ferruh Yigit
  0 siblings, 0 replies; 494+ messages in thread
From: Ferruh Yigit @ 2018-07-03 12:12 UTC (permalink / raw)
  To: Jeff Guo, stephen, bruce.richardson, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger
  Cc: jblunck, shreyansh.jain, dev, helin.zhang

On 6/29/2018 11:30 AM, Jeff Guo wrote:
> When hot unplug device, the kernel will release the device resource in the
> kernel side, such as the fd sys file will disappear, and the irq will be
> released. At this time, if igb uio driver still try to release this
> resource, it will cause kernel crash. On the other hand, something like
> interrupt disabling do not automatically process in kernel side. If not
> handler it, this redundancy and dirty thing will affect the interrupt
> resource be used by other device. So the igb_uio driver have to check the
> hot plug status, and the corresponding process should be taken in igb uio
> driver.
> 
> This patch propose to add structure of rte_udev_state into rte_uio_pci_dev
> of igb_uio kernel driver, which will record the state of uio device, such
> as probed/opened/released/removed/unplug. When detect the unexpected
> removal which cause of hot unplug behavior, it will corresponding disable
> interrupt resource, while for the part of releasement which kernel have
> already handle, just skip it to avoid double free or null pointer kernel
> crash issue.
> 
> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> ---
> v4->v3:
> no change
> ---
>  kernel/linux/igb_uio/igb_uio.c | 50 +++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 45 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/linux/igb_uio/igb_uio.c b/kernel/linux/igb_uio/igb_uio.c
> index b3233f1..d301302 100644
> --- a/kernel/linux/igb_uio/igb_uio.c
> +++ b/kernel/linux/igb_uio/igb_uio.c
> @@ -19,6 +19,15 @@
>  
>  #include "compat.h"
>  
> +/* uio pci device state */
> +enum rte_udev_state {
> +	RTE_UDEV_PROBED,
> +	RTE_UDEV_OPENNED,
> +	RTE_UDEV_RELEASED,
> +	RTE_UDEV_REMOVED,
> +	RTE_UDEV_UNPLUG
> +};
> +
>  /**
>   * A structure describing the private information for a uio device.
>   */
> @@ -28,6 +37,7 @@ struct rte_uio_pci_dev {
>  	enum rte_intr_mode mode;
>  	struct mutex lock;
>  	int refcnt;
> +	enum rte_udev_state state;
>  };
>  
>  static char *intr_mode;
> @@ -194,12 +204,20 @@ igbuio_pci_irqhandler(int irq, void *dev_id)
>  {
>  	struct rte_uio_pci_dev *udev = (struct rte_uio_pci_dev *)dev_id;
>  	struct uio_info *info = &udev->info;
> +	struct pci_dev *pdev = udev->pdev;
>  
>  	/* Legacy mode need to mask in hardware */
>  	if (udev->mode == RTE_INTR_MODE_LEGACY &&
>  	    !pci_check_and_mask_intx(udev->pdev))
>  		return IRQ_NONE;
>  
> +	/* check the uevent of the kobj */
> +	if ((&pdev->dev.kobj)->state_remove_uevent_sent == 1) {
> +		dev_notice(&pdev->dev, "device:%s, sent remove uevent!\n",
> +			   (&pdev->dev.kobj)->name);
> +		udev->state = RTE_UDEV_UNPLUG;
> +	}

I guess commit log says kernel can remove device, if so do we need any locking
before accessing dev?

> +
>  	uio_event_notify(info);
>  
>  	/* Message signal mode, no share IRQ and automasked */
> @@ -308,7 +326,6 @@ igbuio_pci_disable_interrupts(struct rte_uio_pci_dev *udev)
>  #endif
>  }
>  
> -
>  /**
>   * This gets called while opening uio device file.
>   */
> @@ -330,24 +347,33 @@ igbuio_pci_open(struct uio_info *info, struct inode *inode)
>  
>  	/* enable interrupts */
>  	err = igbuio_pci_enable_interrupts(udev);
> -	mutex_unlock(&udev->lock);
>  	if (err) {
>  		dev_err(&dev->dev, "Enable interrupt fails\n");
> +		pci_clear_master(dev);
>  		return err;
>  	}
> +	udev->state = RTE_UDEV_OPENNED;
> +	mutex_unlock(&udev->lock);
>  	return 0;
>  }
>  
> +/**
> + * This gets called while closing uio device file.
> + */
>  static int
>  igbuio_pci_release(struct uio_info *info, struct inode *inode)
>  {
> +
>  	struct rte_uio_pci_dev *udev = info->priv;
>  	struct pci_dev *dev = udev->pdev;
>  
> +	if (udev->state == RTE_UDEV_REMOVED)
> +		return 0;
> +
>  	mutex_lock(&udev->lock);
>  	if (--udev->refcnt > 0) {
>  		mutex_unlock(&udev->lock);
> -		return 0;
> +		return -1;
>  	}
>  
>  	/* disable interrupts */
> @@ -355,7 +381,7 @@ igbuio_pci_release(struct uio_info *info, struct inode *inode)
>  
>  	/* stop the device from further DMA */
>  	pci_clear_master(dev);
> -
> +	udev->state = RTE_UDEV_RELEASED;
>  	mutex_unlock(&udev->lock);
>  	return 0;
>  }
> @@ -557,6 +583,7 @@ igbuio_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
>  			 (unsigned long long)map_dma_addr, map_addr);
>  	}
>  
> +	udev->state = RTE_UDEV_PROBED;
>  	return 0;
>  
>  fail_remove_group:
> @@ -573,11 +600,24 @@ igbuio_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
>  static void
>  igbuio_pci_remove(struct pci_dev *dev)
>  {
> +
>  	struct rte_uio_pci_dev *udev = pci_get_drvdata(dev);
> +	int ret;
> +
> +	/* handler hot unplug */
> +	if (udev->state == RTE_UDEV_OPENNED ||
> +		udev->state == RTE_UDEV_UNPLUG) {
> +		dev_notice(&dev->dev, "Unexpected removal!\n");
> +		ret = igbuio_pci_release(&udev->info, NULL);
> +		if (ret)
> +			return;
> +		udev->state = RTE_UDEV_REMOVED;
> +		return;
> +	}
>  
>  	mutex_destroy(&udev->lock);
> -	sysfs_remove_group(&dev->dev.kobj, &dev_attr_grp);
>  	uio_unregister_device(&udev->info);
> +	sysfs_remove_group(&dev->dev.kobj, &dev_attr_grp);
>  	igbuio_pci_release_iomem(&udev->info);
>  	pci_disable_device(dev);
>  	pci_set_drvdata(dev, NULL);
> 

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V4 1/9] bus: introduce hotplug failure handler
  2018-06-29 10:30         ` [PATCH V4 1/9] bus: introduce hotplug failure handler Jeff Guo
@ 2018-07-03 22:21           ` Thomas Monjalon
  2018-07-04  7:16             ` Guo, Jia
  0 siblings, 1 reply; 494+ messages in thread
From: Thomas Monjalon @ 2018-07-03 22:21 UTC (permalink / raw)
  To: Jeff Guo
  Cc: dev, stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, motih, matan, harry.van.haaren,
	qi.z.zhang, shaopeng.he, bernard.iremonger, jblunck,
	shreyansh.jain, helin.zhang

29/06/2018 12:30, Jeff Guo:
>  /**
> + * Implementation a specific hot plug handler, which is responsible
> + * for handle the failure when hot remove the device, guaranty the system
> + * would not crash in the case.
> + * @param dev
> + *	Pointer of the device structure.
> + *
> + * @return
> + *	0 on success.
> + *	!0 on error.
> + */
> +typedef int (*rte_bus_hotplug_handler_t)(struct rte_device *dev);

[...]
> @@ -211,6 +224,8 @@ struct rte_bus {
>  	rte_bus_parse_t parse;       /**< Parse a device name */
>  	struct rte_bus_conf conf;    /**< Bus configuration */
>  	rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
> +	rte_bus_hotplug_handler_t hotplug_handler;
> +						/**< handle hot plug on bus */

The name is misleading.
It is to handle unplugging but is called "hotplug".

In order to demonstrate how the handler is used, you should
introduce the code using this handler in the same patch.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V4 8/9] app/testpmd: show example to handle hot unplug
  2018-07-03  9:35             ` Guo, Jia
@ 2018-07-03 22:44               ` Thomas Monjalon
  2018-07-04  3:48                 ` Guo, Jia
  2018-07-04  7:06                 ` Matan Azrad
  0 siblings, 2 replies; 494+ messages in thread
From: Thomas Monjalon @ 2018-07-03 22:44 UTC (permalink / raw)
  To: Guo, Jia
  Cc: dev, Matan Azrad, stephen, bruce.richardson, ferruh.yigit,
	konstantin.ananyev, gaetan.rivet, jingjing.wu,
	Mordechay Haimovsky, harry.van.haaren, qi.z.zhang, shaopeng.he,
	bernard.iremonger, shreyansh.jain, helin.zhang

03/07/2018 11:35, Guo, Jia:
> On 7/1/2018 3:46 PM, Matan Azrad wrote:
> > From: Jeff Guo
> >> --- a/app/test-pmd/testpmd.c
> >> +++ b/app/test-pmd/testpmd.c
> >> @@ -2206,9 +2209,12 @@ eth_dev_event_callback(char *device_name, enum
> >> rte_dev_event_type type,
> >>   	case RTE_DEV_EVENT_REMOVE:
> >>   		RTE_LOG(ERR, EAL, "The device: %s has been removed!\n",
> >>   			device_name);
> >> -		/* TODO: After finish failure handle, begin to stop
> >> -		 * packet forward, stop port, close port, detach port.
> >> -		 */
> >> +		ret = rte_eth_dev_get_port_by_name(device_name, &port_id);
> > As you probably know, 1 rte_device may be associated to more than one ethdev ports, so the ethdev port name can be different from rte_device name.
> > Looks like we need a new ethdev API to get all the ports associated to one rte_device.
> 
> agree, seems that the the old ethdev API have some issue when got all 
> port by device name. we could check with ethdev maintainer and fix it by 
> specific ethdev patch later.

This ethdev function could return an error if several ports match.

Ideally, we should not use this function at all.
If you want to manage an ethdev port, why are you using an EAL event?
There is an ethdev callback mechanism for port removal.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V4 8/9] app/testpmd: show example to handle hot unplug
  2018-07-03 22:44               ` Thomas Monjalon
@ 2018-07-04  3:48                 ` Guo, Jia
  2018-07-04  7:06                 ` Matan Azrad
  1 sibling, 0 replies; 494+ messages in thread
From: Guo, Jia @ 2018-07-04  3:48 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: dev, Matan Azrad, stephen, bruce.richardson, ferruh.yigit,
	konstantin.ananyev, gaetan.rivet, jingjing.wu,
	Mordechay Haimovsky, harry.van.haaren, qi.z.zhang, shaopeng.he,
	bernard.iremonger, shreyansh.jain, helin.zhang


hi, thomas


On 7/4/2018 6:44 AM, Thomas Monjalon wrote:
> 03/07/2018 11:35, Guo, Jia:
>> On 7/1/2018 3:46 PM, Matan Azrad wrote:
>>> From: Jeff Guo
>>>> --- a/app/test-pmd/testpmd.c
>>>> +++ b/app/test-pmd/testpmd.c
>>>> @@ -2206,9 +2209,12 @@ eth_dev_event_callback(char *device_name, enum
>>>> rte_dev_event_type type,
>>>>    	case RTE_DEV_EVENT_REMOVE:
>>>>    		RTE_LOG(ERR, EAL, "The device: %s has been removed!\n",
>>>>    			device_name);
>>>> -		/* TODO: After finish failure handle, begin to stop
>>>> -		 * packet forward, stop port, close port, detach port.
>>>> -		 */
>>>> +		ret = rte_eth_dev_get_port_by_name(device_name, &port_id);
>>> As you probably know, 1 rte_device may be associated to more than one ethdev ports, so the ethdev port name can be different from rte_device name.
>>> Looks like we need a new ethdev API to get all the ports associated to one rte_device.
>> agree, seems that the the old ethdev API have some issue when got all
>> port by device name. we could check with ethdev maintainer and fix it by
>> specific ethdev patch later.
> This ethdev function could return an error if several ports match.
>
> Ideally, we should not use this function at all.
> If you want to manage an ethdev port, why are you using an EAL event?
> There is an ethdev callback mechanism for port removal.
>
>

i think the problem is that how to manage all ethdev port associated 
with one rte_device. So the easy way is let device event callback to 
check these ports. I will modify it in next version.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V4 8/9] app/testpmd: show example to handle hot unplug
  2018-07-03 22:44               ` Thomas Monjalon
  2018-07-04  3:48                 ` Guo, Jia
@ 2018-07-04  7:06                 ` Matan Azrad
  2018-07-05  7:54                   ` Guo, Jia
  1 sibling, 1 reply; 494+ messages in thread
From: Matan Azrad @ 2018-07-04  7:06 UTC (permalink / raw)
  To: Thomas Monjalon, Guo, Jia
  Cc: dev, stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, Mordechay Haimovsky, harry.van.haaren,
	qi.z.zhang, shaopeng.he, bernard.iremonger, shreyansh.jain,
	helin.zhang

Hi Thomas, Guo

From: Thomas Monjalon
> 03/07/2018 11:35, Guo, Jia:
> > On 7/1/2018 3:46 PM, Matan Azrad wrote:
> > > From: Jeff Guo
> > >> --- a/app/test-pmd/testpmd.c
> > >> +++ b/app/test-pmd/testpmd.c
> > >> @@ -2206,9 +2209,12 @@ eth_dev_event_callback(char
> *device_name,
> > >> enum rte_dev_event_type type,
> > >>   	case RTE_DEV_EVENT_REMOVE:
> > >>   		RTE_LOG(ERR, EAL, "The device: %s has been removed!\n",
> > >>   			device_name);
> > >> -		/* TODO: After finish failure handle, begin to stop
> > >> -		 * packet forward, stop port, close port, detach port.
> > >> -		 */
> > >> +		ret = rte_eth_dev_get_port_by_name(device_name,
> &port_id);
> > > As you probably know, 1 rte_device may be associated to more than one
> ethdev ports, so the ethdev port name can be different from rte_device
> name.
> > > Looks like we need a new ethdev API to get all the ports associated to
> one rte_device.
> >
> > agree, seems that the the old ethdev API have some issue when got all
> > port by device name. we could check with ethdev maintainer and fix it
> > by specific ethdev patch later.
> 
> This ethdev function could return an error if several ports match.
> 

Just to clarify:

The  ethdev name may be different from the rte_device name of a port,
The rte_eth_dev_get_port_by_name() searches the ethdev name and not the rte_device name.

> Ideally, we should not use this function at all.
> If you want to manage an ethdev port, why are you using an EAL event?
> There is an ethdev callback mechanism for port removal.

So, looks like the EAL event should trigger an ethdev event for all the ports associated to this rte_device.
I think that the best one to do it is the PMD, so maybe the PMD(which wants to support hot unplug) should register to the EAL event and to trigger an ethdev RMV event from the EAL callback.

What do you think?


 

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V4 1/9] bus: introduce hotplug failure handler
  2018-07-03 22:21           ` Thomas Monjalon
@ 2018-07-04  7:16             ` Guo, Jia
  2018-07-04  7:55               ` Thomas Monjalon
  0 siblings, 1 reply; 494+ messages in thread
From: Guo, Jia @ 2018-07-04  7:16 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: dev, stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, motih, matan, harry.van.haaren,
	qi.z.zhang, shaopeng.he, bernard.iremonger, jblunck,
	shreyansh.jain, helin.zhang



On 7/4/2018 6:21 AM, Thomas Monjalon wrote:
> 29/06/2018 12:30, Jeff Guo:
>>   /**
>> + * Implementation a specific hot plug handler, which is responsible
>> + * for handle the failure when hot remove the device, guaranty the system
>> + * would not crash in the case.
>> + * @param dev
>> + *	Pointer of the device structure.
>> + *
>> + * @return
>> + *	0 on success.
>> + *	!0 on error.
>> + */
>> +typedef int (*rte_bus_hotplug_handler_t)(struct rte_device *dev);
> [...]
>> @@ -211,6 +224,8 @@ struct rte_bus {
>>   	rte_bus_parse_t parse;       /**< Parse a device name */
>>   	struct rte_bus_conf conf;    /**< Bus configuration */
>>   	rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
>> +	rte_bus_hotplug_handler_t hotplug_handler;
>> +						/**< handle hot plug on bus */
> The name is misleading.
> It is to handle unplugging but is called "hotplug".

ok, so i prefer hotplug_failure_handler than hot_unplug_handler, since 
it is more explicit for failure handle, and more clearly.

> In order to demonstrate how the handler is used, you should
> introduce the code using this handler in the same patch.
>

sorry, i check the history of rte_bus.h, and the way is introduce ops at 
first, second implement in specific bus, then come across the usage.
I think that way clear and make sense. what do you think?
Anyway, i will check the commit log if is there any misleading.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V4 1/9] bus: introduce hotplug failure handler
  2018-07-04  7:16             ` Guo, Jia
@ 2018-07-04  7:55               ` Thomas Monjalon
  2018-07-05  6:23                 ` Guo, Jia
  0 siblings, 1 reply; 494+ messages in thread
From: Thomas Monjalon @ 2018-07-04  7:55 UTC (permalink / raw)
  To: Guo, Jia
  Cc: dev, stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, motih, matan, harry.van.haaren,
	qi.z.zhang, shaopeng.he, bernard.iremonger, jblunck,
	shreyansh.jain, helin.zhang

04/07/2018 09:16, Guo, Jia:
> 
> On 7/4/2018 6:21 AM, Thomas Monjalon wrote:
> > 29/06/2018 12:30, Jeff Guo:
> >>   /**
> >> + * Implementation a specific hot plug handler, which is responsible
> >> + * for handle the failure when hot remove the device, guaranty the system
> >> + * would not crash in the case.
> >> + * @param dev
> >> + *	Pointer of the device structure.
> >> + *
> >> + * @return
> >> + *	0 on success.
> >> + *	!0 on error.
> >> + */
> >> +typedef int (*rte_bus_hotplug_handler_t)(struct rte_device *dev);
> > [...]
> >> @@ -211,6 +224,8 @@ struct rte_bus {
> >>   	rte_bus_parse_t parse;       /**< Parse a device name */
> >>   	struct rte_bus_conf conf;    /**< Bus configuration */
> >>   	rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
> >> +	rte_bus_hotplug_handler_t hotplug_handler;
> >> +						/**< handle hot plug on bus */
> > The name is misleading.
> > It is to handle unplugging but is called "hotplug".
> 
> ok, so i prefer hotplug_failure_handler than hot_unplug_handler, since 
> it is more explicit for failure handle, and more clearly.
> 
> > In order to demonstrate how the handler is used, you should
> > introduce the code using this handler in the same patch.
> >
> 
> sorry, i check the history of rte_bus.h, and the way is introduce ops at 
> first, second implement in specific bus, then come across the usage.
> I think that way clear and make sense. what do you think?
> Anyway, i will check the commit log if is there any misleading.

I think it is better to call ops when they are introduced,
and implement the ops in second step.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V4 1/9] bus: introduce hotplug failure handler
  2018-07-04  7:55               ` Thomas Monjalon
@ 2018-07-05  6:23                 ` Guo, Jia
  2018-07-05  8:30                   ` Thomas Monjalon
  0 siblings, 1 reply; 494+ messages in thread
From: Guo, Jia @ 2018-07-05  6:23 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: dev, stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, motih, matan, harry.van.haaren,
	qi.z.zhang, shaopeng.he, bernard.iremonger, jblunck,
	shreyansh.jain, helin.zhang



On 7/4/2018 3:55 PM, Thomas Monjalon wrote:
> 04/07/2018 09:16, Guo, Jia:
>> On 7/4/2018 6:21 AM, Thomas Monjalon wrote:
>>> 29/06/2018 12:30, Jeff Guo:
>>>>    /**
>>>> + * Implementation a specific hot plug handler, which is responsible
>>>> + * for handle the failure when hot remove the device, guaranty the system
>>>> + * would not crash in the case.
>>>> + * @param dev
>>>> + *	Pointer of the device structure.
>>>> + *
>>>> + * @return
>>>> + *	0 on success.
>>>> + *	!0 on error.
>>>> + */
>>>> +typedef int (*rte_bus_hotplug_handler_t)(struct rte_device *dev);
>>> [...]
>>>> @@ -211,6 +224,8 @@ struct rte_bus {
>>>>    	rte_bus_parse_t parse;       /**< Parse a device name */
>>>>    	struct rte_bus_conf conf;    /**< Bus configuration */
>>>>    	rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
>>>> +	rte_bus_hotplug_handler_t hotplug_handler;
>>>> +						/**< handle hot plug on bus */
>>> The name is misleading.
>>> It is to handle unplugging but is called "hotplug".
>> ok, so i prefer hotplug_failure_handler than hot_unplug_handler, since
>> it is more explicit for failure handle, and more clearly.
>>
>>> In order to demonstrate how the handler is used, you should
>>> introduce the code using this handler in the same patch.
>>>
>> sorry, i check the history of rte_bus.h, and the way is introduce ops at
>> first, second implement in specific bus, then come across the usage.
>> I think that way clear and make sense. what do you think?
>> Anyway, i will check the commit log if is there any misleading.
> I think it is better to call ops when they are introduced,
> and implement the ops in second step.
>

Hi, Thomas

sorry but i want to detail the relationship of the ops and api as bellow 
to try if we can get the better sequence.

Patch num:

1: introduce ops hotplug_failure_handler

2: implement ops hotplug_failure_handler

3:introduce ops sigbus_handler.

4:implement ops sigbus_handler

5: introduce helper rte_bus_sigbus_handler to call the ops sigbus_handler

6: introduce the mechanism to call helper rte_bus_sigbus_handler and 
call hotplug_failure_handler.

If per you said , could I modify the sequence like 6->5->3->4->1->2? I 
don't think it will make sense, and might be more confused.

And I think should be better that introduce each ops just say item, then 
when introduce the caller patch, the functional is ready to use by the 
patch.


if i did not got your point and you have other better sequence about 
that please explicit to let me know. Thanks.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH V5 0/7] hot plug failure handle mechanism
  2017-06-29  4:37     ` [PATCH v3 0/2] add uevent api for hot plug Jeff Guo
                         ` (8 preceding siblings ...)
  2018-06-29 10:30       ` [PATCH V4 0/9] hot plug failure handle mechanism Jeff Guo
@ 2018-07-05  7:38       ` Jeff Guo
  2018-07-05  7:38         ` [PATCH V5 1/7] bus: add hotplug failure handler Jeff Guo
                           ` (5 more replies)
  2018-07-05  8:21       ` [PATCH V5 0/7] hot plug failure handle mechanism Jeff Guo
                         ` (13 subsequent siblings)
  23 siblings, 6 replies; 494+ messages in thread
From: Jeff Guo @ 2018-07-05  7:38 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

As we know, hot plug is an importance feature, either use for the datacenter
device’s fail-safe, or use for SRIOV Live Migration in SDN/NFV. It could bring
the higher flexibility and continuality to the networking services in multiple
use cases in industry. So let we see, dpdk as an importance networking
framework, what can it help to implement hot plug solution for users.

We already have a general device event detect mechanism, failsafe driver,
bonding driver and hot plug/unplug api in framework, app could use these to
develop their hot plug solution.

let’s see the case of hot unplug, it can happen when a hardware device is
be removed physically, or when the software disables it.  App need to call
ether dev API to detach the device, to unplug the device at the bus level and
make access to the device invalid. But the problem is that, the removal of the
device from the software lists is not going to be instantaneous, at this time
if the data(fast) path still read/write the device, it will cause MMIO error
and result of the app crash out.

Seems that we have got fail-safe driver(or app) + RTE_ETH_EVENT_INTR_RMV +
kernel core driver solution to handle it, but still not have failsafe driver
(or app) + RTE_DEV_EVENT_REMOVE + PCIe pmd driver failure handle solution. So
there is an absence in dpdk hot plug solution right now.

Also, we know that kernel only guaranty hot plug on the kernel side, but not for
the user mode side. Firstly we can hardly have a gatekeeper for any MMIO for
multiple PMD driver. Secondly, no more specific 3rd tools such as udev/driverctl
have especially cover these hot plug failure processing. Third, the feasibility
of app’s implement for multiple user mode PMD driver is still a problem. Here,
a general hot plug failure handle mechanism in dpdk framework would be proposed,
it aim to guaranty that, when hot unplug occur, the system will not crash and
app will not be break out, and user space can normally stop and release any
relevant resources, then unplug of the device at the bus level cleanly.

The mechanism should be come across as bellow:

Firstly, app enabled the device event monitor and register the hot plug event’s
callback before running data path. Once the hot unplug behave occur, the
mechanism will detect the removal event and then accordingly do the failure
handle. In order to do that, below functional will be bring in.
 - Add a new bus ops “handle_hot_unplug” to handle bus read/write error, it is
   bus-specific and each kind of bus can implement its own logic.
 - Implement pci bus specific ops “pci_handle_hot_unplug”. It will base on the
   failure address to remap memory for the corresponding device that unplugged.

For the data path or other unexpected control from the control path when hot
unplug occur.
 - Implement a new sigbus handler, it is registered when start device even
   monitoring. The handler is per process. Base on the signal event principle,
   control path thread and data path thread will randomly receive the sigbus
   error, but will go to the common sigbus handler. Once the MMIO sigbus error
   exposure, it will trigger the above hot unplug operation. The sigbus will be
   check if it is cause of the hot unplug or not, if not will info exception as
   the original sigbus handler. If yes, will do memory remapping.

For the control path and the igb uio release:
 - When hot unplug device, the kernel will release the device resource in the
   kernel side, such as the fd sys file will disappear, and the irq will be
   released. At this time, if igb uio driver still try to release this resource,
   it will cause kernel crash.
   On the other hand, something like interrupt disable do not automatically
   process in kernel side. If not handler it, this redundancy and dirty thing
   will affect the interrupt resource be used by other device.
   So the igb_uio driver have to check the hot plug status and corresponding
   process should be taken in igb uio deriver.
   This patch propose to add structure of rte_udev_state into rte_uio_pci_dev
   of igb_uio kernel driver, which will record the state of uio device, such as
   probed/opened/released/removed/unplug. When detect the unexpected removal
   which cause of hot unplug behavior, it will corresponding disable interrupt
   resource, while for the part of releasement which kernel have already handle,
   just skip it to avoid double free or null pointer kernel crash issue.

The mechanism could be use for fail-safe driver and app which want to use hot
plug solution. let testpmd for example:
 - Enable device event monitor->device unplug->failure handle->stop forwarding->
   stop port->close port->detach port.

This process will not breaking the app/fail-safe running, and will not break
other irrelevance device. And app could plug in the device and restart the date
path again by below.
 - Device plug in->bind igb_uio driver ->attached device->start port->
   start forwarding.

patchset history:
v5->v4:
split patches to focus on the failure handle, remove the event usage by testpmd
to another patch.
change the hotplug failure handler name
refine the sigbus handle logic
add lock for udev state in igb uio driver 

v4->v3:
split patches to be small and clear
change to use new parameter "--hotplug-mode" in testpmd
to identify the eal hotplug and ethdev hotplug

v3->v2:
change bus ops name to bus_hotplug_handler.
add new API and bus ops of bus_signal_handler
distingush handle generic sigbus and hotplug sigbus

v2->v1(v21):
refine some doc and commit log
fix igb uio kernel issue for control path failure
rebase testpmd code

Since the hot plug solution be discussed serval around in the public, the
scope be changed and the patch set be split into many times. Coming to the
recently RFC and feature design, it just focus on the hot unplug failure
handler at this patch set, so in order let this topic more clear and focus,
summarize privours patch set in history “v1(v21)”, the v2 here go ahead
for further track.

"v1(21)" == v21 as below:
v21->v20:
split function in hot unplug ops
sync failure hanlde to fix multiple process issue fix attach port issue for multiple devices case.
combind rmv callback function to be only one.

v20->v19:
clean the code
refine the remap logic for multiple device.
remove the auto binding

v19->18:
note for limitation of multiple hotplug,fix some typo, sqeeze patch.

v18->v15:
add document, add signal bus handler, refine the code to be more clear.

the prior patch history please check the patch set "add device event monitor framework"

Jeff Guo (7):
  bus: add hotplug failure handler
  bus/pci: implement hotplug failure handler ops
  bus: add sigbus handler
  bus/pci: implement sigbus handler operation
  bus: add helper to handle sigbus
  eal: add failure handle mechanism for hotplug
  igb_uio: fix uio release issue when hot unplug

 drivers/bus/pci/pci_common.c            |  77 ++++++++++++++++++++++
 drivers/bus/pci/pci_common_uio.c        |  33 ++++++++++
 drivers/bus/pci/private.h               |  12 ++++
 kernel/linux/igb_uio/igb_uio.c          |  51 ++++++++++++++-
 lib/librte_eal/common/eal_common_bus.c  |  36 ++++++++++-
 lib/librte_eal/common/eal_private.h     |  12 ++++
 lib/librte_eal/common/include/rte_bus.h |  31 +++++++++
 lib/librte_eal/linuxapp/eal/eal_dev.c   | 111 +++++++++++++++++++++++++++++++-
 8 files changed, 358 insertions(+), 5 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH V5 1/7] bus: add hotplug failure handler
  2018-07-05  7:38       ` [PATCH V5 0/7] hot plug failure handle mechanism Jeff Guo
@ 2018-07-05  7:38         ` Jeff Guo
  2018-07-06 15:17           ` He, Shaopeng
  2018-07-05  7:38         ` [PATCH V5 2/7] bus/pci: implement hotplug failure handler ops Jeff Guo
                           ` (4 subsequent siblings)
  5 siblings, 1 reply; 494+ messages in thread
From: Jeff Guo @ 2018-07-05  7:38 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

When device be hotplug out, if app still continue to access device by mmio,
it will cause of memory failure and result the system crash.

This patch introduces a bus ops to handle device hotplug failure, it is a
bus specific behavior,so that each kind of bus can implement its own logic
case by case.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v5->v4:
change ops name to be more clear
refine doc and commit log
---
 lib/librte_eal/common/include/rte_bus.h | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
index eb9eded..8a993cf 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -168,6 +168,19 @@ typedef int (*rte_bus_unplug_t)(struct rte_device *dev);
 typedef int (*rte_bus_parse_t)(const char *name, void *addr);
 
 /**
+ * Implementation a specific hotplug failure handler, which is responsible
+ * for handle the failure when hot remove the device, guaranty the system
+ * would not crash in the case.
+ * @param dev
+ *	Pointer of the device structure.
+ *
+ * @return
+ *	0 on success.
+ *	!0 on error.
+ */
+typedef int (*rte_bus_hotplug_failure_handler_t)(struct rte_device *dev);
+
+/**
  * Bus scan policies
  */
 enum rte_bus_scan_mode {
@@ -211,6 +224,8 @@ struct rte_bus {
 	rte_bus_parse_t parse;       /**< Parse a device name */
 	struct rte_bus_conf conf;    /**< Bus configuration */
 	rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
+	rte_bus_hotplug_failure_handler_t hotplug_failure_handler;
+					/**< handle hotplug failure on bus */
 };
 
 /**
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH V5 2/7] bus/pci: implement hotplug failure handler ops
  2018-07-05  7:38       ` [PATCH V5 0/7] hot plug failure handle mechanism Jeff Guo
  2018-07-05  7:38         ` [PATCH V5 1/7] bus: add hotplug failure handler Jeff Guo
@ 2018-07-05  7:38         ` Jeff Guo
  2018-07-06 15:17           ` He, Shaopeng
  2018-07-05  7:38         ` [PATCH V5 3/7] bus: add sigbus handler Jeff Guo
                           ` (3 subsequent siblings)
  5 siblings, 1 reply; 494+ messages in thread
From: Jeff Guo @ 2018-07-05  7:38 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

This patch implements the ops of hotplug failure handler for PCI bus,
it is functional to remap a new dummy memory which overlap to the
failure memory to avoid MMIO read/write error.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v5->v4:
refine log and commit log
---
 drivers/bus/pci/pci_common.c     | 28 ++++++++++++++++++++++++++++
 drivers/bus/pci/pci_common_uio.c | 33 +++++++++++++++++++++++++++++++++
 drivers/bus/pci/private.h        | 12 ++++++++++++
 3 files changed, 73 insertions(+)

diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
index 94b0f41..bc3bcac 100644
--- a/drivers/bus/pci/pci_common.c
+++ b/drivers/bus/pci/pci_common.c
@@ -408,6 +408,33 @@ pci_find_device(const struct rte_device *start, rte_dev_cmp_t cmp,
 }
 
 static int
+pci_hotplug_failure_handler(struct rte_device *dev)
+{
+	struct rte_pci_device *pdev = NULL;
+	int ret = 0;
+
+	pdev = RTE_DEV_TO_PCI(dev);
+	if (!pdev)
+		return -1;
+
+	switch (pdev->kdrv) {
+	case RTE_KDRV_IGB_UIO:
+	case RTE_KDRV_UIO_GENERIC:
+	case RTE_KDRV_NIC_UIO:
+		/* mmio resources is invalid, remap it to be safe. */
+		ret = pci_uio_remap_resource(pdev);
+		break;
+	default:
+		RTE_LOG(DEBUG, EAL,
+			"Not managed by a supported kernel driver, skipped\n");
+		ret = -1;
+		break;
+	}
+
+	return ret;
+}
+
+static int
 pci_plug(struct rte_device *dev)
 {
 	return pci_probe_all_drivers(RTE_DEV_TO_PCI(dev));
@@ -437,6 +464,7 @@ struct rte_pci_bus rte_pci_bus = {
 		.unplug = pci_unplug,
 		.parse = pci_parse,
 		.get_iommu_class = rte_pci_get_iommu_class,
+		.hotplug_failure_handler = pci_hotplug_failure_handler,
 	},
 	.device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
 	.driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
diff --git a/drivers/bus/pci/pci_common_uio.c b/drivers/bus/pci/pci_common_uio.c
index 54bc20b..7ea73db 100644
--- a/drivers/bus/pci/pci_common_uio.c
+++ b/drivers/bus/pci/pci_common_uio.c
@@ -146,6 +146,39 @@ pci_uio_unmap(struct mapped_pci_resource *uio_res)
 	}
 }
 
+/* remap the PCI resource of a PCI device in anonymous virtual memory */
+int
+pci_uio_remap_resource(struct rte_pci_device *dev)
+{
+	int i;
+	void *map_address;
+
+	if (dev == NULL)
+		return -1;
+
+	/* Remap all BARs */
+	for (i = 0; i != PCI_MAX_RESOURCE; i++) {
+		/* skip empty BAR */
+		if (dev->mem_resource[i].phys_addr == 0)
+			continue;
+		map_address = mmap(dev->mem_resource[i].addr,
+				(size_t)dev->mem_resource[i].len,
+				PROT_READ | PROT_WRITE,
+				MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
+		if (map_address == MAP_FAILED) {
+			RTE_LOG(ERR, EAL,
+				"Cannot remap resource for device %s\n",
+				dev->name);
+			return -1;
+		}
+		RTE_LOG(INFO, EAL,
+			"Successful remap resource for device %s\n",
+			dev->name);
+	}
+
+	return 0;
+}
+
 static struct mapped_pci_resource *
 pci_uio_find_resource(struct rte_pci_device *dev)
 {
diff --git a/drivers/bus/pci/private.h b/drivers/bus/pci/private.h
index 8ddd03e..6b312e5 100644
--- a/drivers/bus/pci/private.h
+++ b/drivers/bus/pci/private.h
@@ -123,6 +123,18 @@ void pci_uio_free_resource(struct rte_pci_device *dev,
 		struct mapped_pci_resource *uio_res);
 
 /**
+ * Remap the PCI resource of a PCI device in anonymous virtual memory.
+ *
+ * @param dev
+ *   Point to the struct rte pci device.
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+pci_uio_remap_resource(struct rte_pci_device *dev);
+
+/**
  * Map device memory to uio resource
  *
  * This function is private to EAL.
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH V5 3/7] bus: add sigbus handler
  2018-07-05  7:38       ` [PATCH V5 0/7] hot plug failure handle mechanism Jeff Guo
  2018-07-05  7:38         ` [PATCH V5 1/7] bus: add hotplug failure handler Jeff Guo
  2018-07-05  7:38         ` [PATCH V5 2/7] bus/pci: implement hotplug failure handler ops Jeff Guo
@ 2018-07-05  7:38         ` Jeff Guo
  2018-07-06 15:17           ` He, Shaopeng
  2018-07-05  7:38         ` [PATCH V5 4/7] bus/pci: implement sigbus handler operation Jeff Guo
                           ` (2 subsequent siblings)
  5 siblings, 1 reply; 494+ messages in thread
From: Jeff Guo @ 2018-07-05  7:38 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

When device be hotplug out, if data path still read/write device, the
sigbus error will occur, this error need to be handled. So a handler
need to be here to capture the signal and handle it correspondingly.

This patch introduces a bus ops to handle sigbus error, it is a bus
specific behavior,so that each kind of bus can implement its own logic
case by case.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v5->v4:
refine log and commit log
---
 lib/librte_eal/common/include/rte_bus.h | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
index 8a993cf..d753575 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -181,6 +181,20 @@ typedef int (*rte_bus_parse_t)(const char *name, void *addr);
 typedef int (*rte_bus_hotplug_failure_handler_t)(struct rte_device *dev);
 
 /**
+ * Implementation a specific sigbus handler, which is responsible
+ * for handle the sigbus error which is original memory error, or specific
+ * memory error that caused of hot unplug.
+ * @param failure_addr
+ *	Pointer of the fault address of the sigbus error.
+ *
+ * @return
+ *	0 for success handle the sigbus.
+ *	1 for no bus handle the sigbus.
+ *	-1 for failed to handle the sigbus
+ */
+typedef int (*rte_bus_sigbus_handler_t)(const void *failure_addr);
+
+/**
  * Bus scan policies
  */
 enum rte_bus_scan_mode {
@@ -226,6 +240,8 @@ struct rte_bus {
 	rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
 	rte_bus_hotplug_failure_handler_t hotplug_failure_handler;
 					/**< handle hotplug failure on bus */
+	rte_bus_sigbus_handler_t sigbus_handler; /**< handle sigbus error */
+
 };
 
 /**
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH V5 4/7] bus/pci: implement sigbus handler operation
  2018-07-05  7:38       ` [PATCH V5 0/7] hot plug failure handle mechanism Jeff Guo
                           ` (2 preceding siblings ...)
  2018-07-05  7:38         ` [PATCH V5 3/7] bus: add sigbus handler Jeff Guo
@ 2018-07-05  7:38         ` Jeff Guo
  2018-07-06 15:18           ` He, Shaopeng
  2018-07-05  7:38         ` [PATCH V5 5/7] bus: add helper to handle sigbus Jeff Guo
  2018-07-05  7:38         ` [PATCH V5 6/7] eal: add failure handle mechanism for hotplug Jeff Guo
  5 siblings, 1 reply; 494+ messages in thread
From: Jeff Guo @ 2018-07-05  7:38 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

This patch implements the ops of sigbus handler for PCI bus, it is
functional to find the corresponding pci device which is be hotplug out.
and then handle the hotplug failure for this device.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v5->v4:
no change
---
 drivers/bus/pci/pci_common.c | 49 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 49 insertions(+)

diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
index bc3bcac..f065271 100644
--- a/drivers/bus/pci/pci_common.c
+++ b/drivers/bus/pci/pci_common.c
@@ -407,6 +407,32 @@ pci_find_device(const struct rte_device *start, rte_dev_cmp_t cmp,
 	return NULL;
 }
 
+/* check the failure address belongs to which device. */
+static struct rte_pci_device *
+pci_find_device_by_addr(const void *failure_addr)
+{
+	struct rte_pci_device *pdev = NULL;
+	int i;
+
+	FOREACH_DEVICE_ON_PCIBUS(pdev) {
+		for (i = 0; i != RTE_DIM(pdev->mem_resource); i++) {
+			if ((uint64_t)(uintptr_t)failure_addr >=
+			    (uint64_t)(uintptr_t)pdev->mem_resource[i].addr &&
+			    (uint64_t)(uintptr_t)failure_addr <
+			    (uint64_t)(uintptr_t)pdev->mem_resource[i].addr +
+			    pdev->mem_resource[i].len) {
+				RTE_LOG(INFO, EAL, "Failure address "
+					"%16.16"PRIx64" belongs to "
+					"device %s!\n",
+					(uint64_t)(uintptr_t)failure_addr,
+					pdev->device.name);
+				return pdev;
+			}
+		}
+	}
+	return NULL;
+}
+
 static int
 pci_hotplug_failure_handler(struct rte_device *dev)
 {
@@ -435,6 +461,28 @@ pci_hotplug_failure_handler(struct rte_device *dev)
 }
 
 static int
+pci_sigbus_handler(const void *failure_addr)
+{
+	struct rte_pci_device *pdev = NULL;
+	int ret = 0;
+
+	pdev = pci_find_device_by_addr(failure_addr);
+	if (!pdev) {
+		/* It is a generic sigbus error, no bus would handle it. */
+		ret = 1;
+	} else {
+		/* The sigbus error is caused of hot removal. */
+		ret = pci_hotplug_failure_handler(&pdev->device);
+		if (ret) {
+			RTE_LOG(ERR, EAL, "Failed to handle hot plug for "
+				"device %s", pdev->name);
+			ret = -1;
+		}
+	}
+	return ret;
+}
+
+static int
 pci_plug(struct rte_device *dev)
 {
 	return pci_probe_all_drivers(RTE_DEV_TO_PCI(dev));
@@ -465,6 +513,7 @@ struct rte_pci_bus rte_pci_bus = {
 		.parse = pci_parse,
 		.get_iommu_class = rte_pci_get_iommu_class,
 		.hotplug_failure_handler = pci_hotplug_failure_handler,
+		.sigbus_handler = pci_sigbus_handler,
 	},
 	.device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
 	.driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH V5 5/7] bus: add helper to handle sigbus
  2018-07-05  7:38       ` [PATCH V5 0/7] hot plug failure handle mechanism Jeff Guo
                           ` (3 preceding siblings ...)
  2018-07-05  7:38         ` [PATCH V5 4/7] bus/pci: implement sigbus handler operation Jeff Guo
@ 2018-07-05  7:38         ` Jeff Guo
  2018-07-06 15:22           ` He, Shaopeng
  2018-07-08 13:30           ` Andrew Rybchenko
  2018-07-05  7:38         ` [PATCH V5 6/7] eal: add failure handle mechanism for hotplug Jeff Guo
  5 siblings, 2 replies; 494+ messages in thread
From: Jeff Guo @ 2018-07-05  7:38 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

This patch aim to add a helper to iterate all buses to find the
corresponding bus to handle the sigbus error.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v5->v4:
refine the errno restore logic
---
 lib/librte_eal/common/eal_common_bus.c | 36 +++++++++++++++++++++++++++++++++-
 lib/librte_eal/common/eal_private.h    | 12 ++++++++++++
 2 files changed, 47 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/common/eal_common_bus.c b/lib/librte_eal/common/eal_common_bus.c
index 0943851..c9f3566 100644
--- a/lib/librte_eal/common/eal_common_bus.c
+++ b/lib/librte_eal/common/eal_common_bus.c
@@ -37,6 +37,7 @@
 #include <rte_bus.h>
 #include <rte_debug.h>
 #include <rte_string_fns.h>
+#include <rte_errno.h>
 
 #include "eal_private.h"
 
@@ -220,7 +221,6 @@ rte_bus_find_by_device_name(const char *str)
 	return rte_bus_find(NULL, bus_can_parse, name);
 }
 
-
 /*
  * Get iommu class of devices on the bus.
  */
@@ -242,3 +242,37 @@ rte_bus_get_iommu_class(void)
 	}
 	return mode;
 }
+
+static int
+bus_handle_sigbus(const struct rte_bus *bus,
+			const void *failure_addr)
+{
+	int ret;
+
+	ret = bus->sigbus_handler(failure_addr);
+	rte_errno = ret;
+
+	return !(bus->sigbus_handler && ret <= 0);
+}
+
+int
+rte_bus_sigbus_handler(const void *failure_addr)
+{
+	struct rte_bus *bus;
+
+	int ret = 0;
+	int old_errno = rte_errno;
+	rte_errno = 0;
+
+	bus = rte_bus_find(NULL, bus_handle_sigbus, failure_addr);
+	/* failed to handle the sigbus, pass the new errno. */
+	if (bus && rte_errno == -1)
+		return -1;
+	else if (!bus)
+		ret = 1;
+
+	/* otherwise restore the old errno. */
+	rte_errno = old_errno;
+
+	return ret;
+}
diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h
index bdadc4d..a91c4b5 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -258,4 +258,16 @@ int rte_mp_channel_init(void);
  */
 void dev_callback_process(char *device_name, enum rte_dev_event_type event);
 
+
+/**
+ * Iterate all buses to find the corresponding bus, to handle the sigbus error.
+ * @param failure_addr
+ *	Pointer of the fault address of the sigbus error.
+ *
+ * @return
+ *	 0 success to handle the sigbus.
+ *	-1 failed to handle the sigbus
+ *	 1 no bus can handler the sigbus
+ */
+int rte_bus_sigbus_handler(const void *failure_addr);
 #endif /* _EAL_PRIVATE_H_ */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH V5 6/7] eal: add failure handle mechanism for hotplug
  2018-07-05  7:38       ` [PATCH V5 0/7] hot plug failure handle mechanism Jeff Guo
                           ` (4 preceding siblings ...)
  2018-07-05  7:38         ` [PATCH V5 5/7] bus: add helper to handle sigbus Jeff Guo
@ 2018-07-05  7:38         ` Jeff Guo
  2018-07-06 15:22           ` He, Shaopeng
  2018-07-08 13:46           ` Andrew Rybchenko
  5 siblings, 2 replies; 494+ messages in thread
From: Jeff Guo @ 2018-07-05  7:38 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

This patch introduces a failure handler mechanism to handle device
hot plug removal event.

First register sigbus handler, once sigbus error be captured, will
check the failure address and accordingly remap the invalid memory
for the corresponding device. Bese on this mechanism, it could
guaranty the application not to be crash when hotplug out device.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v5->v4:
add sigbus old handler recover.
---
 lib/librte_eal/linuxapp/eal/eal_dev.c | 111 +++++++++++++++++++++++++++++++++-
 1 file changed, 110 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
index 1cf6aeb..a22cb9a 100644
--- a/lib/librte_eal/linuxapp/eal/eal_dev.c
+++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
@@ -4,6 +4,8 @@
 
 #include <string.h>
 #include <unistd.h>
+#include <fcntl.h>
+#include <signal.h>
 #include <sys/socket.h>
 #include <linux/netlink.h>
 
@@ -14,15 +16,28 @@
 #include <rte_malloc.h>
 #include <rte_interrupts.h>
 #include <rte_alarm.h>
+#include <rte_bus.h>
+#include <rte_eal.h>
+#include <rte_spinlock.h>
+#include <rte_errno.h>
 
 #include "eal_private.h"
 
 static struct rte_intr_handle intr_handle = {.fd = -1 };
 static bool monitor_started;
 
+extern struct rte_bus_list rte_bus_list;
+
 #define EAL_UEV_MSG_LEN 4096
 #define EAL_UEV_MSG_ELEM_LEN 128
 
+/* spinlock for device failure process */
+static rte_spinlock_t dev_failure_lock = RTE_SPINLOCK_INITIALIZER;
+
+static struct sigaction sigbus_action_old;
+
+static int sigbus_need_recover;
+
 static void dev_uev_handler(__rte_unused void *param);
 
 /* identify the system layer which reports this event. */
@@ -33,6 +48,49 @@ enum eal_dev_event_subsystem {
 	EAL_DEV_EVENT_SUBSYSTEM_MAX
 };
 
+static void
+sigbus_action_recover(void)
+{
+	if (sigbus_need_recover) {
+		sigaction(SIGBUS, &sigbus_action_old, NULL);
+		sigbus_need_recover = 0;
+	}
+}
+
+static void sigbus_handler(int signum, siginfo_t *info,
+				void *ctx __rte_unused)
+{
+	int ret;
+
+	RTE_LOG(INFO, EAL, "Thread[%d] catch SIGBUS, fault address:%p\n",
+		(int)pthread_self(), info->si_addr);
+
+	rte_spinlock_lock(&dev_failure_lock);
+	ret = rte_bus_sigbus_handler(info->si_addr);
+	rte_spinlock_unlock(&dev_failure_lock);
+	if (ret == -1) {
+		rte_exit(EXIT_FAILURE,
+			 "Failed to handle SIGBUS for hotplug, "
+			 "(rte_errno: %s)!", strerror(rte_errno));
+	} else if (ret == 1) {
+		if (sigbus_action_old.sa_handler)
+			(*(sigbus_action_old.sa_handler))(signum);
+		else
+			rte_exit(EXIT_FAILURE,
+				 "Failed to handle generic SIGBUS!");
+	}
+
+	RTE_LOG(INFO, EAL, "Success to handle SIGBUS for hotplug!\n");
+}
+
+static int cmp_dev_name(const struct rte_device *dev,
+	const void *_name)
+{
+	const char *name = _name;
+
+	return strcmp(dev->name, name);
+}
+
 static int
 dev_uev_socket_fd_create(void)
 {
@@ -147,6 +205,9 @@ dev_uev_handler(__rte_unused void *param)
 	struct rte_dev_event uevent;
 	int ret;
 	char buf[EAL_UEV_MSG_LEN];
+	struct rte_bus *bus;
+	struct rte_device *dev;
+	const char *busname;
 
 	memset(&uevent, 0, sizeof(struct rte_dev_event));
 	memset(buf, 0, EAL_UEV_MSG_LEN);
@@ -171,13 +232,50 @@ dev_uev_handler(__rte_unused void *param)
 	RTE_LOG(DEBUG, EAL, "receive uevent(name:%s, type:%d, subsystem:%d)\n",
 		uevent.devname, uevent.type, uevent.subsystem);
 
-	if (uevent.devname)
+	switch (uevent.subsystem) {
+	case EAL_DEV_EVENT_SUBSYSTEM_PCI:
+	case EAL_DEV_EVENT_SUBSYSTEM_UIO:
+		busname = "pci";
+		break;
+	default:
+		break;
+	}
+
+	if (uevent.devname) {
+		if (uevent.type == RTE_DEV_EVENT_REMOVE) {
+			rte_spinlock_lock(&dev_failure_lock);
+			bus = rte_bus_find_by_name(busname);
+			if (bus == NULL) {
+				RTE_LOG(ERR, EAL, "Cannot find bus (%s)\n",
+					busname);
+				return;
+			}
+
+			dev = bus->find_device(NULL, cmp_dev_name,
+					       uevent.devname);
+			if (dev == NULL) {
+				RTE_LOG(ERR, EAL, "Cannot find device (%s) on "
+					"bus (%s)\n", uevent.devname, busname);
+				return;
+			}
+
+			ret = bus->hotplug_failure_handler(dev);
+			rte_spinlock_unlock(&dev_failure_lock);
+			if (ret) {
+				RTE_LOG(ERR, EAL, "Can not handle hotplug for "
+					"device (%s)\n", dev->name);
+				return;
+			}
+		}
 		dev_callback_process(uevent.devname, uevent.type);
+	}
 }
 
 int __rte_experimental
 rte_dev_event_monitor_start(void)
 {
+	sigset_t mask;
+	struct sigaction action;
 	int ret;
 
 	if (monitor_started)
@@ -197,6 +295,14 @@ rte_dev_event_monitor_start(void)
 		return -1;
 	}
 
+	/* register sigbus handler */
+	sigemptyset(&mask);
+	sigaddset(&mask, SIGBUS);
+	action.sa_flags = SA_SIGINFO;
+	action.sa_mask = mask;
+	action.sa_sigaction = sigbus_handler;
+	sigbus_need_recover = !sigaction(SIGBUS, &action, &sigbus_action_old);
+
 	monitor_started = true;
 
 	return 0;
@@ -217,8 +323,11 @@ rte_dev_event_monitor_stop(void)
 		return ret;
 	}
 
+	sigbus_action_recover();
+
 	close(intr_handle.fd);
 	intr_handle.fd = -1;
 	monitor_started = false;
+
 	return 0;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* Re: [PATCH V4 8/9] app/testpmd: show example to handle hot unplug
  2018-07-04  7:06                 ` Matan Azrad
@ 2018-07-05  7:54                   ` Guo, Jia
  0 siblings, 0 replies; 494+ messages in thread
From: Guo, Jia @ 2018-07-05  7:54 UTC (permalink / raw)
  To: Matan Azrad, Thomas Monjalon
  Cc: dev, stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, Mordechay Haimovsky, harry.van.haaren,
	qi.z.zhang, shaopeng.he, bernard.iremonger, shreyansh.jain,
	helin.zhang



On 7/4/2018 3:06 PM, Matan Azrad wrote:
> Hi Thomas, Guo
>
> From: Thomas Monjalon
>> 03/07/2018 11:35, Guo, Jia:
>>> On 7/1/2018 3:46 PM, Matan Azrad wrote:
>>>> From: Jeff Guo
>>>>> --- a/app/test-pmd/testpmd.c
>>>>> +++ b/app/test-pmd/testpmd.c
>>>>> @@ -2206,9 +2209,12 @@ eth_dev_event_callback(char
>> *device_name,
>>>>> enum rte_dev_event_type type,
>>>>>    	case RTE_DEV_EVENT_REMOVE:
>>>>>    		RTE_LOG(ERR, EAL, "The device: %s has been removed!\n",
>>>>>    			device_name);
>>>>> -		/* TODO: After finish failure handle, begin to stop
>>>>> -		 * packet forward, stop port, close port, detach port.
>>>>> -		 */
>>>>> +		ret = rte_eth_dev_get_port_by_name(device_name,
>> &port_id);
>>>> As you probably know, 1 rte_device may be associated to more than one
>> ethdev ports, so the ethdev port name can be different from rte_device
>> name.
>>>> Looks like we need a new ethdev API to get all the ports associated to
>> one rte_device.
>>> agree, seems that the the old ethdev API have some issue when got all
>>> port by device name. we could check with ethdev maintainer and fix it
>>> by specific ethdev patch later.
>> This ethdev function could return an error if several ports match.
>>
> Just to clarify:
>
> The  ethdev name may be different from the rte_device name of a port,
> The rte_eth_dev_get_port_by_name() searches the ethdev name and not the rte_device name.
>
>> Ideally, we should not use this function at all.
>> If you want to manage an ethdev port, why are you using an EAL event?
>> There is an ethdev callback mechanism for port removal.
> So, looks like the EAL event should trigger an ethdev event for all the ports associated to this rte_device.
> I think that the best one to do it is the PMD, so maybe the PMD(which wants to support hot unplug) should register to the EAL event and to trigger an ethdev RMV event from the EAL callback.
>
> What do you think?
>

i think matan give an constructive option to combine the usage of eal 
event and ethdev event,  but i am not sure which is the best one.
So let this discuss on going, i will remove this 8/9 and 9/9 patches, 
let the patch set focus on the hotplug failure mechanism ,
and will use another patch set to cover the event management example in 
testpmd.

>   
>

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH V5 0/7] hot plug failure handle mechanism
  2017-06-29  4:37     ` [PATCH v3 0/2] add uevent api for hot plug Jeff Guo
                         ` (9 preceding siblings ...)
  2018-07-05  7:38       ` [PATCH V5 0/7] hot plug failure handle mechanism Jeff Guo
@ 2018-07-05  8:21       ` Jeff Guo
  2018-07-05  8:21         ` [PATCH V5 7/7] igb_uio: fix uio release issue when hot unplug Jeff Guo
  2018-07-09  6:51       ` [PATCH v6 0/7] hotplug failure handle mechanism Jeff Guo
                         ` (12 subsequent siblings)
  23 siblings, 1 reply; 494+ messages in thread
From: Jeff Guo @ 2018-07-05  8:21 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

As we know, hot plug is an importance feature, either use for the datacenter
device’s fail-safe, or use for SRIOV Live Migration in SDN/NFV. It could bring
the higher flexibility and continuality to the networking services in multiple
use cases in industry. So let we see, dpdk as an importance networking
framework, what can it help to implement hot plug solution for users.

We already have a general device event detect mechanism, failsafe driver,
bonding driver and hot plug/unplug api in framework, app could use these to
develop their hot plug solution.

let’s see the case of hot unplug, it can happen when a hardware device is
be removed physically, or when the software disables it.  App need to call
ether dev API to detach the device, to unplug the device at the bus level and
make access to the device invalid. But the problem is that, the removal of the
device from the software lists is not going to be instantaneous, at this time
if the data(fast) path still read/write the device, it will cause MMIO error
and result of the app crash out.

Seems that we have got fail-safe driver(or app) + RTE_ETH_EVENT_INTR_RMV +
kernel core driver solution to handle it, but still not have failsafe driver
(or app) + RTE_DEV_EVENT_REMOVE + PCIe pmd driver failure handle solution. So
there is an absence in dpdk hot plug solution right now.

Also, we know that kernel only guaranty hot plug on the kernel side, but not for
the user mode side. Firstly we can hardly have a gatekeeper for any MMIO for
multiple PMD driver. Secondly, no more specific 3rd tools such as udev/driverctl
have especially cover these hot plug failure processing. Third, the feasibility
of app’s implement for multiple user mode PMD driver is still a problem. Here,
a general hot plug failure handle mechanism in dpdk framework would be proposed,
it aim to guaranty that, when hot unplug occur, the system will not crash and
app will not be break out, and user space can normally stop and release any
relevant resources, then unplug of the device at the bus level cleanly.

The mechanism should be come across as bellow:

Firstly, app enabled the device event monitor and register the hot plug event’s
callback before running data path. Once the hot unplug behave occur, the
mechanism will detect the removal event and then accordingly do the failure
handle. In order to do that, below functional will be bring in.
 - Add a new bus ops “handle_hot_unplug” to handle bus read/write error, it is
   bus-specific and each kind of bus can implement its own logic.
 - Implement pci bus specific ops “pci_handle_hot_unplug”. It will base on the
   failure address to remap memory for the corresponding device that unplugged.

For the data path or other unexpected control from the control path when hot
unplug occur.
 - Implement a new sigbus handler, it is registered when start device even
   monitoring. The handler is per process. Base on the signal event principle,
   control path thread and data path thread will randomly receive the sigbus
   error, but will go to the common sigbus handler. Once the MMIO sigbus error
   exposure, it will trigger the above hot unplug operation. The sigbus will be
   check if it is cause of the hot unplug or not, if not will info exception as
   the original sigbus handler. If yes, will do memory remapping.

For the control path and the igb uio release:
 - When hot unplug device, the kernel will release the device resource in the
   kernel side, such as the fd sys file will disappear, and the irq will be
   released. At this time, if igb uio driver still try to release this resource,
   it will cause kernel crash.
   On the other hand, something like interrupt disable do not automatically
   process in kernel side. If not handler it, this redundancy and dirty thing
   will affect the interrupt resource be used by other device.
   So the igb_uio driver have to check the hot plug status and corresponding
   process should be taken in igb uio deriver.
   This patch propose to add structure of rte_udev_state into rte_uio_pci_dev
   of igb_uio kernel driver, which will record the state of uio device, such as
   probed/opened/released/removed/unplug. When detect the unexpected removal
   which cause of hot unplug behavior, it will corresponding disable interrupt
   resource, while for the part of releasement which kernel have already handle,
   just skip it to avoid double free or null pointer kernel crash issue.

The mechanism could be use for fail-safe driver and app which want to use hot
plug solution. let testpmd for example:
 - Enable device event monitor->device unplug->failure handle->stop forwarding->
   stop port->close port->detach port.

This process will not breaking the app/fail-safe running, and will not break
other irrelevance device. And app could plug in the device and restart the date
path again by below.
 - Device plug in->bind igb_uio driver ->attached device->start port->
   start forwarding.

patchset history:
v5->v4:
split patches to focus on the failure handle, remove the event usage by testpmd
to another patch.
change the hotplug failure handler name
refine the sigbus handle logic
add lock for udev state in igb uio driver 

v4->v3:
split patches to be small and clear
change to use new parameter "--hotplug-mode" in testpmd
to identify the eal hotplug and ethdev hotplug

v3->v2:
change bus ops name to bus_hotplug_handler.
add new API and bus ops of bus_signal_handler
distingush handle generic sigbus and hotplug sigbus

v2->v1(v21):
refine some doc and commit log
fix igb uio kernel issue for control path failure
rebase testpmd code

Since the hot plug solution be discussed serval around in the public, the
scope be changed and the patch set be split into many times. Coming to the
recently RFC and feature design, it just focus on the hot unplug failure
handler at this patch set, so in order let this topic more clear and focus,
summarize privours patch set in history “v1(v21)”, the v2 here go ahead
for further track.

"v1(21)" == v21 as below:
v21->v20:
split function in hot unplug ops
sync failure hanlde to fix multiple process issue fix attach port issue for multiple devices case.
combind rmv callback function to be only one.

v20->v19:
clean the code
refine the remap logic for multiple device.
remove the auto binding

v19->18:
note for limitation of multiple hotplug,fix some typo, sqeeze patch.

v18->v15:
add document, add signal bus handler, refine the code to be more clear.

the prior patch history please check the patch set "add device event monitor framework"

Jeff Guo (7):
  bus: add hotplug failure handler
  bus/pci: implement hotplug failure handler ops
  bus: add sigbus handler
  bus/pci: implement sigbus handler operation
  bus: add helper to handle sigbus
  eal: add failure handle mechanism for hotplug
  igb_uio: fix uio release issue when hot unplug

 drivers/bus/pci/pci_common.c            |  77 ++++++++++++++++++++++
 drivers/bus/pci/pci_common_uio.c        |  33 ++++++++++
 drivers/bus/pci/private.h               |  12 ++++
 kernel/linux/igb_uio/igb_uio.c          |  51 ++++++++++++++-
 lib/librte_eal/common/eal_common_bus.c  |  36 ++++++++++-
 lib/librte_eal/common/eal_private.h     |  12 ++++
 lib/librte_eal/common/include/rte_bus.h |  31 +++++++++
 lib/librte_eal/linuxapp/eal/eal_dev.c   | 111 +++++++++++++++++++++++++++++++-
 8 files changed, 358 insertions(+), 5 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH V5 7/7] igb_uio: fix uio release issue when hot unplug
  2018-07-05  8:21       ` [PATCH V5 0/7] hot plug failure handle mechanism Jeff Guo
@ 2018-07-05  8:21         ` Jeff Guo
  0 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-07-05  8:21 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

When hotplug out device, the kernel will release the device resource in the
kernel side, such as the fd sys file will disappear, and the irq will be
released. At this time, if igb uio driver still try to release this
resource, it will cause kernel crash. On the other hand, something like
interrupt disabling do not automatically process in kernel side. If not
handler it, this redundancy and dirty thing will affect the interrupt
resource be used by other device. So the igb_uio driver have to check the
hotplug status, and the corresponding process should be taken in igb uio
driver.

This patch propose to add structure of rte_udev_state into rte_uio_pci_dev
of igb_uio kernel driver, which will record the state of uio device, such
as probed/opened/released/removed/unplug. When detect the unexpected
removal which cause of hotplug out behavior, it will corresponding disable
interrupt resource, while for the part of releasement which kernel have
already handle, just skip it to avoid double free or null pointer kernel
crash issue.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v5->v4:
add lock for udev state
---
 kernel/linux/igb_uio/igb_uio.c | 51 +++++++++++++++++++++++++++++++++++++++---
 1 file changed, 48 insertions(+), 3 deletions(-)

diff --git a/kernel/linux/igb_uio/igb_uio.c b/kernel/linux/igb_uio/igb_uio.c
index 3398eac..adc8cea 100644
--- a/kernel/linux/igb_uio/igb_uio.c
+++ b/kernel/linux/igb_uio/igb_uio.c
@@ -19,6 +19,15 @@
 
 #include "compat.h"
 
+/* uio pci device state */
+enum rte_udev_state {
+	RTE_UDEV_PROBED,
+	RTE_UDEV_OPENNED,
+	RTE_UDEV_RELEASED,
+	RTE_UDEV_REMOVED,
+	RTE_UDEV_UNPLUG
+};
+
 /**
  * A structure describing the private information for a uio device.
  */
@@ -28,6 +37,7 @@ struct rte_uio_pci_dev {
 	enum rte_intr_mode mode;
 	struct mutex lock;
 	int refcnt;
+	enum rte_udev_state state;
 };
 
 static int wc_activate;
@@ -195,12 +205,22 @@ igbuio_pci_irqhandler(int irq, void *dev_id)
 {
 	struct rte_uio_pci_dev *udev = (struct rte_uio_pci_dev *)dev_id;
 	struct uio_info *info = &udev->info;
+	struct pci_dev *pdev = udev->pdev;
 
 	/* Legacy mode need to mask in hardware */
 	if (udev->mode == RTE_INTR_MODE_LEGACY &&
 	    !pci_check_and_mask_intx(udev->pdev))
 		return IRQ_NONE;
 
+	mutex_lock(&udev->lock);
+	/* check the uevent of the kobj */
+	if ((&pdev->dev.kobj)->state_remove_uevent_sent == 1) {
+		dev_notice(&pdev->dev, "device:%s, sent remove uevent!\n",
+			   (&pdev->dev.kobj)->name);
+		udev->state = RTE_UDEV_UNPLUG;
+	}
+	mutex_unlock(&udev->lock);
+
 	uio_event_notify(info);
 
 	/* Message signal mode, no share IRQ and automasked */
@@ -309,7 +329,6 @@ igbuio_pci_disable_interrupts(struct rte_uio_pci_dev *udev)
 #endif
 }
 
-
 /**
  * This gets called while opening uio device file.
  */
@@ -331,20 +350,29 @@ igbuio_pci_open(struct uio_info *info, struct inode *inode)
 
 	/* enable interrupts */
 	err = igbuio_pci_enable_interrupts(udev);
-	mutex_unlock(&udev->lock);
 	if (err) {
 		dev_err(&dev->dev, "Enable interrupt fails\n");
+		pci_clear_master(dev);
+		mutex_unlock(&udev->lock);
 		return err;
 	}
+	udev->state = RTE_UDEV_OPENNED;
+	mutex_unlock(&udev->lock);
 	return 0;
 }
 
+/**
+ * This gets called while closing uio device file.
+ */
 static int
 igbuio_pci_release(struct uio_info *info, struct inode *inode)
 {
 	struct rte_uio_pci_dev *udev = info->priv;
 	struct pci_dev *dev = udev->pdev;
 
+	if (udev->state == RTE_UDEV_REMOVED)
+		return 0;
+
 	mutex_lock(&udev->lock);
 	if (--udev->refcnt > 0) {
 		mutex_unlock(&udev->lock);
@@ -356,7 +384,7 @@ igbuio_pci_release(struct uio_info *info, struct inode *inode)
 
 	/* stop the device from further DMA */
 	pci_clear_master(dev);
-
+	udev->state = RTE_UDEV_RELEASED;
 	mutex_unlock(&udev->lock);
 	return 0;
 }
@@ -562,6 +590,9 @@ igbuio_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
 			 (unsigned long long)map_dma_addr, map_addr);
 	}
 
+	mutex_lock(&udev->lock);
+	udev->state = RTE_UDEV_PROBED;
+	mutex_unlock(&udev->lock);
 	return 0;
 
 fail_remove_group:
@@ -579,6 +610,20 @@ static void
 igbuio_pci_remove(struct pci_dev *dev)
 {
 	struct rte_uio_pci_dev *udev = pci_get_drvdata(dev);
+	int ret;
+
+	/* handler hot unplug */
+	if (udev->state == RTE_UDEV_OPENNED ||
+		udev->state == RTE_UDEV_UNPLUG) {
+		dev_notice(&dev->dev, "Unexpected removal!\n");
+		ret = igbuio_pci_release(&udev->info, NULL);
+		if (ret)
+			return;
+		mutex_lock(&udev->lock);
+		udev->state = RTE_UDEV_REMOVED;
+		mutex_unlock(&udev->lock);
+		return;
+	}
 
 	mutex_destroy(&udev->lock);
 	sysfs_remove_group(&dev->dev.kobj, &dev_attr_grp);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* Re: [PATCH V4 1/9] bus: introduce hotplug failure handler
  2018-07-05  6:23                 ` Guo, Jia
@ 2018-07-05  8:30                   ` Thomas Monjalon
  0 siblings, 0 replies; 494+ messages in thread
From: Thomas Monjalon @ 2018-07-05  8:30 UTC (permalink / raw)
  To: Guo, Jia
  Cc: dev, stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, motih, matan, harry.van.haaren,
	qi.z.zhang, shaopeng.he, bernard.iremonger, jblunck,
	shreyansh.jain, helin.zhang

05/07/2018 08:23, Guo, Jia:
> 
> On 7/4/2018 3:55 PM, Thomas Monjalon wrote:
> > 04/07/2018 09:16, Guo, Jia:
> >> On 7/4/2018 6:21 AM, Thomas Monjalon wrote:
> >>> 29/06/2018 12:30, Jeff Guo:
> >>>>    /**
> >>>> + * Implementation a specific hot plug handler, which is responsible
> >>>> + * for handle the failure when hot remove the device, guaranty the system
> >>>> + * would not crash in the case.
> >>>> + * @param dev
> >>>> + *	Pointer of the device structure.
> >>>> + *
> >>>> + * @return
> >>>> + *	0 on success.
> >>>> + *	!0 on error.
> >>>> + */
> >>>> +typedef int (*rte_bus_hotplug_handler_t)(struct rte_device *dev);
> >>> [...]
> >>>> @@ -211,6 +224,8 @@ struct rte_bus {
> >>>>    	rte_bus_parse_t parse;       /**< Parse a device name */
> >>>>    	struct rte_bus_conf conf;    /**< Bus configuration */
> >>>>    	rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
> >>>> +	rte_bus_hotplug_handler_t hotplug_handler;
> >>>> +						/**< handle hot plug on bus */
> >>> The name is misleading.
> >>> It is to handle unplugging but is called "hotplug".
> >> ok, so i prefer hotplug_failure_handler than hot_unplug_handler, since
> >> it is more explicit for failure handle, and more clearly.
> >>
> >>> In order to demonstrate how the handler is used, you should
> >>> introduce the code using this handler in the same patch.
> >>>
> >> sorry, i check the history of rte_bus.h, and the way is introduce ops at
> >> first, second implement in specific bus, then come across the usage.
> >> I think that way clear and make sense. what do you think?
> >> Anyway, i will check the commit log if is there any misleading.
> > I think it is better to call ops when they are introduced,
> > and implement the ops in second step.
> >
> 
> Hi, Thomas
> 
> sorry but i want to detail the relationship of the ops and api as bellow 
> to try if we can get the better sequence.
> 
> Patch num:
> 
> 1: introduce ops hotplug_failure_handler
> 
> 2: implement ops hotplug_failure_handler
> 
> 3:introduce ops sigbus_handler.
> 
> 4:implement ops sigbus_handler
> 
> 5: introduce helper rte_bus_sigbus_handler to call the ops sigbus_handler
> 
> 6: introduce the mechanism to call helper rte_bus_sigbus_handler and 
> call hotplug_failure_handler.
> 
> If per you said , could I modify the sequence like 6->5->3->4->1->2? I 
> don't think it will make sense, and might be more confused.
> 
> And I think should be better that introduce each ops just say item, then 
> when introduce the caller patch, the functional is ready to use by the 
> patch.
> 
> 
> if i did not got your point and you have other better sequence about 
> that please explicit to let me know. Thanks.

The main concern is to be able to understand each patch separately.

When introducing a new op, we need to understand how it will be used.
But actually, no need to change patch organization,
you just need to provide a clear doxygen documentation,
and introduce the context in the commit log.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V5 1/7] bus: add hotplug failure handler
  2018-07-05  7:38         ` [PATCH V5 1/7] bus: add hotplug failure handler Jeff Guo
@ 2018-07-06 15:17           ` He, Shaopeng
  0 siblings, 0 replies; 494+ messages in thread
From: He, Shaopeng @ 2018-07-06 15:17 UTC (permalink / raw)
  To: Guo, Jia, stephen, Richardson, Bruce, Yigit, Ferruh, Ananyev,
	Konstantin, gaetan.rivet, Wu, Jingjing, thomas, motih, matan,
	Van Haaren, Harry, Zhang, Qi Z, Iremonger, Bernard
  Cc: jblunck, shreyansh.jain, dev, Zhang, Helin


> -----Original Message-----
> From: Guo, Jia
> Sent: Thursday, July 5, 2018 3:39 PM
> 
> When device be hotplug out, if app still continue to access device by mmio,
> it will cause of memory failure and result the system crash.
> 
> This patch introduces a bus ops to handle device hotplug failure, it is a
> bus specific behavior,so that each kind of bus can implement its own logic
> case by case.
> 
> Signed-off-by: Jeff Guo <jia.guo@intel.com>

Minor comment: there should be a space after the "behavior,"

Acked-by: Shaopeng He <shaopeng.he@intel.com>

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V5 2/7] bus/pci: implement hotplug failure handler ops
  2018-07-05  7:38         ` [PATCH V5 2/7] bus/pci: implement hotplug failure handler ops Jeff Guo
@ 2018-07-06 15:17           ` He, Shaopeng
  2018-07-09  5:29             ` Jeff Guo
  0 siblings, 1 reply; 494+ messages in thread
From: He, Shaopeng @ 2018-07-06 15:17 UTC (permalink / raw)
  To: Guo, Jia, stephen, Richardson, Bruce, Yigit, Ferruh, Ananyev,
	Konstantin, gaetan.rivet, Wu, Jingjing, thomas, motih, matan,
	Van Haaren, Harry, Zhang, Qi Z, Iremonger, Bernard
  Cc: jblunck, shreyansh.jain, dev, Zhang, Helin


> -----Original Message-----
> From: Guo, Jia
> Sent: Thursday, July 5, 2018 3:39 PM
> 
[...]
> +	switch (pdev->kdrv) {
> +	case RTE_KDRV_IGB_UIO:
> +	case RTE_KDRV_UIO_GENERIC:
> +	case RTE_KDRV_NIC_UIO:
> +		/* mmio resources is invalid, remap it to be safe. */

Better to keep consistent as: mmio resource is

[...]

Is it helpful that pci_uio_remap_resource could also remap UIO event and control fd?
So, up-layer application will be easier to deal with the un-plug event.

> +/* remap the PCI resource of a PCI device in anonymous virtual memory */
> +int
> +pci_uio_remap_resource(struct rte_pci_device *dev)
> +{
> +	int i;
> +	void *map_address;

Acked-by: Shaopeng He <shaopeng.he@intel.com>

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V5 3/7] bus: add sigbus handler
  2018-07-05  7:38         ` [PATCH V5 3/7] bus: add sigbus handler Jeff Guo
@ 2018-07-06 15:17           ` He, Shaopeng
  0 siblings, 0 replies; 494+ messages in thread
From: He, Shaopeng @ 2018-07-06 15:17 UTC (permalink / raw)
  To: Guo, Jia, stephen, Richardson, Bruce, Yigit, Ferruh, Ananyev,
	Konstantin, gaetan.rivet, Wu, Jingjing, thomas, motih, matan,
	Van Haaren, Harry, Zhang, Qi Z, Iremonger, Bernard
  Cc: jblunck, shreyansh.jain, dev, Zhang, Helin


> -----Original Message-----
> From: Guo, Jia
> 
> When device be hotplug out, if data path still read/write device, the
> sigbus error will occur, this error need to be handled. So a handler
> need to be here to capture the signal and handle it correspondingly.
> 
> This patch introduces a bus ops to handle sigbus error, it is a bus
> specific behavior,so that each kind of bus can implement its own logic
> case by case.
> 
> Signed-off-by: Jeff Guo <jia.guo@intel.com>

Acked-by: Shaopeng He <shaopeng.he@intel.com>

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V5 4/7] bus/pci: implement sigbus handler operation
  2018-07-05  7:38         ` [PATCH V5 4/7] bus/pci: implement sigbus handler operation Jeff Guo
@ 2018-07-06 15:18           ` He, Shaopeng
  0 siblings, 0 replies; 494+ messages in thread
From: He, Shaopeng @ 2018-07-06 15:18 UTC (permalink / raw)
  To: Guo, Jia, stephen, Richardson, Bruce, Yigit, Ferruh, Ananyev,
	Konstantin, gaetan.rivet, Wu, Jingjing, thomas, motih, matan,
	Van Haaren, Harry, Zhang, Qi Z, Iremonger, Bernard
  Cc: jblunck, shreyansh.jain, dev, Zhang, Helin


> -----Original Message-----
> From: Guo, Jia
> Sent: Thursday, July 5, 2018 3:39 PM
> 
> This patch implements the ops of sigbus handler for PCI bus, it is
> functional to find the corresponding pci device which is be hotplug out.

" which is been hotplug out "?

> and then handle the hotplug failure for this device.
> 
> Signed-off-by: Jeff Guo <jia.guo@intel.com>

Acked-by: Shaopeng He <shaopeng.he@intel.com>

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V5 5/7] bus: add helper to handle sigbus
  2018-07-05  7:38         ` [PATCH V5 5/7] bus: add helper to handle sigbus Jeff Guo
@ 2018-07-06 15:22           ` He, Shaopeng
  2018-07-09  5:31             ` Jeff Guo
  2018-07-08 13:30           ` Andrew Rybchenko
  1 sibling, 1 reply; 494+ messages in thread
From: He, Shaopeng @ 2018-07-06 15:22 UTC (permalink / raw)
  To: Guo, Jia, stephen, Richardson, Bruce, Yigit, Ferruh, Ananyev,
	Konstantin, gaetan.rivet, Wu, Jingjing, thomas, motih, matan,
	Van Haaren, Harry, Zhang, Qi Z, Iremonger, Bernard
  Cc: jblunck, shreyansh.jain, dev, Zhang, Helin


> -----Original Message-----
> From: Guo, Jia
> Sent: Thursday, July 5, 2018 3:39 PM
> 
> This patch aim to add a helper to iterate all buses to find the
> corresponding bus to handle the sigbus error.
> 

[...]

> +	bus = rte_bus_find(NULL, bus_handle_sigbus, failure_addr);
> +	/* failed to handle the sigbus, pass the new errno. */
> +	if (bus && rte_errno == -1)
> +		return -1;
> +	else if (!bus)
> +		ret = 1;

Change the compare order, code will be a little bit shorter?
	if (!bus)
		ret = 1
	else if (rte_errno == -1)
		return -1;

[...]

Acked-by: Shaopeng He <shaopeng.he@intel.com>

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V5 6/7] eal: add failure handle mechanism for hotplug
  2018-07-05  7:38         ` [PATCH V5 6/7] eal: add failure handle mechanism for hotplug Jeff Guo
@ 2018-07-06 15:22           ` He, Shaopeng
  2018-07-08 13:46           ` Andrew Rybchenko
  1 sibling, 0 replies; 494+ messages in thread
From: He, Shaopeng @ 2018-07-06 15:22 UTC (permalink / raw)
  To: Guo, Jia, stephen, Richardson, Bruce, Yigit, Ferruh, Ananyev,
	Konstantin, gaetan.rivet, Wu, Jingjing, thomas, motih, matan,
	Van Haaren, Harry, Zhang, Qi Z, Iremonger, Bernard
  Cc: jblunck, shreyansh.jain, dev, Zhang, Helin


> -----Original Message-----
> From: Guo, Jia
> Sent: Thursday, July 5, 2018 3:39 PM
> 
> This patch introduces a failure handler mechanism to handle device
> hot plug removal event.
> 
> First register sigbus handler, once sigbus error be captured, will
> check the failure address and accordingly remap the invalid memory
> for the corresponding device. Bese on this mechanism, it could

" Besed on this mechanism "?

> guaranty the application not to be crash when hotplug out device.
> 
> Signed-off-by: Jeff Guo <jia.guo@intel.com>

Acked-by: Shaopeng He <shaopeng.he@intel.com>

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V5 5/7] bus: add helper to handle sigbus
  2018-07-05  7:38         ` [PATCH V5 5/7] bus: add helper to handle sigbus Jeff Guo
  2018-07-06 15:22           ` He, Shaopeng
@ 2018-07-08 13:30           ` Andrew Rybchenko
  2018-07-09  5:33             ` Jeff Guo
  1 sibling, 1 reply; 494+ messages in thread
From: Andrew Rybchenko @ 2018-07-08 13:30 UTC (permalink / raw)
  To: Jeff Guo, stephen, bruce.richardson, ferruh.yigit,
	konstantin.ananyev, gaetan.rivet, jingjing.wu, thomas, motih,
	matan, harry.van.haaren, qi.z.zhang, shaopeng.he,
	bernard.iremonger
  Cc: jblunck, shreyansh.jain, dev, helin.zhang

On 05.07.2018 10:38, Jeff Guo wrote:
> This patch aim to add a helper to iterate all buses to find the
> corresponding bus to handle the sigbus error.
>
> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> ---
> v5->v4:
> refine the errno restore logic
> ---
>   lib/librte_eal/common/eal_common_bus.c | 36 +++++++++++++++++++++++++++++++++-
>   lib/librte_eal/common/eal_private.h    | 12 ++++++++++++
>   2 files changed, 47 insertions(+), 1 deletion(-)
>
> diff --git a/lib/librte_eal/common/eal_common_bus.c b/lib/librte_eal/common/eal_common_bus.c
> index 0943851..c9f3566 100644
> --- a/lib/librte_eal/common/eal_common_bus.c
> +++ b/lib/librte_eal/common/eal_common_bus.c
> @@ -37,6 +37,7 @@
>   #include <rte_bus.h>
>   #include <rte_debug.h>
>   #include <rte_string_fns.h>
> +#include <rte_errno.h>
>   
>   #include "eal_private.h"
>   
> @@ -220,7 +221,6 @@ rte_bus_find_by_device_name(const char *str)
>   	return rte_bus_find(NULL, bus_can_parse, name);
>   }
>   
> -

Unrelated change.

>   /*
>    * Get iommu class of devices on the bus.
>    */
> @@ -242,3 +242,37 @@ rte_bus_get_iommu_class(void)
>   	}
>   	return mode;
>   }
> +
> +static int
> +bus_handle_sigbus(const struct rte_bus *bus,
> +			const void *failure_addr)
> +{
> +	int ret;
> +
> +	ret = bus->sigbus_handler(failure_addr);

Shouldn't bus->sigbus_handler be checked here against NULL?
It looks like not all buses implement it.

> +	rte_errno = ret;
> +
> +	return !(bus->sigbus_handler && ret <= 0);
> +}
> +
> +int
> +rte_bus_sigbus_handler(const void *failure_addr)
> +{
> +	struct rte_bus *bus;
> +
> +	int ret = 0;
> +	int old_errno = rte_errno;
> +	rte_errno = 0;
> +
> +	bus = rte_bus_find(NULL, bus_handle_sigbus, failure_addr);
> +	/* failed to handle the sigbus, pass the new errno. */
> +	if (bus && rte_errno == -1)
> +		return -1;
> +	else if (!bus)
> +		ret = 1;
> +
> +	/* otherwise restore the old errno. */
> +	rte_errno = old_errno;
> +
> +	return ret;
> +}
> diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h
> index bdadc4d..a91c4b5 100644
> --- a/lib/librte_eal/common/eal_private.h
> +++ b/lib/librte_eal/common/eal_private.h
> @@ -258,4 +258,16 @@ int rte_mp_channel_init(void);
>    */
>   void dev_callback_process(char *device_name, enum rte_dev_event_type event);
>   
> +
> +/**
> + * Iterate all buses to find the corresponding bus, to handle the sigbus error.
> + * @param failure_addr
> + *	Pointer of the fault address of the sigbus error.
> + *
> + * @return
> + *	 0 success to handle the sigbus.
> + *	-1 failed to handle the sigbus
> + *	 1 no bus can handler the sigbus
> + */
> +int rte_bus_sigbus_handler(const void *failure_addr);

Empty line is missing after the function.

>   #endif /* _EAL_PRIVATE_H_ */

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V5 6/7] eal: add failure handle mechanism for hotplug
  2018-07-05  7:38         ` [PATCH V5 6/7] eal: add failure handle mechanism for hotplug Jeff Guo
  2018-07-06 15:22           ` He, Shaopeng
@ 2018-07-08 13:46           ` Andrew Rybchenko
  2018-07-09  5:40             ` Jeff Guo
  1 sibling, 1 reply; 494+ messages in thread
From: Andrew Rybchenko @ 2018-07-08 13:46 UTC (permalink / raw)
  To: Jeff Guo, stephen, bruce.richardson, ferruh.yigit,
	konstantin.ananyev, gaetan.rivet, jingjing.wu, thomas, motih,
	matan, harry.van.haaren, qi.z.zhang, shaopeng.he,
	bernard.iremonger
  Cc: jblunck, shreyansh.jain, dev, helin.zhang

On 05.07.2018 10:38, Jeff Guo wrote:
> This patch introduces a failure handler mechanism to handle device
> hot plug removal event.
>
> First register sigbus handler, once sigbus error be captured, will
> check the failure address and accordingly remap the invalid memory
> for the corresponding device. Bese on this mechanism, it could
> guaranty the application not to be crash when hotplug out device.
>
> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> ---
> v5->v4:
> add sigbus old handler recover.
> ---
>   lib/librte_eal/linuxapp/eal/eal_dev.c | 111 +++++++++++++++++++++++++++++++++-
>   1 file changed, 110 insertions(+), 1 deletion(-)
>
> diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
> index 1cf6aeb..a22cb9a 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_dev.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
> @@ -4,6 +4,8 @@
>   
>   #include <string.h>
>   #include <unistd.h>
> +#include <fcntl.h>
> +#include <signal.h>
>   #include <sys/socket.h>
>   #include <linux/netlink.h>
>   
> @@ -14,15 +16,28 @@
>   #include <rte_malloc.h>
>   #include <rte_interrupts.h>
>   #include <rte_alarm.h>
> +#include <rte_bus.h>
> +#include <rte_eal.h>
> +#include <rte_spinlock.h>
> +#include <rte_errno.h>
>   
>   #include "eal_private.h"
>   
>   static struct rte_intr_handle intr_handle = {.fd = -1 };
>   static bool monitor_started;
>   
> +extern struct rte_bus_list rte_bus_list;
> +

Shouldn't rte_bus.h provide it?

>   #define EAL_UEV_MSG_LEN 4096
>   #define EAL_UEV_MSG_ELEM_LEN 128
>   
> +/* spinlock for device failure process */
> +static rte_spinlock_t dev_failure_lock = RTE_SPINLOCK_INITIALIZER;

It would be useful to explain why the lock is needed and when
it should be obtained/released. Which resources are protected
by the lock?

> +
> +static struct sigaction sigbus_action_old;
> +
> +static int sigbus_need_recover;
> +
>   static void dev_uev_handler(__rte_unused void *param);
>   
>   /* identify the system layer which reports this event. */
> @@ -33,6 +48,49 @@ enum eal_dev_event_subsystem {
>   	EAL_DEV_EVENT_SUBSYSTEM_MAX
>   };
>   
> +static void
> +sigbus_action_recover(void)
> +{
> +	if (sigbus_need_recover) {
> +		sigaction(SIGBUS, &sigbus_action_old, NULL);
> +		sigbus_need_recover = 0;
> +	}
> +}
> +
> +static void sigbus_handler(int signum, siginfo_t *info,
> +				void *ctx __rte_unused)
> +{
> +	int ret;
> +
> +	RTE_LOG(INFO, EAL, "Thread[%d] catch SIGBUS, fault address:%p\n",
> +		(int)pthread_self(), info->si_addr);
> +
> +	rte_spinlock_lock(&dev_failure_lock);
> +	ret = rte_bus_sigbus_handler(info->si_addr);
> +	rte_spinlock_unlock(&dev_failure_lock);
> +	if (ret == -1) {
> +		rte_exit(EXIT_FAILURE,
> +			 "Failed to handle SIGBUS for hotplug, "
> +			 "(rte_errno: %s)!", strerror(rte_errno));
> +	} else if (ret == 1) {
> +		if (sigbus_action_old.sa_handler)
> +			(*(sigbus_action_old.sa_handler))(signum);
> +		else
> +			rte_exit(EXIT_FAILURE,
> +				 "Failed to handle generic SIGBUS!");
> +	}
> +
> +	RTE_LOG(INFO, EAL, "Success to handle SIGBUS for hotplug!\n");
> +}
> +
> +static int cmp_dev_name(const struct rte_device *dev,
> +	const void *_name)
> +{
> +	const char *name = _name;
> +
> +	return strcmp(dev->name, name);
> +}
> +
>   static int
>   dev_uev_socket_fd_create(void)
>   {
> @@ -147,6 +205,9 @@ dev_uev_handler(__rte_unused void *param)
>   	struct rte_dev_event uevent;
>   	int ret;
>   	char buf[EAL_UEV_MSG_LEN];
> +	struct rte_bus *bus;
> +	struct rte_device *dev;
> +	const char *busname;
>   
>   	memset(&uevent, 0, sizeof(struct rte_dev_event));
>   	memset(buf, 0, EAL_UEV_MSG_LEN);
> @@ -171,13 +232,50 @@ dev_uev_handler(__rte_unused void *param)
>   	RTE_LOG(DEBUG, EAL, "receive uevent(name:%s, type:%d, subsystem:%d)\n",
>   		uevent.devname, uevent.type, uevent.subsystem);
>   
> -	if (uevent.devname)
> +	switch (uevent.subsystem) {
> +	case EAL_DEV_EVENT_SUBSYSTEM_PCI:
> +	case EAL_DEV_EVENT_SUBSYSTEM_UIO:
> +		busname = "pci";
> +		break;
> +	default:
> +		break;
> +	}
> +
> +	if (uevent.devname) {
> +		if (uevent.type == RTE_DEV_EVENT_REMOVE) {
> +			rte_spinlock_lock(&dev_failure_lock);
> +			bus = rte_bus_find_by_name(busname);

It looks like busname could be uninitialized here.

> +			if (bus == NULL) {
> +				RTE_LOG(ERR, EAL, "Cannot find bus (%s)\n",
> +					busname);
> +				return;
> +			}
> +
> +			dev = bus->find_device(NULL, cmp_dev_name,
> +					       uevent.devname);
> +			if (dev == NULL) {
> +				RTE_LOG(ERR, EAL, "Cannot find device (%s) on "
> +					"bus (%s)\n", uevent.devname, busname);
> +				return;
> +			}
> +
> +			ret = bus->hotplug_failure_handler(dev);
> +			rte_spinlock_unlock(&dev_failure_lock);
> +			if (ret) {
> +				RTE_LOG(ERR, EAL, "Can not handle hotplug for "
> +					"device (%s)\n", dev->name);
> +				return;
> +			}
> +		}
>   		dev_callback_process(uevent.devname, uevent.type);
> +	}
>   }
>   
>   int __rte_experimental
>   rte_dev_event_monitor_start(void)
>   {
> +	sigset_t mask;
> +	struct sigaction action;
>   	int ret;
>   
>   	if (monitor_started)
> @@ -197,6 +295,14 @@ rte_dev_event_monitor_start(void)
>   		return -1;
>   	}
>   
> +	/* register sigbus handler */
> +	sigemptyset(&mask);
> +	sigaddset(&mask, SIGBUS);
> +	action.sa_flags = SA_SIGINFO;
> +	action.sa_mask = mask;
> +	action.sa_sigaction = sigbus_handler;
> +	sigbus_need_recover = !sigaction(SIGBUS, &action, &sigbus_action_old);
> +
>   	monitor_started = true;
>   
>   	return 0;
> @@ -217,8 +323,11 @@ rte_dev_event_monitor_stop(void)
>   		return ret;
>   	}
>   
> +	sigbus_action_recover();
> +
>   	close(intr_handle.fd);
>   	intr_handle.fd = -1;
>   	monitor_started = false;
> +
>   	return 0;
>   }

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V5 2/7] bus/pci: implement hotplug failure handler ops
  2018-07-06 15:17           ` He, Shaopeng
@ 2018-07-09  5:29             ` Jeff Guo
  0 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-07-09  5:29 UTC (permalink / raw)
  To: He, Shaopeng, stephen, Richardson, Bruce, Yigit, Ferruh, Ananyev,
	Konstantin, gaetan.rivet, Wu, Jingjing, thomas, motih, matan,
	Van Haaren, Harry, Zhang, Qi Z, Iremonger, Bernard
  Cc: jblunck, shreyansh.jain, dev, Zhang, Helin

hi, shaopeng


On 7/6/2018 11:17 PM, He, Shaopeng wrote:
>> -----Original Message-----
>> From: Guo, Jia
>> Sent: Thursday, July 5, 2018 3:39 PM
>>
> [...]
>> +	switch (pdev->kdrv) {
>> +	case RTE_KDRV_IGB_UIO:
>> +	case RTE_KDRV_UIO_GENERIC:
>> +	case RTE_KDRV_NIC_UIO:
>> +		/* mmio resources is invalid, remap it to be safe. */
> Better to keep consistent as: mmio resource is

ok.

> [...]
>
> Is it helpful that pci_uio_remap_resource could also remap UIO event and control fd?
> So, up-layer application will be easier to deal with the un-plug event.

The fd remove should be after the device be closed, since it will still 
use the fd to close the interrupt when uninitialized driver,
and removing fd is go on to let the pci_uio_unmap_resource to do it when 
device detach.

>> +/* remap the PCI resource of a PCI device in anonymous virtual memory */
>> +int
>> +pci_uio_remap_resource(struct rte_pci_device *dev)
>> +{
>> +	int i;
>> +	void *map_address;
> Acked-by: Shaopeng He <shaopeng.he@intel.com>
>

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V5 5/7] bus: add helper to handle sigbus
  2018-07-06 15:22           ` He, Shaopeng
@ 2018-07-09  5:31             ` Jeff Guo
  0 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-07-09  5:31 UTC (permalink / raw)
  To: He, Shaopeng, stephen, Richardson, Bruce, Yigit, Ferruh, Ananyev,
	Konstantin, gaetan.rivet, Wu, Jingjing, thomas, motih, matan,
	Van Haaren, Harry, Zhang, Qi Z, Iremonger, Bernard
  Cc: jblunck, shreyansh.jain, dev, Zhang, Helin

hi, shaopeng

thanks for your review.


On 7/6/2018 11:22 PM, He, Shaopeng wrote:
>> -----Original Message-----
>> From: Guo, Jia
>> Sent: Thursday, July 5, 2018 3:39 PM
>>
>> This patch aim to add a helper to iterate all buses to find the
>> corresponding bus to handle the sigbus error.
>>
> [...]
>
>> +	bus = rte_bus_find(NULL, bus_handle_sigbus, failure_addr);
>> +	/* failed to handle the sigbus, pass the new errno. */
>> +	if (bus && rte_errno == -1)
>> +		return -1;
>> +	else if (!bus)
>> +		ret = 1;
> Change the compare order, code will be a little bit shorter?
> 	if (!bus)
> 		ret = 1
> 	else if (rte_errno == -1)
> 		return -1;
>
> [...]

make sense.

> Acked-by: Shaopeng He <shaopeng.he@intel.com>
>

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V5 5/7] bus: add helper to handle sigbus
  2018-07-08 13:30           ` Andrew Rybchenko
@ 2018-07-09  5:33             ` Jeff Guo
  0 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-07-09  5:33 UTC (permalink / raw)
  To: Andrew Rybchenko, stephen, bruce.richardson, ferruh.yigit,
	konstantin.ananyev, gaetan.rivet, jingjing.wu, thomas, motih,
	matan, harry.van.haaren, qi.z.zhang, shaopeng.he,
	bernard.iremonger
  Cc: jblunck, shreyansh.jain, dev, helin.zhang

hi, andrew

Thanks for your reviewing.

On 7/8/2018 9:30 PM, Andrew Rybchenko wrote:
> On 05.07.2018 10:38, Jeff Guo wrote:
>> This patch aim to add a helper to iterate all buses to find the
>> corresponding bus to handle the sigbus error.
>>
>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
>> ---
>> v5->v4:
>> refine the errno restore logic
>> ---
>>   lib/librte_eal/common/eal_common_bus.c | 36 
>> +++++++++++++++++++++++++++++++++-
>>   lib/librte_eal/common/eal_private.h    | 12 ++++++++++++
>>   2 files changed, 47 insertions(+), 1 deletion(-)
>>
>> diff --git a/lib/librte_eal/common/eal_common_bus.c 
>> b/lib/librte_eal/common/eal_common_bus.c
>> index 0943851..c9f3566 100644
>> --- a/lib/librte_eal/common/eal_common_bus.c
>> +++ b/lib/librte_eal/common/eal_common_bus.c
>> @@ -37,6 +37,7 @@
>>   #include <rte_bus.h>
>>   #include <rte_debug.h>
>>   #include <rte_string_fns.h>
>> +#include <rte_errno.h>
>>     #include "eal_private.h"
>>   @@ -220,7 +221,6 @@ rte_bus_find_by_device_name(const char *str)
>>       return rte_bus_find(NULL, bus_can_parse, name);
>>   }
>>   -
>
> Unrelated change.
>

ok. I am fine to let it left to other specific patch.

>>   /*
>>    * Get iommu class of devices on the bus.
>>    */
>> @@ -242,3 +242,37 @@ rte_bus_get_iommu_class(void)
>>       }
>>       return mode;
>>   }
>> +
>> +static int
>> +bus_handle_sigbus(const struct rte_bus *bus,
>> +            const void *failure_addr)
>> +{
>> +    int ret;
>> +
>> +    ret = bus->sigbus_handler(failure_addr);
>
> Shouldn't bus->sigbus_handler be checked here against NULL?
> It looks like not all buses implement it.
>

should be like what you said.

>> +    rte_errno = ret;
>> +
>> +    return !(bus->sigbus_handler && ret <= 0);
>> +}
>> +
>> +int
>> +rte_bus_sigbus_handler(const void *failure_addr)
>> +{
>> +    struct rte_bus *bus;
>> +
>> +    int ret = 0;
>> +    int old_errno = rte_errno;
>> +    rte_errno = 0;
>> +
>> +    bus = rte_bus_find(NULL, bus_handle_sigbus, failure_addr);
>> +    /* failed to handle the sigbus, pass the new errno. */
>> +    if (bus && rte_errno == -1)
>> +        return -1;
>> +    else if (!bus)
>> +        ret = 1;
>> +
>> +    /* otherwise restore the old errno. */
>> +    rte_errno = old_errno;
>> +
>> +    return ret;
>> +}
>> diff --git a/lib/librte_eal/common/eal_private.h 
>> b/lib/librte_eal/common/eal_private.h
>> index bdadc4d..a91c4b5 100644
>> --- a/lib/librte_eal/common/eal_private.h
>> +++ b/lib/librte_eal/common/eal_private.h
>> @@ -258,4 +258,16 @@ int rte_mp_channel_init(void);
>>    */
>>   void dev_callback_process(char *device_name, enum 
>> rte_dev_event_type event);
>>   +
>> +/**
>> + * Iterate all buses to find the corresponding bus, to handle the 
>> sigbus error.
>> + * @param failure_addr
>> + *    Pointer of the fault address of the sigbus error.
>> + *
>> + * @return
>> + *     0 success to handle the sigbus.
>> + *    -1 failed to handle the sigbus
>> + *     1 no bus can handler the sigbus
>> + */
>> +int rte_bus_sigbus_handler(const void *failure_addr);
>
> Empty line is missing after the function.
>

ok.

>>   #endif /* _EAL_PRIVATE_H_ */
>

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V5 6/7] eal: add failure handle mechanism for hotplug
  2018-07-08 13:46           ` Andrew Rybchenko
@ 2018-07-09  5:40             ` Jeff Guo
  0 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-07-09  5:40 UTC (permalink / raw)
  To: Andrew Rybchenko, stephen, bruce.richardson, ferruh.yigit,
	konstantin.ananyev, gaetan.rivet, jingjing.wu, thomas, motih,
	matan, harry.van.haaren, qi.z.zhang, shaopeng.he,
	bernard.iremonger
  Cc: jblunck, shreyansh.jain, dev, helin.zhang



On 7/8/2018 9:46 PM, Andrew Rybchenko wrote:
> On 05.07.2018 10:38, Jeff Guo wrote:
>> This patch introduces a failure handler mechanism to handle device
>> hot plug removal event.
>>
>> First register sigbus handler, once sigbus error be captured, will
>> check the failure address and accordingly remap the invalid memory
>> for the corresponding device. Bese on this mechanism, it could
>> guaranty the application not to be crash when hotplug out device.
>>
>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
>> ---
>> v5->v4:
>> add sigbus old handler recover.
>> ---
>>   lib/librte_eal/linuxapp/eal/eal_dev.c | 111 
>> +++++++++++++++++++++++++++++++++-
>>   1 file changed, 110 insertions(+), 1 deletion(-)
>>
>> diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c 
>> b/lib/librte_eal/linuxapp/eal/eal_dev.c
>> index 1cf6aeb..a22cb9a 100644
>> --- a/lib/librte_eal/linuxapp/eal/eal_dev.c
>> +++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
>> @@ -4,6 +4,8 @@
>>     #include <string.h>
>>   #include <unistd.h>
>> +#include <fcntl.h>
>> +#include <signal.h>
>>   #include <sys/socket.h>
>>   #include <linux/netlink.h>
>>   @@ -14,15 +16,28 @@
>>   #include <rte_malloc.h>
>>   #include <rte_interrupts.h>
>>   #include <rte_alarm.h>
>> +#include <rte_bus.h>
>> +#include <rte_eal.h>
>> +#include <rte_spinlock.h>
>> +#include <rte_errno.h>
>>     #include "eal_private.h"
>>     static struct rte_intr_handle intr_handle = {.fd = -1 };
>>   static bool monitor_started;
>>   +extern struct rte_bus_list rte_bus_list;
>> +
>
> Shouldn't rte_bus.h provide it?
>

I think rte_bus.h provide the rte_bus_list structure,  and then 
announcement a variable in eal_common_bus.c, then i use it by extern in 
eal_dev.c.

>>   #define EAL_UEV_MSG_LEN 4096
>>   #define EAL_UEV_MSG_ELEM_LEN 128
>>   +/* spinlock for device failure process */
>> +static rte_spinlock_t dev_failure_lock = RTE_SPINLOCK_INITIALIZER;
>
> It would be useful to explain why the lock is needed and when
> it should be obtained/released. Which resources are protected
> by the lock?
>

make sense, this locker should be use both bus and device access 
protection. Will explicit to let it to be more readable.

>> +
>> +static struct sigaction sigbus_action_old;
>> +
>> +static int sigbus_need_recover;
>> +
>>   static void dev_uev_handler(__rte_unused void *param);
>>     /* identify the system layer which reports this event. */
>> @@ -33,6 +48,49 @@ enum eal_dev_event_subsystem {
>>       EAL_DEV_EVENT_SUBSYSTEM_MAX
>>   };
>>   +static void
>> +sigbus_action_recover(void)
>> +{
>> +    if (sigbus_need_recover) {
>> +        sigaction(SIGBUS, &sigbus_action_old, NULL);
>> +        sigbus_need_recover = 0;
>> +    }
>> +}
>> +
>> +static void sigbus_handler(int signum, siginfo_t *info,
>> +                void *ctx __rte_unused)
>> +{
>> +    int ret;
>> +
>> +    RTE_LOG(INFO, EAL, "Thread[%d] catch SIGBUS, fault address:%p\n",
>> +        (int)pthread_self(), info->si_addr);
>> +
>> +    rte_spinlock_lock(&dev_failure_lock);
>> +    ret = rte_bus_sigbus_handler(info->si_addr);
>> +    rte_spinlock_unlock(&dev_failure_lock);
>> +    if (ret == -1) {
>> +        rte_exit(EXIT_FAILURE,
>> +             "Failed to handle SIGBUS for hotplug, "
>> +             "(rte_errno: %s)!", strerror(rte_errno));
>> +    } else if (ret == 1) {
>> +        if (sigbus_action_old.sa_handler)
>> +            (*(sigbus_action_old.sa_handler))(signum);
>> +        else
>> +            rte_exit(EXIT_FAILURE,
>> +                 "Failed to handle generic SIGBUS!");
>> +    }
>> +
>> +    RTE_LOG(INFO, EAL, "Success to handle SIGBUS for hotplug!\n");
>> +}
>> +
>> +static int cmp_dev_name(const struct rte_device *dev,
>> +    const void *_name)
>> +{
>> +    const char *name = _name;
>> +
>> +    return strcmp(dev->name, name);
>> +}
>> +
>>   static int
>>   dev_uev_socket_fd_create(void)
>>   {
>> @@ -147,6 +205,9 @@ dev_uev_handler(__rte_unused void *param)
>>       struct rte_dev_event uevent;
>>       int ret;
>>       char buf[EAL_UEV_MSG_LEN];
>> +    struct rte_bus *bus;
>> +    struct rte_device *dev;
>> +    const char *busname;
>>         memset(&uevent, 0, sizeof(struct rte_dev_event));
>>       memset(buf, 0, EAL_UEV_MSG_LEN);
>> @@ -171,13 +232,50 @@ dev_uev_handler(__rte_unused void *param)
>>       RTE_LOG(DEBUG, EAL, "receive uevent(name:%s, type:%d, 
>> subsystem:%d)\n",
>>           uevent.devname, uevent.type, uevent.subsystem);
>>   -    if (uevent.devname)
>> +    switch (uevent.subsystem) {
>> +    case EAL_DEV_EVENT_SUBSYSTEM_PCI:
>> +    case EAL_DEV_EVENT_SUBSYSTEM_UIO:
>> +        busname = "pci";
>> +        break;
>> +    default:
>> +        break;
>> +    }
>> +
>> +    if (uevent.devname) {
>> +        if (uevent.type == RTE_DEV_EVENT_REMOVE) {
>> +            rte_spinlock_lock(&dev_failure_lock);
>> +            bus = rte_bus_find_by_name(busname);
>
> It looks like busname could be uninitialized here.
>

you are correct i think.

>> +            if (bus == NULL) {
>> +                RTE_LOG(ERR, EAL, "Cannot find bus (%s)\n",
>> +                    busname);
>> +                return;
>> +            }
>> +
>> +            dev = bus->find_device(NULL, cmp_dev_name,
>> +                           uevent.devname);
>> +            if (dev == NULL) {
>> +                RTE_LOG(ERR, EAL, "Cannot find device (%s) on "
>> +                    "bus (%s)\n", uevent.devname, busname);
>> +                return;
>> +            }
>> +
>> +            ret = bus->hotplug_failure_handler(dev);
>> +            rte_spinlock_unlock(&dev_failure_lock);
>> +            if (ret) {
>> +                RTE_LOG(ERR, EAL, "Can not handle hotplug for "
>> +                    "device (%s)\n", dev->name);
>> +                return;
>> +            }
>> +        }
>>           dev_callback_process(uevent.devname, uevent.type);
>> +    }
>>   }
>>     int __rte_experimental
>>   rte_dev_event_monitor_start(void)
>>   {
>> +    sigset_t mask;
>> +    struct sigaction action;
>>       int ret;
>>         if (monitor_started)
>> @@ -197,6 +295,14 @@ rte_dev_event_monitor_start(void)
>>           return -1;
>>       }
>>   +    /* register sigbus handler */
>> +    sigemptyset(&mask);
>> +    sigaddset(&mask, SIGBUS);
>> +    action.sa_flags = SA_SIGINFO;
>> +    action.sa_mask = mask;
>> +    action.sa_sigaction = sigbus_handler;
>> +    sigbus_need_recover = !sigaction(SIGBUS, &action, 
>> &sigbus_action_old);
>> +
>>       monitor_started = true;
>>         return 0;
>> @@ -217,8 +323,11 @@ rte_dev_event_monitor_stop(void)
>>           return ret;
>>       }
>>   +    sigbus_action_recover();
>> +
>>       close(intr_handle.fd);
>>       intr_handle.fd = -1;
>>       monitor_started = false;
>> +
>>       return 0;
>>   }
>

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH v6 0/7] hotplug failure handle mechanism
  2017-06-29  4:37     ` [PATCH v3 0/2] add uevent api for hot plug Jeff Guo
                         ` (10 preceding siblings ...)
  2018-07-05  8:21       ` [PATCH V5 0/7] hot plug failure handle mechanism Jeff Guo
@ 2018-07-09  6:51       ` Jeff Guo
  2018-07-09  6:51         ` [PATCH v6 1/7] bus: add hotplug failure handler Jeff Guo
                           ` (6 more replies)
  2018-07-09 11:56       ` [PATCH v7 0/7] hotplug failure handle mechanism Jeff Guo
                         ` (11 subsequent siblings)
  23 siblings, 7 replies; 494+ messages in thread
From: Jeff Guo @ 2018-07-09  6:51 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

As we know, hot plug is an importance feature, either use for the datacenter
device’s fail-safe, or use for SRIOV Live Migration in SDN/NFV. It could bring
the higher flexibility and continuality to the networking services in multiple
use cases in industry. So let we see, dpdk as an importance networking
framework, what can it help to implement hot plug solution for users.

We already have a general device event detect mechanism, failsafe driver,
bonding driver and hot plug/unplug api in framework, app could use these to
develop their hot plug solution.

let’s see the case of hot unplug, it can happen when a hardware device is
be removed physically, or when the software disables it.  App need to call
ether dev API to detach the device, to unplug the device at the bus level and
make access to the device invalid. But the problem is that, the removal of the
device from the software lists is not going to be instantaneous, at this time
if the data(fast) path still read/write the device, it will cause MMIO error
and result of the app crash out.

Seems that we have got fail-safe driver(or app) + RTE_ETH_EVENT_INTR_RMV +
kernel core driver solution to handle it, but still not have failsafe driver
(or app) + RTE_DEV_EVENT_REMOVE + PCIe pmd driver failure handle solution. So
there is an absence in dpdk hot plug solution right now.

Also, we know that kernel only guaranty hot plug on the kernel side, but not for
the user mode side. Firstly we can hardly have a gatekeeper for any MMIO for
multiple PMD driver. Secondly, no more specific 3rd tools such as udev/driverctl
have especially cover these hot plug failure processing. Third, the feasibility
of app’s implement for multiple user mode PMD driver is still a problem. Here,
a general hot plug failure handle mechanism in dpdk framework would be proposed,
it aim to guaranty that, when hot unplug occur, the system will not crash and
app will not be break out, and user space can normally stop and release any
relevant resources, then unplug of the device at the bus level cleanly.

The mechanism should be come across as bellow:

Firstly, app enabled the device event monitor and register the hot plug event’s
callback before running data path. Once the hot unplug behave occur, the
mechanism will detect the removal event and then accordingly do the failure
handle. In order to do that, below functional will be bring in.
 - Add a new bus ops “handle_hot_unplug” to handle bus read/write error, it is
   bus-specific and each kind of bus can implement its own logic.
 - Implement pci bus specific ops “pci_handle_hot_unplug”. It will base on the
   failure address to remap memory for the corresponding device that unplugged.

For the data path or other unexpected control from the control path when hot
unplug occur.
 - Implement a new sigbus handler, it is registered when start device even
   monitoring. The handler is per process. Base on the signal event principle,
   control path thread and data path thread will randomly receive the sigbus
   error, but will go to the common sigbus handler. Once the MMIO sigbus error
   exposure, it will trigger the above hot unplug operation. The sigbus will be
   check if it is cause of the hot unplug or not, if not will info exception as
   the original sigbus handler. If yes, will do memory remapping.

For the control path and the igb uio release:
 - When hot unplug device, the kernel will release the device resource in the
   kernel side, such as the fd sys file will disappear, and the irq will be
   released. At this time, if igb uio driver still try to release this resource,
   it will cause kernel crash.
   On the other hand, something like interrupt disable do not automatically
   process in kernel side. If not handler it, this redundancy and dirty thing
   will affect the interrupt resource be used by other device.
   So the igb_uio driver have to check the hot plug status and corresponding
   process should be taken in igb uio deriver.
   This patch propose to add structure of rte_udev_state into rte_uio_pci_dev
   of igb_uio kernel driver, which will record the state of uio device, such as
   probed/opened/released/removed/unplug. When detect the unexpected removal
   which cause of hot unplug behavior, it will corresponding disable interrupt
   resource, while for the part of releasement which kernel have already handle,
   just skip it to avoid double free or null pointer kernel crash issue.

The mechanism could be use for fail-safe driver and app which want to use hot
plug solution. let testpmd for example:
 - Enable device event monitor->device unplug->failure handle->stop forwarding->
   stop port->close port->detach port.

This process will not breaking the app/fail-safe running, and will not break
other irrelevance device. And app could plug in the device and restart the date
path again by below.
 - Device plug in->bind igb_uio driver ->attached device->start port->
   start forwarding.

patchset history:
v6->v5:
refine some description about bus ops
refine commit log
add some entry check.

v5->v4:
split patches to focus on the failure handle, remove the event usage by testpmd
to another patch.
change the hotplug failure handler name
refine the sigbus handle logic
add lock for udev state in igb uio driver

v4->v3:
split patches to be small and clear
change to use new parameter "--hotplug-mode" in testpmd
to identify the eal hotplug and ethdev hotplug

v3->v2:
change bus ops name to bus_hotplug_handler.
add new API and bus ops of bus_signal_handler
distingush handle generic sigbus and hotplug sigbus

v2->v1(v21):
refine some doc and commit log
fix igb uio kernel issue for control path failure
rebase testpmd code

Since the hot plug solution be discussed serval around in the public, the
scope be changed and the patch set be split into many times. Coming to the
recently RFC and feature design, it just focus on the hot unplug failure
handler at this patch set, so in order let this topic more clear and focus,
summarize privours patch set in history “v1(v21)”, the v2 here go ahead
for further track.

"v1(21)" == v21 as below:
v21->v20:
split function in hot unplug ops
sync failure hanlde to fix multiple process issue fix attach port issue for multiple devices case.
combind rmv callback function to be only one.

v20->v19:
clean the code
refine the remap logic for multiple device.
remove the auto binding

v19->18:
note for limitation of multiple hotplug,fix some typo, sqeeze patch.

v18->v15:
add document, add signal bus handler, refine the code to be more clear.

the prior patch history please check the patch set "add device event monitor framework"

Jeff Guo (7):
  bus: add hotplug failure handler
  bus/pci: implement hotplug failure handler ops
  bus: add sigbus handler
  bus/pci: implement sigbus handler operation
  bus: add helper to handle sigbus
  eal: add failure handle mechanism for hotplug
  igb_uio: fix uio release issue when hot unplug

 drivers/bus/pci/pci_common.c            |  77 +++++++++++++++++++++
 drivers/bus/pci/pci_common_uio.c        |  33 +++++++++
 drivers/bus/pci/private.h               |  12 ++++
 kernel/linux/igb_uio/igb_uio.c          |  51 +++++++++++++-
 lib/librte_eal/common/eal_common_bus.c  |  42 ++++++++++++
 lib/librte_eal/common/eal_private.h     |  12 ++++
 lib/librte_eal/common/include/rte_bus.h |  33 +++++++++
 lib/librte_eal/linuxapp/eal/eal_dev.c   | 114 +++++++++++++++++++++++++++++++-
 8 files changed, 370 insertions(+), 4 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH v6 1/7] bus: add hotplug failure handler
  2018-07-09  6:51       ` [PATCH v6 0/7] hotplug failure handle mechanism Jeff Guo
@ 2018-07-09  6:51         ` Jeff Guo
  2018-07-09  6:51         ` [PATCH v6 2/7] bus/pci: implement hotplug failure handler ops Jeff Guo
                           ` (5 subsequent siblings)
  6 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-07-09  6:51 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

When device be hotplug out, if app still continue to access device by mmio,
it will cause of memory failure and result the system crash.

This patch introduces a bus ops to handle device hotplug failure, it is a
bus specific behavior, so each kind of bus can implement its own logic case
by case.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Shaopeng He <shaopeng.he@intel.com>
---
v6->v5:
refine some description of bus ops
---
 lib/librte_eal/common/include/rte_bus.h | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
index eb9eded..e3a55a8 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -168,6 +168,20 @@ typedef int (*rte_bus_unplug_t)(struct rte_device *dev);
 typedef int (*rte_bus_parse_t)(const char *name, void *addr);
 
 /**
+ * Implementation a specific hotplug failure handler, which is responsible
+ * for handle the failure when the device be hotplug out from the bus. When
+ * hotplug removal event be detected, it could call this function to handle
+ * failure and guaranty the system would not crash in the case.
+ * @param dev
+ *	Pointer of the device structure.
+ *
+ * @return
+ *	0 on success.
+ *	!0 on error.
+ */
+typedef int (*rte_bus_hotplug_failure_handler_t)(struct rte_device *dev);
+
+/**
  * Bus scan policies
  */
 enum rte_bus_scan_mode {
@@ -211,6 +225,8 @@ struct rte_bus {
 	rte_bus_parse_t parse;       /**< Parse a device name */
 	struct rte_bus_conf conf;    /**< Bus configuration */
 	rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
+	rte_bus_hotplug_failure_handler_t hotplug_failure_handler;
+					/**< handle hotplug failure on bus */
 };
 
 /**
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v6 2/7] bus/pci: implement hotplug failure handler ops
  2018-07-09  6:51       ` [PATCH v6 0/7] hotplug failure handle mechanism Jeff Guo
  2018-07-09  6:51         ` [PATCH v6 1/7] bus: add hotplug failure handler Jeff Guo
@ 2018-07-09  6:51         ` Jeff Guo
  2018-07-09  6:51         ` [PATCH v6 3/7] bus: add sigbus handler Jeff Guo
                           ` (4 subsequent siblings)
  6 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-07-09  6:51 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

This patch implements the ops of hotplug failure handler for PCI bus,
it is functional to remap a new dummy memory which overlap to the
failure memory to avoid MMIO read/write error.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Shaopeng He <shaopeng.he@intel.com>
---
v6->v5:
refine some typo
---
 drivers/bus/pci/pci_common.c     | 28 ++++++++++++++++++++++++++++
 drivers/bus/pci/pci_common_uio.c | 33 +++++++++++++++++++++++++++++++++
 drivers/bus/pci/private.h        | 12 ++++++++++++
 3 files changed, 73 insertions(+)

diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
index 94b0f41..d7abe6c 100644
--- a/drivers/bus/pci/pci_common.c
+++ b/drivers/bus/pci/pci_common.c
@@ -408,6 +408,33 @@ pci_find_device(const struct rte_device *start, rte_dev_cmp_t cmp,
 }
 
 static int
+pci_hotplug_failure_handler(struct rte_device *dev)
+{
+	struct rte_pci_device *pdev = NULL;
+	int ret = 0;
+
+	pdev = RTE_DEV_TO_PCI(dev);
+	if (!pdev)
+		return -1;
+
+	switch (pdev->kdrv) {
+	case RTE_KDRV_IGB_UIO:
+	case RTE_KDRV_UIO_GENERIC:
+	case RTE_KDRV_NIC_UIO:
+		/* mmio resource is invalid, remap it to be safe. */
+		ret = pci_uio_remap_resource(pdev);
+		break;
+	default:
+		RTE_LOG(DEBUG, EAL,
+			"Not managed by a supported kernel driver, skipped\n");
+		ret = -1;
+		break;
+	}
+
+	return ret;
+}
+
+static int
 pci_plug(struct rte_device *dev)
 {
 	return pci_probe_all_drivers(RTE_DEV_TO_PCI(dev));
@@ -437,6 +464,7 @@ struct rte_pci_bus rte_pci_bus = {
 		.unplug = pci_unplug,
 		.parse = pci_parse,
 		.get_iommu_class = rte_pci_get_iommu_class,
+		.hotplug_failure_handler = pci_hotplug_failure_handler,
 	},
 	.device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
 	.driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
diff --git a/drivers/bus/pci/pci_common_uio.c b/drivers/bus/pci/pci_common_uio.c
index 54bc20b..7ea73db 100644
--- a/drivers/bus/pci/pci_common_uio.c
+++ b/drivers/bus/pci/pci_common_uio.c
@@ -146,6 +146,39 @@ pci_uio_unmap(struct mapped_pci_resource *uio_res)
 	}
 }
 
+/* remap the PCI resource of a PCI device in anonymous virtual memory */
+int
+pci_uio_remap_resource(struct rte_pci_device *dev)
+{
+	int i;
+	void *map_address;
+
+	if (dev == NULL)
+		return -1;
+
+	/* Remap all BARs */
+	for (i = 0; i != PCI_MAX_RESOURCE; i++) {
+		/* skip empty BAR */
+		if (dev->mem_resource[i].phys_addr == 0)
+			continue;
+		map_address = mmap(dev->mem_resource[i].addr,
+				(size_t)dev->mem_resource[i].len,
+				PROT_READ | PROT_WRITE,
+				MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
+		if (map_address == MAP_FAILED) {
+			RTE_LOG(ERR, EAL,
+				"Cannot remap resource for device %s\n",
+				dev->name);
+			return -1;
+		}
+		RTE_LOG(INFO, EAL,
+			"Successful remap resource for device %s\n",
+			dev->name);
+	}
+
+	return 0;
+}
+
 static struct mapped_pci_resource *
 pci_uio_find_resource(struct rte_pci_device *dev)
 {
diff --git a/drivers/bus/pci/private.h b/drivers/bus/pci/private.h
index 8ddd03e..6b312e5 100644
--- a/drivers/bus/pci/private.h
+++ b/drivers/bus/pci/private.h
@@ -123,6 +123,18 @@ void pci_uio_free_resource(struct rte_pci_device *dev,
 		struct mapped_pci_resource *uio_res);
 
 /**
+ * Remap the PCI resource of a PCI device in anonymous virtual memory.
+ *
+ * @param dev
+ *   Point to the struct rte pci device.
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+pci_uio_remap_resource(struct rte_pci_device *dev);
+
+/**
  * Map device memory to uio resource
  *
  * This function is private to EAL.
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v6 3/7] bus: add sigbus handler
  2018-07-09  6:51       ` [PATCH v6 0/7] hotplug failure handle mechanism Jeff Guo
  2018-07-09  6:51         ` [PATCH v6 1/7] bus: add hotplug failure handler Jeff Guo
  2018-07-09  6:51         ` [PATCH v6 2/7] bus/pci: implement hotplug failure handler ops Jeff Guo
@ 2018-07-09  6:51         ` Jeff Guo
  2018-07-09  6:51         ` [PATCH v6 4/7] bus/pci: implement sigbus handler operation Jeff Guo
                           ` (3 subsequent siblings)
  6 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-07-09  6:51 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

When device be hotplug out, if data path still read/write device, the
sigbus error will occur, this error need to be handled. So a handler
need to be here to capture the signal and handle it correspondingly.

This patch introduces a bus ops to handle sigbus error, it is a bus
specific behavior, so that each kind of bus can implement its own logic
case by case.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Shaopeng He <shaopeng.he@intel.com>
---
v6->v5:
refine some description of bus ops
---
 lib/librte_eal/common/include/rte_bus.h | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
index e3a55a8..216ad1e 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -182,6 +182,21 @@ typedef int (*rte_bus_parse_t)(const char *name, void *addr);
 typedef int (*rte_bus_hotplug_failure_handler_t)(struct rte_device *dev);
 
 /**
+ * Implementation a specific sigbus handler, which is responsible for handle
+ * the sigbus error which is either original memory error, or specific memory
+ * error that caused of hot unplug. When sigbus error be captured, it could
+ * call this function to handle sigbus error.
+ * @param failure_addr
+ *	Pointer of the fault address of the sigbus error.
+ *
+ * @return
+ *	0 for success handle the sigbus.
+ *	1 for no bus handle the sigbus.
+ *	-1 for failed to handle the sigbus
+ */
+typedef int (*rte_bus_sigbus_handler_t)(const void *failure_addr);
+
+/**
  * Bus scan policies
  */
 enum rte_bus_scan_mode {
@@ -227,6 +242,8 @@ struct rte_bus {
 	rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
 	rte_bus_hotplug_failure_handler_t hotplug_failure_handler;
 					/**< handle hotplug failure on bus */
+	rte_bus_sigbus_handler_t sigbus_handler; /**< handle sigbus error */
+
 };
 
 /**
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v6 4/7] bus/pci: implement sigbus handler operation
  2018-07-09  6:51       ` [PATCH v6 0/7] hotplug failure handle mechanism Jeff Guo
                           ` (2 preceding siblings ...)
  2018-07-09  6:51         ` [PATCH v6 3/7] bus: add sigbus handler Jeff Guo
@ 2018-07-09  6:51         ` Jeff Guo
  2018-07-09  6:51         ` [PATCH v6 5/7] bus: add helper to handle sigbus Jeff Guo
                           ` (2 subsequent siblings)
  6 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-07-09  6:51 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

This patch implements the ops of sigbus handler for PCI bus, it is
functional to find the corresponding pci device which is been hotplug
out, and then call the bus ops of hotplug failure handler to handle
the failure for the device.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Shaopeng He <shaopeng.he@intel.com>
---
v6->v5:
refine some typo
---
 drivers/bus/pci/pci_common.c | 49 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 49 insertions(+)

diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
index d7abe6c..37ad266 100644
--- a/drivers/bus/pci/pci_common.c
+++ b/drivers/bus/pci/pci_common.c
@@ -407,6 +407,32 @@ pci_find_device(const struct rte_device *start, rte_dev_cmp_t cmp,
 	return NULL;
 }
 
+/* check the failure address belongs to which device. */
+static struct rte_pci_device *
+pci_find_device_by_addr(const void *failure_addr)
+{
+	struct rte_pci_device *pdev = NULL;
+	int i;
+
+	FOREACH_DEVICE_ON_PCIBUS(pdev) {
+		for (i = 0; i != RTE_DIM(pdev->mem_resource); i++) {
+			if ((uint64_t)(uintptr_t)failure_addr >=
+			    (uint64_t)(uintptr_t)pdev->mem_resource[i].addr &&
+			    (uint64_t)(uintptr_t)failure_addr <
+			    (uint64_t)(uintptr_t)pdev->mem_resource[i].addr +
+			    pdev->mem_resource[i].len) {
+				RTE_LOG(INFO, EAL, "Failure address "
+					"%16.16"PRIx64" belongs to "
+					"device %s!\n",
+					(uint64_t)(uintptr_t)failure_addr,
+					pdev->device.name);
+				return pdev;
+			}
+		}
+	}
+	return NULL;
+}
+
 static int
 pci_hotplug_failure_handler(struct rte_device *dev)
 {
@@ -435,6 +461,28 @@ pci_hotplug_failure_handler(struct rte_device *dev)
 }
 
 static int
+pci_sigbus_handler(const void *failure_addr)
+{
+	struct rte_pci_device *pdev = NULL;
+	int ret = 0;
+
+	pdev = pci_find_device_by_addr(failure_addr);
+	if (!pdev) {
+		/* It is a generic sigbus error, no bus would handle it. */
+		ret = 1;
+	} else {
+		/* The sigbus error is caused of hot removal. */
+		ret = pci_hotplug_failure_handler(&pdev->device);
+		if (ret) {
+			RTE_LOG(ERR, EAL, "Failed to handle hot plug for "
+				"device %s", pdev->name);
+			ret = -1;
+		}
+	}
+	return ret;
+}
+
+static int
 pci_plug(struct rte_device *dev)
 {
 	return pci_probe_all_drivers(RTE_DEV_TO_PCI(dev));
@@ -465,6 +513,7 @@ struct rte_pci_bus rte_pci_bus = {
 		.parse = pci_parse,
 		.get_iommu_class = rte_pci_get_iommu_class,
 		.hotplug_failure_handler = pci_hotplug_failure_handler,
+		.sigbus_handler = pci_sigbus_handler,
 	},
 	.device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
 	.driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v6 5/7] bus: add helper to handle sigbus
  2018-07-09  6:51       ` [PATCH v6 0/7] hotplug failure handle mechanism Jeff Guo
                           ` (3 preceding siblings ...)
  2018-07-09  6:51         ` [PATCH v6 4/7] bus/pci: implement sigbus handler operation Jeff Guo
@ 2018-07-09  6:51         ` Jeff Guo
  2018-07-09  6:51         ` [PATCH v6 6/7] eal: add failure handle mechanism for hotplug Jeff Guo
  2018-07-09  6:51         ` [PATCH v6 7/7] igb_uio: fix uio release issue when hot unplug Jeff Guo
  6 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-07-09  6:51 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

This patch aim to add a helper to iterate all buses to find the
corresponding bus to handle the sigbus error.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Shaopeng He <shaopeng.he@intel.com>
---
v6->v5:
refine some coding style.
---
 lib/librte_eal/common/eal_common_bus.c | 42 ++++++++++++++++++++++++++++++++++
 lib/librte_eal/common/eal_private.h    | 12 ++++++++++
 2 files changed, 54 insertions(+)

diff --git a/lib/librte_eal/common/eal_common_bus.c b/lib/librte_eal/common/eal_common_bus.c
index 0943851..8856adc 100644
--- a/lib/librte_eal/common/eal_common_bus.c
+++ b/lib/librte_eal/common/eal_common_bus.c
@@ -37,6 +37,7 @@
 #include <rte_bus.h>
 #include <rte_debug.h>
 #include <rte_string_fns.h>
+#include <rte_errno.h>
 
 #include "eal_private.h"
 
@@ -242,3 +243,44 @@ rte_bus_get_iommu_class(void)
 	}
 	return mode;
 }
+
+static int
+bus_handle_sigbus(const struct rte_bus *bus,
+			const void *failure_addr)
+{
+	int ret;
+
+	if (!bus->sigbus_handler) {
+		RTE_LOG(ERR, EAL, "Function sigbus_handler not supported by "
+			"bus (%s)\n", bus->name);
+		return -1;
+	}
+
+	ret = bus->sigbus_handler(failure_addr);
+	rte_errno = ret;
+
+	return !(bus->sigbus_handler && ret <= 0);
+}
+
+int
+rte_bus_sigbus_handler(const void *failure_addr)
+{
+	struct rte_bus *bus;
+
+	int ret = 0;
+	int old_errno = rte_errno;
+
+	rte_errno = 0;
+
+	bus = rte_bus_find(NULL, bus_handle_sigbus, failure_addr);
+	/* failed to handle the sigbus, pass the new errno. */
+	if (!bus)
+		ret = 1;
+	else if (rte_errno == -1)
+		return -1;
+
+	/* otherwise restore the old errno. */
+	rte_errno = old_errno;
+
+	return ret;
+}
diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h
index bdadc4d..2337e71 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -258,4 +258,16 @@ int rte_mp_channel_init(void);
  */
 void dev_callback_process(char *device_name, enum rte_dev_event_type event);
 
+/**
+ * Iterate all buses to find the corresponding bus, to handle the sigbus error.
+ * @param failure_addr
+ *	Pointer of the fault address of the sigbus error.
+ *
+ * @return
+ *	 0 success to handle the sigbus.
+ *	-1 failed to handle the sigbus
+ *	 1 no bus can handler the sigbus
+ */
+int rte_bus_sigbus_handler(const void *failure_addr);
+
 #endif /* _EAL_PRIVATE_H_ */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v6 6/7] eal: add failure handle mechanism for hotplug
  2018-07-09  6:51       ` [PATCH v6 0/7] hotplug failure handle mechanism Jeff Guo
                           ` (4 preceding siblings ...)
  2018-07-09  6:51         ` [PATCH v6 5/7] bus: add helper to handle sigbus Jeff Guo
@ 2018-07-09  6:51         ` Jeff Guo
  2018-07-09  7:42           ` Gaëtan Rivet
  2018-07-09  6:51         ` [PATCH v6 7/7] igb_uio: fix uio release issue when hot unplug Jeff Guo
  6 siblings, 1 reply; 494+ messages in thread
From: Jeff Guo @ 2018-07-09  6:51 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

This patch introduces a failure handle mechanism to handle device
hotplug removal event.

First it can register sigbus handler when enable device event monitor. Once
sigbus error be captured, it will check the failure address and accordingly
remap the invalid memory for the corresponding device. Besed on this
mechanism, it could guaranty the application not crash when the device be
hotplug out.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Shaopeng He <shaopeng.he@intel.com>
---
v6->v5:
refine some doc and coding style
---
 lib/librte_eal/linuxapp/eal/eal_dev.c | 114 +++++++++++++++++++++++++++++++++-
 1 file changed, 113 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
index 1cf6aeb..cb30729 100644
--- a/lib/librte_eal/linuxapp/eal/eal_dev.c
+++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
@@ -4,6 +4,8 @@
 
 #include <string.h>
 #include <unistd.h>
+#include <fcntl.h>
+#include <signal.h>
 #include <sys/socket.h>
 #include <linux/netlink.h>
 
@@ -14,15 +16,31 @@
 #include <rte_malloc.h>
 #include <rte_interrupts.h>
 #include <rte_alarm.h>
+#include <rte_bus.h>
+#include <rte_eal.h>
+#include <rte_spinlock.h>
+#include <rte_errno.h>
 
 #include "eal_private.h"
 
 static struct rte_intr_handle intr_handle = {.fd = -1 };
 static bool monitor_started;
 
+extern struct rte_bus_list rte_bus_list;
+
 #define EAL_UEV_MSG_LEN 4096
 #define EAL_UEV_MSG_ELEM_LEN 128
 
+/*
+ * spinlock for device failure process, protect the bus and the device
+ * to avoid race condition.
+ */
+static rte_spinlock_t dev_failure_lock = RTE_SPINLOCK_INITIALIZER;
+
+static struct sigaction sigbus_action_old;
+
+static int sigbus_need_recover;
+
 static void dev_uev_handler(__rte_unused void *param);
 
 /* identify the system layer which reports this event. */
@@ -33,6 +51,49 @@ enum eal_dev_event_subsystem {
 	EAL_DEV_EVENT_SUBSYSTEM_MAX
 };
 
+static void
+sigbus_action_recover(void)
+{
+	if (sigbus_need_recover) {
+		sigaction(SIGBUS, &sigbus_action_old, NULL);
+		sigbus_need_recover = 0;
+	}
+}
+
+static void sigbus_handler(int signum, siginfo_t *info,
+				void *ctx __rte_unused)
+{
+	int ret;
+
+	RTE_LOG(INFO, EAL, "Thread[%d] catch SIGBUS, fault address:%p\n",
+		(int)pthread_self(), info->si_addr);
+
+	rte_spinlock_lock(&dev_failure_lock);
+	ret = rte_bus_sigbus_handler(info->si_addr);
+	rte_spinlock_unlock(&dev_failure_lock);
+	if (ret == -1) {
+		rte_exit(EXIT_FAILURE,
+			 "Failed to handle SIGBUS for hotplug, "
+			 "(rte_errno: %s)!", strerror(rte_errno));
+	} else if (ret == 1) {
+		if (sigbus_action_old.sa_handler)
+			(*(sigbus_action_old.sa_handler))(signum);
+		else
+			rte_exit(EXIT_FAILURE,
+				 "Failed to handle generic SIGBUS!");
+	}
+
+	RTE_LOG(INFO, EAL, "Success to handle SIGBUS for hotplug!\n");
+}
+
+static int cmp_dev_name(const struct rte_device *dev,
+	const void *_name)
+{
+	const char *name = _name;
+
+	return strcmp(dev->name, name);
+}
+
 static int
 dev_uev_socket_fd_create(void)
 {
@@ -147,6 +208,9 @@ dev_uev_handler(__rte_unused void *param)
 	struct rte_dev_event uevent;
 	int ret;
 	char buf[EAL_UEV_MSG_LEN];
+	struct rte_bus *bus;
+	struct rte_device *dev;
+	const char *busname = "";
 
 	memset(&uevent, 0, sizeof(struct rte_dev_event));
 	memset(buf, 0, EAL_UEV_MSG_LEN);
@@ -171,13 +235,50 @@ dev_uev_handler(__rte_unused void *param)
 	RTE_LOG(DEBUG, EAL, "receive uevent(name:%s, type:%d, subsystem:%d)\n",
 		uevent.devname, uevent.type, uevent.subsystem);
 
-	if (uevent.devname)
+	switch (uevent.subsystem) {
+	case EAL_DEV_EVENT_SUBSYSTEM_PCI:
+	case EAL_DEV_EVENT_SUBSYSTEM_UIO:
+		busname = "pci";
+		break;
+	default:
+		break;
+	}
+
+	if (uevent.devname) {
+		if (uevent.type == RTE_DEV_EVENT_REMOVE) {
+			rte_spinlock_lock(&dev_failure_lock);
+			bus = rte_bus_find_by_name(busname);
+			if (bus == NULL) {
+				RTE_LOG(ERR, EAL, "Cannot find bus (%s)\n",
+					busname);
+				return;
+			}
+
+			dev = bus->find_device(NULL, cmp_dev_name,
+					       uevent.devname);
+			if (dev == NULL) {
+				RTE_LOG(ERR, EAL, "Cannot find device (%s) on "
+					"bus (%s)\n", uevent.devname, busname);
+				return;
+			}
+
+			ret = bus->hotplug_failure_handler(dev);
+			rte_spinlock_unlock(&dev_failure_lock);
+			if (ret) {
+				RTE_LOG(ERR, EAL, "Can not handle hotplug for "
+					"device (%s)\n", dev->name);
+				return;
+			}
+		}
 		dev_callback_process(uevent.devname, uevent.type);
+	}
 }
 
 int __rte_experimental
 rte_dev_event_monitor_start(void)
 {
+	sigset_t mask;
+	struct sigaction action;
 	int ret;
 
 	if (monitor_started)
@@ -197,6 +298,14 @@ rte_dev_event_monitor_start(void)
 		return -1;
 	}
 
+	/* register sigbus handler */
+	sigemptyset(&mask);
+	sigaddset(&mask, SIGBUS);
+	action.sa_flags = SA_SIGINFO;
+	action.sa_mask = mask;
+	action.sa_sigaction = sigbus_handler;
+	sigbus_need_recover = !sigaction(SIGBUS, &action, &sigbus_action_old);
+
 	monitor_started = true;
 
 	return 0;
@@ -217,8 +326,11 @@ rte_dev_event_monitor_stop(void)
 		return ret;
 	}
 
+	sigbus_action_recover();
+
 	close(intr_handle.fd);
 	intr_handle.fd = -1;
 	monitor_started = false;
+
 	return 0;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v6 7/7] igb_uio: fix uio release issue when hot unplug
  2018-07-09  6:51       ` [PATCH v6 0/7] hotplug failure handle mechanism Jeff Guo
                           ` (5 preceding siblings ...)
  2018-07-09  6:51         ` [PATCH v6 6/7] eal: add failure handle mechanism for hotplug Jeff Guo
@ 2018-07-09  6:51         ` Jeff Guo
  6 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-07-09  6:51 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

When hotplug out device, the kernel will release the device resource in the
kernel side, such as the fd sys file will disappear, and the irq will be
released. At this time, if igb uio driver still try to release this
resource, it will cause kernel crash. On the other hand, something like
interrupt disabling do not automatically process in kernel side. If not
handler it, this redundancy and dirty thing will affect the interrupt
resource be used by other device. So the igb_uio driver have to check the
hotplug status, and the corresponding process should be taken in igb uio
driver.

This patch propose to add structure of rte_udev_state into rte_uio_pci_dev
of igb_uio kernel driver, which will record the state of uio device, such
as probed/opened/released/removed/unplug. When detect the unexpected
removal which cause of hotplug out behavior, it will corresponding disable
interrupt resource, while for the part of releasement which kernel have
already handle, just skip it to avoid double free or null pointer kernel
crash issue.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v6->v5:
no change
---
 kernel/linux/igb_uio/igb_uio.c | 51 +++++++++++++++++++++++++++++++++++++++---
 1 file changed, 48 insertions(+), 3 deletions(-)

diff --git a/kernel/linux/igb_uio/igb_uio.c b/kernel/linux/igb_uio/igb_uio.c
index 3398eac..adc8cea 100644
--- a/kernel/linux/igb_uio/igb_uio.c
+++ b/kernel/linux/igb_uio/igb_uio.c
@@ -19,6 +19,15 @@
 
 #include "compat.h"
 
+/* uio pci device state */
+enum rte_udev_state {
+	RTE_UDEV_PROBED,
+	RTE_UDEV_OPENNED,
+	RTE_UDEV_RELEASED,
+	RTE_UDEV_REMOVED,
+	RTE_UDEV_UNPLUG
+};
+
 /**
  * A structure describing the private information for a uio device.
  */
@@ -28,6 +37,7 @@ struct rte_uio_pci_dev {
 	enum rte_intr_mode mode;
 	struct mutex lock;
 	int refcnt;
+	enum rte_udev_state state;
 };
 
 static int wc_activate;
@@ -195,12 +205,22 @@ igbuio_pci_irqhandler(int irq, void *dev_id)
 {
 	struct rte_uio_pci_dev *udev = (struct rte_uio_pci_dev *)dev_id;
 	struct uio_info *info = &udev->info;
+	struct pci_dev *pdev = udev->pdev;
 
 	/* Legacy mode need to mask in hardware */
 	if (udev->mode == RTE_INTR_MODE_LEGACY &&
 	    !pci_check_and_mask_intx(udev->pdev))
 		return IRQ_NONE;
 
+	mutex_lock(&udev->lock);
+	/* check the uevent of the kobj */
+	if ((&pdev->dev.kobj)->state_remove_uevent_sent == 1) {
+		dev_notice(&pdev->dev, "device:%s, sent remove uevent!\n",
+			   (&pdev->dev.kobj)->name);
+		udev->state = RTE_UDEV_UNPLUG;
+	}
+	mutex_unlock(&udev->lock);
+
 	uio_event_notify(info);
 
 	/* Message signal mode, no share IRQ and automasked */
@@ -309,7 +329,6 @@ igbuio_pci_disable_interrupts(struct rte_uio_pci_dev *udev)
 #endif
 }
 
-
 /**
  * This gets called while opening uio device file.
  */
@@ -331,20 +350,29 @@ igbuio_pci_open(struct uio_info *info, struct inode *inode)
 
 	/* enable interrupts */
 	err = igbuio_pci_enable_interrupts(udev);
-	mutex_unlock(&udev->lock);
 	if (err) {
 		dev_err(&dev->dev, "Enable interrupt fails\n");
+		pci_clear_master(dev);
+		mutex_unlock(&udev->lock);
 		return err;
 	}
+	udev->state = RTE_UDEV_OPENNED;
+	mutex_unlock(&udev->lock);
 	return 0;
 }
 
+/**
+ * This gets called while closing uio device file.
+ */
 static int
 igbuio_pci_release(struct uio_info *info, struct inode *inode)
 {
 	struct rte_uio_pci_dev *udev = info->priv;
 	struct pci_dev *dev = udev->pdev;
 
+	if (udev->state == RTE_UDEV_REMOVED)
+		return 0;
+
 	mutex_lock(&udev->lock);
 	if (--udev->refcnt > 0) {
 		mutex_unlock(&udev->lock);
@@ -356,7 +384,7 @@ igbuio_pci_release(struct uio_info *info, struct inode *inode)
 
 	/* stop the device from further DMA */
 	pci_clear_master(dev);
-
+	udev->state = RTE_UDEV_RELEASED;
 	mutex_unlock(&udev->lock);
 	return 0;
 }
@@ -562,6 +590,9 @@ igbuio_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
 			 (unsigned long long)map_dma_addr, map_addr);
 	}
 
+	mutex_lock(&udev->lock);
+	udev->state = RTE_UDEV_PROBED;
+	mutex_unlock(&udev->lock);
 	return 0;
 
 fail_remove_group:
@@ -579,6 +610,20 @@ static void
 igbuio_pci_remove(struct pci_dev *dev)
 {
 	struct rte_uio_pci_dev *udev = pci_get_drvdata(dev);
+	int ret;
+
+	/* handler hot unplug */
+	if (udev->state == RTE_UDEV_OPENNED ||
+		udev->state == RTE_UDEV_UNPLUG) {
+		dev_notice(&dev->dev, "Unexpected removal!\n");
+		ret = igbuio_pci_release(&udev->info, NULL);
+		if (ret)
+			return;
+		mutex_lock(&udev->lock);
+		udev->state = RTE_UDEV_REMOVED;
+		mutex_unlock(&udev->lock);
+		return;
+	}
 
 	mutex_destroy(&udev->lock);
 	sysfs_remove_group(&dev->dev.kobj, &dev_attr_grp);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* Re: [PATCH v6 6/7] eal: add failure handle mechanism for hotplug
  2018-07-09  6:51         ` [PATCH v6 6/7] eal: add failure handle mechanism for hotplug Jeff Guo
@ 2018-07-09  7:42           ` Gaëtan Rivet
  2018-07-09  8:12             ` Jeff Guo
  0 siblings, 1 reply; 494+ messages in thread
From: Gaëtan Rivet @ 2018-07-09  7:42 UTC (permalink / raw)
  To: Jeff Guo
  Cc: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	jingjing.wu, thomas, motih, matan, harry.van.haaren, qi.z.zhang,
	shaopeng.he, bernard.iremonger, arybchenko, jblunck,
	shreyansh.jain, dev, helin.zhang

Hi Jeff,

On Mon, Jul 09, 2018 at 02:51:21PM +0800, Jeff Guo wrote:
> This patch introduces a failure handle mechanism to handle device
> hotplug removal event.
> 
> First it can register sigbus handler when enable device event monitor. Once
> sigbus error be captured, it will check the failure address and accordingly
> remap the invalid memory for the corresponding device. Besed on this
> mechanism, it could guaranty the application not crash when the device be
> hotplug out.
> 
> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> Acked-by: Shaopeng He <shaopeng.he@intel.com>
> ---
> v6->v5:
> refine some doc and coding style
> ---
>  lib/librte_eal/linuxapp/eal/eal_dev.c | 114 +++++++++++++++++++++++++++++++++-
>  1 file changed, 113 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
> index 1cf6aeb..cb30729 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_dev.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
> @@ -4,6 +4,8 @@
>  
>  #include <string.h>
>  #include <unistd.h>
> +#include <fcntl.h>
> +#include <signal.h>
>  #include <sys/socket.h>
>  #include <linux/netlink.h>
>  
> @@ -14,15 +16,31 @@
>  #include <rte_malloc.h>
>  #include <rte_interrupts.h>
>  #include <rte_alarm.h>
> +#include <rte_bus.h>
> +#include <rte_eal.h>
> +#include <rte_spinlock.h>
> +#include <rte_errno.h>
>  
>  #include "eal_private.h"
>  
>  static struct rte_intr_handle intr_handle = {.fd = -1 };
>  static bool monitor_started;
>  
> +extern struct rte_bus_list rte_bus_list;
> +

Where do you use the rte_bus_list? It seems the reference is a remnant
from a previous version.

You do not seem to need a direct access on rte_bus_list,
as you call rte_bus_find instead.

Why do you need this extern? I think its absence is motivated: to keep the
bus list private and force users to access it through standard exposed ways.

Regards,
-- 
Gaëtan Rivet
6WIND

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v6 6/7] eal: add failure handle mechanism for hotplug
  2018-07-09  7:42           ` Gaëtan Rivet
@ 2018-07-09  8:12             ` Jeff Guo
  0 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-07-09  8:12 UTC (permalink / raw)
  To: Gaëtan Rivet
  Cc: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	jingjing.wu, thomas, motih, matan, harry.van.haaren, qi.z.zhang,
	shaopeng.he, bernard.iremonger, arybchenko, jblunck,
	shreyansh.jain, dev, helin.zhang

hi, gaetan


On 7/9/2018 3:42 PM, Gaëtan Rivet wrote:
> Hi Jeff,
>
> On Mon, Jul 09, 2018 at 02:51:21PM +0800, Jeff Guo wrote:
>> This patch introduces a failure handle mechanism to handle device
>> hotplug removal event.
>>
>> First it can register sigbus handler when enable device event monitor. Once
>> sigbus error be captured, it will check the failure address and accordingly
>> remap the invalid memory for the corresponding device. Besed on this
>> mechanism, it could guaranty the application not crash when the device be
>> hotplug out.
>>
>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
>> Acked-by: Shaopeng He <shaopeng.he@intel.com>
>> ---
>> v6->v5:
>> refine some doc and coding style
>> ---
>>   lib/librte_eal/linuxapp/eal/eal_dev.c | 114 +++++++++++++++++++++++++++++++++-
>>   1 file changed, 113 insertions(+), 1 deletion(-)
>>
>> diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
>> index 1cf6aeb..cb30729 100644
>> --- a/lib/librte_eal/linuxapp/eal/eal_dev.c
>> +++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
>> @@ -4,6 +4,8 @@
>>   
>>   #include <string.h>
>>   #include <unistd.h>
>> +#include <fcntl.h>
>> +#include <signal.h>
>>   #include <sys/socket.h>
>>   #include <linux/netlink.h>
>>   
>> @@ -14,15 +16,31 @@
>>   #include <rte_malloc.h>
>>   #include <rte_interrupts.h>
>>   #include <rte_alarm.h>
>> +#include <rte_bus.h>
>> +#include <rte_eal.h>
>> +#include <rte_spinlock.h>
>> +#include <rte_errno.h>
>>   
>>   #include "eal_private.h"
>>   
>>   static struct rte_intr_handle intr_handle = {.fd = -1 };
>>   static bool monitor_started;
>>   
>> +extern struct rte_bus_list rte_bus_list;
>> +
> Where do you use the rte_bus_list? It seems the reference is a remnant
> from a previous version.
>
> You do not seem to need a direct access on rte_bus_list,
> as you call rte_bus_find instead.
>
> Why do you need this extern? I think its absence is motivated: to keep the
> bus list private and force users to access it through standard exposed ways.
>
> Regards,

i think that is my missing here. Will delete it. Thanks for your info.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH v7 0/7] hotplug failure handle mechanism
  2017-06-29  4:37     ` [PATCH v3 0/2] add uevent api for hot plug Jeff Guo
                         ` (11 preceding siblings ...)
  2018-07-09  6:51       ` [PATCH v6 0/7] hotplug failure handle mechanism Jeff Guo
@ 2018-07-09 11:56       ` Jeff Guo
  2018-07-09 11:56         ` [PATCH v7 1/7] bus: add hotplug failure handler Jeff Guo
                           ` (6 more replies)
  2018-07-09 12:00       ` [PATCH v7 0/7] hotplug failure handle mechanism Jeff Guo
                         ` (10 subsequent siblings)
  23 siblings, 7 replies; 494+ messages in thread
From: Jeff Guo @ 2018-07-09 11:56 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

As we know, hot plug is an importance feature, either use for the datacenter
device’s fail-safe, or use for SRIOV Live Migration in SDN/NFV. It could bring
the higher flexibility and continuality to the networking services in multiple
use cases in industry. So let we see, dpdk as an importance networking
framework, what can it help to implement hot plug solution for users.

We already have a general device event detect mechanism, failsafe driver,
bonding driver and hot plug/unplug api in framework, app could use these to
develop their hot plug solution.

let’s see the case of hot unplug, it can happen when a hardware device is
be removed physically, or when the software disables it.  App need to call
ether dev API to detach the device, to unplug the device at the bus level and
make access to the device invalid. But the problem is that, the removal of the
device from the software lists is not going to be instantaneous, at this time
if the data(fast) path still read/write the device, it will cause MMIO error
and result of the app crash out.

Seems that we have got fail-safe driver(or app) + RTE_ETH_EVENT_INTR_RMV +
kernel core driver solution to handle it, but still not have failsafe driver
(or app) + RTE_DEV_EVENT_REMOVE + PCIe pmd driver failure handle solution. So
there is an absence in dpdk hot plug solution right now.

Also, we know that kernel only guaranty hot plug on the kernel side, but not for
the user mode side. Firstly we can hardly have a gatekeeper for any MMIO for
multiple PMD driver. Secondly, no more specific 3rd tools such as udev/driverctl
have especially cover these hot plug failure processing. Third, the feasibility
of app’s implement for multiple user mode PMD driver is still a problem. Here,
a general hot plug failure handle mechanism in dpdk framework would be proposed,
it aim to guaranty that, when hot unplug occur, the system will not crash and
app will not be break out, and user space can normally stop and release any
relevant resources, then unplug of the device at the bus level cleanly.

The mechanism should be come across as bellow:

Firstly, app enabled the device event monitor and register the hot plug event’s
callback before running data path. Once the hot unplug behave occur, the
mechanism will detect the removal event and then accordingly do the failure
handle. In order to do that, below functional will be bring in.
 - Add a new bus ops “handle_hot_unplug” to handle bus read/write error, it is
   bus-specific and each kind of bus can implement its own logic.
 - Implement pci bus specific ops “pci_handle_hot_unplug”. It will base on the
   failure address to remap memory for the corresponding device that unplugged.

For the data path or other unexpected control from the control path when hot
unplug occur.
 - Implement a new sigbus handler, it is registered when start device even
   monitoring. The handler is per process. Base on the signal event principle,
   control path thread and data path thread will randomly receive the sigbus
   error, but will go to the common sigbus handler. Once the MMIO sigbus error
   exposure, it will trigger the above hot unplug operation. The sigbus will be
   check if it is cause of the hot unplug or not, if not will info exception as
   the original sigbus handler. If yes, will do memory remapping.

For the control path and the igb uio release:
 - When hot unplug device, the kernel will release the device resource in the
   kernel side, such as the fd sys file will disappear, and the irq will be
   released. At this time, if igb uio driver still try to release this resource,
   it will cause kernel crash.
   On the other hand, something like interrupt disable do not automatically
   process in kernel side. If not handler it, this redundancy and dirty thing
   will affect the interrupt resource be used by other device.
   So the igb_uio driver have to check the hot plug status and corresponding
   process should be taken in igb uio deriver.
   This patch propose to add structure of rte_udev_state into rte_uio_pci_dev
   of igb_uio kernel driver, which will record the state of uio device, such as
   probed/opened/released/removed/unplug. When detect the unexpected removal
   which cause of hot unplug behavior, it will corresponding disable interrupt
   resource, while for the part of releasement which kernel have already handle,
   just skip it to avoid double free or null pointer kernel crash issue.

The mechanism could be use for fail-safe driver and app which want to use hot
plug solution. let testpmd for example:
 - Enable device event monitor->device unplug->failure handle->stop forwarding->
   stop port->close port->detach port.

This process will not breaking the app/fail-safe running, and will not break
other irrelevance device. And app could plug in the device and restart the date
path again by below.
 - Device plug in->bind igb_uio driver ->attached device->start port->
   start forwarding.

patchset history:
v7->v6:
delete some unused part

v6->v5:
refine some description about bus ops
refine commit log
add some entry check.

v5->v4:
split patches to focus on the failure handle, remove the event usage by testpmd
to another patch.
change the hotplug failure handler name
refine the sigbus handle logic
add lock for udev state in igb uio driver

v4->v3:
split patches to be small and clear
change to use new parameter "--hotplug-mode" in testpmd
to identify the eal hotplug and ethdev hotplug

v3->v2:
change bus ops name to bus_hotplug_handler.
add new API and bus ops of bus_signal_handler
distingush handle generic sigbus and hotplug sigbus

v2->v1(v21):
refine some doc and commit log
fix igb uio kernel issue for control path failure
rebase testpmd code

Since the hot plug solution be discussed serval around in the public, the
scope be changed and the patch set be split into many times. Coming to the
recently RFC and feature design, it just focus on the hot unplug failure
handler at this patch set, so in order let this topic more clear and focus,
summarize privours patch set in history “v1(v21)”, the v2 here go ahead
for further track.

"v1(21)" == v21 as below:
v21->v20:
split function in hot unplug ops
sync failure hanlde to fix multiple process issue fix attach port issue for multiple devices case.
combind rmv callback function to be only one.

v20->v19:
clean the code
refine the remap logic for multiple device.
remove the auto binding

v19->18:
note for limitation of multiple hotplug,fix some typo, sqeeze patch.

v18->v15:
add document, add signal bus handler, refine the code to be more clear.

the prior patch history please check the patch set "add device event monitor framework"

Jeff Guo (7):
  bus: add hotplug failure handler
  bus/pci: implement hotplug failure handler ops
  bus: add sigbus handler
  bus/pci: implement sigbus handler operation
  bus: add helper to handle sigbus
  eal: add failure handle mechanism for hotplug
  igb_uio: fix uio release issue when hot unplug

 drivers/bus/pci/pci_common.c            |  77 ++++++++++++++++++++++
 drivers/bus/pci/pci_common_uio.c        |  33 ++++++++++
 drivers/bus/pci/private.h               |  12 ++++
 kernel/linux/igb_uio/igb_uio.c          |  51 ++++++++++++++-
 lib/librte_eal/common/eal_common_bus.c  |  42 ++++++++++++
 lib/librte_eal/common/eal_private.h     |  12 ++++
 lib/librte_eal/common/include/rte_bus.h |  33 ++++++++++
 lib/librte_eal/linuxapp/eal/eal_dev.c   | 112 +++++++++++++++++++++++++++++++-
 8 files changed, 368 insertions(+), 4 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH v7 1/7] bus: add hotplug failure handler
  2018-07-09 11:56       ` [PATCH v7 0/7] hotplug failure handle mechanism Jeff Guo
@ 2018-07-09 11:56         ` Jeff Guo
  2018-07-09 11:56         ` [PATCH v7 2/7] bus/pci: implement hotplug failure handler ops Jeff Guo
                           ` (5 subsequent siblings)
  6 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-07-09 11:56 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

When device be hotplug out, if app still continue to access device by mmio,
it will cause of memory failure and result the system crash.

This patch introduces a bus ops to handle device hotplug failure, it is a
bus specific behavior, so each kind of bus can implement its own logic case
by case.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
 lib/librte_eal/common/include/rte_bus.h | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
index eb9eded..e3a55a8 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -168,6 +168,20 @@ typedef int (*rte_bus_unplug_t)(struct rte_device *dev);
 typedef int (*rte_bus_parse_t)(const char *name, void *addr);
 
 /**
+ * Implementation a specific hotplug failure handler, which is responsible
+ * for handle the failure when the device be hotplug out from the bus. When
+ * hotplug removal event be detected, it could call this function to handle
+ * failure and guaranty the system would not crash in the case.
+ * @param dev
+ *	Pointer of the device structure.
+ *
+ * @return
+ *	0 on success.
+ *	!0 on error.
+ */
+typedef int (*rte_bus_hotplug_failure_handler_t)(struct rte_device *dev);
+
+/**
  * Bus scan policies
  */
 enum rte_bus_scan_mode {
@@ -211,6 +225,8 @@ struct rte_bus {
 	rte_bus_parse_t parse;       /**< Parse a device name */
 	struct rte_bus_conf conf;    /**< Bus configuration */
 	rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
+	rte_bus_hotplug_failure_handler_t hotplug_failure_handler;
+					/**< handle hotplug failure on bus */
 };
 
 /**
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v7 2/7] bus/pci: implement hotplug failure handler ops
  2018-07-09 11:56       ` [PATCH v7 0/7] hotplug failure handle mechanism Jeff Guo
  2018-07-09 11:56         ` [PATCH v7 1/7] bus: add hotplug failure handler Jeff Guo
@ 2018-07-09 11:56         ` Jeff Guo
  2018-07-09 11:56         ` [PATCH v7 3/7] bus: add sigbus handler Jeff Guo
                           ` (4 subsequent siblings)
  6 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-07-09 11:56 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

This patch implements the ops of hotplug failure handler for PCI bus,
it is functional to remap a new dummy memory which overlap to the
failure memory to avoid MMIO read/write error.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
 drivers/bus/pci/pci_common.c     | 28 ++++++++++++++++++++++++++++
 drivers/bus/pci/pci_common_uio.c | 33 +++++++++++++++++++++++++++++++++
 drivers/bus/pci/private.h        | 12 ++++++++++++
 3 files changed, 73 insertions(+)

diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
index 94b0f41..d7abe6c 100644
--- a/drivers/bus/pci/pci_common.c
+++ b/drivers/bus/pci/pci_common.c
@@ -408,6 +408,33 @@ pci_find_device(const struct rte_device *start, rte_dev_cmp_t cmp,
 }
 
 static int
+pci_hotplug_failure_handler(struct rte_device *dev)
+{
+	struct rte_pci_device *pdev = NULL;
+	int ret = 0;
+
+	pdev = RTE_DEV_TO_PCI(dev);
+	if (!pdev)
+		return -1;
+
+	switch (pdev->kdrv) {
+	case RTE_KDRV_IGB_UIO:
+	case RTE_KDRV_UIO_GENERIC:
+	case RTE_KDRV_NIC_UIO:
+		/* mmio resource is invalid, remap it to be safe. */
+		ret = pci_uio_remap_resource(pdev);
+		break;
+	default:
+		RTE_LOG(DEBUG, EAL,
+			"Not managed by a supported kernel driver, skipped\n");
+		ret = -1;
+		break;
+	}
+
+	return ret;
+}
+
+static int
 pci_plug(struct rte_device *dev)
 {
 	return pci_probe_all_drivers(RTE_DEV_TO_PCI(dev));
@@ -437,6 +464,7 @@ struct rte_pci_bus rte_pci_bus = {
 		.unplug = pci_unplug,
 		.parse = pci_parse,
 		.get_iommu_class = rte_pci_get_iommu_class,
+		.hotplug_failure_handler = pci_hotplug_failure_handler,
 	},
 	.device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
 	.driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
diff --git a/drivers/bus/pci/pci_common_uio.c b/drivers/bus/pci/pci_common_uio.c
index 54bc20b..7ea73db 100644
--- a/drivers/bus/pci/pci_common_uio.c
+++ b/drivers/bus/pci/pci_common_uio.c
@@ -146,6 +146,39 @@ pci_uio_unmap(struct mapped_pci_resource *uio_res)
 	}
 }
 
+/* remap the PCI resource of a PCI device in anonymous virtual memory */
+int
+pci_uio_remap_resource(struct rte_pci_device *dev)
+{
+	int i;
+	void *map_address;
+
+	if (dev == NULL)
+		return -1;
+
+	/* Remap all BARs */
+	for (i = 0; i != PCI_MAX_RESOURCE; i++) {
+		/* skip empty BAR */
+		if (dev->mem_resource[i].phys_addr == 0)
+			continue;
+		map_address = mmap(dev->mem_resource[i].addr,
+				(size_t)dev->mem_resource[i].len,
+				PROT_READ | PROT_WRITE,
+				MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
+		if (map_address == MAP_FAILED) {
+			RTE_LOG(ERR, EAL,
+				"Cannot remap resource for device %s\n",
+				dev->name);
+			return -1;
+		}
+		RTE_LOG(INFO, EAL,
+			"Successful remap resource for device %s\n",
+			dev->name);
+	}
+
+	return 0;
+}
+
 static struct mapped_pci_resource *
 pci_uio_find_resource(struct rte_pci_device *dev)
 {
diff --git a/drivers/bus/pci/private.h b/drivers/bus/pci/private.h
index 8ddd03e..6b312e5 100644
--- a/drivers/bus/pci/private.h
+++ b/drivers/bus/pci/private.h
@@ -123,6 +123,18 @@ void pci_uio_free_resource(struct rte_pci_device *dev,
 		struct mapped_pci_resource *uio_res);
 
 /**
+ * Remap the PCI resource of a PCI device in anonymous virtual memory.
+ *
+ * @param dev
+ *   Point to the struct rte pci device.
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+pci_uio_remap_resource(struct rte_pci_device *dev);
+
+/**
  * Map device memory to uio resource
  *
  * This function is private to EAL.
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v7 3/7] bus: add sigbus handler
  2018-07-09 11:56       ` [PATCH v7 0/7] hotplug failure handle mechanism Jeff Guo
  2018-07-09 11:56         ` [PATCH v7 1/7] bus: add hotplug failure handler Jeff Guo
  2018-07-09 11:56         ` [PATCH v7 2/7] bus/pci: implement hotplug failure handler ops Jeff Guo
@ 2018-07-09 11:56         ` Jeff Guo
  2018-07-09 11:56         ` [PATCH v7 4/7] bus/pci: implement sigbus handler operation Jeff Guo
                           ` (3 subsequent siblings)
  6 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-07-09 11:56 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

When device be hotplug out, if data path still read/write device, the
sigbus error will occur, this error need to be handled. So a handler
need to be here to capture the signal and handle it correspondingly.

This patch introduces a bus ops to handle sigbus error, it is a bus
specific behavior, so that each kind of bus can implement its own logic
case by case.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
 lib/librte_eal/common/include/rte_bus.h | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
index e3a55a8..216ad1e 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -182,6 +182,21 @@ typedef int (*rte_bus_parse_t)(const char *name, void *addr);
 typedef int (*rte_bus_hotplug_failure_handler_t)(struct rte_device *dev);
 
 /**
+ * Implementation a specific sigbus handler, which is responsible for handle
+ * the sigbus error which is either original memory error, or specific memory
+ * error that caused of hot unplug. When sigbus error be captured, it could
+ * call this function to handle sigbus error.
+ * @param failure_addr
+ *	Pointer of the fault address of the sigbus error.
+ *
+ * @return
+ *	0 for success handle the sigbus.
+ *	1 for no bus handle the sigbus.
+ *	-1 for failed to handle the sigbus
+ */
+typedef int (*rte_bus_sigbus_handler_t)(const void *failure_addr);
+
+/**
  * Bus scan policies
  */
 enum rte_bus_scan_mode {
@@ -227,6 +242,8 @@ struct rte_bus {
 	rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
 	rte_bus_hotplug_failure_handler_t hotplug_failure_handler;
 					/**< handle hotplug failure on bus */
+	rte_bus_sigbus_handler_t sigbus_handler; /**< handle sigbus error */
+
 };
 
 /**
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v7 4/7] bus/pci: implement sigbus handler operation
  2018-07-09 11:56       ` [PATCH v7 0/7] hotplug failure handle mechanism Jeff Guo
                           ` (2 preceding siblings ...)
  2018-07-09 11:56         ` [PATCH v7 3/7] bus: add sigbus handler Jeff Guo
@ 2018-07-09 11:56         ` Jeff Guo
  2018-07-09 11:56         ` [PATCH v7 5/7] bus: add helper to handle sigbus Jeff Guo
                           ` (2 subsequent siblings)
  6 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-07-09 11:56 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

This patch implements the ops of sigbus handler for PCI bus, it is
functional to find the corresponding pci device which is been hotplug
out, and then call the bus ops of hotplug failure handler to handle
the failure for the device.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
 drivers/bus/pci/pci_common.c | 49 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 49 insertions(+)

diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
index d7abe6c..37ad266 100644
--- a/drivers/bus/pci/pci_common.c
+++ b/drivers/bus/pci/pci_common.c
@@ -407,6 +407,32 @@ pci_find_device(const struct rte_device *start, rte_dev_cmp_t cmp,
 	return NULL;
 }
 
+/* check the failure address belongs to which device. */
+static struct rte_pci_device *
+pci_find_device_by_addr(const void *failure_addr)
+{
+	struct rte_pci_device *pdev = NULL;
+	int i;
+
+	FOREACH_DEVICE_ON_PCIBUS(pdev) {
+		for (i = 0; i != RTE_DIM(pdev->mem_resource); i++) {
+			if ((uint64_t)(uintptr_t)failure_addr >=
+			    (uint64_t)(uintptr_t)pdev->mem_resource[i].addr &&
+			    (uint64_t)(uintptr_t)failure_addr <
+			    (uint64_t)(uintptr_t)pdev->mem_resource[i].addr +
+			    pdev->mem_resource[i].len) {
+				RTE_LOG(INFO, EAL, "Failure address "
+					"%16.16"PRIx64" belongs to "
+					"device %s!\n",
+					(uint64_t)(uintptr_t)failure_addr,
+					pdev->device.name);
+				return pdev;
+			}
+		}
+	}
+	return NULL;
+}
+
 static int
 pci_hotplug_failure_handler(struct rte_device *dev)
 {
@@ -435,6 +461,28 @@ pci_hotplug_failure_handler(struct rte_device *dev)
 }
 
 static int
+pci_sigbus_handler(const void *failure_addr)
+{
+	struct rte_pci_device *pdev = NULL;
+	int ret = 0;
+
+	pdev = pci_find_device_by_addr(failure_addr);
+	if (!pdev) {
+		/* It is a generic sigbus error, no bus would handle it. */
+		ret = 1;
+	} else {
+		/* The sigbus error is caused of hot removal. */
+		ret = pci_hotplug_failure_handler(&pdev->device);
+		if (ret) {
+			RTE_LOG(ERR, EAL, "Failed to handle hot plug for "
+				"device %s", pdev->name);
+			ret = -1;
+		}
+	}
+	return ret;
+}
+
+static int
 pci_plug(struct rte_device *dev)
 {
 	return pci_probe_all_drivers(RTE_DEV_TO_PCI(dev));
@@ -465,6 +513,7 @@ struct rte_pci_bus rte_pci_bus = {
 		.parse = pci_parse,
 		.get_iommu_class = rte_pci_get_iommu_class,
 		.hotplug_failure_handler = pci_hotplug_failure_handler,
+		.sigbus_handler = pci_sigbus_handler,
 	},
 	.device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
 	.driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v7 5/7] bus: add helper to handle sigbus
  2018-07-09 11:56       ` [PATCH v7 0/7] hotplug failure handle mechanism Jeff Guo
                           ` (3 preceding siblings ...)
  2018-07-09 11:56         ` [PATCH v7 4/7] bus/pci: implement sigbus handler operation Jeff Guo
@ 2018-07-09 11:56         ` Jeff Guo
  2018-07-09 11:56         ` [PATCH v7 6/7] eal: add failure handle mechanism for hotplug Jeff Guo
  2018-07-09 11:56         ` [PATCH v7 7/7] igb_uio: fix uio release issue when hot unplug Jeff Guo
  6 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-07-09 11:56 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

This patch aim to add a helper to iterate all buses to find the
corresponding bus to handle the sigbus error.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
 lib/librte_eal/common/eal_common_bus.c | 42 ++++++++++++++++++++++++++++++++++
 lib/librte_eal/common/eal_private.h    | 12 ++++++++++
 2 files changed, 54 insertions(+)

diff --git a/lib/librte_eal/common/eal_common_bus.c b/lib/librte_eal/common/eal_common_bus.c
index 0943851..8856adc 100644
--- a/lib/librte_eal/common/eal_common_bus.c
+++ b/lib/librte_eal/common/eal_common_bus.c
@@ -37,6 +37,7 @@
 #include <rte_bus.h>
 #include <rte_debug.h>
 #include <rte_string_fns.h>
+#include <rte_errno.h>
 
 #include "eal_private.h"
 
@@ -242,3 +243,44 @@ rte_bus_get_iommu_class(void)
 	}
 	return mode;
 }
+
+static int
+bus_handle_sigbus(const struct rte_bus *bus,
+			const void *failure_addr)
+{
+	int ret;
+
+	if (!bus->sigbus_handler) {
+		RTE_LOG(ERR, EAL, "Function sigbus_handler not supported by "
+			"bus (%s)\n", bus->name);
+		return -1;
+	}
+
+	ret = bus->sigbus_handler(failure_addr);
+	rte_errno = ret;
+
+	return !(bus->sigbus_handler && ret <= 0);
+}
+
+int
+rte_bus_sigbus_handler(const void *failure_addr)
+{
+	struct rte_bus *bus;
+
+	int ret = 0;
+	int old_errno = rte_errno;
+
+	rte_errno = 0;
+
+	bus = rte_bus_find(NULL, bus_handle_sigbus, failure_addr);
+	/* failed to handle the sigbus, pass the new errno. */
+	if (!bus)
+		ret = 1;
+	else if (rte_errno == -1)
+		return -1;
+
+	/* otherwise restore the old errno. */
+	rte_errno = old_errno;
+
+	return ret;
+}
diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h
index bdadc4d..2337e71 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -258,4 +258,16 @@ int rte_mp_channel_init(void);
  */
 void dev_callback_process(char *device_name, enum rte_dev_event_type event);
 
+/**
+ * Iterate all buses to find the corresponding bus, to handle the sigbus error.
+ * @param failure_addr
+ *	Pointer of the fault address of the sigbus error.
+ *
+ * @return
+ *	 0 success to handle the sigbus.
+ *	-1 failed to handle the sigbus
+ *	 1 no bus can handler the sigbus
+ */
+int rte_bus_sigbus_handler(const void *failure_addr);
+
 #endif /* _EAL_PRIVATE_H_ */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v7 6/7] eal: add failure handle mechanism for hotplug
  2018-07-09 11:56       ` [PATCH v7 0/7] hotplug failure handle mechanism Jeff Guo
                           ` (4 preceding siblings ...)
  2018-07-09 11:56         ` [PATCH v7 5/7] bus: add helper to handle sigbus Jeff Guo
@ 2018-07-09 11:56         ` Jeff Guo
  2018-07-09 11:56         ` [PATCH v7 7/7] igb_uio: fix uio release issue when hot unplug Jeff Guo
  6 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-07-09 11:56 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

This patch introduces a failure handle mechanism to handle device
hotplug removal event.

First it can register sigbus handler when enable device event monitor. Once
sigbus error be captured, it will check the failure address and accordingly
remap the invalid memory for the corresponding device. Besed on this
mechanism, it could guaranty the application not crash when the device be
hotplug out.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v7->v6:
delete some unused part.
---
 lib/librte_eal/linuxapp/eal/eal_dev.c | 112 +++++++++++++++++++++++++++++++++-
 1 file changed, 111 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
index 1cf6aeb..0de3fb7 100644
--- a/lib/librte_eal/linuxapp/eal/eal_dev.c
+++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
@@ -4,6 +4,8 @@
 
 #include <string.h>
 #include <unistd.h>
+#include <fcntl.h>
+#include <signal.h>
 #include <sys/socket.h>
 #include <linux/netlink.h>
 
@@ -14,6 +16,10 @@
 #include <rte_malloc.h>
 #include <rte_interrupts.h>
 #include <rte_alarm.h>
+#include <rte_bus.h>
+#include <rte_eal.h>
+#include <rte_spinlock.h>
+#include <rte_errno.h>
 
 #include "eal_private.h"
 
@@ -23,6 +29,16 @@ static bool monitor_started;
 #define EAL_UEV_MSG_LEN 4096
 #define EAL_UEV_MSG_ELEM_LEN 128
 
+/*
+ * spinlock for device failure process, protect the bus and the device
+ * to avoid race condition.
+ */
+static rte_spinlock_t dev_failure_lock = RTE_SPINLOCK_INITIALIZER;
+
+static struct sigaction sigbus_action_old;
+
+static int sigbus_need_recover;
+
 static void dev_uev_handler(__rte_unused void *param);
 
 /* identify the system layer which reports this event. */
@@ -33,6 +49,49 @@ enum eal_dev_event_subsystem {
 	EAL_DEV_EVENT_SUBSYSTEM_MAX
 };
 
+static void
+sigbus_action_recover(void)
+{
+	if (sigbus_need_recover) {
+		sigaction(SIGBUS, &sigbus_action_old, NULL);
+		sigbus_need_recover = 0;
+	}
+}
+
+static void sigbus_handler(int signum, siginfo_t *info,
+				void *ctx __rte_unused)
+{
+	int ret;
+
+	RTE_LOG(INFO, EAL, "Thread[%d] catch SIGBUS, fault address:%p\n",
+		(int)pthread_self(), info->si_addr);
+
+	rte_spinlock_lock(&dev_failure_lock);
+	ret = rte_bus_sigbus_handler(info->si_addr);
+	rte_spinlock_unlock(&dev_failure_lock);
+	if (ret == -1) {
+		rte_exit(EXIT_FAILURE,
+			 "Failed to handle SIGBUS for hotplug, "
+			 "(rte_errno: %s)!", strerror(rte_errno));
+	} else if (ret == 1) {
+		if (sigbus_action_old.sa_handler)
+			(*(sigbus_action_old.sa_handler))(signum);
+		else
+			rte_exit(EXIT_FAILURE,
+				 "Failed to handle generic SIGBUS!");
+	}
+
+	RTE_LOG(INFO, EAL, "Success to handle SIGBUS for hotplug!\n");
+}
+
+static int cmp_dev_name(const struct rte_device *dev,
+	const void *_name)
+{
+	const char *name = _name;
+
+	return strcmp(dev->name, name);
+}
+
 static int
 dev_uev_socket_fd_create(void)
 {
@@ -147,6 +206,9 @@ dev_uev_handler(__rte_unused void *param)
 	struct rte_dev_event uevent;
 	int ret;
 	char buf[EAL_UEV_MSG_LEN];
+	struct rte_bus *bus;
+	struct rte_device *dev;
+	const char *busname = "";
 
 	memset(&uevent, 0, sizeof(struct rte_dev_event));
 	memset(buf, 0, EAL_UEV_MSG_LEN);
@@ -171,13 +233,50 @@ dev_uev_handler(__rte_unused void *param)
 	RTE_LOG(DEBUG, EAL, "receive uevent(name:%s, type:%d, subsystem:%d)\n",
 		uevent.devname, uevent.type, uevent.subsystem);
 
-	if (uevent.devname)
+	switch (uevent.subsystem) {
+	case EAL_DEV_EVENT_SUBSYSTEM_PCI:
+	case EAL_DEV_EVENT_SUBSYSTEM_UIO:
+		busname = "pci";
+		break;
+	default:
+		break;
+	}
+
+	if (uevent.devname) {
+		if (uevent.type == RTE_DEV_EVENT_REMOVE) {
+			rte_spinlock_lock(&dev_failure_lock);
+			bus = rte_bus_find_by_name(busname);
+			if (bus == NULL) {
+				RTE_LOG(ERR, EAL, "Cannot find bus (%s)\n",
+					busname);
+				return;
+			}
+
+			dev = bus->find_device(NULL, cmp_dev_name,
+					       uevent.devname);
+			if (dev == NULL) {
+				RTE_LOG(ERR, EAL, "Cannot find device (%s) on "
+					"bus (%s)\n", uevent.devname, busname);
+				return;
+			}
+
+			ret = bus->hotplug_failure_handler(dev);
+			rte_spinlock_unlock(&dev_failure_lock);
+			if (ret) {
+				RTE_LOG(ERR, EAL, "Can not handle hotplug for "
+					"device (%s)\n", dev->name);
+				return;
+			}
+		}
 		dev_callback_process(uevent.devname, uevent.type);
+	}
 }
 
 int __rte_experimental
 rte_dev_event_monitor_start(void)
 {
+	sigset_t mask;
+	struct sigaction action;
 	int ret;
 
 	if (monitor_started)
@@ -197,6 +296,14 @@ rte_dev_event_monitor_start(void)
 		return -1;
 	}
 
+	/* register sigbus handler */
+	sigemptyset(&mask);
+	sigaddset(&mask, SIGBUS);
+	action.sa_flags = SA_SIGINFO;
+	action.sa_mask = mask;
+	action.sa_sigaction = sigbus_handler;
+	sigbus_need_recover = !sigaction(SIGBUS, &action, &sigbus_action_old);
+
 	monitor_started = true;
 
 	return 0;
@@ -217,8 +324,11 @@ rte_dev_event_monitor_stop(void)
 		return ret;
 	}
 
+	sigbus_action_recover();
+
 	close(intr_handle.fd);
 	intr_handle.fd = -1;
 	monitor_started = false;
+
 	return 0;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v7 7/7] igb_uio: fix uio release issue when hot unplug
  2018-07-09 11:56       ` [PATCH v7 0/7] hotplug failure handle mechanism Jeff Guo
                           ` (5 preceding siblings ...)
  2018-07-09 11:56         ` [PATCH v7 6/7] eal: add failure handle mechanism for hotplug Jeff Guo
@ 2018-07-09 11:56         ` Jeff Guo
  6 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-07-09 11:56 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

When hotplug out device, the kernel will release the device resource in the
kernel side, such as the fd sys file will disappear, and the irq will be
released. At this time, if igb uio driver still try to release this
resource, it will cause kernel crash. On the other hand, something like
interrupt disabling do not automatically process in kernel side. If not
handler it, this redundancy and dirty thing will affect the interrupt
resource be used by other device. So the igb_uio driver have to check the
hotplug status, and the corresponding process should be taken in igb uio
driver.

This patch propose to add structure of rte_udev_state into rte_uio_pci_dev
of igb_uio kernel driver, which will record the state of uio device, such
as probed/opened/released/removed/unplug. When detect the unexpected
removal which cause of hotplug out behavior, it will corresponding disable
interrupt resource, while for the part of releasement which kernel have
already handle, just skip it to avoid double free or null pointer kernel
crash issue.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
 kernel/linux/igb_uio/igb_uio.c | 51 +++++++++++++++++++++++++++++++++++++++---
 1 file changed, 48 insertions(+), 3 deletions(-)

diff --git a/kernel/linux/igb_uio/igb_uio.c b/kernel/linux/igb_uio/igb_uio.c
index 3398eac..adc8cea 100644
--- a/kernel/linux/igb_uio/igb_uio.c
+++ b/kernel/linux/igb_uio/igb_uio.c
@@ -19,6 +19,15 @@
 
 #include "compat.h"
 
+/* uio pci device state */
+enum rte_udev_state {
+	RTE_UDEV_PROBED,
+	RTE_UDEV_OPENNED,
+	RTE_UDEV_RELEASED,
+	RTE_UDEV_REMOVED,
+	RTE_UDEV_UNPLUG
+};
+
 /**
  * A structure describing the private information for a uio device.
  */
@@ -28,6 +37,7 @@ struct rte_uio_pci_dev {
 	enum rte_intr_mode mode;
 	struct mutex lock;
 	int refcnt;
+	enum rte_udev_state state;
 };
 
 static int wc_activate;
@@ -195,12 +205,22 @@ igbuio_pci_irqhandler(int irq, void *dev_id)
 {
 	struct rte_uio_pci_dev *udev = (struct rte_uio_pci_dev *)dev_id;
 	struct uio_info *info = &udev->info;
+	struct pci_dev *pdev = udev->pdev;
 
 	/* Legacy mode need to mask in hardware */
 	if (udev->mode == RTE_INTR_MODE_LEGACY &&
 	    !pci_check_and_mask_intx(udev->pdev))
 		return IRQ_NONE;
 
+	mutex_lock(&udev->lock);
+	/* check the uevent of the kobj */
+	if ((&pdev->dev.kobj)->state_remove_uevent_sent == 1) {
+		dev_notice(&pdev->dev, "device:%s, sent remove uevent!\n",
+			   (&pdev->dev.kobj)->name);
+		udev->state = RTE_UDEV_UNPLUG;
+	}
+	mutex_unlock(&udev->lock);
+
 	uio_event_notify(info);
 
 	/* Message signal mode, no share IRQ and automasked */
@@ -309,7 +329,6 @@ igbuio_pci_disable_interrupts(struct rte_uio_pci_dev *udev)
 #endif
 }
 
-
 /**
  * This gets called while opening uio device file.
  */
@@ -331,20 +350,29 @@ igbuio_pci_open(struct uio_info *info, struct inode *inode)
 
 	/* enable interrupts */
 	err = igbuio_pci_enable_interrupts(udev);
-	mutex_unlock(&udev->lock);
 	if (err) {
 		dev_err(&dev->dev, "Enable interrupt fails\n");
+		pci_clear_master(dev);
+		mutex_unlock(&udev->lock);
 		return err;
 	}
+	udev->state = RTE_UDEV_OPENNED;
+	mutex_unlock(&udev->lock);
 	return 0;
 }
 
+/**
+ * This gets called while closing uio device file.
+ */
 static int
 igbuio_pci_release(struct uio_info *info, struct inode *inode)
 {
 	struct rte_uio_pci_dev *udev = info->priv;
 	struct pci_dev *dev = udev->pdev;
 
+	if (udev->state == RTE_UDEV_REMOVED)
+		return 0;
+
 	mutex_lock(&udev->lock);
 	if (--udev->refcnt > 0) {
 		mutex_unlock(&udev->lock);
@@ -356,7 +384,7 @@ igbuio_pci_release(struct uio_info *info, struct inode *inode)
 
 	/* stop the device from further DMA */
 	pci_clear_master(dev);
-
+	udev->state = RTE_UDEV_RELEASED;
 	mutex_unlock(&udev->lock);
 	return 0;
 }
@@ -562,6 +590,9 @@ igbuio_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
 			 (unsigned long long)map_dma_addr, map_addr);
 	}
 
+	mutex_lock(&udev->lock);
+	udev->state = RTE_UDEV_PROBED;
+	mutex_unlock(&udev->lock);
 	return 0;
 
 fail_remove_group:
@@ -579,6 +610,20 @@ static void
 igbuio_pci_remove(struct pci_dev *dev)
 {
 	struct rte_uio_pci_dev *udev = pci_get_drvdata(dev);
+	int ret;
+
+	/* handler hot unplug */
+	if (udev->state == RTE_UDEV_OPENNED ||
+		udev->state == RTE_UDEV_UNPLUG) {
+		dev_notice(&dev->dev, "Unexpected removal!\n");
+		ret = igbuio_pci_release(&udev->info, NULL);
+		if (ret)
+			return;
+		mutex_lock(&udev->lock);
+		udev->state = RTE_UDEV_REMOVED;
+		mutex_unlock(&udev->lock);
+		return;
+	}
 
 	mutex_destroy(&udev->lock);
 	sysfs_remove_group(&dev->dev.kobj, &dev_attr_grp);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v7 0/7] hotplug failure handle mechanism
  2017-06-29  4:37     ` [PATCH v3 0/2] add uevent api for hot plug Jeff Guo
                         ` (12 preceding siblings ...)
  2018-07-09 11:56       ` [PATCH v7 0/7] hotplug failure handle mechanism Jeff Guo
@ 2018-07-09 12:00       ` Jeff Guo
  2018-07-09 12:01         ` [PATCH v7 1/7] bus: add hotplug failure handler Jeff Guo
                           ` (6 more replies)
  2018-07-10 11:03       ` [PATCH v8 0/7] hotplug failure handle mechanism Jeff Guo
                         ` (9 subsequent siblings)
  23 siblings, 7 replies; 494+ messages in thread
From: Jeff Guo @ 2018-07-09 12:00 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

As we know, hot plug is an importance feature, either use for the datacenter
device’s fail-safe, or use for SRIOV Live Migration in SDN/NFV. It could bring
the higher flexibility and continuality to the networking services in multiple
use cases in industry. So let we see, dpdk as an importance networking
framework, what can it help to implement hot plug solution for users.

We already have a general device event detect mechanism, failsafe driver,
bonding driver and hot plug/unplug api in framework, app could use these to
develop their hot plug solution.

let’s see the case of hot unplug, it can happen when a hardware device is
be removed physically, or when the software disables it.  App need to call
ether dev API to detach the device, to unplug the device at the bus level and
make access to the device invalid. But the problem is that, the removal of the
device from the software lists is not going to be instantaneous, at this time
if the data(fast) path still read/write the device, it will cause MMIO error
and result of the app crash out.

Seems that we have got fail-safe driver(or app) + RTE_ETH_EVENT_INTR_RMV +
kernel core driver solution to handle it, but still not have failsafe driver
(or app) + RTE_DEV_EVENT_REMOVE + PCIe pmd driver failure handle solution. So
there is an absence in dpdk hot plug solution right now.

Also, we know that kernel only guaranty hot plug on the kernel side, but not for
the user mode side. Firstly we can hardly have a gatekeeper for any MMIO for
multiple PMD driver. Secondly, no more specific 3rd tools such as udev/driverctl
have especially cover these hot plug failure processing. Third, the feasibility
of app’s implement for multiple user mode PMD driver is still a problem. Here,
a general hot plug failure handle mechanism in dpdk framework would be proposed,
it aim to guaranty that, when hot unplug occur, the system will not crash and
app will not be break out, and user space can normally stop and release any
relevant resources, then unplug of the device at the bus level cleanly.

The mechanism should be come across as bellow:

Firstly, app enabled the device event monitor and register the hot plug event’s
callback before running data path. Once the hot unplug behave occur, the
mechanism will detect the removal event and then accordingly do the failure
handle. In order to do that, below functional will be bring in.
 - Add a new bus ops “handle_hot_unplug” to handle bus read/write error, it is
   bus-specific and each kind of bus can implement its own logic.
 - Implement pci bus specific ops “pci_handle_hot_unplug”. It will base on the
   failure address to remap memory for the corresponding device that unplugged.

For the data path or other unexpected control from the control path when hot
unplug occur.
 - Implement a new sigbus handler, it is registered when start device even
   monitoring. The handler is per process. Base on the signal event principle,
   control path thread and data path thread will randomly receive the sigbus
   error, but will go to the common sigbus handler. Once the MMIO sigbus error
   exposure, it will trigger the above hot unplug operation. The sigbus will be
   check if it is cause of the hot unplug or not, if not will info exception as
   the original sigbus handler. If yes, will do memory remapping.

For the control path and the igb uio release:
 - When hot unplug device, the kernel will release the device resource in the
   kernel side, such as the fd sys file will disappear, and the irq will be
   released. At this time, if igb uio driver still try to release this resource,
   it will cause kernel crash.
   On the other hand, something like interrupt disable do not automatically
   process in kernel side. If not handler it, this redundancy and dirty thing
   will affect the interrupt resource be used by other device.
   So the igb_uio driver have to check the hot plug status and corresponding
   process should be taken in igb uio deriver.
   This patch propose to add structure of rte_udev_state into rte_uio_pci_dev
   of igb_uio kernel driver, which will record the state of uio device, such as
   probed/opened/released/removed/unplug. When detect the unexpected removal
   which cause of hot unplug behavior, it will corresponding disable interrupt
   resource, while for the part of releasement which kernel have already handle,
   just skip it to avoid double free or null pointer kernel crash issue.

The mechanism could be use for fail-safe driver and app which want to use hot
plug solution. let testpmd for example:
 - Enable device event monitor->device unplug->failure handle->stop forwarding->
   stop port->close port->detach port.

This process will not breaking the app/fail-safe running, and will not break
other irrelevance device. And app could plug in the device and restart the date
path again by below.
 - Device plug in->bind igb_uio driver ->attached device->start port->
   start forwarding.

patchset history:
v7->v6:
delete some unused part

v6->v5:
refine some description about bus ops
refine commit log
add some entry check.

v5->v4:
split patches to focus on the failure handle, remove the event usage by testpmd
to another patch.
change the hotplug failure handler name
refine the sigbus handle logic
add lock for udev state in igb uio driver

v4->v3:
split patches to be small and clear
change to use new parameter "--hotplug-mode" in testpmd
to identify the eal hotplug and ethdev hotplug

v3->v2:
change bus ops name to bus_hotplug_handler.
add new API and bus ops of bus_signal_handler
distingush handle generic sigbus and hotplug sigbus

v2->v1(v21):
refine some doc and commit log
fix igb uio kernel issue for control path failure
rebase testpmd code

Since the hot plug solution be discussed serval around in the public, the
scope be changed and the patch set be split into many times. Coming to the
recently RFC and feature design, it just focus on the hot unplug failure
handler at this patch set, so in order let this topic more clear and focus,
summarize privours patch set in history “v1(v21)”, the v2 here go ahead
for further track.

"v1(21)" == v21 as below:
v21->v20:
split function in hot unplug ops
sync failure hanlde to fix multiple process issue fix attach port issue for multiple devices case.
combind rmv callback function to be only one.

v20->v19:
clean the code
refine the remap logic for multiple device.
remove the auto binding

v19->18:
note for limitation of multiple hotplug,fix some typo, sqeeze patch.

v18->v15:
add document, add signal bus handler, refine the code to be more clear.

the prior patch history please check the patch set "add device event monitor framework"

Jeff Guo (7):
  bus: add hotplug failure handler
  bus/pci: implement hotplug failure handler ops
  bus: add sigbus handler
  bus/pci: implement sigbus handler operation
  bus: add helper to handle sigbus
  eal: add failure handle mechanism for hotplug
  igb_uio: fix uio release issue when hot unplug

 drivers/bus/pci/pci_common.c            |  77 ++++++++++++++++++++++
 drivers/bus/pci/pci_common_uio.c        |  33 ++++++++++
 drivers/bus/pci/private.h               |  12 ++++
 kernel/linux/igb_uio/igb_uio.c          |  51 ++++++++++++++-
 lib/librte_eal/common/eal_common_bus.c  |  42 ++++++++++++
 lib/librte_eal/common/eal_private.h     |  12 ++++
 lib/librte_eal/common/include/rte_bus.h |  33 ++++++++++
 lib/librte_eal/linuxapp/eal/eal_dev.c   | 112 +++++++++++++++++++++++++++++++-
 8 files changed, 368 insertions(+), 4 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH v7 1/7] bus: add hotplug failure handler
  2018-07-09 12:00       ` [PATCH v7 0/7] hotplug failure handle mechanism Jeff Guo
@ 2018-07-09 12:01         ` Jeff Guo
  2018-07-09 12:01         ` [PATCH v7 2/7] bus/pci: implement hotplug failure handler ops Jeff Guo
                           ` (5 subsequent siblings)
  6 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-07-09 12:01 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

When device be hotplug out, if app still continue to access device by mmio,
it will cause of memory failure and result the system crash.

This patch introduces a bus ops to handle device hotplug failure, it is a
bus specific behavior, so each kind of bus can implement its own logic case
by case.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Shaopeng He <shaopeng.he@intel.com>
---
v7->v6:
no change
---
 lib/librte_eal/common/include/rte_bus.h | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
index eb9eded..e3a55a8 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -168,6 +168,20 @@ typedef int (*rte_bus_unplug_t)(struct rte_device *dev);
 typedef int (*rte_bus_parse_t)(const char *name, void *addr);
 
 /**
+ * Implementation a specific hotplug failure handler, which is responsible
+ * for handle the failure when the device be hotplug out from the bus. When
+ * hotplug removal event be detected, it could call this function to handle
+ * failure and guaranty the system would not crash in the case.
+ * @param dev
+ *	Pointer of the device structure.
+ *
+ * @return
+ *	0 on success.
+ *	!0 on error.
+ */
+typedef int (*rte_bus_hotplug_failure_handler_t)(struct rte_device *dev);
+
+/**
  * Bus scan policies
  */
 enum rte_bus_scan_mode {
@@ -211,6 +225,8 @@ struct rte_bus {
 	rte_bus_parse_t parse;       /**< Parse a device name */
 	struct rte_bus_conf conf;    /**< Bus configuration */
 	rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
+	rte_bus_hotplug_failure_handler_t hotplug_failure_handler;
+					/**< handle hotplug failure on bus */
 };
 
 /**
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v7 2/7] bus/pci: implement hotplug failure handler ops
  2018-07-09 12:00       ` [PATCH v7 0/7] hotplug failure handle mechanism Jeff Guo
  2018-07-09 12:01         ` [PATCH v7 1/7] bus: add hotplug failure handler Jeff Guo
@ 2018-07-09 12:01         ` Jeff Guo
  2018-07-09 12:01         ` [PATCH v7 3/7] bus: add sigbus handler Jeff Guo
                           ` (4 subsequent siblings)
  6 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-07-09 12:01 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

This patch implements the ops of hotplug failure handler for PCI bus,
it is functional to remap a new dummy memory which overlap to the
failure memory to avoid MMIO read/write error.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Shaopeng He <shaopeng.he@intel.com>
---
v7->v6:
no change
---
 drivers/bus/pci/pci_common.c     | 28 ++++++++++++++++++++++++++++
 drivers/bus/pci/pci_common_uio.c | 33 +++++++++++++++++++++++++++++++++
 drivers/bus/pci/private.h        | 12 ++++++++++++
 3 files changed, 73 insertions(+)

diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
index 94b0f41..d7abe6c 100644
--- a/drivers/bus/pci/pci_common.c
+++ b/drivers/bus/pci/pci_common.c
@@ -408,6 +408,33 @@ pci_find_device(const struct rte_device *start, rte_dev_cmp_t cmp,
 }
 
 static int
+pci_hotplug_failure_handler(struct rte_device *dev)
+{
+	struct rte_pci_device *pdev = NULL;
+	int ret = 0;
+
+	pdev = RTE_DEV_TO_PCI(dev);
+	if (!pdev)
+		return -1;
+
+	switch (pdev->kdrv) {
+	case RTE_KDRV_IGB_UIO:
+	case RTE_KDRV_UIO_GENERIC:
+	case RTE_KDRV_NIC_UIO:
+		/* mmio resource is invalid, remap it to be safe. */
+		ret = pci_uio_remap_resource(pdev);
+		break;
+	default:
+		RTE_LOG(DEBUG, EAL,
+			"Not managed by a supported kernel driver, skipped\n");
+		ret = -1;
+		break;
+	}
+
+	return ret;
+}
+
+static int
 pci_plug(struct rte_device *dev)
 {
 	return pci_probe_all_drivers(RTE_DEV_TO_PCI(dev));
@@ -437,6 +464,7 @@ struct rte_pci_bus rte_pci_bus = {
 		.unplug = pci_unplug,
 		.parse = pci_parse,
 		.get_iommu_class = rte_pci_get_iommu_class,
+		.hotplug_failure_handler = pci_hotplug_failure_handler,
 	},
 	.device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
 	.driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
diff --git a/drivers/bus/pci/pci_common_uio.c b/drivers/bus/pci/pci_common_uio.c
index 54bc20b..7ea73db 100644
--- a/drivers/bus/pci/pci_common_uio.c
+++ b/drivers/bus/pci/pci_common_uio.c
@@ -146,6 +146,39 @@ pci_uio_unmap(struct mapped_pci_resource *uio_res)
 	}
 }
 
+/* remap the PCI resource of a PCI device in anonymous virtual memory */
+int
+pci_uio_remap_resource(struct rte_pci_device *dev)
+{
+	int i;
+	void *map_address;
+
+	if (dev == NULL)
+		return -1;
+
+	/* Remap all BARs */
+	for (i = 0; i != PCI_MAX_RESOURCE; i++) {
+		/* skip empty BAR */
+		if (dev->mem_resource[i].phys_addr == 0)
+			continue;
+		map_address = mmap(dev->mem_resource[i].addr,
+				(size_t)dev->mem_resource[i].len,
+				PROT_READ | PROT_WRITE,
+				MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
+		if (map_address == MAP_FAILED) {
+			RTE_LOG(ERR, EAL,
+				"Cannot remap resource for device %s\n",
+				dev->name);
+			return -1;
+		}
+		RTE_LOG(INFO, EAL,
+			"Successful remap resource for device %s\n",
+			dev->name);
+	}
+
+	return 0;
+}
+
 static struct mapped_pci_resource *
 pci_uio_find_resource(struct rte_pci_device *dev)
 {
diff --git a/drivers/bus/pci/private.h b/drivers/bus/pci/private.h
index 8ddd03e..6b312e5 100644
--- a/drivers/bus/pci/private.h
+++ b/drivers/bus/pci/private.h
@@ -123,6 +123,18 @@ void pci_uio_free_resource(struct rte_pci_device *dev,
 		struct mapped_pci_resource *uio_res);
 
 /**
+ * Remap the PCI resource of a PCI device in anonymous virtual memory.
+ *
+ * @param dev
+ *   Point to the struct rte pci device.
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+pci_uio_remap_resource(struct rte_pci_device *dev);
+
+/**
  * Map device memory to uio resource
  *
  * This function is private to EAL.
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v7 3/7] bus: add sigbus handler
  2018-07-09 12:00       ` [PATCH v7 0/7] hotplug failure handle mechanism Jeff Guo
  2018-07-09 12:01         ` [PATCH v7 1/7] bus: add hotplug failure handler Jeff Guo
  2018-07-09 12:01         ` [PATCH v7 2/7] bus/pci: implement hotplug failure handler ops Jeff Guo
@ 2018-07-09 12:01         ` Jeff Guo
  2018-07-09 12:01         ` [PATCH v7 4/7] bus/pci: implement sigbus handler operation Jeff Guo
                           ` (3 subsequent siblings)
  6 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-07-09 12:01 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

When device be hotplug out, if data path still read/write device, the
sigbus error will occur, this error need to be handled. So a handler
need to be here to capture the signal and handle it correspondingly.

This patch introduces a bus ops to handle sigbus error, it is a bus
specific behavior, so that each kind of bus can implement its own logic
case by case.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Shaopeng He <shaopeng.he@intel.com>
---
v7->v6:
no change
---
 lib/librte_eal/common/include/rte_bus.h | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
index e3a55a8..216ad1e 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -182,6 +182,21 @@ typedef int (*rte_bus_parse_t)(const char *name, void *addr);
 typedef int (*rte_bus_hotplug_failure_handler_t)(struct rte_device *dev);
 
 /**
+ * Implementation a specific sigbus handler, which is responsible for handle
+ * the sigbus error which is either original memory error, or specific memory
+ * error that caused of hot unplug. When sigbus error be captured, it could
+ * call this function to handle sigbus error.
+ * @param failure_addr
+ *	Pointer of the fault address of the sigbus error.
+ *
+ * @return
+ *	0 for success handle the sigbus.
+ *	1 for no bus handle the sigbus.
+ *	-1 for failed to handle the sigbus
+ */
+typedef int (*rte_bus_sigbus_handler_t)(const void *failure_addr);
+
+/**
  * Bus scan policies
  */
 enum rte_bus_scan_mode {
@@ -227,6 +242,8 @@ struct rte_bus {
 	rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
 	rte_bus_hotplug_failure_handler_t hotplug_failure_handler;
 					/**< handle hotplug failure on bus */
+	rte_bus_sigbus_handler_t sigbus_handler; /**< handle sigbus error */
+
 };
 
 /**
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v7 4/7] bus/pci: implement sigbus handler operation
  2018-07-09 12:00       ` [PATCH v7 0/7] hotplug failure handle mechanism Jeff Guo
                           ` (2 preceding siblings ...)
  2018-07-09 12:01         ` [PATCH v7 3/7] bus: add sigbus handler Jeff Guo
@ 2018-07-09 12:01         ` Jeff Guo
  2018-07-09 12:01         ` [PATCH v7 5/7] bus: add helper to handle sigbus Jeff Guo
                           ` (2 subsequent siblings)
  6 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-07-09 12:01 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

This patch implements the ops of sigbus handler for PCI bus, it is
functional to find the corresponding pci device which is been hotplug
out, and then call the bus ops of hotplug failure handler to handle
the failure for the device.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Shaopeng He <shaopeng.he@intel.com>
---
v7->v6:
no change
---
 drivers/bus/pci/pci_common.c | 49 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 49 insertions(+)

diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
index d7abe6c..37ad266 100644
--- a/drivers/bus/pci/pci_common.c
+++ b/drivers/bus/pci/pci_common.c
@@ -407,6 +407,32 @@ pci_find_device(const struct rte_device *start, rte_dev_cmp_t cmp,
 	return NULL;
 }
 
+/* check the failure address belongs to which device. */
+static struct rte_pci_device *
+pci_find_device_by_addr(const void *failure_addr)
+{
+	struct rte_pci_device *pdev = NULL;
+	int i;
+
+	FOREACH_DEVICE_ON_PCIBUS(pdev) {
+		for (i = 0; i != RTE_DIM(pdev->mem_resource); i++) {
+			if ((uint64_t)(uintptr_t)failure_addr >=
+			    (uint64_t)(uintptr_t)pdev->mem_resource[i].addr &&
+			    (uint64_t)(uintptr_t)failure_addr <
+			    (uint64_t)(uintptr_t)pdev->mem_resource[i].addr +
+			    pdev->mem_resource[i].len) {
+				RTE_LOG(INFO, EAL, "Failure address "
+					"%16.16"PRIx64" belongs to "
+					"device %s!\n",
+					(uint64_t)(uintptr_t)failure_addr,
+					pdev->device.name);
+				return pdev;
+			}
+		}
+	}
+	return NULL;
+}
+
 static int
 pci_hotplug_failure_handler(struct rte_device *dev)
 {
@@ -435,6 +461,28 @@ pci_hotplug_failure_handler(struct rte_device *dev)
 }
 
 static int
+pci_sigbus_handler(const void *failure_addr)
+{
+	struct rte_pci_device *pdev = NULL;
+	int ret = 0;
+
+	pdev = pci_find_device_by_addr(failure_addr);
+	if (!pdev) {
+		/* It is a generic sigbus error, no bus would handle it. */
+		ret = 1;
+	} else {
+		/* The sigbus error is caused of hot removal. */
+		ret = pci_hotplug_failure_handler(&pdev->device);
+		if (ret) {
+			RTE_LOG(ERR, EAL, "Failed to handle hot plug for "
+				"device %s", pdev->name);
+			ret = -1;
+		}
+	}
+	return ret;
+}
+
+static int
 pci_plug(struct rte_device *dev)
 {
 	return pci_probe_all_drivers(RTE_DEV_TO_PCI(dev));
@@ -465,6 +513,7 @@ struct rte_pci_bus rte_pci_bus = {
 		.parse = pci_parse,
 		.get_iommu_class = rte_pci_get_iommu_class,
 		.hotplug_failure_handler = pci_hotplug_failure_handler,
+		.sigbus_handler = pci_sigbus_handler,
 	},
 	.device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
 	.driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v7 5/7] bus: add helper to handle sigbus
  2018-07-09 12:00       ` [PATCH v7 0/7] hotplug failure handle mechanism Jeff Guo
                           ` (3 preceding siblings ...)
  2018-07-09 12:01         ` [PATCH v7 4/7] bus/pci: implement sigbus handler operation Jeff Guo
@ 2018-07-09 12:01         ` Jeff Guo
  2018-07-09 13:48           ` Andrew Rybchenko
  2018-07-09 12:01         ` [PATCH v7 6/7] eal: add failure handle mechanism for hotplug Jeff Guo
  2018-07-09 12:01         ` [PATCH v7 7/7] igb_uio: fix uio release issue when hot unplug Jeff Guo
  6 siblings, 1 reply; 494+ messages in thread
From: Jeff Guo @ 2018-07-09 12:01 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

This patch aim to add a helper to iterate all buses to find the
corresponding bus to handle the sigbus error.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Shaopeng He <shaopeng.he@intel.com>
---
v7->v6:
no change
---
 lib/librte_eal/common/eal_common_bus.c | 42 ++++++++++++++++++++++++++++++++++
 lib/librte_eal/common/eal_private.h    | 12 ++++++++++
 2 files changed, 54 insertions(+)

diff --git a/lib/librte_eal/common/eal_common_bus.c b/lib/librte_eal/common/eal_common_bus.c
index 0943851..8856adc 100644
--- a/lib/librte_eal/common/eal_common_bus.c
+++ b/lib/librte_eal/common/eal_common_bus.c
@@ -37,6 +37,7 @@
 #include <rte_bus.h>
 #include <rte_debug.h>
 #include <rte_string_fns.h>
+#include <rte_errno.h>
 
 #include "eal_private.h"
 
@@ -242,3 +243,44 @@ rte_bus_get_iommu_class(void)
 	}
 	return mode;
 }
+
+static int
+bus_handle_sigbus(const struct rte_bus *bus,
+			const void *failure_addr)
+{
+	int ret;
+
+	if (!bus->sigbus_handler) {
+		RTE_LOG(ERR, EAL, "Function sigbus_handler not supported by "
+			"bus (%s)\n", bus->name);
+		return -1;
+	}
+
+	ret = bus->sigbus_handler(failure_addr);
+	rte_errno = ret;
+
+	return !(bus->sigbus_handler && ret <= 0);
+}
+
+int
+rte_bus_sigbus_handler(const void *failure_addr)
+{
+	struct rte_bus *bus;
+
+	int ret = 0;
+	int old_errno = rte_errno;
+
+	rte_errno = 0;
+
+	bus = rte_bus_find(NULL, bus_handle_sigbus, failure_addr);
+	/* failed to handle the sigbus, pass the new errno. */
+	if (!bus)
+		ret = 1;
+	else if (rte_errno == -1)
+		return -1;
+
+	/* otherwise restore the old errno. */
+	rte_errno = old_errno;
+
+	return ret;
+}
diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h
index bdadc4d..2337e71 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -258,4 +258,16 @@ int rte_mp_channel_init(void);
  */
 void dev_callback_process(char *device_name, enum rte_dev_event_type event);
 
+/**
+ * Iterate all buses to find the corresponding bus, to handle the sigbus error.
+ * @param failure_addr
+ *	Pointer of the fault address of the sigbus error.
+ *
+ * @return
+ *	 0 success to handle the sigbus.
+ *	-1 failed to handle the sigbus
+ *	 1 no bus can handler the sigbus
+ */
+int rte_bus_sigbus_handler(const void *failure_addr);
+
 #endif /* _EAL_PRIVATE_H_ */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v7 6/7] eal: add failure handle mechanism for hotplug
  2018-07-09 12:00       ` [PATCH v7 0/7] hotplug failure handle mechanism Jeff Guo
                           ` (4 preceding siblings ...)
  2018-07-09 12:01         ` [PATCH v7 5/7] bus: add helper to handle sigbus Jeff Guo
@ 2018-07-09 12:01         ` Jeff Guo
  2018-07-09 13:50           ` Andrew Rybchenko
  2018-07-09 12:01         ` [PATCH v7 7/7] igb_uio: fix uio release issue when hot unplug Jeff Guo
  6 siblings, 1 reply; 494+ messages in thread
From: Jeff Guo @ 2018-07-09 12:01 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

This patch introduces a failure handle mechanism to handle device
hotplug removal event.

First it can register sigbus handler when enable device event monitor. Once
sigbus error be captured, it will check the failure address and accordingly
remap the invalid memory for the corresponding device. Besed on this
mechanism, it could guaranty the application not crash when the device be
hotplug out.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Shaopeng He <shaopeng.he@intel.com>
---
v7->v6:
delete some unused part.
---
 lib/librte_eal/linuxapp/eal/eal_dev.c | 112 +++++++++++++++++++++++++++++++++-
 1 file changed, 111 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
index 1cf6aeb..0de3fb7 100644
--- a/lib/librte_eal/linuxapp/eal/eal_dev.c
+++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
@@ -4,6 +4,8 @@
 
 #include <string.h>
 #include <unistd.h>
+#include <fcntl.h>
+#include <signal.h>
 #include <sys/socket.h>
 #include <linux/netlink.h>
 
@@ -14,6 +16,10 @@
 #include <rte_malloc.h>
 #include <rte_interrupts.h>
 #include <rte_alarm.h>
+#include <rte_bus.h>
+#include <rte_eal.h>
+#include <rte_spinlock.h>
+#include <rte_errno.h>
 
 #include "eal_private.h"
 
@@ -23,6 +29,16 @@ static bool monitor_started;
 #define EAL_UEV_MSG_LEN 4096
 #define EAL_UEV_MSG_ELEM_LEN 128
 
+/*
+ * spinlock for device failure process, protect the bus and the device
+ * to avoid race condition.
+ */
+static rte_spinlock_t dev_failure_lock = RTE_SPINLOCK_INITIALIZER;
+
+static struct sigaction sigbus_action_old;
+
+static int sigbus_need_recover;
+
 static void dev_uev_handler(__rte_unused void *param);
 
 /* identify the system layer which reports this event. */
@@ -33,6 +49,49 @@ enum eal_dev_event_subsystem {
 	EAL_DEV_EVENT_SUBSYSTEM_MAX
 };
 
+static void
+sigbus_action_recover(void)
+{
+	if (sigbus_need_recover) {
+		sigaction(SIGBUS, &sigbus_action_old, NULL);
+		sigbus_need_recover = 0;
+	}
+}
+
+static void sigbus_handler(int signum, siginfo_t *info,
+				void *ctx __rte_unused)
+{
+	int ret;
+
+	RTE_LOG(INFO, EAL, "Thread[%d] catch SIGBUS, fault address:%p\n",
+		(int)pthread_self(), info->si_addr);
+
+	rte_spinlock_lock(&dev_failure_lock);
+	ret = rte_bus_sigbus_handler(info->si_addr);
+	rte_spinlock_unlock(&dev_failure_lock);
+	if (ret == -1) {
+		rte_exit(EXIT_FAILURE,
+			 "Failed to handle SIGBUS for hotplug, "
+			 "(rte_errno: %s)!", strerror(rte_errno));
+	} else if (ret == 1) {
+		if (sigbus_action_old.sa_handler)
+			(*(sigbus_action_old.sa_handler))(signum);
+		else
+			rte_exit(EXIT_FAILURE,
+				 "Failed to handle generic SIGBUS!");
+	}
+
+	RTE_LOG(INFO, EAL, "Success to handle SIGBUS for hotplug!\n");
+}
+
+static int cmp_dev_name(const struct rte_device *dev,
+	const void *_name)
+{
+	const char *name = _name;
+
+	return strcmp(dev->name, name);
+}
+
 static int
 dev_uev_socket_fd_create(void)
 {
@@ -147,6 +206,9 @@ dev_uev_handler(__rte_unused void *param)
 	struct rte_dev_event uevent;
 	int ret;
 	char buf[EAL_UEV_MSG_LEN];
+	struct rte_bus *bus;
+	struct rte_device *dev;
+	const char *busname = "";
 
 	memset(&uevent, 0, sizeof(struct rte_dev_event));
 	memset(buf, 0, EAL_UEV_MSG_LEN);
@@ -171,13 +233,50 @@ dev_uev_handler(__rte_unused void *param)
 	RTE_LOG(DEBUG, EAL, "receive uevent(name:%s, type:%d, subsystem:%d)\n",
 		uevent.devname, uevent.type, uevent.subsystem);
 
-	if (uevent.devname)
+	switch (uevent.subsystem) {
+	case EAL_DEV_EVENT_SUBSYSTEM_PCI:
+	case EAL_DEV_EVENT_SUBSYSTEM_UIO:
+		busname = "pci";
+		break;
+	default:
+		break;
+	}
+
+	if (uevent.devname) {
+		if (uevent.type == RTE_DEV_EVENT_REMOVE) {
+			rte_spinlock_lock(&dev_failure_lock);
+			bus = rte_bus_find_by_name(busname);
+			if (bus == NULL) {
+				RTE_LOG(ERR, EAL, "Cannot find bus (%s)\n",
+					busname);
+				return;
+			}
+
+			dev = bus->find_device(NULL, cmp_dev_name,
+					       uevent.devname);
+			if (dev == NULL) {
+				RTE_LOG(ERR, EAL, "Cannot find device (%s) on "
+					"bus (%s)\n", uevent.devname, busname);
+				return;
+			}
+
+			ret = bus->hotplug_failure_handler(dev);
+			rte_spinlock_unlock(&dev_failure_lock);
+			if (ret) {
+				RTE_LOG(ERR, EAL, "Can not handle hotplug for "
+					"device (%s)\n", dev->name);
+				return;
+			}
+		}
 		dev_callback_process(uevent.devname, uevent.type);
+	}
 }
 
 int __rte_experimental
 rte_dev_event_monitor_start(void)
 {
+	sigset_t mask;
+	struct sigaction action;
 	int ret;
 
 	if (monitor_started)
@@ -197,6 +296,14 @@ rte_dev_event_monitor_start(void)
 		return -1;
 	}
 
+	/* register sigbus handler */
+	sigemptyset(&mask);
+	sigaddset(&mask, SIGBUS);
+	action.sa_flags = SA_SIGINFO;
+	action.sa_mask = mask;
+	action.sa_sigaction = sigbus_handler;
+	sigbus_need_recover = !sigaction(SIGBUS, &action, &sigbus_action_old);
+
 	monitor_started = true;
 
 	return 0;
@@ -217,8 +324,11 @@ rte_dev_event_monitor_stop(void)
 		return ret;
 	}
 
+	sigbus_action_recover();
+
 	close(intr_handle.fd);
 	intr_handle.fd = -1;
 	monitor_started = false;
+
 	return 0;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v7 7/7] igb_uio: fix uio release issue when hot unplug
  2018-07-09 12:00       ` [PATCH v7 0/7] hotplug failure handle mechanism Jeff Guo
                           ` (5 preceding siblings ...)
  2018-07-09 12:01         ` [PATCH v7 6/7] eal: add failure handle mechanism for hotplug Jeff Guo
@ 2018-07-09 12:01         ` Jeff Guo
  2018-07-09 22:44           ` Stephen Hemminger
  6 siblings, 1 reply; 494+ messages in thread
From: Jeff Guo @ 2018-07-09 12:01 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

When hotplug out device, the kernel will release the device resource in the
kernel side, such as the fd sys file will disappear, and the irq will be
released. At this time, if igb uio driver still try to release this
resource, it will cause kernel crash. On the other hand, something like
interrupt disabling do not automatically process in kernel side. If not
handler it, this redundancy and dirty thing will affect the interrupt
resource be used by other device. So the igb_uio driver have to check the
hotplug status, and the corresponding process should be taken in igb uio
driver.

This patch propose to add structure of rte_udev_state into rte_uio_pci_dev
of igb_uio kernel driver, which will record the state of uio device, such
as probed/opened/released/removed/unplug. When detect the unexpected
removal which cause of hotplug out behavior, it will corresponding disable
interrupt resource, while for the part of releasement which kernel have
already handle, just skip it to avoid double free or null pointer kernel
crash issue.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
 kernel/linux/igb_uio/igb_uio.c | 51 +++++++++++++++++++++++++++++++++++++++---
 1 file changed, 48 insertions(+), 3 deletions(-)

diff --git a/kernel/linux/igb_uio/igb_uio.c b/kernel/linux/igb_uio/igb_uio.c
index 3398eac..adc8cea 100644
--- a/kernel/linux/igb_uio/igb_uio.c
+++ b/kernel/linux/igb_uio/igb_uio.c
@@ -19,6 +19,15 @@
 
 #include "compat.h"
 
+/* uio pci device state */
+enum rte_udev_state {
+	RTE_UDEV_PROBED,
+	RTE_UDEV_OPENNED,
+	RTE_UDEV_RELEASED,
+	RTE_UDEV_REMOVED,
+	RTE_UDEV_UNPLUG
+};
+
 /**
  * A structure describing the private information for a uio device.
  */
@@ -28,6 +37,7 @@ struct rte_uio_pci_dev {
 	enum rte_intr_mode mode;
 	struct mutex lock;
 	int refcnt;
+	enum rte_udev_state state;
 };
 
 static int wc_activate;
@@ -195,12 +205,22 @@ igbuio_pci_irqhandler(int irq, void *dev_id)
 {
 	struct rte_uio_pci_dev *udev = (struct rte_uio_pci_dev *)dev_id;
 	struct uio_info *info = &udev->info;
+	struct pci_dev *pdev = udev->pdev;
 
 	/* Legacy mode need to mask in hardware */
 	if (udev->mode == RTE_INTR_MODE_LEGACY &&
 	    !pci_check_and_mask_intx(udev->pdev))
 		return IRQ_NONE;
 
+	mutex_lock(&udev->lock);
+	/* check the uevent of the kobj */
+	if ((&pdev->dev.kobj)->state_remove_uevent_sent == 1) {
+		dev_notice(&pdev->dev, "device:%s, sent remove uevent!\n",
+			   (&pdev->dev.kobj)->name);
+		udev->state = RTE_UDEV_UNPLUG;
+	}
+	mutex_unlock(&udev->lock);
+
 	uio_event_notify(info);
 
 	/* Message signal mode, no share IRQ and automasked */
@@ -309,7 +329,6 @@ igbuio_pci_disable_interrupts(struct rte_uio_pci_dev *udev)
 #endif
 }
 
-
 /**
  * This gets called while opening uio device file.
  */
@@ -331,20 +350,29 @@ igbuio_pci_open(struct uio_info *info, struct inode *inode)
 
 	/* enable interrupts */
 	err = igbuio_pci_enable_interrupts(udev);
-	mutex_unlock(&udev->lock);
 	if (err) {
 		dev_err(&dev->dev, "Enable interrupt fails\n");
+		pci_clear_master(dev);
+		mutex_unlock(&udev->lock);
 		return err;
 	}
+	udev->state = RTE_UDEV_OPENNED;
+	mutex_unlock(&udev->lock);
 	return 0;
 }
 
+/**
+ * This gets called while closing uio device file.
+ */
 static int
 igbuio_pci_release(struct uio_info *info, struct inode *inode)
 {
 	struct rte_uio_pci_dev *udev = info->priv;
 	struct pci_dev *dev = udev->pdev;
 
+	if (udev->state == RTE_UDEV_REMOVED)
+		return 0;
+
 	mutex_lock(&udev->lock);
 	if (--udev->refcnt > 0) {
 		mutex_unlock(&udev->lock);
@@ -356,7 +384,7 @@ igbuio_pci_release(struct uio_info *info, struct inode *inode)
 
 	/* stop the device from further DMA */
 	pci_clear_master(dev);
-
+	udev->state = RTE_UDEV_RELEASED;
 	mutex_unlock(&udev->lock);
 	return 0;
 }
@@ -562,6 +590,9 @@ igbuio_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
 			 (unsigned long long)map_dma_addr, map_addr);
 	}
 
+	mutex_lock(&udev->lock);
+	udev->state = RTE_UDEV_PROBED;
+	mutex_unlock(&udev->lock);
 	return 0;
 
 fail_remove_group:
@@ -579,6 +610,20 @@ static void
 igbuio_pci_remove(struct pci_dev *dev)
 {
 	struct rte_uio_pci_dev *udev = pci_get_drvdata(dev);
+	int ret;
+
+	/* handler hot unplug */
+	if (udev->state == RTE_UDEV_OPENNED ||
+		udev->state == RTE_UDEV_UNPLUG) {
+		dev_notice(&dev->dev, "Unexpected removal!\n");
+		ret = igbuio_pci_release(&udev->info, NULL);
+		if (ret)
+			return;
+		mutex_lock(&udev->lock);
+		udev->state = RTE_UDEV_REMOVED;
+		mutex_unlock(&udev->lock);
+		return;
+	}
 
 	mutex_destroy(&udev->lock);
 	sysfs_remove_group(&dev->dev.kobj, &dev_attr_grp);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* Re: [PATCH v7 5/7] bus: add helper to handle sigbus
  2018-07-09 12:01         ` [PATCH v7 5/7] bus: add helper to handle sigbus Jeff Guo
@ 2018-07-09 13:48           ` Andrew Rybchenko
  2018-07-10  8:22             ` Jeff Guo
  0 siblings, 1 reply; 494+ messages in thread
From: Andrew Rybchenko @ 2018-07-09 13:48 UTC (permalink / raw)
  To: Jeff Guo, stephen, bruce.richardson, ferruh.yigit,
	konstantin.ananyev, gaetan.rivet, jingjing.wu, thomas, motih,
	matan, harry.van.haaren, qi.z.zhang, shaopeng.he,
	bernard.iremonger, arybchenko, wenzhuo.lu
  Cc: jblunck, shreyansh.jain, dev, helin.zhang

On 09.07.2018 15:01, Jeff Guo wrote:
> This patch aim to add a helper to iterate all buses to find the
> corresponding bus to handle the sigbus error.
>
> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> Acked-by: Shaopeng He <shaopeng.he@intel.com>
> ---
> v7->v6:
> no change
> ---
>   lib/librte_eal/common/eal_common_bus.c | 42 ++++++++++++++++++++++++++++++++++
>   lib/librte_eal/common/eal_private.h    | 12 ++++++++++
>   2 files changed, 54 insertions(+)
>
> diff --git a/lib/librte_eal/common/eal_common_bus.c b/lib/librte_eal/common/eal_common_bus.c
> index 0943851..8856adc 100644
> --- a/lib/librte_eal/common/eal_common_bus.c
> +++ b/lib/librte_eal/common/eal_common_bus.c
> @@ -37,6 +37,7 @@
>   #include <rte_bus.h>
>   #include <rte_debug.h>
>   #include <rte_string_fns.h>
> +#include <rte_errno.h>
>   
>   #include "eal_private.h"
>   
> @@ -242,3 +243,44 @@ rte_bus_get_iommu_class(void)
>   	}
>   	return mode;
>   }
> +
> +static int
> +bus_handle_sigbus(const struct rte_bus *bus,
> +			const void *failure_addr)
> +{
> +	int ret;
> +
> +	if (!bus->sigbus_handler) {
> +		RTE_LOG(ERR, EAL, "Function sigbus_handler not supported by "
> +			"bus (%s)\n", bus->name);

It is not an error. It is OK that some buses cannot handle SIGBUS.

> +		return -1;
> +	}
> +
> +	ret = bus->sigbus_handler(failure_addr);
> +	rte_errno = ret;
> +
> +	return !(bus->sigbus_handler && ret <= 0);

There is no point to check bus->sigbus_handler here. It is already 
checked above.
So, it should be just:
    return ret > 0;
I.e. we should continue search if the address is not handled by any device
on the bus (we should stop if it is handled (ret==0) or failed to to handle
(ret < 0)).

> +}
> +
> +int
> +rte_bus_sigbus_handler(const void *failure_addr)
> +{
> +	struct rte_bus *bus;
> +
> +	int ret = 0;
> +	int old_errno = rte_errno;
> +
> +	rte_errno = 0;
> +
> +	bus = rte_bus_find(NULL, bus_handle_sigbus, failure_addr);
> +	/* failed to handle the sigbus, pass the new errno. */
> +	if (!bus)
> +		ret = 1;
> +	else if (rte_errno == -1)

I'm still thinking it is bad to keep negative value in rte_errno here.

> +		return -1;
> +
> +	/* otherwise restore the old errno. */
> +	rte_errno = old_errno;
> +
> +	return ret;
> +}
> diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h
> index bdadc4d..2337e71 100644
> --- a/lib/librte_eal/common/eal_private.h
> +++ b/lib/librte_eal/common/eal_private.h
> @@ -258,4 +258,16 @@ int rte_mp_channel_init(void);
>    */
>   void dev_callback_process(char *device_name, enum rte_dev_event_type event);
>   
> +/**
> + * Iterate all buses to find the corresponding bus, to handle the sigbus error.
> + * @param failure_addr
> + *	Pointer of the fault address of the sigbus error.
> + *
> + * @return
> + *	 0 success to handle the sigbus.
> + *	-1 failed to handle the sigbus
> + *	 1 no bus can handler the sigbus
> + */
> +int rte_bus_sigbus_handler(const void *failure_addr);
> +
>   #endif /* _EAL_PRIVATE_H_ */

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v7 6/7] eal: add failure handle mechanism for hotplug
  2018-07-09 12:01         ` [PATCH v7 6/7] eal: add failure handle mechanism for hotplug Jeff Guo
@ 2018-07-09 13:50           ` Andrew Rybchenko
  2018-07-10  8:23             ` Jeff Guo
  0 siblings, 1 reply; 494+ messages in thread
From: Andrew Rybchenko @ 2018-07-09 13:50 UTC (permalink / raw)
  To: Jeff Guo, stephen, bruce.richardson, ferruh.yigit,
	konstantin.ananyev, gaetan.rivet, jingjing.wu, thomas, motih,
	matan, harry.van.haaren, qi.z.zhang, shaopeng.he,
	bernard.iremonger, arybchenko, wenzhuo.lu
  Cc: jblunck, shreyansh.jain, dev, helin.zhang

On 09.07.2018 15:01, Jeff Guo wrote:
> This patch introduces a failure handle mechanism to handle device
> hotplug removal event.
>
> First it can register sigbus handler when enable device event monitor. Once
> sigbus error be captured, it will check the failure address and accordingly
> remap the invalid memory for the corresponding device. Besed on this
> mechanism, it could guaranty the application not crash when the device be
> hotplug out.
>
> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> Acked-by: Shaopeng He <shaopeng.he@intel.com>
> ---
> v7->v6:
> delete some unused part.
> ---
>   lib/librte_eal/linuxapp/eal/eal_dev.c | 112 +++++++++++++++++++++++++++++++++-
>   1 file changed, 111 insertions(+), 1 deletion(-)
>
> diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
> index 1cf6aeb..0de3fb7 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_dev.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
> @@ -4,6 +4,8 @@
>   
>   #include <string.h>
>   #include <unistd.h>
> +#include <fcntl.h>
> +#include <signal.h>
>   #include <sys/socket.h>
>   #include <linux/netlink.h>
>   
> @@ -14,6 +16,10 @@
>   #include <rte_malloc.h>
>   #include <rte_interrupts.h>
>   #include <rte_alarm.h>
> +#include <rte_bus.h>
> +#include <rte_eal.h>
> +#include <rte_spinlock.h>
> +#include <rte_errno.h>
>   
>   #include "eal_private.h"
>   
> @@ -23,6 +29,16 @@ static bool monitor_started;
>   #define EAL_UEV_MSG_LEN 4096
>   #define EAL_UEV_MSG_ELEM_LEN 128
>   
> +/*
> + * spinlock for device failure process, protect the bus and the device
> + * to avoid race condition.
> + */
> +static rte_spinlock_t dev_failure_lock = RTE_SPINLOCK_INITIALIZER;

Sorry, it is still too vague why the lock is required. It is just generic
words. Please, add details and describe circumstance when it is
required.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v7 7/7] igb_uio: fix uio release issue when hot unplug
  2018-07-09 12:01         ` [PATCH v7 7/7] igb_uio: fix uio release issue when hot unplug Jeff Guo
@ 2018-07-09 22:44           ` Stephen Hemminger
  2018-07-10  8:28             ` Jeff Guo
  0 siblings, 1 reply; 494+ messages in thread
From: Stephen Hemminger @ 2018-07-09 22:44 UTC (permalink / raw)
  To: Jeff Guo
  Cc: bruce.richardson, ferruh.yigit, konstantin.ananyev, gaetan.rivet,
	jingjing.wu, thomas, motih, matan, harry.van.haaren, qi.z.zhang,
	shaopeng.he, bernard.iremonger, arybchenko, wenzhuo.lu, jblunck,
	shreyansh.jain, dev, helin.zhang

On Mon,  9 Jul 2018 20:01:06 +0800
Jeff Guo <jia.guo@intel.com> wrote:

> @@ -195,12 +205,22 @@ igbuio_pci_irqhandler(int irq, void *dev_id)
>  {
>  	struct rte_uio_pci_dev *udev = (struct rte_uio_pci_dev *)dev_id;
>  	struct uio_info *info = &udev->info;
> +	struct pci_dev *pdev = udev->pdev;
>  
>  	/* Legacy mode need to mask in hardware */
>  	if (udev->mode == RTE_INTR_MODE_LEGACY &&
>  	    !pci_check_and_mask_intx(udev->pdev))
>  		return IRQ_NONE;
>  
> +	mutex_lock(&udev->lock);
> +	/* check the uevent of the kobj */
> +	if ((&pdev->dev.kobj)->state_remove_uevent_sent == 1) {
> +		dev_notice(&pdev->dev, "device:%s, sent remove uevent!\n",
> +			   (&pdev->dev.kobj)->name);
> +		udev->state = RTE_UDEV_UNPLUG;
> +	}
> +	mutex_unlock(&udev->lock);

Did you run with lockdep?
I don't think you can safely acquire a mutex (would sleep) in IRQ context.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v7 5/7] bus: add helper to handle sigbus
  2018-07-09 13:48           ` Andrew Rybchenko
@ 2018-07-10  8:22             ` Jeff Guo
  2018-07-10  8:40               ` Gaëtan Rivet
  0 siblings, 1 reply; 494+ messages in thread
From: Jeff Guo @ 2018-07-10  8:22 UTC (permalink / raw)
  To: Andrew Rybchenko, stephen, bruce.richardson, ferruh.yigit,
	konstantin.ananyev, gaetan.rivet, jingjing.wu, thomas, motih,
	matan, harry.van.haaren, qi.z.zhang, shaopeng.he,
	bernard.iremonger, wenzhuo.lu
  Cc: jblunck, shreyansh.jain, dev, helin.zhang



On 7/9/2018 9:48 PM, Andrew Rybchenko wrote:
> On 09.07.2018 15:01, Jeff Guo wrote:
>> This patch aim to add a helper to iterate all buses to find the
>> corresponding bus to handle the sigbus error.
>>
>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
>> Acked-by: Shaopeng He <shaopeng.he@intel.com>
>> ---
>> v7->v6:
>> no change
>> ---
>>   lib/librte_eal/common/eal_common_bus.c | 42 
>> ++++++++++++++++++++++++++++++++++
>>   lib/librte_eal/common/eal_private.h    | 12 ++++++++++
>>   2 files changed, 54 insertions(+)
>>
>> diff --git a/lib/librte_eal/common/eal_common_bus.c 
>> b/lib/librte_eal/common/eal_common_bus.c
>> index 0943851..8856adc 100644
>> --- a/lib/librte_eal/common/eal_common_bus.c
>> +++ b/lib/librte_eal/common/eal_common_bus.c
>> @@ -37,6 +37,7 @@
>>   #include <rte_bus.h>
>>   #include <rte_debug.h>
>>   #include <rte_string_fns.h>
>> +#include <rte_errno.h>
>>     #include "eal_private.h"
>>   @@ -242,3 +243,44 @@ rte_bus_get_iommu_class(void)
>>       }
>>       return mode;
>>   }
>> +
>> +static int
>> +bus_handle_sigbus(const struct rte_bus *bus,
>> +            const void *failure_addr)
>> +{
>> +    int ret;
>> +
>> +    if (!bus->sigbus_handler) {
>> +        RTE_LOG(ERR, EAL, "Function sigbus_handler not supported by "
>> +            "bus (%s)\n", bus->name);
>
> It is not an error. It is OK that some buses cannot handle SIGBUS.
>

yes, it is.

>> +        return -1;
>> +    }
>> +
>> +    ret = bus->sigbus_handler(failure_addr);
>> +    rte_errno = ret;
>> +
>> +    return !(bus->sigbus_handler && ret <= 0);
>
> There is no point to check bus->sigbus_handler here. It is already 
> checked above.
> So, it should be just:
>    return ret > 0;
> I.e. we should continue search if the address is not handled by any 
> device
> on the bus (we should stop if it is handled (ret==0) or failed to to 
> handle
> (ret < 0)).
>

i will modify it, thanks.

>> +}
>> +
>> +int
>> +rte_bus_sigbus_handler(const void *failure_addr)
>> +{
>> +    struct rte_bus *bus;
>> +
>> +    int ret = 0;
>> +    int old_errno = rte_errno;
>> +
>> +    rte_errno = 0;
>> +
>> +    bus = rte_bus_find(NULL, bus_handle_sigbus, failure_addr);
>> +    /* failed to handle the sigbus, pass the new errno. */
>> +    if (!bus)
>> +        ret = 1;
>> +    else if (rte_errno == -1)
>
> I'm still thinking it is bad to keep negative value in rte_errno here.
>

i think the rte_errno just no used for the caller if return -1. Since if 
find bus but process failed, will use rte_exit to process whatever the 
rte_errno value. Only return 1 means use the origin sigbus handler that 
will care about the errno.

>> +        return -1;
>> +
>> +    /* otherwise restore the old errno. */
>> +    rte_errno = old_errno;
>> +
>> +    return ret;
>> +}
>> diff --git a/lib/librte_eal/common/eal_private.h 
>> b/lib/librte_eal/common/eal_private.h
>> index bdadc4d..2337e71 100644
>> --- a/lib/librte_eal/common/eal_private.h
>> +++ b/lib/librte_eal/common/eal_private.h
>> @@ -258,4 +258,16 @@ int rte_mp_channel_init(void);
>>    */
>>   void dev_callback_process(char *device_name, enum 
>> rte_dev_event_type event);
>>   +/**
>> + * Iterate all buses to find the corresponding bus, to handle the 
>> sigbus error.
>> + * @param failure_addr
>> + *    Pointer of the fault address of the sigbus error.
>> + *
>> + * @return
>> + *     0 success to handle the sigbus.
>> + *    -1 failed to handle the sigbus
>> + *     1 no bus can handler the sigbus
>> + */
>> +int rte_bus_sigbus_handler(const void *failure_addr);
>> +
>>   #endif /* _EAL_PRIVATE_H_ */
>

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v7 6/7] eal: add failure handle mechanism for hotplug
  2018-07-09 13:50           ` Andrew Rybchenko
@ 2018-07-10  8:23             ` Jeff Guo
  0 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-07-10  8:23 UTC (permalink / raw)
  To: Andrew Rybchenko, stephen, bruce.richardson, ferruh.yigit,
	konstantin.ananyev, gaetan.rivet, jingjing.wu, thomas, motih,
	matan, harry.van.haaren, qi.z.zhang, shaopeng.he,
	bernard.iremonger, wenzhuo.lu
  Cc: jblunck, shreyansh.jain, dev, helin.zhang



On 7/9/2018 9:50 PM, Andrew Rybchenko wrote:
> On 09.07.2018 15:01, Jeff Guo wrote:
>> This patch introduces a failure handle mechanism to handle device
>> hotplug removal event.
>>
>> First it can register sigbus handler when enable device event 
>> monitor. Once
>> sigbus error be captured, it will check the failure address and 
>> accordingly
>> remap the invalid memory for the corresponding device. Besed on this
>> mechanism, it could guaranty the application not crash when the 
>> device be
>> hotplug out.
>>
>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
>> Acked-by: Shaopeng He <shaopeng.he@intel.com>
>> ---
>> v7->v6:
>> delete some unused part.
>> ---
>>   lib/librte_eal/linuxapp/eal/eal_dev.c | 112 
>> +++++++++++++++++++++++++++++++++-
>>   1 file changed, 111 insertions(+), 1 deletion(-)
>>
>> diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c 
>> b/lib/librte_eal/linuxapp/eal/eal_dev.c
>> index 1cf6aeb..0de3fb7 100644
>> --- a/lib/librte_eal/linuxapp/eal/eal_dev.c
>> +++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
>> @@ -4,6 +4,8 @@
>>     #include <string.h>
>>   #include <unistd.h>
>> +#include <fcntl.h>
>> +#include <signal.h>
>>   #include <sys/socket.h>
>>   #include <linux/netlink.h>
>>   @@ -14,6 +16,10 @@
>>   #include <rte_malloc.h>
>>   #include <rte_interrupts.h>
>>   #include <rte_alarm.h>
>> +#include <rte_bus.h>
>> +#include <rte_eal.h>
>> +#include <rte_spinlock.h>
>> +#include <rte_errno.h>
>>     #include "eal_private.h"
>>   @@ -23,6 +29,16 @@ static bool monitor_started;
>>   #define EAL_UEV_MSG_LEN 4096
>>   #define EAL_UEV_MSG_ELEM_LEN 128
>>   +/*
>> + * spinlock for device failure process, protect the bus and the device
>> + * to avoid race condition.
>> + */
>> +static rte_spinlock_t dev_failure_lock = RTE_SPINLOCK_INITIALIZER;
>
> Sorry, it is still too vague why the lock is required. It is just generic
> words. Please, add details and describe circumstance when it is
> required.

let me check if i can do more for that.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v7 7/7] igb_uio: fix uio release issue when hot unplug
  2018-07-09 22:44           ` Stephen Hemminger
@ 2018-07-10  8:28             ` Jeff Guo
  0 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-07-10  8:28 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: bruce.richardson, ferruh.yigit, konstantin.ananyev, gaetan.rivet,
	jingjing.wu, thomas, motih, matan, harry.van.haaren, qi.z.zhang,
	shaopeng.he, bernard.iremonger, arybchenko, wenzhuo.lu, jblunck,
	shreyansh.jain, dev, helin.zhang

hi, stephen

Thanks for your review.

On 7/10/2018 6:44 AM, Stephen Hemminger wrote:
> On Mon,  9 Jul 2018 20:01:06 +0800
> Jeff Guo <jia.guo@intel.com> wrote:
>
>> @@ -195,12 +205,22 @@ igbuio_pci_irqhandler(int irq, void *dev_id)
>>   {
>>   	struct rte_uio_pci_dev *udev = (struct rte_uio_pci_dev *)dev_id;
>>   	struct uio_info *info = &udev->info;
>> +	struct pci_dev *pdev = udev->pdev;
>>   
>>   	/* Legacy mode need to mask in hardware */
>>   	if (udev->mode == RTE_INTR_MODE_LEGACY &&
>>   	    !pci_check_and_mask_intx(udev->pdev))
>>   		return IRQ_NONE;
>>   
>> +	mutex_lock(&udev->lock);
>> +	/* check the uevent of the kobj */
>> +	if ((&pdev->dev.kobj)->state_remove_uevent_sent == 1) {
>> +		dev_notice(&pdev->dev, "device:%s, sent remove uevent!\n",
>> +			   (&pdev->dev.kobj)->name);
>> +		udev->state = RTE_UDEV_UNPLUG;
>> +	}
>> +	mutex_unlock(&udev->lock);
> Did you run with lockdep?
> I don't think you can safely acquire a mutex (would sleep) in IRQ context.
I think lockdep could do me a favor about that,  but i think only the 
uio remove function will check the unplug status, so i think i could let 
this
check in the uio remove function, no need to let it in irq handler 
anymore, since like what you said acquire a mutex in IRQ context is no safe.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v7 5/7] bus: add helper to handle sigbus
  2018-07-10  8:22             ` Jeff Guo
@ 2018-07-10  8:40               ` Gaëtan Rivet
  2018-07-10 10:07                 ` Jeff Guo
  0 siblings, 1 reply; 494+ messages in thread
From: Gaëtan Rivet @ 2018-07-10  8:40 UTC (permalink / raw)
  To: Jeff Guo
  Cc: Andrew Rybchenko, stephen, bruce.richardson, ferruh.yigit,
	konstantin.ananyev, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	wenzhuo.lu, jblunck, shreyansh.jain, dev, helin.zhang

On Tue, Jul 10, 2018 at 04:22:23PM +0800, Jeff Guo wrote:
> 
> 
> On 7/9/2018 9:48 PM, Andrew Rybchenko wrote:
> > On 09.07.2018 15:01, Jeff Guo wrote:
> > > This patch aim to add a helper to iterate all buses to find the
> > > corresponding bus to handle the sigbus error.
> > > 
> > > Signed-off-by: Jeff Guo <jia.guo@intel.com>
> > > Acked-by: Shaopeng He <shaopeng.he@intel.com>
> > > ---
> > > v7->v6:
> > > no change
> > > ---
> > >   lib/librte_eal/common/eal_common_bus.c | 42
> > > ++++++++++++++++++++++++++++++++++
> > >   lib/librte_eal/common/eal_private.h    | 12 ++++++++++
> > >   2 files changed, 54 insertions(+)
> > > 
> > > diff --git a/lib/librte_eal/common/eal_common_bus.c
> > > b/lib/librte_eal/common/eal_common_bus.c
> > > index 0943851..8856adc 100644
> > > --- a/lib/librte_eal/common/eal_common_bus.c
> > > +++ b/lib/librte_eal/common/eal_common_bus.c
> > > @@ -37,6 +37,7 @@
> > >   #include <rte_bus.h>
> > >   #include <rte_debug.h>
> > >   #include <rte_string_fns.h>
> > > +#include <rte_errno.h>
> > >     #include "eal_private.h"
> > >   @@ -242,3 +243,44 @@ rte_bus_get_iommu_class(void)
> > >       }
> > >       return mode;
> > >   }
> > > +
> > > +static int
> > > +bus_handle_sigbus(const struct rte_bus *bus,
> > > +            const void *failure_addr)
> > > +{
> > > +    int ret;
> > > +
> > > +    if (!bus->sigbus_handler) {
> > > +        RTE_LOG(ERR, EAL, "Function sigbus_handler not supported by "
> > > +            "bus (%s)\n", bus->name);
> > 
> > It is not an error. It is OK that some buses cannot handle SIGBUS.
> > 
> 
> yes, it is.
> 
> > > +        return -1;
> > > +    }
> > > +
> > > +    ret = bus->sigbus_handler(failure_addr);
> > > +    rte_errno = ret;
> > > +
> > > +    return !(bus->sigbus_handler && ret <= 0);
> > 
> > There is no point to check bus->sigbus_handler here. It is already
> > checked above.
> > So, it should be just:
> >    return ret > 0;
> > I.e. we should continue search if the address is not handled by any
> > device
> > on the bus (we should stop if it is handled (ret==0) or failed to to
> > handle
> > (ret < 0)).
> > 
> 
> i will modify it, thanks.
> 

Why is rte_errno set here?
rte_errno is meant by the bus dev to be set on error. You do not have to
modify it.
ret would already be <0 on error.

At most, you could do something like:

if (ret < 0 && rte_errno == 0)
    rte_errno = ENOTSUP;

Or something akin, with a non-descriptive error hinting that the
developper didn't seem to care about setting errno to something
meaningful (so only partially respecting the API).

> > > +}
> > > +
> > > +int
> > > +rte_bus_sigbus_handler(const void *failure_addr)
> > > +{
> > > +    struct rte_bus *bus;
> > > +
> > > +    int ret = 0;
> > > +    int old_errno = rte_errno;
> > > +
> > > +    rte_errno = 0;
> > > +
> > > +    bus = rte_bus_find(NULL, bus_handle_sigbus, failure_addr);
> > > +    /* failed to handle the sigbus, pass the new errno. */
> > > +    if (!bus)
> > > +        ret = 1;
> > > +    else if (rte_errno == -1)
> > 
> > I'm still thinking it is bad to keep negative value in rte_errno here.
> > 
> 
> i think the rte_errno just no used for the caller if return -1. Since if
> find bus but process failed, will use rte_exit to process whatever the
> rte_errno value. Only return 1 means use the origin sigbus handler that will
> care about the errno.
> 

With the changes above, the check should be something like:

    if (bus == NULL)
        return 1;
    else if (rte_errno != 0)
        return -rte_errno;

    rte_errno = old_errno;
    return 0;

Which would avoid resetting rte_errno on top of whichever value a dev
would have used, and having it set to a negative non-errno value.

(Please do not just use this as-is, if you think this is not a good idea
just tell us why or how you would prefer to do it. I'm only proposing a
way that I think would work.)

Regards,
-- 
Gaëtan Rivet
6WIND

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v7 5/7] bus: add helper to handle sigbus
  2018-07-10  8:40               ` Gaëtan Rivet
@ 2018-07-10 10:07                 ` Jeff Guo
  0 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-07-10 10:07 UTC (permalink / raw)
  To: Gaëtan Rivet
  Cc: Andrew Rybchenko, stephen, bruce.richardson, ferruh.yigit,
	konstantin.ananyev, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	wenzhuo.lu, jblunck, shreyansh.jain, dev, helin.zhang



On 7/10/2018 4:40 PM, Gaëtan Rivet wrote:
> On Tue, Jul 10, 2018 at 04:22:23PM +0800, Jeff Guo wrote:
>>
>> On 7/9/2018 9:48 PM, Andrew Rybchenko wrote:
>>> On 09.07.2018 15:01, Jeff Guo wrote:
>>>> This patch aim to add a helper to iterate all buses to find the
>>>> corresponding bus to handle the sigbus error.
>>>>
>>>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
>>>> Acked-by: Shaopeng He <shaopeng.he@intel.com>
>>>> ---
>>>> v7->v6:
>>>> no change
>>>> ---
>>>>    lib/librte_eal/common/eal_common_bus.c | 42
>>>> ++++++++++++++++++++++++++++++++++
>>>>    lib/librte_eal/common/eal_private.h    | 12 ++++++++++
>>>>    2 files changed, 54 insertions(+)
>>>>
>>>> diff --git a/lib/librte_eal/common/eal_common_bus.c
>>>> b/lib/librte_eal/common/eal_common_bus.c
>>>> index 0943851..8856adc 100644
>>>> --- a/lib/librte_eal/common/eal_common_bus.c
>>>> +++ b/lib/librte_eal/common/eal_common_bus.c
>>>> @@ -37,6 +37,7 @@
>>>>    #include <rte_bus.h>
>>>>    #include <rte_debug.h>
>>>>    #include <rte_string_fns.h>
>>>> +#include <rte_errno.h>
>>>>      #include "eal_private.h"
>>>>    @@ -242,3 +243,44 @@ rte_bus_get_iommu_class(void)
>>>>        }
>>>>        return mode;
>>>>    }
>>>> +
>>>> +static int
>>>> +bus_handle_sigbus(const struct rte_bus *bus,
>>>> +            const void *failure_addr)
>>>> +{
>>>> +    int ret;
>>>> +
>>>> +    if (!bus->sigbus_handler) {
>>>> +        RTE_LOG(ERR, EAL, "Function sigbus_handler not supported by "
>>>> +            "bus (%s)\n", bus->name);
>>> It is not an error. It is OK that some buses cannot handle SIGBUS.
>>>
>> yes, it is.
>>
>>>> +        return -1;
>>>> +    }
>>>> +
>>>> +    ret = bus->sigbus_handler(failure_addr);
>>>> +    rte_errno = ret;
>>>> +
>>>> +    return !(bus->sigbus_handler && ret <= 0);
>>> There is no point to check bus->sigbus_handler here. It is already
>>> checked above.
>>> So, it should be just:
>>>     return ret > 0;
>>> I.e. we should continue search if the address is not handled by any
>>> device
>>> on the bus (we should stop if it is handled (ret==0) or failed to to
>>> handle
>>> (ret < 0)).
>>>
>> i will modify it, thanks.
>>
> Why is rte_errno set here?
> rte_errno is meant by the bus dev to be set on error. You do not have to
> modify it.
> ret would already be <0 on error.
>
> At most, you could do something like:
>
> if (ret < 0 && rte_errno == 0)
>      rte_errno = ENOTSUP;
>
> Or something akin, with a non-descriptive error hinting that the
> developper didn't seem to care about setting errno to something
> meaningful (so only partially respecting the API).

  the purpose to set rte_errno here is because of the status of the 
handle need to pass though to the function caller "rte_bus_sigbus_handler",
it could give a chance to check the searching status.

>>>> +}
>>>> +
>>>> +int
>>>> +rte_bus_sigbus_handler(const void *failure_addr)
>>>> +{
>>>> +    struct rte_bus *bus;
>>>> +
>>>> +    int ret = 0;
>>>> +    int old_errno = rte_errno;
>>>> +
>>>> +    rte_errno = 0;
>>>> +
>>>> +    bus = rte_bus_find(NULL, bus_handle_sigbus, failure_addr);
>>>> +    /* failed to handle the sigbus, pass the new errno. */
>>>> +    if (!bus)
>>>> +        ret = 1;
>>>> +    else if (rte_errno == -1)
>>> I'm still thinking it is bad to keep negative value in rte_errno here.
>>>
>> i think the rte_errno just no used for the caller if return -1. Since if
>> find bus but process failed, will use rte_exit to process whatever the
>> rte_errno value. Only return 1 means use the origin sigbus handler that will
>> care about the errno.
>>
> With the changes above, the check should be something like:
>
>      if (bus == NULL)
>          return 1;
>      else if (rte_errno != 0)
>          return -rte_errno;
>
>      rte_errno = old_errno;
>      return 0;
>
> Which would avoid resetting rte_errno on top of whichever value a dev
> would have used, and having it set to a negative non-errno value.
>
> (Please do not just use this as-is, if you think this is not a good idea
> just tell us why or how you would prefer to do it. I'm only proposing a
> way that I think would work.)
>
> Regards,

i think that is the problem to find a better way, i agree to maximum to 
keep the rte_errno should be make sense.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH v8 0/7] hotplug failure handle mechanism
  2017-06-29  4:37     ` [PATCH v3 0/2] add uevent api for hot plug Jeff Guo
                         ` (13 preceding siblings ...)
  2018-07-09 12:00       ` [PATCH v7 0/7] hotplug failure handle mechanism Jeff Guo
@ 2018-07-10 11:03       ` Jeff Guo
  2018-07-10 11:03         ` [PATCH v8 1/7] bus: add hotplug failure handler Jeff Guo
                           ` (6 more replies)
  2018-07-11 10:41       ` [PATCH v9 0/7] hotplug failure handle mechanism Jeff Guo
                         ` (8 subsequent siblings)
  23 siblings, 7 replies; 494+ messages in thread
From: Jeff Guo @ 2018-07-10 11:03 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

As we know, hot plug is an importance feature, either use for the datacenter
device’s fail-safe, or use for SRIOV Live Migration in SDN/NFV. It could bring
the higher flexibility and continuality to the networking services in multiple
use cases in industry. So let we see, dpdk as an importance networking
framework, what can it help to implement hot plug solution for users.

We already have a general device event detect mechanism, failsafe driver,
bonding driver and hot plug/unplug api in framework, app could use these to
develop their hot plug solution.

let’s see the case of hot unplug, it can happen when a hardware device is
be removed physically, or when the software disables it.  App need to call
ether dev API to detach the device, to unplug the device at the bus level and
make access to the device invalid. But the problem is that, the removal of the
device from the software lists is not going to be instantaneous, at this time
if the data(fast) path still read/write the device, it will cause MMIO error
and result of the app crash out.

Seems that we have got fail-safe driver(or app) + RTE_ETH_EVENT_INTR_RMV +
kernel core driver solution to handle it, but still not have failsafe driver
(or app) + RTE_DEV_EVENT_REMOVE + PCIe pmd driver failure handle solution. So
there is an absence in dpdk hot plug solution right now.

Also, we know that kernel only guaranty hot plug on the kernel side, but not for
the user mode side. Firstly we can hardly have a gatekeeper for any MMIO for
multiple PMD driver. Secondly, no more specific 3rd tools such as udev/driverctl
have especially cover these hot plug failure processing. Third, the feasibility
of app’s implement for multiple user mode PMD driver is still a problem. Here,
a general hot plug failure handle mechanism in dpdk framework would be proposed,
it aim to guaranty that, when hot unplug occur, the system will not crash and
app will not be break out, and user space can normally stop and release any
relevant resources, then unplug of the device at the bus level cleanly.

The mechanism should be come across as bellow:

Firstly, app enabled the device event monitor and register the hot plug event’s
callback before running data path. Once the hot unplug behave occur, the
mechanism will detect the removal event and then accordingly do the failure
handle. In order to do that, below functional will be bring in.
 - Add a new bus ops “handle_hot_unplug” to handle bus read/write error, it is
   bus-specific and each kind of bus can implement its own logic.
 - Implement pci bus specific ops “pci_handle_hot_unplug”. It will base on the
   failure address to remap memory for the corresponding device that unplugged.

For the data path or other unexpected control from the control path when hot
unplug occur.
 - Implement a new sigbus handler, it is registered when start device even
   monitoring. The handler is per process. Base on the signal event principle,
   control path thread and data path thread will randomly receive the sigbus
   error, but will go to the common sigbus handler. Once the MMIO sigbus error
   exposure, it will trigger the above hot unplug operation. The sigbus will be
   check if it is cause of the hot unplug or not, if not will info exception as
   the original sigbus handler. If yes, will do memory remapping.

For the control path and the igb uio release:
 - When hot unplug device, the kernel will release the device resource in the
   kernel side, such as the fd sys file will disappear, and the irq will be
   released. At this time, if igb uio driver still try to release this resource,
   it will cause kernel crash.
   On the other hand, something like interrupt disable do not automatically
   process in kernel side. If not handler it, this redundancy and dirty thing
   will affect the interrupt resource be used by other device.
   So the igb_uio driver have to check the hot plug status and corresponding
   process should be taken in igb uio deriver.
   This patch propose to add structure of rte_udev_state into rte_uio_pci_dev
   of igb_uio kernel driver, which will record the state of uio device, such as
   probed/opened/released/removed/unplug. When detect the unexpected removal
   which cause of hot unplug behavior, it will corresponding disable interrupt
   resource, while for the part of releasement which kernel have already handle,
   just skip it to avoid double free or null pointer kernel crash issue.

The mechanism could be use for fail-safe driver and app which want to use hot
plug solution. let testpmd for example:
 - Enable device event monitor->device unplug->failure handle->stop forwarding->
   stop port->close port->detach port.

This process will not breaking the app/fail-safe running, and will not break
other irrelevance device. And app could plug in the device and restart the date
path again by below.
 - Device plug in->bind igb_uio driver ->attached device->start port->
   start forwarding.

patchset history:
v8->v7:
refine errno process in sigbus handler.
refine igb uio release process

v7->v6:
delete some unused part

v6->v5:
refine some description about bus ops
refine commit log
add some entry check.

v5->v4:
split patches to focus on the failure handle, remove the event usage by testpmd
to another patch.
change the hotplug failure handler name
refine the sigbus handle logic
add lock for udev state in igb uio driver

v4->v3:
split patches to be small and clear
change to use new parameter "--hotplug-mode" in testpmd
to identify the eal hotplug and ethdev hotplug

v3->v2:
change bus ops name to bus_hotplug_handler.
add new API and bus ops of bus_signal_handler
distingush handle generic sigbus and hotplug sigbus

v2->v1(v21):
refine some doc and commit log
fix igb uio kernel issue for control path failure
rebase testpmd code

Since the hot plug solution be discussed serval around in the public, the
scope be changed and the patch set be split into many times. Coming to the
recently RFC and feature design, it just focus on the hot unplug failure
handler at this patch set, so in order let this topic more clear and focus,
summarize privours patch set in history “v1(v21)”, the v2 here go ahead
for further track.

"v1(21)" == v21 as below:
v21->v20:
split function in hot unplug ops
sync failure hanlde to fix multiple process issue fix attach port issue for multiple devices case.
combind rmv callback function to be only one.

v20->v19:
clean the code
refine the remap logic for multiple device.
remove the auto binding

v19->18:
note for limitation of multiple hotplug,fix some typo, sqeeze patch.

v18->v15:
add document, add signal bus handler, refine the code to be more clear.

the prior patch history please check the patch set "add device event monitor framework".

Jeff Guo (7):
  bus: add hotplug failure handler
  bus/pci: implement hotplug failure handler ops
  bus: add sigbus handler
  bus/pci: implement sigbus handler operation
  bus: add helper to handle sigbus
  eal: add failure handle mechanism for hotplug
  igb_uio: fix uio release issue for hotplug

 drivers/bus/pci/pci_common.c            |  77 ++++++++++++++++++++++
 drivers/bus/pci/pci_common_uio.c        |  33 ++++++++++
 drivers/bus/pci/private.h               |  12 ++++
 kernel/linux/igb_uio/igb_uio.c          |  69 +++++++++++++++----
 lib/librte_eal/common/eal_common_bus.c  |  43 ++++++++++++
 lib/librte_eal/common/eal_private.h     |  12 ++++
 lib/librte_eal/common/include/rte_bus.h |  33 ++++++++++
 lib/librte_eal/linuxapp/eal/eal_dev.c   | 113 +++++++++++++++++++++++++++++++-
 8 files changed, 377 insertions(+), 15 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH v8 1/7] bus: add hotplug failure handler
  2018-07-10 11:03       ` [PATCH v8 0/7] hotplug failure handle mechanism Jeff Guo
@ 2018-07-10 11:03         ` Jeff Guo
  2018-07-10 11:03         ` [PATCH v8 2/7] bus/pci: implement hotplug failure handler ops Jeff Guo
                           ` (5 subsequent siblings)
  6 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-07-10 11:03 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

When device be hotplug out, if app still continue to access device by mmio,
it will cause of memory failure and result the system crash.

This patch introduces a bus ops to handle device hotplug failure, it is a
bus specific behavior, so each kind of bus can implement its own logic case
by case.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Shaopeng He <shaopeng.he@intel.com>
---
v8->v7:
no change
---
 lib/librte_eal/common/include/rte_bus.h | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
index eb9eded..e3a55a8 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -168,6 +168,20 @@ typedef int (*rte_bus_unplug_t)(struct rte_device *dev);
 typedef int (*rte_bus_parse_t)(const char *name, void *addr);
 
 /**
+ * Implementation a specific hotplug failure handler, which is responsible
+ * for handle the failure when the device be hotplug out from the bus. When
+ * hotplug removal event be detected, it could call this function to handle
+ * failure and guaranty the system would not crash in the case.
+ * @param dev
+ *	Pointer of the device structure.
+ *
+ * @return
+ *	0 on success.
+ *	!0 on error.
+ */
+typedef int (*rte_bus_hotplug_failure_handler_t)(struct rte_device *dev);
+
+/**
  * Bus scan policies
  */
 enum rte_bus_scan_mode {
@@ -211,6 +225,8 @@ struct rte_bus {
 	rte_bus_parse_t parse;       /**< Parse a device name */
 	struct rte_bus_conf conf;    /**< Bus configuration */
 	rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
+	rte_bus_hotplug_failure_handler_t hotplug_failure_handler;
+					/**< handle hotplug failure on bus */
 };
 
 /**
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v8 2/7] bus/pci: implement hotplug failure handler ops
  2018-07-10 11:03       ` [PATCH v8 0/7] hotplug failure handle mechanism Jeff Guo
  2018-07-10 11:03         ` [PATCH v8 1/7] bus: add hotplug failure handler Jeff Guo
@ 2018-07-10 11:03         ` Jeff Guo
  2018-07-10 11:03         ` [PATCH v8 3/7] bus: add sigbus handler Jeff Guo
                           ` (4 subsequent siblings)
  6 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-07-10 11:03 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

This patch implements the ops of hotplug failure handler for PCI bus,
it is functional to remap a new dummy memory which overlap to the
failure memory to avoid MMIO read/write error.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Shaopeng He <shaopeng.he@intel.com>
---
v8->v7:
no change
---
 drivers/bus/pci/pci_common.c     | 28 ++++++++++++++++++++++++++++
 drivers/bus/pci/pci_common_uio.c | 33 +++++++++++++++++++++++++++++++++
 drivers/bus/pci/private.h        | 12 ++++++++++++
 3 files changed, 73 insertions(+)

diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
index 94b0f41..d7abe6c 100644
--- a/drivers/bus/pci/pci_common.c
+++ b/drivers/bus/pci/pci_common.c
@@ -408,6 +408,33 @@ pci_find_device(const struct rte_device *start, rte_dev_cmp_t cmp,
 }
 
 static int
+pci_hotplug_failure_handler(struct rte_device *dev)
+{
+	struct rte_pci_device *pdev = NULL;
+	int ret = 0;
+
+	pdev = RTE_DEV_TO_PCI(dev);
+	if (!pdev)
+		return -1;
+
+	switch (pdev->kdrv) {
+	case RTE_KDRV_IGB_UIO:
+	case RTE_KDRV_UIO_GENERIC:
+	case RTE_KDRV_NIC_UIO:
+		/* mmio resource is invalid, remap it to be safe. */
+		ret = pci_uio_remap_resource(pdev);
+		break;
+	default:
+		RTE_LOG(DEBUG, EAL,
+			"Not managed by a supported kernel driver, skipped\n");
+		ret = -1;
+		break;
+	}
+
+	return ret;
+}
+
+static int
 pci_plug(struct rte_device *dev)
 {
 	return pci_probe_all_drivers(RTE_DEV_TO_PCI(dev));
@@ -437,6 +464,7 @@ struct rte_pci_bus rte_pci_bus = {
 		.unplug = pci_unplug,
 		.parse = pci_parse,
 		.get_iommu_class = rte_pci_get_iommu_class,
+		.hotplug_failure_handler = pci_hotplug_failure_handler,
 	},
 	.device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
 	.driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
diff --git a/drivers/bus/pci/pci_common_uio.c b/drivers/bus/pci/pci_common_uio.c
index 54bc20b..7ea73db 100644
--- a/drivers/bus/pci/pci_common_uio.c
+++ b/drivers/bus/pci/pci_common_uio.c
@@ -146,6 +146,39 @@ pci_uio_unmap(struct mapped_pci_resource *uio_res)
 	}
 }
 
+/* remap the PCI resource of a PCI device in anonymous virtual memory */
+int
+pci_uio_remap_resource(struct rte_pci_device *dev)
+{
+	int i;
+	void *map_address;
+
+	if (dev == NULL)
+		return -1;
+
+	/* Remap all BARs */
+	for (i = 0; i != PCI_MAX_RESOURCE; i++) {
+		/* skip empty BAR */
+		if (dev->mem_resource[i].phys_addr == 0)
+			continue;
+		map_address = mmap(dev->mem_resource[i].addr,
+				(size_t)dev->mem_resource[i].len,
+				PROT_READ | PROT_WRITE,
+				MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
+		if (map_address == MAP_FAILED) {
+			RTE_LOG(ERR, EAL,
+				"Cannot remap resource for device %s\n",
+				dev->name);
+			return -1;
+		}
+		RTE_LOG(INFO, EAL,
+			"Successful remap resource for device %s\n",
+			dev->name);
+	}
+
+	return 0;
+}
+
 static struct mapped_pci_resource *
 pci_uio_find_resource(struct rte_pci_device *dev)
 {
diff --git a/drivers/bus/pci/private.h b/drivers/bus/pci/private.h
index 8ddd03e..6b312e5 100644
--- a/drivers/bus/pci/private.h
+++ b/drivers/bus/pci/private.h
@@ -123,6 +123,18 @@ void pci_uio_free_resource(struct rte_pci_device *dev,
 		struct mapped_pci_resource *uio_res);
 
 /**
+ * Remap the PCI resource of a PCI device in anonymous virtual memory.
+ *
+ * @param dev
+ *   Point to the struct rte pci device.
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+pci_uio_remap_resource(struct rte_pci_device *dev);
+
+/**
  * Map device memory to uio resource
  *
  * This function is private to EAL.
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v8 3/7] bus: add sigbus handler
  2018-07-10 11:03       ` [PATCH v8 0/7] hotplug failure handle mechanism Jeff Guo
  2018-07-10 11:03         ` [PATCH v8 1/7] bus: add hotplug failure handler Jeff Guo
  2018-07-10 11:03         ` [PATCH v8 2/7] bus/pci: implement hotplug failure handler ops Jeff Guo
@ 2018-07-10 11:03         ` Jeff Guo
  2018-07-10 11:03         ` [PATCH v8 4/7] bus/pci: implement sigbus handler operation Jeff Guo
                           ` (3 subsequent siblings)
  6 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-07-10 11:03 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

When device be hotplug out, if data path still read/write device, the
sigbus error will occur, this error need to be handled. So a handler
need to be here to capture the signal and handle it correspondingly.

This patch introduces a bus ops to handle sigbus error, it is a bus
specific behavior, so that each kind of bus can implement its own logic
case by case.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Shaopeng He <shaopeng.he@intel.com>
---
v8->v7:
no change
---
 lib/librte_eal/common/include/rte_bus.h | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
index e3a55a8..216ad1e 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -182,6 +182,21 @@ typedef int (*rte_bus_parse_t)(const char *name, void *addr);
 typedef int (*rte_bus_hotplug_failure_handler_t)(struct rte_device *dev);
 
 /**
+ * Implementation a specific sigbus handler, which is responsible for handle
+ * the sigbus error which is either original memory error, or specific memory
+ * error that caused of hot unplug. When sigbus error be captured, it could
+ * call this function to handle sigbus error.
+ * @param failure_addr
+ *	Pointer of the fault address of the sigbus error.
+ *
+ * @return
+ *	0 for success handle the sigbus.
+ *	1 for no bus handle the sigbus.
+ *	-1 for failed to handle the sigbus
+ */
+typedef int (*rte_bus_sigbus_handler_t)(const void *failure_addr);
+
+/**
  * Bus scan policies
  */
 enum rte_bus_scan_mode {
@@ -227,6 +242,8 @@ struct rte_bus {
 	rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
 	rte_bus_hotplug_failure_handler_t hotplug_failure_handler;
 					/**< handle hotplug failure on bus */
+	rte_bus_sigbus_handler_t sigbus_handler; /**< handle sigbus error */
+
 };
 
 /**
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v8 4/7] bus/pci: implement sigbus handler operation
  2018-07-10 11:03       ` [PATCH v8 0/7] hotplug failure handle mechanism Jeff Guo
                           ` (2 preceding siblings ...)
  2018-07-10 11:03         ` [PATCH v8 3/7] bus: add sigbus handler Jeff Guo
@ 2018-07-10 11:03         ` Jeff Guo
  2018-07-10 11:03         ` [PATCH v8 5/7] bus: add helper to handle sigbus Jeff Guo
                           ` (2 subsequent siblings)
  6 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-07-10 11:03 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

This patch implements the ops of sigbus handler for PCI bus, it is
functional to find the corresponding pci device which is been hotplug
out, and then call the bus ops of hotplug failure handler to handle
the failure for the device.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Shaopeng He <shaopeng.he@intel.com>
---
v8->v7:
no change
---
 drivers/bus/pci/pci_common.c | 49 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 49 insertions(+)

diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
index d7abe6c..37ad266 100644
--- a/drivers/bus/pci/pci_common.c
+++ b/drivers/bus/pci/pci_common.c
@@ -407,6 +407,32 @@ pci_find_device(const struct rte_device *start, rte_dev_cmp_t cmp,
 	return NULL;
 }
 
+/* check the failure address belongs to which device. */
+static struct rte_pci_device *
+pci_find_device_by_addr(const void *failure_addr)
+{
+	struct rte_pci_device *pdev = NULL;
+	int i;
+
+	FOREACH_DEVICE_ON_PCIBUS(pdev) {
+		for (i = 0; i != RTE_DIM(pdev->mem_resource); i++) {
+			if ((uint64_t)(uintptr_t)failure_addr >=
+			    (uint64_t)(uintptr_t)pdev->mem_resource[i].addr &&
+			    (uint64_t)(uintptr_t)failure_addr <
+			    (uint64_t)(uintptr_t)pdev->mem_resource[i].addr +
+			    pdev->mem_resource[i].len) {
+				RTE_LOG(INFO, EAL, "Failure address "
+					"%16.16"PRIx64" belongs to "
+					"device %s!\n",
+					(uint64_t)(uintptr_t)failure_addr,
+					pdev->device.name);
+				return pdev;
+			}
+		}
+	}
+	return NULL;
+}
+
 static int
 pci_hotplug_failure_handler(struct rte_device *dev)
 {
@@ -435,6 +461,28 @@ pci_hotplug_failure_handler(struct rte_device *dev)
 }
 
 static int
+pci_sigbus_handler(const void *failure_addr)
+{
+	struct rte_pci_device *pdev = NULL;
+	int ret = 0;
+
+	pdev = pci_find_device_by_addr(failure_addr);
+	if (!pdev) {
+		/* It is a generic sigbus error, no bus would handle it. */
+		ret = 1;
+	} else {
+		/* The sigbus error is caused of hot removal. */
+		ret = pci_hotplug_failure_handler(&pdev->device);
+		if (ret) {
+			RTE_LOG(ERR, EAL, "Failed to handle hot plug for "
+				"device %s", pdev->name);
+			ret = -1;
+		}
+	}
+	return ret;
+}
+
+static int
 pci_plug(struct rte_device *dev)
 {
 	return pci_probe_all_drivers(RTE_DEV_TO_PCI(dev));
@@ -465,6 +513,7 @@ struct rte_pci_bus rte_pci_bus = {
 		.parse = pci_parse,
 		.get_iommu_class = rte_pci_get_iommu_class,
 		.hotplug_failure_handler = pci_hotplug_failure_handler,
+		.sigbus_handler = pci_sigbus_handler,
 	},
 	.device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
 	.driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v8 5/7] bus: add helper to handle sigbus
  2018-07-10 11:03       ` [PATCH v8 0/7] hotplug failure handle mechanism Jeff Guo
                           ` (3 preceding siblings ...)
  2018-07-10 11:03         ` [PATCH v8 4/7] bus/pci: implement sigbus handler operation Jeff Guo
@ 2018-07-10 11:03         ` Jeff Guo
  2018-07-10 11:03         ` [PATCH v8 6/7] eal: add failure handle mechanism for hotplug Jeff Guo
  2018-07-10 11:03         ` [PATCH v8 7/7] igb_uio: fix uio release issue " Jeff Guo
  6 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-07-10 11:03 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

This patch aim to add a helper to iterate all buses to find the
corresponding bus to handle the sigbus error.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Shaopeng He <shaopeng.he@intel.com>
---
v8->v7:
refine errno process in sigbus handler
---
 lib/librte_eal/common/eal_common_bus.c | 43 ++++++++++++++++++++++++++++++++++
 lib/librte_eal/common/eal_private.h    | 12 ++++++++++
 2 files changed, 55 insertions(+)

diff --git a/lib/librte_eal/common/eal_common_bus.c b/lib/librte_eal/common/eal_common_bus.c
index 0943851..62b7318 100644
--- a/lib/librte_eal/common/eal_common_bus.c
+++ b/lib/librte_eal/common/eal_common_bus.c
@@ -37,6 +37,7 @@
 #include <rte_bus.h>
 #include <rte_debug.h>
 #include <rte_string_fns.h>
+#include <rte_errno.h>
 
 #include "eal_private.h"
 
@@ -242,3 +243,45 @@ rte_bus_get_iommu_class(void)
 	}
 	return mode;
 }
+
+static int
+bus_handle_sigbus(const struct rte_bus *bus,
+			const void *failure_addr)
+{
+	int ret;
+
+	if (!bus->sigbus_handler)
+		return -1;
+
+	ret = bus->sigbus_handler(failure_addr);
+
+	/* find bus but handle failed, keep the errno be set. */
+	if (ret < 0 && rte_errno == 0)
+		rte_errno = ENOTSUP;
+
+	return ret > 0;
+}
+
+int
+rte_bus_sigbus_handler(const void *failure_addr)
+{
+	struct rte_bus *bus;
+
+	int ret = 0;
+	int old_errno = rte_errno;
+
+	rte_errno = 0;
+
+	bus = rte_bus_find(NULL, bus_handle_sigbus, failure_addr);
+	/* can not find bus. */
+	if (!bus)
+		return 1;
+	/* find bus but handle failed, pass on the new errno. */
+	else if (rte_errno != 0)
+		return -1;
+
+	/* restore the old errno. */
+	rte_errno = old_errno;
+
+	return ret;
+}
diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h
index bdadc4d..2337e71 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -258,4 +258,16 @@ int rte_mp_channel_init(void);
  */
 void dev_callback_process(char *device_name, enum rte_dev_event_type event);
 
+/**
+ * Iterate all buses to find the corresponding bus, to handle the sigbus error.
+ * @param failure_addr
+ *	Pointer of the fault address of the sigbus error.
+ *
+ * @return
+ *	 0 success to handle the sigbus.
+ *	-1 failed to handle the sigbus
+ *	 1 no bus can handler the sigbus
+ */
+int rte_bus_sigbus_handler(const void *failure_addr);
+
 #endif /* _EAL_PRIVATE_H_ */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v8 6/7] eal: add failure handle mechanism for hotplug
  2018-07-10 11:03       ` [PATCH v8 0/7] hotplug failure handle mechanism Jeff Guo
                           ` (4 preceding siblings ...)
  2018-07-10 11:03         ` [PATCH v8 5/7] bus: add helper to handle sigbus Jeff Guo
@ 2018-07-10 11:03         ` Jeff Guo
  2018-07-10 11:03         ` [PATCH v8 7/7] igb_uio: fix uio release issue " Jeff Guo
  6 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-07-10 11:03 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

This patch introduces a failure handle mechanism to handle device
hotplug removal event.

First it can register sigbus handler when enable device event monitor. Once
sigbus error be captured, it will check the failure address and accordingly
remap the invalid memory for the corresponding device. Besed on this
mechanism, it could guaranty the application not crash when the device be
hotplug out.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Shaopeng He <shaopeng.he@intel.com>
---
v8->v7:
refine sigbus handle
---
 lib/librte_eal/linuxapp/eal/eal_dev.c | 113 +++++++++++++++++++++++++++++++++-
 1 file changed, 112 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
index 1cf6aeb..1643b33 100644
--- a/lib/librte_eal/linuxapp/eal/eal_dev.c
+++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
@@ -4,6 +4,8 @@
 
 #include <string.h>
 #include <unistd.h>
+#include <fcntl.h>
+#include <signal.h>
 #include <sys/socket.h>
 #include <linux/netlink.h>
 
@@ -14,6 +16,10 @@
 #include <rte_malloc.h>
 #include <rte_interrupts.h>
 #include <rte_alarm.h>
+#include <rte_bus.h>
+#include <rte_eal.h>
+#include <rte_spinlock.h>
+#include <rte_errno.h>
 
 #include "eal_private.h"
 
@@ -23,6 +29,17 @@ static bool monitor_started;
 #define EAL_UEV_MSG_LEN 4096
 #define EAL_UEV_MSG_ELEM_LEN 128
 
+/*
+ * spinlock for device failure handle, if try to access bus or device,
+ * such as handle sigbus on bus or handle memory failure for device just use
+ * this lock. It could protect the bus and the device to avoid race condition.
+ */
+static rte_spinlock_t failure_handle_lock = RTE_SPINLOCK_INITIALIZER;
+
+static struct sigaction sigbus_action_old;
+
+static int sigbus_need_recover;
+
 static void dev_uev_handler(__rte_unused void *param);
 
 /* identify the system layer which reports this event. */
@@ -33,6 +50,49 @@ enum eal_dev_event_subsystem {
 	EAL_DEV_EVENT_SUBSYSTEM_MAX
 };
 
+static void
+sigbus_action_recover(void)
+{
+	if (sigbus_need_recover) {
+		sigaction(SIGBUS, &sigbus_action_old, NULL);
+		sigbus_need_recover = 0;
+	}
+}
+
+static void sigbus_handler(int signum, siginfo_t *info,
+				void *ctx __rte_unused)
+{
+	int ret;
+
+	RTE_LOG(INFO, EAL, "Thread[%d] catch SIGBUS, fault address:%p\n",
+		(int)pthread_self(), info->si_addr);
+
+	rte_spinlock_lock(&failure_handle_lock);
+	ret = rte_bus_sigbus_handler(info->si_addr);
+	rte_spinlock_unlock(&failure_handle_lock);
+	if (ret == -1) {
+		rte_exit(EXIT_FAILURE,
+			 "Failed to handle SIGBUS for hotplug, "
+			 "(rte_errno: %s)!", strerror(rte_errno));
+	} else if (ret == 1) {
+		if (sigbus_action_old.sa_handler)
+			(*(sigbus_action_old.sa_handler))(signum);
+		else
+			rte_exit(EXIT_FAILURE,
+				 "Failed to handle generic SIGBUS!");
+	}
+
+	RTE_LOG(INFO, EAL, "Success to handle SIGBUS for hotplug!\n");
+}
+
+static int cmp_dev_name(const struct rte_device *dev,
+	const void *_name)
+{
+	const char *name = _name;
+
+	return strcmp(dev->name, name);
+}
+
 static int
 dev_uev_socket_fd_create(void)
 {
@@ -147,6 +207,9 @@ dev_uev_handler(__rte_unused void *param)
 	struct rte_dev_event uevent;
 	int ret;
 	char buf[EAL_UEV_MSG_LEN];
+	struct rte_bus *bus;
+	struct rte_device *dev;
+	const char *busname = "";
 
 	memset(&uevent, 0, sizeof(struct rte_dev_event));
 	memset(buf, 0, EAL_UEV_MSG_LEN);
@@ -171,13 +234,50 @@ dev_uev_handler(__rte_unused void *param)
 	RTE_LOG(DEBUG, EAL, "receive uevent(name:%s, type:%d, subsystem:%d)\n",
 		uevent.devname, uevent.type, uevent.subsystem);
 
-	if (uevent.devname)
+	switch (uevent.subsystem) {
+	case EAL_DEV_EVENT_SUBSYSTEM_PCI:
+	case EAL_DEV_EVENT_SUBSYSTEM_UIO:
+		busname = "pci";
+		break;
+	default:
+		break;
+	}
+
+	if (uevent.devname) {
+		if (uevent.type == RTE_DEV_EVENT_REMOVE) {
+			rte_spinlock_lock(&failure_handle_lock);
+			bus = rte_bus_find_by_name(busname);
+			if (bus == NULL) {
+				RTE_LOG(ERR, EAL, "Cannot find bus (%s)\n",
+					busname);
+				return;
+			}
+
+			dev = bus->find_device(NULL, cmp_dev_name,
+					       uevent.devname);
+			if (dev == NULL) {
+				RTE_LOG(ERR, EAL, "Cannot find device (%s) on "
+					"bus (%s)\n", uevent.devname, busname);
+				return;
+			}
+
+			ret = bus->hotplug_failure_handler(dev);
+			rte_spinlock_unlock(&failure_handle_lock);
+			if (ret) {
+				RTE_LOG(ERR, EAL, "Can not handle hotplug for "
+					"device (%s)\n", dev->name);
+				return;
+			}
+		}
 		dev_callback_process(uevent.devname, uevent.type);
+	}
 }
 
 int __rte_experimental
 rte_dev_event_monitor_start(void)
 {
+	sigset_t mask;
+	struct sigaction action;
 	int ret;
 
 	if (monitor_started)
@@ -197,6 +297,14 @@ rte_dev_event_monitor_start(void)
 		return -1;
 	}
 
+	/* register sigbus handler */
+	sigemptyset(&mask);
+	sigaddset(&mask, SIGBUS);
+	action.sa_flags = SA_SIGINFO;
+	action.sa_mask = mask;
+	action.sa_sigaction = sigbus_handler;
+	sigbus_need_recover = !sigaction(SIGBUS, &action, &sigbus_action_old);
+
 	monitor_started = true;
 
 	return 0;
@@ -217,8 +325,11 @@ rte_dev_event_monitor_stop(void)
 		return ret;
 	}
 
+	sigbus_action_recover();
+
 	close(intr_handle.fd);
 	intr_handle.fd = -1;
 	monitor_started = false;
+
 	return 0;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v8 7/7] igb_uio: fix uio release issue for hotplug
  2018-07-10 11:03       ` [PATCH v8 0/7] hotplug failure handle mechanism Jeff Guo
                           ` (5 preceding siblings ...)
  2018-07-10 11:03         ` [PATCH v8 6/7] eal: add failure handle mechanism for hotplug Jeff Guo
@ 2018-07-10 11:03         ` Jeff Guo
  2018-07-10 21:48           ` Stephen Hemminger
  2018-07-10 21:52           ` Stephen Hemminger
  6 siblings, 2 replies; 494+ messages in thread
From: Jeff Guo @ 2018-07-10 11:03 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

When hotplug out device, the device resource will be released in kernel.
The fd sys file will disappear, and the irq will be released. At this time,
if igb uio driver still try to release this resource, it will cause kernel
crash. On the other hand, interrupt disabling do not automatically be
processed in kernel. If not handle it, this redundancy and dirty thing will
affect the interrupt resource be used by other device. So the igb_uio
driver have to check the hotplug status, and the corresponding process
should be taken in igb uio driver.

This patch propose to add enum rte_udev_state into struct rte_uio_pci_dev
of igb uio driver, which will record the state of uio device, such as
probed/opened/released/removed. When detect the unexpected removal which
cause of hotplug out behavior, it will corresponding disable interrupt
resource. For the part of releasement which kernel have already handle,
just skip it to avoid double free or null pointer crash issue.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v8->v7:
change enum of udev state, refine code to release udev resource
---
 kernel/linux/igb_uio/igb_uio.c | 69 +++++++++++++++++++++++++++++++++---------
 1 file changed, 55 insertions(+), 14 deletions(-)

diff --git a/kernel/linux/igb_uio/igb_uio.c b/kernel/linux/igb_uio/igb_uio.c
index 3398eac..d126371 100644
--- a/kernel/linux/igb_uio/igb_uio.c
+++ b/kernel/linux/igb_uio/igb_uio.c
@@ -19,6 +19,14 @@
 
 #include "compat.h"
 
+/* uio pci device state */
+enum rte_udev_state {
+	RTE_UDEV_PROBED,
+	RTE_UDEV_OPENNED,
+	RTE_UDEV_RELEASED,
+	RTE_UDEV_REMOVED,
+};
+
 /**
  * A structure describing the private information for a uio device.
  */
@@ -28,6 +36,7 @@ struct rte_uio_pci_dev {
 	enum rte_intr_mode mode;
 	struct mutex lock;
 	int refcnt;
+	enum rte_udev_state state;
 };
 
 static int wc_activate;
@@ -309,6 +318,17 @@ igbuio_pci_disable_interrupts(struct rte_uio_pci_dev *udev)
 #endif
 }
 
+/* Unmap previously ioremap'd resources */
+static void
+igbuio_pci_release_iomem(struct uio_info *info)
+{
+	int i;
+
+	for (i = 0; i < MAX_UIO_MAPS; i++) {
+		if (info->mem[i].internal_addr)
+			iounmap(info->mem[i].internal_addr);
+	}
+}
 
 /**
  * This gets called while opening uio device file.
@@ -331,20 +351,35 @@ igbuio_pci_open(struct uio_info *info, struct inode *inode)
 
 	/* enable interrupts */
 	err = igbuio_pci_enable_interrupts(udev);
-	mutex_unlock(&udev->lock);
 	if (err) {
 		dev_err(&dev->dev, "Enable interrupt fails\n");
+		pci_clear_master(dev);
+		mutex_unlock(&udev->lock);
 		return err;
 	}
+	udev->state = RTE_UDEV_OPENNED;
+	mutex_unlock(&udev->lock);
 	return 0;
 }
 
+/**
+ * This gets called while closing uio device file.
+ */
 static int
 igbuio_pci_release(struct uio_info *info, struct inode *inode)
 {
 	struct rte_uio_pci_dev *udev = info->priv;
 	struct pci_dev *dev = udev->pdev;
 
+	if (udev->state == RTE_UDEV_REMOVED) {
+		mutex_destroy(&udev->lock);
+		igbuio_pci_release_iomem(&udev->info);
+		pci_disable_device(dev);
+		pci_set_drvdata(dev, NULL);
+		kfree(udev);
+		return 0;
+	}
+
 	mutex_lock(&udev->lock);
 	if (--udev->refcnt > 0) {
 		mutex_unlock(&udev->lock);
@@ -356,7 +391,7 @@ igbuio_pci_release(struct uio_info *info, struct inode *inode)
 
 	/* stop the device from further DMA */
 	pci_clear_master(dev);
-
+	udev->state = RTE_UDEV_RELEASED;
 	mutex_unlock(&udev->lock);
 	return 0;
 }
@@ -414,18 +449,6 @@ igbuio_pci_setup_ioport(struct pci_dev *dev, struct uio_info *info,
 	return 0;
 }
 
-/* Unmap previously ioremap'd resources */
-static void
-igbuio_pci_release_iomem(struct uio_info *info)
-{
-	int i;
-
-	for (i = 0; i < MAX_UIO_MAPS; i++) {
-		if (info->mem[i].internal_addr)
-			iounmap(info->mem[i].internal_addr);
-	}
-}
-
 static int
 igbuio_setup_bars(struct pci_dev *dev, struct uio_info *info)
 {
@@ -562,6 +585,9 @@ igbuio_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
 			 (unsigned long long)map_dma_addr, map_addr);
 	}
 
+	mutex_lock(&udev->lock);
+	udev->state = RTE_UDEV_PROBED;
+	mutex_unlock(&udev->lock);
 	return 0;
 
 fail_remove_group:
@@ -579,6 +605,21 @@ static void
 igbuio_pci_remove(struct pci_dev *dev)
 {
 	struct rte_uio_pci_dev *udev = pci_get_drvdata(dev);
+	struct pci_dev *pdev = udev->pdev;
+	int ret;
+
+	/* handle unexpected removal */
+	if (udev->state == RTE_UDEV_OPENNED ||
+	    (&pdev->dev.kobj)->state_remove_uevent_sent == 1) {
+		dev_notice(&dev->dev, "Unexpected removal!\n");
+		ret = igbuio_pci_release(&udev->info, NULL);
+		if (ret)
+			return;
+		mutex_lock(&udev->lock);
+		udev->state = RTE_UDEV_REMOVED;
+		mutex_unlock(&udev->lock);
+		return;
+	}
 
 	mutex_destroy(&udev->lock);
 	sysfs_remove_group(&dev->dev.kobj, &dev_attr_grp);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* Re: [PATCH v8 7/7] igb_uio: fix uio release issue for hotplug
  2018-07-10 11:03         ` [PATCH v8 7/7] igb_uio: fix uio release issue " Jeff Guo
@ 2018-07-10 21:48           ` Stephen Hemminger
  2018-07-11  3:10             ` Jeff Guo
  2018-07-10 21:52           ` Stephen Hemminger
  1 sibling, 1 reply; 494+ messages in thread
From: Stephen Hemminger @ 2018-07-10 21:48 UTC (permalink / raw)
  To: Jeff Guo
  Cc: bruce.richardson, ferruh.yigit, konstantin.ananyev, gaetan.rivet,
	jingjing.wu, thomas, motih, matan, harry.van.haaren, qi.z.zhang,
	shaopeng.he, bernard.iremonger, arybchenko, wenzhuo.lu, jblunck,
	shreyansh.jain, dev, helin.zhang

On Tue, 10 Jul 2018 19:03:27 +0800
Jeff Guo <jia.guo@intel.com> wrote:

>  
> +/* uio pci device state */
> +enum rte_udev_state {
> +	RTE_UDEV_PROBED,
> +	RTE_UDEV_OPENNED,
> +	RTE_UDEV_RELEASED,
> +	RTE_UDEV_REMOVED,
> +};
> +

The states here are a little confusing. especially since pci_release
seems to take different actions based on the state. And there is nothing
preventing races between unexpected removal (PCI), and removing the
device from being used by igb_uio.

Would it be possible to only use state variable from the kernel PCI
layer where the value is consistent?

Also there is refcounting in PCI layer (and locking). Could that
be used instead?

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v8 7/7] igb_uio: fix uio release issue for hotplug
  2018-07-10 11:03         ` [PATCH v8 7/7] igb_uio: fix uio release issue " Jeff Guo
  2018-07-10 21:48           ` Stephen Hemminger
@ 2018-07-10 21:52           ` Stephen Hemminger
  2018-07-11  2:46             ` Jeff Guo
  1 sibling, 1 reply; 494+ messages in thread
From: Stephen Hemminger @ 2018-07-10 21:52 UTC (permalink / raw)
  To: Jeff Guo
  Cc: bruce.richardson, ferruh.yigit, konstantin.ananyev, gaetan.rivet,
	jingjing.wu, thomas, motih, matan, harry.van.haaren, qi.z.zhang,
	shaopeng.he, bernard.iremonger, arybchenko, wenzhuo.lu, jblunck,
	shreyansh.jain, dev, helin.zhang

On Tue, 10 Jul 2018 19:03:27 +0800
Jeff Guo <jia.guo@intel.com> wrote:

> When hotplug out device, the device resource will be released in kernel.
> The fd sys file will disappear, and the irq will be released. At this time,
> if igb uio driver still try to release this resource, it will cause kernel
> crash. On the other hand, interrupt disabling do not automatically be
> processed in kernel. If not handle it, this redundancy and dirty thing will
> affect the interrupt resource be used by other device. So the igb_uio
> driver have to check the hotplug status, and the corresponding process
> should be taken in igb uio driver.
> 
> This patch propose to add enum rte_udev_state into struct rte_uio_pci_dev
> of igb uio driver, which will record the state of uio device, such as
> probed/opened/released/removed. When detect the unexpected removal which
> cause of hotplug out behavior, it will corresponding disable interrupt
> resource. For the part of releasement which kernel have already handle,
> just skip it to avoid double free or null pointer crash issue.
> 
> Signed-off-by: Jeff Guo <jia.guo@intel.com>

The PCI hotplug management is an important and potentially error prone
error of DPDK.

I realize that English is not your native language, but the commit messages
for this are hard to read. Perhaps you can get a volunteer or other person
in the community to reword them. The commit logs and comments contain
important information about the documentation of the code.

How does VFIO handle hotplug? We should direct all users to use VFIO
since it is supported and secure. Igb uio has always been a slightly
dangerous (as they say "running with scissors") way of accessing devices.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V4 3/9] bus: introduce sigbus handler
  2018-06-29 10:30         ` [PATCH V4 3/9] bus: introduce sigbus handler Jeff Guo
@ 2018-07-10 21:55           ` Stephen Hemminger
  2018-07-11  2:15             ` Jeff Guo
  0 siblings, 1 reply; 494+ messages in thread
From: Stephen Hemminger @ 2018-07-10 21:55 UTC (permalink / raw)
  To: Jeff Guo
  Cc: bruce.richardson, ferruh.yigit, konstantin.ananyev, gaetan.rivet,
	jingjing.wu, thomas, motih, matan, harry.van.haaren, qi.z.zhang,
	shaopeng.he, bernard.iremonger, jblunck, shreyansh.jain, dev,
	helin.zhang

On Fri, 29 Jun 2018 18:30:42 +0800
Jeff Guo <jia.guo@intel.com> wrote:

> When device be hotplug, if data path still read/write device, the sigbus
> error will occur, this error need to be handled. So a handler need to be
> here to capture the signal and handle it correspondingly.
> 
> To handle sigbus error is a bus-specific behavior, this patch introduces
> a bus ops so that each kind of bus can implement its own logic.
> 
> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> ---
> v4->v3:
> split patches to be small and clear.
> ---
>  lib/librte_eal/common/include/rte_bus.h | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
> 
> diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
> index 3642aeb..231bd3d 100644
> --- a/lib/librte_eal/common/include/rte_bus.h
> +++ b/lib/librte_eal/common/include/rte_bus.h
> @@ -181,6 +181,20 @@ typedef int (*rte_bus_parse_t)(const char *name, void *addr);
>  typedef int (*rte_bus_hotplug_handler_t)(struct rte_device *dev);
>  
>  /**
> + * Implementation a specific sigbus handler, which is responsible
> + * for handle the sigbus error which is original memory error, or specific
> + * memory error that caused of hot unplug.
> + * @param failure_addr
> + *	Pointer of the fault address of the sigbus error.
> + *
> + * @return
> + *	0 for success handle the sigbus.
> + *	1 for no handle the sigbus.
> + *	-1 for failed to handle the sigbus
> + */
> +typedef int (*rte_bus_sigbus_handler_t)(const void *failure_addr);
> +
> +/**
>   * Bus scan policies
>   */
>  enum rte_bus_scan_mode {
> @@ -226,6 +240,8 @@ struct rte_bus {
>  	rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
>  	rte_bus_hotplug_handler_t hotplug_handler;
>  						/**< handle hot plug on bus */
> +	rte_bus_sigbus_handler_t sigbus_handler; /**< handle sigbus error */
> +
>  };
>  
>  /**

One issue with handling sigbus is that you are going to trap program errors
as well as hotplug. How can you distinguish between removed device and a
buggy userspace program (or worse comprimised program)?

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH V4 3/9] bus: introduce sigbus handler
  2018-07-10 21:55           ` Stephen Hemminger
@ 2018-07-11  2:15             ` Jeff Guo
  0 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-07-11  2:15 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: bruce.richardson, ferruh.yigit, konstantin.ananyev, gaetan.rivet,
	jingjing.wu, thomas, motih, matan, harry.van.haaren, qi.z.zhang,
	shaopeng.he, bernard.iremonger, jblunck, shreyansh.jain, dev,
	helin.zhang



On 7/11/2018 5:55 AM, Stephen Hemminger wrote:
> On Fri, 29 Jun 2018 18:30:42 +0800
> Jeff Guo <jia.guo@intel.com> wrote:
>
>> When device be hotplug, if data path still read/write device, the sigbus
>> error will occur, this error need to be handled. So a handler need to be
>> here to capture the signal and handle it correspondingly.
>>
>> To handle sigbus error is a bus-specific behavior, this patch introduces
>> a bus ops so that each kind of bus can implement its own logic.
>>
>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
>> ---
>> v4->v3:
>> split patches to be small and clear.
>> ---
>>   lib/librte_eal/common/include/rte_bus.h | 16 ++++++++++++++++
>>   1 file changed, 16 insertions(+)
>>
>> diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
>> index 3642aeb..231bd3d 100644
>> --- a/lib/librte_eal/common/include/rte_bus.h
>> +++ b/lib/librte_eal/common/include/rte_bus.h
>> @@ -181,6 +181,20 @@ typedef int (*rte_bus_parse_t)(const char *name, void *addr);
>>   typedef int (*rte_bus_hotplug_handler_t)(struct rte_device *dev);
>>   
>>   /**
>> + * Implementation a specific sigbus handler, which is responsible
>> + * for handle the sigbus error which is original memory error, or specific
>> + * memory error that caused of hot unplug.
>> + * @param failure_addr
>> + *	Pointer of the fault address of the sigbus error.
>> + *
>> + * @return
>> + *	0 for success handle the sigbus.
>> + *	1 for no handle the sigbus.
>> + *	-1 for failed to handle the sigbus
>> + */
>> +typedef int (*rte_bus_sigbus_handler_t)(const void *failure_addr);
>> +
>> +/**
>>    * Bus scan policies
>>    */
>>   enum rte_bus_scan_mode {
>> @@ -226,6 +240,8 @@ struct rte_bus {
>>   	rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
>>   	rte_bus_hotplug_handler_t hotplug_handler;
>>   						/**< handle hot plug on bus */
>> +	rte_bus_sigbus_handler_t sigbus_handler; /**< handle sigbus error */
>> +
>>   };
>>   
>>   /**
> One issue with handling sigbus is that you are going to trap program errors
> as well as hotplug. How can you distinguish between removed device and a
> buggy userspace program (or worse comprimised program)?
That is a problem which i have been considerate in this mechanism and do 
it in other patch, the way is that first check if the error domain is 
belong to the mmio device resource or not,
if it is will do new sigbus handler for hotplug, if not will mean that 
it is buggy user space program, will use generic sigbus handler to 
handler it.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v8 7/7] igb_uio: fix uio release issue for hotplug
  2018-07-10 21:52           ` Stephen Hemminger
@ 2018-07-11  2:46             ` Jeff Guo
  2018-07-11 10:01               ` Jeff Guo
  0 siblings, 1 reply; 494+ messages in thread
From: Jeff Guo @ 2018-07-11  2:46 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: bruce.richardson, ferruh.yigit, konstantin.ananyev, gaetan.rivet,
	jingjing.wu, thomas, motih, matan, harry.van.haaren, qi.z.zhang,
	shaopeng.he, bernard.iremonger, arybchenko, wenzhuo.lu, jblunck,
	shreyansh.jain, dev, helin.zhang



On 7/11/2018 5:52 AM, Stephen Hemminger wrote:
> On Tue, 10 Jul 2018 19:03:27 +0800
> Jeff Guo <jia.guo@intel.com> wrote:
>
>> When hotplug out device, the device resource will be released in kernel.
>> The fd sys file will disappear, and the irq will be released. At this time,
>> if igb uio driver still try to release this resource, it will cause kernel
>> crash. On the other hand, interrupt disabling do not automatically be
>> processed in kernel. If not handle it, this redundancy and dirty thing will
>> affect the interrupt resource be used by other device. So the igb_uio
>> driver have to check the hotplug status, and the corresponding process
>> should be taken in igb uio driver.
>>
>> This patch propose to add enum rte_udev_state into struct rte_uio_pci_dev
>> of igb uio driver, which will record the state of uio device, such as
>> probed/opened/released/removed. When detect the unexpected removal which
>> cause of hotplug out behavior, it will corresponding disable interrupt
>> resource. For the part of releasement which kernel have already handle,
>> just skip it to avoid double free or null pointer crash issue.
>>
>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> The PCI hotplug management is an important and potentially error prone
> error of DPDK.
>
> I realize that English is not your native language, but the commit messages
> for this are hard to read. Perhaps you can get a volunteer or other person
> in the community to reword them. The commit logs and comments contain
> important information about the documentation of the code.

yes, i think that it might not be the whole thing to let you confused 
except something specific. But definitely it is my task to let reviewer 
most easily know what i want to propose before they will ack it,
especial for some complex case. let's try my best for check my word. I 
am also planning to go to more native English country and great 
conference study, anyway to improve my word : )

> How does VFIO handle hotplug? We should direct all users to use VFIO
> since it is supported and secure. Igb uio has always been a slightly
> dangerous (as they say "running with scissors") way of accessing devices.

You exposure VFIO here to replace igo uio, maybe we should check if it 
is optional or not.
If we can fix all igb_uio issue it should be optional, if only vfio show 
stable we should go to vfio only.
What other else guy comment here?

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v8 7/7] igb_uio: fix uio release issue for hotplug
  2018-07-10 21:48           ` Stephen Hemminger
@ 2018-07-11  3:10             ` Jeff Guo
  0 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-07-11  3:10 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: bruce.richardson, ferruh.yigit, konstantin.ananyev, gaetan.rivet,
	jingjing.wu, thomas, motih, matan, harry.van.haaren, qi.z.zhang,
	shaopeng.he, bernard.iremonger, arybchenko, wenzhuo.lu, jblunck,
	shreyansh.jain, dev, helin.zhang



On 7/11/2018 5:48 AM, Stephen Hemminger wrote:
> On Tue, 10 Jul 2018 19:03:27 +0800
> Jeff Guo <jia.guo@intel.com> wrote:
>
>>   
>> +/* uio pci device state */
>> +enum rte_udev_state {
>> +	RTE_UDEV_PROBED,
>> +	RTE_UDEV_OPENNED,
>> +	RTE_UDEV_RELEASED,
>> +	RTE_UDEV_REMOVED,
>> +};
>> +
> The states here are a little confusing. especially since pci_release
> seems to take different actions based on the state. And there is nothing
> preventing races between unexpected removal (PCI), and removing the
> device from being used by igb_uio.

The states here just manage in igb uio, and only restore the status of 
igb uio.
And only the RTE_UDEV_REMOVED that the key status might be a highlight 
and process, it is means pci have been removed, then directly come to 
igb uio remove without go igb uio release at first,
the status is for hotplug out, need to do specific process. It will no 
affect the normal process, and for normal igb uio remove, udev be 
release, no any status need to restore.

> Would it be possible to only use state variable from the kernel PCI
> layer where the value is consistent?

The state which i only care here is hot remove state, i check that 
kobj->state_remove_uevent_sent could be use in igb uio,
except that still can not find a specific state of kernel pci to 
identify it. If we can find it, it should be best.
what's other comment from guys?

> Also there is refcounting in PCI layer (and locking). Could that
> be used instead?

It might be, but if state is enough here we might not considerate to use 
it. If i am missing anything, let me know.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v8 7/7] igb_uio: fix uio release issue for hotplug
  2018-07-11  2:46             ` Jeff Guo
@ 2018-07-11 10:01               ` Jeff Guo
  0 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-07-11 10:01 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: bruce.richardson, ferruh.yigit, konstantin.ananyev, gaetan.rivet,
	jingjing.wu, thomas, motih, matan, harry.van.haaren, qi.z.zhang,
	shaopeng.he, bernard.iremonger, arybchenko, wenzhuo.lu, jblunck,
	shreyansh.jain, dev, helin.zhang



On 7/11/2018 10:46 AM, Jeff Guo wrote:
>
>
> On 7/11/2018 5:52 AM, Stephen Hemminger wrote:
>> On Tue, 10 Jul 2018 19:03:27 +0800
>> Jeff Guo <jia.guo@intel.com> wrote:
>>
>>> When hotplug out device, the device resource will be released in 
>>> kernel.
>>> The fd sys file will disappear, and the irq will be released. At 
>>> this time,
>>> if igb uio driver still try to release this resource, it will cause 
>>> kernel
>>> crash. On the other hand, interrupt disabling do not automatically be
>>> processed in kernel. If not handle it, this redundancy and dirty 
>>> thing will
>>> affect the interrupt resource be used by other device. So the igb_uio
>>> driver have to check the hotplug status, and the corresponding process
>>> should be taken in igb uio driver.
>>>
>>> This patch propose to add enum rte_udev_state into struct 
>>> rte_uio_pci_dev
>>> of igb uio driver, which will record the state of uio device, such as
>>> probed/opened/released/removed. When detect the unexpected removal 
>>> which
>>> cause of hotplug out behavior, it will corresponding disable interrupt
>>> resource. For the part of releasement which kernel have already handle,
>>> just skip it to avoid double free or null pointer crash issue.
>>>
>>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
>> The PCI hotplug management is an important and potentially error prone
>> error of DPDK.
>>
>> I realize that English is not your native language, but the commit 
>> messages
>> for this are hard to read. Perhaps you can get a volunteer or other 
>> person
>> in the community to reword them. The commit logs and comments contain
>> important information about the documentation of the code.
>
> yes, i think that it might not be the whole thing to let you confused 
> except something specific. But definitely it is my task to let 
> reviewer most easily know what i want to propose before they will ack it,
> especial for some complex case. let's try my best for check my word. I 
> am also planning to go to more native English country and great 
> conference study, anyway to improve my word : )
>
>> How does VFIO handle hotplug? We should direct all users to use VFIO
>> since it is supported and secure. Igb uio has always been a slightly
>> dangerous (as they say "running with scissors") way of accessing 
>> devices.
>
> You exposure VFIO here to replace igo uio, maybe we should check if it 
> is optional or not.
> If we can fix all igb_uio issue it should be optional, if only vfio 
> show stable we should go to vfio only.
> What other else guy comment here?

Plus, the pci vfio hotplug enabling is still on the next plan, since the 
different framework between vifo and uio for uevent process.
Per vfio, when user space control  it will holding the resource so the 
uevent will be blocked to sent out from kernel.
Anyway that should be another story. We hope this patch set just fix 
hotplug failure issue for igb uio case.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH v9 0/7] hotplug failure handle mechanism
  2017-06-29  4:37     ` [PATCH v3 0/2] add uevent api for hot plug Jeff Guo
                         ` (14 preceding siblings ...)
  2018-07-10 11:03       ` [PATCH v8 0/7] hotplug failure handle mechanism Jeff Guo
@ 2018-07-11 10:41       ` Jeff Guo
  2018-07-11 10:41         ` [PATCH v9 1/7] bus: add hotplug failure handler Jeff Guo
                           ` (7 more replies)
  2018-08-17 10:48       ` [PATCH v10 0/8] " Jeff Guo
                         ` (7 subsequent siblings)
  23 siblings, 8 replies; 494+ messages in thread
From: Jeff Guo @ 2018-07-11 10:41 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

As we know, hot plug is an importance feature, either use for the datacenter
device’s fail-safe, or use for SRIOV Live Migration in SDN/NFV. It could bring
the higher flexibility and continuality to the networking services in multiple
use cases in industry. So let we see, dpdk as an importance networking
framework, what can it help to implement hot plug solution for users.

We already have a general device event detect mechanism, failsafe driver,
bonding driver and hot plug/unplug api in framework, app could use these to
develop their hot plug solution.

let’s see the case of hot unplug, it can happen when a hardware device is
be removed physically, or when the software disables it.  App need to call
ether dev API to detach the device, to unplug the device at the bus level and
make access to the device invalid. But the problem is that, the removal of the
device from the software lists is not going to be instantaneous, at this time
if the data(fast) path still read/write the device, it will cause MMIO error
and result of the app crash out.

Seems that we have got fail-safe driver(or app) + RTE_ETH_EVENT_INTR_RMV +
kernel core driver solution to handle it, but still not have failsafe driver
(or app) + RTE_DEV_EVENT_REMOVE + PCIe pmd driver failure handle solution. So
there is an absence in dpdk hot plug solution right now.

Also, we know that kernel only guaranty hot plug on the kernel side, but not for
the user mode side. Firstly we can hardly have a gatekeeper for any MMIO for
multiple PMD driver. Secondly, no more specific 3rd tools such as udev/driverctl
have especially cover these hot plug failure processing. Third, the feasibility
of app’s implement for multiple user mode PMD driver is still a problem. Here,
a general hot plug failure handle mechanism in dpdk framework would be proposed,
it aim to guaranty that, when hot unplug occur, the system will not crash and
app will not be break out, and user space can normally stop and release any
relevant resources, then unplug of the device at the bus level cleanly.

The mechanism should be come across as bellow:

Firstly, app enabled the device event monitor and register the hot plug event’s
callback before running data path. Once the hot unplug behave occur, the
mechanism will detect the removal event and then accordingly do the failure
handle. In order to do that, below functional will be bring in.
 - Add a new bus ops “handle_hot_unplug” to handle bus read/write error, it is
   bus-specific and each kind of bus can implement its own logic.
 - Implement pci bus specific ops “pci_handle_hot_unplug”. It will base on the
   failure address to remap memory for the corresponding device that unplugged.

For the data path or other unexpected control from the control path when hot
unplug occur.
 - Implement a new sigbus handler, it is registered when start device even
   monitoring. The handler is per process. Base on the signal event principle,
   control path thread and data path thread will randomly receive the sigbus
   error, but will go to the common sigbus handler. Once the MMIO sigbus error
   exposure, it will trigger the above hot unplug operation. The sigbus will be
   check if it is cause of the hot unplug or not, if not will info exception as
   the original sigbus handler. If yes, will do memory remapping.

For the control path and the igb uio release:
 - When hot unplug device, the kernel will release the device resource in the
   kernel side, such as the fd sys file will disappear, and the irq will be
   released. At this time, if igb uio driver still try to release this resource,
   it will cause kernel crash.
   On the other hand, something like interrupt disable do not automatically
   process in kernel side. If not handler it, this redundancy and dirty thing
   will affect the interrupt resource be used by other device.
   So the igb_uio driver have to check the hot plug status and corresponding
   process should be taken in igb uio deriver.
   This patch propose to add structure of rte_udev_state into rte_uio_pci_dev
   of igb_uio kernel driver, which will record the state of uio device, such as
   probed/opened/released/removed/unplug. When detect the unexpected removal
   which cause of hot unplug behavior, it will corresponding disable interrupt
   resource, while for the part of releasement which kernel have already handle,
   just skip it to avoid double free or null pointer kernel crash issue.

The mechanism could be use for fail-safe driver and app which want to use hot
plug solution. let testpmd for example:
 - Enable device event monitor->device unplug->failure handle->stop forwarding->
   stop port->close port->detach port.

This process will not breaking the app/fail-safe running, and will not break
other irrelevance device. And app could plug in the device and restart the date
path again by below.
 - Device plug in->bind igb_uio driver ->attached device->start port->
   start forwarding.

patchset history:
v9->v8:
refine commit log to be more readable.

v8->v7:
refine errno process in sigbus handler.
refine igb uio release process

v7->v6:
delete some unused part

v6->v5:
refine some description about bus ops
refine commit log
add some entry check.

v5->v4:
split patches to focus on the failure handle, remove the event usage by testpmd
to another patch.
change the hotplug failure handler name
refine the sigbus handle logic
add lock for udev state in igb uio driver

v4->v3:
split patches to be small and clear
change to use new parameter "--hotplug-mode" in testpmd
to identify the eal hotplug and ethdev hotplug

v3->v2:
change bus ops name to bus_hotplug_handler.
add new API and bus ops of bus_signal_handler
distingush handle generic sigbus and hotplug sigbus

v2->v1(v21):
refine some doc and commit log
fix igb uio kernel issue for control path failure
rebase testpmd code

Since the hot plug solution be discussed serval around in the public, the
scope be changed and the patch set be split into many times. Coming to the
recently RFC and feature design, it just focus on the hot unplug failure
handler at this patch set, so in order let this topic more clear and focus,
summarize privours patch set in history “v1(v21)”, the v2 here go ahead
for further track.

"v1(21)" == v21 as below:
v21->v20:
split function in hot unplug ops
sync failure hanlde to fix multiple process issue fix attach port issue for multiple devices case.
combind rmv callback function to be only one.

v20->v19:
clean the code
refine the remap logic for multiple device.
remove the auto binding

v19->18:
note for limitation of multiple hotplug,fix some typo, sqeeze patch.

v18->v15:
add document, add signal bus handler, refine the code to be more clear.

the prior patch history please check the patch set "add device event monitor framework".

Jeff Guo (7):
  bus: add hotplug failure handler
  bus/pci: implement hotplug failure handler ops
  bus: add sigbus handler
  bus/pci: implement sigbus handler operation
  bus: add helper to handle sigbus
  eal: add failure handle mechanism for hotplug
  igb_uio: fix unexpected remove issue for hotplug

 drivers/bus/pci/pci_common.c            |  77 ++++++++++++++++++++++
 drivers/bus/pci/pci_common_uio.c        |  33 ++++++++++
 drivers/bus/pci/private.h               |  12 ++++
 kernel/linux/igb_uio/igb_uio.c          |  69 +++++++++++++++----
 lib/librte_eal/common/eal_common_bus.c  |  43 ++++++++++++
 lib/librte_eal/common/eal_private.h     |  12 ++++
 lib/librte_eal/common/include/rte_bus.h |  33 ++++++++++
 lib/librte_eal/linuxapp/eal/eal_dev.c   | 113 +++++++++++++++++++++++++++++++-
 8 files changed, 377 insertions(+), 15 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH v9 1/7] bus: add hotplug failure handler
  2018-07-11 10:41       ` [PATCH v9 0/7] hotplug failure handle mechanism Jeff Guo
@ 2018-07-11 10:41         ` Jeff Guo
  2018-07-11 10:41         ` [PATCH v9 2/7] bus/pci: implement hotplug failure handler ops Jeff Guo
                           ` (6 subsequent siblings)
  7 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-07-11 10:41 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

When device be hotplug out, if app still continue to access device by mmio,
it will cause of memory failure and result the system crash.

This patch introduces a bus ops to handle device hotplug failure, it is a
bus specific behavior, so each kind of bus can implement its own logic case
by case.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Shaopeng He <shaopeng.he@intel.com>
---
v9->v8:
no change
---
 lib/librte_eal/common/include/rte_bus.h | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
index eb9eded..e3a55a8 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -168,6 +168,20 @@ typedef int (*rte_bus_unplug_t)(struct rte_device *dev);
 typedef int (*rte_bus_parse_t)(const char *name, void *addr);
 
 /**
+ * Implementation a specific hotplug failure handler, which is responsible
+ * for handle the failure when the device be hotplug out from the bus. When
+ * hotplug removal event be detected, it could call this function to handle
+ * failure and guaranty the system would not crash in the case.
+ * @param dev
+ *	Pointer of the device structure.
+ *
+ * @return
+ *	0 on success.
+ *	!0 on error.
+ */
+typedef int (*rte_bus_hotplug_failure_handler_t)(struct rte_device *dev);
+
+/**
  * Bus scan policies
  */
 enum rte_bus_scan_mode {
@@ -211,6 +225,8 @@ struct rte_bus {
 	rte_bus_parse_t parse;       /**< Parse a device name */
 	struct rte_bus_conf conf;    /**< Bus configuration */
 	rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
+	rte_bus_hotplug_failure_handler_t hotplug_failure_handler;
+					/**< handle hotplug failure on bus */
 };
 
 /**
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v9 2/7] bus/pci: implement hotplug failure handler ops
  2018-07-11 10:41       ` [PATCH v9 0/7] hotplug failure handle mechanism Jeff Guo
  2018-07-11 10:41         ` [PATCH v9 1/7] bus: add hotplug failure handler Jeff Guo
@ 2018-07-11 10:41         ` Jeff Guo
  2018-07-11 10:41         ` [PATCH v9 3/7] bus: add sigbus handler Jeff Guo
                           ` (5 subsequent siblings)
  7 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-07-11 10:41 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

This patch implements the ops of hotplug failure handler for PCI bus,
it is functional to remap a new dummy memory which overlap to the
failure memory to avoid MMIO read/write error.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Shaopeng He <shaopeng.he@intel.com>
---
v9->v8:
no change
---
 drivers/bus/pci/pci_common.c     | 28 ++++++++++++++++++++++++++++
 drivers/bus/pci/pci_common_uio.c | 33 +++++++++++++++++++++++++++++++++
 drivers/bus/pci/private.h        | 12 ++++++++++++
 3 files changed, 73 insertions(+)

diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
index 94b0f41..d7abe6c 100644
--- a/drivers/bus/pci/pci_common.c
+++ b/drivers/bus/pci/pci_common.c
@@ -408,6 +408,33 @@ pci_find_device(const struct rte_device *start, rte_dev_cmp_t cmp,
 }
 
 static int
+pci_hotplug_failure_handler(struct rte_device *dev)
+{
+	struct rte_pci_device *pdev = NULL;
+	int ret = 0;
+
+	pdev = RTE_DEV_TO_PCI(dev);
+	if (!pdev)
+		return -1;
+
+	switch (pdev->kdrv) {
+	case RTE_KDRV_IGB_UIO:
+	case RTE_KDRV_UIO_GENERIC:
+	case RTE_KDRV_NIC_UIO:
+		/* mmio resource is invalid, remap it to be safe. */
+		ret = pci_uio_remap_resource(pdev);
+		break;
+	default:
+		RTE_LOG(DEBUG, EAL,
+			"Not managed by a supported kernel driver, skipped\n");
+		ret = -1;
+		break;
+	}
+
+	return ret;
+}
+
+static int
 pci_plug(struct rte_device *dev)
 {
 	return pci_probe_all_drivers(RTE_DEV_TO_PCI(dev));
@@ -437,6 +464,7 @@ struct rte_pci_bus rte_pci_bus = {
 		.unplug = pci_unplug,
 		.parse = pci_parse,
 		.get_iommu_class = rte_pci_get_iommu_class,
+		.hotplug_failure_handler = pci_hotplug_failure_handler,
 	},
 	.device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
 	.driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
diff --git a/drivers/bus/pci/pci_common_uio.c b/drivers/bus/pci/pci_common_uio.c
index 54bc20b..7ea73db 100644
--- a/drivers/bus/pci/pci_common_uio.c
+++ b/drivers/bus/pci/pci_common_uio.c
@@ -146,6 +146,39 @@ pci_uio_unmap(struct mapped_pci_resource *uio_res)
 	}
 }
 
+/* remap the PCI resource of a PCI device in anonymous virtual memory */
+int
+pci_uio_remap_resource(struct rte_pci_device *dev)
+{
+	int i;
+	void *map_address;
+
+	if (dev == NULL)
+		return -1;
+
+	/* Remap all BARs */
+	for (i = 0; i != PCI_MAX_RESOURCE; i++) {
+		/* skip empty BAR */
+		if (dev->mem_resource[i].phys_addr == 0)
+			continue;
+		map_address = mmap(dev->mem_resource[i].addr,
+				(size_t)dev->mem_resource[i].len,
+				PROT_READ | PROT_WRITE,
+				MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
+		if (map_address == MAP_FAILED) {
+			RTE_LOG(ERR, EAL,
+				"Cannot remap resource for device %s\n",
+				dev->name);
+			return -1;
+		}
+		RTE_LOG(INFO, EAL,
+			"Successful remap resource for device %s\n",
+			dev->name);
+	}
+
+	return 0;
+}
+
 static struct mapped_pci_resource *
 pci_uio_find_resource(struct rte_pci_device *dev)
 {
diff --git a/drivers/bus/pci/private.h b/drivers/bus/pci/private.h
index 8ddd03e..6b312e5 100644
--- a/drivers/bus/pci/private.h
+++ b/drivers/bus/pci/private.h
@@ -123,6 +123,18 @@ void pci_uio_free_resource(struct rte_pci_device *dev,
 		struct mapped_pci_resource *uio_res);
 
 /**
+ * Remap the PCI resource of a PCI device in anonymous virtual memory.
+ *
+ * @param dev
+ *   Point to the struct rte pci device.
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+pci_uio_remap_resource(struct rte_pci_device *dev);
+
+/**
  * Map device memory to uio resource
  *
  * This function is private to EAL.
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v9 3/7] bus: add sigbus handler
  2018-07-11 10:41       ` [PATCH v9 0/7] hotplug failure handle mechanism Jeff Guo
  2018-07-11 10:41         ` [PATCH v9 1/7] bus: add hotplug failure handler Jeff Guo
  2018-07-11 10:41         ` [PATCH v9 2/7] bus/pci: implement hotplug failure handler ops Jeff Guo
@ 2018-07-11 10:41         ` Jeff Guo
  2018-07-11 10:41         ` [PATCH v9 4/7] bus/pci: implement sigbus handler operation Jeff Guo
                           ` (4 subsequent siblings)
  7 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-07-11 10:41 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

When device be hotplug out, if data path still read/write device, the
sigbus error will occur, this error need to be handled. So a handler
need to be here to capture the signal and handle it correspondingly.

This patch introduces a bus ops to handle sigbus error, it is a bus
specific behavior, so that each kind of bus can implement its own logic
case by case.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Shaopeng He <shaopeng.he@intel.com>
---
v9->v8:
no change
---
 lib/librte_eal/common/include/rte_bus.h | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
index e3a55a8..216ad1e 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -182,6 +182,21 @@ typedef int (*rte_bus_parse_t)(const char *name, void *addr);
 typedef int (*rte_bus_hotplug_failure_handler_t)(struct rte_device *dev);
 
 /**
+ * Implementation a specific sigbus handler, which is responsible for handle
+ * the sigbus error which is either original memory error, or specific memory
+ * error that caused of hot unplug. When sigbus error be captured, it could
+ * call this function to handle sigbus error.
+ * @param failure_addr
+ *	Pointer of the fault address of the sigbus error.
+ *
+ * @return
+ *	0 for success handle the sigbus.
+ *	1 for no bus handle the sigbus.
+ *	-1 for failed to handle the sigbus
+ */
+typedef int (*rte_bus_sigbus_handler_t)(const void *failure_addr);
+
+/**
  * Bus scan policies
  */
 enum rte_bus_scan_mode {
@@ -227,6 +242,8 @@ struct rte_bus {
 	rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
 	rte_bus_hotplug_failure_handler_t hotplug_failure_handler;
 					/**< handle hotplug failure on bus */
+	rte_bus_sigbus_handler_t sigbus_handler; /**< handle sigbus error */
+
 };
 
 /**
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v9 4/7] bus/pci: implement sigbus handler operation
  2018-07-11 10:41       ` [PATCH v9 0/7] hotplug failure handle mechanism Jeff Guo
                           ` (2 preceding siblings ...)
  2018-07-11 10:41         ` [PATCH v9 3/7] bus: add sigbus handler Jeff Guo
@ 2018-07-11 10:41         ` Jeff Guo
  2018-07-11 10:41         ` [PATCH v9 5/7] bus: add helper to handle sigbus Jeff Guo
                           ` (3 subsequent siblings)
  7 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-07-11 10:41 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

This patch implements the ops of sigbus handler for PCI bus, it is
functional to find the corresponding pci device which is been hotplug
out, and then call the bus ops of hotplug failure handler to handle
the failure for the device.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Shaopeng He <shaopeng.he@intel.com>
---
v9->v8:
no change
---
 drivers/bus/pci/pci_common.c | 49 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 49 insertions(+)

diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
index d7abe6c..37ad266 100644
--- a/drivers/bus/pci/pci_common.c
+++ b/drivers/bus/pci/pci_common.c
@@ -407,6 +407,32 @@ pci_find_device(const struct rte_device *start, rte_dev_cmp_t cmp,
 	return NULL;
 }
 
+/* check the failure address belongs to which device. */
+static struct rte_pci_device *
+pci_find_device_by_addr(const void *failure_addr)
+{
+	struct rte_pci_device *pdev = NULL;
+	int i;
+
+	FOREACH_DEVICE_ON_PCIBUS(pdev) {
+		for (i = 0; i != RTE_DIM(pdev->mem_resource); i++) {
+			if ((uint64_t)(uintptr_t)failure_addr >=
+			    (uint64_t)(uintptr_t)pdev->mem_resource[i].addr &&
+			    (uint64_t)(uintptr_t)failure_addr <
+			    (uint64_t)(uintptr_t)pdev->mem_resource[i].addr +
+			    pdev->mem_resource[i].len) {
+				RTE_LOG(INFO, EAL, "Failure address "
+					"%16.16"PRIx64" belongs to "
+					"device %s!\n",
+					(uint64_t)(uintptr_t)failure_addr,
+					pdev->device.name);
+				return pdev;
+			}
+		}
+	}
+	return NULL;
+}
+
 static int
 pci_hotplug_failure_handler(struct rte_device *dev)
 {
@@ -435,6 +461,28 @@ pci_hotplug_failure_handler(struct rte_device *dev)
 }
 
 static int
+pci_sigbus_handler(const void *failure_addr)
+{
+	struct rte_pci_device *pdev = NULL;
+	int ret = 0;
+
+	pdev = pci_find_device_by_addr(failure_addr);
+	if (!pdev) {
+		/* It is a generic sigbus error, no bus would handle it. */
+		ret = 1;
+	} else {
+		/* The sigbus error is caused of hot removal. */
+		ret = pci_hotplug_failure_handler(&pdev->device);
+		if (ret) {
+			RTE_LOG(ERR, EAL, "Failed to handle hot plug for "
+				"device %s", pdev->name);
+			ret = -1;
+		}
+	}
+	return ret;
+}
+
+static int
 pci_plug(struct rte_device *dev)
 {
 	return pci_probe_all_drivers(RTE_DEV_TO_PCI(dev));
@@ -465,6 +513,7 @@ struct rte_pci_bus rte_pci_bus = {
 		.parse = pci_parse,
 		.get_iommu_class = rte_pci_get_iommu_class,
 		.hotplug_failure_handler = pci_hotplug_failure_handler,
+		.sigbus_handler = pci_sigbus_handler,
 	},
 	.device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
 	.driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v9 5/7] bus: add helper to handle sigbus
  2018-07-11 10:41       ` [PATCH v9 0/7] hotplug failure handle mechanism Jeff Guo
                           ` (3 preceding siblings ...)
  2018-07-11 10:41         ` [PATCH v9 4/7] bus/pci: implement sigbus handler operation Jeff Guo
@ 2018-07-11 10:41         ` Jeff Guo
  2018-07-11 10:41         ` [PATCH v9 6/7] eal: add failure handle mechanism for hotplug Jeff Guo
                           ` (2 subsequent siblings)
  7 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-07-11 10:41 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

This patch aim to add a helper to iterate all buses to find the
corresponding bus to handle the sigbus error.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Shaopeng He <shaopeng.he@intel.com>
---
v9->v8:
no change
---
 lib/librte_eal/common/eal_common_bus.c | 43 ++++++++++++++++++++++++++++++++++
 lib/librte_eal/common/eal_private.h    | 12 ++++++++++
 2 files changed, 55 insertions(+)

diff --git a/lib/librte_eal/common/eal_common_bus.c b/lib/librte_eal/common/eal_common_bus.c
index 0943851..62b7318 100644
--- a/lib/librte_eal/common/eal_common_bus.c
+++ b/lib/librte_eal/common/eal_common_bus.c
@@ -37,6 +37,7 @@
 #include <rte_bus.h>
 #include <rte_debug.h>
 #include <rte_string_fns.h>
+#include <rte_errno.h>
 
 #include "eal_private.h"
 
@@ -242,3 +243,45 @@ rte_bus_get_iommu_class(void)
 	}
 	return mode;
 }
+
+static int
+bus_handle_sigbus(const struct rte_bus *bus,
+			const void *failure_addr)
+{
+	int ret;
+
+	if (!bus->sigbus_handler)
+		return -1;
+
+	ret = bus->sigbus_handler(failure_addr);
+
+	/* find bus but handle failed, keep the errno be set. */
+	if (ret < 0 && rte_errno == 0)
+		rte_errno = ENOTSUP;
+
+	return ret > 0;
+}
+
+int
+rte_bus_sigbus_handler(const void *failure_addr)
+{
+	struct rte_bus *bus;
+
+	int ret = 0;
+	int old_errno = rte_errno;
+
+	rte_errno = 0;
+
+	bus = rte_bus_find(NULL, bus_handle_sigbus, failure_addr);
+	/* can not find bus. */
+	if (!bus)
+		return 1;
+	/* find bus but handle failed, pass on the new errno. */
+	else if (rte_errno != 0)
+		return -1;
+
+	/* restore the old errno. */
+	rte_errno = old_errno;
+
+	return ret;
+}
diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h
index bdadc4d..2337e71 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -258,4 +258,16 @@ int rte_mp_channel_init(void);
  */
 void dev_callback_process(char *device_name, enum rte_dev_event_type event);
 
+/**
+ * Iterate all buses to find the corresponding bus, to handle the sigbus error.
+ * @param failure_addr
+ *	Pointer of the fault address of the sigbus error.
+ *
+ * @return
+ *	 0 success to handle the sigbus.
+ *	-1 failed to handle the sigbus
+ *	 1 no bus can handler the sigbus
+ */
+int rte_bus_sigbus_handler(const void *failure_addr);
+
 #endif /* _EAL_PRIVATE_H_ */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v9 6/7] eal: add failure handle mechanism for hotplug
  2018-07-11 10:41       ` [PATCH v9 0/7] hotplug failure handle mechanism Jeff Guo
                           ` (4 preceding siblings ...)
  2018-07-11 10:41         ` [PATCH v9 5/7] bus: add helper to handle sigbus Jeff Guo
@ 2018-07-11 10:41         ` Jeff Guo
  2018-07-11 10:41         ` [PATCH v9 7/7] igb_uio: fix unexpected remove issue " Jeff Guo
  2018-07-11 15:46         ` [PATCH v9 0/7] hotplug failure handle mechanism Stephen Hemminger
  7 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-07-11 10:41 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

This patch introduces a failure handle mechanism to handle device
hotplug removal event.

First it can register sigbus handler when enable device event monitor. Once
sigbus error be captured, it will check the failure address and accordingly
remap the invalid memory for the corresponding device. Besed on this
mechanism, it could guaranty the application not crash when the device be
hotplug out.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Shaopeng He <shaopeng.he@intel.com>
---
v9->v8:
no change
---
 lib/librte_eal/linuxapp/eal/eal_dev.c | 113 +++++++++++++++++++++++++++++++++-
 1 file changed, 112 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
index 1cf6aeb..1643b33 100644
--- a/lib/librte_eal/linuxapp/eal/eal_dev.c
+++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
@@ -4,6 +4,8 @@
 
 #include <string.h>
 #include <unistd.h>
+#include <fcntl.h>
+#include <signal.h>
 #include <sys/socket.h>
 #include <linux/netlink.h>
 
@@ -14,6 +16,10 @@
 #include <rte_malloc.h>
 #include <rte_interrupts.h>
 #include <rte_alarm.h>
+#include <rte_bus.h>
+#include <rte_eal.h>
+#include <rte_spinlock.h>
+#include <rte_errno.h>
 
 #include "eal_private.h"
 
@@ -23,6 +29,17 @@ static bool monitor_started;
 #define EAL_UEV_MSG_LEN 4096
 #define EAL_UEV_MSG_ELEM_LEN 128
 
+/*
+ * spinlock for device failure handle, if try to access bus or device,
+ * such as handle sigbus on bus or handle memory failure for device just use
+ * this lock. It could protect the bus and the device to avoid race condition.
+ */
+static rte_spinlock_t failure_handle_lock = RTE_SPINLOCK_INITIALIZER;
+
+static struct sigaction sigbus_action_old;
+
+static int sigbus_need_recover;
+
 static void dev_uev_handler(__rte_unused void *param);
 
 /* identify the system layer which reports this event. */
@@ -33,6 +50,49 @@ enum eal_dev_event_subsystem {
 	EAL_DEV_EVENT_SUBSYSTEM_MAX
 };
 
+static void
+sigbus_action_recover(void)
+{
+	if (sigbus_need_recover) {
+		sigaction(SIGBUS, &sigbus_action_old, NULL);
+		sigbus_need_recover = 0;
+	}
+}
+
+static void sigbus_handler(int signum, siginfo_t *info,
+				void *ctx __rte_unused)
+{
+	int ret;
+
+	RTE_LOG(INFO, EAL, "Thread[%d] catch SIGBUS, fault address:%p\n",
+		(int)pthread_self(), info->si_addr);
+
+	rte_spinlock_lock(&failure_handle_lock);
+	ret = rte_bus_sigbus_handler(info->si_addr);
+	rte_spinlock_unlock(&failure_handle_lock);
+	if (ret == -1) {
+		rte_exit(EXIT_FAILURE,
+			 "Failed to handle SIGBUS for hotplug, "
+			 "(rte_errno: %s)!", strerror(rte_errno));
+	} else if (ret == 1) {
+		if (sigbus_action_old.sa_handler)
+			(*(sigbus_action_old.sa_handler))(signum);
+		else
+			rte_exit(EXIT_FAILURE,
+				 "Failed to handle generic SIGBUS!");
+	}
+
+	RTE_LOG(INFO, EAL, "Success to handle SIGBUS for hotplug!\n");
+}
+
+static int cmp_dev_name(const struct rte_device *dev,
+	const void *_name)
+{
+	const char *name = _name;
+
+	return strcmp(dev->name, name);
+}
+
 static int
 dev_uev_socket_fd_create(void)
 {
@@ -147,6 +207,9 @@ dev_uev_handler(__rte_unused void *param)
 	struct rte_dev_event uevent;
 	int ret;
 	char buf[EAL_UEV_MSG_LEN];
+	struct rte_bus *bus;
+	struct rte_device *dev;
+	const char *busname = "";
 
 	memset(&uevent, 0, sizeof(struct rte_dev_event));
 	memset(buf, 0, EAL_UEV_MSG_LEN);
@@ -171,13 +234,50 @@ dev_uev_handler(__rte_unused void *param)
 	RTE_LOG(DEBUG, EAL, "receive uevent(name:%s, type:%d, subsystem:%d)\n",
 		uevent.devname, uevent.type, uevent.subsystem);
 
-	if (uevent.devname)
+	switch (uevent.subsystem) {
+	case EAL_DEV_EVENT_SUBSYSTEM_PCI:
+	case EAL_DEV_EVENT_SUBSYSTEM_UIO:
+		busname = "pci";
+		break;
+	default:
+		break;
+	}
+
+	if (uevent.devname) {
+		if (uevent.type == RTE_DEV_EVENT_REMOVE) {
+			rte_spinlock_lock(&failure_handle_lock);
+			bus = rte_bus_find_by_name(busname);
+			if (bus == NULL) {
+				RTE_LOG(ERR, EAL, "Cannot find bus (%s)\n",
+					busname);
+				return;
+			}
+
+			dev = bus->find_device(NULL, cmp_dev_name,
+					       uevent.devname);
+			if (dev == NULL) {
+				RTE_LOG(ERR, EAL, "Cannot find device (%s) on "
+					"bus (%s)\n", uevent.devname, busname);
+				return;
+			}
+
+			ret = bus->hotplug_failure_handler(dev);
+			rte_spinlock_unlock(&failure_handle_lock);
+			if (ret) {
+				RTE_LOG(ERR, EAL, "Can not handle hotplug for "
+					"device (%s)\n", dev->name);
+				return;
+			}
+		}
 		dev_callback_process(uevent.devname, uevent.type);
+	}
 }
 
 int __rte_experimental
 rte_dev_event_monitor_start(void)
 {
+	sigset_t mask;
+	struct sigaction action;
 	int ret;
 
 	if (monitor_started)
@@ -197,6 +297,14 @@ rte_dev_event_monitor_start(void)
 		return -1;
 	}
 
+	/* register sigbus handler */
+	sigemptyset(&mask);
+	sigaddset(&mask, SIGBUS);
+	action.sa_flags = SA_SIGINFO;
+	action.sa_mask = mask;
+	action.sa_sigaction = sigbus_handler;
+	sigbus_need_recover = !sigaction(SIGBUS, &action, &sigbus_action_old);
+
 	monitor_started = true;
 
 	return 0;
@@ -217,8 +325,11 @@ rte_dev_event_monitor_stop(void)
 		return ret;
 	}
 
+	sigbus_action_recover();
+
 	close(intr_handle.fd);
 	intr_handle.fd = -1;
 	monitor_started = false;
+
 	return 0;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v9 7/7] igb_uio: fix unexpected remove issue for hotplug
  2018-07-11 10:41       ` [PATCH v9 0/7] hotplug failure handle mechanism Jeff Guo
                           ` (5 preceding siblings ...)
  2018-07-11 10:41         ` [PATCH v9 6/7] eal: add failure handle mechanism for hotplug Jeff Guo
@ 2018-07-11 10:41         ` Jeff Guo
  2018-07-12  1:57           ` He, Shaopeng
  2018-07-11 15:46         ` [PATCH v9 0/7] hotplug failure handle mechanism Stephen Hemminger
  7 siblings, 1 reply; 494+ messages in thread
From: Jeff Guo @ 2018-07-11 10:41 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

When device be hotplug out, the pci resource will be released in kernel,
the uio fd will disappear, and the irq will be released. At this time,
if igb uio driver still try to access or release these resource, it will
cause kernel crash.

On the other hand, uio_remove will be called unexpectedly before
uio_release. The uio_remove procedure will free resources which are needed
by uio_release. So there is no chance to disable interrupt which is defined
inside uio_release procedure. This will affect later usage of interrupt.

So the case of unexpectedly removal by hot unplug should be identify and
correspondingly processed.

This patch propose to add enum rte_udev_state in struct rte_uio_pci_dev,
that will keep the state of uio device as probed/opened/released/removed.

This patch also checks kobject’s remove_uevent_sent state to detect the
unexpectedly removal status which means hot unplug. Once hot unplug be
detected, it will call uio_release as soon as possible and set the uio
status to be “removed”. After that, uio will check this status in
uio_release function, if uio have already been removed, it will only free
the dirty uio resource.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v9->v8:
refine commit log to be more readable.
---
 kernel/linux/igb_uio/igb_uio.c | 69 +++++++++++++++++++++++++++++++++---------
 1 file changed, 55 insertions(+), 14 deletions(-)

diff --git a/kernel/linux/igb_uio/igb_uio.c b/kernel/linux/igb_uio/igb_uio.c
index 3398eac..d126371 100644
--- a/kernel/linux/igb_uio/igb_uio.c
+++ b/kernel/linux/igb_uio/igb_uio.c
@@ -19,6 +19,14 @@
 
 #include "compat.h"
 
+/* uio pci device state */
+enum rte_udev_state {
+	RTE_UDEV_PROBED,
+	RTE_UDEV_OPENNED,
+	RTE_UDEV_RELEASED,
+	RTE_UDEV_REMOVED,
+};
+
 /**
  * A structure describing the private information for a uio device.
  */
@@ -28,6 +36,7 @@ struct rte_uio_pci_dev {
 	enum rte_intr_mode mode;
 	struct mutex lock;
 	int refcnt;
+	enum rte_udev_state state;
 };
 
 static int wc_activate;
@@ -309,6 +318,17 @@ igbuio_pci_disable_interrupts(struct rte_uio_pci_dev *udev)
 #endif
 }
 
+/* Unmap previously ioremap'd resources */
+static void
+igbuio_pci_release_iomem(struct uio_info *info)
+{
+	int i;
+
+	for (i = 0; i < MAX_UIO_MAPS; i++) {
+		if (info->mem[i].internal_addr)
+			iounmap(info->mem[i].internal_addr);
+	}
+}
 
 /**
  * This gets called while opening uio device file.
@@ -331,20 +351,35 @@ igbuio_pci_open(struct uio_info *info, struct inode *inode)
 
 	/* enable interrupts */
 	err = igbuio_pci_enable_interrupts(udev);
-	mutex_unlock(&udev->lock);
 	if (err) {
 		dev_err(&dev->dev, "Enable interrupt fails\n");
+		pci_clear_master(dev);
+		mutex_unlock(&udev->lock);
 		return err;
 	}
+	udev->state = RTE_UDEV_OPENNED;
+	mutex_unlock(&udev->lock);
 	return 0;
 }
 
+/**
+ * This gets called while closing uio device file.
+ */
 static int
 igbuio_pci_release(struct uio_info *info, struct inode *inode)
 {
 	struct rte_uio_pci_dev *udev = info->priv;
 	struct pci_dev *dev = udev->pdev;
 
+	if (udev->state == RTE_UDEV_REMOVED) {
+		mutex_destroy(&udev->lock);
+		igbuio_pci_release_iomem(&udev->info);
+		pci_disable_device(dev);
+		pci_set_drvdata(dev, NULL);
+		kfree(udev);
+		return 0;
+	}
+
 	mutex_lock(&udev->lock);
 	if (--udev->refcnt > 0) {
 		mutex_unlock(&udev->lock);
@@ -356,7 +391,7 @@ igbuio_pci_release(struct uio_info *info, struct inode *inode)
 
 	/* stop the device from further DMA */
 	pci_clear_master(dev);
-
+	udev->state = RTE_UDEV_RELEASED;
 	mutex_unlock(&udev->lock);
 	return 0;
 }
@@ -414,18 +449,6 @@ igbuio_pci_setup_ioport(struct pci_dev *dev, struct uio_info *info,
 	return 0;
 }
 
-/* Unmap previously ioremap'd resources */
-static void
-igbuio_pci_release_iomem(struct uio_info *info)
-{
-	int i;
-
-	for (i = 0; i < MAX_UIO_MAPS; i++) {
-		if (info->mem[i].internal_addr)
-			iounmap(info->mem[i].internal_addr);
-	}
-}
-
 static int
 igbuio_setup_bars(struct pci_dev *dev, struct uio_info *info)
 {
@@ -562,6 +585,9 @@ igbuio_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
 			 (unsigned long long)map_dma_addr, map_addr);
 	}
 
+	mutex_lock(&udev->lock);
+	udev->state = RTE_UDEV_PROBED;
+	mutex_unlock(&udev->lock);
 	return 0;
 
 fail_remove_group:
@@ -579,6 +605,21 @@ static void
 igbuio_pci_remove(struct pci_dev *dev)
 {
 	struct rte_uio_pci_dev *udev = pci_get_drvdata(dev);
+	struct pci_dev *pdev = udev->pdev;
+	int ret;
+
+	/* handle unexpected removal */
+	if (udev->state == RTE_UDEV_OPENNED ||
+	    (&pdev->dev.kobj)->state_remove_uevent_sent == 1) {
+		dev_notice(&dev->dev, "Unexpected removal!\n");
+		ret = igbuio_pci_release(&udev->info, NULL);
+		if (ret)
+			return;
+		mutex_lock(&udev->lock);
+		udev->state = RTE_UDEV_REMOVED;
+		mutex_unlock(&udev->lock);
+		return;
+	}
 
 	mutex_destroy(&udev->lock);
 	sysfs_remove_group(&dev->dev.kobj, &dev_attr_grp);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* Re: [PATCH v9 0/7] hotplug failure handle mechanism
  2018-07-11 10:41       ` [PATCH v9 0/7] hotplug failure handle mechanism Jeff Guo
                           ` (6 preceding siblings ...)
  2018-07-11 10:41         ` [PATCH v9 7/7] igb_uio: fix unexpected remove issue " Jeff Guo
@ 2018-07-11 15:46         ` Stephen Hemminger
  2018-07-12  3:14           ` Jeff Guo
  7 siblings, 1 reply; 494+ messages in thread
From: Stephen Hemminger @ 2018-07-11 15:46 UTC (permalink / raw)
  To: Jeff Guo
  Cc: bruce.richardson, ferruh.yigit, konstantin.ananyev, gaetan.rivet,
	jingjing.wu, thomas, motih, matan, harry.van.haaren, qi.z.zhang,
	shaopeng.he, bernard.iremonger, arybchenko, wenzhuo.lu, jblunck,
	shreyansh.jain, dev, helin.zhang

On Wed, 11 Jul 2018 18:41:50 +0800
Jeff Guo <jia.guo@intel.com> wrote:

> As we know, hot plug is an importance feature, either use for the datacenter
> device’s fail-safe, or use for SRIOV Live Migration in SDN/NFV. It could bring
> the higher flexibility and continuality to the networking services in multiple
> use cases in industry. So let we see, dpdk as an importance networking
> framework, what can it help to implement hot plug solution for users.
> 
> We already have a general device event detect mechanism, failsafe driver,
> bonding driver and hot plug/unplug api in framework, app could use these to
> develop their hot plug solution.

I like seeing a better solution to hot plug. But it is worth mentioning
that for the Hyper-V netvsc driver this is mostly unnecessary. The Hyper-V
host notifies the network driver directly about availability of SRIOV
device, and the netvsc device driver can use that to do its own VF
management. It doesn't really need (or want) to be using a general
PCI solution.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v9 7/7] igb_uio: fix unexpected remove issue for hotplug
  2018-07-11 10:41         ` [PATCH v9 7/7] igb_uio: fix unexpected remove issue " Jeff Guo
@ 2018-07-12  1:57           ` He, Shaopeng
  0 siblings, 0 replies; 494+ messages in thread
From: He, Shaopeng @ 2018-07-12  1:57 UTC (permalink / raw)
  To: Guo, Jia, stephen, Richardson, Bruce, Yigit, Ferruh, Ananyev,
	Konstantin, gaetan.rivet, Wu, Jingjing, thomas, motih, matan,
	Van Haaren, Harry, Zhang, Qi Z, Iremonger, Bernard, arybchenko,
	Lu, Wenzhuo
  Cc: jblunck, shreyansh.jain, dev, Zhang, Helin


> -----Original Message-----
> From: Guo, Jia
> Sent: Wednesday, July 11, 2018 6:42 PM
> 
> When device be hotplug out, the pci resource will be released in kernel,
> the uio fd will disappear, and the irq will be released. At this time,
> if igb uio driver still try to access or release these resource, it will
> cause kernel crash.
> 
> On the other hand, uio_remove will be called unexpectedly before
> uio_release. The uio_remove procedure will free resources which are needed
> by uio_release. So there is no chance to disable interrupt which is defined
> inside uio_release procedure. This will affect later usage of interrupt.
> 
> So the case of unexpectedly removal by hot unplug should be identify and
> correspondingly processed.
> 
> This patch propose to add enum rte_udev_state in struct rte_uio_pci_dev,
> that will keep the state of uio device as probed/opened/released/removed.
> 
> This patch also checks kobject’s remove_uevent_sent state to detect the
> unexpectedly removal status which means hot unplug. Once hot unplug be
> detected, it will call uio_release as soon as possible and set the uio
> status to be “removed”. After that, uio will check this status in
> uio_release function, if uio have already been removed, it will only free
> the dirty uio resource.
> 
> Signed-off-by: Jeff Guo <jia.guo@intel.com>

Even though we prefer vfio than igb_uio as vfio is safer and more standard way
of accessing devices, it's still good to have this bug-fixing to avoiding kernel crash
and memory leak. 

Later on, we might further enhance igb_uio by introducing similar mechanism
which vfio-pci currently uses (send event to up-layer application in the middle of 
pci remove process), so up-layer application can close this device more gracefully.
Or, we can suggest to use vfio, and leave igb_uio as it is.

Thanks,
--Shaopeng

Acked-by: Shaopeng He <shaopeng.he@intel.com>


^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v9 0/7] hotplug failure handle mechanism
  2018-07-11 15:46         ` [PATCH v9 0/7] hotplug failure handle mechanism Stephen Hemminger
@ 2018-07-12  3:14           ` Jeff Guo
  0 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-07-12  3:14 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: bruce.richardson, ferruh.yigit, konstantin.ananyev, gaetan.rivet,
	jingjing.wu, thomas, motih, matan, harry.van.haaren, qi.z.zhang,
	shaopeng.he, bernard.iremonger, arybchenko, wenzhuo.lu, jblunck,
	shreyansh.jain, dev, helin.zhang



On 7/11/2018 11:46 PM, Stephen Hemminger wrote:
> On Wed, 11 Jul 2018 18:41:50 +0800
> Jeff Guo <jia.guo@intel.com> wrote:
>
>> As we know, hot plug is an importance feature, either use for the datacenter
>> device’s fail-safe, or use for SRIOV Live Migration in SDN/NFV. It could bring
>> the higher flexibility and continuality to the networking services in multiple
>> use cases in industry. So let we see, dpdk as an importance networking
>> framework, what can it help to implement hot plug solution for users.
>>
>> We already have a general device event detect mechanism, failsafe driver,
>> bonding driver and hot plug/unplug api in framework, app could use these to
>> develop their hot plug solution.
> I like seeing a better solution to hot plug. But it is worth mentioning
> that for the Hyper-V netvsc driver this is mostly unnecessary. The Hyper-V
> host notifies the network driver directly about availability of SRIOV
> device, and the netvsc device driver can use that to do its own VF
> management. It doesn't really need (or want) to be using a general
> PCI solution.

I am not sure about Hyper-v netvsc driver, but i am very interesting to 
deep analyze it to find what could let we find a best solution for 
hotplug. But since some customer, who like me still not have chance to 
use Hyper-v, they will encounter the hotplug issue when use kvm or other 
platform.
so  let we said that the topic about here is aim to fix the issue for 
that part of customer or developer.  Just let hotplug be used more in 
diversity scenario.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH v10 0/8] hotplug failure handle mechanism
  2017-06-29  4:37     ` [PATCH v3 0/2] add uevent api for hot plug Jeff Guo
                         ` (15 preceding siblings ...)
  2018-07-11 10:41       ` [PATCH v9 0/7] hotplug failure handle mechanism Jeff Guo
@ 2018-08-17 10:48       ` Jeff Guo
  2018-08-17 10:48         ` [PATCH v10 1/8] bus: add memory failure handler Jeff Guo
                           ` (8 more replies)
  2018-09-30 10:24       ` [PATCH v11 0/7] hot-unplug " Jeff Guo
                         ` (6 subsequent siblings)
  23 siblings, 9 replies; 494+ messages in thread
From: Jeff Guo @ 2018-08-17 10:48 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

Hotplug is an important feature for use-cases like the datacenter device's
fail-safe and for SRIOV Live Migration in SDN/NFV. It could bring higher
flexibility and continuality to networking services in multiple use-cases
in the industry. So let's see how DPDK can help users implement hotplug
solutions.

We already have a general device-event monitor mechanism, failsafe driver,
and hot plug/unplug API in DPDK. We have already got the solution of
“ethdev event + kernel PMD hotplug handler + failsafe”, but we still not
got “eal event + hotplug handler for pci PMD + failsafe” implement, and we
need to considerate 2 different solutions between uio  and vfio.
 
In the case of hotplug for igb_uio, when a hardware device be removed
physically or disabled in software, the application needs to be notified
and detach the device out of the bus, and then make the device invalidate.
The problem is that, the removal of the device is not instantaneous in
software. If the application data path tries to read/write to the device
when removal is still in process, it will cause an MMIO error and
application will crash.

In this patch set, we propose a PCIe bus failure handler mechanism for
hotplug in igb_uio. It aims to guarantee that, when a hot unplug occurs,
the system will not crash and the application will not break out. 

The mechanism should work as below:

First, the application enables the device event monitor, registers the
hotplug event’s callback and enable hotplug handling before running the
data path. Once the hot unplug occurs, the mechanism will detect the
removal event and then accordingly do the failure handling. In order to
do that, the below functionality will be required:
 - Add a new bus ops “memory_failure_handler” to handle bus read/write
   errors.
 - Implement pci bus specific ops “pci_memory_failure_handler”. It will
   be based on the failure address to remap memory for the corresponding
   device that unplugged.

For the data path or other unexpected behaviors from the control path
when a hot unplug occurs:
 - Add a new bus ops “sigbus_handler”, that is responsible for handling
   the sigbus error which is either an original memory error, or a specific
   memory error that is caused by a hot unplug. When a sigbus error is
   captured, it will call this function to handle sigbus error.
 - Implement PCI bus specific ops “pci_sigbus_handler”. It will iterate all
   device on PCI bus to find which device encounter the failure.
 - Implement a "rte_bus_sigbus_handler" to iterate all buses to find a bus
   to handle the failure.
 - Add a couple of APIs “rte_dev_hotplug_handle_enable” and
   “rte_dev_hotplug_handle_diable” to enable/disable hotplug handling.
   It will monitor the sigbus error by a handler which is per-process.
   Based on the signal event principle, the control path thread and the
   data path thread will randomly receive the sigbus error, but will call the
   common sigbus process. When sigbus be captured, it will call the above API
   to find bus to handle it.

The mechanism could be used by app or PMDs. For example, the whole process
of hotplug in testpmd is:
 - Enable device event monitor->Enable hotplug handle->Register event callback
   ->attach port->start port->start forwarding->Device unplug->failure handle
   ->stop forwarding->stop port->close port->detach port.

This patch set would not cover hotplug insert and binding, and it is only
implement the igb_uio failure handler, the vfio hotplug failure handler
will be in next coming patch set.

patchset history:
v10->v9:
modify the api name and exposure out for public use.
add hotplug handle enable/disable APIs
refine commit log

v9->v8:
refine commit log to be more readable.

v8->v7:
refine errno process in sigbus handler.
refine igb uio release process

v7->v6:
delete some unused part

v6->v5:
refine some description about bus ops
refine commit log
add some entry check.

v5->v4:
split patches to focus on the failure handle, remove the event usage
by testpmd to another patch.
change the hotplug failure handler name.
refine the sigbus handle logic.
add lock for udev state in igb uio driver.

v4->v3:
split patches to be small and clear.
change to use new parameter "--hotplug-mode" in testpmd to identify
the eal hotplug and ethdev hotplug.

v3->v2:
change bus ops name to bus_hotplug_handler.
add new API and bus ops of bus_signal_handler distingush handle generic.
sigbus and hotplug sigbus.

v2->v1(v21):
refine some doc and commit log.
fix igb uio kernel issue for control path failure rebase testpmd code.

Since the hot plug solution be discussed serval around in the public,
the scope be changed and the patch set be split into many times. Coming
to the recently RFC and feature design, it just focus on the hot unplug
failure handler at this patch set, so in order let this topic more clear
and focus, summarize privours patch set in history “v1(v21)”, the v2 here
go ahead for further track.

"v1(21)" == v21 as below:
v21->v20:
split function in hot unplug ops.
sync failure hanlde to fix multiple process issue fix attach port issue for multiple devices case.
combind rmv callback function to be only one.

v20->v19:
clean the code.
refine the remap logic for multiple device.
remove the auto binding.

v19->18:
note for limitation of multiple hotplug, fix some typo, sqeeze patch.

v18->v15:
add document, add signal bus handler, refine the code to be more clear.

the prior patch history please check the patch set "add device event monitor framework".

Jeff Guo (8):
  bus: add memory failure handler
  bus/pci: implement memory failure handler ops
  bus: add sigbus handler
  bus/pci: implement sigbus handler ops
  bus: add helper to handle sigbus
  eal: add failure handle mechanism for hotplug
  igb_uio: fix unexpected remove issue for hotplug
  testpmd: use hotplug failure handle mechanism

 app/test-pmd/testpmd.c                  |  27 +++++-
 doc/guides/rel_notes/release_18_08.rst  |   5 +
 drivers/bus/pci/pci_common.c            |  81 ++++++++++++++++
 drivers/bus/pci/pci_common_uio.c        |  33 +++++++
 drivers/bus/pci/private.h               |  12 +++
 kernel/linux/igb_uio/igb_uio.c          |  69 +++++++++++---
 lib/librte_eal/bsdapp/eal/eal_dev.c     |  14 +++
 lib/librte_eal/common/eal_common_bus.c  |  43 +++++++++
 lib/librte_eal/common/eal_private.h     |  38 ++++++++
 lib/librte_eal/common/include/rte_bus.h |  34 +++++++
 lib/librte_eal/common/include/rte_dev.h |  26 ++++++
 lib/librte_eal/linuxapp/eal/eal_dev.c   | 159 +++++++++++++++++++++++++++++++-
 lib/librte_eal/rte_eal_version.map      |   2 +
 13 files changed, 523 insertions(+), 20 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH v10 1/8] bus: add memory failure handler
  2018-08-17 10:48       ` [PATCH v10 0/8] " Jeff Guo
@ 2018-08-17 10:48         ` Jeff Guo
  2018-08-17 10:48         ` [PATCH v10 2/8] bus/pci: implement memory failure handler ops Jeff Guo
                           ` (7 subsequent siblings)
  8 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-08-17 10:48 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

A memory failure and system crash can be caused if a device is hotplugged
out but the application can still access the device by MMIO.

This patch introduces bus ops to handle memory failures of illegal access,
especially for hotplug. Each bus can implement its own case-dependent
logic to handle the memory failures.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v10->v9:
modify bus ops name
---
 lib/librte_eal/common/include/rte_bus.h | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
index b7b5b08..2606451 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -168,6 +168,20 @@ typedef int (*rte_bus_unplug_t)(struct rte_device *dev);
 typedef int (*rte_bus_parse_t)(const char *name, void *addr);
 
 /**
+ * Implement a specific memory failure handler, which is responsible for
+ * handle the failure of memory illegal access, especially for hotplug. When
+ * the event of hotplug-out be detected, it could call this function to handle
+ * the memory failure and avoid system crash.
+ * @param dev
+ *	Pointer of the device structure.
+ *
+ * @return
+ *	0 on success.
+ *	!0 on error.
+ */
+typedef int (*rte_bus_memory_failure_handler_t)(struct rte_device *dev);
+
+/**
  * Bus scan policies
  */
 enum rte_bus_scan_mode {
@@ -212,6 +226,8 @@ struct rte_bus {
 	struct rte_bus_conf conf;    /**< Bus configuration */
 	rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
 	rte_dev_iterate_t dev_iterate; /**< Device iterator. */
+	rte_bus_memory_failure_handler_t memory_failure_handler;
+					/**< handle memory failure on the bus */
 };
 
 /**
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v10 2/8] bus/pci: implement memory failure handler ops
  2018-08-17 10:48       ` [PATCH v10 0/8] " Jeff Guo
  2018-08-17 10:48         ` [PATCH v10 1/8] bus: add memory failure handler Jeff Guo
@ 2018-08-17 10:48         ` Jeff Guo
  2018-08-17 10:48         ` [PATCH v10 3/8] bus: add sigbus handler Jeff Guo
                           ` (6 subsequent siblings)
  8 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-08-17 10:48 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

This patch implements the ops to handle memory failures on the PCI bus. It
avoids MMIO read/write errors by creating a new dummy memory to remap the
memory where the failure is.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Shaopeng He <shaopeng.he@intel.com>
---
v10->v9:
change pci bus ops name
---
 drivers/bus/pci/pci_common.c     | 28 ++++++++++++++++++++++++++++
 drivers/bus/pci/pci_common_uio.c | 33 +++++++++++++++++++++++++++++++++
 drivers/bus/pci/private.h        | 12 ++++++++++++
 3 files changed, 73 insertions(+)

diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
index 7736b3f..759ccc3 100644
--- a/drivers/bus/pci/pci_common.c
+++ b/drivers/bus/pci/pci_common.c
@@ -406,6 +406,33 @@ pci_find_device(const struct rte_device *start, rte_dev_cmp_t cmp,
 }
 
 static int
+pci_memory_failure_handler(struct rte_device *dev)
+{
+	struct rte_pci_device *pdev = NULL;
+	int ret = 0;
+
+	pdev = RTE_DEV_TO_PCI(dev);
+	if (!pdev)
+		return -1;
+
+	switch (pdev->kdrv) {
+	case RTE_KDRV_IGB_UIO:
+	case RTE_KDRV_UIO_GENERIC:
+	case RTE_KDRV_NIC_UIO:
+		/* mmio resource is invalid, remap it to be safe. */
+		ret = pci_uio_remap_resource(pdev);
+		break;
+	default:
+		RTE_LOG(DEBUG, EAL,
+			"Not managed by a supported kernel driver, skipped\n");
+		ret = -1;
+		break;
+	}
+
+	return ret;
+}
+
+static int
 pci_plug(struct rte_device *dev)
 {
 	return pci_probe_all_drivers(RTE_DEV_TO_PCI(dev));
@@ -435,6 +462,7 @@ struct rte_pci_bus rte_pci_bus = {
 		.unplug = pci_unplug,
 		.parse = pci_parse,
 		.get_iommu_class = rte_pci_get_iommu_class,
+		.memory_failure_handler = pci_memory_failure_handler,
 	},
 	.device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
 	.driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
diff --git a/drivers/bus/pci/pci_common_uio.c b/drivers/bus/pci/pci_common_uio.c
index 54bc20b..7ea73db 100644
--- a/drivers/bus/pci/pci_common_uio.c
+++ b/drivers/bus/pci/pci_common_uio.c
@@ -146,6 +146,39 @@ pci_uio_unmap(struct mapped_pci_resource *uio_res)
 	}
 }
 
+/* remap the PCI resource of a PCI device in anonymous virtual memory */
+int
+pci_uio_remap_resource(struct rte_pci_device *dev)
+{
+	int i;
+	void *map_address;
+
+	if (dev == NULL)
+		return -1;
+
+	/* Remap all BARs */
+	for (i = 0; i != PCI_MAX_RESOURCE; i++) {
+		/* skip empty BAR */
+		if (dev->mem_resource[i].phys_addr == 0)
+			continue;
+		map_address = mmap(dev->mem_resource[i].addr,
+				(size_t)dev->mem_resource[i].len,
+				PROT_READ | PROT_WRITE,
+				MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
+		if (map_address == MAP_FAILED) {
+			RTE_LOG(ERR, EAL,
+				"Cannot remap resource for device %s\n",
+				dev->name);
+			return -1;
+		}
+		RTE_LOG(INFO, EAL,
+			"Successful remap resource for device %s\n",
+			dev->name);
+	}
+
+	return 0;
+}
+
 static struct mapped_pci_resource *
 pci_uio_find_resource(struct rte_pci_device *dev)
 {
diff --git a/drivers/bus/pci/private.h b/drivers/bus/pci/private.h
index 8ddd03e..6b312e5 100644
--- a/drivers/bus/pci/private.h
+++ b/drivers/bus/pci/private.h
@@ -123,6 +123,18 @@ void pci_uio_free_resource(struct rte_pci_device *dev,
 		struct mapped_pci_resource *uio_res);
 
 /**
+ * Remap the PCI resource of a PCI device in anonymous virtual memory.
+ *
+ * @param dev
+ *   Point to the struct rte pci device.
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+pci_uio_remap_resource(struct rte_pci_device *dev);
+
+/**
  * Map device memory to uio resource
  *
  * This function is private to EAL.
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v10 3/8] bus: add sigbus handler
  2018-08-17 10:48       ` [PATCH v10 0/8] " Jeff Guo
  2018-08-17 10:48         ` [PATCH v10 1/8] bus: add memory failure handler Jeff Guo
  2018-08-17 10:48         ` [PATCH v10 2/8] bus/pci: implement memory failure handler ops Jeff Guo
@ 2018-08-17 10:48         ` Jeff Guo
  2018-08-17 10:48         ` [PATCH v10 4/8] bus/pci: implement sigbus handler ops Jeff Guo
                           ` (5 subsequent siblings)
  8 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-08-17 10:48 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

When a device is hotplugged out, a sigbus error will occur of the datapath
can still read/write to the device. A handler is required here to capture
the sigbus signal and handle it appropriately.

This patch introduces bus ops to handle sigbus errors. Each bus can
implement its own case-dependent logic to handle the sigbus errors.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Shaopeng He <shaopeng.he@intel.com>
---
v10->v9:
refine commit log
---
 lib/librte_eal/common/include/rte_bus.h | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
index 2606451..ddb29dd 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -182,6 +182,21 @@ typedef int (*rte_bus_parse_t)(const char *name, void *addr);
 typedef int (*rte_bus_memory_failure_handler_t)(struct rte_device *dev);
 
 /**
+ * Implement a specific sigbus handler, which is responsible for handle
+ * the sigbus error which is either original memory error, or specific memory
+ * error that caused of device be hotplug-out. When sigbus error be captured,
+ * it could call this function to handle sigbus error.
+ * @param failure_addr
+ *	Pointer of the fault address of the sigbus error.
+ *
+ * @return
+ *	0 for success handle the sigbus.
+ *	1 for no bus handle the sigbus.
+ *	-1 for failed to handle the sigbus
+ */
+typedef int (*rte_bus_sigbus_handler_t)(const void *failure_addr);
+
+/**
  * Bus scan policies
  */
 enum rte_bus_scan_mode {
@@ -228,6 +243,9 @@ struct rte_bus {
 	rte_dev_iterate_t dev_iterate; /**< Device iterator. */
 	rte_bus_memory_failure_handler_t memory_failure_handler;
 					/**< handle memory failure on the bus */
+	rte_bus_sigbus_handler_t sigbus_handler;
+					/**< handle sigbus error on the bus */
+
 };
 
 /**
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v10 4/8] bus/pci: implement sigbus handler ops
  2018-08-17 10:48       ` [PATCH v10 0/8] " Jeff Guo
                           ` (2 preceding siblings ...)
  2018-08-17 10:48         ` [PATCH v10 3/8] bus: add sigbus handler Jeff Guo
@ 2018-08-17 10:48         ` Jeff Guo
  2018-08-17 10:48         ` [PATCH v10 5/8] bus: add helper to handle sigbus Jeff Guo
                           ` (4 subsequent siblings)
  8 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-08-17 10:48 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

This patch implements the ops for the PCI bus sigbus handler. It finds the
PCI device that is being hotplugged out and calls the relevant ops of the
memory failure handler to handle the failure of the device.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Shaopeng He <shaopeng.he@intel.com>
---
v10->v9:
refine doc.
---
 drivers/bus/pci/pci_common.c | 53 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 53 insertions(+)

diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
index 759ccc3..b8f3244 100644
--- a/drivers/bus/pci/pci_common.c
+++ b/drivers/bus/pci/pci_common.c
@@ -405,6 +405,36 @@ pci_find_device(const struct rte_device *start, rte_dev_cmp_t cmp,
 	return NULL;
 }
 
+/**
+ * find the device which encounter the failure, by iterate all device on
+ * PCI bus to check if the memory failure address is located in the range
+ * of the BARs of any device.
+ */
+static struct rte_pci_device *
+pci_find_device_by_addr(const void *failure_addr)
+{
+	struct rte_pci_device *pdev = NULL;
+	int i;
+
+	FOREACH_DEVICE_ON_PCIBUS(pdev) {
+		for (i = 0; i != RTE_DIM(pdev->mem_resource); i++) {
+			if ((uint64_t)(uintptr_t)failure_addr >=
+			    (uint64_t)(uintptr_t)pdev->mem_resource[i].addr &&
+			    (uint64_t)(uintptr_t)failure_addr <
+			    (uint64_t)(uintptr_t)pdev->mem_resource[i].addr +
+			    pdev->mem_resource[i].len) {
+				RTE_LOG(INFO, EAL, "Failure address "
+					"%16.16"PRIx64" belongs to "
+					"device %s!\n",
+					(uint64_t)(uintptr_t)failure_addr,
+					pdev->device.name);
+				return pdev;
+			}
+		}
+	}
+	return NULL;
+}
+
 static int
 pci_memory_failure_handler(struct rte_device *dev)
 {
@@ -433,6 +463,28 @@ pci_memory_failure_handler(struct rte_device *dev)
 }
 
 static int
+pci_sigbus_handler(const void *failure_addr)
+{
+	struct rte_pci_device *pdev = NULL;
+	int ret = 0;
+
+	pdev = pci_find_device_by_addr(failure_addr);
+	if (!pdev) {
+		/* It is a generic sigbus error, no bus would handle it. */
+		ret = 1;
+	} else {
+		/* The sigbus error is caused of hotplug-out. */
+		ret = pci_memory_failure_handler(&pdev->device);
+		if (ret) {
+			RTE_LOG(ERR, EAL, "Failed to handle failure for "
+				"device %s", pdev->name);
+			ret = -1;
+		}
+	}
+	return ret;
+}
+
+static int
 pci_plug(struct rte_device *dev)
 {
 	return pci_probe_all_drivers(RTE_DEV_TO_PCI(dev));
@@ -463,6 +515,7 @@ struct rte_pci_bus rte_pci_bus = {
 		.parse = pci_parse,
 		.get_iommu_class = rte_pci_get_iommu_class,
 		.memory_failure_handler = pci_memory_failure_handler,
+		.sigbus_handler = pci_sigbus_handler,
 	},
 	.device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
 	.driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v10 5/8] bus: add helper to handle sigbus
  2018-08-17 10:48       ` [PATCH v10 0/8] " Jeff Guo
                           ` (3 preceding siblings ...)
  2018-08-17 10:48         ` [PATCH v10 4/8] bus/pci: implement sigbus handler ops Jeff Guo
@ 2018-08-17 10:48         ` Jeff Guo
  2018-08-17 10:48         ` Jeff Guo
                           ` (3 subsequent siblings)
  8 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-08-17 10:48 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

This patch aims to add a helper to iterate through all buses to find the
relevant bus to handle the sigbus error.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Shaopeng He <shaopeng.he@intel.com>
---
v10->v9:
refine commit log
---
 lib/librte_eal/common/eal_common_bus.c | 43 ++++++++++++++++++++++++++++++++++
 lib/librte_eal/common/eal_private.h    | 12 ++++++++++
 2 files changed, 55 insertions(+)

diff --git a/lib/librte_eal/common/eal_common_bus.c b/lib/librte_eal/common/eal_common_bus.c
index 0943851..62b7318 100644
--- a/lib/librte_eal/common/eal_common_bus.c
+++ b/lib/librte_eal/common/eal_common_bus.c
@@ -37,6 +37,7 @@
 #include <rte_bus.h>
 #include <rte_debug.h>
 #include <rte_string_fns.h>
+#include <rte_errno.h>
 
 #include "eal_private.h"
 
@@ -242,3 +243,45 @@ rte_bus_get_iommu_class(void)
 	}
 	return mode;
 }
+
+static int
+bus_handle_sigbus(const struct rte_bus *bus,
+			const void *failure_addr)
+{
+	int ret;
+
+	if (!bus->sigbus_handler)
+		return -1;
+
+	ret = bus->sigbus_handler(failure_addr);
+
+	/* find bus but handle failed, keep the errno be set. */
+	if (ret < 0 && rte_errno == 0)
+		rte_errno = ENOTSUP;
+
+	return ret > 0;
+}
+
+int
+rte_bus_sigbus_handler(const void *failure_addr)
+{
+	struct rte_bus *bus;
+
+	int ret = 0;
+	int old_errno = rte_errno;
+
+	rte_errno = 0;
+
+	bus = rte_bus_find(NULL, bus_handle_sigbus, failure_addr);
+	/* can not find bus. */
+	if (!bus)
+		return 1;
+	/* find bus but handle failed, pass on the new errno. */
+	else if (rte_errno != 0)
+		return -1;
+
+	/* restore the old errno. */
+	rte_errno = old_errno;
+
+	return ret;
+}
diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h
index 4f809a8..168430e 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -304,4 +304,16 @@ int
 rte_devargs_layers_parse(struct rte_devargs *devargs,
 			 const char *devstr);
 
+/**
+ * Iterate all buses to find the corresponding bus to handle the sigbus error.
+ * @param failure_addr
+ *	Pointer of the fault address of the sigbus error.
+ *
+ * @return
+ *	 0 success to handle the sigbus.
+ *	-1 failed to handle the sigbus
+ *	 1 no bus can handler the sigbus
+ */
+int rte_bus_sigbus_handler(const void *failure_addr);
+
 #endif /* _EAL_PRIVATE_H_ */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v10 5/8] bus: add helper to handle sigbus
  2018-08-17 10:48       ` [PATCH v10 0/8] " Jeff Guo
                           ` (4 preceding siblings ...)
  2018-08-17 10:48         ` [PATCH v10 5/8] bus: add helper to handle sigbus Jeff Guo
@ 2018-08-17 10:48         ` Jeff Guo
  2018-08-17 10:48         ` [PATCH v10 6/8] eal: add failure handle mechanism for hotplug Jeff Guo
                           ` (2 subsequent siblings)
  8 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-08-17 10:48 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

This patch aims to add a helper to iterate through all buses to find the
relevant bus to handle the sigbus error.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Shaopeng He <shaopeng.he@intel.com>
---
v10->v9:
refine commit log
---
 lib/librte_eal/common/eal_common_bus.c | 43 ++++++++++++++++++++++++++++++++++
 lib/librte_eal/common/eal_private.h    | 12 ++++++++++
 2 files changed, 55 insertions(+)

diff --git a/lib/librte_eal/common/eal_common_bus.c b/lib/librte_eal/common/eal_common_bus.c
index 0943851..62b7318 100644
--- a/lib/librte_eal/common/eal_common_bus.c
+++ b/lib/librte_eal/common/eal_common_bus.c
@@ -37,6 +37,7 @@
 #include <rte_bus.h>
 #include <rte_debug.h>
 #include <rte_string_fns.h>
+#include <rte_errno.h>
 
 #include "eal_private.h"
 
@@ -242,3 +243,45 @@ rte_bus_get_iommu_class(void)
 	}
 	return mode;
 }
+
+static int
+bus_handle_sigbus(const struct rte_bus *bus,
+			const void *failure_addr)
+{
+	int ret;
+
+	if (!bus->sigbus_handler)
+		return -1;
+
+	ret = bus->sigbus_handler(failure_addr);
+
+	/* find bus but handle failed, keep the errno be set. */
+	if (ret < 0 && rte_errno == 0)
+		rte_errno = ENOTSUP;
+
+	return ret > 0;
+}
+
+int
+rte_bus_sigbus_handler(const void *failure_addr)
+{
+	struct rte_bus *bus;
+
+	int ret = 0;
+	int old_errno = rte_errno;
+
+	rte_errno = 0;
+
+	bus = rte_bus_find(NULL, bus_handle_sigbus, failure_addr);
+	/* can not find bus. */
+	if (!bus)
+		return 1;
+	/* find bus but handle failed, pass on the new errno. */
+	else if (rte_errno != 0)
+		return -1;
+
+	/* restore the old errno. */
+	rte_errno = old_errno;
+
+	return ret;
+}
diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h
index 4f809a8..168430e 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -304,4 +304,16 @@ int
 rte_devargs_layers_parse(struct rte_devargs *devargs,
 			 const char *devstr);
 
+/**
+ * Iterate all buses to find the corresponding bus to handle the sigbus error.
+ * @param failure_addr
+ *	Pointer of the fault address of the sigbus error.
+ *
+ * @return
+ *	 0 success to handle the sigbus.
+ *	-1 failed to handle the sigbus
+ *	 1 no bus can handler the sigbus
+ */
+int rte_bus_sigbus_handler(const void *failure_addr);
+
 #endif /* _EAL_PRIVATE_H_ */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v10 6/8] eal: add failure handle mechanism for hotplug
  2018-08-17 10:48       ` [PATCH v10 0/8] " Jeff Guo
                           ` (5 preceding siblings ...)
  2018-08-17 10:48         ` Jeff Guo
@ 2018-08-17 10:48         ` Jeff Guo
  2018-08-17 10:48         ` [PATCH v10 7/8] igb_uio: fix unexpected remove issue " Jeff Guo
  2018-08-17 10:48         ` [PATCH v10 8/8] testpmd: use hotplug failure handle mechanism Jeff Guo
  8 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-08-17 10:48 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

The mechanism can initially register the sigbus handler after the device
event monitor is enabled. When a sigbus error is captured, the mechanism
will check the failure address and accordingly remap the invalid memory
for the corresponding device. It could prevent the application from
crashing when a device is hotplugged out.

By this patch, users could call below new added APIs to enable/disable
the device failure handle mechanism:
  - rte_dev_hotplug_handle_enable
  - rte_dev_hotplug_handle_disable

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v10->v9:
add new APIs to enable/disable hotplug handling
---
 doc/guides/rel_notes/release_18_08.rst  |   5 +
 lib/librte_eal/bsdapp/eal/eal_dev.c     |  14 +++
 lib/librte_eal/common/eal_private.h     |  26 ++++++
 lib/librte_eal/common/include/rte_dev.h |  26 ++++++
 lib/librte_eal/linuxapp/eal/eal_dev.c   | 159 +++++++++++++++++++++++++++++++-
 lib/librte_eal/rte_eal_version.map      |   2 +
 6 files changed, 231 insertions(+), 1 deletion(-)

diff --git a/doc/guides/rel_notes/release_18_08.rst b/doc/guides/rel_notes/release_18_08.rst
index 321fa84..95dc1e0 100644
--- a/doc/guides/rel_notes/release_18_08.rst
+++ b/doc/guides/rel_notes/release_18_08.rst
@@ -117,6 +117,11 @@ New Features
 
   Added support for chained mbufs (input and output).
 
+* **Added failure handle mechanism for hotplug.**
+
+  ``rte_dev_hotplug_handle_enable`` and ``rte_dev_hotplug_handle_disable`` are
+  for enable or disable failure handle mechanism for hotplug.
+
 
 API Changes
 -----------
diff --git a/lib/librte_eal/bsdapp/eal/eal_dev.c b/lib/librte_eal/bsdapp/eal/eal_dev.c
index 1c6c51b..ae1c558 100644
--- a/lib/librte_eal/bsdapp/eal/eal_dev.c
+++ b/lib/librte_eal/bsdapp/eal/eal_dev.c
@@ -19,3 +19,17 @@ rte_dev_event_monitor_stop(void)
 	RTE_LOG(ERR, EAL, "Device event is not supported for FreeBSD\n");
 	return -1;
 }
+
+int
+rte_dev_hotplug_handle_enable(void)
+{
+	RTE_LOG(ERR, EAL, "Device event is not supported for FreeBSD\n");
+	return -1;
+}
+
+int
+rte_dev_hotplug_handle_disable(void)
+{
+	RTE_LOG(ERR, EAL, "Device event is not supported for FreeBSD\n");
+	return -1;
+}
diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h
index 168430e..3cf0357 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -316,4 +316,30 @@ rte_devargs_layers_parse(struct rte_devargs *devargs,
  */
 int rte_bus_sigbus_handler(const void *failure_addr);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Register the sigbus hander.
+ *
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+rte_dev_sigbus_handler_register(void);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Unregister the sigbus hander.
+ *
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+rte_dev_sigbus_handler_unregister(void);
+
 #endif /* _EAL_PRIVATE_H_ */
diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h
index b80a805..ff580a0 100644
--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -460,4 +460,30 @@ rte_dev_event_monitor_start(void);
 int __rte_experimental
 rte_dev_event_monitor_stop(void);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Enable hotplug handling for devices.
+ *
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int __rte_experimental
+rte_dev_hotplug_handle_enable(void);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Disable hotplug handling for devices.
+ *
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int __rte_experimental
+rte_dev_hotplug_handle_disable(void);
+
 #endif /* _RTE_DEV_H_ */
diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
index 1cf6aeb..fa5cb9b 100644
--- a/lib/librte_eal/linuxapp/eal/eal_dev.c
+++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
@@ -4,6 +4,8 @@
 
 #include <string.h>
 #include <unistd.h>
+#include <fcntl.h>
+#include <signal.h>
 #include <sys/socket.h>
 #include <linux/netlink.h>
 
@@ -14,15 +16,31 @@
 #include <rte_malloc.h>
 #include <rte_interrupts.h>
 #include <rte_alarm.h>
+#include <rte_bus.h>
+#include <rte_eal.h>
+#include <rte_spinlock.h>
+#include <rte_errno.h>
 
 #include "eal_private.h"
 
 static struct rte_intr_handle intr_handle = {.fd = -1 };
 static bool monitor_started;
+static bool hotplug_handle;
 
 #define EAL_UEV_MSG_LEN 4096
 #define EAL_UEV_MSG_ELEM_LEN 128
 
+/*
+ * spinlock for device failure handle, if try to access bus or device,
+ * such as handle sigbus on bus or handle memory failure for device just use
+ * this lock. It could protect the bus and the device to avoid race condition.
+ */
+static rte_spinlock_t failure_handle_lock = RTE_SPINLOCK_INITIALIZER;
+
+static struct sigaction sigbus_action_old;
+
+static int sigbus_need_recover;
+
 static void dev_uev_handler(__rte_unused void *param);
 
 /* identify the system layer which reports this event. */
@@ -33,6 +51,49 @@ enum eal_dev_event_subsystem {
 	EAL_DEV_EVENT_SUBSYSTEM_MAX
 };
 
+static void
+sigbus_action_recover(void)
+{
+	if (sigbus_need_recover) {
+		sigaction(SIGBUS, &sigbus_action_old, NULL);
+		sigbus_need_recover = 0;
+	}
+}
+
+static void sigbus_handler(int signum, siginfo_t *info,
+				void *ctx __rte_unused)
+{
+	int ret;
+
+	RTE_LOG(INFO, EAL, "Thread[%d] catch SIGBUS, fault address:%p\n",
+		(int)pthread_self(), info->si_addr);
+
+	rte_spinlock_lock(&failure_handle_lock);
+	ret = rte_bus_sigbus_handler(info->si_addr);
+	rte_spinlock_unlock(&failure_handle_lock);
+	if (ret == -1) {
+		rte_exit(EXIT_FAILURE,
+			 "Failed to handle SIGBUS for hotplug, "
+			 "(rte_errno: %s)!", strerror(rte_errno));
+	} else if (ret == 1) {
+		if (sigbus_action_old.sa_handler)
+			(*(sigbus_action_old.sa_handler))(signum);
+		else
+			rte_exit(EXIT_FAILURE,
+				 "Failed to handle generic SIGBUS!");
+	}
+
+	RTE_LOG(INFO, EAL, "Success to handle SIGBUS for hotplug!\n");
+}
+
+static int cmp_dev_name(const struct rte_device *dev,
+	const void *_name)
+{
+	const char *name = _name;
+
+	return strcmp(dev->name, name);
+}
+
 static int
 dev_uev_socket_fd_create(void)
 {
@@ -147,6 +208,9 @@ dev_uev_handler(__rte_unused void *param)
 	struct rte_dev_event uevent;
 	int ret;
 	char buf[EAL_UEV_MSG_LEN];
+	struct rte_bus *bus;
+	struct rte_device *dev;
+	const char *busname = "";
 
 	memset(&uevent, 0, sizeof(struct rte_dev_event));
 	memset(buf, 0, EAL_UEV_MSG_LEN);
@@ -171,8 +235,43 @@ dev_uev_handler(__rte_unused void *param)
 	RTE_LOG(DEBUG, EAL, "receive uevent(name:%s, type:%d, subsystem:%d)\n",
 		uevent.devname, uevent.type, uevent.subsystem);
 
-	if (uevent.devname)
+	switch (uevent.subsystem) {
+	case EAL_DEV_EVENT_SUBSYSTEM_PCI:
+	case EAL_DEV_EVENT_SUBSYSTEM_UIO:
+		busname = "pci";
+		break;
+	default:
+		break;
+	}
+
+	if (uevent.devname) {
+		if (uevent.type == RTE_DEV_EVENT_REMOVE && hotplug_handle) {
+			rte_spinlock_lock(&failure_handle_lock);
+			bus = rte_bus_find_by_name(busname);
+			if (bus == NULL) {
+				RTE_LOG(ERR, EAL, "Cannot find bus (%s)\n",
+					busname);
+				return;
+			}
+
+			dev = bus->find_device(NULL, cmp_dev_name,
+					       uevent.devname);
+			if (dev == NULL) {
+				RTE_LOG(ERR, EAL, "Cannot find device (%s) on "
+					"bus (%s)\n", uevent.devname, busname);
+				return;
+			}
+
+			ret = bus->memory_failure_handler(dev);
+			rte_spinlock_unlock(&failure_handle_lock);
+			if (ret) {
+				RTE_LOG(ERR, EAL, "Can not handle hotplug for "
+					"device (%s)\n", dev->name);
+				return;
+			}
+		}
 		dev_callback_process(uevent.devname, uevent.type);
+	}
 }
 
 int __rte_experimental
@@ -220,5 +319,63 @@ rte_dev_event_monitor_stop(void)
 	close(intr_handle.fd);
 	intr_handle.fd = -1;
 	monitor_started = false;
+
 	return 0;
 }
+
+int
+rte_dev_sigbus_handler_register(void)
+{
+	sigset_t mask;
+	struct sigaction action;
+
+	rte_errno = 0;
+
+	sigemptyset(&mask);
+	sigaddset(&mask, SIGBUS);
+	action.sa_flags = SA_SIGINFO;
+	action.sa_mask = mask;
+	action.sa_sigaction = sigbus_handler;
+	sigbus_need_recover = !sigaction(SIGBUS, &action, &sigbus_action_old);
+
+	return rte_errno;
+}
+
+int
+rte_dev_sigbus_handler_unregister(void)
+{
+	rte_errno = 0;
+	sigbus_need_recover = 1;
+
+	sigbus_action_recover();
+
+	return rte_errno;
+}
+
+int
+rte_dev_hotplug_handle_enable(void)
+{
+	int ret = 0;
+
+	ret = rte_dev_sigbus_handler_register();
+	if (ret < 0)
+		RTE_LOG(ERR, EAL, "fail to register sigbus handler for devices.\n");
+
+	hotplug_handle = true;
+
+	return ret;
+}
+
+int
+rte_dev_hotplug_handle_disable(void)
+{
+	int ret = 0;
+
+	ret = rte_dev_sigbus_handler_unregister();
+	if (ret < 0)
+		RTE_LOG(ERR, EAL, "fail to unregister sigbus handler for devices.\n");
+
+	hotplug_handle = false;
+
+	return ret;
+}
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 344a43d..996e709 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -274,6 +274,8 @@ EXPERIMENTAL {
 	rte_dev_event_callback_unregister;
 	rte_dev_event_monitor_start;
 	rte_dev_event_monitor_stop;
+	rte_dev_hotplug_handle_enable;
+	rte_dev_hotplug_handle_disable;
 	rte_dev_iterator_init;
 	rte_dev_iterator_next;
 	rte_devargs_add;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v10 7/8] igb_uio: fix unexpected remove issue for hotplug
  2018-08-17 10:48       ` [PATCH v10 0/8] " Jeff Guo
                           ` (6 preceding siblings ...)
  2018-08-17 10:48         ` [PATCH v10 6/8] eal: add failure handle mechanism for hotplug Jeff Guo
@ 2018-08-17 10:48         ` Jeff Guo
  2018-09-27 15:07           ` Ferruh Yigit
  2018-10-18  6:27           ` [PATCH v1] igb_uio: fix unexpected removal for hot-unplug Jeff Guo
  2018-08-17 10:48         ` [PATCH v10 8/8] testpmd: use hotplug failure handle mechanism Jeff Guo
  8 siblings, 2 replies; 494+ messages in thread
From: Jeff Guo @ 2018-08-17 10:48 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

When a device is hotplugged out, the PCI resource is released in the
kernel, the UIO file descriptor will disappear and the irq will be
released. After this, a kernel crash will be caused if the igb uio driver
tries to access or release these resources.

And more, uio_remove will be called unexpectedly before uio_release
when device be hotpluggged out, the uio_remove procedure will
free resources that are required by uio_release. This will later affect the
usage of interrupt as there is no way to disable the interrupt which is
defined in uio_release.

To prevent this, the hotplug removal needs to be identified and processed
accordingly in igb uio driver.

This patch proposes the addition of enum rte_udev_state in the
rte_uio_pci_dev struct. This will store the state of the uio device as one
of the following: probed/opened/released/removed.

This patch also checks the kobject's remove_uevent_sent state to detect if
the removal status is hotplug-out. Once a hotplug-out is detected, it will
call uio_release and set the uio status to "removed". After that, uio will
check the status in the uio_release function. If uio has already been
removed, it will only free the dirty uio resource.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Shaopeng He <shaopeng.he@intel.com>
---
v10->v9:
refine commmit log.
---
 kernel/linux/igb_uio/igb_uio.c | 69 +++++++++++++++++++++++++++++++++---------
 1 file changed, 55 insertions(+), 14 deletions(-)

diff --git a/kernel/linux/igb_uio/igb_uio.c b/kernel/linux/igb_uio/igb_uio.c
index 3398eac..d126371 100644
--- a/kernel/linux/igb_uio/igb_uio.c
+++ b/kernel/linux/igb_uio/igb_uio.c
@@ -19,6 +19,14 @@
 
 #include "compat.h"
 
+/* uio pci device state */
+enum rte_udev_state {
+	RTE_UDEV_PROBED,
+	RTE_UDEV_OPENNED,
+	RTE_UDEV_RELEASED,
+	RTE_UDEV_REMOVED,
+};
+
 /**
  * A structure describing the private information for a uio device.
  */
@@ -28,6 +36,7 @@ struct rte_uio_pci_dev {
 	enum rte_intr_mode mode;
 	struct mutex lock;
 	int refcnt;
+	enum rte_udev_state state;
 };
 
 static int wc_activate;
@@ -309,6 +318,17 @@ igbuio_pci_disable_interrupts(struct rte_uio_pci_dev *udev)
 #endif
 }
 
+/* Unmap previously ioremap'd resources */
+static void
+igbuio_pci_release_iomem(struct uio_info *info)
+{
+	int i;
+
+	for (i = 0; i < MAX_UIO_MAPS; i++) {
+		if (info->mem[i].internal_addr)
+			iounmap(info->mem[i].internal_addr);
+	}
+}
 
 /**
  * This gets called while opening uio device file.
@@ -331,20 +351,35 @@ igbuio_pci_open(struct uio_info *info, struct inode *inode)
 
 	/* enable interrupts */
 	err = igbuio_pci_enable_interrupts(udev);
-	mutex_unlock(&udev->lock);
 	if (err) {
 		dev_err(&dev->dev, "Enable interrupt fails\n");
+		pci_clear_master(dev);
+		mutex_unlock(&udev->lock);
 		return err;
 	}
+	udev->state = RTE_UDEV_OPENNED;
+	mutex_unlock(&udev->lock);
 	return 0;
 }
 
+/**
+ * This gets called while closing uio device file.
+ */
 static int
 igbuio_pci_release(struct uio_info *info, struct inode *inode)
 {
 	struct rte_uio_pci_dev *udev = info->priv;
 	struct pci_dev *dev = udev->pdev;
 
+	if (udev->state == RTE_UDEV_REMOVED) {
+		mutex_destroy(&udev->lock);
+		igbuio_pci_release_iomem(&udev->info);
+		pci_disable_device(dev);
+		pci_set_drvdata(dev, NULL);
+		kfree(udev);
+		return 0;
+	}
+
 	mutex_lock(&udev->lock);
 	if (--udev->refcnt > 0) {
 		mutex_unlock(&udev->lock);
@@ -356,7 +391,7 @@ igbuio_pci_release(struct uio_info *info, struct inode *inode)
 
 	/* stop the device from further DMA */
 	pci_clear_master(dev);
-
+	udev->state = RTE_UDEV_RELEASED;
 	mutex_unlock(&udev->lock);
 	return 0;
 }
@@ -414,18 +449,6 @@ igbuio_pci_setup_ioport(struct pci_dev *dev, struct uio_info *info,
 	return 0;
 }
 
-/* Unmap previously ioremap'd resources */
-static void
-igbuio_pci_release_iomem(struct uio_info *info)
-{
-	int i;
-
-	for (i = 0; i < MAX_UIO_MAPS; i++) {
-		if (info->mem[i].internal_addr)
-			iounmap(info->mem[i].internal_addr);
-	}
-}
-
 static int
 igbuio_setup_bars(struct pci_dev *dev, struct uio_info *info)
 {
@@ -562,6 +585,9 @@ igbuio_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
 			 (unsigned long long)map_dma_addr, map_addr);
 	}
 
+	mutex_lock(&udev->lock);
+	udev->state = RTE_UDEV_PROBED;
+	mutex_unlock(&udev->lock);
 	return 0;
 
 fail_remove_group:
@@ -579,6 +605,21 @@ static void
 igbuio_pci_remove(struct pci_dev *dev)
 {
 	struct rte_uio_pci_dev *udev = pci_get_drvdata(dev);
+	struct pci_dev *pdev = udev->pdev;
+	int ret;
+
+	/* handle unexpected removal */
+	if (udev->state == RTE_UDEV_OPENNED ||
+	    (&pdev->dev.kobj)->state_remove_uevent_sent == 1) {
+		dev_notice(&dev->dev, "Unexpected removal!\n");
+		ret = igbuio_pci_release(&udev->info, NULL);
+		if (ret)
+			return;
+		mutex_lock(&udev->lock);
+		udev->state = RTE_UDEV_REMOVED;
+		mutex_unlock(&udev->lock);
+		return;
+	}
 
 	mutex_destroy(&udev->lock);
 	sysfs_remove_group(&dev->dev.kobj, &dev_attr_grp);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v10 8/8] testpmd: use hotplug failure handle mechanism
  2018-08-17 10:48       ` [PATCH v10 0/8] " Jeff Guo
                           ` (7 preceding siblings ...)
  2018-08-17 10:48         ` [PATCH v10 7/8] igb_uio: fix unexpected remove issue " Jeff Guo
@ 2018-08-17 10:48         ` Jeff Guo
  8 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-08-17 10:48 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

This patch use testpmd for example to show how to use failure handle
mechanism for hotplug in app.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v10->v9:
use new APIs to manage hotplug handling.
---
 app/test-pmd/testpmd.c | 27 ++++++++++++++++++++++-----
 1 file changed, 22 insertions(+), 5 deletions(-)

diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index ee48db2..12fc497 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -2098,14 +2098,22 @@ pmd_test_exit(void)
 
 	if (hot_plug) {
 		ret = rte_dev_event_monitor_stop();
-		if (ret)
+		if (ret) {
 			RTE_LOG(ERR, EAL,
 				"fail to stop device event monitor.");
+			return;
+		}
 
 		ret = eth_dev_event_callback_unregister();
 		if (ret)
+			return;
+
+		ret = rte_dev_hotplug_handle_disable();
+		if (ret) {
 			RTE_LOG(ERR, EAL,
-				"fail to unregister all event callbacks.");
+				"fail to disable hotplug handling.");
+			return;
+		}
 	}
 
 	printf("\nBye...\n");
@@ -2784,14 +2792,23 @@ main(int argc, char** argv)
 	init_config();
 
 	if (hot_plug) {
-		/* enable hot plug monitoring */
+		ret = rte_dev_hotplug_handle_enable();
+		if (ret) {
+			RTE_LOG(ERR, EAL,
+				"fail to enable hotplug handling.");
+			return -1;
+		}
+
 		ret = rte_dev_event_monitor_start();
 		if (ret) {
-			rte_errno = EINVAL;
+			RTE_LOG(ERR, EAL,
+				"fail to start device event monitoring.");
 			return -1;
 		}
-		eth_dev_event_callback_register();
 
+		ret = eth_dev_event_callback_register();
+		if (ret)
+			return -1;
 	}
 
 	if (start_port(RTE_PORT_ALL) != 0)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* Re: [PATCH v10 7/8] igb_uio: fix unexpected remove issue for hotplug
  2018-08-17 10:48         ` [PATCH v10 7/8] igb_uio: fix unexpected remove issue " Jeff Guo
@ 2018-09-27 15:07           ` Ferruh Yigit
  2018-10-18  5:51             ` Jeff Guo
  2018-10-18  6:27           ` [PATCH v1] igb_uio: fix unexpected removal for hot-unplug Jeff Guo
  1 sibling, 1 reply; 494+ messages in thread
From: Ferruh Yigit @ 2018-09-27 15:07 UTC (permalink / raw)
  To: Jeff Guo, stephen, bruce.richardson, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu
  Cc: jblunck, shreyansh.jain, dev, helin.zhang

On 8/17/2018 11:48 AM, Jeff Guo wrote:
> When a device is hotplugged out, the PCI resource is released in the
> kernel, the UIO file descriptor will disappear and the irq will be
> released. After this, a kernel crash will be caused if the igb uio driver
> tries to access or release these resources.
> 
> And more, uio_remove will be called unexpectedly before uio_release
> when device be hotpluggged out, the uio_remove procedure will
> free resources that are required by uio_release. This will later affect the
> usage of interrupt as there is no way to disable the interrupt which is
> defined in uio_release.
> 
> To prevent this, the hotplug removal needs to be identified and processed
> accordingly in igb uio driver.
> 
> This patch proposes the addition of enum rte_udev_state in the
> rte_uio_pci_dev struct. This will store the state of the uio device as one
> of the following: probed/opened/released/removed.
> 
> This patch also checks the kobject's remove_uevent_sent state to detect if
> the removal status is hotplug-out. Once a hotplug-out is detected, it will
> call uio_release and set the uio status to "removed". After that, uio will
> check the status in the uio_release function. If uio has already been
> removed, it will only free the dirty uio resource.
> 
> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> Acked-by: Shaopeng He <shaopeng.he@intel.com>

<...>

> @@ -331,20 +351,35 @@ igbuio_pci_open(struct uio_info *info, struct inode *inode)
>  
>  	/* enable interrupts */
>  	err = igbuio_pci_enable_interrupts(udev);
> -	mutex_unlock(&udev->lock);
>  	if (err) {
>  		dev_err(&dev->dev, "Enable interrupt fails\n");
> +		pci_clear_master(dev);

Why pci_clear_master required here.

btw, some part of this patch conflicts with [1], which removes mutes and use
atomic refcnt operations, but introducing state seems needs mutex.

[1]
igb_uio: fix refcount if open returns error
https://patches.dpdk.org/patch/44732/

> +		mutex_unlock(&udev->lock);
>  		return err;
>  	}
> +	udev->state = RTE_UDEV_OPENNED;
> +	mutex_unlock(&udev->lock);
>  	return 0;
>  }
>  
> +/**
> + * This gets called while closing uio device file.
> + */
>  static int
>  igbuio_pci_release(struct uio_info *info, struct inode *inode)
>  {
>  	struct rte_uio_pci_dev *udev = info->priv;
>  	struct pci_dev *dev = udev->pdev;
>  
> +	if (udev->state == RTE_UDEV_REMOVED) {
> +		mutex_destroy(&udev->lock);
> +		igbuio_pci_release_iomem(&udev->info);
> +		pci_disable_device(dev);
> +		pci_set_drvdata(dev, NULL);
> +		kfree(udev);
> +		return 0;

This branch taken when pci_remove called before pci_release.
- At this stage is "dev" valid, since pci_remove() called?
- In this path uio_unregister_device() is missing, who unregisters uio?
- sysfs_remove_group() also missing, it is not clear if it is forgotten or left
out, what do you think move common part of pci_remove into new function and call
both in pci_remove and here?

And as a logic, can we make pci_remove clear everything, instead of doing some
cleanup here. Like:
pci_remove:
- calls pci_release
- instead of return keeps doing pci_remove work
- set state to REMOVED

pci_release:
- if state is REMOVED, return without doing nothing

btw, even after uio_unregister_device() how pci_release called?


It can help to share crash backtrace in commit log, to describe problem in more
detail.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH v11 0/7] hot-unplug failure handle mechanism
  2017-06-29  4:37     ` [PATCH v3 0/2] add uevent api for hot plug Jeff Guo
                         ` (16 preceding siblings ...)
  2018-08-17 10:48       ` [PATCH v10 0/8] " Jeff Guo
@ 2018-09-30 10:24       ` Jeff Guo
  2018-09-30 10:24         ` [PATCH v11 1/7] bus: add hot-unplug handler Jeff Guo
                           ` (6 more replies)
  2018-09-30 11:29       ` [PATCH v11 0/7] " Jeff Guo
                         ` (5 subsequent siblings)
  23 siblings, 7 replies; 494+ messages in thread
From: Jeff Guo @ 2018-09-30 10:24 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu, anatoly.burakov
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

Hotplug is an important feature for use-cases like the datacenter device's
fail-safe and for SRIOV Live Migration in SDN/NFV. It could bring higher
flexibility and continuality to networking services in multiple use-cases
in the industry. So let's see how DPDK can help users implement hotplug
solutions.

We already have a general device-event monitor mechanism, failsafe driver,
and hot plug/unplug API in DPDK. We have already got the solution of
“ethdev event + kernel PMD hotplug handler + failsafe”, but we still not
got “eal event + hotplug handler for pci PMD + failsafe” implement, and we
need to considerate 2 different solutions between uio pci and vfio pci.

In the case of hotplug for igb_uio, when a hardware device be removed
physically or disabled in software, the application needs to be notified
and detach the device out of the bus, and then make the device invalidate.
The problem is that, the removal of the device is not instantaneous in
software. If the application data path tries to read/write to the device
when removal is still in process, it will cause an MMIO error and
application will crash.

In this patch set, we propose a PCIe bus failure handler mechanism for
hot-unplug in igb_uio. It aims to guarantee that, when a hot-unplug occurs,
the application will not crash.

The mechanism should work as below:

First, the application enables the device event monitor, registers the
hotplug event’s callback and enable hotplug handling before running the
data path. Once the hot-unplug occurs, the mechanism will detect the
removal event and then accordingly do the failure handling. In order to
do that, the below functionality will be required:
 - Add a new bus ops “hot_unplug_handler” to handle hot-unplug failure.
 - Implement pci bus specific ops “pci_hot_unplug_handler”. For uio pci,
   it will be based on the failure address to remap memory for the corresponding
   device that unplugged. For vfio pci, could seperate implement case by case.

For the data path or other unexpected behaviors from the control path
when a hot unplug occurs:
 - Add a new bus ops “sigbus_handler”, that is responsible for handling
   the sigbus error which is either an original memory error, or a specific
   memory error that is caused by a hot unplug. When a sigbus error is
   captured, it will call this function to handle sigbus error.
 - Implement PCI bus specific ops “pci_sigbus_handler”. It will iterate all
   device on PCI bus to find which device encounter the failure.
 - Implement a "rte_bus_sigbus_handler" to iterate all buses to find a bus
   to handle the failure.
 - Add a couple of APIs “rte_dev_hotplug_handle_enable” and
   “rte_dev_hotplug_handle_diable” to enable/disable hotplug handling.
   It will monitor the sigbus error by a handler which is per-process.
   Based on the signal event principle, the control path thread and the
   data path thread will randomly receive the sigbus error, but will call the
   common sigbus process. When sigbus be captured, it will call the above API
   to find bus to handle it.

The mechanism could be used by app or PMDs. For example, the whole process
of hotplug in testpmd is:
 - Enable device event monitor->Enable hotplug handle->Register event callback
   ->attach port->start port->start forwarding->Device unplug->failure handle
   ->stop forwarding->stop port->close port->detach port.

This patch set would not cover hotplug insert and binding, and it is only
implement the igb_uio failure handler, the vfio hotplug failure handler
will be in next coming patch set.

patchset history:
v11->v10:
change the ops name, since both uio and vfio will use the hot-unplug ops.
since we plan to abandon RTE_ETH_EVENT_INTR_RMV, change to use
RTE_DEV_EVENT_REMOVE, so modify the hotplug event and callback usage.
move the igb_uio fixing part, since it is random issue and should be considarate
as kernel driver defect but not include as this failure handler mechanism.

v10->v9:
modify the api name and exposure out for public use.
add hotplug handle enable/disable APIs
refine commit log

v9->v8:
refine commit log to be more readable.

v8->v7:
refine errno process in sigbus handler.
refine igb uio release process

v7->v6:
delete some unused part

v6->v5:
refine some description about bus ops
refine commit log
add some entry check.

v5->v4:
split patches to focus on the failure handle, remove the event usage
by testpmd to another patch.
change the hotplug failure handler name.
refine the sigbus handle logic.
add lock for udev state in igb uio driver.

v4->v3:
split patches to be small and clear.
change to use new parameter "--hotplug-mode" in testpmd to identify
the eal hotplug and ethdev hotplug.

v3->v2:
change bus ops name to bus_hotplug_handler.
add new API and bus ops of bus_signal_handler distingush handle generic.
sigbus and hotplug sigbus.

v2->v1(v21):
refine some doc and commit log.
fix igb uio kernel issue for control path failure rebase testpmd code.

Since the hot plug solution be discussed serval around in the public,
the scope be changed and the patch set be split into many times. Coming
to the recently RFC and feature design, it just focus on the hot unplug
failure handler at this patch set, so in order let this topic more clear
and focus, summarize privours patch set in history “v1(v21)”, the v2 here
go ahead for further track.

"v1(21)" == v21 as below:
v21->v20:
split function in hot unplug ops.
sync failure hanlde to fix multiple process issue fix attach port issue for multiple devices case.
combind rmv callback function to be only one.

v20->v19:
clean the code.
refine the remap logic for multiple device.
remove the auto binding.

v19->18:
note for limitation of multiple hotplug, fix some typo, sqeeze patch.

v18->v15:
add document, add signal bus handler, refine the code to be more clear.

the prior patch history please check the patch set "add device event monitor framework".

Jeff Guo (7):
  bus: add hot-unplug handler
  bus/pci: implement hot-unplug handler ops
  bus: add sigbus handler
  bus/pci: implement sigbus handler ops
  bus: add helper to handle sigbus
  eal: add failure handle mechanism for hot-unplug
  testpmd: use hot-unplug failure handle mechanism

 app/test-pmd/testpmd.c                  |  39 ++++++--
 doc/guides/rel_notes/release_18_08.rst  |   5 +
 drivers/bus/pci/pci_common.c            |  81 ++++++++++++++++
 drivers/bus/pci/pci_common_uio.c        |  33 +++++++
 drivers/bus/pci/private.h               |  12 +++
 lib/librte_eal/bsdapp/eal/eal_dev.c     |  14 +++
 lib/librte_eal/common/eal_common_bus.c  |  43 +++++++++
 lib/librte_eal/common/eal_private.h     |  39 ++++++++
 lib/librte_eal/common/include/rte_bus.h |  34 +++++++
 lib/librte_eal/common/include/rte_dev.h |  26 +++++
 lib/librte_eal/linuxapp/eal/eal_dev.c   | 162 +++++++++++++++++++++++++++++++-
 lib/librte_eal/rte_eal_version.map      |   2 +
 12 files changed, 481 insertions(+), 9 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH v11 1/7] bus: add hot-unplug handler
  2018-09-30 10:24       ` [PATCH v11 0/7] hot-unplug " Jeff Guo
@ 2018-09-30 10:24         ` Jeff Guo
  2018-09-30 10:24         ` [PATCH v11 2/7] bus/pci: implement hot-unplug handler ops Jeff Guo
                           ` (5 subsequent siblings)
  6 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-09-30 10:24 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu, anatoly.burakov
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

A hot-unplug failure and app crash can be caused, when a device is
hot-unplugged but the application still try to access the device
by reading or writing from the BARs, which is already invalid but
still not timely be unmap or released.

This patch introduces bus ops to handle hot-unplug failures. Each
bus can implement its own case-dependent logic to handle the failures.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Shaopeng He <shaopeng.he@intel.com>
---
v11->v10:
change the ops name, since both uio and vfio will use the hot-unplug ops.
---
 lib/librte_eal/common/include/rte_bus.h | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
index b7b5b08..1bb53dc 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -168,6 +168,20 @@ typedef int (*rte_bus_unplug_t)(struct rte_device *dev);
 typedef int (*rte_bus_parse_t)(const char *name, void *addr);
 
 /**
+ * Implement a specific hot-unplug handler, which is responsible for
+ * handle the failure when device be hot-unplugged. When the event of
+ * hot-unplug be detected, it could call this function to handle
+ * the hot-unplug failure and avoid app crash.
+ * @param dev
+ *	Pointer of the device structure.
+ *
+ * @return
+ *	0 on success.
+ *	!0 on error.
+ */
+typedef int (*rte_bus_hot_unplug_handler_t)(struct rte_device *dev);
+
+/**
  * Bus scan policies
  */
 enum rte_bus_scan_mode {
@@ -212,6 +226,8 @@ struct rte_bus {
 	struct rte_bus_conf conf;    /**< Bus configuration */
 	rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
 	rte_dev_iterate_t dev_iterate; /**< Device iterator. */
+	rte_bus_hot_unplug_handler_t hot_unplug_handler;
+				/**< handle hot-unplug failure on the bus */
 };
 
 /**
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v11 2/7] bus/pci: implement hot-unplug handler ops
  2018-09-30 10:24       ` [PATCH v11 0/7] hot-unplug " Jeff Guo
  2018-09-30 10:24         ` [PATCH v11 1/7] bus: add hot-unplug handler Jeff Guo
@ 2018-09-30 10:24         ` Jeff Guo
  2018-09-30 10:24         ` [PATCH v11 3/7] bus: add sigbus handler Jeff Guo
                           ` (4 subsequent siblings)
  6 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-09-30 10:24 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu, anatoly.burakov
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

This patch implements the ops to handle hot-unplug on the PCI bus.
For UIO PCI, it could avoids BARs read/write errors by creating a
new dummy memory to remap the memory where the failure is. For VFIO
or other kernel driver, it could specific implement function to handle
hot-unplug case by case.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Shaopeng He <shaopeng.he@intel.com>
---
v11->v10:
change the ops name
---
 drivers/bus/pci/pci_common.c     | 28 ++++++++++++++++++++++++++++
 drivers/bus/pci/pci_common_uio.c | 33 +++++++++++++++++++++++++++++++++
 drivers/bus/pci/private.h        | 12 ++++++++++++
 3 files changed, 73 insertions(+)

diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
index 7736b3f..d286234 100644
--- a/drivers/bus/pci/pci_common.c
+++ b/drivers/bus/pci/pci_common.c
@@ -406,6 +406,33 @@ pci_find_device(const struct rte_device *start, rte_dev_cmp_t cmp,
 }
 
 static int
+pci_hot_unplug_handler(struct rte_device *dev)
+{
+	struct rte_pci_device *pdev = NULL;
+	int ret = 0;
+
+	pdev = RTE_DEV_TO_PCI(dev);
+	if (!pdev)
+		return -1;
+
+	switch (pdev->kdrv) {
+	case RTE_KDRV_IGB_UIO:
+	case RTE_KDRV_UIO_GENERIC:
+	case RTE_KDRV_NIC_UIO:
+		/* BARs resource is invalid, remap it to be safe. */
+		ret = pci_uio_remap_resource(pdev);
+		break;
+	default:
+		RTE_LOG(DEBUG, EAL,
+			"Not managed by a supported kernel driver, skipped\n");
+		ret = -1;
+		break;
+	}
+
+	return ret;
+}
+
+static int
 pci_plug(struct rte_device *dev)
 {
 	return pci_probe_all_drivers(RTE_DEV_TO_PCI(dev));
@@ -435,6 +462,7 @@ struct rte_pci_bus rte_pci_bus = {
 		.unplug = pci_unplug,
 		.parse = pci_parse,
 		.get_iommu_class = rte_pci_get_iommu_class,
+		.hot_unplug_handler = pci_hot_unplug_handler,
 	},
 	.device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
 	.driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
diff --git a/drivers/bus/pci/pci_common_uio.c b/drivers/bus/pci/pci_common_uio.c
index 54bc20b..7ea73db 100644
--- a/drivers/bus/pci/pci_common_uio.c
+++ b/drivers/bus/pci/pci_common_uio.c
@@ -146,6 +146,39 @@ pci_uio_unmap(struct mapped_pci_resource *uio_res)
 	}
 }
 
+/* remap the PCI resource of a PCI device in anonymous virtual memory */
+int
+pci_uio_remap_resource(struct rte_pci_device *dev)
+{
+	int i;
+	void *map_address;
+
+	if (dev == NULL)
+		return -1;
+
+	/* Remap all BARs */
+	for (i = 0; i != PCI_MAX_RESOURCE; i++) {
+		/* skip empty BAR */
+		if (dev->mem_resource[i].phys_addr == 0)
+			continue;
+		map_address = mmap(dev->mem_resource[i].addr,
+				(size_t)dev->mem_resource[i].len,
+				PROT_READ | PROT_WRITE,
+				MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
+		if (map_address == MAP_FAILED) {
+			RTE_LOG(ERR, EAL,
+				"Cannot remap resource for device %s\n",
+				dev->name);
+			return -1;
+		}
+		RTE_LOG(INFO, EAL,
+			"Successful remap resource for device %s\n",
+			dev->name);
+	}
+
+	return 0;
+}
+
 static struct mapped_pci_resource *
 pci_uio_find_resource(struct rte_pci_device *dev)
 {
diff --git a/drivers/bus/pci/private.h b/drivers/bus/pci/private.h
index 8ddd03e..6b312e5 100644
--- a/drivers/bus/pci/private.h
+++ b/drivers/bus/pci/private.h
@@ -123,6 +123,18 @@ void pci_uio_free_resource(struct rte_pci_device *dev,
 		struct mapped_pci_resource *uio_res);
 
 /**
+ * Remap the PCI resource of a PCI device in anonymous virtual memory.
+ *
+ * @param dev
+ *   Point to the struct rte pci device.
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+pci_uio_remap_resource(struct rte_pci_device *dev);
+
+/**
  * Map device memory to uio resource
  *
  * This function is private to EAL.
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v11 3/7] bus: add sigbus handler
  2018-09-30 10:24       ` [PATCH v11 0/7] hot-unplug " Jeff Guo
  2018-09-30 10:24         ` [PATCH v11 1/7] bus: add hot-unplug handler Jeff Guo
  2018-09-30 10:24         ` [PATCH v11 2/7] bus/pci: implement hot-unplug handler ops Jeff Guo
@ 2018-09-30 10:24         ` Jeff Guo
  2018-09-30 10:24         ` [PATCH v11 4/7] bus/pci: implement sigbus handler ops Jeff Guo
                           ` (3 subsequent siblings)
  6 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-09-30 10:24 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu, anatoly.burakov
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

When a device is hot-unplugged, a sigbus error will occur of the datapath
can still read/write to the device. A handler is required here to capture
the sigbus signal and handle it appropriately.

This patch introduces a bus ops to handle sigbus errors. Each bus can
implement its own case-dependent logic to handle the sigbus errors.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Shaopeng He <shaopeng.he@intel.com>
---
v11->v10:
change some commit log
---
 lib/librte_eal/common/include/rte_bus.h | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
index 1bb53dc..201454a 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -182,6 +182,21 @@ typedef int (*rte_bus_parse_t)(const char *name, void *addr);
 typedef int (*rte_bus_hot_unplug_handler_t)(struct rte_device *dev);
 
 /**
+ * Implement a specific sigbus handler, which is responsible for handling
+ * the sigbus error which is either original memory error, or specific memory
+ * error that caused of device be hot-unplugged. When sigbus error be captured,
+ * it could call this function to handle sigbus error.
+ * @param failure_addr
+ *	Pointer of the fault address of the sigbus error.
+ *
+ * @return
+ *	0 for success handle the sigbus.
+ *	1 for no bus handle the sigbus.
+ *	-1 for failed to handle the sigbus
+ */
+typedef int (*rte_bus_sigbus_handler_t)(const void *failure_addr);
+
+/**
  * Bus scan policies
  */
 enum rte_bus_scan_mode {
@@ -228,6 +243,9 @@ struct rte_bus {
 	rte_dev_iterate_t dev_iterate; /**< Device iterator. */
 	rte_bus_hot_unplug_handler_t hot_unplug_handler;
 				/**< handle hot-unplug failure on the bus */
+	rte_bus_sigbus_handler_t sigbus_handler;
+					/**< handle sigbus error on the bus */
+
 };
 
 /**
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v11 4/7] bus/pci: implement sigbus handler ops
  2018-09-30 10:24       ` [PATCH v11 0/7] hot-unplug " Jeff Guo
                           ` (2 preceding siblings ...)
  2018-09-30 10:24         ` [PATCH v11 3/7] bus: add sigbus handler Jeff Guo
@ 2018-09-30 10:24         ` Jeff Guo
  2018-09-30 10:24         ` [PATCH v11 5/7] bus: add helper to handle sigbus Jeff Guo
                           ` (2 subsequent siblings)
  6 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-09-30 10:24 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu, anatoly.burakov
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

This patch implements the ops for the PCI bus sigbus handler. It finds the
PCI device that is being hot-unplugged and calls the relevant ops of the
hot-unplug handler to handle the hot-unplug failure of the device.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Shaopeng He <shaopeng.he@intel.com>
---
v11->v10:
change commit log.
---
 drivers/bus/pci/pci_common.c | 53 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 53 insertions(+)

diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
index d286234..f313fe9 100644
--- a/drivers/bus/pci/pci_common.c
+++ b/drivers/bus/pci/pci_common.c
@@ -405,6 +405,36 @@ pci_find_device(const struct rte_device *start, rte_dev_cmp_t cmp,
 	return NULL;
 }
 
+/**
+ * find the device which encounter the failure, by iterate over all device on
+ * PCI bus to check if the memory failure address is located in the range
+ * of the BARs of the device.
+ */
+static struct rte_pci_device *
+pci_find_device_by_addr(const void *failure_addr)
+{
+	struct rte_pci_device *pdev = NULL;
+	int i;
+
+	FOREACH_DEVICE_ON_PCIBUS(pdev) {
+		for (i = 0; i != RTE_DIM(pdev->mem_resource); i++) {
+			if ((uint64_t)(uintptr_t)failure_addr >=
+			    (uint64_t)(uintptr_t)pdev->mem_resource[i].addr &&
+			    (uint64_t)(uintptr_t)failure_addr <
+			    (uint64_t)(uintptr_t)pdev->mem_resource[i].addr +
+			    pdev->mem_resource[i].len) {
+				RTE_LOG(INFO, EAL, "Failure address "
+					"%16.16"PRIx64" belongs to "
+					"device %s!\n",
+					(uint64_t)(uintptr_t)failure_addr,
+					pdev->device.name);
+				return pdev;
+			}
+		}
+	}
+	return NULL;
+}
+
 static int
 pci_hot_unplug_handler(struct rte_device *dev)
 {
@@ -433,6 +463,28 @@ pci_hot_unplug_handler(struct rte_device *dev)
 }
 
 static int
+pci_sigbus_handler(const void *failure_addr)
+{
+	struct rte_pci_device *pdev = NULL;
+	int ret = 0;
+
+	pdev = pci_find_device_by_addr(failure_addr);
+	if (!pdev) {
+		/* It is a generic sigbus error, no bus would handle it. */
+		ret = 1;
+	} else {
+		/* The sigbus error is caused of hot-unplug. */
+		ret = pci_hot_unplug_handler(&pdev->device);
+		if (ret) {
+			RTE_LOG(ERR, EAL, "Failed to handle hot-unplug for "
+				"device %s", pdev->name);
+			ret = -1;
+		}
+	}
+	return ret;
+}
+
+static int
 pci_plug(struct rte_device *dev)
 {
 	return pci_probe_all_drivers(RTE_DEV_TO_PCI(dev));
@@ -463,6 +515,7 @@ struct rte_pci_bus rte_pci_bus = {
 		.parse = pci_parse,
 		.get_iommu_class = rte_pci_get_iommu_class,
 		.hot_unplug_handler = pci_hot_unplug_handler,
+		.sigbus_handler = pci_sigbus_handler,
 	},
 	.device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
 	.driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v11 5/7] bus: add helper to handle sigbus
  2018-09-30 10:24       ` [PATCH v11 0/7] hot-unplug " Jeff Guo
                           ` (3 preceding siblings ...)
  2018-09-30 10:24         ` [PATCH v11 4/7] bus/pci: implement sigbus handler ops Jeff Guo
@ 2018-09-30 10:24         ` Jeff Guo
  2018-09-30 10:24         ` [PATCH v11 6/7] eal: add failure handle mechanism for hot-unplug Jeff Guo
  2018-09-30 10:24         ` [PATCH v11 7/7] testpmd: use hot-unplug failure handle mechanism Jeff Guo
  6 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-09-30 10:24 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu, anatoly.burakov
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

This patch aims to add a helper to iterate over all buses to find the
relevant bus to handle the sigbus error.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Shaopeng He <shaopeng.he@intel.com>
---
v11->v10:
change some words.
---
 lib/librte_eal/common/eal_common_bus.c | 43 ++++++++++++++++++++++++++++++++++
 lib/librte_eal/common/eal_private.h    | 13 ++++++++++
 2 files changed, 56 insertions(+)

diff --git a/lib/librte_eal/common/eal_common_bus.c b/lib/librte_eal/common/eal_common_bus.c
index 0943851..62b7318 100644
--- a/lib/librte_eal/common/eal_common_bus.c
+++ b/lib/librte_eal/common/eal_common_bus.c
@@ -37,6 +37,7 @@
 #include <rte_bus.h>
 #include <rte_debug.h>
 #include <rte_string_fns.h>
+#include <rte_errno.h>
 
 #include "eal_private.h"
 
@@ -242,3 +243,45 @@ rte_bus_get_iommu_class(void)
 	}
 	return mode;
 }
+
+static int
+bus_handle_sigbus(const struct rte_bus *bus,
+			const void *failure_addr)
+{
+	int ret;
+
+	if (!bus->sigbus_handler)
+		return -1;
+
+	ret = bus->sigbus_handler(failure_addr);
+
+	/* find bus but handle failed, keep the errno be set. */
+	if (ret < 0 && rte_errno == 0)
+		rte_errno = ENOTSUP;
+
+	return ret > 0;
+}
+
+int
+rte_bus_sigbus_handler(const void *failure_addr)
+{
+	struct rte_bus *bus;
+
+	int ret = 0;
+	int old_errno = rte_errno;
+
+	rte_errno = 0;
+
+	bus = rte_bus_find(NULL, bus_handle_sigbus, failure_addr);
+	/* can not find bus. */
+	if (!bus)
+		return 1;
+	/* find bus but handle failed, pass on the new errno. */
+	else if (rte_errno != 0)
+		return -1;
+
+	/* restore the old errno. */
+	rte_errno = old_errno;
+
+	return ret;
+}
diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h
index 4f809a8..a2d1528 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -304,4 +304,17 @@ int
 rte_devargs_layers_parse(struct rte_devargs *devargs,
 			 const char *devstr);
 
+/**
+ * Iterate over all buses to find the corresponding bus to handle the sigbus
+ * error.
+ * @param failure_addr
+ *	Pointer of the fault address of the sigbus error.
+ *
+ * @return
+ *	 0 success to handle the sigbus.
+ *	-1 failed to handle the sigbus
+ *	 1 no bus can handler the sigbus
+ */
+int rte_bus_sigbus_handler(const void *failure_addr);
+
 #endif /* _EAL_PRIVATE_H_ */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v11 6/7] eal: add failure handle mechanism for hot-unplug
  2018-09-30 10:24       ` [PATCH v11 0/7] hot-unplug " Jeff Guo
                           ` (4 preceding siblings ...)
  2018-09-30 10:24         ` [PATCH v11 5/7] bus: add helper to handle sigbus Jeff Guo
@ 2018-09-30 10:24         ` Jeff Guo
  2018-09-30 10:24         ` [PATCH v11 7/7] testpmd: use hot-unplug failure handle mechanism Jeff Guo
  6 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-09-30 10:24 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu, anatoly.burakov
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

The mechanism can initially register the sigbus handler after the device
event monitor is enabled. When a sigbus event is captured, it will check
the failure address and accordingly handle the memory failure of the
corresponding device by invoke the hot-unplug handler. It could prevent
the application from crashing when a device is hot-unplugged.

By this patch, users could call below new added APIs to enable/disable
the device hotplug handle mechanism. Note that it just implement the
hot-unplug handler in these functions, the other handler of hotplug, such
as handler for hotplug binding, could be add in the future if need:
  - rte_dev_hotplug_handle_enable
  - rte_dev_hotplug_handle_disable

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Shaopeng He <shaopeng.he@intel.com>
---
v11->v10:
change some words and change the invoked func name.
---
 doc/guides/rel_notes/release_18_08.rst  |   5 +
 lib/librte_eal/bsdapp/eal/eal_dev.c     |  14 +++
 lib/librte_eal/common/eal_private.h     |  26 +++++
 lib/librte_eal/common/include/rte_dev.h |  26 +++++
 lib/librte_eal/linuxapp/eal/eal_dev.c   | 162 +++++++++++++++++++++++++++++++-
 lib/librte_eal/rte_eal_version.map      |   2 +
 6 files changed, 234 insertions(+), 1 deletion(-)

diff --git a/doc/guides/rel_notes/release_18_08.rst b/doc/guides/rel_notes/release_18_08.rst
index 321fa84..fe0e60f 100644
--- a/doc/guides/rel_notes/release_18_08.rst
+++ b/doc/guides/rel_notes/release_18_08.rst
@@ -117,6 +117,11 @@ New Features
 
   Added support for chained mbufs (input and output).
 
+* **Added hot-unplug handle mechanism.**
+
+  ``rte_dev_hotplug_handle_enable`` and ``rte_dev_hotplug_handle_disable`` are
+  for enabling or disabling hotplug handle mechanism.
+
 
 API Changes
 -----------
diff --git a/lib/librte_eal/bsdapp/eal/eal_dev.c b/lib/librte_eal/bsdapp/eal/eal_dev.c
index 1c6c51b..ae1c558 100644
--- a/lib/librte_eal/bsdapp/eal/eal_dev.c
+++ b/lib/librte_eal/bsdapp/eal/eal_dev.c
@@ -19,3 +19,17 @@ rte_dev_event_monitor_stop(void)
 	RTE_LOG(ERR, EAL, "Device event is not supported for FreeBSD\n");
 	return -1;
 }
+
+int
+rte_dev_hotplug_handle_enable(void)
+{
+	RTE_LOG(ERR, EAL, "Device event is not supported for FreeBSD\n");
+	return -1;
+}
+
+int
+rte_dev_hotplug_handle_disable(void)
+{
+	RTE_LOG(ERR, EAL, "Device event is not supported for FreeBSD\n");
+	return -1;
+}
diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h
index a2d1528..637f20d 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -317,4 +317,30 @@ rte_devargs_layers_parse(struct rte_devargs *devargs,
  */
 int rte_bus_sigbus_handler(const void *failure_addr);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Register the sigbus handler.
+ *
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+rte_dev_sigbus_handler_register(void);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Unregister the sigbus handler.
+ *
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+rte_dev_sigbus_handler_unregister(void);
+
 #endif /* _EAL_PRIVATE_H_ */
diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h
index b80a805..ff580a0 100644
--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -460,4 +460,30 @@ rte_dev_event_monitor_start(void);
 int __rte_experimental
 rte_dev_event_monitor_stop(void);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Enable hotplug handling for devices.
+ *
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int __rte_experimental
+rte_dev_hotplug_handle_enable(void);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Disable hotplug handling for devices.
+ *
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int __rte_experimental
+rte_dev_hotplug_handle_disable(void);
+
 #endif /* _RTE_DEV_H_ */
diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
index 1cf6aeb..9f9e1cf 100644
--- a/lib/librte_eal/linuxapp/eal/eal_dev.c
+++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
@@ -4,6 +4,8 @@
 
 #include <string.h>
 #include <unistd.h>
+#include <fcntl.h>
+#include <signal.h>
 #include <sys/socket.h>
 #include <linux/netlink.h>
 
@@ -14,15 +16,32 @@
 #include <rte_malloc.h>
 #include <rte_interrupts.h>
 #include <rte_alarm.h>
+#include <rte_bus.h>
+#include <rte_eal.h>
+#include <rte_spinlock.h>
+#include <rte_errno.h>
 
 #include "eal_private.h"
 
 static struct rte_intr_handle intr_handle = {.fd = -1 };
 static bool monitor_started;
+static bool hotplug_handle;
 
 #define EAL_UEV_MSG_LEN 4096
 #define EAL_UEV_MSG_ELEM_LEN 128
 
+/*
+ * spinlock for device hot-unplug failure handling. If it try to access bus or
+ * device, such as handle sigbus on bus or handle memory failure for device
+ * just need to use this lock. It could protect the bus and the device to avoid
+ * race condition.
+ */
+static rte_spinlock_t failure_handle_lock = RTE_SPINLOCK_INITIALIZER;
+
+static struct sigaction sigbus_action_old;
+
+static int sigbus_need_recover;
+
 static void dev_uev_handler(__rte_unused void *param);
 
 /* identify the system layer which reports this event. */
@@ -33,6 +52,49 @@ enum eal_dev_event_subsystem {
 	EAL_DEV_EVENT_SUBSYSTEM_MAX
 };
 
+static void
+sigbus_action_recover(void)
+{
+	if (sigbus_need_recover) {
+		sigaction(SIGBUS, &sigbus_action_old, NULL);
+		sigbus_need_recover = 0;
+	}
+}
+
+static void sigbus_handler(int signum, siginfo_t *info,
+				void *ctx __rte_unused)
+{
+	int ret;
+
+	RTE_LOG(INFO, EAL, "Thread[%d] catch SIGBUS, fault address:%p\n",
+		(int)pthread_self(), info->si_addr);
+
+	rte_spinlock_lock(&failure_handle_lock);
+	ret = rte_bus_sigbus_handler(info->si_addr);
+	rte_spinlock_unlock(&failure_handle_lock);
+	if (ret == -1) {
+		rte_exit(EXIT_FAILURE,
+			 "Failed to handle SIGBUS for hot-unplug, "
+			 "(rte_errno: %s)!", strerror(rte_errno));
+	} else if (ret == 1) {
+		if (sigbus_action_old.sa_handler)
+			(*(sigbus_action_old.sa_handler))(signum);
+		else
+			rte_exit(EXIT_FAILURE,
+				 "Failed to handle generic SIGBUS!");
+	}
+
+	RTE_LOG(INFO, EAL, "Success to handle SIGBUS for hot-unplug!\n");
+}
+
+static int cmp_dev_name(const struct rte_device *dev,
+	const void *_name)
+{
+	const char *name = _name;
+
+	return strcmp(dev->name, name);
+}
+
 static int
 dev_uev_socket_fd_create(void)
 {
@@ -147,6 +209,9 @@ dev_uev_handler(__rte_unused void *param)
 	struct rte_dev_event uevent;
 	int ret;
 	char buf[EAL_UEV_MSG_LEN];
+	struct rte_bus *bus;
+	struct rte_device *dev;
+	const char *busname = "";
 
 	memset(&uevent, 0, sizeof(struct rte_dev_event));
 	memset(buf, 0, EAL_UEV_MSG_LEN);
@@ -171,8 +236,43 @@ dev_uev_handler(__rte_unused void *param)
 	RTE_LOG(DEBUG, EAL, "receive uevent(name:%s, type:%d, subsystem:%d)\n",
 		uevent.devname, uevent.type, uevent.subsystem);
 
-	if (uevent.devname)
+	switch (uevent.subsystem) {
+	case EAL_DEV_EVENT_SUBSYSTEM_PCI:
+	case EAL_DEV_EVENT_SUBSYSTEM_UIO:
+		busname = "pci";
+		break;
+	default:
+		break;
+	}
+
+	if (uevent.devname) {
+		if (uevent.type == RTE_DEV_EVENT_REMOVE && hotplug_handle) {
+			rte_spinlock_lock(&failure_handle_lock);
+			bus = rte_bus_find_by_name(busname);
+			if (bus == NULL) {
+				RTE_LOG(ERR, EAL, "Cannot find bus (%s)\n",
+					busname);
+				return;
+			}
+
+			dev = bus->find_device(NULL, cmp_dev_name,
+					       uevent.devname);
+			if (dev == NULL) {
+				RTE_LOG(ERR, EAL, "Cannot find device (%s) on "
+					"bus (%s)\n", uevent.devname, busname);
+				return;
+			}
+
+			ret = bus->hot_unplug_handler(dev);
+			rte_spinlock_unlock(&failure_handle_lock);
+			if (ret) {
+				RTE_LOG(ERR, EAL, "Can not handle hot-unplug "
+					"for device (%s)\n", dev->name);
+				return;
+			}
+		}
 		dev_callback_process(uevent.devname, uevent.type);
+	}
 }
 
 int __rte_experimental
@@ -220,5 +320,65 @@ rte_dev_event_monitor_stop(void)
 	close(intr_handle.fd);
 	intr_handle.fd = -1;
 	monitor_started = false;
+
 	return 0;
 }
+
+int
+rte_dev_sigbus_handler_register(void)
+{
+	sigset_t mask;
+	struct sigaction action;
+
+	rte_errno = 0;
+
+	sigemptyset(&mask);
+	sigaddset(&mask, SIGBUS);
+	action.sa_flags = SA_SIGINFO;
+	action.sa_mask = mask;
+	action.sa_sigaction = sigbus_handler;
+	sigbus_need_recover = !sigaction(SIGBUS, &action, &sigbus_action_old);
+
+	return rte_errno;
+}
+
+int
+rte_dev_sigbus_handler_unregister(void)
+{
+	rte_errno = 0;
+	sigbus_need_recover = 1;
+
+	sigbus_action_recover();
+
+	return rte_errno;
+}
+
+int
+rte_dev_hotplug_handle_enable(void)
+{
+	int ret = 0;
+
+	ret = rte_dev_sigbus_handler_register();
+	if (ret < 0)
+		RTE_LOG(ERR, EAL, "fail to register sigbus handler for "
+			"devices.\n");
+
+	hotplug_handle = true;
+
+	return ret;
+}
+
+int
+rte_dev_hotplug_handle_disable(void)
+{
+	int ret = 0;
+
+	ret = rte_dev_sigbus_handler_unregister();
+	if (ret < 0)
+		RTE_LOG(ERR, EAL, "fail to unregister sigbus handler for "
+			"devices.\n");
+
+	hotplug_handle = false;
+
+	return ret;
+}
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 73282bb..a3255aa 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -281,6 +281,8 @@ EXPERIMENTAL {
 	rte_dev_event_callback_unregister;
 	rte_dev_event_monitor_start;
 	rte_dev_event_monitor_stop;
+	rte_dev_hotplug_handle_enable;
+	rte_dev_hotplug_handle_disable;
 	rte_dev_iterator_init;
 	rte_dev_iterator_next;
 	rte_devargs_add;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v11 7/7] testpmd: use hot-unplug failure handle mechanism
  2018-09-30 10:24       ` [PATCH v11 0/7] hot-unplug " Jeff Guo
                           ` (5 preceding siblings ...)
  2018-09-30 10:24         ` [PATCH v11 6/7] eal: add failure handle mechanism for hot-unplug Jeff Guo
@ 2018-09-30 10:24         ` Jeff Guo
  6 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-09-30 10:24 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu, anatoly.burakov
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

This patch use testpmd for example, to show how an app smoothly handle
failure when device be hot-unplug. Except app should enabled the device
event monitor and register the hotplug event’s callback, it also need
enable hotplug handle mechanism before running. Once app detect the
removal event, the hot-unplug callback would be called. It will first stop
the packet forwarding, then stop the port, close the port, and finally
detach the port to clean the device and release the resources.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v11->v10:
since we plan to abandon RTE_ETH_EVENT_INTR_RMV, change to use
RTE_DEV_EVENT_REMOVE, so modify the hotplug event and callback usage.
---
 app/test-pmd/testpmd.c | 39 +++++++++++++++++++++++++++++++--------
 1 file changed, 31 insertions(+), 8 deletions(-)

diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 001f0e5..bfef483 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -2093,14 +2093,22 @@ pmd_test_exit(void)
 
 	if (hot_plug) {
 		ret = rte_dev_event_monitor_stop();
-		if (ret)
+		if (ret) {
 			RTE_LOG(ERR, EAL,
 				"fail to stop device event monitor.");
+			return;
+		}
 
 		ret = eth_dev_event_callback_unregister();
 		if (ret)
+			return;
+
+		ret = rte_dev_hotplug_handle_disable();
+		if (ret) {
 			RTE_LOG(ERR, EAL,
-				"fail to unregister all event callbacks.");
+				"fail to disable hotplug handling.");
+			return;
+		}
 	}
 
 	printf("\nBye...\n");
@@ -2244,6 +2252,9 @@ static void
 eth_dev_event_callback(char *device_name, enum rte_dev_event_type type,
 			     __rte_unused void *arg)
 {
+	uint16_t port_id;
+	int ret;
+
 	if (type >= RTE_DEV_EVENT_MAX) {
 		fprintf(stderr, "%s called upon invalid event %d\n",
 			__func__, type);
@@ -2254,9 +2265,12 @@ eth_dev_event_callback(char *device_name, enum rte_dev_event_type type,
 	case RTE_DEV_EVENT_REMOVE:
 		RTE_LOG(ERR, EAL, "The device: %s has been removed!\n",
 			device_name);
-		/* TODO: After finish failure handle, begin to stop
-		 * packet forward, stop port, close port, detach port.
-		 */
+		ret = rte_eth_dev_get_port_by_name(device_name, &port_id);
+		if (ret) {
+			printf("can not get port by device %s!\n", device_name);
+			return;
+		}
+		rmv_event_callback((void *)(intptr_t)port_id);
 		break;
 	case RTE_DEV_EVENT_ADD:
 		RTE_LOG(ERR, EAL, "The device: %s has been added!\n",
@@ -2779,14 +2793,23 @@ main(int argc, char** argv)
 	init_config();
 
 	if (hot_plug) {
-		/* enable hot plug monitoring */
+		ret = rte_dev_hotplug_handle_enable();
+		if (ret) {
+			RTE_LOG(ERR, EAL,
+				"fail to enable hotplug handling.");
+			return -1;
+		}
+
 		ret = rte_dev_event_monitor_start();
 		if (ret) {
-			rte_errno = EINVAL;
+			RTE_LOG(ERR, EAL,
+				"fail to start device event monitoring.");
 			return -1;
 		}
-		eth_dev_event_callback_register();
 
+		ret = eth_dev_event_callback_register();
+		if (ret)
+			return -1;
 	}
 
 	if (start_port(RTE_PORT_ALL) != 0)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v11 0/7] hot-unplug failure handle mechanism
  2017-06-29  4:37     ` [PATCH v3 0/2] add uevent api for hot plug Jeff Guo
                         ` (17 preceding siblings ...)
  2018-09-30 10:24       ` [PATCH v11 0/7] hot-unplug " Jeff Guo
@ 2018-09-30 11:29       ` Jeff Guo
  2018-09-30 11:29         ` [PATCH v11 1/7] bus: add hot-unplug handler Jeff Guo
                           ` (7 more replies)
  2018-10-02 12:32       ` [PATCH v12 " Jeff Guo
                         ` (4 subsequent siblings)
  23 siblings, 8 replies; 494+ messages in thread
From: Jeff Guo @ 2018-09-30 11:29 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu, anatoly.burakov
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

Hotplug is an important feature for use-cases like the datacenter device's
fail-safe and for SRIOV Live Migration in SDN/NFV. It could bring higher
flexibility and continuality to networking services in multiple use-cases
in the industry. So let's see how DPDK can help users implement hotplug
solutions.

We already have a general device-event monitor mechanism, failsafe driver,
and hot plug/unplug API in DPDK. We have already got the solution of
“ethdev event + kernel PMD hotplug handler + failsafe”, but we still not
got “eal event + hotplug handler for pci PMD + failsafe” implement, and we
need to considerate 2 different solutions between uio pci and vfio pci.

In the case of hotplug for igb_uio, when a hardware device be removed
physically or disabled in software, the application needs to be notified
and detach the device out of the bus, and then make the device invalidate.
The problem is that, the removal of the device is not instantaneous in
software. If the application data path tries to read/write to the device
when removal is still in process, it will cause an MMIO error and
application will crash.

In this patch set, we propose a PCIe bus failure handler mechanism for
hot-unplug in igb_uio. It aims to guarantee that, when a hot-unplug occurs,
the application will not crash.

The mechanism should work as below:

First, the application enables the device event monitor, registers the
hotplug event’s callback and enable hotplug handling before running the
data path. Once the hot-unplug occurs, the mechanism will detect the
removal event and then accordingly do the failure handling. In order to
do that, the below functionality will be required:
 - Add a new bus ops “hot_unplug_handler” to handle hot-unplug failure.
 - Implement pci bus specific ops “pci_hot_unplug_handler”. For uio pci,
   it will be based on the failure address to remap memory for the corresponding
   device that unplugged. For vfio pci, could seperate implement case by case.

For the data path or other unexpected behaviors from the control path
when a hot unplug occurs:
 - Add a new bus ops “sigbus_handler”, that is responsible for handling
   the sigbus error which is either an original memory error, or a specific
   memory error that is caused by a hot unplug. When a sigbus error is
   captured, it will call this function to handle sigbus error.
 - Implement PCI bus specific ops “pci_sigbus_handler”. It will iterate all
   device on PCI bus to find which device encounter the failure.
 - Implement a "rte_bus_sigbus_handler" to iterate all buses to find a bus
   to handle the failure.
 - Add a couple of APIs “rte_dev_hotplug_handle_enable” and
   “rte_dev_hotplug_handle_diable” to enable/disable hotplug handling.
   It will monitor the sigbus error by a handler which is per-process.
   Based on the signal event principle, the control path thread and the
   data path thread will randomly receive the sigbus error, but will call the
   common sigbus process. When sigbus be captured, it will call the above API
   to find bus to handle it.

The mechanism could be used by app or PMDs. For example, the whole process
of hotplug in testpmd is:
 - Enable device event monitor->Enable hotplug handle->Register event callback
   ->attach port->start port->start forwarding->Device unplug->failure handle
   ->stop forwarding->stop port->close port->detach port.

This patch set would not cover hotplug insert and binding, and it is only
implement the igb_uio failure handler, the vfio hotplug failure handler
will be in next coming patch set.

patchset history:
v11->v10:
change the ops name, since both uio and vfio will use the hot-unplug ops.
add experimental tag.
since we plan to abandon RTE_ETH_EVENT_INTR_RMV, change to use
RTE_DEV_EVENT_REMOVE, so modify the hotplug event and callback usage.
move the igb_uio fixing part, since it is random issue and should be considarate
as kernel driver defect but not include as this failure handler mechanism.

v10->v9:
modify the api name and exposure out for public use.
add hotplug handle enable/disable APIs
refine commit log

v9->v8:
refine commit log to be more readable.

v8->v7:
refine errno process in sigbus handler.
refine igb uio release process

v7->v6:
delete some unused part

v6->v5:
refine some description about bus ops
refine commit log
add some entry check.

v5->v4:
split patches to focus on the failure handle, remove the event usage
by testpmd to another patch.
change the hotplug failure handler name.
refine the sigbus handle logic.
add lock for udev state in igb uio driver.

v4->v3:
split patches to be small and clear.
change to use new parameter "--hotplug-mode" in testpmd to identify
the eal hotplug and ethdev hotplug.

v3->v2:
change bus ops name to bus_hotplug_handler.
add new API and bus ops of bus_signal_handler distingush handle generic.
sigbus and hotplug sigbus.

v2->v1(v21):
refine some doc and commit log.
fix igb uio kernel issue for control path failure rebase testpmd code.

Since the hot plug solution be discussed serval around in the public,
the scope be changed and the patch set be split into many times. Coming
to the recently RFC and feature design, it just focus on the hot unplug
failure handler at this patch set, so in order let this topic more clear
and focus, summarize privours patch set in history “v1(v21)”, the v2 here
go ahead for further track.

"v1(21)" == v21 as below:
v21->v20:
split function in hot unplug ops.
sync failure hanlde to fix multiple process issue fix attach port issue for multiple devices case.
combind rmv callback function to be only one.

v20->v19:
clean the code.
refine the remap logic for multiple device.
remove the auto binding.

v19->18:
note for limitation of multiple hotplug, fix some typo, sqeeze patch.

v18->v15:
add document, add signal bus handler, refine the code to be more clear.

the prior patch history please check the patch set "add device event monitor framework".

Jeff Guo (7):
  bus: add hot-unplug handler
  bus/pci: implement hot-unplug handler ops
  bus: add sigbus handler
  bus/pci: implement sigbus handler ops
  bus: add helper to handle sigbus
  eal: add failure handle mechanism for hot-unplug
  testpmd: use hot-unplug failure handle mechanism

 app/test-pmd/testpmd.c                  |  39 ++++++--
 doc/guides/rel_notes/release_18_08.rst  |   5 +
 drivers/bus/pci/pci_common.c            |  81 ++++++++++++++++
 drivers/bus/pci/pci_common_uio.c        |  33 +++++++
 drivers/bus/pci/private.h               |  12 +++
 lib/librte_eal/bsdapp/eal/eal_dev.c     |  14 +++
 lib/librte_eal/common/eal_common_bus.c  |  43 +++++++++
 lib/librte_eal/common/eal_private.h     |  39 ++++++++
 lib/librte_eal/common/include/rte_bus.h |  34 +++++++
 lib/librte_eal/common/include/rte_dev.h |  26 +++++
 lib/librte_eal/linuxapp/eal/eal_dev.c   | 162 +++++++++++++++++++++++++++++++-
 lib/librte_eal/rte_eal_version.map      |   2 +
 12 files changed, 481 insertions(+), 9 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH v11 1/7] bus: add hot-unplug handler
  2018-09-30 11:29       ` [PATCH v11 0/7] " Jeff Guo
@ 2018-09-30 11:29         ` Jeff Guo
  2018-09-30 11:29         ` [PATCH v11 2/7] bus/pci: implement hot-unplug handler ops Jeff Guo
                           ` (6 subsequent siblings)
  7 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-09-30 11:29 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu, anatoly.burakov
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

A hot-unplug failure and app crash can be caused, when a device is
hot-unplugged but the application still try to access the device
by reading or writing from the BARs, which is already invalid but
still not timely be unmap or released.

This patch introduces bus ops to handle hot-unplug failures. Each
bus can implement its own case-dependent logic to handle the failures.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Shaopeng He <shaopeng.he@intel.com>
---
v11->v10:
change the ops name, since both uio and vfio will use the hot-unplug ops.
---
 lib/librte_eal/common/include/rte_bus.h | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
index b7b5b08..1bb53dc 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -168,6 +168,20 @@ typedef int (*rte_bus_unplug_t)(struct rte_device *dev);
 typedef int (*rte_bus_parse_t)(const char *name, void *addr);
 
 /**
+ * Implement a specific hot-unplug handler, which is responsible for
+ * handle the failure when device be hot-unplugged. When the event of
+ * hot-unplug be detected, it could call this function to handle
+ * the hot-unplug failure and avoid app crash.
+ * @param dev
+ *	Pointer of the device structure.
+ *
+ * @return
+ *	0 on success.
+ *	!0 on error.
+ */
+typedef int (*rte_bus_hot_unplug_handler_t)(struct rte_device *dev);
+
+/**
  * Bus scan policies
  */
 enum rte_bus_scan_mode {
@@ -212,6 +226,8 @@ struct rte_bus {
 	struct rte_bus_conf conf;    /**< Bus configuration */
 	rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
 	rte_dev_iterate_t dev_iterate; /**< Device iterator. */
+	rte_bus_hot_unplug_handler_t hot_unplug_handler;
+				/**< handle hot-unplug failure on the bus */
 };
 
 /**
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v11 2/7] bus/pci: implement hot-unplug handler ops
  2018-09-30 11:29       ` [PATCH v11 0/7] " Jeff Guo
  2018-09-30 11:29         ` [PATCH v11 1/7] bus: add hot-unplug handler Jeff Guo
@ 2018-09-30 11:29         ` Jeff Guo
  2018-09-30 11:29         ` [PATCH v11 3/7] bus: add sigbus handler Jeff Guo
                           ` (5 subsequent siblings)
  7 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-09-30 11:29 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu, anatoly.burakov
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

This patch implements the ops to handle hot-unplug on the PCI bus.
For UIO PCI, it could avoids BARs read/write errors by creating a
new dummy memory to remap the memory where the failure is. For VFIO
or other kernel driver, it could specific implement function to handle
hot-unplug case by case.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Shaopeng He <shaopeng.he@intel.com>
---
v11->v10:
change the ops name
---
 drivers/bus/pci/pci_common.c     | 28 ++++++++++++++++++++++++++++
 drivers/bus/pci/pci_common_uio.c | 33 +++++++++++++++++++++++++++++++++
 drivers/bus/pci/private.h        | 12 ++++++++++++
 3 files changed, 73 insertions(+)

diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
index 7736b3f..d286234 100644
--- a/drivers/bus/pci/pci_common.c
+++ b/drivers/bus/pci/pci_common.c
@@ -406,6 +406,33 @@ pci_find_device(const struct rte_device *start, rte_dev_cmp_t cmp,
 }
 
 static int
+pci_hot_unplug_handler(struct rte_device *dev)
+{
+	struct rte_pci_device *pdev = NULL;
+	int ret = 0;
+
+	pdev = RTE_DEV_TO_PCI(dev);
+	if (!pdev)
+		return -1;
+
+	switch (pdev->kdrv) {
+	case RTE_KDRV_IGB_UIO:
+	case RTE_KDRV_UIO_GENERIC:
+	case RTE_KDRV_NIC_UIO:
+		/* BARs resource is invalid, remap it to be safe. */
+		ret = pci_uio_remap_resource(pdev);
+		break;
+	default:
+		RTE_LOG(DEBUG, EAL,
+			"Not managed by a supported kernel driver, skipped\n");
+		ret = -1;
+		break;
+	}
+
+	return ret;
+}
+
+static int
 pci_plug(struct rte_device *dev)
 {
 	return pci_probe_all_drivers(RTE_DEV_TO_PCI(dev));
@@ -435,6 +462,7 @@ struct rte_pci_bus rte_pci_bus = {
 		.unplug = pci_unplug,
 		.parse = pci_parse,
 		.get_iommu_class = rte_pci_get_iommu_class,
+		.hot_unplug_handler = pci_hot_unplug_handler,
 	},
 	.device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
 	.driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
diff --git a/drivers/bus/pci/pci_common_uio.c b/drivers/bus/pci/pci_common_uio.c
index 54bc20b..7ea73db 100644
--- a/drivers/bus/pci/pci_common_uio.c
+++ b/drivers/bus/pci/pci_common_uio.c
@@ -146,6 +146,39 @@ pci_uio_unmap(struct mapped_pci_resource *uio_res)
 	}
 }
 
+/* remap the PCI resource of a PCI device in anonymous virtual memory */
+int
+pci_uio_remap_resource(struct rte_pci_device *dev)
+{
+	int i;
+	void *map_address;
+
+	if (dev == NULL)
+		return -1;
+
+	/* Remap all BARs */
+	for (i = 0; i != PCI_MAX_RESOURCE; i++) {
+		/* skip empty BAR */
+		if (dev->mem_resource[i].phys_addr == 0)
+			continue;
+		map_address = mmap(dev->mem_resource[i].addr,
+				(size_t)dev->mem_resource[i].len,
+				PROT_READ | PROT_WRITE,
+				MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
+		if (map_address == MAP_FAILED) {
+			RTE_LOG(ERR, EAL,
+				"Cannot remap resource for device %s\n",
+				dev->name);
+			return -1;
+		}
+		RTE_LOG(INFO, EAL,
+			"Successful remap resource for device %s\n",
+			dev->name);
+	}
+
+	return 0;
+}
+
 static struct mapped_pci_resource *
 pci_uio_find_resource(struct rte_pci_device *dev)
 {
diff --git a/drivers/bus/pci/private.h b/drivers/bus/pci/private.h
index 8ddd03e..6b312e5 100644
--- a/drivers/bus/pci/private.h
+++ b/drivers/bus/pci/private.h
@@ -123,6 +123,18 @@ void pci_uio_free_resource(struct rte_pci_device *dev,
 		struct mapped_pci_resource *uio_res);
 
 /**
+ * Remap the PCI resource of a PCI device in anonymous virtual memory.
+ *
+ * @param dev
+ *   Point to the struct rte pci device.
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+pci_uio_remap_resource(struct rte_pci_device *dev);
+
+/**
  * Map device memory to uio resource
  *
  * This function is private to EAL.
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v11 3/7] bus: add sigbus handler
  2018-09-30 11:29       ` [PATCH v11 0/7] " Jeff Guo
  2018-09-30 11:29         ` [PATCH v11 1/7] bus: add hot-unplug handler Jeff Guo
  2018-09-30 11:29         ` [PATCH v11 2/7] bus/pci: implement hot-unplug handler ops Jeff Guo
@ 2018-09-30 11:29         ` Jeff Guo
  2018-09-30 11:30         ` [PATCH v11 4/7] bus/pci: implement sigbus handler ops Jeff Guo
                           ` (4 subsequent siblings)
  7 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-09-30 11:29 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu, anatoly.burakov
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

When a device is hot-unplugged, a sigbus error will occur of the datapath
can still read/write to the device. A handler is required here to capture
the sigbus signal and handle it appropriately.

This patch introduces a bus ops to handle sigbus errors. Each bus can
implement its own case-dependent logic to handle the sigbus errors.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Shaopeng He <shaopeng.he@intel.com>
---
v11->v10:
change some commit log
---
 lib/librte_eal/common/include/rte_bus.h | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
index 1bb53dc..201454a 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -182,6 +182,21 @@ typedef int (*rte_bus_parse_t)(const char *name, void *addr);
 typedef int (*rte_bus_hot_unplug_handler_t)(struct rte_device *dev);
 
 /**
+ * Implement a specific sigbus handler, which is responsible for handling
+ * the sigbus error which is either original memory error, or specific memory
+ * error that caused of device be hot-unplugged. When sigbus error be captured,
+ * it could call this function to handle sigbus error.
+ * @param failure_addr
+ *	Pointer of the fault address of the sigbus error.
+ *
+ * @return
+ *	0 for success handle the sigbus.
+ *	1 for no bus handle the sigbus.
+ *	-1 for failed to handle the sigbus
+ */
+typedef int (*rte_bus_sigbus_handler_t)(const void *failure_addr);
+
+/**
  * Bus scan policies
  */
 enum rte_bus_scan_mode {
@@ -228,6 +243,9 @@ struct rte_bus {
 	rte_dev_iterate_t dev_iterate; /**< Device iterator. */
 	rte_bus_hot_unplug_handler_t hot_unplug_handler;
 				/**< handle hot-unplug failure on the bus */
+	rte_bus_sigbus_handler_t sigbus_handler;
+					/**< handle sigbus error on the bus */
+
 };
 
 /**
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v11 4/7] bus/pci: implement sigbus handler ops
  2018-09-30 11:29       ` [PATCH v11 0/7] " Jeff Guo
                           ` (2 preceding siblings ...)
  2018-09-30 11:29         ` [PATCH v11 3/7] bus: add sigbus handler Jeff Guo
@ 2018-09-30 11:30         ` Jeff Guo
  2018-09-30 11:30         ` [PATCH v11 5/7] bus: add helper to handle sigbus Jeff Guo
                           ` (3 subsequent siblings)
  7 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-09-30 11:30 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu, anatoly.burakov
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

This patch implements the ops for the PCI bus sigbus handler. It finds the
PCI device that is being hot-unplugged and calls the relevant ops of the
hot-unplug handler to handle the hot-unplug failure of the device.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Shaopeng He <shaopeng.he@intel.com>
---
v11->v10:
change commit log.
---
 drivers/bus/pci/pci_common.c | 53 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 53 insertions(+)

diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
index d286234..f313fe9 100644
--- a/drivers/bus/pci/pci_common.c
+++ b/drivers/bus/pci/pci_common.c
@@ -405,6 +405,36 @@ pci_find_device(const struct rte_device *start, rte_dev_cmp_t cmp,
 	return NULL;
 }
 
+/**
+ * find the device which encounter the failure, by iterate over all device on
+ * PCI bus to check if the memory failure address is located in the range
+ * of the BARs of the device.
+ */
+static struct rte_pci_device *
+pci_find_device_by_addr(const void *failure_addr)
+{
+	struct rte_pci_device *pdev = NULL;
+	int i;
+
+	FOREACH_DEVICE_ON_PCIBUS(pdev) {
+		for (i = 0; i != RTE_DIM(pdev->mem_resource); i++) {
+			if ((uint64_t)(uintptr_t)failure_addr >=
+			    (uint64_t)(uintptr_t)pdev->mem_resource[i].addr &&
+			    (uint64_t)(uintptr_t)failure_addr <
+			    (uint64_t)(uintptr_t)pdev->mem_resource[i].addr +
+			    pdev->mem_resource[i].len) {
+				RTE_LOG(INFO, EAL, "Failure address "
+					"%16.16"PRIx64" belongs to "
+					"device %s!\n",
+					(uint64_t)(uintptr_t)failure_addr,
+					pdev->device.name);
+				return pdev;
+			}
+		}
+	}
+	return NULL;
+}
+
 static int
 pci_hot_unplug_handler(struct rte_device *dev)
 {
@@ -433,6 +463,28 @@ pci_hot_unplug_handler(struct rte_device *dev)
 }
 
 static int
+pci_sigbus_handler(const void *failure_addr)
+{
+	struct rte_pci_device *pdev = NULL;
+	int ret = 0;
+
+	pdev = pci_find_device_by_addr(failure_addr);
+	if (!pdev) {
+		/* It is a generic sigbus error, no bus would handle it. */
+		ret = 1;
+	} else {
+		/* The sigbus error is caused of hot-unplug. */
+		ret = pci_hot_unplug_handler(&pdev->device);
+		if (ret) {
+			RTE_LOG(ERR, EAL, "Failed to handle hot-unplug for "
+				"device %s", pdev->name);
+			ret = -1;
+		}
+	}
+	return ret;
+}
+
+static int
 pci_plug(struct rte_device *dev)
 {
 	return pci_probe_all_drivers(RTE_DEV_TO_PCI(dev));
@@ -463,6 +515,7 @@ struct rte_pci_bus rte_pci_bus = {
 		.parse = pci_parse,
 		.get_iommu_class = rte_pci_get_iommu_class,
 		.hot_unplug_handler = pci_hot_unplug_handler,
+		.sigbus_handler = pci_sigbus_handler,
 	},
 	.device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
 	.driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v11 5/7] bus: add helper to handle sigbus
  2018-09-30 11:29       ` [PATCH v11 0/7] " Jeff Guo
                           ` (3 preceding siblings ...)
  2018-09-30 11:30         ` [PATCH v11 4/7] bus/pci: implement sigbus handler ops Jeff Guo
@ 2018-09-30 11:30         ` Jeff Guo
  2018-09-30 11:30         ` [PATCH v11 6/7] eal: add failure handle mechanism for hot-unplug Jeff Guo
                           ` (2 subsequent siblings)
  7 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-09-30 11:30 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu, anatoly.burakov
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

This patch aims to add a helper to iterate over all buses to find the
relevant bus to handle the sigbus error.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Shaopeng He <shaopeng.he@intel.com>
---
v11->v10:
change some words.
---
 lib/librte_eal/common/eal_common_bus.c | 43 ++++++++++++++++++++++++++++++++++
 lib/librte_eal/common/eal_private.h    | 13 ++++++++++
 2 files changed, 56 insertions(+)

diff --git a/lib/librte_eal/common/eal_common_bus.c b/lib/librte_eal/common/eal_common_bus.c
index 0943851..62b7318 100644
--- a/lib/librte_eal/common/eal_common_bus.c
+++ b/lib/librte_eal/common/eal_common_bus.c
@@ -37,6 +37,7 @@
 #include <rte_bus.h>
 #include <rte_debug.h>
 #include <rte_string_fns.h>
+#include <rte_errno.h>
 
 #include "eal_private.h"
 
@@ -242,3 +243,45 @@ rte_bus_get_iommu_class(void)
 	}
 	return mode;
 }
+
+static int
+bus_handle_sigbus(const struct rte_bus *bus,
+			const void *failure_addr)
+{
+	int ret;
+
+	if (!bus->sigbus_handler)
+		return -1;
+
+	ret = bus->sigbus_handler(failure_addr);
+
+	/* find bus but handle failed, keep the errno be set. */
+	if (ret < 0 && rte_errno == 0)
+		rte_errno = ENOTSUP;
+
+	return ret > 0;
+}
+
+int
+rte_bus_sigbus_handler(const void *failure_addr)
+{
+	struct rte_bus *bus;
+
+	int ret = 0;
+	int old_errno = rte_errno;
+
+	rte_errno = 0;
+
+	bus = rte_bus_find(NULL, bus_handle_sigbus, failure_addr);
+	/* can not find bus. */
+	if (!bus)
+		return 1;
+	/* find bus but handle failed, pass on the new errno. */
+	else if (rte_errno != 0)
+		return -1;
+
+	/* restore the old errno. */
+	rte_errno = old_errno;
+
+	return ret;
+}
diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h
index 4f809a8..a2d1528 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -304,4 +304,17 @@ int
 rte_devargs_layers_parse(struct rte_devargs *devargs,
 			 const char *devstr);
 
+/**
+ * Iterate over all buses to find the corresponding bus to handle the sigbus
+ * error.
+ * @param failure_addr
+ *	Pointer of the fault address of the sigbus error.
+ *
+ * @return
+ *	 0 success to handle the sigbus.
+ *	-1 failed to handle the sigbus
+ *	 1 no bus can handler the sigbus
+ */
+int rte_bus_sigbus_handler(const void *failure_addr);
+
 #endif /* _EAL_PRIVATE_H_ */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v11 6/7] eal: add failure handle mechanism for hot-unplug
  2018-09-30 11:29       ` [PATCH v11 0/7] " Jeff Guo
                           ` (4 preceding siblings ...)
  2018-09-30 11:30         ` [PATCH v11 5/7] bus: add helper to handle sigbus Jeff Guo
@ 2018-09-30 11:30         ` Jeff Guo
  2018-09-30 19:46           ` Ananyev, Konstantin
  2018-09-30 11:30         ` [PATCH v11 7/7] testpmd: use hot-unplug failure handle mechanism Jeff Guo
  2018-10-01  9:00         ` [PATCH v11 0/7] " Stephen Hemminger
  7 siblings, 1 reply; 494+ messages in thread
From: Jeff Guo @ 2018-09-30 11:30 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu, anatoly.burakov
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

The mechanism can initially register the sigbus handler after the device
event monitor is enabled. When a sigbus event is captured, it will check
the failure address and accordingly handle the memory failure of the
corresponding device by invoke the hot-unplug handler. It could prevent
the application from crashing when a device is hot-unplugged.

By this patch, users could call below new added APIs to enable/disable
the device hotplug handle mechanism. Note that it just implement the
hot-unplug handler in these functions, the other handler of hotplug, such
as handler for hotplug binding, could be add in the future if need:
  - rte_dev_hotplug_handle_enable
  - rte_dev_hotplug_handle_disable

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Shaopeng He <shaopeng.he@intel.com>
---
v11->v10:
change some words and change the invoked func name.
add experimental tag
---
 doc/guides/rel_notes/release_18_08.rst  |   5 +
 lib/librte_eal/bsdapp/eal/eal_dev.c     |  14 +++
 lib/librte_eal/common/eal_private.h     |  26 +++++
 lib/librte_eal/common/include/rte_dev.h |  26 +++++
 lib/librte_eal/linuxapp/eal/eal_dev.c   | 162 +++++++++++++++++++++++++++++++-
 lib/librte_eal/rte_eal_version.map      |   2 +
 6 files changed, 234 insertions(+), 1 deletion(-)

diff --git a/doc/guides/rel_notes/release_18_08.rst b/doc/guides/rel_notes/release_18_08.rst
index 321fa84..fe0e60f 100644
--- a/doc/guides/rel_notes/release_18_08.rst
+++ b/doc/guides/rel_notes/release_18_08.rst
@@ -117,6 +117,11 @@ New Features
 
   Added support for chained mbufs (input and output).
 
+* **Added hot-unplug handle mechanism.**
+
+  ``rte_dev_hotplug_handle_enable`` and ``rte_dev_hotplug_handle_disable`` are
+  for enabling or disabling hotplug handle mechanism.
+
 
 API Changes
 -----------
diff --git a/lib/librte_eal/bsdapp/eal/eal_dev.c b/lib/librte_eal/bsdapp/eal/eal_dev.c
index 1c6c51b..255d611 100644
--- a/lib/librte_eal/bsdapp/eal/eal_dev.c
+++ b/lib/librte_eal/bsdapp/eal/eal_dev.c
@@ -19,3 +19,17 @@ rte_dev_event_monitor_stop(void)
 	RTE_LOG(ERR, EAL, "Device event is not supported for FreeBSD\n");
 	return -1;
 }
+
+int __rte_experimental
+rte_dev_hotplug_handle_enable(void)
+{
+	RTE_LOG(ERR, EAL, "Device event is not supported for FreeBSD\n");
+	return -1;
+}
+
+int __rte_experimental
+rte_dev_hotplug_handle_disable(void)
+{
+	RTE_LOG(ERR, EAL, "Device event is not supported for FreeBSD\n");
+	return -1;
+}
diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h
index a2d1528..637f20d 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -317,4 +317,30 @@ rte_devargs_layers_parse(struct rte_devargs *devargs,
  */
 int rte_bus_sigbus_handler(const void *failure_addr);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Register the sigbus handler.
+ *
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+rte_dev_sigbus_handler_register(void);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Unregister the sigbus handler.
+ *
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+rte_dev_sigbus_handler_unregister(void);
+
 #endif /* _EAL_PRIVATE_H_ */
diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h
index b80a805..ff580a0 100644
--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -460,4 +460,30 @@ rte_dev_event_monitor_start(void);
 int __rte_experimental
 rte_dev_event_monitor_stop(void);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Enable hotplug handling for devices.
+ *
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int __rte_experimental
+rte_dev_hotplug_handle_enable(void);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Disable hotplug handling for devices.
+ *
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int __rte_experimental
+rte_dev_hotplug_handle_disable(void);
+
 #endif /* _RTE_DEV_H_ */
diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
index 1cf6aeb..14b18d8 100644
--- a/lib/librte_eal/linuxapp/eal/eal_dev.c
+++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
@@ -4,6 +4,8 @@
 
 #include <string.h>
 #include <unistd.h>
+#include <fcntl.h>
+#include <signal.h>
 #include <sys/socket.h>
 #include <linux/netlink.h>
 
@@ -14,15 +16,32 @@
 #include <rte_malloc.h>
 #include <rte_interrupts.h>
 #include <rte_alarm.h>
+#include <rte_bus.h>
+#include <rte_eal.h>
+#include <rte_spinlock.h>
+#include <rte_errno.h>
 
 #include "eal_private.h"
 
 static struct rte_intr_handle intr_handle = {.fd = -1 };
 static bool monitor_started;
+static bool hotplug_handle;
 
 #define EAL_UEV_MSG_LEN 4096
 #define EAL_UEV_MSG_ELEM_LEN 128
 
+/*
+ * spinlock for device hot-unplug failure handling. If it try to access bus or
+ * device, such as handle sigbus on bus or handle memory failure for device
+ * just need to use this lock. It could protect the bus and the device to avoid
+ * race condition.
+ */
+static rte_spinlock_t failure_handle_lock = RTE_SPINLOCK_INITIALIZER;
+
+static struct sigaction sigbus_action_old;
+
+static int sigbus_need_recover;
+
 static void dev_uev_handler(__rte_unused void *param);
 
 /* identify the system layer which reports this event. */
@@ -33,6 +52,49 @@ enum eal_dev_event_subsystem {
 	EAL_DEV_EVENT_SUBSYSTEM_MAX
 };
 
+static void
+sigbus_action_recover(void)
+{
+	if (sigbus_need_recover) {
+		sigaction(SIGBUS, &sigbus_action_old, NULL);
+		sigbus_need_recover = 0;
+	}
+}
+
+static void sigbus_handler(int signum, siginfo_t *info,
+				void *ctx __rte_unused)
+{
+	int ret;
+
+	RTE_LOG(INFO, EAL, "Thread[%d] catch SIGBUS, fault address:%p\n",
+		(int)pthread_self(), info->si_addr);
+
+	rte_spinlock_lock(&failure_handle_lock);
+	ret = rte_bus_sigbus_handler(info->si_addr);
+	rte_spinlock_unlock(&failure_handle_lock);
+	if (ret == -1) {
+		rte_exit(EXIT_FAILURE,
+			 "Failed to handle SIGBUS for hot-unplug, "
+			 "(rte_errno: %s)!", strerror(rte_errno));
+	} else if (ret == 1) {
+		if (sigbus_action_old.sa_handler)
+			(*(sigbus_action_old.sa_handler))(signum);
+		else
+			rte_exit(EXIT_FAILURE,
+				 "Failed to handle generic SIGBUS!");
+	}
+
+	RTE_LOG(INFO, EAL, "Success to handle SIGBUS for hot-unplug!\n");
+}
+
+static int cmp_dev_name(const struct rte_device *dev,
+	const void *_name)
+{
+	const char *name = _name;
+
+	return strcmp(dev->name, name);
+}
+
 static int
 dev_uev_socket_fd_create(void)
 {
@@ -147,6 +209,9 @@ dev_uev_handler(__rte_unused void *param)
 	struct rte_dev_event uevent;
 	int ret;
 	char buf[EAL_UEV_MSG_LEN];
+	struct rte_bus *bus;
+	struct rte_device *dev;
+	const char *busname = "";
 
 	memset(&uevent, 0, sizeof(struct rte_dev_event));
 	memset(buf, 0, EAL_UEV_MSG_LEN);
@@ -171,8 +236,43 @@ dev_uev_handler(__rte_unused void *param)
 	RTE_LOG(DEBUG, EAL, "receive uevent(name:%s, type:%d, subsystem:%d)\n",
 		uevent.devname, uevent.type, uevent.subsystem);
 
-	if (uevent.devname)
+	switch (uevent.subsystem) {
+	case EAL_DEV_EVENT_SUBSYSTEM_PCI:
+	case EAL_DEV_EVENT_SUBSYSTEM_UIO:
+		busname = "pci";
+		break;
+	default:
+		break;
+	}
+
+	if (uevent.devname) {
+		if (uevent.type == RTE_DEV_EVENT_REMOVE && hotplug_handle) {
+			rte_spinlock_lock(&failure_handle_lock);
+			bus = rte_bus_find_by_name(busname);
+			if (bus == NULL) {
+				RTE_LOG(ERR, EAL, "Cannot find bus (%s)\n",
+					busname);
+				return;
+			}
+
+			dev = bus->find_device(NULL, cmp_dev_name,
+					       uevent.devname);
+			if (dev == NULL) {
+				RTE_LOG(ERR, EAL, "Cannot find device (%s) on "
+					"bus (%s)\n", uevent.devname, busname);
+				return;
+			}
+
+			ret = bus->hot_unplug_handler(dev);
+			rte_spinlock_unlock(&failure_handle_lock);
+			if (ret) {
+				RTE_LOG(ERR, EAL, "Can not handle hot-unplug "
+					"for device (%s)\n", dev->name);
+				return;
+			}
+		}
 		dev_callback_process(uevent.devname, uevent.type);
+	}
 }
 
 int __rte_experimental
@@ -220,5 +320,65 @@ rte_dev_event_monitor_stop(void)
 	close(intr_handle.fd);
 	intr_handle.fd = -1;
 	monitor_started = false;
+
 	return 0;
 }
+
+int __rte_experimental
+rte_dev_sigbus_handler_register(void)
+{
+	sigset_t mask;
+	struct sigaction action;
+
+	rte_errno = 0;
+
+	sigemptyset(&mask);
+	sigaddset(&mask, SIGBUS);
+	action.sa_flags = SA_SIGINFO;
+	action.sa_mask = mask;
+	action.sa_sigaction = sigbus_handler;
+	sigbus_need_recover = !sigaction(SIGBUS, &action, &sigbus_action_old);
+
+	return rte_errno;
+}
+
+int __rte_experimental
+rte_dev_sigbus_handler_unregister(void)
+{
+	rte_errno = 0;
+	sigbus_need_recover = 1;
+
+	sigbus_action_recover();
+
+	return rte_errno;
+}
+
+int __rte_experimental
+rte_dev_hotplug_handle_enable(void)
+{
+	int ret = 0;
+
+	ret = rte_dev_sigbus_handler_register();
+	if (ret < 0)
+		RTE_LOG(ERR, EAL, "fail to register sigbus handler for "
+			"devices.\n");
+
+	hotplug_handle = true;
+
+	return ret;
+}
+
+int __rte_experimental
+rte_dev_hotplug_handle_disable(void)
+{
+	int ret = 0;
+
+	ret = rte_dev_sigbus_handler_unregister();
+	if (ret < 0)
+		RTE_LOG(ERR, EAL, "fail to unregister sigbus handler for "
+			"devices.\n");
+
+	hotplug_handle = false;
+
+	return ret;
+}
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 73282bb..a3255aa 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -281,6 +281,8 @@ EXPERIMENTAL {
 	rte_dev_event_callback_unregister;
 	rte_dev_event_monitor_start;
 	rte_dev_event_monitor_stop;
+	rte_dev_hotplug_handle_enable;
+	rte_dev_hotplug_handle_disable;
 	rte_dev_iterator_init;
 	rte_dev_iterator_next;
 	rte_devargs_add;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v11 7/7] testpmd: use hot-unplug failure handle mechanism
  2018-09-30 11:29       ` [PATCH v11 0/7] " Jeff Guo
                           ` (5 preceding siblings ...)
  2018-09-30 11:30         ` [PATCH v11 6/7] eal: add failure handle mechanism for hot-unplug Jeff Guo
@ 2018-09-30 11:30         ` Jeff Guo
  2018-10-01  9:00         ` [PATCH v11 0/7] " Stephen Hemminger
  7 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-09-30 11:30 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu, anatoly.burakov
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

This patch use testpmd for example, to show how an app smoothly handle
failure when device be hot-unplug. Except app should enabled the device
event monitor and register the hotplug event’s callback, it also need
enable hotplug handle mechanism before running. Once app detect the
removal event, the hot-unplug callback would be called. It will first stop
the packet forwarding, then stop the port, close the port, and finally
detach the port to clean the device and release the resources.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v11->v10:
since we plan to abandon RTE_ETH_EVENT_INTR_RMV, change to use
RTE_DEV_EVENT_REMOVE, so modify the hotplug event and callback usage.
---
 app/test-pmd/testpmd.c | 39 +++++++++++++++++++++++++++++++--------
 1 file changed, 31 insertions(+), 8 deletions(-)

diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 001f0e5..bfef483 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -2093,14 +2093,22 @@ pmd_test_exit(void)
 
 	if (hot_plug) {
 		ret = rte_dev_event_monitor_stop();
-		if (ret)
+		if (ret) {
 			RTE_LOG(ERR, EAL,
 				"fail to stop device event monitor.");
+			return;
+		}
 
 		ret = eth_dev_event_callback_unregister();
 		if (ret)
+			return;
+
+		ret = rte_dev_hotplug_handle_disable();
+		if (ret) {
 			RTE_LOG(ERR, EAL,
-				"fail to unregister all event callbacks.");
+				"fail to disable hotplug handling.");
+			return;
+		}
 	}
 
 	printf("\nBye...\n");
@@ -2244,6 +2252,9 @@ static void
 eth_dev_event_callback(char *device_name, enum rte_dev_event_type type,
 			     __rte_unused void *arg)
 {
+	uint16_t port_id;
+	int ret;
+
 	if (type >= RTE_DEV_EVENT_MAX) {
 		fprintf(stderr, "%s called upon invalid event %d\n",
 			__func__, type);
@@ -2254,9 +2265,12 @@ eth_dev_event_callback(char *device_name, enum rte_dev_event_type type,
 	case RTE_DEV_EVENT_REMOVE:
 		RTE_LOG(ERR, EAL, "The device: %s has been removed!\n",
 			device_name);
-		/* TODO: After finish failure handle, begin to stop
-		 * packet forward, stop port, close port, detach port.
-		 */
+		ret = rte_eth_dev_get_port_by_name(device_name, &port_id);
+		if (ret) {
+			printf("can not get port by device %s!\n", device_name);
+			return;
+		}
+		rmv_event_callback((void *)(intptr_t)port_id);
 		break;
 	case RTE_DEV_EVENT_ADD:
 		RTE_LOG(ERR, EAL, "The device: %s has been added!\n",
@@ -2779,14 +2793,23 @@ main(int argc, char** argv)
 	init_config();
 
 	if (hot_plug) {
-		/* enable hot plug monitoring */
+		ret = rte_dev_hotplug_handle_enable();
+		if (ret) {
+			RTE_LOG(ERR, EAL,
+				"fail to enable hotplug handling.");
+			return -1;
+		}
+
 		ret = rte_dev_event_monitor_start();
 		if (ret) {
-			rte_errno = EINVAL;
+			RTE_LOG(ERR, EAL,
+				"fail to start device event monitoring.");
 			return -1;
 		}
-		eth_dev_event_callback_register();
 
+		ret = eth_dev_event_callback_register();
+		if (ret)
+			return -1;
 	}
 
 	if (start_port(RTE_PORT_ALL) != 0)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* Re: [PATCH v11 6/7] eal: add failure handle mechanism for hot-unplug
  2018-09-30 11:30         ` [PATCH v11 6/7] eal: add failure handle mechanism for hot-unplug Jeff Guo
@ 2018-09-30 19:46           ` Ananyev, Konstantin
  2018-10-02  4:01             ` Jeff Guo
  0 siblings, 1 reply; 494+ messages in thread
From: Ananyev, Konstantin @ 2018-09-30 19:46 UTC (permalink / raw)
  To: Guo, Jia, stephen, Richardson, Bruce, Yigit, Ferruh,
	gaetan.rivet, Wu, Jingjing, thomas, motih, matan, Van Haaren,
	Harry, Zhang, Qi Z, He, Shaopeng, Iremonger, Bernard, arybchenko,
	Lu, Wenzhuo, Burakov, Anatoly
  Cc: jblunck, shreyansh.jain, dev, Zhang, Helin

Hi Jeff,

> 
> The mechanism can initially register the sigbus handler after the device
> event monitor is enabled. When a sigbus event is captured, it will check
> the failure address and accordingly handle the memory failure of the
> corresponding device by invoke the hot-unplug handler. It could prevent
> the application from crashing when a device is hot-unplugged.
> 
> By this patch, users could call below new added APIs to enable/disable
> the device hotplug handle mechanism. Note that it just implement the
> hot-unplug handler in these functions, the other handler of hotplug, such
> as handler for hotplug binding, could be add in the future if need:
>   - rte_dev_hotplug_handle_enable
>   - rte_dev_hotplug_handle_disable
> 
> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> Acked-by: Shaopeng He <shaopeng.he@intel.com>
> ---
> v11->v10:
> change some words and change the invoked func name.
> add experimental tag
> ---
>  doc/guides/rel_notes/release_18_08.rst  |   5 +
>  lib/librte_eal/bsdapp/eal/eal_dev.c     |  14 +++
>  lib/librte_eal/common/eal_private.h     |  26 +++++
>  lib/librte_eal/common/include/rte_dev.h |  26 +++++
>  lib/librte_eal/linuxapp/eal/eal_dev.c   | 162 +++++++++++++++++++++++++++++++-
>  lib/librte_eal/rte_eal_version.map      |   2 +
>  6 files changed, 234 insertions(+), 1 deletion(-)
> 
> diff --git a/doc/guides/rel_notes/release_18_08.rst b/doc/guides/rel_notes/release_18_08.rst
> index 321fa84..fe0e60f 100644
> --- a/doc/guides/rel_notes/release_18_08.rst
> +++ b/doc/guides/rel_notes/release_18_08.rst
> @@ -117,6 +117,11 @@ New Features
> 
>    Added support for chained mbufs (input and output).
> 
> +* **Added hot-unplug handle mechanism.**
> +
> +  ``rte_dev_hotplug_handle_enable`` and ``rte_dev_hotplug_handle_disable`` are
> +  for enabling or disabling hotplug handle mechanism.
> +
> 
>  API Changes
>  -----------
> diff --git a/lib/librte_eal/bsdapp/eal/eal_dev.c b/lib/librte_eal/bsdapp/eal/eal_dev.c
> index 1c6c51b..255d611 100644
> --- a/lib/librte_eal/bsdapp/eal/eal_dev.c
> +++ b/lib/librte_eal/bsdapp/eal/eal_dev.c
> @@ -19,3 +19,17 @@ rte_dev_event_monitor_stop(void)
>  	RTE_LOG(ERR, EAL, "Device event is not supported for FreeBSD\n");
>  	return -1;
>  }
> +
> +int __rte_experimental
> +rte_dev_hotplug_handle_enable(void)
> +{
> +	RTE_LOG(ERR, EAL, "Device event is not supported for FreeBSD\n");
> +	return -1;
> +}
> +
> +int __rte_experimental
> +rte_dev_hotplug_handle_disable(void)
> +{
> +	RTE_LOG(ERR, EAL, "Device event is not supported for FreeBSD\n");
> +	return -1;
> +}
> diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h
> index a2d1528..637f20d 100644
> --- a/lib/librte_eal/common/eal_private.h
> +++ b/lib/librte_eal/common/eal_private.h
> @@ -317,4 +317,30 @@ rte_devargs_layers_parse(struct rte_devargs *devargs,
>   */
>  int rte_bus_sigbus_handler(const void *failure_addr);
> 
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Register the sigbus handler.
> + *
> + * @return
> + *   - On success, zero.
> + *   - On failure, a negative value.
> + */
> +int
> +rte_dev_sigbus_handler_register(void);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Unregister the sigbus handler.
> + *
> + * @return
> + *   - On success, zero.
> + *   - On failure, a negative value.
> + */
> +int
> +rte_dev_sigbus_handler_unregister(void);
> +
>  #endif /* _EAL_PRIVATE_H_ */
> diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h
> index b80a805..ff580a0 100644
> --- a/lib/librte_eal/common/include/rte_dev.h
> +++ b/lib/librte_eal/common/include/rte_dev.h
> @@ -460,4 +460,30 @@ rte_dev_event_monitor_start(void);
>  int __rte_experimental
>  rte_dev_event_monitor_stop(void);
> 
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Enable hotplug handling for devices.
> + *
> + * @return
> + *   - On success, zero.
> + *   - On failure, a negative value.
> + */
> +int __rte_experimental
> +rte_dev_hotplug_handle_enable(void);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Disable hotplug handling for devices.
> + *
> + * @return
> + *   - On success, zero.
> + *   - On failure, a negative value.
> + */
> +int __rte_experimental
> +rte_dev_hotplug_handle_disable(void);
> +
>  #endif /* _RTE_DEV_H_ */
> diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
> index 1cf6aeb..14b18d8 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_dev.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
> @@ -4,6 +4,8 @@
> 
>  #include <string.h>
>  #include <unistd.h>
> +#include <fcntl.h>
> +#include <signal.h>
>  #include <sys/socket.h>
>  #include <linux/netlink.h>
> 
> @@ -14,15 +16,32 @@
>  #include <rte_malloc.h>
>  #include <rte_interrupts.h>
>  #include <rte_alarm.h>
> +#include <rte_bus.h>
> +#include <rte_eal.h>
> +#include <rte_spinlock.h>
> +#include <rte_errno.h>
> 
>  #include "eal_private.h"
> 
>  static struct rte_intr_handle intr_handle = {.fd = -1 };
>  static bool monitor_started;
> +static bool hotplug_handle;
> 
>  #define EAL_UEV_MSG_LEN 4096
>  #define EAL_UEV_MSG_ELEM_LEN 128
> 
> +/*
> + * spinlock for device hot-unplug failure handling. If it try to access bus or
> + * device, such as handle sigbus on bus or handle memory failure for device
> + * just need to use this lock. It could protect the bus and the device to avoid
> + * race condition.
> + */
> +static rte_spinlock_t failure_handle_lock = RTE_SPINLOCK_INITIALIZER;
> +
> +static struct sigaction sigbus_action_old;
> +
> +static int sigbus_need_recover;
> +
>  static void dev_uev_handler(__rte_unused void *param);
> 
>  /* identify the system layer which reports this event. */
> @@ -33,6 +52,49 @@ enum eal_dev_event_subsystem {
>  	EAL_DEV_EVENT_SUBSYSTEM_MAX
>  };
> 
> +static void
> +sigbus_action_recover(void)
> +{
> +	if (sigbus_need_recover) {
> +		sigaction(SIGBUS, &sigbus_action_old, NULL);
> +		sigbus_need_recover = 0;
> +	}
> +}
> +
> +static void sigbus_handler(int signum, siginfo_t *info,
> +				void *ctx __rte_unused)
> +{
> +	int ret;
> +
> +	RTE_LOG(INFO, EAL, "Thread[%d] catch SIGBUS, fault address:%p\n",
> +		(int)pthread_self(), info->si_addr);
> +
> +	rte_spinlock_lock(&failure_handle_lock);
> +	ret = rte_bus_sigbus_handler(info->si_addr);
> +	rte_spinlock_unlock(&failure_handle_lock);
> +	if (ret == -1) {
> +		rte_exit(EXIT_FAILURE,
> +			 "Failed to handle SIGBUS for hot-unplug, "
> +			 "(rte_errno: %s)!", strerror(rte_errno));
> +	} else if (ret == 1) {
> +		if (sigbus_action_old.sa_handler)
> +			(*(sigbus_action_old.sa_handler))(signum);
> +		else
> +			rte_exit(EXIT_FAILURE,
> +				 "Failed to handle generic SIGBUS!");
> +	}
> +
> +	RTE_LOG(INFO, EAL, "Success to handle SIGBUS for hot-unplug!\n");
> +}
> +
> +static int cmp_dev_name(const struct rte_device *dev,
> +	const void *_name)
> +{
> +	const char *name = _name;
> +
> +	return strcmp(dev->name, name);
> +}
> +
>  static int
>  dev_uev_socket_fd_create(void)
>  {
> @@ -147,6 +209,9 @@ dev_uev_handler(__rte_unused void *param)
>  	struct rte_dev_event uevent;
>  	int ret;
>  	char buf[EAL_UEV_MSG_LEN];
> +	struct rte_bus *bus;
> +	struct rte_device *dev;
> +	const char *busname = "";
> 
>  	memset(&uevent, 0, sizeof(struct rte_dev_event));
>  	memset(buf, 0, EAL_UEV_MSG_LEN);
> @@ -171,8 +236,43 @@ dev_uev_handler(__rte_unused void *param)
>  	RTE_LOG(DEBUG, EAL, "receive uevent(name:%s, type:%d, subsystem:%d)\n",
>  		uevent.devname, uevent.type, uevent.subsystem);
> 
> -	if (uevent.devname)
> +	switch (uevent.subsystem) {
> +	case EAL_DEV_EVENT_SUBSYSTEM_PCI:
> +	case EAL_DEV_EVENT_SUBSYSTEM_UIO:
> +		busname = "pci";
> +		break;
> +	default:
> +		break;
> +	}
> +
> +	if (uevent.devname) {
> +		if (uevent.type == RTE_DEV_EVENT_REMOVE && hotplug_handle) {
> +			rte_spinlock_lock(&failure_handle_lock);
> +			bus = rte_bus_find_by_name(busname);
> +			if (bus == NULL) {
> +				RTE_LOG(ERR, EAL, "Cannot find bus (%s)\n",
> +					busname);
> +				return;
> +			}
> +
> +			dev = bus->find_device(NULL, cmp_dev_name,
> +					       uevent.devname);
> +			if (dev == NULL) {
> +				RTE_LOG(ERR, EAL, "Cannot find device (%s) on "
> +					"bus (%s)\n", uevent.devname, busname);
> +				return;
> +			}
> +
> +			ret = bus->hot_unplug_handler(dev);
> +			rte_spinlock_unlock(&failure_handle_lock);
> +			if (ret) {
> +				RTE_LOG(ERR, EAL, "Can not handle hot-unplug "
> +					"for device (%s)\n", dev->name);
> +				return;
> +			}
> +		}
>  		dev_callback_process(uevent.devname, uevent.type);
> +	}
>  }
> 
>  int __rte_experimental
> @@ -220,5 +320,65 @@ rte_dev_event_monitor_stop(void)
>  	close(intr_handle.fd);
>  	intr_handle.fd = -1;
>  	monitor_started = false;
> +
>  	return 0;
>  }
> +
> +int __rte_experimental
> +rte_dev_sigbus_handler_register(void)
> +{
> +	sigset_t mask;
> +	struct sigaction action;
> +
> +	rte_errno = 0;
> +

Shouldn't you call sigaction only if sigbus_need_recover == 0?

> +	sigemptyset(&mask);
> +	sigaddset(&mask, SIGBUS);
> +	action.sa_flags = SA_SIGINFO;
> +	action.sa_mask = mask;
> +	action.sa_sigaction = sigbus_handler;
> +	sigbus_need_recover = !sigaction(SIGBUS, &action, &sigbus_action_old);
> +
> +	return rte_errno;
> +}
> +
> +int __rte_experimental
> +rte_dev_sigbus_handler_unregister(void)
> +{
> +	rte_errno = 0;
> +	sigbus_need_recover = 1;

Hmm, why to set sugbus_need_recover to 1 here?
If user called rte_dev_sigbus_handler_register() before, and it was successful, it already would be 1.
In other cases, you probably don't have to do anything.
Konstantin

> +
> +	sigbus_action_recover();
> +
> +	return rte_errno;
> +}
> +
> +int __rte_experimental
> +rte_dev_hotplug_handle_enable(void)
> +{
> +	int ret = 0;
> +
> +	ret = rte_dev_sigbus_handler_register();
> +	if (ret < 0)
> +		RTE_LOG(ERR, EAL, "fail to register sigbus handler for "
> +			"devices.\n");
> +
> +	hotplug_handle = true;
> +
> +	return ret;
> +}
> +
> +int __rte_experimental
> +rte_dev_hotplug_handle_disable(void)
> +{
> +	int ret = 0;
> +
> +	ret = rte_dev_sigbus_handler_unregister();
> +	if (ret < 0)
> +		RTE_LOG(ERR, EAL, "fail to unregister sigbus handler for "
> +			"devices.\n");
> +
> +	hotplug_handle = false;
> +
> +	return ret;
> +}
> diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
> index 73282bb..a3255aa 100644
> --- a/lib/librte_eal/rte_eal_version.map
> +++ b/lib/librte_eal/rte_eal_version.map
> @@ -281,6 +281,8 @@ EXPERIMENTAL {
>  	rte_dev_event_callback_unregister;
>  	rte_dev_event_monitor_start;
>  	rte_dev_event_monitor_stop;
> +	rte_dev_hotplug_handle_enable;
> +	rte_dev_hotplug_handle_disable;
>  	rte_dev_iterator_init;
>  	rte_dev_iterator_next;
>  	rte_devargs_add;
> --
> 2.7.4

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v11 0/7] hot-unplug failure handle mechanism
  2018-09-30 11:29       ` [PATCH v11 0/7] " Jeff Guo
                           ` (6 preceding siblings ...)
  2018-09-30 11:30         ` [PATCH v11 7/7] testpmd: use hot-unplug failure handle mechanism Jeff Guo
@ 2018-10-01  9:00         ` Stephen Hemminger
  2018-10-01  9:55           ` Jerin Jacob
  2018-10-02  9:57           ` Jeff Guo
  7 siblings, 2 replies; 494+ messages in thread
From: Stephen Hemminger @ 2018-10-01  9:00 UTC (permalink / raw)
  To: Jeff Guo
  Cc: bruce.richardson, ferruh.yigit, konstantin.ananyev, gaetan.rivet,
	jingjing.wu, thomas, motih, matan, harry.van.haaren, qi.z.zhang,
	shaopeng.he, bernard.iremonger, arybchenko, wenzhuo.lu,
	anatoly.burakov, jblunck, shreyansh.jain, dev, helin.zhang

On Sun, 30 Sep 2018 19:29:56 +0800
Jeff Guo <jia.guo@intel.com> wrote:

> Hotplug is an important feature for use-cases like the datacenter device's
> fail-safe and for SRIOV Live Migration in SDN/NFV. It could bring higher
> flexibility and continuality to networking services in multiple use-cases
> in the industry. So let's see how DPDK can help users implement hotplug
> solutions.
> 
> We already have a general device-event monitor mechanism, failsafe driver,
> and hot plug/unplug API in DPDK. We have already got the solution of
> “ethdev event + kernel PMD hotplug handler + failsafe”, but we still not
> got “eal event + hotplug handler for pci PMD + failsafe” implement, and we
> need to considerate 2 different solutions between uio pci and vfio pci.
> 
> In the case of hotplug for igb_uio, when a hardware device be removed
> physically or disabled in software, the application needs to be notified
> and detach the device out of the bus, and then make the device invalidate.
> The problem is that, the removal of the device is not instantaneous in
> software. If the application data path tries to read/write to the device
> when removal is still in process, it will cause an MMIO error and
> application will crash.
> 
> In this patch set, we propose a PCIe bus failure handler mechanism for
> hot-unplug in igb_uio. It aims to guarantee that, when a hot-unplug occurs,
> the application will not crash.
> 
> The mechanism should work as below:
> 
> First, the application enables the device event monitor, registers the
> hotplug event’s callback and enable hotplug handling before running the
> data path. Once the hot-unplug occurs, the mechanism will detect the
> removal event and then accordingly do the failure handling. In order to
> do that, the below functionality will be required:
>  - Add a new bus ops “hot_unplug_handler” to handle hot-unplug failure.
>  - Implement pci bus specific ops “pci_hot_unplug_handler”. For uio pci,
>    it will be based on the failure address to remap memory for the corresponding
>    device that unplugged. For vfio pci, could seperate implement case by case.
> 
> For the data path or other unexpected behaviors from the control path
> when a hot unplug occurs:
>  - Add a new bus ops “sigbus_handler”, that is responsible for handling
>    the sigbus error which is either an original memory error, or a specific
>    memory error that is caused by a hot unplug. When a sigbus error is
>    captured, it will call this function to handle sigbus error.
>  - Implement PCI bus specific ops “pci_sigbus_handler”. It will iterate all
>    device on PCI bus to find which device encounter the failure.
>  - Implement a "rte_bus_sigbus_handler" to iterate all buses to find a bus
>    to handle the failure.
>  - Add a couple of APIs “rte_dev_hotplug_handle_enable” and
>    “rte_dev_hotplug_handle_diable” to enable/disable hotplug handling.
>    It will monitor the sigbus error by a handler which is per-process.
>    Based on the signal event principle, the control path thread and the
>    data path thread will randomly receive the sigbus error, but will call the
>    common sigbus process. When sigbus be captured, it will call the above API
>    to find bus to handle it.
> 
> The mechanism could be used by app or PMDs. For example, the whole process
> of hotplug in testpmd is:
>  - Enable device event monitor->Enable hotplug handle->Register event callback
>    ->attach port->start port->start forwarding->Device unplug->failure handle
>    ->stop forwarding->stop port->close port->detach port.  
> 
> This patch set would not cover hotplug insert and binding, and it is only
> implement the igb_uio failure handler, the vfio hotplug failure handler
> will be in next coming patch set.
> 
> patchset history:
> v11->v10:
> change the ops name, since both uio and vfio will use the hot-unplug ops.
> add experimental tag.
> since we plan to abandon RTE_ETH_EVENT_INTR_RMV, change to use
> RTE_DEV_EVENT_REMOVE, so modify the hotplug event and callback usage.
> move the igb_uio fixing part, since it is random issue and should be considarate
> as kernel driver defect but not include as this failure handler mechanism.
> 
> v10->v9:
> modify the api name and exposure out for public use.
> add hotplug handle enable/disable APIs
> refine commit log
> 
> v9->v8:
> refine commit log to be more readable.
> 
> v8->v7:
> refine errno process in sigbus handler.
> refine igb uio release process
> 
> v7->v6:
> delete some unused part
> 
> v6->v5:
> refine some description about bus ops
> refine commit log
> add some entry check.
> 
> v5->v4:
> split patches to focus on the failure handle, remove the event usage
> by testpmd to another patch.
> change the hotplug failure handler name.
> refine the sigbus handle logic.
> add lock for udev state in igb uio driver.
> 
> v4->v3:
> split patches to be small and clear.
> change to use new parameter "--hotplug-mode" in testpmd to identify
> the eal hotplug and ethdev hotplug.
> 
> v3->v2:
> change bus ops name to bus_hotplug_handler.
> add new API and bus ops of bus_signal_handler distingush handle generic.
> sigbus and hotplug sigbus.
> 
> v2->v1(v21):
> refine some doc and commit log.
> fix igb uio kernel issue for control path failure rebase testpmd code.
> 
> Since the hot plug solution be discussed serval around in the public,
> the scope be changed and the patch set be split into many times. Coming
> to the recently RFC and feature design, it just focus on the hot unplug
> failure handler at this patch set, so in order let this topic more clear
> and focus, summarize privours patch set in history “v1(v21)”, the v2 here
> go ahead for further track.
> 
> "v1(21)" == v21 as below:
> v21->v20:
> split function in hot unplug ops.
> sync failure hanlde to fix multiple process issue fix attach port issue for multiple devices case.
> combind rmv callback function to be only one.
> 
> v20->v19:
> clean the code.
> refine the remap logic for multiple device.
> remove the auto binding.
> 
> v19->18:
> note for limitation of multiple hotplug, fix some typo, sqeeze patch.
> 
> v18->v15:
> add document, add signal bus handler, refine the code to be more clear.
> 
> the prior patch history please check the patch set "add device event monitor framework".
> 
> Jeff Guo (7):
>   bus: add hot-unplug handler
>   bus/pci: implement hot-unplug handler ops
>   bus: add sigbus handler
>   bus/pci: implement sigbus handler ops
>   bus: add helper to handle sigbus
>   eal: add failure handle mechanism for hot-unplug
>   testpmd: use hot-unplug failure handle mechanism
> 
>  app/test-pmd/testpmd.c                  |  39 ++++++--
>  doc/guides/rel_notes/release_18_08.rst  |   5 +
>  drivers/bus/pci/pci_common.c            |  81 ++++++++++++++++
>  drivers/bus/pci/pci_common_uio.c        |  33 +++++++
>  drivers/bus/pci/private.h               |  12 +++
>  lib/librte_eal/bsdapp/eal/eal_dev.c     |  14 +++
>  lib/librte_eal/common/eal_common_bus.c  |  43 +++++++++
>  lib/librte_eal/common/eal_private.h     |  39 ++++++++
>  lib/librte_eal/common/include/rte_bus.h |  34 +++++++
>  lib/librte_eal/common/include/rte_dev.h |  26 +++++
>  lib/librte_eal/linuxapp/eal/eal_dev.c   | 162 +++++++++++++++++++++++++++++++-
>  lib/librte_eal/rte_eal_version.map      |   2 +
>  12 files changed, 481 insertions(+), 9 deletions(-)
> 

I am glad to see this, hotplug is needed. But have a somewhat controversial
point of view. The DPDK project needs to do more to force users to go to
more modern kernels and API's; there has been too much effort already to
support new DPDK on older kernels and distributions. This leads to higher
testing burden, technical debt and multiple API's.

To take the extreme point of view.
	* igb_uio should be deprecated and all new work only use vfio and vfio-ionommu only
	* kni should be deprecated and replaced by virtio

When there are N ways of doing things against X kernel versions,
and Y distributions, and multiple device vendors; the combinational explosion of cases means
that interfaces don't get the depth of testing they deserve.

That means why not support hotplug on VFIO only?

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v11 0/7] hot-unplug failure handle mechanism
  2018-10-01  9:00         ` [PATCH v11 0/7] " Stephen Hemminger
@ 2018-10-01  9:55           ` Jerin Jacob
  2018-10-02 10:08             ` Jeff Guo
  2018-10-02  9:57           ` Jeff Guo
  1 sibling, 1 reply; 494+ messages in thread
From: Jerin Jacob @ 2018-10-01  9:55 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Jeff Guo, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu, anatoly.burakov, jblunck, shreyansh.jain,
	dev, helin.zhang

-----Original Message-----
> Date: Mon, 1 Oct 2018 11:00:12 +0200
> From: Stephen Hemminger <stephen@networkplumber.org>
> To: Jeff Guo <jia.guo@intel.com>
> Cc: bruce.richardson@intel.com, ferruh.yigit@intel.com,
>  konstantin.ananyev@intel.com, gaetan.rivet@6wind.com,
>  jingjing.wu@intel.com, thomas@monjalon.net, motih@mellanox.com,
>  matan@mellanox.com, harry.van.haaren@intel.com, qi.z.zhang@intel.com,
>  shaopeng.he@intel.com, bernard.iremonger@intel.com,
>  arybchenko@solarflare.com, wenzhuo.lu@intel.com,
>  anatoly.burakov@intel.com, jblunck@infradead.org, shreyansh.jain@nxp.com,
>  dev@dpdk.org, helin.zhang@intel.com
> Subject: Re: [dpdk-dev] [PATCH v11 0/7] hot-unplug failure handle mechanism
> 
> 
> On Sun, 30 Sep 2018 19:29:56 +0800
> Jeff Guo <jia.guo@intel.com> wrote:
> 
> > Hotplug is an important feature for use-cases like the datacenter device's
> > fail-safe and for SRIOV Live Migration in SDN/NFV. It could bring higher
> > flexibility and continuality to networking services in multiple use-cases
> > in the industry. So let's see how DPDK can help users implement hotplug
> > solutions.
> >
> > We already have a general device-event monitor mechanism, failsafe driver,
> > and hot plug/unplug API in DPDK. We have already got the solution of
> > “ethdev event + kernel PMD hotplug handler + failsafe”, but we still not
> > got “eal event + hotplug handler for pci PMD + failsafe” implement, and we
> > need to considerate 2 different solutions between uio pci and vfio pci.
> >
> > In the case of hotplug for igb_uio, when a hardware device be removed
> > physically or disabled in software, the application needs to be notified
> > and detach the device out of the bus, and then make the device invalidate.
> > The problem is that, the removal of the device is not instantaneous in
> > software. If the application data path tries to read/write to the device
> > when removal is still in process, it will cause an MMIO error and
> > application will crash.
> >
> > In this patch set, we propose a PCIe bus failure handler mechanism for
> > hot-unplug in igb_uio. It aims to guarantee that, when a hot-unplug occurs,
> > the application will not crash.
> >
> > The mechanism should work as below:
> >
> > First, the application enables the device event monitor, registers the
> > hotplug event’s callback and enable hotplug handling before running the
> > data path. Once the hot-unplug occurs, the mechanism will detect the
> > removal event and then accordingly do the failure handling. In order to
> > do that, the below functionality will be required:
> >  - Add a new bus ops “hot_unplug_handler” to handle hot-unplug failure.
> >  - Implement pci bus specific ops “pci_hot_unplug_handler”. For uio pci,
> >    it will be based on the failure address to remap memory for the corresponding
> >    device that unplugged. For vfio pci, could seperate implement case by case.
> >
> > For the data path or other unexpected behaviors from the control path
> > when a hot unplug occurs:
> >  - Add a new bus ops “sigbus_handler”, that is responsible for handling
> >    the sigbus error which is either an original memory error, or a specific
> >    memory error that is caused by a hot unplug. When a sigbus error is
> >    captured, it will call this function to handle sigbus error.
> >  - Implement PCI bus specific ops “pci_sigbus_handler”. It will iterate all
> >    device on PCI bus to find which device encounter the failure.
> >  - Implement a "rte_bus_sigbus_handler" to iterate all buses to find a bus
> >    to handle the failure.
> >  - Add a couple of APIs “rte_dev_hotplug_handle_enable” and
> >    “rte_dev_hotplug_handle_diable” to enable/disable hotplug handling.
> >    It will monitor the sigbus error by a handler which is per-process.
> >    Based on the signal event principle, the control path thread and the
> >    data path thread will randomly receive the sigbus error, but will call the
> >    common sigbus process. When sigbus be captured, it will call the above API
> >    to find bus to handle it.
> >
> > The mechanism could be used by app or PMDs. For example, the whole process
> > of hotplug in testpmd is:
> >  - Enable device event monitor->Enable hotplug handle->Register event callback
> >    ->attach port->start port->start forwarding->Device unplug->failure handle
> >    ->stop forwarding->stop port->close port->detach port.
> >
> > This patch set would not cover hotplug insert and binding, and it is only
> > implement the igb_uio failure handler, the vfio hotplug failure handler
> > will be in next coming patch set.
> >
> > patchset history:
> > v11->v10:
> > change the ops name, since both uio and vfio will use the hot-unplug ops.
> > add experimental tag.
> > since we plan to abandon RTE_ETH_EVENT_INTR_RMV, change to use
> > RTE_DEV_EVENT_REMOVE, so modify the hotplug event and callback usage.
> > move the igb_uio fixing part, since it is random issue and should be considarate
> > as kernel driver defect but not include as this failure handler mechanism.
> >
> > v10->v9:
> > modify the api name and exposure out for public use.
> > add hotplug handle enable/disable APIs
> > refine commit log
> >
> > v9->v8:
> > refine commit log to be more readable.
> >
> > v8->v7:
> > refine errno process in sigbus handler.
> > refine igb uio release process
> >
> > v7->v6:
> > delete some unused part
> >
> > v6->v5:
> > refine some description about bus ops
> > refine commit log
> > add some entry check.
> >
> > v5->v4:
> > split patches to focus on the failure handle, remove the event usage
> > by testpmd to another patch.
> > change the hotplug failure handler name.
> > refine the sigbus handle logic.
> > add lock for udev state in igb uio driver.
> >
> > v4->v3:
> > split patches to be small and clear.
> > change to use new parameter "--hotplug-mode" in testpmd to identify
> > the eal hotplug and ethdev hotplug.
> >
> > v3->v2:
> > change bus ops name to bus_hotplug_handler.
> > add new API and bus ops of bus_signal_handler distingush handle generic.
> > sigbus and hotplug sigbus.
> >
> > v2->v1(v21):
> > refine some doc and commit log.
> > fix igb uio kernel issue for control path failure rebase testpmd code.
> >
> > Since the hot plug solution be discussed serval around in the public,
> > the scope be changed and the patch set be split into many times. Coming
> > to the recently RFC and feature design, it just focus on the hot unplug
> > failure handler at this patch set, so in order let this topic more clear
> > and focus, summarize privours patch set in history “v1(v21)”, the v2 here
> > go ahead for further track.
> >
> > "v1(21)" == v21 as below:
> > v21->v20:
> > split function in hot unplug ops.
> > sync failure hanlde to fix multiple process issue fix attach port issue for multiple devices case.
> > combind rmv callback function to be only one.
> >
> > v20->v19:
> > clean the code.
> > refine the remap logic for multiple device.
> > remove the auto binding.
> >
> > v19->18:
> > note for limitation of multiple hotplug, fix some typo, sqeeze patch.
> >
> > v18->v15:
> > add document, add signal bus handler, refine the code to be more clear.
> >
> > the prior patch history please check the patch set "add device event monitor framework".
> >
> > Jeff Guo (7):
> >   bus: add hot-unplug handler
> >   bus/pci: implement hot-unplug handler ops
> >   bus: add sigbus handler
> >   bus/pci: implement sigbus handler ops
> >   bus: add helper to handle sigbus
> >   eal: add failure handle mechanism for hot-unplug
> >   testpmd: use hot-unplug failure handle mechanism
> >
> >  app/test-pmd/testpmd.c                  |  39 ++++++--
> >  doc/guides/rel_notes/release_18_08.rst  |   5 +
> >  drivers/bus/pci/pci_common.c            |  81 ++++++++++++++++
> >  drivers/bus/pci/pci_common_uio.c        |  33 +++++++
> >  drivers/bus/pci/private.h               |  12 +++
> >  lib/librte_eal/bsdapp/eal/eal_dev.c     |  14 +++
> >  lib/librte_eal/common/eal_common_bus.c  |  43 +++++++++
> >  lib/librte_eal/common/eal_private.h     |  39 ++++++++
> >  lib/librte_eal/common/include/rte_bus.h |  34 +++++++
> >  lib/librte_eal/common/include/rte_dev.h |  26 +++++
> >  lib/librte_eal/linuxapp/eal/eal_dev.c   | 162 +++++++++++++++++++++++++++++++-
> >  lib/librte_eal/rte_eal_version.map      |   2 +
> >  12 files changed, 481 insertions(+), 9 deletions(-)
> >
> 
> I am glad to see this, hotplug is needed. But have a somewhat controversial
> point of view. The DPDK project needs to do more to force users to go to
> more modern kernels and API's; there has been too much effort already to
> support new DPDK on older kernels and distributions. This leads to higher
> testing burden, technical debt and multiple API's.
> 
> To take the extreme point of view.
>         * igb_uio should be deprecated and all new work only use vfio and vfio-ionommu only
>         * kni should be deprecated and replaced by virtio

+1

I think, The only feature missing in upstream kernel for DPDK may be to
enable SRIOV on PF VFIO devices controlled by DPDK PMD.
I think, Binding a PF device to DPDK along with VFs(VF can be bound to netdev or DPDK
PMDs, Though binding VF to netdev considered as security breach) will be useful for 
a) rte_flow actions like redirecting the traffic to PF or VF on the given pattern
b) Some NICs can support promiscuous mode only on PF
c) Enable Switch representation devices
https://doc.dpdk.org/guides/prog_guide/switch_representation.html 

I think, igb_uio mainly used as the backdoor for this use case.

I think, there was some work in this area but it is not upstreamed due
to various reasons.
https://lkml.org/lkml/2018/3/8/1122

> 
> When there are N ways of doing things against X kernel versions,
> and Y distributions, and multiple device vendors; the combinational explosion of cases means
> that interfaces don't get the depth of testing they deserve.
> 
> That means why not support hotplug on VFIO only?
> 

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v11 6/7] eal: add failure handle mechanism for hot-unplug
  2018-09-30 19:46           ` Ananyev, Konstantin
@ 2018-10-02  4:01             ` Jeff Guo
  0 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-10-02  4:01 UTC (permalink / raw)
  To: Ananyev, Konstantin, stephen, Richardson, Bruce, Yigit, Ferruh,
	gaetan.rivet, Wu, Jingjing, thomas, motih, matan, Van Haaren,
	Harry, Zhang, Qi Z, He, Shaopeng, Iremonger, Bernard, arybchenko,
	Lu, Wenzhuo, Burakov, Anatoly
  Cc: jblunck, shreyansh.jain, dev, Zhang, Helin

hi, constantin

On 10/1/2018 3:46 AM, Ananyev, Konstantin wrote:
> Hi Jeff,
>
>> The mechanism can initially register the sigbus handler after the device
>> event monitor is enabled. When a sigbus event is captured, it will check
>> the failure address and accordingly handle the memory failure of the
>> corresponding device by invoke the hot-unplug handler. It could prevent
>> the application from crashing when a device is hot-unplugged.
>>
>> By this patch, users could call below new added APIs to enable/disable
>> the device hotplug handle mechanism. Note that it just implement the
>> hot-unplug handler in these functions, the other handler of hotplug, such
>> as handler for hotplug binding, could be add in the future if need:
>>    - rte_dev_hotplug_handle_enable
>>    - rte_dev_hotplug_handle_disable
>>
>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
>> Acked-by: Shaopeng He <shaopeng.he@intel.com>
>> ---
>> v11->v10:
>> change some words and change the invoked func name.
>> add experimental tag
>> ---
>>   doc/guides/rel_notes/release_18_08.rst  |   5 +
>>   lib/librte_eal/bsdapp/eal/eal_dev.c     |  14 +++
>>   lib/librte_eal/common/eal_private.h     |  26 +++++
>>   lib/librte_eal/common/include/rte_dev.h |  26 +++++
>>   lib/librte_eal/linuxapp/eal/eal_dev.c   | 162 +++++++++++++++++++++++++++++++-
>>   lib/librte_eal/rte_eal_version.map      |   2 +
>>   6 files changed, 234 insertions(+), 1 deletion(-)
>>
>> diff --git a/doc/guides/rel_notes/release_18_08.rst b/doc/guides/rel_notes/release_18_08.rst
>> index 321fa84..fe0e60f 100644
>> --- a/doc/guides/rel_notes/release_18_08.rst
>> +++ b/doc/guides/rel_notes/release_18_08.rst
>> @@ -117,6 +117,11 @@ New Features
>>
>>     Added support for chained mbufs (input and output).
>>
>> +* **Added hot-unplug handle mechanism.**
>> +
>> +  ``rte_dev_hotplug_handle_enable`` and ``rte_dev_hotplug_handle_disable`` are
>> +  for enabling or disabling hotplug handle mechanism.
>> +
>>
>>   API Changes
>>   -----------
>> diff --git a/lib/librte_eal/bsdapp/eal/eal_dev.c b/lib/librte_eal/bsdapp/eal/eal_dev.c
>> index 1c6c51b..255d611 100644
>> --- a/lib/librte_eal/bsdapp/eal/eal_dev.c
>> +++ b/lib/librte_eal/bsdapp/eal/eal_dev.c
>> @@ -19,3 +19,17 @@ rte_dev_event_monitor_stop(void)
>>   	RTE_LOG(ERR, EAL, "Device event is not supported for FreeBSD\n");
>>   	return -1;
>>   }
>> +
>> +int __rte_experimental
>> +rte_dev_hotplug_handle_enable(void)
>> +{
>> +	RTE_LOG(ERR, EAL, "Device event is not supported for FreeBSD\n");
>> +	return -1;
>> +}
>> +
>> +int __rte_experimental
>> +rte_dev_hotplug_handle_disable(void)
>> +{
>> +	RTE_LOG(ERR, EAL, "Device event is not supported for FreeBSD\n");
>> +	return -1;
>> +}
>> diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h
>> index a2d1528..637f20d 100644
>> --- a/lib/librte_eal/common/eal_private.h
>> +++ b/lib/librte_eal/common/eal_private.h
>> @@ -317,4 +317,30 @@ rte_devargs_layers_parse(struct rte_devargs *devargs,
>>    */
>>   int rte_bus_sigbus_handler(const void *failure_addr);
>>
>> +/**
>> + * @warning
>> + * @b EXPERIMENTAL: this API may change without prior notice
>> + *
>> + * Register the sigbus handler.
>> + *
>> + * @return
>> + *   - On success, zero.
>> + *   - On failure, a negative value.
>> + */
>> +int
>> +rte_dev_sigbus_handler_register(void);
>> +
>> +/**
>> + * @warning
>> + * @b EXPERIMENTAL: this API may change without prior notice
>> + *
>> + * Unregister the sigbus handler.
>> + *
>> + * @return
>> + *   - On success, zero.
>> + *   - On failure, a negative value.
>> + */
>> +int
>> +rte_dev_sigbus_handler_unregister(void);
>> +
>>   #endif /* _EAL_PRIVATE_H_ */
>> diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h
>> index b80a805..ff580a0 100644
>> --- a/lib/librte_eal/common/include/rte_dev.h
>> +++ b/lib/librte_eal/common/include/rte_dev.h
>> @@ -460,4 +460,30 @@ rte_dev_event_monitor_start(void);
>>   int __rte_experimental
>>   rte_dev_event_monitor_stop(void);
>>
>> +/**
>> + * @warning
>> + * @b EXPERIMENTAL: this API may change without prior notice
>> + *
>> + * Enable hotplug handling for devices.
>> + *
>> + * @return
>> + *   - On success, zero.
>> + *   - On failure, a negative value.
>> + */
>> +int __rte_experimental
>> +rte_dev_hotplug_handle_enable(void);
>> +
>> +/**
>> + * @warning
>> + * @b EXPERIMENTAL: this API may change without prior notice
>> + *
>> + * Disable hotplug handling for devices.
>> + *
>> + * @return
>> + *   - On success, zero.
>> + *   - On failure, a negative value.
>> + */
>> +int __rte_experimental
>> +rte_dev_hotplug_handle_disable(void);
>> +
>>   #endif /* _RTE_DEV_H_ */
>> diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
>> index 1cf6aeb..14b18d8 100644
>> --- a/lib/librte_eal/linuxapp/eal/eal_dev.c
>> +++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
>> @@ -4,6 +4,8 @@
>>
>>   #include <string.h>
>>   #include <unistd.h>
>> +#include <fcntl.h>
>> +#include <signal.h>
>>   #include <sys/socket.h>
>>   #include <linux/netlink.h>
>>
>> @@ -14,15 +16,32 @@
>>   #include <rte_malloc.h>
>>   #include <rte_interrupts.h>
>>   #include <rte_alarm.h>
>> +#include <rte_bus.h>
>> +#include <rte_eal.h>
>> +#include <rte_spinlock.h>
>> +#include <rte_errno.h>
>>
>>   #include "eal_private.h"
>>
>>   static struct rte_intr_handle intr_handle = {.fd = -1 };
>>   static bool monitor_started;
>> +static bool hotplug_handle;
>>
>>   #define EAL_UEV_MSG_LEN 4096
>>   #define EAL_UEV_MSG_ELEM_LEN 128
>>
>> +/*
>> + * spinlock for device hot-unplug failure handling. If it try to access bus or
>> + * device, such as handle sigbus on bus or handle memory failure for device
>> + * just need to use this lock. It could protect the bus and the device to avoid
>> + * race condition.
>> + */
>> +static rte_spinlock_t failure_handle_lock = RTE_SPINLOCK_INITIALIZER;
>> +
>> +static struct sigaction sigbus_action_old;
>> +
>> +static int sigbus_need_recover;
>> +
>>   static void dev_uev_handler(__rte_unused void *param);
>>
>>   /* identify the system layer which reports this event. */
>> @@ -33,6 +52,49 @@ enum eal_dev_event_subsystem {
>>   	EAL_DEV_EVENT_SUBSYSTEM_MAX
>>   };
>>
>> +static void
>> +sigbus_action_recover(void)
>> +{
>> +	if (sigbus_need_recover) {
>> +		sigaction(SIGBUS, &sigbus_action_old, NULL);
>> +		sigbus_need_recover = 0;
>> +	}
>> +}
>> +
>> +static void sigbus_handler(int signum, siginfo_t *info,
>> +				void *ctx __rte_unused)
>> +{
>> +	int ret;
>> +
>> +	RTE_LOG(INFO, EAL, "Thread[%d] catch SIGBUS, fault address:%p\n",
>> +		(int)pthread_self(), info->si_addr);
>> +
>> +	rte_spinlock_lock(&failure_handle_lock);
>> +	ret = rte_bus_sigbus_handler(info->si_addr);
>> +	rte_spinlock_unlock(&failure_handle_lock);
>> +	if (ret == -1) {
>> +		rte_exit(EXIT_FAILURE,
>> +			 "Failed to handle SIGBUS for hot-unplug, "
>> +			 "(rte_errno: %s)!", strerror(rte_errno));
>> +	} else if (ret == 1) {
>> +		if (sigbus_action_old.sa_handler)
>> +			(*(sigbus_action_old.sa_handler))(signum);
>> +		else
>> +			rte_exit(EXIT_FAILURE,
>> +				 "Failed to handle generic SIGBUS!");
>> +	}
>> +
>> +	RTE_LOG(INFO, EAL, "Success to handle SIGBUS for hot-unplug!\n");
>> +}
>> +
>> +static int cmp_dev_name(const struct rte_device *dev,
>> +	const void *_name)
>> +{
>> +	const char *name = _name;
>> +
>> +	return strcmp(dev->name, name);
>> +}
>> +
>>   static int
>>   dev_uev_socket_fd_create(void)
>>   {
>> @@ -147,6 +209,9 @@ dev_uev_handler(__rte_unused void *param)
>>   	struct rte_dev_event uevent;
>>   	int ret;
>>   	char buf[EAL_UEV_MSG_LEN];
>> +	struct rte_bus *bus;
>> +	struct rte_device *dev;
>> +	const char *busname = "";
>>
>>   	memset(&uevent, 0, sizeof(struct rte_dev_event));
>>   	memset(buf, 0, EAL_UEV_MSG_LEN);
>> @@ -171,8 +236,43 @@ dev_uev_handler(__rte_unused void *param)
>>   	RTE_LOG(DEBUG, EAL, "receive uevent(name:%s, type:%d, subsystem:%d)\n",
>>   		uevent.devname, uevent.type, uevent.subsystem);
>>
>> -	if (uevent.devname)
>> +	switch (uevent.subsystem) {
>> +	case EAL_DEV_EVENT_SUBSYSTEM_PCI:
>> +	case EAL_DEV_EVENT_SUBSYSTEM_UIO:
>> +		busname = "pci";
>> +		break;
>> +	default:
>> +		break;
>> +	}
>> +
>> +	if (uevent.devname) {
>> +		if (uevent.type == RTE_DEV_EVENT_REMOVE && hotplug_handle) {
>> +			rte_spinlock_lock(&failure_handle_lock);
>> +			bus = rte_bus_find_by_name(busname);
>> +			if (bus == NULL) {
>> +				RTE_LOG(ERR, EAL, "Cannot find bus (%s)\n",
>> +					busname);
>> +				return;
>> +			}
>> +
>> +			dev = bus->find_device(NULL, cmp_dev_name,
>> +					       uevent.devname);
>> +			if (dev == NULL) {
>> +				RTE_LOG(ERR, EAL, "Cannot find device (%s) on "
>> +					"bus (%s)\n", uevent.devname, busname);
>> +				return;
>> +			}
>> +
>> +			ret = bus->hot_unplug_handler(dev);
>> +			rte_spinlock_unlock(&failure_handle_lock);
>> +			if (ret) {
>> +				RTE_LOG(ERR, EAL, "Can not handle hot-unplug "
>> +					"for device (%s)\n", dev->name);
>> +				return;
>> +			}
>> +		}
>>   		dev_callback_process(uevent.devname, uevent.type);
>> +	}
>>   }
>>
>>   int __rte_experimental
>> @@ -220,5 +320,65 @@ rte_dev_event_monitor_stop(void)
>>   	close(intr_handle.fd);
>>   	intr_handle.fd = -1;
>>   	monitor_started = false;
>> +
>>   	return 0;
>>   }
>> +
>> +int __rte_experimental
>> +rte_dev_sigbus_handler_register(void)
>> +{
>> +	sigset_t mask;
>> +	struct sigaction action;
>> +
>> +	rte_errno = 0;
>> +
> Shouldn't you call sigaction only if sigbus_need_recover == 0?


i guess what you mean is that the sigbus_need_recover is need check 
before call sigaction, because the register could be call many times, if 
so, i think it is make sense to check it every time.


>> +	sigemptyset(&mask);
>> +	sigaddset(&mask, SIGBUS);
>> +	action.sa_flags = SA_SIGINFO;
>> +	action.sa_mask = mask;
>> +	action.sa_sigaction = sigbus_handler;
>> +	sigbus_need_recover = !sigaction(SIGBUS, &action, &sigbus_action_old);
>> +
>> +	return rte_errno;
>> +}
>> +
>> +int __rte_experimental
>> +rte_dev_sigbus_handler_unregister(void)
>> +{
>> +	rte_errno = 0;
>> +	sigbus_need_recover = 1;
> Hmm, why to set sugbus_need_recover to 1 here?
> If user called rte_dev_sigbus_handler_register() before, and it was successful, it already would be 1.
> In other cases, you probably don't have to do anything.
> Konstantin


you are right, it should let each sigaction calling to manage this macro 
but no other more place, thanks.


>> +
>> +	sigbus_action_recover();
>> +
>> +	return rte_errno;
>> +}
>> +
>> +int __rte_experimental
>> +rte_dev_hotplug_handle_enable(void)
>> +{
>> +	int ret = 0;
>> +
>> +	ret = rte_dev_sigbus_handler_register();
>> +	if (ret < 0)
>> +		RTE_LOG(ERR, EAL, "fail to register sigbus handler for "
>> +			"devices.\n");
>> +
>> +	hotplug_handle = true;
>> +
>> +	return ret;
>> +}
>> +
>> +int __rte_experimental
>> +rte_dev_hotplug_handle_disable(void)
>> +{
>> +	int ret = 0;
>> +
>> +	ret = rte_dev_sigbus_handler_unregister();
>> +	if (ret < 0)
>> +		RTE_LOG(ERR, EAL, "fail to unregister sigbus handler for "
>> +			"devices.\n");
>> +
>> +	hotplug_handle = false;
>> +
>> +	return ret;
>> +}
>> diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
>> index 73282bb..a3255aa 100644
>> --- a/lib/librte_eal/rte_eal_version.map
>> +++ b/lib/librte_eal/rte_eal_version.map
>> @@ -281,6 +281,8 @@ EXPERIMENTAL {
>>   	rte_dev_event_callback_unregister;
>>   	rte_dev_event_monitor_start;
>>   	rte_dev_event_monitor_stop;
>> +	rte_dev_hotplug_handle_enable;
>> +	rte_dev_hotplug_handle_disable;
>>   	rte_dev_iterator_init;
>>   	rte_dev_iterator_next;
>>   	rte_devargs_add;
>> --
>> 2.7.4

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v11 0/7] hot-unplug failure handle mechanism
  2018-10-01  9:00         ` [PATCH v11 0/7] " Stephen Hemminger
  2018-10-01  9:55           ` Jerin Jacob
@ 2018-10-02  9:57           ` Jeff Guo
  1 sibling, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-10-02  9:57 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: bruce.richardson, ferruh.yigit, konstantin.ananyev, gaetan.rivet,
	jingjing.wu, thomas, motih, matan, harry.van.haaren, qi.z.zhang,
	shaopeng.he, bernard.iremonger, arybchenko, wenzhuo.lu,
	anatoly.burakov, jblunck, shreyansh.jain, dev, helin.zhang

hi, stephen

Thanks for your review, my answer as below.

On 10/1/2018 5:00 PM, Stephen Hemminger wrote:
> On Sun, 30 Sep 2018 19:29:56 +0800
> Jeff Guo <jia.guo@intel.com> wrote:
>
>> Hotplug is an important feature for use-cases like the datacenter device's
>> fail-safe and for SRIOV Live Migration in SDN/NFV. It could bring higher
>> flexibility and continuality to networking services in multiple use-cases
>> in the industry. So let's see how DPDK can help users implement hotplug
>> solutions.
>>
>> We already have a general device-event monitor mechanism, failsafe driver,
>> and hot plug/unplug API in DPDK. We have already got the solution of
>> “ethdev event + kernel PMD hotplug handler + failsafe”, but we still not
>> got “eal event + hotplug handler for pci PMD + failsafe” implement, and we
>> need to considerate 2 different solutions between uio pci and vfio pci.
>>
>> In the case of hotplug for igb_uio, when a hardware device be removed
>> physically or disabled in software, the application needs to be notified
>> and detach the device out of the bus, and then make the device invalidate.
>> The problem is that, the removal of the device is not instantaneous in
>> software. If the application data path tries to read/write to the device
>> when removal is still in process, it will cause an MMIO error and
>> application will crash.
>>
>> In this patch set, we propose a PCIe bus failure handler mechanism for
>> hot-unplug in igb_uio. It aims to guarantee that, when a hot-unplug occurs,
>> the application will not crash.
>>
>> The mechanism should work as below:
>>
>> First, the application enables the device event monitor, registers the
>> hotplug event’s callback and enable hotplug handling before running the
>> data path. Once the hot-unplug occurs, the mechanism will detect the
>> removal event and then accordingly do the failure handling. In order to
>> do that, the below functionality will be required:
>>   - Add a new bus ops “hot_unplug_handler” to handle hot-unplug failure.
>>   - Implement pci bus specific ops “pci_hot_unplug_handler”. For uio pci,
>>     it will be based on the failure address to remap memory for the corresponding
>>     device that unplugged. For vfio pci, could seperate implement case by case.
>>
>> For the data path or other unexpected behaviors from the control path
>> when a hot unplug occurs:
>>   - Add a new bus ops “sigbus_handler”, that is responsible for handling
>>     the sigbus error which is either an original memory error, or a specific
>>     memory error that is caused by a hot unplug. When a sigbus error is
>>     captured, it will call this function to handle sigbus error.
>>   - Implement PCI bus specific ops “pci_sigbus_handler”. It will iterate all
>>     device on PCI bus to find which device encounter the failure.
>>   - Implement a "rte_bus_sigbus_handler" to iterate all buses to find a bus
>>     to handle the failure.
>>   - Add a couple of APIs “rte_dev_hotplug_handle_enable” and
>>     “rte_dev_hotplug_handle_diable” to enable/disable hotplug handling.
>>     It will monitor the sigbus error by a handler which is per-process.
>>     Based on the signal event principle, the control path thread and the
>>     data path thread will randomly receive the sigbus error, but will call the
>>     common sigbus process. When sigbus be captured, it will call the above API
>>     to find bus to handle it.
>>
>> The mechanism could be used by app or PMDs. For example, the whole process
>> of hotplug in testpmd is:
>>   - Enable device event monitor->Enable hotplug handle->Register event callback
>>     ->attach port->start port->start forwarding->Device unplug->failure handle
>>     ->stop forwarding->stop port->close port->detach port.
>>
>> This patch set would not cover hotplug insert and binding, and it is only
>> implement the igb_uio failure handler, the vfio hotplug failure handler
>> will be in next coming patch set.
>>
>> patchset history:
>> v11->v10:
>> change the ops name, since both uio and vfio will use the hot-unplug ops.
>> add experimental tag.
>> since we plan to abandon RTE_ETH_EVENT_INTR_RMV, change to use
>> RTE_DEV_EVENT_REMOVE, so modify the hotplug event and callback usage.
>> move the igb_uio fixing part, since it is random issue and should be considarate
>> as kernel driver defect but not include as this failure handler mechanism.
>>
>> v10->v9:
>> modify the api name and exposure out for public use.
>> add hotplug handle enable/disable APIs
>> refine commit log
>>
>> v9->v8:
>> refine commit log to be more readable.
>>
>> v8->v7:
>> refine errno process in sigbus handler.
>> refine igb uio release process
>>
>> v7->v6:
>> delete some unused part
>>
>> v6->v5:
>> refine some description about bus ops
>> refine commit log
>> add some entry check.
>>
>> v5->v4:
>> split patches to focus on the failure handle, remove the event usage
>> by testpmd to another patch.
>> change the hotplug failure handler name.
>> refine the sigbus handle logic.
>> add lock for udev state in igb uio driver.
>>
>> v4->v3:
>> split patches to be small and clear.
>> change to use new parameter "--hotplug-mode" in testpmd to identify
>> the eal hotplug and ethdev hotplug.
>>
>> v3->v2:
>> change bus ops name to bus_hotplug_handler.
>> add new API and bus ops of bus_signal_handler distingush handle generic.
>> sigbus and hotplug sigbus.
>>
>> v2->v1(v21):
>> refine some doc and commit log.
>> fix igb uio kernel issue for control path failure rebase testpmd code.
>>
>> Since the hot plug solution be discussed serval around in the public,
>> the scope be changed and the patch set be split into many times. Coming
>> to the recently RFC and feature design, it just focus on the hot unplug
>> failure handler at this patch set, so in order let this topic more clear
>> and focus, summarize privours patch set in history “v1(v21)”, the v2 here
>> go ahead for further track.
>>
>> "v1(21)" == v21 as below:
>> v21->v20:
>> split function in hot unplug ops.
>> sync failure hanlde to fix multiple process issue fix attach port issue for multiple devices case.
>> combind rmv callback function to be only one.
>>
>> v20->v19:
>> clean the code.
>> refine the remap logic for multiple device.
>> remove the auto binding.
>>
>> v19->18:
>> note for limitation of multiple hotplug, fix some typo, sqeeze patch.
>>
>> v18->v15:
>> add document, add signal bus handler, refine the code to be more clear.
>>
>> the prior patch history please check the patch set "add device event monitor framework".
>>
>> Jeff Guo (7):
>>    bus: add hot-unplug handler
>>    bus/pci: implement hot-unplug handler ops
>>    bus: add sigbus handler
>>    bus/pci: implement sigbus handler ops
>>    bus: add helper to handle sigbus
>>    eal: add failure handle mechanism for hot-unplug
>>    testpmd: use hot-unplug failure handle mechanism
>>
>>   app/test-pmd/testpmd.c                  |  39 ++++++--
>>   doc/guides/rel_notes/release_18_08.rst  |   5 +
>>   drivers/bus/pci/pci_common.c            |  81 ++++++++++++++++
>>   drivers/bus/pci/pci_common_uio.c        |  33 +++++++
>>   drivers/bus/pci/private.h               |  12 +++
>>   lib/librte_eal/bsdapp/eal/eal_dev.c     |  14 +++
>>   lib/librte_eal/common/eal_common_bus.c  |  43 +++++++++
>>   lib/librte_eal/common/eal_private.h     |  39 ++++++++
>>   lib/librte_eal/common/include/rte_bus.h |  34 +++++++
>>   lib/librte_eal/common/include/rte_dev.h |  26 +++++
>>   lib/librte_eal/linuxapp/eal/eal_dev.c   | 162 +++++++++++++++++++++++++++++++-
>>   lib/librte_eal/rte_eal_version.map      |   2 +
>>   12 files changed, 481 insertions(+), 9 deletions(-)
>>
> I am glad to see this, hotplug is needed. But have a somewhat controversial
> point of view. The DPDK project needs to do more to force users to go to
> more modern kernels and API's; there has been too much effort already to
> support new DPDK on older kernels and distributions. This leads to higher
> testing burden, technical debt and multiple API's.
>
> To take the extreme point of view.
> 	* igb_uio should be deprecated and all new work only use vfio and vfio-ionommu only
> 	* kni should be deprecated and replaced by virtio
>
> When there are N ways of doing things against X kernel versions,
> and Y distributions, and multiple device vendors; the combinational explosion of cases means
> that interfaces don't get the depth of testing they deserve.
>
> That means why not support hotplug on VFIO only?


I think you gave a very constructive suggestion, but i am not 100% sure 
for that, at least something need we considerate and discuss here.

1)Is it announced to all dpdk user that igb_uio will be deprecated, and 
is it plan the time slot when will be?

2)At next, the subject of considerate should be the cost of the 
transaction. During the transaction, what is better way, to set 
experimental and implement and testing it to benefit for igb_uio user 
and customer, or just let this part to be vacation and avoid any 
unnecessary combinational cost of implementation and testing .

I think we can fulfill hotplug for our dpdk user and customer, if we 
could find a better way to handle 1) and 2).

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v11 0/7] hot-unplug failure handle mechanism
  2018-10-01  9:55           ` Jerin Jacob
@ 2018-10-02 10:08             ` Jeff Guo
  0 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-10-02 10:08 UTC (permalink / raw)
  To: Jerin Jacob, Stephen Hemminger
  Cc: bruce.richardson, ferruh.yigit, konstantin.ananyev, gaetan.rivet,
	jingjing.wu, thomas, motih, matan, harry.van.haaren, qi.z.zhang,
	shaopeng.he, bernard.iremonger, arybchenko, wenzhuo.lu,
	anatoly.burakov, jblunck, shreyansh.jain, dev, helin.zhang

hi, jerin

Thanks for your comment and reply as below.

On 10/1/2018 5:55 PM, Jerin Jacob wrote:
> -----Original Message-----
>> Date: Mon, 1 Oct 2018 11:00:12 +0200
>> From: Stephen Hemminger <stephen@networkplumber.org>
>> To: Jeff Guo <jia.guo@intel.com>
>> Cc: bruce.richardson@intel.com, ferruh.yigit@intel.com,
>>   konstantin.ananyev@intel.com, gaetan.rivet@6wind.com,
>>   jingjing.wu@intel.com, thomas@monjalon.net, motih@mellanox.com,
>>   matan@mellanox.com, harry.van.haaren@intel.com, qi.z.zhang@intel.com,
>>   shaopeng.he@intel.com, bernard.iremonger@intel.com,
>>   arybchenko@solarflare.com, wenzhuo.lu@intel.com,
>>   anatoly.burakov@intel.com, jblunck@infradead.org, shreyansh.jain@nxp.com,
>>   dev@dpdk.org, helin.zhang@intel.com
>> Subject: Re: [dpdk-dev] [PATCH v11 0/7] hot-unplug failure handle mechanism
>>
>>
>> On Sun, 30 Sep 2018 19:29:56 +0800
>> Jeff Guo <jia.guo@intel.com> wrote:
>>
>>> Hotplug is an important feature for use-cases like the datacenter device's
>>> fail-safe and for SRIOV Live Migration in SDN/NFV. It could bring higher
>>> flexibility and continuality to networking services in multiple use-cases
>>> in the industry. So let's see how DPDK can help users implement hotplug
>>> solutions.
>>>
>>> We already have a general device-event monitor mechanism, failsafe driver,
>>> and hot plug/unplug API in DPDK. We have already got the solution of
>>> “ethdev event + kernel PMD hotplug handler + failsafe”, but we still not
>>> got “eal event + hotplug handler for pci PMD + failsafe” implement, and we
>>> need to considerate 2 different solutions between uio pci and vfio pci.
>>>
>>> In the case of hotplug for igb_uio, when a hardware device be removed
>>> physically or disabled in software, the application needs to be notified
>>> and detach the device out of the bus, and then make the device invalidate.
>>> The problem is that, the removal of the device is not instantaneous in
>>> software. If the application data path tries to read/write to the device
>>> when removal is still in process, it will cause an MMIO error and
>>> application will crash.
>>>
>>> In this patch set, we propose a PCIe bus failure handler mechanism for
>>> hot-unplug in igb_uio. It aims to guarantee that, when a hot-unplug occurs,
>>> the application will not crash.
>>>
>>> The mechanism should work as below:
>>>
>>> First, the application enables the device event monitor, registers the
>>> hotplug event’s callback and enable hotplug handling before running the
>>> data path. Once the hot-unplug occurs, the mechanism will detect the
>>> removal event and then accordingly do the failure handling. In order to
>>> do that, the below functionality will be required:
>>>   - Add a new bus ops “hot_unplug_handler” to handle hot-unplug failure.
>>>   - Implement pci bus specific ops “pci_hot_unplug_handler”. For uio pci,
>>>     it will be based on the failure address to remap memory for the corresponding
>>>     device that unplugged. For vfio pci, could seperate implement case by case.
>>>
>>> For the data path or other unexpected behaviors from the control path
>>> when a hot unplug occurs:
>>>   - Add a new bus ops “sigbus_handler”, that is responsible for handling
>>>     the sigbus error which is either an original memory error, or a specific
>>>     memory error that is caused by a hot unplug. When a sigbus error is
>>>     captured, it will call this function to handle sigbus error.
>>>   - Implement PCI bus specific ops “pci_sigbus_handler”. It will iterate all
>>>     device on PCI bus to find which device encounter the failure.
>>>   - Implement a "rte_bus_sigbus_handler" to iterate all buses to find a bus
>>>     to handle the failure.
>>>   - Add a couple of APIs “rte_dev_hotplug_handle_enable” and
>>>     “rte_dev_hotplug_handle_diable” to enable/disable hotplug handling.
>>>     It will monitor the sigbus error by a handler which is per-process.
>>>     Based on the signal event principle, the control path thread and the
>>>     data path thread will randomly receive the sigbus error, but will call the
>>>     common sigbus process. When sigbus be captured, it will call the above API
>>>     to find bus to handle it.
>>>
>>> The mechanism could be used by app or PMDs. For example, the whole process
>>> of hotplug in testpmd is:
>>>   - Enable device event monitor->Enable hotplug handle->Register event callback
>>>     ->attach port->start port->start forwarding->Device unplug->failure handle
>>>     ->stop forwarding->stop port->close port->detach port.
>>>
>>> This patch set would not cover hotplug insert and binding, and it is only
>>> implement the igb_uio failure handler, the vfio hotplug failure handler
>>> will be in next coming patch set.
>>>
>>> patchset history:
>>> v11->v10:
>>> change the ops name, since both uio and vfio will use the hot-unplug ops.
>>> add experimental tag.
>>> since we plan to abandon RTE_ETH_EVENT_INTR_RMV, change to use
>>> RTE_DEV_EVENT_REMOVE, so modify the hotplug event and callback usage.
>>> move the igb_uio fixing part, since it is random issue and should be considarate
>>> as kernel driver defect but not include as this failure handler mechanism.
>>>
>>> v10->v9:
>>> modify the api name and exposure out for public use.
>>> add hotplug handle enable/disable APIs
>>> refine commit log
>>>
>>> v9->v8:
>>> refine commit log to be more readable.
>>>
>>> v8->v7:
>>> refine errno process in sigbus handler.
>>> refine igb uio release process
>>>
>>> v7->v6:
>>> delete some unused part
>>>
>>> v6->v5:
>>> refine some description about bus ops
>>> refine commit log
>>> add some entry check.
>>>
>>> v5->v4:
>>> split patches to focus on the failure handle, remove the event usage
>>> by testpmd to another patch.
>>> change the hotplug failure handler name.
>>> refine the sigbus handle logic.
>>> add lock for udev state in igb uio driver.
>>>
>>> v4->v3:
>>> split patches to be small and clear.
>>> change to use new parameter "--hotplug-mode" in testpmd to identify
>>> the eal hotplug and ethdev hotplug.
>>>
>>> v3->v2:
>>> change bus ops name to bus_hotplug_handler.
>>> add new API and bus ops of bus_signal_handler distingush handle generic.
>>> sigbus and hotplug sigbus.
>>>
>>> v2->v1(v21):
>>> refine some doc and commit log.
>>> fix igb uio kernel issue for control path failure rebase testpmd code.
>>>
>>> Since the hot plug solution be discussed serval around in the public,
>>> the scope be changed and the patch set be split into many times. Coming
>>> to the recently RFC and feature design, it just focus on the hot unplug
>>> failure handler at this patch set, so in order let this topic more clear
>>> and focus, summarize privours patch set in history “v1(v21)”, the v2 here
>>> go ahead for further track.
>>>
>>> "v1(21)" == v21 as below:
>>> v21->v20:
>>> split function in hot unplug ops.
>>> sync failure hanlde to fix multiple process issue fix attach port issue for multiple devices case.
>>> combind rmv callback function to be only one.
>>>
>>> v20->v19:
>>> clean the code.
>>> refine the remap logic for multiple device.
>>> remove the auto binding.
>>>
>>> v19->18:
>>> note for limitation of multiple hotplug, fix some typo, sqeeze patch.
>>>
>>> v18->v15:
>>> add document, add signal bus handler, refine the code to be more clear.
>>>
>>> the prior patch history please check the patch set "add device event monitor framework".
>>>
>>> Jeff Guo (7):
>>>    bus: add hot-unplug handler
>>>    bus/pci: implement hot-unplug handler ops
>>>    bus: add sigbus handler
>>>    bus/pci: implement sigbus handler ops
>>>    bus: add helper to handle sigbus
>>>    eal: add failure handle mechanism for hot-unplug
>>>    testpmd: use hot-unplug failure handle mechanism
>>>
>>>   app/test-pmd/testpmd.c                  |  39 ++++++--
>>>   doc/guides/rel_notes/release_18_08.rst  |   5 +
>>>   drivers/bus/pci/pci_common.c            |  81 ++++++++++++++++
>>>   drivers/bus/pci/pci_common_uio.c        |  33 +++++++
>>>   drivers/bus/pci/private.h               |  12 +++
>>>   lib/librte_eal/bsdapp/eal/eal_dev.c     |  14 +++
>>>   lib/librte_eal/common/eal_common_bus.c  |  43 +++++++++
>>>   lib/librte_eal/common/eal_private.h     |  39 ++++++++
>>>   lib/librte_eal/common/include/rte_bus.h |  34 +++++++
>>>   lib/librte_eal/common/include/rte_dev.h |  26 +++++
>>>   lib/librte_eal/linuxapp/eal/eal_dev.c   | 162 +++++++++++++++++++++++++++++++-
>>>   lib/librte_eal/rte_eal_version.map      |   2 +
>>>   12 files changed, 481 insertions(+), 9 deletions(-)
>>>
>> I am glad to see this, hotplug is needed. But have a somewhat controversial
>> point of view. The DPDK project needs to do more to force users to go to
>> more modern kernels and API's; there has been too much effort already to
>> support new DPDK on older kernels and distributions. This leads to higher
>> testing burden, technical debt and multiple API's.
>>
>> To take the extreme point of view.
>>          * igb_uio should be deprecated and all new work only use vfio and vfio-ionommu only
>>          * kni should be deprecated and replaced by virtio
> +1
>
> I think, The only feature missing in upstream kernel for DPDK may be to
> enable SRIOV on PF VFIO devices controlled by DPDK PMD.
> I think, Binding a PF device to DPDK along with VFs(VF can be bound to netdev or DPDK
> PMDs, Though binding VF to netdev considered as security breach) will be useful for
> a) rte_flow actions like redirecting the traffic to PF or VF on the given pattern
> b) Some NICs can support promiscuous mode only on PF
> c) Enable Switch representation devices
> https://doc.dpdk.org/guides/prog_guide/switch_representation.html
>
> I think, igb_uio mainly used as the backdoor for this use case.
>
> I think, there was some work in this area but it is not upstreamed due
> to various reasons.
> https://lkml.org/lkml/2018/3/8/1122


I think you definite raise some meaningful of the hotplug for SRIOV. The 
igb_uio still is using now, its usage need to highlight to discuss.


>> When there are N ways of doing things against X kernel versions,
>> and Y distributions, and multiple device vendors; the combinational explosion of cases means
>> that interfaces don't get the depth of testing they deserve.
>>
>> That means why not support hotplug on VFIO only?
>>

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH v12 0/7] hot-unplug failure handle mechanism
  2017-06-29  4:37     ` [PATCH v3 0/2] add uevent api for hot plug Jeff Guo
                         ` (18 preceding siblings ...)
  2018-09-30 11:29       ` [PATCH v11 0/7] " Jeff Guo
@ 2018-10-02 12:32       ` Jeff Guo
  2018-10-02 12:32         ` [PATCH v12 1/7] bus: add hot-unplug handler Jeff Guo
                           ` (6 more replies)
  2018-10-02 12:35       ` [PATCH v12 0/7] " Jeff Guo
                         ` (3 subsequent siblings)
  23 siblings, 7 replies; 494+ messages in thread
From: Jeff Guo @ 2018-10-02 12:32 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu, anatoly.burakov, jerin.jacob
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

Hotplug is an important feature for use-cases like the datacenter device's
fail-safe and for SRIOV Live Migration in SDN/NFV. It could bring higher
flexibility and continuality to networking services in multiple use-cases
in the industry. So let's see how DPDK can help users implement hotplug
solutions.

We already have a general device-event monitor mechanism, failsafe driver,
and hot plug/unplug API in DPDK. We have already got the solution of
“ethdev event + kernel PMD hotplug handler + failsafe”, but we still not
got “eal event + hotplug handler for pci PMD + failsafe” implement, and we
need to considerate 2 different solutions between uio pci and vfio pci.

In the case of hotplug for igb_uio, when a hardware device be removed
physically or disabled in software, the application needs to be notified
and detach the device out of the bus, and then make the device invalidate.
The problem is that, the removal of the device is not instantaneous in
software. If the application data path tries to read/write to the device
when removal is still in process, it will cause an MMIO error and
application will crash.

In this patch set, we propose a PCIe bus failure handler mechanism for
hot-unplug in igb_uio. It aims to guarantee that, when a hot-unplug occurs,
the application will not crash.

The mechanism should work as below:

First, the application enables the device event monitor, registers the
hotplug event’s callback and enable hotplug handling before running the
data path. Once the hot-unplug occurs, the mechanism will detect the
removal event and then accordingly do the failure handling. In order to
do that, the below functionality will be required:
 - Add a new bus ops “hot_unplug_handler” to handle hot-unplug failure.
 - Implement pci bus specific ops “pci_hot_unplug_handler”. For uio pci,
   it will be based on the failure address to remap memory for the corresponding
   device that unplugged. For vfio pci, could seperate implement case by case.

For the data path or other unexpected behaviors from the control path
when a hot unplug occurs:
 - Add a new bus ops “sigbus_handler”, that is responsible for handling
   the sigbus error which is either an original memory error, or a specific
   memory error that is caused by a hot unplug. When a sigbus error is
   captured, it will call this function to handle sigbus error.
 - Implement PCI bus specific ops “pci_sigbus_handler”. It will iterate all
   device on PCI bus to find which device encounter the failure.
 - Implement a "rte_bus_sigbus_handler" to iterate all buses to find a bus
   to handle the failure.
 - Add a couple of APIs “rte_dev_hotplug_handle_enable” and
   “rte_dev_hotplug_handle_diable” to enable/disable hotplug handling.
   It will monitor the sigbus error by a handler which is per-process.
   Based on the signal event principle, the control path thread and the
   data path thread will randomly receive the sigbus error, but will call the
   common sigbus process. When sigbus be captured, it will call the above API
   to find bus to handle it.

The mechanism could be used by app or PMDs. For example, the whole process
of hotplug in testpmd is:
 - Enable device event monitor->Enable hotplug handle->Register event callback
   ->attach port->start port->start forwarding->Device unplug->failure handle
   ->stop forwarding->stop port->close port->detach port.

This patch set would not cover hotplug insert and binding, and it is only
implement the igb_uio failure handler, the vfio hotplug failure handler
will be in next coming patch set.

patchset history:
v12->v11:
add and delete some checking about sigbus recover.

v11->v10:
change the ops name, since both uio and vfio will use the hot-unplug ops.
since we plan to abandon RTE_ETH_EVENT_INTR_RMV, change to use
RTE_DEV_EVENT_REMOVE, so modify the hotplug event and callback usage.
move the igb_uio fixing part, since it is random issue and should be considarate
as kernel driver defect but not include as this failure handler mechanism.

v10->v9:
modify the api name and exposure out for public use.
add hotplug handle enable/disable APIs
refine commit log

v9->v8:
refine commit log to be more readable.

v8->v7:
refine errno process in sigbus handler.
refine igb uio release process

v7->v6:
delete some unused part

v6->v5:
refine some description about bus ops
refine commit log
add some entry check.

v5->v4:
split patches to focus on the failure handle, remove the event usage
by testpmd to another patch.
change the hotplug failure handler name.
refine the sigbus handle logic.
add lock for udev state in igb uio driver.

v4->v3:
split patches to be small and clear.
change to use new parameter "--hotplug-mode" in testpmd to identify
the eal hotplug and ethdev hotplug.

v3->v2:
change bus ops name to bus_hotplug_handler.
add new API and bus ops of bus_signal_handler distingush handle generic.
sigbus and hotplug sigbus.

v2->v1(v21):
refine some doc and commit log.
fix igb uio kernel issue for control path failure rebase testpmd code.

Since the hot plug solution be discussed serval around in the public,
the scope be changed and the patch set be split into many times. Coming
to the recently RFC and feature design, it just focus on the hot unplug
failure handler at this patch set, so in order let this topic more clear
and focus, summarize privours patch set in history “v1(v21)”, the v2 here
go ahead for further track.

"v1(21)" == v21 as below:
v21->v20:
split function in hot unplug ops.
sync failure hanlde to fix multiple process issue fix attach port issue for multiple devices case.
combind rmv callback function to be only one.

v20->v19:
clean the code.
refine the remap logic for multiple device.
remove the auto binding.

v19->18:
note for limitation of multiple hotplug, fix some typo, sqeeze patch.

v18->v15:
add document, add signal bus handler, refine the code to be more clear.

the prior patch history please check the patch set "add device event monitor framework".

Jeff Guo (7):
  bus: add hot-unplug handler
  bus/pci: implement hot-unplug handler ops
  bus: add sigbus handler
  bus/pci: implement sigbus handler ops
  bus: add helper to handle sigbus
  eal: add failure handle mechanism for hot-unplug
  testpmd: use hot-unplug failure handle mechanism

 app/test-pmd/testpmd.c                  |  39 ++++++--
 doc/guides/rel_notes/release_18_08.rst  |   5 +
 drivers/bus/pci/pci_common.c            |  81 ++++++++++++++++
 drivers/bus/pci/pci_common_uio.c        |  33 +++++++
 drivers/bus/pci/private.h               |  12 +++
 lib/librte_eal/bsdapp/eal/eal_dev.c     |  14 +++
 lib/librte_eal/common/eal_common_bus.c  |  43 +++++++++
 lib/librte_eal/common/eal_private.h     |  39 ++++++++
 lib/librte_eal/common/include/rte_bus.h |  34 +++++++
 lib/librte_eal/common/include/rte_dev.h |  26 +++++
 lib/librte_eal/linuxapp/eal/eal_dev.c   | 164 +++++++++++++++++++++++++++++++-
 lib/librte_eal/rte_eal_version.map      |   2 +
 12 files changed, 483 insertions(+), 9 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH v12 1/7] bus: add hot-unplug handler
  2018-10-02 12:32       ` [PATCH v12 " Jeff Guo
@ 2018-10-02 12:32         ` Jeff Guo
  2018-10-02 12:32         ` [PATCH v12 2/7] bus/pci: implement hot-unplug handler ops Jeff Guo
                           ` (5 subsequent siblings)
  6 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-10-02 12:32 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu, anatoly.burakov, jerin.jacob
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

A hot-unplug failure and app crash can be caused, when a device is
hot-unplugged but the application still try to access the device
by reading or writing from the BARs, which is already invalid but
still not timely be unmap or released.

This patch introduces bus ops to handle hot-unplug failures. Each
bus can implement its own case-dependent logic to handle the failures.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v12->v11:
no change.
---
 lib/librte_eal/common/include/rte_bus.h | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
index b7b5b08..1bb53dc 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -168,6 +168,20 @@ typedef int (*rte_bus_unplug_t)(struct rte_device *dev);
 typedef int (*rte_bus_parse_t)(const char *name, void *addr);
 
 /**
+ * Implement a specific hot-unplug handler, which is responsible for
+ * handle the failure when device be hot-unplugged. When the event of
+ * hot-unplug be detected, it could call this function to handle
+ * the hot-unplug failure and avoid app crash.
+ * @param dev
+ *	Pointer of the device structure.
+ *
+ * @return
+ *	0 on success.
+ *	!0 on error.
+ */
+typedef int (*rte_bus_hot_unplug_handler_t)(struct rte_device *dev);
+
+/**
  * Bus scan policies
  */
 enum rte_bus_scan_mode {
@@ -212,6 +226,8 @@ struct rte_bus {
 	struct rte_bus_conf conf;    /**< Bus configuration */
 	rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
 	rte_dev_iterate_t dev_iterate; /**< Device iterator. */
+	rte_bus_hot_unplug_handler_t hot_unplug_handler;
+				/**< handle hot-unplug failure on the bus */
 };
 
 /**
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v12 2/7] bus/pci: implement hot-unplug handler ops
  2018-10-02 12:32       ` [PATCH v12 " Jeff Guo
  2018-10-02 12:32         ` [PATCH v12 1/7] bus: add hot-unplug handler Jeff Guo
@ 2018-10-02 12:32         ` Jeff Guo
  2018-10-02 12:32         ` [PATCH v12 3/7] bus: add sigbus handler Jeff Guo
                           ` (4 subsequent siblings)
  6 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-10-02 12:32 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu, anatoly.burakov, jerin.jacob
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

This patch implements the ops to handle hot-unplug on the PCI bus.
For UIO PCI, it could avoids BARs read/write errors by creating a
new dummy memory to remap the memory where the failure is. For VFIO
or other kernel driver, it could specific implement function to handle
hot-unplug case by case.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Shaopeng He <shaopeng.he@intel.com>
---
v12->v11:
no change.
---
 drivers/bus/pci/pci_common.c     | 28 ++++++++++++++++++++++++++++
 drivers/bus/pci/pci_common_uio.c | 33 +++++++++++++++++++++++++++++++++
 drivers/bus/pci/private.h        | 12 ++++++++++++
 3 files changed, 73 insertions(+)

diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
index 7736b3f..d286234 100644
--- a/drivers/bus/pci/pci_common.c
+++ b/drivers/bus/pci/pci_common.c
@@ -406,6 +406,33 @@ pci_find_device(const struct rte_device *start, rte_dev_cmp_t cmp,
 }
 
 static int
+pci_hot_unplug_handler(struct rte_device *dev)
+{
+	struct rte_pci_device *pdev = NULL;
+	int ret = 0;
+
+	pdev = RTE_DEV_TO_PCI(dev);
+	if (!pdev)
+		return -1;
+
+	switch (pdev->kdrv) {
+	case RTE_KDRV_IGB_UIO:
+	case RTE_KDRV_UIO_GENERIC:
+	case RTE_KDRV_NIC_UIO:
+		/* BARs resource is invalid, remap it to be safe. */
+		ret = pci_uio_remap_resource(pdev);
+		break;
+	default:
+		RTE_LOG(DEBUG, EAL,
+			"Not managed by a supported kernel driver, skipped\n");
+		ret = -1;
+		break;
+	}
+
+	return ret;
+}
+
+static int
 pci_plug(struct rte_device *dev)
 {
 	return pci_probe_all_drivers(RTE_DEV_TO_PCI(dev));
@@ -435,6 +462,7 @@ struct rte_pci_bus rte_pci_bus = {
 		.unplug = pci_unplug,
 		.parse = pci_parse,
 		.get_iommu_class = rte_pci_get_iommu_class,
+		.hot_unplug_handler = pci_hot_unplug_handler,
 	},
 	.device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
 	.driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
diff --git a/drivers/bus/pci/pci_common_uio.c b/drivers/bus/pci/pci_common_uio.c
index 54bc20b..7ea73db 100644
--- a/drivers/bus/pci/pci_common_uio.c
+++ b/drivers/bus/pci/pci_common_uio.c
@@ -146,6 +146,39 @@ pci_uio_unmap(struct mapped_pci_resource *uio_res)
 	}
 }
 
+/* remap the PCI resource of a PCI device in anonymous virtual memory */
+int
+pci_uio_remap_resource(struct rte_pci_device *dev)
+{
+	int i;
+	void *map_address;
+
+	if (dev == NULL)
+		return -1;
+
+	/* Remap all BARs */
+	for (i = 0; i != PCI_MAX_RESOURCE; i++) {
+		/* skip empty BAR */
+		if (dev->mem_resource[i].phys_addr == 0)
+			continue;
+		map_address = mmap(dev->mem_resource[i].addr,
+				(size_t)dev->mem_resource[i].len,
+				PROT_READ | PROT_WRITE,
+				MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
+		if (map_address == MAP_FAILED) {
+			RTE_LOG(ERR, EAL,
+				"Cannot remap resource for device %s\n",
+				dev->name);
+			return -1;
+		}
+		RTE_LOG(INFO, EAL,
+			"Successful remap resource for device %s\n",
+			dev->name);
+	}
+
+	return 0;
+}
+
 static struct mapped_pci_resource *
 pci_uio_find_resource(struct rte_pci_device *dev)
 {
diff --git a/drivers/bus/pci/private.h b/drivers/bus/pci/private.h
index 8ddd03e..6b312e5 100644
--- a/drivers/bus/pci/private.h
+++ b/drivers/bus/pci/private.h
@@ -123,6 +123,18 @@ void pci_uio_free_resource(struct rte_pci_device *dev,
 		struct mapped_pci_resource *uio_res);
 
 /**
+ * Remap the PCI resource of a PCI device in anonymous virtual memory.
+ *
+ * @param dev
+ *   Point to the struct rte pci device.
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+pci_uio_remap_resource(struct rte_pci_device *dev);
+
+/**
  * Map device memory to uio resource
  *
  * This function is private to EAL.
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v12 3/7] bus: add sigbus handler
  2018-10-02 12:32       ` [PATCH v12 " Jeff Guo
  2018-10-02 12:32         ` [PATCH v12 1/7] bus: add hot-unplug handler Jeff Guo
  2018-10-02 12:32         ` [PATCH v12 2/7] bus/pci: implement hot-unplug handler ops Jeff Guo
@ 2018-10-02 12:32         ` Jeff Guo
  2018-10-02 12:32         ` [PATCH v12 4/7] bus/pci: implement sigbus handler ops Jeff Guo
                           ` (3 subsequent siblings)
  6 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-10-02 12:32 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu, anatoly.burakov, jerin.jacob
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

When a device is hot-unplugged, a sigbus error will occur of the datapath
can still read/write to the device. A handler is required here to capture
the sigbus signal and handle it appropriately.

This patch introduces a bus ops to handle sigbus errors. Each bus can
implement its own case-dependent logic to handle the sigbus errors.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Shaopeng He <shaopeng.he@intel.com>
---
v12->v11:
no change.
---
 lib/librte_eal/common/include/rte_bus.h | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
index 1bb53dc..201454a 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -182,6 +182,21 @@ typedef int (*rte_bus_parse_t)(const char *name, void *addr);
 typedef int (*rte_bus_hot_unplug_handler_t)(struct rte_device *dev);
 
 /**
+ * Implement a specific sigbus handler, which is responsible for handling
+ * the sigbus error which is either original memory error, or specific memory
+ * error that caused of device be hot-unplugged. When sigbus error be captured,
+ * it could call this function to handle sigbus error.
+ * @param failure_addr
+ *	Pointer of the fault address of the sigbus error.
+ *
+ * @return
+ *	0 for success handle the sigbus.
+ *	1 for no bus handle the sigbus.
+ *	-1 for failed to handle the sigbus
+ */
+typedef int (*rte_bus_sigbus_handler_t)(const void *failure_addr);
+
+/**
  * Bus scan policies
  */
 enum rte_bus_scan_mode {
@@ -228,6 +243,9 @@ struct rte_bus {
 	rte_dev_iterate_t dev_iterate; /**< Device iterator. */
 	rte_bus_hot_unplug_handler_t hot_unplug_handler;
 				/**< handle hot-unplug failure on the bus */
+	rte_bus_sigbus_handler_t sigbus_handler;
+					/**< handle sigbus error on the bus */
+
 };
 
 /**
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v12 4/7] bus/pci: implement sigbus handler ops
  2018-10-02 12:32       ` [PATCH v12 " Jeff Guo
                           ` (2 preceding siblings ...)
  2018-10-02 12:32         ` [PATCH v12 3/7] bus: add sigbus handler Jeff Guo
@ 2018-10-02 12:32         ` Jeff Guo
  2018-10-02 12:32         ` [PATCH v12 5/7] bus: add helper to handle sigbus Jeff Guo
                           ` (2 subsequent siblings)
  6 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-10-02 12:32 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu, anatoly.burakov, jerin.jacob
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

This patch implements the ops for the PCI bus sigbus handler. It finds the
PCI device that is being hot-unplugged and calls the relevant ops of the
hot-unplug handler to handle the hot-unplug failure of the device.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Shaopeng He <shaopeng.he@intel.com>
---
v12->v11:
no change.
---
 drivers/bus/pci/pci_common.c | 53 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 53 insertions(+)

diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
index d286234..f313fe9 100644
--- a/drivers/bus/pci/pci_common.c
+++ b/drivers/bus/pci/pci_common.c
@@ -405,6 +405,36 @@ pci_find_device(const struct rte_device *start, rte_dev_cmp_t cmp,
 	return NULL;
 }
 
+/**
+ * find the device which encounter the failure, by iterate over all device on
+ * PCI bus to check if the memory failure address is located in the range
+ * of the BARs of the device.
+ */
+static struct rte_pci_device *
+pci_find_device_by_addr(const void *failure_addr)
+{
+	struct rte_pci_device *pdev = NULL;
+	int i;
+
+	FOREACH_DEVICE_ON_PCIBUS(pdev) {
+		for (i = 0; i != RTE_DIM(pdev->mem_resource); i++) {
+			if ((uint64_t)(uintptr_t)failure_addr >=
+			    (uint64_t)(uintptr_t)pdev->mem_resource[i].addr &&
+			    (uint64_t)(uintptr_t)failure_addr <
+			    (uint64_t)(uintptr_t)pdev->mem_resource[i].addr +
+			    pdev->mem_resource[i].len) {
+				RTE_LOG(INFO, EAL, "Failure address "
+					"%16.16"PRIx64" belongs to "
+					"device %s!\n",
+					(uint64_t)(uintptr_t)failure_addr,
+					pdev->device.name);
+				return pdev;
+			}
+		}
+	}
+	return NULL;
+}
+
 static int
 pci_hot_unplug_handler(struct rte_device *dev)
 {
@@ -433,6 +463,28 @@ pci_hot_unplug_handler(struct rte_device *dev)
 }
 
 static int
+pci_sigbus_handler(const void *failure_addr)
+{
+	struct rte_pci_device *pdev = NULL;
+	int ret = 0;
+
+	pdev = pci_find_device_by_addr(failure_addr);
+	if (!pdev) {
+		/* It is a generic sigbus error, no bus would handle it. */
+		ret = 1;
+	} else {
+		/* The sigbus error is caused of hot-unplug. */
+		ret = pci_hot_unplug_handler(&pdev->device);
+		if (ret) {
+			RTE_LOG(ERR, EAL, "Failed to handle hot-unplug for "
+				"device %s", pdev->name);
+			ret = -1;
+		}
+	}
+	return ret;
+}
+
+static int
 pci_plug(struct rte_device *dev)
 {
 	return pci_probe_all_drivers(RTE_DEV_TO_PCI(dev));
@@ -463,6 +515,7 @@ struct rte_pci_bus rte_pci_bus = {
 		.parse = pci_parse,
 		.get_iommu_class = rte_pci_get_iommu_class,
 		.hot_unplug_handler = pci_hot_unplug_handler,
+		.sigbus_handler = pci_sigbus_handler,
 	},
 	.device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
 	.driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v12 5/7] bus: add helper to handle sigbus
  2018-10-02 12:32       ` [PATCH v12 " Jeff Guo
                           ` (3 preceding siblings ...)
  2018-10-02 12:32         ` [PATCH v12 4/7] bus/pci: implement sigbus handler ops Jeff Guo
@ 2018-10-02 12:32         ` Jeff Guo
  2018-10-02 12:32         ` [PATCH v11 6/7] eal: add failure handle mechanism for hot-unplug Jeff Guo
  2018-10-02 12:32         ` [PATCH v11 7/7] testpmd: use hot-unplug failure handle mechanism Jeff Guo
  6 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-10-02 12:32 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu, anatoly.burakov, jerin.jacob
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

This patch aims to add a helper to iterate over all buses to find the
relevant bus to handle the sigbus error.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Shaopeng He <shaopeng.he@intel.com>
---
v12->v11:
no change.
---
 lib/librte_eal/common/eal_common_bus.c | 43 ++++++++++++++++++++++++++++++++++
 lib/librte_eal/common/eal_private.h    | 13 ++++++++++
 2 files changed, 56 insertions(+)

diff --git a/lib/librte_eal/common/eal_common_bus.c b/lib/librte_eal/common/eal_common_bus.c
index 0943851..62b7318 100644
--- a/lib/librte_eal/common/eal_common_bus.c
+++ b/lib/librte_eal/common/eal_common_bus.c
@@ -37,6 +37,7 @@
 #include <rte_bus.h>
 #include <rte_debug.h>
 #include <rte_string_fns.h>
+#include <rte_errno.h>
 
 #include "eal_private.h"
 
@@ -242,3 +243,45 @@ rte_bus_get_iommu_class(void)
 	}
 	return mode;
 }
+
+static int
+bus_handle_sigbus(const struct rte_bus *bus,
+			const void *failure_addr)
+{
+	int ret;
+
+	if (!bus->sigbus_handler)
+		return -1;
+
+	ret = bus->sigbus_handler(failure_addr);
+
+	/* find bus but handle failed, keep the errno be set. */
+	if (ret < 0 && rte_errno == 0)
+		rte_errno = ENOTSUP;
+
+	return ret > 0;
+}
+
+int
+rte_bus_sigbus_handler(const void *failure_addr)
+{
+	struct rte_bus *bus;
+
+	int ret = 0;
+	int old_errno = rte_errno;
+
+	rte_errno = 0;
+
+	bus = rte_bus_find(NULL, bus_handle_sigbus, failure_addr);
+	/* can not find bus. */
+	if (!bus)
+		return 1;
+	/* find bus but handle failed, pass on the new errno. */
+	else if (rte_errno != 0)
+		return -1;
+
+	/* restore the old errno. */
+	rte_errno = old_errno;
+
+	return ret;
+}
diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h
index 4f809a8..a2d1528 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -304,4 +304,17 @@ int
 rte_devargs_layers_parse(struct rte_devargs *devargs,
 			 const char *devstr);
 
+/**
+ * Iterate over all buses to find the corresponding bus to handle the sigbus
+ * error.
+ * @param failure_addr
+ *	Pointer of the fault address of the sigbus error.
+ *
+ * @return
+ *	 0 success to handle the sigbus.
+ *	-1 failed to handle the sigbus
+ *	 1 no bus can handler the sigbus
+ */
+int rte_bus_sigbus_handler(const void *failure_addr);
+
 #endif /* _EAL_PRIVATE_H_ */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v11 6/7] eal: add failure handle mechanism for hot-unplug
  2018-10-02 12:32       ` [PATCH v12 " Jeff Guo
                           ` (4 preceding siblings ...)
  2018-10-02 12:32         ` [PATCH v12 5/7] bus: add helper to handle sigbus Jeff Guo
@ 2018-10-02 12:32         ` Jeff Guo
  2018-10-02 12:32         ` [PATCH v11 7/7] testpmd: use hot-unplug failure handle mechanism Jeff Guo
  6 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-10-02 12:32 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu, anatoly.burakov, jerin.jacob
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

The mechanism can initially register the sigbus handler after the device
event monitor is enabled. When a sigbus event is captured, it will check
the failure address and accordingly handle the memory failure of the
corresponding device by invoke the hot-unplug handler. It could prevent
the application from crashing when a device is hot-unplugged.

By this patch, users could call below new added APIs to enable/disable
the device hotplug handle mechanism. Note that it just implement the
hot-unplug handler in these functions, the other handler of hotplug, such
as handler for hotplug binding, could be add in the future if need:
  - rte_dev_hotplug_handle_enable
  - rte_dev_hotplug_handle_disable

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v12->v11:
add and delete some checking about sigbus recover.
---
 doc/guides/rel_notes/release_18_08.rst  |   5 +
 lib/librte_eal/bsdapp/eal/eal_dev.c     |  14 +++
 lib/librte_eal/common/eal_private.h     |  26 +++++
 lib/librte_eal/common/include/rte_dev.h |  26 +++++
 lib/librte_eal/linuxapp/eal/eal_dev.c   | 164 +++++++++++++++++++++++++++++++-
 lib/librte_eal/rte_eal_version.map      |   2 +
 6 files changed, 236 insertions(+), 1 deletion(-)

diff --git a/doc/guides/rel_notes/release_18_08.rst b/doc/guides/rel_notes/release_18_08.rst
index 321fa84..fe0e60f 100644
--- a/doc/guides/rel_notes/release_18_08.rst
+++ b/doc/guides/rel_notes/release_18_08.rst
@@ -117,6 +117,11 @@ New Features
 
   Added support for chained mbufs (input and output).
 
+* **Added hot-unplug handle mechanism.**
+
+  ``rte_dev_hotplug_handle_enable`` and ``rte_dev_hotplug_handle_disable`` are
+  for enabling or disabling hotplug handle mechanism.
+
 
 API Changes
 -----------
diff --git a/lib/librte_eal/bsdapp/eal/eal_dev.c b/lib/librte_eal/bsdapp/eal/eal_dev.c
index 1c6c51b..255d611 100644
--- a/lib/librte_eal/bsdapp/eal/eal_dev.c
+++ b/lib/librte_eal/bsdapp/eal/eal_dev.c
@@ -19,3 +19,17 @@ rte_dev_event_monitor_stop(void)
 	RTE_LOG(ERR, EAL, "Device event is not supported for FreeBSD\n");
 	return -1;
 }
+
+int __rte_experimental
+rte_dev_hotplug_handle_enable(void)
+{
+	RTE_LOG(ERR, EAL, "Device event is not supported for FreeBSD\n");
+	return -1;
+}
+
+int __rte_experimental
+rte_dev_hotplug_handle_disable(void)
+{
+	RTE_LOG(ERR, EAL, "Device event is not supported for FreeBSD\n");
+	return -1;
+}
diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h
index a2d1528..637f20d 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -317,4 +317,30 @@ rte_devargs_layers_parse(struct rte_devargs *devargs,
  */
 int rte_bus_sigbus_handler(const void *failure_addr);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Register the sigbus handler.
+ *
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+rte_dev_sigbus_handler_register(void);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Unregister the sigbus handler.
+ *
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+rte_dev_sigbus_handler_unregister(void);
+
 #endif /* _EAL_PRIVATE_H_ */
diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h
index b80a805..ff580a0 100644
--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -460,4 +460,30 @@ rte_dev_event_monitor_start(void);
 int __rte_experimental
 rte_dev_event_monitor_stop(void);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Enable hotplug handling for devices.
+ *
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int __rte_experimental
+rte_dev_hotplug_handle_enable(void);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Disable hotplug handling for devices.
+ *
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int __rte_experimental
+rte_dev_hotplug_handle_disable(void);
+
 #endif /* _RTE_DEV_H_ */
diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
index 1cf6aeb..72fc033 100644
--- a/lib/librte_eal/linuxapp/eal/eal_dev.c
+++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
@@ -4,6 +4,8 @@
 
 #include <string.h>
 #include <unistd.h>
+#include <fcntl.h>
+#include <signal.h>
 #include <sys/socket.h>
 #include <linux/netlink.h>
 
@@ -14,15 +16,32 @@
 #include <rte_malloc.h>
 #include <rte_interrupts.h>
 #include <rte_alarm.h>
+#include <rte_bus.h>
+#include <rte_eal.h>
+#include <rte_spinlock.h>
+#include <rte_errno.h>
 
 #include "eal_private.h"
 
 static struct rte_intr_handle intr_handle = {.fd = -1 };
 static bool monitor_started;
+static bool hotplug_handle;
 
 #define EAL_UEV_MSG_LEN 4096
 #define EAL_UEV_MSG_ELEM_LEN 128
 
+/*
+ * spinlock for device hot-unplug failure handling. If it try to access bus or
+ * device, such as handle sigbus on bus or handle memory failure for device
+ * just need to use this lock. It could protect the bus and the device to avoid
+ * race condition.
+ */
+static rte_spinlock_t failure_handle_lock = RTE_SPINLOCK_INITIALIZER;
+
+static struct sigaction sigbus_action_old;
+
+static int sigbus_need_recover;
+
 static void dev_uev_handler(__rte_unused void *param);
 
 /* identify the system layer which reports this event. */
@@ -33,6 +52,49 @@ enum eal_dev_event_subsystem {
 	EAL_DEV_EVENT_SUBSYSTEM_MAX
 };
 
+static void
+sigbus_action_recover(void)
+{
+	if (sigbus_need_recover) {
+		sigaction(SIGBUS, &sigbus_action_old, NULL);
+		sigbus_need_recover = 0;
+	}
+}
+
+static void sigbus_handler(int signum, siginfo_t *info,
+				void *ctx __rte_unused)
+{
+	int ret;
+
+	RTE_LOG(INFO, EAL, "Thread[%d] catch SIGBUS, fault address:%p\n",
+		(int)pthread_self(), info->si_addr);
+
+	rte_spinlock_lock(&failure_handle_lock);
+	ret = rte_bus_sigbus_handler(info->si_addr);
+	rte_spinlock_unlock(&failure_handle_lock);
+	if (ret == -1) {
+		rte_exit(EXIT_FAILURE,
+			 "Failed to handle SIGBUS for hot-unplug, "
+			 "(rte_errno: %s)!", strerror(rte_errno));
+	} else if (ret == 1) {
+		if (sigbus_action_old.sa_handler)
+			(*(sigbus_action_old.sa_handler))(signum);
+		else
+			rte_exit(EXIT_FAILURE,
+				 "Failed to handle generic SIGBUS!");
+	}
+
+	RTE_LOG(INFO, EAL, "Success to handle SIGBUS for hot-unplug!\n");
+}
+
+static int cmp_dev_name(const struct rte_device *dev,
+	const void *_name)
+{
+	const char *name = _name;
+
+	return strcmp(dev->name, name);
+}
+
 static int
 dev_uev_socket_fd_create(void)
 {
@@ -147,6 +209,9 @@ dev_uev_handler(__rte_unused void *param)
 	struct rte_dev_event uevent;
 	int ret;
 	char buf[EAL_UEV_MSG_LEN];
+	struct rte_bus *bus;
+	struct rte_device *dev;
+	const char *busname = "";
 
 	memset(&uevent, 0, sizeof(struct rte_dev_event));
 	memset(buf, 0, EAL_UEV_MSG_LEN);
@@ -171,8 +236,43 @@ dev_uev_handler(__rte_unused void *param)
 	RTE_LOG(DEBUG, EAL, "receive uevent(name:%s, type:%d, subsystem:%d)\n",
 		uevent.devname, uevent.type, uevent.subsystem);
 
-	if (uevent.devname)
+	switch (uevent.subsystem) {
+	case EAL_DEV_EVENT_SUBSYSTEM_PCI:
+	case EAL_DEV_EVENT_SUBSYSTEM_UIO:
+		busname = "pci";
+		break;
+	default:
+		break;
+	}
+
+	if (uevent.devname) {
+		if (uevent.type == RTE_DEV_EVENT_REMOVE && hotplug_handle) {
+			rte_spinlock_lock(&failure_handle_lock);
+			bus = rte_bus_find_by_name(busname);
+			if (bus == NULL) {
+				RTE_LOG(ERR, EAL, "Cannot find bus (%s)\n",
+					busname);
+				return;
+			}
+
+			dev = bus->find_device(NULL, cmp_dev_name,
+					       uevent.devname);
+			if (dev == NULL) {
+				RTE_LOG(ERR, EAL, "Cannot find device (%s) on "
+					"bus (%s)\n", uevent.devname, busname);
+				return;
+			}
+
+			ret = bus->hot_unplug_handler(dev);
+			rte_spinlock_unlock(&failure_handle_lock);
+			if (ret) {
+				RTE_LOG(ERR, EAL, "Can not handle hot-unplug "
+					"for device (%s)\n", dev->name);
+				return;
+			}
+		}
 		dev_callback_process(uevent.devname, uevent.type);
+	}
 }
 
 int __rte_experimental
@@ -220,5 +320,67 @@ rte_dev_event_monitor_stop(void)
 	close(intr_handle.fd);
 	intr_handle.fd = -1;
 	monitor_started = false;
+
 	return 0;
 }
+
+int __rte_experimental
+rte_dev_sigbus_handler_register(void)
+{
+	sigset_t mask;
+	struct sigaction action;
+
+	rte_errno = 0;
+
+	if (sigbus_need_recover)
+		return 0;
+
+	sigemptyset(&mask);
+	sigaddset(&mask, SIGBUS);
+	action.sa_flags = SA_SIGINFO;
+	action.sa_mask = mask;
+	action.sa_sigaction = sigbus_handler;
+	sigbus_need_recover = !sigaction(SIGBUS, &action, &sigbus_action_old);
+
+	return rte_errno;
+}
+
+int __rte_experimental
+rte_dev_sigbus_handler_unregister(void)
+{
+	rte_errno = 0;
+
+	sigbus_action_recover();
+
+	return rte_errno;
+}
+
+int __rte_experimental
+rte_dev_hotplug_handle_enable(void)
+{
+	int ret = 0;
+
+	ret = rte_dev_sigbus_handler_register();
+	if (ret < 0)
+		RTE_LOG(ERR, EAL,
+			"fail to register sigbus handler for devices.\n");
+
+	hotplug_handle = true;
+
+	return ret;
+}
+
+int __rte_experimental
+rte_dev_hotplug_handle_disable(void)
+{
+	int ret = 0;
+
+	ret = rte_dev_sigbus_handler_unregister();
+	if (ret < 0)
+		RTE_LOG(ERR, EAL,
+			"fail to unregister sigbus handler for devices.\n");
+
+	hotplug_handle = false;
+
+	return ret;
+}
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 73282bb..a3255aa 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -281,6 +281,8 @@ EXPERIMENTAL {
 	rte_dev_event_callback_unregister;
 	rte_dev_event_monitor_start;
 	rte_dev_event_monitor_stop;
+	rte_dev_hotplug_handle_enable;
+	rte_dev_hotplug_handle_disable;
 	rte_dev_iterator_init;
 	rte_dev_iterator_next;
 	rte_devargs_add;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v11 7/7] testpmd: use hot-unplug failure handle mechanism
  2018-10-02 12:32       ` [PATCH v12 " Jeff Guo
                           ` (5 preceding siblings ...)
  2018-10-02 12:32         ` [PATCH v11 6/7] eal: add failure handle mechanism for hot-unplug Jeff Guo
@ 2018-10-02 12:32         ` Jeff Guo
  6 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-10-02 12:32 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu, anatoly.burakov, jerin.jacob
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

This patch use testpmd for example, to show how an app smoothly handle
failure when device be hot-unplug. Except app should enabled the device
event monitor and register the hotplug event’s callback, it also need
enable hotplug handle mechanism before running. Once app detect the
removal event, the hot-unplug callback would be called. It will first stop
the packet forwarding, then stop the port, close the port, and finally
detach the port to clean the device and release the resources.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v12->v11:
no change.
---
 app/test-pmd/testpmd.c | 39 +++++++++++++++++++++++++++++++--------
 1 file changed, 31 insertions(+), 8 deletions(-)

diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 001f0e5..bfef483 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -2093,14 +2093,22 @@ pmd_test_exit(void)
 
 	if (hot_plug) {
 		ret = rte_dev_event_monitor_stop();
-		if (ret)
+		if (ret) {
 			RTE_LOG(ERR, EAL,
 				"fail to stop device event monitor.");
+			return;
+		}
 
 		ret = eth_dev_event_callback_unregister();
 		if (ret)
+			return;
+
+		ret = rte_dev_hotplug_handle_disable();
+		if (ret) {
 			RTE_LOG(ERR, EAL,
-				"fail to unregister all event callbacks.");
+				"fail to disable hotplug handling.");
+			return;
+		}
 	}
 
 	printf("\nBye...\n");
@@ -2244,6 +2252,9 @@ static void
 eth_dev_event_callback(char *device_name, enum rte_dev_event_type type,
 			     __rte_unused void *arg)
 {
+	uint16_t port_id;
+	int ret;
+
 	if (type >= RTE_DEV_EVENT_MAX) {
 		fprintf(stderr, "%s called upon invalid event %d\n",
 			__func__, type);
@@ -2254,9 +2265,12 @@ eth_dev_event_callback(char *device_name, enum rte_dev_event_type type,
 	case RTE_DEV_EVENT_REMOVE:
 		RTE_LOG(ERR, EAL, "The device: %s has been removed!\n",
 			device_name);
-		/* TODO: After finish failure handle, begin to stop
-		 * packet forward, stop port, close port, detach port.
-		 */
+		ret = rte_eth_dev_get_port_by_name(device_name, &port_id);
+		if (ret) {
+			printf("can not get port by device %s!\n", device_name);
+			return;
+		}
+		rmv_event_callback((void *)(intptr_t)port_id);
 		break;
 	case RTE_DEV_EVENT_ADD:
 		RTE_LOG(ERR, EAL, "The device: %s has been added!\n",
@@ -2779,14 +2793,23 @@ main(int argc, char** argv)
 	init_config();
 
 	if (hot_plug) {
-		/* enable hot plug monitoring */
+		ret = rte_dev_hotplug_handle_enable();
+		if (ret) {
+			RTE_LOG(ERR, EAL,
+				"fail to enable hotplug handling.");
+			return -1;
+		}
+
 		ret = rte_dev_event_monitor_start();
 		if (ret) {
-			rte_errno = EINVAL;
+			RTE_LOG(ERR, EAL,
+				"fail to start device event monitoring.");
 			return -1;
 		}
-		eth_dev_event_callback_register();
 
+		ret = eth_dev_event_callback_register();
+		if (ret)
+			return -1;
 	}
 
 	if (start_port(RTE_PORT_ALL) != 0)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v12 0/7] hot-unplug failure handle mechanism
  2017-06-29  4:37     ` [PATCH v3 0/2] add uevent api for hot plug Jeff Guo
                         ` (19 preceding siblings ...)
  2018-10-02 12:32       ` [PATCH v12 " Jeff Guo
@ 2018-10-02 12:35       ` Jeff Guo
  2018-10-02 12:35         ` [PATCH v12 1/7] bus: add hot-unplug handler Jeff Guo
                           ` (6 more replies)
  2018-10-04  6:30       ` [PATCH v13 0/7] " Jeff Guo
                         ` (2 subsequent siblings)
  23 siblings, 7 replies; 494+ messages in thread
From: Jeff Guo @ 2018-10-02 12:35 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu, anatoly.burakov, jerin.jacob
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

Hotplug is an important feature for use-cases like the datacenter device's
fail-safe and for SRIOV Live Migration in SDN/NFV. It could bring higher
flexibility and continuality to networking services in multiple use-cases
in the industry. So let's see how DPDK can help users implement hotplug
solutions.

We already have a general device-event monitor mechanism, failsafe driver,
and hot plug/unplug API in DPDK. We have already got the solution of
“ethdev event + kernel PMD hotplug handler + failsafe”, but we still not
got “eal event + hotplug handler for pci PMD + failsafe” implement, and we
need to considerate 2 different solutions between uio pci and vfio pci.

In the case of hotplug for igb_uio, when a hardware device be removed
physically or disabled in software, the application needs to be notified
and detach the device out of the bus, and then make the device invalidate.
The problem is that, the removal of the device is not instantaneous in
software. If the application data path tries to read/write to the device
when removal is still in process, it will cause an MMIO error and
application will crash.

In this patch set, we propose a PCIe bus failure handler mechanism for
hot-unplug in igb_uio. It aims to guarantee that, when a hot-unplug occurs,
the application will not crash.

The mechanism should work as below:

First, the application enables the device event monitor, registers the
hotplug event’s callback and enable hotplug handling before running the
data path. Once the hot-unplug occurs, the mechanism will detect the
removal event and then accordingly do the failure handling. In order to
do that, the below functionality will be required:
 - Add a new bus ops “hot_unplug_handler” to handle hot-unplug failure.
 - Implement pci bus specific ops “pci_hot_unplug_handler”. For uio pci,
   it will be based on the failure address to remap memory for the corresponding
   device that unplugged. For vfio pci, could seperate implement case by case.

For the data path or other unexpected behaviors from the control path
when a hot unplug occurs:
 - Add a new bus ops “sigbus_handler”, that is responsible for handling
   the sigbus error which is either an original memory error, or a specific
   memory error that is caused by a hot unplug. When a sigbus error is
   captured, it will call this function to handle sigbus error.
 - Implement PCI bus specific ops “pci_sigbus_handler”. It will iterate all
   device on PCI bus to find which device encounter the failure.
 - Implement a "rte_bus_sigbus_handler" to iterate all buses to find a bus
   to handle the failure.
 - Add a couple of APIs “rte_dev_hotplug_handle_enable” and
   “rte_dev_hotplug_handle_diable” to enable/disable hotplug handling.
   It will monitor the sigbus error by a handler which is per-process.
   Based on the signal event principle, the control path thread and the
   data path thread will randomly receive the sigbus error, but will call the
   common sigbus process. When sigbus be captured, it will call the above API
   to find bus to handle it.

The mechanism could be used by app or PMDs. For example, the whole process
of hotplug in testpmd is:
 - Enable device event monitor->Enable hotplug handle->Register event callback
   ->attach port->start port->start forwarding->Device unplug->failure handle
   ->stop forwarding->stop port->close port->detach port.

This patch set would not cover hotplug insert and binding, and it is only
implement the igb_uio failure handler, the vfio hotplug failure handler
will be in next coming patch set.

patchset history:
v12->v11:
add and delete some checking about sigbus recover.

v11->v10:
change the ops name, since both uio and vfio will use the hot-unplug ops.
since we plan to abandon RTE_ETH_EVENT_INTR_RMV, change to use
RTE_DEV_EVENT_REMOVE, so modify the hotplug event and callback usage.
move the igb_uio fixing part, since it is random issue and should be considarate
as kernel driver defect but not include as this failure handler mechanism.

v10->v9:
modify the api name and exposure out for public use.
add hotplug handle enable/disable APIs
refine commit log

v9->v8:
refine commit log to be more readable.

v8->v7:
refine errno process in sigbus handler.
refine igb uio release process

v7->v6:
delete some unused part

v6->v5:
refine some description about bus ops
refine commit log
add some entry check.

v5->v4:
split patches to focus on the failure handle, remove the event usage
by testpmd to another patch.
change the hotplug failure handler name.
refine the sigbus handle logic.
add lock for udev state in igb uio driver.

v4->v3:
split patches to be small and clear.
change to use new parameter "--hotplug-mode" in testpmd to identify
the eal hotplug and ethdev hotplug.

v3->v2:
change bus ops name to bus_hotplug_handler.
add new API and bus ops of bus_signal_handler distingush handle generic.
sigbus and hotplug sigbus.

v2->v1(v21):
refine some doc and commit log.
fix igb uio kernel issue for control path failure rebase testpmd code.

Since the hot plug solution be discussed serval around in the public,
the scope be changed and the patch set be split into many times. Coming
to the recently RFC and feature design, it just focus on the hot unplug
failure handler at this patch set, so in order let this topic more clear
and focus, summarize privours patch set in history “v1(v21)”, the v2 here
go ahead for further track.

"v1(21)" == v21 as below:
v21->v20:
split function in hot unplug ops.
sync failure hanlde to fix multiple process issue fix attach port issue for multiple devices case.
combind rmv callback function to be only one.

v20->v19:
clean the code.
refine the remap logic for multiple device.
remove the auto binding.

v19->18:
note for limitation of multiple hotplug, fix some typo, sqeeze patch.

v18->v15:
add document, add signal bus handler, refine the code to be more clear.

the prior patch history please check the patch set "add device event monitor framework".

Jeff Guo (7):
  bus: add hot-unplug handler
  bus/pci: implement hot-unplug handler ops
  bus: add sigbus handler
  bus/pci: implement sigbus handler ops
  bus: add helper to handle sigbus
  eal: add failure handle mechanism for hot-unplug
  testpmd: use hot-unplug failure handle mechanism

 app/test-pmd/testpmd.c                  |  39 ++++++--
 doc/guides/rel_notes/release_18_08.rst  |   5 +
 drivers/bus/pci/pci_common.c            |  81 ++++++++++++++++
 drivers/bus/pci/pci_common_uio.c        |  33 +++++++
 drivers/bus/pci/private.h               |  12 +++
 lib/librte_eal/bsdapp/eal/eal_dev.c     |  14 +++
 lib/librte_eal/common/eal_common_bus.c  |  43 +++++++++
 lib/librte_eal/common/eal_private.h     |  39 ++++++++
 lib/librte_eal/common/include/rte_bus.h |  34 +++++++
 lib/librte_eal/common/include/rte_dev.h |  26 +++++
 lib/librte_eal/linuxapp/eal/eal_dev.c   | 164 +++++++++++++++++++++++++++++++-
 lib/librte_eal/rte_eal_version.map      |   2 +
 12 files changed, 483 insertions(+), 9 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH v12 1/7] bus: add hot-unplug handler
  2018-10-02 12:35       ` [PATCH v12 0/7] " Jeff Guo
@ 2018-10-02 12:35         ` Jeff Guo
  2018-10-02 12:35         ` [PATCH v12 2/7] bus/pci: implement hot-unplug handler ops Jeff Guo
                           ` (5 subsequent siblings)
  6 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-10-02 12:35 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu, anatoly.burakov, jerin.jacob
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

A hot-unplug failure and app crash can be caused, when a device is
hot-unplugged but the application still try to access the device
by reading or writing from the BARs, which is already invalid but
still not timely be unmap or released.

This patch introduces bus ops to handle hot-unplug failures. Each
bus can implement its own case-dependent logic to handle the failures.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v12->v11:
no change.
---
 lib/librte_eal/common/include/rte_bus.h | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
index b7b5b08..1bb53dc 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -168,6 +168,20 @@ typedef int (*rte_bus_unplug_t)(struct rte_device *dev);
 typedef int (*rte_bus_parse_t)(const char *name, void *addr);
 
 /**
+ * Implement a specific hot-unplug handler, which is responsible for
+ * handle the failure when device be hot-unplugged. When the event of
+ * hot-unplug be detected, it could call this function to handle
+ * the hot-unplug failure and avoid app crash.
+ * @param dev
+ *	Pointer of the device structure.
+ *
+ * @return
+ *	0 on success.
+ *	!0 on error.
+ */
+typedef int (*rte_bus_hot_unplug_handler_t)(struct rte_device *dev);
+
+/**
  * Bus scan policies
  */
 enum rte_bus_scan_mode {
@@ -212,6 +226,8 @@ struct rte_bus {
 	struct rte_bus_conf conf;    /**< Bus configuration */
 	rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
 	rte_dev_iterate_t dev_iterate; /**< Device iterator. */
+	rte_bus_hot_unplug_handler_t hot_unplug_handler;
+				/**< handle hot-unplug failure on the bus */
 };
 
 /**
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v12 2/7] bus/pci: implement hot-unplug handler ops
  2018-10-02 12:35       ` [PATCH v12 0/7] " Jeff Guo
  2018-10-02 12:35         ` [PATCH v12 1/7] bus: add hot-unplug handler Jeff Guo
@ 2018-10-02 12:35         ` Jeff Guo
  2018-10-02 12:35         ` [PATCH v12 3/7] bus: add sigbus handler Jeff Guo
                           ` (4 subsequent siblings)
  6 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-10-02 12:35 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu, anatoly.burakov, jerin.jacob
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

This patch implements the ops to handle hot-unplug on the PCI bus.
For UIO PCI, it could avoids BARs read/write errors by creating a
new dummy memory to remap the memory where the failure is. For VFIO
or other kernel driver, it could specific implement function to handle
hot-unplug case by case.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Shaopeng He <shaopeng.he@intel.com>
---
v12->v11:
no change.
---
 drivers/bus/pci/pci_common.c     | 28 ++++++++++++++++++++++++++++
 drivers/bus/pci/pci_common_uio.c | 33 +++++++++++++++++++++++++++++++++
 drivers/bus/pci/private.h        | 12 ++++++++++++
 3 files changed, 73 insertions(+)

diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
index 7736b3f..d286234 100644
--- a/drivers/bus/pci/pci_common.c
+++ b/drivers/bus/pci/pci_common.c
@@ -406,6 +406,33 @@ pci_find_device(const struct rte_device *start, rte_dev_cmp_t cmp,
 }
 
 static int
+pci_hot_unplug_handler(struct rte_device *dev)
+{
+	struct rte_pci_device *pdev = NULL;
+	int ret = 0;
+
+	pdev = RTE_DEV_TO_PCI(dev);
+	if (!pdev)
+		return -1;
+
+	switch (pdev->kdrv) {
+	case RTE_KDRV_IGB_UIO:
+	case RTE_KDRV_UIO_GENERIC:
+	case RTE_KDRV_NIC_UIO:
+		/* BARs resource is invalid, remap it to be safe. */
+		ret = pci_uio_remap_resource(pdev);
+		break;
+	default:
+		RTE_LOG(DEBUG, EAL,
+			"Not managed by a supported kernel driver, skipped\n");
+		ret = -1;
+		break;
+	}
+
+	return ret;
+}
+
+static int
 pci_plug(struct rte_device *dev)
 {
 	return pci_probe_all_drivers(RTE_DEV_TO_PCI(dev));
@@ -435,6 +462,7 @@ struct rte_pci_bus rte_pci_bus = {
 		.unplug = pci_unplug,
 		.parse = pci_parse,
 		.get_iommu_class = rte_pci_get_iommu_class,
+		.hot_unplug_handler = pci_hot_unplug_handler,
 	},
 	.device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
 	.driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
diff --git a/drivers/bus/pci/pci_common_uio.c b/drivers/bus/pci/pci_common_uio.c
index 54bc20b..7ea73db 100644
--- a/drivers/bus/pci/pci_common_uio.c
+++ b/drivers/bus/pci/pci_common_uio.c
@@ -146,6 +146,39 @@ pci_uio_unmap(struct mapped_pci_resource *uio_res)
 	}
 }
 
+/* remap the PCI resource of a PCI device in anonymous virtual memory */
+int
+pci_uio_remap_resource(struct rte_pci_device *dev)
+{
+	int i;
+	void *map_address;
+
+	if (dev == NULL)
+		return -1;
+
+	/* Remap all BARs */
+	for (i = 0; i != PCI_MAX_RESOURCE; i++) {
+		/* skip empty BAR */
+		if (dev->mem_resource[i].phys_addr == 0)
+			continue;
+		map_address = mmap(dev->mem_resource[i].addr,
+				(size_t)dev->mem_resource[i].len,
+				PROT_READ | PROT_WRITE,
+				MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
+		if (map_address == MAP_FAILED) {
+			RTE_LOG(ERR, EAL,
+				"Cannot remap resource for device %s\n",
+				dev->name);
+			return -1;
+		}
+		RTE_LOG(INFO, EAL,
+			"Successful remap resource for device %s\n",
+			dev->name);
+	}
+
+	return 0;
+}
+
 static struct mapped_pci_resource *
 pci_uio_find_resource(struct rte_pci_device *dev)
 {
diff --git a/drivers/bus/pci/private.h b/drivers/bus/pci/private.h
index 8ddd03e..6b312e5 100644
--- a/drivers/bus/pci/private.h
+++ b/drivers/bus/pci/private.h
@@ -123,6 +123,18 @@ void pci_uio_free_resource(struct rte_pci_device *dev,
 		struct mapped_pci_resource *uio_res);
 
 /**
+ * Remap the PCI resource of a PCI device in anonymous virtual memory.
+ *
+ * @param dev
+ *   Point to the struct rte pci device.
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+pci_uio_remap_resource(struct rte_pci_device *dev);
+
+/**
  * Map device memory to uio resource
  *
  * This function is private to EAL.
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v12 3/7] bus: add sigbus handler
  2018-10-02 12:35       ` [PATCH v12 0/7] " Jeff Guo
  2018-10-02 12:35         ` [PATCH v12 1/7] bus: add hot-unplug handler Jeff Guo
  2018-10-02 12:35         ` [PATCH v12 2/7] bus/pci: implement hot-unplug handler ops Jeff Guo
@ 2018-10-02 12:35         ` Jeff Guo
  2018-10-02 14:32           ` Burakov, Anatoly
  2018-10-02 12:35         ` [PATCH v12 4/7] bus/pci: implement sigbus handler ops Jeff Guo
                           ` (3 subsequent siblings)
  6 siblings, 1 reply; 494+ messages in thread
From: Jeff Guo @ 2018-10-02 12:35 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu, anatoly.burakov, jerin.jacob
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

When a device is hot-unplugged, a sigbus error will occur of the datapath
can still read/write to the device. A handler is required here to capture
the sigbus signal and handle it appropriately.

This patch introduces a bus ops to handle sigbus errors. Each bus can
implement its own case-dependent logic to handle the sigbus errors.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Shaopeng He <shaopeng.he@intel.com>
---
v12->v11:
no change.
---
 lib/librte_eal/common/include/rte_bus.h | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
index 1bb53dc..201454a 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -182,6 +182,21 @@ typedef int (*rte_bus_parse_t)(const char *name, void *addr);
 typedef int (*rte_bus_hot_unplug_handler_t)(struct rte_device *dev);
 
 /**
+ * Implement a specific sigbus handler, which is responsible for handling
+ * the sigbus error which is either original memory error, or specific memory
+ * error that caused of device be hot-unplugged. When sigbus error be captured,
+ * it could call this function to handle sigbus error.
+ * @param failure_addr
+ *	Pointer of the fault address of the sigbus error.
+ *
+ * @return
+ *	0 for success handle the sigbus.
+ *	1 for no bus handle the sigbus.
+ *	-1 for failed to handle the sigbus
+ */
+typedef int (*rte_bus_sigbus_handler_t)(const void *failure_addr);
+
+/**
  * Bus scan policies
  */
 enum rte_bus_scan_mode {
@@ -228,6 +243,9 @@ struct rte_bus {
 	rte_dev_iterate_t dev_iterate; /**< Device iterator. */
 	rte_bus_hot_unplug_handler_t hot_unplug_handler;
 				/**< handle hot-unplug failure on the bus */
+	rte_bus_sigbus_handler_t sigbus_handler;
+					/**< handle sigbus error on the bus */
+
 };
 
 /**
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v12 4/7] bus/pci: implement sigbus handler ops
  2018-10-02 12:35       ` [PATCH v12 0/7] " Jeff Guo
                           ` (2 preceding siblings ...)
  2018-10-02 12:35         ` [PATCH v12 3/7] bus: add sigbus handler Jeff Guo
@ 2018-10-02 12:35         ` Jeff Guo
  2018-10-02 14:39           ` Burakov, Anatoly
  2018-10-02 12:35         ` [PATCH v12 5/7] bus: add helper to handle sigbus Jeff Guo
                           ` (2 subsequent siblings)
  6 siblings, 1 reply; 494+ messages in thread
From: Jeff Guo @ 2018-10-02 12:35 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu, anatoly.burakov, jerin.jacob
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

This patch implements the ops for the PCI bus sigbus handler. It finds the
PCI device that is being hot-unplugged and calls the relevant ops of the
hot-unplug handler to handle the hot-unplug failure of the device.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Shaopeng He <shaopeng.he@intel.com>
---
v12->v11:
no change.
---
 drivers/bus/pci/pci_common.c | 53 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 53 insertions(+)

diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
index d286234..f313fe9 100644
--- a/drivers/bus/pci/pci_common.c
+++ b/drivers/bus/pci/pci_common.c
@@ -405,6 +405,36 @@ pci_find_device(const struct rte_device *start, rte_dev_cmp_t cmp,
 	return NULL;
 }
 
+/**
+ * find the device which encounter the failure, by iterate over all device on
+ * PCI bus to check if the memory failure address is located in the range
+ * of the BARs of the device.
+ */
+static struct rte_pci_device *
+pci_find_device_by_addr(const void *failure_addr)
+{
+	struct rte_pci_device *pdev = NULL;
+	int i;
+
+	FOREACH_DEVICE_ON_PCIBUS(pdev) {
+		for (i = 0; i != RTE_DIM(pdev->mem_resource); i++) {
+			if ((uint64_t)(uintptr_t)failure_addr >=
+			    (uint64_t)(uintptr_t)pdev->mem_resource[i].addr &&
+			    (uint64_t)(uintptr_t)failure_addr <
+			    (uint64_t)(uintptr_t)pdev->mem_resource[i].addr +
+			    pdev->mem_resource[i].len) {
+				RTE_LOG(INFO, EAL, "Failure address "
+					"%16.16"PRIx64" belongs to "
+					"device %s!\n",
+					(uint64_t)(uintptr_t)failure_addr,
+					pdev->device.name);
+				return pdev;
+			}
+		}
+	}
+	return NULL;
+}
+
 static int
 pci_hot_unplug_handler(struct rte_device *dev)
 {
@@ -433,6 +463,28 @@ pci_hot_unplug_handler(struct rte_device *dev)
 }
 
 static int
+pci_sigbus_handler(const void *failure_addr)
+{
+	struct rte_pci_device *pdev = NULL;
+	int ret = 0;
+
+	pdev = pci_find_device_by_addr(failure_addr);
+	if (!pdev) {
+		/* It is a generic sigbus error, no bus would handle it. */
+		ret = 1;
+	} else {
+		/* The sigbus error is caused of hot-unplug. */
+		ret = pci_hot_unplug_handler(&pdev->device);
+		if (ret) {
+			RTE_LOG(ERR, EAL, "Failed to handle hot-unplug for "
+				"device %s", pdev->name);
+			ret = -1;
+		}
+	}
+	return ret;
+}
+
+static int
 pci_plug(struct rte_device *dev)
 {
 	return pci_probe_all_drivers(RTE_DEV_TO_PCI(dev));
@@ -463,6 +515,7 @@ struct rte_pci_bus rte_pci_bus = {
 		.parse = pci_parse,
 		.get_iommu_class = rte_pci_get_iommu_class,
 		.hot_unplug_handler = pci_hot_unplug_handler,
+		.sigbus_handler = pci_sigbus_handler,
 	},
 	.device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
 	.driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v12 5/7] bus: add helper to handle sigbus
  2018-10-02 12:35       ` [PATCH v12 0/7] " Jeff Guo
                           ` (3 preceding siblings ...)
  2018-10-02 12:35         ` [PATCH v12 4/7] bus/pci: implement sigbus handler ops Jeff Guo
@ 2018-10-02 12:35         ` Jeff Guo
  2018-10-02 12:35         ` [PATCH v12 6/7] eal: add failure handle mechanism for hot-unplug Jeff Guo
  2018-10-02 12:35         ` [PATCH v12 7/7] testpmd: use hot-unplug failure handle mechanism Jeff Guo
  6 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-10-02 12:35 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu, anatoly.burakov, jerin.jacob
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

This patch aims to add a helper to iterate over all buses to find the
relevant bus to handle the sigbus error.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Shaopeng He <shaopeng.he@intel.com>
---
v12->v11:
no change.
---
 lib/librte_eal/common/eal_common_bus.c | 43 ++++++++++++++++++++++++++++++++++
 lib/librte_eal/common/eal_private.h    | 13 ++++++++++
 2 files changed, 56 insertions(+)

diff --git a/lib/librte_eal/common/eal_common_bus.c b/lib/librte_eal/common/eal_common_bus.c
index 0943851..62b7318 100644
--- a/lib/librte_eal/common/eal_common_bus.c
+++ b/lib/librte_eal/common/eal_common_bus.c
@@ -37,6 +37,7 @@
 #include <rte_bus.h>
 #include <rte_debug.h>
 #include <rte_string_fns.h>
+#include <rte_errno.h>
 
 #include "eal_private.h"
 
@@ -242,3 +243,45 @@ rte_bus_get_iommu_class(void)
 	}
 	return mode;
 }
+
+static int
+bus_handle_sigbus(const struct rte_bus *bus,
+			const void *failure_addr)
+{
+	int ret;
+
+	if (!bus->sigbus_handler)
+		return -1;
+
+	ret = bus->sigbus_handler(failure_addr);
+
+	/* find bus but handle failed, keep the errno be set. */
+	if (ret < 0 && rte_errno == 0)
+		rte_errno = ENOTSUP;
+
+	return ret > 0;
+}
+
+int
+rte_bus_sigbus_handler(const void *failure_addr)
+{
+	struct rte_bus *bus;
+
+	int ret = 0;
+	int old_errno = rte_errno;
+
+	rte_errno = 0;
+
+	bus = rte_bus_find(NULL, bus_handle_sigbus, failure_addr);
+	/* can not find bus. */
+	if (!bus)
+		return 1;
+	/* find bus but handle failed, pass on the new errno. */
+	else if (rte_errno != 0)
+		return -1;
+
+	/* restore the old errno. */
+	rte_errno = old_errno;
+
+	return ret;
+}
diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h
index 4f809a8..a2d1528 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -304,4 +304,17 @@ int
 rte_devargs_layers_parse(struct rte_devargs *devargs,
 			 const char *devstr);
 
+/**
+ * Iterate over all buses to find the corresponding bus to handle the sigbus
+ * error.
+ * @param failure_addr
+ *	Pointer of the fault address of the sigbus error.
+ *
+ * @return
+ *	 0 success to handle the sigbus.
+ *	-1 failed to handle the sigbus
+ *	 1 no bus can handler the sigbus
+ */
+int rte_bus_sigbus_handler(const void *failure_addr);
+
 #endif /* _EAL_PRIVATE_H_ */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v12 6/7] eal: add failure handle mechanism for hot-unplug
  2018-10-02 12:35       ` [PATCH v12 0/7] " Jeff Guo
                           ` (4 preceding siblings ...)
  2018-10-02 12:35         ` [PATCH v12 5/7] bus: add helper to handle sigbus Jeff Guo
@ 2018-10-02 12:35         ` Jeff Guo
  2018-10-02 13:34           ` Ananyev, Konstantin
  2018-10-02 15:53           ` Burakov, Anatoly
  2018-10-02 12:35         ` [PATCH v12 7/7] testpmd: use hot-unplug failure handle mechanism Jeff Guo
  6 siblings, 2 replies; 494+ messages in thread
From: Jeff Guo @ 2018-10-02 12:35 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu, anatoly.burakov, jerin.jacob
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

The mechanism can initially register the sigbus handler after the device
event monitor is enabled. When a sigbus event is captured, it will check
the failure address and accordingly handle the memory failure of the
corresponding device by invoke the hot-unplug handler. It could prevent
the application from crashing when a device is hot-unplugged.

By this patch, users could call below new added APIs to enable/disable
the device hotplug handle mechanism. Note that it just implement the
hot-unplug handler in these functions, the other handler of hotplug, such
as handler for hotplug binding, could be add in the future if need:
  - rte_dev_hotplug_handle_enable
  - rte_dev_hotplug_handle_disable

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v12->v11:
add and delete some checking about sigbus recover.
---
 doc/guides/rel_notes/release_18_08.rst  |   5 +
 lib/librte_eal/bsdapp/eal/eal_dev.c     |  14 +++
 lib/librte_eal/common/eal_private.h     |  26 +++++
 lib/librte_eal/common/include/rte_dev.h |  26 +++++
 lib/librte_eal/linuxapp/eal/eal_dev.c   | 164 +++++++++++++++++++++++++++++++-
 lib/librte_eal/rte_eal_version.map      |   2 +
 6 files changed, 236 insertions(+), 1 deletion(-)

diff --git a/doc/guides/rel_notes/release_18_08.rst b/doc/guides/rel_notes/release_18_08.rst
index 321fa84..fe0e60f 100644
--- a/doc/guides/rel_notes/release_18_08.rst
+++ b/doc/guides/rel_notes/release_18_08.rst
@@ -117,6 +117,11 @@ New Features
 
   Added support for chained mbufs (input and output).
 
+* **Added hot-unplug handle mechanism.**
+
+  ``rte_dev_hotplug_handle_enable`` and ``rte_dev_hotplug_handle_disable`` are
+  for enabling or disabling hotplug handle mechanism.
+
 
 API Changes
 -----------
diff --git a/lib/librte_eal/bsdapp/eal/eal_dev.c b/lib/librte_eal/bsdapp/eal/eal_dev.c
index 1c6c51b..255d611 100644
--- a/lib/librte_eal/bsdapp/eal/eal_dev.c
+++ b/lib/librte_eal/bsdapp/eal/eal_dev.c
@@ -19,3 +19,17 @@ rte_dev_event_monitor_stop(void)
 	RTE_LOG(ERR, EAL, "Device event is not supported for FreeBSD\n");
 	return -1;
 }
+
+int __rte_experimental
+rte_dev_hotplug_handle_enable(void)
+{
+	RTE_LOG(ERR, EAL, "Device event is not supported for FreeBSD\n");
+	return -1;
+}
+
+int __rte_experimental
+rte_dev_hotplug_handle_disable(void)
+{
+	RTE_LOG(ERR, EAL, "Device event is not supported for FreeBSD\n");
+	return -1;
+}
diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h
index a2d1528..637f20d 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -317,4 +317,30 @@ rte_devargs_layers_parse(struct rte_devargs *devargs,
  */
 int rte_bus_sigbus_handler(const void *failure_addr);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Register the sigbus handler.
+ *
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+rte_dev_sigbus_handler_register(void);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Unregister the sigbus handler.
+ *
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+rte_dev_sigbus_handler_unregister(void);
+
 #endif /* _EAL_PRIVATE_H_ */
diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h
index b80a805..ff580a0 100644
--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -460,4 +460,30 @@ rte_dev_event_monitor_start(void);
 int __rte_experimental
 rte_dev_event_monitor_stop(void);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Enable hotplug handling for devices.
+ *
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int __rte_experimental
+rte_dev_hotplug_handle_enable(void);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Disable hotplug handling for devices.
+ *
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int __rte_experimental
+rte_dev_hotplug_handle_disable(void);
+
 #endif /* _RTE_DEV_H_ */
diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
index 1cf6aeb..72fc033 100644
--- a/lib/librte_eal/linuxapp/eal/eal_dev.c
+++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
@@ -4,6 +4,8 @@
 
 #include <string.h>
 #include <unistd.h>
+#include <fcntl.h>
+#include <signal.h>
 #include <sys/socket.h>
 #include <linux/netlink.h>
 
@@ -14,15 +16,32 @@
 #include <rte_malloc.h>
 #include <rte_interrupts.h>
 #include <rte_alarm.h>
+#include <rte_bus.h>
+#include <rte_eal.h>
+#include <rte_spinlock.h>
+#include <rte_errno.h>
 
 #include "eal_private.h"
 
 static struct rte_intr_handle intr_handle = {.fd = -1 };
 static bool monitor_started;
+static bool hotplug_handle;
 
 #define EAL_UEV_MSG_LEN 4096
 #define EAL_UEV_MSG_ELEM_LEN 128
 
+/*
+ * spinlock for device hot-unplug failure handling. If it try to access bus or
+ * device, such as handle sigbus on bus or handle memory failure for device
+ * just need to use this lock. It could protect the bus and the device to avoid
+ * race condition.
+ */
+static rte_spinlock_t failure_handle_lock = RTE_SPINLOCK_INITIALIZER;
+
+static struct sigaction sigbus_action_old;
+
+static int sigbus_need_recover;
+
 static void dev_uev_handler(__rte_unused void *param);
 
 /* identify the system layer which reports this event. */
@@ -33,6 +52,49 @@ enum eal_dev_event_subsystem {
 	EAL_DEV_EVENT_SUBSYSTEM_MAX
 };
 
+static void
+sigbus_action_recover(void)
+{
+	if (sigbus_need_recover) {
+		sigaction(SIGBUS, &sigbus_action_old, NULL);
+		sigbus_need_recover = 0;
+	}
+}
+
+static void sigbus_handler(int signum, siginfo_t *info,
+				void *ctx __rte_unused)
+{
+	int ret;
+
+	RTE_LOG(INFO, EAL, "Thread[%d] catch SIGBUS, fault address:%p\n",
+		(int)pthread_self(), info->si_addr);
+
+	rte_spinlock_lock(&failure_handle_lock);
+	ret = rte_bus_sigbus_handler(info->si_addr);
+	rte_spinlock_unlock(&failure_handle_lock);
+	if (ret == -1) {
+		rte_exit(EXIT_FAILURE,
+			 "Failed to handle SIGBUS for hot-unplug, "
+			 "(rte_errno: %s)!", strerror(rte_errno));
+	} else if (ret == 1) {
+		if (sigbus_action_old.sa_handler)
+			(*(sigbus_action_old.sa_handler))(signum);
+		else
+			rte_exit(EXIT_FAILURE,
+				 "Failed to handle generic SIGBUS!");
+	}
+
+	RTE_LOG(INFO, EAL, "Success to handle SIGBUS for hot-unplug!\n");
+}
+
+static int cmp_dev_name(const struct rte_device *dev,
+	const void *_name)
+{
+	const char *name = _name;
+
+	return strcmp(dev->name, name);
+}
+
 static int
 dev_uev_socket_fd_create(void)
 {
@@ -147,6 +209,9 @@ dev_uev_handler(__rte_unused void *param)
 	struct rte_dev_event uevent;
 	int ret;
 	char buf[EAL_UEV_MSG_LEN];
+	struct rte_bus *bus;
+	struct rte_device *dev;
+	const char *busname = "";
 
 	memset(&uevent, 0, sizeof(struct rte_dev_event));
 	memset(buf, 0, EAL_UEV_MSG_LEN);
@@ -171,8 +236,43 @@ dev_uev_handler(__rte_unused void *param)
 	RTE_LOG(DEBUG, EAL, "receive uevent(name:%s, type:%d, subsystem:%d)\n",
 		uevent.devname, uevent.type, uevent.subsystem);
 
-	if (uevent.devname)
+	switch (uevent.subsystem) {
+	case EAL_DEV_EVENT_SUBSYSTEM_PCI:
+	case EAL_DEV_EVENT_SUBSYSTEM_UIO:
+		busname = "pci";
+		break;
+	default:
+		break;
+	}
+
+	if (uevent.devname) {
+		if (uevent.type == RTE_DEV_EVENT_REMOVE && hotplug_handle) {
+			rte_spinlock_lock(&failure_handle_lock);
+			bus = rte_bus_find_by_name(busname);
+			if (bus == NULL) {
+				RTE_LOG(ERR, EAL, "Cannot find bus (%s)\n",
+					busname);
+				return;
+			}
+
+			dev = bus->find_device(NULL, cmp_dev_name,
+					       uevent.devname);
+			if (dev == NULL) {
+				RTE_LOG(ERR, EAL, "Cannot find device (%s) on "
+					"bus (%s)\n", uevent.devname, busname);
+				return;
+			}
+
+			ret = bus->hot_unplug_handler(dev);
+			rte_spinlock_unlock(&failure_handle_lock);
+			if (ret) {
+				RTE_LOG(ERR, EAL, "Can not handle hot-unplug "
+					"for device (%s)\n", dev->name);
+				return;
+			}
+		}
 		dev_callback_process(uevent.devname, uevent.type);
+	}
 }
 
 int __rte_experimental
@@ -220,5 +320,67 @@ rte_dev_event_monitor_stop(void)
 	close(intr_handle.fd);
 	intr_handle.fd = -1;
 	monitor_started = false;
+
 	return 0;
 }
+
+int __rte_experimental
+rte_dev_sigbus_handler_register(void)
+{
+	sigset_t mask;
+	struct sigaction action;
+
+	rte_errno = 0;
+
+	if (sigbus_need_recover)
+		return 0;
+
+	sigemptyset(&mask);
+	sigaddset(&mask, SIGBUS);
+	action.sa_flags = SA_SIGINFO;
+	action.sa_mask = mask;
+	action.sa_sigaction = sigbus_handler;
+	sigbus_need_recover = !sigaction(SIGBUS, &action, &sigbus_action_old);
+
+	return rte_errno;
+}
+
+int __rte_experimental
+rte_dev_sigbus_handler_unregister(void)
+{
+	rte_errno = 0;
+
+	sigbus_action_recover();
+
+	return rte_errno;
+}
+
+int __rte_experimental
+rte_dev_hotplug_handle_enable(void)
+{
+	int ret = 0;
+
+	ret = rte_dev_sigbus_handler_register();
+	if (ret < 0)
+		RTE_LOG(ERR, EAL,
+			"fail to register sigbus handler for devices.\n");
+
+	hotplug_handle = true;
+
+	return ret;
+}
+
+int __rte_experimental
+rte_dev_hotplug_handle_disable(void)
+{
+	int ret = 0;
+
+	ret = rte_dev_sigbus_handler_unregister();
+	if (ret < 0)
+		RTE_LOG(ERR, EAL,
+			"fail to unregister sigbus handler for devices.\n");
+
+	hotplug_handle = false;
+
+	return ret;
+}
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 73282bb..a3255aa 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -281,6 +281,8 @@ EXPERIMENTAL {
 	rte_dev_event_callback_unregister;
 	rte_dev_event_monitor_start;
 	rte_dev_event_monitor_stop;
+	rte_dev_hotplug_handle_enable;
+	rte_dev_hotplug_handle_disable;
 	rte_dev_iterator_init;
 	rte_dev_iterator_next;
 	rte_devargs_add;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v12 7/7] testpmd: use hot-unplug failure handle mechanism
  2018-10-02 12:35       ` [PATCH v12 0/7] " Jeff Guo
                           ` (5 preceding siblings ...)
  2018-10-02 12:35         ` [PATCH v12 6/7] eal: add failure handle mechanism for hot-unplug Jeff Guo
@ 2018-10-02 12:35         ` Jeff Guo
  2018-10-02 15:21           ` Iremonger, Bernard
  6 siblings, 1 reply; 494+ messages in thread
From: Jeff Guo @ 2018-10-02 12:35 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu, anatoly.burakov, jerin.jacob
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

This patch use testpmd for example, to show how an app smoothly handle
failure when device be hot-unplug. Except app should enabled the device
event monitor and register the hotplug event’s callback, it also need
enable hotplug handle mechanism before running. Once app detect the
removal event, the hot-unplug callback would be called. It will first stop
the packet forwarding, then stop the port, close the port, and finally
detach the port to clean the device and release the resources.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v12->v11:
no change.
---
 app/test-pmd/testpmd.c | 39 +++++++++++++++++++++++++++++++--------
 1 file changed, 31 insertions(+), 8 deletions(-)

diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 001f0e5..bfef483 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -2093,14 +2093,22 @@ pmd_test_exit(void)
 
 	if (hot_plug) {
 		ret = rte_dev_event_monitor_stop();
-		if (ret)
+		if (ret) {
 			RTE_LOG(ERR, EAL,
 				"fail to stop device event monitor.");
+			return;
+		}
 
 		ret = eth_dev_event_callback_unregister();
 		if (ret)
+			return;
+
+		ret = rte_dev_hotplug_handle_disable();
+		if (ret) {
 			RTE_LOG(ERR, EAL,
-				"fail to unregister all event callbacks.");
+				"fail to disable hotplug handling.");
+			return;
+		}
 	}
 
 	printf("\nBye...\n");
@@ -2244,6 +2252,9 @@ static void
 eth_dev_event_callback(char *device_name, enum rte_dev_event_type type,
 			     __rte_unused void *arg)
 {
+	uint16_t port_id;
+	int ret;
+
 	if (type >= RTE_DEV_EVENT_MAX) {
 		fprintf(stderr, "%s called upon invalid event %d\n",
 			__func__, type);
@@ -2254,9 +2265,12 @@ eth_dev_event_callback(char *device_name, enum rte_dev_event_type type,
 	case RTE_DEV_EVENT_REMOVE:
 		RTE_LOG(ERR, EAL, "The device: %s has been removed!\n",
 			device_name);
-		/* TODO: After finish failure handle, begin to stop
-		 * packet forward, stop port, close port, detach port.
-		 */
+		ret = rte_eth_dev_get_port_by_name(device_name, &port_id);
+		if (ret) {
+			printf("can not get port by device %s!\n", device_name);
+			return;
+		}
+		rmv_event_callback((void *)(intptr_t)port_id);
 		break;
 	case RTE_DEV_EVENT_ADD:
 		RTE_LOG(ERR, EAL, "The device: %s has been added!\n",
@@ -2779,14 +2793,23 @@ main(int argc, char** argv)
 	init_config();
 
 	if (hot_plug) {
-		/* enable hot plug monitoring */
+		ret = rte_dev_hotplug_handle_enable();
+		if (ret) {
+			RTE_LOG(ERR, EAL,
+				"fail to enable hotplug handling.");
+			return -1;
+		}
+
 		ret = rte_dev_event_monitor_start();
 		if (ret) {
-			rte_errno = EINVAL;
+			RTE_LOG(ERR, EAL,
+				"fail to start device event monitoring.");
 			return -1;
 		}
-		eth_dev_event_callback_register();
 
+		ret = eth_dev_event_callback_register();
+		if (ret)
+			return -1;
 	}
 
 	if (start_port(RTE_PORT_ALL) != 0)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* Re: [PATCH v12 6/7] eal: add failure handle mechanism for hot-unplug
  2018-10-02 12:35         ` [PATCH v12 6/7] eal: add failure handle mechanism for hot-unplug Jeff Guo
@ 2018-10-02 13:34           ` Ananyev, Konstantin
  2018-10-04  2:31             ` Jeff Guo
  2018-10-02 15:53           ` Burakov, Anatoly
  1 sibling, 1 reply; 494+ messages in thread
From: Ananyev, Konstantin @ 2018-10-02 13:34 UTC (permalink / raw)
  To: Guo, Jia, stephen, Richardson, Bruce, Yigit, Ferruh,
	gaetan.rivet, Wu, Jingjing, thomas, motih, matan, Van Haaren,
	Harry, Zhang, Qi Z, He, Shaopeng, Iremonger, Bernard, arybchenko,
	Lu, Wenzhuo, Burakov, Anatoly, jerin.jacob
  Cc: jblunck, shreyansh.jain, dev, Zhang, Helin

Hi Jeff,

Looks ok to me in general, just one thing I missed before:

> +static void sigbus_handler(int signum, siginfo_t *info,
> +				void *ctx __rte_unused)
> +{
> +	int ret;
> +
> +	RTE_LOG(INFO, EAL, "Thread[%d] catch SIGBUS, fault address:%p\n",
> +		(int)pthread_self(), info->si_addr);
> +
> +	rte_spinlock_lock(&failure_handle_lock);
> +	ret = rte_bus_sigbus_handler(info->si_addr);
> +	rte_spinlock_unlock(&failure_handle_lock);
> +	if (ret == -1) {
> +		rte_exit(EXIT_FAILURE,
> +			 "Failed to handle SIGBUS for hot-unplug, "
> +			 "(rte_errno: %s)!", strerror(rte_errno));
> +	} else if (ret == 1) {
> +		if (sigbus_action_old.sa_handler)
> +			(*(sigbus_action_old.sa_handler))(signum);

Shouldn't we check sigbus_action_old.sa_flags here,and based on that
invoke either sa_handler() or sa_sigaction()?
Konstantin

> +		else
> +			rte_exit(EXIT_FAILURE,
> +				 "Failed to handle generic SIGBUS!");
> +	}
> +
> +	RTE_LOG(INFO, EAL, "Success to handle SIGBUS for hot-unplug!\n");
> +}

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v12 3/7] bus: add sigbus handler
  2018-10-02 12:35         ` [PATCH v12 3/7] bus: add sigbus handler Jeff Guo
@ 2018-10-02 14:32           ` Burakov, Anatoly
  2018-10-04  3:14             ` Jeff Guo
  0 siblings, 1 reply; 494+ messages in thread
From: Burakov, Anatoly @ 2018-10-02 14:32 UTC (permalink / raw)
  To: Jeff Guo, stephen, bruce.richardson, ferruh.yigit,
	konstantin.ananyev, gaetan.rivet, jingjing.wu, thomas, motih,
	matan, harry.van.haaren, qi.z.zhang, shaopeng.he,
	bernard.iremonger, arybchenko, wenzhuo.lu, jerin.jacob
  Cc: jblunck, shreyansh.jain, dev, helin.zhang

On 02-Oct-18 1:35 PM, Jeff Guo wrote:
> When a device is hot-unplugged, a sigbus error will occur of the datapath
> can still read/write to the device. A handler is required here to capture
> the sigbus signal and handle it appropriately.
> 
> This patch introduces a bus ops to handle sigbus errors. Each bus can
> implement its own case-dependent logic to handle the sigbus errors.
> 
> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> Acked-by: Shaopeng He <shaopeng.he@intel.com>
> ---
> v12->v11:
> no change.
> ---
>   lib/librte_eal/common/include/rte_bus.h | 18 ++++++++++++++++++
>   1 file changed, 18 insertions(+)
> 
> diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
> index 1bb53dc..201454a 100644
> --- a/lib/librte_eal/common/include/rte_bus.h
> +++ b/lib/librte_eal/common/include/rte_bus.h
> @@ -182,6 +182,21 @@ typedef int (*rte_bus_parse_t)(const char *name, void *addr);
>   typedef int (*rte_bus_hot_unplug_handler_t)(struct rte_device *dev);
>   
>   /**
> + * Implement a specific sigbus handler, which is responsible for handling
> + * the sigbus error which is either original memory error, or specific memory
> + * error that caused of device be hot-unplugged. When sigbus error be captured,
> + * it could call this function to handle sigbus error.
> + * @param failure_addr
> + *	Pointer of the fault address of the sigbus error.
> + *
> + * @return
> + *	0 for success handle the sigbus.
> + *	1 for no bus handle the sigbus.

I think the comment here should be reworded. I can't parse "no bus 
handle the sigbus" - what does that mean, and how is it different from 
an error?

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v12 4/7] bus/pci: implement sigbus handler ops
  2018-10-02 12:35         ` [PATCH v12 4/7] bus/pci: implement sigbus handler ops Jeff Guo
@ 2018-10-02 14:39           ` Burakov, Anatoly
  2018-10-04  3:58             ` Jeff Guo
  0 siblings, 1 reply; 494+ messages in thread
From: Burakov, Anatoly @ 2018-10-02 14:39 UTC (permalink / raw)
  To: Jeff Guo, stephen, bruce.richardson, ferruh.yigit,
	konstantin.ananyev, gaetan.rivet, jingjing.wu, thomas, motih,
	matan, harry.van.haaren, qi.z.zhang, shaopeng.he,
	bernard.iremonger, arybchenko, wenzhuo.lu, jerin.jacob
  Cc: jblunck, shreyansh.jain, dev, helin.zhang

On 02-Oct-18 1:35 PM, Jeff Guo wrote:
> This patch implements the ops for the PCI bus sigbus handler. It finds the
> PCI device that is being hot-unplugged and calls the relevant ops of the
> hot-unplug handler to handle the hot-unplug failure of the device.
> 
> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> Acked-by: Shaopeng He <shaopeng.he@intel.com>
> ---
> v12->v11:
> no change.
> ---
>   drivers/bus/pci/pci_common.c | 53 ++++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 53 insertions(+)
> 
> diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
> index d286234..f313fe9 100644
> --- a/drivers/bus/pci/pci_common.c
> +++ b/drivers/bus/pci/pci_common.c
> @@ -405,6 +405,36 @@ pci_find_device(const struct rte_device *start, rte_dev_cmp_t cmp,
>   	return NULL;
>   }
>   
> +/**
> + * find the device which encounter the failure, by iterate over all device on
> + * PCI bus to check if the memory failure address is located in the range
> + * of the BARs of the device.
> + */
> +static struct rte_pci_device *
> +pci_find_device_by_addr(const void *failure_addr)
> +{
> +	struct rte_pci_device *pdev = NULL;
> +	int i;
> +
> +	FOREACH_DEVICE_ON_PCIBUS(pdev) {
> +		for (i = 0; i != RTE_DIM(pdev->mem_resource); i++) {
> +			if ((uint64_t)(uintptr_t)failure_addr >=
> +			    (uint64_t)(uintptr_t)pdev->mem_resource[i].addr &&
> +			    (uint64_t)(uintptr_t)failure_addr <
> +			    (uint64_t)(uintptr_t)pdev->mem_resource[i].addr +
> +			    pdev->mem_resource[i].len) {

You must *really* dislike local variables :) Suggested rewriting:

const void *start, *end;
size_t len;

start = pdev->mem_resource[i].addr;
len = pdev->mem_resource[i].len;
end = RTE_PTR_ADD(start, len);

if (failure_addr >= start && failure_addr < end) {
	...
}

I think this is way clearer.

> +				RTE_LOG(INFO, EAL, "Failure address "
> +					"%16.16"PRIx64" belongs to "
> +					"device %s!\n",
> +					(uint64_t)(uintptr_t)failure_addr,
> +					pdev->device.name);

I feel like this should have DEBUG level, rather than INFO. 
Alternatively, if you really feel like this should be at level INFO, the 
message should be reworded because the word "failure" might give the 
wrong impression :)

(but really, i think this is info useful for debugging purposes but not 
interesting generally, so it should be under DEBUG IMO)

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v12 7/7] testpmd: use hot-unplug failure handle mechanism
  2018-10-02 12:35         ` [PATCH v12 7/7] testpmd: use hot-unplug failure handle mechanism Jeff Guo
@ 2018-10-02 15:21           ` Iremonger, Bernard
  2018-10-04  2:56             ` Jeff Guo
  0 siblings, 1 reply; 494+ messages in thread
From: Iremonger, Bernard @ 2018-10-02 15:21 UTC (permalink / raw)
  To: Guo, Jia, stephen, Richardson, Bruce, Yigit, Ferruh, Ananyev,
	Konstantin, gaetan.rivet, Wu, Jingjing, thomas, motih, matan,
	Van Haaren, Harry, Zhang, Qi Z, He, Shaopeng, arybchenko, Lu,
	Wenzhuo, Burakov, Anatoly, jerin.jacob
  Cc: jblunck, shreyansh.jain, dev, Zhang, Helin

Hi Jeff,

<snip>

> Subject: [PATCH v12 7/7] testpmd: use hot-unplug failure handle mechanism

./devtools/check-git-log.sh -1
Wrong headline label:
        testpmd: use hot-unplug failure handle mechanism
 
> This patch use testpmd for example, to show how an app smoothly handle
> failure when device be hot-unplug. Except app should enabled the device event
> monitor and register the hotplug event’s callback, it also need enable hotplug
> handle mechanism before running. Once app detect the removal event, the hot-
> unplug callback would be called. It will first stop the packet forwarding, then
> stop the port, close the port, and finally detach the port to clean the device and
> release the resources.
> 
> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> ---
> v12->v11:
> no change.
> ---
>  app/test-pmd/testpmd.c | 39 +++++++++++++++++++++++++++++++--------
>  1 file changed, 31 insertions(+), 8 deletions(-)
> 
> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
> 001f0e5..bfef483 100644
> --- a/app/test-pmd/testpmd.c
> +++ b/app/test-pmd/testpmd.c
> @@ -2093,14 +2093,22 @@ pmd_test_exit(void)
> 
>  	if (hot_plug) {
>  		ret = rte_dev_event_monitor_stop();
> -		if (ret)
> +		if (ret) {
>  			RTE_LOG(ERR, EAL,
>  				"fail to stop device event monitor.");
> +			return;
> +		}
> 
>  		ret = eth_dev_event_callback_unregister();
>  		if (ret)

Should there be an RTE_LOG() call here?

> +			return;
> +
> +		ret = rte_dev_hotplug_handle_disable();
> +		if (ret) {
>  			RTE_LOG(ERR, EAL,
> -				"fail to unregister all event callbacks.");
> +				"fail to disable hotplug handling.");
> +			return;
> +		}
>  	}
> 
>  	printf("\nBye...\n");
> @@ -2244,6 +2252,9 @@ static void
>  eth_dev_event_callback(char *device_name, enum rte_dev_event_type type,
>  			     __rte_unused void *arg)
>  {
> +	uint16_t port_id;
> +	int ret;
> +
>  	if (type >= RTE_DEV_EVENT_MAX) {
>  		fprintf(stderr, "%s called upon invalid event %d\n",
>  			__func__, type);
> @@ -2254,9 +2265,12 @@ eth_dev_event_callback(char *device_name, enum
> rte_dev_event_type type,
>  	case RTE_DEV_EVENT_REMOVE:
>  		RTE_LOG(ERR, EAL, "The device: %s has been removed!\n",
>  			device_name);
> -		/* TODO: After finish failure handle, begin to stop
> -		 * packet forward, stop port, close port, detach port.
> -		 */
> +		ret = rte_eth_dev_get_port_by_name(device_name, &port_id);
> +		if (ret) {
> +			printf("can not get port by device %s!\n",
> device_name);

It would be better to use an RTE_LOG() call here instead of printf().

> +			return;
> +		}
> +		rmv_event_callback((void *)(intptr_t)port_id);
>  		break;
>  	case RTE_DEV_EVENT_ADD:
>  		RTE_LOG(ERR, EAL, "The device: %s has been added!\n", @@ -
> 2779,14 +2793,23 @@ main(int argc, char** argv)
>  	init_config();
> 
>  	if (hot_plug) {
> -		/* enable hot plug monitoring */
> +		ret = rte_dev_hotplug_handle_enable();
> +		if (ret) {
> +			RTE_LOG(ERR, EAL,
> +				"fail to enable hotplug handling.");
> +			return -1;
> +		}
> +
>  		ret = rte_dev_event_monitor_start();
>  		if (ret) {
> -			rte_errno = EINVAL;
> +			RTE_LOG(ERR, EAL,
> +				"fail to start device event monitoring.");
>  			return -1;
>  		}
> -		eth_dev_event_callback_register();
> 
> +		ret = eth_dev_event_callback_register();
> +		if (ret)

Should there be an RTE_LOG() call here?

> +			return -1;
>  	}
> 
>  	if (start_port(RTE_PORT_ALL) != 0)
> --
> 2.7.4

Regards,

Bernard.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v12 6/7] eal: add failure handle mechanism for hot-unplug
  2018-10-02 12:35         ` [PATCH v12 6/7] eal: add failure handle mechanism for hot-unplug Jeff Guo
  2018-10-02 13:34           ` Ananyev, Konstantin
@ 2018-10-02 15:53           ` Burakov, Anatoly
  2018-10-02 16:00             ` Ananyev, Konstantin
  2018-10-04  3:12             ` Jeff Guo
  1 sibling, 2 replies; 494+ messages in thread
From: Burakov, Anatoly @ 2018-10-02 15:53 UTC (permalink / raw)
  To: Jeff Guo, stephen, bruce.richardson, ferruh.yigit,
	konstantin.ananyev, gaetan.rivet, jingjing.wu, thomas, motih,
	matan, harry.van.haaren, qi.z.zhang, shaopeng.he,
	bernard.iremonger, arybchenko, wenzhuo.lu, jerin.jacob
  Cc: jblunck, shreyansh.jain, dev, helin.zhang

On 02-Oct-18 1:35 PM, Jeff Guo wrote:
> The mechanism can initially register the sigbus handler after the device
> event monitor is enabled. When a sigbus event is captured, it will check
> the failure address and accordingly handle the memory failure of the
> corresponding device by invoke the hot-unplug handler. It could prevent
> the application from crashing when a device is hot-unplugged.
> 
> By this patch, users could call below new added APIs to enable/disable
> the device hotplug handle mechanism. Note that it just implement the
> hot-unplug handler in these functions, the other handler of hotplug, such
> as handler for hotplug binding, could be add in the future if need:
>    - rte_dev_hotplug_handle_enable
>    - rte_dev_hotplug_handle_disable
> 
> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> ---

<snip>

> +static void sigbus_handler(int signum, siginfo_t *info,
> +				void *ctx __rte_unused)
> +{
> +	int ret;
> +
> +	RTE_LOG(INFO, EAL, "Thread[%d] catch SIGBUS, fault address:%p\n",
> +		(int)pthread_self(), info->si_addr);
> +
> +	rte_spinlock_lock(&failure_handle_lock);
> +	ret = rte_bus_sigbus_handler(info->si_addr);
> +	rte_spinlock_unlock(&failure_handle_lock);
> +	if (ret == -1) {
> +		rte_exit(EXIT_FAILURE,
> +			 "Failed to handle SIGBUS for hot-unplug, "
> +			 "(rte_errno: %s)!", strerror(rte_errno));

Do we really want to exit the application on sigbus handle failure?

> +	} else if (ret == 1) {
> +		if (sigbus_action_old.sa_handler)
> +			(*(sigbus_action_old.sa_handler))(signum);
> +		else
> +			rte_exit(EXIT_FAILURE,
> +				 "Failed to handle generic SIGBUS!");
> +	}
> +
> +	RTE_LOG(INFO, EAL, "Success to handle SIGBUS for hot-unplug!\n");

Again, does this all need to be with INFO log level? IMO it should be DEBUG.

> +}
> +
> +static int cmp_dev_name(const struct rte_device *dev,
> +	const void *_name)
> +{
> +	const char *name = _name;
> +
> +	return strcmp(dev->name, name);
> +}
> +
>   static int

<snip>

>   
>   int __rte_experimental
> @@ -220,5 +320,67 @@ rte_dev_event_monitor_stop(void)
>   	close(intr_handle.fd);
>   	intr_handle.fd = -1;
>   	monitor_started = false;
> +
>   	return 0;

This looks like unintended change.

>   }
> +
> +int __rte_experimental
> +rte_dev_sigbus_handler_register(void)
> +{
> +	sigset_t mask;
> +	struct sigaction action;
> +

<snip>

> --- a/lib/librte_eal/rte_eal_version.map
> +++ b/lib/librte_eal/rte_eal_version.map
> @@ -281,6 +281,8 @@ EXPERIMENTAL {
>   	rte_dev_event_callback_unregister;
>   	rte_dev_event_monitor_start;
>   	rte_dev_event_monitor_stop;
> +	rte_dev_hotplug_handle_enable;
> +	rte_dev_hotplug_handle_disable;

Nitpicking - disable should be above enable, as E follows D in alphabet :)

>   	rte_dev_iterator_init;
>   	rte_dev_iterator_next;
>   	rte_devargs_add;
> 


-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v12 6/7] eal: add failure handle mechanism for hot-unplug
  2018-10-02 15:53           ` Burakov, Anatoly
@ 2018-10-02 16:00             ` Ananyev, Konstantin
  2018-10-04  3:12             ` Jeff Guo
  1 sibling, 0 replies; 494+ messages in thread
From: Ananyev, Konstantin @ 2018-10-02 16:00 UTC (permalink / raw)
  To: Burakov, Anatoly, Guo, Jia, stephen, Richardson, Bruce, Yigit,
	Ferruh, gaetan.rivet, Wu, Jingjing, thomas, motih, matan,
	Van Haaren, Harry, Zhang, Qi Z, He, Shaopeng, Iremonger, Bernard,
	arybchenko, Lu, Wenzhuo, jerin.jacob
  Cc: jblunck, shreyansh.jain, dev, Zhang, Helin



> -----Original Message-----
> From: Burakov, Anatoly
> Sent: Tuesday, October 2, 2018 4:54 PM
> To: Guo, Jia <jia.guo@intel.com>; stephen@networkplumber.org; Richardson, Bruce <bruce.richardson@intel.com>; Yigit, Ferruh
> <ferruh.yigit@intel.com>; Ananyev, Konstantin <konstantin.ananyev@intel.com>; gaetan.rivet@6wind.com; Wu, Jingjing
> <jingjing.wu@intel.com>; thomas@monjalon.net; motih@mellanox.com; matan@mellanox.com; Van Haaren, Harry
> <harry.van.haaren@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>; He, Shaopeng <shaopeng.he@intel.com>; Iremonger, Bernard
> <bernard.iremonger@intel.com>; arybchenko@solarflare.com; Lu, Wenzhuo <wenzhuo.lu@intel.com>; jerin.jacob@caviumnetworks.com
> Cc: jblunck@infradead.org; shreyansh.jain@nxp.com; dev@dpdk.org; Zhang, Helin <helin.zhang@intel.com>
> Subject: Re: [PATCH v12 6/7] eal: add failure handle mechanism for hot-unplug
> 
> On 02-Oct-18 1:35 PM, Jeff Guo wrote:
> > The mechanism can initially register the sigbus handler after the device
> > event monitor is enabled. When a sigbus event is captured, it will check
> > the failure address and accordingly handle the memory failure of the
> > corresponding device by invoke the hot-unplug handler. It could prevent
> > the application from crashing when a device is hot-unplugged.
> >
> > By this patch, users could call below new added APIs to enable/disable
> > the device hotplug handle mechanism. Note that it just implement the
> > hot-unplug handler in these functions, the other handler of hotplug, such
> > as handler for hotplug binding, could be add in the future if need:
> >    - rte_dev_hotplug_handle_enable
> >    - rte_dev_hotplug_handle_disable
> >
> > Signed-off-by: Jeff Guo <jia.guo@intel.com>
> > ---
> 
> <snip>
> 
> > +static void sigbus_handler(int signum, siginfo_t *info,
> > +				void *ctx __rte_unused)
> > +{
> > +	int ret;
> > +
> > +	RTE_LOG(INFO, EAL, "Thread[%d] catch SIGBUS, fault address:%p\n",
> > +		(int)pthread_self(), info->si_addr);
> > +
> > +	rte_spinlock_lock(&failure_handle_lock);
> > +	ret = rte_bus_sigbus_handler(info->si_addr);
> > +	rte_spinlock_unlock(&failure_handle_lock);
> > +	if (ret == -1) {
> > +		rte_exit(EXIT_FAILURE,
> > +			 "Failed to handle SIGBUS for hot-unplug, "
> > +			 "(rte_errno: %s)!", strerror(rte_errno));
> 
> Do we really want to exit the application on sigbus handle failure?

I'd say yes :)
What else we can do in such situation, except then die gracefully?
Konstantin

> 
> > +	} else if (ret == 1) {
> > +		if (sigbus_action_old.sa_handler)
> > +			(*(sigbus_action_old.sa_handler))(signum);
> > +		else
> > +			rte_exit(EXIT_FAILURE,
> > +				 "Failed to handle generic SIGBUS!");
> > +	}
> > +
> > +	RTE_LOG(INFO, EAL, "Success to handle SIGBUS for hot-unplug!\n");
> 
> Again, does this all need to be with INFO log level? IMO it should be DEBUG.
> 
> > +}
> > +
> > +static int cmp_dev_name(const struct rte_device *dev,
> > +	const void *_name)
> > +{
> > +	const char *name = _name;
> > +
> > +	return strcmp(dev->name, name);
> > +}
> > +
> >   static int
> 
> <snip>
> 
> >
> >   int __rte_experimental
> > @@ -220,5 +320,67 @@ rte_dev_event_monitor_stop(void)
> >   	close(intr_handle.fd);
> >   	intr_handle.fd = -1;
> >   	monitor_started = false;
> > +
> >   	return 0;
> 
> This looks like unintended change.
> 
> >   }
> > +
> > +int __rte_experimental
> > +rte_dev_sigbus_handler_register(void)
> > +{
> > +	sigset_t mask;
> > +	struct sigaction action;
> > +
> 
> <snip>
> 
> > --- a/lib/librte_eal/rte_eal_version.map
> > +++ b/lib/librte_eal/rte_eal_version.map
> > @@ -281,6 +281,8 @@ EXPERIMENTAL {
> >   	rte_dev_event_callback_unregister;
> >   	rte_dev_event_monitor_start;
> >   	rte_dev_event_monitor_stop;
> > +	rte_dev_hotplug_handle_enable;
> > +	rte_dev_hotplug_handle_disable;
> 
> Nitpicking - disable should be above enable, as E follows D in alphabet :)
> 
> >   	rte_dev_iterator_init;
> >   	rte_dev_iterator_next;
> >   	rte_devargs_add;
> >
> 
> 
> --
> Thanks,
> Anatoly

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v12 6/7] eal: add failure handle mechanism for hot-unplug
  2018-10-02 13:34           ` Ananyev, Konstantin
@ 2018-10-04  2:31             ` Jeff Guo
  0 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-10-04  2:31 UTC (permalink / raw)
  To: Ananyev, Konstantin, stephen, Richardson, Bruce, Yigit, Ferruh,
	gaetan.rivet, Wu, Jingjing, thomas, motih, matan, Van Haaren,
	Harry, Zhang, Qi Z, He, Shaopeng, Iremonger, Bernard, arybchenko,
	Lu, Wenzhuo, Burakov, Anatoly, jerin.jacob
  Cc: jblunck, shreyansh.jain, dev, Zhang, Helin


On 10/2/2018 9:34 PM, Ananyev, Konstantin wrote:
> Hi Jeff,
>
> Looks ok to me in general, just one thing I missed before:
>
>> +static void sigbus_handler(int signum, siginfo_t *info,
>> +				void *ctx __rte_unused)
>> +{
>> +	int ret;
>> +
>> +	RTE_LOG(INFO, EAL, "Thread[%d] catch SIGBUS, fault address:%p\n",
>> +		(int)pthread_self(), info->si_addr);
>> +
>> +	rte_spinlock_lock(&failure_handle_lock);
>> +	ret = rte_bus_sigbus_handler(info->si_addr);
>> +	rte_spinlock_unlock(&failure_handle_lock);
>> +	if (ret == -1) {
>> +		rte_exit(EXIT_FAILURE,
>> +			 "Failed to handle SIGBUS for hot-unplug, "
>> +			 "(rte_errno: %s)!", strerror(rte_errno));
>> +	} else if (ret == 1) {
>> +		if (sigbus_action_old.sa_handler)
>> +			(*(sigbus_action_old.sa_handler))(signum);
> Shouldn't we check sigbus_action_old.sa_flags here,and based on that
> invoke either sa_handler() or sa_sigaction()?
> Konstantin


you are right here, konstantin.

We should not assume the old action should always be sa_handler. There 
is a flags check missing here. Thanks.


>> +		else
>> +			rte_exit(EXIT_FAILURE,
>> +				 "Failed to handle generic SIGBUS!");
>> +	}
>> +
>> +	RTE_LOG(INFO, EAL, "Success to handle SIGBUS for hot-unplug!\n");
>> +}

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v12 7/7] testpmd: use hot-unplug failure handle mechanism
  2018-10-02 15:21           ` Iremonger, Bernard
@ 2018-10-04  2:56             ` Jeff Guo
  0 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-10-04  2:56 UTC (permalink / raw)
  To: Iremonger, Bernard, stephen, Richardson, Bruce, Yigit, Ferruh,
	Ananyev, Konstantin, gaetan.rivet, Wu, Jingjing, thomas, motih,
	matan, Van Haaren, Harry, Zhang, Qi Z, He, Shaopeng, arybchenko,
	Lu, Wenzhuo, Burakov, Anatoly, jerin.jacob
  Cc: jblunck, shreyansh.jain, dev, Zhang, Helin

hi, bernard

thanks for your review, comment as below.


On 10/2/2018 11:21 PM, Iremonger, Bernard wrote:
> Hi Jeff,
>
> <snip>
>
>> Subject: [PATCH v12 7/7] testpmd: use hot-unplug failure handle mechanism
> ./devtools/check-git-log.sh -1
> Wrong headline label:
>          testpmd: use hot-unplug failure handle mechanism
>   


ok, let me check it.


>> This patch use testpmd for example, to show how an app smoothly handle
>> failure when device be hot-unplug. Except app should enabled the device event
>> monitor and register the hotplug event’s callback, it also need enable hotplug
>> handle mechanism before running. Once app detect the removal event, the hot-
>> unplug callback would be called. It will first stop the packet forwarding, then
>> stop the port, close the port, and finally detach the port to clean the device and
>> release the resources.
>>
>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
>> ---
>> v12->v11:
>> no change.
>> ---
>>   app/test-pmd/testpmd.c | 39 +++++++++++++++++++++++++++++++--------
>>   1 file changed, 31 insertions(+), 8 deletions(-)
>>
>> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
>> 001f0e5..bfef483 100644
>> --- a/app/test-pmd/testpmd.c
>> +++ b/app/test-pmd/testpmd.c
>> @@ -2093,14 +2093,22 @@ pmd_test_exit(void)
>>
>>   	if (hot_plug) {
>>   		ret = rte_dev_event_monitor_stop();
>> -		if (ret)
>> +		if (ret) {
>>   			RTE_LOG(ERR, EAL,
>>   				"fail to stop device event monitor.");
>> +			return;
>> +		}
>>
>>   		ret = eth_dev_event_callback_unregister();
>>   		if (ret)
> Should there be an RTE_LOG() call here?


ok, in order to make it more clean, no need to add help here.


>> +			return;
>> +
>> +		ret = rte_dev_hotplug_handle_disable();
>> +		if (ret) {
>>   			RTE_LOG(ERR, EAL,
>> -				"fail to unregister all event callbacks.");
>> +				"fail to disable hotplug handling.");
>> +			return;
>> +		}
>>   	}
>>
>>   	printf("\nBye...\n");
>> @@ -2244,6 +2252,9 @@ static void
>>   eth_dev_event_callback(char *device_name, enum rte_dev_event_type type,
>>   			     __rte_unused void *arg)
>>   {
>> +	uint16_t port_id;
>> +	int ret;
>> +
>>   	if (type >= RTE_DEV_EVENT_MAX) {
>>   		fprintf(stderr, "%s called upon invalid event %d\n",
>>   			__func__, type);
>> @@ -2254,9 +2265,12 @@ eth_dev_event_callback(char *device_name, enum
>> rte_dev_event_type type,
>>   	case RTE_DEV_EVENT_REMOVE:
>>   		RTE_LOG(ERR, EAL, "The device: %s has been removed!\n",
>>   			device_name);
>> -		/* TODO: After finish failure handle, begin to stop
>> -		 * packet forward, stop port, close port, detach port.
>> -		 */
>> +		ret = rte_eth_dev_get_port_by_name(device_name, &port_id);
>> +		if (ret) {
>> +			printf("can not get port by device %s!\n",
>> device_name);
> It would be better to use an RTE_LOG() call here instead of printf().


ok.


>> +			return;
>> +		}
>> +		rmv_event_callback((void *)(intptr_t)port_id);
>>   		break;
>>   	case RTE_DEV_EVENT_ADD:
>>   		RTE_LOG(ERR, EAL, "The device: %s has been added!\n", @@ -
>> 2779,14 +2793,23 @@ main(int argc, char** argv)
>>   	init_config();
>>
>>   	if (hot_plug) {
>> -		/* enable hot plug monitoring */
>> +		ret = rte_dev_hotplug_handle_enable();
>> +		if (ret) {
>> +			RTE_LOG(ERR, EAL,
>> +				"fail to enable hotplug handling.");
>> +			return -1;
>> +		}
>> +
>>   		ret = rte_dev_event_monitor_start();
>>   		if (ret) {
>> -			rte_errno = EINVAL;
>> +			RTE_LOG(ERR, EAL,
>> +				"fail to start device event monitoring.");
>>   			return -1;
>>   		}
>> -		eth_dev_event_callback_register();
>>
>> +		ret = eth_dev_event_callback_register();
>> +		if (ret)
> Should there be an RTE_LOG() call here?


please see above answer.


>> +			return -1;
>>   	}
>>
>>   	if (start_port(RTE_PORT_ALL) != 0)
>> --
>> 2.7.4
> Regards,
>
> Bernard.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v12 6/7] eal: add failure handle mechanism for hot-unplug
  2018-10-02 15:53           ` Burakov, Anatoly
  2018-10-02 16:00             ` Ananyev, Konstantin
@ 2018-10-04  3:12             ` Jeff Guo
  1 sibling, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-10-04  3:12 UTC (permalink / raw)
  To: Burakov, Anatoly, stephen, bruce.richardson, ferruh.yigit,
	konstantin.ananyev, gaetan.rivet, jingjing.wu, thomas, motih,
	matan, harry.van.haaren, qi.z.zhang, shaopeng.he,
	bernard.iremonger, arybchenko, wenzhuo.lu, jerin.jacob
  Cc: jblunck, shreyansh.jain, dev, helin.zhang


On 10/2/2018 11:53 PM, Burakov, Anatoly wrote:
> On 02-Oct-18 1:35 PM, Jeff Guo wrote:
>> The mechanism can initially register the sigbus handler after the device
>> event monitor is enabled. When a sigbus event is captured, it will check
>> the failure address and accordingly handle the memory failure of the
>> corresponding device by invoke the hot-unplug handler. It could prevent
>> the application from crashing when a device is hot-unplugged.
>>
>> By this patch, users could call below new added APIs to enable/disable
>> the device hotplug handle mechanism. Note that it just implement the
>> hot-unplug handler in these functions, the other handler of hotplug, 
>> such
>> as handler for hotplug binding, could be add in the future if need:
>>    - rte_dev_hotplug_handle_enable
>>    - rte_dev_hotplug_handle_disable
>>
>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
>> ---
>
> <snip>
>
>> +static void sigbus_handler(int signum, siginfo_t *info,
>> +                void *ctx __rte_unused)
>> +{
>> +    int ret;
>> +
>> +    RTE_LOG(INFO, EAL, "Thread[%d] catch SIGBUS, fault address:%p\n",
>> +        (int)pthread_self(), info->si_addr);
>> +
>> +    rte_spinlock_lock(&failure_handle_lock);
>> +    ret = rte_bus_sigbus_handler(info->si_addr);
>> +    rte_spinlock_unlock(&failure_handle_lock);
>> +    if (ret == -1) {
>> +        rte_exit(EXIT_FAILURE,
>> +             "Failed to handle SIGBUS for hot-unplug, "
>> +             "(rte_errno: %s)!", strerror(rte_errno));
>
> Do we really want to exit the application on sigbus handle failure?
>

Definitely yes we want, since it is a failure of the process. Agree with 
Konstantin reply on other mail.


>> +    } else if (ret == 1) {
>> +        if (sigbus_action_old.sa_handler)
>> +            (*(sigbus_action_old.sa_handler))(signum);
>> +        else
>> +            rte_exit(EXIT_FAILURE,
>> +                 "Failed to handle generic SIGBUS!");
>> +    }
>> +
>> +    RTE_LOG(INFO, EAL, "Success to handle SIGBUS for hot-unplug!\n");
>
> Again, does this all need to be with INFO log level? IMO it should be 
> DEBUG.
>

I am fine for that.


>> +}
>> +
>> +static int cmp_dev_name(const struct rte_device *dev,
>> +    const void *_name)
>> +{
>> +    const char *name = _name;
>> +
>> +    return strcmp(dev->name, name);
>> +}
>> +
>>   static int
>
> <snip>
>
>>     int __rte_experimental
>> @@ -220,5 +320,67 @@ rte_dev_event_monitor_stop(void)
>>       close(intr_handle.fd);
>>       intr_handle.fd = -1;
>>       monitor_started = false;
>> +
>>       return 0;
>
> This looks like unintended change.
>

No, i intended to change it to consistent with the other format.


>>   }
>> +
>> +int __rte_experimental
>> +rte_dev_sigbus_handler_register(void)
>> +{
>> +    sigset_t mask;
>> +    struct sigaction action;
>> +
>
> <snip>
>
>> --- a/lib/librte_eal/rte_eal_version.map
>> +++ b/lib/librte_eal/rte_eal_version.map
>> @@ -281,6 +281,8 @@ EXPERIMENTAL {
>>       rte_dev_event_callback_unregister;
>>       rte_dev_event_monitor_start;
>>       rte_dev_event_monitor_stop;
>> +    rte_dev_hotplug_handle_enable;
>> +    rte_dev_hotplug_handle_disable;
>
> Nitpicking - disable should be above enable, as E follows D in 
> alphabet :)
>

yes, after recheck with alphabet, it definitely like what you said. :).


>>       rte_dev_iterator_init;
>>       rte_dev_iterator_next;
>>       rte_devargs_add;
>>
>
>

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v12 3/7] bus: add sigbus handler
  2018-10-02 14:32           ` Burakov, Anatoly
@ 2018-10-04  3:14             ` Jeff Guo
  0 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-10-04  3:14 UTC (permalink / raw)
  To: Burakov, Anatoly, stephen, bruce.richardson, ferruh.yigit,
	konstantin.ananyev, gaetan.rivet, jingjing.wu, thomas, motih,
	matan, harry.van.haaren, qi.z.zhang, shaopeng.he,
	bernard.iremonger, arybchenko, wenzhuo.lu, jerin.jacob
  Cc: jblunck, shreyansh.jain, dev, helin.zhang


On 10/2/2018 10:32 PM, Burakov, Anatoly wrote:
> On 02-Oct-18 1:35 PM, Jeff Guo wrote:
>> When a device is hot-unplugged, a sigbus error will occur of the 
>> datapath
>> can still read/write to the device. A handler is required here to 
>> capture
>> the sigbus signal and handle it appropriately.
>>
>> This patch introduces a bus ops to handle sigbus errors. Each bus can
>> implement its own case-dependent logic to handle the sigbus errors.
>>
>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
>> Acked-by: Shaopeng He <shaopeng.he@intel.com>
>> ---
>> v12->v11:
>> no change.
>> ---
>>   lib/librte_eal/common/include/rte_bus.h | 18 ++++++++++++++++++
>>   1 file changed, 18 insertions(+)
>>
>> diff --git a/lib/librte_eal/common/include/rte_bus.h 
>> b/lib/librte_eal/common/include/rte_bus.h
>> index 1bb53dc..201454a 100644
>> --- a/lib/librte_eal/common/include/rte_bus.h
>> +++ b/lib/librte_eal/common/include/rte_bus.h
>> @@ -182,6 +182,21 @@ typedef int (*rte_bus_parse_t)(const char *name, 
>> void *addr);
>>   typedef int (*rte_bus_hot_unplug_handler_t)(struct rte_device *dev);
>>     /**
>> + * Implement a specific sigbus handler, which is responsible for 
>> handling
>> + * the sigbus error which is either original memory error, or 
>> specific memory
>> + * error that caused of device be hot-unplugged. When sigbus error 
>> be captured,
>> + * it could call this function to handle sigbus error.
>> + * @param failure_addr
>> + *    Pointer of the fault address of the sigbus error.
>> + *
>> + * @return
>> + *    0 for success handle the sigbus.
>> + *    1 for no bus handle the sigbus.
>
> I think the comment here should be reworded. I can't parse "no bus 
> handle the sigbus" - what does that mean, and how is it different from 
> an error?
>

ok, let me detail more.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v12 4/7] bus/pci: implement sigbus handler ops
  2018-10-02 14:39           ` Burakov, Anatoly
@ 2018-10-04  3:58             ` Jeff Guo
  0 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-10-04  3:58 UTC (permalink / raw)
  To: Burakov, Anatoly, stephen, bruce.richardson, ferruh.yigit,
	konstantin.ananyev, gaetan.rivet, jingjing.wu, thomas, motih,
	matan, harry.van.haaren, qi.z.zhang, shaopeng.he,
	bernard.iremonger, arybchenko, wenzhuo.lu, jerin.jacob
  Cc: jblunck, shreyansh.jain, dev, helin.zhang


On 10/2/2018 10:39 PM, Burakov, Anatoly wrote:
> On 02-Oct-18 1:35 PM, Jeff Guo wrote:
>> This patch implements the ops for the PCI bus sigbus handler. It 
>> finds the
>> PCI device that is being hot-unplugged and calls the relevant ops of the
>> hot-unplug handler to handle the hot-unplug failure of the device.
>>
>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
>> Acked-by: Shaopeng He <shaopeng.he@intel.com>
>> ---
>> v12->v11:
>> no change.
>> ---
>>   drivers/bus/pci/pci_common.c | 53 
>> ++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 53 insertions(+)
>>
>> diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
>> index d286234..f313fe9 100644
>> --- a/drivers/bus/pci/pci_common.c
>> +++ b/drivers/bus/pci/pci_common.c
>> @@ -405,6 +405,36 @@ pci_find_device(const struct rte_device *start, 
>> rte_dev_cmp_t cmp,
>>       return NULL;
>>   }
>>   +/**
>> + * find the device which encounter the failure, by iterate over all 
>> device on
>> + * PCI bus to check if the memory failure address is located in the 
>> range
>> + * of the BARs of the device.
>> + */
>> +static struct rte_pci_device *
>> +pci_find_device_by_addr(const void *failure_addr)
>> +{
>> +    struct rte_pci_device *pdev = NULL;
>> +    int i;
>> +
>> +    FOREACH_DEVICE_ON_PCIBUS(pdev) {
>> +        for (i = 0; i != RTE_DIM(pdev->mem_resource); i++) {
>> +            if ((uint64_t)(uintptr_t)failure_addr >=
>> + (uint64_t)(uintptr_t)pdev->mem_resource[i].addr &&
>> +                (uint64_t)(uintptr_t)failure_addr <
>> + (uint64_t)(uintptr_t)pdev->mem_resource[i].addr +
>> +                pdev->mem_resource[i].len) {
>
> You must *really* dislike local variables :) Suggested rewriting:
>
> const void *start, *end;
> size_t len;
>
> start = pdev->mem_resource[i].addr;
> len = pdev->mem_resource[i].len;
> end = RTE_PTR_ADD(start, len);
>
> if (failure_addr >= start && failure_addr < end) {
>     ...
> }
>
> I think this is way clearer.
>

good point, local variable might be good and helpful. Thanks.


>> +                RTE_LOG(INFO, EAL, "Failure address "
>> +                    "%16.16"PRIx64" belongs to "
>> +                    "device %s!\n",
>> +                    (uint64_t)(uintptr_t)failure_addr,
>> +                    pdev->device.name);
>
> I feel like this should have DEBUG level, rather than INFO. 
> Alternatively, if you really feel like this should be at level INFO, 
> the message should be reworded because the word "failure" might give 
> the wrong impression :)
>
> (but really, i think this is info useful for debugging purposes but 
> not interesting generally, so it should be under DEBUG IMO)
>

ok, i accept it.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH v13 0/7] hot-unplug failure handle mechanism
  2017-06-29  4:37     ` [PATCH v3 0/2] add uevent api for hot plug Jeff Guo
                         ` (20 preceding siblings ...)
  2018-10-02 12:35       ` [PATCH v12 0/7] " Jeff Guo
@ 2018-10-04  6:30       ` Jeff Guo
  2018-10-04  6:30         ` [PATCH v13 1/7] bus: add hot-unplug handler Jeff Guo
                           ` (7 more replies)
  2018-10-04 14:46       ` [PATCH v14 " Jeff Guo
  2018-10-15 11:27       ` [PATCH v15 0/7] hot-unplug failure handle mechanism Jeff Guo
  23 siblings, 8 replies; 494+ messages in thread
From: Jeff Guo @ 2018-10-04  6:30 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu, anatoly.burakov, jerin.jacob
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

Hotplug is an important feature for use-cases like the datacenter device's
fail-safe and for SRIOV Live Migration in SDN/NFV. It could bring higher
flexibility and continuality to networking services in multiple use-cases
in the industry. So let's see how DPDK can help users implement hotplug
solutions.

We already have a general device-event monitor mechanism, failsafe driver,
and hot plug/unplug API in DPDK. We have already got the solution of
“ethdev event + kernel PMD hotplug handler + failsafe”, but we still not
got “eal event + hotplug handler for pci PMD + failsafe” implement, and we
need to considerate 2 different solutions between uio pci and vfio pci.

In the case of hotplug for igb_uio, when a hardware device be removed
physically or disabled in software, the application needs to be notified
and detach the device out of the bus, and then make the device invalidate.
The problem is that, the removal of the device is not instantaneous in
software. If the application data path tries to read/write to the device
when removal is still in process, it will cause an MMIO error and
application will crash.

In this patch set, we propose a PCIe bus failure handler mechanism for
hot-unplug in igb_uio. It aims to guarantee that, when a hot-unplug occurs,
the application will not crash.

The mechanism should work as below:

First, the application enables the device event monitor, registers the
hotplug event’s callback and enable hotplug handling before running the
data path. Once the hot-unplug occurs, the mechanism will detect the
removal event and then accordingly do the failure handling. In order to
do that, the below functionality will be required:
 - Add a new bus ops “hot_unplug_handler” to handle hot-unplug failure.
 - Implement pci bus specific ops “pci_hot_unplug_handler”. For uio pci,
   it will be based on the failure address to remap memory for the corresponding
   device that unplugged. For vfio pci, could seperate implement case by case.

For the data path or other unexpected behaviors from the control path
when a hot unplug occurs:
 - Add a new bus ops “sigbus_handler”, that is responsible for handling
   the sigbus error which is either an original memory error, or a specific
   memory error that is caused by a hot unplug. When a sigbus error is
   captured, it will call this function to handle sigbus error.
 - Implement PCI bus specific ops “pci_sigbus_handler”. It will iterate all
   device on PCI bus to find which device encounter the failure.
 - Implement a "rte_bus_sigbus_handler" to iterate all buses to find a bus
   to handle the failure.
 - Add a couple of APIs “rte_dev_hotplug_handle_enable” and
   “rte_dev_hotplug_handle_diable” to enable/disable hotplug handling.
   It will monitor the sigbus error by a handler which is per-process.
   Based on the signal event principle, the control path thread and the
   data path thread will randomly receive the sigbus error, but will call the
   common sigbus process. When sigbus be captured, it will call the above API
   to find bus to handle it.

The mechanism could be used by app or PMDs. For example, the whole process
of hotplug in testpmd is:
 - Enable device event monitor->Enable hotplug handle->Register event callback
   ->attach port->start port->start forwarding->Device unplug->failure handle
   ->stop forwarding->stop port->close port->detach port.

This patch set would not cover hotplug insert and binding, and it is only
implement the igb_uio failure handler, the vfio hotplug failure handler
will be in next coming patch set.

patchset history:
v13->v12:
use local variable to rewrite the func to be more readable.
add sa_flag check when invoke generic sigbus handler
modify some typo
delete needless helper in app

v12->v11:
add and delete some checking about sigbus recover.

v11->v10:
change the ops name, since both uio and vfio will use the hot-unplug ops.
since we plan to abandon RTE_ETH_EVENT_INTR_RMV, change to use
RTE_DEV_EVENT_REMOVE, so modify the hotplug event and callback usage.
move the igb_uio fixing part, since it is random issue and should be considarate
as kernel driver defect but not include as this failure handler mechanism.

v10->v9:
modify the api name and exposure out for public use.
add hotplug handle enable/disable APIs
refine commit log

v9->v8:
refine commit log to be more readable.

v8->v7:
refine errno process in sigbus handler.
refine igb uio release process

v7->v6:
delete some unused part

v6->v5:
refine some description about bus ops
refine commit log
add some entry check.

v5->v4:
split patches to focus on the failure handle, remove the event usage
by testpmd to another patch.
change the hotplug failure handler name.
refine the sigbus handle logic.
add lock for udev state in igb uio driver.

v4->v3:
split patches to be small and clear.
change to use new parameter "--hotplug-mode" in testpmd to identify
the eal hotplug and ethdev hotplug.

v3->v2:
change bus ops name to bus_hotplug_handler.
add new API and bus ops of bus_signal_handler distingush handle generic.
sigbus and hotplug sigbus.

v2->v1(v21):
refine some doc and commit log.
fix igb uio kernel issue for control path failure rebase testpmd code.

Since the hot plug solution be discussed serval around in the public,
the scope be changed and the patch set be split into many times. Coming
to the recently RFC and feature design, it just focus on the hot unplug
failure handler at this patch set, so in order let this topic more clear
and focus, summarize privours patch set in history “v1(v21)”, the v2 here
go ahead for further track.

"v1(21)" == v21 as below:
v21->v20:
split function in hot unplug ops.
sync failure hanlde to fix multiple process issue fix attach port issue for multiple devices case.
combind rmv callback function to be only one.

v20->v19:
clean the code.
refine the remap logic for multiple device.
remove the auto binding.

v19->18:
note for limitation of multiple hotplug, fix some typo, sqeeze patch.

v18->v15:
add document, add signal bus handler, refine the code to be more clear.

the prior patch history please check the patch set "add device event monitor framework".

Jeff Guo (7):
  bus: add hot-unplug handler
  bus/pci: implement hot-unplug handler ops
  bus: add sigbus handler
  bus/pci: implement sigbus handler ops
  bus: add helper to handle sigbus
  eal: add failure handle mechanism for hot-unplug
  app/testpmd: use hotplug failure handler

 app/test-pmd/testpmd.c                  |  86 ++++++++--------
 doc/guides/rel_notes/release_18_08.rst  |   5 +
 drivers/bus/pci/pci_common.c            |  82 +++++++++++++++
 drivers/bus/pci/pci_common_uio.c        |  33 +++++++
 drivers/bus/pci/private.h               |  12 +++
 lib/librte_eal/bsdapp/eal/eal_dev.c     |  14 +++
 lib/librte_eal/common/eal_common_bus.c  |  43 ++++++++
 lib/librte_eal/common/eal_private.h     |  39 ++++++++
 lib/librte_eal/common/include/rte_bus.h |  34 +++++++
 lib/librte_eal/common/include/rte_dev.h |  26 +++++
 lib/librte_eal/linuxapp/eal/eal_dev.c   | 170 +++++++++++++++++++++++++++++++-
 lib/librte_eal/rte_eal_version.map      |   2 +
 12 files changed, 499 insertions(+), 47 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH v13 1/7] bus: add hot-unplug handler
  2018-10-04  6:30       ` [PATCH v13 0/7] " Jeff Guo
@ 2018-10-04  6:30         ` Jeff Guo
  2018-10-04  6:30         ` [PATCH v13 2/7] bus/pci: implement hot-unplug handler ops Jeff Guo
                           ` (6 subsequent siblings)
  7 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-10-04  6:30 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu, anatoly.burakov, jerin.jacob
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

A hot-unplug failure and app crash can be caused, when a device is
hot-unplugged but the application still try to access the device
by reading or writing from the BARs, which is already invalid but
still not timely be unmap or released.

This patch introduces bus ops to handle hot-unplug failures. Each
bus can implement its own case-dependent logic to handle the failures.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v13->v12:
no change.
---
 lib/librte_eal/common/include/rte_bus.h | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
index b7b5b08..1bb53dc 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -168,6 +168,20 @@ typedef int (*rte_bus_unplug_t)(struct rte_device *dev);
 typedef int (*rte_bus_parse_t)(const char *name, void *addr);
 
 /**
+ * Implement a specific hot-unplug handler, which is responsible for
+ * handle the failure when device be hot-unplugged. When the event of
+ * hot-unplug be detected, it could call this function to handle
+ * the hot-unplug failure and avoid app crash.
+ * @param dev
+ *	Pointer of the device structure.
+ *
+ * @return
+ *	0 on success.
+ *	!0 on error.
+ */
+typedef int (*rte_bus_hot_unplug_handler_t)(struct rte_device *dev);
+
+/**
  * Bus scan policies
  */
 enum rte_bus_scan_mode {
@@ -212,6 +226,8 @@ struct rte_bus {
 	struct rte_bus_conf conf;    /**< Bus configuration */
 	rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
 	rte_dev_iterate_t dev_iterate; /**< Device iterator. */
+	rte_bus_hot_unplug_handler_t hot_unplug_handler;
+				/**< handle hot-unplug failure on the bus */
 };
 
 /**
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v13 2/7] bus/pci: implement hot-unplug handler ops
  2018-10-04  6:30       ` [PATCH v13 0/7] " Jeff Guo
  2018-10-04  6:30         ` [PATCH v13 1/7] bus: add hot-unplug handler Jeff Guo
@ 2018-10-04  6:30         ` Jeff Guo
  2018-10-04  6:30         ` [PATCH v13 3/7] bus: add sigbus handler Jeff Guo
                           ` (5 subsequent siblings)
  7 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-10-04  6:30 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu, anatoly.burakov, jerin.jacob
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

This patch implements the ops to handle hot-unplug on the PCI bus.
For UIO PCI, it could avoids BARs read/write errors by creating a
new dummy memory to remap the memory where the failure is. For VFIO
or other kernel driver, it could specific implement function to handle
hot-unplug case by case.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Shaopeng He <shaopeng.he@intel.com>
---
v13->v12:
no change.
---
 drivers/bus/pci/pci_common.c     | 28 ++++++++++++++++++++++++++++
 drivers/bus/pci/pci_common_uio.c | 33 +++++++++++++++++++++++++++++++++
 drivers/bus/pci/private.h        | 12 ++++++++++++
 3 files changed, 73 insertions(+)

diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
index 7736b3f..d286234 100644
--- a/drivers/bus/pci/pci_common.c
+++ b/drivers/bus/pci/pci_common.c
@@ -406,6 +406,33 @@ pci_find_device(const struct rte_device *start, rte_dev_cmp_t cmp,
 }
 
 static int
+pci_hot_unplug_handler(struct rte_device *dev)
+{
+	struct rte_pci_device *pdev = NULL;
+	int ret = 0;
+
+	pdev = RTE_DEV_TO_PCI(dev);
+	if (!pdev)
+		return -1;
+
+	switch (pdev->kdrv) {
+	case RTE_KDRV_IGB_UIO:
+	case RTE_KDRV_UIO_GENERIC:
+	case RTE_KDRV_NIC_UIO:
+		/* BARs resource is invalid, remap it to be safe. */
+		ret = pci_uio_remap_resource(pdev);
+		break;
+	default:
+		RTE_LOG(DEBUG, EAL,
+			"Not managed by a supported kernel driver, skipped\n");
+		ret = -1;
+		break;
+	}
+
+	return ret;
+}
+
+static int
 pci_plug(struct rte_device *dev)
 {
 	return pci_probe_all_drivers(RTE_DEV_TO_PCI(dev));
@@ -435,6 +462,7 @@ struct rte_pci_bus rte_pci_bus = {
 		.unplug = pci_unplug,
 		.parse = pci_parse,
 		.get_iommu_class = rte_pci_get_iommu_class,
+		.hot_unplug_handler = pci_hot_unplug_handler,
 	},
 	.device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
 	.driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
diff --git a/drivers/bus/pci/pci_common_uio.c b/drivers/bus/pci/pci_common_uio.c
index 54bc20b..7ea73db 100644
--- a/drivers/bus/pci/pci_common_uio.c
+++ b/drivers/bus/pci/pci_common_uio.c
@@ -146,6 +146,39 @@ pci_uio_unmap(struct mapped_pci_resource *uio_res)
 	}
 }
 
+/* remap the PCI resource of a PCI device in anonymous virtual memory */
+int
+pci_uio_remap_resource(struct rte_pci_device *dev)
+{
+	int i;
+	void *map_address;
+
+	if (dev == NULL)
+		return -1;
+
+	/* Remap all BARs */
+	for (i = 0; i != PCI_MAX_RESOURCE; i++) {
+		/* skip empty BAR */
+		if (dev->mem_resource[i].phys_addr == 0)
+			continue;
+		map_address = mmap(dev->mem_resource[i].addr,
+				(size_t)dev->mem_resource[i].len,
+				PROT_READ | PROT_WRITE,
+				MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
+		if (map_address == MAP_FAILED) {
+			RTE_LOG(ERR, EAL,
+				"Cannot remap resource for device %s\n",
+				dev->name);
+			return -1;
+		}
+		RTE_LOG(INFO, EAL,
+			"Successful remap resource for device %s\n",
+			dev->name);
+	}
+
+	return 0;
+}
+
 static struct mapped_pci_resource *
 pci_uio_find_resource(struct rte_pci_device *dev)
 {
diff --git a/drivers/bus/pci/private.h b/drivers/bus/pci/private.h
index 8ddd03e..6b312e5 100644
--- a/drivers/bus/pci/private.h
+++ b/drivers/bus/pci/private.h
@@ -123,6 +123,18 @@ void pci_uio_free_resource(struct rte_pci_device *dev,
 		struct mapped_pci_resource *uio_res);
 
 /**
+ * Remap the PCI resource of a PCI device in anonymous virtual memory.
+ *
+ * @param dev
+ *   Point to the struct rte pci device.
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+pci_uio_remap_resource(struct rte_pci_device *dev);
+
+/**
  * Map device memory to uio resource
  *
  * This function is private to EAL.
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v13 3/7] bus: add sigbus handler
  2018-10-04  6:30       ` [PATCH v13 0/7] " Jeff Guo
  2018-10-04  6:30         ` [PATCH v13 1/7] bus: add hot-unplug handler Jeff Guo
  2018-10-04  6:30         ` [PATCH v13 2/7] bus/pci: implement hot-unplug handler ops Jeff Guo
@ 2018-10-04  6:30         ` Jeff Guo
  2018-10-04  6:30         ` [PATCH v13 4/7] bus/pci: implement sigbus handler ops Jeff Guo
                           ` (4 subsequent siblings)
  7 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-10-04  6:30 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu, anatoly.burakov, jerin.jacob
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

When a device is hot-unplugged, a sigbus error will occur of the datapath
can still read/write to the device. A handler is required here to capture
the sigbus signal and handle it appropriately.

This patch introduces a bus ops to handle sigbus errors. Each bus can
implement its own case-dependent logic to handle the sigbus errors.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Shaopeng He <shaopeng.he@intel.com>
---
v13->v12:
reword the ops comment.
---
 lib/librte_eal/common/include/rte_bus.h | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
index 1bb53dc..6be4b5c 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -182,6 +182,21 @@ typedef int (*rte_bus_parse_t)(const char *name, void *addr);
 typedef int (*rte_bus_hot_unplug_handler_t)(struct rte_device *dev);
 
 /**
+ * Implement a specific sigbus handler, which is responsible for handling
+ * the sigbus error which is either original memory error, or specific memory
+ * error that caused of device be hot-unplugged. When sigbus error be captured,
+ * it could call this function to handle sigbus error.
+ * @param failure_addr
+ *	Pointer of the fault address of the sigbus error.
+ *
+ * @return
+ *	0 for success handle the sigbus for hot-unplug.
+ *	1 for not process it, because it is a generic sigbus error.
+ *	-1 for failed to handle the sigbus for hot-unplug.
+ */
+typedef int (*rte_bus_sigbus_handler_t)(const void *failure_addr);
+
+/**
  * Bus scan policies
  */
 enum rte_bus_scan_mode {
@@ -228,6 +243,9 @@ struct rte_bus {
 	rte_dev_iterate_t dev_iterate; /**< Device iterator. */
 	rte_bus_hot_unplug_handler_t hot_unplug_handler;
 				/**< handle hot-unplug failure on the bus */
+	rte_bus_sigbus_handler_t sigbus_handler;
+					/**< handle sigbus error on the bus */
+
 };
 
 /**
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v13 4/7] bus/pci: implement sigbus handler ops
  2018-10-04  6:30       ` [PATCH v13 0/7] " Jeff Guo
                           ` (2 preceding siblings ...)
  2018-10-04  6:30         ` [PATCH v13 3/7] bus: add sigbus handler Jeff Guo
@ 2018-10-04  6:30         ` Jeff Guo
  2018-10-04  6:30         ` [PATCH v13 5/7] bus: add helper to handle sigbus Jeff Guo
                           ` (3 subsequent siblings)
  7 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-10-04  6:30 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu, anatoly.burakov, jerin.jacob
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

This patch implements the ops for the PCI bus sigbus handler. It finds the
PCI device that is being hot-unplugged and calls the relevant ops of the
hot-unplug handler to handle the hot-unplug failure of the device.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Shaopeng He <shaopeng.he@intel.com>
---
v13->v12:
use local variable to rewrite the func to be more readable.
---
 drivers/bus/pci/pci_common.c | 54 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 54 insertions(+)

diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
index d286234..b71f63a 100644
--- a/drivers/bus/pci/pci_common.c
+++ b/drivers/bus/pci/pci_common.c
@@ -405,6 +405,37 @@ pci_find_device(const struct rte_device *start, rte_dev_cmp_t cmp,
 	return NULL;
 }
 
+/*
+ * find the device which encounter the failure, by iterate over all device on
+ * PCI bus to check if the memory failure address is located in the range
+ * of the BARs of the device.
+ */
+static struct rte_pci_device *
+pci_find_device_by_addr(const void *failure_addr)
+{
+	struct rte_pci_device *pdev = NULL;
+	uint64_t check_point, start, end, len;
+	int i;
+
+	check_point = (uint64_t)(uintptr_t)failure_addr;
+
+	FOREACH_DEVICE_ON_PCIBUS(pdev) {
+		start = (uint64_t)(uintptr_t)pdev->mem_resource[i].addr;
+		len = pdev->mem_resource[i].len;
+		end = (uint64_t)(uintptr_t)RTE_PTR_ADD(start, len);
+
+		for (i = 0; i != RTE_DIM(pdev->mem_resource); i++) {
+			if (check_point >= start && check_point < end) {
+				RTE_LOG(DEBUG, EAL,
+					"Failure address %16.16"PRIx64" belongs to device %s!\n",
+					check_point, pdev->device.name);
+				return pdev;
+			}
+		}
+	}
+	return NULL;
+}
+
 static int
 pci_hot_unplug_handler(struct rte_device *dev)
 {
@@ -433,6 +464,28 @@ pci_hot_unplug_handler(struct rte_device *dev)
 }
 
 static int
+pci_sigbus_handler(const void *failure_addr)
+{
+	struct rte_pci_device *pdev = NULL;
+	int ret = 0;
+
+	pdev = pci_find_device_by_addr(failure_addr);
+	if (!pdev) {
+		/* It is a generic sigbus error, no bus would handle it. */
+		ret = 1;
+	} else {
+		/* The sigbus error is caused of hot-unplug. */
+		ret = pci_hot_unplug_handler(&pdev->device);
+		if (ret) {
+			RTE_LOG(ERR, EAL, "Failed to handle hot-unplug for "
+				"device %s", pdev->name);
+			ret = -1;
+		}
+	}
+	return ret;
+}
+
+static int
 pci_plug(struct rte_device *dev)
 {
 	return pci_probe_all_drivers(RTE_DEV_TO_PCI(dev));
@@ -463,6 +516,7 @@ struct rte_pci_bus rte_pci_bus = {
 		.parse = pci_parse,
 		.get_iommu_class = rte_pci_get_iommu_class,
 		.hot_unplug_handler = pci_hot_unplug_handler,
+		.sigbus_handler = pci_sigbus_handler,
 	},
 	.device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
 	.driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v13 5/7] bus: add helper to handle sigbus
  2018-10-04  6:30       ` [PATCH v13 0/7] " Jeff Guo
                           ` (3 preceding siblings ...)
  2018-10-04  6:30         ` [PATCH v13 4/7] bus/pci: implement sigbus handler ops Jeff Guo
@ 2018-10-04  6:30         ` Jeff Guo
  2018-10-04  6:30         ` [PATCH v13 6/7] eal: add failure handle mechanism for hot-unplug Jeff Guo
                           ` (2 subsequent siblings)
  7 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-10-04  6:30 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu, anatoly.burakov, jerin.jacob
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

This patch aims to add a helper to iterate over all buses to find the
relevant bus to handle the sigbus error.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Shaopeng He <shaopeng.he@intel.com>
---
v13->v12:
no change.
---
 lib/librte_eal/common/eal_common_bus.c | 43 ++++++++++++++++++++++++++++++++++
 lib/librte_eal/common/eal_private.h    | 13 ++++++++++
 2 files changed, 56 insertions(+)

diff --git a/lib/librte_eal/common/eal_common_bus.c b/lib/librte_eal/common/eal_common_bus.c
index 0943851..62b7318 100644
--- a/lib/librte_eal/common/eal_common_bus.c
+++ b/lib/librte_eal/common/eal_common_bus.c
@@ -37,6 +37,7 @@
 #include <rte_bus.h>
 #include <rte_debug.h>
 #include <rte_string_fns.h>
+#include <rte_errno.h>
 
 #include "eal_private.h"
 
@@ -242,3 +243,45 @@ rte_bus_get_iommu_class(void)
 	}
 	return mode;
 }
+
+static int
+bus_handle_sigbus(const struct rte_bus *bus,
+			const void *failure_addr)
+{
+	int ret;
+
+	if (!bus->sigbus_handler)
+		return -1;
+
+	ret = bus->sigbus_handler(failure_addr);
+
+	/* find bus but handle failed, keep the errno be set. */
+	if (ret < 0 && rte_errno == 0)
+		rte_errno = ENOTSUP;
+
+	return ret > 0;
+}
+
+int
+rte_bus_sigbus_handler(const void *failure_addr)
+{
+	struct rte_bus *bus;
+
+	int ret = 0;
+	int old_errno = rte_errno;
+
+	rte_errno = 0;
+
+	bus = rte_bus_find(NULL, bus_handle_sigbus, failure_addr);
+	/* can not find bus. */
+	if (!bus)
+		return 1;
+	/* find bus but handle failed, pass on the new errno. */
+	else if (rte_errno != 0)
+		return -1;
+
+	/* restore the old errno. */
+	rte_errno = old_errno;
+
+	return ret;
+}
diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h
index 4f809a8..a2d1528 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -304,4 +304,17 @@ int
 rte_devargs_layers_parse(struct rte_devargs *devargs,
 			 const char *devstr);
 
+/**
+ * Iterate over all buses to find the corresponding bus to handle the sigbus
+ * error.
+ * @param failure_addr
+ *	Pointer of the fault address of the sigbus error.
+ *
+ * @return
+ *	 0 success to handle the sigbus.
+ *	-1 failed to handle the sigbus
+ *	 1 no bus can handler the sigbus
+ */
+int rte_bus_sigbus_handler(const void *failure_addr);
+
 #endif /* _EAL_PRIVATE_H_ */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v13 6/7] eal: add failure handle mechanism for hot-unplug
  2018-10-04  6:30       ` [PATCH v13 0/7] " Jeff Guo
                           ` (4 preceding siblings ...)
  2018-10-04  6:30         ` [PATCH v13 5/7] bus: add helper to handle sigbus Jeff Guo
@ 2018-10-04  6:30         ` Jeff Guo
  2018-10-04  6:30         ` [PATCH v13 7/7] app/testpmd: use hotplug failure handler Jeff Guo
  2018-10-04 12:02         ` [PATCH v13 0/7] hot-unplug failure handle mechanism Ananyev, Konstantin
  7 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-10-04  6:30 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu, anatoly.burakov, jerin.jacob
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

The mechanism can initially register the sigbus handler after the device
event monitor is enabled. When a sigbus event is captured, it will check
the failure address and accordingly handle the memory failure of the
corresponding device by invoke the hot-unplug handler. It could prevent
the application from crashing when a device is hot-unplugged.

By this patch, users could call below new added APIs to enable/disable
the device hotplug handle mechanism. Note that it just implement the
hot-unplug handler in these functions, the other handler of hotplug, such
as handler for hotplug binding, could be add in the future if need:
  - rte_dev_hotplug_handle_enable
  - rte_dev_hotplug_handle_disable

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v13->v12:
add sa_flags check when invoke generic sigbus handler
modify some typo
---
 doc/guides/rel_notes/release_18_08.rst  |   5 +
 lib/librte_eal/bsdapp/eal/eal_dev.c     |  14 +++
 lib/librte_eal/common/eal_private.h     |  26 +++++
 lib/librte_eal/common/include/rte_dev.h |  26 +++++
 lib/librte_eal/linuxapp/eal/eal_dev.c   | 170 +++++++++++++++++++++++++++++++-
 lib/librte_eal/rte_eal_version.map      |   2 +
 6 files changed, 242 insertions(+), 1 deletion(-)

diff --git a/doc/guides/rel_notes/release_18_08.rst b/doc/guides/rel_notes/release_18_08.rst
index 321fa84..fe0e60f 100644
--- a/doc/guides/rel_notes/release_18_08.rst
+++ b/doc/guides/rel_notes/release_18_08.rst
@@ -117,6 +117,11 @@ New Features
 
   Added support for chained mbufs (input and output).
 
+* **Added hot-unplug handle mechanism.**
+
+  ``rte_dev_hotplug_handle_enable`` and ``rte_dev_hotplug_handle_disable`` are
+  for enabling or disabling hotplug handle mechanism.
+
 
 API Changes
 -----------
diff --git a/lib/librte_eal/bsdapp/eal/eal_dev.c b/lib/librte_eal/bsdapp/eal/eal_dev.c
index 1c6c51b..255d611 100644
--- a/lib/librte_eal/bsdapp/eal/eal_dev.c
+++ b/lib/librte_eal/bsdapp/eal/eal_dev.c
@@ -19,3 +19,17 @@ rte_dev_event_monitor_stop(void)
 	RTE_LOG(ERR, EAL, "Device event is not supported for FreeBSD\n");
 	return -1;
 }
+
+int __rte_experimental
+rte_dev_hotplug_handle_enable(void)
+{
+	RTE_LOG(ERR, EAL, "Device event is not supported for FreeBSD\n");
+	return -1;
+}
+
+int __rte_experimental
+rte_dev_hotplug_handle_disable(void)
+{
+	RTE_LOG(ERR, EAL, "Device event is not supported for FreeBSD\n");
+	return -1;
+}
diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h
index a2d1528..637f20d 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -317,4 +317,30 @@ rte_devargs_layers_parse(struct rte_devargs *devargs,
  */
 int rte_bus_sigbus_handler(const void *failure_addr);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Register the sigbus handler.
+ *
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+rte_dev_sigbus_handler_register(void);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Unregister the sigbus handler.
+ *
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+rte_dev_sigbus_handler_unregister(void);
+
 #endif /* _EAL_PRIVATE_H_ */
diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h
index b80a805..ff580a0 100644
--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -460,4 +460,30 @@ rte_dev_event_monitor_start(void);
 int __rte_experimental
 rte_dev_event_monitor_stop(void);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Enable hotplug handling for devices.
+ *
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int __rte_experimental
+rte_dev_hotplug_handle_enable(void);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Disable hotplug handling for devices.
+ *
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int __rte_experimental
+rte_dev_hotplug_handle_disable(void);
+
 #endif /* _RTE_DEV_H_ */
diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
index 1cf6aeb..4695fcb 100644
--- a/lib/librte_eal/linuxapp/eal/eal_dev.c
+++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
@@ -4,6 +4,8 @@
 
 #include <string.h>
 #include <unistd.h>
+#include <fcntl.h>
+#include <signal.h>
 #include <sys/socket.h>
 #include <linux/netlink.h>
 
@@ -14,15 +16,32 @@
 #include <rte_malloc.h>
 #include <rte_interrupts.h>
 #include <rte_alarm.h>
+#include <rte_bus.h>
+#include <rte_eal.h>
+#include <rte_spinlock.h>
+#include <rte_errno.h>
 
 #include "eal_private.h"
 
 static struct rte_intr_handle intr_handle = {.fd = -1 };
 static bool monitor_started;
+static bool hotplug_handle;
 
 #define EAL_UEV_MSG_LEN 4096
 #define EAL_UEV_MSG_ELEM_LEN 128
 
+/*
+ * spinlock for device hot-unplug failure handling. If it try to access bus or
+ * device, such as handle sigbus on bus or handle memory failure for device
+ * just need to use this lock. It could protect the bus and the device to avoid
+ * race condition.
+ */
+static rte_spinlock_t failure_handle_lock = RTE_SPINLOCK_INITIALIZER;
+
+static struct sigaction sigbus_action_old;
+
+static int sigbus_need_recover;
+
 static void dev_uev_handler(__rte_unused void *param);
 
 /* identify the system layer which reports this event. */
@@ -33,6 +52,55 @@ enum eal_dev_event_subsystem {
 	EAL_DEV_EVENT_SUBSYSTEM_MAX
 };
 
+static void
+sigbus_action_recover(void)
+{
+	if (sigbus_need_recover) {
+		sigaction(SIGBUS, &sigbus_action_old, NULL);
+		sigbus_need_recover = 0;
+	}
+}
+
+static void sigbus_handler(int signum, siginfo_t *info,
+				void *ctx __rte_unused)
+{
+	int ret;
+
+	RTE_LOG(DEBUG, EAL, "Thread[%d] catch SIGBUS, fault address:%p\n",
+		(int)pthread_self(), info->si_addr);
+
+	rte_spinlock_lock(&failure_handle_lock);
+	ret = rte_bus_sigbus_handler(info->si_addr);
+	rte_spinlock_unlock(&failure_handle_lock);
+	if (ret == -1) {
+		rte_exit(EXIT_FAILURE,
+			 "Failed to handle SIGBUS for hot-unplug, "
+			 "(rte_errno: %s)!", strerror(rte_errno));
+	} else if (ret == 1) {
+		if (sigbus_action_old.sa_flags == SA_SIGINFO
+		    && sigbus_action_old.sa_sigaction) {
+			(*(sigbus_action_old.sa_sigaction))(signum,
+							    info, ctx);
+		} else if (sigbus_action_old.sa_flags != SA_SIGINFO
+			   && sigbus_action_old.sa_handler) {
+			(*(sigbus_action_old.sa_handler))(signum);
+		} else {
+			rte_exit(EXIT_FAILURE,
+				 "Failed to handle generic SIGBUS!");
+		}
+	}
+
+	RTE_LOG(DEBUG, EAL, "Success to handle SIGBUS for hot-unplug!\n");
+}
+
+static int cmp_dev_name(const struct rte_device *dev,
+	const void *_name)
+{
+	const char *name = _name;
+
+	return strcmp(dev->name, name);
+}
+
 static int
 dev_uev_socket_fd_create(void)
 {
@@ -147,6 +215,9 @@ dev_uev_handler(__rte_unused void *param)
 	struct rte_dev_event uevent;
 	int ret;
 	char buf[EAL_UEV_MSG_LEN];
+	struct rte_bus *bus;
+	struct rte_device *dev;
+	const char *busname = "";
 
 	memset(&uevent, 0, sizeof(struct rte_dev_event));
 	memset(buf, 0, EAL_UEV_MSG_LEN);
@@ -171,8 +242,43 @@ dev_uev_handler(__rte_unused void *param)
 	RTE_LOG(DEBUG, EAL, "receive uevent(name:%s, type:%d, subsystem:%d)\n",
 		uevent.devname, uevent.type, uevent.subsystem);
 
-	if (uevent.devname)
+	switch (uevent.subsystem) {
+	case EAL_DEV_EVENT_SUBSYSTEM_PCI:
+	case EAL_DEV_EVENT_SUBSYSTEM_UIO:
+		busname = "pci";
+		break;
+	default:
+		break;
+	}
+
+	if (uevent.devname) {
+		if (uevent.type == RTE_DEV_EVENT_REMOVE && hotplug_handle) {
+			rte_spinlock_lock(&failure_handle_lock);
+			bus = rte_bus_find_by_name(busname);
+			if (bus == NULL) {
+				RTE_LOG(ERR, EAL, "Cannot find bus (%s)\n",
+					busname);
+				return;
+			}
+
+			dev = bus->find_device(NULL, cmp_dev_name,
+					       uevent.devname);
+			if (dev == NULL) {
+				RTE_LOG(ERR, EAL, "Cannot find device (%s) on "
+					"bus (%s)\n", uevent.devname, busname);
+				return;
+			}
+
+			ret = bus->hot_unplug_handler(dev);
+			rte_spinlock_unlock(&failure_handle_lock);
+			if (ret) {
+				RTE_LOG(ERR, EAL, "Can not handle hot-unplug "
+					"for device (%s)\n", dev->name);
+				return;
+			}
+		}
 		dev_callback_process(uevent.devname, uevent.type);
+	}
 }
 
 int __rte_experimental
@@ -220,5 +326,67 @@ rte_dev_event_monitor_stop(void)
 	close(intr_handle.fd);
 	intr_handle.fd = -1;
 	monitor_started = false;
+
 	return 0;
 }
+
+int __rte_experimental
+rte_dev_sigbus_handler_register(void)
+{
+	sigset_t mask;
+	struct sigaction action;
+
+	rte_errno = 0;
+
+	if (sigbus_need_recover)
+		return 0;
+
+	sigemptyset(&mask);
+	sigaddset(&mask, SIGBUS);
+	action.sa_flags = SA_SIGINFO;
+	action.sa_mask = mask;
+	action.sa_sigaction = sigbus_handler;
+	sigbus_need_recover = !sigaction(SIGBUS, &action, &sigbus_action_old);
+
+	return rte_errno;
+}
+
+int __rte_experimental
+rte_dev_sigbus_handler_unregister(void)
+{
+	rte_errno = 0;
+
+	sigbus_action_recover();
+
+	return rte_errno;
+}
+
+int __rte_experimental
+rte_dev_hotplug_handle_enable(void)
+{
+	int ret = 0;
+
+	ret = rte_dev_sigbus_handler_register();
+	if (ret < 0)
+		RTE_LOG(ERR, EAL,
+			"fail to register sigbus handler for devices.\n");
+
+	hotplug_handle = true;
+
+	return ret;
+}
+
+int __rte_experimental
+rte_dev_hotplug_handle_disable(void)
+{
+	int ret = 0;
+
+	ret = rte_dev_sigbus_handler_unregister();
+	if (ret < 0)
+		RTE_LOG(ERR, EAL,
+			"fail to unregister sigbus handler for devices.\n");
+
+	hotplug_handle = false;
+
+	return ret;
+}
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 73282bb..b167b8f 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -281,6 +281,8 @@ EXPERIMENTAL {
 	rte_dev_event_callback_unregister;
 	rte_dev_event_monitor_start;
 	rte_dev_event_monitor_stop;
+	rte_dev_hotplug_handle_disable;
+	rte_dev_hotplug_handle_enable;
 	rte_dev_iterator_init;
 	rte_dev_iterator_next;
 	rte_devargs_add;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v13 7/7] app/testpmd: use hotplug failure handler
  2018-10-04  6:30       ` [PATCH v13 0/7] " Jeff Guo
                           ` (5 preceding siblings ...)
  2018-10-04  6:30         ` [PATCH v13 6/7] eal: add failure handle mechanism for hot-unplug Jeff Guo
@ 2018-10-04  6:30         ` Jeff Guo
  2018-10-04 10:31           ` Iremonger, Bernard
  2018-10-04 12:02         ` [PATCH v13 0/7] hot-unplug failure handle mechanism Ananyev, Konstantin
  7 siblings, 1 reply; 494+ messages in thread
From: Jeff Guo @ 2018-10-04  6:30 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu, anatoly.burakov, jerin.jacob
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

This patch use testpmd for example, to show how an app smoothly handle
failure when device be hot-unplug. Except that app should enabled the
device event monitor and register the hotplug event’s callback, it also
need enable hotplug handle mechanism before running. Once app detect the
removal event, the hot-unplug callback would be called. It will first stop
the packet forwarding, then stop the port, close the port, and finally
detach the port to clean the device and release the resources.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
v13->v12:
delete needless helper in app.
---
 app/test-pmd/testpmd.c | 86 +++++++++++++++++++++++---------------------------
 1 file changed, 40 insertions(+), 46 deletions(-)

diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 001f0e5..f3f8e44 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -434,9 +434,6 @@ static int eth_event_callback(portid_t port_id,
 static void eth_dev_event_callback(char *device_name,
 				enum rte_dev_event_type type,
 				void *param);
-static int eth_dev_event_callback_register(void);
-static int eth_dev_event_callback_unregister(void);
-
 
 /*
  * Check if all the ports are started.
@@ -1954,39 +1951,6 @@ reset_port(portid_t pid)
 	printf("Done\n");
 }
 
-static int
-eth_dev_event_callback_register(void)
-{
-	int ret;
-
-	/* register the device event callback */
-	ret = rte_dev_event_callback_register(NULL,
-		eth_dev_event_callback, NULL);
-	if (ret) {
-		printf("Failed to register device event callback\n");
-		return -1;
-	}
-
-	return 0;
-}
-
-
-static int
-eth_dev_event_callback_unregister(void)
-{
-	int ret;
-
-	/* unregister the device event callback */
-	ret = rte_dev_event_callback_unregister(NULL,
-		eth_dev_event_callback, NULL);
-	if (ret < 0) {
-		printf("Failed to unregister device event callback\n");
-		return -1;
-	}
-
-	return 0;
-}
-
 void
 attach_port(char *identifier)
 {
@@ -2093,14 +2057,25 @@ pmd_test_exit(void)
 
 	if (hot_plug) {
 		ret = rte_dev_event_monitor_stop();
-		if (ret)
+		if (ret) {
 			RTE_LOG(ERR, EAL,
 				"fail to stop device event monitor.");
+			return;
+		}
 
-		ret = eth_dev_event_callback_unregister();
-		if (ret)
+		ret = rte_dev_event_callback_unregister(NULL,
+			eth_dev_event_callback, NULL);
+		if (ret < 0) {
+			printf("fail to unregister device event callback.\n");
+			return;
+		}
+
+		ret = rte_dev_hotplug_handle_disable();
+		if (ret) {
 			RTE_LOG(ERR, EAL,
-				"fail to unregister all event callbacks.");
+				"fail to disable hotplug handling.\n");
+			return;
+		}
 	}
 
 	printf("\nBye...\n");
@@ -2244,6 +2219,9 @@ static void
 eth_dev_event_callback(char *device_name, enum rte_dev_event_type type,
 			     __rte_unused void *arg)
 {
+	uint16_t port_id;
+	int ret;
+
 	if (type >= RTE_DEV_EVENT_MAX) {
 		fprintf(stderr, "%s called upon invalid event %d\n",
 			__func__, type);
@@ -2254,9 +2232,13 @@ eth_dev_event_callback(char *device_name, enum rte_dev_event_type type,
 	case RTE_DEV_EVENT_REMOVE:
 		RTE_LOG(ERR, EAL, "The device: %s has been removed!\n",
 			device_name);
-		/* TODO: After finish failure handle, begin to stop
-		 * packet forward, stop port, close port, detach port.
-		 */
+		ret = rte_eth_dev_get_port_by_name(device_name, &port_id);
+		if (ret) {
+			RTE_LOG(ERR, EAL, "can not get port by device %s!\n",
+				device_name);
+			return;
+		}
+		rmv_event_callback((void *)(intptr_t)port_id);
 		break;
 	case RTE_DEV_EVENT_ADD:
 		RTE_LOG(ERR, EAL, "The device: %s has been added!\n",
@@ -2779,14 +2761,26 @@ main(int argc, char** argv)
 	init_config();
 
 	if (hot_plug) {
-		/* enable hot plug monitoring */
+		ret = rte_dev_hotplug_handle_enable();
+		if (ret) {
+			RTE_LOG(ERR, EAL,
+				"fail to enable hotplug handling.");
+			return -1;
+		}
+
 		ret = rte_dev_event_monitor_start();
 		if (ret) {
-			rte_errno = EINVAL;
+			RTE_LOG(ERR, EAL,
+				"fail to start device event monitoring.");
 			return -1;
 		}
-		eth_dev_event_callback_register();
 
+		ret = rte_dev_event_callback_register(NULL,
+			eth_dev_event_callback, NULL);
+		if (ret) {
+			printf("faile to register device event callback\n");
+			return -1;
+		}
 	}
 
 	if (start_port(RTE_PORT_ALL) != 0)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* Re: [PATCH v13 7/7] app/testpmd: use hotplug failure handler
  2018-10-04  6:30         ` [PATCH v13 7/7] app/testpmd: use hotplug failure handler Jeff Guo
@ 2018-10-04 10:31           ` Iremonger, Bernard
  2018-10-04 13:53             ` Jeff Guo
  0 siblings, 1 reply; 494+ messages in thread
From: Iremonger, Bernard @ 2018-10-04 10:31 UTC (permalink / raw)
  To: Guo, Jia, stephen, Richardson, Bruce, Yigit, Ferruh, Ananyev,
	Konstantin, gaetan.rivet, Wu, Jingjing, thomas, motih, matan,
	Van Haaren, Harry, Zhang, Qi Z, He, Shaopeng, arybchenko, Lu,
	Wenzhuo, Burakov, Anatoly, jerin.jacob
  Cc: jblunck, shreyansh.jain, dev, Zhang, Helin

Hi Jeff,

> -----Original Message-----
> From: Guo, Jia
> Sent: Thursday, October 4, 2018 7:31 AM
> To: stephen@networkplumber.org; Richardson, Bruce
> <bruce.richardson@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com>; Ananyev,
> Konstantin <konstantin.ananyev@intel.com>; gaetan.rivet@6wind.com; Wu,
> Jingjing <jingjing.wu@intel.com>; thomas@monjalon.net;
> motih@mellanox.com; matan@mellanox.com; Van Haaren, Harry
> <harry.van.haaren@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>; He,
> Shaopeng <shaopeng.he@intel.com>; Iremonger, Bernard
> <bernard.iremonger@intel.com>; arybchenko@solarflare.com; Lu, Wenzhuo
> <wenzhuo.lu@intel.com>; Burakov, Anatoly <anatoly.burakov@intel.com>;
> jerin.jacob@caviumnetworks.com
> Cc: jblunck@infradead.org; shreyansh.jain@nxp.com; dev@dpdk.org; Guo, Jia
> <jia.guo@intel.com>; Zhang, Helin <helin.zhang@intel.com>
> Subject: [PATCH v13 7/7] app/testpmd: use hotplug failure handler
> 
> This patch use testpmd for example, to show how an app smoothly handle
> failure when device be hot-unplug. Except that app should enabled the device
> event monitor and register the hotplug event’s callback, it also need enable
> hotplug handle mechanism before running. Once app detect the removal event,
> the hot-unplug callback would be called. It will first stop the packet forwarding,
> then stop the port, close the port, and finally detach the port to clean the device
> and release the resources.
> 
> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> ---
> v13->v12:
> delete needless helper in app.
> ---
>  app/test-pmd/testpmd.c | 86 +++++++++++++++++++++++--------------------------
> -
>  1 file changed, 40 insertions(+), 46 deletions(-)
> 
> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
> 001f0e5..f3f8e44 100644
> --- a/app/test-pmd/testpmd.c
> +++ b/app/test-pmd/testpmd.c
> @@ -434,9 +434,6 @@ static int eth_event_callback(portid_t port_id,  static
> void eth_dev_event_callback(char *device_name,
>  				enum rte_dev_event_type type,
>  				void *param);
> -static int eth_dev_event_callback_register(void);
> -static int eth_dev_event_callback_unregister(void);
> -
> 
>  /*
>   * Check if all the ports are started.
> @@ -1954,39 +1951,6 @@ reset_port(portid_t pid)
>  	printf("Done\n");
>  }
> 
> -static int
> -eth_dev_event_callback_register(void)
> -{
> -	int ret;
> -
> -	/* register the device event callback */
> -	ret = rte_dev_event_callback_register(NULL,
> -		eth_dev_event_callback, NULL);
> -	if (ret) {
> -		printf("Failed to register device event callback\n");
> -		return -1;
> -	}
> -
> -	return 0;
> -}
> -
> -
> -static int
> -eth_dev_event_callback_unregister(void)
> -{
> -	int ret;
> -
> -	/* unregister the device event callback */
> -	ret = rte_dev_event_callback_unregister(NULL,
> -		eth_dev_event_callback, NULL);
> -	if (ret < 0) {
> -		printf("Failed to unregister device event callback\n");
> -		return -1;
> -	}
> -
> -	return 0;
> -}
> -
>  void
>  attach_port(char *identifier)
>  {
> @@ -2093,14 +2057,25 @@ pmd_test_exit(void)
> 
>  	if (hot_plug) {
>  		ret = rte_dev_event_monitor_stop();
> -		if (ret)
> +		if (ret) {
>  			RTE_LOG(ERR, EAL,
>  				"fail to stop device event monitor.");
> +			return;
> +		}
> 
> -		ret = eth_dev_event_callback_unregister();
> -		if (ret)
> +		ret = rte_dev_event_callback_unregister(NULL,
> +			eth_dev_event_callback, NULL);
> +		if (ret < 0) {
> +			printf("fail to unregister device event callback.\n");

Better to use RTE_LOG()  instead of printf().

> +			return;
> +		}
> +
> +		ret = rte_dev_hotplug_handle_disable();
> +		if (ret) {
>  			RTE_LOG(ERR, EAL,
> -				"fail to unregister all event callbacks.");
> +				"fail to disable hotplug handling.\n");
> +			return;
> +		}
>  	}
> 
>  	printf("\nBye...\n");
> @@ -2244,6 +2219,9 @@ static void
>  eth_dev_event_callback(char *device_name, enum rte_dev_event_type type,
>  			     __rte_unused void *arg)
>  {
> +	uint16_t port_id;
> +	int ret;
> +
>  	if (type >= RTE_DEV_EVENT_MAX) {
>  		fprintf(stderr, "%s called upon invalid event %d\n",
>  			__func__, type);
> @@ -2254,9 +2232,13 @@ eth_dev_event_callback(char *device_name, enum
> rte_dev_event_type type,
>  	case RTE_DEV_EVENT_REMOVE:
>  		RTE_LOG(ERR, EAL, "The device: %s has been removed!\n",
>  			device_name);
> -		/* TODO: After finish failure handle, begin to stop
> -		 * packet forward, stop port, close port, detach port.
> -		 */
> +		ret = rte_eth_dev_get_port_by_name(device_name, &port_id);
> +		if (ret) {
> +			RTE_LOG(ERR, EAL, "can not get port by device %s!\n",
> +				device_name);
> +			return;
> +		}
> +		rmv_event_callback((void *)(intptr_t)port_id);
>  		break;
>  	case RTE_DEV_EVENT_ADD:
>  		RTE_LOG(ERR, EAL, "The device: %s has been added!\n", @@ -
> 2779,14 +2761,26 @@ main(int argc, char** argv)
>  	init_config();
> 
>  	if (hot_plug) {
> -		/* enable hot plug monitoring */
> +		ret = rte_dev_hotplug_handle_enable();
> +		if (ret) {
> +			RTE_LOG(ERR, EAL,
> +				"fail to enable hotplug handling.");
> +			return -1;
> +		}
> +
>  		ret = rte_dev_event_monitor_start();
>  		if (ret) {
> -			rte_errno = EINVAL;
> +			RTE_LOG(ERR, EAL,
> +				"fail to start device event monitoring.");
>  			return -1;
>  		}
> -		eth_dev_event_callback_register();
> 
> +		ret = rte_dev_event_callback_register(NULL,
> +			eth_dev_event_callback, NULL);
> +		if (ret) {
> +			printf("faile to register device event callback\n");

Better to use RTE_LOG() instead of peintf(). Note type in message, "faile" should be "failed"

> +			return -1;
> +		}
>  	}
> 
>  	if (start_port(RTE_PORT_ALL) != 0)
> --
> 2.7.4


^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v13 0/7] hot-unplug failure handle mechanism
  2018-10-04  6:30       ` [PATCH v13 0/7] " Jeff Guo
                           ` (6 preceding siblings ...)
  2018-10-04  6:30         ` [PATCH v13 7/7] app/testpmd: use hotplug failure handler Jeff Guo
@ 2018-10-04 12:02         ` Ananyev, Konstantin
  7 siblings, 0 replies; 494+ messages in thread
From: Ananyev, Konstantin @ 2018-10-04 12:02 UTC (permalink / raw)
  To: Guo, Jia, stephen, Richardson, Bruce, Yigit, Ferruh,
	gaetan.rivet, Wu, Jingjing, thomas, motih, matan, Van Haaren,
	Harry, Zhang, Qi Z, He, Shaopeng, Iremonger, Bernard, arybchenko,
	Lu, Wenzhuo, Burakov, Anatoly, jerin.jacob
  Cc: jblunck, shreyansh.jain, dev, Zhang, Helin



> -----Original Message-----
> From: Guo, Jia
> Sent: Thursday, October 4, 2018 7:31 AM
> To: stephen@networkplumber.org; Richardson, Bruce <bruce.richardson@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com>; Ananyev,
> Konstantin <konstantin.ananyev@intel.com>; gaetan.rivet@6wind.com; Wu, Jingjing <jingjing.wu@intel.com>; thomas@monjalon.net;
> motih@mellanox.com; matan@mellanox.com; Van Haaren, Harry <harry.van.haaren@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>; He,
> Shaopeng <shaopeng.he@intel.com>; Iremonger, Bernard <bernard.iremonger@intel.com>; arybchenko@solarflare.com; Lu, Wenzhuo
> <wenzhuo.lu@intel.com>; Burakov, Anatoly <anatoly.burakov@intel.com>; jerin.jacob@caviumnetworks.com
> Cc: jblunck@infradead.org; shreyansh.jain@nxp.com; dev@dpdk.org; Guo, Jia <jia.guo@intel.com>; Zhang, Helin <helin.zhang@intel.com>
> Subject: [PATCH v13 0/7] hot-unplug failure handle mechanism
> 
> Hotplug is an important feature for use-cases like the datacenter device's
> fail-safe and for SRIOV Live Migration in SDN/NFV. It could bring higher
> flexibility and continuality to networking services in multiple use-cases
> in the industry. So let's see how DPDK can help users implement hotplug
> solutions.
> 
> We already have a general device-event monitor mechanism, failsafe driver,
> and hot plug/unplug API in DPDK. We have already got the solution of
> “ethdev event + kernel PMD hotplug handler + failsafe”, but we still not
> got “eal event + hotplug handler for pci PMD + failsafe” implement, and we
> need to considerate 2 different solutions between uio pci and vfio pci.
> 
> In the case of hotplug for igb_uio, when a hardware device be removed
> physically or disabled in software, the application needs to be notified
> and detach the device out of the bus, and then make the device invalidate.
> The problem is that, the removal of the device is not instantaneous in
> software. If the application data path tries to read/write to the device
> when removal is still in process, it will cause an MMIO error and
> application will crash.
> 
> In this patch set, we propose a PCIe bus failure handler mechanism for
> hot-unplug in igb_uio. It aims to guarantee that, when a hot-unplug occurs,
> the application will not crash.
> 
> The mechanism should work as below:
> 
> First, the application enables the device event monitor, registers the
> hotplug event’s callback and enable hotplug handling before running the
> data path. Once the hot-unplug occurs, the mechanism will detect the
> removal event and then accordingly do the failure handling. In order to
> do that, the below functionality will be required:
>  - Add a new bus ops “hot_unplug_handler” to handle hot-unplug failure.
>  - Implement pci bus specific ops “pci_hot_unplug_handler”. For uio pci,
>    it will be based on the failure address to remap memory for the corresponding
>    device that unplugged. For vfio pci, could seperate implement case by case.
> 
> For the data path or other unexpected behaviors from the control path
> when a hot unplug occurs:
>  - Add a new bus ops “sigbus_handler”, that is responsible for handling
>    the sigbus error which is either an original memory error, or a specific
>    memory error that is caused by a hot unplug. When a sigbus error is
>    captured, it will call this function to handle sigbus error.
>  - Implement PCI bus specific ops “pci_sigbus_handler”. It will iterate all
>    device on PCI bus to find which device encounter the failure.
>  - Implement a "rte_bus_sigbus_handler" to iterate all buses to find a bus
>    to handle the failure.
>  - Add a couple of APIs “rte_dev_hotplug_handle_enable” and
>    “rte_dev_hotplug_handle_diable” to enable/disable hotplug handling.
>    It will monitor the sigbus error by a handler which is per-process.
>    Based on the signal event principle, the control path thread and the
>    data path thread will randomly receive the sigbus error, but will call the
>    common sigbus process. When sigbus be captured, it will call the above API
>    to find bus to handle it.
> 
> The mechanism could be used by app or PMDs. For example, the whole process
> of hotplug in testpmd is:
>  - Enable device event monitor->Enable hotplug handle->Register event callback
>    ->attach port->start port->start forwarding->Device unplug->failure handle
>    ->stop forwarding->stop port->close port->detach port.
> 
> This patch set would not cover hotplug insert and binding, and it is only
> implement the igb_uio failure handler, the vfio hotplug failure handler
> will be in next coming patch set.
> 
> patchset history:
> v13->v12:
> use local variable to rewrite the func to be more readable.
> add sa_flag check when invoke generic sigbus handler
> modify some typo
> delete needless helper in app
> 
> v12->v11:
> add and delete some checking about sigbus recover.
> 
> v11->v10:
> change the ops name, since both uio and vfio will use the hot-unplug ops.
> since we plan to abandon RTE_ETH_EVENT_INTR_RMV, change to use
> RTE_DEV_EVENT_REMOVE, so modify the hotplug event and callback usage.
> move the igb_uio fixing part, since it is random issue and should be considarate
> as kernel driver defect but not include as this failure handler mechanism.
> 
> v10->v9:
> modify the api name and exposure out for public use.
> add hotplug handle enable/disable APIs
> refine commit log
> 
> v9->v8:
> refine commit log to be more readable.
> 
> v8->v7:
> refine errno process in sigbus handler.
> refine igb uio release process
> 
> v7->v6:
> delete some unused part
> 
> v6->v5:
> refine some description about bus ops
> refine commit log
> add some entry check.
> 
> v5->v4:
> split patches to focus on the failure handle, remove the event usage
> by testpmd to another patch.
> change the hotplug failure handler name.
> refine the sigbus handle logic.
> add lock for udev state in igb uio driver.
> 
> v4->v3:
> split patches to be small and clear.
> change to use new parameter "--hotplug-mode" in testpmd to identify
> the eal hotplug and ethdev hotplug.
> 
> v3->v2:
> change bus ops name to bus_hotplug_handler.
> add new API and bus ops of bus_signal_handler distingush handle generic.
> sigbus and hotplug sigbus.
> 
> v2->v1(v21):
> refine some doc and commit log.
> fix igb uio kernel issue for control path failure rebase testpmd code.
> 
> Since the hot plug solution be discussed serval around in the public,
> the scope be changed and the patch set be split into many times. Coming
> to the recently RFC and feature design, it just focus on the hot unplug
> failure handler at this patch set, so in order let this topic more clear
> and focus, summarize privours patch set in history “v1(v21)”, the v2 here
> go ahead for further track.
> 
> "v1(21)" == v21 as below:
> v21->v20:
> split function in hot unplug ops.
> sync failure hanlde to fix multiple process issue fix attach port issue for multiple devices case.
> combind rmv callback function to be only one.
> 
> v20->v19:
> clean the code.
> refine the remap logic for multiple device.
> remove the auto binding.
> 
> v19->18:
> note for limitation of multiple hotplug, fix some typo, sqeeze patch.
> 
> v18->v15:
> add document, add signal bus handler, refine the code to be more clear.
> 
> the prior patch history please check the patch set "add device event monitor framework".
> 
> Jeff Guo (7):
>   bus: add hot-unplug handler
>   bus/pci: implement hot-unplug handler ops
>   bus: add sigbus handler
>   bus/pci: implement sigbus handler ops
>   bus: add helper to handle sigbus
>   eal: add failure handle mechanism for hot-unplug
>   app/testpmd: use hotplug failure handler
> 
>  app/test-pmd/testpmd.c                  |  86 ++++++++--------
>  doc/guides/rel_notes/release_18_08.rst  |   5 +
>  drivers/bus/pci/pci_common.c            |  82 +++++++++++++++
>  drivers/bus/pci/pci_common_uio.c        |  33 +++++++
>  drivers/bus/pci/private.h               |  12 +++
>  lib/librte_eal/bsdapp/eal/eal_dev.c     |  14 +++
>  lib/librte_eal/common/eal_common_bus.c  |  43 ++++++++
>  lib/librte_eal/common/eal_private.h     |  39 ++++++++
>  lib/librte_eal/common/include/rte_bus.h |  34 +++++++
>  lib/librte_eal/common/include/rte_dev.h |  26 +++++
>  lib/librte_eal/linuxapp/eal/eal_dev.c   | 170 +++++++++++++++++++++++++++++++-
>  lib/librte_eal/rte_eal_version.map      |   2 +
>  12 files changed, 499 insertions(+), 47 deletions(-)
> 
> --

Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> 2.7.4


^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v13 7/7] app/testpmd: use hotplug failure handler
  2018-10-04 10:31           ` Iremonger, Bernard
@ 2018-10-04 13:53             ` Jeff Guo
  0 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-10-04 13:53 UTC (permalink / raw)
  To: Iremonger, Bernard, stephen, Richardson, Bruce, Yigit, Ferruh,
	Ananyev, Konstantin, gaetan.rivet, Wu, Jingjing, thomas, motih,
	matan, Van Haaren, Harry, Zhang, Qi Z, He, Shaopeng, arybchenko,
	Lu, Wenzhuo, Burakov, Anatoly, jerin.jacob
  Cc: jblunck, shreyansh.jain, dev, Zhang, Helin

thanks for your kindly review.

On 10/4/2018 6:31 PM, Iremonger, Bernard wrote:
> Hi Jeff,
>
>> -----Original Message-----
>> From: Guo, Jia
>> Sent: Thursday, October 4, 2018 7:31 AM
>> To: stephen@networkplumber.org; Richardson, Bruce
>> <bruce.richardson@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com>; Ananyev,
>> Konstantin <konstantin.ananyev@intel.com>; gaetan.rivet@6wind.com; Wu,
>> Jingjing <jingjing.wu@intel.com>; thomas@monjalon.net;
>> motih@mellanox.com; matan@mellanox.com; Van Haaren, Harry
>> <harry.van.haaren@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>; He,
>> Shaopeng <shaopeng.he@intel.com>; Iremonger, Bernard
>> <bernard.iremonger@intel.com>; arybchenko@solarflare.com; Lu, Wenzhuo
>> <wenzhuo.lu@intel.com>; Burakov, Anatoly <anatoly.burakov@intel.com>;
>> jerin.jacob@caviumnetworks.com
>> Cc: jblunck@infradead.org; shreyansh.jain@nxp.com; dev@dpdk.org; Guo, Jia
>> <jia.guo@intel.com>; Zhang, Helin <helin.zhang@intel.com>
>> Subject: [PATCH v13 7/7] app/testpmd: use hotplug failure handler
>>
>> This patch use testpmd for example, to show how an app smoothly handle
>> failure when device be hot-unplug. Except that app should enabled the device
>> event monitor and register the hotplug event’s callback, it also need enable
>> hotplug handle mechanism before running. Once app detect the removal event,
>> the hot-unplug callback would be called. It will first stop the packet forwarding,
>> then stop the port, close the port, and finally detach the port to clean the device
>> and release the resources.
>>
>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
>> ---
>> v13->v12:
>> delete needless helper in app.
>> ---
>>   app/test-pmd/testpmd.c | 86 +++++++++++++++++++++++--------------------------
>> -
>>   1 file changed, 40 insertions(+), 46 deletions(-)
>>
>> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
>> 001f0e5..f3f8e44 100644
>> --- a/app/test-pmd/testpmd.c
>> +++ b/app/test-pmd/testpmd.c
>> @@ -434,9 +434,6 @@ static int eth_event_callback(portid_t port_id,  static
>> void eth_dev_event_callback(char *device_name,
>>   				enum rte_dev_event_type type,
>>   				void *param);
>> -static int eth_dev_event_callback_register(void);
>> -static int eth_dev_event_callback_unregister(void);
>> -
>>
>>   /*
>>    * Check if all the ports are started.
>> @@ -1954,39 +1951,6 @@ reset_port(portid_t pid)
>>   	printf("Done\n");
>>   }
>>
>> -static int
>> -eth_dev_event_callback_register(void)
>> -{
>> -	int ret;
>> -
>> -	/* register the device event callback */
>> -	ret = rte_dev_event_callback_register(NULL,
>> -		eth_dev_event_callback, NULL);
>> -	if (ret) {
>> -		printf("Failed to register device event callback\n");
>> -		return -1;
>> -	}
>> -
>> -	return 0;
>> -}
>> -
>> -
>> -static int
>> -eth_dev_event_callback_unregister(void)
>> -{
>> -	int ret;
>> -
>> -	/* unregister the device event callback */
>> -	ret = rte_dev_event_callback_unregister(NULL,
>> -		eth_dev_event_callback, NULL);
>> -	if (ret < 0) {
>> -		printf("Failed to unregister device event callback\n");
>> -		return -1;
>> -	}
>> -
>> -	return 0;
>> -}
>> -
>>   void
>>   attach_port(char *identifier)
>>   {
>> @@ -2093,14 +2057,25 @@ pmd_test_exit(void)
>>
>>   	if (hot_plug) {
>>   		ret = rte_dev_event_monitor_stop();
>> -		if (ret)
>> +		if (ret) {
>>   			RTE_LOG(ERR, EAL,
>>   				"fail to stop device event monitor.");
>> +			return;
>> +		}
>>
>> -		ret = eth_dev_event_callback_unregister();
>> -		if (ret)
>> +		ret = rte_dev_event_callback_unregister(NULL,
>> +			eth_dev_event_callback, NULL);
>> +		if (ret < 0) {
>> +			printf("fail to unregister device event callback.\n");
> Better to use RTE_LOG()  instead of printf().
>
>> +			return;
>> +		}
>> +
>> +		ret = rte_dev_hotplug_handle_disable();
>> +		if (ret) {
>>   			RTE_LOG(ERR, EAL,
>> -				"fail to unregister all event callbacks.");
>> +				"fail to disable hotplug handling.\n");
>> +			return;
>> +		}
>>   	}
>>
>>   	printf("\nBye...\n");
>> @@ -2244,6 +2219,9 @@ static void
>>   eth_dev_event_callback(char *device_name, enum rte_dev_event_type type,
>>   			     __rte_unused void *arg)
>>   {
>> +	uint16_t port_id;
>> +	int ret;
>> +
>>   	if (type >= RTE_DEV_EVENT_MAX) {
>>   		fprintf(stderr, "%s called upon invalid event %d\n",
>>   			__func__, type);
>> @@ -2254,9 +2232,13 @@ eth_dev_event_callback(char *device_name, enum
>> rte_dev_event_type type,
>>   	case RTE_DEV_EVENT_REMOVE:
>>   		RTE_LOG(ERR, EAL, "The device: %s has been removed!\n",
>>   			device_name);
>> -		/* TODO: After finish failure handle, begin to stop
>> -		 * packet forward, stop port, close port, detach port.
>> -		 */
>> +		ret = rte_eth_dev_get_port_by_name(device_name, &port_id);
>> +		if (ret) {
>> +			RTE_LOG(ERR, EAL, "can not get port by device %s!\n",
>> +				device_name);
>> +			return;
>> +		}
>> +		rmv_event_callback((void *)(intptr_t)port_id);
>>   		break;
>>   	case RTE_DEV_EVENT_ADD:
>>   		RTE_LOG(ERR, EAL, "The device: %s has been added!\n", @@ -
>> 2779,14 +2761,26 @@ main(int argc, char** argv)
>>   	init_config();
>>
>>   	if (hot_plug) {
>> -		/* enable hot plug monitoring */
>> +		ret = rte_dev_hotplug_handle_enable();
>> +		if (ret) {
>> +			RTE_LOG(ERR, EAL,
>> +				"fail to enable hotplug handling.");
>> +			return -1;
>> +		}
>> +
>>   		ret = rte_dev_event_monitor_start();
>>   		if (ret) {
>> -			rte_errno = EINVAL;
>> +			RTE_LOG(ERR, EAL,
>> +				"fail to start device event monitoring.");
>>   			return -1;
>>   		}
>> -		eth_dev_event_callback_register();
>>
>> +		ret = rte_dev_event_callback_register(NULL,
>> +			eth_dev_event_callback, NULL);
>> +		if (ret) {
>> +			printf("faile to register device event callback\n");
> Better to use RTE_LOG() instead of peintf(). Note type in message, "faile" should be "failed"
>
>> +			return -1;
>> +		}
>>   	}
>>
>>   	if (start_port(RTE_PORT_ALL) != 0)
>> --
>> 2.7.4

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH v14 0/7] hot-unplug failure handle mechanism
  2017-06-29  4:37     ` [PATCH v3 0/2] add uevent api for hot plug Jeff Guo
                         ` (21 preceding siblings ...)
  2018-10-04  6:30       ` [PATCH v13 0/7] " Jeff Guo
@ 2018-10-04 14:46       ` Jeff Guo
  2018-10-04 14:46         ` [PATCH v14 1/7] bus: add hot-unplug handler Jeff Guo
                           ` (6 more replies)
  2018-10-15 11:27       ` [PATCH v15 0/7] hot-unplug failure handle mechanism Jeff Guo
  23 siblings, 7 replies; 494+ messages in thread
From: Jeff Guo @ 2018-10-04 14:46 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu, anatoly.burakov, jerin.jacob
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

Hotplug is an important feature for use-cases like the datacenter device's
fail-safe and for SRIOV Live Migration in SDN/NFV. It could bring higher
flexibility and continuality to networking services in multiple use-cases
in the industry. So let's see how DPDK can help users implement hotplug
solutions.

We already have a general device-event monitor mechanism, failsafe driver,
and hot plug/unplug API in DPDK. We have already got the solution of
“ethdev event + kernel PMD hotplug handler + failsafe”, but we still
not got “eal event + hotplug handler for pci PMD + failsafe” implement,
and we need to considerate 2 different solutions between uio pci and
vfio pci.

In the case of hotplug for igb_uio, when a hardware device be removed
physically or disabled in software, the application needs to be notified
and detach the device out of the bus, and then make the device invalidate.
The problem is that, the removal of the device is not instantaneous in
software. If the application data path tries to read/write to the device
when removal is still in process, it will cause an MMIO error and
application will crash.

In this patch set, we propose a PCIe bus failure handler mechanism for
hot-unplug in igb_uio. It aims to guarantee that, when a hot-unplug occurs,
the application will not crash.

The mechanism should work as below:

First, the application enables the device event monitor, registers the
hotplug event’s callback and enable hotplug handling before running the
data path. Once the hot-unplug occurs, the mechanism will detect the
removal event and then accordingly do the failure handling. In order to
do that, the below functionality will be required:
 - Add a new bus ops “hot_unplug_handler” to handle hot-unplug failure.
 - Implement pci bus specific ops “pci_hot_unplug_handler”. For uio
   pci, it will be based on the failure address to remap memory for the
   corresponding device that unplugged. For vfio pci, could separate
   implement case by case.

For the data path or other unexpected behaviors from the control path
when a hot unplug occurs:
 - Add a new bus ops “sigbus_handler”, that is responsible for handling
   the sigbus error which is either an original memory error, or a specific
   memory error that is caused by a hot unplug. When a sigbus error is
   captured, it will call this function to handle sigbus error.
 - Implement PCI bus specific ops “pci_sigbus_handler”. It will
   iterate over all devices on PCI bus to find which device encounter
   the failure.
 - Implement a "rte_bus_sigbus_handler" to iterate all buses to find a bus
   to handle the failure.
 - Add a couple of APIs “rte_dev_hotplug_handle_enable” and
   “rte_dev_hotplug_handle_disable” to enable/disable hotplug handling.
   It will monitor the sigbus error by a handler which is per-process.
   Based on the signal event principle, the control path thread and the
   data path thread will randomly receive the sigbus error, but will call
   the common sigbus process. When sigbus be captured, it will call the
   above API to find bus to handle it.

The mechanism could be used by app or PMDs. For example, the whole process
of hotplug in testpmd is:
 - Enable device event monitor->Enable hotplug handle->Register event callback
   ->attach port->start port->start forwarding->Device unplug->failure handle
   ->stop forwarding->stop port->close port->detach port.

This patch set would not cover hotplug insert and binding, and it is only
implement the igb_uio failure handler, the vfio hotplug failure handler
will be in next coming patch set.

patchset history:
v14->v13:
rebase code for fix apply issue.
fix some typo and checkpatch warning issue.

v13->v12:
use local variable to rewrite the func to be more readable.
add sa_flag check when invoke generic sigbus handler
modify some typo
delete needless helper in app

v12->v11:
add and delete some checking about sigbus recover.

v11->v10:
change the ops name, since both uio and vfio will use the hot-unplug ops.
since we plan to abandon RTE_ETH_EVENT_INTR_RMV, change to use
RTE_DEV_EVENT_REMOVE, so modify the hotplug event and callback usage.
move the igb_uio fixing part, since it is random issue and should be considarate
as kernel driver defect but not include as this failure handler mechanism.

v10->v9:
modify the api name and exposure out for public use.
add hotplug handle enable/disable APIs
refine commit log

v9->v8:
refine commit log to be more readable.

v8->v7:
refine errno process in sigbus handler.
refine igb uio release process

v7->v6:
delete some unused part

v6->v5:
refine some description about bus ops
refine commit log
add some entry check.

v5->v4:
split patches to focus on the failure handle, remove the event usage
by testpmd to another patch.
change the hotplug failure handler name.
refine the sigbus handle logic.
add lock for udev state in igb uio driver.

v4->v3:
split patches to be small and clear.
change to use new parameter "--hotplug-mode" in testpmd to identify
the eal hotplug and ethdev hotplug.

v3->v2:
change bus ops name to bus_hotplug_handler.
add new API and bus ops of bus_signal_handler distingush handle generic.
sigbus and hotplug sigbus.

v2->v1(v21):
refine some doc and commit log.
fix igb uio kernel issue for control path failure rebase testpmd code.

Since the hot plug solution be discussed serval around in the public,
the scope be changed and the patch set be split into many times. Coming
to the recently RFC and feature design, it just focus on the hot unplug
failure handler at this patch set, so in order let this topic more clear
and focus, summarize privours patch set in history “v1(v21)”, the v2 here
go ahead for further track.

"v1(21)" == v21 as below:
v21->v20:
split function in hot unplug ops.
sync failure hanlde to fix multiple process issue fix attach port issue for multiple devices case.
combind rmv callback function to be only one.

v20->v19:
clean the code.
refine the remap logic for multiple device.
remove the auto binding.

v19->18:
note for limitation of multiple hotplug, fix some typo, sqeeze patch.

v18->v15:
add document, add signal bus handler, refine the code to be more clear.

the prior patch history please check the patch set "add device event monitor framework".

Jeff Guo (7):
  bus: add hot-unplug handler
  bus/pci: implement hot-unplug handler ops
  bus: add sigbus handler
  bus/pci: implement sigbus handler ops
  bus: add helper to handle sigbus
  eal: add failure handle mechanism for hot-unplug
  app/testpmd: use hotplug failure handler

 app/test-pmd/testpmd.c                  |  88 ++++++++---------
 doc/guides/rel_notes/release_18_08.rst  |   5 +
 drivers/bus/pci/pci_common.c            |  83 ++++++++++++++++
 drivers/bus/pci/pci_common_uio.c        |  33 +++++++
 drivers/bus/pci/private.h               |  12 +++
 lib/librte_eal/bsdapp/eal/eal_dev.c     |  14 +++
 lib/librte_eal/common/eal_common_bus.c  |  43 ++++++++
 lib/librte_eal/common/eal_private.h     |  39 ++++++++
 lib/librte_eal/common/include/rte_bus.h |  34 +++++++
 lib/librte_eal/common/include/rte_dev.h |  26 +++++
 lib/librte_eal/linuxapp/eal/eal_dev.c   | 170 +++++++++++++++++++++++++++++++-
 lib/librte_eal/rte_eal_version.map      |   2 +
 12 files changed, 502 insertions(+), 47 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH v14 1/7] bus: add hot-unplug handler
  2018-10-04 14:46       ` [PATCH v14 " Jeff Guo
@ 2018-10-04 14:46         ` Jeff Guo
  2018-10-04 14:46         ` [PATCH v14 2/7] bus/pci: implement hot-unplug handler ops Jeff Guo
                           ` (5 subsequent siblings)
  6 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-10-04 14:46 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu, anatoly.burakov, jerin.jacob
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

A hot-unplug failure and app crash can be caused, when a device is
hot-unplugged but the application still try to access the device
by reading or writing from the BARs, which is already invalid but
still not timely be unmap or released.

This patch introduces bus ops to handle hot-unplug failures. Each
bus can implement its own case-dependent logic to handle the failures.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
v14->v13:
no change.
---
 lib/librte_eal/common/include/rte_bus.h | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
index b7b5b08..1bb53dc 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -168,6 +168,20 @@ typedef int (*rte_bus_unplug_t)(struct rte_device *dev);
 typedef int (*rte_bus_parse_t)(const char *name, void *addr);
 
 /**
+ * Implement a specific hot-unplug handler, which is responsible for
+ * handle the failure when device be hot-unplugged. When the event of
+ * hot-unplug be detected, it could call this function to handle
+ * the hot-unplug failure and avoid app crash.
+ * @param dev
+ *	Pointer of the device structure.
+ *
+ * @return
+ *	0 on success.
+ *	!0 on error.
+ */
+typedef int (*rte_bus_hot_unplug_handler_t)(struct rte_device *dev);
+
+/**
  * Bus scan policies
  */
 enum rte_bus_scan_mode {
@@ -212,6 +226,8 @@ struct rte_bus {
 	struct rte_bus_conf conf;    /**< Bus configuration */
 	rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
 	rte_dev_iterate_t dev_iterate; /**< Device iterator. */
+	rte_bus_hot_unplug_handler_t hot_unplug_handler;
+				/**< handle hot-unplug failure on the bus */
 };
 
 /**
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v14 2/7] bus/pci: implement hot-unplug handler ops
  2018-10-04 14:46       ` [PATCH v14 " Jeff Guo
  2018-10-04 14:46         ` [PATCH v14 1/7] bus: add hot-unplug handler Jeff Guo
@ 2018-10-04 14:46         ` Jeff Guo
  2018-10-04 14:46         ` [PATCH v14 3/7] bus: add sigbus handler Jeff Guo
                           ` (4 subsequent siblings)
  6 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-10-04 14:46 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu, anatoly.burakov, jerin.jacob
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

This patch implements the ops to handle hot-unplug on the PCI bus.
For UIO PCI, it could avoids BARs read/write errors by creating a
new dummy memory to remap the memory where the failure is. For VFIO
or other kernel driver, it could specific implement function to handle
hot-unplug case by case.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Shaopeng He <shaopeng.he@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
v14->v13:
no change.
---
 drivers/bus/pci/pci_common.c     | 28 ++++++++++++++++++++++++++++
 drivers/bus/pci/pci_common_uio.c | 33 +++++++++++++++++++++++++++++++++
 drivers/bus/pci/private.h        | 12 ++++++++++++
 3 files changed, 73 insertions(+)

diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
index c7695d1..be7cc1f 100644
--- a/drivers/bus/pci/pci_common.c
+++ b/drivers/bus/pci/pci_common.c
@@ -404,6 +404,33 @@ pci_find_device(const struct rte_device *start, rte_dev_cmp_t cmp,
 }
 
 static int
+pci_hot_unplug_handler(struct rte_device *dev)
+{
+	struct rte_pci_device *pdev = NULL;
+	int ret = 0;
+
+	pdev = RTE_DEV_TO_PCI(dev);
+	if (!pdev)
+		return -1;
+
+	switch (pdev->kdrv) {
+	case RTE_KDRV_IGB_UIO:
+	case RTE_KDRV_UIO_GENERIC:
+	case RTE_KDRV_NIC_UIO:
+		/* BARs resource is invalid, remap it to be safe. */
+		ret = pci_uio_remap_resource(pdev);
+		break;
+	default:
+		RTE_LOG(DEBUG, EAL,
+			"Not managed by a supported kernel driver, skipped\n");
+		ret = -1;
+		break;
+	}
+
+	return ret;
+}
+
+static int
 pci_plug(struct rte_device *dev)
 {
 	return pci_probe_all_drivers(RTE_DEV_TO_PCI(dev));
@@ -434,6 +461,7 @@ struct rte_pci_bus rte_pci_bus = {
 		.parse = pci_parse,
 		.get_iommu_class = rte_pci_get_iommu_class,
 		.dev_iterate = rte_pci_dev_iterate,
+		.hot_unplug_handler = pci_hot_unplug_handler,
 	},
 	.device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
 	.driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
diff --git a/drivers/bus/pci/pci_common_uio.c b/drivers/bus/pci/pci_common_uio.c
index 54bc20b..7ea73db 100644
--- a/drivers/bus/pci/pci_common_uio.c
+++ b/drivers/bus/pci/pci_common_uio.c
@@ -146,6 +146,39 @@ pci_uio_unmap(struct mapped_pci_resource *uio_res)
 	}
 }
 
+/* remap the PCI resource of a PCI device in anonymous virtual memory */
+int
+pci_uio_remap_resource(struct rte_pci_device *dev)
+{
+	int i;
+	void *map_address;
+
+	if (dev == NULL)
+		return -1;
+
+	/* Remap all BARs */
+	for (i = 0; i != PCI_MAX_RESOURCE; i++) {
+		/* skip empty BAR */
+		if (dev->mem_resource[i].phys_addr == 0)
+			continue;
+		map_address = mmap(dev->mem_resource[i].addr,
+				(size_t)dev->mem_resource[i].len,
+				PROT_READ | PROT_WRITE,
+				MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
+		if (map_address == MAP_FAILED) {
+			RTE_LOG(ERR, EAL,
+				"Cannot remap resource for device %s\n",
+				dev->name);
+			return -1;
+		}
+		RTE_LOG(INFO, EAL,
+			"Successful remap resource for device %s\n",
+			dev->name);
+	}
+
+	return 0;
+}
+
 static struct mapped_pci_resource *
 pci_uio_find_resource(struct rte_pci_device *dev)
 {
diff --git a/drivers/bus/pci/private.h b/drivers/bus/pci/private.h
index 0e689fa..0883d82 100644
--- a/drivers/bus/pci/private.h
+++ b/drivers/bus/pci/private.h
@@ -125,6 +125,18 @@ void pci_uio_free_resource(struct rte_pci_device *dev,
 		struct mapped_pci_resource *uio_res);
 
 /**
+ * Remap the PCI resource of a PCI device in anonymous virtual memory.
+ *
+ * @param dev
+ *   Point to the struct rte pci device.
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+pci_uio_remap_resource(struct rte_pci_device *dev);
+
+/**
  * Map device memory to uio resource
  *
  * This function is private to EAL.
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v14 3/7] bus: add sigbus handler
  2018-10-04 14:46       ` [PATCH v14 " Jeff Guo
  2018-10-04 14:46         ` [PATCH v14 1/7] bus: add hot-unplug handler Jeff Guo
  2018-10-04 14:46         ` [PATCH v14 2/7] bus/pci: implement hot-unplug handler ops Jeff Guo
@ 2018-10-04 14:46         ` Jeff Guo
  2018-10-04 14:46         ` [PATCH v14 4/7] bus/pci: implement sigbus handler ops Jeff Guo
                           ` (3 subsequent siblings)
  6 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-10-04 14:46 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu, anatoly.burakov, jerin.jacob
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

When a device is hot-unplugged, a sigbus error will occur of the datapath
can still read/write to the device. A handler is required here to capture
the sigbus signal and handle it appropriately.

This patch introduces a bus ops to handle sigbus errors. Each bus can
implement its own case-dependent logic to handle the sigbus errors.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Shaopeng He <shaopeng.he@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
v14->v13:
no change.
---
 lib/librte_eal/common/include/rte_bus.h | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
index 1bb53dc..6be4b5c 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -182,6 +182,21 @@ typedef int (*rte_bus_parse_t)(const char *name, void *addr);
 typedef int (*rte_bus_hot_unplug_handler_t)(struct rte_device *dev);
 
 /**
+ * Implement a specific sigbus handler, which is responsible for handling
+ * the sigbus error which is either original memory error, or specific memory
+ * error that caused of device be hot-unplugged. When sigbus error be captured,
+ * it could call this function to handle sigbus error.
+ * @param failure_addr
+ *	Pointer of the fault address of the sigbus error.
+ *
+ * @return
+ *	0 for success handle the sigbus for hot-unplug.
+ *	1 for not process it, because it is a generic sigbus error.
+ *	-1 for failed to handle the sigbus for hot-unplug.
+ */
+typedef int (*rte_bus_sigbus_handler_t)(const void *failure_addr);
+
+/**
  * Bus scan policies
  */
 enum rte_bus_scan_mode {
@@ -228,6 +243,9 @@ struct rte_bus {
 	rte_dev_iterate_t dev_iterate; /**< Device iterator. */
 	rte_bus_hot_unplug_handler_t hot_unplug_handler;
 				/**< handle hot-unplug failure on the bus */
+	rte_bus_sigbus_handler_t sigbus_handler;
+					/**< handle sigbus error on the bus */
+
 };
 
 /**
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v14 4/7] bus/pci: implement sigbus handler ops
  2018-10-04 14:46       ` [PATCH v14 " Jeff Guo
                           ` (2 preceding siblings ...)
  2018-10-04 14:46         ` [PATCH v14 3/7] bus: add sigbus handler Jeff Guo
@ 2018-10-04 14:46         ` Jeff Guo
  2018-10-04 14:46         ` [PATCH v14 5/7] bus: add helper to handle sigbus Jeff Guo
                           ` (2 subsequent siblings)
  6 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-10-04 14:46 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu, anatoly.burakov, jerin.jacob
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

This patch implements the ops for the PCI bus sigbus handler. It finds the
PCI device that is being hot-unplugged and calls the relevant ops of the
hot-unplug handler to handle the hot-unplug failure of the device.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Shaopeng He <shaopeng.he@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
v14->v13:
fix some checkpatch issue.
---
 drivers/bus/pci/pci_common.c | 55 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 55 insertions(+)

diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
index be7cc1f..0efd930 100644
--- a/drivers/bus/pci/pci_common.c
+++ b/drivers/bus/pci/pci_common.c
@@ -403,6 +403,37 @@ pci_find_device(const struct rte_device *start, rte_dev_cmp_t cmp,
 	return NULL;
 }
 
+/*
+ * find the device which encounter the failure, by iterate over all device on
+ * PCI bus to check if the memory failure address is located in the range
+ * of the BARs of the device.
+ */
+static struct rte_pci_device *
+pci_find_device_by_addr(const void *failure_addr)
+{
+	struct rte_pci_device *pdev = NULL;
+	uint64_t check_point, start, end, len;
+	int i;
+
+	check_point = (uint64_t)(uintptr_t)failure_addr;
+
+	FOREACH_DEVICE_ON_PCIBUS(pdev) {
+		start = (uint64_t)(uintptr_t)pdev->mem_resource[i].addr;
+		len = pdev->mem_resource[i].len;
+		end = (uint64_t)(uintptr_t)RTE_PTR_ADD(start, len);
+
+		for (i = 0; i != RTE_DIM(pdev->mem_resource); i++) {
+			if (check_point >= start && check_point < end) {
+				RTE_LOG(DEBUG, EAL, "Failure address %16.16"
+					PRIx64" belongs to device %s!\n",
+					check_point, pdev->device.name);
+				return pdev;
+			}
+		}
+	}
+	return NULL;
+}
+
 static int
 pci_hot_unplug_handler(struct rte_device *dev)
 {
@@ -431,6 +462,29 @@ pci_hot_unplug_handler(struct rte_device *dev)
 }
 
 static int
+pci_sigbus_handler(const void *failure_addr)
+{
+	struct rte_pci_device *pdev = NULL;
+	int ret = 0;
+
+	pdev = pci_find_device_by_addr(failure_addr);
+	if (!pdev) {
+		/* It is a generic sigbus error, no bus would handle it. */
+		ret = 1;
+	} else {
+		/* The sigbus error is caused of hot-unplug. */
+		ret = pci_hot_unplug_handler(&pdev->device);
+		if (ret) {
+			RTE_LOG(ERR, EAL,
+				"Failed to handle hot-unplug for device %s",
+				pdev->name);
+			ret = -1;
+		}
+	}
+	return ret;
+}
+
+static int
 pci_plug(struct rte_device *dev)
 {
 	return pci_probe_all_drivers(RTE_DEV_TO_PCI(dev));
@@ -462,6 +516,7 @@ struct rte_pci_bus rte_pci_bus = {
 		.get_iommu_class = rte_pci_get_iommu_class,
 		.dev_iterate = rte_pci_dev_iterate,
 		.hot_unplug_handler = pci_hot_unplug_handler,
+		.sigbus_handler = pci_sigbus_handler,
 	},
 	.device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
 	.driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v14 5/7] bus: add helper to handle sigbus
  2018-10-04 14:46       ` [PATCH v14 " Jeff Guo
                           ` (3 preceding siblings ...)
  2018-10-04 14:46         ` [PATCH v14 4/7] bus/pci: implement sigbus handler ops Jeff Guo
@ 2018-10-04 14:46         ` Jeff Guo
  2018-10-04 14:46         ` [PATCH v14 6/7] eal: add failure handle mechanism for hot-unplug Jeff Guo
  2018-10-04 14:46         ` [PATCH v14 7/7] app/testpmd: use hotplug failure handler Jeff Guo
  6 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-10-04 14:46 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu, anatoly.burakov, jerin.jacob
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

This patch aims to add a helper to iterate over all buses to find the
relevant bus to handle the sigbus error.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Shaopeng He <shaopeng.he@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
v14->v13:
no change.
---
 lib/librte_eal/common/eal_common_bus.c | 43 ++++++++++++++++++++++++++++++++++
 lib/librte_eal/common/eal_private.h    | 13 ++++++++++
 2 files changed, 56 insertions(+)

diff --git a/lib/librte_eal/common/eal_common_bus.c b/lib/librte_eal/common/eal_common_bus.c
index 0943851..62b7318 100644
--- a/lib/librte_eal/common/eal_common_bus.c
+++ b/lib/librte_eal/common/eal_common_bus.c
@@ -37,6 +37,7 @@
 #include <rte_bus.h>
 #include <rte_debug.h>
 #include <rte_string_fns.h>
+#include <rte_errno.h>
 
 #include "eal_private.h"
 
@@ -242,3 +243,45 @@ rte_bus_get_iommu_class(void)
 	}
 	return mode;
 }
+
+static int
+bus_handle_sigbus(const struct rte_bus *bus,
+			const void *failure_addr)
+{
+	int ret;
+
+	if (!bus->sigbus_handler)
+		return -1;
+
+	ret = bus->sigbus_handler(failure_addr);
+
+	/* find bus but handle failed, keep the errno be set. */
+	if (ret < 0 && rte_errno == 0)
+		rte_errno = ENOTSUP;
+
+	return ret > 0;
+}
+
+int
+rte_bus_sigbus_handler(const void *failure_addr)
+{
+	struct rte_bus *bus;
+
+	int ret = 0;
+	int old_errno = rte_errno;
+
+	rte_errno = 0;
+
+	bus = rte_bus_find(NULL, bus_handle_sigbus, failure_addr);
+	/* can not find bus. */
+	if (!bus)
+		return 1;
+	/* find bus but handle failed, pass on the new errno. */
+	else if (rte_errno != 0)
+		return -1;
+
+	/* restore the old errno. */
+	rte_errno = old_errno;
+
+	return ret;
+}
diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h
index 4f809a8..a2d1528 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -304,4 +304,17 @@ int
 rte_devargs_layers_parse(struct rte_devargs *devargs,
 			 const char *devstr);
 
+/**
+ * Iterate over all buses to find the corresponding bus to handle the sigbus
+ * error.
+ * @param failure_addr
+ *	Pointer of the fault address of the sigbus error.
+ *
+ * @return
+ *	 0 success to handle the sigbus.
+ *	-1 failed to handle the sigbus
+ *	 1 no bus can handler the sigbus
+ */
+int rte_bus_sigbus_handler(const void *failure_addr);
+
 #endif /* _EAL_PRIVATE_H_ */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v14 6/7] eal: add failure handle mechanism for hot-unplug
  2018-10-04 14:46       ` [PATCH v14 " Jeff Guo
                           ` (4 preceding siblings ...)
  2018-10-04 14:46         ` [PATCH v14 5/7] bus: add helper to handle sigbus Jeff Guo
@ 2018-10-04 14:46         ` Jeff Guo
  2018-10-15 10:43           ` Thomas Monjalon
  2018-10-04 14:46         ` [PATCH v14 7/7] app/testpmd: use hotplug failure handler Jeff Guo
  6 siblings, 1 reply; 494+ messages in thread
From: Jeff Guo @ 2018-10-04 14:46 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu, anatoly.burakov, jerin.jacob
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

The mechanism can initially register the sigbus handler after the device
event monitor is enabled. When a sigbus event is captured, it will check
the failure address and accordingly handle the memory failure of the
corresponding device by invoke the hot-unplug handler. It could prevent
the application from crashing when a device is hot-unplugged.

By this patch, users could call below new added APIs to enable/disable
the device hotplug handle mechanism. Note that it just implement the
hot-unplug handler in these functions, the other handler of hotplug, such
as handler for hotplug binding, could be add in the future if need:
  - rte_dev_hotplug_handle_enable
  - rte_dev_hotplug_handle_disable

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
v14->v13:
no change.
---
 doc/guides/rel_notes/release_18_08.rst  |   5 +
 lib/librte_eal/bsdapp/eal/eal_dev.c     |  14 +++
 lib/librte_eal/common/eal_private.h     |  26 +++++
 lib/librte_eal/common/include/rte_dev.h |  26 +++++
 lib/librte_eal/linuxapp/eal/eal_dev.c   | 170 +++++++++++++++++++++++++++++++-
 lib/librte_eal/rte_eal_version.map      |   2 +
 6 files changed, 242 insertions(+), 1 deletion(-)

diff --git a/doc/guides/rel_notes/release_18_08.rst b/doc/guides/rel_notes/release_18_08.rst
index 8a09dee..4eca59a 100644
--- a/doc/guides/rel_notes/release_18_08.rst
+++ b/doc/guides/rel_notes/release_18_08.rst
@@ -117,6 +117,11 @@ New Features
 
   Added support for chained mbufs (input and output).
 
+* **Added hot-unplug handle mechanism.**
+
+  ``rte_dev_hotplug_handle_enable`` and ``rte_dev_hotplug_handle_disable`` are
+  for enabling or disabling hotplug handle mechanism.
+
 
 API Changes
 -----------
diff --git a/lib/librte_eal/bsdapp/eal/eal_dev.c b/lib/librte_eal/bsdapp/eal/eal_dev.c
index 1c6c51b..255d611 100644
--- a/lib/librte_eal/bsdapp/eal/eal_dev.c
+++ b/lib/librte_eal/bsdapp/eal/eal_dev.c
@@ -19,3 +19,17 @@ rte_dev_event_monitor_stop(void)
 	RTE_LOG(ERR, EAL, "Device event is not supported for FreeBSD\n");
 	return -1;
 }
+
+int __rte_experimental
+rte_dev_hotplug_handle_enable(void)
+{
+	RTE_LOG(ERR, EAL, "Device event is not supported for FreeBSD\n");
+	return -1;
+}
+
+int __rte_experimental
+rte_dev_hotplug_handle_disable(void)
+{
+	RTE_LOG(ERR, EAL, "Device event is not supported for FreeBSD\n");
+	return -1;
+}
diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h
index a2d1528..637f20d 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -317,4 +317,30 @@ rte_devargs_layers_parse(struct rte_devargs *devargs,
  */
 int rte_bus_sigbus_handler(const void *failure_addr);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Register the sigbus handler.
+ *
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+rte_dev_sigbus_handler_register(void);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Unregister the sigbus handler.
+ *
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+rte_dev_sigbus_handler_unregister(void);
+
 #endif /* _EAL_PRIVATE_H_ */
diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h
index b80a805..ff580a0 100644
--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -460,4 +460,30 @@ rte_dev_event_monitor_start(void);
 int __rte_experimental
 rte_dev_event_monitor_stop(void);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Enable hotplug handling for devices.
+ *
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int __rte_experimental
+rte_dev_hotplug_handle_enable(void);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Disable hotplug handling for devices.
+ *
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int __rte_experimental
+rte_dev_hotplug_handle_disable(void);
+
 #endif /* _RTE_DEV_H_ */
diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
index 1cf6aeb..4695fcb 100644
--- a/lib/librte_eal/linuxapp/eal/eal_dev.c
+++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
@@ -4,6 +4,8 @@
 
 #include <string.h>
 #include <unistd.h>
+#include <fcntl.h>
+#include <signal.h>
 #include <sys/socket.h>
 #include <linux/netlink.h>
 
@@ -14,15 +16,32 @@
 #include <rte_malloc.h>
 #include <rte_interrupts.h>
 #include <rte_alarm.h>
+#include <rte_bus.h>
+#include <rte_eal.h>
+#include <rte_spinlock.h>
+#include <rte_errno.h>
 
 #include "eal_private.h"
 
 static struct rte_intr_handle intr_handle = {.fd = -1 };
 static bool monitor_started;
+static bool hotplug_handle;
 
 #define EAL_UEV_MSG_LEN 4096
 #define EAL_UEV_MSG_ELEM_LEN 128
 
+/*
+ * spinlock for device hot-unplug failure handling. If it try to access bus or
+ * device, such as handle sigbus on bus or handle memory failure for device
+ * just need to use this lock. It could protect the bus and the device to avoid
+ * race condition.
+ */
+static rte_spinlock_t failure_handle_lock = RTE_SPINLOCK_INITIALIZER;
+
+static struct sigaction sigbus_action_old;
+
+static int sigbus_need_recover;
+
 static void dev_uev_handler(__rte_unused void *param);
 
 /* identify the system layer which reports this event. */
@@ -33,6 +52,55 @@ enum eal_dev_event_subsystem {
 	EAL_DEV_EVENT_SUBSYSTEM_MAX
 };
 
+static void
+sigbus_action_recover(void)
+{
+	if (sigbus_need_recover) {
+		sigaction(SIGBUS, &sigbus_action_old, NULL);
+		sigbus_need_recover = 0;
+	}
+}
+
+static void sigbus_handler(int signum, siginfo_t *info,
+				void *ctx __rte_unused)
+{
+	int ret;
+
+	RTE_LOG(DEBUG, EAL, "Thread[%d] catch SIGBUS, fault address:%p\n",
+		(int)pthread_self(), info->si_addr);
+
+	rte_spinlock_lock(&failure_handle_lock);
+	ret = rte_bus_sigbus_handler(info->si_addr);
+	rte_spinlock_unlock(&failure_handle_lock);
+	if (ret == -1) {
+		rte_exit(EXIT_FAILURE,
+			 "Failed to handle SIGBUS for hot-unplug, "
+			 "(rte_errno: %s)!", strerror(rte_errno));
+	} else if (ret == 1) {
+		if (sigbus_action_old.sa_flags == SA_SIGINFO
+		    && sigbus_action_old.sa_sigaction) {
+			(*(sigbus_action_old.sa_sigaction))(signum,
+							    info, ctx);
+		} else if (sigbus_action_old.sa_flags != SA_SIGINFO
+			   && sigbus_action_old.sa_handler) {
+			(*(sigbus_action_old.sa_handler))(signum);
+		} else {
+			rte_exit(EXIT_FAILURE,
+				 "Failed to handle generic SIGBUS!");
+		}
+	}
+
+	RTE_LOG(DEBUG, EAL, "Success to handle SIGBUS for hot-unplug!\n");
+}
+
+static int cmp_dev_name(const struct rte_device *dev,
+	const void *_name)
+{
+	const char *name = _name;
+
+	return strcmp(dev->name, name);
+}
+
 static int
 dev_uev_socket_fd_create(void)
 {
@@ -147,6 +215,9 @@ dev_uev_handler(__rte_unused void *param)
 	struct rte_dev_event uevent;
 	int ret;
 	char buf[EAL_UEV_MSG_LEN];
+	struct rte_bus *bus;
+	struct rte_device *dev;
+	const char *busname = "";
 
 	memset(&uevent, 0, sizeof(struct rte_dev_event));
 	memset(buf, 0, EAL_UEV_MSG_LEN);
@@ -171,8 +242,43 @@ dev_uev_handler(__rte_unused void *param)
 	RTE_LOG(DEBUG, EAL, "receive uevent(name:%s, type:%d, subsystem:%d)\n",
 		uevent.devname, uevent.type, uevent.subsystem);
 
-	if (uevent.devname)
+	switch (uevent.subsystem) {
+	case EAL_DEV_EVENT_SUBSYSTEM_PCI:
+	case EAL_DEV_EVENT_SUBSYSTEM_UIO:
+		busname = "pci";
+		break;
+	default:
+		break;
+	}
+
+	if (uevent.devname) {
+		if (uevent.type == RTE_DEV_EVENT_REMOVE && hotplug_handle) {
+			rte_spinlock_lock(&failure_handle_lock);
+			bus = rte_bus_find_by_name(busname);
+			if (bus == NULL) {
+				RTE_LOG(ERR, EAL, "Cannot find bus (%s)\n",
+					busname);
+				return;
+			}
+
+			dev = bus->find_device(NULL, cmp_dev_name,
+					       uevent.devname);
+			if (dev == NULL) {
+				RTE_LOG(ERR, EAL, "Cannot find device (%s) on "
+					"bus (%s)\n", uevent.devname, busname);
+				return;
+			}
+
+			ret = bus->hot_unplug_handler(dev);
+			rte_spinlock_unlock(&failure_handle_lock);
+			if (ret) {
+				RTE_LOG(ERR, EAL, "Can not handle hot-unplug "
+					"for device (%s)\n", dev->name);
+				return;
+			}
+		}
 		dev_callback_process(uevent.devname, uevent.type);
+	}
 }
 
 int __rte_experimental
@@ -220,5 +326,67 @@ rte_dev_event_monitor_stop(void)
 	close(intr_handle.fd);
 	intr_handle.fd = -1;
 	monitor_started = false;
+
 	return 0;
 }
+
+int __rte_experimental
+rte_dev_sigbus_handler_register(void)
+{
+	sigset_t mask;
+	struct sigaction action;
+
+	rte_errno = 0;
+
+	if (sigbus_need_recover)
+		return 0;
+
+	sigemptyset(&mask);
+	sigaddset(&mask, SIGBUS);
+	action.sa_flags = SA_SIGINFO;
+	action.sa_mask = mask;
+	action.sa_sigaction = sigbus_handler;
+	sigbus_need_recover = !sigaction(SIGBUS, &action, &sigbus_action_old);
+
+	return rte_errno;
+}
+
+int __rte_experimental
+rte_dev_sigbus_handler_unregister(void)
+{
+	rte_errno = 0;
+
+	sigbus_action_recover();
+
+	return rte_errno;
+}
+
+int __rte_experimental
+rte_dev_hotplug_handle_enable(void)
+{
+	int ret = 0;
+
+	ret = rte_dev_sigbus_handler_register();
+	if (ret < 0)
+		RTE_LOG(ERR, EAL,
+			"fail to register sigbus handler for devices.\n");
+
+	hotplug_handle = true;
+
+	return ret;
+}
+
+int __rte_experimental
+rte_dev_hotplug_handle_disable(void)
+{
+	int ret = 0;
+
+	ret = rte_dev_sigbus_handler_unregister();
+	if (ret < 0)
+		RTE_LOG(ERR, EAL,
+			"fail to unregister sigbus handler for devices.\n");
+
+	hotplug_handle = false;
+
+	return ret;
+}
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 73282bb..b167b8f 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -281,6 +281,8 @@ EXPERIMENTAL {
 	rte_dev_event_callback_unregister;
 	rte_dev_event_monitor_start;
 	rte_dev_event_monitor_stop;
+	rte_dev_hotplug_handle_disable;
+	rte_dev_hotplug_handle_enable;
 	rte_dev_iterator_init;
 	rte_dev_iterator_next;
 	rte_devargs_add;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v14 7/7] app/testpmd: use hotplug failure handler
  2018-10-04 14:46       ` [PATCH v14 " Jeff Guo
                           ` (5 preceding siblings ...)
  2018-10-04 14:46         ` [PATCH v14 6/7] eal: add failure handle mechanism for hot-unplug Jeff Guo
@ 2018-10-04 14:46         ` Jeff Guo
  2018-10-05 12:26           ` Iremonger, Bernard
  6 siblings, 1 reply; 494+ messages in thread
From: Jeff Guo @ 2018-10-04 14:46 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu, anatoly.burakov, jerin.jacob
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

This patch use testpmd for example, to show how an app smoothly handle
failure when device be hot-unplug. Except that app should enabled the
device event monitor and register the hotplug event’s callback, it also
need enable hotplug handle mechanism before running. Once app detect the
removal event, the hot-unplug callback would be called. It will first stop
the packet forwarding, then stop the port, close the port, and finally
detach the port to clean the device and release the resources.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
v14->v13:
rebase code to fix apply issue.
fix some typo.
---
 app/test-pmd/testpmd.c | 88 ++++++++++++++++++++++++--------------------------
 1 file changed, 42 insertions(+), 46 deletions(-)

diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 001f0e5..db87c63 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -434,9 +434,6 @@ static int eth_event_callback(portid_t port_id,
 static void eth_dev_event_callback(char *device_name,
 				enum rte_dev_event_type type,
 				void *param);
-static int eth_dev_event_callback_register(void);
-static int eth_dev_event_callback_unregister(void);
-
 
 /*
  * Check if all the ports are started.
@@ -1954,39 +1951,6 @@ reset_port(portid_t pid)
 	printf("Done\n");
 }
 
-static int
-eth_dev_event_callback_register(void)
-{
-	int ret;
-
-	/* register the device event callback */
-	ret = rte_dev_event_callback_register(NULL,
-		eth_dev_event_callback, NULL);
-	if (ret) {
-		printf("Failed to register device event callback\n");
-		return -1;
-	}
-
-	return 0;
-}
-
-
-static int
-eth_dev_event_callback_unregister(void)
-{
-	int ret;
-
-	/* unregister the device event callback */
-	ret = rte_dev_event_callback_unregister(NULL,
-		eth_dev_event_callback, NULL);
-	if (ret < 0) {
-		printf("Failed to unregister device event callback\n");
-		return -1;
-	}
-
-	return 0;
-}
-
 void
 attach_port(char *identifier)
 {
@@ -2093,14 +2057,26 @@ pmd_test_exit(void)
 
 	if (hot_plug) {
 		ret = rte_dev_event_monitor_stop();
-		if (ret)
+		if (ret) {
 			RTE_LOG(ERR, EAL,
 				"fail to stop device event monitor.");
+			return;
+		}
 
-		ret = eth_dev_event_callback_unregister();
-		if (ret)
+		ret = rte_dev_event_callback_unregister(NULL,
+			eth_dev_event_callback, NULL);
+		if (ret < 0) {
+			RTE_LOG(ERR, EAL,
+				"fail to unregister device event callback.\n");
+			return;
+		}
+
+		ret = rte_dev_hotplug_handle_disable();
+		if (ret) {
 			RTE_LOG(ERR, EAL,
-				"fail to unregister all event callbacks.");
+				"fail to disable hotplug handling.\n");
+			return;
+		}
 	}
 
 	printf("\nBye...\n");
@@ -2244,6 +2220,9 @@ static void
 eth_dev_event_callback(char *device_name, enum rte_dev_event_type type,
 			     __rte_unused void *arg)
 {
+	uint16_t port_id;
+	int ret;
+
 	if (type >= RTE_DEV_EVENT_MAX) {
 		fprintf(stderr, "%s called upon invalid event %d\n",
 			__func__, type);
@@ -2254,9 +2233,13 @@ eth_dev_event_callback(char *device_name, enum rte_dev_event_type type,
 	case RTE_DEV_EVENT_REMOVE:
 		RTE_LOG(ERR, EAL, "The device: %s has been removed!\n",
 			device_name);
-		/* TODO: After finish failure handle, begin to stop
-		 * packet forward, stop port, close port, detach port.
-		 */
+		ret = rte_eth_dev_get_port_by_name(device_name, &port_id);
+		if (ret) {
+			RTE_LOG(ERR, EAL, "can not get port by device %s!\n",
+				device_name);
+			return;
+		}
+		rmv_event_callback((void *)(intptr_t)port_id);
 		break;
 	case RTE_DEV_EVENT_ADD:
 		RTE_LOG(ERR, EAL, "The device: %s has been added!\n",
@@ -2779,14 +2762,27 @@ main(int argc, char** argv)
 	init_config();
 
 	if (hot_plug) {
-		/* enable hot plug monitoring */
+		ret = rte_dev_hotplug_handle_enable();
+		if (ret) {
+			RTE_LOG(ERR, EAL,
+				"fail to enable hotplug handling.");
+			return -1;
+		}
+
 		ret = rte_dev_event_monitor_start();
 		if (ret) {
-			rte_errno = EINVAL;
+			RTE_LOG(ERR, EAL,
+				"fail to start device event monitoring.");
 			return -1;
 		}
-		eth_dev_event_callback_register();
 
+		ret = rte_dev_event_callback_register(NULL,
+			eth_dev_event_callback, NULL);
+		if (ret) {
+			RTE_LOG(ERR, EAL,
+				"fail  to register device event callback\n");
+			return -1;
+		}
 	}
 
 	if (start_port(RTE_PORT_ALL) != 0)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* Re: [PATCH v14 7/7] app/testpmd: use hotplug failure handler
  2018-10-04 14:46         ` [PATCH v14 7/7] app/testpmd: use hotplug failure handler Jeff Guo
@ 2018-10-05 12:26           ` Iremonger, Bernard
  0 siblings, 0 replies; 494+ messages in thread
From: Iremonger, Bernard @ 2018-10-05 12:26 UTC (permalink / raw)
  To: Guo, Jia, stephen, Richardson, Bruce, Yigit, Ferruh, Ananyev,
	Konstantin, gaetan.rivet, Wu, Jingjing, thomas, motih, matan,
	Van Haaren, Harry, Zhang, Qi Z, He, Shaopeng, arybchenko, Lu,
	Wenzhuo, Burakov, Anatoly, jerin.jacob
  Cc: jblunck, shreyansh.jain, dev, Zhang, Helin



> -----Original Message-----
> From: Guo, Jia
> Sent: Thursday, October 4, 2018 3:46 PM
> To: stephen@networkplumber.org; Richardson, Bruce
> <bruce.richardson@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com>; Ananyev,
> Konstantin <konstantin.ananyev@intel.com>; gaetan.rivet@6wind.com; Wu,
> Jingjing <jingjing.wu@intel.com>; thomas@monjalon.net;
> motih@mellanox.com; matan@mellanox.com; Van Haaren, Harry
> <harry.van.haaren@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>; He,
> Shaopeng <shaopeng.he@intel.com>; Iremonger, Bernard
> <bernard.iremonger@intel.com>; arybchenko@solarflare.com; Lu, Wenzhuo
> <wenzhuo.lu@intel.com>; Burakov, Anatoly <anatoly.burakov@intel.com>;
> jerin.jacob@caviumnetworks.com
> Cc: jblunck@infradead.org; shreyansh.jain@nxp.com; dev@dpdk.org; Guo, Jia
> <jia.guo@intel.com>; Zhang, Helin <helin.zhang@intel.com>
> Subject: [PATCH v14 7/7] app/testpmd: use hotplug failure handler
> 
> This patch use testpmd for example, to show how an app smoothly handle
> failure when device be hot-unplug. Except that app should enabled the device
> event monitor and register the hotplug event’s callback, it also need enable
> hotplug handle mechanism before running. Once app detect the removal event,
> the hot-unplug callback would be called. It will first stop the packet forwarding,
> then stop the port, close the port, and finally detach the port to clean the device
> and release the resources.
> 
> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

Acked-by: Bernard Iremonger <bernard.iremonger@intel.com>


^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v14 6/7] eal: add failure handle mechanism for hot-unplug
  2018-10-04 14:46         ` [PATCH v14 6/7] eal: add failure handle mechanism for hot-unplug Jeff Guo
@ 2018-10-15 10:43           ` Thomas Monjalon
  0 siblings, 0 replies; 494+ messages in thread
From: Thomas Monjalon @ 2018-10-15 10:43 UTC (permalink / raw)
  To: Jeff Guo
  Cc: dev, stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, motih, matan, harry.van.haaren,
	qi.z.zhang, shaopeng.he, bernard.iremonger, arybchenko,
	wenzhuo.lu, anatoly.burakov, jerin.jacob, jblunck,
	shreyansh.jain, helin.zhang

04/10/2018 16:46, Jeff Guo:
> --- a/doc/guides/rel_notes/release_18_08.rst
> +++ b/doc/guides/rel_notes/release_18_08.rst
> @@ -117,6 +117,11 @@ New Features
>  
>    Added support for chained mbufs (input and output).
>  
> +* **Added hot-unplug handle mechanism.**
> +
> +  ``rte_dev_hotplug_handle_enable`` and ``rte_dev_hotplug_handle_disable`` are
> +  for enabling or disabling hotplug handle mechanism.

EAL features should be inserted before mbuf features.
And more importantly, it should be inserted in 18.11 release notes ;)

> --- a/lib/librte_eal/common/eal_private.h
> +++ b/lib/librte_eal/common/eal_private.h
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice

No need of experimental mark for internal functions.

> + *
> + * Register the sigbus handler.
> + *
> + * @return
> + *   - On success, zero.
> + *   - On failure, a negative value.
> + */
> +int
> +rte_dev_sigbus_handler_register(void);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Unregister the sigbus handler.
> + *
> + * @return
> + *   - On success, zero.
> + *   - On failure, a negative value.
> + */
> +int
> +rte_dev_sigbus_handler_unregister(void);

Why are you using rte_ prefix for private functions?

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH v15 0/7] hot-unplug failure handle mechanism
  2017-06-29  4:37     ` [PATCH v3 0/2] add uevent api for hot plug Jeff Guo
                         ` (22 preceding siblings ...)
  2018-10-04 14:46       ` [PATCH v14 " Jeff Guo
@ 2018-10-15 11:27       ` Jeff Guo
  2018-10-15 11:27         ` [PATCH v15 1/7] bus: add hot-unplug handler Jeff Guo
                           ` (7 more replies)
  23 siblings, 8 replies; 494+ messages in thread
From: Jeff Guo @ 2018-10-15 11:27 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu, anatoly.burakov, jerin.jacob
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

Hotplug is an important feature for use-cases like the datacenter device's
fail-safe and for SRIOV Live Migration in SDN/NFV. It could bring higher
flexibility and continuality to networking services in multiple use-cases
in the industry. So let's see how DPDK can help users implement hotplug
solutions.

We already have a general device-event monitor mechanism, failsafe driver,
and hot plug/unplug API in DPDK. We have already got the solution of
“ethdev event + kernel PMD hotplug handler + failsafe”, but we still
not got “eal event + hotplug handler for pci PMD + failsafe” implement,
and we need to considerate 2 different solutions between uio pci and
vfio pci.

In the case of hotplug for igb_uio, when a hardware device be removed
physically or disabled in software, the application needs to be notified
and detach the device out of the bus, and then make the device invalidate.
The problem is that, the removal of the device is not instantaneous in
software. If the application data path tries to read/write to the device
when removal is still in process, it will cause an MMIO error and
application will crash.

In this patch set, we propose a PCIe bus failure handler mechanism for
hot-unplug in igb_uio. It aims to guarantee that, when a hot-unplug occurs,
the application will not crash.

The mechanism should work as below:

First, the application enables the device event monitor, registers the
hotplug event’s callback and enable hotplug handling before running the
data path. Once the hot-unplug occurs, the mechanism will detect the
removal event and then accordingly do the failure handling. In order to
do that, the below functionality will be required:
 - Add a new bus ops “hot_unplug_handler” to handle hot-unplug failure.
 - Implement pci bus specific ops “pci_hot_unplug_handler”. For uio
   pci, it will be based on the failure address to remap memory for the
   corresponding device that unplugged. For vfio pci, could separate
   implement case by case.

For the data path or other unexpected behaviors from the control path
when a hot unplug occurs:
 - Add a new bus ops “sigbus_handler”, that is responsible for handling
   the sigbus error which is either an original memory error, or a specific
   memory error that is caused by a hot unplug. When a sigbus error is
   captured, it will call this function to handle sigbus error.
 - Implement PCI bus specific ops “pci_sigbus_handler”. It will
   iterate over all devices on PCI bus to find which device encounter
   the failure.
 - Implement a "rte_bus_sigbus_handler" to iterate all buses to find a bus
   to handle the failure.
 - Add a couple of APIs “rte_dev_hotplug_handle_enable” and
   “rte_dev_hotplug_handle_disable” to enable/disable hotplug handling.
   It will monitor the sigbus error by a handler which is per-process.
   Based on the signal event principle, the control path thread and the
   data path thread will randomly receive the sigbus error, but will call
   the common sigbus process. When sigbus be captured, it will call the
   above API to find bus to handle it.

The mechanism could be used by app or PMDs. For example, the whole process
of hotplug in testpmd is:
 - Enable device event monitor->Enable hotplug handle->Register event callback
   ->attach port->start port->start forwarding->Device unplug->failure handle
   ->stop forwarding->stop port->close port->detach port.

This patch set would not cover hotplug insert and binding, and it is only
implement the igb_uio failure handler, the vfio hotplug failure handler
will be in next coming patch set.

patchset history:
v15->v14:
fix compling and document issues

v14->v13:
rebase code for fix apply issue.
fix some typo and checkpatch warning issue.

v13->v12:
use local variable to rewrite the func to be more readable.
add sa_flag check when invoke generic sigbus handler
modify some typo
delete needless helper in app

v12->v11:
add and delete some checking about sigbus recover.

v11->v10:
change the ops name, since both uio and vfio will use the hot-unplug ops.
since we plan to abandon RTE_ETH_EVENT_INTR_RMV, change to use
RTE_DEV_EVENT_REMOVE, so modify the hotplug event and callback usage.
move the igb_uio fixing part, since it is random issue and should be considarate
as kernel driver defect but not include as this failure handler mechanism.

v10->v9:
modify the api name and exposure out for public use.
add hotplug handle enable/disable APIs
refine commit log

v9->v8:
refine commit log to be more readable.

v8->v7:
refine errno process in sigbus handler.
refine igb uio release process

v7->v6:
delete some unused part

v6->v5:
refine some description about bus ops
refine commit log
add some entry check.

v5->v4:
split patches to focus on the failure handle, remove the event usage
by testpmd to another patch.
change the hotplug failure handler name.
refine the sigbus handle logic.
add lock for udev state in igb uio driver.

v4->v3:
split patches to be small and clear.
change to use new parameter "--hotplug-mode" in testpmd to identify
the eal hotplug and ethdev hotplug.

v3->v2:
change bus ops name to bus_hotplug_handler.
add new API and bus ops of bus_signal_handler distingush handle generic.
sigbus and hotplug sigbus.

v2->v1(v21):
refine some doc and commit log.
fix igb uio kernel issue for control path failure rebase testpmd code.

Since the hot plug solution be discussed serval around in the public,
the scope be changed and the patch set be split into many times. Coming
to the recently RFC and feature design, it just focus on the hot unplug
failure handler at this patch set, so in order let this topic more clear
and focus, summarize privours patch set in history “v1(v21)”, the v2 here
go ahead for further track.

"v1(21)" == v21 as below:
v21->v20:
split function in hot unplug ops.
sync failure hanlde to fix multiple process issue fix attach port issue for multiple devices case.
combind rmv callback function to be only one.

v20->v19:
clean the code.
refine the remap logic for multiple device.
remove the auto binding.

v19->18:
note for limitation of multiple hotplug, fix some typo, sqeeze patch.

v18->v15:
add document, add signal bus handler, refine the code to be more clear.

the prior patch history please check the patch set "add device event monitor framework".

Jeff Guo (7):
  bus: add hot-unplug handler
  bus/pci: implement hot-unplug handler ops
  bus: add sigbus handler
  bus/pci: implement sigbus handler ops
  bus: add helper to handle sigbus
  eal: add failure handle mechanism for hot-unplug
  app/testpmd: use hotplug failure handler

 app/test-pmd/testpmd.c                  |  88 ++++++++---------
 doc/guides/rel_notes/release_18_11.rst  |   5 +
 drivers/bus/pci/pci_common.c            |  82 +++++++++++++++
 drivers/bus/pci/pci_common_uio.c        |  33 +++++++
 drivers/bus/pci/private.h               |  12 +++
 lib/librte_eal/bsdapp/eal/eal_dev.c     |  14 +++
 lib/librte_eal/common/eal_common_bus.c  |  43 ++++++++
 lib/librte_eal/common/eal_private.h     |  35 +++++++
 lib/librte_eal/common/include/rte_bus.h |  34 +++++++
 lib/librte_eal/common/include/rte_dev.h |  26 +++++
 lib/librte_eal/linuxapp/eal/eal_dev.c   | 170 +++++++++++++++++++++++++++++++-
 lib/librte_eal/rte_eal_version.map      |   2 +
 12 files changed, 497 insertions(+), 47 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH v15 1/7] bus: add hot-unplug handler
  2018-10-15 11:27       ` [PATCH v15 0/7] hot-unplug failure handle mechanism Jeff Guo
@ 2018-10-15 11:27         ` Jeff Guo
  2018-10-15 11:27         ` [PATCH v15 2/7] bus/pci: implement hot-unplug handler ops Jeff Guo
                           ` (6 subsequent siblings)
  7 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-10-15 11:27 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu, anatoly.burakov, jerin.jacob
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

A hot-unplug failure and app crash can be caused, when a device is
hot-unplugged but the application still try to access the device
by reading or writing from the BARs, which is already invalid but
still not timely be unmap or released.

This patch introduces bus ops to handle hot-unplug failures. Each
bus can implement its own case-dependent logic to handle the failures.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
v15->v14:
no change.
---
 lib/librte_eal/common/include/rte_bus.h | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
index b7b5b08..1bb53dc 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -168,6 +168,20 @@ typedef int (*rte_bus_unplug_t)(struct rte_device *dev);
 typedef int (*rte_bus_parse_t)(const char *name, void *addr);
 
 /**
+ * Implement a specific hot-unplug handler, which is responsible for
+ * handle the failure when device be hot-unplugged. When the event of
+ * hot-unplug be detected, it could call this function to handle
+ * the hot-unplug failure and avoid app crash.
+ * @param dev
+ *	Pointer of the device structure.
+ *
+ * @return
+ *	0 on success.
+ *	!0 on error.
+ */
+typedef int (*rte_bus_hot_unplug_handler_t)(struct rte_device *dev);
+
+/**
  * Bus scan policies
  */
 enum rte_bus_scan_mode {
@@ -212,6 +226,8 @@ struct rte_bus {
 	struct rte_bus_conf conf;    /**< Bus configuration */
 	rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
 	rte_dev_iterate_t dev_iterate; /**< Device iterator. */
+	rte_bus_hot_unplug_handler_t hot_unplug_handler;
+				/**< handle hot-unplug failure on the bus */
 };
 
 /**
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v15 2/7] bus/pci: implement hot-unplug handler ops
  2018-10-15 11:27       ` [PATCH v15 0/7] hot-unplug failure handle mechanism Jeff Guo
  2018-10-15 11:27         ` [PATCH v15 1/7] bus: add hot-unplug handler Jeff Guo
@ 2018-10-15 11:27         ` Jeff Guo
  2018-10-15 11:27         ` [PATCH v15 3/7] bus: add sigbus handler Jeff Guo
                           ` (5 subsequent siblings)
  7 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-10-15 11:27 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu, anatoly.burakov, jerin.jacob
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

This patch implements the ops to handle hot-unplug on the PCI bus.
For UIO PCI, it could avoids BARs read/write errors by creating a
new dummy memory to remap the memory where the failure is. For VFIO
or other kernel driver, it could specific implement function to handle
hot-unplug case by case.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Shaopeng He <shaopeng.he@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
v15->v14:
no change.
---
 drivers/bus/pci/pci_common.c     | 28 ++++++++++++++++++++++++++++
 drivers/bus/pci/pci_common_uio.c | 33 +++++++++++++++++++++++++++++++++
 drivers/bus/pci/private.h        | 12 ++++++++++++
 3 files changed, 73 insertions(+)

diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
index c7695d1..be7cc1f 100644
--- a/drivers/bus/pci/pci_common.c
+++ b/drivers/bus/pci/pci_common.c
@@ -404,6 +404,33 @@ pci_find_device(const struct rte_device *start, rte_dev_cmp_t cmp,
 }
 
 static int
+pci_hot_unplug_handler(struct rte_device *dev)
+{
+	struct rte_pci_device *pdev = NULL;
+	int ret = 0;
+
+	pdev = RTE_DEV_TO_PCI(dev);
+	if (!pdev)
+		return -1;
+
+	switch (pdev->kdrv) {
+	case RTE_KDRV_IGB_UIO:
+	case RTE_KDRV_UIO_GENERIC:
+	case RTE_KDRV_NIC_UIO:
+		/* BARs resource is invalid, remap it to be safe. */
+		ret = pci_uio_remap_resource(pdev);
+		break;
+	default:
+		RTE_LOG(DEBUG, EAL,
+			"Not managed by a supported kernel driver, skipped\n");
+		ret = -1;
+		break;
+	}
+
+	return ret;
+}
+
+static int
 pci_plug(struct rte_device *dev)
 {
 	return pci_probe_all_drivers(RTE_DEV_TO_PCI(dev));
@@ -434,6 +461,7 @@ struct rte_pci_bus rte_pci_bus = {
 		.parse = pci_parse,
 		.get_iommu_class = rte_pci_get_iommu_class,
 		.dev_iterate = rte_pci_dev_iterate,
+		.hot_unplug_handler = pci_hot_unplug_handler,
 	},
 	.device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
 	.driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
diff --git a/drivers/bus/pci/pci_common_uio.c b/drivers/bus/pci/pci_common_uio.c
index 54bc20b..7ea73db 100644
--- a/drivers/bus/pci/pci_common_uio.c
+++ b/drivers/bus/pci/pci_common_uio.c
@@ -146,6 +146,39 @@ pci_uio_unmap(struct mapped_pci_resource *uio_res)
 	}
 }
 
+/* remap the PCI resource of a PCI device in anonymous virtual memory */
+int
+pci_uio_remap_resource(struct rte_pci_device *dev)
+{
+	int i;
+	void *map_address;
+
+	if (dev == NULL)
+		return -1;
+
+	/* Remap all BARs */
+	for (i = 0; i != PCI_MAX_RESOURCE; i++) {
+		/* skip empty BAR */
+		if (dev->mem_resource[i].phys_addr == 0)
+			continue;
+		map_address = mmap(dev->mem_resource[i].addr,
+				(size_t)dev->mem_resource[i].len,
+				PROT_READ | PROT_WRITE,
+				MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
+		if (map_address == MAP_FAILED) {
+			RTE_LOG(ERR, EAL,
+				"Cannot remap resource for device %s\n",
+				dev->name);
+			return -1;
+		}
+		RTE_LOG(INFO, EAL,
+			"Successful remap resource for device %s\n",
+			dev->name);
+	}
+
+	return 0;
+}
+
 static struct mapped_pci_resource *
 pci_uio_find_resource(struct rte_pci_device *dev)
 {
diff --git a/drivers/bus/pci/private.h b/drivers/bus/pci/private.h
index 0e689fa..0883d82 100644
--- a/drivers/bus/pci/private.h
+++ b/drivers/bus/pci/private.h
@@ -125,6 +125,18 @@ void pci_uio_free_resource(struct rte_pci_device *dev,
 		struct mapped_pci_resource *uio_res);
 
 /**
+ * Remap the PCI resource of a PCI device in anonymous virtual memory.
+ *
+ * @param dev
+ *   Point to the struct rte pci device.
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+pci_uio_remap_resource(struct rte_pci_device *dev);
+
+/**
  * Map device memory to uio resource
  *
  * This function is private to EAL.
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v15 3/7] bus: add sigbus handler
  2018-10-15 11:27       ` [PATCH v15 0/7] hot-unplug failure handle mechanism Jeff Guo
  2018-10-15 11:27         ` [PATCH v15 1/7] bus: add hot-unplug handler Jeff Guo
  2018-10-15 11:27         ` [PATCH v15 2/7] bus/pci: implement hot-unplug handler ops Jeff Guo
@ 2018-10-15 11:27         ` Jeff Guo
  2018-10-15 11:27         ` [PATCH v15 4/7] bus/pci: implement sigbus handler ops Jeff Guo
                           ` (4 subsequent siblings)
  7 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-10-15 11:27 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu, anatoly.burakov, jerin.jacob
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

When a device is hot-unplugged, a sigbus error will occur of the datapath
can still read/write to the device. A handler is required here to capture
the sigbus signal and handle it appropriately.

This patch introduces a bus ops to handle sigbus errors. Each bus can
implement its own case-dependent logic to handle the sigbus errors.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Shaopeng He <shaopeng.he@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
v15->v14:
no change.
---
 lib/librte_eal/common/include/rte_bus.h | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
index 1bb53dc..6be4b5c 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -182,6 +182,21 @@ typedef int (*rte_bus_parse_t)(const char *name, void *addr);
 typedef int (*rte_bus_hot_unplug_handler_t)(struct rte_device *dev);
 
 /**
+ * Implement a specific sigbus handler, which is responsible for handling
+ * the sigbus error which is either original memory error, or specific memory
+ * error that caused of device be hot-unplugged. When sigbus error be captured,
+ * it could call this function to handle sigbus error.
+ * @param failure_addr
+ *	Pointer of the fault address of the sigbus error.
+ *
+ * @return
+ *	0 for success handle the sigbus for hot-unplug.
+ *	1 for not process it, because it is a generic sigbus error.
+ *	-1 for failed to handle the sigbus for hot-unplug.
+ */
+typedef int (*rte_bus_sigbus_handler_t)(const void *failure_addr);
+
+/**
  * Bus scan policies
  */
 enum rte_bus_scan_mode {
@@ -228,6 +243,9 @@ struct rte_bus {
 	rte_dev_iterate_t dev_iterate; /**< Device iterator. */
 	rte_bus_hot_unplug_handler_t hot_unplug_handler;
 				/**< handle hot-unplug failure on the bus */
+	rte_bus_sigbus_handler_t sigbus_handler;
+					/**< handle sigbus error on the bus */
+
 };
 
 /**
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v15 4/7] bus/pci: implement sigbus handler ops
  2018-10-15 11:27       ` [PATCH v15 0/7] hot-unplug failure handle mechanism Jeff Guo
                           ` (2 preceding siblings ...)
  2018-10-15 11:27         ` [PATCH v15 3/7] bus: add sigbus handler Jeff Guo
@ 2018-10-15 11:27         ` Jeff Guo
  2018-10-15 13:41           ` Thomas Monjalon
  2018-10-15 11:27         ` [PATCH v15 5/7] bus: add helper to handle sigbus Jeff Guo
                           ` (3 subsequent siblings)
  7 siblings, 1 reply; 494+ messages in thread
From: Jeff Guo @ 2018-10-15 11:27 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu, anatoly.burakov, jerin.jacob
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

This patch implements the ops for the PCI bus sigbus handler. It finds the
PCI device that is being hot-unplugged and calls the relevant ops of the
hot-unplug handler to handle the hot-unplug failure of the device.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Shaopeng He <shaopeng.he@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
v15->v14:
fix compling issue.
---
 drivers/bus/pci/pci_common.c | 54 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 54 insertions(+)

diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
index be7cc1f..4325b0e 100644
--- a/drivers/bus/pci/pci_common.c
+++ b/drivers/bus/pci/pci_common.c
@@ -403,6 +403,36 @@ pci_find_device(const struct rte_device *start, rte_dev_cmp_t cmp,
 	return NULL;
 }
 
+/*
+ * find the device which encounter the failure, by iterate over all device on
+ * PCI bus to check if the memory failure address is located in the range
+ * of the BARs of the device.
+ */
+static struct rte_pci_device *
+pci_find_device_by_addr(const void *failure_addr)
+{
+	struct rte_pci_device *pdev = NULL;
+	uint64_t check_point, start, end, len;
+	int i;
+
+	check_point = (uint64_t)(uintptr_t)failure_addr;
+
+	FOREACH_DEVICE_ON_PCIBUS(pdev) {
+		for (i = 0; i != RTE_DIM(pdev->mem_resource); i++) {
+			start = (uint64_t)(uintptr_t)pdev->mem_resource[i].addr;
+			len = pdev->mem_resource[i].len;
+			end = (uint64_t)(uintptr_t)RTE_PTR_ADD(start, len);
+			if (check_point >= start && check_point < end) {
+				RTE_LOG(DEBUG, EAL, "Failure address %16.16"
+					PRIx64" belongs to device %s!\n",
+					check_point, pdev->device.name);
+				return pdev;
+			}
+		}
+	}
+	return NULL;
+}
+
 static int
 pci_hot_unplug_handler(struct rte_device *dev)
 {
@@ -431,6 +461,29 @@ pci_hot_unplug_handler(struct rte_device *dev)
 }
 
 static int
+pci_sigbus_handler(const void *failure_addr)
+{
+	struct rte_pci_device *pdev = NULL;
+	int ret = 0;
+
+	pdev = pci_find_device_by_addr(failure_addr);
+	if (!pdev) {
+		/* It is a generic sigbus error, no bus would handle it. */
+		ret = 1;
+	} else {
+		/* The sigbus error is caused of hot-unplug. */
+		ret = pci_hot_unplug_handler(&pdev->device);
+		if (ret) {
+			RTE_LOG(ERR, EAL,
+				"Failed to handle hot-unplug for device %s",
+				pdev->name);
+			ret = -1;
+		}
+	}
+	return ret;
+}
+
+static int
 pci_plug(struct rte_device *dev)
 {
 	return pci_probe_all_drivers(RTE_DEV_TO_PCI(dev));
@@ -462,6 +515,7 @@ struct rte_pci_bus rte_pci_bus = {
 		.get_iommu_class = rte_pci_get_iommu_class,
 		.dev_iterate = rte_pci_dev_iterate,
 		.hot_unplug_handler = pci_hot_unplug_handler,
+		.sigbus_handler = pci_sigbus_handler,
 	},
 	.device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
 	.driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v15 5/7] bus: add helper to handle sigbus
  2018-10-15 11:27       ` [PATCH v15 0/7] hot-unplug failure handle mechanism Jeff Guo
                           ` (3 preceding siblings ...)
  2018-10-15 11:27         ` [PATCH v15 4/7] bus/pci: implement sigbus handler ops Jeff Guo
@ 2018-10-15 11:27         ` Jeff Guo
  2018-10-15 11:27         ` [PATCH v15 6/7] eal: add failure handle mechanism for hot-unplug Jeff Guo
                           ` (2 subsequent siblings)
  7 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-10-15 11:27 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu, anatoly.burakov, jerin.jacob
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

This patch aims to add a helper to iterate over all buses to find the
relevant bus to handle the sigbus error.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Shaopeng He <shaopeng.he@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
v15->v14:
no change.
---
 lib/librte_eal/common/eal_common_bus.c | 43 ++++++++++++++++++++++++++++++++++
 lib/librte_eal/common/eal_private.h    | 13 ++++++++++
 2 files changed, 56 insertions(+)

diff --git a/lib/librte_eal/common/eal_common_bus.c b/lib/librte_eal/common/eal_common_bus.c
index 0943851..62b7318 100644
--- a/lib/librte_eal/common/eal_common_bus.c
+++ b/lib/librte_eal/common/eal_common_bus.c
@@ -37,6 +37,7 @@
 #include <rte_bus.h>
 #include <rte_debug.h>
 #include <rte_string_fns.h>
+#include <rte_errno.h>
 
 #include "eal_private.h"
 
@@ -242,3 +243,45 @@ rte_bus_get_iommu_class(void)
 	}
 	return mode;
 }
+
+static int
+bus_handle_sigbus(const struct rte_bus *bus,
+			const void *failure_addr)
+{
+	int ret;
+
+	if (!bus->sigbus_handler)
+		return -1;
+
+	ret = bus->sigbus_handler(failure_addr);
+
+	/* find bus but handle failed, keep the errno be set. */
+	if (ret < 0 && rte_errno == 0)
+		rte_errno = ENOTSUP;
+
+	return ret > 0;
+}
+
+int
+rte_bus_sigbus_handler(const void *failure_addr)
+{
+	struct rte_bus *bus;
+
+	int ret = 0;
+	int old_errno = rte_errno;
+
+	rte_errno = 0;
+
+	bus = rte_bus_find(NULL, bus_handle_sigbus, failure_addr);
+	/* can not find bus. */
+	if (!bus)
+		return 1;
+	/* find bus but handle failed, pass on the new errno. */
+	else if (rte_errno != 0)
+		return -1;
+
+	/* restore the old errno. */
+	rte_errno = old_errno;
+
+	return ret;
+}
diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h
index 4f809a8..a2d1528 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -304,4 +304,17 @@ int
 rte_devargs_layers_parse(struct rte_devargs *devargs,
 			 const char *devstr);
 
+/**
+ * Iterate over all buses to find the corresponding bus to handle the sigbus
+ * error.
+ * @param failure_addr
+ *	Pointer of the fault address of the sigbus error.
+ *
+ * @return
+ *	 0 success to handle the sigbus.
+ *	-1 failed to handle the sigbus
+ *	 1 no bus can handler the sigbus
+ */
+int rte_bus_sigbus_handler(const void *failure_addr);
+
 #endif /* _EAL_PRIVATE_H_ */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v15 6/7] eal: add failure handle mechanism for hot-unplug
  2018-10-15 11:27       ` [PATCH v15 0/7] hot-unplug failure handle mechanism Jeff Guo
                           ` (4 preceding siblings ...)
  2018-10-15 11:27         ` [PATCH v15 5/7] bus: add helper to handle sigbus Jeff Guo
@ 2018-10-15 11:27         ` Jeff Guo
  2018-10-15 11:27         ` [PATCH v15 7/7] app/testpmd: use hotplug failure handler Jeff Guo
  2018-10-15 20:19         ` [PATCH v15 0/7] hot-unplug failure handle mechanism Thomas Monjalon
  7 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-10-15 11:27 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu, anatoly.burakov, jerin.jacob
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

The mechanism can initially register the sigbus handler after the device
event monitor is enabled. When a sigbus event is captured, it will check
the failure address and accordingly handle the memory failure of the
corresponding device by invoke the hot-unplug handler. It could prevent
the application from crashing when a device is hot-unplugged.

By this patch, users could call below new added APIs to enable/disable
the device hotplug handle mechanism. Note that it just implement the
hot-unplug handler in these functions, the other handler of hotplug, such
as handler for hotplug binding, could be add in the future if need:
  - rte_dev_hotplug_handle_enable
  - rte_dev_hotplug_handle_disable

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
v15->v14:
fix document and comment issue
---
 doc/guides/rel_notes/release_18_11.rst  |   5 +
 lib/librte_eal/bsdapp/eal/eal_dev.c     |  14 +++
 lib/librte_eal/common/eal_private.h     |  22 +++++
 lib/librte_eal/common/include/rte_dev.h |  26 +++++
 lib/librte_eal/linuxapp/eal/eal_dev.c   | 170 +++++++++++++++++++++++++++++++-
 lib/librte_eal/rte_eal_version.map      |   2 +
 6 files changed, 238 insertions(+), 1 deletion(-)

diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 2133a5b..5eaf926 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -104,6 +104,11 @@ New Features
   the specified port. The port must be stopped before the command call in order
   to reconfigure queues.
 
+* **Added hot-unplug handle mechanism.**
+
+  ``rte_dev_hotplug_handle_enable`` and ``rte_dev_hotplug_handle_disable`` are
+  for enabling or disabling hotplug handle mechanism.
+
 
 API Changes
 -----------
diff --git a/lib/librte_eal/bsdapp/eal/eal_dev.c b/lib/librte_eal/bsdapp/eal/eal_dev.c
index 1c6c51b..255d611 100644
--- a/lib/librte_eal/bsdapp/eal/eal_dev.c
+++ b/lib/librte_eal/bsdapp/eal/eal_dev.c
@@ -19,3 +19,17 @@ rte_dev_event_monitor_stop(void)
 	RTE_LOG(ERR, EAL, "Device event is not supported for FreeBSD\n");
 	return -1;
 }
+
+int __rte_experimental
+rte_dev_hotplug_handle_enable(void)
+{
+	RTE_LOG(ERR, EAL, "Device event is not supported for FreeBSD\n");
+	return -1;
+}
+
+int __rte_experimental
+rte_dev_hotplug_handle_disable(void)
+{
+	RTE_LOG(ERR, EAL, "Device event is not supported for FreeBSD\n");
+	return -1;
+}
diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h
index a2d1528..4174d33 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -317,4 +317,26 @@ rte_devargs_layers_parse(struct rte_devargs *devargs,
  */
 int rte_bus_sigbus_handler(const void *failure_addr);
 
+/**
+ * @internal
+ * Register the sigbus handler.
+ *
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+dev_sigbus_handler_register(void);
+
+/**
+ * @internal
+ * Unregister the sigbus handler.
+ *
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+dev_sigbus_handler_unregister(void);
+
 #endif /* _EAL_PRIVATE_H_ */
diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h
index b80a805..ff580a0 100644
--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -460,4 +460,30 @@ rte_dev_event_monitor_start(void);
 int __rte_experimental
 rte_dev_event_monitor_stop(void);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Enable hotplug handling for devices.
+ *
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int __rte_experimental
+rte_dev_hotplug_handle_enable(void);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Disable hotplug handling for devices.
+ *
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int __rte_experimental
+rte_dev_hotplug_handle_disable(void);
+
 #endif /* _RTE_DEV_H_ */
diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c
index 1cf6aeb..fe662ef 100644
--- a/lib/librte_eal/linuxapp/eal/eal_dev.c
+++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
@@ -4,6 +4,8 @@
 
 #include <string.h>
 #include <unistd.h>
+#include <fcntl.h>
+#include <signal.h>
 #include <sys/socket.h>
 #include <linux/netlink.h>
 
@@ -14,15 +16,32 @@
 #include <rte_malloc.h>
 #include <rte_interrupts.h>
 #include <rte_alarm.h>
+#include <rte_bus.h>
+#include <rte_eal.h>
+#include <rte_spinlock.h>
+#include <rte_errno.h>
 
 #include "eal_private.h"
 
 static struct rte_intr_handle intr_handle = {.fd = -1 };
 static bool monitor_started;
+static bool hotplug_handle;
 
 #define EAL_UEV_MSG_LEN 4096
 #define EAL_UEV_MSG_ELEM_LEN 128
 
+/*
+ * spinlock for device hot-unplug failure handling. If it try to access bus or
+ * device, such as handle sigbus on bus or handle memory failure for device
+ * just need to use this lock. It could protect the bus and the device to avoid
+ * race condition.
+ */
+static rte_spinlock_t failure_handle_lock = RTE_SPINLOCK_INITIALIZER;
+
+static struct sigaction sigbus_action_old;
+
+static int sigbus_need_recover;
+
 static void dev_uev_handler(__rte_unused void *param);
 
 /* identify the system layer which reports this event. */
@@ -33,6 +52,55 @@ enum eal_dev_event_subsystem {
 	EAL_DEV_EVENT_SUBSYSTEM_MAX
 };
 
+static void
+sigbus_action_recover(void)
+{
+	if (sigbus_need_recover) {
+		sigaction(SIGBUS, &sigbus_action_old, NULL);
+		sigbus_need_recover = 0;
+	}
+}
+
+static void sigbus_handler(int signum, siginfo_t *info,
+				void *ctx __rte_unused)
+{
+	int ret;
+
+	RTE_LOG(DEBUG, EAL, "Thread[%d] catch SIGBUS, fault address:%p\n",
+		(int)pthread_self(), info->si_addr);
+
+	rte_spinlock_lock(&failure_handle_lock);
+	ret = rte_bus_sigbus_handler(info->si_addr);
+	rte_spinlock_unlock(&failure_handle_lock);
+	if (ret == -1) {
+		rte_exit(EXIT_FAILURE,
+			 "Failed to handle SIGBUS for hot-unplug, "
+			 "(rte_errno: %s)!", strerror(rte_errno));
+	} else if (ret == 1) {
+		if (sigbus_action_old.sa_flags == SA_SIGINFO
+		    && sigbus_action_old.sa_sigaction) {
+			(*(sigbus_action_old.sa_sigaction))(signum,
+							    info, ctx);
+		} else if (sigbus_action_old.sa_flags != SA_SIGINFO
+			   && sigbus_action_old.sa_handler) {
+			(*(sigbus_action_old.sa_handler))(signum);
+		} else {
+			rte_exit(EXIT_FAILURE,
+				 "Failed to handle generic SIGBUS!");
+		}
+	}
+
+	RTE_LOG(DEBUG, EAL, "Success to handle SIGBUS for hot-unplug!\n");
+}
+
+static int cmp_dev_name(const struct rte_device *dev,
+	const void *_name)
+{
+	const char *name = _name;
+
+	return strcmp(dev->name, name);
+}
+
 static int
 dev_uev_socket_fd_create(void)
 {
@@ -147,6 +215,9 @@ dev_uev_handler(__rte_unused void *param)
 	struct rte_dev_event uevent;
 	int ret;
 	char buf[EAL_UEV_MSG_LEN];
+	struct rte_bus *bus;
+	struct rte_device *dev;
+	const char *busname = "";
 
 	memset(&uevent, 0, sizeof(struct rte_dev_event));
 	memset(buf, 0, EAL_UEV_MSG_LEN);
@@ -171,8 +242,43 @@ dev_uev_handler(__rte_unused void *param)
 	RTE_LOG(DEBUG, EAL, "receive uevent(name:%s, type:%d, subsystem:%d)\n",
 		uevent.devname, uevent.type, uevent.subsystem);
 
-	if (uevent.devname)
+	switch (uevent.subsystem) {
+	case EAL_DEV_EVENT_SUBSYSTEM_PCI:
+	case EAL_DEV_EVENT_SUBSYSTEM_UIO:
+		busname = "pci";
+		break;
+	default:
+		break;
+	}
+
+	if (uevent.devname) {
+		if (uevent.type == RTE_DEV_EVENT_REMOVE && hotplug_handle) {
+			rte_spinlock_lock(&failure_handle_lock);
+			bus = rte_bus_find_by_name(busname);
+			if (bus == NULL) {
+				RTE_LOG(ERR, EAL, "Cannot find bus (%s)\n",
+					busname);
+				return;
+			}
+
+			dev = bus->find_device(NULL, cmp_dev_name,
+					       uevent.devname);
+			if (dev == NULL) {
+				RTE_LOG(ERR, EAL, "Cannot find device (%s) on "
+					"bus (%s)\n", uevent.devname, busname);
+				return;
+			}
+
+			ret = bus->hot_unplug_handler(dev);
+			rte_spinlock_unlock(&failure_handle_lock);
+			if (ret) {
+				RTE_LOG(ERR, EAL, "Can not handle hot-unplug "
+					"for device (%s)\n", dev->name);
+				return;
+			}
+		}
 		dev_callback_process(uevent.devname, uevent.type);
+	}
 }
 
 int __rte_experimental
@@ -220,5 +326,67 @@ rte_dev_event_monitor_stop(void)
 	close(intr_handle.fd);
 	intr_handle.fd = -1;
 	monitor_started = false;
+
 	return 0;
 }
+
+int
+dev_sigbus_handler_register(void)
+{
+	sigset_t mask;
+	struct sigaction action;
+
+	rte_errno = 0;
+
+	if (sigbus_need_recover)
+		return 0;
+
+	sigemptyset(&mask);
+	sigaddset(&mask, SIGBUS);
+	action.sa_flags = SA_SIGINFO;
+	action.sa_mask = mask;
+	action.sa_sigaction = sigbus_handler;
+	sigbus_need_recover = !sigaction(SIGBUS, &action, &sigbus_action_old);
+
+	return rte_errno;
+}
+
+int
+dev_sigbus_handler_unregister(void)
+{
+	rte_errno = 0;
+
+	sigbus_action_recover();
+
+	return rte_errno;
+}
+
+int __rte_experimental
+rte_dev_hotplug_handle_enable(void)
+{
+	int ret = 0;
+
+	ret = dev_sigbus_handler_register();
+	if (ret < 0)
+		RTE_LOG(ERR, EAL,
+			"fail to register sigbus handler for devices.\n");
+
+	hotplug_handle = true;
+
+	return ret;
+}
+
+int __rte_experimental
+rte_dev_hotplug_handle_disable(void)
+{
+	int ret = 0;
+
+	ret = dev_sigbus_handler_unregister();
+	if (ret < 0)
+		RTE_LOG(ERR, EAL,
+			"fail to unregister sigbus handler for devices.\n");
+
+	hotplug_handle = false;
+
+	return ret;
+}
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 73282bb..b167b8f 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -281,6 +281,8 @@ EXPERIMENTAL {
 	rte_dev_event_callback_unregister;
 	rte_dev_event_monitor_start;
 	rte_dev_event_monitor_stop;
+	rte_dev_hotplug_handle_disable;
+	rte_dev_hotplug_handle_enable;
 	rte_dev_iterator_init;
 	rte_dev_iterator_next;
 	rte_devargs_add;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* [PATCH v15 7/7] app/testpmd: use hotplug failure handler
  2018-10-15 11:27       ` [PATCH v15 0/7] hot-unplug failure handle mechanism Jeff Guo
                           ` (5 preceding siblings ...)
  2018-10-15 11:27         ` [PATCH v15 6/7] eal: add failure handle mechanism for hot-unplug Jeff Guo
@ 2018-10-15 11:27         ` Jeff Guo
  2018-10-15 20:19         ` [PATCH v15 0/7] hot-unplug failure handle mechanism Thomas Monjalon
  7 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-10-15 11:27 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu, anatoly.burakov, jerin.jacob
  Cc: jblunck, shreyansh.jain, dev, jia.guo, helin.zhang

This patch use testpmd for example, to show how an app smoothly handle
failure when device be hot-unplug. Except that app should enabled the
device event monitor and register the hotplug event’s callback, it also
need enable hotplug handle mechanism before running. Once app detect the
removal event, the hot-unplug callback would be called. It will first stop
the packet forwarding, then stop the port, close the port, and finally
detach the port to clean the device and release the resources.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
v15->v14:
no change.
---
 app/test-pmd/testpmd.c | 88 ++++++++++++++++++++++++--------------------------
 1 file changed, 42 insertions(+), 46 deletions(-)

diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 001f0e5..db87c63 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -434,9 +434,6 @@ static int eth_event_callback(portid_t port_id,
 static void eth_dev_event_callback(char *device_name,
 				enum rte_dev_event_type type,
 				void *param);
-static int eth_dev_event_callback_register(void);
-static int eth_dev_event_callback_unregister(void);
-
 
 /*
  * Check if all the ports are started.
@@ -1954,39 +1951,6 @@ reset_port(portid_t pid)
 	printf("Done\n");
 }
 
-static int
-eth_dev_event_callback_register(void)
-{
-	int ret;
-
-	/* register the device event callback */
-	ret = rte_dev_event_callback_register(NULL,
-		eth_dev_event_callback, NULL);
-	if (ret) {
-		printf("Failed to register device event callback\n");
-		return -1;
-	}
-
-	return 0;
-}
-
-
-static int
-eth_dev_event_callback_unregister(void)
-{
-	int ret;
-
-	/* unregister the device event callback */
-	ret = rte_dev_event_callback_unregister(NULL,
-		eth_dev_event_callback, NULL);
-	if (ret < 0) {
-		printf("Failed to unregister device event callback\n");
-		return -1;
-	}
-
-	return 0;
-}
-
 void
 attach_port(char *identifier)
 {
@@ -2093,14 +2057,26 @@ pmd_test_exit(void)
 
 	if (hot_plug) {
 		ret = rte_dev_event_monitor_stop();
-		if (ret)
+		if (ret) {
 			RTE_LOG(ERR, EAL,
 				"fail to stop device event monitor.");
+			return;
+		}
 
-		ret = eth_dev_event_callback_unregister();
-		if (ret)
+		ret = rte_dev_event_callback_unregister(NULL,
+			eth_dev_event_callback, NULL);
+		if (ret < 0) {
+			RTE_LOG(ERR, EAL,
+				"fail to unregister device event callback.\n");
+			return;
+		}
+
+		ret = rte_dev_hotplug_handle_disable();
+		if (ret) {
 			RTE_LOG(ERR, EAL,
-				"fail to unregister all event callbacks.");
+				"fail to disable hotplug handling.\n");
+			return;
+		}
 	}
 
 	printf("\nBye...\n");
@@ -2244,6 +2220,9 @@ static void
 eth_dev_event_callback(char *device_name, enum rte_dev_event_type type,
 			     __rte_unused void *arg)
 {
+	uint16_t port_id;
+	int ret;
+
 	if (type >= RTE_DEV_EVENT_MAX) {
 		fprintf(stderr, "%s called upon invalid event %d\n",
 			__func__, type);
@@ -2254,9 +2233,13 @@ eth_dev_event_callback(char *device_name, enum rte_dev_event_type type,
 	case RTE_DEV_EVENT_REMOVE:
 		RTE_LOG(ERR, EAL, "The device: %s has been removed!\n",
 			device_name);
-		/* TODO: After finish failure handle, begin to stop
-		 * packet forward, stop port, close port, detach port.
-		 */
+		ret = rte_eth_dev_get_port_by_name(device_name, &port_id);
+		if (ret) {
+			RTE_LOG(ERR, EAL, "can not get port by device %s!\n",
+				device_name);
+			return;
+		}
+		rmv_event_callback((void *)(intptr_t)port_id);
 		break;
 	case RTE_DEV_EVENT_ADD:
 		RTE_LOG(ERR, EAL, "The device: %s has been added!\n",
@@ -2779,14 +2762,27 @@ main(int argc, char** argv)
 	init_config();
 
 	if (hot_plug) {
-		/* enable hot plug monitoring */
+		ret = rte_dev_hotplug_handle_enable();
+		if (ret) {
+			RTE_LOG(ERR, EAL,
+				"fail to enable hotplug handling.");
+			return -1;
+		}
+
 		ret = rte_dev_event_monitor_start();
 		if (ret) {
-			rte_errno = EINVAL;
+			RTE_LOG(ERR, EAL,
+				"fail to start device event monitoring.");
 			return -1;
 		}
-		eth_dev_event_callback_register();
 
+		ret = rte_dev_event_callback_register(NULL,
+			eth_dev_event_callback, NULL);
+		if (ret) {
+			RTE_LOG(ERR, EAL,
+				"fail  to register device event callback\n");
+			return -1;
+		}
 	}
 
 	if (start_port(RTE_PORT_ALL) != 0)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* Re: [PATCH v15 4/7] bus/pci: implement sigbus handler ops
  2018-10-15 11:27         ` [PATCH v15 4/7] bus/pci: implement sigbus handler ops Jeff Guo
@ 2018-10-15 13:41           ` Thomas Monjalon
  2018-10-15 14:16             ` Thomas Monjalon
  0 siblings, 1 reply; 494+ messages in thread
From: Thomas Monjalon @ 2018-10-15 13:41 UTC (permalink / raw)
  To: Jeff Guo
  Cc: dev, stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, motih, matan, harry.van.haaren,
	qi.z.zhang, shaopeng.he, bernard.iremonger, arybchenko,
	wenzhuo.lu, anatoly.burakov, jerin.jacob, jblunck,
	shreyansh.jain, helin.zhang

15/10/2018 13:27, Jeff Guo:
> This patch implements the ops for the PCI bus sigbus handler. It finds the
> PCI device that is being hot-unplugged and calls the relevant ops of the
> hot-unplug handler to handle the hot-unplug failure of the device.
> 
> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> Acked-by: Shaopeng He <shaopeng.he@intel.com>
> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> ---
> v15->v14:
> fix compling issue.
> ---
> +static struct rte_pci_device *
> +pci_find_device_by_addr(const void *failure_addr)
> +{
> +	struct rte_pci_device *pdev = NULL;
> +	uint64_t check_point, start, end, len;
> +	int i;
> +
> +	check_point = (uint64_t)(uintptr_t)failure_addr;
> +
> +	FOREACH_DEVICE_ON_PCIBUS(pdev) {
> +		for (i = 0; i != RTE_DIM(pdev->mem_resource); i++) {
> +			start = (uint64_t)(uintptr_t)pdev->mem_resource[i].addr;
> +			len = pdev->mem_resource[i].len;
> +			end = (uint64_t)(uintptr_t)RTE_PTR_ADD(start, len);

When compiling for 32-bit, there is an error:
cast to pointer from integer of different size

start is not a pointer.
I think it must be replaced by a simple addition.
	end = start + len;

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v15 4/7] bus/pci: implement sigbus handler ops
  2018-10-15 13:41           ` Thomas Monjalon
@ 2018-10-15 14:16             ` Thomas Monjalon
  0 siblings, 0 replies; 494+ messages in thread
From: Thomas Monjalon @ 2018-10-15 14:16 UTC (permalink / raw)
  To: Jeff Guo
  Cc: dev, stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, motih, matan, harry.van.haaren,
	qi.z.zhang, shaopeng.he, bernard.iremonger, arybchenko,
	wenzhuo.lu, anatoly.burakov, jerin.jacob, jblunck,
	shreyansh.jain, helin.zhang

15/10/2018 15:41, Thomas Monjalon:
> 15/10/2018 13:27, Jeff Guo:
> > This patch implements the ops for the PCI bus sigbus handler. It finds the
> > PCI device that is being hot-unplugged and calls the relevant ops of the
> > hot-unplug handler to handle the hot-unplug failure of the device.
> > 
> > Signed-off-by: Jeff Guo <jia.guo@intel.com>
> > Acked-by: Shaopeng He <shaopeng.he@intel.com>
> > Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> > ---
> > v15->v14:
> > fix compling issue.
> > ---
> > +static struct rte_pci_device *
> > +pci_find_device_by_addr(const void *failure_addr)
> > +{
> > +	struct rte_pci_device *pdev = NULL;
> > +	uint64_t check_point, start, end, len;
> > +	int i;
> > +
> > +	check_point = (uint64_t)(uintptr_t)failure_addr;
> > +
> > +	FOREACH_DEVICE_ON_PCIBUS(pdev) {
> > +		for (i = 0; i != RTE_DIM(pdev->mem_resource); i++) {
> > +			start = (uint64_t)(uintptr_t)pdev->mem_resource[i].addr;
> > +			len = pdev->mem_resource[i].len;
> > +			end = (uint64_t)(uintptr_t)RTE_PTR_ADD(start, len);
> 
> When compiling for 32-bit, there is an error:
> cast to pointer from integer of different size
> 
> start is not a pointer.
> I think it must be replaced by a simple addition.
> 	end = start + len;

I will fix it on apply.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v15 0/7] hot-unplug failure handle mechanism
  2018-10-15 11:27       ` [PATCH v15 0/7] hot-unplug failure handle mechanism Jeff Guo
                           ` (6 preceding siblings ...)
  2018-10-15 11:27         ` [PATCH v15 7/7] app/testpmd: use hotplug failure handler Jeff Guo
@ 2018-10-15 20:19         ` Thomas Monjalon
  7 siblings, 0 replies; 494+ messages in thread
From: Thomas Monjalon @ 2018-10-15 20:19 UTC (permalink / raw)
  To: Jeff Guo
  Cc: dev, stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, motih, matan, harry.van.haaren,
	qi.z.zhang, shaopeng.he, bernard.iremonger, arybchenko,
	wenzhuo.lu, anatoly.burakov, jerin.jacob, jblunck,
	shreyansh.jain, helin.zhang

> Jeff Guo (7):
>   bus: add hot-unplug handler
>   bus/pci: implement hot-unplug handler ops
>   bus: add sigbus handler
>   bus/pci: implement sigbus handler ops
>   bus: add helper to handle sigbus
>   eal: add failure handle mechanism for hot-unplug
>   app/testpmd: use hotplug failure handler

Applied, thanks!

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v10 7/8] igb_uio: fix unexpected remove issue for hotplug
  2018-09-27 15:07           ` Ferruh Yigit
@ 2018-10-18  5:51             ` Jeff Guo
  0 siblings, 0 replies; 494+ messages in thread
From: Jeff Guo @ 2018-10-18  5:51 UTC (permalink / raw)
  To: Ferruh Yigit, stephen, bruce.richardson, konstantin.ananyev,
	gaetan.rivet, jingjing.wu, thomas, motih, matan,
	harry.van.haaren, qi.z.zhang, shaopeng.he, bernard.iremonger,
	arybchenko, wenzhuo.lu
  Cc: jblunck, shreyansh.jain, dev, helin.zhang

hi, ferruh

On 9/27/2018 11:07 PM, Ferruh Yigit wrote:
> On 8/17/2018 11:48 AM, Jeff Guo wrote:
>> When a device is hotplugged out, the PCI resource is released in the
>> kernel, the UIO file descriptor will disappear and the irq will be
>> released. After this, a kernel crash will be caused if the igb uio driver
>> tries to access or release these resources.
>>
>> And more, uio_remove will be called unexpectedly before uio_release
>> when device be hotpluggged out, the uio_remove procedure will
>> free resources that are required by uio_release. This will later affect the
>> usage of interrupt as there is no way to disable the interrupt which is
>> defined in uio_release.
>>
>> To prevent this, the hotplug removal needs to be identified and processed
>> accordingly in igb uio driver.
>>
>> This patch proposes the addition of enum rte_udev_state in the
>> rte_uio_pci_dev struct. This will store the state of the uio device as one
>> of the following: probed/opened/released/removed.
>>
>> This patch also checks the kobject's remove_uevent_sent state to detect if
>> the removal status is hotplug-out. Once a hotplug-out is detected, it will
>> call uio_release and set the uio status to "removed". After that, uio will
>> check the status in the uio_release function. If uio has already been
>> removed, it will only free the dirty uio resource.
>>
>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
>> Acked-by: Shaopeng He <shaopeng.he@intel.com>
> <...>
>
>> @@ -331,20 +351,35 @@ igbuio_pci_open(struct uio_info *info, struct inode *inode)
>>   
>>   	/* enable interrupts */
>>   	err = igbuio_pci_enable_interrupts(udev);
>> -	mutex_unlock(&udev->lock);
>>   	if (err) {
>>   		dev_err(&dev->dev, "Enable interrupt fails\n");
>> +		pci_clear_master(dev);
> Why pci_clear_master required here.


It is because set master is before interrupt enabling, if enable 
interrupt fails should clear master before return i think.

Anyway it is not belong to this patch perspective, it could be separated 
to another one.


> btw, some part of this patch conflicts with [1], which removes mutes and use
> atomic refcnt operations, but introducing state seems needs mutex.
>
> [1]
> igb_uio: fix refcount if open returns error
> https://patches.dpdk.org/patch/44732/


yes, i see and will rework for that if need.


>> +		mutex_unlock(&udev->lock);
>>   		return err;
>>   	}
>> +	udev->state = RTE_UDEV_OPENNED;
>> +	mutex_unlock(&udev->lock);
>>   	return 0;
>>   }
>>   
>> +/**
>> + * This gets called while closing uio device file.
>> + */
>>   static int
>>   igbuio_pci_release(struct uio_info *info, struct inode *inode)
>>   {
>>   	struct rte_uio_pci_dev *udev = info->priv;
>>   	struct pci_dev *dev = udev->pdev;
>>   
>> +	if (udev->state == RTE_UDEV_REMOVED) {
>> +		mutex_destroy(&udev->lock);
>> +		igbuio_pci_release_iomem(&udev->info);
>> +		pci_disable_device(dev);
>> +		pci_set_drvdata(dev, NULL);
>> +		kfree(udev);
>> +		return 0;
> This branch taken when pci_remove called before pci_release.
> - At this stage is "dev" valid, since pci_remove() called?
> - In this path uio_unregister_device() is missing, who unregisters uio?
> - sysfs_remove_group() also missing, it is not clear if it is forgotten or left
> out, what do you think move common part of pci_remove into new function and call
> both in pci_remove and here?


It is not forgotten but specific left out, since the if uio remove 
before uio release it will cause kernel error, which is double free the

already-free irq issue when uio unregister device.


> And as a logic, can we make pci_remove clear everything, instead of doing some
> cleanup here. Like:
> pci_remove:
> - calls pci_release
> - instead of return keeps doing pci_remove work
> - set state to REMOVED
>
> pci_release:
> - if state is REMOVED, return without doing nothing


I think the logic you said here is make sense, just make release and 
remove more focus their own work.


>
> btw, even after uio_unregister_device() how pci_release called?


  The consequence of igb uio removal is that, igb uio remove be called, 
then igb uio release be called when detaching device, then if quit app 
it will call pci remove.


>
> It can help to share crash backtrace in commit log, to describe problem in more
> detail.


I will do that. And i think the 2 thing need to fix is that, one is the 
double free irq issue, the other one is give the chance to disable 
interrupt when uio remove be called before uio release. I check again 
and find that just add release before remove could both fix these

issues, so please review my coming update patch. Thanks.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* [PATCH v1] igb_uio: fix unexpected removal for hot-unplug
  2018-08-17 10:48         ` [PATCH v10 7/8] igb_uio: fix unexpected remove issue " Jeff Guo
  2018-09-27 15:07           ` Ferruh Yigit
@ 2018-10-18  6:27           ` Jeff Guo
  2018-10-18 16:06             ` Ferruh Yigit
  1 sibling, 1 reply; 494+ messages in thread
From: Jeff Guo @ 2018-10-18  6:27 UTC (permalink / raw)
  To: stephen, bruce.richardson, ferruh.yigit, konstantin.ananyev,
	thomas, shaopeng.he
  Cc: dev, jia.guo, helin.zhang

When a device is hot-unplugged, pci_remove will be invoked unexpectedly
before pci_release, it will caused kernel hung issue which will throw the
error info of "Trying to free already-free IRQ XXX". And on the other hand,
if pci_remove before pci_release, the interrupt will not got chance to be
disabled. So this patch aim to fix this issue by adding pci_release call
in pci_remove, it will gurranty that all pci clean up will be done before
pci removal.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
 kernel/linux/igb_uio/igb_uio.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/linux/igb_uio/igb_uio.c b/kernel/linux/igb_uio/igb_uio.c
index fede66c..3cf394b 100644
--- a/kernel/linux/igb_uio/igb_uio.c
+++ b/kernel/linux/igb_uio/igb_uio.c
@@ -570,6 +570,8 @@ igbuio_pci_remove(struct pci_dev *dev)
 {
 	struct rte_uio_pci_dev *udev = pci_get_drvdata(dev);
 
+	igbuio_pci_release(&udev->info, NULL);
+
 	sysfs_remove_group(&dev->dev.kobj, &dev_attr_grp);
 	uio_unregister_device(&udev->info);
 	igbuio_pci_release_iomem(&udev->info);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 494+ messages in thread

* Re: [PATCH v1] igb_uio: fix unexpected removal for hot-unplug
  2018-10-18  6:27           ` [PATCH v1] igb_uio: fix unexpected removal for hot-unplug Jeff Guo
@ 2018-10-18 16:06             ` Ferruh Yigit
  2018-10-19  8:35               ` Jeff Guo
  0 siblings, 1 reply; 494+ messages in thread
From: Ferruh Yigit @ 2018-10-18 16:06 UTC (permalink / raw)
  To: Jeff Guo, stephen, bruce.richardson, konstantin.ananyev, thomas,
	shaopeng.he
  Cc: dev, helin.zhang

On 10/18/2018 7:27 AM, Jeff Guo wrote:
> When a device is hot-unplugged, pci_remove will be invoked unexpectedly
> before pci_release, it will caused kernel hung issue which will throw the
> error info of "Trying to free already-free IRQ XXX". And on the other hand,
> if pci_remove before pci_release, the interrupt will not got chance to be
> disabled. So this patch aim to fix this issue by adding pci_release call
> in pci_remove, it will gurranty that all pci clean up will be done before
> pci removal.
> 
> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> ---
>  kernel/linux/igb_uio/igb_uio.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/kernel/linux/igb_uio/igb_uio.c b/kernel/linux/igb_uio/igb_uio.c
> index fede66c..3cf394b 100644
> --- a/kernel/linux/igb_uio/igb_uio.c
> +++ b/kernel/linux/igb_uio/igb_uio.c
> @@ -570,6 +570,8 @@ igbuio_pci_remove(struct pci_dev *dev)
>  {
>  	struct rte_uio_pci_dev *udev = pci_get_drvdata(dev);
>  
> +	igbuio_pci_release(&udev->info, NULL);
> +

Hi Jeff,

This is simpler approach comparing to previous version.

And do you know if igbuio_pci_release() won't be called after
igbuio_pci_remove() because that will also cause crash, and indeed it will cause
a crash in the uio too.

The flow as far as I can see:
when uioN device opened by application, igbuio_pci_open() is called.

If device removed, I expect driver remove() function called, which has a call
stack like below:

igbuio_pci_remove()
  uio_unregister_device()
    uio_device_release()
      kfree(struct uio_device)

After this point udev is freed and igbuio_pci_release() shouldn't be called, so
I assume uioN device closed before this point but I couldn't find where, if not
closed, closing it later will crash.

I can't test the hotplug case, can you please confirm above patch fixing crashes
you observed for your use cases?

And for regular usecase this change shouldn't cause any problem, so at worst it
may not be fixing all hotplug issues, which looks safe to get.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v1] igb_uio: fix unexpected removal for hot-unplug
  2018-10-18 16:06             ` Ferruh Yigit
@ 2018-10-19  8:35               ` Jeff Guo
  2018-10-22 11:13                 ` Ferruh Yigit
  0 siblings, 1 reply; 494+ messages in thread
From: Jeff Guo @ 2018-10-19  8:35 UTC (permalink / raw)
  To: Ferruh Yigit, stephen, bruce.richardson, konstantin.ananyev,
	thomas, shaopeng.he
  Cc: dev, helin.zhang


On 10/19/2018 12:06 AM, Ferruh Yigit wrote:
> On 10/18/2018 7:27 AM, Jeff Guo wrote:
>> When a device is hot-unplugged, pci_remove will be invoked unexpectedly
>> before pci_release, it will caused kernel hung issue which will throw the
>> error info of "Trying to free already-free IRQ XXX". And on the other hand,
>> if pci_remove before pci_release, the interrupt will not got chance to be
>> disabled. So this patch aim to fix this issue by adding pci_release call
>> in pci_remove, it will gurranty that all pci clean up will be done before
>> pci removal.
>>
>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
>> ---
>>   kernel/linux/igb_uio/igb_uio.c | 2 ++
>>   1 file changed, 2 insertions(+)
>>
>> diff --git a/kernel/linux/igb_uio/igb_uio.c b/kernel/linux/igb_uio/igb_uio.c
>> index fede66c..3cf394b 100644
>> --- a/kernel/linux/igb_uio/igb_uio.c
>> +++ b/kernel/linux/igb_uio/igb_uio.c
>> @@ -570,6 +570,8 @@ igbuio_pci_remove(struct pci_dev *dev)
>>   {
>>   	struct rte_uio_pci_dev *udev = pci_get_drvdata(dev);
>>   
>> +	igbuio_pci_release(&udev->info, NULL);
>> +
> Hi Jeff,
>
> This is simpler approach comparing to previous version.
>
> And do you know if igbuio_pci_release() won't be called after
> igbuio_pci_remove() because that will also cause crash, and indeed it will cause
> a crash in the uio too.
>
> The flow as far as I can see:
> when uioN device opened by application, igbuio_pci_open() is called.
>
> If device removed, I expect driver remove() function called, which has a call
> stack like below:
>
> igbuio_pci_remove()
>    uio_unregister_device()
>      uio_device_release()
>        kfree(struct uio_device)
>
> After this point udev is freed and igbuio_pci_release() shouldn't be called, so
> I assume uioN device closed before this point but I couldn't find where, if not
> closed, closing it later will crash.


What i saw is that after igb_uio remove , if detach the device the pci 
release will be called, so the

igbuo_pci_release should be called again.


> I can't test the hotplug case, can you please confirm above patch fixing crashes
> you observed for your use cases?


yes, it could be fix the crashed i observed right now.


>
> And for regular usecase this change shouldn't cause any problem, so at worst it
> may not be fixing all hotplug issues, which looks safe to get.


I think it would fix this hung issue that caused of double free irq and 
would not have side effect anyway.

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v1] igb_uio: fix unexpected removal for hot-unplug
  2018-10-19  8:35               ` Jeff Guo
@ 2018-10-22 11:13                 ` Ferruh Yigit
  2018-10-24 23:14                   ` Thomas Monjalon
  0 siblings, 1 reply; 494+ messages in thread
From: Ferruh Yigit @ 2018-10-22 11:13 UTC (permalink / raw)
  To: Jeff Guo, stephen, bruce.richardson, konstantin.ananyev, thomas,
	shaopeng.he
  Cc: dev, helin.zhang

On 10/19/2018 9:35 AM, Jeff Guo wrote:
> 
> On 10/19/2018 12:06 AM, Ferruh Yigit wrote:
>> On 10/18/2018 7:27 AM, Jeff Guo wrote:
>>> When a device is hot-unplugged, pci_remove will be invoked unexpectedly
>>> before pci_release, it will caused kernel hung issue which will throw the
>>> error info of "Trying to free already-free IRQ XXX". And on the other hand,
>>> if pci_remove before pci_release, the interrupt will not got chance to be
>>> disabled. So this patch aim to fix this issue by adding pci_release call
>>> in pci_remove, it will gurranty that all pci clean up will be done before
>>> pci removal.
>>>
>>> Signed-off-by: Jeff Guo <jia.guo@intel.com>

Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>

^ permalink raw reply	[flat|nested] 494+ messages in thread

* Re: [PATCH v1] igb_uio: fix unexpected removal for hot-unplug
  2018-10-22 11:13                 ` Ferruh Yigit
@ 2018-10-24 23:14                   ` Thomas Monjalon
  0 siblings, 0 replies; 494+ messages in thread
From: Thomas Monjalon @ 2018-10-24 23:14 UTC (permalink / raw)
  To: Jeff Guo
  Cc: dev, Ferruh Yigit, stephen, bruce.richardson, konstantin.ananyev,
	shaopeng.he, helin.zhang

22/10/2018 13:13, Ferruh Yigit:
> On 10/19/2018 9:35 AM, Jeff Guo wrote:
> > 
> > On 10/19/2018 12:06 AM, Ferruh Yigit wrote:
> >> On 10/18/2018 7:27 AM, Jeff Guo wrote:
> >>> When a device is hot-unplugged, pci_remove will be invoked unexpectedly
> >>> before pci_release, it will caused kernel hung issue which will throw the
> >>> error info of "Trying to free already-free IRQ XXX". And on the other hand,
> >>> if pci_remove before pci_release, the interrupt will not got chance to be
> >>> disabled. So this patch aim to fix this issue by adding pci_release call
> >>> in pci_remove, it will gurranty that all pci clean up will be done before
> >>> pci removal.
> >>>
> >>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> 
> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>

Applied, thanks

^ permalink raw reply	[flat|nested] 494+ messages in thread

end of thread, other threads:[~2018-10-24 23:14 UTC | newest]

Thread overview: 494+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-28 15:44 [RFC] Add hot plug event in rte eal interrupt and inplement it in i40e driver Jeff Guo
2017-05-30  7:14 ` Gaëtan Rivet
2017-06-07  7:40   ` Wu, Jingjing
2017-06-15 21:22     ` Gaëtan Rivet
2017-06-21  2:50       ` Guo, Jia
2017-06-29 17:27     ` Stephen Hemminger
2017-06-30  3:36       ` Wu, Jingjing
2017-06-07  7:27 ` Wu, Jingjing
2017-06-28 11:07 ` [PATCH v2 1/2] eal: add uevent api for hot plug Jeff Guo
2017-06-28 11:07   ` [PATCH v2 2/2] net/i40e: add hot plug monitor in i40e Jeff Guo
2017-06-29  1:41     ` Wu, Jingjing
2017-06-29  4:31       ` Guo, Jia
2017-06-29  3:34     ` Stephen Hemminger
2017-06-29  4:48       ` Wu, Jingjing
2017-06-29  7:47         ` Guo, Jia
2017-06-29  4:37     ` [PATCH v3 0/2] add uevent api for hot plug Jeff Guo
2017-06-29  4:37       ` [PATCH v3 1/2] eal: " Jeff Guo
2017-06-30  3:38         ` Wu, Jingjing
2017-06-29  4:37       ` [PATCH v3 2/2] net/i40e: add hot plug monitor in i40e Jeff Guo
2017-06-30  3:38         ` Wu, Jingjing
2018-04-13  8:30       ` [PATCH V22 0/4] add device event monitor framework Jeff Guo
2018-04-13  8:30         ` [PATCH V22 1/4] eal: add device event handle in interrupt thread Jeff Guo
2018-04-13  8:30         ` [PATCH V22 2/4] eal: add device event monitor framework Jeff Guo
2018-04-13  8:30         ` [PATCH V22 3/4] eal/linux: uevent parse and process Jeff Guo
2018-04-13  8:30         ` [PATCH V22 4/4] app/testpmd: enable device hotplug monitoring Jeff Guo
2018-04-13 10:03         ` [PATCH V22 0/4] add device event monitor framework Thomas Monjalon
2018-04-18 13:38       ` [PATCH V20 0/4] add hot plug recovery mechanism Jeff Guo
2018-04-18 13:38         ` [PATCH V20 1/4] bus/pci: introduce device hot unplug handle Jeff Guo
2018-04-20 10:32           ` Ananyev, Konstantin
2018-05-03  3:05             ` Guo, Jia
2018-04-18 13:38         ` [PATCH V20 2/4] eal: add failure handler mechanism for hot plug Jeff Guo
2018-04-19  1:30           ` Zhang, Qi Z
2018-04-20 11:14           ` Ananyev, Konstantin
2018-05-03  3:13             ` Guo, Jia
2018-04-20 16:16           ` Ananyev, Konstantin
2018-05-03  3:17             ` Guo, Jia
2018-04-18 13:38         ` [PATCH V20 3/4] igb_uio: fix uio release issue when hot unplug Jeff Guo
2018-04-18 13:38         ` [PATCH V20 4/4] app/testpmd: show example to handler " Jeff Guo
2018-05-03  7:25           ` Matan Azrad
2018-05-03  9:35             ` Guo, Jia
2018-05-03 11:27               ` Matan Azrad
2018-05-03  8:57       ` [PATCH V21 0/4] hot plug recovery mechanism Jeff Guo
2018-05-03  8:57         ` [PATCH V21 1/4] bus/pci: handle device hot unplug Jeff Guo
2018-05-03  8:57         ` [PATCH V21 2/4] eal: add failure handle mechanism for hot plug Jeff Guo
2018-05-03  8:57         ` [PATCH V21 3/4] igb_uio: fix uio release issue when hot unplug Jeff Guo
2018-05-03  8:57         ` [PATCH V21 4/4] app/testpmd: show example to handle " Jeff Guo
2018-05-16 14:30           ` Iremonger, Bernard
2018-05-03 10:48       ` [PATCH V21 0/4] hot plug recovery mechanism Jeff Guo
2018-05-03 10:48         ` [PATCH V21 1/4] bus/pci: handle device hot unplug Jeff Guo
2018-05-03 10:48         ` [PATCH V21 2/4] eal: add failure handle mechanism for hot plug Jeff Guo
2018-05-04 15:56           ` Ananyev, Konstantin
2018-05-08 14:57             ` Guo, Jia
2018-05-08 15:19               ` Ananyev, Konstantin
2018-05-03 10:48         ` [PATCH V21 3/4] igb_uio: fix uio release issue when hot unplug Jeff Guo
2018-05-03 10:48         ` [PATCH V21 4/4] app/testpmd: show example to handle " Jeff Guo
2018-06-14 12:59           ` Iremonger, Bernard
2018-06-15  8:32             ` Guo, Jia
2018-06-22 11:51       ` [PATCH v2 0/4] hot plug failure handle mechanism Jeff Guo
2018-06-22 11:51         ` [PATCH v2 1/4] bus/pci: handle device hot unplug Jeff Guo
2018-06-22 12:59           ` Gaëtan Rivet
2018-06-26 15:30             ` Guo, Jia
2018-06-22 11:51         ` [PATCH v2 2/4] eal: add failure handle mechanism for hot plug Jeff Guo
2018-06-22 11:51         ` [PATCH v2 3/4] igb_uio: fix uio release issue when hot unplug Jeff Guo
2018-06-22 11:51         ` [PATCH v2 4/4] app/testpmd: show example to handle " Jeff Guo
2018-06-26 10:06           ` Iremonger, Bernard
2018-06-26 11:58           ` Matan Azrad
2018-06-26 15:33             ` Guo, Jia
2018-06-26 15:36       ` [PATCH V3 1/4] bus/pci: handle device " Jeff Guo
2018-06-26 15:36         ` [PATCH V3 2/4] eal: add failure handle mechanism for hot plug Jeff Guo
2018-06-26 15:36         ` [PATCH V3 3/4] igb_uio: fix uio release issue when hot unplug Jeff Guo
2018-06-26 15:36         ` [PATCH V3 4/4] app/testpmd: show example to handle " Jeff Guo
2018-06-26 17:07           ` Matan Azrad
2018-06-27  3:56             ` Guo, Jia
2018-06-27  6:05               ` Matan Azrad
2018-06-29 10:26                 ` Guo, Jia
2018-06-29 10:30       ` [PATCH V4 0/9] hot plug failure handle mechanism Jeff Guo
2018-06-29 10:30         ` [PATCH V4 1/9] bus: introduce hotplug failure handler Jeff Guo
2018-07-03 22:21           ` Thomas Monjalon
2018-07-04  7:16             ` Guo, Jia
2018-07-04  7:55               ` Thomas Monjalon
2018-07-05  6:23                 ` Guo, Jia
2018-07-05  8:30                   ` Thomas Monjalon
2018-06-29 10:30         ` [PATCH V4 2/9] bus/pci: implement hotplug handler operation Jeff Guo
2018-06-29 10:30         ` [PATCH V4 3/9] bus: introduce sigbus handler Jeff Guo
2018-07-10 21:55           ` Stephen Hemminger
2018-07-11  2:15             ` Jeff Guo
2018-06-29 10:30         ` [PATCH V4 4/9] bus/pci: implement sigbus handler operation Jeff Guo
2018-06-29 10:30         ` [PATCH V4 5/9] bus: add helper to handle sigbus Jeff Guo
2018-06-29 10:51           ` Ananyev, Konstantin
2018-06-29 11:23             ` Guo, Jia
2018-06-29 12:21               ` Ananyev, Konstantin
2018-06-29 12:52                 ` Gaëtan Rivet
2018-07-03 11:24                   ` Guo, Jia
2018-06-29 10:30         ` [PATCH V4 6/9] eal: add failure handle mechanism for hot plug Jeff Guo
2018-06-29 10:49           ` Ananyev, Konstantin
2018-06-29 11:15             ` Guo, Jia
2018-06-29 12:06               ` Ananyev, Konstantin
2018-06-29 10:30         ` [PATCH V4 7/9] igb_uio: fix uio release issue when hot unplug Jeff Guo
2018-07-03 12:12           ` Ferruh Yigit
2018-06-29 10:30         ` [PATCH V4 8/9] app/testpmd: show example to handle " Jeff Guo
2018-07-01  7:46           ` Matan Azrad
2018-07-03  9:35             ` Guo, Jia
2018-07-03 22:44               ` Thomas Monjalon
2018-07-04  3:48                 ` Guo, Jia
2018-07-04  7:06                 ` Matan Azrad
2018-07-05  7:54                   ` Guo, Jia
2018-06-29 10:30         ` [PATCH V4 9/9] app/testpmd: enable device hotplug monitoring Jeff Guo
2018-07-05  7:38       ` [PATCH V5 0/7] hot plug failure handle mechanism Jeff Guo
2018-07-05  7:38         ` [PATCH V5 1/7] bus: add hotplug failure handler Jeff Guo
2018-07-06 15:17           ` He, Shaopeng
2018-07-05  7:38         ` [PATCH V5 2/7] bus/pci: implement hotplug failure handler ops Jeff Guo
2018-07-06 15:17           ` He, Shaopeng
2018-07-09  5:29             ` Jeff Guo
2018-07-05  7:38         ` [PATCH V5 3/7] bus: add sigbus handler Jeff Guo
2018-07-06 15:17           ` He, Shaopeng
2018-07-05  7:38         ` [PATCH V5 4/7] bus/pci: implement sigbus handler operation Jeff Guo
2018-07-06 15:18           ` He, Shaopeng
2018-07-05  7:38         ` [PATCH V5 5/7] bus: add helper to handle sigbus Jeff Guo
2018-07-06 15:22           ` He, Shaopeng
2018-07-09  5:31             ` Jeff Guo
2018-07-08 13:30           ` Andrew Rybchenko
2018-07-09  5:33             ` Jeff Guo
2018-07-05  7:38         ` [PATCH V5 6/7] eal: add failure handle mechanism for hotplug Jeff Guo
2018-07-06 15:22           ` He, Shaopeng
2018-07-08 13:46           ` Andrew Rybchenko
2018-07-09  5:40             ` Jeff Guo
2018-07-05  8:21       ` [PATCH V5 0/7] hot plug failure handle mechanism Jeff Guo
2018-07-05  8:21         ` [PATCH V5 7/7] igb_uio: fix uio release issue when hot unplug Jeff Guo
2018-07-09  6:51       ` [PATCH v6 0/7] hotplug failure handle mechanism Jeff Guo
2018-07-09  6:51         ` [PATCH v6 1/7] bus: add hotplug failure handler Jeff Guo
2018-07-09  6:51         ` [PATCH v6 2/7] bus/pci: implement hotplug failure handler ops Jeff Guo
2018-07-09  6:51         ` [PATCH v6 3/7] bus: add sigbus handler Jeff Guo
2018-07-09  6:51         ` [PATCH v6 4/7] bus/pci: implement sigbus handler operation Jeff Guo
2018-07-09  6:51         ` [PATCH v6 5/7] bus: add helper to handle sigbus Jeff Guo
2018-07-09  6:51         ` [PATCH v6 6/7] eal: add failure handle mechanism for hotplug Jeff Guo
2018-07-09  7:42           ` Gaëtan Rivet
2018-07-09  8:12             ` Jeff Guo
2018-07-09  6:51         ` [PATCH v6 7/7] igb_uio: fix uio release issue when hot unplug Jeff Guo
2018-07-09 11:56       ` [PATCH v7 0/7] hotplug failure handle mechanism Jeff Guo
2018-07-09 11:56         ` [PATCH v7 1/7] bus: add hotplug failure handler Jeff Guo
2018-07-09 11:56         ` [PATCH v7 2/7] bus/pci: implement hotplug failure handler ops Jeff Guo
2018-07-09 11:56         ` [PATCH v7 3/7] bus: add sigbus handler Jeff Guo
2018-07-09 11:56         ` [PATCH v7 4/7] bus/pci: implement sigbus handler operation Jeff Guo
2018-07-09 11:56         ` [PATCH v7 5/7] bus: add helper to handle sigbus Jeff Guo
2018-07-09 11:56         ` [PATCH v7 6/7] eal: add failure handle mechanism for hotplug Jeff Guo
2018-07-09 11:56         ` [PATCH v7 7/7] igb_uio: fix uio release issue when hot unplug Jeff Guo
2018-07-09 12:00       ` [PATCH v7 0/7] hotplug failure handle mechanism Jeff Guo
2018-07-09 12:01         ` [PATCH v7 1/7] bus: add hotplug failure handler Jeff Guo
2018-07-09 12:01         ` [PATCH v7 2/7] bus/pci: implement hotplug failure handler ops Jeff Guo
2018-07-09 12:01         ` [PATCH v7 3/7] bus: add sigbus handler Jeff Guo
2018-07-09 12:01         ` [PATCH v7 4/7] bus/pci: implement sigbus handler operation Jeff Guo
2018-07-09 12:01         ` [PATCH v7 5/7] bus: add helper to handle sigbus Jeff Guo
2018-07-09 13:48           ` Andrew Rybchenko
2018-07-10  8:22             ` Jeff Guo
2018-07-10  8:40               ` Gaëtan Rivet
2018-07-10 10:07                 ` Jeff Guo
2018-07-09 12:01         ` [PATCH v7 6/7] eal: add failure handle mechanism for hotplug Jeff Guo
2018-07-09 13:50           ` Andrew Rybchenko
2018-07-10  8:23             ` Jeff Guo
2018-07-09 12:01         ` [PATCH v7 7/7] igb_uio: fix uio release issue when hot unplug Jeff Guo
2018-07-09 22:44           ` Stephen Hemminger
2018-07-10  8:28             ` Jeff Guo
2018-07-10 11:03       ` [PATCH v8 0/7] hotplug failure handle mechanism Jeff Guo
2018-07-10 11:03         ` [PATCH v8 1/7] bus: add hotplug failure handler Jeff Guo
2018-07-10 11:03         ` [PATCH v8 2/7] bus/pci: implement hotplug failure handler ops Jeff Guo
2018-07-10 11:03         ` [PATCH v8 3/7] bus: add sigbus handler Jeff Guo
2018-07-10 11:03         ` [PATCH v8 4/7] bus/pci: implement sigbus handler operation Jeff Guo
2018-07-10 11:03         ` [PATCH v8 5/7] bus: add helper to handle sigbus Jeff Guo
2018-07-10 11:03         ` [PATCH v8 6/7] eal: add failure handle mechanism for hotplug Jeff Guo
2018-07-10 11:03         ` [PATCH v8 7/7] igb_uio: fix uio release issue " Jeff Guo
2018-07-10 21:48           ` Stephen Hemminger
2018-07-11  3:10             ` Jeff Guo
2018-07-10 21:52           ` Stephen Hemminger
2018-07-11  2:46             ` Jeff Guo
2018-07-11 10:01               ` Jeff Guo
2018-07-11 10:41       ` [PATCH v9 0/7] hotplug failure handle mechanism Jeff Guo
2018-07-11 10:41         ` [PATCH v9 1/7] bus: add hotplug failure handler Jeff Guo
2018-07-11 10:41         ` [PATCH v9 2/7] bus/pci: implement hotplug failure handler ops Jeff Guo
2018-07-11 10:41         ` [PATCH v9 3/7] bus: add sigbus handler Jeff Guo
2018-07-11 10:41         ` [PATCH v9 4/7] bus/pci: implement sigbus handler operation Jeff Guo
2018-07-11 10:41         ` [PATCH v9 5/7] bus: add helper to handle sigbus Jeff Guo
2018-07-11 10:41         ` [PATCH v9 6/7] eal: add failure handle mechanism for hotplug Jeff Guo
2018-07-11 10:41         ` [PATCH v9 7/7] igb_uio: fix unexpected remove issue " Jeff Guo
2018-07-12  1:57           ` He, Shaopeng
2018-07-11 15:46         ` [PATCH v9 0/7] hotplug failure handle mechanism Stephen Hemminger
2018-07-12  3:14           ` Jeff Guo
2018-08-17 10:48       ` [PATCH v10 0/8] " Jeff Guo
2018-08-17 10:48         ` [PATCH v10 1/8] bus: add memory failure handler Jeff Guo
2018-08-17 10:48         ` [PATCH v10 2/8] bus/pci: implement memory failure handler ops Jeff Guo
2018-08-17 10:48         ` [PATCH v10 3/8] bus: add sigbus handler Jeff Guo
2018-08-17 10:48         ` [PATCH v10 4/8] bus/pci: implement sigbus handler ops Jeff Guo
2018-08-17 10:48         ` [PATCH v10 5/8] bus: add helper to handle sigbus Jeff Guo
2018-08-17 10:48         ` Jeff Guo
2018-08-17 10:48         ` [PATCH v10 6/8] eal: add failure handle mechanism for hotplug Jeff Guo
2018-08-17 10:48         ` [PATCH v10 7/8] igb_uio: fix unexpected remove issue " Jeff Guo
2018-09-27 15:07           ` Ferruh Yigit
2018-10-18  5:51             ` Jeff Guo
2018-10-18  6:27           ` [PATCH v1] igb_uio: fix unexpected removal for hot-unplug Jeff Guo
2018-10-18 16:06             ` Ferruh Yigit
2018-10-19  8:35               ` Jeff Guo
2018-10-22 11:13                 ` Ferruh Yigit
2018-10-24 23:14                   ` Thomas Monjalon
2018-08-17 10:48         ` [PATCH v10 8/8] testpmd: use hotplug failure handle mechanism Jeff Guo
2018-09-30 10:24       ` [PATCH v11 0/7] hot-unplug " Jeff Guo
2018-09-30 10:24         ` [PATCH v11 1/7] bus: add hot-unplug handler Jeff Guo
2018-09-30 10:24         ` [PATCH v11 2/7] bus/pci: implement hot-unplug handler ops Jeff Guo
2018-09-30 10:24         ` [PATCH v11 3/7] bus: add sigbus handler Jeff Guo
2018-09-30 10:24         ` [PATCH v11 4/7] bus/pci: implement sigbus handler ops Jeff Guo
2018-09-30 10:24         ` [PATCH v11 5/7] bus: add helper to handle sigbus Jeff Guo
2018-09-30 10:24         ` [PATCH v11 6/7] eal: add failure handle mechanism for hot-unplug Jeff Guo
2018-09-30 10:24         ` [PATCH v11 7/7] testpmd: use hot-unplug failure handle mechanism Jeff Guo
2018-09-30 11:29       ` [PATCH v11 0/7] " Jeff Guo
2018-09-30 11:29         ` [PATCH v11 1/7] bus: add hot-unplug handler Jeff Guo
2018-09-30 11:29         ` [PATCH v11 2/7] bus/pci: implement hot-unplug handler ops Jeff Guo
2018-09-30 11:29         ` [PATCH v11 3/7] bus: add sigbus handler Jeff Guo
2018-09-30 11:30         ` [PATCH v11 4/7] bus/pci: implement sigbus handler ops Jeff Guo
2018-09-30 11:30         ` [PATCH v11 5/7] bus: add helper to handle sigbus Jeff Guo
2018-09-30 11:30         ` [PATCH v11 6/7] eal: add failure handle mechanism for hot-unplug Jeff Guo
2018-09-30 19:46           ` Ananyev, Konstantin
2018-10-02  4:01             ` Jeff Guo
2018-09-30 11:30         ` [PATCH v11 7/7] testpmd: use hot-unplug failure handle mechanism Jeff Guo
2018-10-01  9:00         ` [PATCH v11 0/7] " Stephen Hemminger
2018-10-01  9:55           ` Jerin Jacob
2018-10-02 10:08             ` Jeff Guo
2018-10-02  9:57           ` Jeff Guo
2018-10-02 12:32       ` [PATCH v12 " Jeff Guo
2018-10-02 12:32         ` [PATCH v12 1/7] bus: add hot-unplug handler Jeff Guo
2018-10-02 12:32         ` [PATCH v12 2/7] bus/pci: implement hot-unplug handler ops Jeff Guo
2018-10-02 12:32         ` [PATCH v12 3/7] bus: add sigbus handler Jeff Guo
2018-10-02 12:32         ` [PATCH v12 4/7] bus/pci: implement sigbus handler ops Jeff Guo
2018-10-02 12:32         ` [PATCH v12 5/7] bus: add helper to handle sigbus Jeff Guo
2018-10-02 12:32         ` [PATCH v11 6/7] eal: add failure handle mechanism for hot-unplug Jeff Guo
2018-10-02 12:32         ` [PATCH v11 7/7] testpmd: use hot-unplug failure handle mechanism Jeff Guo
2018-10-02 12:35       ` [PATCH v12 0/7] " Jeff Guo
2018-10-02 12:35         ` [PATCH v12 1/7] bus: add hot-unplug handler Jeff Guo
2018-10-02 12:35         ` [PATCH v12 2/7] bus/pci: implement hot-unplug handler ops Jeff Guo
2018-10-02 12:35         ` [PATCH v12 3/7] bus: add sigbus handler Jeff Guo
2018-10-02 14:32           ` Burakov, Anatoly
2018-10-04  3:14             ` Jeff Guo
2018-10-02 12:35         ` [PATCH v12 4/7] bus/pci: implement sigbus handler ops Jeff Guo
2018-10-02 14:39           ` Burakov, Anatoly
2018-10-04  3:58             ` Jeff Guo
2018-10-02 12:35         ` [PATCH v12 5/7] bus: add helper to handle sigbus Jeff Guo
2018-10-02 12:35         ` [PATCH v12 6/7] eal: add failure handle mechanism for hot-unplug Jeff Guo
2018-10-02 13:34           ` Ananyev, Konstantin
2018-10-04  2:31             ` Jeff Guo
2018-10-02 15:53           ` Burakov, Anatoly
2018-10-02 16:00             ` Ananyev, Konstantin
2018-10-04  3:12             ` Jeff Guo
2018-10-02 12:35         ` [PATCH v12 7/7] testpmd: use hot-unplug failure handle mechanism Jeff Guo
2018-10-02 15:21           ` Iremonger, Bernard
2018-10-04  2:56             ` Jeff Guo
2018-10-04  6:30       ` [PATCH v13 0/7] " Jeff Guo
2018-10-04  6:30         ` [PATCH v13 1/7] bus: add hot-unplug handler Jeff Guo
2018-10-04  6:30         ` [PATCH v13 2/7] bus/pci: implement hot-unplug handler ops Jeff Guo
2018-10-04  6:30         ` [PATCH v13 3/7] bus: add sigbus handler Jeff Guo
2018-10-04  6:30         ` [PATCH v13 4/7] bus/pci: implement sigbus handler ops Jeff Guo
2018-10-04  6:30         ` [PATCH v13 5/7] bus: add helper to handle sigbus Jeff Guo
2018-10-04  6:30         ` [PATCH v13 6/7] eal: add failure handle mechanism for hot-unplug Jeff Guo
2018-10-04  6:30         ` [PATCH v13 7/7] app/testpmd: use hotplug failure handler Jeff Guo
2018-10-04 10:31           ` Iremonger, Bernard
2018-10-04 13:53             ` Jeff Guo
2018-10-04 12:02         ` [PATCH v13 0/7] hot-unplug failure handle mechanism Ananyev, Konstantin
2018-10-04 14:46       ` [PATCH v14 " Jeff Guo
2018-10-04 14:46         ` [PATCH v14 1/7] bus: add hot-unplug handler Jeff Guo
2018-10-04 14:46         ` [PATCH v14 2/7] bus/pci: implement hot-unplug handler ops Jeff Guo
2018-10-04 14:46         ` [PATCH v14 3/7] bus: add sigbus handler Jeff Guo
2018-10-04 14:46         ` [PATCH v14 4/7] bus/pci: implement sigbus handler ops Jeff Guo
2018-10-04 14:46         ` [PATCH v14 5/7] bus: add helper to handle sigbus Jeff Guo
2018-10-04 14:46         ` [PATCH v14 6/7] eal: add failure handle mechanism for hot-unplug Jeff Guo
2018-10-15 10:43           ` Thomas Monjalon
2018-10-04 14:46         ` [PATCH v14 7/7] app/testpmd: use hotplug failure handler Jeff Guo
2018-10-05 12:26           ` Iremonger, Bernard
2018-10-15 11:27       ` [PATCH v15 0/7] hot-unplug failure handle mechanism Jeff Guo
2018-10-15 11:27         ` [PATCH v15 1/7] bus: add hot-unplug handler Jeff Guo
2018-10-15 11:27         ` [PATCH v15 2/7] bus/pci: implement hot-unplug handler ops Jeff Guo
2018-10-15 11:27         ` [PATCH v15 3/7] bus: add sigbus handler Jeff Guo
2018-10-15 11:27         ` [PATCH v15 4/7] bus/pci: implement sigbus handler ops Jeff Guo
2018-10-15 13:41           ` Thomas Monjalon
2018-10-15 14:16             ` Thomas Monjalon
2018-10-15 11:27         ` [PATCH v15 5/7] bus: add helper to handle sigbus Jeff Guo
2018-10-15 11:27         ` [PATCH v15 6/7] eal: add failure handle mechanism for hot-unplug Jeff Guo
2018-10-15 11:27         ` [PATCH v15 7/7] app/testpmd: use hotplug failure handler Jeff Guo
2018-10-15 20:19         ` [PATCH v15 0/7] hot-unplug failure handle mechanism Thomas Monjalon
2017-06-29  5:01     ` [PATCH v3 0/2] add uevent api for hot plug Jeff Guo
2017-06-29  5:01       ` [PATCH v3 1/2] eal: " Jeff Guo
2017-07-04  7:15         ` Wu, Jingjing
2017-09-03 15:49         ` [PATCH v4 0/2] add uevent monitor " Jeff Guo
2017-09-03 15:49           ` [PATCH v4 1/2] eal: " Jeff Guo
2017-09-03 16:10             ` Stephen Hemminger
2017-09-03 16:12             ` Stephen Hemminger
2017-09-05  5:28               ` Guo, Jia
2017-09-03 16:14             ` Stephen Hemminger
2017-09-03 16:16             ` Stephen Hemminger
2017-09-03 15:49           ` [PATCH v4 2/2] app/testpmd: use uevent to monitor hot removal Jeff Guo
2017-09-20  4:12             ` [PATCH v5 0/2] add uevent monitor for hot plug Jeff Guo
2017-09-19 18:44               ` Jan Blunck
2017-09-20  6:51                 ` Guo, Jia
2017-09-20  4:12               ` [PATCH v5 1/2] eal: " Jeff Guo
2017-09-20  4:12               ` [PATCH v5 2/2] app/testpmd: use uevent to monitor hot removal Jeff Guo
2017-11-01 20:16                 ` [PATCH v6 0/2] add uevent monitor for hot plug Jeff Guo
2017-11-01 20:16                   ` [PATCH v6 1/2] eal: " Jeff Guo
2017-11-01 21:36                     ` Stephen Hemminger
2017-11-01 21:41                     ` Stephen Hemminger
2017-11-08  5:39                       ` Guo, Jia
2017-12-25  8:30                       ` Guo, Jia
2017-12-25 18:06                     ` Stephen Hemminger
2018-01-02  9:40                       ` Guo, Jia
2017-11-01 20:16                   ` [PATCH v6 2/2] app/testpmd: use uevent to monitor hotplug Jeff Guo
2018-01-03  1:42                     ` [PATCH v7 0/2] add uevent monitor for hot plug Jeff Guo
2018-01-03  1:42                       ` [PATCH v7 1/2] eal: " Jeff Guo
2018-01-02 17:02                         ` Matan Azrad
2018-01-08  5:26                           ` Guo, Jia
2018-01-08  8:14                             ` Matan Azrad
2018-01-08  6:05                           ` Guo, Jia
2018-01-09  0:39                         ` Thomas Monjalon
2018-01-09  8:25                           ` Guo, Jia
2018-01-09 10:31                             ` Mordechay Haimovsky
2018-01-09 10:47                               ` Thomas Monjalon
2018-01-09 11:39                                 ` Guo, Jia
2018-01-09 11:44                                   ` Thomas Monjalon
2018-01-09 12:08                                     ` Guo, Jia
2018-01-09 12:42                                       ` Gaëtan Rivet
2018-01-10  9:29                                         ` Guo, Jia
2018-01-09 13:44                                       ` Thomas Monjalon
2018-01-10  9:32                                         ` Guo, Jia
2018-01-09 11:45                               ` Guo, Jia
2018-01-09 11:38                             ` Thomas Monjalon
2018-01-09 11:58                               ` Guo, Jia
2018-01-09 13:40                                 ` Thomas Monjalon
2018-01-03  1:42                       ` [PATCH v7 2/2] app/testpmd: use uevent to monitor hotplug Jeff Guo
2018-01-10  3:30                         ` [PATCH V8 0/3] add uevent mechanism in eal framework Jeff Guo
2018-01-10  3:30                           ` [PATCH V8 1/3] eal: add uevent monitor for hot plug Jeff Guo
2018-01-10  3:30                           ` [PATCH V8 2/3] igb_uio: fix device removal issuse for hotplug Jeff Guo
2018-01-10  3:30                           ` [PATCH V8 3/3] app/testpmd: use uevent to monitor hotplug Jeff Guo
2018-01-10  9:12                             ` [PATCH V9 0/5] add uevent mechanism in eal framework Jeff Guo
2018-01-10  9:12                               ` [PATCH V9 1/5] eal: add uevent monitor api and callback func Jeff Guo
2018-01-10 16:34                                 ` Stephen Hemminger
2018-01-11  1:43                                 ` Thomas Monjalon
2018-01-11 14:24                                   ` Guo, Jia
2018-01-10  9:12                               ` [PATCH V9 2/5] eal: add uevent pass and process function Jeff Guo
2018-01-11 14:05                                 ` [PATCH V10 1/2] eal: add uevent monitor api and callback func Jeff Guo
2018-01-11 14:05                                   ` [PATCH V10 2/2] eal: add uevent pass and process function Jeff Guo
2018-01-14 23:24                                     ` Thomas Monjalon
2018-01-15 10:52                                       ` Guo, Jia
2018-01-15 11:29                                         ` Thomas Monjalon
2018-01-15 15:33                                           ` Guo, Jia
2018-01-15 10:48                                     ` [PATCH V11 1/3] eal: add uevent monitor api and callback func Jeff Guo
2018-01-15 10:48                                       ` [PATCH V11 2/3] eal: add uevent pass and process function Jeff Guo
2018-01-17 22:00                                         ` Thomas Monjalon
2018-01-18  4:17                                           ` Guo, Jia
2018-01-15 10:48                                       ` [PATCH V11 3/3] app/testpmd: use uevent to monitor hotplug Jeff Guo
2018-01-18  4:12                                         ` [PATCH V12 1/3] eal: add uevent monitor api and callback func Jeff Guo
2018-01-18  4:12                                           ` [PATCH V12 2/3] eal: add uevent pass and process function Jeff Guo
2018-01-24 15:00                                             ` Wu, Jingjing
2018-01-18  4:12                                           ` [PATCH V12 3/3] app/testpmd: use uevent to monitor hotplug Jeff Guo
2018-01-24 15:21                                             ` Wu, Jingjing
2018-01-25 14:58                                               ` Guo, Jia
2018-01-25 14:46                                             ` [PATCH V13 1/3] eal: add uevent monitor api and callback func Jeff Guo
2018-01-25 14:46                                               ` [PATCH V13 2/3] eal: add uevent pass and process function Jeff Guo
2018-01-25 14:46                                               ` [PATCH V13 3/3] app/testpmd: use uevent to monitor hotplug Jeff Guo
2018-01-26  3:49                                             ` [PATCH V13 1/3] eal: add uevent monitor api and callback func Jeff Guo
2018-01-26  3:49                                               ` [PATCH V13 2/3] eal: add uevent pass and process function Jeff Guo
2018-01-26  3:49                                               ` [PATCH V13 3/3] app/testpmd: use uevent to monitor hotplug Jeff Guo
2018-01-30 12:20                                                 ` [PATCH V14 1/3] eal: add uevent monitor api and callback func Jeff Guo
2018-01-30 12:20                                                   ` [PATCH V14 2/3] eal: add uevent pass and process function Jeff Guo
2018-01-30 12:21                                                   ` [PATCH V14 3/3] app/testpmd: use uevent to monitor hotplug Jeff Guo
2018-01-31  5:21                                                     ` Wu, Jingjing
2018-03-21  5:27                                                     ` [PATCH V15 1/5] eal: add uevent monitor api and callback func Jeff Guo
2018-03-21  5:27                                                       ` [PATCH V15 2/5] eal: add uevent pass and process function Jeff Guo
2018-03-21 14:20                                                         ` Tan, Jianfeng
2018-03-22  8:20                                                           ` Guo, Jia
2018-03-21  5:27                                                       ` [PATCH V15 3/5] app/testpmd: use uevent to monitor hotplug Jeff Guo
2018-03-26 10:55                                                         ` [PATCH V16 0/3] add device event monitor framework Jeff Guo
2018-03-26 10:55                                                           ` [PATCH V16 1/3] eal: add device event handle in interrupt thread Jeff Guo
2018-03-26 10:55                                                           ` [PATCH V16 2/3] eal: add device event monitor framework Jeff Guo
2018-03-26 10:55                                                           ` [PATCH V16 3/3] app/testpmd: enable device hotplug monitoring Jeff Guo
2018-03-26 11:20                                                         ` [PATCH V16 0/4] add device event monitor framework Jeff Guo
2018-03-26 11:20                                                           ` [PATCH V16 1/4] eal: add device event handle in interrupt thread Jeff Guo
2018-03-27  9:26                                                             ` Tan, Jianfeng
2018-03-28  8:14                                                               ` Guo, Jia
2018-03-26 11:20                                                           ` [PATCH V16 2/4] eal: add device event monitor framework Jeff Guo
2018-03-28  3:39                                                             ` Tan, Jianfeng
2018-03-28  8:12                                                               ` Guo, Jia
2018-03-26 11:20                                                           ` [PATCH V16 3/4] eal/linux: uevent parse and process Jeff Guo
2018-03-28 16:15                                                             ` Tan, Jianfeng
2018-03-29 13:32                                                               ` Van Haaren, Harry
2018-03-29 15:03                                                                 ` Guo, Jia
2018-03-29 15:08                                                               ` Guo, Jia
2018-03-26 11:20                                                           ` [PATCH V16 4/4] app/testpmd: enable device hotplug monitoring Jeff Guo
2018-03-28 16:41                                                             ` Tan, Jianfeng
2018-03-29 16:00                                                             ` [PATCH V17 0/4] add device event monitor framework Jeff Guo
2018-03-29 16:00                                                               ` [PATCH V17 1/4] eal: add device event handle in interrupt thread Jeff Guo
2018-03-29 16:00                                                               ` [PATCH V17 2/4] eal: add device event monitor framework Jeff Guo
2018-03-29 16:00                                                               ` [PATCH V17 3/4] eal/linux: uevent parse and process Jeff Guo
2018-03-29 16:59                                                                 ` Stephen Hemminger
2018-04-02  4:20                                                                   ` Guo, Jia
2018-03-29 17:00                                                                 ` Stephen Hemminger
2018-04-02  4:19                                                                   ` Guo, Jia
2018-03-29 16:00                                                               ` [PATCH V17 4/4] app/testpmd: enable device hotplug monitoring Jeff Guo
2018-03-29 17:00                                                                 ` Stephen Hemminger
2018-04-02  4:18                                                                   ` Guo, Jia
2018-04-02  5:49                                                                 ` Wu, Jingjing
2018-04-02 11:31                                                                   ` Guo, Jia
2018-04-03 10:33                                                                 ` [PATCH V18 0/4] add device event monitor framework Jeff Guo
2018-04-03 10:33                                                                   ` [PATCH V18 1/4] eal: add device event handle in interrupt thread Jeff Guo
2018-04-04  1:47                                                                     ` Tan, Jianfeng
2018-04-04  4:00                                                                       ` Guo, Jia
2018-04-03 10:33                                                                   ` [PATCH V18 2/4] eal: add device event monitor framework Jeff Guo
2018-04-04  2:53                                                                     ` Tan, Jianfeng
2018-04-05  3:44                                                                       ` Guo, Jia
2018-04-03 10:33                                                                   ` [PATCH V18 3/4] eal/linux: uevent parse and process Jeff Guo
2018-04-04  3:15                                                                     ` Tan, Jianfeng
2018-04-05  6:09                                                                       ` Guo, Jia
2018-04-03 10:33                                                                   ` [PATCH V18 4/4] app/testpmd: enable device hotplug monitoring Jeff Guo
2018-04-04  3:22                                                                     ` Tan, Jianfeng
2018-04-04 16:31                                                                       ` Matan Azrad
2018-04-05  8:40                                                                         ` Guo, Jia
2018-04-05  9:03                                                                         ` Tan, Jianfeng
2018-04-05  8:32                                                                     ` [PATCH V19 0/4] add device event monitor framework Jeff Guo
2018-04-05  8:32                                                                       ` [PATCH V19 1/4] eal: add device event handle in interrupt thread Jeff Guo
2018-04-05  8:32                                                                       ` [PATCH V19 2/4] eal: add device event monitor framework Jeff Guo
2018-04-05 10:15                                                                         ` Tan, Jianfeng
2018-04-05  8:32                                                                       ` [PATCH V19 3/4] eal/linux: uevent parse and process Jeff Guo
2018-04-05  8:32                                                                       ` [PATCH V19 4/4] app/testpmd: enable device hotplug monitoring Jeff Guo
2018-04-05  9:02                                                                     ` [PATCH V19 0/4] add device event monitor framework Jeff Guo
2018-04-05  9:02                                                                       ` [PATCH V19 1/4] eal: add device event handle in interrupt thread Jeff Guo
2018-04-05  9:02                                                                       ` [PATCH V19 2/4] eal: add device event monitor framework Jeff Guo
2018-04-05  9:02                                                                       ` [PATCH V19 3/4] eal/linux: uevent parse and process Jeff Guo
2018-04-05 11:05                                                                         ` Tan, Jianfeng
2018-04-11 11:40                                                                           ` Guo, Jia
2018-04-05  9:02                                                                       ` [PATCH V19 4/4] app/testpmd: enable device hotplug monitoring Jeff Guo
2018-04-05 16:10                                                                         ` [PATCH V20 0/4] add device event monitor framework Jeff Guo
2018-04-05 16:10                                                                           ` [PATCH V20 1/4] eal: add device event handle in interrupt thread Jeff Guo
2018-04-05 16:10                                                                           ` [PATCH V20 2/4] eal: add device event monitor framework Jeff Guo
2018-04-05 21:54                                                                             ` Thomas Monjalon
2018-04-06  3:51                                                                               ` Guo, Jia
2018-04-05 16:10                                                                           ` [PATCH V20 3/4] eal/linux: uevent parse and process Jeff Guo
2018-04-05 16:22                                                                             ` Tan, Jianfeng
2018-04-06  3:47                                                                               ` Guo, Jia
2018-04-05 21:58                                                                             ` Thomas Monjalon
2018-04-06  3:52                                                                               ` Guo, Jia
2018-04-05 16:10                                                                           ` [PATCH V20 4/4] app/testpmd: enable device hotplug monitoring Jeff Guo
2018-04-05 21:48                                                                             ` Thomas Monjalon
2018-04-06  3:51                                                                               ` Guo, Jia
2018-04-06  3:54                                                                             ` [PATCH V21 0/4] add device event monitor framework Jeff Guo
2018-04-06  3:54                                                                               ` [PATCH V21 1/4] eal: add device event handle in interrupt thread Jeff Guo
2018-04-06  3:55                                                                               ` [PATCH V21 2/4] eal: add device event monitor framework Jeff Guo
2018-04-12  8:36                                                                                 ` Thomas Monjalon
2018-04-06  3:55                                                                               ` [PATCH V21 3/4] eal/linux: uevent parse and process Jeff Guo
2018-04-06  3:55                                                                               ` [PATCH V21 4/4] app/testpmd: enable device hotplug monitoring Jeff Guo
2018-01-31  0:44                                                   ` [PATCH V14 1/3] eal: add uevent monitor api and callback func Stephen Hemminger
2018-02-02 10:45                                                     ` Guo, Jia
2018-01-26 16:53                                               ` [PATCH V13 " Bruce Richardson
2018-01-27  3:48                                                 ` Guo, Jia
2018-01-30  0:14                                                   ` Thomas Monjalon
2018-01-30 12:20                                                     ` Guo, Jia
2018-01-19  1:13                                           ` [PATCH V12 " Thomas Monjalon
2018-01-19  2:51                                             ` Guo, Jia
2018-01-24 14:52                                           ` Wu, Jingjing
2018-01-25 14:57                                             ` Guo, Jia
2018-01-17 21:59                                       ` [PATCH V11 " Thomas Monjalon
2018-01-18  4:23                                         ` Guo, Jia
2018-01-19  1:10                                           ` Thomas Monjalon
2018-01-14 23:16                                   ` [PATCH V10 1/2] " Thomas Monjalon
2018-01-15 10:55                                     ` Guo, Jia
2018-01-15 11:32                                       ` Thomas Monjalon
2018-01-15 15:29                                         ` Guo, Jia
2018-01-10  9:12                               ` [PATCH V9 3/5] app/testpmd: use uevent to monitor hotplug Jeff Guo
2018-01-10  9:12                               ` [PATCH V9 4/5] pci_uio: add uevent hotplug failure handler in pci Jeff Guo
2018-01-10  9:12                               ` [PATCH V9 5/5] pci: add driver auto bind for hot insertion Jeff Guo
2018-03-21  6:11                                 ` [PATCH V15 1/2] pci_uio: add uevent hotplug failure handler in uio Jeff Guo
2018-03-21  6:11                                   ` [PATCH V15 2/2] pci: add driver auto bind for hot insertion Jeff Guo
2018-03-30  3:35                                   ` [PATCH V15 1/2] pci_uio: add uevent hotplug failure handler in uio Tan, Jianfeng
2017-12-14  9:48                   ` [PATCH v6 0/2] add uevent monitor for hot plug Mordechay Haimovsky
2017-12-14 10:21                     ` Gaëtan Rivet
2017-12-22  0:16                       ` Guo, Jia
2017-12-24 15:12                         ` Mordechay Haimovsky
2018-01-02  9:43                           ` Guo, Jia
2017-06-29  5:01       ` [PATCH v3 2/2] net/i40e: add hot plug monitor in i40e Jeff Guo
2017-07-04  7:15         ` Wu, Jingjing
2017-07-07  7:56         ` Thomas Monjalon
2017-07-07 10:17           ` Thomas Monjalon
2017-07-07 14:08             ` Guo, Jia
2017-07-09 22:35               ` Thomas Monjalon
2017-07-12  7:36                 ` Guo, Jia
2017-06-29  2:25   ` [PATCH v2 1/2] eal: add uevent api for hot plug Wu, Jingjing
2017-06-29  4:29     ` Guo, Jia
2017-07-04 23:45   ` Thomas Monjalon
2017-07-05  3:02     ` Guo, Jia
2017-07-05  7:32       ` Thomas Monjalon
2017-07-05  9:04         ` Guo, Jia
2017-08-22 14:56           ` Guo, Jia
2017-08-28 15:50             ` Gaëtan Rivet

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.