From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeff Guo Subject: Re: [PATCH v12 6/7] eal: add failure handle mechanism for hot-unplug Date: Thu, 4 Oct 2018 11:12:22 +0800 Message-ID: <66283e63-8bee-e195-a57b-45b9d0341c7f@intel.com> References: <1498711073-42917-1-git-send-email-jia.guo@intel.com> <1538483726-96411-1-git-send-email-jia.guo@intel.com> <1538483726-96411-7-git-send-email-jia.guo@intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Cc: jblunck@infradead.org, shreyansh.jain@nxp.com, dev@dpdk.org, helin.zhang@intel.com To: "Burakov, Anatoly" , stephen@networkplumber.org, bruce.richardson@intel.com, ferruh.yigit@intel.com, konstantin.ananyev@intel.com, gaetan.rivet@6wind.com, jingjing.wu@intel.com, thomas@monjalon.net, motih@mellanox.com, matan@mellanox.com, harry.van.haaren@intel.com, qi.z.zhang@intel.com, shaopeng.he@intel.com, bernard.iremonger@intel.com, arybchenko@solarflare.com, wenzhuo.lu@intel.com, jerin.jacob@caviumnetworks.com Return-path: Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id 4B1765F36 for ; Thu, 4 Oct 2018 05:12:28 +0200 (CEST) In-Reply-To: Content-Language: en-US List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 10/2/2018 11:53 PM, Burakov, Anatoly wrote: > On 02-Oct-18 1:35 PM, Jeff Guo wrote: >> The mechanism can initially register the sigbus handler after the device >> event monitor is enabled. When a sigbus event is captured, it will check >> the failure address and accordingly handle the memory failure of the >> corresponding device by invoke the hot-unplug handler. It could prevent >> the application from crashing when a device is hot-unplugged. >> >> By this patch, users could call below new added APIs to enable/disable >> the device hotplug handle mechanism. Note that it just implement the >> hot-unplug handler in these functions, the other handler of hotplug, >> such >> as handler for hotplug binding, could be add in the future if need: >>    - rte_dev_hotplug_handle_enable >>    - rte_dev_hotplug_handle_disable >> >> Signed-off-by: Jeff Guo >> --- > > > >> +static void sigbus_handler(int signum, siginfo_t *info, >> +                void *ctx __rte_unused) >> +{ >> +    int ret; >> + >> +    RTE_LOG(INFO, EAL, "Thread[%d] catch SIGBUS, fault address:%p\n", >> +        (int)pthread_self(), info->si_addr); >> + >> +    rte_spinlock_lock(&failure_handle_lock); >> +    ret = rte_bus_sigbus_handler(info->si_addr); >> +    rte_spinlock_unlock(&failure_handle_lock); >> +    if (ret == -1) { >> +        rte_exit(EXIT_FAILURE, >> +             "Failed to handle SIGBUS for hot-unplug, " >> +             "(rte_errno: %s)!", strerror(rte_errno)); > > Do we really want to exit the application on sigbus handle failure? > Definitely yes we want, since it is a failure of the process. Agree with Konstantin reply on other mail. >> +    } else if (ret == 1) { >> +        if (sigbus_action_old.sa_handler) >> +            (*(sigbus_action_old.sa_handler))(signum); >> +        else >> +            rte_exit(EXIT_FAILURE, >> +                 "Failed to handle generic SIGBUS!"); >> +    } >> + >> +    RTE_LOG(INFO, EAL, "Success to handle SIGBUS for hot-unplug!\n"); > > Again, does this all need to be with INFO log level? IMO it should be > DEBUG. > I am fine for that. >> +} >> + >> +static int cmp_dev_name(const struct rte_device *dev, >> +    const void *_name) >> +{ >> +    const char *name = _name; >> + >> +    return strcmp(dev->name, name); >> +} >> + >>   static int > > > >>     int __rte_experimental >> @@ -220,5 +320,67 @@ rte_dev_event_monitor_stop(void) >>       close(intr_handle.fd); >>       intr_handle.fd = -1; >>       monitor_started = false; >> + >>       return 0; > > This looks like unintended change. > No, i intended to change it to consistent with the other format. >>   } >> + >> +int __rte_experimental >> +rte_dev_sigbus_handler_register(void) >> +{ >> +    sigset_t mask; >> +    struct sigaction action; >> + > > > >> --- a/lib/librte_eal/rte_eal_version.map >> +++ b/lib/librte_eal/rte_eal_version.map >> @@ -281,6 +281,8 @@ EXPERIMENTAL { >>       rte_dev_event_callback_unregister; >>       rte_dev_event_monitor_start; >>       rte_dev_event_monitor_stop; >> +    rte_dev_hotplug_handle_enable; >> +    rte_dev_hotplug_handle_disable; > > Nitpicking - disable should be above enable, as E follows D in > alphabet :) > yes, after recheck with alphabet, it definitely like what you said. :). >>       rte_dev_iterator_init; >>       rte_dev_iterator_next; >>       rte_devargs_add; >> > >