From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751981AbcGARAE (ORCPT ); Fri, 1 Jul 2016 13:00:04 -0400 Received: from mail-pa0-f43.google.com ([209.85.220.43]:34414 "EHLO mail-pa0-f43.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751533AbcGARAC (ORCPT ); Fri, 1 Jul 2016 13:00:02 -0400 Date: Fri, 1 Jul 2016 09:59:59 -0700 From: Viresh Kumar To: Tejun Heo Cc: Greg Kroah-Hartman , Linux Kernel Mailing List , vlevenetz@mm-sol.com, vaibhav.hiremath@linaro.org, alex.elder@linaro.org, johan@kernel.org Subject: [Query] Preemption (hogging) of the work handler Message-ID: <20160701165959.GR12473@ubuntu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Tejun, we are stuck with a typical issue on our octa-core ARM platform and wanted to make sure that we aren't abusing the workqueue API by using it for the wrong usecase. Setup: The system watchdog uses a delayed-work (1 second) for petting the watchdog (resetting its counter) and if the work doesn't reset the counters in time (another 1 second), the watchdog resets the system. Petting-time: 1 second Watchdog Reset-time: 2 seconds The wq is allocated with: wdog_wq = alloc_workqueue("wdog", WQ_HIGHPRI, 0); The watchdog's work-handler looks like this: static void pet_watchdog_work(struct work_struct *work) { ... pet_watchdog(); //Reset its counters /* CONFIG_HZ=300, queuing for 1 second */ queue_delayed_work(wdog_wq, &wdog_dwork, 300); } kernel: 3.10 (Yeah, you can rant me for that, but its not something I can decide on :) Symptoms: - The watchdog reboots the system sometimes. It is more reproducible in cases where an (out-of-tree) bus enumerated over USB is suddenly disconnected, which leads to removal of lots of kernel devices on that bus and a lot of print messages as well, due to failures for sending any more data for those devices.. Observations: I tried to get more into it and found this.. - The timer used by the delayed work fires at the time it was programmed for (checked timer->expires with value of jiffies) and the work-handler gets a chance to run and reset the counters pretty quickly after that. - But somehow, the timer isn't programmed for the right time. - Something is happening between the time the work-handler starts running and we read jiffies from the add_timer() function which gets called from within the queue_delayed_work(). - For example, if the value of jiffies in the pet_watchdog_work() handler (before calling queue_delayed_work()) is say 1000000, then the value of jiffies after the call to queue_delayed_work() has returned becomes 1000310. i.e. it sometimes increases by a value of over 300, which is 1 second in our setup. I have seen this delta to vary from 50 to 350. If it crosses 300, the watchdog resets the system (as it was programmed for 2 seconds). So, we aren't able to queue the next timer in time and that causes all these problems. I haven't concluded on why is that so.. Questions: - I hope that the wq handler can be preempted, but can it be this bad? - Is it fine to use the wq-handler for petting the watchdog? Or should that only be done with help of interrupt-handlers? - Any other clues you can give which can help us figure out what's going on? Thanks in advance and sorry to bother you :) -- viresh