From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752171AbbEFR13 (ORCPT ); Wed, 6 May 2015 13:27:29 -0400 Received: from mail-oi0-f46.google.com ([209.85.218.46]:36403 "EHLO mail-oi0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750816AbbEFR12 (ORCPT ); Wed, 6 May 2015 13:27:28 -0400 MIME-Version: 1.0 Date: Wed, 6 May 2015 22:57:27 +0530 Message-ID: Subject: [KERNEL BUG] do_timer/tick_handover_do_timer 3.10.17 From: pawandeep oza To: linux-kernel@vger.kernel.org, malayasen rout , oza@broadcom.com Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, Linux version 3.10.17 Problem Statement: The timekeeping/do_timer seems to be stopped and the core (in this case it is core0) which is aborting is stuck in the loop which relies on jiffies. The root cause/Reason: we have tickless kernel, so cpu goes to deep idle state, and stop sched tick. tick_nohz_stop_sched_tick tick_sched_do_timer should then take the job and whichever cpu is running transfer jiffies incrementing job to itself. which is tick_sched_do_timer but when say core0 has raised BUG, ipi_cpu_stop will amek other cpu to go to stop. and clcokevents_notify/tick_notify/hrtimer_notifiy eventually seem to be conencted through cpu_chain. but this code belong to hotplug where cpu_down happen and then it can successfully call tick_handover_do_timer which will take over the duty from dying cpu and assign it to the one which is online. static void tick_handover_do_timer(int *cpup) { if (*cpup == tick_do_timer_cpu) { int cpu = cpumask_first(cpu_online_mask); tick_do_timer_cpu = (cpu < nr_cpu_ids) ? cpu : TICK_DO_TIMER_NONE; } } but since cpu_down is not getting called, this handover is not happening. and the last status of the variable tick_do_timer_cpu is always pointing to DEAD cpu (1,2 or 3). and core0 waits forever (where if the code relies on the increment of jiffies). what is the right way to approach this problem, at first it looks like kernel should take care of handing over the jiffies job to other online core indepedent of hotplug. Regards, Oza.