From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758193AbbA2VOc (ORCPT ); Thu, 29 Jan 2015 16:14:32 -0500 Received: from aserp1040.oracle.com ([141.146.126.69]:17013 "EHLO aserp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753011AbbA2VO3 (ORCPT ); Thu, 29 Jan 2015 16:14:29 -0500 Message-ID: <54CAA2A9.50008@oracle.com> Date: Thu, 29 Jan 2015 13:14:17 -0800 From: santosh shilimkar Organization: Oracle Corporation User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.1.2 MIME-Version: 1.0 To: Viresh Kumar , Ethan Zhao CC: "Rafael J. Wysocki" , "linux-pm@vger.kernel.org" , Linux Kernel Mailing List , Ethan Zhao Subject: Re: [PATCH] cpufreq: fix another race between PPC notification and vcpu_hotplug() References: <1422513761-8230-1-git-send-email-ethan.zhao@oracle.com> In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Source-IP: acsinet21.oracle.com [141.146.126.237] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 1/29/2015 12:38 AM, Viresh Kumar wrote: > Looks like you just save my time here, Santosh has also reported a > similar race in a personal mail.. > > On 29 January 2015 at 12:12, Ethan Zhao wrote: >> There is race observed between PPC changed notification handler worker thread >> and vcpu_hotplug() called within xenbus_thread() context. >> It is shown as following WARNING: >> >> ------------[ cut here ]------------ >> WARNING: CPU: 0 PID: 4 at include/linux/kref.h:47 >> kobject_get+0x41/0x50() >> Modules linked in: acpi_cpufreq(+) nfsd auth_rpcgss nfs_acl >> lockd grace sunrpc xfs libcrc32c sd_mod ixgbe igb mdio ahci hwmon >> ... >> [ 14.003548] CPU: 0 PID: 4 Comm: kworker/0:0 Not tainted >> ... >> [ 14.003553] Workqueue: kacpi_notify acpi_os_execute_deferred >> [ 14.003554] 0000000000000000 000000008c76682c ffff88094c793af8 >> ffffffff81661b14 >> [ 14.003556] 0000000000000000 0000000000000000 ffff88094c793b38 >> ffffffff81072b61 >> [ 14.003558] ffff88094c793bd8 ffff8812491f8800 0000000000000292 >> 0000000000000000 >> [ 14.003560] Call Trace: >> [ 14.003567] [] dump_stack+0x46/0x58 >> [ 14.003571] [] warn_slowpath_common+0x81/0xa0 >> [ 14.003572] [] warn_slowpath_null+0x1a/0x20 >> [ 14.003574] [] kobject_get+0x41/0x50 >> [ 14.003579] [] cpufreq_cpu_get+0x75/0xc0 >> [ 14.003581] [] cpufreq_update_policy+0x2e/0x1f0 >> [ 14.003586] [] ? up+0x32/0x50 >> [ 14.003589] [] ? acpi_ns_get_node+0xcb/0xf2 >> [ 14.003591] [] ? acpi_evaluate_object+0x22c/0x252 >> [ 14.003593] [] ? acpi_get_handle+0x95/0xc0 >> [ 14.003596] [] ? acpi_has_method+0x25/0x40 >> [ 14.003601] [] acpi_processor_ppc_has_changed+0x77/0x82 >> [ 14.003604] [] ? move_linked_works+0x66/0x90 >> [ 14.003606] [] acpi_processor_notify+0x58/0xe7 >> [ 14.003609] [] acpi_ev_notify_dispatch+0x44/0x5c >> [ 14.003611] [] acpi_os_execute_deferred+0x15/0x22 >> [ 14.003614] [] process_one_work+0x160/0x410 >> [ 14.003616] [] worker_thread+0x11b/0x520 >> [ 14.003617] [] ? rescuer_thread+0x380/0x380 >> [ 14.003621] [] kthread+0xe1/0x100 >> [ 14.003623] [] ? kthread_create_on_node+0x1b0/0x1b0 >> [ 14.003628] [] ret_from_fork+0x7c/0xb0 >> [ 14.003630] [] ? kthread_create_on_node+0x1b0/0x1b0 >> [ 14.003631] ---[ end trace 89e66eb9795efdf7 ]--- >> >> Thread A: Workqueue: kacpi_notify >> >> acpi_processor_notify() >> acpi_processor_ppc_has_changed() >> cpufreq_update_policy() >> cpufreq_cpu_get() >> kobject_get() >> >> Thread B: xenbus_thread() >> >> xenbus_thread() >> msg->u.watch.handle->callback() >> handle_vcpu_hotplug_event() >> vcpu_hotplug() >> cpu_down() >> __cpu_notify(CPU_DOWN_PREPARE..) >> cpufreq_cpu_callback() >> __cpufreq_remove_dev_prepare() >> update_policy_cpu() >> kobject_move() > > Where is the race ? How do you say this is racy ? > > I am not sure if the problem is with kobject_move(), to me it looked like > the problem is with cpufreq_policy_put_kobj() and we tried to do kobject_get() > after the kobject has been freed.. > > I don't agree to the solution you gave, but lets first make sure what the > problem is, and then take any action against it. > > Please try this patch and let us know if it fixes it for you: > Yes it does fix the issue. Thanks Viresh Regards, Santosh