From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757990AbcDHIEQ (ORCPT ); Fri, 8 Apr 2016 04:04:16 -0400 Received: from mx1.redhat.com ([209.132.183.28]:56846 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757969AbcDHIEL (ORCPT ); Fri, 8 Apr 2016 04:04:11 -0400 Reply-To: xlpang@redhat.com Subject: Re: [PATCH v2 1/2] sched/rtmutex/deadline: Fix a PI crash for deadline tasks References: <1459947556-12332-1-git-send-email-xlpang@redhat.com> <20160406181433.GT3448@twins.programming.kicks-ass.net> To: Peter Zijlstra Cc: linux-kernel@vger.kernel.org, Thomas Gleixner , Juri Lelli , Ingo Molnar , Steven Rostedt From: Xunlei Pang Message-ID: <570765F7.7070406@redhat.com> Date: Fri, 8 Apr 2016 16:04:07 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 In-Reply-To: <20160406181433.GT3448@twins.programming.kicks-ass.net> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2016/04/07 at 02:14, Peter Zijlstra wrote: > On Wed, Apr 06, 2016 at 08:59:15PM +0800, Xunlei Pang wrote: >> A crash happened while I'm playing with deadline PI rtmutex. >> >> BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 >> IP: [] rt_mutex_get_top_task+0x1f/0x30 >> PGD 232a75067 PUD 230947067 PMD 0 >> Oops: 0000 [#1] SMP >> CPU: 1 PID: 10994 Comm: a.out Not tainted >> >> Call Trace: >> [] ? enqueue_task_dl+0x2a/0x320 >> [] enqueue_task+0x2c/0x80 >> [] activate_task+0x23/0x30 >> [] pull_dl_task+0x1d5/0x260 >> [] pre_schedule_dl+0x16/0x20 >> [] __schedule+0xd3/0x900 >> [] schedule+0x29/0x70 >> [] __rt_mutex_slowlock+0x4b/0xc0 >> [] rt_mutex_slowlock+0xd1/0x190 >> [] rt_mutex_timed_lock+0x53/0x60 >> [] futex_lock_pi.isra.18+0x28c/0x390 >> [] ? enqueue_task_dl+0x195/0x320 >> [] ? prio_changed_dl+0x6c/0x90 >> [] do_futex+0x190/0x5b0 >> [] SyS_futex+0x80/0x180 >> [] system_call_fastpath+0x16/0x1b >> RIP [] rt_mutex_get_top_task+0x1f/0x30 >> >> This is because rt_mutex_enqueue_pi() and rt_mutex_dequeue_pi() >> are only protected by pi_lock when operating pi waiters, while >> rt_mutex_get_top_task() will access them with rq lock held but >> not holding pi_lock. >> >> In order to tackle it, we introduce a new pointer "pi_top_task" >> in task_struct, and update it to be the top waiter task(this waiter >> is updated under pi_lock) in rt_mutex_setprio() which is under >> both pi_lock and rq lock, then ensure all its accessers be under >> rq lock (or pi_lock), this can safely fix the crash. >> >> This patch is originated from "Peter Zijlstra", with several >> tweaks and improvements by me. > I would suggest doing the rt_mutex_postunlock() thing as a separate > patch, it has some merit outside of these changes and reduces the total > amount of complexity in this patch. I think the code change is necessary , as it avoids the invalid task_struct access issue introduced by PATCH1. Do you mean just making the code refactor using rt_mutex_postunlock() as a separate patch? or do I miss something? > > Also, I would very much like Thomas to ack this patch before I take it, > but since its conference season this might take a little while. Esp. the > change marked with XXX is something that I'm not sure about. I'm ok with this change, waiting for Thomas. Regards, Xunlei