From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753886AbdF0WcV (ORCPT ); Tue, 27 Jun 2017 18:32:21 -0400 Received: from smtp.codeaurora.org ([198.145.29.96]:60384 "EHLO smtp.codeaurora.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751900AbdF0WcN (ORCPT ); Tue, 27 Jun 2017 18:32:13 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 smtp.codeaurora.org ECB736074C Authentication-Results: pdx-caf-mail.web.codeaurora.org; dmarc=none (p=none dis=none) header.from=codeaurora.org Authentication-Results: pdx-caf-mail.web.codeaurora.org; spf=none smtp.mailfrom=jhugo@codeaurora.org Subject: Re: [BUG] Deadlock due due to interactions of block, RCU, and cpu offline To: paulmck@linux.vnet.ibm.com Cc: linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, pprakash@codeaurora.org, Josh Triplett , Steven Rostedt , Mathieu Desnoyers , Lai Jiangshan , Jens Axboe , Sebastian Andrzej Siewior , Thomas Gleixner , Richard Cochran , Boris Ostrovsky , Richard Weinberger References: <20170326232843.GA3637@linux.vnet.ibm.com> <20170327181711.GF3637@linux.vnet.ibm.com> <20170620234623.GA16200@linux.vnet.ibm.com> <20170621161853.GB3721@linux.vnet.ibm.com> <20170623033456.GA15959@linux.vnet.ibm.com> From: Jeffrey Hugo Message-ID: Date: Tue, 27 Jun 2017 16:32:09 -0600 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.1.1 MIME-Version: 1.0 In-Reply-To: <20170623033456.GA15959@linux.vnet.ibm.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 6/22/2017 9:34 PM, Paul E. McKenney wrote: > On Wed, Jun 21, 2017 at 09:18:53AM -0700, Paul E. McKenney wrote: >> No worries, and I am very much looking forward to seeing the results of >> your testing. > > And please see below for an updated patch based on LKML review and > more intensive testing. > I spent some time on this today. It didn't go as I expected. I validated the issue is reproducible as before on 4.11 and 4.12 rcs 1 through 4. However, the version of stress-ng that I was using ran into constant errors starting with rc5, making it nearly impossible to make progress toward reproduction. Upgrading stress-ng to tip fixes the issue, however, I've still been unable to repro the issue. Its my unfounded suspicion that something went in between rc4 and rc5 which changed the timing, and didn't actually fix the issue. I will run the test overnight for 5 hours to try to repro. The patch you sent appears to be based on linux-next, and appears to have a number of dependencies which prevent it from cleanly applying on anything current that I'm able to repro on at this time. Do you want to provide a rebased version of the patch which applies to say 4.11? I could easily test that and report back. -- Jeffrey Hugo Qualcomm Datacenter Technologies as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.