From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: From: Dexuan Cui To: Hannes Reinecke , Bart Van Assche , "hare@suse.de" , "axboe@kernel.dk" CC: "hch@lst.de" , "linux-kernel@vger.kernel.org" , "linux-block@vger.kernel.org" , "jth@kernel.org" Subject: RE: [PATCH] genhd: Do not hold event lock when scheduling workqueue elements Date: Tue, 7 Feb 2017 02:23:06 +0000 Message-ID: References: <1484732896-22941-1-git-send-email-hare@suse.de> <1485822639.2669.16.camel@sandisk.com> <532c55c4-15da-d2f9-401c-36bc4343756b@suse.com> In-Reply-To: Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 List-ID: > From: linux-block-owner@vger.kernel.org [mailto:linux-block- > owner@vger.kernel.org] On Behalf Of Dexuan Cui > Sent: Friday, February 3, 2017 20:23 > To: Hannes Reinecke ; Bart Van Assche > ; hare@suse.de; axboe@kernel.dk > Cc: hch@lst.de; linux-kernel@vger.kernel.org; linux-block@vger.kernel.org= ; > jth@kernel.org > Subject: RE: [PATCH] genhd: Do not hold event lock when scheduling workqu= eue > elements >=20 > > From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel- > > owner@vger.kernel.org] On Behalf Of Hannes Reinecke > > Sent: Wednesday, February 1, 2017 00:15 > > To: Bart Van Assche ; hare@suse.de; > > axboe@kernel.dk > > Cc: hch@lst.de; linux-kernel@vger.kernel.org; linux-block@vger.kernel.o= rg; > > jth@kernel.org > > Subject: Re: [PATCH] genhd: Do not hold event lock when scheduling > workqueue > > elements > > > > On 01/31/2017 01:31 AM, Bart Van Assche wrote: > > > On Wed, 2017-01-18 at 10:48 +0100, Hannes Reinecke wrote: > > >> @@ -1488,26 +1487,13 @@ static unsigned long > > disk_events_poll_jiffies(struct gendisk *disk) > > >> void disk_block_events(struct gendisk *disk) > > >> { > > >> struct disk_events *ev =3D disk->ev; > > >> - unsigned long flags; > > >> - bool cancel; > > >> > > >> if (!ev) > > >> return; > > >> > > >> - /* > > >> - * Outer mutex ensures that the first blocker completes canc= eling > > >> - * the event work before further blockers are allowed to fin= ish. > > >> - */ > > >> - mutex_lock(&ev->block_mutex); > > >> - > > >> - spin_lock_irqsave(&ev->lock, flags); > > >> - cancel =3D !ev->block++; > > >> - spin_unlock_irqrestore(&ev->lock, flags); > > >> - > > >> - if (cancel) > > >> + if (atomic_inc_return(&ev->block) =3D=3D 1) > > >> cancel_delayed_work_sync(&disk->ev->dwork); > > >> > > >> - mutex_unlock(&ev->block_mutex); > > >> } > > > > > > Hello Hannes, > > > > > > I have already encountered a few times a deadlock that was caused by = the > > > event checking code so I agree with you that it would be a big step f= orward > > > if such deadlocks wouldn't occur anymore. However, this patch realize= s a > > > change that has not been described in the patch description, namely t= hat > > > disk_block_events() calls are no longer serialized. Are you sure it i= s safe > > > to drop the serialization of disk_block_events() calls? > > > > > Well, this whole synchronization stuff it a bit weird; I so totally fai= l > > to see the rationale for it. > > But anyway, once we've converted ev->block to atomics I _think_ the > > mutex_lock can remain; will be checking. > > > > Cheers, > > > > Hannes > > -- >=20 > Hi, I think I got the same calltrace with today's linux-next (next-201702= 03). >=20 > The issue happened every time when my Linux virtual machine booted and > Hannes's patch could NOT help. >=20 > The calltrace is pasted below. >=20 > -- Dexuan =20 Any news on this thread? The issue is still blocking Linux from booting up normally in my test. :-( Have we identified the faulty patch? If so, at least I can try to revert it to boot up. Thanks, -- Dexuan From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752417AbdBGCXN (ORCPT ); Mon, 6 Feb 2017 21:23:13 -0500 Received: from mail-sn1nam02on0129.outbound.protection.outlook.com ([104.47.36.129]:4102 "EHLO NAM02-SN1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1752149AbdBGCXK (ORCPT ); Mon, 6 Feb 2017 21:23:10 -0500 From: Dexuan Cui To: Hannes Reinecke , Bart Van Assche , "hare@suse.de" , "axboe@kernel.dk" CC: "hch@lst.de" , "linux-kernel@vger.kernel.org" , "linux-block@vger.kernel.org" , "jth@kernel.org" Subject: RE: [PATCH] genhd: Do not hold event lock when scheduling workqueue elements Thread-Topic: [PATCH] genhd: Do not hold event lock when scheduling workqueue elements Thread-Index: AQHSe9062dHV0XX4/EGXRjibwyBPjaFXNu6QgAWiPdA= Date: Tue, 7 Feb 2017 02:23:06 +0000 Message-ID: References: <1484732896-22941-1-git-send-email-hare@suse.de> <1485822639.2669.16.camel@sandisk.com> <532c55c4-15da-d2f9-401c-36bc4343756b@suse.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=decui@microsoft.com; x-originating-ip: [2404:f801:9000:18::54b] x-ms-office365-filtering-correlation-id: 764dc018-705d-4a2e-8046-08d44f0040a2 x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:(22001)(48565401081);SRVR:MWHPR03MB2670; x-microsoft-exchange-diagnostics: 1;MWHPR03MB2670;7:r1yal+pQj+AVOfyCVcimjDjgm2KS8zS/C+K0HGojXyCTH2QyrIwZiHGLtbDv/mDX+8yt+cZkXVKEl/z5md3QBEMHfKp90hSeZNMc9pCJlT0ws9ceaPGGEnFJUc1w9dpToCEKhr3s3VppahMzsqthLnWTRoJrj2Hxy2gaibUr8f5bTXSvA5hQL4BWK3+r9nJAT1EINUBwNpJuat6oOiJfiNc0DXiGHNBiKMxgpLZOXzTw8aZDcNkoHsMb/tr6lM11jf+3qpcCDoaJbMDQkBhnfEOoeqw3X1n5Z4lUxjieo35eD5Xot34qXAud94HktEwfOCjTVwiN4iszX7Fsqc2tLh0JOgG5qmPZpA0jXk2rdYQSrGDBNih4aR2KnZaCWbG07TuUJoShJO8MqCT0o2DkOpTW1BsY/9HaEKNu8QlkZ2cnnUmtmYaNtKIcvhTW5SAxobZ7Ls5Do2SydLeBb4eC9TwmVTwtpp1wAt61Y7EcBy6uIRyALi9YKxhJFZ26Meb2FzNeAinJQP9/L8P2AtMGV6EDSb5fdAxnv5xdWM5mSKk= x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(9452136761055)(131327999870524)(42932892334569); x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(61425038)(6040375)(601004)(2401047)(20170203043)(2017020601026)(8121501046)(5005006)(10201501046)(3002001)(6055026)(61426038)(61427038)(6041248)(20161123558025)(20161123555025)(20161123560025)(20161123562025)(20161123564025)(6072148)(6042181);SRVR:MWHPR03MB2670;BCL:0;PCL:0;RULEID:;SRVR:MWHPR03MB2670; x-forefront-prvs: 0211965D06 x-forefront-antispam-report: SFV:NSPM;SFS:(10019020)(6009001)(7916002)(52314003)(24454002)(377454003)(199003)(189002)(377424004)(106116001)(6506006)(2900100001)(25786008)(53546003)(6246003)(4326007)(10090500001)(229853002)(2906002)(10290500002)(1691005)(68736007)(77096006)(8990500004)(3660700001)(189998001)(6116002)(97736004)(2950100002)(102836003)(5005710100001)(122556002)(53936002)(3280700002)(92566002)(6436002)(101416001)(305945005)(74316002)(33656002)(86362001)(7736002)(8676002)(105586002)(81166006)(93886004)(86612001)(2201001)(8656002)(99286003)(55016002)(50986999)(54906002)(9686003)(2501003)(7696004)(106356001)(5660300001)(76176999)(81156014)(38730400002)(8936002)(54356999);DIR:OUT;SFP:1102;SCL:1;SRVR:MWHPR03MB2670;H:MWHPR03MB2669.namprd03.prod.outlook.com;FPR:;SPF:None;PTR:InfoNoRecords;A:1;MX:1;LANG:en; spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 X-OriginatorOrg: microsoft.com X-MS-Exchange-CrossTenant-originalarrivaltime: 07 Feb 2017 02:23:06.7577 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 72f988bf-86f1-41af-91ab-2d7cd011db47 X-MS-Exchange-Transport-CrossTenantHeadersStamped: MWHPR03MB2670 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by mail.home.local id v172NKHL004543 > From: linux-block-owner@vger.kernel.org [mailto:linux-block- > owner@vger.kernel.org] On Behalf Of Dexuan Cui > Sent: Friday, February 3, 2017 20:23 > To: Hannes Reinecke ; Bart Van Assche > ; hare@suse.de; axboe@kernel.dk > Cc: hch@lst.de; linux-kernel@vger.kernel.org; linux-block@vger.kernel.org; > jth@kernel.org > Subject: RE: [PATCH] genhd: Do not hold event lock when scheduling workqueue > elements > > > From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel- > > owner@vger.kernel.org] On Behalf Of Hannes Reinecke > > Sent: Wednesday, February 1, 2017 00:15 > > To: Bart Van Assche ; hare@suse.de; > > axboe@kernel.dk > > Cc: hch@lst.de; linux-kernel@vger.kernel.org; linux-block@vger.kernel.org; > > jth@kernel.org > > Subject: Re: [PATCH] genhd: Do not hold event lock when scheduling > workqueue > > elements > > > > On 01/31/2017 01:31 AM, Bart Van Assche wrote: > > > On Wed, 2017-01-18 at 10:48 +0100, Hannes Reinecke wrote: > > >> @@ -1488,26 +1487,13 @@ static unsigned long > > disk_events_poll_jiffies(struct gendisk *disk) > > >> void disk_block_events(struct gendisk *disk) > > >> { > > >> struct disk_events *ev = disk->ev; > > >> - unsigned long flags; > > >> - bool cancel; > > >> > > >> if (!ev) > > >> return; > > >> > > >> - /* > > >> - * Outer mutex ensures that the first blocker completes canceling > > >> - * the event work before further blockers are allowed to finish. > > >> - */ > > >> - mutex_lock(&ev->block_mutex); > > >> - > > >> - spin_lock_irqsave(&ev->lock, flags); > > >> - cancel = !ev->block++; > > >> - spin_unlock_irqrestore(&ev->lock, flags); > > >> - > > >> - if (cancel) > > >> + if (atomic_inc_return(&ev->block) == 1) > > >> cancel_delayed_work_sync(&disk->ev->dwork); > > >> > > >> - mutex_unlock(&ev->block_mutex); > > >> } > > > > > > Hello Hannes, > > > > > > I have already encountered a few times a deadlock that was caused by the > > > event checking code so I agree with you that it would be a big step forward > > > if such deadlocks wouldn't occur anymore. However, this patch realizes a > > > change that has not been described in the patch description, namely that > > > disk_block_events() calls are no longer serialized. Are you sure it is safe > > > to drop the serialization of disk_block_events() calls? > > > > > Well, this whole synchronization stuff it a bit weird; I so totally fail > > to see the rationale for it. > > But anyway, once we've converted ev->block to atomics I _think_ the > > mutex_lock can remain; will be checking. > > > > Cheers, > > > > Hannes > > -- > > Hi, I think I got the same calltrace with today's linux-next (next-20170203). > > The issue happened every time when my Linux virtual machine booted and > Hannes's patch could NOT help. > > The calltrace is pasted below. > > -- Dexuan Any news on this thread? The issue is still blocking Linux from booting up normally in my test. :-( Have we identified the faulty patch? If so, at least I can try to revert it to boot up. Thanks, -- Dexuan