From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753279AbcERKS3 (ORCPT ); Wed, 18 May 2016 06:18:29 -0400 Received: from TYO202.gate.nec.co.jp ([210.143.35.52]:65030 "EHLO tyo202.gate.nec.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752695AbcERKS1 convert rfc822-to-8bit (ORCPT ); Wed, 18 May 2016 06:18:27 -0400 From: Naoya Horiguchi To: Mel Gorman CC: Vlastimil Babka , Andrew Morton , "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" , "Naoya Horiguchi" Subject: Re: [PATCH v1] mm: bad_page() checks bad_flags instead of page->flags for hwpoison page Thread-Topic: [PATCH v1] mm: bad_page() checks bad_flags instead of page->flags for hwpoison page Thread-Index: AQHRsA+9sMPxDDB/hUmwlukD1L6tNZ+91jCAgAAC04CAAAYSgIAABsqA Date: Wed, 18 May 2016 10:17:09 +0000 Message-ID: <20160518101709.GA25087@hori1.linux.bs1.fc.nec.co.jp> References: <1463470975-29972-1-git-send-email-n-horiguchi@ah.jp.nec.com> <20160518092100.GB2527@techsingularity.net> <573C365B.6020807@suse.cz> <20160518095251.GD2527@techsingularity.net> In-Reply-To: <20160518095251.GD2527@techsingularity.net> Accept-Language: en-US, ja-JP Content-Language: ja-JP X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.128.101.5] Content-Type: text/plain; charset="iso-2022-jp" Content-ID: <67C80B6B3C4D1E4EBF4F687D4A11C405@gisp.nec.co.jp> Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, May 18, 2016 at 10:52:51AM +0100, Mel Gorman wrote: > On Wed, May 18, 2016 at 11:31:07AM +0200, Vlastimil Babka wrote: > > On 05/18/2016 11:21 AM, Mel Gorman wrote: > > >On Tue, May 17, 2016 at 04:42:55PM +0900, Naoya Horiguchi wrote: > > >>There's a race window between checking page->flags and unpoisoning, which > > >>taints kernel with "BUG: Bad page state". That's overkill. It's safer to > > >>use bad_flags to detect hwpoisoned page. > > >> > > > > > >I'm not quite getting this one. Minimally, instead of = __PG_HWPOISON, it > > >should have been (bad_flags & __PG_POISON). As Vlastimil already pointed > > >out, __PG_HWPOISON can be 0. What I'm not getting is why this fixes the > > >race. The current race is > > > > > >1. Check poison, set bad_flags > > >2. poison clears in parallel > > >3. Check page->flag state in bad_page and trigger warning > > > > > >The code changes it to > > > > > >1. Check poison, set bad_flags > > >2. poison clears in parallel > > >3. Check bad_flags and trigger warning > > > > I think you got step 3 here wrong. It's "skip the warning since we have set > > bad_flags to hwpoison and bad_flags didn't change due to parallel unpoison". > > > > I think the benefit is marginal. The race means that the patch will trigger > a warning that might have been missed before due to a parallel unpoison > but that's not necessary a Good Thing. It's inherently race-prone. > > Naoya, if you fix the check to (bad_flags & __PG_POISON) then I'll add my > ack but I'm not convinced it's a real problem. This v1 had the wrong operator issue as you mentioned. I posted v2 a while ago, which has no such issue and is a better fix hopefully. Thanks, Naoya Horiguchi