From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754022AbdH2QNc (ORCPT ); Tue, 29 Aug 2017 12:13:32 -0400 Received: from mga11.intel.com ([192.55.52.93]:63093 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752419AbdH2QNb (ORCPT ); Tue, 29 Aug 2017 12:13:31 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.41,445,1498546800"; d="scan'208";a="1008900648" Subject: Re: [PATCH 2/2 v2] sched/wait: Introduce lock breaker in wake_up_page_bit To: Linus Torvalds , "Liang, Kan" Cc: Mel Gorman , Peter Zijlstra , Ingo Molnar , Andi Kleen , Andrew Morton , Johannes Weiner , Jan Kara , Christopher Lameter , "Eric W . Biederman" , Davidlohr Bueso , linux-mm , Linux Kernel Mailing List References: <83f675ad385d67760da4b99cd95ee912ca7c0b44.1503677178.git.tim.c.chen@linux.intel.com> <37D7C6CF3E00A74B8858931C1DB2F077537A07E9@SHSMSX103.ccr.corp.intel.com> <37D7C6CF3E00A74B8858931C1DB2F077537A1C19@SHSMSX103.ccr.corp.intel.com> From: Tim Chen Message-ID: Date: Tue, 29 Aug 2017 09:13:29 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.1.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 08/29/2017 09:01 AM, Linus Torvalds wrote: > On Tue, Aug 29, 2017 at 5:57 AM, Liang, Kan wrote: >>> >>> Attached is an ALMOST COMPLETELY UNTESTED forward-port of those two >>> patches, now without that nasty WQ_FLAG_ARRIVALS logic, because we now >>> always put the new entries at the end of the waitqueue. >> >> The patches fix the long wait issue. >> >> Tested-by: Kan Liang > > Ok. I'm not 100% comfortable applying them at rc7, so let me think > about it. There's only one known load triggering this, and by "known" > I mean "not really known" since we don't even know what the heck it > does outside of intel and whoever your customer is. > > So I suspect I'll apply the patches next merge window, and we can > maybe mark them for stable if this actually ends up mattering. > > Can you tell if the problem is actually hitting _production_ use or > was some kind of benchmark stress-test? > > It is affecting not a production use, but the customer's acceptance test for their systems. So I suspect it is a stress test. Tim From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg0-f69.google.com (mail-pg0-f69.google.com [74.125.83.69]) by kanga.kvack.org (Postfix) with ESMTP id 1F5B56B025F for ; Tue, 29 Aug 2017 12:13:32 -0400 (EDT) Received: by mail-pg0-f69.google.com with SMTP id t3so7170356pgt.8 for ; Tue, 29 Aug 2017 09:13:32 -0700 (PDT) Received: from mga09.intel.com (mga09.intel.com. [134.134.136.24]) by mx.google.com with ESMTPS id j5si2705570pgn.55.2017.08.29.09.13.30 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 29 Aug 2017 09:13:30 -0700 (PDT) Subject: Re: [PATCH 2/2 v2] sched/wait: Introduce lock breaker in wake_up_page_bit References: <83f675ad385d67760da4b99cd95ee912ca7c0b44.1503677178.git.tim.c.chen@linux.intel.com> <37D7C6CF3E00A74B8858931C1DB2F077537A07E9@SHSMSX103.ccr.corp.intel.com> <37D7C6CF3E00A74B8858931C1DB2F077537A1C19@SHSMSX103.ccr.corp.intel.com> From: Tim Chen Message-ID: Date: Tue, 29 Aug 2017 09:13:29 -0700 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Linus Torvalds , "Liang, Kan" Cc: Mel Gorman , Peter Zijlstra , Ingo Molnar , Andi Kleen , Andrew Morton , Johannes Weiner , Jan Kara , Christopher Lameter , "Eric W . Biederman" , Davidlohr Bueso , linux-mm , Linux Kernel Mailing List On 08/29/2017 09:01 AM, Linus Torvalds wrote: > On Tue, Aug 29, 2017 at 5:57 AM, Liang, Kan wrote: >>> >>> Attached is an ALMOST COMPLETELY UNTESTED forward-port of those two >>> patches, now without that nasty WQ_FLAG_ARRIVALS logic, because we now >>> always put the new entries at the end of the waitqueue. >> >> The patches fix the long wait issue. >> >> Tested-by: Kan Liang > > Ok. I'm not 100% comfortable applying them at rc7, so let me think > about it. There's only one known load triggering this, and by "known" > I mean "not really known" since we don't even know what the heck it > does outside of intel and whoever your customer is. > > So I suspect I'll apply the patches next merge window, and we can > maybe mark them for stable if this actually ends up mattering. > > Can you tell if the problem is actually hitting _production_ use or > was some kind of benchmark stress-test? > > It is affecting not a production use, but the customer's acceptance test for their systems. So I suspect it is a stress test. Tim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org