From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753517AbdH2QB1 (ORCPT ); Tue, 29 Aug 2017 12:01:27 -0400 Received: from mail-io0-f180.google.com ([209.85.223.180]:36561 "EHLO mail-io0-f180.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751867AbdH2QBM (ORCPT ); Tue, 29 Aug 2017 12:01:12 -0400 MIME-Version: 1.0 In-Reply-To: <37D7C6CF3E00A74B8858931C1DB2F077537A1C19@SHSMSX103.ccr.corp.intel.com> References: <83f675ad385d67760da4b99cd95ee912ca7c0b44.1503677178.git.tim.c.chen@linux.intel.com> <37D7C6CF3E00A74B8858931C1DB2F077537A07E9@SHSMSX103.ccr.corp.intel.com> <37D7C6CF3E00A74B8858931C1DB2F077537A1C19@SHSMSX103.ccr.corp.intel.com> From: Linus Torvalds Date: Tue, 29 Aug 2017 09:01:11 -0700 X-Google-Sender-Auth: Bz03PHA3IQNp29n1jggz8Rczyes Message-ID: Subject: Re: [PATCH 2/2 v2] sched/wait: Introduce lock breaker in wake_up_page_bit To: "Liang, Kan" Cc: Tim Chen , Mel Gorman , Peter Zijlstra , Ingo Molnar , Andi Kleen , Andrew Morton , Johannes Weiner , Jan Kara , Christopher Lameter , "Eric W . Biederman" , Davidlohr Bueso , linux-mm , Linux Kernel Mailing List Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Aug 29, 2017 at 5:57 AM, Liang, Kan wrote: >> >> Attached is an ALMOST COMPLETELY UNTESTED forward-port of those two >> patches, now without that nasty WQ_FLAG_ARRIVALS logic, because we now >> always put the new entries at the end of the waitqueue. > > The patches fix the long wait issue. > > Tested-by: Kan Liang Ok. I'm not 100% comfortable applying them at rc7, so let me think about it. There's only one known load triggering this, and by "known" I mean "not really known" since we don't even know what the heck it does outside of intel and whoever your customer is. So I suspect I'll apply the patches next merge window, and we can maybe mark them for stable if this actually ends up mattering. Can you tell if the problem is actually hitting _production_ use or was some kind of benchmark stress-test? Linus