From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.3 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4B379C432C0 for ; Tue, 26 Nov 2019 16:15:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 2C88720727 for ; Tue, 26 Nov 2019 16:15:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728414AbfKZQPo (ORCPT ); Tue, 26 Nov 2019 11:15:44 -0500 Received: from mailbackend.panix.com ([166.84.1.89]:43348 "EHLO mailbackend.panix.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727580AbfKZQPo (ORCPT ); Tue, 26 Nov 2019 11:15:44 -0500 Received: from hp-x360n (c-73-241-154-233.hsd1.ca.comcast.net [73.241.154.233]) by mailbackend.panix.com (Postfix) with ESMTPSA id 47Mpsn1mr7z1bwH; Tue, 26 Nov 2019 11:15:41 -0500 (EST) Date: Tue, 26 Nov 2019 08:15:39 -0800 (PST) From: "Kenneth R. Crudup" Reply-To: "Kenneth R. Crudup" To: "Rafael J. Wysocki" cc: "Rafael J. Wysocki" , Rafael Wysocki , Linux PM Subject: Re: Help me fix a regression caused by 56b9918490 (PM: sleep: Simplify suspend-to-idle control flow) In-Reply-To: Message-ID: References: <2977390.9qzeJo7xji@kreacher> User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-pm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org On Tue, 26 Nov 2019, Rafael J. Wysocki wrote: > > BTW, I've got breaking it down to a science now: > > - Let the battery drain pretty well down (< ~80%) > > - Boot the kernel I wish to test, while still on battery > > - Intiate a suspend/resume, which will come back OK > > - Initiate another suspend > > - Plug in the charger. If I have my power meter in, I see it do the PD > > negotiation (it'll start off at 20V/150mA, then it'll PD ramp up to > > a full 2.5-2.75A @20v to charge the battery) > > - Try to resume. It'll be totally dead and I have to long-power-button > > to get it back Huh. So ... I run bleeding edge and grab Linus' tip as it's pushed; I'd seen some changes to workqueues in last night's merge. Personally, I feel that if workqueues were AFU the kernel would be a hot mess, but at this point since there's one in the path of acpi_ec_flush_work() called from acpi_s2idle_sync() and I'm kinda desperate at this point for at least something to help fix this, I put your 2nd patch back in earnest (uncommented out acpi_ec_flush_work() from acpi_s2idle_sync()) and added a pair of WARN_ON(1)s (which should dump to pstore- or is that "BUG_ON()"?) around the spin_lock/unlock in acpi_ec_query_flushed(). Who knows if this is a race condition, the extra (slight) overhead of the output of the WARN_ON, or just dumb luck but it ... here I go, tempting fate ... seems to be working. I drained the battery to 75% and ran the procedure a couple of times yesterday, and so far, so good. Of course, I'll have to give it the "car ride test" (and I'll be damned if I know why that brings this bug out so consistently that flipping it around on its axes randomly for a while does), but I'm hopeful (and again, a public announcement is usually guaranteed to make it break). Stay tuned, again, -Kenny -- Kenneth R. Crudup Sr. SW Engineer, Scott County Consulting, Silicon Valley