From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0F68AC433DF for ; Tue, 18 Aug 2020 13:50:29 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id CB5A6206DA for ; Tue, 18 Aug 2020 13:50:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1597758628; bh=omiWsQfnzSBaf81C+C5v56nWimMqUyxwiQky0T6QpcA=; h=Date:From:To:Cc:Subject:References:In-Reply-To:List-ID:From; b=YqkjItJsy/gac17QIOFEWtijh4Lqy52JwOlFUtdZswuAhW1IAtFQbjjQZn4RGcUSn AHkWR1XKH54tLvIfhiARXvxk1hny1VBMSqURcP4OVbD5YMzNtkOMmfRKZSpaO0iAh8 FoRmwIwIgEBVH4EzqQl7CABCBVJaxMYvaJOvXgxU= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726826AbgHRNu1 (ORCPT ); Tue, 18 Aug 2020 09:50:27 -0400 Received: from mail.kernel.org ([198.145.29.99]:53688 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726482AbgHRNuX (ORCPT ); Tue, 18 Aug 2020 09:50:23 -0400 Received: from localhost (83-86-89-107.cable.dynamic.v4.ziggo.nl [83.86.89.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id C63D4206DA; Tue, 18 Aug 2020 13:50:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1597758622; bh=omiWsQfnzSBaf81C+C5v56nWimMqUyxwiQky0T6QpcA=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=bbqRxEerJGafXIATjU+Q/2wDTKtKmbe6ixV3RQrJVo6WtjJNhgykwBkVj9bM6o/MZ 1UojFmy5fWQvESZVLPaeZhTCj2sq5KMNAtNvQK9A0rGVCmpVaYzKJZ9liMEIzFB1Jh fsHgRoNgWVorHmarCEMu0aFtYF+Zn28sMyOoxV5Q= Date: Tue, 18 Aug 2020 15:50:45 +0200 From: Greg KH To: Hugh Dickins Cc: Linus Torvalds , Oleg Nesterov , Michal Hocko , Linux-MM , LKML , Andrew Morton , Tim Chen , Michal Hocko Subject: Re: [RFC PATCH] mm: silence soft lockups from unlock_page Message-ID: <20200818135045.GA495837@kroah.com> References: <20200724152424.GC17209@redhat.com> <20200725101445.GB3870@redhat.com> <20200727193512.GA236164@kroah.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Aug 05, 2020 at 10:46:12PM -0700, Hugh Dickins wrote: > On Mon, 27 Jul 2020, Greg KH wrote: > > > > Linus just pointed me at this thread. > > > > If you could run: > > echo -n 'module xhci_hcd =p' > /sys/kernel/debug/dynamic_debug/control > > and run the same workload to see if anything shows up in the log when > > xhci crashes, that would be great. > > Thanks, I tried that, and indeed it did have a story to tell: > > ep 0x81 - asked for 16 bytes, 10 bytes untransferred > ep 0x81 - asked for 16 bytes, 10 bytes untransferred > ep 0x81 - asked for 16 bytes, 10 bytes untransferred > a very large number of lines like the above, then > Cancel URB 00000000d81602f7, dev 4, ep 0x0, starting at offset 0xfffd42c0 > // Ding dong! > ep 0x81 - asked for 16 bytes, 10 bytes untransferred > Stopped on No-op or Link TRB for slot 1 ep 0 > xhci_drop_endpoint called for udev 000000005bc07fa6 > drop ep 0x81, slot id 1, new drop flags = 0x8, new add flags = 0x0 > add ep 0x81, slot id 1, new drop flags = 0x8, new add flags = 0x8 > xhci_check_bandwidth called for udev 000000005bc07fa6 > // Ding dong! > Successful Endpoint Configure command > Cancel URB 000000006b77d490, dev 4, ep 0x81, starting at offset 0x0 > // Ding dong! > Stopped on No-op or Link TRB for slot 1 ep 2 > Removing canceled TD starting at 0x0 (dma). > list_del corruption: prev(ffff8fdb4de7a130)->next should be ffff8fdb41697f88, > but is 6b6b6b6b6b6b6b6b; next(ffff8fdb4de7a130)->prev is 6b6b6b6b6b6b6b6b. > ------------[ cut here ]------------ > kernel BUG at lib/list_debug.c:53! > RIP: 0010:__list_del_entry_valid+0x8e/0xb0 > Call Trace: > > handle_cmd_completion+0x7d4/0x14f0 [xhci_hcd] > xhci_irq+0x242/0x1ea0 [xhci_hcd] > xhci_msi_irq+0x11/0x20 [xhci_hcd] > __handle_irq_event_percpu+0x48/0x2c0 > handle_irq_event_percpu+0x32/0x80 > handle_irq_event+0x4a/0x80 > handle_edge_irq+0xd8/0x1b0 > handle_irq+0x2b/0x50 > do_IRQ+0xb6/0x1c0 > common_interrupt+0x90/0x90 > > > Info provided for your interest, not expecting any response. > The list_del info in there is non-standard, from a patch of mine: > I find hashed addresses in debug output less than helpful. Thanks for this, that is really odd. > > > > Although if you are using an "older version" of the driver, there's not > > much I can suggest except update to a newer one :) > > Yes, I was reluctant to post any info, since really the ball is at our > end of the court, not yours. I did have a go at bringing in the latest > xhci driver instead, but quickly saw that was not a sensible task for > me. And I did scan the git log of xhci changes (especially xhci-ring.c > changes): thought I saw a likely relevant and easily applied fix commit, > but in fact it made no difference here. > > I suspect it's in part a hardware problem, but driver not recovering > correctly. I've replaced the machine (but also noticed that the same > crash has occasionally been seen on other machines). I'm sure it has > no relevance to this unlock_page() thread, though it's quite possible > that it's triggered under stress, and Linus's changes allowed greater > stress. I will be willing to blame hardware problems for this as well, but will save this report in case something else shows up in the future, thanks! greg k-h