From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751838AbbKJR1o (ORCPT ); Tue, 10 Nov 2015 12:27:44 -0500 Received: from mx1.redhat.com ([209.132.183.28]:56925 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751473AbbKJR1m (ORCPT ); Tue, 10 Nov 2015 12:27:42 -0500 Date: Tue, 10 Nov 2015 12:27:41 -0500 From: Mike Snitzer To: =?utf-8?B?Qm/FoXRqYW4gxaBrdWZjYSBAIFRlb24uc2k=?= Cc: device-mapper development , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: Linux >= 4.2 dm_any_congested bug due to bad data from vfs/mm? [was: Bug in dm_any_congested?] Message-ID: <20151110172740.GA5450@redhat.com> References: <56420188.7030406@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org [Cc'ing LKML and linux-fsdevel to cast a wider net and raise awareness] On Tue, Nov 10 2015 at 10:02am -0500, Boštjan Škufca @ Teon.si wrote: > On 10 November 2015 at 15:39, Zdenek Kabelac wrote: > > Dne 10.11.2015 v 14:14 Boštjan Škufca @ Teon.si napsal(a): > >> > >> Hi all, > >> > >> HW is a bit dated, but had no problems with it up to now, and SW raid > >> is used here. Kernel was 4.2.4. > >> > >> Is this the right mlist for such bug? > > > > > > Hi > > > > Yes the issue is known - but source is not fully known. > > I've opened public BZ: https://bugzilla.redhat.com/1279941 > > There is some potential fix - but unclear what it solves: > > http://git.kernel.org/linus/ad5f498f610 > > So 4.1.13 is ok in this respect, or is this unknown ATM? > > Does it depend on underlying storage at all, or not? MD does not seem > to be listed in stack trace. We don't yet have a reliable reproducer. So if your test proves to reliably reproduce the issue for you then we may be able to make much quicker progress. While the bug manifests as a crash in dm_any_congested (either NULL pointer or GPF) it _seems_ that the problem is further up the stack in the vfs and/or mm (by passing garbage into dm_any_congested via call to queue->backing_dev_info.congested_fn). But all possibilities are still on the table... again not much to go on yet. Please feel free to test using the 4.4 stable@ commit Zdenek referenced (but I'm skeptical it'll fix this issue if you aren't reactivating volumes or anything): http://git.kernel.org/linus/ad5f498f610 Also, you're welcome to update this BZ as you collect additional info: https://bugzilla.redhat.com/1279941 Thanks, Mike