From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.6 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,SPF_PASS,USER_AGENT_MUTT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AD088C169C4 for ; Mon, 11 Feb 2019 19:46:13 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 7A13921B25 for ; Mon, 11 Feb 2019 19:46:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1549914373; bh=0ixXY8uzB0Pvo1onEHul2zIG7pBq+vK+Z8l54eyoWmA=; h=Date:From:To:Cc:Subject:References:In-Reply-To:List-ID:From; b=SJVR499QQTQLy1A64O/eB/r6T2pOQQcRWU/NT0NLtvQXLtPLEOG5WEpXANc0/8SR+ jFt5Oc/4h409LFtRJqTu6NdcGbj54TPhBr2jnkNnLlSh4dsbT4rEe8wnta7xbaQz2U 40x7LqdwukI5uSy41A+rgLJ2V3ik38SW0YEYAbWk= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728127AbfBKTqM (ORCPT ); Mon, 11 Feb 2019 14:46:12 -0500 Received: from mail-pg1-f195.google.com ([209.85.215.195]:45259 "EHLO mail-pg1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727639AbfBKTqM (ORCPT ); Mon, 11 Feb 2019 14:46:12 -0500 Received: by mail-pg1-f195.google.com with SMTP id y4so27786pgc.12; Mon, 11 Feb 2019 11:46:11 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=RG0sPdbC4ODbf9rPSK++R+HWTB/Esr1/AVVdRu7MzfQ=; b=oi4fpBuN6YCj1YbFu2a+46hv8qlsKAAFyPvYtejmkiM3BT1jFBSMO72LWdri7MRanv qtT+eYnASZ8InMXuNYvoOeKShuC2GXU5ZCGqLBbKR4qrF9VlFxMv/tdsa4uhVttv/YdK pjdBxAiMgXVCk4WEbRhNkukOe0IHlP2bEP7vhR+eb9ULEMu/ke7yU3HTWdLqWg1nz5UP LXRb7f2HETGBV0MjjLsMMbOqxxshGpcFvyfIxk7lSLfYifupPhpyjjf1Kbo952SDRz5U m3iCzfA9QmVjBx5AzZynA1kE+t5Lp2Sdn5wuGCNsXjdF0wPz+wwbach7DXPFKRC1LaFp 67Gg== X-Gm-Message-State: AHQUAuab9DLaFMRlbHoeJ1AAddjVIuruqwo/WHNEAwyKKqCJTa0Nn9vw ECHk2NY/rtTVLNn+5ArbvuM= X-Google-Smtp-Source: AHgI3IZqBenhtDmIvsa1S8mUJsg2v5yHmclPgx3nQ/duN1PydgnxXfISVCuCKCPVDZhlz+JtqY1fuw== X-Received: by 2002:a62:6dc7:: with SMTP id i190mr38295649pfc.166.1549914371195; Mon, 11 Feb 2019 11:46:11 -0800 (PST) Received: from garbanzo.do-not-panic.com (c-73-71-40-85.hsd1.ca.comcast.net. [73.71.40.85]) by smtp.gmail.com with ESMTPSA id p2sm14861000pgc.94.2019.02.11.11.46.08 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 11 Feb 2019 11:46:09 -0800 (PST) Received: by garbanzo.do-not-panic.com (sSMTP sendmail emulation); Mon, 11 Feb 2019 11:46:06 -0800 Date: Mon, 11 Feb 2019 11:46:06 -0800 From: Luis Chamberlain To: Sasha Levin Cc: Dave Chinner , linux-xfs@vger.kernel.org, gregkh@linuxfoundation.org, Alexander.Levin@microsoft.com, stable@vger.kernel.org, amir73il@gmail.com, hch@infradead.org Subject: Re: [PATCH v2 00/10] xfs: stable fixes for v4.19.y Message-ID: <20190211194606.GO11489@garbanzo.do-not-panic.com> References: <20190204165427.23607-1-mcgrof@kernel.org> <20190205220655.GF14116@dastard> <20190206040559.GA4119@sasha-vm> <20190206215454.GG14116@dastard> <20190208060620.GA31898@sasha-vm> <20190208221726.GM11489@garbanzo.do-not-panic.com> <20190209215627.GB69686@sasha-vm> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190209215627.GB69686@sasha-vm> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: stable-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org On Sat, Feb 09, 2019 at 04:56:27PM -0500, Sasha Levin wrote: > On Fri, Feb 08, 2019 at 02:17:26PM -0800, Luis Chamberlain wrote: > > On Fri, Feb 08, 2019 at 01:06:20AM -0500, Sasha Levin wrote: > > Have you found pmem > > issues not present on other sections? > > Originally I've added this because the xfs folks suggested that pmem vs > block exercises very different code paths and we should be testing both > of them. > > Looking at the baseline I have, it seems that there are differences > between the failing tests. For example, with "MKFS_OPTIONS='-f -m > crc=1,reflink=0,rmapbt=0, -i sparse=0'", That's my "xfs" section. > generic/524 seems to fail on pmem but not on block. This is useful thanks! Can you get the failure rate? How often does it fail when you run the test? Always? Does it *never* fail on block? How many consecutive runs did you have run on block? To help with this oscheck has naggy-check.sh, you could run it until a failure is hit: ./naggy-check.sh -f -s xfs generic/524 And on another host: ./naggy-check.sh -f -s xfs_pmem generic/524 > > Any reason you don't name the sections with more finer granularity? > > It would help me in ensuring when we revise both of tests we can more > > easily ensure we're talking about apples, pears, or bananas. > > Nope, I'll happily rename them if there are "official" names for it :) Well since I am pushing out the stable fixes and am using oscheck to be transparent about how I test and what I track, and since I'm using section names, yes it would be useful to me. Simply adding a _pmem postfix to the pmem ones would suffice. > > FWIW, I run two different bare metal hosts now, and each has a VM guest > > per section above. One host I use for tracking stable, the other host for > > my changes. This ensures I don't mess things up easier and I can re-test > > any time fast. > > > > I dedicate a VM guest to test *one* section. I do this with oscheck > > easily: > > > > ./oscheck.sh --test-section xfs_nocrc | tee log-xfs-4.19.18+ > > > > For instance will just test xfs_nocrc section. On average each section > > takes about 1 hour to run. > > We have a similar setup then. I just spawn the VM on azure for each > section and run them all in parallel that way. Indeed. > I thought oscheck runs everything on a single VM, By default it does. > is it a built in > mechanism to spawn a VM for each config? Yes: ./oscheck.sh --test-section xfs_nocrc_512 For instance will test section xfs_nocrc_512 *only* on that host. > If so, I can add some code in > to support azure and we can use the same codebase. Groovy. I believe the next step will if you can send me your delta of expunges, and then I can run naggy-check.sh on them to see if I can reach similar results. I believe you have a larger expunge list. I suspect some of this may you may not have certain quirks handled. We will see. But getting this right and to sync our testing should yield good confirmation of failures. > > I could run the tests on raw nvme and do away with the guests, but > > that loses some of my ability to debug on crashes easily and out to > > baremetal.. but curious, how long do your tests takes? How about per > > section? Say just the default "xfs" section? > > I think that the longest config takes about 5 hours, otherwise > everything tends to take about 2 hours. Oh wow, mine are only 1 hour each. Guess I got a decent rig now :) > I basically run these on "repeat" until I issue a stop order, so in a > timespan of 48 hours some configs run ~20 times and some only ~10. I see... so you iterate over all tests and many times a day and this is how you've built your expunge list. Correct? It could could explain how you may end up with a larger set. This can mean some tests only fail at a non-100% failure rate, for these I'm annotating the failure rate as a comment on each expunge line. Having a consistent format for this and proper agreed upon term would be good. Right now I just mention how oftem I have to run a test before reaching a failure. This provides a rough estimate how many times one should iterate running the test in a loop before detecting a failure. Of course this may not always be acurate, given systems vary and this could play an impact on the failure... but at least it provides some guidance. It would be curious to see if we end up with similar failure rates for tests don't always fail. And if there is a divergence, how big this could be. Luis