From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING, NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 74447C64E7B for ; Mon, 30 Nov 2020 11:52:04 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id AC0C020825 for ; Mon, 30 Nov 2020 11:52:03 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org AC0C020825 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id B699B8D0003; Mon, 30 Nov 2020 06:52:02 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id AA4838D0001; Mon, 30 Nov 2020 06:52:02 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9B9BE8D0003; Mon, 30 Nov 2020 06:52:02 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0044.hostedemail.com [216.40.44.44]) by kanga.kvack.org (Postfix) with ESMTP id 7CFE68D0001 for ; Mon, 30 Nov 2020 06:52:02 -0500 (EST) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 2BE5A8249980 for ; Mon, 30 Nov 2020 11:52:02 +0000 (UTC) X-FDA: 77540920884.03.screw69_2000c25273a1 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin03.hostedemail.com (Postfix) with ESMTP id EECE828A4E9 for ; Mon, 30 Nov 2020 11:52:01 +0000 (UTC) X-HE-Tag: screw69_2000c25273a1 X-Filterd-Recvd-Size: 3496 Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf30.hostedemail.com (Postfix) with ESMTP for ; Mon, 30 Nov 2020 11:52:01 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 20497ACC6; Mon, 30 Nov 2020 11:52:00 +0000 (UTC) Subject: Re: Potential bug in soft-dirty bits (with test case) To: Mohamed Alzayat , Peter Zijlstra , "Aneesh Kumar K.V" Cc: linux-mm@kvack.org References: From: Vlastimil Babka Message-ID: <2e60ad6d-7bdf-e19e-4ad9-6942f76088d3@suse.cz> Date: Mon, 30 Nov 2020 12:51:59 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.5.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 11/30/20 11:37 AM, Mohamed Alzayat wrote: > On Fri, Nov 27, 2020 at 5:40 PM Vlastimil Babka wrote: >> >> On 11/25/20 3:15 PM, Mohamed Alzayat wrote: >> > Hi Everyone, >> > >> > I have noticed a change in the synchrony of updating the soft-dirty >> > bits in recent kernel versions (5.6+). More precisely, up to kernel >> > v5.5, the soft-dirty bits as parsed from /proc/pid/pagemap accurately >> > capture the dirtied pages. Recently, I started testing on kernels v5.6 >> > - v5.9, and I noticed that the soft-dirty bits are not immediately >> > updated. >> > >> > I have prepared a short test that repeatedly causes at least one >> > memory page to be dirtied, then scans /proc/pid/pagemap counting the >> > soft-dirty bits. The test fails if this count is zero. In my >> > observation, this test fails once in every 10-20 trials. The test >> > defaults to 100 trials and can be found at >> > https://gitlab.mpi-sws.org/-/snippets/1696 >> > >> > Is this non-synchronous propagation of soft dirty bits intended? If >> >> AFAIK, not. The tracking is done by write-protecting the pages to cause a page >> fault, so it should be quite synchronous update of page table entries, and >> reading pagemap is a page table walk of those very entries. >> >> But as you have the test, it should be possible to git bisect it? Just do enough >> trials to be sure enough that no fail means indeed a "good" kernel. > > Thanks for confirming, Vlastimil! > > The first bad commit is: 0758cd8304942292e95a0f750c374533db378b32 > asm-generic/tlb: avoid potential double flush > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0758cd8304942292e95a0f750c374533db378b32 > > Reverting this commit solves the problem, but this might not be the > right way of fixing it. Thanks for bisecting! Let's CC people involved in that commit. All important should be in the quoted conversation above. Vlastimil > >> >> > yes, is there a way to force the soft-dirty bits to be propagated to >> > the page map entries immediately, or is there an alternative interface >> > that has the synchronous behavior? >> > >> > Thanks in advance, >> > Mohamed Alzayat >> > >> >> >