From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.9 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A693CC33CB6 for ; Wed, 22 Jan 2020 08:16:33 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 505972253D for ; Wed, 22 Jan 2020 08:16:33 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="SERWtuP+" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 505972253D Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=amazon.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id F1FB96B000A; Wed, 22 Jan 2020 03:16:32 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id ED03D6B000C; Wed, 22 Jan 2020 03:16:32 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DBDC66B000D; Wed, 22 Jan 2020 03:16:32 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0126.hostedemail.com [216.40.44.126]) by kanga.kvack.org (Postfix) with ESMTP id B988F6B000A for ; Wed, 22 Jan 2020 03:16:32 -0500 (EST) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with SMTP id 736F12C84 for ; Wed, 22 Jan 2020 08:16:32 +0000 (UTC) X-FDA: 76404563424.25.bite85_90546e91eb235 X-HE-Tag: bite85_90546e91eb235 X-Filterd-Recvd-Size: 15809 Received: from smtp-fw-33001.amazon.com (smtp-fw-33001.amazon.com [207.171.190.10]) by imf24.hostedemail.com (Postfix) with ESMTP for ; Wed, 22 Jan 2020 08:16:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1579680992; x=1611216992; h=from:to:cc:subject:date:message-id:in-reply-to: mime-version; bh=A/5aGnXEkuVacvz+Hwcazo5DnQuInoCW2fn4/kQIEeE=; b=SERWtuP+7iWpfmJndrWb+iy3YedRfiMnC4KlfoVAnov2bLKMYZPKhMLs 3DDtDWs5jUzIjyMs9U6hLNR32VE6R8unx9JOSLA3X2zGReEzfxHQX5aZl dsXWDBYrNCcCgIrxye4Gr0OW8S2fiF5KvvKmqItI15BUYh7or5PNU6xU4 w=; IronPort-SDR: DC9pCMpPytZ3AOKyt7rZmQd4Dnc5J0oIcWISONv7tzSYzXyOf/nvGKLRNXof1puQDXDyT4U6qV Y/ghBbsDCrVg== X-IronPort-AV: E=Sophos;i="5.70,348,1574121600"; d="scan'208";a="21677566" Received: from sea32-co-svc-lb4-vlan3.sea.corp.amazon.com (HELO email-inbound-relay-2a-22cc717f.us-west-2.amazon.com) ([10.47.23.38]) by smtp-border-fw-out-33001.sea14.amazon.com with ESMTP; 22 Jan 2020 08:16:14 +0000 Received: from EX13MTAUEA002.ant.amazon.com (pdx4-ws-svc-p6-lb7-vlan3.pdx.amazon.com [10.170.41.166]) by email-inbound-relay-2a-22cc717f.us-west-2.amazon.com (Postfix) with ESMTPS id 5A373A1983; Wed, 22 Jan 2020 08:16:13 +0000 (UTC) Received: from EX13D31EUA001.ant.amazon.com (10.43.165.15) by EX13MTAUEA002.ant.amazon.com (10.43.61.77) with Microsoft SMTP Server (TLS) id 15.0.1236.3; Wed, 22 Jan 2020 08:16:12 +0000 Received: from u886c93fd17d25d.ant.amazon.com (10.43.161.253) by EX13D31EUA001.ant.amazon.com (10.43.165.15) with Microsoft SMTP Server (TLS) id 15.0.1367.3; Wed, 22 Jan 2020 08:16:04 +0000 From: SeongJae Park To: CC: , SeongJae Park , , , , , , , , , , , , Yunjae Lee , , , , Subject: Re: [PATCH 2/8] mm/damon: Implement region based sampling Date: Wed, 22 Jan 2020 09:15:35 +0100 Message-ID: <20200122081535.23080-1-sjpark@amazon.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20200120162757.32375-3-sjpark@amazon.com> (raw) MIME-Version: 1.0 Content-Type: text/plain X-Originating-IP: [10.43.161.253] X-ClientProxiedBy: EX13D24UWA002.ant.amazon.com (10.43.160.200) To EX13D31EUA001.ant.amazon.com (10.43.165.15) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, 20 Jan 2020 17:27:51 +0100 SeongJae Park wrote: > From: SeongJae Park > > This commit implements DAMON's basic access check and region based > sampling mechanisms. > > Basic Access Check > ------------------ > > DAMON basically reports what pages are how frequently accessed. Note > that the frequency is not an absolute number of accesses, but a relative > frequency among the pages of the target workloads. > > Users can control the resolution of the reports by setting two time > intervals, ``sampling interval`` and ``aggregation interval``. In > detail, DAMON checks access to each page per ``sampling interval``, > aggregates the results (counts the number of the accesses to each page), > and reports the aggregated results per ``aggregation interval``. For > the access check of each page, DAMON uses the Accessed bits of PTEs. > > This is thus similar to common periodic access checks based access > tracking mechanisms, which overhead is increasing as the size of the > target process grows. > > Region Based Sampling > --------------------- > > To avoid the unbounded increase of the overhead, DAMON groups a number > of adjacent pages that assumed to have same access frequencies into a > region. As long as the assumption (pages in a region have same access > frequencies) is kept, only one page in the region is required to be > checked. Thus, for each ``sampling interval``, DAMON randomly picks one > page in each region and clears its Accessed bit. After one more > ``sampling interval``, DAMON reads the Accessed bit of the page and > increases the access frequency of the region if the bit has set > meanwhile. Therefore, the monitoring overhead is controllable by > setting the number of regions. > > Nonetheless, this scheme cannot preserve the quality of the output if > the assumption is not kept. > > Signed-off-by: SeongJae Park > --- > mm/damon.c | 599 +++++++++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 599 insertions(+) > > diff --git a/mm/damon.c b/mm/damon.c > index 064ec1f6ded9..2a0c010291f8 100644 > --- a/mm/damon.c > +++ b/mm/damon.c > @@ -9,9 +9,14 @@ > > #define pr_fmt(fmt) "damon: " fmt > [...] > + > +/* > + * Check whether the given region has accessed since the last check > + * > + * mm 'mm_struct' for the given virtual address space > + * r the region to be checked > + */ > +static void kdamond_check_access(struct mm_struct *mm, struct damon_region *r) > +{ > + pte_t *pte = NULL; > + pmd_t *pmd = NULL; > + spinlock_t *ptl; > + > + if (follow_pte_pmd(mm, r->sampling_addr, NULL, &pte, &pmd, &ptl)) > + goto mkold; > + > + /* Read the page table access bit of the page */ > + if (pte && pte_young(*pte)) > + r->nr_accesses++; > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE > + else if (pmd && pmd_young(*pmd)) > + r->nr_accesses++; > +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ > + > + spin_unlock(ptl); > + > +mkold: > + /* mkold next target */ > + r->sampling_addr = damon_rand(r->vm_start, r->vm_end); > + > + if (follow_pte_pmd(mm, r->sampling_addr, NULL, &pte, &pmd, &ptl)) > + return; > + > + if (pte) { > + if (pte_young(*pte)) > + clear_page_idle(pte_page(*pte)); Yunjae has personally pointed me out that this could interfere with reclamation logic because page_referenced_one() checks the pte Accessed bits. As the function also checks PG_Young, we agreed to adjust PG_Young in addition to the PG_Idle here, as below: diff --git a/mm/damon.c b/mm/damon.c index 8067ea916f81..55b89a2c0140 100644 --- a/mm/damon.c +++ b/mm/damon.c @@ -491,14 +491,18 @@ static void kdamond_check_access(struct mm_struct *mm, struct damon_region *r) return; if (pte) { - if (pte_young(*pte)) + if (pte_young(*pte)) { clear_page_idle(pte_page(*pte)); + set_page_young(pte_page(*pte)); + } *pte = pte_mkold(*pte); } #ifdef CONFIG_TRANSPARENT_HUGEPAGE else if (pmd) { - if (pmd_young(*pmd)) + if (pmd_young(*pmd)) { clear_page_idle(pmd_page(*pmd)); + set_page_young(pte_page(*pte)); + } *pmd = pmd_mkold(*pmd); } #endif This change will be merged into this patch by next spin. Also, adding CC for page_idle.c related people. Thanks, SeongJae Park > + *pte = pte_mkold(*pte); > + } > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE > + else if (pmd) { > + if (pmd_young(*pmd)) > + clear_page_idle(pmd_page(*pmd)); > + *pmd = pmd_mkold(*pmd); > + } > +#endif > + > + spin_unlock(ptl); > +} > + > +/* > + * Check whether a time interval is elapsed > + * > + * baseline the time to check whether the interval has elapsed since > + * interval the time interval (microseconds) > + * > + * See whether the given time interval has passed since the given baseline > + * time. If so, it also updates the baseline to current time for next check. > + * > + * Returns true if the time interval has passed, or false otherwise. > + */ > +static bool damon_check_reset_time_interval(struct timespec64 *baseline, > + unsigned long interval) > +{ > + struct timespec64 now; > + > + ktime_get_coarse_ts64(&now); > + if ((timespec64_to_ns(&now) - timespec64_to_ns(baseline)) / 1000 < > + interval) > + return false; > + *baseline = now; > + return true; > +} > + > +/* > + * Check whether it is time to flush the aggregated information > + */ > +static bool kdamond_aggregate_interval_passed(void) > +{ > + return damon_check_reset_time_interval(&last_aggregate_time, > + aggr_interval); > +} > + > +/* > + * Flush the content in the result buffer to the result file > + */ > +static void damon_flush_rbuffer(void) > +{ > + ssize_t sz; > + loff_t pos; > + struct file *rfile; > + > + while (damon_rbuf_offset) { > + pos = 0; > + rfile = filp_open(rfile_path, O_CREAT | O_RDWR | O_APPEND, > + 0644); > + if (IS_ERR(rfile)) { > + pr_err("Cannot open the result file %s\n", rfile_path); > + return; > + } > + > + sz = kernel_write(rfile, damon_rbuf, damon_rbuf_offset, &pos); > + filp_close(rfile, NULL); > + > + damon_rbuf_offset -= sz; > + } > +} > + > +/* > + * Write a data into the result buffer > + */ > +static void damon_write_rbuf(void *data, ssize_t size) > +{ > + if (damon_rbuf_offset + size > DAMON_LEN_RBUF) > + damon_flush_rbuffer(); > + > + memcpy(&damon_rbuf[damon_rbuf_offset], data, size); > + damon_rbuf_offset += size; > +} > + > +/* > + * Flush the aggregated monitoring results to the result buffer > + * > + * Stores current tracking results to the result buffer and reset 'nr_accesses' > + * of each regions. The format for the result buffer is as below: > + * > + *