From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.1 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CDA03C072AF for ; Mon, 20 May 2019 03:53:34 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 85AF220856 for ; Mon, 20 May 2019 03:53:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1558324414; bh=ObOj0+a2Oo4CDwwCq9IBUqEDoTpxzMttBVbuvV2zTrE=; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-ID:From; b=tQaP9wN9v5alFWLmSeVy9MPhCrv2I2puMlX9nWbATHKlkkaWmlRiIisGG4NvyRd+W PLoJyCIKzgYWy6u3Qdoa/qgyEeIyVMAp5A84T7vRoUzr/Pdf7Etu2x4LqGYuTmyNmL C+XhSFbUeDvpqIvE9yy7XXDSzgKdbMBQSJXdVROw= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730330AbfETDxd (ORCPT ); Sun, 19 May 2019 23:53:33 -0400 Received: from mail-pg1-f194.google.com ([209.85.215.194]:36603 "EHLO mail-pg1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727626AbfETDxc (ORCPT ); Sun, 19 May 2019 23:53:32 -0400 Received: by mail-pg1-f194.google.com with SMTP id a3so6094003pgb.3 for ; Sun, 19 May 2019 20:53:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=8fBJ35agorKCunTXs811LaBEjUlFt/o9xwPOQhaevY0=; b=rCgV6grFd1VFVtus7aOkg2Nhi0kCxqIOMHIRqgpyweiFaeLMopGpQwJXCZSKLwsibC QgYt+TXrIKw2sHv+WiiCOfsbLvzeoIvzscG2JiDBvxnBWFBqW/JRMfK5IiyIMerLxj0v oKNyq7x5uN0yJc6DDoT8UNzSaWOedLJWVbhMqTVfpBBauFJz6d1py2+RYNYursmwonRB 9Ia1hQqeDHdslM+lJrbutkWYR5BmmcZdz7nZslS804Sd9V71hpb0Q89SJWPb0d/+qXQZ 8Iy+Iee1ApriyJSBoamNpeHPHSspH0yPc0PAoQ/N9QX4UN5TXeHvXsjvosacyIOHPyW0 URWA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references:mime-version:content-transfer-encoding; bh=8fBJ35agorKCunTXs811LaBEjUlFt/o9xwPOQhaevY0=; b=WksBsZxTy9PQ1CV6ZfWkhay9dDtBV3twvU1QyuGzZJcjb6zghtlOp5USMKVImgaR42 8aoOC+oJqny0OiA9RcOB32jJ2Lyop0C4wXqEa429sAd36iiB7BAsSQWVqUBMiMtzdNSv dGmvJZ8AYbQNTf6GbvUzCVwukXM/gXgMoAO1gbzN5ODB5USrgmkaIsEr1RDnczcVcRTu re3P2H/8/FqBEhrhKPlcEi1EGgoofKP+XwP3Zwdevb++4OUly44VByDUE7NUwSRwWEo/ A5rRPsk+fxXuhW06MfZ0ghh4V+Gh3jhm+SfuwAjK5EJIRBdjNPOkv7iNvodV7bksklD7 RQVQ== X-Gm-Message-State: APjAAAXVWg0EihBb1+ULxMZTRpSc6eoe9gFprXFb2mQ6t0+q01WRON/0 vjymDWfLF93UdVT+3m9K5zw= X-Google-Smtp-Source: APXvYqyIBN0roIeUPC55CImKizlJCrfGac023V5GGgyvYa8R+Rw3Iny1vz3vJBv5Ytczunh6wQZSpg== X-Received: by 2002:a63:191b:: with SMTP id z27mr73201987pgl.327.1558324411188; Sun, 19 May 2019 20:53:31 -0700 (PDT) Received: from bbox-2.seo.corp.google.com ([2401:fa00:d:0:98f1:8b3d:1f37:3e8]) by smtp.gmail.com with ESMTPSA id x66sm3312779pfx.139.2019.05.19.20.53.26 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 19 May 2019 20:53:30 -0700 (PDT) From: Minchan Kim To: Andrew Morton Cc: LKML , linux-mm , Michal Hocko , Johannes Weiner , Tim Murray , Joel Fernandes , Suren Baghdasaryan , Daniel Colascione , Shakeel Butt , Sonny Rao , Brian Geffon , Minchan Kim Subject: [RFC 6/7] mm: extend process_madvise syscall to support vector arrary Date: Mon, 20 May 2019 12:52:53 +0900 Message-Id: <20190520035254.57579-7-minchan@kernel.org> X-Mailer: git-send-email 2.21.0.1020.gf2820cf01a-goog In-Reply-To: <20190520035254.57579-1-minchan@kernel.org> References: <20190520035254.57579-1-minchan@kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Currently, process_madvise syscall works for only one address range so user should call the syscall several times to give hints to multiple address range. This patch extends process_madvise syscall to support multiple hints, address ranges and return vaules so user could give hints all at once. struct pr_madvise_param { int size; /* the size of this structure */ const struct iovec __user *vec; /* address range array */ } int process_madvise(int pidfd, ssize_t nr_elem, int *behavior, struct pr_madvise_param *results, struct pr_madvise_param *ranges, unsigned long flags); - pidfd target process fd - nr_elem the number of elemenent of array behavior, results, ranges - behavior hints for each address range in remote process so that user could give different hints for each range. - results array of buffers to get results for associated remote address range action. - ranges array to buffers to have remote process's address ranges to be processed - flags extra argument for the future. It should be zero this moment. Example) struct pr_madvise_param { int size; const struct iovec *vec; }; int main(int argc, char *argv[]) { struct pr_madvise_param retp, rangep; struct iovec result_vec[2], range_vec[2]; int hints[2]; long ret[2]; void *addr[2]; pid_t pid; char cmd[64] = {0,}; addr[0] = mmap(NULL, ALLOC_SIZE, PROT_READ|PROT_WRITE, MAP_POPULATE|MAP_PRIVATE|MAP_ANONYMOUS, 0, 0); if (MAP_FAILED == addr[0]) return 1; addr[1] = mmap(NULL, ALLOC_SIZE, PROT_READ|PROT_WRITE, MAP_POPULATE|MAP_PRIVATE|MAP_ANONYMOUS, 0, 0); if (MAP_FAILED == addr[1]) return 1; hints[0] = MADV_COLD; range_vec[0].iov_base = addr[0]; range_vec[0].iov_len = ALLOC_SIZE; result_vec[0].iov_base = &ret[0]; result_vec[0].iov_len = sizeof(long); retp.vec = result_vec; retp.size = sizeof(struct pr_madvise_param); hints[1] = MADV_COOL; range_vec[1].iov_base = addr[1]; range_vec[1].iov_len = ALLOC_SIZE; result_vec[1].iov_base = &ret[1]; result_vec[1].iov_len = sizeof(long); rangep.vec = range_vec; rangep.size = sizeof(struct pr_madvise_param); pid = fork(); if (!pid) { sleep(10); } else { int pidfd = open(cmd, O_DIRECTORY | O_CLOEXEC); if (pidfd < 0) return 1; /* munmap to make pages private for the child */ munmap(addr[0], ALLOC_SIZE); munmap(addr[1], ALLOC_SIZE); system("cat /proc/vmstat | egrep 'pswpout|deactivate'"); if (syscall(__NR_process_madvise, pidfd, 2, behaviors, &retp, &rangep, 0)) perror("process_madvise fail\n"); system("cat /proc/vmstat | egrep 'pswpout|deactivate'"); } return 0; } Signed-off-by: Minchan Kim --- include/uapi/asm-generic/mman-common.h | 5 + mm/madvise.c | 184 +++++++++++++++++++++---- 2 files changed, 166 insertions(+), 23 deletions(-) diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h index b9b51eeb8e1a..b8e230de84a6 100644 --- a/include/uapi/asm-generic/mman-common.h +++ b/include/uapi/asm-generic/mman-common.h @@ -74,4 +74,9 @@ #define PKEY_ACCESS_MASK (PKEY_DISABLE_ACCESS |\ PKEY_DISABLE_WRITE) +struct pr_madvise_param { + int size; /* the size of this structure */ + const struct iovec __user *vec; /* address range array */ +}; + #endif /* __ASM_GENERIC_MMAN_COMMON_H */ diff --git a/mm/madvise.c b/mm/madvise.c index af02aa17e5c1..f4f569dac2bd 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -320,6 +320,7 @@ static int madvise_cool_pte_range(pmd_t *pmd, unsigned long addr, struct page *page; struct vm_area_struct *vma = walk->vma; unsigned long next; + long nr_pages = 0; next = pmd_addr_end(addr, end); if (pmd_trans_huge(*pmd)) { @@ -380,9 +381,12 @@ static int madvise_cool_pte_range(pmd_t *pmd, unsigned long addr, ptep_test_and_clear_young(vma, addr, pte); deactivate_page(page); + nr_pages++; + } pte_unmap_unlock(orig_pte, ptl); + *(long *)walk->private += nr_pages; cond_resched(); return 0; @@ -390,11 +394,13 @@ static int madvise_cool_pte_range(pmd_t *pmd, unsigned long addr, static void madvise_cool_page_range(struct mmu_gather *tlb, struct vm_area_struct *vma, - unsigned long addr, unsigned long end) + unsigned long addr, unsigned long end, + long *nr_pages) { struct mm_walk cool_walk = { .pmd_entry = madvise_cool_pte_range, .mm = vma->vm_mm, + .private = nr_pages }; tlb_start_vma(tlb, vma); @@ -403,7 +409,8 @@ static void madvise_cool_page_range(struct mmu_gather *tlb, } static long madvise_cool(struct vm_area_struct *vma, - unsigned long start_addr, unsigned long end_addr) + unsigned long start_addr, unsigned long end_addr, + long *nr_pages) { struct mm_struct *mm = vma->vm_mm; struct mmu_gather tlb; @@ -413,7 +420,7 @@ static long madvise_cool(struct vm_area_struct *vma, lru_add_drain(); tlb_gather_mmu(&tlb, mm, start_addr, end_addr); - madvise_cool_page_range(&tlb, vma, start_addr, end_addr); + madvise_cool_page_range(&tlb, vma, start_addr, end_addr, nr_pages); tlb_finish_mmu(&tlb, start_addr, end_addr); return 0; @@ -429,6 +436,7 @@ static int madvise_cold_pte_range(pmd_t *pmd, unsigned long addr, int isolated = 0; struct vm_area_struct *vma = walk->vma; unsigned long next; + long nr_pages = 0; next = pmd_addr_end(addr, end); if (pmd_trans_huge(*pmd)) { @@ -492,7 +500,7 @@ static int madvise_cold_pte_range(pmd_t *pmd, unsigned long addr, list_add(&page->lru, &page_list); if (isolated >= SWAP_CLUSTER_MAX) { pte_unmap_unlock(orig_pte, ptl); - reclaim_pages(&page_list); + nr_pages += reclaim_pages(&page_list); isolated = 0; pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); orig_pte = pte; @@ -500,19 +508,22 @@ static int madvise_cold_pte_range(pmd_t *pmd, unsigned long addr, } pte_unmap_unlock(orig_pte, ptl); - reclaim_pages(&page_list); + nr_pages += reclaim_pages(&page_list); cond_resched(); + *(long *)walk->private += nr_pages; return 0; } static void madvise_cold_page_range(struct mmu_gather *tlb, struct vm_area_struct *vma, - unsigned long addr, unsigned long end) + unsigned long addr, unsigned long end, + long *nr_pages) { struct mm_walk warm_walk = { .pmd_entry = madvise_cold_pte_range, .mm = vma->vm_mm, + .private = nr_pages, }; tlb_start_vma(tlb, vma); @@ -522,7 +533,8 @@ static void madvise_cold_page_range(struct mmu_gather *tlb, static long madvise_cold(struct vm_area_struct *vma, - unsigned long start_addr, unsigned long end_addr) + unsigned long start_addr, unsigned long end_addr, + long *nr_pages) { struct mm_struct *mm = vma->vm_mm; struct mmu_gather tlb; @@ -532,7 +544,7 @@ static long madvise_cold(struct vm_area_struct *vma, lru_add_drain(); tlb_gather_mmu(&tlb, mm, start_addr, end_addr); - madvise_cold_page_range(&tlb, vma, start_addr, end_addr); + madvise_cold_page_range(&tlb, vma, start_addr, end_addr, nr_pages); tlb_finish_mmu(&tlb, start_addr, end_addr); return 0; @@ -922,7 +934,7 @@ static int madvise_inject_error(int behavior, static long madvise_vma(struct task_struct *tsk, struct vm_area_struct *vma, struct vm_area_struct **prev, unsigned long start, - unsigned long end, int behavior) + unsigned long end, int behavior, long *nr_pages) { switch (behavior) { case MADV_REMOVE: @@ -930,9 +942,9 @@ madvise_vma(struct task_struct *tsk, struct vm_area_struct *vma, case MADV_WILLNEED: return madvise_willneed(vma, prev, start, end); case MADV_COOL: - return madvise_cool(vma, start, end); + return madvise_cool(vma, start, end, nr_pages); case MADV_COLD: - return madvise_cold(vma, start, end); + return madvise_cold(vma, start, end, nr_pages); case MADV_FREE: case MADV_DONTNEED: return madvise_dontneed_free(tsk, vma, prev, start, @@ -981,7 +993,7 @@ madvise_behavior_valid(int behavior) } static int madvise_core(struct task_struct *tsk, unsigned long start, - size_t len_in, int behavior) + size_t len_in, int behavior, long *nr_pages) { unsigned long end, tmp; struct vm_area_struct *vma, *prev; @@ -996,6 +1008,7 @@ static int madvise_core(struct task_struct *tsk, unsigned long start, if (start & ~PAGE_MASK) return error; + len = (len_in + ~PAGE_MASK) & PAGE_MASK; /* Check to see whether len was rounded up from small -ve to zero */ @@ -1035,6 +1048,8 @@ static int madvise_core(struct task_struct *tsk, unsigned long start, blk_start_plug(&plug); for (;;) { /* Still start < end. */ + long pages = 0; + error = -ENOMEM; if (!vma) goto out; @@ -1053,9 +1068,11 @@ static int madvise_core(struct task_struct *tsk, unsigned long start, tmp = end; /* Here vma->vm_start <= start < tmp <= (end|vma->vm_end). */ - error = madvise_vma(tsk, vma, &prev, start, tmp, behavior); + error = madvise_vma(tsk, vma, &prev, start, tmp, + behavior, &pages); if (error) goto out; + *nr_pages += pages; start = tmp; if (prev && start < prev->vm_end) start = prev->vm_end; @@ -1140,26 +1157,137 @@ static int madvise_core(struct task_struct *tsk, unsigned long start, */ SYSCALL_DEFINE3(madvise, unsigned long, start, size_t, len_in, int, behavior) { - return madvise_core(current, start, len_in, behavior); + unsigned long dummy; + + return madvise_core(current, start, len_in, behavior, &dummy); } -SYSCALL_DEFINE4(process_madvise, int, pidfd, unsigned long, start, - size_t, len_in, int, behavior) +static int pr_madvise_copy_param(struct pr_madvise_param __user *u_param, + struct pr_madvise_param *param) +{ + u32 size; + int ret; + + memset(param, 0, sizeof(*param)); + + ret = get_user(size, &u_param->size); + if (ret) + return ret; + + if (size > PAGE_SIZE) + return -E2BIG; + + if (!size || size > sizeof(struct pr_madvise_param)) + return -EINVAL; + + ret = copy_from_user(param, u_param, size); + if (ret) + return -EFAULT; + + return ret; +} + +static int process_madvise_core(struct task_struct *tsk, int *behaviors, + struct iov_iter *iter, + const struct iovec *range_vec, + unsigned long riovcnt, + unsigned long flags) +{ + int i; + long err; + + for (err = 0, i = 0; i < riovcnt && iov_iter_count(iter); i++) { + long ret = 0; + + err = madvise_core(tsk, (unsigned long)range_vec[i].iov_base, + range_vec[i].iov_len, behaviors[i], + &ret); + if (err) + ret = err; + + if (copy_to_iter(&ret, sizeof(long), iter) != + sizeof(long)) { + err = -EFAULT; + break; + } + + err = 0; + } + + return err; +} + +SYSCALL_DEFINE6(process_madvise, int, pidfd, ssize_t, nr_elem, + const int __user *, hints, + struct pr_madvise_param __user *, results, + struct pr_madvise_param __user *, ranges, + unsigned long, flags) { int ret; struct fd f; struct pid *pid; struct task_struct *tsk; struct mm_struct *mm; + struct pr_madvise_param result_p, range_p; + const struct iovec __user *result_vec, __user *range_vec; + int *behaviors; + struct iovec iovstack_result[UIO_FASTIOV]; + struct iovec iovstack_r[UIO_FASTIOV]; + struct iovec *iov_l = iovstack_result; + struct iovec *iov_r = iovstack_r; + struct iov_iter iter; + + if (flags != 0) + return -EINVAL; + + ret = pr_madvise_copy_param(results, &result_p); + if (ret) + return ret; + + ret = pr_madvise_copy_param(ranges, &range_p); + if (ret) + return ret; + + result_vec = result_p.vec; + range_vec = range_p.vec; + + if (result_p.size != sizeof(struct pr_madvise_param) || + range_p.size != sizeof(struct pr_madvise_param)) + return -EINVAL; + + behaviors = kmalloc_array(nr_elem, sizeof(int), GFP_KERNEL); + if (!behaviors) + return -ENOMEM; + + ret = copy_from_user(behaviors, hints, sizeof(int) * nr_elem); + if (ret < 0) + goto free_behavior_vec; + + ret = import_iovec(READ, result_vec, nr_elem, UIO_FASTIOV, + &iov_l, &iter); + if (ret < 0) + goto free_behavior_vec; + + if (!iov_iter_count(&iter)) { + ret = -EINVAL; + goto free_iovecs; + } + + ret = rw_copy_check_uvector(CHECK_IOVEC_ONLY, range_vec, nr_elem, + UIO_FASTIOV, iovstack_r, &iov_r); + if (ret <= 0) + goto free_iovecs; f = fdget(pidfd); - if (!f.file) - return -EBADF; + if (!f.file) { + ret = -EBADF; + goto free_iovecs; + } pid = pidfd_to_pid(f.file); if (IS_ERR(pid)) { ret = PTR_ERR(pid); - goto err; + goto put_fd; } ret = -EINVAL; @@ -1167,7 +1295,7 @@ SYSCALL_DEFINE4(process_madvise, int, pidfd, unsigned long, start, tsk = pid_task(pid, PIDTYPE_PID); if (!tsk) { rcu_read_unlock(); - goto err; + goto put_fd; } get_task_struct(tsk); rcu_read_unlock(); @@ -1176,12 +1304,22 @@ SYSCALL_DEFINE4(process_madvise, int, pidfd, unsigned long, start, ret = IS_ERR(mm) ? PTR_ERR(mm) : -ESRCH; if (ret == -EACCES) ret = -EPERM; - goto err; + goto put_task; } - ret = madvise_core(tsk, start, len_in, behavior); + + ret = process_madvise_core(tsk, behaviors, &iter, iov_r, + nr_elem, flags); mmput(mm); +put_task: put_task_struct(tsk); -err: +put_fd: fdput(f); +free_iovecs: + if (iov_r != iovstack_r) + kfree(iov_r); + kfree(iov_l); +free_behavior_vec: + kfree(behaviors); + return ret; } -- 2.21.0.1020.gf2820cf01a-goog From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.8 required=3.0 tests=FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EF416C282E3 for ; Wed, 29 May 2019 04:15:07 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 8CAE32075B for ; Wed, 29 May 2019 04:15:07 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8CAE32075B Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=sina.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 0456C6B0272; Wed, 29 May 2019 00:15:07 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EEA246B0273; Wed, 29 May 2019 00:15:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D8A946B0275; Wed, 29 May 2019 00:15:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f198.google.com (mail-pf1-f198.google.com [209.85.210.198]) by kanga.kvack.org (Postfix) with ESMTP id 9AF416B0272 for ; Wed, 29 May 2019 00:15:06 -0400 (EDT) Received: by mail-pf1-f198.google.com with SMTP id r4so898606pfh.16 for ; Tue, 28 May 2019 21:15:06 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version:list-id :archived-at:list-archive:list-post:content-transfer-encoding; bh=r5/OjtXFuUE+uWCMge6H1dbZg9/1vgpsiaMnC8Ch1NQ=; b=mmQO3+PvwyBCSDGxmYEZHyzvCTWsfLI4kp3pHZxfubY8GXBDerPf1Z1sgJs9qCcoeT 3nO5hNQs56pdLQMeSrgDlTp8UG+bt2BcYmvIgQdZ/rlrFHBgvadHc1ml/kSS/MLxcVvS sgF/a1y8Yr99RrpNAvDF2QKozsAVV8mnmjtbHnRqJmvPh/Khagw7wPF1taKV9Hrm1iCH UEFuJETow+3iczZCIaojyA6gB5i4N1wvBcL08e+jk/bqejAhUffn/y1yx/S9L773HmPv cGZIXRAj6KQLyFDjcK8x4Ga4psdK2yDIeFRILibuWen/qcJFHiE8b4FOTxLiiMOHdazz yIOg== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of hdanton@sina.com designates 202.108.3.165 as permitted sender) smtp.mailfrom=hdanton@sina.com X-Gm-Message-State: APjAAAVnZLI0mig4muzPpJ++Z5OLGYMFVrz/7PINgPw5qa7Mbq5Oix6c Kt+Ih4zlbzdm17k88mvXuGKP+ptTdZbXS59VPEYFSfnHnd+IZhjtK+G0iJl+AtiMaOYzoeqwpjA rJMlM+YfgbMiyVhhvhx7QutuzZfvbwuLC3DXP3JVV1i9ZLw89Sz1VvN48VWBVIxvaqQ== X-Received: by 2002:a65:6543:: with SMTP id a3mr107669718pgw.300.1559103306164; Tue, 28 May 2019 21:15:06 -0700 (PDT) X-Google-Smtp-Source: APXvYqybiCnDs5fyZtwA4OjRhM4+Gd4LW3IAIosuCN526ckdDWGUr9zoDg1yA+UzgBfvqG/EDXSU X-Received: by 2002:a65:6543:: with SMTP id a3mr107669677pgw.300.1559103305502; Tue, 28 May 2019 21:15:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1559103305; cv=none; d=google.com; s=arc-20160816; b=KbLTXKNPmUq0mTwJbkDYBqyhzurtd4yvFvy8c4ll4+RoHwc35BGafvIZmPB18C0n7K rrOgd7i6iuJ+DNqlyB6WOL3LAWyLn2ymQJW7WMLdO5UW7M2e2V3vSelPfkY0nu96lw2A KHj2BtnaWGpvqAgqIVji2aETBAPA6JtT7JwBWvZlj9+d9SpGKpLqouPXtGNDcNpM4UI3 lxqv4a7drcN9KdxepgkzPaMIL79SAIeHO2PfXmr9s+K1z+1Lu6byTwhVYliRO1LBqrKz axMzC3e72wZ7HLadjdXxavbzZVoPSSEMKXu0xQvmAdCXrkfOJxc/3ecm3nwHM8XwQhFq 7sZw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:list-post:list-archive:archived-at :list-id:mime-version:references:in-reply-to:message-id:date:subject :cc:to:from; bh=r5/OjtXFuUE+uWCMge6H1dbZg9/1vgpsiaMnC8Ch1NQ=; b=oJ8ORMJmSlVwmYq0U0bTCwqI2E8dciVYgjVANqKW0mnY+LQzpdNfxDRwAvwLXXIXl5 qn/dOthMHBaoQeCyLpQxHM2pBS2QmchLiWluaKb96FDU+Q3I95d5d3Cx7uE8uJVlE4OP lciXE0CwQVsxWiQiMTNVhJXkkfOgUr4huQkavE3APmGPh5w+yryvaxLHJEvQiVPl8ALD PDcCGcCi2yn6vU4mMck6hMC27q9MA9LqTaktYOkUdPcql6F70VGt/LcSPL29wQZ7YG0L zrknUGal77rNimIuI62Nsh2wShF7EhfHAdYkq+2dy7vVV34JRxsVf3s/7JjwDv7gs7xl Kzvw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of hdanton@sina.com designates 202.108.3.165 as permitted sender) smtp.mailfrom=hdanton@sina.com Received: from mail3-165.sinamail.sina.com.cn (mail3-165.sinamail.sina.com.cn. [202.108.3.165]) by mx.google.com with SMTP id 204si22894960pga.373.2019.05.28.21.15.04 for ; Tue, 28 May 2019 21:15:05 -0700 (PDT) Received-SPF: pass (google.com: domain of hdanton@sina.com designates 202.108.3.165 as permitted sender) client-ip=202.108.3.165; Authentication-Results: mx.google.com; spf=pass (google.com: domain of hdanton@sina.com designates 202.108.3.165 as permitted sender) smtp.mailfrom=hdanton@sina.com Received: from unknown (HELO localhost.localdomain)([123.112.52.157]) by sina.com with ESMTP id 5CEE073F00004B7C; Wed, 29 May 2019 12:15:02 +0800 (CST) X-Sender: hdanton@sina.com X-Auth-ID: hdanton@sina.com X-SMAIL-MID: 564143396047 From: Hillf Danton To: Minchan Kim Cc: Andrew Morton , LKML , linux-mm , Michal Hocko , Johannes Weiner , Tim Murray , Joel Fernandes , Suren Baghdasaryan , Daniel Colascione , Shakeel Butt , Sonny Rao , Brian Geffon Subject: Re: [RFC 6/7] mm: extend process_madvise syscall to support vector arrary Date: Wed, 29 May 2019 12:14:47 +0800 Message-Id: <20190520035254.57579-7-minchan@kernel.org> In-Reply-To: <20190520035254.57579-1-minchan@kernel.org> References: <20190520035254.57579-1-minchan@kernel.org> X-Mailer: git-send-email 2.21.0.1020.gf2820cf01a-goog MIME-Version: 1.0 List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Archived-At: List-Archive: List-Post: Content-Transfer-Encoding: 8bit X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Content-Type: text/plain; charset="UTF-8" Message-ID: <20190529041447.yS-hNnmHaZzavxxZ3nqUepUZaP3KF5kZrSyKZcKdKIQ@z> On Mon, 20 May 2019 12:52:53 +0900 Minchan Kim wrote: > Example) > Better if the following stuff is stored somewhere under the tools/testing directory. BR Hillf > struct pr_madvise_param { > int size; > const struct iovec *vec; > }; > > int main(int argc, char *argv[]) > { > struct pr_madvise_param retp, rangep; > struct iovec result_vec[2], range_vec[2]; > int hints[2]; > long ret[2]; > void *addr[2]; > > pid_t pid; > char cmd[64] = {0,}; > addr[0] = mmap(NULL, ALLOC_SIZE, PROT_READ|PROT_WRITE, > MAP_POPULATE|MAP_PRIVATE|MAP_ANONYMOUS, 0, 0); > > if (MAP_FAILED == addr[0]) > return 1; > > addr[1] = mmap(NULL, ALLOC_SIZE, PROT_READ|PROT_WRITE, > MAP_POPULATE|MAP_PRIVATE|MAP_ANONYMOUS, 0, 0); > > if (MAP_FAILED == addr[1]) > return 1; > > hints[0] = MADV_COLD; > range_vec[0].iov_base = addr[0]; > range_vec[0].iov_len = ALLOC_SIZE; > result_vec[0].iov_base = &ret[0]; > result_vec[0].iov_len = sizeof(long); > retp.vec = result_vec; > retp.size = sizeof(struct pr_madvise_param); > > hints[1] = MADV_COOL; > range_vec[1].iov_base = addr[1]; > range_vec[1].iov_len = ALLOC_SIZE; > result_vec[1].iov_base = &ret[1]; > result_vec[1].iov_len = sizeof(long); > rangep.vec = range_vec; > rangep.size = sizeof(struct pr_madvise_param); > > pid = fork(); > if (!pid) { > sleep(10); > } else { > int pidfd = open(cmd, O_DIRECTORY | O_CLOEXEC); > if (pidfd < 0) > return 1; > > /* munmap to make pages private for the child */ > munmap(addr[0], ALLOC_SIZE); > munmap(addr[1], ALLOC_SIZE); > system("cat /proc/vmstat | egrep 'pswpout|deactivate'"); > if (syscall(__NR_process_madvise, pidfd, 2, behaviors, > &retp, &rangep, 0)) > perror("process_madvise fail\n"); > system("cat /proc/vmstat | egrep 'pswpout|deactivate'"); > } > > return 0; > }