From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.2 required=3.0 tests=FROM_EXCESS_BASE64, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, UNPARSEABLE_RELAY,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 290B9C742A8 for ; Fri, 12 Jul 2019 04:03:31 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 0BBBA217D6 for ; Fri, 12 Jul 2019 04:03:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726676AbfGLED3 (ORCPT ); Fri, 12 Jul 2019 00:03:29 -0400 Received: from out30-45.freemail.mail.aliyun.com ([115.124.30.45]:46834 "EHLO out30-45.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725268AbfGLED3 (ORCPT ); Fri, 12 Jul 2019 00:03:29 -0400 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R161e4;CH=green;DM=||false|;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e01422;MF=yun.wang@linux.alibaba.com;NM=1;PH=DS;RN=13;SR=0;TI=SMTPD_---0TWfcT1L_1562904203; Received: from testdeMacBook-Pro.local(mailfrom:yun.wang@linux.alibaba.com fp:SMTPD_---0TWfcT1L_1562904203) by smtp.aliyun-inc.com(127.0.0.1); Fri, 12 Jul 2019 12:03:24 +0800 Subject: Re: [PATCH 3/4] numa: introduce numa group per task group To: Peter Zijlstra Cc: hannes@cmpxchg.org, mhocko@kernel.org, vdavydov.dev@gmail.com, Ingo Molnar , linux-kernel@vger.kernel.org, linux-mm@kvack.org, mcgrof@kernel.org, keescook@chromium.org, linux-fsdevel@vger.kernel.org, cgroups@vger.kernel.org, Mel Gorman , riel@surriel.com References: <209d247e-c1b2-3235-2722-dd7c1f896483@linux.alibaba.com> <60b59306-5e36-e587-9145-e90657daec41@linux.alibaba.com> <93cf9333-2f9a-ca1e-a4a6-54fc388d1673@linux.alibaba.com> <20190711141038.GE3402@hirez.programming.kicks-ass.net> From: =?UTF-8?B?546L6LSH?= Message-ID: <50a5ae9e-6dbd-51b6-a374-1b0e45588abf@linux.alibaba.com> Date: Fri, 12 Jul 2019 12:03:23 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:60.0) Gecko/20100101 Thunderbird/60.7.0 MIME-Version: 1.0 In-Reply-To: <20190711141038.GE3402@hirez.programming.kicks-ass.net> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2019/7/11 δΈ‹εˆ10:10, Peter Zijlstra wrote: > On Wed, Jul 03, 2019 at 11:32:32AM +0800, ηŽ‹θ΄‡ wrote: >> By tracing numa page faults, we recognize tasks sharing the same page, >> and try pack them together into a single numa group. >> >> However when two task share lot's of cache pages while not much >> anonymous pages, since numa balancing do not tracing cache page, they >> have no chance to join into the same group. >> >> While tracing cache page cost too much, we could use some hints from > > I forgot; where again do we skip shared pages? task_numa_work() doesn't > seem to skip file vmas. That's the page cache generated by file read/write, rather than the pages for file mapping, pages of memory to support IO also won't be considered as shared between tasks since they don't belong to any particular task, but may serving multiples. > >> userland and cpu cgroup could be a good one. >> >> This patch introduced new entry 'numa_group' for cpu cgroup, by echo >> non-zero into the entry, we can now force all the tasks of this cgroup >> to join the same numa group serving for task group. >> >> In this way tasks are more likely to settle down on the same node, to >> share closer cpu cache and gain benefit from NUMA on both file/anonymous >> pages. >> >> Besides, when multiple cgroup enabled numa group, they will be able to >> exchange task location by utilizing numa migration, in this way they >> could achieve single node settle down without breaking load balance. > > I dislike cgroup only interfaces; it there really nothing else we could > use for this? Me too... while at this moment that's the best approach we have got, we also tried to use separately module to handle these automatically, but this need a very good understanding of the system, configuration and workloads which only known by the owner. So maybe just providing the functionality and leave the choice to user is not that bad? Regards, Michael Wang >