From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.6 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B5F53C5ACAE for ; Wed, 11 Sep 2019 12:25:35 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 750ED20872 for ; Wed, 11 Sep 2019 12:25:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1568204735; bh=A8VTRAGYafaJl21LZ5JJHFgvY4AYbgK54wYUKUZ0wds=; h=Date:From:To:Cc:Subject:References:In-Reply-To:List-ID:From; b=CBMdG7VCVZ87QZTYZynEvHX5J4mjTTFyeUhjDwX8F3hX8bSqwTRlNfjJwFg1TFfyi b6xxY530LUhf0HkcnL3TaZ6CLs1qikWOH0Wd8LBMbCiq3tRQ9Ct3jvJFHocziNkXsr Zyp2aq2yJ1h7kEcnDDakoc+Ax2ZOLUT2wkz/DeLg= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727952AbfIKMZe (ORCPT ); Wed, 11 Sep 2019 08:25:34 -0400 Received: from mx2.suse.de ([195.135.220.15]:34540 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727933AbfIKMZe (ORCPT ); Wed, 11 Sep 2019 08:25:34 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 086ACB7F4; Wed, 11 Sep 2019 12:25:31 +0000 (UTC) Date: Wed, 11 Sep 2019 14:25:26 +0200 From: Michal Hocko To: "Michael S. Tsirkin" Cc: Alexander Duyck , Alexander Duyck , virtio-dev@lists.oasis-open.org, kvm list , Catalin Marinas , David Hildenbrand , Dave Hansen , LKML , Matthew Wilcox , linux-mm , Andrew Morton , will@kernel.org, linux-arm-kernel@lists.infradead.org, Oscar Salvador , Yang Zhang , Pankaj Gupta , Konrad Rzeszutek Wilk , Nitesh Narayan Lal , Rik van Riel , lcapitulino@redhat.com, "Wang, Wei W" , Andrea Arcangeli , ying.huang@intel.com, Paolo Bonzini , Dan Williams , Fengguang Wu , "Kirill A. Shutemov" Subject: Re: [PATCH v9 0/8] stg mail -e --version=v9 \ Message-ID: <20190911122526.GV4023@dhcp22.suse.cz> References: <20190907172225.10910.34302.stgit@localhost.localdomain> <20190910124209.GY2063@dhcp22.suse.cz> <20190910144713.GF2063@dhcp22.suse.cz> <20190910175213.GD4023@dhcp22.suse.cz> <1d7de9f9f4074f67c567dbb4cc1497503d739e30.camel@linux.intel.com> <20190911113619.GP4023@dhcp22.suse.cz> <20190911080804-mutt-send-email-mst@kernel.org> <20190911121941.GU4023@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190911121941.GU4023@dhcp22.suse.cz> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed 11-09-19 14:19:41, Michal Hocko wrote: > On Wed 11-09-19 08:08:38, Michael S. Tsirkin wrote: > > On Wed, Sep 11, 2019 at 01:36:19PM +0200, Michal Hocko wrote: > > > On Tue 10-09-19 14:23:40, Alexander Duyck wrote: > > > [...] > > > > We don't put any limitations on the allocator other then that it needs to > > > > clean up the metadata on allocation, and that it cannot allocate a page > > > > that is in the process of being reported since we pulled it from the > > > > free_list. If the page is a "Reported" page then it decrements the > > > > reported_pages count for the free_area and makes sure the page doesn't > > > > exist in the "Boundary" array pointer value, if it does it moves the > > > > "Boundary" since it is pulling the page. > > > > > > This is still a non-trivial limitation on the page allocation from an > > > external code IMHO. I cannot give any explicit reason why an ordering on > > > the free list might matter (well except for page shuffling which uses it > > > to make physical memory pattern allocation more random) but the > > > architecture seems hacky and dubious to be honest. It shoulds like the > > > whole interface has been developed around a very particular and single > > > purpose optimization. > > > > > > I remember that there was an attempt to report free memory that provided > > > a callback mechanism [1], which was much less intrusive to the internals > > > of the allocator yet it should provide a similar functionality. Did you > > > see that approach? How does this compares to it? Or am I completely off > > > when comparing them? > > > > > > [1] mostly likely not the latest version of the patchset > > > http://lkml.kernel.org/r/1502940416-42944-5-git-send-email-wei.w.wang@intel.com > > > > Linus nacked that one. He thinks invoking callbacks with lots of > > internal mm locks is too fragile. > > I would be really curious how much he would be happy about injecting > other restrictions on the allocator like this patch proposes. This is > more intrusive as it has a higher maintenance cost longterm IMHO. Btw. I do agree that callbacks with internal mm locks are not great either. We do have a model for that in mmu_notifiers and it is something I do consider PITA, on the other hand it is mostly sleepable part of the interface which makes it the real pain. The above callback mechanism was explicitly documented with restrictions and that the context is essentially atomic with no access to particular struct pages and no expensive operations possible. So in the end I've considered it acceptably painful. Not that I want to override Linus' nack but if virtualization usecases really require some form of reporting and no other way to do that push people to invent even more interesting approaches then we should simply give them/you something reasonable and least intrusive to our internals. -- Michal Hocko SUSE Labs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.5 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9C1D1C49ED6 for ; Wed, 11 Sep 2019 12:25:41 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 6F6C220CC7 for ; Wed, 11 Sep 2019 12:25:41 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="RsNx0tDQ" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6F6C220CC7 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+infradead-linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:Cc:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References: Message-ID:Subject:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=MJE6xwwe3+JXFpvwSPLTjcRMz9mYO0cxF/cfwJ+sxhM=; b=RsNx0tDQUHuhFF A5y2H26S+59FJ9rYhP+WZMYawaMKGoeKJZe6Cio9Cfz3CLcVtWWZAyiwmukf6cZVTXtAMYSs3JkGS 5Dou9SgR36CJLjDfTnhcvaFYaHeLL17j+Rzc5zWEOAQMwoONIle4VlVH+N9DzwIuLwxgBQWmb+xvX lkYSgLArpNpufmNHYxqqkP3x+7XTEoXryA2FWTkd/lUjtS6QOw3jJWYdfoKTKVwcc1WkvLyKD/q27 Y/YvdzAv2eEy6z+89OKnCBPrSzMRb7/R3WFr+PwqNEygCLcAJVxOUhb1mFwzuNDGVKSA6G3iGxGrT nqAyVqDGAdtSP8hjkAlA==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.92.2 #3 (Red Hat Linux)) id 1i81gm-0001vG-66; Wed, 11 Sep 2019 12:25:36 +0000 Received: from mx2.suse.de ([195.135.220.15] helo=mx1.suse.de) by bombadil.infradead.org with esmtps (Exim 4.92.2 #3 (Red Hat Linux)) id 1i81gj-0001ux-01 for linux-arm-kernel@lists.infradead.org; Wed, 11 Sep 2019 12:25:34 +0000 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 086ACB7F4; Wed, 11 Sep 2019 12:25:31 +0000 (UTC) Date: Wed, 11 Sep 2019 14:25:26 +0200 From: Michal Hocko To: "Michael S. Tsirkin" Subject: Re: [PATCH v9 0/8] stg mail -e --version=v9 \ Message-ID: <20190911122526.GV4023@dhcp22.suse.cz> References: <20190907172225.10910.34302.stgit@localhost.localdomain> <20190910124209.GY2063@dhcp22.suse.cz> <20190910144713.GF2063@dhcp22.suse.cz> <20190910175213.GD4023@dhcp22.suse.cz> <1d7de9f9f4074f67c567dbb4cc1497503d739e30.camel@linux.intel.com> <20190911113619.GP4023@dhcp22.suse.cz> <20190911080804-mutt-send-email-mst@kernel.org> <20190911121941.GU4023@dhcp22.suse.cz> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20190911121941.GU4023@dhcp22.suse.cz> User-Agent: Mutt/1.10.1 (2018-07-13) X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20190911_052533_331388_90914321 X-CRM114-Status: GOOD ( 23.02 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Yang Zhang , Pankaj Gupta , kvm list , David Hildenbrand , Catalin Marinas , Alexander Duyck , lcapitulino@redhat.com, linux-mm , Alexander Duyck , will@kernel.org, Andrea Arcangeli , virtio-dev@lists.oasis-open.org, Rik van Riel , Matthew Wilcox , "Wang, Wei W" , ying.huang@intel.com, Konrad Rzeszutek Wilk , Dan Williams , linux-arm-kernel@lists.infradead.org, Oscar Salvador , Nitesh Narayan Lal , Dave Hansen , LKML , Paolo Bonzini , Andrew Morton , Fengguang Wu , "Kirill A. Shutemov" Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+infradead-linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Wed 11-09-19 14:19:41, Michal Hocko wrote: > On Wed 11-09-19 08:08:38, Michael S. Tsirkin wrote: > > On Wed, Sep 11, 2019 at 01:36:19PM +0200, Michal Hocko wrote: > > > On Tue 10-09-19 14:23:40, Alexander Duyck wrote: > > > [...] > > > > We don't put any limitations on the allocator other then that it needs to > > > > clean up the metadata on allocation, and that it cannot allocate a page > > > > that is in the process of being reported since we pulled it from the > > > > free_list. If the page is a "Reported" page then it decrements the > > > > reported_pages count for the free_area and makes sure the page doesn't > > > > exist in the "Boundary" array pointer value, if it does it moves the > > > > "Boundary" since it is pulling the page. > > > > > > This is still a non-trivial limitation on the page allocation from an > > > external code IMHO. I cannot give any explicit reason why an ordering on > > > the free list might matter (well except for page shuffling which uses it > > > to make physical memory pattern allocation more random) but the > > > architecture seems hacky and dubious to be honest. It shoulds like the > > > whole interface has been developed around a very particular and single > > > purpose optimization. > > > > > > I remember that there was an attempt to report free memory that provided > > > a callback mechanism [1], which was much less intrusive to the internals > > > of the allocator yet it should provide a similar functionality. Did you > > > see that approach? How does this compares to it? Or am I completely off > > > when comparing them? > > > > > > [1] mostly likely not the latest version of the patchset > > > http://lkml.kernel.org/r/1502940416-42944-5-git-send-email-wei.w.wang@intel.com > > > > Linus nacked that one. He thinks invoking callbacks with lots of > > internal mm locks is too fragile. > > I would be really curious how much he would be happy about injecting > other restrictions on the allocator like this patch proposes. This is > more intrusive as it has a higher maintenance cost longterm IMHO. Btw. I do agree that callbacks with internal mm locks are not great either. We do have a model for that in mmu_notifiers and it is something I do consider PITA, on the other hand it is mostly sleepable part of the interface which makes it the real pain. The above callback mechanism was explicitly documented with restrictions and that the context is essentially atomic with no access to particular struct pages and no expensive operations possible. So in the end I've considered it acceptably painful. Not that I want to override Linus' nack but if virtualization usecases really require some form of reporting and no other way to do that push people to invent even more interesting approaches then we should simply give them/you something reasonable and least intrusive to our internals. -- Michal Hocko SUSE Labs _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel