From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932069AbcEZOQC (ORCPT ); Thu, 26 May 2016 10:16:02 -0400 Received: from mail-pf0-f193.google.com ([209.85.192.193]:34772 "EHLO mail-pf0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753813AbcEZOP7 (ORCPT ); Thu, 26 May 2016 10:15:59 -0400 Message-ID: <1464272149.5939.92.camel@edumazet-glaptop3.roam.corp.google.com> Subject: Re: [PATCH RESEND 7/8] pipe: account to kmemcg From: Eric Dumazet To: Vladimir Davydov Cc: Minchan Kim , Andrew Morton , Alexander Viro , Johannes Weiner , Michal Hocko , linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, netdev@vger.kernel.org, x86@kernel.org, linux-kernel@vger.kernel.org Date: Thu, 26 May 2016 07:15:49 -0700 In-Reply-To: <20160526135930.GA26059@esperanza> References: <2c2545563b6201f118946f96dd8cfc90e564aff6.1464079538.git.vdavydov@virtuozzo.com> <1464094742.5939.46.camel@edumazet-glaptop3.roam.corp.google.com> <20160524161336.GA11150@esperanza> <1464120273.5939.53.camel@edumazet-glaptop3.roam.corp.google.com> <20160525103011.GF11150@esperanza> <20160526070455.GF9661@bbox> <20160526135930.GA26059@esperanza> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.10.4-0ubuntu2 Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 2016-05-26 at 16:59 +0300, Vladimir Davydov wrote: > On Thu, May 26, 2016 at 04:04:55PM +0900, Minchan Kim wrote: > > On Wed, May 25, 2016 at 01:30:11PM +0300, Vladimir Davydov wrote: > > > On Tue, May 24, 2016 at 01:04:33PM -0700, Eric Dumazet wrote: > > > > On Tue, 2016-05-24 at 19:13 +0300, Vladimir Davydov wrote: > > > > > On Tue, May 24, 2016 at 05:59:02AM -0700, Eric Dumazet wrote: > > > > > ... > > > > > > > +static int anon_pipe_buf_steal(struct pipe_inode_info *pipe, > > > > > > > + struct pipe_buffer *buf) > > > > > > > +{ > > > > > > > + struct page *page = buf->page; > > > > > > > + > > > > > > > + if (page_count(page) == 1) { > > > > > > > > > > > > This looks racy : some cpu could have temporarily elevated page count. > > > > > > > > > > All pipe operations (pipe_buf_operations->get, ->release, ->steal) are > > > > > supposed to be called under pipe_lock. So, if we see a pipe_buffer->page > > > > > with refcount of 1 in ->steal, that means that we are the only its user > > > > > and it can't be spliced to another pipe. > > > > > > > > > > In fact, I just copied the code from generic_pipe_buf_steal, adding > > > > > kmemcg related checks along the way, so it should be fine. > > > > > > > > So you guarantee that no other cpu might have done > > > > get_page_unless_zero() right before this test ? > > > > > > Each pipe_buffer holds a reference to its page. If we find page's > > > refcount to be 1 here, then it can be referenced only by our > > > pipe_buffer. And the refcount cannot be increased by a parallel thread, > > > because we hold pipe_lock, which rules out splice, and otherwise it's > > > impossible to reach the page as it is not on lru. That said, I think I > > > guarantee that this should be safe. > > > > I don't know kmemcg internal and pipe stuff so my comment might be > > totally crap. > > > > No one cannot guarantee any CPU cannot held a reference of a page. > > Look at get_page_unless_zero usecases. > > > > 1. balloon_page_isolate > > > > It can hold a reference in random page and then verify the page > > is balloon page. Otherwise, just put. > > > > 2. page_idle_get_page > > > > It has PageLRU check but it's racy so it can hold a reference > > of randome page and then verify within zone->lru_lock. If it's > > not LRU page, just put. > > Well, I see your concern now - even if a page is not on lru and we > locked all structs pointing to it, it can always get accessed by pfn in > a completely unrelated thread, like in examples you gave above. That's a > fair point. > > However, I still think that it's OK in case of pipe buffers. What can > happen if somebody takes a transient reference to a pipe buffer page? At > worst, we'll see page_count > 1 due to temporary ref and abort stealing, > falling back on copying instead. That's OK, because stealing is not > guaranteed. Can a function that takes a transient ref to page by pfn > mistakenly assume that this is a page it's interested in? I don't think > so, because this page has no marks on it except special _mapcount value, > which should only be set on kmemcg pages. Well, all this information deserve to be in the changelog. Maybe in 6 months, this will be incredibly useful for bug hunting. pipes can be used to exchange data (or pages) between processes in different domains. If kmemcg is not precise, this could be used by some attackers to force some processes to consume all their budget and eventually not be able to allocate new pages. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Message-ID: <1464272149.5939.92.camel@edumazet-glaptop3.roam.corp.google.com> Subject: Re: [PATCH RESEND 7/8] pipe: account to kmemcg From: Eric Dumazet To: Vladimir Davydov Cc: Minchan Kim , Andrew Morton , Alexander Viro , Johannes Weiner , Michal Hocko , linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, netdev@vger.kernel.org, x86@kernel.org, linux-kernel@vger.kernel.org Date: Thu, 26 May 2016 07:15:49 -0700 In-Reply-To: <20160526135930.GA26059@esperanza> References: <2c2545563b6201f118946f96dd8cfc90e564aff6.1464079538.git.vdavydov@virtuozzo.com> <1464094742.5939.46.camel@edumazet-glaptop3.roam.corp.google.com> <20160524161336.GA11150@esperanza> <1464120273.5939.53.camel@edumazet-glaptop3.roam.corp.google.com> <20160525103011.GF11150@esperanza> <20160526070455.GF9661@bbox> <20160526135930.GA26059@esperanza> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: On Thu, 2016-05-26 at 16:59 +0300, Vladimir Davydov wrote: > On Thu, May 26, 2016 at 04:04:55PM +0900, Minchan Kim wrote: > > On Wed, May 25, 2016 at 01:30:11PM +0300, Vladimir Davydov wrote: > > > On Tue, May 24, 2016 at 01:04:33PM -0700, Eric Dumazet wrote: > > > > On Tue, 2016-05-24 at 19:13 +0300, Vladimir Davydov wrote: > > > > > On Tue, May 24, 2016 at 05:59:02AM -0700, Eric Dumazet wrote: > > > > > ... > > > > > > > +static int anon_pipe_buf_steal(struct pipe_inode_info *pipe, > > > > > > > + struct pipe_buffer *buf) > > > > > > > +{ > > > > > > > + struct page *page = buf->page; > > > > > > > + > > > > > > > + if (page_count(page) == 1) { > > > > > > > > > > > > This looks racy : some cpu could have temporarily elevated page count. > > > > > > > > > > All pipe operations (pipe_buf_operations->get, ->release, ->steal) are > > > > > supposed to be called under pipe_lock. So, if we see a pipe_buffer->page > > > > > with refcount of 1 in ->steal, that means that we are the only its user > > > > > and it can't be spliced to another pipe. > > > > > > > > > > In fact, I just copied the code from generic_pipe_buf_steal, adding > > > > > kmemcg related checks along the way, so it should be fine. > > > > > > > > So you guarantee that no other cpu might have done > > > > get_page_unless_zero() right before this test ? > > > > > > Each pipe_buffer holds a reference to its page. If we find page's > > > refcount to be 1 here, then it can be referenced only by our > > > pipe_buffer. And the refcount cannot be increased by a parallel thread, > > > because we hold pipe_lock, which rules out splice, and otherwise it's > > > impossible to reach the page as it is not on lru. That said, I think I > > > guarantee that this should be safe. > > > > I don't know kmemcg internal and pipe stuff so my comment might be > > totally crap. > > > > No one cannot guarantee any CPU cannot held a reference of a page. > > Look at get_page_unless_zero usecases. > > > > 1. balloon_page_isolate > > > > It can hold a reference in random page and then verify the page > > is balloon page. Otherwise, just put. > > > > 2. page_idle_get_page > > > > It has PageLRU check but it's racy so it can hold a reference > > of randome page and then verify within zone->lru_lock. If it's > > not LRU page, just put. > > Well, I see your concern now - even if a page is not on lru and we > locked all structs pointing to it, it can always get accessed by pfn in > a completely unrelated thread, like in examples you gave above. That's a > fair point. > > However, I still think that it's OK in case of pipe buffers. What can > happen if somebody takes a transient reference to a pipe buffer page? At > worst, we'll see page_count > 1 due to temporary ref and abort stealing, > falling back on copying instead. That's OK, because stealing is not > guaranteed. Can a function that takes a transient ref to page by pfn > mistakenly assume that this is a page it's interested in? I don't think > so, because this page has no marks on it except special _mapcount value, > which should only be set on kmemcg pages. Well, all this information deserve to be in the changelog. Maybe in 6 months, this will be incredibly useful for bug hunting. pipes can be used to exchange data (or pages) between processes in different domains. If kmemcg is not precise, this could be used by some attackers to force some processes to consume all their budget and eventually not be able to allocate new pages. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f69.google.com (mail-pa0-f69.google.com [209.85.220.69]) by kanga.kvack.org (Postfix) with ESMTP id 5353C6B007E for ; Thu, 26 May 2016 10:15:55 -0400 (EDT) Received: by mail-pa0-f69.google.com with SMTP id yl2so113141658pac.2 for ; Thu, 26 May 2016 07:15:55 -0700 (PDT) Received: from mail-pf0-x244.google.com (mail-pf0-x244.google.com. [2607:f8b0:400e:c00::244]) by mx.google.com with ESMTPS id vx8si20979822pac.107.2016.05.26.07.15.54 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 26 May 2016 07:15:54 -0700 (PDT) Received: by mail-pf0-x244.google.com with SMTP id f144so2524026pfa.2 for ; Thu, 26 May 2016 07:15:54 -0700 (PDT) Message-ID: <1464272149.5939.92.camel@edumazet-glaptop3.roam.corp.google.com> Subject: Re: [PATCH RESEND 7/8] pipe: account to kmemcg From: Eric Dumazet Date: Thu, 26 May 2016 07:15:49 -0700 In-Reply-To: <20160526135930.GA26059@esperanza> References: <2c2545563b6201f118946f96dd8cfc90e564aff6.1464079538.git.vdavydov@virtuozzo.com> <1464094742.5939.46.camel@edumazet-glaptop3.roam.corp.google.com> <20160524161336.GA11150@esperanza> <1464120273.5939.53.camel@edumazet-glaptop3.roam.corp.google.com> <20160525103011.GF11150@esperanza> <20160526070455.GF9661@bbox> <20160526135930.GA26059@esperanza> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Vladimir Davydov Cc: Minchan Kim , Andrew Morton , Alexander Viro , Johannes Weiner , Michal Hocko , linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, netdev@vger.kernel.org, x86@kernel.org, linux-kernel@vger.kernel.org On Thu, 2016-05-26 at 16:59 +0300, Vladimir Davydov wrote: > On Thu, May 26, 2016 at 04:04:55PM +0900, Minchan Kim wrote: > > On Wed, May 25, 2016 at 01:30:11PM +0300, Vladimir Davydov wrote: > > > On Tue, May 24, 2016 at 01:04:33PM -0700, Eric Dumazet wrote: > > > > On Tue, 2016-05-24 at 19:13 +0300, Vladimir Davydov wrote: > > > > > On Tue, May 24, 2016 at 05:59:02AM -0700, Eric Dumazet wrote: > > > > > ... > > > > > > > +static int anon_pipe_buf_steal(struct pipe_inode_info *pipe, > > > > > > > + struct pipe_buffer *buf) > > > > > > > +{ > > > > > > > + struct page *page = buf->page; > > > > > > > + > > > > > > > + if (page_count(page) == 1) { > > > > > > > > > > > > This looks racy : some cpu could have temporarily elevated page count. > > > > > > > > > > All pipe operations (pipe_buf_operations->get, ->release, ->steal) are > > > > > supposed to be called under pipe_lock. So, if we see a pipe_buffer->page > > > > > with refcount of 1 in ->steal, that means that we are the only its user > > > > > and it can't be spliced to another pipe. > > > > > > > > > > In fact, I just copied the code from generic_pipe_buf_steal, adding > > > > > kmemcg related checks along the way, so it should be fine. > > > > > > > > So you guarantee that no other cpu might have done > > > > get_page_unless_zero() right before this test ? > > > > > > Each pipe_buffer holds a reference to its page. If we find page's > > > refcount to be 1 here, then it can be referenced only by our > > > pipe_buffer. And the refcount cannot be increased by a parallel thread, > > > because we hold pipe_lock, which rules out splice, and otherwise it's > > > impossible to reach the page as it is not on lru. That said, I think I > > > guarantee that this should be safe. > > > > I don't know kmemcg internal and pipe stuff so my comment might be > > totally crap. > > > > No one cannot guarantee any CPU cannot held a reference of a page. > > Look at get_page_unless_zero usecases. > > > > 1. balloon_page_isolate > > > > It can hold a reference in random page and then verify the page > > is balloon page. Otherwise, just put. > > > > 2. page_idle_get_page > > > > It has PageLRU check but it's racy so it can hold a reference > > of randome page and then verify within zone->lru_lock. If it's > > not LRU page, just put. > > Well, I see your concern now - even if a page is not on lru and we > locked all structs pointing to it, it can always get accessed by pfn in > a completely unrelated thread, like in examples you gave above. That's a > fair point. > > However, I still think that it's OK in case of pipe buffers. What can > happen if somebody takes a transient reference to a pipe buffer page? At > worst, we'll see page_count > 1 due to temporary ref and abort stealing, > falling back on copying instead. That's OK, because stealing is not > guaranteed. Can a function that takes a transient ref to page by pfn > mistakenly assume that this is a page it's interested in? I don't think > so, because this page has no marks on it except special _mapcount value, > which should only be set on kmemcg pages. Well, all this information deserve to be in the changelog. Maybe in 6 months, this will be incredibly useful for bug hunting. pipes can be used to exchange data (or pages) between processes in different domains. If kmemcg is not precise, this could be used by some attackers to force some processes to consume all their budget and eventually not be able to allocate new pages. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org