From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754030AbcEZN7m (ORCPT ); Thu, 26 May 2016 09:59:42 -0400 Received: from mail-am1on0128.outbound.protection.outlook.com ([157.56.112.128]:36018 "EHLO emea01-am1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1753715AbcEZN7k (ORCPT ); Thu, 26 May 2016 09:59:40 -0400 Authentication-Results: kernel.org; dkim=none (message not signed) header.d=none;kernel.org; dmarc=none action=none header.from=virtuozzo.com; Date: Thu, 26 May 2016 16:59:30 +0300 From: Vladimir Davydov To: Minchan Kim CC: Eric Dumazet , Andrew Morton , Alexander Viro , Johannes Weiner , Michal Hocko , , , , , Subject: Re: [PATCH RESEND 7/8] pipe: account to kmemcg Message-ID: <20160526135930.GA26059@esperanza> References: <2c2545563b6201f118946f96dd8cfc90e564aff6.1464079538.git.vdavydov@virtuozzo.com> <1464094742.5939.46.camel@edumazet-glaptop3.roam.corp.google.com> <20160524161336.GA11150@esperanza> <1464120273.5939.53.camel@edumazet-glaptop3.roam.corp.google.com> <20160525103011.GF11150@esperanza> <20160526070455.GF9661@bbox> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: <20160526070455.GF9661@bbox> X-Originating-IP: [195.214.232.10] X-ClientProxiedBy: AM2PR09CA0057.eurprd09.prod.outlook.com (2a01:111:e400:8413::25) To AM3PR08MB0580.eurprd08.prod.outlook.com (2a01:111:e400:c408::14) X-MS-Office365-Filtering-Correlation-Id: f5ee199c-5ab2-4318-068c-08d3856df8e7 X-Microsoft-Exchange-Diagnostics: 1;AM3PR08MB0580;2:Xe0EBTmFSi4gqtwZoMY4uXyXTR83RsHCN3FIj/kWNDRUIVqoZpby3aDquoYw2FEMvohl6iuoVzZNPgQ9Iw97ZVqh7h2GJR/eW9OQBTYzSqeWgXWW4+tblkWKQ+iRIvl3acp0yJOjlaCNqTHVHhdUr4lpvmifQhyz8gnFC+oAFvijZjM6aU4CVqVSWXBrtI2Y;3:AJLo/+aMA2RbPs2mcUK5jiqlclaLy0mXqDpikhXP4KSeQ3X8U8WTc0XJXRNbHTg2NSUMq+AepHH0qrRX3NzLuOyLUNNN3uZs1T7tsg3jQOH1DBWaYgtFEaJiGucZANbc X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:AM3PR08MB0580; X-Microsoft-Exchange-Diagnostics: 1;AM3PR08MB0580;25:EKHwVy7NeKAHAENrz8kujtiBXmwg67pLE48ZOLcG9CZeF+Q1ZpjHnGikIoISGKjAWJrMVHM8Qemo8nC0UXUrbK59TzqdtrVadS/3cxQjTBWEDChhsPABix2iQVDMZumOSehnE1q5Zow/2fz/kmxhvRFL/y1VMiMtLVaKjCh8/E1jN4ya7HnpUizmtfJf7lAXgz/0tW0TqCog0De3/JwZYnayR8b0bGVnFCvNTYo5Q78+cBVv8uU1lzNbeXMLFI6BrzkIq8FzqaC1NZ8UWzj8yJCeDefdKAbRGcwNziBrN1CQou/4Wo5lGT5oaKSxOTbuR3IsDGdLZY4/OXFHK7e9a6sHV5CmH5kE6bjkhkYy2r0k7xWHoZRdt3z6V5Ln2/o5XU8FuKDcnukoa1T+SM9KY5tqTAwU7mDcRMsWleqJUJsQiIJIRrk0GEARQOIpiZDVngI51I138gQ1W6LMuu8XmWsMjqqjbynq42CJUABX1AGByolfDs/x8JQgngYsf3KOA1dtS337hq6seCbtGIaGlLQ2j1c2szrSPoW6dcAl0CcLfhX1bnO/6KiivsZ3Flb8UrjbNW6qMVHIwOzlXSBpRzOaj89FyN1uKNmOohfe2vPw7dBG9FSU9vFceuh3DUKaw7z+tJ2njK2Sfb/RB9bvzIKSRUv8b6VHK2FY5UTyIRuXolj/WKFnnauZTgQHbXIf9geU3sm2NhpJ7yLBKd2AuVDtnGoIpftO0aBjzmUgCy2quPX3G3WSvXNGIJVms8yY X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:; X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(6040130)(601004)(2401047)(5005006)(8121501046)(10201501046)(3002001)(6041072)(6043046);SRVR:AM3PR08MB0580;BCL:0;PCL:0;RULEID:;SRVR:AM3PR08MB0580; X-Microsoft-Exchange-Diagnostics: 1;AM3PR08MB0580;4:Ne22+ZqEMtZVw1gYO8aXGqCpk5ATvFXDbNI7Tu5J6okNwpW8LJFJNjo89ojvWMJ468j55n8B4vcV7jCKyK1C1e5ZAkkSUFjx5jLFhbjAMXK2twrdUq0yS7vWxoUnzTv3P5zwexGJqgxUUw3QWR7jTX1jgiwj7uW1PS9yhaOyIye6chJwCvm/GtCHxJrl/R7vxDLmOYzIi5iZEOFdL5/iVxoXTUIT2qUjHKiaW2wED94l8q5dnQnPF2e8tHfeUKZuXg6EBdDz9GSiEcaSCmhKacR6YKq95J4yQ58Up3pW9j/bJLvFTNhnAMBUf+ZNpU1ageWrG+utBHvcjLiT9VsaiG7qLU0jMgEz/+hrguYcSLptNUdRRYnrP34aRwq9aNT6FP5Xuc0wap6nSHbrH5EQ4lK9WJaI56lAOFTIQqSrcVs= X-Forefront-PRVS: 0954EE4910 X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10019020)(4630300001)(6009001)(24454002)(377424004)(586003)(5008740100001)(86362001)(42186005)(33716001)(6116002)(9686002)(3846002)(4326007)(81166006)(33656002)(50466002)(23726003)(1076002)(5004730100002)(2906002)(110136002)(2950100001)(66066001)(93886004)(189998001)(54356999)(46406003)(92566002)(76176999)(47776003)(97756001)(15650500001)(50986999)(80792005)(8676002)(217873001);DIR:OUT;SFP:1102;SCL:1;SRVR:AM3PR08MB0580;H:esperanza;FPR:;SPF:None;MLV:sfv;LANG:en; X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1;AM3PR08MB0580;23:mAm1z8d/OFuC7m9KSYDs8vQXGr0qe6t4a9f6uQbRA?= =?us-ascii?Q?OF9fR0p3RW+5xqYO22hwc6JkcCPBYXHhHA5k6ltg1QlyQcFwjHFqs1yoclCw?= =?us-ascii?Q?vd6/HBKhp6Ap/JuwknYT4HYZn+6oNgAb8VFSVVWltGvCQfKylAglgQh2TZuK?= =?us-ascii?Q?6pZCPH6+WuwDB/cJnigUeTTTLQnFJoWpvOCk9W2KVGQ0rJeYGVs1DzdPsK07?= =?us-ascii?Q?LHDn5Gfhri9JyespMuuyuwgJbY4hQ0WPKuqpEVd4pGtpviFz9QKR0LI1CNrM?= =?us-ascii?Q?ZHIplOLEVMogG+KcP6CVVkVhMqHZzH0hzrHxIsyPBUum46IliF2YRZ7vk14E?= =?us-ascii?Q?wQFaEyl/DNoZQv3vjkO081JfkbwmL+5wilhiRXruRQonV7sVlrLJ8TtPyhGq?= =?us-ascii?Q?uR2LYfti5LLYOTRBf9+/AN3qCFjmeoqapHrg9chuOD2A5BUkCgKNudG+KcND?= =?us-ascii?Q?+Pi7ZbOqyig0aLinWXzQAYwzbkQG883MODv0aWDNdhOe7NcVE82GlbHS1YKM?= =?us-ascii?Q?yj8aUKr2BJ79VRAE83H9PWzvlidazP7XOlLxaofg3ui3C4O775MCGNIJWPed?= =?us-ascii?Q?qcn4FEvvz9trblOqSDawVRzbAfruHV13m6dcDMSMpCxkK5aEtIg96w2WmPin?= =?us-ascii?Q?jby3X4VvuQI5BTgXUwzaXLoZI0Tz1hM9J0JAy8Yp6atBzCYWPOJifgYjefMU?= =?us-ascii?Q?IvACFP+IYDz1E5fr96KGNnEr7BEz01CtecnrlIdFPEdw2pQAeJncQV+DCaEH?= =?us-ascii?Q?/pqQUfDUxA5Huwz1eGIPrqNwXNZjYtUQsCfbYYiUOu6GOHrQlXWIGJ0iLt6k?= =?us-ascii?Q?ZFu4hganhpVQAzeuTsPAFml8+cjgUWCDMEoXBsU9YQuWxF4q8/RgyQOTefuS?= =?us-ascii?Q?EBaChL2MlC5JYtiEghO9HxB4ibnjmlvsUj7iRHZ31pS6RMUkc4tE0egEoxJ6?= =?us-ascii?Q?sioHDovvUuccy/XRjym0NsS48tYSAvgG9rEt3lAoQ=3D=3D?= X-Microsoft-Exchange-Diagnostics: 1;AM3PR08MB0580;5:skqciGhbWU5LfDWLEydAdCNprqAwtQDPlHs0aLhEs9jTcqAqdE2AeMWDjL2Iz+VkVFYX3SnFGBnHk1PIcFiqEZdcWYifl6nUnqXlwmUwWiPSK4chlcfYDlhKfFeHc45fotDKWXWiHf8LdJsFOEms1w==;24:zEfx8vRGAhEM5CmaTpAlgjvy9xBmYGfWHRPMCnP9isBnUjTwURqsto0i26d87rgvE/rPuwFiBsmCi3uG0HqIJsMFafywHcn9o9h9aZK0EZs=;7:WKfx2Dky+nzk2eA44g1zjhzCeLP+HQdNq9vj6ANpjOP+e8RjHMhp8AZOl1JkAnRRxoqz1M/dhjsdt7lT7H2tQJtjV/zseEJWB7qG3L8Cf7k1ggJVTCySNWukBjPevyvaElAr7+Lxm7IaWCXR+8zB2p0Pknpep+DnO9k5qgfrNZg33KpT9Mzct0xxJNhgcBvH;20:wIiXBPwT/AYT3FNqvboVRucUlT14yFmStcY+R0Bta7htkO+ODNpRDBuV2VApXh97/Oo32pDdHVvt7RF25hO4GfXgTtDTdZkDuTmaF+296e2ygWeYQEWolxacUV4pAlzkO9A/N3HYjIMlRdluJ4tkgx8TkxdRAgLEIYj0L0tAVlI= SpamDiagnosticOutput: 1:23 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: virtuozzo.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 26 May 2016 13:59:35.3191 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM3PR08MB0580 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, May 26, 2016 at 04:04:55PM +0900, Minchan Kim wrote: > On Wed, May 25, 2016 at 01:30:11PM +0300, Vladimir Davydov wrote: > > On Tue, May 24, 2016 at 01:04:33PM -0700, Eric Dumazet wrote: > > > On Tue, 2016-05-24 at 19:13 +0300, Vladimir Davydov wrote: > > > > On Tue, May 24, 2016 at 05:59:02AM -0700, Eric Dumazet wrote: > > > > ... > > > > > > +static int anon_pipe_buf_steal(struct pipe_inode_info *pipe, > > > > > > + struct pipe_buffer *buf) > > > > > > +{ > > > > > > + struct page *page = buf->page; > > > > > > + > > > > > > + if (page_count(page) == 1) { > > > > > > > > > > This looks racy : some cpu could have temporarily elevated page count. > > > > > > > > All pipe operations (pipe_buf_operations->get, ->release, ->steal) are > > > > supposed to be called under pipe_lock. So, if we see a pipe_buffer->page > > > > with refcount of 1 in ->steal, that means that we are the only its user > > > > and it can't be spliced to another pipe. > > > > > > > > In fact, I just copied the code from generic_pipe_buf_steal, adding > > > > kmemcg related checks along the way, so it should be fine. > > > > > > So you guarantee that no other cpu might have done > > > get_page_unless_zero() right before this test ? > > > > Each pipe_buffer holds a reference to its page. If we find page's > > refcount to be 1 here, then it can be referenced only by our > > pipe_buffer. And the refcount cannot be increased by a parallel thread, > > because we hold pipe_lock, which rules out splice, and otherwise it's > > impossible to reach the page as it is not on lru. That said, I think I > > guarantee that this should be safe. > > I don't know kmemcg internal and pipe stuff so my comment might be > totally crap. > > No one cannot guarantee any CPU cannot held a reference of a page. > Look at get_page_unless_zero usecases. > > 1. balloon_page_isolate > > It can hold a reference in random page and then verify the page > is balloon page. Otherwise, just put. > > 2. page_idle_get_page > > It has PageLRU check but it's racy so it can hold a reference > of randome page and then verify within zone->lru_lock. If it's > not LRU page, just put. Well, I see your concern now - even if a page is not on lru and we locked all structs pointing to it, it can always get accessed by pfn in a completely unrelated thread, like in examples you gave above. That's a fair point. However, I still think that it's OK in case of pipe buffers. What can happen if somebody takes a transient reference to a pipe buffer page? At worst, we'll see page_count > 1 due to temporary ref and abort stealing, falling back on copying instead. That's OK, because stealing is not guaranteed. Can a function that takes a transient ref to page by pfn mistakenly assume that this is a page it's interested in? I don't think so, because this page has no marks on it except special _mapcount value, which should only be set on kmemcg pages. Thanks, Vladimir From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Thu, 26 May 2016 16:59:30 +0300 From: Vladimir Davydov To: Minchan Kim CC: Eric Dumazet , Andrew Morton , Alexander Viro , Johannes Weiner , Michal Hocko , , , , , Subject: Re: [PATCH RESEND 7/8] pipe: account to kmemcg Message-ID: <20160526135930.GA26059@esperanza> References: <2c2545563b6201f118946f96dd8cfc90e564aff6.1464079538.git.vdavydov@virtuozzo.com> <1464094742.5939.46.camel@edumazet-glaptop3.roam.corp.google.com> <20160524161336.GA11150@esperanza> <1464120273.5939.53.camel@edumazet-glaptop3.roam.corp.google.com> <20160525103011.GF11150@esperanza> <20160526070455.GF9661@bbox> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: <20160526070455.GF9661@bbox> Sender: owner-linux-mm@kvack.org List-ID: On Thu, May 26, 2016 at 04:04:55PM +0900, Minchan Kim wrote: > On Wed, May 25, 2016 at 01:30:11PM +0300, Vladimir Davydov wrote: > > On Tue, May 24, 2016 at 01:04:33PM -0700, Eric Dumazet wrote: > > > On Tue, 2016-05-24 at 19:13 +0300, Vladimir Davydov wrote: > > > > On Tue, May 24, 2016 at 05:59:02AM -0700, Eric Dumazet wrote: > > > > ... > > > > > > +static int anon_pipe_buf_steal(struct pipe_inode_info *pipe, > > > > > > + struct pipe_buffer *buf) > > > > > > +{ > > > > > > + struct page *page = buf->page; > > > > > > + > > > > > > + if (page_count(page) == 1) { > > > > > > > > > > This looks racy : some cpu could have temporarily elevated page count. > > > > > > > > All pipe operations (pipe_buf_operations->get, ->release, ->steal) are > > > > supposed to be called under pipe_lock. So, if we see a pipe_buffer->page > > > > with refcount of 1 in ->steal, that means that we are the only its user > > > > and it can't be spliced to another pipe. > > > > > > > > In fact, I just copied the code from generic_pipe_buf_steal, adding > > > > kmemcg related checks along the way, so it should be fine. > > > > > > So you guarantee that no other cpu might have done > > > get_page_unless_zero() right before this test ? > > > > Each pipe_buffer holds a reference to its page. If we find page's > > refcount to be 1 here, then it can be referenced only by our > > pipe_buffer. And the refcount cannot be increased by a parallel thread, > > because we hold pipe_lock, which rules out splice, and otherwise it's > > impossible to reach the page as it is not on lru. That said, I think I > > guarantee that this should be safe. > > I don't know kmemcg internal and pipe stuff so my comment might be > totally crap. > > No one cannot guarantee any CPU cannot held a reference of a page. > Look at get_page_unless_zero usecases. > > 1. balloon_page_isolate > > It can hold a reference in random page and then verify the page > is balloon page. Otherwise, just put. > > 2. page_idle_get_page > > It has PageLRU check but it's racy so it can hold a reference > of randome page and then verify within zone->lru_lock. If it's > not LRU page, just put. Well, I see your concern now - even if a page is not on lru and we locked all structs pointing to it, it can always get accessed by pfn in a completely unrelated thread, like in examples you gave above. That's a fair point. However, I still think that it's OK in case of pipe buffers. What can happen if somebody takes a transient reference to a pipe buffer page? At worst, we'll see page_count > 1 due to temporary ref and abort stealing, falling back on copying instead. That's OK, because stealing is not guaranteed. Can a function that takes a transient ref to page by pfn mistakenly assume that this is a page it's interested in? I don't think so, because this page has no marks on it except special _mapcount value, which should only be set on kmemcg pages. Thanks, Vladimir -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ob0-f198.google.com (mail-ob0-f198.google.com [209.85.214.198]) by kanga.kvack.org (Postfix) with ESMTP id A601D6B007E for ; Thu, 26 May 2016 09:59:39 -0400 (EDT) Received: by mail-ob0-f198.google.com with SMTP id yu3so127342835obb.3 for ; Thu, 26 May 2016 06:59:39 -0700 (PDT) Received: from emea01-am1-obe.outbound.protection.outlook.com (mail-am1on0113.outbound.protection.outlook.com. [157.56.112.113]) by mx.google.com with ESMTPS id t131si9881886oig.155.2016.05.26.06.59.38 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Thu, 26 May 2016 06:59:38 -0700 (PDT) Date: Thu, 26 May 2016 16:59:30 +0300 From: Vladimir Davydov Subject: Re: [PATCH RESEND 7/8] pipe: account to kmemcg Message-ID: <20160526135930.GA26059@esperanza> References: <2c2545563b6201f118946f96dd8cfc90e564aff6.1464079538.git.vdavydov@virtuozzo.com> <1464094742.5939.46.camel@edumazet-glaptop3.roam.corp.google.com> <20160524161336.GA11150@esperanza> <1464120273.5939.53.camel@edumazet-glaptop3.roam.corp.google.com> <20160525103011.GF11150@esperanza> <20160526070455.GF9661@bbox> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: <20160526070455.GF9661@bbox> Sender: owner-linux-mm@kvack.org List-ID: To: Minchan Kim Cc: Eric Dumazet , Andrew Morton , Alexander Viro , Johannes Weiner , Michal Hocko , linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, netdev@vger.kernel.org, x86@kernel.org, linux-kernel@vger.kernel.org On Thu, May 26, 2016 at 04:04:55PM +0900, Minchan Kim wrote: > On Wed, May 25, 2016 at 01:30:11PM +0300, Vladimir Davydov wrote: > > On Tue, May 24, 2016 at 01:04:33PM -0700, Eric Dumazet wrote: > > > On Tue, 2016-05-24 at 19:13 +0300, Vladimir Davydov wrote: > > > > On Tue, May 24, 2016 at 05:59:02AM -0700, Eric Dumazet wrote: > > > > ... > > > > > > +static int anon_pipe_buf_steal(struct pipe_inode_info *pipe, > > > > > > + struct pipe_buffer *buf) > > > > > > +{ > > > > > > + struct page *page = buf->page; > > > > > > + > > > > > > + if (page_count(page) == 1) { > > > > > > > > > > This looks racy : some cpu could have temporarily elevated page count. > > > > > > > > All pipe operations (pipe_buf_operations->get, ->release, ->steal) are > > > > supposed to be called under pipe_lock. So, if we see a pipe_buffer->page > > > > with refcount of 1 in ->steal, that means that we are the only its user > > > > and it can't be spliced to another pipe. > > > > > > > > In fact, I just copied the code from generic_pipe_buf_steal, adding > > > > kmemcg related checks along the way, so it should be fine. > > > > > > So you guarantee that no other cpu might have done > > > get_page_unless_zero() right before this test ? > > > > Each pipe_buffer holds a reference to its page. If we find page's > > refcount to be 1 here, then it can be referenced only by our > > pipe_buffer. And the refcount cannot be increased by a parallel thread, > > because we hold pipe_lock, which rules out splice, and otherwise it's > > impossible to reach the page as it is not on lru. That said, I think I > > guarantee that this should be safe. > > I don't know kmemcg internal and pipe stuff so my comment might be > totally crap. > > No one cannot guarantee any CPU cannot held a reference of a page. > Look at get_page_unless_zero usecases. > > 1. balloon_page_isolate > > It can hold a reference in random page and then verify the page > is balloon page. Otherwise, just put. > > 2. page_idle_get_page > > It has PageLRU check but it's racy so it can hold a reference > of randome page and then verify within zone->lru_lock. If it's > not LRU page, just put. Well, I see your concern now - even if a page is not on lru and we locked all structs pointing to it, it can always get accessed by pfn in a completely unrelated thread, like in examples you gave above. That's a fair point. However, I still think that it's OK in case of pipe buffers. What can happen if somebody takes a transient reference to a pipe buffer page? At worst, we'll see page_count > 1 due to temporary ref and abort stealing, falling back on copying instead. That's OK, because stealing is not guaranteed. Can a function that takes a transient ref to page by pfn mistakenly assume that this is a page it's interested in? I don't think so, because this page has no marks on it except special _mapcount value, which should only be set on kmemcg pages. Thanks, Vladimir -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org