From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Vagin Subject: Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces Date: Mon, 1 Aug 2016 16:01:49 -0700 Message-ID: <20160801230147.GA32309@outlook.office365.com> References: <87r3ahepb4.fsf@x220.int.ebiederm.org> <20160726025455.GC26206@outlook.office365.com> <3390535b-0660-757f-aeba-c03d936b3485@gmail.com> <20160726182524.GA328@outlook.office365.com> <20160726203955.GA9415@outlook.office365.com> <87popxkjjp.fsf@x220.int.ebiederm.org> <40e35f1a-10e6-b7a5-936e-a09f008be0d0@gmail.com> <87h9b8e2v7.fsf@x220.int.ebiederm.org> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Content-Disposition: inline In-Reply-To: <87h9b8e2v7.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: containers-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org Errors-To: containers-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org To: "Eric W. Biederman" Cc: James Bottomley , Andrey Vagin , Linux API , Linux Containers , LKML , Alexander Viro , "criu-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org" , "Michael Kerrisk (man-pages)" , linux-fsdevel List-Id: containers.vger.kernel.org On Fri, Jul 29, 2016 at 01:05:48PM -0500, Eric W. Biederman wrote: > "Michael Kerrisk (man-pages)" writes: > > > Hi Eric, > > > > On 07/28/2016 02:56 PM, Eric W. Biederman wrote: > >> "Michael Kerrisk (man-pages)" writes: > >> > >>> On 07/26/2016 10:39 PM, Andrew Vagin wrote: > >>>> On Tue, Jul 26, 2016 at 09:17:31PM +0200, Michael Kerrisk (man-pages) wrote: > >> > >>>> If we want to compare two file descriptors of the current process, > >>>> it is one of cases for which kcmp can be used. We can call kcmp to > >>>> compare two namespaces which are opened in other processes. > >>> > >>> Is there really a use case there? I assume we're talking about the > >>> scenario where a process in one namespace opens a /proc/PID/ns/* > >>> file descriptor and passes that FD to another process via a UNIX > >>> domain socket. Is that correct? > >>> > >>> So, supposing that we want to build a map of the relationships > >>> between namespaces using the proposed kcmp() API, and there are > >>> say N namespaces? Does this mena we make (N * (N-1) / 2) calls > >>> to kcmp()? > >> > >> Potentially. The numbers are small enough O(N^2) isn't fatal. > > > > Define "small", please. > > > > O(N^2) makes me nervous about what other use cases lurk out > > there that may get bitten by this. > > Worst case for N (One namespace per thread) is about 60k. > A typical heavy use case may be 1000 namespaces of any type. > So we are talking about O(N^2) that rarely happens and should be done in > a couple of seconds. > > >> Where kcmp shines is that it allows migration to happen. Inode numbers > >> to change (which they very much will today), and still have things work. > > > > > >> We can keep it O(Nlog(N)) by taking advantage of not just the equality > >> but the ordering relationship. Although Ugh. > > > > Yes, that sounds pretty ugly... > > Actually having thought about this a little more if kcmp returns an > ordering by inode and migration preserves the relative order of > the inodes (which should just be a creation order) it should be quite > solvable. > > Switch from an order by inode number to an order by object creation > time, and guarantee that all creations are have an order (which with > task_list_lock we practically already have) and it should be even easier > to create. (A 64bit nanosecond resolution timestamp is good for 544 > years of uptime). A 64bit number that increments each time an object is > created should have an even better lifespan. > > I don't know if we can find a way to give that guarantee for other kcmp > comparisons but it is worth a thought. > > >>One disadvantage of > >> kcmp currently is that the way the ordering relationship is defined > >> the order is not preserved over migration :( > > > > So, does kcmp() fully solve the proble(s) at hand? It sounds like > > not, if I understand your last point correctly. > > There are 3 possibilities I see for migration in migration, ordered > in order of implementation difficulty. > 1) Have a clear signal that migration happened and a nested migration > needs to restart. > 2) Use kcmp so that only the relative order needs to be preserved. > 3) Preserve the device number and inode numbers. > > At a practical level I think (2) may actually in net be the simplest. > It requires a little more care to implement and you have to opt in, > but it should not require any rolling back of activity (merely careful > ordering of object creation). > > I definititely like kcmp knowing how to compare things by inode > (aka st_dev, st_inode) because then even if you have to restart > the comparisons after a migration the exact details you are comparing > are hidden and so it is easier to support and harder to get wrong. > > I can imagine how to preserve inode numbers by creating a new instance > of nsfs instance and using the old inode numbers upon restore. I don't > currently see how we could possibly preserve st_dev over migration short of > a device number namespace. I think we can avoid comparing st_dev if we will compare inode numbers for parent user namespaces. Namespaces looks like a tree where user-namespaces are directories and other namespaces are files. A namespace can be described by a path in this imaginary file system, which looks like /userns1/userns2/XXXns. In this case we need to guarantee uniq names inside each directories and that they will be not changed over migration. > > So if we are going to continue with making device numbers be a legacy > attribute applications should not care about we need a way to compare > things by not looking at st_dev. Which brings us back to kcmp. > > Hmm. Hotplugging as disk and plugging it back likely will change the > device number and give the same kind of challenge with st_dev (although > you can't keep a file descriptor open across that kind of event). So > certainly a hotplug event on a device should be enough to say don't care > about the device number. > > Eric > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753149AbcHBJtV (ORCPT ); Tue, 2 Aug 2016 05:49:21 -0400 Received: from mail-db5eur01on0101.outbound.protection.outlook.com ([104.47.2.101]:55664 "EHLO EUR01-DB5-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1752037AbcHBJtK (ORCPT ); Tue, 2 Aug 2016 05:49:10 -0400 Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=avagin@virtuozzo.com; Date: Mon, 1 Aug 2016 16:01:49 -0700 From: Andrew Vagin To: "Eric W. Biederman" CC: "Michael Kerrisk (man-pages)" , Andrey Vagin , "Serge E. Hallyn" , "criu@openvz.org" , Linux API , Linux Containers , LKML , James Bottomley , linux-fsdevel , Alexander Viro Subject: Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces Message-ID: <20160801230147.GA32309@outlook.office365.com> References: <87r3ahepb4.fsf@x220.int.ebiederm.org> <20160726025455.GC26206@outlook.office365.com> <3390535b-0660-757f-aeba-c03d936b3485@gmail.com> <20160726182524.GA328@outlook.office365.com> <20160726203955.GA9415@outlook.office365.com> <87popxkjjp.fsf@x220.int.ebiederm.org> <40e35f1a-10e6-b7a5-936e-a09f008be0d0@gmail.com> <87h9b8e2v7.fsf@x220.int.ebiederm.org> MIME-Version: 1.0 Content-Type: text/plain; charset="koi8-r" Content-Disposition: inline In-Reply-To: <87h9b8e2v7.fsf@x220.int.ebiederm.org> User-Agent: Mutt/1.6.2 (2016-07-01) X-Originating-IP: [162.246.95.100] X-ClientProxiedBy: CO2PR20CA0035.namprd20.prod.outlook.com (10.163.96.45) To AM5PR0801MB1969.eurprd08.prod.outlook.com (10.168.158.8) X-MS-Office365-Filtering-Correlation-Id: 1e5bc95f-4b11-4233-6203-08d3ba5fdb01 X-Microsoft-Exchange-Diagnostics: 1;AM5PR0801MB1969;2:gAB3UdAZXQptU/0bftCa4+z98CJBaZ7O9ctx6/OMlpxXYGgGOMYyeYMnR5rCBFT4y0ayU5GD7d/5+TXEkwXtOoQLogBaPx1mJMYXuZvdYgAt93Tus0LLLSpOLFUGuUV3PWBePpiZAnAQvbjbhgjxP10M3XiyRDDZJvKog/Pphj0aDwjdshQ0GeGj/SjwkH/h;3:Ak+AhMfFTnUVsbi8TyvzdY5hrSdDfu/Rqsk032WdxkERorg4sPxsPoQIB8YlWsZTWhE9FugVfuwVXvSeNzdINnpzF5/HFNkhI4T9HyzuVSkPimZVYqDZZh83A4jCELfM;25:VH+59ujf7CiOjzhZIH7iM8Z+LAHJL10jZN0E7hS0F1NhgDtDg3Wxj/PosphEiKLLlSr2PtC3lAGK7hm5ifQR0lxBP7Yypz4J7hTVA2w+mDhhJCJOI0aTbXQIrXqfkyq5ZBapzA5CpsufVXRwfyTRcaRXDzCk/B2F0BlAYN1bMbysGtJKezwfxrau3evHuiBf1Vh/6p8Ji1uGNfG2qOmLO3IIRrvc/UN+fXktFDU0aEq3KgZER5K98+LV/2Ov6xnxCgdWgFalI5DXCNLvbjkNp7Dp+8jno6+RSm32Pov5scy94sZVA473c4Punjra0gAAbnKzXEe+99XgOky8SHz/qdk9qpDcTw99WOy/2iuMYLumBhYDL3qHj2+j6n8joofQ085V9HnoslEkfuUHGh1jLNL68XXxzNlpUBLqqERln1w= X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:AM5PR0801MB1969; X-Microsoft-Exchange-Diagnostics: 1;AM5PR0801MB1969;31:sOpynFuwCNp0r+bMz0RPz+bOLsY9CA1vZxRBgVIX+CwSTQVJyGSS9pXLA+VyGAlSklyPtZYJdH6eMUllIfnP3TZRhOhhZyNumoMRWNLm8Xae11hL5BBnb9bL2C+1ITGtuum1XEFSevsajrcQJqPcydLgUz9YmMzM1WbQhB3HQooSSR97+s2N7Zj8xkp4Fv2E7acMRx1tZNo0s6hZUj6j4FsYaBkV717UXmrNJ2ASZRU=;4:8Q7ED/slNEzHGOjo38xDzdBzT6fiu20w3YwdR7wU0EqsvRPJmJx1DUAfnc8LjQDJUaWPtEb4ZQqx+OLBivjfzFm74IeAKB0YiXM3TlH3cKQSV3o50KAU7gKlsAialzVhU5TOI1bJM7EKbx6xzLCG70XG0Gx3wO1u5P2HUso7D9vSYatYGLFrSgA7I/lpVq4BZNnRcfmirUa4jb5giD/A6MOR0keXEg6ZPrbp5nGXXumhzg8vHl0xDXGMBXDgpDurWiSpG751IznPhvGvBXX38Hpsqtuh3OAK/289cAaMIWACwzrIlnstrkbeiIesWlIhqj4AWkjRzkbsfn1c+6DMVm4xLSdeiLilcNUSLRixtdXQlHcfct7OJprB+RRBM4jJVf0OmodswS4I1w7TyknYVq68kNQqjud72VTRMf3rJL+s1EZIjExSv5krcQtt2ujtxycK5qK0mK0ddQIt3noas8yRpej2+vUEVE9xWiUU9BIrvSKyo1p31zuQUI6ILQu1 X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(278428928389397)(17755550239193); X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(6040130)(601004)(2401047)(8121501046)(5005006)(10201501046)(3002001)(6041072)(6043046);SRVR:AM5PR0801MB1969;BCL:0;PCL:0;RULEID:;SRVR:AM5PR0801MB1969; X-Forefront-PRVS: 0021920B5A X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10019020)(4630300001)(6009001)(7916002)(189002)(24454002)(199003)(377454003)(305945005)(68736007)(50466002)(7846002)(110136002)(9686002)(7736002)(23686003)(19580395003)(2906002)(6116002)(4326007)(19580405001)(92566002)(83506001)(69596002)(97736004)(1076002)(2950100001)(586003)(3846002)(4001350100001)(101416001)(53416004)(33656002)(66066001)(47776003)(86362001)(189998001)(106356001)(76176999)(54356999)(105586002)(50986999)(77096005)(93886004)(8676002)(42186005)(81166006)(81156014)(7099028)(18370500001)(26326002);DIR:OUT;SFP:1102;SCL:1;SRVR:AM5PR0801MB1969;H:outlook.office365.com;FPR:;SPF:None;PTR:InfoNoRecords;MX:1;A:1;LANG:en; X-Microsoft-Exchange-Diagnostics: =?koi8-r?Q?1;AM5PR0801MB1969;23:RhFvHYhTvemCnbAfI5bseqv1uNGkcySXGaUnW9bmJ?= =?koi8-r?Q?NapEhSd5NvVZ5C5645j+NsncEwF4Pr3r4itH+ekyoDZAXgD40jin02Mk18PGW9?= =?koi8-r?Q?4fLHvMnt8gGdaWhGnHqXB9qr7sKty27elJjmkB1X50eWbY7hXFBlujMuohlsav?= =?koi8-r?Q?5SP5EyT7oh3CBe3WtnlBfMuPK5vtMQL9bnG22WqCJAOFEuamO3HyU/O3UPlbAl?= =?koi8-r?Q?Ul8glAf9o2BphImK3+3cMrMyWhuo+xnEcSjWbTcY4z1U3ov4zvewU/E5oOfrzo?= =?koi8-r?Q?BC8M/3koMZsHGIO7EkIp4/iaP/rkIF3PMvkwN01e3IxLtXykd1g01/Fuf4vZAQ?= =?koi8-r?Q?l/8Y/+FbK/LcFne4cxNDhph221K9O4RfeRTe3l7eSLEuVJmq9JWW9bhdevbGkM?= =?koi8-r?Q?VQC63Xb90meKAX/Yw8+DeP72Jq0FxbcOPg/t+m8p8mhbiHZZRP9k6IC2XEm01H?= =?koi8-r?Q?gKydHHE0hXV/i2Q1FpYlDjXtP8L1PO594FZeoG85OoL6oft69TEsZPpZoLppn+?= =?koi8-r?Q?nhUzMbJDv4bO/pGIdURU1ER5J2AY1U6npjuVYoiMjCO/7PU65UnTfzZ5pU9dKf?= =?koi8-r?Q?brvOtkdA5k9FS+irptDvLfEuC32Ts0wp7cm7tgAo1KUQe1r2bO9ntLHbk4fi22?= =?koi8-r?Q?sA72HH4XdNVcT+ykTVbMcXm9F236HaS+v8xMlBe1X6eRwmCyrW6jFP7/ECm7X4?= =?koi8-r?Q?51U9vnkNclk+E7v5FBVfpTNBlDD69rao1g+tPpOCc80BSMLCdSsVrRwHmjJoHg?= =?koi8-r?Q?IHxiZzt7Bm+fxUgU6v5mC9KbiN2/8kB+mxf3+ZzyTGSe9h37GCCLfNeUzck4nY?= =?koi8-r?Q?ODYy8JicQdt7iR8cZ6OJJvSC+BgcdQxLjP8HaC3HwpFHEsj6cNVX8rTYSlnmBx?= =?koi8-r?Q?8RoQ2Fam0PrweId5dUETuiIkTDezxsY3d7islEY8Sf10ElMKvP5d6vsWO4yzll?= =?koi8-r?Q?dhW5mTaDh6ENVeyIWRkQQ18hUN3e37jOKqejXFvD3Ya+zHkz3ApckKbzDYlxh4?= =?koi8-r?Q?NIEJhP6EjDle71uL18sTVSDLVZsSgGolPG+VJe6/6t5CgSVwHGqLS7QA7OU+Aw?= =?koi8-r?Q?c9yVxG+k7SGZI2Fv4j+B3Isy62M67PAA0YHSqU0SAJINgQ+ZIBEbOxzQt2qbfz?= =?koi8-r?Q?ytbR0ZsK6MfU18av3sF6iJ/Fk2TQoOsU0GRqNZbVoKIZMb86a2UBrGiYv99gdy?= =?koi8-r?Q?dqmxDxrZRi7sKsbSgJijCsrVCX2jjOn0tZRFk1Mg=3D?= X-Microsoft-Exchange-Diagnostics: 1;AM5PR0801MB1969;6:HVcGOsuCRIgZiLRUIbDHT0TCKXkzfO4sGg89/yJKheY/2/K5sFBSIsoh2px8GBrQaNoksoulnWabfxkEtscYCyuA/bus3SHYPbua6Px58DgOcJxUY4RjsiFSwg7OyOcrbbs/gT8LfeQaflffQ0EOxZ8W7MBr7Vf5L4ZlpPjbk6Vg5qwyBRL58YOXalAT5IAuTaY4bdviEO0GIYK4IqqRJgLHPvu5nh3Bg+q2C6aFy53dqLHQJrBWXc9ZdiERLCQd/bQfW2SIDUDc6cBZ6+ZH3px5vgZLEDuBVZHDy90/k9IS4FlHuTEowh1k1NEUSG5A;5:m1EYgZoHOotv8JqixlhfbbKUQVN77SPDb0vB0F7/Jo+YpaCmuuKNtH9LzuIF7pFU+ZP7DKwQBpgIdH1C79IzcEiCFi5P232ml9AX2JtZir2jDxP+G5CFBoin/dZpHWY3Scmed9uADlaS4e+cZX4z9A==;24:Xr8aXtiUYD0L6A8mFKVUflAEhOdf1asGs4bu6O9dY7J/j7yBjgS1enSAiEfDE4DCC0ufGXUlJ9+0m6k7zQzcywYJG7Rz/zZh7SgdpQx7IYU=;7:ZFwzNO88Gt018KGlL1X/9moW7S1S9Hpw9J6832bWIuOM8QNke/f6AL7ZkGZKH9KGnZTHNP2LvQotPRCRFo0vzCO+Qw8yiI+8sMTkMKUsa1sEWheJOZCmSnkqRcvJ7pqy7cQUQijH6bnU8bpzObWml+kuwAUN7RcegXj8hgRoHEoQQKWLgPmQj/L4thChpHMXpKOpeyYOBcqzg+6qWikfK3USz79nB59K8kAb0CQHk4ZlKYncbMDaEo8wtzMxj1J+ SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1;AM5PR0801MB1969;20:6Y4txWXhxKOAuyo4VxLesUBl//RpF0sJjZaf8nmgFpTo57bQ7/LE4I9BENh28iCqPj/Jh/V382ZclypoPe07Z7eTCUlrcwneNownNBV+Nq5fb4yFSIW9VGG//3dYw+v5xOZNhGJtWBwzaOuqvv4tQaDrHjpnb3Fg9cOZEF5GV50= X-OriginatorOrg: virtuozzo.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 01 Aug 2016 23:02:02.7504 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM5PR0801MB1969 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jul 29, 2016 at 01:05:48PM -0500, Eric W. Biederman wrote: > "Michael Kerrisk (man-pages)" writes: > > > Hi Eric, > > > > On 07/28/2016 02:56 PM, Eric W. Biederman wrote: > >> "Michael Kerrisk (man-pages)" writes: > >> > >>> On 07/26/2016 10:39 PM, Andrew Vagin wrote: > >>>> On Tue, Jul 26, 2016 at 09:17:31PM +0200, Michael Kerrisk (man-pages) wrote: > >> > >>>> If we want to compare two file descriptors of the current process, > >>>> it is one of cases for which kcmp can be used. We can call kcmp to > >>>> compare two namespaces which are opened in other processes. > >>> > >>> Is there really a use case there? I assume we're talking about the > >>> scenario where a process in one namespace opens a /proc/PID/ns/* > >>> file descriptor and passes that FD to another process via a UNIX > >>> domain socket. Is that correct? > >>> > >>> So, supposing that we want to build a map of the relationships > >>> between namespaces using the proposed kcmp() API, and there are > >>> say N namespaces? Does this mena we make (N * (N-1) / 2) calls > >>> to kcmp()? > >> > >> Potentially. The numbers are small enough O(N^2) isn't fatal. > > > > Define "small", please. > > > > O(N^2) makes me nervous about what other use cases lurk out > > there that may get bitten by this. > > Worst case for N (One namespace per thread) is about 60k. > A typical heavy use case may be 1000 namespaces of any type. > So we are talking about O(N^2) that rarely happens and should be done in > a couple of seconds. > > >> Where kcmp shines is that it allows migration to happen. Inode numbers > >> to change (which they very much will today), and still have things work. > > > > > >> We can keep it O(Nlog(N)) by taking advantage of not just the equality > >> but the ordering relationship. Although Ugh. > > > > Yes, that sounds pretty ugly... > > Actually having thought about this a little more if kcmp returns an > ordering by inode and migration preserves the relative order of > the inodes (which should just be a creation order) it should be quite > solvable. > > Switch from an order by inode number to an order by object creation > time, and guarantee that all creations are have an order (which with > task_list_lock we practically already have) and it should be even easier > to create. (A 64bit nanosecond resolution timestamp is good for 544 > years of uptime). A 64bit number that increments each time an object is > created should have an even better lifespan. > > I don't know if we can find a way to give that guarantee for other kcmp > comparisons but it is worth a thought. > > >>One disadvantage of > >> kcmp currently is that the way the ordering relationship is defined > >> the order is not preserved over migration :( > > > > So, does kcmp() fully solve the proble(s) at hand? It sounds like > > not, if I understand your last point correctly. > > There are 3 possibilities I see for migration in migration, ordered > in order of implementation difficulty. > 1) Have a clear signal that migration happened and a nested migration > needs to restart. > 2) Use kcmp so that only the relative order needs to be preserved. > 3) Preserve the device number and inode numbers. > > At a practical level I think (2) may actually in net be the simplest. > It requires a little more care to implement and you have to opt in, > but it should not require any rolling back of activity (merely careful > ordering of object creation). > > I definititely like kcmp knowing how to compare things by inode > (aka st_dev, st_inode) because then even if you have to restart > the comparisons after a migration the exact details you are comparing > are hidden and so it is easier to support and harder to get wrong. > > I can imagine how to preserve inode numbers by creating a new instance > of nsfs instance and using the old inode numbers upon restore. I don't > currently see how we could possibly preserve st_dev over migration short of > a device number namespace. I think we can avoid comparing st_dev if we will compare inode numbers for parent user namespaces. Namespaces looks like a tree where user-namespaces are directories and other namespaces are files. A namespace can be described by a path in this imaginary file system, which looks like /userns1/userns2/XXXns. In this case we need to guarantee uniq names inside each directories and that they will be not changed over migration. > > So if we are going to continue with making device numbers be a legacy > attribute applications should not care about we need a way to compare > things by not looking at st_dev. Which brings us back to kcmp. > > Hmm. Hotplugging as disk and plugging it back likely will change the > device number and give the same kind of challenge with st_dev (although > you can't keep a file descriptor open across that kind of event). So > certainly a hotplug event on a device should be enough to say don't care > about the device number. > > Eric >