linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [LSF/MM ATTEND] FS jitter testing, network caching, Lustre, cluster filesystems.
@ 2017-01-15 23:38 Oleg Drokin
  2017-01-16 17:17 ` J. Bruce Fields
  2017-01-16 17:32 ` [Lsf-pc] " James Bottomley
  0 siblings, 2 replies; 19+ messages in thread
From: Oleg Drokin @ 2017-01-15 23:38 UTC (permalink / raw)
  To: lsf-pc; +Cc: linux-fsdevel

Hello!

   I would like to attend filesystem track in the LSF/MM this year.

   Other than the obvious Lustre related stuff (ie hearing from Christoph
   how bad Lustre is and what other parts of it we need to remove),
   I can share hopefully useful testing methods we came up with in our group
   that more people can benefit from apparently, as evidenced by some interest
   from NFS people due to a bunch of problems I was able to uncover.
   I suspect other networking filesystems would benefit here.

   I also see there's potentially going to be a caching discussion that sounds
   pretty relevant to Lustre too.
   This probably would go hand-in-hand with a somewhat recent discusison with Al Viro
   about potentially redoing "unmount the subtrees on dentry invalidation" that
   appears to be overly aggressive now.

   A container support from filesystems is also very relevant to us since Lustre
   is used more and more in such settings.

Bye,
    Oleg

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [LSF/MM ATTEND] FS jitter testing, network caching, Lustre, cluster filesystems.
  2017-01-15 23:38 [LSF/MM ATTEND] FS jitter testing, network caching, Lustre, cluster filesystems Oleg Drokin
@ 2017-01-16 17:17 ` J. Bruce Fields
  2017-01-16 17:23   ` Jeffrey Altman
  2017-01-16 17:32 ` [Lsf-pc] " James Bottomley
  1 sibling, 1 reply; 19+ messages in thread
From: J. Bruce Fields @ 2017-01-16 17:17 UTC (permalink / raw)
  To: Oleg Drokin; +Cc: lsf-pc, linux-fsdevel

On Sun, Jan 15, 2017 at 06:38:43PM -0500, Oleg Drokin wrote:
> Hello!
> 
>    I would like to attend filesystem track in the LSF/MM this year.
> 
>    Other than the obvious Lustre related stuff (ie hearing from Christoph
>    how bad Lustre is and what other parts of it we need to remove),
>    I can share hopefully useful testing methods we came up with in our group
>    that more people can benefit from apparently, as evidenced by some interest
>    from NFS people due to a bunch of problems I was able to uncover.

Yes, I remember at least this found some races after the server's NFSv4
state locking was rewritten.

--b.

>    I suspect other networking filesystems would benefit here.
> 
>    I also see there's potentially going to be a caching discussion that sounds
>    pretty relevant to Lustre too.
>    This probably would go hand-in-hand with a somewhat recent discusison with Al Viro
>    about potentially redoing "unmount the subtrees on dentry invalidation" that
>    appears to be overly aggressive now.
> 
>    A container support from filesystems is also very relevant to us since Lustre
>    is used more and more in such settings.
> 
> Bye,
>     Oleg--
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [LSF/MM ATTEND] FS jitter testing, network caching, Lustre, cluster filesystems.
  2017-01-16 17:17 ` J. Bruce Fields
@ 2017-01-16 17:23   ` Jeffrey Altman
  2017-01-16 17:42     ` Chuck Lever
  0 siblings, 1 reply; 19+ messages in thread
From: Jeffrey Altman @ 2017-01-16 17:23 UTC (permalink / raw)
  To: Oleg Drokin; +Cc: lsf-pc, linux-fsdevel


[-- Attachment #1.1: Type: text/plain, Size: 550 bytes --]

> On Sun, Jan 15, 2017 at 06:38:43PM -0500, Oleg Drokin wrote:
>>    A container support from filesystems is also very relevant to us since Lustre
>>    is used more and more in such settings.

I too would be interested in participating in a discussion of filesystem
support for containers.  In particular, how to manage container identity
for network filesystems so that network filesystem modules such as kafs
can be used to provide persistent location-independent storage to
containerized processes.

Jeffrey Altman
AuriStor, Inc.



[-- Attachment #1.2: jaltman.vcf --]
[-- Type: text/x-vcard, Size: 410 bytes --]

begin:vcard
fn:Jeffrey Altman
n:Altman;Jeffrey
org:AuriStor, Inc.
adr:Suite 6B;;255 West 94Th Street;New York;New York;10025-6985;United States
email;internet:jaltman@auristor.com
title:Founder and CEO
tel;work:+1-212-769-9018
note;quoted-printable:LinkedIn: https://www.linkedin.com/in/jeffreyaltman=0D=0A=
	Skype: jeffrey.e.altman=0D=0A=
	
url:https://www.auristor.com/
version:2.1
end:vcard


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4057 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Lsf-pc] [LSF/MM ATTEND] FS jitter testing, network caching, Lustre, cluster filesystems.
  2017-01-15 23:38 [LSF/MM ATTEND] FS jitter testing, network caching, Lustre, cluster filesystems Oleg Drokin
  2017-01-16 17:17 ` J. Bruce Fields
@ 2017-01-16 17:32 ` James Bottomley
  2017-01-16 18:02   ` Oleg Drokin
  1 sibling, 1 reply; 19+ messages in thread
From: James Bottomley @ 2017-01-16 17:32 UTC (permalink / raw)
  To: Oleg Drokin, lsf-pc; +Cc: linux-fsdevel, containers

On Sun, 2017-01-15 at 18:38 -0500, Oleg Drokin wrote:
>    A container support from filesystems is also very relevant to us 
> since Lustre    is used more and more in such settings.

I've added the containers ML to the cc just in case.  Can you add more
colour to this, please?  What container support for filesystems do you
think we need beyond the user namespace in the superblock?

James



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [LSF/MM ATTEND] FS jitter testing, network caching, Lustre, cluster filesystems.
  2017-01-16 17:23   ` Jeffrey Altman
@ 2017-01-16 17:42     ` Chuck Lever
  2017-01-16 17:46       ` James Bottomley
  0 siblings, 1 reply; 19+ messages in thread
From: Chuck Lever @ 2017-01-16 17:42 UTC (permalink / raw)
  To: Jeffrey Altman; +Cc: Oleg Drokin, lsf-pc, linux-fsdevel


> On Jan 16, 2017, at 12:23 PM, Jeffrey Altman <jaltman@auristor.com> wrote:
> 
>> On Sun, Jan 15, 2017 at 06:38:43PM -0500, Oleg Drokin wrote:
>>>   A container support from filesystems is also very relevant to us since Lustre
>>>   is used more and more in such settings.
> 
> I too would be interested in participating in a discussion of filesystem
> support for containers.  In particular, how to manage container identity
> for network filesystems so that network filesystem modules such as kafs
> can be used to provide persistent location-independent storage to
> containerized processes.

I'm also interested in that discussion.


> Jeffrey Altman
> AuriStor, Inc.
> 
> 
> <jaltman.vcf>

--
Chuck Lever




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [LSF/MM ATTEND] FS jitter testing, network caching, Lustre, cluster filesystems.
  2017-01-16 17:42     ` Chuck Lever
@ 2017-01-16 17:46       ` James Bottomley
  2017-01-16 20:39         ` Authentication Contexts for network file systems and Containers was " Jeffrey Altman
  0 siblings, 1 reply; 19+ messages in thread
From: James Bottomley @ 2017-01-16 17:46 UTC (permalink / raw)
  To: Chuck Lever, Jeffrey Altman; +Cc: Oleg Drokin, lsf-pc, linux-fsdevel

On Mon, 2017-01-16 at 12:42 -0500, Chuck Lever wrote:
> > On Jan 16, 2017, at 12:23 PM, Jeffrey Altman <jaltman@auristor.com>
> > wrote:
> > 
> > > On Sun, Jan 15, 2017 at 06:38:43PM -0500, Oleg Drokin wrote:
> > > >   A container support from filesystems is also very relevant to
> > > > us since Lustre
> > > >   is used more and more in such settings.
> > 
> > I too would be interested in participating in a discussion of 
> > filesystem support for containers.  In particular, how to manage 
> > container identity for network filesystems so that network 
> > filesystem modules such as kafs can be used to provide persistent 
> > location-independent storage to containerized processes.
> 
> I'm also interested in that discussion.

For identity, doesn't the UTS namespace do this?  If not, what is
missing?

James



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Lsf-pc] [LSF/MM ATTEND] FS jitter testing, network caching, Lustre, cluster filesystems.
  2017-01-16 17:32 ` [Lsf-pc] " James Bottomley
@ 2017-01-16 18:02   ` Oleg Drokin
  2017-01-16 18:21     ` James Bottomley
  0 siblings, 1 reply; 19+ messages in thread
From: Oleg Drokin @ 2017-01-16 18:02 UTC (permalink / raw)
  To: James Bottomley; +Cc: lsf-pc, linux-fsdevel, containers


On Jan 16, 2017, at 12:32 PM, James Bottomley wrote:

> On Sun, 2017-01-15 at 18:38 -0500, Oleg Drokin wrote:
>>   A container support from filesystems is also very relevant to us 
>> since Lustre    is used more and more in such settings.
> 
> I've added the containers ML to the cc just in case.  Can you add more
> colour to this, please?  What container support for filesystems do you
> think we need beyond the user namespace in the superblock?

Namespace access is necessary, we might need it before the superblock is there
too (say during mount we might need kerberos credentials fetched to
properly authenticate this mount instance to the server).

Separately, I know that e.g. NFS tries to match underlying mounts to share
them "under the hood", so there might be a single superblock used with
several namespaces potentially, I imagine.
In Lustre it might be beneficial to do something like this too in order to conserve
resources and potentially have better fs cache sharing.
I fact the whole caching thing is somewhat complicated with memory groups too,
and if we allow shared caching between several containers, would become even more complicated.

I am sure there's a bunch of pitfalls there too that we are not realizing yet that
other people have already encountered and it would be useful to find about them.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Lsf-pc] [LSF/MM ATTEND] FS jitter testing, network caching, Lustre, cluster filesystems.
  2017-01-16 18:02   ` Oleg Drokin
@ 2017-01-16 18:21     ` James Bottomley
  2017-01-16 18:39       ` Oleg Drokin
  0 siblings, 1 reply; 19+ messages in thread
From: James Bottomley @ 2017-01-16 18:21 UTC (permalink / raw)
  To: Oleg Drokin; +Cc: linux-fsdevel, containers, lsf-pc

On Mon, 2017-01-16 at 13:02 -0500, Oleg Drokin wrote:
> On Jan 16, 2017, at 12:32 PM, James Bottomley wrote:
> 
> > On Sun, 2017-01-15 at 18:38 -0500, Oleg Drokin wrote:
> > >   A container support from filesystems is also very relevant to
> > > us 
> > > since Lustre    is used more and more in such settings.
> > 
> > I've added the containers ML to the cc just in case.  Can you add
> > more
> > colour to this, please?  What container support for filesystems do
> > you
> > think we need beyond the user namespace in the superblock?
> 
> Namespace access is necessary, we might need it before the superblock 
> is there too (say during mount we might need kerberos credentials 
> fetched to properly authenticate this mount instance to the server).

The superblock namespace is mostly for uid/gid changes across the
kernel <-> filesystem boundary.

The actual container namespaces will already be set up by the time the
mount is done (assuming mount within a container), so you have them all
present.  Usually you get the information for credentials from a
combination of the UTS namespace (host/domain name) and the mount
namespace (credentials provisioned to container filesystem).

Perhaps if you described the actual problem you're seeing rather than
try to relate it to what I said about superblock namespace (which is
probably irrelevant), we could figure out what the issue is.

James


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Lsf-pc] [LSF/MM ATTEND] FS jitter testing, network caching, Lustre, cluster filesystems.
  2017-01-16 18:21     ` James Bottomley
@ 2017-01-16 18:39       ` Oleg Drokin
  2017-01-16 20:58         ` James Bottomley
  0 siblings, 1 reply; 19+ messages in thread
From: Oleg Drokin @ 2017-01-16 18:39 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-fsdevel, containers, lsf-pc


On Jan 16, 2017, at 1:21 PM, James Bottomley wrote:

> On Mon, 2017-01-16 at 13:02 -0500, Oleg Drokin wrote:
>> On Jan 16, 2017, at 12:32 PM, James Bottomley wrote:
>> 
>>> On Sun, 2017-01-15 at 18:38 -0500, Oleg Drokin wrote:
>>>>  A container support from filesystems is also very relevant to
>>>> us 
>>>> since Lustre    is used more and more in such settings.
>>> 
>>> I've added the containers ML to the cc just in case.  Can you add
>>> more
>>> colour to this, please?  What container support for filesystems do
>>> you
>>> think we need beyond the user namespace in the superblock?
>> 
>> Namespace access is necessary, we might need it before the superblock 
>> is there too (say during mount we might need kerberos credentials 
>> fetched to properly authenticate this mount instance to the server).
> 
> The superblock namespace is mostly for uid/gid changes across the
> kernel <-> filesystem boundary.

That's on the kernel<->filesystem, but inside of the FS there might be other
considerations that you might want to attach there.
Say when you are encrypting the traffic to the server you want
to use the right keys.
It's all relatively easy when you have a separate mount there, so
you can store the credentials in the superblock, but we lose on the
cache sharing, for example (I don't know how important that is).

> The actual container namespaces will already be set up by the time the
> mount is done (assuming mount within a container), so you have them all
> present.  Usually you get the information for credentials from a
> combination of the UTS namespace (host/domain name) and the mount
> namespace (credentials provisioned to container filesystem).

Yes, when mounting from a container it's possible to fetch this info
and pass it around, is mounting from outside of the container important too?

> Perhaps if you described the actual problem you're seeing rather than
> try to relate it to what I said about superblock namespace (which is
> probably irrelevant), we could figure out what the issue is.

Right now the deployments are simple and we do not have any major issues
(other than certain caching overzealousness that throws cgroup memory
accounting off), but learning what other problems are there in this space
and what we should be looking for.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Authentication Contexts for network file systems and Containers was Re: [LSF/MM ATTEND] FS jitter testing, network caching, Lustre, cluster filesystems.
  2017-01-16 17:46       ` James Bottomley
@ 2017-01-16 20:39         ` Jeffrey Altman
  2017-01-16 21:03           ` [Lsf-pc] " James Bottomley
  0 siblings, 1 reply; 19+ messages in thread
From: Jeffrey Altman @ 2017-01-16 20:39 UTC (permalink / raw)
  To: James Bottomley; +Cc: containers, lsf-pc, linux-fsdevel


[-- Attachment #1.1: Type: text/plain, Size: 5949 bytes --]

On 1/16/2017 12:46 PM, James Bottomley wrote:
>
> For identity, doesn't the UTS namespace do this?  If not, what is
> missing?
> 
> James

James,

Thanks for posing the question.

Unless I'm missing something, the UTS namespace permits an alternate
'hostname' and NIS 'domainname' to be specified for local visibility to
the processes running in the container.

For an /afs network file system client (kafs, OpenAFS or AuriStorFS) the
kernel module must be able to associate each process with an
authentication context.  The AFS family of file systems have implemented
this binding as part of its Process Authentication Group (PAG) concept.
A PAG is a set of processes that share an authentication context.  The
authentication context includes:

 * network credentials necessary to establish new server connections
   to requisite network-based services.  These include not only the
   backing store for files and directories but any distributed database
   services managing location independence, replication, failover, etc.

 * established server connections to individual servers.  These
   connections are re-used for all requests from a process that shares
   the authentication context.

The network credentials might be a Kerberos ticket, a public key, the
result of a GSS-API exchange, or something else.   It depends on the
requirements of the security class.

The security properties of a PAG are:

 * a new PAG may be created by any process.  When a new PAG is
   created its membership is only the process that created it.

 * a process may remove itself from a PAG that it is a member of.

 * when a child process is created, it inherits a single PAG
   membership from the parent process.

 * it should not be possible to join a process to a PAG after
   process creation.  Although due to implementation limitations
   on some platforms you will find references to a child process
   being able to set the PAG of its parent process.

In the traditional PAG implementations used by AFS unix clients, there
has been a restriction of one PAG membership per process.  The Windows
client implements an extended model which is better suited to
multi-threaded processes.

 * a process can be a member of more than one PAG at a time

 * a process can select one of its PAGs as the default PAG

 * a thread can select one of the process' PAGs as its active
   PAG and if there is no active PAG, the process default PAG
   is used

This extended Authentication Group model works well for processes such
as web servers that need to execute requests in the authentication
context of the delegated identity and be able to rapidly switch contexts
for each request.

It is important to note that the network credentials stored in an
authentication context do not necessarily have any relationship to the
local machine.  It is also important to remember that network
credentials often have a relatively short lifetime and must be renewed
or replaced on a regular basis.

For containers I envision PAGs being used in the following manner:

 * A process running in the context of the host OS or one that has
   access to keys stored in a TPM or other secure keystore
   creates a new PAG for each container it is going to launch.

 * This process will then obtain the initial network credentials
   required by the container processes and store them into the PAG.

 * The initial Container process will then be created as a child
   process and inherits the PAG membership.  Each subsequent child
   process in the Container will in turn inherit the same PAG.

 * Periodically the host OS process will renew the network credentials
   for the PAG.  This avoids the need for the processes in the container
   to have any access to or knowledge of the network identity under
   which it is executing.

 * A process in the container could decide to resign from the
   inherited PAG and create its own PAG using credentials available
   to that process.  For example, a web server running in a container.

The end result is a PAG which spans both the host OS and the Container
processes.  The Container processes might not even know what credentials
they are running with.

Keyrings were created as a storage facility for the network credentials,
https://www.infradead.org/~dhowells/kafs/#keyrings, but keyrings are not
an authentication context.

While a file system can internally create an association between an
authentication content with a file descriptor once it is created and
with pages for write-back, I believe there would be benefit from a more
generic method of tracking authentication contexts in file descriptors
and pages.  In particular would be better defined behavior when a file
has been opened for "write" from processes associated with more than one
authentication context.

PAG creation and PAG token set manipulation in the AFS family of file
systems traditionally took place via the use of path-based ioctls.
Providing equivalent functionality to user-land is an open topic that
David Howells's submitted as a topic for LSF/MM.  See afs(setpag),
VIOC_GETPAG, VIOCUNPAG, VIC*TOK* and VIOCUNLOG:

  https://www.infradead.org/~dhowells/kafs/user_interface.html

While the PAG model has worked well for many decades it does
periodically run into problems with system design that assumes that
local system identities have the same meaning to network resources.  For
example, the problems that AFS is currently experiencing with systemd.
A good description of problem by Jonathan Billings can be found at


https://docs.google.com/document/d/1P27fP1uj-C8QdxDKMKtI-Qh00c5_9zJa4YHjnpB6ODM/pub

I hope this letter is helpful in describing the issues that the AFS
community has experienced and how we believe that authentication context
management can be used to enhance the usability of containers.





[-- Attachment #1.2: jaltman.vcf --]
[-- Type: text/x-vcard, Size: 410 bytes --]

begin:vcard
fn:Jeffrey Altman
n:Altman;Jeffrey
org:AuriStor, Inc.
adr:Suite 6B;;255 West 94Th Street;New York;New York;10025-6985;United States
email;internet:jaltman@auristor.com
title:Founder and CEO
tel;work:+1-212-769-9018
note;quoted-printable:LinkedIn: https://www.linkedin.com/in/jeffreyaltman=0D=0A=
	Skype: jeffrey.e.altman=0D=0A=
	
url:https://www.auristor.com/
version:2.1
end:vcard


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4057 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Lsf-pc] [LSF/MM ATTEND] FS jitter testing, network caching, Lustre, cluster filesystems.
  2017-01-16 18:39       ` Oleg Drokin
@ 2017-01-16 20:58         ` James Bottomley
  2017-01-17  7:00           ` Oleg Drokin
  0 siblings, 1 reply; 19+ messages in thread
From: James Bottomley @ 2017-01-16 20:58 UTC (permalink / raw)
  To: Oleg Drokin; +Cc: linux-fsdevel, containers, lsf-pc

On Mon, 2017-01-16 at 13:39 -0500, Oleg Drokin wrote:
> On Jan 16, 2017, at 1:21 PM, James Bottomley wrote:
> 
> > On Mon, 2017-01-16 at 13:02 -0500, Oleg Drokin wrote:
> > > On Jan 16, 2017, at 12:32 PM, James Bottomley wrote:
> > > 
> > > > On Sun, 2017-01-15 at 18:38 -0500, Oleg Drokin wrote:
> > > > >  A container support from filesystems is also very relevant 
> > > > > to us since Lustre    is used more and more in such settings.
> > > > 
> > > > I've added the containers ML to the cc just in case.  Can you 
> > > > add more colour to this, please?  What container support for 
> > > > filesystems do you think we need beyond the user namespace in
> > > > the superblock?
> > > 
> > > Namespace access is necessary, we might need it before the 
> > > superblock is there too (say during mount we might need kerberos 
> > > credentials fetched to properly authenticate this mount instance 
> > > to the server).
> > 
> > The superblock namespace is mostly for uid/gid changes across the
> > kernel <-> filesystem boundary.
> 
> That's on the kernel<->filesystem, but inside of the FS there might 
> be other considerations that you might want to attach there.
> Say when you are encrypting the traffic to the server you want
> to use the right keys.

So this is the keyring namespace?  It was mentioned at KS, but, as far
as I can tell, not discussed in the Containers MC that followed, so
I've no idea what the status is.

> It's all relatively easy when you have a separate mount there, so
> you can store the credentials in the superblock, but we lose on the
> cache sharing, for example (I don't know how important that is).

It depends what you mean by "cache sharing". If you're thinking of the
page cache, then it all just works, provided the underlying inode
doesn't change.  If you're in the situation where the container
orchestration system knows that two files are the same but there's been
a change of underlying device (fuse passthrough, say) so the inode is
different (the docker double caching problem) and you need some way of
forcibly combining them in the page cache, that was discussed a couple
of years ago, and Virtuozzo people have patches, but I haven't seen
much upstream agreement.

> > The actual container namespaces will already be set up by the time 
> > the mount is done (assuming mount within a container), so you have 
> > them all present.  Usually you get the information for credentials 
> > from a combination of the UTS namespace (host/domain name) and the 
> > mount namespace (credentials provisioned to container filesystem).
> 
> Yes, when mounting from a container it's possible to fetch this info
> and pass it around, is mounting from outside of the container
> important too?

mounting from outside the container usually involved entering the
container and performing the mount.  However the way you enter the
container can pull stuff in from outside (like file descriptors).

> > Perhaps if you described the actual problem you're seeing rather 
> > than try to relate it to what I said about superblock namespace 
> > (which is probably irrelevant), we could figure out what the issue
> > is.
> 
> Right now the deployments are simple and we do not have any major 
> issues (other than certain caching overzealousness that throws cgroup 
> memory accounting off), but learning what other problems are there in 
> this space and what we should be looking for.

You might need to canvas the other users to see if there is anything
viable to discuss.

James



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Lsf-pc] Authentication Contexts for network file systems and Containers was Re: [LSF/MM ATTEND] FS jitter testing, network caching, Lustre, cluster filesystems.
  2017-01-16 20:39         ` Authentication Contexts for network file systems and Containers was " Jeffrey Altman
@ 2017-01-16 21:03           ` James Bottomley
  2017-01-17 16:29             ` Jeffrey Altman
  0 siblings, 1 reply; 19+ messages in thread
From: James Bottomley @ 2017-01-16 21:03 UTC (permalink / raw)
  To: Jeffrey Altman; +Cc: linux-fsdevel, containers, lsf-pc

On Mon, 2017-01-16 at 15:39 -0500, Jeffrey Altman wrote:
> 	Error verifying signature: parse error
> --------------ms000508080908050405010401
> Content-Type: multipart/mixed;
>  boundary="------------049F6401F78BABEBFB8F74AC"
> 
> This is a multi-part message in MIME format.
> --------------049F6401F78BABEBFB8F74AC
> Content-Type: text/plain; charset=utf-8
> Content-Transfer-Encoding: quoted-printable
> 
> On 1/16/2017 12:46 PM, James Bottomley wrote:
> > 
> > For identity, doesn't the UTS namespace do this?  If not, what is
> > missing?
> > =20
> > James
> 
> James,
> 
> Thanks for posing the question.
> 
> Unless I'm missing something, the UTS namespace permits an alternate
> 'hostname' and NIS 'domainname' to be specified for local visibility 
> to the processes running in the container.
> 
> For an /afs network file system client (kafs, OpenAFS or AuriStorFS) 
> the kernel module must be able to associate each process with an
> authentication context.  The AFS family of file systems have 
> implemented this binding as part of its Process Authentication Group 
> (PAG) concept. A PAG is a set of processes that share an 
> authentication context.   The authentication context includes:
[...]

OK, so snipping all the details: it's a per process property and
inherited, I don't even see that it needs anything container specific. 
 The pid namespace should be sufficient to keep any potential security
leaks contained and the inheritance model should just work with
containers.

> While a file system can internally create an association between an
> authentication content with a file descriptor once it is created and
> with pages for write-back, I believe there would be benefit from a 
> more generic method of tracking authentication contexts in file
> descriptors and pages.  In particular would be better defined 
> behavior when a file has been opened for "write" from processes 
> associated with more than one authentication context.

As long as an "authentication" becomes a property of a file descriptor
(like a token), then I don't see any container problems: fds are
namespace blind, so they can be passed between containers and your
authorizations would go with them.  If you need to go back to a process
as part of the authorization, then there would be problems because
processes are namespaced.

> For example, the problems that AFS is currently experiencing with
> systemd. A good description of problem by Jonathan Billings can be
> found at
> 
> 
> https://docs.google.com/document/d/1P27fP1uj-C8QdxDKMKtI-Qh00c5_9zJa4
> YHjn=pB6ODM/pub

This is giving me "Sorry, the file you have requested does not exist."

James


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Lsf-pc] [LSF/MM ATTEND] FS jitter testing, network caching, Lustre, cluster filesystems.
  2017-01-16 20:58         ` James Bottomley
@ 2017-01-17  7:00           ` Oleg Drokin
  2017-01-17 14:26             ` James Bottomley
  2017-01-17 14:56             ` James Bottomley
  0 siblings, 2 replies; 19+ messages in thread
From: Oleg Drokin @ 2017-01-17  7:00 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-fsdevel, containers, lsf-pc


On Jan 16, 2017, at 3:58 PM, James Bottomley wrote:

> On Mon, 2017-01-16 at 13:39 -0500, Oleg Drokin wrote:
>> On Jan 16, 2017, at 1:21 PM, James Bottomley wrote:
>> 
>>> On Mon, 2017-01-16 at 13:02 -0500, Oleg Drokin wrote:
>>>> On Jan 16, 2017, at 12:32 PM, James Bottomley wrote:
>>>> 
>>>>> On Sun, 2017-01-15 at 18:38 -0500, Oleg Drokin wrote:
>>>>>> A container support from filesystems is also very relevant 
>>>>>> to us since Lustre    is used more and more in such settings.
>>>>> 
>>>>> I've added the containers ML to the cc just in case.  Can you 
>>>>> add more colour to this, please?  What container support for 
>>>>> filesystems do you think we need beyond the user namespace in
>>>>> the superblock?
>>>> 
>>>> Namespace access is necessary, we might need it before the 
>>>> superblock is there too (say during mount we might need kerberos 
>>>> credentials fetched to properly authenticate this mount instance 
>>>> to the server).
>>> 
>>> The superblock namespace is mostly for uid/gid changes across the
>>> kernel <-> filesystem boundary.
>> 
>> That's on the kernel<->filesystem, but inside of the FS there might 
>> be other considerations that you might want to attach there.
>> Say when you are encrypting the traffic to the server you want
>> to use the right keys.
> 
> So this is the keyring namespace?  It was mentioned at KS, but, as far
> as I can tell, not discussed in the Containers MC that followed, so
> I've no idea what the status is.

Could be keyring or other mechanisms.

>> It's all relatively easy when you have a separate mount there, so
>> you can store the credentials in the superblock, but we lose on the
>> cache sharing, for example (I don't know how important that is).
> 
> It depends what you mean by "cache sharing". If you're thinking of the
> page cache, then it all just works, provided the underlying inode
> doesn't change.  If you're in the situation where the container

It only "just works" if the superblock is the same, if there's a separate
mount per container with separate superblock, then there's no sharing
at all.
Accounting of said "shared" cache might be interesting too, which
of the containers would you account against? All of them?

>>> Perhaps if you described the actual problem you're seeing rather 
>>> than try to relate it to what I said about superblock namespace 
>>> (which is probably irrelevant), we could figure out what the issue
>>> is.
>> 
>> Right now the deployments are simple and we do not have any major 
>> issues (other than certain caching overzealousness that throws cgroup 
>> memory accounting off), but learning what other problems are there in 
>> this space and what we should be looking for.
> 
> You might need to canvas the other users to see if there is anything
> viable to discuss.

This is what I am trying to do with this email in part, I guess.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Lsf-pc] [LSF/MM ATTEND] FS jitter testing, network caching, Lustre, cluster filesystems.
  2017-01-17  7:00           ` Oleg Drokin
@ 2017-01-17 14:26             ` James Bottomley
  2017-01-17 17:41               ` Oleg Drokin
  2017-01-17 14:56             ` James Bottomley
  1 sibling, 1 reply; 19+ messages in thread
From: James Bottomley @ 2017-01-17 14:26 UTC (permalink / raw)
  To: Oleg Drokin; +Cc: linux-fsdevel, containers, lsf-pc

On Tue, 2017-01-17 at 02:00 -0500, Oleg Drokin wrote:
> On Jan 16, 2017, at 3:58 PM, James Bottomley wrote:
> 
> > On Mon, 2017-01-16 at 13:39 -0500, Oleg Drokin wrote:
> > > On Jan 16, 2017, at 1:21 PM, James Bottomley wrote:
> > > 
> > > > On Mon, 2017-01-16 at 13:02 -0500, Oleg Drokin wrote:
> > > > > On Jan 16, 2017, at 12:32 PM, James Bottomley wrote:
> > > > > 
> > > > > > On Sun, 2017-01-15 at 18:38 -0500, Oleg Drokin wrote:
> > > > > > > A container support from filesystems is also very 
> > > > > > > relevant to us since Lustre    is used more and more in 
> > > > > > > such settings.
> > > > > > 
> > > > > > I've added the containers ML to the cc just in case.  Can 
> > > > > > you add more colour to this, please?  What container 
> > > > > > support for filesystems do you think we need beyond the 
> > > > > > user namespace in the superblock?
> > > > > 
> > > > > Namespace access is necessary, we might need it before the 
> > > > > superblock is there too (say during mount we might need 
> > > > > kerberos credentials fetched to properly authenticate this 
> > > > > mount instance to the server).
> > > > 
> > > > The superblock namespace is mostly for uid/gid changes across 
> > > > the kernel <-> filesystem boundary.
> > > 
> > > That's on the kernel<->filesystem, but inside of the FS there 
> > > might be other considerations that you might want to attach 
> > > there. Say when you are encrypting the traffic to the server you 
> > > want to use the right keys.
> > 
> > So this is the keyring namespace?  It was mentioned at KS, but, as 
> > far as I can tell, not discussed in the Containers MC that 
> > followed, so I've no idea what the status is.
> 
> Could be keyring or other mechanisms.

OK, you need to agree on the mechanism first, then we can discuss if it
needs OS virtualization.  A large number of mechanisms in the kernel
actually don't (because the current OS protections are strong enough)
like file descriptors.  After you understand the mechanism there are
usually four main ways to do OS virtualization:

   1. Do nothing becuase the object doesn't need it (fd)
   2. Label Namespace because it needs isolation (network)
   3. add to user namespace because you need privileged access (setns
      call)
   4. add to cgroup because the resource needs to be accounted (mem)

But before we get into that we need to know the properties of the
mechanism.

James


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Lsf-pc] [LSF/MM ATTEND] FS jitter testing, network caching, Lustre, cluster filesystems.
  2017-01-17  7:00           ` Oleg Drokin
  2017-01-17 14:26             ` James Bottomley
@ 2017-01-17 14:56             ` James Bottomley
  1 sibling, 0 replies; 19+ messages in thread
From: James Bottomley @ 2017-01-17 14:56 UTC (permalink / raw)
  To: Oleg Drokin; +Cc: linux-fsdevel, containers, lsf-pc

On Tue, 2017-01-17 at 02:00 -0500, Oleg Drokin wrote:
> On Jan 16, 2017, at 3:58 PM, James Bottomley wrote:
> 
> > On Mon, 2017-01-16 at 13:39 -0500, Oleg Drokin wrote:
> > > It's all relatively easy when you have a separate mount there, so
> > > you can store the credentials in the superblock, but we lose on 
> > > the cache sharing, for example (I don't know how important that
> > > is).
> > 
> > It depends what you mean by "cache sharing". If you're thinking of 
> > the page cache, then it all just works, provided the underlying 
> > inode doesn't change.  If you're in the situation where the
> > container
> 
> It only "just works" if the superblock is the same, if there's a 
> separate mount per container with separate superblock, then there's 
> no sharing at all. Accounting of said "shared" cache might be 
> interesting too, which of the containers would you account against?
> All of them?

Well, caching is done per address_space, which is can be per inode and 
 as you found, inodes are usually per superblock.  There are (dirty) tr
icks you can do to force sharing at the address space level if you know
it's the same file.  There was also mention of a ksm like mechanism to
force the sharing.  Like I said, it was the VZ people who had patches.

James


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Lsf-pc] Authentication Contexts for network file systems and Containers was Re: [LSF/MM ATTEND] FS jitter testing, network caching, Lustre, cluster filesystems.
  2017-01-16 21:03           ` [Lsf-pc] " James Bottomley
@ 2017-01-17 16:29             ` Jeffrey Altman
  2017-01-17 16:34               ` Trond Myklebust
  0 siblings, 1 reply; 19+ messages in thread
From: Jeffrey Altman @ 2017-01-17 16:29 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-fsdevel, containers, lsf-pc


[-- Attachment #1.1: Type: text/plain, Size: 1727 bytes --]

On 1/16/2017 4:03 PM, James Bottomley wrote:
> [...]
> 
> OK, so snipping all the details: it's a per process property and
> inherited, I don't even see that it needs anything container specific. 
> The pid namespace should be sufficient to keep any potential security
> leaks contained and the inheritance model should just work with
> containers.

Agreed.

>> While a file system can internally create an association between an
>> authentication content with a file descriptor once it is created and
>> with pages for write-back, I believe there would be benefit from a 
>> more generic method of tracking authentication contexts in file
>> descriptors and pages.  In particular would be better defined 
>> behavior when a file has been opened for "write" from processes 
>> associated with more than one authentication context.
> 
> As long as an "authentication" becomes a property of a file descriptor
> (like a token), then I don't see any container problems: fds are
> namespace blind, so they can be passed between containers and your
> authorizations would go with them.  If you need to go back to a process
> as part of the authorization, then there would be problems because
> processes are namespaced.
> 
>> For example, the problems that AFS is currently experiencing with
>> systemd. A good description of problem by Jonathan Billings can be
>> found at
>>
>>
>> https://docs.google.com/document/d/1P27fP1uj-C8QdxDKMKtI-Qh00c5_9zJa4
>> YHjn=pB6ODM/pub
> 
> This is giving me "Sorry, the file you have requested does not exist."

Not sure how an extra '=' got in there.

https://docs.google.com/document/d/1P27fP1uj-C8QdxDKMKtI-Qh00c5_9zJa4YHjnpB6ODM/pub

Jeffrey Altman


[-- Attachment #1.2: jaltman.vcf --]
[-- Type: text/x-vcard, Size: 410 bytes --]

begin:vcard
fn:Jeffrey Altman
n:Altman;Jeffrey
org:AuriStor, Inc.
adr:Suite 6B;;255 West 94Th Street;New York;New York;10025-6985;United States
email;internet:jaltman@auristor.com
title:Founder and CEO
tel;work:+1-212-769-9018
note;quoted-printable:LinkedIn: https://www.linkedin.com/in/jeffreyaltman=0D=0A=
	Skype: jeffrey.e.altman=0D=0A=
	
url:https://www.auristor.com/
version:2.1
end:vcard


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4057 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Lsf-pc] Authentication Contexts for network file systems and Containers was Re: [LSF/MM ATTEND] FS jitter testing, network caching, Lustre, cluster filesystems.
  2017-01-17 16:29             ` Jeffrey Altman
@ 2017-01-17 16:34               ` Trond Myklebust
  2017-01-17 17:10                 ` Jeffrey Altman
  0 siblings, 1 reply; 19+ messages in thread
From: Trond Myklebust @ 2017-01-17 16:34 UTC (permalink / raw)
  To: jaltman, James.Bottomley; +Cc: containers, lsf-pc, linux-fsdevel

On Tue, 2017-01-17 at 11:29 -0500, Jeffrey Altman wrote:
> 	Error verifying signature: parse error
> On 1/16/2017 4:03 PM, James Bottomley wrote:
> > [...]
> > 
> > OK, so snipping all the details: it's a per process property and
> > inherited, I don't even see that it needs anything container
> > specific. 
> > The pid namespace should be sufficient to keep any potential
> > security
> > leaks contained and the inheritance model should just work with
> > containers.
> 
> Agreed.
> 
> > > While a file system can internally create an association between
> > > an
> > > authentication content with a file descriptor once it is created
> > > and
> > > with pages for write-back, I believe there would be benefit from
> > > a 
> > > more generic method of tracking authentication contexts in file
> > > descriptors and pages.  In particular would be better defined 
> > > behavior when a file has been opened for "write" from processes 
> > > associated with more than one authentication context.
> > 
> > As long as an "authentication" becomes a property of a file
> > descriptor
> > (like a token), then I don't see any container problems: fds are
> > namespace blind, so they can be passed between containers and your
> > authorizations would go with them.  If you need to go back to a
> > process
> > as part of the authorization, then there would be problems because
> > processes are namespaced.
> > 
> > > For example, the problems that AFS is currently experiencing with
> > > systemd. A good description of problem by Jonathan Billings can
> > > be
> > > found at
> > > 
> > > 
> > > https://docs.google.com/document/d/1P27fP1uj-C8QdxDKMKtI-Qh00c5_9
> > > zJa4
> > > YHjn=pB6ODM/pub
> > 
> > This is giving me "Sorry, the file you have requested does not
> > exist."
> 
> Not sure how an extra '=' got in there.
> 
> https://docs.google.com/document/d/1P27fP1uj-C8QdxDKMKtI-Qh00c5_9zJa4
> YHjnpB6ODM/pub
> 
> Jeffrey Altman
> 


There is the usual problem when you have to do an upcall in order to
set up the authentication context for session based protocols, such as
RPCSEC_GSS.

-- 
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.myklebust@primarydata.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Lsf-pc] Authentication Contexts for network file systems and Containers was Re: [LSF/MM ATTEND] FS jitter testing, network caching, Lustre, cluster filesystems.
  2017-01-17 16:34               ` Trond Myklebust
@ 2017-01-17 17:10                 ` Jeffrey Altman
  0 siblings, 0 replies; 19+ messages in thread
From: Jeffrey Altman @ 2017-01-17 17:10 UTC (permalink / raw)
  To: Trond Myklebust, James.Bottomley; +Cc: containers, lsf-pc, linux-fsdevel


[-- Attachment #1.1: Type: text/plain, Size: 1025 bytes --]

On 1/17/2017 11:34 AM, Trond Myklebust wrote:
>>
>> https://docs.google.com/document/d/1P27fP1uj-C8QdxDKMKtI-Qh00c5_9zJa4
>> YHjnpB6ODM/pub
>>
>> Jeffrey Altman
>>
> 
> 
> There is the usual problem when you have to do an upcall in order to
> set up the authentication context for session based protocols, such as
> RPCSEC_GSS.
> 

Trond,

Thanks for the thought but that is not the issue here.   systemd --user
launches processes as the user but those processes do not share the same
keyring as the processes started from the pam stack at logon.
Since the keyring doesn't match, the processes started by systemd --user
are in a different authentication context.

Setting the effective 'uid' is insufficient to gain access to the proper
authentication context.

I agree that upcalls are often a problem which is why the AFS family of
protocols does not use them.  Typically a process will be created in
userland for each PAG to push refreshed credentials to the kernel module.

Jeffrey Altman


[-- Attachment #1.2: jaltman.vcf --]
[-- Type: text/x-vcard, Size: 410 bytes --]

begin:vcard
fn:Jeffrey Altman
n:Altman;Jeffrey
org:AuriStor, Inc.
adr:Suite 6B;;255 West 94Th Street;New York;New York;10025-6985;United States
email;internet:jaltman@auristor.com
title:Founder and CEO
tel;work:+1-212-769-9018
note;quoted-printable:LinkedIn: https://www.linkedin.com/in/jeffreyaltman=0D=0A=
	Skype: jeffrey.e.altman=0D=0A=
	
url:https://www.auristor.com/
version:2.1
end:vcard


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4057 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Lsf-pc] [LSF/MM ATTEND] FS jitter testing, network caching, Lustre, cluster filesystems.
  2017-01-17 14:26             ` James Bottomley
@ 2017-01-17 17:41               ` Oleg Drokin
  0 siblings, 0 replies; 19+ messages in thread
From: Oleg Drokin @ 2017-01-17 17:41 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-fsdevel, containers, lsf-pc


On Jan 17, 2017, at 9:26 AM, James Bottomley wrote:

> On Tue, 2017-01-17 at 02:00 -0500, Oleg Drokin wrote:
>> On Jan 16, 2017, at 3:58 PM, James Bottomley wrote:
>> 
>>> On Mon, 2017-01-16 at 13:39 -0500, Oleg Drokin wrote:
>>>> On Jan 16, 2017, at 1:21 PM, James Bottomley wrote:
>>>> 
>>>>> On Mon, 2017-01-16 at 13:02 -0500, Oleg Drokin wrote:
>>>>>> On Jan 16, 2017, at 12:32 PM, James Bottomley wrote:
>>>>>> 
>>>>>>> On Sun, 2017-01-15 at 18:38 -0500, Oleg Drokin wrote:
>>>>>>>> A container support from filesystems is also very 
>>>>>>>> relevant to us since Lustre    is used more and more in 
>>>>>>>> such settings.
>>>>>>> 
>>>>>>> I've added the containers ML to the cc just in case.  Can 
>>>>>>> you add more colour to this, please?  What container 
>>>>>>> support for filesystems do you think we need beyond the 
>>>>>>> user namespace in the superblock?
>>>>>> 
>>>>>> Namespace access is necessary, we might need it before the 
>>>>>> superblock is there too (say during mount we might need 
>>>>>> kerberos credentials fetched to properly authenticate this 
>>>>>> mount instance to the server).
>>>>> 
>>>>> The superblock namespace is mostly for uid/gid changes across 
>>>>> the kernel <-> filesystem boundary.
>>>> 
>>>> That's on the kernel<->filesystem, but inside of the FS there 
>>>> might be other considerations that you might want to attach 
>>>> there. Say when you are encrypting the traffic to the server you 
>>>> want to use the right keys.
>>> 
>>> So this is the keyring namespace?  It was mentioned at KS, but, as 
>>> far as I can tell, not discussed in the Containers MC that 
>>> followed, so I've no idea what the status is.
>> 
>> Could be keyring or other mechanisms.
> 
> OK, you need to agree on the mechanism first, then we can discuss if it
> needs OS virtualization.  A large number of mechanisms in the kernel
> actually don't (because the current OS protections are strong enough)
> like file descriptors.  After you understand the mechanism there are
> usually four main ways to do OS virtualization:
> 
>   1. Do nothing becuase the object doesn't need it (fd)
>   2. Label Namespace because it needs isolation (network)
>   3. add to user namespace because you need privileged access (setns
>      call)
>   4. add to cgroup because the resource needs to be accounted (mem)
> 
> But before we get into that we need to know the properties of the
> mechanism.

Right, I just checked and we actually are using keyring that is a per namespace
even for kerberos, so that's enough for us there so far as long as we can attach
to it (and we can when we know where from did the request originate from).


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2017-01-17 17:42 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-15 23:38 [LSF/MM ATTEND] FS jitter testing, network caching, Lustre, cluster filesystems Oleg Drokin
2017-01-16 17:17 ` J. Bruce Fields
2017-01-16 17:23   ` Jeffrey Altman
2017-01-16 17:42     ` Chuck Lever
2017-01-16 17:46       ` James Bottomley
2017-01-16 20:39         ` Authentication Contexts for network file systems and Containers was " Jeffrey Altman
2017-01-16 21:03           ` [Lsf-pc] " James Bottomley
2017-01-17 16:29             ` Jeffrey Altman
2017-01-17 16:34               ` Trond Myklebust
2017-01-17 17:10                 ` Jeffrey Altman
2017-01-16 17:32 ` [Lsf-pc] " James Bottomley
2017-01-16 18:02   ` Oleg Drokin
2017-01-16 18:21     ` James Bottomley
2017-01-16 18:39       ` Oleg Drokin
2017-01-16 20:58         ` James Bottomley
2017-01-17  7:00           ` Oleg Drokin
2017-01-17 14:26             ` James Bottomley
2017-01-17 17:41               ` Oleg Drokin
2017-01-17 14:56             ` James Bottomley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).