All of lore.kernel.org
 help / color / mirror / Atom feed
From: Davidlohr Bueso <dave-h16yJtLeMjHk1uMJSBkQmQ@public.gmane.org>
To: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Linux Containers
	<containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org>,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org,
	Davidlohr Bueso <dbueso-l3A5Bk7waGM@public.gmane.org>
Subject: Re: [RFC PATCH] namespaces: Avoid task_lock when setting tsk->nsproxy
Date: Fri, 26 Feb 2016 10:19:26 -0800	[thread overview]
Message-ID: <20160226181926.GA4166@linux-uzut.site> (raw)
In-Reply-To: <87wpprgyz7.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>

On Fri, 26 Feb 2016, Eric W. Biederman wrote:

>I would really appreciate it, if you are going to come up with a new
>locking primitive that you implement that locking primitive separately.

Using some rmw insn to avoid a lock is hardly implementing a new locking
primitive. This was never the purpose, and we use similar techniques throughout
the kernel to force certain paths.

>A fresh locking primitive comingled with other code is a good way to get
>something wrong, and generally to get code that is not well maintained
>because there is not a separation of concerns.
>
>Furthermore there is a world of difference between a 1+jiffy delay
>waiting for rcu_synchronize and the short hold times of task_lock.

I am aware of that, this is an optimization only based on that task_lock
is mainly unconteded, and hoped I made it clear in the changelog.

>Looking at your locking it appears to be a complete hack.  Always taking
>task_lock on read (but now you have an extra atomic op where you call
>xchg on the pointer).  Just calling compare_xchg on write if there
>are no concurrent readers.

The task on read is considered a slowpath, the extra atomic op is what I
had mentioned in the overhead, but for the fastpath we save two ops, otoh.

>For raw performance you would do better to have a separate lock, or
>potentially a specialized locking primitive that just used the low
>pointer bits.
>
>I don't know what motivates this work are you actually seeing
>performance problems with setns?

Motivation is towards other alloc_lock users, more than nsproxy itself.
I had just come across your mentioned change from using rcu and given
that there is practically 0 concurrency, though we could do better.

>I am very uncomofortable with a novel, and very awkward new locking
>primitive that does not clearly show improvements in even it's target
>workloads.
>
>Hmm.  After thinking about this a little more your set_reader_nsproxy is
>completely unsafe.  Most readers of nsproxy are from the same task.
>Changing the low bits of the pointer of from another task will cause
>those readers to segfault, and if not segfault they are reading from the
>wrong memory locations.

Ok, this part I was not sure of and was the simplest way I could think of
atomically checking for readers with a single [Rmw]. fwiw, as an alternative
to the pointer tagging I had considered doing some sorf of spin_is_locked()
check on the alloc_lock for the writer, but that has its own can of worms on
SMP anyway.

Thanks,
Davidlohr

WARNING: multiple messages have this Message-ID (diff)
From: Davidlohr Bueso <dave@stgolabs.net>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Davidlohr Bueso <dbueso@suse.de>,
	viro@zeniv.linux.org.uk, linux-kernel@vger.kernel.org,
	Linux Containers <containers@lists.linux-foundation.org>,
	"Serge E. Hallyn" <serge@hallyn.com>
Subject: Re: [RFC PATCH] namespaces: Avoid task_lock when setting tsk->nsproxy
Date: Fri, 26 Feb 2016 10:19:26 -0800	[thread overview]
Message-ID: <20160226181926.GA4166@linux-uzut.site> (raw)
In-Reply-To: <87wpprgyz7.fsf@x220.int.ebiederm.org>

On Fri, 26 Feb 2016, Eric W. Biederman wrote:

>I would really appreciate it, if you are going to come up with a new
>locking primitive that you implement that locking primitive separately.

Using some rmw insn to avoid a lock is hardly implementing a new locking
primitive. This was never the purpose, and we use similar techniques throughout
the kernel to force certain paths.

>A fresh locking primitive comingled with other code is a good way to get
>something wrong, and generally to get code that is not well maintained
>because there is not a separation of concerns.
>
>Furthermore there is a world of difference between a 1+jiffy delay
>waiting for rcu_synchronize and the short hold times of task_lock.

I am aware of that, this is an optimization only based on that task_lock
is mainly unconteded, and hoped I made it clear in the changelog.

>Looking at your locking it appears to be a complete hack.  Always taking
>task_lock on read (but now you have an extra atomic op where you call
>xchg on the pointer).  Just calling compare_xchg on write if there
>are no concurrent readers.

The task on read is considered a slowpath, the extra atomic op is what I
had mentioned in the overhead, but for the fastpath we save two ops, otoh.

>For raw performance you would do better to have a separate lock, or
>potentially a specialized locking primitive that just used the low
>pointer bits.
>
>I don't know what motivates this work are you actually seeing
>performance problems with setns?

Motivation is towards other alloc_lock users, more than nsproxy itself.
I had just come across your mentioned change from using rcu and given
that there is practically 0 concurrency, though we could do better.

>I am very uncomofortable with a novel, and very awkward new locking
>primitive that does not clearly show improvements in even it's target
>workloads.
>
>Hmm.  After thinking about this a little more your set_reader_nsproxy is
>completely unsafe.  Most readers of nsproxy are from the same task.
>Changing the low bits of the pointer of from another task will cause
>those readers to segfault, and if not segfault they are reading from the
>wrong memory locations.

Ok, this part I was not sure of and was the simplest way I could think of
atomically checking for readers with a single [Rmw]. fwiw, as an alternative
to the pointer tagging I had considered doing some sorf of spin_is_locked()
check on the alloc_lock for the writer, but that has its own can of worms on
SMP anyway.

Thanks,
Davidlohr

  parent reply	other threads:[~2016-02-26 18:19 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-26 10:10 [RFC PATCH] namespaces: Avoid task_lock when setting tsk->nsproxy Davidlohr Bueso
     [not found] ` <1456481448-3925-1-git-send-email-dbueso-l3A5Bk7waGM@public.gmane.org>
2016-02-26 17:31   ` Eric W. Biederman
2016-02-26 17:31     ` Eric W. Biederman
     [not found]     ` <87wpprgyz7.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2016-02-26 18:19       ` Davidlohr Bueso [this message]
2016-02-26 18:19         ` Davidlohr Bueso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160226181926.GA4166@linux-uzut.site \
    --to=dave-h16yjtlemjhk1umjsbkqmq@public.gmane.org \
    --cc=containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
    --cc=dbueso-l3A5Bk7waGM@public.gmane.org \
    --cc=ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org \
    --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.