From: Manfred Spraul <manfred@colorfullife.com>
To: Davidlohr Bueso <dave@stgolabs.net>,
Waiman Long <longman@redhat.com>,
Michael Kerrisk <mtk.manpages@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>,
"Luis R. Rodriguez" <mcgrof@kernel.org>,
Kees Cook <keescook@chromium.org>,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
Andrew Morton <akpm@linux-foundation.org>,
Al Viro <viro@zeniv.linux.org.uk>,
Matthew Wilcox <willy@infradead.org>,
Stanislav Kinsbursky <skinsbursky@parallels.com>,
Linux Containers <containers@lists.linux-foundation.org>,
linux-api@vger.kernel.org
Subject: Re: [RFC][PATCH] ipc: Remove IPCMNI
Date: Thu, 29 Mar 2018 10:47:45 +0200 [thread overview]
Message-ID: <3e201de2-bed2-6f7d-0783-700d095142e0@colorfullife.com> (raw)
In-Reply-To: <20180329021409.gcjjrmviw2lckbfk@linux-n805>
Hello together,
On 03/29/2018 04:14 AM, Davidlohr Bueso wrote:
> Cc'ing mtk, Manfred and linux-api.
>
> See below.
>
> On Thu, 15 Mar 2018, Waiman Long wrote:
>
>> On 03/15/2018 03:00 PM, Eric W. Biederman wrote:
>>> Waiman Long <longman@redhat.com> writes:
>>>
>>>> On 03/14/2018 08:49 PM, Eric W. Biederman wrote:
>>>>> The define IPCMNI was originally the size of a statically sized
>>>>> array in
>>>>> the kernel and that has long since been removed. Therefore there
>>>>> is no
>>>>> fundamental reason for IPCMNI.
>>>>>
>>>>> The only remaining use IPCMNI serves is as a convoluted way to format
>>>>> the ipc id to userspace. It does not appear that anything except for
>>>>> the CHECKPOINT_RESTORE code even cares about this variety of
>>>>> assignment
>>>>> and the CHECKPOINT_RESTORE code only cares about this weirdness
>>>>> because
>>>>> it has to restore these peculiar ids.
>>>>>
My assumption is that if an array is recreated, it should get a
different id.
a=semget(1234,,);
semctl(a,,IPC_RMID);
b=semget(1234,,);
now a!=b.
Rational: semop() calls only refer to the array by the id.
If there is a stale process in the system that tries to access the "old"
array and the new array has the same id, then the locking gets corrupted.
>>>>> Therefore make the assignment of ipc ids match the description in
>>>>> Advanced Programming in the Unix Environment and assign the next id
>>>>> until INT_MAX is hit then loop around to the lower ids.
>>>>>
Ok, sounds good.
That way we really cycle through INT_MAX, right now a==b would happen
after 128k RMID calls.
>>>>> This can be implemented trivially with the current code using
>>>>> idr_alloc_cyclic.
>>>>>
Is there a performance impact?
Right now, the idr tree is only large if there are lots of objects.
What happens if we have only 1 object, with id=INT_MAX-1?
semop() that do not sleep are fairly fast.
The same applies for msgsnd/msgrcv, if the message is small enough.
@Davidlohr:
Do you know if there are application that frequently call semop() and it
doesn't have to sleep?
From the scalability that was pushed into the kernel, I assume that
this exists.
I have myself only checked postgresql, and postgresql always sleeps.
(and this was long ago)
>>>>> To make it possible to keep checkpoint/restore working I have renamed
>>>>> the sysctls from xxx_next_id to xxx_nextid. That is enough change
>>>>> that
>>>>> a smart CRIU implementation can see that what is exported has
>>>>> changed,
>>>>> and act accordingly. New kernels will be able to restore the old
>>>>> id's.
>>>>>
>>>>> This code still needs some real world testing to verify my
>>>>> assumptions.
>>>>> And some work with the CRIU implementations to actually add the code
>>>>> that deals with the new for of id assignment.
>>>>>
It means that all existing checkpoint/restore application will not work
with a new kernel.
Everyone must first update the checkpoint/restore application, then
update the kernel.
Is this acceptable?
--
Manfred
next prev parent reply other threads:[~2018-03-29 8:47 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-03-12 20:15 [PATCH v4 0/6] ipc: Clamp *mni to the real IPCMNI limit Waiman Long
2018-03-12 20:15 ` [PATCH v4 1/6] sysctl: Add flags to support min/max range clamping Waiman Long
2018-03-12 20:44 ` Luis R. Rodriguez
2018-03-12 20:48 ` Waiman Long
2018-03-13 17:46 ` Eric W. Biederman
2018-03-13 18:49 ` Waiman Long
2018-03-12 20:15 ` [PATCH v4 2/6] proc/sysctl: Check for invalid flags bits Waiman Long
2018-03-12 20:46 ` Luis R. Rodriguez
2018-03-12 20:54 ` Waiman Long
2018-03-12 20:59 ` Luis R. Rodriguez
2018-03-12 21:02 ` Waiman Long
2018-03-12 20:52 ` Andrew Morton
2018-03-12 22:12 ` Waiman Long
2018-03-12 22:42 ` Andrew Morton
2018-03-12 20:15 ` [PATCH v4 3/6] sysctl: Warn when a clamped sysctl parameter is set out of range Waiman Long
2018-03-12 20:50 ` Luis R. Rodriguez
2018-03-12 21:07 ` Waiman Long
2018-03-12 21:00 ` Andrew Morton
2018-03-12 21:04 ` Waiman Long
2018-03-12 20:15 ` [PATCH v4 4/6] ipc: Clamp msgmni and shmmni to the real IPCMNI limit Waiman Long
2018-03-13 18:17 ` Eric W. Biederman
2018-03-13 18:39 ` Waiman Long
2018-03-13 20:29 ` Eric W. Biederman
2018-03-13 21:06 ` Waiman Long
2018-03-15 0:49 ` [RFC][PATCH] ipc: Remove IPCMNI Eric W. Biederman
2018-03-15 17:02 ` Waiman Long
2018-03-15 19:00 ` Eric W. Biederman
2018-03-15 21:46 ` Waiman Long
2018-03-29 2:14 ` Davidlohr Bueso
2018-03-29 8:47 ` Manfred Spraul [this message]
2018-03-29 10:56 ` Matthew Wilcox
2018-03-29 18:07 ` Manfred Spraul
2018-03-29 18:52 ` Eric W. Biederman
2018-03-29 19:32 ` Matthew Wilcox
2018-03-29 20:08 ` Eric W. Biederman
2018-03-15 19:45 ` Matthew Wilcox
2018-03-12 20:15 ` [PATCH v4 5/6] ipc: Clamp semmni to the real IPCMNI limit Waiman Long
2018-03-12 20:52 ` Luis R. Rodriguez
2018-03-12 20:59 ` Waiman Long
2018-03-12 20:15 ` [PATCH v4 6/6] test_sysctl: Add range clamping test Waiman Long
2018-03-12 20:53 ` Luis R. Rodriguez
2018-03-12 21:00 ` Waiman Long
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3e201de2-bed2-6f7d-0783-700d095142e0@colorfullife.com \
--to=manfred@colorfullife.com \
--cc=akpm@linux-foundation.org \
--cc=containers@lists.linux-foundation.org \
--cc=dave@stgolabs.net \
--cc=ebiederm@xmission.com \
--cc=keescook@chromium.org \
--cc=linux-api@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=longman@redhat.com \
--cc=mcgrof@kernel.org \
--cc=mtk.manpages@gmail.com \
--cc=skinsbursky@parallels.com \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).