linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Al Viro <viro@zeniv.linux.org.uk>
To: Jens Axboe <axboe@kernel.dk>
Cc: Jann Horn <jannh@google.com>,
	linux-aio@kvack.org, linux-block@vger.kernel.org,
	Linux API <linux-api@vger.kernel.org>,
	hch@lst.de, jmoyer@redhat.com, avi@scylladb.com,
	linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH 13/18] io_uring: add file set registration
Date: Wed, 6 Feb 2019 00:56:38 +0000	[thread overview]
Message-ID: <20190206005638.GU2217@ZenIV.linux.org.uk> (raw)
In-Reply-To: <40b27e78-9ee8-1395-feb3-a73aac87c9a7@kernel.dk>

On Tue, Feb 05, 2019 at 12:08:25PM -0700, Jens Axboe wrote:
> Proof is in the pudding, here's the main commit introducing io_uring
> and now wiring it up to the AF_UNIX garbage collection:
> 
> http://git.kernel.dk/cgit/linux-block/commit/?h=io_uring&id=158e6f42b67d0abe9ee84886b96ca8c4b3d3dfd5
> 
> How does that look?

In a word - wrong.  Some theory: garbage collector assumes that there is
a subset of file references such that
	* for all files with such references there's an associated unix_sock.
	* all such references are stored in SCM_RIGHTS datagrams that can be
found by the garbage collector (currently: for data-bearing AF_UNIX sockets -
queued SCM_RIGHTS datagrams, for listeners - SCM_RIGHTS datagrams sent via
yet-to-be-accepted connections).
	* there is an efficient way to count those references for given file
(->inflight of the corresponding unix_sock).
	* removal of those references would render the graph acyclic.
	* file can _NOT_ be subject to syscalls unless there are references
to it outside of that subset.

unix_inflight() moves a reference into the subset
unix_notinflight() moves a reference out of the subset
activity that might add such references ought to call wait_for_unix_gc() first
(basically, to stall the massive insertions when gc is running).

Note that unix_gc() does *NOT* work in terms of dropping file references -
the primary effect is locating the SCM_RIGHTS datagrams that can be disposed
of and taking them out.  It simply won't do anything to your file references,
no matter what.  Add a printk into your ->release() and try to register io_uring
descriptor into itself, then close it.  And observe ->release() not being
called for that object.  Ever.

PS: The algorithm used by unix_gc() is basically this -

	grab unix_gc_lock (giving exclusion with unix_inflight/unix_notinflight
			   and stabilizing ->inflight counters)

	Candidates = {}
	for all unix_sock u such that u->inflight > 0
		if file corresponding to u has no other references
			Candidates += u

	/* everything else already is reachable; due to unix_gc_lock these
	   can't die or get syscall-visible references under us */
	Might_Die = Candidates

	/* invariant to maintain: for u in Candidates u->inflight will be equal
	   to the number of references from SCM_RIGHTS datagrams *except*
	   those immediately reachable from elements of Might_Die */

	for all u in Candidates
		for each file reference v in SCM_RIGHTS datagrams
					immediately reachable from u
			if v in Candidates
				v->inflight--

	To_Scan = ()	// stuff reachable from those must live
	for all u in Might_Die
		if u->inflight > 0
			queue u into To_Scan

	while To_Scan is non-empty
		u = dequeue(To_Scan)
		Might_Die -= u
		for each file reference v in SCM_RIGHTS datagrams
					immediately reachable from u
			if v in Candidates
				v->inflight++	// maintain the invariant
				if v in Might_Die
					queue v into To_Scan

	/* at that point nothing in Might_Die is reachable from the outside */

	/* restore the original values of ->inflight */
	for all u in Might_Die
		for each file reference v in SCM_RIGHTS datagrams
					immediately reachable from u
			if v in Candidates
				v->inflight++

	hitlist = ()
	for all u in Might_Die
		for each SCM_RIGHTS datagram D immediately reachable from u
			if D contains references to something in Candidates
				move D to hitlist
	/* all those datagrams would've never become reachable */

	drop unix_gc_lock

	discard all datagrams in hitlist.

  parent reply	other threads:[~2019-02-06  0:56 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20190129192702.3605-1-axboe@kernel.dk>
     [not found] ` <20190129192702.3605-14-axboe@kernel.dk>
2019-01-30  1:29   ` [PATCH 13/18] io_uring: add file set registration Jann Horn
2019-01-30 15:35     ` Jens Axboe
2019-02-04  2:56     ` Al Viro
2019-02-05  2:19       ` Jens Axboe
2019-02-05 17:57         ` Jens Axboe
2019-02-05 19:08           ` Jens Axboe
2019-02-06  0:27             ` Jens Axboe
2019-02-06  1:01               ` Al Viro
2019-02-06 17:56                 ` Jens Axboe
2019-02-07  4:05                   ` Al Viro
2019-02-07 16:14                     ` Jens Axboe
2019-02-07 16:30                       ` Al Viro
2019-02-07 16:35                         ` Jens Axboe
2019-02-07 16:51                         ` Al Viro
2019-02-06  0:56             ` Al Viro [this message]
2019-02-06 13:41               ` Jens Axboe
2019-02-07  4:00                 ` Al Viro
2019-02-07  9:22                   ` Miklos Szeredi
2019-02-07 13:31                     ` Al Viro
2019-02-07 14:20                       ` Miklos Szeredi
2019-02-07 15:20                         ` Al Viro
2019-02-07 15:27                           ` Miklos Szeredi
2019-02-07 16:26                             ` Al Viro
2019-02-07 19:08                               ` Miklos Szeredi
2019-02-07 18:45                   ` Jens Axboe
2019-02-07 18:58                     ` Jens Axboe
2019-02-11 15:55                     ` Jonathan Corbet
2019-02-11 17:35                       ` Al Viro
2019-02-11 20:33                         ` Jonathan Corbet
2019-01-23 15:35 [PATCHSET v7] io_uring IO interface Jens Axboe
2019-01-23 15:35 ` [PATCH 13/18] io_uring: add file set registration Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190206005638.GU2217@ZenIV.linux.org.uk \
    --to=viro@zeniv.linux.org.uk \
    --cc=avi@scylladb.com \
    --cc=axboe@kernel.dk \
    --cc=hch@lst.de \
    --cc=jannh@google.com \
    --cc=jmoyer@redhat.com \
    --cc=linux-aio@kvack.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).