linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: bfields <bfields@fieldses.org>
To: Richard Weinberger <richard@nod.at>
Cc: linux-nfs <linux-nfs@vger.kernel.org>,
	david <david@sigma-star.at>,
	luis turcitu <luis.turcitu@appsbroker.com>,
	david young <david.young@appsbroker.com>,
	david oberhollenzer <david.oberhollenzer@sigma-star.at>,
	trond myklebust <trond.myklebust@hammerspace.com>,
	anna schumaker <anna.schumaker@netapp.com>,
	chris chilvers <chris.chilvers@appsbroker.com>
Subject: Re: [RFC PATCH 1/6] Implement reexport helper library
Date: Wed, 9 Mar 2022 10:28:46 -0500	[thread overview]
Message-ID: <20220309152846.GC6633@fieldses.org> (raw)
In-Reply-To: <2098326047.128064.1646838131178.JavaMail.zimbra@nod.at>

On Wed, Mar 09, 2022 at 04:02:11PM +0100, Richard Weinberger wrote:
> Bruce,
> 
> ----- Ursprüngliche Mail -----
> > Von: "bfields" <bfields@fieldses.org>
> >> Concurrent access to the database is synchronized using a shared rwlock (on
> >> shared memory).
> >> reexpdb_init.lock is used to make sure that creating and initializing the shared
> >> memory/lock
> >> happens once.
> > 
> > Could you point me to sqlite documentation that explains why the user
> > would need to do their own locking?
> 
> https://www.sqlite.org/rescode.html#busy
>  
> > I assumed sqlite would do any necessary locking for you.  It seems like
> > a core function for a database.
> 
> Well, SQLite does locking but no queuing.
> So, as soon somebody is writing the data base it is locked and all other
> read/writes will fail either with SQLITE_BUSY or SQLITE_LOCKED.
> It is up to the user how to react on that.
>  
> That's why I chose to use a shared rwlock where a task can *wait* upon
> conflicting access.
> 
> Maybe there is a better way do it, dunno.

Oh, got it, thanks for the explanation.

Assuming writes are rare, maybe a dumb retry loop would be adequate.
Sounds like that's what we'd need anyway if we were to share the
database between cooperating re-export servers.  (Would we have a
performance problem in that case, if several reexport servers start at
once and all start trying to populate the shared database?  I don't
know.)

Anyway, it's a judgement call, fair enough.  Might be worth a brief
comment, at least.

> >> > What are the two tables used for?  Naively I'd've thought the
> >> > "subvolumes" table was redundant.
> >> 
> >> fsidnums is used to store generated and predefined fsid numbers.
> >> It is only used in reexport modes auto-fsidnum and predefined-fsidnum.
> >> 
> >> subvolumes contains a list of subvolumes which a are likely in use by
> >> a client. Up start all these paths will get touched such that they can
> >> be exported.
> > 
> > The fsidnums also contains that same list of paths, right?  So I don't
> > understand why we need both.
> 
> In the current design generated fsidnums will stay forever while the paths
> in subvolumes can get cleaned.
>  
> > Also, if we're depending on touching all the paths on startup, something
> > is wrong.
> 
> I think we talked about that already and agreed that it should work without
> touching. So far I didn't had a chance to investigate into this.

OK.  Do you think you could look into that, and strip this down to the
one auto-fsidnum case, and then continue the discussion?  I think that'd
clarify things.

As I say, I wouldn't necessarily be opposed to later adding a reexport=
option back in, but for now I'd first like to see if we can find the
simplest patches that will solve the problem in one good-enough way.

> > What we want to do is touch the path when we get an upcall for the given
> > fsid.  That way we don't have to assume, for example, that the system
> > will never expire mounts that haven't been used recently.
> > 
> >> >> +/*
> >> >> + * This query is a little tricky. We use SQL to find and claim the smallest
> >> >> free fsid number.
> >> > 
> >> > Yes, that is a little tricky.  Is it necessary?  My SQL Is rusty, but
> >> > the database should be able to pick a unique value for us, shouldn't it?
> >> 
> >> SQLite can generate a unique value, but we cannot select the range.
> >> It will give a value between 0 and 2^64.
> >> We need an id between 1 and 2^32.
> > 
> > Surely that CHECK constraint doesn't somehow cause sqlite to generate
> > non-unique primary keys?  At worst I'd think it would cause INSERTs to
> > fail if the ordinary primary-key-choosing algorithm chooses something
> > over 2^32.
> 
> The CHECK is just a paranoid check. My SQL INSERT generates ids starting with 1.
> Sure, if you run it 2^32 times, it will fail due to the CHECK.

OK.

--b.

  reply	other threads:[~2022-03-09 15:28 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-17 13:15 [RFC PATCH 0/6] nfs-utils: Improving NFS re-exports Richard Weinberger
2022-02-17 13:15 ` [RFC PATCH 1/6] Implement reexport helper library Richard Weinberger
2022-03-08 21:44   ` J. Bruce Fields
2022-03-09  9:43     ` Richard Weinberger
2022-03-09 14:19       ` bfields
2022-03-09 15:02         ` Richard Weinberger
2022-03-09 15:28           ` bfields [this message]
2022-02-17 13:15 ` [RFC PATCH 2/6] exports: Implement new export option reexport= Richard Weinberger
2022-03-08 22:10   ` J. Bruce Fields
2022-03-09  9:43     ` Richard Weinberger
2022-02-17 13:15 ` [RFC PATCH 3/6] export: Implement logic behind reexport= Richard Weinberger
2022-02-17 13:15 ` [RFC PATCH 4/6] export: Record mounted volumes Richard Weinberger
2022-02-17 13:15 ` [RFC PATCH 5/6] nfsd: statfs() every known subvolume upon start Richard Weinberger
2022-02-17 13:15 ` [RFC PATCH 6/6] export: Garbage collect orphaned subvolumes " Richard Weinberger
2022-02-17 16:33 ` [RFC PATCH 0/6] nfs-utils: Improving NFS re-exports J. Bruce Fields
2022-02-17 17:27   ` Richard Weinberger
2022-02-17 19:27     ` bfields
2022-02-17 20:15       ` Richard Weinberger
2022-02-17 20:18         ` bfields
2022-02-17 20:29           ` Richard Weinberger
2022-03-07  9:25   ` Richard Weinberger
2022-03-07 22:29     ` bfields
2022-04-19 20:20       ` Steve Dickson
2022-04-19 20:31         ` Richard Weinberger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220309152846.GC6633@fieldses.org \
    --to=bfields@fieldses.org \
    --cc=anna.schumaker@netapp.com \
    --cc=chris.chilvers@appsbroker.com \
    --cc=david.oberhollenzer@sigma-star.at \
    --cc=david.young@appsbroker.com \
    --cc=david@sigma-star.at \
    --cc=linux-nfs@vger.kernel.org \
    --cc=luis.turcitu@appsbroker.com \
    --cc=richard@nod.at \
    --cc=trond.myklebust@hammerspace.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).