linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* lock_kernel() usage and sync_*() functions
@ 2001-03-22  3:11 Nigel Gamble
  2001-03-22  3:42 ` Linus Torvalds
  0 siblings, 1 reply; 5+ messages in thread
From: Nigel Gamble @ 2001-03-22  3:11 UTC (permalink / raw)
  To: linux-kernel

Why is the kernel lock held around sync_supers() and sync_inodes() in
sync_old_buffers() and fsync_dev(), but not in sync_dev()?  Is it just
to serialize calls to these functions, or is there some other reason?

Since this use of the BKL is one of the causes of high preemption
latency in a preemptible kernel, I'm hoping it would be OK to replace
them with a semaphore.  Please let me know if this is not the case.

Thanks!

Nigel Gamble                                    nigel@nrg.org
Mountain View, CA, USA.                         http://www.nrg.org/

MontaVista Software                             nigel@mvista.com


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: lock_kernel() usage and sync_*() functions
  2001-03-22  3:11 lock_kernel() usage and sync_*() functions Nigel Gamble
@ 2001-03-22  3:42 ` Linus Torvalds
  2001-03-22  5:18   ` Alexander Viro
  0 siblings, 1 reply; 5+ messages in thread
From: Linus Torvalds @ 2001-03-22  3:42 UTC (permalink / raw)
  To: linux-kernel

In article <Pine.LNX.4.05.10103211901070.705-100000@cosmic.nrg.org>,
Nigel Gamble  <nigel@nrg.org> wrote:
>
>Why is the kernel lock held around sync_supers() and sync_inodes() in
>sync_old_buffers() and fsync_dev(), but not in sync_dev()?  Is it just
>to serialize calls to these functions, or is there some other reason?

A lot of the FS locks need the kernel lock and are not SMP-safe on their
own.  Look at "lock_super()" for the worst offender (I think most of the
other ones have been converted to properly lock on SMP).

sync_inodes() _shouldn't_ need it. sync_supers() definitely does.

The fact that sync_dev() doesn't get the kernel lock looks worrisome. 
Of course, I don't think much of anything actually _uses_ "sync_dev()"
anyway (quick grep shows it up in revalidate, which gets the kernel lock
earlier)

But it might be a good idea to try to (a) remove the bkl from the
functions, and push it down into sync_supers() that definitely needs it
now (and remove it when it doesn't any more).

The long-term plan (ie 2.5.x) is to basically remove the bkl from all
the VFS interfaces. For 2.4.x, only the truly performance-critical stuff
was done (ie mainly name lookup and read/write page).

		Linus

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: lock_kernel() usage and sync_*() functions
  2001-03-22  3:42 ` Linus Torvalds
@ 2001-03-22  5:18   ` Alexander Viro
  2001-03-22 14:45     ` Ingo Oeser
  0 siblings, 1 reply; 5+ messages in thread
From: Alexander Viro @ 2001-03-22  5:18 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel



On 21 Mar 2001, Linus Torvalds wrote:

> In article <Pine.LNX.4.05.10103211901070.705-100000@cosmic.nrg.org>,
> Nigel Gamble  <nigel@nrg.org> wrote:
> >
> >Why is the kernel lock held around sync_supers() and sync_inodes() in
> >sync_old_buffers() and fsync_dev(), but not in sync_dev()?  Is it just
> >to serialize calls to these functions, or is there some other reason?
> 
> A lot of the FS locks need the kernel lock and are not SMP-safe on their
> own.  Look at "lock_super()" for the worst offender (I think most of the
> other ones have been converted to properly lock on SMP).

Fixed in namespaces patch.

> sync_inodes() _shouldn't_ need it. sync_supers() definitely does.

Unfortunately, sync_inodes() involves scanning the list of superblocks.
And that's a source of big PITA.

> The fact that sync_dev() doesn't get the kernel lock looks worrisome. 
> Of course, I don't think much of anything actually _uses_ "sync_dev()"
> anyway (quick grep shows it up in revalidate, which gets the kernel lock
> earlier)

sync_dev() is called only under BKL, AFAICS.
 
> But it might be a good idea to try to (a) remove the bkl from the
> functions, and push it down into sync_supers() that definitely needs it
> now (and remove it when it doesn't any more).

Again, done in namespaces patch (->s_lock is semaphore there).

> The long-term plan (ie 2.5.x) is to basically remove the bkl from all
> the VFS interfaces. For 2.4.x, only the truly performance-critical stuff
> was done (ie mainly name lookup and read/write page).

Ehh... Linus, the main problem is in get_super(). Want a nice race?
sys_ustat() vs. sys_umount(). The former does get_super(), finds
struct super_block and does nothing to guarantee that it will stay.
Then it calls ->statfs(). In the meanwhile, you umount the thing
and do rmmod. Oops..

We need to refcount the struct super_block. I went for the following:
rw-semaphore ->s_umount protects the superblock "contents", ->s_count 
is a refcount.
	get_super() grabs a spinlock, looks through the list of
superblocks and increments s_count before dropping the spinlock.
Then it does down_read() on ->s_umount and checks ->s_root once it
got the semaphore. non-NULL - OK, NULL - repeat the search.
	drop_super() - read_up() and atomic_dec_and_test(), followed
by kfree().
	kill_super() - grab the ->s_umount for write(), remove from
list and set ->s_root to NULL while we are holding the ->s_umount and
only then drop it.
	->s_count gets contributions from each get_super() (temp.
reference) _and_ from having vfsmounts over that superblock. Same
scheme as with mm_struct - it's a number of temp refs + one more if
there are perm. ones.

	It works (I'm running such kernel for more than a month on
my boxen) and I'm going to show this beast in details in San Jose. 
Patch is about 150K unpacked, but most of that stuff is in cleanup
of last stages of boot sequence and general cleanup of fs/super.c,
so we could deal with get_super() races with smaller patch. However,
it requires changes to drivers - get_super() needs to be balanced.
Sorry - no way around tha, AFAICS.

	I started with adding 
void invalidate_dev(kdev_t dev, int sync_flag)
{
        struct super_block *sb = get_super(dev);
        if (sync_flag == 1)
                sync_dev(dev);
        else if (sync_flag == 2)
                fsync_dev(dev);
        if (sb) {
                invalidate_inodes(sb);
                /* drop_super(sb); here */
        }
        invalidate_buffers(dev);
}
in fs/buffer.c and converted drivers to that - all uses of get_super()
are in the kernel proper, so after that it was easier to fix without
excessive pain. We could do the same in the main tree as the first
step of get_super() fixes. Mind if I submit such patch?

						Cheers,
							Al


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: lock_kernel() usage and sync_*() functions
  2001-03-22  5:18   ` Alexander Viro
@ 2001-03-22 14:45     ` Ingo Oeser
  2001-03-22 17:20       ` Alexander Viro
  0 siblings, 1 reply; 5+ messages in thread
From: Ingo Oeser @ 2001-03-22 14:45 UTC (permalink / raw)
  To: Alexander Viro; +Cc: Linus Torvalds, linux-kernel

On Thu, Mar 22, 2001 at 12:18:27AM -0500, Alexander Viro wrote:
> 
> 	I started with adding 
> void invalidate_dev(kdev_t dev, int sync_flag)
> {
>         struct super_block *sb = get_super(dev);
>         if (sync_flag == 1)
>                 sync_dev(dev);
>         else if (sync_flag == 2)
>                 fsync_dev(dev);
>         if (sb) {
>                 invalidate_inodes(sb);
>                 /* drop_super(sb); here */
>         }
>         invalidate_buffers(dev);
> }

Could we remove the "magic" sync_flag from the exported interface?

Do sth. like renaming your invalidate_dev() to
_invalidate_dev() and adding 3 defines:

#define invalidate_dev(dev) _invalidate_dev(dev,0)
#define invalidate_dev_sync(dev) _invalidate_dev(dev,1)
#define invalidate_dev_fsync(dev) _invalidate_dev(dev,2)

This would make it quite clear, what will be done.

AFAIR Linus dosn't like these magic numers either, right?

Regards

Ingo Oeser
-- 
10.+11.03.2001 - 3. Chemnitzer LinuxTag <http://www.tu-chemnitz.de/linux/tag>
         <<<<<<<<<<<<     been there and had much fun   >>>>>>>>>>>>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: lock_kernel() usage and sync_*() functions
  2001-03-22 14:45     ` Ingo Oeser
@ 2001-03-22 17:20       ` Alexander Viro
  0 siblings, 0 replies; 5+ messages in thread
From: Alexander Viro @ 2001-03-22 17:20 UTC (permalink / raw)
  To: Ingo Oeser; +Cc: Linus Torvalds, linux-kernel



On Thu, 22 Mar 2001, Ingo Oeser wrote:

> Could we remove the "magic" sync_flag from the exported interface?

Sure. But I seriously suspect that sync_dev() is wrong in 100% of cases.
So "flag" is eventually going to become "do we want to sync it or not?"
thing. However, I don't want to deal with that sort of analysis right now -
callers are in drivers/* and we are in even branch.
 
> Do sth. like renaming your invalidate_dev() to
> _invalidate_dev() and adding 3 defines:
> 
> #define invalidate_dev(dev) _invalidate_dev(dev,0)
> #define invalidate_dev_sync(dev) _invalidate_dev(dev,1)
> #define invalidate_dev_fsync(dev) _invalidate_dev(dev,2)
> 
> This would make it quite clear, what will be done.
> 
> AFAIR Linus dosn't like these magic numers either, right?

I also don't like them. I _don't_ believe that magic #defines are
any better, though. And I would rather localize the get_super() to
kernel proper preserving the current behaviour and left dealing
with the sync vs. fsync to 2.5. It's easy to grep and if my gut
feeling is correct your invalidate_dev_sync() is going to be a ballast.

Again, for 2.4 I would rather do a change that obviously doesn't
change behaviour of drivers, doesn't add functions without need
and is easy to review once drivers become a fair game again (== in 2.5).
Comments?
							Cheers,
								Al


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2001-03-22 17:21 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-03-22  3:11 lock_kernel() usage and sync_*() functions Nigel Gamble
2001-03-22  3:42 ` Linus Torvalds
2001-03-22  5:18   ` Alexander Viro
2001-03-22 14:45     ` Ingo Oeser
2001-03-22 17:20       ` Alexander Viro

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).