linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* BUG: enabling psacct breaks fsfreeze
@ 2012-10-23  9:43 Nikola Ciprich
  2012-10-31 12:15 ` Jan Kara
  0 siblings, 1 reply; 13+ messages in thread
From: Nikola Ciprich @ 2012-10-23  9:43 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel; +Cc: nikola.ciprich

[-- Attachment #1: Type: text/plain, Size: 946 bytes --]

Hi,

while trying to create consistent backups of KVM guest, I've discovered
that fsfreeze always hangs.. deeper investigation revealed psacct to be the culprit.
When psacct is disabled, fsfreeze works fine, when enabled, the command never returns.
I suppose that the problem is /var is not on separate partition (ie it's same volume
as /) and psacct isn't able to dump process exit information about fsfreeze command
thus creating deadlock..
problem is 100% reproducible on both latest 3.0.x kernel and 3.7-r1c.

Should more debugging information be needed, I'll be glad to provide whatever I can..

BR

nik

-- 
-------------------------------------
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:    +420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: servis@linuxbox.cz
-------------------------------------

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BUG: enabling psacct breaks fsfreeze
  2012-10-23  9:43 BUG: enabling psacct breaks fsfreeze Nikola Ciprich
@ 2012-10-31 12:15 ` Jan Kara
  2012-10-31 12:46   ` Nikola Ciprich
  0 siblings, 1 reply; 13+ messages in thread
From: Jan Kara @ 2012-10-31 12:15 UTC (permalink / raw)
  To: Nikola Ciprich; +Cc: linux-fsdevel, linux-kernel

  Hello,

On Tue 23-10-12 11:43:51, Nikola Ciprich wrote:
> while trying to create consistent backups of KVM guest, I've discovered
> that fsfreeze always hangs.. deeper investigation revealed psacct to be the culprit.
> When psacct is disabled, fsfreeze works fine, when enabled, the command never returns.
> I suppose that the problem is /var is not on separate partition (ie it's same volume
> as /) and psacct isn't able to dump process exit information about fsfreeze command
> thus creating deadlock..
> problem is 100% reproducible on both latest 3.0.x kernel and 3.7-r1c.
> 
> Should more debugging information be needed, I'll be glad to provide whatever I can..
  Thanks for report. Hum, I'm not sure how the deadlock can happen because
AFAIU audit sends a message via netlink to userspace and whatever audit
daemon does with it is its private thing. Can you please run:
  echo w >/proc/sysrq-trigger
after the machine deadlocks and then take dmesg and attach it here? You'll
have to have the shell prepared and use serial console / netconsole to gather
dmesg or try your luck with copying via ssh / netcat.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BUG: enabling psacct breaks fsfreeze
  2012-10-31 12:15 ` Jan Kara
@ 2012-10-31 12:46   ` Nikola Ciprich
  2012-11-01  9:37     ` Jan Kara
  0 siblings, 1 reply; 13+ messages in thread
From: Nikola Ciprich @ 2012-10-31 12:46 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, linux-kernel, Nikola Ciprich

[-- Attachment #1: Type: text/plain, Size: 1283 bytes --]

Hi Jan,

thanks for the reply, sure, I'll gather and post requested info.
One more note before that, the problem is with psacct, not audit
 - psacct if I'm not mistaken  (and as opposed to audit) doesn't
use any userspace, kernel dumps information directly to fs, 
which might the reason for deadlock..

BR

nik


> > Should more debugging information be needed, I'll be glad to provide whatever I can..
>   Thanks for report. Hum, I'm not sure how the deadlock can happen because
> AFAIU audit sends a message via netlink to userspace and whatever audit
> daemon does with it is its private thing. Can you please run:
>   echo w >/proc/sysrq-trigger
> after the machine deadlocks and then take dmesg and attach it here? You'll
> have to have the shell prepared and use serial console / netconsole to gather
> dmesg or try your luck with copying via ssh / netcat.
> 
> 								Honza
> -- 
> Jan Kara <jack@suse.cz>
> SUSE Labs, CR
> 

-- 
-------------------------------------
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:    +420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: servis@linuxbox.cz
-------------------------------------

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BUG: enabling psacct breaks fsfreeze
  2012-10-31 12:46   ` Nikola Ciprich
@ 2012-11-01  9:37     ` Jan Kara
  2012-11-01 11:19       ` Jan Kara
  0 siblings, 1 reply; 13+ messages in thread
From: Jan Kara @ 2012-11-01  9:37 UTC (permalink / raw)
  To: Nikola Ciprich; +Cc: Jan Kara, linux-fsdevel, linux-kernel

On Wed 31-10-12 13:46:00, Nikola Ciprich wrote:
> Hi Jan,
> 
> thanks for the reply, sure, I'll gather and post requested info.
> One more note before that, the problem is with psacct, not audit
>  - psacct if I'm not mistaken  (and as opposed to audit) doesn't
> use any userspace, kernel dumps information directly to fs, 
> which might the reason for deadlock..
  Ah, right. Now I looked into the right code and I can see what's the
problem. I'll see what we could do about that... So far I don't have a
better idea than just dropping accounting records that should be written to
frozen filesystem (as you have nowhere to write those records to).

								Honza

> > > Should more debugging information be needed, I'll be glad to provide whatever I can..
> >   Thanks for report. Hum, I'm not sure how the deadlock can happen because
> > AFAIU audit sends a message via netlink to userspace and whatever audit
> > daemon does with it is its private thing. Can you please run:
> >   echo w >/proc/sysrq-trigger
> > after the machine deadlocks and then take dmesg and attach it here? You'll
> > have to have the shell prepared and use serial console / netconsole to gather
> > dmesg or try your luck with copying via ssh / netcat.
> > 
> > 								Honza
> > -- 
> > Jan Kara <jack@suse.cz>
> > SUSE Labs, CR
> > 
> 
> -- 
> -------------------------------------
> Ing. Nikola CIPRICH
> LinuxBox.cz, s.r.o.
> 28.rijna 168, 709 00 Ostrava
> 
> tel.:   +420 591 166 214
> fax:    +420 596 621 273
> mobil:  +420 777 093 799
> www.linuxbox.cz
> 
> mobil servis: +420 737 238 656
> email servis: servis@linuxbox.cz
> -------------------------------------


-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BUG: enabling psacct breaks fsfreeze
  2012-11-01  9:37     ` Jan Kara
@ 2012-11-01 11:19       ` Jan Kara
  2012-11-01 14:23         ` Nikola Ciprich
  0 siblings, 1 reply; 13+ messages in thread
From: Jan Kara @ 2012-11-01 11:19 UTC (permalink / raw)
  To: Nikola Ciprich; +Cc: Jan Kara, linux-fsdevel, linux-kernel

On Thu 01-11-12 10:37:23, Jan Kara wrote:
> On Wed 31-10-12 13:46:00, Nikola Ciprich wrote:
> > Hi Jan,
> > 
> > thanks for the reply, sure, I'll gather and post requested info.
> > One more note before that, the problem is with psacct, not audit
> >  - psacct if I'm not mistaken  (and as opposed to audit) doesn't
> > use any userspace, kernel dumps information directly to fs, 
> > which might the reason for deadlock..
>   Ah, right. Now I looked into the right code and I can see what's the
> problem. I'll see what we could do about that... So far I don't have a
> better idea than just dropping accounting records that should be written to
> frozen filesystem (as you have nowhere to write those records to).
  But I'd still be interested in those traces. I can see how one process
gets blocked but it's not quite clear on which locks do other block.

								Honza

> > > > Should more debugging information be needed, I'll be glad to provide whatever I can..
> > >   Thanks for report. Hum, I'm not sure how the deadlock can happen because
> > > AFAIU audit sends a message via netlink to userspace and whatever audit
> > > daemon does with it is its private thing. Can you please run:
> > >   echo w >/proc/sysrq-trigger
> > > after the machine deadlocks and then take dmesg and attach it here? You'll
> > > have to have the shell prepared and use serial console / netconsole to gather
> > > dmesg or try your luck with copying via ssh / netcat.
> > > 
> > > 								Honza
> > > -- 
> > > Jan Kara <jack@suse.cz>
> > > SUSE Labs, CR
> > > 
> > 
> > -- 
> > -------------------------------------
> > Ing. Nikola CIPRICH
> > LinuxBox.cz, s.r.o.
> > 28.rijna 168, 709 00 Ostrava
> > 
> > tel.:   +420 591 166 214
> > fax:    +420 596 621 273
> > mobil:  +420 777 093 799
> > www.linuxbox.cz
> > 
> > mobil servis: +420 737 238 656
> > email servis: servis@linuxbox.cz
> > -------------------------------------
> 
> 
> -- 
> Jan Kara <jack@suse.cz>
> SUSE Labs, CR
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BUG: enabling psacct breaks fsfreeze
  2012-11-01 11:19       ` Jan Kara
@ 2012-11-01 14:23         ` Nikola Ciprich
  2012-11-01 22:50           ` Jan Kara
  0 siblings, 1 reply; 13+ messages in thread
From: Nikola Ciprich @ 2012-11-01 14:23 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, linux-kernel, Nikola Ciprich

[-- Attachment #1: Type: text/plain, Size: 1356 bytes --]

Hi Jan,

here it goes:

Nov  1 14:23:25 vmnci22 [ 1075.178123] SysRq : Show Blocked State
Nov  1 14:23:25 vmnci22 [ 1075.180555]   task                        PC stack   pid father
Nov  1 14:23:25 vmnci22 [ 1075.180592] fsfreeze      D 0000000000000000     0  4215   4195 0x00000000
Nov  1 14:23:25 vmnci22 [ 1075.180599]  ffff8800090b9b28 0000000000000046 0000000000000000 ffffffff00000000
Nov  1 14:23:25 vmnci22 [ 1075.180606]  0000000000013780 ffff8800090b9fd8 ffff88000f716170 ffff88000f715e80
Nov  1 14:23:25 vmnci22 [ 1075.180612]  ffff88000f715dc0 ffffffff81566080 ffff88000f716170 000000010002f405
Nov  1 14:23:25 vmnci22 [ 1075.180619] Call Trace:
Nov  1 14:23:25 vmnci22 [ 1075.180693]  [<ffffffff810e2dbb>] __generic_file_aio_write+0xbb/0x420
Nov  1 14:23:25 vmnci22 [ 1075.180729]  [<ffffffff81079290>] ? autoremove_wake_function+0x0/0x40
Nov  1 14:23:25 vmnci22 [ 1075.180736]  [<ffffffff810e317f>] generic_file_aio_write+0x5f/0xc0

I obtained it using networked syslog, so it shouldn't be truncated..


-- 
-------------------------------------
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:    +420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: servis@linuxbox.cz
-------------------------------------

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BUG: enabling psacct breaks fsfreeze
  2012-11-01 14:23         ` Nikola Ciprich
@ 2012-11-01 22:50           ` Jan Kara
  2012-11-02  9:50             ` Marco Stornelli
  2012-11-07 18:51             ` Jan Kara
  0 siblings, 2 replies; 13+ messages in thread
From: Jan Kara @ 2012-11-01 22:50 UTC (permalink / raw)
  To: Nikola Ciprich; +Cc: Jan Kara, linux-fsdevel, linux-kernel

On Thu 01-11-12 15:23:25, Nikola Ciprich wrote:
> Nov  1 14:23:25 vmnci22 [ 1075.178123] SysRq : Show Blocked State
> Nov  1 14:23:25 vmnci22 [ 1075.180555]   task                        PC stack   pid father
> Nov  1 14:23:25 vmnci22 [ 1075.180592] fsfreeze      D 0000000000000000     0  4215   4195 0x00000000
> Nov  1 14:23:25 vmnci22 [ 1075.180599]  ffff8800090b9b28 0000000000000046 0000000000000000 ffffffff00000000
> Nov  1 14:23:25 vmnci22 [ 1075.180606]  0000000000013780 ffff8800090b9fd8 ffff88000f716170 ffff88000f715e80
> Nov  1 14:23:25 vmnci22 [ 1075.180612]  ffff88000f715dc0 ffffffff81566080 ffff88000f716170 000000010002f405
> Nov  1 14:23:25 vmnci22 [ 1075.180619] Call Trace:
> Nov  1 14:23:25 vmnci22 [ 1075.180693]  [<ffffffff810e2dbb>] __generic_file_aio_write+0xbb/0x420
> Nov  1 14:23:25 vmnci22 [ 1075.180729]  [<ffffffff81079290>] ? autoremove_wake_function+0x0/0x40
> Nov  1 14:23:25 vmnci22 [ 1075.180736]  [<ffffffff810e317f>] generic_file_aio_write+0x5f/0xc0
  Thanks. So the system isn't really deadlocked. It's just that fsfreeze
command hangs, isn't it? OK, I understand that it's kind of incovenient
situation because every command will hang like this when the filesystem is
frozen.

Now I only have to come up with a way to improve this... It isn't quite
simple - to properly protect against freezing be have to communicate down
into generic_file_aio_write() that we want to bail out if filesystem is
frozen instead of waiting.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BUG: enabling psacct breaks fsfreeze
  2012-11-01 22:50           ` Jan Kara
@ 2012-11-02  9:50             ` Marco Stornelli
  2012-11-02 10:40               ` Nikola Ciprich
  2012-11-07 18:51             ` Jan Kara
  1 sibling, 1 reply; 13+ messages in thread
From: Marco Stornelli @ 2012-11-02  9:50 UTC (permalink / raw)
  To: Jan Kara; +Cc: Nikola Ciprich, linux-fsdevel, linux-kernel

Il 01/11/2012 23:50, Jan Kara ha scritto:
> On Thu 01-11-12 15:23:25, Nikola Ciprich wrote:
>> Nov  1 14:23:25 vmnci22 [ 1075.178123] SysRq : Show Blocked State
>> Nov  1 14:23:25 vmnci22 [ 1075.180555]   task                        PC stack   pid father
>> Nov  1 14:23:25 vmnci22 [ 1075.180592] fsfreeze      D 0000000000000000     0  4215   4195 0x00000000
>> Nov  1 14:23:25 vmnci22 [ 1075.180599]  ffff8800090b9b28 0000000000000046 0000000000000000 ffffffff00000000
>> Nov  1 14:23:25 vmnci22 [ 1075.180606]  0000000000013780 ffff8800090b9fd8 ffff88000f716170 ffff88000f715e80
>> Nov  1 14:23:25 vmnci22 [ 1075.180612]  ffff88000f715dc0 ffffffff81566080 ffff88000f716170 000000010002f405
>> Nov  1 14:23:25 vmnci22 [ 1075.180619] Call Trace:
>> Nov  1 14:23:25 vmnci22 [ 1075.180693]  [<ffffffff810e2dbb>] __generic_file_aio_write+0xbb/0x420
>> Nov  1 14:23:25 vmnci22 [ 1075.180729]  [<ffffffff81079290>] ? autoremove_wake_function+0x0/0x40
>> Nov  1 14:23:25 vmnci22 [ 1075.180736]  [<ffffffff810e317f>] generic_file_aio_write+0x5f/0xc0
>    Thanks. So the system isn't really deadlocked. It's just that fsfreeze
> command hangs, isn't it? OK, I understand that it's kind of incovenient
> situation because every command will hang like this when the filesystem is
> frozen.
>
> Now I only have to come up with a way to improve this... It isn't quite
> simple - to properly protect against freezing be have to communicate down
> into generic_file_aio_write() that we want to bail out if filesystem is
> frozen instead of waiting.
>
> 								Honza
>

I saw this behavior (task-hang) when I tested the fsfreeze code. I was 
writing a little patch to replace fsfreeze's wait queue with a killable 
queue, in this way the user can do at least "kill -9", but since the 
behavior was the same before your patch I didn't send it. I don't know 
if we can break any previous behavior. The funny thing here is that it's 
like if fsfreeze freezes itself :)

Marco

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BUG: enabling psacct breaks fsfreeze
  2012-11-02  9:50             ` Marco Stornelli
@ 2012-11-02 10:40               ` Nikola Ciprich
  2012-11-03  8:22                 ` Marco Stornelli
  0 siblings, 1 reply; 13+ messages in thread
From: Nikola Ciprich @ 2012-11-02 10:40 UTC (permalink / raw)
  To: Marco Stornelli; +Cc: Jan Kara, linux-fsdevel, linux-kernel, Nikola Ciprich

[-- Attachment #1: Type: text/plain, Size: 873 bytes --]

> I saw this behavior (task-hang) when I tested the fsfreeze code. I was
> writing a little patch to replace fsfreeze's wait queue with a killable
> queue, in this way the user can do at least "kill -9", but since the
> behavior was the same before your patch I didn't send it. I don't know if we
> can break any previous behavior. The funny thing here is that it's like if
> fsfreeze freezes itself :)

I think freezing all tasks ain't that bad, my problem is it's not possible to
start fsfreeze -u to thaw filesystem..


> 
> Marco
> 

-- 
-------------------------------------
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:    +420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: servis@linuxbox.cz
-------------------------------------

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BUG: enabling psacct breaks fsfreeze
  2012-11-02 10:40               ` Nikola Ciprich
@ 2012-11-03  8:22                 ` Marco Stornelli
  0 siblings, 0 replies; 13+ messages in thread
From: Marco Stornelli @ 2012-11-03  8:22 UTC (permalink / raw)
  To: Nikola Ciprich; +Cc: Jan Kara, linux-fsdevel, linux-kernel

Il 02/11/2012 11:40, Nikola Ciprich ha scritto:
>> I saw this behavior (task-hang) when I tested the fsfreeze code. I was
>> writing a little patch to replace fsfreeze's wait queue with a killable
>> queue, in this way the user can do at least "kill -9", but since the
>> behavior was the same before your patch I didn't send it. I don't know if we
>> can break any previous behavior. The funny thing here is that it's like if
>> fsfreeze freezes itself :)
>
> I think freezing all tasks ain't that bad, my problem is it's not possible to
> start fsfreeze -u to thaw filesystem..
>
>

Yes, of course. It was only a general comment.

Marco

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BUG: enabling psacct breaks fsfreeze
  2012-11-01 22:50           ` Jan Kara
  2012-11-02  9:50             ` Marco Stornelli
@ 2012-11-07 18:51             ` Jan Kara
  2012-11-07 21:21               ` Nikola Ciprich
  1 sibling, 1 reply; 13+ messages in thread
From: Jan Kara @ 2012-11-07 18:51 UTC (permalink / raw)
  To: Nikola Ciprich; +Cc: Jan Kara, linux-fsdevel, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1639 bytes --]

On Thu 01-11-12 23:50:53, Jan Kara wrote:
> On Thu 01-11-12 15:23:25, Nikola Ciprich wrote:
> > Nov  1 14:23:25 vmnci22 [ 1075.178123] SysRq : Show Blocked State
> > Nov  1 14:23:25 vmnci22 [ 1075.180555]   task                        PC stack   pid father
> > Nov  1 14:23:25 vmnci22 [ 1075.180592] fsfreeze      D 0000000000000000     0  4215   4195 0x00000000
> > Nov  1 14:23:25 vmnci22 [ 1075.180599]  ffff8800090b9b28 0000000000000046 0000000000000000 ffffffff00000000
> > Nov  1 14:23:25 vmnci22 [ 1075.180606]  0000000000013780 ffff8800090b9fd8 ffff88000f716170 ffff88000f715e80
> > Nov  1 14:23:25 vmnci22 [ 1075.180612]  ffff88000f715dc0 ffffffff81566080 ffff88000f716170 000000010002f405
> > Nov  1 14:23:25 vmnci22 [ 1075.180619] Call Trace:
> > Nov  1 14:23:25 vmnci22 [ 1075.180693]  [<ffffffff810e2dbb>] __generic_file_aio_write+0xbb/0x420
> > Nov  1 14:23:25 vmnci22 [ 1075.180729]  [<ffffffff81079290>] ? autoremove_wake_function+0x0/0x40
> > Nov  1 14:23:25 vmnci22 [ 1075.180736]  [<ffffffff810e317f>] generic_file_aio_write+0x5f/0xc0
>   Thanks. So the system isn't really deadlocked. It's just that fsfreeze
> command hangs, isn't it? OK, I understand that it's kind of incovenient
> situation because every command will hang like this when the filesystem is
> frozen.
> 
> Now I only have to come up with a way to improve this... It isn't quite
> simple - to properly protect against freezing be have to communicate down
> into generic_file_aio_write() that we want to bail out if filesystem is
> frozen instead of waiting.
  OK, can you test attached patch?

								Honza

-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

[-- Attachment #2: 0001-fs-Fix-hang-with-BSD-accounting-on-frozen-filesystem.patch --]
[-- Type: text/x-patch, Size: 6137 bytes --]

>From 1cc937c5a850b2f9f0c2a83fdf757911602db198 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Wed, 7 Nov 2012 19:26:45 +0100
Subject: [PATCH] fs: Fix hang with BSD accounting on frozen filesystem

When BSD process accounting is enabled and logs information to a filesystem
which gets frozen, system easily becomes unusable because each attempt to
account process information blocks. Thus e.g. every task gets blocked in exit.

It seems better to drop accounting information (which can already happen when
filesystem is running out of space) instead of locking system up. This is
implemented using a special flag FMODE_NO_FREEZE_WAIT in file->f_mode of a
file to which accounting information is written.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/btrfs/file.c    |    3 ++-
 fs/cifs/file.c     |    3 ++-
 fs/fuse/file.c     |    3 ++-
 fs/ntfs/file.c     |    3 ++-
 fs/ocfs2/file.c    |    3 ++-
 fs/open.c          |    2 +-
 fs/xfs/xfs_file.c  |    3 ++-
 include/linux/fs.h |   14 ++++++++++++++
 kernel/acct.c      |    1 +
 mm/filemap.c       |    3 ++-
 10 files changed, 30 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 9ab1bed..6eb2e30 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1411,7 +1411,8 @@ static ssize_t btrfs_file_aio_write(struct kiocb *iocb,
 	ssize_t err = 0;
 	size_t count, ocount;
 
-	sb_start_write(inode->i_sb);
+	if (!sb_start_file_write(file))
+		return -EAGAIN;
 
 	mutex_lock(&inode->i_mutex);
 
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index edb25b4..1629e47 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -2448,7 +2448,8 @@ cifs_writev(struct kiocb *iocb, const struct iovec *iov,
 
 	BUG_ON(iocb->ki_pos != pos);
 
-	sb_start_write(inode->i_sb);
+	if (!sb_start_file_write(file))
+		return -EAGAIN;
 
 	/*
 	 * We need to hold the sem to be sure nobody modifies lock list
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 78d2837..641df9e 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -947,7 +947,8 @@ static ssize_t fuse_file_aio_write(struct kiocb *iocb, const struct iovec *iov,
 		return err;
 
 	count = ocount;
-	sb_start_write(inode->i_sb);
+	if (!sb_start_file_write(file))
+		return -EAGAIN;
 	mutex_lock(&inode->i_mutex);
 
 	/* We can write back this queue in page reclaim */
diff --git a/fs/ntfs/file.c b/fs/ntfs/file.c
index 1ecf464..028b349 100644
--- a/fs/ntfs/file.c
+++ b/fs/ntfs/file.c
@@ -2118,7 +2118,8 @@ static ssize_t ntfs_file_aio_write(struct kiocb *iocb, const struct iovec *iov,
 
 	BUG_ON(iocb->ki_pos != pos);
 
-	sb_start_write(inode->i_sb);
+	if (!sb_start_file_write(file))
+		return -EAGAIN;
 	mutex_lock(&inode->i_mutex);
 	ret = ntfs_file_aio_write_nolock(iocb, iov, nr_segs, &iocb->ki_pos);
 	mutex_unlock(&inode->i_mutex);
diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index 5a4ee77..93ef34d 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -2265,7 +2265,8 @@ static ssize_t ocfs2_file_aio_write(struct kiocb *iocb,
 	if (iocb->ki_left == 0)
 		return 0;
 
-	sb_start_write(inode->i_sb);
+	if (!sb_start_file_write(file))
+		return -EAGAIN;
 
 	appending = file->f_flags & O_APPEND ? 1 : 0;
 	direct_io = file->f_flags & O_DIRECT ? 1 : 0;
diff --git a/fs/open.c b/fs/open.c
index 59071f5..42bd875 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -808,7 +808,7 @@ static inline int build_open_flags(int flags, umode_t mode, struct open_flags *o
 		op->mode = 0;
 
 	/* Must never be set by userspace */
-	flags &= ~FMODE_NONOTIFY & ~O_CLOEXEC;
+	flags &= ~FMODE_NONOTIFY & ~O_CLOEXEC & ~FMODE_NO_FREEZE_WAIT;
 
 	/*
 	 * O_SYNC is implemented as __O_SYNC|O_DSYNC.  As many places only
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index aa473fa..7d8af61 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -771,7 +771,8 @@ xfs_file_aio_write(
 	if (ocount == 0)
 		return 0;
 
-	sb_start_write(inode->i_sb);
+	if (!sb_start_file_write(file))
+		return -EAGAIN;
 
 	if (XFS_FORCED_SHUTDOWN(ip->i_mount)) {
 		ret = -EIO;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index b33cfc9..c040a6c 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -123,6 +123,9 @@ typedef void (dio_iodone_t)(struct kiocb *iocb, loff_t offset,
 /* File was opened by fanotify and shouldn't generate fanotify events */
 #define FMODE_NONOTIFY		((__force fmode_t)0x1000000)
 
+/* Write to file should fail on frozen fs rather than block */
+#define FMODE_NO_FREEZE_WAIT	((__force fmode_t)0x2000000)
+
 /*
  * Flag for rw_copy_check_uvector and compat_rw_copy_check_uvector
  * that indicates that they should check the contents of the iovec are
@@ -1401,6 +1404,17 @@ static inline int sb_start_write_trylock(struct super_block *sb)
 	return __sb_start_write(sb, SB_FREEZE_WRITE, false);
 }
 
+/*
+ * We use trylock semantics if write originates in kernel and normal lock
+ * semantics otherwise. This is a hack but solves problems with deadlocking
+ * of e.g. psacct when filesystem is frozen.
+ */
+static inline int sb_start_file_write(struct file *file)
+{
+	return __sb_start_write(file->f_mapping->host->i_sb, SB_FREEZE_WRITE,
+				!(file->f_mode & FMODE_NO_FREEZE_WAIT));
+}
+
 /**
  * sb_start_pagefault - get write access to a superblock from a page fault
  * @sb: the super we write to
diff --git a/kernel/acct.c b/kernel/acct.c
index 051e071..0b5f231 100644
--- a/kernel/acct.c
+++ b/kernel/acct.c
@@ -183,6 +183,7 @@ static void acct_file_reopen(struct bsd_acct_struct *acct, struct file *file,
 		acct->needcheck = jiffies + ACCT_TIMEOUT*HZ;
 		acct->active = 1;
 		list_add(&acct->list, &acct_list);
+		file->f_mode |= FMODE_NO_FREEZE_WAIT;
 	}
 	if (old_acct) {
 		mnt_unpin(old_acct->f_path.mnt);
diff --git a/mm/filemap.c b/mm/filemap.c
index 83efee7..3b2812b 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2527,7 +2527,8 @@ ssize_t generic_file_aio_write(struct kiocb *iocb, const struct iovec *iov,
 
 	BUG_ON(iocb->ki_pos != pos);
 
-	sb_start_write(inode->i_sb);
+	if (!sb_start_file_write(file))
+		return -EAGAIN;
 	mutex_lock(&inode->i_mutex);
 	ret = __generic_file_aio_write(iocb, iov, nr_segs, &iocb->ki_pos);
 	mutex_unlock(&inode->i_mutex);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: BUG: enabling psacct breaks fsfreeze
  2012-11-07 18:51             ` Jan Kara
@ 2012-11-07 21:21               ` Nikola Ciprich
  2012-11-07 22:32                 ` Jan Kara
  0 siblings, 1 reply; 13+ messages in thread
From: Nikola Ciprich @ 2012-11-07 21:21 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 9090 bytes --]

Hello Jan,

tried on 3.7-rc4, works great! thanks!

will You submit as-is, or do You plan any further changes?
do You plan to backport for stable kernels? I can try it and send for review
if You want (although we'll have to wait till it's upstream anyways)

cheers

nik


On Wed, Nov 07, 2012 at 07:51:37PM +0100, Jan Kara wrote:
> On Thu 01-11-12 23:50:53, Jan Kara wrote:
> > On Thu 01-11-12 15:23:25, Nikola Ciprich wrote:
> > > Nov  1 14:23:25 vmnci22 [ 1075.178123] SysRq : Show Blocked State
> > > Nov  1 14:23:25 vmnci22 [ 1075.180555]   task                        PC stack   pid father
> > > Nov  1 14:23:25 vmnci22 [ 1075.180592] fsfreeze      D 0000000000000000     0  4215   4195 0x00000000
> > > Nov  1 14:23:25 vmnci22 [ 1075.180599]  ffff8800090b9b28 0000000000000046 0000000000000000 ffffffff00000000
> > > Nov  1 14:23:25 vmnci22 [ 1075.180606]  0000000000013780 ffff8800090b9fd8 ffff88000f716170 ffff88000f715e80
> > > Nov  1 14:23:25 vmnci22 [ 1075.180612]  ffff88000f715dc0 ffffffff81566080 ffff88000f716170 000000010002f405
> > > Nov  1 14:23:25 vmnci22 [ 1075.180619] Call Trace:
> > > Nov  1 14:23:25 vmnci22 [ 1075.180693]  [<ffffffff810e2dbb>] __generic_file_aio_write+0xbb/0x420
> > > Nov  1 14:23:25 vmnci22 [ 1075.180729]  [<ffffffff81079290>] ? autoremove_wake_function+0x0/0x40
> > > Nov  1 14:23:25 vmnci22 [ 1075.180736]  [<ffffffff810e317f>] generic_file_aio_write+0x5f/0xc0
> >   Thanks. So the system isn't really deadlocked. It's just that fsfreeze
> > command hangs, isn't it? OK, I understand that it's kind of incovenient
> > situation because every command will hang like this when the filesystem is
> > frozen.
> > 
> > Now I only have to come up with a way to improve this... It isn't quite
> > simple - to properly protect against freezing be have to communicate down
> > into generic_file_aio_write() that we want to bail out if filesystem is
> > frozen instead of waiting.
>   OK, can you test attached patch?
> 
> 								Honza
> 
> -- 
> Jan Kara <jack@suse.cz>
> SUSE Labs, CR

> From 1cc937c5a850b2f9f0c2a83fdf757911602db198 Mon Sep 17 00:00:00 2001
> From: Jan Kara <jack@suse.cz>
> Date: Wed, 7 Nov 2012 19:26:45 +0100
> Subject: [PATCH] fs: Fix hang with BSD accounting on frozen filesystem
> 
> When BSD process accounting is enabled and logs information to a filesystem
> which gets frozen, system easily becomes unusable because each attempt to
> account process information blocks. Thus e.g. every task gets blocked in exit.
> 
> It seems better to drop accounting information (which can already happen when
> filesystem is running out of space) instead of locking system up. This is
> implemented using a special flag FMODE_NO_FREEZE_WAIT in file->f_mode of a
> file to which accounting information is written.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/btrfs/file.c    |    3 ++-
>  fs/cifs/file.c     |    3 ++-
>  fs/fuse/file.c     |    3 ++-
>  fs/ntfs/file.c     |    3 ++-
>  fs/ocfs2/file.c    |    3 ++-
>  fs/open.c          |    2 +-
>  fs/xfs/xfs_file.c  |    3 ++-
>  include/linux/fs.h |   14 ++++++++++++++
>  kernel/acct.c      |    1 +
>  mm/filemap.c       |    3 ++-
>  10 files changed, 30 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index 9ab1bed..6eb2e30 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -1411,7 +1411,8 @@ static ssize_t btrfs_file_aio_write(struct kiocb *iocb,
>  	ssize_t err = 0;
>  	size_t count, ocount;
>  
> -	sb_start_write(inode->i_sb);
> +	if (!sb_start_file_write(file))
> +		return -EAGAIN;
>  
>  	mutex_lock(&inode->i_mutex);
>  
> diff --git a/fs/cifs/file.c b/fs/cifs/file.c
> index edb25b4..1629e47 100644
> --- a/fs/cifs/file.c
> +++ b/fs/cifs/file.c
> @@ -2448,7 +2448,8 @@ cifs_writev(struct kiocb *iocb, const struct iovec *iov,
>  
>  	BUG_ON(iocb->ki_pos != pos);
>  
> -	sb_start_write(inode->i_sb);
> +	if (!sb_start_file_write(file))
> +		return -EAGAIN;
>  
>  	/*
>  	 * We need to hold the sem to be sure nobody modifies lock list
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index 78d2837..641df9e 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -947,7 +947,8 @@ static ssize_t fuse_file_aio_write(struct kiocb *iocb, const struct iovec *iov,
>  		return err;
>  
>  	count = ocount;
> -	sb_start_write(inode->i_sb);
> +	if (!sb_start_file_write(file))
> +		return -EAGAIN;
>  	mutex_lock(&inode->i_mutex);
>  
>  	/* We can write back this queue in page reclaim */
> diff --git a/fs/ntfs/file.c b/fs/ntfs/file.c
> index 1ecf464..028b349 100644
> --- a/fs/ntfs/file.c
> +++ b/fs/ntfs/file.c
> @@ -2118,7 +2118,8 @@ static ssize_t ntfs_file_aio_write(struct kiocb *iocb, const struct iovec *iov,
>  
>  	BUG_ON(iocb->ki_pos != pos);
>  
> -	sb_start_write(inode->i_sb);
> +	if (!sb_start_file_write(file))
> +		return -EAGAIN;
>  	mutex_lock(&inode->i_mutex);
>  	ret = ntfs_file_aio_write_nolock(iocb, iov, nr_segs, &iocb->ki_pos);
>  	mutex_unlock(&inode->i_mutex);
> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
> index 5a4ee77..93ef34d 100644
> --- a/fs/ocfs2/file.c
> +++ b/fs/ocfs2/file.c
> @@ -2265,7 +2265,8 @@ static ssize_t ocfs2_file_aio_write(struct kiocb *iocb,
>  	if (iocb->ki_left == 0)
>  		return 0;
>  
> -	sb_start_write(inode->i_sb);
> +	if (!sb_start_file_write(file))
> +		return -EAGAIN;
>  
>  	appending = file->f_flags & O_APPEND ? 1 : 0;
>  	direct_io = file->f_flags & O_DIRECT ? 1 : 0;
> diff --git a/fs/open.c b/fs/open.c
> index 59071f5..42bd875 100644
> --- a/fs/open.c
> +++ b/fs/open.c
> @@ -808,7 +808,7 @@ static inline int build_open_flags(int flags, umode_t mode, struct open_flags *o
>  		op->mode = 0;
>  
>  	/* Must never be set by userspace */
> -	flags &= ~FMODE_NONOTIFY & ~O_CLOEXEC;
> +	flags &= ~FMODE_NONOTIFY & ~O_CLOEXEC & ~FMODE_NO_FREEZE_WAIT;
>  
>  	/*
>  	 * O_SYNC is implemented as __O_SYNC|O_DSYNC.  As many places only
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index aa473fa..7d8af61 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -771,7 +771,8 @@ xfs_file_aio_write(
>  	if (ocount == 0)
>  		return 0;
>  
> -	sb_start_write(inode->i_sb);
> +	if (!sb_start_file_write(file))
> +		return -EAGAIN;
>  
>  	if (XFS_FORCED_SHUTDOWN(ip->i_mount)) {
>  		ret = -EIO;
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index b33cfc9..c040a6c 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -123,6 +123,9 @@ typedef void (dio_iodone_t)(struct kiocb *iocb, loff_t offset,
>  /* File was opened by fanotify and shouldn't generate fanotify events */
>  #define FMODE_NONOTIFY		((__force fmode_t)0x1000000)
>  
> +/* Write to file should fail on frozen fs rather than block */
> +#define FMODE_NO_FREEZE_WAIT	((__force fmode_t)0x2000000)
> +
>  /*
>   * Flag for rw_copy_check_uvector and compat_rw_copy_check_uvector
>   * that indicates that they should check the contents of the iovec are
> @@ -1401,6 +1404,17 @@ static inline int sb_start_write_trylock(struct super_block *sb)
>  	return __sb_start_write(sb, SB_FREEZE_WRITE, false);
>  }
>  
> +/*
> + * We use trylock semantics if write originates in kernel and normal lock
> + * semantics otherwise. This is a hack but solves problems with deadlocking
> + * of e.g. psacct when filesystem is frozen.
> + */
> +static inline int sb_start_file_write(struct file *file)
> +{
> +	return __sb_start_write(file->f_mapping->host->i_sb, SB_FREEZE_WRITE,
> +				!(file->f_mode & FMODE_NO_FREEZE_WAIT));
> +}
> +
>  /**
>   * sb_start_pagefault - get write access to a superblock from a page fault
>   * @sb: the super we write to
> diff --git a/kernel/acct.c b/kernel/acct.c
> index 051e071..0b5f231 100644
> --- a/kernel/acct.c
> +++ b/kernel/acct.c
> @@ -183,6 +183,7 @@ static void acct_file_reopen(struct bsd_acct_struct *acct, struct file *file,
>  		acct->needcheck = jiffies + ACCT_TIMEOUT*HZ;
>  		acct->active = 1;
>  		list_add(&acct->list, &acct_list);
> +		file->f_mode |= FMODE_NO_FREEZE_WAIT;
>  	}
>  	if (old_acct) {
>  		mnt_unpin(old_acct->f_path.mnt);
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 83efee7..3b2812b 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -2527,7 +2527,8 @@ ssize_t generic_file_aio_write(struct kiocb *iocb, const struct iovec *iov,
>  
>  	BUG_ON(iocb->ki_pos != pos);
>  
> -	sb_start_write(inode->i_sb);
> +	if (!sb_start_file_write(file))
> +		return -EAGAIN;
>  	mutex_lock(&inode->i_mutex);
>  	ret = __generic_file_aio_write(iocb, iov, nr_segs, &iocb->ki_pos);
>  	mutex_unlock(&inode->i_mutex);
> -- 
> 1.7.1
> 


-- 
-------------------------------------
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:    +420 596 621 273
mobil:  +420 777 093 799

www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: servis@linuxbox.cz
-------------------------------------

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BUG: enabling psacct breaks fsfreeze
  2012-11-07 21:21               ` Nikola Ciprich
@ 2012-11-07 22:32                 ` Jan Kara
  0 siblings, 0 replies; 13+ messages in thread
From: Jan Kara @ 2012-11-07 22:32 UTC (permalink / raw)
  To: Nikola Ciprich; +Cc: Jan Kara, linux-fsdevel, linux-kernel

On Wed 07-11-12 22:21:19, Nikola Ciprich wrote:
> Hello Jan,
> 
> tried on 3.7-rc4, works great! thanks!
> 
> will You submit as-is, or do You plan any further changes?
> do You plan to backport for stable kernels? I can try it and send for review
> if You want (although we'll have to wait till it's upstream anyways)
  Thanks for testing. I've sent the patch and will see what other guys
tell.

								Honza

> On Wed, Nov 07, 2012 at 07:51:37PM +0100, Jan Kara wrote:
> > On Thu 01-11-12 23:50:53, Jan Kara wrote:
> > > On Thu 01-11-12 15:23:25, Nikola Ciprich wrote:
> > > > Nov  1 14:23:25 vmnci22 [ 1075.178123] SysRq : Show Blocked State
> > > > Nov  1 14:23:25 vmnci22 [ 1075.180555]   task                        PC stack   pid father
> > > > Nov  1 14:23:25 vmnci22 [ 1075.180592] fsfreeze      D 0000000000000000     0  4215   4195 0x00000000
> > > > Nov  1 14:23:25 vmnci22 [ 1075.180599]  ffff8800090b9b28 0000000000000046 0000000000000000 ffffffff00000000
> > > > Nov  1 14:23:25 vmnci22 [ 1075.180606]  0000000000013780 ffff8800090b9fd8 ffff88000f716170 ffff88000f715e80
> > > > Nov  1 14:23:25 vmnci22 [ 1075.180612]  ffff88000f715dc0 ffffffff81566080 ffff88000f716170 000000010002f405
> > > > Nov  1 14:23:25 vmnci22 [ 1075.180619] Call Trace:
> > > > Nov  1 14:23:25 vmnci22 [ 1075.180693]  [<ffffffff810e2dbb>] __generic_file_aio_write+0xbb/0x420
> > > > Nov  1 14:23:25 vmnci22 [ 1075.180729]  [<ffffffff81079290>] ? autoremove_wake_function+0x0/0x40
> > > > Nov  1 14:23:25 vmnci22 [ 1075.180736]  [<ffffffff810e317f>] generic_file_aio_write+0x5f/0xc0
> > >   Thanks. So the system isn't really deadlocked. It's just that fsfreeze
> > > command hangs, isn't it? OK, I understand that it's kind of incovenient
> > > situation because every command will hang like this when the filesystem is
> > > frozen.
> > > 
> > > Now I only have to come up with a way to improve this... It isn't quite
> > > simple - to properly protect against freezing be have to communicate down
> > > into generic_file_aio_write() that we want to bail out if filesystem is
> > > frozen instead of waiting.
> >   OK, can you test attached patch?
> > 
> > 								Honza
> > 
> > -- 
> > Jan Kara <jack@suse.cz>
> > SUSE Labs, CR
> 
> > From 1cc937c5a850b2f9f0c2a83fdf757911602db198 Mon Sep 17 00:00:00 2001
> > From: Jan Kara <jack@suse.cz>
> > Date: Wed, 7 Nov 2012 19:26:45 +0100
> > Subject: [PATCH] fs: Fix hang with BSD accounting on frozen filesystem
> > 
> > When BSD process accounting is enabled and logs information to a filesystem
> > which gets frozen, system easily becomes unusable because each attempt to
> > account process information blocks. Thus e.g. every task gets blocked in exit.
> > 
> > It seems better to drop accounting information (which can already happen when
> > filesystem is running out of space) instead of locking system up. This is
> > implemented using a special flag FMODE_NO_FREEZE_WAIT in file->f_mode of a
> > file to which accounting information is written.
> > 
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> >  fs/btrfs/file.c    |    3 ++-
> >  fs/cifs/file.c     |    3 ++-
> >  fs/fuse/file.c     |    3 ++-
> >  fs/ntfs/file.c     |    3 ++-
> >  fs/ocfs2/file.c    |    3 ++-
> >  fs/open.c          |    2 +-
> >  fs/xfs/xfs_file.c  |    3 ++-
> >  include/linux/fs.h |   14 ++++++++++++++
> >  kernel/acct.c      |    1 +
> >  mm/filemap.c       |    3 ++-
> >  10 files changed, 30 insertions(+), 8 deletions(-)
> > 
> > diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> > index 9ab1bed..6eb2e30 100644
> > --- a/fs/btrfs/file.c
> > +++ b/fs/btrfs/file.c
> > @@ -1411,7 +1411,8 @@ static ssize_t btrfs_file_aio_write(struct kiocb *iocb,
> >  	ssize_t err = 0;
> >  	size_t count, ocount;
> >  
> > -	sb_start_write(inode->i_sb);
> > +	if (!sb_start_file_write(file))
> > +		return -EAGAIN;
> >  
> >  	mutex_lock(&inode->i_mutex);
> >  
> > diff --git a/fs/cifs/file.c b/fs/cifs/file.c
> > index edb25b4..1629e47 100644
> > --- a/fs/cifs/file.c
> > +++ b/fs/cifs/file.c
> > @@ -2448,7 +2448,8 @@ cifs_writev(struct kiocb *iocb, const struct iovec *iov,
> >  
> >  	BUG_ON(iocb->ki_pos != pos);
> >  
> > -	sb_start_write(inode->i_sb);
> > +	if (!sb_start_file_write(file))
> > +		return -EAGAIN;
> >  
> >  	/*
> >  	 * We need to hold the sem to be sure nobody modifies lock list
> > diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> > index 78d2837..641df9e 100644
> > --- a/fs/fuse/file.c
> > +++ b/fs/fuse/file.c
> > @@ -947,7 +947,8 @@ static ssize_t fuse_file_aio_write(struct kiocb *iocb, const struct iovec *iov,
> >  		return err;
> >  
> >  	count = ocount;
> > -	sb_start_write(inode->i_sb);
> > +	if (!sb_start_file_write(file))
> > +		return -EAGAIN;
> >  	mutex_lock(&inode->i_mutex);
> >  
> >  	/* We can write back this queue in page reclaim */
> > diff --git a/fs/ntfs/file.c b/fs/ntfs/file.c
> > index 1ecf464..028b349 100644
> > --- a/fs/ntfs/file.c
> > +++ b/fs/ntfs/file.c
> > @@ -2118,7 +2118,8 @@ static ssize_t ntfs_file_aio_write(struct kiocb *iocb, const struct iovec *iov,
> >  
> >  	BUG_ON(iocb->ki_pos != pos);
> >  
> > -	sb_start_write(inode->i_sb);
> > +	if (!sb_start_file_write(file))
> > +		return -EAGAIN;
> >  	mutex_lock(&inode->i_mutex);
> >  	ret = ntfs_file_aio_write_nolock(iocb, iov, nr_segs, &iocb->ki_pos);
> >  	mutex_unlock(&inode->i_mutex);
> > diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
> > index 5a4ee77..93ef34d 100644
> > --- a/fs/ocfs2/file.c
> > +++ b/fs/ocfs2/file.c
> > @@ -2265,7 +2265,8 @@ static ssize_t ocfs2_file_aio_write(struct kiocb *iocb,
> >  	if (iocb->ki_left == 0)
> >  		return 0;
> >  
> > -	sb_start_write(inode->i_sb);
> > +	if (!sb_start_file_write(file))
> > +		return -EAGAIN;
> >  
> >  	appending = file->f_flags & O_APPEND ? 1 : 0;
> >  	direct_io = file->f_flags & O_DIRECT ? 1 : 0;
> > diff --git a/fs/open.c b/fs/open.c
> > index 59071f5..42bd875 100644
> > --- a/fs/open.c
> > +++ b/fs/open.c
> > @@ -808,7 +808,7 @@ static inline int build_open_flags(int flags, umode_t mode, struct open_flags *o
> >  		op->mode = 0;
> >  
> >  	/* Must never be set by userspace */
> > -	flags &= ~FMODE_NONOTIFY & ~O_CLOEXEC;
> > +	flags &= ~FMODE_NONOTIFY & ~O_CLOEXEC & ~FMODE_NO_FREEZE_WAIT;
> >  
> >  	/*
> >  	 * O_SYNC is implemented as __O_SYNC|O_DSYNC.  As many places only
> > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> > index aa473fa..7d8af61 100644
> > --- a/fs/xfs/xfs_file.c
> > +++ b/fs/xfs/xfs_file.c
> > @@ -771,7 +771,8 @@ xfs_file_aio_write(
> >  	if (ocount == 0)
> >  		return 0;
> >  
> > -	sb_start_write(inode->i_sb);
> > +	if (!sb_start_file_write(file))
> > +		return -EAGAIN;
> >  
> >  	if (XFS_FORCED_SHUTDOWN(ip->i_mount)) {
> >  		ret = -EIO;
> > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > index b33cfc9..c040a6c 100644
> > --- a/include/linux/fs.h
> > +++ b/include/linux/fs.h
> > @@ -123,6 +123,9 @@ typedef void (dio_iodone_t)(struct kiocb *iocb, loff_t offset,
> >  /* File was opened by fanotify and shouldn't generate fanotify events */
> >  #define FMODE_NONOTIFY		((__force fmode_t)0x1000000)
> >  
> > +/* Write to file should fail on frozen fs rather than block */
> > +#define FMODE_NO_FREEZE_WAIT	((__force fmode_t)0x2000000)
> > +
> >  /*
> >   * Flag for rw_copy_check_uvector and compat_rw_copy_check_uvector
> >   * that indicates that they should check the contents of the iovec are
> > @@ -1401,6 +1404,17 @@ static inline int sb_start_write_trylock(struct super_block *sb)
> >  	return __sb_start_write(sb, SB_FREEZE_WRITE, false);
> >  }
> >  
> > +/*
> > + * We use trylock semantics if write originates in kernel and normal lock
> > + * semantics otherwise. This is a hack but solves problems with deadlocking
> > + * of e.g. psacct when filesystem is frozen.
> > + */
> > +static inline int sb_start_file_write(struct file *file)
> > +{
> > +	return __sb_start_write(file->f_mapping->host->i_sb, SB_FREEZE_WRITE,
> > +				!(file->f_mode & FMODE_NO_FREEZE_WAIT));
> > +}
> > +
> >  /**
> >   * sb_start_pagefault - get write access to a superblock from a page fault
> >   * @sb: the super we write to
> > diff --git a/kernel/acct.c b/kernel/acct.c
> > index 051e071..0b5f231 100644
> > --- a/kernel/acct.c
> > +++ b/kernel/acct.c
> > @@ -183,6 +183,7 @@ static void acct_file_reopen(struct bsd_acct_struct *acct, struct file *file,
> >  		acct->needcheck = jiffies + ACCT_TIMEOUT*HZ;
> >  		acct->active = 1;
> >  		list_add(&acct->list, &acct_list);
> > +		file->f_mode |= FMODE_NO_FREEZE_WAIT;
> >  	}
> >  	if (old_acct) {
> >  		mnt_unpin(old_acct->f_path.mnt);
> > diff --git a/mm/filemap.c b/mm/filemap.c
> > index 83efee7..3b2812b 100644
> > --- a/mm/filemap.c
> > +++ b/mm/filemap.c
> > @@ -2527,7 +2527,8 @@ ssize_t generic_file_aio_write(struct kiocb *iocb, const struct iovec *iov,
> >  
> >  	BUG_ON(iocb->ki_pos != pos);
> >  
> > -	sb_start_write(inode->i_sb);
> > +	if (!sb_start_file_write(file))
> > +		return -EAGAIN;
> >  	mutex_lock(&inode->i_mutex);
> >  	ret = __generic_file_aio_write(iocb, iov, nr_segs, &iocb->ki_pos);
> >  	mutex_unlock(&inode->i_mutex);
> > -- 
> > 1.7.1
> > 
> 
> 
> -- 
> -------------------------------------
> Ing. Nikola CIPRICH
> LinuxBox.cz, s.r.o.
> 28. rijna 168, 709 00 Ostrava
> 
> tel.:   +420 591 166 214
> fax:    +420 596 621 273
> mobil:  +420 777 093 799
> 
> www.linuxbox.cz
> 
> mobil servis: +420 737 238 656
> email servis: servis@linuxbox.cz
> -------------------------------------


-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2012-11-07 22:32 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-10-23  9:43 BUG: enabling psacct breaks fsfreeze Nikola Ciprich
2012-10-31 12:15 ` Jan Kara
2012-10-31 12:46   ` Nikola Ciprich
2012-11-01  9:37     ` Jan Kara
2012-11-01 11:19       ` Jan Kara
2012-11-01 14:23         ` Nikola Ciprich
2012-11-01 22:50           ` Jan Kara
2012-11-02  9:50             ` Marco Stornelli
2012-11-02 10:40               ` Nikola Ciprich
2012-11-03  8:22                 ` Marco Stornelli
2012-11-07 18:51             ` Jan Kara
2012-11-07 21:21               ` Nikola Ciprich
2012-11-07 22:32                 ` Jan Kara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).