* Re: Dirtiable inode bdi default != sb bdi btrfs
2010-09-27 22:54 ` Chris Mason
@ 2010-09-27 23:51 ` Jan Kara
2010-09-28 7:05 ` Artem Bityutskiy
` (2 subsequent siblings)
3 siblings, 0 replies; 30+ messages in thread
From: Jan Kara @ 2010-09-27 23:51 UTC (permalink / raw)
To: Chris Mason
Cc: Jan Kara, Cesar Eduardo Barros, Andrew Morton, hch, linux-kernel,
Jens Axboe, linux-btrfs, Alexander Viro, linux-fsdevel, stable,
Jens Axboe, Michał Piotrowski, Chuck Ebbert, kernel
On Mon 27-09-10 18:54:52, Chris Mason wrote:
> On Tue, Sep 28, 2010 at 12:25:48AM +0200, Jan Kara wrote:
> > [Added CCs for similar ecryptfs warning]
> > On Thu 23-09-10 12:38:49, Andrew Morton wrote:
> > > > This started appearing for me on v2.6.36-rc5-49-gc79bd89; it did not
> > > > happen on v2.6.36-rc5-33-g1ce1e41, probably because it does not have
> > > > commit 692ebd17c2905313fff3c504c249c6a0faad16ec which introduces the
> > > > warning.
> > > > [...]
> > > > device fsid 44d595920ddedfa-3ece6b56e80f689e devid 1 transid 22342
> > > > /dev/mapper/vg_cesarbinspiro-lv_home
> > > > SELinux: initialized (dev dm-3, type btrfs), uses xattr
> > > > ------------[ cut here ]------------
> > > > WARNING: at fs/fs-writeback.c:87 inode_to_bdi+0x62/0x6d()
> > > > Hardware name: Inspiron N4010
> > > > Dirtiable inode bdi default != sb bdi btrfs
> > > > Modules linked in: ipv6 kvm_intel kvm uinput arc4 ecb
> > > > snd_hda_codec_intelhdmi snd_hda_codec_realtek iwlagn snd_hda_intel
> > > > iwlcore snd_hda_codec uvcvideo snd_hwdep mac80211 videodev snd_seq
> > > > snd_seq_device v4l1_compat snd_pcm atl1c v4l2_compat_ioctl32 btusb
> > > > cfg80211 snd_timer i2c_i801 bluetooth iTCO_wdt dell_wmi dell_laptop snd
> > > > pcspkr wmi dcdbas shpchp iTCO_vendor_support soundcore snd_page_alloc
> > > > rfkill joydev microcode btrfs zlib_deflate libcrc32c cryptd aes_x86_64
> > > > aes_generic xts gf128mul dm_crypt usb_storage i915 drm_kms_helper drm
> > > > i2c_algo_bit i2c_core video output [last unloaded: scsi_wait_scan]
> > > > Pid: 1073, comm: find Not tainted 2.6.36-rc5+ #8
> > > > Call Trace:
> > > > [<ffffffff8104d0e4>] warn_slowpath_common+0x85/0x9d
> > > > [<ffffffff8104d19f>] warn_slowpath_fmt+0x46/0x48
> > > > [<ffffffff811308b7>] inode_to_bdi+0x62/0x6d
> > > > [<ffffffff81131b48>] __mark_inode_dirty+0xd0/0x177
> > > > [<ffffffff81127168>] touch_atime+0x107/0x12a
> > > > [<ffffffff81122384>] ? filldir+0x0/0xd0
> > > > [<ffffffff8112259b>] vfs_readdir+0x8d/0xb4
> > > > [<ffffffff8112270b>] sys_getdents+0x81/0xd1
> > > > [<ffffffff81009c72>] system_call_fastpath+0x16/0x1b
> > Thanks for the report. These bdi pointers are a mess. As Chris pointed
> > out, btrfs forgets to properly initialize inode->i_mapping.backing_dev_info
> > for directories and special inodes and thus these were previously attached
> > to default_backing_dev_info which probably isn't what Chris would like to
> > see.
>
> There's no actual writeback for these, so it works fine for btrfs either
> way.
Ah, OK. I see you do all the writing already at inode dirtying time so
you don't really care what happens after the dirtying for inodes without
page cache. So it's really just a cosmetic issue for btrfs.
> > I've also got a similar report for ecryptfs which also does not
> > initialize inode->i_mapping.backing_dev_info although it sets sb->s_bdi and
> > thus again its inodes get filed to default_backing_dev_info lists. Quick
> > search seems to reveal that other filesystems using handcrafted bdi's get
> > it wrong as well and thus their inodes end up in the default_backing_dev_info
> > lists which is generally undesirable (this was happening already before my
> > patch, my code just started complaining about that).
> > That suggests that we should probably handle such cases in a more generic
> > way by changing the code in inode_init_always(). The patch below makes at
> > least btrfs happy for me... Could you maybe test it? Thanks.
>
> Christoph had a slightly different plan for this, I've cc'd him and kept
> the patch below for comment.
OK, thanks.
Honza
> > ---
> >
> > From 29f60c2b08ff9637a10439d1513805835ddcc746 Mon Sep 17 00:00:00 2001
> > From: Jan Kara <jack@suse.cz>
> > Date: Mon, 27 Sep 2010 23:56:48 +0200
> > Subject: [PATCH] bdi: Initialize inode->i_mapping.backing_dev_info to sb->s_bdi
> >
> > Currently, we initialize inode->i_mapping.backing_dev_info to the bdi of device
> > sb->s_bdev points to. However there is quite a big number of filesystems that
> > do not set sb->s_bdev (because they do not have one) but do set sb->s_bdi.
> > These filesystems would generally benefit from setting
> > inode->i_mapping.backing_dev_info to their s_bdi because otherwise their inodes
> > would point to default_backing_dev_info and thus dirty inode tracking would
> > happen there. So change inode initialization code to use sb->s_bdi if it
> > is available.
> >
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> > fs/inode.c | 22 ++++++++++++++--------
> > 1 files changed, 14 insertions(+), 8 deletions(-)
> >
> > diff --git a/fs/inode.c b/fs/inode.c
> > index 8646433..e415be4 100644
> > --- a/fs/inode.c
> > +++ b/fs/inode.c
> > @@ -172,15 +172,21 @@ int inode_init_always(struct super_block *sb, struct inode *inode)
> > mapping->writeback_index = 0;
> >
> > /*
> > - * If the block_device provides a backing_dev_info for client
> > - * inodes then use that. Otherwise the inode share the bdev's
> > - * backing_dev_info.
> > + * If the filesystem provides a backing_dev_info for client inodes
> > + * then use that. Otherwise inodes share default_backing_dev_info.
> > */
> > - if (sb->s_bdev) {
> > - struct backing_dev_info *bdi;
> > -
> > - bdi = sb->s_bdev->bd_inode->i_mapping->backing_dev_info;
> > - mapping->backing_dev_info = bdi;
> > + if (sb->s_bdi && sb->s_bdi != &noop_backing_dev_info) {
> > + /*
> > + * Catch cases where filesystem might be bitten by using s_bdi
> > + * instead of sb->s_bdev. Can be removed in 2.6.38.
> > + */
> > + if (sb->s_bdev) {
> > + struct backing_dev_info *bdi =
> > + sb->s_bdev->bd_inode->i_mapping->backing_dev_info;
> > + WARN(bdi != sb->s_bdi, "s_bdev bdi %s != s_bdi %s\n",
> > + bdi->name, sb->s_bdi->name);
> > + }
> > + mapping->backing_dev_info = sb->s_bdi;
> > }
> > inode->i_private = NULL;
> > inode->i_mapping = mapping;
> > --
> > 1.6.4.2
> >
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Dirtiable inode bdi default != sb bdi btrfs
@ 2010-09-27 23:51 ` Jan Kara
0 siblings, 0 replies; 30+ messages in thread
From: Jan Kara @ 2010-09-27 23:51 UTC (permalink / raw)
To: Chris Mason
Cc: Jens Axboe, Jan Kara, kernel-TuqUDEhatI4ANWPb/1PvSmm0pvjS0E/A,
Jens Axboe, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
stable-DgEjT+Ai2ygdnm+yROfE0A, Michał Piotrowski,
linux-btrfs-u79uwXL29TY76Z2rM5mHXA,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
hch-jcswGhMUV9g, Cesar Eduardo Barros, Alexander Viro
On Mon 27-09-10 18:54:52, Chris Mason wrote:
> On Tue, Sep 28, 2010 at 12:25:48AM +0200, Jan Kara wrote:
> > [Added CCs for similar ecryptfs warning]
> > On Thu 23-09-10 12:38:49, Andrew Morton wrote:
> > > > This started appearing for me on v2.6.36-rc5-49-gc79bd89; it did not
> > > > happen on v2.6.36-rc5-33-g1ce1e41, probably because it does not have
> > > > commit 692ebd17c2905313fff3c504c249c6a0faad16ec which introduces the
> > > > warning.
> > > > [...]
> > > > device fsid 44d595920ddedfa-3ece6b56e80f689e devid 1 transid 22342
> > > > /dev/mapper/vg_cesarbinspiro-lv_home
> > > > SELinux: initialized (dev dm-3, type btrfs), uses xattr
> > > > ------------[ cut here ]------------
> > > > WARNING: at fs/fs-writeback.c:87 inode_to_bdi+0x62/0x6d()
> > > > Hardware name: Inspiron N4010
> > > > Dirtiable inode bdi default != sb bdi btrfs
> > > > Modules linked in: ipv6 kvm_intel kvm uinput arc4 ecb
> > > > snd_hda_codec_intelhdmi snd_hda_codec_realtek iwlagn snd_hda_intel
> > > > iwlcore snd_hda_codec uvcvideo snd_hwdep mac80211 videodev snd_seq
> > > > snd_seq_device v4l1_compat snd_pcm atl1c v4l2_compat_ioctl32 btusb
> > > > cfg80211 snd_timer i2c_i801 bluetooth iTCO_wdt dell_wmi dell_laptop snd
> > > > pcspkr wmi dcdbas shpchp iTCO_vendor_support soundcore snd_page_alloc
> > > > rfkill joydev microcode btrfs zlib_deflate libcrc32c cryptd aes_x86_64
> > > > aes_generic xts gf128mul dm_crypt usb_storage i915 drm_kms_helper drm
> > > > i2c_algo_bit i2c_core video output [last unloaded: scsi_wait_scan]
> > > > Pid: 1073, comm: find Not tainted 2.6.36-rc5+ #8
> > > > Call Trace:
> > > > [<ffffffff8104d0e4>] warn_slowpath_common+0x85/0x9d
> > > > [<ffffffff8104d19f>] warn_slowpath_fmt+0x46/0x48
> > > > [<ffffffff811308b7>] inode_to_bdi+0x62/0x6d
> > > > [<ffffffff81131b48>] __mark_inode_dirty+0xd0/0x177
> > > > [<ffffffff81127168>] touch_atime+0x107/0x12a
> > > > [<ffffffff81122384>] ? filldir+0x0/0xd0
> > > > [<ffffffff8112259b>] vfs_readdir+0x8d/0xb4
> > > > [<ffffffff8112270b>] sys_getdents+0x81/0xd1
> > > > [<ffffffff81009c72>] system_call_fastpath+0x16/0x1b
> > Thanks for the report. These bdi pointers are a mess. As Chris pointed
> > out, btrfs forgets to properly initialize inode->i_mapping.backing_dev_info
> > for directories and special inodes and thus these were previously attached
> > to default_backing_dev_info which probably isn't what Chris would like to
> > see.
>
> There's no actual writeback for these, so it works fine for btrfs either
> way.
Ah, OK. I see you do all the writing already at inode dirtying time so
you don't really care what happens after the dirtying for inodes without
page cache. So it's really just a cosmetic issue for btrfs.
> > I've also got a similar report for ecryptfs which also does not
> > initialize inode->i_mapping.backing_dev_info although it sets sb->s_bdi and
> > thus again its inodes get filed to default_backing_dev_info lists. Quick
> > search seems to reveal that other filesystems using handcrafted bdi's get
> > it wrong as well and thus their inodes end up in the default_backing_dev_info
> > lists which is generally undesirable (this was happening already before my
> > patch, my code just started complaining about that).
> > That suggests that we should probably handle such cases in a more generic
> > way by changing the code in inode_init_always(). The patch below makes at
> > least btrfs happy for me... Could you maybe test it? Thanks.
>
> Christoph had a slightly different plan for this, I've cc'd him and kept
> the patch below for comment.
OK, thanks.
Honza
> > ---
> >
> > From 29f60c2b08ff9637a10439d1513805835ddcc746 Mon Sep 17 00:00:00 2001
> > From: Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>
> > Date: Mon, 27 Sep 2010 23:56:48 +0200
> > Subject: [PATCH] bdi: Initialize inode->i_mapping.backing_dev_info to sb->s_bdi
> >
> > Currently, we initialize inode->i_mapping.backing_dev_info to the bdi of device
> > sb->s_bdev points to. However there is quite a big number of filesystems that
> > do not set sb->s_bdev (because they do not have one) but do set sb->s_bdi.
> > These filesystems would generally benefit from setting
> > inode->i_mapping.backing_dev_info to their s_bdi because otherwise their inodes
> > would point to default_backing_dev_info and thus dirty inode tracking would
> > happen there. So change inode initialization code to use sb->s_bdi if it
> > is available.
> >
> > Signed-off-by: Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>
> > ---
> > fs/inode.c | 22 ++++++++++++++--------
> > 1 files changed, 14 insertions(+), 8 deletions(-)
> >
> > diff --git a/fs/inode.c b/fs/inode.c
> > index 8646433..e415be4 100644
> > --- a/fs/inode.c
> > +++ b/fs/inode.c
> > @@ -172,15 +172,21 @@ int inode_init_always(struct super_block *sb, struct inode *inode)
> > mapping->writeback_index = 0;
> >
> > /*
> > - * If the block_device provides a backing_dev_info for client
> > - * inodes then use that. Otherwise the inode share the bdev's
> > - * backing_dev_info.
> > + * If the filesystem provides a backing_dev_info for client inodes
> > + * then use that. Otherwise inodes share default_backing_dev_info.
> > */
> > - if (sb->s_bdev) {
> > - struct backing_dev_info *bdi;
> > -
> > - bdi = sb->s_bdev->bd_inode->i_mapping->backing_dev_info;
> > - mapping->backing_dev_info = bdi;
> > + if (sb->s_bdi && sb->s_bdi != &noop_backing_dev_info) {
> > + /*
> > + * Catch cases where filesystem might be bitten by using s_bdi
> > + * instead of sb->s_bdev. Can be removed in 2.6.38.
> > + */
> > + if (sb->s_bdev) {
> > + struct backing_dev_info *bdi =
> > + sb->s_bdev->bd_inode->i_mapping->backing_dev_info;
> > + WARN(bdi != sb->s_bdi, "s_bdev bdi %s != s_bdi %s\n",
> > + bdi->name, sb->s_bdi->name);
> > + }
> > + mapping->backing_dev_info = sb->s_bdi;
> > }
> > inode->i_private = NULL;
> > inode->i_mapping = mapping;
> > --
> > 1.6.4.2
> >
--
Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Dirtiable inode bdi default != sb bdi btrfs
2010-09-27 22:54 ` Chris Mason
@ 2010-09-28 7:05 ` Artem Bityutskiy
2010-09-28 7:05 ` Artem Bityutskiy
` (2 subsequent siblings)
3 siblings, 0 replies; 30+ messages in thread
From: Artem Bityutskiy @ 2010-09-28 7:05 UTC (permalink / raw)
To: Chris Mason
Cc: Jan Kara, Cesar Eduardo Barros, Andrew Morton, hch, linux-kernel,
Jens Axboe, linux-btrfs, Alexander Viro, linux-fsdevel, stable,
Jens Axboe, Michał Piotrowski, Chuck Ebbert, kernel
On Mon, 2010-09-27 at 18:54 -0400, Chris Mason wrote:
> On Tue, Sep 28, 2010 at 12:25:48AM +0200, Jan Kara wrote:
> > [Added CCs for similar ecryptfs warning]
> > On Thu 23-09-10 12:38:49, Andrew Morton wrote:
> > > > This started appearing for me on v2.6.36-rc5-49-gc79bd89; it di=
d not=20
> > > > happen on v2.6.36-rc5-33-g1ce1e41, probably because it does not=
have=20
> > > > commit 692ebd17c2905313fff3c504c249c6a0faad16ec which introduce=
s the=20
> > > > warning.
> > > > [...]
> > > > device fsid 44d595920ddedfa-3ece6b56e80f689e devid 1 transid 22=
342=20
> > > > /dev/mapper/vg_cesarbinspiro-lv_home
> > > > SELinux: initialized (dev dm-3, type btrfs), uses xattr
> > > > ------------[ cut here ]------------
> > > > WARNING: at fs/fs-writeback.c:87 inode_to_bdi+0x62/0x6d()
> > > > Hardware name: Inspiron N4010
> > > > Dirtiable inode bdi default !=3D sb bdi btrfs
> > > > Modules linked in: ipv6 kvm_intel kvm uinput arc4 ecb=20
> > > > snd_hda_codec_intelhdmi snd_hda_codec_realtek iwlagn snd_hda_in=
tel=20
> > > > iwlcore snd_hda_codec uvcvideo snd_hwdep mac80211 videodev snd_=
seq=20
> > > > snd_seq_device v4l1_compat snd_pcm atl1c v4l2_compat_ioctl32 bt=
usb=20
> > > > cfg80211 snd_timer i2c_i801 bluetooth iTCO_wdt dell_wmi dell_la=
ptop snd=20
> > > > pcspkr wmi dcdbas shpchp iTCO_vendor_support soundcore snd_page=
_alloc=20
> > > > rfkill joydev microcode btrfs zlib_deflate libcrc32c cryptd aes=
_x86_64=20
> > > > aes_generic xts gf128mul dm_crypt usb_storage i915 drm_kms_help=
er drm=20
> > > > i2c_algo_bit i2c_core video output [last unloaded: scsi_wait_sc=
an]
> > > > Pid: 1073, comm: find Not tainted 2.6.36-rc5+ #8
> > > > Call Trace:
> > > > [<ffffffff8104d0e4>] warn_slowpath_common+0x85/0x9d
> > > > [<ffffffff8104d19f>] warn_slowpath_fmt+0x46/0x48
> > > > [<ffffffff811308b7>] inode_to_bdi+0x62/0x6d
> > > > [<ffffffff81131b48>] __mark_inode_dirty+0xd0/0x177
> > > > [<ffffffff81127168>] touch_atime+0x107/0x12a
> > > > [<ffffffff81122384>] ? filldir+0x0/0xd0
> > > > [<ffffffff8112259b>] vfs_readdir+0x8d/0xb4
> > > > [<ffffffff8112270b>] sys_getdents+0x81/0xd1
> > > > [<ffffffff81009c72>] system_call_fastpath+0x16/0x1b
> > Thanks for the report. These bdi pointers are a mess. As Chris po=
inted
> > out, btrfs forgets to properly initialize inode->i_mapping.backing_=
dev_info
> > for directories and special inodes and thus these were previously a=
ttached
> > to default_backing_dev_info which probably isn't what Chris would l=
ike to
> > see.
>=20
> There's no actual writeback for these, so it works fine for btrfs eit=
her
> way.
Side note: every time inode is marked as dirty, we wake up a bdi thread
or the default bdi thread. So if we have inodes which do not need
write-back, we should never mark them as dirty.
--=20
Best Regards,
Artem Bityutskiy (=D0=90=D1=80=D1=82=D1=91=D0=BC =D0=91=D0=B8=D1=82=D1=8E=
=D1=86=D0=BA=D0=B8=D0=B9)
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Dirtiable inode bdi default != sb bdi btrfs
@ 2010-09-28 7:05 ` Artem Bityutskiy
0 siblings, 0 replies; 30+ messages in thread
From: Artem Bityutskiy @ 2010-09-28 7:05 UTC (permalink / raw)
To: Chris Mason
Cc: Jan Kara, Cesar Eduardo Barros, Andrew Morton, hch, linux-kernel,
Jens Axboe, linux-btrfs, Alexander Viro, linux-fsdevel, stable,
Jens Axboe, Michał Piotrowski, Chuck Ebbert, kernel
On Mon, 2010-09-27 at 18:54 -0400, Chris Mason wrote:
> On Tue, Sep 28, 2010 at 12:25:48AM +0200, Jan Kara wrote:
> > [Added CCs for similar ecryptfs warning]
> > On Thu 23-09-10 12:38:49, Andrew Morton wrote:
> > > > This started appearing for me on v2.6.36-rc5-49-gc79bd89; it did not
> > > > happen on v2.6.36-rc5-33-g1ce1e41, probably because it does not have
> > > > commit 692ebd17c2905313fff3c504c249c6a0faad16ec which introduces the
> > > > warning.
> > > > [...]
> > > > device fsid 44d595920ddedfa-3ece6b56e80f689e devid 1 transid 22342
> > > > /dev/mapper/vg_cesarbinspiro-lv_home
> > > > SELinux: initialized (dev dm-3, type btrfs), uses xattr
> > > > ------------[ cut here ]------------
> > > > WARNING: at fs/fs-writeback.c:87 inode_to_bdi+0x62/0x6d()
> > > > Hardware name: Inspiron N4010
> > > > Dirtiable inode bdi default != sb bdi btrfs
> > > > Modules linked in: ipv6 kvm_intel kvm uinput arc4 ecb
> > > > snd_hda_codec_intelhdmi snd_hda_codec_realtek iwlagn snd_hda_intel
> > > > iwlcore snd_hda_codec uvcvideo snd_hwdep mac80211 videodev snd_seq
> > > > snd_seq_device v4l1_compat snd_pcm atl1c v4l2_compat_ioctl32 btusb
> > > > cfg80211 snd_timer i2c_i801 bluetooth iTCO_wdt dell_wmi dell_laptop snd
> > > > pcspkr wmi dcdbas shpchp iTCO_vendor_support soundcore snd_page_alloc
> > > > rfkill joydev microcode btrfs zlib_deflate libcrc32c cryptd aes_x86_64
> > > > aes_generic xts gf128mul dm_crypt usb_storage i915 drm_kms_helper drm
> > > > i2c_algo_bit i2c_core video output [last unloaded: scsi_wait_scan]
> > > > Pid: 1073, comm: find Not tainted 2.6.36-rc5+ #8
> > > > Call Trace:
> > > > [<ffffffff8104d0e4>] warn_slowpath_common+0x85/0x9d
> > > > [<ffffffff8104d19f>] warn_slowpath_fmt+0x46/0x48
> > > > [<ffffffff811308b7>] inode_to_bdi+0x62/0x6d
> > > > [<ffffffff81131b48>] __mark_inode_dirty+0xd0/0x177
> > > > [<ffffffff81127168>] touch_atime+0x107/0x12a
> > > > [<ffffffff81122384>] ? filldir+0x0/0xd0
> > > > [<ffffffff8112259b>] vfs_readdir+0x8d/0xb4
> > > > [<ffffffff8112270b>] sys_getdents+0x81/0xd1
> > > > [<ffffffff81009c72>] system_call_fastpath+0x16/0x1b
> > Thanks for the report. These bdi pointers are a mess. As Chris pointed
> > out, btrfs forgets to properly initialize inode->i_mapping.backing_dev_info
> > for directories and special inodes and thus these were previously attached
> > to default_backing_dev_info which probably isn't what Chris would like to
> > see.
>
> There's no actual writeback for these, so it works fine for btrfs either
> way.
Side note: every time inode is marked as dirty, we wake up a bdi thread
or the default bdi thread. So if we have inodes which do not need
write-back, we should never mark them as dirty.
--
Best Regards,
Artem Bityutskiy (Артём Битюцкий)
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Dirtiable inode bdi default != sb bdi btrfs
2010-09-28 7:05 ` Artem Bityutskiy
(?)
@ 2010-09-29 13:00 ` Jan Kara
2010-09-29 13:40 ` Artem Bityutskiy
-1 siblings, 1 reply; 30+ messages in thread
From: Jan Kara @ 2010-09-29 13:00 UTC (permalink / raw)
To: Artem Bityutskiy
Cc: Chris Mason, Jan Kara, Cesar Eduardo Barros, Andrew Morton, hch,
linux-kernel, Jens Axboe, linux-btrfs, Alexander Viro,
linux-fsdevel, stable, Jens Axboe, Michał Piotrowski,
Chuck Ebbert, kernel
On Tue 28-09-10 10:05:49, Artem Bityutskiy wrote:
> On Mon, 2010-09-27 at 18:54 -0400, Chris Mason wrote:
> > On Tue, Sep 28, 2010 at 12:25:48AM +0200, Jan Kara wrote:
> > > [Added CCs for similar ecryptfs warning]
> > > On Thu 23-09-10 12:38:49, Andrew Morton wrote:
> > > > > This started appearing for me on v2.6.36-rc5-49-gc79bd89; it did not
> > > > > happen on v2.6.36-rc5-33-g1ce1e41, probably because it does not have
> > > > > commit 692ebd17c2905313fff3c504c249c6a0faad16ec which introduces the
> > > > > warning.
> > > > > [...]
> > > > > device fsid 44d595920ddedfa-3ece6b56e80f689e devid 1 transid 22342
> > > > > /dev/mapper/vg_cesarbinspiro-lv_home
> > > > > SELinux: initialized (dev dm-3, type btrfs), uses xattr
> > > > > ------------[ cut here ]------------
> > > > > WARNING: at fs/fs-writeback.c:87 inode_to_bdi+0x62/0x6d()
> > > > > Hardware name: Inspiron N4010
> > > > > Dirtiable inode bdi default != sb bdi btrfs
> > > > > Modules linked in: ipv6 kvm_intel kvm uinput arc4 ecb
> > > > > snd_hda_codec_intelhdmi snd_hda_codec_realtek iwlagn snd_hda_intel
> > > > > iwlcore snd_hda_codec uvcvideo snd_hwdep mac80211 videodev snd_seq
> > > > > snd_seq_device v4l1_compat snd_pcm atl1c v4l2_compat_ioctl32 btusb
> > > > > cfg80211 snd_timer i2c_i801 bluetooth iTCO_wdt dell_wmi dell_laptop snd
> > > > > pcspkr wmi dcdbas shpchp iTCO_vendor_support soundcore snd_page_alloc
> > > > > rfkill joydev microcode btrfs zlib_deflate libcrc32c cryptd aes_x86_64
> > > > > aes_generic xts gf128mul dm_crypt usb_storage i915 drm_kms_helper drm
> > > > > i2c_algo_bit i2c_core video output [last unloaded: scsi_wait_scan]
> > > > > Pid: 1073, comm: find Not tainted 2.6.36-rc5+ #8
> > > > > Call Trace:
> > > > > [<ffffffff8104d0e4>] warn_slowpath_common+0x85/0x9d
> > > > > [<ffffffff8104d19f>] warn_slowpath_fmt+0x46/0x48
> > > > > [<ffffffff811308b7>] inode_to_bdi+0x62/0x6d
> > > > > [<ffffffff81131b48>] __mark_inode_dirty+0xd0/0x177
> > > > > [<ffffffff81127168>] touch_atime+0x107/0x12a
> > > > > [<ffffffff81122384>] ? filldir+0x0/0xd0
> > > > > [<ffffffff8112259b>] vfs_readdir+0x8d/0xb4
> > > > > [<ffffffff8112270b>] sys_getdents+0x81/0xd1
> > > > > [<ffffffff81009c72>] system_call_fastpath+0x16/0x1b
> > > Thanks for the report. These bdi pointers are a mess. As Chris pointed
> > > out, btrfs forgets to properly initialize inode->i_mapping.backing_dev_info
> > > for directories and special inodes and thus these were previously attached
> > > to default_backing_dev_info which probably isn't what Chris would like to
> > > see.
> >
> > There's no actual writeback for these, so it works fine for btrfs either
> > way.
>
> Side note: every time inode is marked as dirty, we wake up a bdi thread
> or the default bdi thread. So if we have inodes which do not need
> write-back, we should never mark them as dirty.
Are you sure? I think we wake up the thread only when it's the first
dirty inode for the bdi...
And a side side note ;): It's harder not to dirty the inode than it seems.
E.g. btrfs (or similarly ext3) add the new inode data to the journal already
at inode dirty time but still they need to track that the transaction
carrying the inode is still uncommitted. Thus the inode *is* dirty in some
sense. Only it does not need any writeout to happen...
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Dirtiable inode bdi default != sb bdi btrfs
2010-09-29 13:00 ` Jan Kara
2010-09-29 13:40 ` Artem Bityutskiy
@ 2010-09-29 13:40 ` Artem Bityutskiy
0 siblings, 0 replies; 30+ messages in thread
From: Artem Bityutskiy @ 2010-09-29 13:40 UTC (permalink / raw)
To: Jan Kara
Cc: Chris Mason, Cesar Eduardo Barros, Andrew Morton, hch,
linux-kernel, Jens Axboe, linux-btrfs, Alexander Viro,
linux-fsdevel, stable, Jens Axboe, Michał Piotrowski,
Chuck Ebbert, kernel
On Wed, 2010-09-29 at 15:00 +0200, Jan Kara wrote:
> On Tue 28-09-10 10:05:49, Artem Bityutskiy wrote:
> > On Mon, 2010-09-27 at 18:54 -0400, Chris Mason wrote:
> > > On Tue, Sep 28, 2010 at 12:25:48AM +0200, Jan Kara wrote:
> > > > [Added CCs for similar ecryptfs warning]
> > > > On Thu 23-09-10 12:38:49, Andrew Morton wrote:
> > > > > > This started appearing for me on v2.6.36-rc5-49-gc79bd89; i=
t did not=20
> > > > > > happen on v2.6.36-rc5-33-g1ce1e41, probably because it does=
not have=20
> > > > > > commit 692ebd17c2905313fff3c504c249c6a0faad16ec which intro=
duces the=20
> > > > > > warning.
> > > > > > [...]
> > > > > > device fsid 44d595920ddedfa-3ece6b56e80f689e devid 1 transi=
d 22342=20
> > > > > > /dev/mapper/vg_cesarbinspiro-lv_home
> > > > > > SELinux: initialized (dev dm-3, type btrfs), uses xattr
> > > > > > ------------[ cut here ]------------
> > > > > > WARNING: at fs/fs-writeback.c:87 inode_to_bdi+0x62/0x6d()
> > > > > > Hardware name: Inspiron N4010
> > > > > > Dirtiable inode bdi default !=3D sb bdi btrfs
> > > > > > Modules linked in: ipv6 kvm_intel kvm uinput arc4 ecb=20
> > > > > > snd_hda_codec_intelhdmi snd_hda_codec_realtek iwlagn snd_hd=
a_intel=20
> > > > > > iwlcore snd_hda_codec uvcvideo snd_hwdep mac80211 videodev =
snd_seq=20
> > > > > > snd_seq_device v4l1_compat snd_pcm atl1c v4l2_compat_ioctl3=
2 btusb=20
> > > > > > cfg80211 snd_timer i2c_i801 bluetooth iTCO_wdt dell_wmi del=
l_laptop snd=20
> > > > > > pcspkr wmi dcdbas shpchp iTCO_vendor_support soundcore snd_=
page_alloc=20
> > > > > > rfkill joydev microcode btrfs zlib_deflate libcrc32c cryptd=
aes_x86_64=20
> > > > > > aes_generic xts gf128mul dm_crypt usb_storage i915 drm_kms_=
helper drm=20
> > > > > > i2c_algo_bit i2c_core video output [last unloaded: scsi_wai=
t_scan]
> > > > > > Pid: 1073, comm: find Not tainted 2.6.36-rc5+ #8
> > > > > > Call Trace:
> > > > > > [<ffffffff8104d0e4>] warn_slowpath_common+0x85/0x9d
> > > > > > [<ffffffff8104d19f>] warn_slowpath_fmt+0x46/0x48
> > > > > > [<ffffffff811308b7>] inode_to_bdi+0x62/0x6d
> > > > > > [<ffffffff81131b48>] __mark_inode_dirty+0xd0/0x177
> > > > > > [<ffffffff81127168>] touch_atime+0x107/0x12a
> > > > > > [<ffffffff81122384>] ? filldir+0x0/0xd0
> > > > > > [<ffffffff8112259b>] vfs_readdir+0x8d/0xb4
> > > > > > [<ffffffff8112270b>] sys_getdents+0x81/0xd1
> > > > > > [<ffffffff81009c72>] system_call_fastpath+0x16/0x1b
> > > > Thanks for the report. These bdi pointers are a mess. As Chri=
s pointed
> > > > out, btrfs forgets to properly initialize inode->i_mapping.back=
ing_dev_info
> > > > for directories and special inodes and thus these were previous=
ly attached
> > > > to default_backing_dev_info which probably isn't what Chris wou=
ld like to
> > > > see.
> > >=20
> > > There's no actual writeback for these, so it works fine for btrfs=
either
> > > way.
> >=20
> > Side note: every time inode is marked as dirty, we wake up a bdi th=
read
> > or the default bdi thread. So if we have inodes which do not need
> > write-back, we should never mark them as dirty.
> Are you sure? I think we wake up the thread only when it's the firs=
t
> dirty inode for the bdi...
Err, right. If no one ever marks it as clean then we won't wake-up the
thread. But I thought that marking it as dirty even once is bad because
this causes bdi thread creation, which consumes resources.
Sorry for my ignorance, I did not really follow the conversation, I jus=
t
remember that when I looked at bdi stuff, I noticed that during boot th=
e
kernel created many bdi threads which were never used then. They
eventually exited. But I thought that creating useless bdi threads it
about concuming resources and slowing down the boot.
As I remember, the reason was touch_atime() for some of the threads.
But really, I did not dig this, I just noticed this conversation and
wanted to let you know about the issue I noticed this summer.
--=20
Best Regards,
Artem Bityutskiy (=D0=90=D1=80=D1=82=D1=91=D0=BC =D0=91=D0=B8=D1=82=D1=8E=
=D1=86=D0=BA=D0=B8=D0=B9)
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Dirtiable inode bdi default != sb bdi btrfs
@ 2010-09-29 13:40 ` Artem Bityutskiy
0 siblings, 0 replies; 30+ messages in thread
From: Artem Bityutskiy @ 2010-09-29 13:40 UTC (permalink / raw)
To: Jan Kara
Cc: Chris Mason, Cesar Eduardo Barros, Andrew Morton, hch,
linux-kernel, Jens Axboe, linux-btrfs, Alexander Viro,
linux-fsdevel, stable, Jens Axboe, Michał Piotrowski,
Chuck Ebbert, kernel
On Wed, 2010-09-29 at 15:00 +0200, Jan Kara wrote:
> On Tue 28-09-10 10:05:49, Artem Bityutskiy wrote:
> > On Mon, 2010-09-27 at 18:54 -0400, Chris Mason wrote:
> > > On Tue, Sep 28, 2010 at 12:25:48AM +0200, Jan Kara wrote:
> > > > [Added CCs for similar ecryptfs warning]
> > > > On Thu 23-09-10 12:38:49, Andrew Morton wrote:
> > > > > > This started appearing for me on v2.6.36-rc5-49-gc79bd89; it did not
> > > > > > happen on v2.6.36-rc5-33-g1ce1e41, probably because it does not have
> > > > > > commit 692ebd17c2905313fff3c504c249c6a0faad16ec which introduces the
> > > > > > warning.
> > > > > > [...]
> > > > > > device fsid 44d595920ddedfa-3ece6b56e80f689e devid 1 transid 22342
> > > > > > /dev/mapper/vg_cesarbinspiro-lv_home
> > > > > > SELinux: initialized (dev dm-3, type btrfs), uses xattr
> > > > > > ------------[ cut here ]------------
> > > > > > WARNING: at fs/fs-writeback.c:87 inode_to_bdi+0x62/0x6d()
> > > > > > Hardware name: Inspiron N4010
> > > > > > Dirtiable inode bdi default != sb bdi btrfs
> > > > > > Modules linked in: ipv6 kvm_intel kvm uinput arc4 ecb
> > > > > > snd_hda_codec_intelhdmi snd_hda_codec_realtek iwlagn snd_hda_intel
> > > > > > iwlcore snd_hda_codec uvcvideo snd_hwdep mac80211 videodev snd_seq
> > > > > > snd_seq_device v4l1_compat snd_pcm atl1c v4l2_compat_ioctl32 btusb
> > > > > > cfg80211 snd_timer i2c_i801 bluetooth iTCO_wdt dell_wmi dell_laptop snd
> > > > > > pcspkr wmi dcdbas shpchp iTCO_vendor_support soundcore snd_page_alloc
> > > > > > rfkill joydev microcode btrfs zlib_deflate libcrc32c cryptd aes_x86_64
> > > > > > aes_generic xts gf128mul dm_crypt usb_storage i915 drm_kms_helper drm
> > > > > > i2c_algo_bit i2c_core video output [last unloaded: scsi_wait_scan]
> > > > > > Pid: 1073, comm: find Not tainted 2.6.36-rc5+ #8
> > > > > > Call Trace:
> > > > > > [<ffffffff8104d0e4>] warn_slowpath_common+0x85/0x9d
> > > > > > [<ffffffff8104d19f>] warn_slowpath_fmt+0x46/0x48
> > > > > > [<ffffffff811308b7>] inode_to_bdi+0x62/0x6d
> > > > > > [<ffffffff81131b48>] __mark_inode_dirty+0xd0/0x177
> > > > > > [<ffffffff81127168>] touch_atime+0x107/0x12a
> > > > > > [<ffffffff81122384>] ? filldir+0x0/0xd0
> > > > > > [<ffffffff8112259b>] vfs_readdir+0x8d/0xb4
> > > > > > [<ffffffff8112270b>] sys_getdents+0x81/0xd1
> > > > > > [<ffffffff81009c72>] system_call_fastpath+0x16/0x1b
> > > > Thanks for the report. These bdi pointers are a mess. As Chris pointed
> > > > out, btrfs forgets to properly initialize inode->i_mapping.backing_dev_info
> > > > for directories and special inodes and thus these were previously attached
> > > > to default_backing_dev_info which probably isn't what Chris would like to
> > > > see.
> > >
> > > There's no actual writeback for these, so it works fine for btrfs either
> > > way.
> >
> > Side note: every time inode is marked as dirty, we wake up a bdi thread
> > or the default bdi thread. So if we have inodes which do not need
> > write-back, we should never mark them as dirty.
> Are you sure? I think we wake up the thread only when it's the first
> dirty inode for the bdi...
Err, right. If no one ever marks it as clean then we won't wake-up the
thread. But I thought that marking it as dirty even once is bad because
this causes bdi thread creation, which consumes resources.
Sorry for my ignorance, I did not really follow the conversation, I just
remember that when I looked at bdi stuff, I noticed that during boot the
kernel created many bdi threads which were never used then. They
eventually exited. But I thought that creating useless bdi threads it
about concuming resources and slowing down the boot.
As I remember, the reason was touch_atime() for some of the threads.
But really, I did not dig this, I just noticed this conversation and
wanted to let you know about the issue I noticed this summer.
--
Best Regards,
Artem Bityutskiy (Артём Битюцкий)
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Dirtiable inode bdi default != sb bdi btrfs
@ 2010-09-29 13:40 ` Artem Bityutskiy
0 siblings, 0 replies; 30+ messages in thread
From: Artem Bityutskiy @ 2010-09-29 13:40 UTC (permalink / raw)
To: Jan Kara
Cc: Chris Mason, Cesar Eduardo Barros, Andrew Morton, hch,
linux-kernel, Jens Axboe, linux-btrfs, Alexander Viro,
linux-fsdevel, stable, Jens Axboe, Michał Piotrowski,
Chuck Ebbert, kernel
On Wed, 2010-09-29 at 15:00 +0200, Jan Kara wrote:
> On Tue 28-09-10 10:05:49, Artem Bityutskiy wrote:
> > On Mon, 2010-09-27 at 18:54 -0400, Chris Mason wrote:
> > > On Tue, Sep 28, 2010 at 12:25:48AM +0200, Jan Kara wrote:
> > > > [Added CCs for similar ecryptfs warning]
> > > > On Thu 23-09-10 12:38:49, Andrew Morton wrote:
> > > > > > This started appearing for me on v2.6.36-rc5-49-gc79bd89; it did not
> > > > > > happen on v2.6.36-rc5-33-g1ce1e41, probably because it does not have
> > > > > > commit 692ebd17c2905313fff3c504c249c6a0faad16ec which introduces the
> > > > > > warning.
> > > > > > [...]
> > > > > > device fsid 44d595920ddedfa-3ece6b56e80f689e devid 1 transid 22342
> > > > > > /dev/mapper/vg_cesarbinspiro-lv_home
> > > > > > SELinux: initialized (dev dm-3, type btrfs), uses xattr
> > > > > > ------------[ cut here ]------------
> > > > > > WARNING: at fs/fs-writeback.c:87 inode_to_bdi+0x62/0x6d()
> > > > > > Hardware name: Inspiron N4010
> > > > > > Dirtiable inode bdi default != sb bdi btrfs
> > > > > > Modules linked in: ipv6 kvm_intel kvm uinput arc4 ecb
> > > > > > snd_hda_codec_intelhdmi snd_hda_codec_realtek iwlagn snd_hda_intel
> > > > > > iwlcore snd_hda_codec uvcvideo snd_hwdep mac80211 videodev snd_seq
> > > > > > snd_seq_device v4l1_compat snd_pcm atl1c v4l2_compat_ioctl32 btusb
> > > > > > cfg80211 snd_timer i2c_i801 bluetooth iTCO_wdt dell_wmi dell_laptop snd
> > > > > > pcspkr wmi dcdbas shpchp iTCO_vendor_support soundcore snd_page_alloc
> > > > > > rfkill joydev microcode btrfs zlib_deflate libcrc32c cryptd aes_x86_64
> > > > > > aes_generic xts gf128mul dm_crypt usb_storage i915 drm_kms_helper drm
> > > > > > i2c_algo_bit i2c_core video output [last unloaded: scsi_wait_scan]
> > > > > > Pid: 1073, comm: find Not tainted 2.6.36-rc5+ #8
> > > > > > Call Trace:
> > > > > > [<ffffffff8104d0e4>] warn_slowpath_common+0x85/0x9d
> > > > > > [<ffffffff8104d19f>] warn_slowpath_fmt+0x46/0x48
> > > > > > [<ffffffff811308b7>] inode_to_bdi+0x62/0x6d
> > > > > > [<ffffffff81131b48>] __mark_inode_dirty+0xd0/0x177
> > > > > > [<ffffffff81127168>] touch_atime+0x107/0x12a
> > > > > > [<ffffffff81122384>] ? filldir+0x0/0xd0
> > > > > > [<ffffffff8112259b>] vfs_readdir+0x8d/0xb4
> > > > > > [<ffffffff8112270b>] sys_getdents+0x81/0xd1
> > > > > > [<ffffffff81009c72>] system_call_fastpath+0x16/0x1b
> > > > Thanks for the report. These bdi pointers are a mess. As Chris pointed
> > > > out, btrfs forgets to properly initialize inode->i_mapping.backing_dev_info
> > > > for directories and special inodes and thus these were previously attached
> > > > to default_backing_dev_info which probably isn't what Chris would like to
> > > > see.
> > >
> > > There's no actual writeback for these, so it works fine for btrfs either
> > > way.
> >
> > Side note: every time inode is marked as dirty, we wake up a bdi thread
> > or the default bdi thread. So if we have inodes which do not need
> > write-back, we should never mark them as dirty.
> Are you sure? I think we wake up the thread only when it's the first
> dirty inode for the bdi...
Err, right. If no one ever marks it as clean then we won't wake-up the
thread. But I thought that marking it as dirty even once is bad because
this causes bdi thread creation, which consumes resources.
Sorry for my ignorance, I did not really follow the conversation, I just
remember that when I looked at bdi stuff, I noticed that during boot the
kernel created many bdi threads which were never used then. They
eventually exited. But I thought that creating useless bdi threads it
about concuming resources and slowing down the boot.
As I remember, the reason was touch_atime() for some of the threads.
But really, I did not dig this, I just noticed this conversation and
wanted to let you know about the issue I noticed this summer.
--
Best Regards,
Artem Bityutskiy (Артём Битюцкий)
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Dirtiable inode bdi default != sb bdi btrfs
2010-09-27 22:54 ` Chris Mason
2010-09-27 23:51 ` Jan Kara
2010-09-28 7:05 ` Artem Bityutskiy
@ 2010-09-29 8:19 ` Christoph Hellwig
2010-09-29 8:19 ` Christoph Hellwig
3 siblings, 0 replies; 30+ messages in thread
From: Christoph Hellwig @ 2010-09-29 8:19 UTC (permalink / raw)
To: Chris Mason, Jan Kara, Cesar Eduardo Barros, Andrew Morton, hch,
linux-kernel
Here is the patch that I already proposed a while ago. I've tested
xfstests on btrfs and xfstests to make sure the btrfs issue is fixed,
and I've also tested the original dirtying of device files issue
and I/O operations on block device files to test the special case
in the patch.
---
From: Christoph Hellwig <hch@lst.de>
Subject: [PATCH] writeback: always use sb->s_bdi for writeback purposes
We currently use struct backing_dev_info for various different purposes.
Originally it was introduced to describe a backing device which includes
an unplug and congestion function and various bits of readahead information
and VM-relevant flags. We're also using for tracking dirty inodes for
writeback.
To make writeback properly find all inodes we need to only access the
per-filesystem backing_device pointed to by the superblock in ->s_bdi
inside the writeback code, and not the instances pointeded to by
inode->i_mapping->backing_dev which can be overriden by special devices
or might not be set at all by some filesystems.
Long term we should split out the writeback-relevant bits of struct
backing_device_info (which includes more than the current bdi_writeback)
and only point to it from the superblock while leaving the traditional
backing device as a separate structure that can be overriden by devices.
The one exception for now is the block device filesystem which really
wants different writeback contexts for it's different (internal) inodes
to handle the writeout more efficiently. For now we do this with
a hack in fs-writeback.c because we're so late in the cycle, but in
the future I plan to replace this with a superblock method that allows
for multiple writeback contexts per filesystem.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Index: linux-2.6/fs/fs-writeback.c
===================================================================
--- linux-2.6.orig/fs/fs-writeback.c 2010-09-29 16:58:41.750557721 +0900
+++ linux-2.6/fs/fs-writeback.c 2010-09-29 17:11:35.040557719 +0900
@@ -72,22 +72,10 @@ int writeback_in_progress(struct backing
static inline struct backing_dev_info *inode_to_bdi(struct inode *inode)
{
struct super_block *sb = inode->i_sb;
- struct backing_dev_info *bdi = inode->i_mapping->backing_dev_info;
- /*
- * For inodes on standard filesystems, we use superblock's bdi. For
- * inodes on virtual filesystems, we want to use inode mapping's bdi
- * because they can possibly point to something useful (think about
- * block_dev filesystem).
- */
- if (sb->s_bdi && sb->s_bdi != &noop_backing_dev_info) {
- /* Some device inodes could play dirty tricks. Catch them... */
- WARN(bdi != sb->s_bdi && bdi_cap_writeback_dirty(bdi),
- "Dirtiable inode bdi %s != sb bdi %s\n",
- bdi->name, sb->s_bdi->name);
- return sb->s_bdi;
- }
- return bdi;
+ if (strcmp(sb->s_type->name, "bdev") == 0)
+ return inode->i_mapping->backing_dev_info;
+ return sb->s_bdi;
}
static void bdi_queue_work(struct backing_dev_info *bdi,
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Dirtiable inode bdi default != sb bdi btrfs
2010-09-27 22:54 ` Chris Mason
@ 2010-09-29 8:19 ` Christoph Hellwig
2010-09-28 7:05 ` Artem Bityutskiy
` (2 subsequent siblings)
3 siblings, 0 replies; 30+ messages in thread
From: Christoph Hellwig @ 2010-09-29 8:19 UTC (permalink / raw)
To: Chris Mason, Jan Kara, Cesar Eduardo Barros, Andrew Morton, hch,
linux-kernel
Here is the patch that I already proposed a while ago. I've tested
xfstests on btrfs and xfstests to make sure the btrfs issue is fixed,
and I've also tested the original dirtying of device files issue
and I/O operations on block device files to test the special case
in the patch.
---
From: Christoph Hellwig <hch@lst.de>
Subject: [PATCH] writeback: always use sb->s_bdi for writeback purposes
We currently use struct backing_dev_info for various different purposes.
Originally it was introduced to describe a backing device which includes
an unplug and congestion function and various bits of readahead information
and VM-relevant flags. We're also using for tracking dirty inodes for
writeback.
To make writeback properly find all inodes we need to only access the
per-filesystem backing_device pointed to by the superblock in ->s_bdi
inside the writeback code, and not the instances pointeded to by
inode->i_mapping->backing_dev which can be overriden by special devices
or might not be set at all by some filesystems.
Long term we should split out the writeback-relevant bits of struct
backing_device_info (which includes more than the current bdi_writeback)
and only point to it from the superblock while leaving the traditional
backing device as a separate structure that can be overriden by devices.
The one exception for now is the block device filesystem which really
wants different writeback contexts for it's different (internal) inodes
to handle the writeout more efficiently. For now we do this with
a hack in fs-writeback.c because we're so late in the cycle, but in
the future I plan to replace this with a superblock method that allows
for multiple writeback contexts per filesystem.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Index: linux-2.6/fs/fs-writeback.c
===================================================================
--- linux-2.6.orig/fs/fs-writeback.c 2010-09-29 16:58:41.750557721 +0900
+++ linux-2.6/fs/fs-writeback.c 2010-09-29 17:11:35.040557719 +0900
@@ -72,22 +72,10 @@ int writeback_in_progress(struct backing
static inline struct backing_dev_info *inode_to_bdi(struct inode *inode)
{
struct super_block *sb = inode->i_sb;
- struct backing_dev_info *bdi = inode->i_mapping->backing_dev_info;
- /*
- * For inodes on standard filesystems, we use superblock's bdi. For
- * inodes on virtual filesystems, we want to use inode mapping's bdi
- * because they can possibly point to something useful (think about
- * block_dev filesystem).
- */
- if (sb->s_bdi && sb->s_bdi != &noop_backing_dev_info) {
- /* Some device inodes could play dirty tricks. Catch them... */
- WARN(bdi != sb->s_bdi && bdi_cap_writeback_dirty(bdi),
- "Dirtiable inode bdi %s != sb bdi %s\n",
- bdi->name, sb->s_bdi->name);
- return sb->s_bdi;
- }
- return bdi;
+ if (strcmp(sb->s_type->name, "bdev") == 0)
+ return inode->i_mapping->backing_dev_info;
+ return sb->s_bdi;
}
static void bdi_queue_work(struct backing_dev_info *bdi,
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Dirtiable inode bdi default != sb bdi btrfs
@ 2010-09-29 8:19 ` Christoph Hellwig
0 siblings, 0 replies; 30+ messages in thread
From: Christoph Hellwig @ 2010-09-29 8:19 UTC (permalink / raw)
To: Chris Mason, Jan Kara, Cesar Eduardo Barros, Andrew Morton, hch,
linux-kernel, Jens Axboe, linux-btrfs, Alexander Viro,
linux-fsdevel, stable, Jens Axboe, Micha?? Piotrowski,
Chuck Ebbert, kernel
Here is the patch that I already proposed a while ago. I've tested
xfstests on btrfs and xfstests to make sure the btrfs issue is fixed,
and I've also tested the original dirtying of device files issue
and I/O operations on block device files to test the special case
in the patch.
---
From: Christoph Hellwig <hch@lst.de>
Subject: [PATCH] writeback: always use sb->s_bdi for writeback purposes
We currently use struct backing_dev_info for various different purposes.
Originally it was introduced to describe a backing device which includes
an unplug and congestion function and various bits of readahead information
and VM-relevant flags. We're also using for tracking dirty inodes for
writeback.
To make writeback properly find all inodes we need to only access the
per-filesystem backing_device pointed to by the superblock in ->s_bdi
inside the writeback code, and not the instances pointeded to by
inode->i_mapping->backing_dev which can be overriden by special devices
or might not be set at all by some filesystems.
Long term we should split out the writeback-relevant bits of struct
backing_device_info (which includes more than the current bdi_writeback)
and only point to it from the superblock while leaving the traditional
backing device as a separate structure that can be overriden by devices.
The one exception for now is the block device filesystem which really
wants different writeback contexts for it's different (internal) inodes
to handle the writeout more efficiently. For now we do this with
a hack in fs-writeback.c because we're so late in the cycle, but in
the future I plan to replace this with a superblock method that allows
for multiple writeback contexts per filesystem.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Index: linux-2.6/fs/fs-writeback.c
===================================================================
--- linux-2.6.orig/fs/fs-writeback.c 2010-09-29 16:58:41.750557721 +0900
+++ linux-2.6/fs/fs-writeback.c 2010-09-29 17:11:35.040557719 +0900
@@ -72,22 +72,10 @@ int writeback_in_progress(struct backing
static inline struct backing_dev_info *inode_to_bdi(struct inode *inode)
{
struct super_block *sb = inode->i_sb;
- struct backing_dev_info *bdi = inode->i_mapping->backing_dev_info;
- /*
- * For inodes on standard filesystems, we use superblock's bdi. For
- * inodes on virtual filesystems, we want to use inode mapping's bdi
- * because they can possibly point to something useful (think about
- * block_dev filesystem).
- */
- if (sb->s_bdi && sb->s_bdi != &noop_backing_dev_info) {
- /* Some device inodes could play dirty tricks. Catch them... */
- WARN(bdi != sb->s_bdi && bdi_cap_writeback_dirty(bdi),
- "Dirtiable inode bdi %s != sb bdi %s\n",
- bdi->name, sb->s_bdi->name);
- return sb->s_bdi;
- }
- return bdi;
+ if (strcmp(sb->s_type->name, "bdev") == 0)
+ return inode->i_mapping->backing_dev_info;
+ return sb->s_bdi;
}
static void bdi_queue_work(struct backing_dev_info *bdi,
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Dirtiable inode bdi default != sb bdi btrfs
@ 2010-09-29 12:18 ` Jan Kara
0 siblings, 0 replies; 30+ messages in thread
From: Jan Kara @ 2010-09-29 12:18 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Chris Mason, Jan Kara, Cesar Eduardo Barros, Andrew Morton,
linux-kernel, Jens Axboe, linux-btrfs, Alexander Viro,
linux-fsdevel, stable, Jens Axboe, Micha?? Piotrowski,
Chuck Ebbert, kernel
On Wed 29-09-10 10:19:36, Christoph Hellwig wrote:
> ---
> From: Christoph Hellwig <hch@lst.de>
> Subject: [PATCH] writeback: always use sb->s_bdi for writeback purposes
>
...
> The one exception for now is the block device filesystem which really
> wants different writeback contexts for it's different (internal) inodes
> to handle the writeout more efficiently. For now we do this with
> a hack in fs-writeback.c because we're so late in the cycle, but in
> the future I plan to replace this with a superblock method that allows
> for multiple writeback contexts per filesystem.
Another exception I know about is mtd_inodefs filesystem
(drivers/mtd/mtdchar.c).
> Signed-off-by: Christoph Hellwig <hch@lst.de>
>
> Index: linux-2.6/fs/fs-writeback.c
> ===================================================================
> --- linux-2.6.orig/fs/fs-writeback.c 2010-09-29 16:58:41.750557721 +0900
> +++ linux-2.6/fs/fs-writeback.c 2010-09-29 17:11:35.040557719 +0900
> @@ -72,22 +72,10 @@ int writeback_in_progress(struct backing
> static inline struct backing_dev_info *inode_to_bdi(struct inode *inode)
> {
> struct super_block *sb = inode->i_sb;
> - struct backing_dev_info *bdi = inode->i_mapping->backing_dev_info;
>
> - /*
> - * For inodes on standard filesystems, we use superblock's bdi. For
> - * inodes on virtual filesystems, we want to use inode mapping's bdi
> - * because they can possibly point to something useful (think about
> - * block_dev filesystem).
> - */
> - if (sb->s_bdi && sb->s_bdi != &noop_backing_dev_info) {
> - /* Some device inodes could play dirty tricks. Catch them... */
> - WARN(bdi != sb->s_bdi && bdi_cap_writeback_dirty(bdi),
> - "Dirtiable inode bdi %s != sb bdi %s\n",
> - bdi->name, sb->s_bdi->name);
> - return sb->s_bdi;
> - }
> - return bdi;
> + if (strcmp(sb->s_type->name, "bdev") == 0)
> + return inode->i_mapping->backing_dev_info;
> + return sb->s_bdi;
So at least here you'd need also add a similar exception for
"mtd_inodefs". Because of these exeptions I've chosen the
(sb->s_bdi && sb->s_bdi != &noop_backing_dev_info) check rather than your
exception based check. All in all I don't care much what ends up in the
kernel as it's just a temporary solution...
Also I've added the warning to catch situations where inodes would get
filed to a different backing device after the patch. So far the reported
warnings were harmless but still I'm more comfortable when it's there
because otherwise we can so easily miss some device-driver-invented
filesystem like mtd_inodefs which would break silently after the change...
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Dirtiable inode bdi default != sb bdi btrfs
@ 2010-09-29 12:18 ` Jan Kara
0 siblings, 0 replies; 30+ messages in thread
From: Jan Kara @ 2010-09-29 12:18 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Jens Axboe, Jan Kara, kernel-TuqUDEhatI4ANWPb/1PvSmm0pvjS0E/A,
Jens Axboe, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
Micha?? Piotrowski, Chris Mason,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
linux-btrfs-u79uwXL29TY76Z2rM5mHXA,
stable-DgEjT+Ai2ygdnm+yROfE0A, Cesar Eduardo Barros,
Alexander Viro
On Wed 29-09-10 10:19:36, Christoph Hellwig wrote:
> ---
> From: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
> Subject: [PATCH] writeback: always use sb->s_bdi for writeback purposes
>
...
> The one exception for now is the block device filesystem which really
> wants different writeback contexts for it's different (internal) inodes
> to handle the writeout more efficiently. For now we do this with
> a hack in fs-writeback.c because we're so late in the cycle, but in
> the future I plan to replace this with a superblock method that allows
> for multiple writeback contexts per filesystem.
Another exception I know about is mtd_inodefs filesystem
(drivers/mtd/mtdchar.c).
> Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
>
> Index: linux-2.6/fs/fs-writeback.c
> ===================================================================
> --- linux-2.6.orig/fs/fs-writeback.c 2010-09-29 16:58:41.750557721 +0900
> +++ linux-2.6/fs/fs-writeback.c 2010-09-29 17:11:35.040557719 +0900
> @@ -72,22 +72,10 @@ int writeback_in_progress(struct backing
> static inline struct backing_dev_info *inode_to_bdi(struct inode *inode)
> {
> struct super_block *sb = inode->i_sb;
> - struct backing_dev_info *bdi = inode->i_mapping->backing_dev_info;
>
> - /*
> - * For inodes on standard filesystems, we use superblock's bdi. For
> - * inodes on virtual filesystems, we want to use inode mapping's bdi
> - * because they can possibly point to something useful (think about
> - * block_dev filesystem).
> - */
> - if (sb->s_bdi && sb->s_bdi != &noop_backing_dev_info) {
> - /* Some device inodes could play dirty tricks. Catch them... */
> - WARN(bdi != sb->s_bdi && bdi_cap_writeback_dirty(bdi),
> - "Dirtiable inode bdi %s != sb bdi %s\n",
> - bdi->name, sb->s_bdi->name);
> - return sb->s_bdi;
> - }
> - return bdi;
> + if (strcmp(sb->s_type->name, "bdev") == 0)
> + return inode->i_mapping->backing_dev_info;
> + return sb->s_bdi;
So at least here you'd need also add a similar exception for
"mtd_inodefs". Because of these exeptions I've chosen the
(sb->s_bdi && sb->s_bdi != &noop_backing_dev_info) check rather than your
exception based check. All in all I don't care much what ends up in the
kernel as it's just a temporary solution...
Also I've added the warning to catch situations where inodes would get
filed to a different backing device after the patch. So far the reported
warnings were harmless but still I'm more comfortable when it's there
because otherwise we can so easily miss some device-driver-invented
filesystem like mtd_inodefs which would break silently after the change...
Honza
--
Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Dirtiable inode bdi default != sb bdi btrfs
2010-09-29 12:18 ` Jan Kara
(?)
@ 2010-09-29 14:10 ` Christoph Hellwig
2010-09-29 23:38 ` Jan Kara
-1 siblings, 1 reply; 30+ messages in thread
From: Christoph Hellwig @ 2010-09-29 14:10 UTC (permalink / raw)
To: Jan Kara
Cc: Christoph Hellwig, Chris Mason, Cesar Eduardo Barros,
Andrew Morton, linux-kernel, Jens Axboe, linux-btrfs,
Alexander Viro, linux-fsdevel, stable, Jens Axboe,
Micha?? Piotrowski, Chuck Ebbert, kernel
On Wed, Sep 29, 2010 at 02:18:08PM +0200, Jan Kara wrote:
> On Wed 29-09-10 10:19:36, Christoph Hellwig wrote:
> > ---
> > From: Christoph Hellwig <hch@lst.de>
> > Subject: [PATCH] writeback: always use sb->s_bdi for writeback purposes
> >
> ...
> > The one exception for now is the block device filesystem which really
> > wants different writeback contexts for it's different (internal) inodes
> > to handle the writeout more efficiently. For now we do this with
> > a hack in fs-writeback.c because we're so late in the cycle, but in
> > the future I plan to replace this with a superblock method that allows
> > for multiple writeback contexts per filesystem.
> Another exception I know about is mtd_inodefs filesystem
> (drivers/mtd/mtdchar.c).
No, it's not. MTD only has three different backing_dev_info instances
which have different flags in the mapping-relevant portion of the
backing_dev.
> So at least here you'd need also add a similar exception for
> "mtd_inodefs".
No. For one thing we don't need any exception for correctnes alone -
even the block device variant would work fine with the default case.
But for mtd specificly we don't need an exception for performance either
given that there are no per-device bdis in mtd.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Dirtiable inode bdi default != sb bdi btrfs
2010-09-29 14:10 ` Christoph Hellwig
@ 2010-09-29 23:38 ` Jan Kara
0 siblings, 0 replies; 30+ messages in thread
From: Jan Kara @ 2010-09-29 23:38 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Jan Kara, Chris Mason, Cesar Eduardo Barros, Andrew Morton,
linux-kernel, Jens Axboe, linux-btrfs, Alexander Viro,
linux-fsdevel, stable, Jens Axboe, Micha?? Piotrowski,
Chuck Ebbert, kernel, linux-mtd, David Woodhouse
On Wed 29-09-10 16:10:06, Christoph Hellwig wrote:
> On Wed, Sep 29, 2010 at 02:18:08PM +0200, Jan Kara wrote:
> > On Wed 29-09-10 10:19:36, Christoph Hellwig wrote:
> > > ---
> > > From: Christoph Hellwig <hch@lst.de>
> > > Subject: [PATCH] writeback: always use sb->s_bdi for writeback purposes
> > >
> > ...
> > > The one exception for now is the block device filesystem which really
> > > wants different writeback contexts for it's different (internal) inodes
> > > to handle the writeout more efficiently. For now we do this with
> > > a hack in fs-writeback.c because we're so late in the cycle, but in
> > > the future I plan to replace this with a superblock method that allows
> > > for multiple writeback contexts per filesystem.
> > Another exception I know about is mtd_inodefs filesystem
> > (drivers/mtd/mtdchar.c).
>
> No, it's not. MTD only has three different backing_dev_info instances
> which have different flags in the mapping-relevant portion of the
> backing_dev.
In the end I agree I was probably wrong but it's not that simple ;)
> > So at least here you'd need also add a similar exception for
> > "mtd_inodefs".
>
> No. For one thing we don't need any exception for correctnes alone -
> even the block device variant would work fine with the default case.
Here I don't agree. If you don't have some kind of exception, sb->s_bdi
for both "block" and "mtd_inodefs" filesystems points to
noop_backing_dev_info and you get no writeback for that one. So it isn't
just a performance issue but also a correctness one.
Regarding mtd_inodefs I now looked in more detail what MTD actually does
and it seems to me that MTD device inodes do not seem to carry any
cached state that flusher threads could write back. So returning
noop_backing_dev_info might be the right thing for them after all...
(added David Woodhouse and MTD list to CC so that they can shout if it's
not the case). Coming to this conclusion, I'm happy with your patch going
in as is...
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Dirtiable inode bdi default != sb bdi btrfs
@ 2010-09-29 23:38 ` Jan Kara
0 siblings, 0 replies; 30+ messages in thread
From: Jan Kara @ 2010-09-29 23:38 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Jens Axboe, Jan Kara, kernel, Jens Axboe, linux-kernel,
Micha?? Piotrowski, linux-mtd, Chris Mason, Chuck Ebbert,
linux-fsdevel, Andrew Morton, linux-btrfs, David Woodhouse,
stable, Cesar Eduardo Barros, Alexander Viro
On Wed 29-09-10 16:10:06, Christoph Hellwig wrote:
> On Wed, Sep 29, 2010 at 02:18:08PM +0200, Jan Kara wrote:
> > On Wed 29-09-10 10:19:36, Christoph Hellwig wrote:
> > > ---
> > > From: Christoph Hellwig <hch@lst.de>
> > > Subject: [PATCH] writeback: always use sb->s_bdi for writeback purposes
> > >
> > ...
> > > The one exception for now is the block device filesystem which really
> > > wants different writeback contexts for it's different (internal) inodes
> > > to handle the writeout more efficiently. For now we do this with
> > > a hack in fs-writeback.c because we're so late in the cycle, but in
> > > the future I plan to replace this with a superblock method that allows
> > > for multiple writeback contexts per filesystem.
> > Another exception I know about is mtd_inodefs filesystem
> > (drivers/mtd/mtdchar.c).
>
> No, it's not. MTD only has three different backing_dev_info instances
> which have different flags in the mapping-relevant portion of the
> backing_dev.
In the end I agree I was probably wrong but it's not that simple ;)
> > So at least here you'd need also add a similar exception for
> > "mtd_inodefs".
>
> No. For one thing we don't need any exception for correctnes alone -
> even the block device variant would work fine with the default case.
Here I don't agree. If you don't have some kind of exception, sb->s_bdi
for both "block" and "mtd_inodefs" filesystems points to
noop_backing_dev_info and you get no writeback for that one. So it isn't
just a performance issue but also a correctness one.
Regarding mtd_inodefs I now looked in more detail what MTD actually does
and it seems to me that MTD device inodes do not seem to carry any
cached state that flusher threads could write back. So returning
noop_backing_dev_info might be the right thing for them after all...
(added David Woodhouse and MTD list to CC so that they can shout if it's
not the case). Coming to this conclusion, I'm happy with your patch going
in as is...
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Dirtiable inode bdi default != sb bdi btrfs
2010-09-29 23:38 ` Jan Kara
@ 2010-09-30 0:06 ` Christoph Hellwig
-1 siblings, 0 replies; 30+ messages in thread
From: Christoph Hellwig @ 2010-09-30 0:06 UTC (permalink / raw)
To: Jan Kara
Cc: Christoph Hellwig, Chris Mason, Cesar Eduardo Barros,
Andrew Morton, linux-kernel, Jens Axboe, linux-btrfs,
Alexander Viro, linux-fsdevel, stable, Jens Axboe,
Micha?? Piotrowski, Chuck Ebbert, kernel, linux-mtd,
David Woodhouse
On Thu, Sep 30, 2010 at 01:38:07AM +0200, Jan Kara wrote:
> > No. For one thing we don't need any exception for correctnes alone -
> > even the block device variant would work fine with the default case.
> Here I don't agree. If you don't have some kind of exception, sb->s_bdi
> for both "block" and "mtd_inodefs" filesystems points to
> noop_backing_dev_info and you get no writeback for that one. So it isn't
> just a performance issue but also a correctness one.
Indeed - for internal filesystems that require writeback the change
causes trouble if they haven't registered a s_bdi. But for all user
visible filesystems that doesn't happen as we require s_bdi for
sync or even unmounts to work.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Dirtiable inode bdi default != sb bdi btrfs
@ 2010-09-30 0:06 ` Christoph Hellwig
0 siblings, 0 replies; 30+ messages in thread
From: Christoph Hellwig @ 2010-09-30 0:06 UTC (permalink / raw)
To: Jan Kara
Cc: Jens Axboe, kernel, Jens Axboe, linux-kernel, stable,
Micha?? Piotrowski, linux-mtd, Cesar Eduardo Barros,
Chuck Ebbert, linux-fsdevel, Andrew Morton, linux-btrfs,
David Woodhouse, Christoph Hellwig, Chris Mason, Alexander Viro
On Thu, Sep 30, 2010 at 01:38:07AM +0200, Jan Kara wrote:
> > No. For one thing we don't need any exception for correctnes alone -
> > even the block device variant would work fine with the default case.
> Here I don't agree. If you don't have some kind of exception, sb->s_bdi
> for both "block" and "mtd_inodefs" filesystems points to
> noop_backing_dev_info and you get no writeback for that one. So it isn't
> just a performance issue but also a correctness one.
Indeed - for internal filesystems that require writeback the change
causes trouble if they haven't registered a s_bdi. But for all user
visible filesystems that doesn't happen as we require s_bdi for
sync or even unmounts to work.
^ permalink raw reply [flat|nested] 30+ messages in thread