linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Martijn Coenen <maco@android.com>,
	Christoph Hellwig <hch@lst.de>, Jan Kara <jack@suse.cz>
Subject: [PATCH 4.9 60/78] writeback: Avoid skipping inode writeback
Date: Tue,  1 Sep 2020 17:10:36 +0200	[thread overview]
Message-ID: <20200901150927.794380135@linuxfoundation.org> (raw)
In-Reply-To: <20200901150924.680106554@linuxfoundation.org>

From: Jan Kara <jack@suse.cz>

commit 5afced3bf28100d81fb2fe7e98918632a08feaf5 upstream.

Inode's i_io_list list head is used to attach inode to several different
lists - wb->{b_dirty, b_dirty_time, b_io, b_more_io}. When flush worker
prepares a list of inodes to writeback e.g. for sync(2), it moves inodes
to b_io list. Thus it is critical for sync(2) data integrity guarantees
that inode is not requeued to any other writeback list when inode is
queued for processing by flush worker. That's the reason why
writeback_single_inode() does not touch i_io_list (unless the inode is
completely clean) and why __mark_inode_dirty() does not touch i_io_list
if I_SYNC flag is set.

However there are two flaws in the current logic:

1) When inode has only I_DIRTY_TIME set but it is already queued in b_io
list due to sync(2), concurrent __mark_inode_dirty(inode, I_DIRTY_SYNC)
can still move inode back to b_dirty list resulting in skipping
writeback of inode time stamps during sync(2).

2) When inode is on b_dirty_time list and writeback_single_inode() races
with __mark_inode_dirty() like:

writeback_single_inode()		__mark_inode_dirty(inode, I_DIRTY_PAGES)
  inode->i_state |= I_SYNC
  __writeback_single_inode()
					  inode->i_state |= I_DIRTY_PAGES;
					  if (inode->i_state & I_SYNC)
					    bail
  if (!(inode->i_state & I_DIRTY_ALL))
  - not true so nothing done

We end up with I_DIRTY_PAGES inode on b_dirty_time list and thus
standard background writeback will not writeback this inode leading to
possible dirty throttling stalls etc. (thanks to Martijn Coenen for this
analysis).

Fix these problems by tracking whether inode is queued in b_io or
b_more_io lists in a new I_SYNC_QUEUED flag. When this flag is set, we
know flush worker has queued inode and we should not touch i_io_list.
On the other hand we also know that once flush worker is done with the
inode it will requeue the inode to appropriate dirty list. When
I_SYNC_QUEUED is not set, __mark_inode_dirty() can (and must) move inode
to appropriate dirty list.

Reported-by: Martijn Coenen <maco@android.com>
Reviewed-by: Martijn Coenen <maco@android.com>
Tested-by: Martijn Coenen <maco@android.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Fixes: 0ae45f63d4ef ("vfs: add support for a lazytime mount option")
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/fs-writeback.c  |   17 ++++++++++++-----
 include/linux/fs.h |    8 ++++++--
 2 files changed, 18 insertions(+), 7 deletions(-)

--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -162,6 +162,7 @@ static void inode_io_list_del_locked(str
 	assert_spin_locked(&wb->list_lock);
 	assert_spin_locked(&inode->i_lock);
 
+	inode->i_state &= ~I_SYNC_QUEUED;
 	list_del_init(&inode->i_io_list);
 	wb_io_lists_depopulated(wb);
 }
@@ -1103,6 +1104,7 @@ static void redirty_tail_locked(struct i
 			inode->dirtied_when = jiffies;
 	}
 	inode_io_list_move_locked(inode, wb, &wb->b_dirty);
+	inode->i_state &= ~I_SYNC_QUEUED;
 }
 
 static void redirty_tail(struct inode *inode, struct bdi_writeback *wb)
@@ -1178,8 +1180,11 @@ static int move_expired_inodes(struct li
 			break;
 		list_move(&inode->i_io_list, &tmp);
 		moved++;
+		spin_lock(&inode->i_lock);
 		if (flags & EXPIRE_DIRTY_ATIME)
-			set_bit(__I_DIRTY_TIME_EXPIRED, &inode->i_state);
+			inode->i_state |= I_DIRTY_TIME_EXPIRED;
+		inode->i_state |= I_SYNC_QUEUED;
+		spin_unlock(&inode->i_lock);
 		if (sb_is_blkdev_sb(inode->i_sb))
 			continue;
 		if (sb && sb != inode->i_sb)
@@ -1354,6 +1359,7 @@ static void requeue_inode(struct inode *
 	} else if (inode->i_state & I_DIRTY_TIME) {
 		inode->dirtied_when = jiffies;
 		inode_io_list_move_locked(inode, wb, &wb->b_dirty_time);
+		inode->i_state &= ~I_SYNC_QUEUED;
 	} else {
 		/* The inode is clean. Remove from writeback lists. */
 		inode_io_list_del_locked(inode, wb);
@@ -2188,11 +2194,12 @@ void __mark_inode_dirty(struct inode *in
 		inode->i_state |= flags;
 
 		/*
-		 * If the inode is being synced, just update its dirty state.
-		 * The unlocker will place the inode on the appropriate
-		 * superblock list, based upon its state.
+		 * If the inode is queued for writeback by flush worker, just
+		 * update its dirty state. Once the flush worker is done with
+		 * the inode it will place it on the appropriate superblock
+		 * list, based upon its state.
 		 */
-		if (inode->i_state & I_SYNC)
+		if (inode->i_state & I_SYNC_QUEUED)
 			goto out_unlock_inode;
 
 		/*
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1954,6 +1954,10 @@ static inline bool HAS_UNMAPPED_ID(struc
  *			wb stat updates to grab mapping->tree_lock.  See
  *			inode_switch_wb_work_fn() for details.
  *
+ * I_SYNC_QUEUED	Inode is queued in b_io or b_more_io writeback lists.
+ *			Used to detect that mark_inode_dirty() should not move
+ * 			inode between dirty lists.
+ *
  * Q: What is the difference between I_WILL_FREE and I_FREEING?
  */
 #define I_DIRTY_SYNC		(1 << 0)
@@ -1971,9 +1975,9 @@ static inline bool HAS_UNMAPPED_ID(struc
 #define I_DIO_WAKEUP		(1 << __I_DIO_WAKEUP)
 #define I_LINKABLE		(1 << 10)
 #define I_DIRTY_TIME		(1 << 11)
-#define __I_DIRTY_TIME_EXPIRED	12
-#define I_DIRTY_TIME_EXPIRED	(1 << __I_DIRTY_TIME_EXPIRED)
+#define I_DIRTY_TIME_EXPIRED	(1 << 12)
 #define I_WB_SWITCH		(1 << 13)
+#define I_SYNC_QUEUED		(1 << 17)
 
 #define I_DIRTY (I_DIRTY_SYNC | I_DIRTY_DATASYNC | I_DIRTY_PAGES)
 #define I_DIRTY_ALL (I_DIRTY | I_DIRTY_TIME)



  parent reply	other threads:[~2020-09-01 17:09 UTC|newest]

Thread overview: 82+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-01 15:09 [PATCH 4.9 00/78] 4.9.235-rc1 review Greg Kroah-Hartman
2020-09-01 15:09 ` [PATCH 4.9 01/78] bonding: fix a potential double-unregister Greg Kroah-Hartman
2020-09-01 15:09 ` [PATCH 4.9 02/78] bonding: show saner speed for broadcast mode Greg Kroah-Hartman
2020-09-01 15:09 ` [PATCH 4.9 03/78] net: Fix potential wrong skb->protocol in skb_vlan_untag() Greg Kroah-Hartman
2020-09-01 15:09 ` [PATCH 4.9 04/78] tipc: fix uninit skb->data in tipc_nl_compat_dumpit() Greg Kroah-Hartman
2020-09-01 15:09 ` [PATCH 4.9 05/78] ipvlan: fix device features Greg Kroah-Hartman
2020-09-01 15:09 ` [PATCH 4.9 06/78] gre6: Fix reception with IP6_TNL_F_RCV_DSCP_COPY Greg Kroah-Hartman
2020-09-01 15:09 ` [PATCH 4.9 07/78] ALSA: pci: delete repeated words in comments Greg Kroah-Hartman
2020-09-01 15:09 ` [PATCH 4.9 08/78] ASoC: tegra: Fix reference count leaks Greg Kroah-Hartman
2020-09-01 15:09 ` [PATCH 4.9 09/78] arm64: dts: qcom: msm8916: Pull down PDM GPIOs during sleep Greg Kroah-Hartman
2020-09-01 15:09 ` [PATCH 4.9 10/78] media: pci: ttpci: av7110: fix possible buffer overflow caused by bad DMA value in debiirq() Greg Kroah-Hartman
2020-09-01 15:09 ` [PATCH 4.9 11/78] scsi: target: tcmu: Fix crash on ARM during cmd completion Greg Kroah-Hartman
2020-09-01 15:09 ` [PATCH 4.9 12/78] iommu/iova: Dont BUG on invalid PFNs Greg Kroah-Hartman
2020-09-01 15:09 ` [PATCH 4.9 13/78] drm/amdkfd: Fix reference count leaks Greg Kroah-Hartman
2020-09-01 15:09 ` [PATCH 4.9 14/78] drm/radeon: fix multiple reference count leak Greg Kroah-Hartman
2020-09-01 15:09 ` [PATCH 4.9 15/78] drm/amdgpu: fix ref count leak in amdgpu_driver_open_kms Greg Kroah-Hartman
2020-09-01 15:09 ` [PATCH 4.9 16/78] drm/amd/display: fix ref count leak in amdgpu_drm_ioctl Greg Kroah-Hartman
2020-09-01 15:09 ` [PATCH 4.9 17/78] drm/amdgpu: fix ref count leak in amdgpu_display_crtc_set_config Greg Kroah-Hartman
2020-09-01 15:09 ` [PATCH 4.9 18/78] drm/amdgpu/display: fix ref count leak when pm_runtime_get_sync fails Greg Kroah-Hartman
2020-09-01 15:09 ` [PATCH 4.9 19/78] scsi: lpfc: Fix shost refcount mismatch when deleting vport Greg Kroah-Hartman
2020-09-01 15:09 ` [PATCH 4.9 20/78] selftests/powerpc: Purge extra count_pmc() calls of ebb selftests Greg Kroah-Hartman
2020-09-01 15:09 ` [PATCH 4.9 21/78] omapfb: fix multiple reference count leaks due to pm_runtime_get_sync Greg Kroah-Hartman
2020-09-01 15:09 ` [PATCH 4.9 22/78] PCI: Fix pci_create_slot() reference count leak Greg Kroah-Hartman
2020-09-01 15:09 ` [PATCH 4.9 23/78] rtlwifi: rtl8192cu: Prevent leaking urb Greg Kroah-Hartman
2020-09-01 15:10 ` [PATCH 4.9 24/78] mips/vdso: Fix resource leaks in genvdso.c Greg Kroah-Hartman
2020-09-01 15:10 ` [PATCH 4.9 25/78] cec-api: prevent leaking memory through hole in structure Greg Kroah-Hartman
2020-09-01 15:10 ` [PATCH 4.9 26/78] drm/nouveau/drm/noveau: fix reference count leak in nouveau_fbcon_open Greg Kroah-Hartman
2020-09-01 15:10 ` [PATCH 4.9 27/78] drm/nouveau: Fix reference count leak in nouveau_connector_detect Greg Kroah-Hartman
2020-09-01 15:10 ` [PATCH 4.9 28/78] locking/lockdep: Fix overflow in presentation of average lock-time Greg Kroah-Hartman
2020-09-01 15:10 ` [PATCH 4.9 29/78] scsi: iscsi: Do not put host in iscsi_set_flashnode_param() Greg Kroah-Hartman
2020-09-01 15:10 ` [PATCH 4.9 30/78] ceph: fix potential mdsc use-after-free crash Greg Kroah-Hartman
2020-09-01 15:10 ` [PATCH 4.9 31/78] scsi: fcoe: Memory leak fix in fcoe_sysfs_fcf_del() Greg Kroah-Hartman
2020-09-01 15:10 ` [PATCH 4.9 32/78] EDAC/ie31200: Fallback if host bridge device is already initialized Greg Kroah-Hartman
2020-09-01 15:10 ` [PATCH 4.9 33/78] media: davinci: vpif_capture: fix potential double free Greg Kroah-Hartman
2020-09-01 15:10 ` [PATCH 4.9 34/78] KVM: arm64: Fix symbol dependency in __hyp_call_panic_nvhe Greg Kroah-Hartman
2020-09-01 15:10 ` [PATCH 4.9 35/78] powerpc/spufs: add CONFIG_COREDUMP dependency Greg Kroah-Hartman
2020-09-01 15:10 ` [PATCH 4.9 36/78] USB: sisusbvga: Fix a potential UB casued by left shifting a negative value Greg Kroah-Hartman
2020-09-01 15:10 ` [PATCH 4.9 37/78] efi: provide empty efi_enter_virtual_mode implementation Greg Kroah-Hartman
2020-09-01 15:10 ` [PATCH 4.9 38/78] Revert "ath10k: fix DMA related firmware crashes on multiple devices" Greg Kroah-Hartman
2020-09-01 15:10 ` [PATCH 4.9 39/78] i2c: rcar: in slave mode, clear NACK earlier Greg Kroah-Hartman
2020-09-01 15:10 ` [PATCH 4.9 40/78] usb: gadget: f_tcm: Fix some resource leaks in some error paths Greg Kroah-Hartman
2020-09-01 15:10 ` [PATCH 4.9 41/78] jbd2: make sure jh have b_transaction set in refile/unfile_buffer Greg Kroah-Hartman
2020-09-01 15:10 ` [PATCH 4.9 42/78] jbd2: abort journal if free a async write error metadata buffer Greg Kroah-Hartman
2020-09-01 15:10 ` [PATCH 4.9 43/78] fs: prevent BUG_ON in submit_bh_wbc() Greg Kroah-Hartman
2020-09-01 15:10 ` [PATCH 4.9 44/78] s390/cio: add cond_resched() in the slow_eval_known_fn() loop Greg Kroah-Hartman
2020-09-01 15:10 ` [PATCH 4.9 45/78] scsi: ufs: Fix possible infinite loop in ufshcd_hold Greg Kroah-Hartman
2020-09-01 15:10 ` [PATCH 4.9 46/78] scsi: ufs: Improve interrupt handling for shared interrupts Greg Kroah-Hartman
2020-09-01 15:10 ` [PATCH 4.9 47/78] net: gianfar: Add of_node_put() before goto statement Greg Kroah-Hartman
2020-09-01 15:10 ` [PATCH 4.9 48/78] powerpc/perf: Fix soft lockups due to missed interrupt accounting Greg Kroah-Hartman
2020-09-01 15:10 ` [PATCH 4.9 49/78] HID: i2c-hid: Always sleep 60ms after I2C_HID_PWR_ON commands Greg Kroah-Hartman
2020-09-01 15:10 ` [PATCH 4.9 50/78] btrfs: fix space cache memory leak after transaction abort Greg Kroah-Hartman
2020-09-01 15:10 ` [PATCH 4.9 51/78] fbcon: prevent user font height or width change from causing potential out-of-bounds access Greg Kroah-Hartman
2020-09-01 15:10 ` [PATCH 4.9 52/78] USB: lvtest: return proper error code in probe Greg Kroah-Hartman
2020-09-01 15:10 ` [PATCH 4.9 53/78] vt: defer kfree() of vc_screenbuf in vc_do_resize() Greg Kroah-Hartman
2020-09-01 15:10 ` [PATCH 4.9 54/78] vt_ioctl: change VT_RESIZEX ioctl to check for error return from vc_resize() Greg Kroah-Hartman
2020-09-01 15:10 ` [PATCH 4.9 55/78] serial: samsung: Removes the IRQ not found warning Greg Kroah-Hartman
2020-09-01 15:10 ` [PATCH 4.9 56/78] serial: pl011: Fix oops on -EPROBE_DEFER Greg Kroah-Hartman
2020-09-01 15:10 ` [PATCH 4.9 57/78] serial: pl011: Dont leak amba_ports entry on driver register error Greg Kroah-Hartman
2020-09-01 15:10 ` [PATCH 4.9 58/78] serial: 8250: change lock order in serial8250_do_startup() Greg Kroah-Hartman
2020-09-01 15:10 ` [PATCH 4.9 59/78] writeback: Protect inode->i_io_list with inode->i_lock Greg Kroah-Hartman
2020-09-01 15:10 ` Greg Kroah-Hartman [this message]
2020-09-01 15:10 ` [PATCH 4.9 61/78] writeback: Fix sync livelock due to b_dirty_time processing Greg Kroah-Hartman
2020-09-01 15:10 ` [PATCH 4.9 62/78] XEN uses irqdesc::irq_data_common::handler_data to store a per interrupt XEN data pointer which contains XEN specific information Greg Kroah-Hartman
2020-09-01 15:10 ` [PATCH 4.9 63/78] xhci: Do warm-reset when both CAS and XDEV_RESUME are set Greg Kroah-Hartman
2020-09-01 15:10 ` [PATCH 4.9 64/78] PM: sleep: core: Fix the handling of pending runtime resume requests Greg Kroah-Hartman
2020-09-01 15:10 ` [PATCH 4.9 65/78] device property: Fix the secondary firmware node handling in set_primary_fwnode() Greg Kroah-Hartman
2020-09-01 15:10 ` [PATCH 4.9 66/78] USB: yurex: Fix bad gfp argument Greg Kroah-Hartman
2020-09-01 15:10 ` [PATCH 4.9 67/78] usb: uas: Add quirk for PNY Pro Elite Greg Kroah-Hartman
2020-09-01 15:10 ` [PATCH 4.9 68/78] USB: quirks: Add no-lpm quirk for another Raydium touchscreen Greg Kroah-Hartman
2020-09-01 15:10 ` [PATCH 4.9 69/78] USB: Ignore UAS for JMicron JMS567 ATA/ATAPI Bridge Greg Kroah-Hartman
2020-09-01 15:10 ` [PATCH 4.9 70/78] usb: host: ohci-exynos: Fix error handling in exynos_ohci_probe() Greg Kroah-Hartman
2020-09-01 15:10 ` [PATCH 4.9 71/78] overflow.h: Add allocation size calculation helpers Greg Kroah-Hartman
2020-09-01 15:10 ` [PATCH 4.9 72/78] USB: gadget: u_f: add overflow checks to VLA macros Greg Kroah-Hartman
2020-09-01 15:10 ` [PATCH 4.9 73/78] USB: gadget: f_ncm: add bounds checks to ncm_unwrap_ntb() Greg Kroah-Hartman
2020-09-01 15:10 ` [PATCH 4.9 74/78] USB: gadget: u_f: Unbreak offset calculation in VLAs Greg Kroah-Hartman
2020-09-01 15:10 ` [PATCH 4.9 75/78] usb: storage: Add unusual_uas entry for Sony PSZ drives Greg Kroah-Hartman
2020-09-01 15:10 ` [PATCH 4.9 76/78] btrfs: check the right error variable in btrfs_del_dir_entries_in_log Greg Kroah-Hartman
2020-09-01 15:10 ` [PATCH 4.9 77/78] HID: hiddev: Fix slab-out-of-bounds write in hiddev_ioctl_usage() Greg Kroah-Hartman
2020-09-01 15:10 ` [PATCH 4.9 78/78] ALSA: usb-audio: Update documentation comment for MS2109 quirk Greg Kroah-Hartman
2020-09-01 22:25 ` [PATCH 4.9 00/78] 4.9.235-rc1 review Shuah Khan
2020-09-02  7:21 ` Naresh Kamboju
2020-09-02 16:46 ` Guenter Roeck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200901150927.794380135@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maco@android.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).