linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH V4] iomap: add support to track dirty state of sub pages
@ 2020-08-21 12:33 Yu Kuai
  2020-08-21 12:44 ` Matthew Wilcox
  0 siblings, 1 reply; 13+ messages in thread
From: Yu Kuai @ 2020-08-21 12:33 UTC (permalink / raw)
  To: hch, darrick.wong, willy, david
  Cc: linux-xfs, linux-fsdevel, linux-kernel, yukuai3, yi.zhang

changes from v3:
 - add IOMAP_STATE_ARRAY_SIZE
 - replace set_bit / clear_bit with bitmap_set / bitmap_clear
 - move iomap_set_page_dirty() out of 'iop->state_lock'
 - merge iomap_set/clear_range_dirty() and iomap_iop_clear/clear_range_dirty()

changes from v2:
 as suggested by Mathew:
 - move iomap_set_page_dirty() into iomap_set_range_dirty()
 - add DIRTY_BITS()
 - move ioamp_set_rannge_dirty() from iomap_page_mkwrite_actor() to
   iomap_page_mkwrite()
 - clear the dirty bits of entire page in iomap_writepage_map

changes from v1:
 - separate set dirty and clear dirty functions
 - don't test uptodate bit in iomap_writepage_map()
 - use one bitmap array for uptodate and dirty.

commit 9dc55f1389f9 ("iomap: add support for sub-pagesize buffered I/O
without buffer heads") replace the per-block structure buffer_head with
the per-page structure iomap_page. However, iomap_page can't track the
dirty state of sub pages, which will cause performance issue since sub
pages will be writeback even if they are not dirty.

For example, if block size is 4k and page size is 64k:

dd if=/dev/zero of=testfile bs=4k count=16 oflag=sync

With buffer_head implementation, the above dd cmd will writeback 4k in
each round. However, with iomap_page implementation, the range of
writeback in each round is from the start of the page to the end offset
we just wrote.

Thus add support to track dirty state in iomap_page.

I tested this path with:
test environment:
	platform:	arm64
	kernel:		v5.8
	pagesize:	64k
	blocksize:	4k

test case:
	dd if=/dev/zero of=/mnt/testfile bs=1M count=128
	fio --ioengine=sync --rw=randwrite --iodepth=64 --name=test --filename=/mnt/testfile --bs=4k --fsync=1

The test result is:
a. with patch

```
Jobs: 1 (f=1): [w(1)][100.0%][r=0KiB/s,w=4576KiB/s][r=0,w=1144 IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=233608: Fri Aug 21 08:09:21 2020
  write: IOPS=1152, BW=4609KiB/s (4720kB/s)(128MiB/28437msec)
    clat (nsec): min=3000, max=23580, avg=4944.15, stdev=1589.23
     lat (nsec): min=3160, max=24500, avg=5109.51, stdev=1592.11
    clat percentiles (nsec):
     |  1.00th=[ 3376],  5.00th=[ 3600], 10.00th=[ 3824], 20.00th=[ 4016],
     | 30.00th=[ 4128], 40.00th=[ 4192], 50.00th=[ 4256], 60.00th=[ 4320],
     | 70.00th=[ 4448], 80.00th=[ 7200], 90.00th=[ 7968], 95.00th=[ 8160],
     | 99.00th=[ 8512], 99.50th=[ 8768], 99.90th=[10688], 99.95th=[12224],
     | 99.99th=[20352]
   bw (  KiB/s): min= 1920, max= 4800, per=99.97%, avg=4607.39, stdev=395.03, samples=56
   iops        : min=  480, max= 1200, avg=1151.84, stdev=98.76, samples=56
  lat (usec)   : 4=18.41%, 10=81.45%, 20=0.13%, 50=0.01%
  fsync/fdatasync/sync_file_range:
    sync (usec): min=651, max=22374, avg=852.24, stdev=430.25
    sync percentiles (usec):
     |  1.00th=[  660],  5.00th=[  660], 10.00th=[  668], 20.00th=[  668],
     | 30.00th=[  676], 40.00th=[  676], 50.00th=[  685], 60.00th=[  685],
     | 70.00th=[  693], 80.00th=[ 1401], 90.00th=[ 1418], 95.00th=[ 1434],
     | 99.00th=[ 1467], 99.50th=[ 1680], 99.90th=[ 7767], 99.95th=[ 7767],
     | 99.99th=[ 7767]
  cpu          : usr=0.41%, sys=2.18%, ctx=98400, majf=0, minf=4
  IO depths    : 1=200.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,32768,0,32767 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
  WRITE: bw=4609KiB/s (4720kB/s), 4609KiB/s-4609KiB/s (4720kB/s-4720kB/s), io=128MiB (134MB), run=28437-28437msec

Disk stats (read/write):
  sda: ios=4/65441, merge=0/5, ticks=3/28859, in_queue=54849, util=99.70%
```

b. without patch

```
Jobs: 1 (f=1): [w(1)][100.0%][r=0KiB/s,w=3003KiB/s][r=0,w=750 IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=9174: Tue Aug 18 04:17:16 2020
  write: IOPS=678, BW=2714KiB/s (2780kB/s)(128MiB/48286msec)
    clat (nsec): min=3420, max=26240, avg=5898.60, stdev=1824.49
     lat (nsec): min=3600, max=26860, avg=6065.21, stdev=1826.90
    clat percentiles (nsec):
     |  1.00th=[ 3792],  5.00th=[ 4128], 10.00th=[ 4320], 20.00th=[ 4512],
     | 30.00th=[ 4576], 40.00th=[ 4704], 50.00th=[ 4832], 60.00th=[ 4960],
     | 70.00th=[ 7968], 80.00th=[ 8256], 90.00th=[ 8512], 95.00th=[ 8768],
     | 99.00th=[ 9152], 99.50th=[ 9408], 99.90th=[11840], 99.95th=[13376],
     | 99.99th=[18560]
   bw (  KiB/s): min= 1016, max= 3128, per=99.92%, avg=2711.92, stdev=357.89, samples=96
   iops        : min=  254, max=  782, avg=677.98, stdev=89.47, samples=96
  lat (usec)   : 4=3.14%, 10=96.66%, 20=0.20%, 50=0.01%
  fsync/fdatasync/sync_file_range:
    sync (usec): min=814, max=24221, avg=1456.82, stdev=543.48
    sync percentiles (usec):
     |  1.00th=[  988],  5.00th=[  996], 10.00th=[  996], 20.00th=[ 1012],
     | 30.00th=[ 1029], 40.00th=[ 1221], 50.00th=[ 1270], 60.00th=[ 1287],
     | 70.00th=[ 1795], 80.00th=[ 1844], 90.00th=[ 2245], 95.00th=[ 2278],
     | 99.00th=[ 2442], 99.50th=[ 2737], 99.90th=[ 5407], 99.95th=[ 5538],
     | 99.99th=[ 5735]
  cpu          : usr=0.19%, sys=1.54%, ctx=98412, majf=0, minf=4
  IO depths    : 1=200.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,32768,0,32767 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
  WRITE: bw=2714KiB/s (2780kB/s), 2714KiB/s-2714KiB/s (2780kB/s-2780kB/s), io=128MiB (134MB), run=48286-48286msec

Disk stats (read/write):
  sda: ios=4/65344, merge=0/5, ticks=2/48198, in_queue=88938, util=99.83%
```

c. ext4

```
Jobs: 1 (f=1): [w(1)][100.0%][r=0KiB/s,w=3919KiB/s][r=0,w=979 IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=8682: Tue Aug 18 04:15:43 2020
  write: IOPS=960, BW=3840KiB/s (3932kB/s)(128MiB/34133msec)
    clat (usec): min=4, max=349, avg= 8.92, stdev= 2.94
     lat (usec): min=4, max=349, avg= 9.06, stdev= 2.94
    clat percentiles (nsec):
     |  1.00th=[ 6112],  5.00th=[ 6624], 10.00th=[ 6880], 20.00th=[ 7200],
     | 30.00th=[ 7456], 40.00th=[ 7712], 50.00th=[ 8032], 60.00th=[ 8384],
     | 70.00th=[ 9024], 80.00th=[11712], 90.00th=[12608], 95.00th=[13120],
     | 99.00th=[14272], 99.50th=[14656], 99.90th=[17536], 99.95th=[20352],
     | 99.99th=[33536]
   bw (  KiB/s): min= 1344, max= 3992, per=100.00%, avg=3839.88, stdev=314.69, samples=68
   iops        : min=  336, max=  998, avg=959.97, stdev=78.67, samples=68
  lat (usec)   : 10=74.64%, 20=25.31%, 50=0.05%, 100=0.01%, 500=0.01%
  fsync/fdatasync/sync_file_range:
    sync (usec): min=666, max=25174, avg=1021.69, stdev=871.62
    sync percentiles (usec):
     |  1.00th=[  685],  5.00th=[  693], 10.00th=[  701], 20.00th=[  701],
     | 30.00th=[  709], 40.00th=[  717], 50.00th=[  717], 60.00th=[  725],
     | 70.00th=[  734], 80.00th=[ 1500], 90.00th=[ 1516], 95.00th=[ 1532],
     | 99.00th=[ 6128], 99.50th=[ 6128], 99.90th=[ 7832], 99.95th=[ 8225],
     | 99.99th=[ 9634]
  cpu          : usr=0.32%, sys=2.87%, ctx=90254, majf=0, minf=4
  IO depths    : 1=200.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,32768,0,32767 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
  WRITE: bw=3840KiB/s (3932kB/s), 3840KiB/s-3840KiB/s (3932kB/s-3932kB/s), io=128MiB (134MB), run=34133-34133msec

Disk stats (read/write):
  sda: ios=0/75055, merge=0/8822, ticks=0/40565, in_queue=68469, util=99.80%
```

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 fs/iomap/buffered-io.c | 88 ++++++++++++++++++++++++++++++++++--------
 1 file changed, 72 insertions(+), 16 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index bcfc288dba3f..4539f98f3c12 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -21,15 +21,22 @@
 
 #include "../internal.h"
 
+#define DIRTY_BITS(x)	((x) + PAGE_SIZE / SECTOR_SIZE)
+#define IOMAP_STATE_ARRAY_SIZE	(PAGE_SIZE * 2 / SECTOR_SIZE)
+
 /*
  * Structure allocated for each page when block size < PAGE_SIZE to track
- * sub-page uptodate status and I/O completions.
+ * sub-page status and I/O completions.
  */
 struct iomap_page {
 	atomic_t		read_count;
 	atomic_t		write_count;
-	spinlock_t		uptodate_lock;
-	DECLARE_BITMAP(uptodate, PAGE_SIZE / 512);
+	spinlock_t		state_lock;
+	/*
+	 * The first half bits are used to track sub-page uptodate status,
+	 * the second half bits are for dirty status.
+	 */
+	DECLARE_BITMAP(state, IOMAP_STATE_ARRAY_SIZE);
 };
 
 static inline struct iomap_page *to_iomap_page(struct page *page)
@@ -52,8 +59,8 @@ iomap_page_create(struct inode *inode, struct page *page)
 	iop = kmalloc(sizeof(*iop), GFP_NOFS | __GFP_NOFAIL);
 	atomic_set(&iop->read_count, 0);
 	atomic_set(&iop->write_count, 0);
-	spin_lock_init(&iop->uptodate_lock);
-	bitmap_zero(iop->uptodate, PAGE_SIZE / SECTOR_SIZE);
+	spin_lock_init(&iop->state_lock);
+	bitmap_zero(iop->state, IOMAP_STATE_ARRAY_SIZE);
 
 	/*
 	 * migrate_page_move_mapping() assumes that pages with private data have
@@ -101,7 +108,7 @@ iomap_adjust_read_range(struct inode *inode, struct iomap_page *iop,
 
 		/* move forward for each leading block marked uptodate */
 		for (i = first; i <= last; i++) {
-			if (!test_bit(i, iop->uptodate))
+			if (!test_bit(i, iop->state))
 				break;
 			*pos += block_size;
 			poff += block_size;
@@ -111,7 +118,7 @@ iomap_adjust_read_range(struct inode *inode, struct iomap_page *iop,
 
 		/* truncate len if we find any trailing uptodate block(s) */
 		for ( ; i <= last; i++) {
-			if (test_bit(i, iop->uptodate)) {
+			if (test_bit(i, iop->state)) {
 				plen -= (last - i + 1) * block_size;
 				last = i - 1;
 				break;
@@ -135,6 +142,53 @@ iomap_adjust_read_range(struct inode *inode, struct iomap_page *iop,
 	*lenp = plen;
 }
 
+static void
+iomap_set_range_dirty(struct page *page, unsigned int off,
+		unsigned int len)
+{
+	struct inode *inode = page->mapping->host;
+	unsigned int first = DIRTY_BITS(off >> inode->i_blkbits);
+	unsigned int last = DIRTY_BITS((off + len - 1) >> inode->i_blkbits);
+	unsigned long flags;
+	struct iomap_page *iop;
+
+	if (PageError(page))
+		return;
+
+	if (len)
+		iomap_set_page_dirty(page);
+
+	if (!page_has_private(page))
+		return;
+
+	iop = to_iomap_page(page);
+	spin_lock_irqsave(&iop->state_lock, flags);
+	bitmap_set(iop->state, first, last - first + 1);
+	spin_unlock_irqrestore(&iop->state_lock, flags);
+}
+
+static void
+iomap_clear_range_dirty(struct page *page, unsigned int off,
+		unsigned int len)
+{
+	struct inode *inode = page->mapping->host;
+	unsigned int first = DIRTY_BITS(off >> inode->i_blkbits);
+	unsigned int last = DIRTY_BITS((off + len - 1) >> inode->i_blkbits);
+	unsigned long flags;
+	struct iomap_page *iop;
+
+	if (PageError(page))
+		return;
+
+	if (!page_has_private(page))
+		return;
+
+	iop = to_iomap_page(page);
+	spin_lock_irqsave(&iop->state_lock, flags);
+	bitmap_clear(iop->state, first, last - first + 1);
+	spin_unlock_irqrestore(&iop->state_lock, flags);
+}
+
 static void
 iomap_iop_set_range_uptodate(struct page *page, unsigned off, unsigned len)
 {
@@ -146,17 +200,17 @@ iomap_iop_set_range_uptodate(struct page *page, unsigned off, unsigned len)
 	unsigned long flags;
 	unsigned int i;
 
-	spin_lock_irqsave(&iop->uptodate_lock, flags);
+	spin_lock_irqsave(&iop->state_lock, flags);
 	for (i = 0; i < PAGE_SIZE / i_blocksize(inode); i++) {
 		if (i >= first && i <= last)
-			set_bit(i, iop->uptodate);
-		else if (!test_bit(i, iop->uptodate))
+			set_bit(i, iop->state);
+		else if (!test_bit(i, iop->state))
 			uptodate = false;
 	}
 
 	if (uptodate)
 		SetPageUptodate(page);
-	spin_unlock_irqrestore(&iop->uptodate_lock, flags);
+	spin_unlock_irqrestore(&iop->state_lock, flags);
 }
 
 static void
@@ -466,7 +520,7 @@ iomap_is_partially_uptodate(struct page *page, unsigned long from,
 
 	if (iop) {
 		for (i = first; i <= last; i++)
-			if (!test_bit(i, iop->uptodate))
+			if (!test_bit(i, iop->state))
 				return 0;
 		return 1;
 	}
@@ -705,7 +759,7 @@ __iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
 	if (unlikely(copied < len && !PageUptodate(page)))
 		return 0;
 	iomap_set_range_uptodate(page, offset_in_page(pos), len);
-	iomap_set_page_dirty(page);
+	iomap_set_range_dirty(page, offset_in_page(pos), len);
 	return copied;
 }
 
@@ -1029,7 +1083,6 @@ iomap_page_mkwrite_actor(struct inode *inode, loff_t pos, loff_t length,
 	} else {
 		WARN_ON_ONCE(!PageUptodate(page));
 		iomap_page_create(inode, page);
-		set_page_dirty(page);
 	}
 
 	return length;
@@ -1039,7 +1092,7 @@ vm_fault_t iomap_page_mkwrite(struct vm_fault *vmf, const struct iomap_ops *ops)
 {
 	struct page *page = vmf->page;
 	struct inode *inode = file_inode(vmf->vma->vm_file);
-	unsigned long length;
+	unsigned int length, bytes_in_page;
 	loff_t offset;
 	ssize_t ret;
 
@@ -1048,6 +1101,7 @@ vm_fault_t iomap_page_mkwrite(struct vm_fault *vmf, const struct iomap_ops *ops)
 	if (ret < 0)
 		goto out_unlock;
 	length = ret;
+	bytes_in_page = ret;
 
 	offset = page_offset(page);
 	while (length > 0) {
@@ -1060,6 +1114,7 @@ vm_fault_t iomap_page_mkwrite(struct vm_fault *vmf, const struct iomap_ops *ops)
 		length -= ret;
 	}
 
+	iomap_set_range_dirty(page, 0, bytes_in_page);
 	wait_for_stable_page(page);
 	return VM_FAULT_LOCKED;
 out_unlock:
@@ -1386,7 +1441,7 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc,
 	for (i = 0, file_offset = page_offset(page);
 	     i < (PAGE_SIZE >> inode->i_blkbits) && file_offset < end_offset;
 	     i++, file_offset += len) {
-		if (iop && !test_bit(i, iop->uptodate))
+		if (iop && !test_bit(DIRTY_BITS(i), iop->state))
 			continue;
 
 		error = wpc->ops->map_blocks(wpc, inode, file_offset);
@@ -1435,6 +1490,7 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc,
 		 */
 		set_page_writeback_keepwrite(page);
 	} else {
+		iomap_clear_range_dirty(page, 0, PAGE_SIZE);
 		clear_page_dirty_for_io(page);
 		set_page_writeback(page);
 	}
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH V4] iomap: add support to track dirty state of sub pages
  2020-08-21 12:33 [RFC PATCH V4] iomap: add support to track dirty state of sub pages Yu Kuai
@ 2020-08-21 12:44 ` Matthew Wilcox
  2020-08-21 12:46   ` [PATCH 1/3] iomap: Use kzalloc to allocate iomap_page Matthew Wilcox (Oracle)
  2020-09-11  8:27   ` [RFC PATCH V4] iomap: add support to track dirty state of sub pages yukuai (C)
  0 siblings, 2 replies; 13+ messages in thread
From: Matthew Wilcox @ 2020-08-21 12:44 UTC (permalink / raw)
  To: Yu Kuai
  Cc: hch, darrick.wong, david, linux-xfs, linux-fsdevel, linux-kernel,
	yi.zhang

On Fri, Aug 21, 2020 at 08:33:06PM +0800, Yu Kuai wrote:
> changes from v3:
>  - add IOMAP_STATE_ARRAY_SIZE
>  - replace set_bit / clear_bit with bitmap_set / bitmap_clear
>  - move iomap_set_page_dirty() out of 'iop->state_lock'
>  - merge iomap_set/clear_range_dirty() and iomap_iop_clear/clear_range_dirty()

I'm still working on the iomap parts of the THP series (fixing up
invalidatepage right now), but here are some of the relevant bits (patch
series to follow)


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/3] iomap: Use kzalloc to allocate iomap_page
  2020-08-21 12:44 ` Matthew Wilcox
@ 2020-08-21 12:46   ` Matthew Wilcox (Oracle)
  2020-08-21 12:46     ` [PATCH 2/3] iomap: Use bitmap ops to set uptodate bits Matthew Wilcox (Oracle)
                       ` (2 more replies)
  2020-09-11  8:27   ` [RFC PATCH V4] iomap: add support to track dirty state of sub pages yukuai (C)
  1 sibling, 3 replies; 13+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-08-21 12:46 UTC (permalink / raw)
  To: Yu Kuai, hch, darrick.wong, david, linux-xfs, linux-fsdevel,
	linux-kernel, yi.zhang
  Cc: Matthew Wilcox (Oracle)

We can skip most of the initialisation, although spinlocks still
need explicit initialisation as architectures may use a non-zero
value to indicate unlocked.  The comment is no longer useful as
attach_page_private() handles the refcount now.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/iomap/buffered-io.c | 10 +---------
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 13d5cdab8dcd..639d54a4177e 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -49,16 +49,8 @@ iomap_page_create(struct inode *inode, struct page *page)
 	if (iop || i_blocks_per_page(inode, page) <= 1)
 		return iop;
 
-	iop = kmalloc(sizeof(*iop), GFP_NOFS | __GFP_NOFAIL);
-	atomic_set(&iop->read_count, 0);
-	atomic_set(&iop->write_count, 0);
+	iop = kzalloc(sizeof(*iop), GFP_NOFS | __GFP_NOFAIL);
 	spin_lock_init(&iop->uptodate_lock);
-	bitmap_zero(iop->uptodate, PAGE_SIZE / SECTOR_SIZE);
-
-	/*
-	 * migrate_page_move_mapping() assumes that pages with private data have
-	 * their count elevated by 1.
-	 */
 	attach_page_private(page, iop);
 	return iop;
 }
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/3] iomap: Use bitmap ops to set uptodate bits
  2020-08-21 12:46   ` [PATCH 1/3] iomap: Use kzalloc to allocate iomap_page Matthew Wilcox (Oracle)
@ 2020-08-21 12:46     ` Matthew Wilcox (Oracle)
  2020-08-22  5:59       ` Christoph Hellwig
  2020-08-21 12:46     ` [PATCH 3/3] iomap: Support arbitrarily many blocks per page Matthew Wilcox (Oracle)
  2020-08-22  5:57     ` [PATCH 1/3] iomap: Use kzalloc to allocate iomap_page Christoph Hellwig
  2 siblings, 1 reply; 13+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-08-21 12:46 UTC (permalink / raw)
  To: Yu Kuai, hch, darrick.wong, david, linux-xfs, linux-fsdevel,
	linux-kernel, yi.zhang
  Cc: Matthew Wilcox (Oracle)

Now that the bitmap is protected by a spinlock, we can use the
more efficient bitmap ops instead of individual test/set bit ops.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/iomap/buffered-io.c | 12 ++----------
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 639d54a4177e..dbf9572dabe9 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -134,19 +134,11 @@ iomap_iop_set_range_uptodate(struct page *page, unsigned off, unsigned len)
 	struct inode *inode = page->mapping->host;
 	unsigned first = off >> inode->i_blkbits;
 	unsigned last = (off + len - 1) >> inode->i_blkbits;
-	bool uptodate = true;
 	unsigned long flags;
-	unsigned int i;
 
 	spin_lock_irqsave(&iop->uptodate_lock, flags);
-	for (i = 0; i < i_blocks_per_page(inode, page); i++) {
-		if (i >= first && i <= last)
-			set_bit(i, iop->uptodate);
-		else if (!test_bit(i, iop->uptodate))
-			uptodate = false;
-	}
-
-	if (uptodate)
+	bitmap_set(iop->uptodate, first, last - first + 1);
+	if (bitmap_full(iop->uptodate, i_blocks_per_page(inode, page)))
 		SetPageUptodate(page);
 	spin_unlock_irqrestore(&iop->uptodate_lock, flags);
 }
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/3] iomap: Support arbitrarily many blocks per page
  2020-08-21 12:46   ` [PATCH 1/3] iomap: Use kzalloc to allocate iomap_page Matthew Wilcox (Oracle)
  2020-08-21 12:46     ` [PATCH 2/3] iomap: Use bitmap ops to set uptodate bits Matthew Wilcox (Oracle)
@ 2020-08-21 12:46     ` Matthew Wilcox (Oracle)
  2020-08-21 12:48       ` Matthew Wilcox
  2020-08-22  6:01       ` Christoph Hellwig
  2020-08-22  5:57     ` [PATCH 1/3] iomap: Use kzalloc to allocate iomap_page Christoph Hellwig
  2 siblings, 2 replies; 13+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-08-21 12:46 UTC (permalink / raw)
  To: Yu Kuai, hch, darrick.wong, david, linux-xfs, linux-fsdevel,
	linux-kernel, yi.zhang
  Cc: Matthew Wilcox (Oracle)

Size the uptodate array dynamically and add a few debugging assertions.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/iomap/buffered-io.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index dbf9572dabe9..844e95cacea8 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -22,18 +22,19 @@
 #include "../internal.h"
 
 /*
- * Structure allocated for each page when block size < PAGE_SIZE to track
+ * Structure allocated for each page when block size < page size to track
  * sub-page uptodate status and I/O completions.
  */
 struct iomap_page {
 	atomic_t		read_count;
 	atomic_t		write_count;
 	spinlock_t		uptodate_lock;
-	DECLARE_BITMAP(uptodate, PAGE_SIZE / 512);
+	unsigned long		uptodate[];
 };
 
 static inline struct iomap_page *to_iomap_page(struct page *page)
 {
+	VM_BUG_ON_PGFLAGS(PageTail(page), page);
 	if (page_has_private(page))
 		return (struct iomap_page *)page_private(page);
 	return NULL;
@@ -45,11 +46,13 @@ static struct iomap_page *
 iomap_page_create(struct inode *inode, struct page *page)
 {
 	struct iomap_page *iop = to_iomap_page(page);
+	unsigned int nr_blocks = i_blocks_per_page(inode, page);
 
-	if (iop || i_blocks_per_page(inode, page) <= 1)
+	if (iop || nr_blocks <= 1)
 		return iop;
 
-	iop = kzalloc(sizeof(*iop), GFP_NOFS | __GFP_NOFAIL);
+	iop = kzalloc(struct_size(iop, uptodate, BITS_TO_LONGS(nr_blocks)),
+			GFP_NOFS | __GFP_NOFAIL);
 	spin_lock_init(&iop->uptodate_lock);
 	attach_page_private(page, iop);
 	return iop;
@@ -59,11 +62,14 @@ static void
 iomap_page_release(struct page *page)
 {
 	struct iomap_page *iop = detach_page_private(page);
+	unsigned int nr_blocks = i_blocks_per_page(page->mapping->host, page);
 
 	if (!iop)
 		return;
 	WARN_ON_ONCE(atomic_read(&iop->read_count));
 	WARN_ON_ONCE(atomic_read(&iop->write_count));
+	WARN_ON_ONCE(bitmap_full(iop->uptodate, nr_blocks) !=
+			PageUptodate(page));
 	kfree(iop);
 }
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/3] iomap: Support arbitrarily many blocks per page
  2020-08-21 12:46     ` [PATCH 3/3] iomap: Support arbitrarily many blocks per page Matthew Wilcox (Oracle)
@ 2020-08-21 12:48       ` Matthew Wilcox
  2020-08-22  6:01       ` Christoph Hellwig
  1 sibling, 0 replies; 13+ messages in thread
From: Matthew Wilcox @ 2020-08-21 12:48 UTC (permalink / raw)
  To: Yu Kuai, hch, darrick.wong, david, linux-xfs, linux-fsdevel,
	linux-kernel, yi.zhang

On Fri, Aug 21, 2020 at 01:46:06PM +0100, Matthew Wilcox (Oracle) wrote:
> @@ -45,11 +46,13 @@ static struct iomap_page *
>  iomap_page_create(struct inode *inode, struct page *page)
>  {
>  	struct iomap_page *iop = to_iomap_page(page);
> +	unsigned int nr_blocks = i_blocks_per_page(inode, page);
>  

i_blocks_per_page isn't part of this series.  It's here:

http://git.infradead.org/users/willy/pagecache.git/commitdiff/e3177f30c0d0906bec49586a86704a2a5736b6c3

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/3] iomap: Use kzalloc to allocate iomap_page
  2020-08-21 12:46   ` [PATCH 1/3] iomap: Use kzalloc to allocate iomap_page Matthew Wilcox (Oracle)
  2020-08-21 12:46     ` [PATCH 2/3] iomap: Use bitmap ops to set uptodate bits Matthew Wilcox (Oracle)
  2020-08-21 12:46     ` [PATCH 3/3] iomap: Support arbitrarily many blocks per page Matthew Wilcox (Oracle)
@ 2020-08-22  5:57     ` Christoph Hellwig
  2 siblings, 0 replies; 13+ messages in thread
From: Christoph Hellwig @ 2020-08-22  5:57 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: Yu Kuai, hch, darrick.wong, david, linux-xfs, linux-fsdevel,
	linux-kernel, yi.zhang

On Fri, Aug 21, 2020 at 01:46:04PM +0100, Matthew Wilcox (Oracle) wrote:
> We can skip most of the initialisation, although spinlocks still
> need explicit initialisation as architectures may use a non-zero
> value to indicate unlocked.  The comment is no longer useful as
> attach_page_private() handles the refcount now.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/3] iomap: Use bitmap ops to set uptodate bits
  2020-08-21 12:46     ` [PATCH 2/3] iomap: Use bitmap ops to set uptodate bits Matthew Wilcox (Oracle)
@ 2020-08-22  5:59       ` Christoph Hellwig
  0 siblings, 0 replies; 13+ messages in thread
From: Christoph Hellwig @ 2020-08-22  5:59 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: Yu Kuai, hch, darrick.wong, david, linux-xfs, linux-fsdevel,
	linux-kernel, yi.zhang

On Fri, Aug 21, 2020 at 01:46:05PM +0100, Matthew Wilcox (Oracle) wrote:
> Now that the bitmap is protected by a spinlock, we can use the
> more efficient bitmap ops instead of individual test/set bit ops.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  fs/iomap/buffered-io.c | 12 ++----------
>  1 file changed, 2 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 639d54a4177e..dbf9572dabe9 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -134,19 +134,11 @@ iomap_iop_set_range_uptodate(struct page *page, unsigned off, unsigned len)
>  	struct inode *inode = page->mapping->host;
>  	unsigned first = off >> inode->i_blkbits;
>  	unsigned last = (off + len - 1) >> inode->i_blkbits;
> -	bool uptodate = true;
>  	unsigned long flags;
> -	unsigned int i;
>  
>  	spin_lock_irqsave(&iop->uptodate_lock, flags);
> -	for (i = 0; i < i_blocks_per_page(inode, page); i++) {
> -		if (i >= first && i <= last)
> -			set_bit(i, iop->uptodate);
> -		else if (!test_bit(i, iop->uptodate))
> -			uptodate = false;
> -	}
> -
> -	if (uptodate)
> +	bitmap_set(iop->uptodate, first, last - first + 1);
> +	if (bitmap_full(iop->uptodate, i_blocks_per_page(inode, page)))
>  		SetPageUptodate(page);
>  	spin_unlock_irqrestore(&iop->uptodate_lock, flags);

Can you respin this to be based on current Linus' tree without
i_blocks_per_page?  With that it looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/3] iomap: Support arbitrarily many blocks per page
  2020-08-21 12:46     ` [PATCH 3/3] iomap: Support arbitrarily many blocks per page Matthew Wilcox (Oracle)
  2020-08-21 12:48       ` Matthew Wilcox
@ 2020-08-22  6:01       ` Christoph Hellwig
  1 sibling, 0 replies; 13+ messages in thread
From: Christoph Hellwig @ 2020-08-22  6:01 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: Yu Kuai, hch, darrick.wong, david, linux-xfs, linux-fsdevel,
	linux-kernel, yi.zhang

On Fri, Aug 21, 2020 at 01:46:06PM +0100, Matthew Wilcox (Oracle) wrote:
> Size the uptodate array dynamically and add a few debugging assertions.

Needs a rebase to not require i_blocks_per_page, and a somewhat more
details changelog that requires why we change this.  Otherwise this
looks good and should probably go in before the per-block uptodate
tracking.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH V4] iomap: add support to track dirty state of sub pages
  2020-08-21 12:44 ` Matthew Wilcox
  2020-08-21 12:46   ` [PATCH 1/3] iomap: Use kzalloc to allocate iomap_page Matthew Wilcox (Oracle)
@ 2020-09-11  8:27   ` yukuai (C)
  2020-09-11  8:35     ` yukuai (C)
  2020-09-11 12:05     ` Matthew Wilcox
  1 sibling, 2 replies; 13+ messages in thread
From: yukuai (C) @ 2020-09-11  8:27 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: hch, darrick.wong, david, linux-xfs, linux-fsdevel, linux-kernel,
	yi.zhang

On 2020/08/21 20:44, Matthew Wilcox wrote:
> On Fri, Aug 21, 2020 at 08:33:06PM +0800, Yu Kuai wrote:
>> changes from v3: - add IOMAP_STATE_ARRAY_SIZE - replace set_bit / 
>> clear_bit with bitmap_set / bitmap_clear - move 
>> iomap_set_page_dirty() out of 'iop->state_lock' - merge 
>> iomap_set/clear_range_dirty() and 
>> iomap_iop_clear/clear_range_dirty()
> 
> I'm still working on the iomap parts of the THP series (fixing up 
> invalidatepage right now), but here are some of the relevant bits 
> (patch series to follow)
> 

Hi, Matthew

Since your THP iomap patches were reviewed, I made some modifications
based on these patches.

Best regards,
Yu Kuai

---
  fs/iomap/buffered-io.c | 92 +++++++++++++++++++++++++++++++++---------
  1 file changed, 74 insertions(+), 18 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index edf5eea56cf5..bc7f57748be8 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -23,13 +23,17 @@

  /*
   * Structure allocated for each page or THP when block size < page size
- * to track sub-page uptodate status and I/O completions.
+ * to track sub-page status and I/O completions.
   */
  struct iomap_page {
  	atomic_t		read_bytes_pending;
  	atomic_t		write_bytes_pending;
-	spinlock_t		uptodate_lock;
-	unsigned long		uptodate[];
+	spinlock_t		state_lock;
+	/*
+	 * The first half bits are used to track sub-page uptodate status,
+	 * the second half bits are for dirty status.
+	 */
+	unsigned long		state[];
  };

  static inline struct iomap_page *to_iomap_page(struct page *page)
@@ -57,9 +61,9 @@ iomap_page_create(struct inode *inode, struct page *page)
  	if (iop || nr_blocks <= 1)
  		return iop;

-	iop = kzalloc(struct_size(iop, uptodate, BITS_TO_LONGS(nr_blocks)),
+	iop = kzalloc(struct_size(iop, state, BITS_TO_LONGS(2 * nr_blocks)),
  			GFP_NOFS | __GFP_NOFAIL);
-	spin_lock_init(&iop->uptodate_lock);
+	spin_lock_init(&iop->state_lock);
  	attach_page_private(page, iop);
  	return iop;
  }
@@ -74,7 +78,7 @@ iomap_page_release(struct page *page)
  		return;
  	WARN_ON_ONCE(atomic_read(&iop->read_bytes_pending));
  	WARN_ON_ONCE(atomic_read(&iop->write_bytes_pending));
-	WARN_ON_ONCE(bitmap_full(iop->uptodate, nr_blocks) !=
+	WARN_ON_ONCE(bitmap_full(iop->state, nr_blocks) !=
  			PageUptodate(page));
  	kfree(iop);
  }
@@ -105,7 +109,7 @@ iomap_adjust_read_range(struct inode *inode, struct 
iomap_page *iop,

  		/* move forward for each leading block marked uptodate */
  		for (i = first; i <= last; i++) {
-			if (!test_bit(i, iop->uptodate))
+			if (!test_bit(i, iop->state))
  				break;
  			*pos += block_size;
  			poff += block_size;
@@ -115,7 +119,7 @@ iomap_adjust_read_range(struct inode *inode, struct 
iomap_page *iop,

  		/* truncate len if we find any trailing uptodate block(s) */
  		for ( ; i <= last; i++) {
-			if (test_bit(i, iop->uptodate)) {
+			if (test_bit(i, iop->state)) {
  				plen -= (last - i + 1) * block_size;
  				last = i - 1;
  				break;
@@ -139,6 +143,55 @@ iomap_adjust_read_range(struct inode *inode, struct 
iomap_page *iop,
  	*lenp = plen;
  }

+static void
+iomap_set_range_dirty(struct page *page, unsigned int off,
+		unsigned int len)
+{
+	struct inode *inode = page->mapping->host;
+	unsigned int blocks_per_page = i_blocks_per_page(inode, page);
+	unsigned int first = (off >> inode->i_blkbits) + blocks_per_page;
+	unsigned int last = ((off + len - 1) >> inode->i_blkbits) + 
blocks_per_page;
+	unsigned long flags;
+	struct iomap_page *iop;
+
+	if (PageError(page))
+		return;
+
+	if (len)
+		iomap_set_page_dirty(page);
+
+	if (!page_has_private(page))
+		return;
+
+	iop = to_iomap_page(page);
+	spin_lock_irqsave(&iop->state_lock, flags);
+	bitmap_set(iop->state, first, last - first + 1);
+	spin_unlock_irqrestore(&iop->state_lock, flags);
+}
+
+static void
+iomap_clear_range_dirty(struct page *page, unsigned int off,
+		unsigned int len)
+{
+	struct inode *inode = page->mapping->host;
+	unsigned int blocks_per_page = i_blocks_per_page(inode, page);
+	unsigned int first = (off >> inode->i_blkbits) + blocks_per_page;
+	unsigned int last = ((off + len - 1) >> inode->i_blkbits) + 
blocks_per_page;
+	unsigned long flags;
+	struct iomap_page *iop;
+
+	if (PageError(page))
+		return;
+
+	if (!page_has_private(page))
+		return;
+
+	iop = to_iomap_page(page);
+	spin_lock_irqsave(&iop->state_lock, flags);
+	bitmap_clear(iop->state, first, last - first + 1);
+	spin_unlock_irqrestore(&iop->state_lock, flags);
+}
+
  static void
  iomap_iop_set_range_uptodate(struct page *page, unsigned off, unsigned 
len)
  {
@@ -148,11 +201,11 @@ iomap_iop_set_range_uptodate(struct page *page, 
unsigned off, unsigned len)
  	unsigned last = (off + len - 1) >> inode->i_blkbits;
  	unsigned long flags;

-	spin_lock_irqsave(&iop->uptodate_lock, flags);
-	bitmap_set(iop->uptodate, first, last - first + 1);
-	if (bitmap_full(iop->uptodate, i_blocks_per_page(inode, page)))
+	spin_lock_irqsave(&iop->state_lock, flags);
+	bitmap_set(iop->state, first, last - first + 1);
+	if (bitmap_full(iop->state, i_blocks_per_page(inode, page)))
  		SetPageUptodate(page);
-	spin_unlock_irqrestore(&iop->uptodate_lock, flags);
+	spin_unlock_irqrestore(&iop->state_lock, flags);
  }

  static void
@@ -445,7 +498,7 @@ iomap_is_partially_uptodate(struct page *page, 
unsigned long from,

  	if (iop) {
  		for (i = first; i <= last; i++)
-			if (!test_bit(i, iop->uptodate))
+			if (!test_bit(i, iop->state))
  				return 0;
  		return 1;
  	}
@@ -683,7 +736,7 @@ static size_t __iomap_write_end(struct inode *inode, 
loff_t pos, size_t len,
  	if (unlikely(copied < len && !PageUptodate(page)))
  		return 0;
  	iomap_set_range_uptodate(page, offset_in_page(pos), len);
-	iomap_set_page_dirty(page);
+	iomap_set_range_dirty(page, offset_in_page(pos), len);
  	return copied;
  }

@@ -997,7 +1050,6 @@ iomap_page_mkwrite_actor(struct inode *inode, 
loff_t pos, loff_t length,
  	} else {
  		WARN_ON_ONCE(!PageUptodate(page));
  		iomap_page_create(inode, page);
-		set_page_dirty(page);
  	}

  	return length;
@@ -1007,7 +1059,7 @@ vm_fault_t iomap_page_mkwrite(struct vm_fault 
*vmf, const struct iomap_ops *ops)
  {
  	struct page *page = vmf->page;
  	struct inode *inode = file_inode(vmf->vma->vm_file);
-	unsigned long length;
+	unsigned int length, dirty_bits;
  	loff_t offset;
  	ssize_t ret;

@@ -1016,6 +1068,7 @@ vm_fault_t iomap_page_mkwrite(struct vm_fault 
*vmf, const struct iomap_ops *ops)
  	if (ret < 0)
  		goto out_unlock;
  	length = ret;
+	dirty_bits = ret;

  	offset = page_offset(page);
  	while (length > 0) {
@@ -1028,6 +1081,7 @@ vm_fault_t iomap_page_mkwrite(struct vm_fault 
*vmf, const struct iomap_ops *ops)
  		length -= ret;
  	}

+	iomap_set_range_dirty(page, 0, dirty_bits);
  	wait_for_stable_page(page);
  	return VM_FAULT_LOCKED;
  out_unlock:
@@ -1340,11 +1394,12 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc,
  	struct iomap_page *iop = to_iomap_page(page);
  	struct iomap_ioend *ioend, *next;
  	unsigned len = i_blocksize(inode);
+	unsigned int blocks_per_page = i_blocks_per_page(inode, page);
  	u64 file_offset; /* file offset of page */
  	int error = 0, count = 0, i;
  	LIST_HEAD(submit_list);

-	WARN_ON_ONCE(i_blocks_per_page(inode, page) > 1 && !iop);
+	WARN_ON_ONCE(blocks_per_page > 1 && !iop);
  	WARN_ON_ONCE(iop && atomic_read(&iop->write_bytes_pending) != 0);

  	/*
@@ -1355,7 +1410,7 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc,
  	for (i = 0, file_offset = page_offset(page);
  	     i < (PAGE_SIZE >> inode->i_blkbits) && file_offset < end_offset;
  	     i++, file_offset += len) {
-		if (iop && !test_bit(i, iop->uptodate))
+		if (iop && !test_bit(i, iop->state + blocks_per_page))
  			continue;

  		error = wpc->ops->map_blocks(wpc, inode, file_offset);
@@ -1404,6 +1459,7 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc,
  		 */
  		set_page_writeback_keepwrite(page);
  	} else {
+		iomap_clear_range_dirty(page, 0, PAGE_SIZE);
  		clear_page_dirty_for_io(page);
  		set_page_writeback(page);
  	}
-- 
2.25.4


.





^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH V4] iomap: add support to track dirty state of sub pages
  2020-09-11  8:27   ` [RFC PATCH V4] iomap: add support to track dirty state of sub pages yukuai (C)
@ 2020-09-11  8:35     ` yukuai (C)
  2020-09-11 12:05     ` Matthew Wilcox
  1 sibling, 0 replies; 13+ messages in thread
From: yukuai (C) @ 2020-09-11  8:35 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: hch, darrick.wong, david, linux-xfs, linux-fsdevel, linux-kernel,
	yi.zhang

Hi,

Sorry that after copy and paste, the content of the patch somehow 
changed and looks strange.

Best regards,
Yu Kuai

On 2020/09/11 16:27, yukuai (C) wrote:
> On 2020/08/21 20:44, Matthew Wilcox wrote:
>> On Fri, Aug 21, 2020 at 08:33:06PM +0800, Yu Kuai wrote:
>>> changes from v3: - add IOMAP_STATE_ARRAY_SIZE - replace set_bit / 
>>> clear_bit with bitmap_set / bitmap_clear - move 
>>> iomap_set_page_dirty() out of 'iop->state_lock' - merge 
>>> iomap_set/clear_range_dirty() and iomap_iop_clear/clear_range_dirty()
>>
>> I'm still working on the iomap parts of the THP series (fixing up 
>> invalidatepage right now), but here are some of the relevant bits 
>> (patch series to follow)
>>
> 
> Hi, Matthew
> 
> Since your THP iomap patches were reviewed, I made some modifications
> based on these patches.
> 
> Best regards,
> Yu Kuai
> 
> ---
>   fs/iomap/buffered-io.c | 92 +++++++++++++++++++++++++++++++++---------
>   1 file changed, 74 insertions(+), 18 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index edf5eea56cf5..bc7f57748be8 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -23,13 +23,17 @@
> 
>   /*
>    * Structure allocated for each page or THP when block size < page size
> - * to track sub-page uptodate status and I/O completions.
> + * to track sub-page status and I/O completions.
>    */
>   struct iomap_page {
>       atomic_t        read_bytes_pending;
>       atomic_t        write_bytes_pending;
> -    spinlock_t        uptodate_lock;
> -    unsigned long        uptodate[];
> +    spinlock_t        state_lock;
> +    /*
> +     * The first half bits are used to track sub-page uptodate status,
> +     * the second half bits are for dirty status.
> +     */
> +    unsigned long        state[];
>   };
> 
>   static inline struct iomap_page *to_iomap_page(struct page *page)
> @@ -57,9 +61,9 @@ iomap_page_create(struct inode *inode, struct page *page)
>       if (iop || nr_blocks <= 1)
>           return iop;
> 
> -    iop = kzalloc(struct_size(iop, uptodate, BITS_TO_LONGS(nr_blocks)),
> +    iop = kzalloc(struct_size(iop, state, BITS_TO_LONGS(2 * nr_blocks)),
>               GFP_NOFS | __GFP_NOFAIL);
> -    spin_lock_init(&iop->uptodate_lock);
> +    spin_lock_init(&iop->state_lock);
>       attach_page_private(page, iop);
>       return iop;
>   }
> @@ -74,7 +78,7 @@ iomap_page_release(struct page *page)
>           return;
>       WARN_ON_ONCE(atomic_read(&iop->read_bytes_pending));
>       WARN_ON_ONCE(atomic_read(&iop->write_bytes_pending));
> -    WARN_ON_ONCE(bitmap_full(iop->uptodate, nr_blocks) !=
> +    WARN_ON_ONCE(bitmap_full(iop->state, nr_blocks) !=
>               PageUptodate(page));
>       kfree(iop);
>   }
> @@ -105,7 +109,7 @@ iomap_adjust_read_range(struct inode *inode, struct 
> iomap_page *iop,
> 
>           /* move forward for each leading block marked uptodate */
>           for (i = first; i <= last; i++) {
> -            if (!test_bit(i, iop->uptodate))
> +            if (!test_bit(i, iop->state))
>                   break;
>               *pos += block_size;
>               poff += block_size;
> @@ -115,7 +119,7 @@ iomap_adjust_read_range(struct inode *inode, struct 
> iomap_page *iop,
> 
>           /* truncate len if we find any trailing uptodate block(s) */
>           for ( ; i <= last; i++) {
> -            if (test_bit(i, iop->uptodate)) {
> +            if (test_bit(i, iop->state)) {
>                   plen -= (last - i + 1) * block_size;
>                   last = i - 1;
>                   break;
> @@ -139,6 +143,55 @@ iomap_adjust_read_range(struct inode *inode, struct 
> iomap_page *iop,
>       *lenp = plen;
>   }
> 
> +static void
> +iomap_set_range_dirty(struct page *page, unsigned int off,
> +        unsigned int len)
> +{
> +    struct inode *inode = page->mapping->host;
> +    unsigned int blocks_per_page = i_blocks_per_page(inode, page);
> +    unsigned int first = (off >> inode->i_blkbits) + blocks_per_page;
> +    unsigned int last = ((off + len - 1) >> inode->i_blkbits) + 
> blocks_per_page;
> +    unsigned long flags;
> +    struct iomap_page *iop;
> +
> +    if (PageError(page))
> +        return;
> +
> +    if (len)
> +        iomap_set_page_dirty(page);
> +
> +    if (!page_has_private(page))
> +        return;
> +
> +    iop = to_iomap_page(page);
> +    spin_lock_irqsave(&iop->state_lock, flags);
> +    bitmap_set(iop->state, first, last - first + 1);
> +    spin_unlock_irqrestore(&iop->state_lock, flags);
> +}
> +
> +static void
> +iomap_clear_range_dirty(struct page *page, unsigned int off,
> +        unsigned int len)
> +{
> +    struct inode *inode = page->mapping->host;
> +    unsigned int blocks_per_page = i_blocks_per_page(inode, page);
> +    unsigned int first = (off >> inode->i_blkbits) + blocks_per_page;
> +    unsigned int last = ((off + len - 1) >> inode->i_blkbits) + 
> blocks_per_page;
> +    unsigned long flags;
> +    struct iomap_page *iop;
> +
> +    if (PageError(page))
> +        return;
> +
> +    if (!page_has_private(page))
> +        return;
> +
> +    iop = to_iomap_page(page);
> +    spin_lock_irqsave(&iop->state_lock, flags);
> +    bitmap_clear(iop->state, first, last - first + 1);
> +    spin_unlock_irqrestore(&iop->state_lock, flags);
> +}
> +
>   static void
>   iomap_iop_set_range_uptodate(struct page *page, unsigned off, unsigned 
> len)
>   {
> @@ -148,11 +201,11 @@ iomap_iop_set_range_uptodate(struct page *page, 
> unsigned off, unsigned len)
>       unsigned last = (off + len - 1) >> inode->i_blkbits;
>       unsigned long flags;
> 
> -    spin_lock_irqsave(&iop->uptodate_lock, flags);
> -    bitmap_set(iop->uptodate, first, last - first + 1);
> -    if (bitmap_full(iop->uptodate, i_blocks_per_page(inode, page)))
> +    spin_lock_irqsave(&iop->state_lock, flags);
> +    bitmap_set(iop->state, first, last - first + 1);
> +    if (bitmap_full(iop->state, i_blocks_per_page(inode, page)))
>           SetPageUptodate(page);
> -    spin_unlock_irqrestore(&iop->uptodate_lock, flags);
> +    spin_unlock_irqrestore(&iop->state_lock, flags);
>   }
> 
>   static void
> @@ -445,7 +498,7 @@ iomap_is_partially_uptodate(struct page *page, 
> unsigned long from,
> 
>       if (iop) {
>           for (i = first; i <= last; i++)
> -            if (!test_bit(i, iop->uptodate))
> +            if (!test_bit(i, iop->state))
>                   return 0;
>           return 1;
>       }
> @@ -683,7 +736,7 @@ static size_t __iomap_write_end(struct inode *inode, 
> loff_t pos, size_t len,
>       if (unlikely(copied < len && !PageUptodate(page)))
>           return 0;
>       iomap_set_range_uptodate(page, offset_in_page(pos), len);
> -    iomap_set_page_dirty(page);
> +    iomap_set_range_dirty(page, offset_in_page(pos), len);
>       return copied;
>   }
> 
> @@ -997,7 +1050,6 @@ iomap_page_mkwrite_actor(struct inode *inode, 
> loff_t pos, loff_t length,
>       } else {
>           WARN_ON_ONCE(!PageUptodate(page));
>           iomap_page_create(inode, page);
> -        set_page_dirty(page);
>       }
> 
>       return length;
> @@ -1007,7 +1059,7 @@ vm_fault_t iomap_page_mkwrite(struct vm_fault 
> *vmf, const struct iomap_ops *ops)
>   {
>       struct page *page = vmf->page;
>       struct inode *inode = file_inode(vmf->vma->vm_file);
> -    unsigned long length;
> +    unsigned int length, dirty_bits;
>       loff_t offset;
>       ssize_t ret;
> 
> @@ -1016,6 +1068,7 @@ vm_fault_t iomap_page_mkwrite(struct vm_fault 
> *vmf, const struct iomap_ops *ops)
>       if (ret < 0)
>           goto out_unlock;
>       length = ret;
> +    dirty_bits = ret;
> 
>       offset = page_offset(page);
>       while (length > 0) {
> @@ -1028,6 +1081,7 @@ vm_fault_t iomap_page_mkwrite(struct vm_fault 
> *vmf, const struct iomap_ops *ops)
>           length -= ret;
>       }
> 
> +    iomap_set_range_dirty(page, 0, dirty_bits);
>       wait_for_stable_page(page);
>       return VM_FAULT_LOCKED;
>   out_unlock:
> @@ -1340,11 +1394,12 @@ iomap_writepage_map(struct iomap_writepage_ctx 
> *wpc,
>       struct iomap_page *iop = to_iomap_page(page);
>       struct iomap_ioend *ioend, *next;
>       unsigned len = i_blocksize(inode);
> +    unsigned int blocks_per_page = i_blocks_per_page(inode, page);
>       u64 file_offset; /* file offset of page */
>       int error = 0, count = 0, i;
>       LIST_HEAD(submit_list);
> 
> -    WARN_ON_ONCE(i_blocks_per_page(inode, page) > 1 && !iop);
> +    WARN_ON_ONCE(blocks_per_page > 1 && !iop);
>       WARN_ON_ONCE(iop && atomic_read(&iop->write_bytes_pending) != 0);
> 
>       /*
> @@ -1355,7 +1410,7 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc,
>       for (i = 0, file_offset = page_offset(page);
>            i < (PAGE_SIZE >> inode->i_blkbits) && file_offset < end_offset;
>            i++, file_offset += len) {
> -        if (iop && !test_bit(i, iop->uptodate))
> +        if (iop && !test_bit(i, iop->state + blocks_per_page))
>               continue;
> 
>           error = wpc->ops->map_blocks(wpc, inode, file_offset);
> @@ -1404,6 +1459,7 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc,
>            */
>           set_page_writeback_keepwrite(page);
>       } else {
> +        iomap_clear_range_dirty(page, 0, PAGE_SIZE);
>           clear_page_dirty_for_io(page);
>           set_page_writeback(page);
>       }


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH V4] iomap: add support to track dirty state of sub pages
  2020-09-11  8:27   ` [RFC PATCH V4] iomap: add support to track dirty state of sub pages yukuai (C)
  2020-09-11  8:35     ` yukuai (C)
@ 2020-09-11 12:05     ` Matthew Wilcox
  2020-09-25  2:30       ` yukuai (C)
  1 sibling, 1 reply; 13+ messages in thread
From: Matthew Wilcox @ 2020-09-11 12:05 UTC (permalink / raw)
  To: yukuai (C)
  Cc: hch, darrick.wong, david, linux-xfs, linux-fsdevel, linux-kernel,
	yi.zhang

On Fri, Sep 11, 2020 at 04:27:08PM +0800, yukuai (C) wrote:
> Since your THP iomap patches were reviewed, I made some modifications
> based on these patches.

Thanks!  This looks good to me!

> +static void
> +iomap_set_range_dirty(struct page *page, unsigned int off,
> +		unsigned int len)
> +{
> +	struct inode *inode = page->mapping->host;
> +	unsigned int blocks_per_page = i_blocks_per_page(inode, page);

I might call this nr_blocks.

> +	unsigned int first = (off >> inode->i_blkbits) + blocks_per_page;
> +	unsigned int last = ((off + len - 1) >> inode->i_blkbits) +
> blocks_per_page;
> +	unsigned long flags;
> +	struct iomap_page *iop;
> +
> +	if (PageError(page))
> +		return;

I think this is wrong.  If PageError is set then we've seen an I/O error
on this page (today either a read or a write, although I intend to make
that for writes only soon).  But I don't see a reason for declining to
make a page dirty if we've seen an eror -- indeed, that seems likely to
lead to further data loss as we then won't even try to write parts of
the page back to storage.

> +	if (len)
> +		iomap_set_page_dirty(page);
> +
> +	if (!page_has_private(page))
> +		return;
> +
> +	iop = to_iomap_page(page);

We usually do the last three lines as:

	iop = to_iomap_page(page);
	if (!iop)
		return;

(actually, we usually call to_iomap_page() at the start of the function
and then check it here)

> +static void
> +iomap_clear_range_dirty(struct page *page, unsigned int off,
> +		unsigned int len)
> +{
> +	struct inode *inode = page->mapping->host;
> +	unsigned int blocks_per_page = i_blocks_per_page(inode, page);
> +	unsigned int first = (off >> inode->i_blkbits) + blocks_per_page;
> +	unsigned int last = ((off + len - 1) >> inode->i_blkbits) +
> blocks_per_page;
> +	unsigned long flags;
> +	struct iomap_page *iop;
> +
> +	if (PageError(page))
> +		return;

It does make sense to avoid clearing the dirty state of the page here;
if the write failed, then the page is still dirty, and it'd be nice to
retry writes until they succeed.  So if you take out the PageError in
set_range_dirty, don't take it out here!

> +	if (!page_has_private(page))
> +		return;
> +
> +	iop = to_iomap_page(page);
> +	spin_lock_irqsave(&iop->state_lock, flags);
> +	bitmap_clear(iop->state, first, last - first + 1);
> +	spin_unlock_irqrestore(&iop->state_lock, flags);
> +}
> +
>  static void
>  iomap_iop_set_range_uptodate(struct page *page, unsigned off, unsigned len)
>  {
> @@ -148,11 +201,11 @@ iomap_iop_set_range_uptodate(struct page *page,
> unsigned off, unsigned len)
>  	unsigned last = (off + len - 1) >> inode->i_blkbits;
>  	unsigned long flags;
> 
> -	spin_lock_irqsave(&iop->uptodate_lock, flags);
> -	bitmap_set(iop->uptodate, first, last - first + 1);
> -	if (bitmap_full(iop->uptodate, i_blocks_per_page(inode, page)))
> +	spin_lock_irqsave(&iop->state_lock, flags);
> +	bitmap_set(iop->state, first, last - first + 1);
> +	if (bitmap_full(iop->state, i_blocks_per_page(inode, page)))
>  		SetPageUptodate(page);
> -	spin_unlock_irqrestore(&iop->uptodate_lock, flags);
> +	spin_unlock_irqrestore(&iop->state_lock, flags);
>  }
> 
>  static void
> @@ -445,7 +498,7 @@ iomap_is_partially_uptodate(struct page *page, unsigned
> long from,
> 
>  	if (iop) {
>  		for (i = first; i <= last; i++)
> -			if (!test_bit(i, iop->uptodate))
> +			if (!test_bit(i, iop->state))
>  				return 0;
>  		return 1;
>  	}
> @@ -683,7 +736,7 @@ static size_t __iomap_write_end(struct inode *inode,
> loff_t pos, size_t len,
>  	if (unlikely(copied < len && !PageUptodate(page)))
>  		return 0;
>  	iomap_set_range_uptodate(page, offset_in_page(pos), len);
> -	iomap_set_page_dirty(page);
> +	iomap_set_range_dirty(page, offset_in_page(pos), len);

I _think_ the call to set_range_uptodate here is now unnecessary.  The
blocks should already have been set uptodate in write_begin.  But please
check!

> @@ -1007,7 +1059,7 @@ vm_fault_t iomap_page_mkwrite(struct vm_fault *vmf,
> const struct iomap_ops *ops)
>  {
>  	struct page *page = vmf->page;
>  	struct inode *inode = file_inode(vmf->vma->vm_file);
> -	unsigned long length;
> +	unsigned int length, dirty_bits;

This is dirty_bytes, surely?


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH V4] iomap: add support to track dirty state of sub pages
  2020-09-11 12:05     ` Matthew Wilcox
@ 2020-09-25  2:30       ` yukuai (C)
  0 siblings, 0 replies; 13+ messages in thread
From: yukuai (C) @ 2020-09-25  2:30 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: hch, darrick.wong, david, linux-xfs, linux-fsdevel, linux-kernel,
	yi.zhang


On 2020/09/11 20:05, Matthew Wilcox wrote:
>> @@ -683,7 +736,7 @@ static size_t __iomap_write_end(struct inode *inode,
>> loff_t pos, size_t len,
>>   	if (unlikely(copied < len && !PageUptodate(page)))
>>   		return 0;
>>   	iomap_set_range_uptodate(page, offset_in_page(pos), len);
>> -	iomap_set_page_dirty(page);
>> +	iomap_set_range_dirty(page, offset_in_page(pos), len);
> I_think_  the call to set_range_uptodate here is now unnecessary.  The
> blocks should already have been set uptodate in write_begin.  But please
> check!

Hi, Matthew

Sorry it took me so long to get back to this.

I found that set_range_uptodate() might be skipped in write_begin()
in this case:

                  if (!(flags & IOMAP_WRITE_F_UNSHARE) &&
                   ┊   (from <= poff || from >= poff + plen) &&
                   ┊   (to <= poff || to >= poff + plen))
                           continue;

And iomap_adjust_read_range() can set 'poff' greater than 'from'
and 'poff + len' less than 'to'.

Thanks
Yu Kuai

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2020-09-25  2:30 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-21 12:33 [RFC PATCH V4] iomap: add support to track dirty state of sub pages Yu Kuai
2020-08-21 12:44 ` Matthew Wilcox
2020-08-21 12:46   ` [PATCH 1/3] iomap: Use kzalloc to allocate iomap_page Matthew Wilcox (Oracle)
2020-08-21 12:46     ` [PATCH 2/3] iomap: Use bitmap ops to set uptodate bits Matthew Wilcox (Oracle)
2020-08-22  5:59       ` Christoph Hellwig
2020-08-21 12:46     ` [PATCH 3/3] iomap: Support arbitrarily many blocks per page Matthew Wilcox (Oracle)
2020-08-21 12:48       ` Matthew Wilcox
2020-08-22  6:01       ` Christoph Hellwig
2020-08-22  5:57     ` [PATCH 1/3] iomap: Use kzalloc to allocate iomap_page Christoph Hellwig
2020-09-11  8:27   ` [RFC PATCH V4] iomap: add support to track dirty state of sub pages yukuai (C)
2020-09-11  8:35     ` yukuai (C)
2020-09-11 12:05     ` Matthew Wilcox
2020-09-25  2:30       ` yukuai (C)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).