Kernel Newbies archive on lore.kernel.org
 help / color / Atom feed
* block size vs bvec length
@ 2020-04-05 18:17 Michele Sorcinelli
  2020-04-05 23:35 ` Valdis Klētnieks
  0 siblings, 1 reply; 3+ messages in thread
From: Michele Sorcinelli @ 2020-04-05 18:17 UTC (permalink / raw)
  To: kernelnewbies

I created a simple block device driver with a logical queue block size of 512 bytes.

$ cat /sys/block/myblock/queue/physical_block_size
512
$ cat /sys/block/myblock/queue/logical_block_size
512

I used rq_for_each_segment() to print bvec.bv_len of the segments and it
appears to be 4096.

Why is it 4096 rather than 512?

Also writing a block of 4096 bytes with dd to /dev/myblock will result in a
single write request, while writing a block of 512 bytes will result in a read
request followed by a write request.

Can someone explain this behavior?

Thanks,
   Michele.

_______________________________________________
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: block size vs bvec length
  2020-04-05 18:17 block size vs bvec length Michele Sorcinelli
@ 2020-04-05 23:35 ` Valdis Klētnieks
  2020-04-05 23:53   ` Michele Sorcinelli
  0 siblings, 1 reply; 3+ messages in thread
From: Valdis Klētnieks @ 2020-04-05 23:35 UTC (permalink / raw)
  To: Michele Sorcinelli; +Cc: kernelnewbies

[-- Attachment #1.1: Type: text/plain, Size: 1843 bytes --]

On Sun, 05 Apr 2020 19:17:39 +0100, Michele Sorcinelli said:

> I used rq_for_each_segment() to print bvec.bv_len of the segments and it
> appears to be 4096.
>
> Why is it 4096 rather than 512?

What is the actual device backing this block device?

> Also writing a block of 4096 bytes with dd to /dev/myblock will result in a
> single write request, while writing a block of 512 bytes will result in a read
> request followed by a write request.
>
> Can someone explain this behavior?

That's called a read-modify-write (RMW) cycle, and is used when a write request
isn't exactly one physical block long, and it happens for file devices as well,
it's just hidden by the file system layer.

Say you have a device/file that has a 4096 physical block.  You want to write
256 bytes, starting at an offset of 512 bytes into the file. To avoid
destroying the *rest* of the 4096 byte block, what happens is:

You read the entire 4096 byte block into a buffer, which now has the entire old
contents of that block.  You then copy the 256 bytes into the appropriate
section of the buffer, so it now contains the old data except where the new
data has been copied.  You then write the entire updated 4096 byte buffer back
to the device.

This becomes a major headache for high-performance disk I/O.  When you're
trying to write data out at 5 gigabytes/second, the last thing you need is some
researcher using the wrong write buffer size and making every write to a RAID6
into a read-modify-write.

Actually, I take that back - using the wrong buffer size *and* a bollixed
offset so half the writes end up being *two* RMW cycles is the last thing you
need :)

And if the researcher manages to screw up the stripe size as well - that
usually results in 3 sysadmins with clue-by-4's visiting the researcher to
advise them on the error of their ways.. :)


[-- Attachment #1.2: Type: application/pgp-signature, Size: 832 bytes --]

[-- Attachment #2: Type: text/plain, Size: 170 bytes --]

_______________________________________________
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: block size vs bvec length
  2020-04-05 23:35 ` Valdis Klētnieks
@ 2020-04-05 23:53   ` Michele Sorcinelli
  0 siblings, 0 replies; 3+ messages in thread
From: Michele Sorcinelli @ 2020-04-05 23:53 UTC (permalink / raw)
  To: Valdis Klētnieks; +Cc: kernelnewbies

On 4/6/20 12:35 AM, Valdis Klētnieks wrote:
> What is the actual device backing this block device?

There's no real device behind the driver: it's just writing the data on the
memory.
  
> That's called a read-modify-write (RMW) cycle, and is used when a write request
> isn't exactly one physical block long, and it happens for file devices as well,
> it's just hidden by the file system layer.

I understand this, but I don't understand why it's using 4096 as unit rather
than 512 that is the logical (and physical) block size I set for the queue
using blk_queue_logical_block_size(). What's the relation between actual block
size and the size of a segment (the bv_len field of the bio_vec struct) ?

Is it related somehow to the page size? Does a segment need to be at least long
as the page size? For example I've noticed that blk_queue_max_segment_size()
will set the max segment size to PAGE_SIZE if the given size it's < PAGE_SIZE.

_______________________________________________
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, back to index

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-05 18:17 block size vs bvec length Michele Sorcinelli
2020-04-05 23:35 ` Valdis Klētnieks
2020-04-05 23:53   ` Michele Sorcinelli

Kernel Newbies archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/kernelnewbies/0 kernelnewbies/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 kernelnewbies kernelnewbies/ https://lore.kernel.org/kernelnewbies \
		kernelnewbies@kernelnewbies.org
	public-inbox-index kernelnewbies

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernelnewbies.kernelnewbies


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git