All of lore.kernel.org
 help / color / mirror / Atom feed
* Theoretical Question about commit=n
@ 2017-11-12 20:58 Robert White
  2017-11-12 22:01 ` Hans van Kranenburg
  0 siblings, 1 reply; 5+ messages in thread
From: Robert White @ 2017-11-12 20:58 UTC (permalink / raw)
  To: Btrfs BTRFS

Is the commit interval monotonic, or is it seconds after sync?

What I mean is that if I manually call sync(2) does the commit timer
reset? I'm thinking it does not, but I can imagine a workload where it
ideally would.

(Again, this is purely theoretical, I have no such workload as I am
about to describe.)

So suppose I have some sort of system, like a database, that I know will
do scattered writes and extends through some files and then call some
variant of sync(2). And I know that those sync() calls will be every
forty-to-sixty seconds because of reasons. It would be "neat" to be able
to set the commit=n to some high value, like 90, and then "normally" the
sync() behaviours would follow the application instead of the larger
commit interval.

The value would be that the file system would tend _not_ to go into sync
while the application was still skittering about in the various files.

Of course any other applications could call sync from their own contexts
for their own reasons. And there's an implicit fsync() on just about any
close() (at least if everything is doing its business "correctly")

It may be a strange idea but I can think of some near realtime
applications might be able to leverage a modicum of control over the
sync event. There is no API, and not strong reason to desire one, for
controlling the commit via (low privelege) applications.

But if the plumbing exists, then having a mode where sync() or fsync()
(which I think causes a general sync because of the journal) resets the
commit timer could be really interesting.

With any kind of delayed block choice/mapping it could actually reduce
the entropy of the individual files for repeated random small writes.
The application would have to be reasonably aware, of course.

Since something is causing a sync() the commit=N guarantee is still
being met for the whole system for any N, but applications could tend to
avoid mid-write commits by planing their sync()s.

Just a thought.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Theoretical Question about commit=n
  2017-11-12 20:58 Theoretical Question about commit=n Robert White
@ 2017-11-12 22:01 ` Hans van Kranenburg
  2017-11-13  0:41   ` Qu Wenruo
  0 siblings, 1 reply; 5+ messages in thread
From: Hans van Kranenburg @ 2017-11-12 22:01 UTC (permalink / raw)
  To: Robert White, Btrfs BTRFS

On 11/12/2017 09:58 PM, Robert White wrote:
> Is the commit interval monotonic, or is it seconds after sync?
> 
> What I mean is that if I manually call sync(2) does the commit timer
> reset? I'm thinking it does not, but I can imagine a workload where it
> ideally would.

The magic happens inside the transaction kernel thread:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/btrfs/disk-io.c?h=v4.14#n1925

You can see the delay being computed:
    delay = HZ * fs_info->commit_interval;

Almost at the end of the function, you see:
    schedule_timeout(delay)

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/time/timer.c?h=v4.14#n1676

This schedule_timeout function sets a timer and then the thread goes to
sleep. If nothing happens, the kernel will wake up the thread after the
timer expires (can be later, but not earlier) and then it will redo the
loop.

If something else wakes up the transaction thread, the timer is
discarded if it's not expired yet.

So it works like you would want.

You can test this yourself by looking at the "generation" number of your
filesystem. It's in the output of btrfs inspect dump-super:

This is the little test filesystem I just used:

-# btrfs inspect dump-super /dev/dorothy/mekker | grep ^generation
generation		35

If you print the number in a loop, like every second, you can see it
going up after a transaction happened. Now play around with other things
and see when it changes.

> (Again, this is purely theoretical, I have no such workload as I am
> about to describe.)
> 
> So suppose I have some sort of system, like a database, that I know will
> do scattered writes and extends through some files and then call some
> variant of sync(2). And I know that those sync() calls will be every
> forty-to-sixty seconds because of reasons. It would be "neat" to be able
> to set the commit=n to some high value, like 90, and then "normally" the
> sync() behaviours would follow the application instead of the larger
> commit interval.
> 
> The value would be that the file system would tend _not_ to go into sync
> while the application was still skittering about in the various files.
> 
> Of course any other applications could call sync from their own contexts
> for their own reasons. And there's an implicit fsync() on just about any
> close() (at least if everything is doing its business "correctly")
> 
> It may be a strange idea but I can think of some near realtime
> applications might be able to leverage a modicum of control over the
> sync event. There is no API, and not strong reason to desire one, for
> controlling the commit via (low privelege) applications.
> 
> But if the plumbing exists, then having a mode where sync() or fsync()
> (which I think causes a general sync because of the journal) resets the
> commit timer could be really interesting.
> 
> With any kind of delayed block choice/mapping it could actually reduce
> the entropy of the individual files for repeated random small writes.
> The application would have to be reasonably aware, of course.
> 
> Since something is causing a sync() the commit=N guarantee is still
> being met for the whole system for any N, but applications could tend to
> avoid mid-write commits by planing their sync()s.
> 
> Just a thought.


-- 
Hans van Kranenburg

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Theoretical Question about commit=n
  2017-11-12 22:01 ` Hans van Kranenburg
@ 2017-11-13  0:41   ` Qu Wenruo
  2017-11-13  1:17     ` Hans van Kranenburg
  0 siblings, 1 reply; 5+ messages in thread
From: Qu Wenruo @ 2017-11-13  0:41 UTC (permalink / raw)
  To: Hans van Kranenburg, Robert White, Btrfs BTRFS


[-- Attachment #1.1: Type: text/plain, Size: 4094 bytes --]



On 2017年11月13日 06:01, Hans van Kranenburg wrote:
> On 11/12/2017 09:58 PM, Robert White wrote:
>> Is the commit interval monotonic, or is it seconds after sync?
>>
>> What I mean is that if I manually call sync(2) does the commit timer
>> reset? I'm thinking it does not, but I can imagine a workload where it
>> ideally would.
> 
> The magic happens inside the transaction kernel thread:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/btrfs/disk-io.c?h=v4.14#n1925
> 
> You can see the delay being computed:
>     delay = HZ * fs_info->commit_interval;
> 
> Almost at the end of the function, you see:
>     schedule_timeout(delay)
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/time/timer.c?h=v4.14#n1676
> 
> This schedule_timeout function sets a timer and then the thread goes to
> sleep. If nothing happens, the kernel will wake up the thread after the
> timer expires (can be later, but not earlier) and then it will redo the
> loop.
> 
> If something else wakes up the transaction thread, the timer is
> discarded if it's not expired yet.

So far so good.

> 
> So it works like you would want.

Not exactly.

Sync or commit_transaction won't wake up transaction_kthread.

transaction_kthread will mostly be woken by trans error, remount or
under certain case of btrfs_end_transaction.

So manually sync will not (at least not always) interrupt commit interval.

And even more, transaction_kthread will only commit transaction, which
means it will only ensure metadata consistent.

It won't ensure buffered write to reach disk if its extent is not
allocated yet (delalloc).

Thanks,
Qu
> 
> You can test this yourself by looking at the "generation" number of your
> filesystem. It's in the output of btrfs inspect dump-super:
> 
> This is the little test filesystem I just used:
> 
> -# btrfs inspect dump-super /dev/dorothy/mekker | grep ^generation
> generation		35
> 
> If you print the number in a loop, like every second, you can see it
> going up after a transaction happened. Now play around with other things
> and see when it changes.
> 
>> (Again, this is purely theoretical, I have no such workload as I am
>> about to describe.)
>>
>> So suppose I have some sort of system, like a database, that I know will
>> do scattered writes and extends through some files and then call some
>> variant of sync(2). And I know that those sync() calls will be every
>> forty-to-sixty seconds because of reasons. It would be "neat" to be able
>> to set the commit=n to some high value, like 90, and then "normally" the
>> sync() behaviours would follow the application instead of the larger
>> commit interval.
>>
>> The value would be that the file system would tend _not_ to go into sync
>> while the application was still skittering about in the various files.
>>
>> Of course any other applications could call sync from their own contexts
>> for their own reasons. And there's an implicit fsync() on just about any
>> close() (at least if everything is doing its business "correctly")
>>
>> It may be a strange idea but I can think of some near realtime
>> applications might be able to leverage a modicum of control over the
>> sync event. There is no API, and not strong reason to desire one, for
>> controlling the commit via (low privelege) applications.
>>
>> But if the plumbing exists, then having a mode where sync() or fsync()
>> (which I think causes a general sync because of the journal) resets the
>> commit timer could be really interesting.
>>
>> With any kind of delayed block choice/mapping it could actually reduce
>> the entropy of the individual files for repeated random small writes.
>> The application would have to be reasonably aware, of course.
>>
>> Since something is causing a sync() the commit=N guarantee is still
>> being met for the whole system for any N, but applications could tend to
>> avoid mid-write commits by planing their sync()s.
>>
>> Just a thought.
> 
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 520 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Theoretical Question about commit=n
  2017-11-13  0:41   ` Qu Wenruo
@ 2017-11-13  1:17     ` Hans van Kranenburg
  2017-11-13  1:25       ` Qu Wenruo
  0 siblings, 1 reply; 5+ messages in thread
From: Hans van Kranenburg @ 2017-11-13  1:17 UTC (permalink / raw)
  To: Qu Wenruo, Robert White, Btrfs BTRFS

On 11/13/2017 01:41 AM, Qu Wenruo wrote:
> 
> On 2017年11月13日 06:01, Hans van Kranenburg wrote:
>> On 11/12/2017 09:58 PM, Robert White wrote:
>>> Is the commit interval monotonic, or is it seconds after sync?
>>>
>>> What I mean is that if I manually call sync(2) does the commit timer
>>> reset? I'm thinking it does not, but I can imagine a workload where it
>>> ideally would.
>>
>> The magic happens inside the transaction kernel thread:
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/btrfs/disk-io.c?h=v4.14#n1925
>>
>> You can see the delay being computed:
>>     delay = HZ * fs_info->commit_interval;
>>
>> Almost at the end of the function, you see:
>>     schedule_timeout(delay)
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/time/timer.c?h=v4.14#n1676
>>
>> This schedule_timeout function sets a timer and then the thread goes to
>> sleep. If nothing happens, the kernel will wake up the thread after the
>> timer expires (can be later, but not earlier) and then it will redo the
>> loop.
>>
>> If something else wakes up the transaction thread, the timer is
>> discarded if it's not expired yet.
> 
> So far so good.
> 
>>
>> So it works like you would want.
> 
> Not exactly.

Ah, interesting.

> Sync or commit_transaction won't wake up transaction_kthread.
> 
> transaction_kthread will mostly be woken by trans error, remount or
> under certain case of btrfs_end_transaction.
> 
> So manually sync will not (at least not always) interrupt commit interval.

The fun thing is, when I just do sync, I see that the time it takes for
a next generation bump to happen is reset (while doing something simple
like touch x in a loop in another terminal).

> And even more, transaction_kthread will only commit transaction, which
> means it will only ensure metadata consistent.
> 
> It won't ensure buffered write to reach disk if its extent is not
> allocated yet (delalloc).

Hm, I have seen things like that in BTRFS_IOC_SYNC...

Actually, I first responded on the timer reset question, because that
one was easy to answer. I don't know if I want to descend the path
further into (f)sync. I heard it can get really messy down there. :]

> 
> Thanks,
> Qu
>>
>> You can test this yourself by looking at the "generation" number of your
>> filesystem. It's in the output of btrfs inspect dump-super:
>>
>> This is the little test filesystem I just used:
>>
>> -# btrfs inspect dump-super /dev/dorothy/mekker | grep ^generation
>> generation		35
>>
>> If you print the number in a loop, like every second, you can see it
>> going up after a transaction happened. Now play around with other things
>> and see when it changes.
>>
>>> (Again, this is purely theoretical, I have no such workload as I am
>>> about to describe.)
>>>
>>> So suppose I have some sort of system, like a database, that I know will
>>> do scattered writes and extends through some files and then call some
>>> variant of sync(2). And I know that those sync() calls will be every
>>> forty-to-sixty seconds because of reasons. It would be "neat" to be able
>>> to set the commit=n to some high value, like 90, and then "normally" the
>>> sync() behaviours would follow the application instead of the larger
>>> commit interval.
>>>
>>> The value would be that the file system would tend _not_ to go into sync
>>> while the application was still skittering about in the various files.
>>>
>>> Of course any other applications could call sync from their own contexts
>>> for their own reasons. And there's an implicit fsync() on just about any
>>> close() (at least if everything is doing its business "correctly")
>>>
>>> It may be a strange idea but I can think of some near realtime
>>> applications might be able to leverage a modicum of control over the
>>> sync event. There is no API, and not strong reason to desire one, for
>>> controlling the commit via (low privelege) applications.
>>>
>>> But if the plumbing exists, then having a mode where sync() or fsync()
>>> (which I think causes a general sync because of the journal) resets the
>>> commit timer could be really interesting.
>>>
>>> With any kind of delayed block choice/mapping it could actually reduce
>>> the entropy of the individual files for repeated random small writes.
>>> The application would have to be reasonably aware, of course.
>>>
>>> Since something is causing a sync() the commit=N guarantee is still
>>> being met for the whole system for any N, but applications could tend to
>>> avoid mid-write commits by planing their sync()s.
>>>
>>> Just a thought.
>>
>>


-- 
Hans van Kranenburg

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Theoretical Question about commit=n
  2017-11-13  1:17     ` Hans van Kranenburg
@ 2017-11-13  1:25       ` Qu Wenruo
  0 siblings, 0 replies; 5+ messages in thread
From: Qu Wenruo @ 2017-11-13  1:25 UTC (permalink / raw)
  To: Hans van Kranenburg, Robert White, Btrfs BTRFS


[-- Attachment #1.1: Type: text/plain, Size: 5252 bytes --]



On 2017年11月13日 09:17, Hans van Kranenburg wrote:
> On 11/13/2017 01:41 AM, Qu Wenruo wrote:
>>
>> On 2017年11月13日 06:01, Hans van Kranenburg wrote:
>>> On 11/12/2017 09:58 PM, Robert White wrote:
>>>> Is the commit interval monotonic, or is it seconds after sync?
>>>>
>>>> What I mean is that if I manually call sync(2) does the commit timer
>>>> reset? I'm thinking it does not, but I can imagine a workload where it
>>>> ideally would.
>>>
>>> The magic happens inside the transaction kernel thread:
>>>
>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/btrfs/disk-io.c?h=v4.14#n1925
>>>
>>> You can see the delay being computed:
>>>     delay = HZ * fs_info->commit_interval;
>>>
>>> Almost at the end of the function, you see:
>>>     schedule_timeout(delay)
>>>
>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/time/timer.c?h=v4.14#n1676
>>>
>>> This schedule_timeout function sets a timer and then the thread goes to
>>> sleep. If nothing happens, the kernel will wake up the thread after the
>>> timer expires (can be later, but not earlier) and then it will redo the
>>> loop.
>>>
>>> If something else wakes up the transaction thread, the timer is
>>> discarded if it's not expired yet.
>>
>> So far so good.
>>
>>>
>>> So it works like you would want.
>>
>> Not exactly.
> 
> Ah, interesting.
> 
>> Sync or commit_transaction won't wake up transaction_kthread.
>>
>> transaction_kthread will mostly be woken by trans error, remount or
>> under certain case of btrfs_end_transaction.
>>
>> So manually sync will not (at least not always) interrupt commit interval.
> 
> The fun thing is, when I just do sync, I see that the time it takes for
> a next generation bump to happen is reset (while doing something simple
> like touch x in a loop in another terminal).

Maybe something else is related.

You could dig it a little further by tracking which caller committed the
transaction, and I can totally be wrong about this.

> 
>> And even more, transaction_kthread will only commit transaction, which
>> means it will only ensure metadata consistent.
>>
>> It won't ensure buffered write to reach disk if its extent is not
>> allocated yet (delalloc).
> 
> Hm, I have seen things like that in BTRFS_IOC_SYNC...
> 
> Actually, I first responded on the timer reset question, because that
> one was easy to answer. I don't know if I want to descend the path
> further into (f)sync. I heard it can get really messy down there. :]

Yep, very messy.
Not messy within btrfs itself, but also related to kernel memory management.

And welcome to the hell of filesystem development.

Thanks,
Qu

> 
>>
>> Thanks,
>> Qu
>>>
>>> You can test this yourself by looking at the "generation" number of your
>>> filesystem. It's in the output of btrfs inspect dump-super:
>>>
>>> This is the little test filesystem I just used:
>>>
>>> -# btrfs inspect dump-super /dev/dorothy/mekker | grep ^generation
>>> generation		35
>>>
>>> If you print the number in a loop, like every second, you can see it
>>> going up after a transaction happened. Now play around with other things
>>> and see when it changes.
>>>
>>>> (Again, this is purely theoretical, I have no such workload as I am
>>>> about to describe.)
>>>>
>>>> So suppose I have some sort of system, like a database, that I know will
>>>> do scattered writes and extends through some files and then call some
>>>> variant of sync(2). And I know that those sync() calls will be every
>>>> forty-to-sixty seconds because of reasons. It would be "neat" to be able
>>>> to set the commit=n to some high value, like 90, and then "normally" the
>>>> sync() behaviours would follow the application instead of the larger
>>>> commit interval.
>>>>
>>>> The value would be that the file system would tend _not_ to go into sync
>>>> while the application was still skittering about in the various files.
>>>>
>>>> Of course any other applications could call sync from their own contexts
>>>> for their own reasons. And there's an implicit fsync() on just about any
>>>> close() (at least if everything is doing its business "correctly")
>>>>
>>>> It may be a strange idea but I can think of some near realtime
>>>> applications might be able to leverage a modicum of control over the
>>>> sync event. There is no API, and not strong reason to desire one, for
>>>> controlling the commit via (low privelege) applications.
>>>>
>>>> But if the plumbing exists, then having a mode where sync() or fsync()
>>>> (which I think causes a general sync because of the journal) resets the
>>>> commit timer could be really interesting.
>>>>
>>>> With any kind of delayed block choice/mapping it could actually reduce
>>>> the entropy of the individual files for repeated random small writes.
>>>> The application would have to be reasonably aware, of course.
>>>>
>>>> Since something is causing a sync() the commit=N guarantee is still
>>>> being met for the whole system for any N, but applications could tend to
>>>> avoid mid-write commits by planing their sync()s.
>>>>
>>>> Just a thought.
>>>
>>>
> 
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 520 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-11-13  1:25 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-12 20:58 Theoretical Question about commit=n Robert White
2017-11-12 22:01 ` Hans van Kranenburg
2017-11-13  0:41   ` Qu Wenruo
2017-11-13  1:17     ` Hans van Kranenburg
2017-11-13  1:25       ` Qu Wenruo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.