All of lore.kernel.org
 help / color / mirror / Atom feed
* [Ksummit-discuss] Self nomination
@ 2016-07-25 17:11 Johannes Weiner
  2016-07-25 18:15 ` Rik van Riel
  2016-07-28 18:55 ` [Ksummit-discuss] [TECH TOPIC] Memory thrashing, was " Johannes Weiner
  0 siblings, 2 replies; 81+ messages in thread
From: Johannes Weiner @ 2016-07-25 17:11 UTC (permalink / raw)
  To: ksummit-discuss

Hi,

I would like to self-nominate myself for this year's kernel summit.

I co-maintain cgroups and the memory controller and have been a
long-time contributor to the memory management subsystem. At Facebook,
I'm in charge of MM scalability and reliability in our fleet. Most
recently I have been working on reviving swap for SSDs and persistent
memory devices (https://lwn.net/Articles/690079/) as part of a bigger
anti-thrashing effort to make the VM recover swiftly and predictably
from load spikes. This has been a bit of a "lock yourself in a
basement" type project, which is why I missed the mechanimal
nomination based on sign-offs this year.

Thanks
Johannes

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Ksummit-discuss] Self nomination
  2016-07-25 17:11 [Ksummit-discuss] Self nomination Johannes Weiner
@ 2016-07-25 18:15 ` Rik van Riel
  2016-07-26 10:56   ` Jan Kara
  2016-07-28 18:55 ` [Ksummit-discuss] [TECH TOPIC] Memory thrashing, was " Johannes Weiner
  1 sibling, 1 reply; 81+ messages in thread
From: Rik van Riel @ 2016-07-25 18:15 UTC (permalink / raw)
  To: Johannes Weiner, ksummit-discuss

[-- Attachment #1: Type: text/plain, Size: 871 bytes --]

On Mon, 2016-07-25 at 13:11 -0400, Johannes Weiner wrote:
> Hi,
> 
> I would like to self-nominate myself for this year's kernel summit.
> 
> I co-maintain cgroups and the memory controller and have been a
> long-time contributor to the memory management subsystem. At
> Facebook,
> I'm in charge of MM scalability and reliability in our fleet. Most
> recently I have been working on reviving swap for SSDs and persistent
> memory devices (https://lwn.net/Articles/690079/) as part of a bigger
> anti-thrashing effort to make the VM recover swiftly and predictably
> from load spikes. This has been a bit of a "lock yourself in a
> basement" type project, which is why I missed the mechanimal
> nomination based on sign-offs this year.

I am interested in discussing that, either at kernel
summit, or at next year's LSF/MM.

-- 

All Rights Reversed.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Ksummit-discuss] Self nomination
  2016-07-25 18:15 ` Rik van Riel
@ 2016-07-26 10:56   ` Jan Kara
  2016-07-26 13:10     ` Vlastimil Babka
  0 siblings, 1 reply; 81+ messages in thread
From: Jan Kara @ 2016-07-26 10:56 UTC (permalink / raw)
  To: Rik van Riel; +Cc: ksummit-discuss

On Mon 25-07-16 14:15:18, Rik van Riel wrote:
> On Mon, 2016-07-25 at 13:11 -0400, Johannes Weiner wrote:
> > Hi,
> > 
> > I would like to self-nominate myself for this year's kernel summit.
> > 
> > I co-maintain cgroups and the memory controller and have been a
> > long-time contributor to the memory management subsystem. At
> > Facebook,
> > I'm in charge of MM scalability and reliability in our fleet. Most
> > recently I have been working on reviving swap for SSDs and persistent
> > memory devices (https://lwn.net/Articles/690079/) as part of a bigger
> > anti-thrashing effort to make the VM recover swiftly and predictably
> > from load spikes. This has been a bit of a "lock yourself in a
> > basement" type project, which is why I missed the mechanimal
> > nomination based on sign-offs this year.
> 
> I am interested in discussing that, either at kernel
> summit, or at next year's LSF/MM.

Me as well.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Ksummit-discuss] Self nomination
  2016-07-26 10:56   ` Jan Kara
@ 2016-07-26 13:10     ` Vlastimil Babka
  0 siblings, 0 replies; 81+ messages in thread
From: Vlastimil Babka @ 2016-07-26 13:10 UTC (permalink / raw)
  To: Jan Kara, Rik van Riel; +Cc: ksummit-discuss

On 07/26/2016 12:56 PM, Jan Kara wrote:
> On Mon 25-07-16 14:15:18, Rik van Riel wrote:
>> On Mon, 2016-07-25 at 13:11 -0400, Johannes Weiner wrote:
>>> Hi,
>>>
>>> I would like to self-nominate myself for this year's kernel summit.
>>>
>>> I co-maintain cgroups and the memory controller and have been a
>>> long-time contributor to the memory management subsystem. At
>>> Facebook,
>>> I'm in charge of MM scalability and reliability in our fleet. Most
>>> recently I have been working on reviving swap for SSDs and persistent
>>> memory devices (https://lwn.net/Articles/690079/) as part of a bigger
>>> anti-thrashing effort to make the VM recover swiftly and predictably
>>> from load spikes. This has been a bit of a "lock yourself in a
>>> basement" type project, which is why I missed the mechanimal
>>> nomination based on sign-offs this year.
>>
>> I am interested in discussing that, either at kernel
>> summit, or at next year's LSF/MM.
>
> Me as well.

+1

>
> 								Honza
>

^ permalink raw reply	[flat|nested] 81+ messages in thread

* [Ksummit-discuss] [TECH TOPIC] Memory thrashing, was Re:  Self nomination
  2016-07-25 17:11 [Ksummit-discuss] Self nomination Johannes Weiner
  2016-07-25 18:15 ` Rik van Riel
@ 2016-07-28 18:55 ` Johannes Weiner
  2016-07-28 21:41   ` James Bottomley
                     ` (3 more replies)
  1 sibling, 4 replies; 81+ messages in thread
From: Johannes Weiner @ 2016-07-28 18:55 UTC (permalink / raw)
  To: ksummit-discuss

On Mon, Jul 25, 2016 at 01:11:42PM -0400, Johannes Weiner wrote:
> Most recently I have been working on reviving swap for SSDs and
> persistent memory devices (https://lwn.net/Articles/690079/) as part
> of a bigger anti-thrashing effort to make the VM recover swiftly and
> predictably from load spikes.

A bit of context, in case we want to discuss this at KS:

We frequently have machines hang and stop responding indefinitely
after they experience memory load spikes. On closer look, we find most
tasks either in page reclaim or majorfaulting parts of an executable
or library. It's a typical thrashing pattern, where everybody
cannibalizes everybody else. The problem is that with fast storage the
cache reloads can be fast enough that there are never enough in-flight
pages at a time to cause page reclaim to fail and trigger the OOM
killer. The livelock persists until external remediation reboots the
box or we get lucky and non-cache allocations eventually suck up the
remaining page cache and trigger the OOM killer.

To avoid hitting this situation, we currently have to keep a generous
memory reserve for occasional spikes, which sucks for utilization the
rest of the time. Swap would be useful here, but the swapout code is
basically only triggering when memory pressure rises - which again
doesn't happen - so I've been working on the swap code to balance
cache reclaim vs. swap based on relative thrashing between the two.

There is usually some cold/unused anonymous memory lying around that
can be unloaded into swap during workload spikes, so that allows us to
drive up the average memory utilization without increasing the risk at
least. But if we screw up and there are not enough unused anon pages,
we are back to thrashing - only now it involves swapping too.

So how do we address this?

A pathological thrashing situation is very obvious to any user, but
it's not quite clear how to quantify it inside the kernel and have it
trigger the OOM killer. It might be useful to talk about
metrics. Could we quantify application progress? Could we quantify the
amount of time a task or the system spends thrashing, and somehow
express it as a percentage of overall execution time? Maybe something
comparable to IO wait time, except tracking the time spent performing
reclaim and waiting on IO that is refetching recently evicted pages?

This question seems to go beyond the memory subsystem and potentially
involve the scheduler and the block layer, so it might be a good tech
topic for KS.

Thanks

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] Memory thrashing, was Re:  Self nomination
  2016-07-28 18:55 ` [Ksummit-discuss] [TECH TOPIC] Memory thrashing, was " Johannes Weiner
@ 2016-07-28 21:41   ` James Bottomley
  2016-08-01 15:46     ` Johannes Weiner
  2016-07-29  0:25   ` Rik van Riel
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 81+ messages in thread
From: James Bottomley @ 2016-07-28 21:41 UTC (permalink / raw)
  To: Johannes Weiner, ksummit-discuss

On Thu, 2016-07-28 at 14:55 -0400, Johannes Weiner wrote:
> On Mon, Jul 25, 2016 at 01:11:42PM -0400, Johannes Weiner wrote:
> > Most recently I have been working on reviving swap for SSDs and
> > persistent memory devices (https://lwn.net/Articles/690079/) as
> > part
> > of a bigger anti-thrashing effort to make the VM recover swiftly
> > and
> > predictably from load spikes.
> 
> A bit of context, in case we want to discuss this at KS:
> 
> We frequently have machines hang and stop responding indefinitely
> after they experience memory load spikes. On closer look, we find 
> most tasks either in page reclaim or majorfaulting parts of an 
> executable or library. It's a typical thrashing pattern, where 
> everybody cannibalizes everybody else. The problem is that with fast 
> storage the cache reloads can be fast enough that there are never 
> enough in-flight pages at a time to cause page reclaim to fail and 
> trigger the OOM killer. The livelock persists until external
> remediation reboots the
> box or we get lucky and non-cache allocations eventually suck up the
> remaining page cache and trigger the OOM killer.
> 
> To avoid hitting this situation, we currently have to keep a generous
> memory reserve for occasional spikes, which sucks for utilization the
> rest of the time. Swap would be useful here, but the swapout code is
> basically only triggering when memory pressure rises - which again
> doesn't happen - so I've been working on the swap code to balance
> cache reclaim vs. swap based on relative thrashing between the two.
> 
> There is usually some cold/unused anonymous memory lying around that
> can be unloaded into swap during workload spikes, so that allows us 
> to drive up the average memory utilization without increasing the 
> risk at least. But if we screw up and there are not enough unused 
> anon pages, we are back to thrashing - only now it involves swapping
> too.
> 
> So how do we address this?
> 
> A pathological thrashing situation is very obvious to any user, but
> it's not quite clear how to quantify it inside the kernel and have it
> trigger the OOM killer. It might be useful to talk about metrics. 
> Could we quantify application progress? Could we quantify the amount 
> of time a task or the system spends thrashing, and somehow express it 
> as a percentage of overall execution time? Maybe something comparable 
> to IO wait time, except tracking the time spent performing reclaim
> and waiting on IO that is refetching recently evicted pages?
> 
> This question seems to go beyond the memory subsystem and potentially
> involve the scheduler and the block layer, so it might be a good tech
> topic for KS.

Actually, I'd be interested in this.  We're starting to generate use
cases in the container cloud for swap (I can't believe I'm saying this
since we hitherto regarded swap as wholly evil).  The issue is that we
want to load the system up into its overcommit region (it means two
things: either we're re-using under used resources or, more correctly,
we're reselling resources we sold to one customer, but they're not
using, so we can sell them to another).  From some research done within
IBM, it turns out there's a region where swapping is beneficial.  We
define it as the region where the B/W to swap doesn't exceed the B/W
capacity of the disk (is this the metric you're looking for?). 
 Surprisingly, this is a stable region, so we can actually operate the
physical system within this region.  It also turns out to be the ideal
region for operating overcommitted systems in because what appears to
be happening is that we're forcing allocated but unused objects (dirty
anonymous memory) out to swap.  The ideal cloud to run this in is one
which has a mix of soak jobs (background, best effort jobs, usually
analytics based) and highly interactive containers (usually web servers
or something).  We find that if we tune the swappiness of the memory
cgroup of the container to 0 for the interactive jobs, they show no
loss of throughput in this region.

Our definition of progress is a bit different from yours above because
the interactive jobs must respond as if they were near bare metal, so
we penalise the soak jobs.  However, we find that the soak jobs also
make reasonable progress according to your measure above (reasonable
enough means the customer is happy to pay for the time they've used).

James

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] Memory thrashing, was Re:  Self nomination
  2016-07-28 18:55 ` [Ksummit-discuss] [TECH TOPIC] Memory thrashing, was " Johannes Weiner
  2016-07-28 21:41   ` James Bottomley
@ 2016-07-29  0:25   ` Rik van Riel
  2016-07-29 11:07   ` Mel Gorman
  2016-08-02  9:18   ` Jan Kara
  3 siblings, 0 replies; 81+ messages in thread
From: Rik van Riel @ 2016-07-29  0:25 UTC (permalink / raw)
  To: Johannes Weiner, ksummit-discuss

[-- Attachment #1: Type: text/plain, Size: 3148 bytes --]

On Thu, 2016-07-28 at 14:55 -0400, Johannes Weiner wrote:
> On Mon, Jul 25, 2016 at 01:11:42PM -0400, Johannes Weiner wrote:
> > Most recently I have been working on reviving swap for SSDs and
> > persistent memory devices (https://lwn.net/Articles/690079/) as
> > part
> > of a bigger anti-thrashing effort to make the VM recover swiftly
> > and
> > predictably from load spikes.
> 
> A bit of context, in case we want to discuss this at KS:
> 
> We frequently have machines hang and stop responding indefinitely
> after they experience memory load spikes. On closer look, we find
> most
> tasks either in page reclaim or majorfaulting parts of an executable
> or library. It's a typical thrashing pattern, where everybody
> cannibalizes everybody else. The problem is that with fast storage
> the
> cache reloads can be fast enough that there are never enough in-
> flight
> pages at a time to cause page reclaim to fail and trigger the OOM
> killer. The livelock persists until external remediation reboots the
> box or we get lucky and non-cache allocations eventually suck up the
> remaining page cache and trigger the OOM killer.
> 
> To avoid hitting this situation, we currently have to keep a generous
> memory reserve for occasional spikes, which sucks for utilization the
> rest of the time. Swap would be useful here, but the swapout code is
> basically only triggering when memory pressure rises - which again
> doesn't happen - so I've been working on the swap code to balance
> cache reclaim vs. swap based on relative thrashing between the two.
> 
> There is usually some cold/unused anonymous memory lying around that
> can be unloaded into swap during workload spikes, so that allows us
> to
> drive up the average memory utilization without increasing the risk
> at
> least. But if we screw up and there are not enough unused anon pages,
> we are back to thrashing - only now it involves swapping too.
> 
> So how do we address this?
> 
> A pathological thrashing situation is very obvious to any user, but
> it's not quite clear how to quantify it inside the kernel and have it
> trigger the OOM killer. It might be useful to talk about
> metrics. Could we quantify application progress? Could we quantify
> the
> amount of time a task or the system spends thrashing, and somehow
> express it as a percentage of overall execution time? Maybe something
> comparable to IO wait time, except tracking the time spent performing
> reclaim and waiting on IO that is refetching recently evicted pages?
> 
> This question seems to go beyond the memory subsystem and potentially
> involve the scheduler and the block layer, so it might be a good tech
> topic for KS.

I would like to discuss this topic, as well.

This is a very fundamental issue that used to be hard
coded in the BSDs (in the 1980s & 1990s), but where
hard coding is totally inappropriate with today's memory
sizes, and variation in I/O subsystem speeds.

Solving this, even if only on the detection side, could
make a real difference in having systems survive load
spikes.

-- 

All Rights Reversed.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] Memory thrashing, was Re:  Self nomination
  2016-07-28 18:55 ` [Ksummit-discuss] [TECH TOPIC] Memory thrashing, was " Johannes Weiner
  2016-07-28 21:41   ` James Bottomley
  2016-07-29  0:25   ` Rik van Riel
@ 2016-07-29 11:07   ` Mel Gorman
  2016-07-29 16:26     ` Luck, Tony
  2016-08-01 16:55     ` Johannes Weiner
  2016-08-02  9:18   ` Jan Kara
  3 siblings, 2 replies; 81+ messages in thread
From: Mel Gorman @ 2016-07-29 11:07 UTC (permalink / raw)
  To: Johannes Weiner; +Cc: ksummit-discuss

On Thu, Jul 28, 2016 at 02:55:23PM -0400, Johannes Weiner wrote:
> On Mon, Jul 25, 2016 at 01:11:42PM -0400, Johannes Weiner wrote:
> > Most recently I have been working on reviving swap for SSDs and
> > persistent memory devices (https://lwn.net/Articles/690079/) as part
> > of a bigger anti-thrashing effort to make the VM recover swiftly and
> > predictably from load spikes.
> 
> A bit of context, in case we want to discuss this at KS:
> 

Even if it's not a dedicated topic, I'm interested in talking about
this.

> We frequently have machines hang and stop responding indefinitely
> after they experience memory load spikes. On closer look, we find most
> tasks either in page reclaim or majorfaulting parts of an executable
> or library. It's a typical thrashing pattern, where everybody
> cannibalizes everybody else. The problem is that with fast storage the
> cache reloads can be fast enough that there are never enough in-flight
> pages at a time to cause page reclaim to fail and trigger the OOM
> killer. The livelock persists until external remediation reboots the
> box or we get lucky and non-cache allocations eventually suck up the
> remaining page cache and trigger the OOM killer.
> 

This is fundamental to how we current track (or not track) pressure.
Unreclaimable is defined as excessive scanning without a page being
reclaimed which is useless with fast storage. This triggers when there is
so many dirty/writeback pages that reclaim is impossible which indirectly
depends on storage being slow.

Sure, it can still happen if every page is being activated before reaching
the end of the inactive list but that is close to impossible with large
memory sizes.

> To avoid hitting this situation, we currently have to keep a generous
> memory reserve for occasional spikes, which sucks for utilization the
> rest of the time. Swap would be useful here, but the swapout code is
> basically only triggering when memory pressure rises - which again
> doesn't happen - so I've been working on the swap code to balance
> cache reclaim vs. swap based on relative thrashing between the two.
> 

While we have active and inactive lists, they have no concept of time.
Inactive may be "has not been used in hours" or "deactivated recently due to
memory pressure". If we continually aged pages at a very slow rate (e.g. 1%
of a node per minute) in the absense of memory pressure we could create a
"unused" list without reclaiming it in the absense of pressure. We'd
also have to scan 1% part of the unused list at the same time and
reactivate pages if necessary.

Minimally, we'd have a very rough estimate of the true WSS as a bonus.
If we could forcible pageout/swap the unused list, potentially ignoring
swappiness for anon pages. With monitoring, an admin would be able to
estimate how large a spike a system can handle without impact. A crucial
aspect would be knowing the average age of the unused list though and I've
no good idea right now how to calculate that.

We could side-step the time issue slightly by only adding pages to the
unused list during the "continual background aging" scan and never when
reclaiming. Continual background aging should also not happen if any process
is reclaiming. If we tagged the time the unused list gets its first page
and the time of the most recently added page, that would at least give
us a *very* approximate age of the list. That is flawed unfortunately if
the first page added gets reactivated but there a few different ways we
could approximate the age (e.g. unused 1 minute, unused 5 minutes, unused
30 minutes lists).

> There is usually some cold/unused anonymous memory lying around that
> can be unloaded into swap during workload spikes, so that allows us to
> drive up the average memory utilization without increasing the risk at
> least. But if we screw up and there are not enough unused anon pages,
> we are back to thrashing - only now it involves swapping too.
> 
> So how do we address this?
> 
> A pathological thrashing situation is very obvious to any user, but
> it's not quite clear how to quantify it inside the kernel and have it
> trigger the OOM killer.

The OOM killer is at the extreme end of the spectrum. One unloved piece of
code is vmpressure.c which we never put that much effort into.  Ideally, that
would at least be able to notify user space that the system is under pressure
but I have anecdotal evidence that it gives bad advice on large systems.

Essentially, we have four bits of information related to memory pressure --
allocations, scans, steals and refaults. A 1:1:1 ratio of allocations, scans
and steals could just be a streaming workload. The refaults distinguish
between streaming and thrashing workloads but we don't use this for
vmpressure calculations or OOM detection.

> It might be useful to talk about
> metrics. Could we quantify application progress?

We can at least calculate if it's stalling on reclaim or refaults. High
amounts of both would indicate that the application is struggling.

> Could we quantify the
> amount of time a task or the system spends thrashing, and somehow
> express it as a percentage of overall execution time?

Potentially if time spent refaulting or direct reclaiming was accounted
for. What complicates this significantly is kswapd.

> Maybe something
> comparable to IO wait time, except tracking the time spent performing
> reclaim and waiting on IO that is refetching recently evicted pages?
> 

Ideally, yes.

> This question seems to go beyond the memory subsystem and potentially
> involve the scheduler and the block layer, so it might be a good tech
> topic for KS.
> 

I'm on board anyway.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] Memory thrashing, was Re:  Self nomination
  2016-07-29 11:07   ` Mel Gorman
@ 2016-07-29 16:26     ` Luck, Tony
  2016-08-01 15:17       ` Rik van Riel
  2016-08-01 16:55     ` Johannes Weiner
  1 sibling, 1 reply; 81+ messages in thread
From: Luck, Tony @ 2016-07-29 16:26 UTC (permalink / raw)
  To: Mel Gorman; +Cc: ksummit-discuss

On Fri, Jul 29, 2016 at 12:07:24PM +0100, Mel Gorman wrote:
> On Thu, Jul 28, 2016 at 02:55:23PM -0400, Johannes Weiner wrote:
> > On Mon, Jul 25, 2016 at 01:11:42PM -0400, Johannes Weiner wrote:

> > It might be useful to talk about
> > metrics. Could we quantify application progress?

The most reliable way to do that would be to have an actual
user mode program that runs, accessing some configurable number
of pages, periodically touching some file in /proc/sys/vm to
let the kernel know that some quantum of work had been completed.

Then the kernel would get accurate data on application progress
(at the cost of cpu time and memory consumed by this process, and
increased power usage when the system could otherwise be idle).

-Tony

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] Memory thrashing, was Re:  Self nomination
  2016-07-29 16:26     ` Luck, Tony
@ 2016-08-01 15:17       ` Rik van Riel
  0 siblings, 0 replies; 81+ messages in thread
From: Rik van Riel @ 2016-08-01 15:17 UTC (permalink / raw)
  To: Luck, Tony, Mel Gorman; +Cc: ksummit-discuss

[-- Attachment #1: Type: text/plain, Size: 1204 bytes --]

On Fri, 2016-07-29 at 09:26 -0700, Luck, Tony wrote:
> On Fri, Jul 29, 2016 at 12:07:24PM +0100, Mel Gorman wrote:
> > On Thu, Jul 28, 2016 at 02:55:23PM -0400, Johannes Weiner wrote:
> > > On Mon, Jul 25, 2016 at 01:11:42PM -0400, Johannes Weiner wrote:
> 
> > > It might be useful to talk about
> > > metrics. Could we quantify application progress?
> 
> The most reliable way to do that would be to have an actual
> user mode program that runs, accessing some configurable number
> of pages, periodically touching some file in /proc/sys/vm to
> let the kernel know that some quantum of work had been completed.

I don't think there is a need for that.

We already keep track of how much user time and how
much system time a program uses, and how much time
it is stalled on IO.

If user time is low, a program is stalled on IO a
lot of the time, and a lot of the faults are refaults
(previously accessed memory), then we are thrashing.

If the program is not stalled on IO much, or is
accessing pages it has not previously accessed
before, it is not thrashing.

We probably have the right statistics already, unless
I am overlooking something.

-- 

All Rights Reversed.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] Memory thrashing, was Re:  Self nomination
  2016-07-28 21:41   ` James Bottomley
@ 2016-08-01 15:46     ` Johannes Weiner
  2016-08-01 16:06       ` James Bottomley
  0 siblings, 1 reply; 81+ messages in thread
From: Johannes Weiner @ 2016-08-01 15:46 UTC (permalink / raw)
  To: James Bottomley; +Cc: ksummit-discuss

On Thu, Jul 28, 2016 at 05:41:43PM -0400, James Bottomley wrote:
> On Thu, 2016-07-28 at 14:55 -0400, Johannes Weiner wrote:
> > On Mon, Jul 25, 2016 at 01:11:42PM -0400, Johannes Weiner wrote:
> > > Most recently I have been working on reviving swap for SSDs and
> > > persistent memory devices (https://lwn.net/Articles/690079/) as
> > > part
> > > of a bigger anti-thrashing effort to make the VM recover swiftly
> > > and
> > > predictably from load spikes.
> > 
> > A bit of context, in case we want to discuss this at KS:
> > 
> > We frequently have machines hang and stop responding indefinitely
> > after they experience memory load spikes. On closer look, we find 
> > most tasks either in page reclaim or majorfaulting parts of an 
> > executable or library. It's a typical thrashing pattern, where 
> > everybody cannibalizes everybody else. The problem is that with fast 
> > storage the cache reloads can be fast enough that there are never 
> > enough in-flight pages at a time to cause page reclaim to fail and 
> > trigger the OOM killer. The livelock persists until external
> > remediation reboots the
> > box or we get lucky and non-cache allocations eventually suck up the
> > remaining page cache and trigger the OOM killer.
> > 
> > To avoid hitting this situation, we currently have to keep a generous
> > memory reserve for occasional spikes, which sucks for utilization the
> > rest of the time. Swap would be useful here, but the swapout code is
> > basically only triggering when memory pressure rises - which again
> > doesn't happen - so I've been working on the swap code to balance
> > cache reclaim vs. swap based on relative thrashing between the two.
> > 
> > There is usually some cold/unused anonymous memory lying around that
> > can be unloaded into swap during workload spikes, so that allows us 
> > to drive up the average memory utilization without increasing the 
> > risk at least. But if we screw up and there are not enough unused 
> > anon pages, we are back to thrashing - only now it involves swapping
> > too.
> > 
> > So how do we address this?
> > 
> > A pathological thrashing situation is very obvious to any user, but
> > it's not quite clear how to quantify it inside the kernel and have it
> > trigger the OOM killer. It might be useful to talk about metrics. 
> > Could we quantify application progress? Could we quantify the amount 
> > of time a task or the system spends thrashing, and somehow express it 
> > as a percentage of overall execution time? Maybe something comparable 
> > to IO wait time, except tracking the time spent performing reclaim
> > and waiting on IO that is refetching recently evicted pages?
> > 
> > This question seems to go beyond the memory subsystem and potentially
> > involve the scheduler and the block layer, so it might be a good tech
> > topic for KS.
> 
> Actually, I'd be interested in this.  We're starting to generate use
> cases in the container cloud for swap (I can't believe I'm saying this
> since we hitherto regarded swap as wholly evil).  The issue is that we
> want to load the system up into its overcommit region (it means two
> things: either we're re-using under used resources or, more correctly,
> we're reselling resources we sold to one customer, but they're not
> using, so we can sell them to another).  From some research done within
> IBM, it turns out there's a region where swapping is beneficial.  We
> define it as the region where the B/W to swap doesn't exceed the B/W
> capacity of the disk (is this the metric you're looking for?).

That's an interesting take, I haven't thought about that. But note
that the CPU cost of evicting and refetching pages is not negligible:
even on fairly beefy machines we've seen significant CPU load when the
IO device hits saturation. With persistent memory devices you might
actually run out of CPU capacity while performing basic page aging
before you saturate the storage device (which is why Andi Kleen has
been suggesting to replace LRU reclaim with random replacement for
these devices). So storage device saturation might not be the final
answer to this problem.

> Our definition of progress is a bit different from yours above because
> the interactive jobs must respond as if they were near bare metal, so
> we penalise the soak jobs.  However, we find that the soak jobs also
> make reasonable progress according to your measure above (reasonable
> enough means the customer is happy to pay for the time they've used).

We actually are in the same boat, where most of our services are doing
work within the context of interactive user sessions. So in terms of
quantifying progress, both throughput and latency percentiles would be
necessary to form a full picture of whether we are beyond capacity.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] Memory thrashing, was Re:  Self nomination
  2016-08-01 15:46     ` Johannes Weiner
@ 2016-08-01 16:06       ` James Bottomley
  2016-08-01 16:11         ` Dave Hansen
  0 siblings, 1 reply; 81+ messages in thread
From: James Bottomley @ 2016-08-01 16:06 UTC (permalink / raw)
  To: Johannes Weiner; +Cc: ksummit-discuss

On Mon, 2016-08-01 at 11:46 -0400, Johannes Weiner wrote:
> On Thu, Jul 28, 2016 at 05:41:43PM -0400, James Bottomley wrote:
> > On Thu, 2016-07-28 at 14:55 -0400, Johannes Weiner wrote:
> > > On Mon, Jul 25, 2016 at 01:11:42PM -0400, Johannes Weiner wrote:
> > > > Most recently I have been working on reviving swap for SSDs and
> > > > persistent memory devices (https://lwn.net/Articles/690079/) as
> > > > part of a bigger anti-thrashing effort to make the VM recover
> > > > swiftly and predictably from load spikes.
> > > 
> > > A bit of context, in case we want to discuss this at KS:
> > > 
> > > We frequently have machines hang and stop responding indefinitely
> > > after they experience memory load spikes. On closer look, we find
> > > most tasks either in page reclaim or majorfaulting parts of an 
> > > executable or library. It's a typical thrashing pattern, where 
> > > everybody cannibalizes everybody else. The problem is that with 
> > > fast storage the cache reloads can be fast enough that there are 
> > > never enough in-flight pages at a time to cause page reclaim to 
> > > fail and trigger the OOM killer. The livelock persists until 
> > > external remediation reboots the box or we get lucky and non
> > > -cache allocations eventually suck up the remaining page cache
> > > and trigger the OOM killer.
> > > 
> > > To avoid hitting this situation, we currently have to keep a 
> > > generous memory reserve for occasional spikes, which sucks for 
> > > utilization the rest of the time. Swap would be useful here, but 
> > > the swapout code is basically only triggering when memory 
> > > pressure rises - which again doesn't happen - so I've been 
> > > working on the swap code to balance cache reclaim vs. swap based 
> > > on relative thrashing between the two.
> > > 
> > > There is usually some cold/unused anonymous memory lying around 
> > > that can be unloaded into swap during workload spikes, so that 
> > > allows us to drive up the average memory utilization without 
> > > increasing the risk at least. But if we screw up and there are 
> > > not enough unused anon pages, we are back to thrashing - only now 
> > > it involves swapping too.
> > > 
> > > So how do we address this?
> > > 
> > > A pathological thrashing situation is very obvious to any user, 
> > > but it's not quite clear how to quantify it inside the kernel and
> > > have it trigger the OOM killer. It might be useful to talk about 
> > > metrics.  Could we quantify application progress? Could we 
> > > quantify the amount of time a task or the system spends 
> > > thrashing, and somehow express it as a percentage of overall 
> > > execution time? Maybe something comparable to IO wait time, 
> > > except tracking the time spent performing reclaim and waiting on
> > > IO that is refetching recently evicted pages?
> > > 
> > > This question seems to go beyond the memory subsystem and 
> > > potentially involve the scheduler and the block layer, so it 
> > > might be a good tech topic for KS.
> > 
> > Actually, I'd be interested in this.  We're starting to generate 
> > use cases in the container cloud for swap (I can't believe I'm 
> > saying this since we hitherto regarded swap as wholly evil).  The 
> > issue is that we want to load the system up into its overcommit 
> > region (it means two things: either we're re-using under used 
> > resources or, more correctly, we're reselling resources we sold to 
> > one customer, but they're not using, so we can sell them to 
> > another).  From some research done within IBM, it turns out there's 
> > a region where swapping is beneficial.   We define it as the region 
> > where the B/W to swap doesn't exceed the B/W capacity of the disk
> > (is this the metric you're looking for?).
> 
> That's an interesting take, I haven't thought about that. But note
> that the CPU cost of evicting and refetching pages is not negligible:
> even on fairly beefy machines we've seen significant CPU load when 
> the IO device hits saturation.

Right, but we're not looking to use swap as a kind of slightly more
expensive memory.  We're looking to push the system aggressively to
find its working set while we load it up with jobs.  This means we need
not very often referenced anonymous memory out on swap.  We use
standard SSDs, so if the anon memory refault rate goes too high, we
move from region 3 to region 4 (required swap B/W exceeds available
swap B/W)  and the system goes unstable (so we'd unload it a bit).

>  With persistent memory devices you might actually run out of CPU 
> capacity while performing basic page aging before you saturate the 
> storage device (which is why Andi Kleen has been suggesting to 
> replace LRU reclaim with random replacement for these devices). So 
> storage device saturation might not be the final answer to this
> problem.

We really wouldn't want this.  All cloud jobs seem to have memory they
allocate but rarely use, so we want the properties of the LRU list to
get this on swap so we can re-use the memory pages for something else. 
 A random replacement algorithm would play havoc with that.

Our biggest problem is the difficulty in forcing the system to push
anonymous stuff out to swap.  Linux really likes to hang on to its
anonymous pages and if you get too abrasive with it, it starts dumping
your file backed pages and causing refaults leading to instability
there instead.  We haven't yet played with the swappiness patches, but
we're hoping they will go some way towards fixing this.

> > Our definition of progress is a bit different from yours above 
> > because the interactive jobs must respond as if they were near bare 
> > metal, so we penalise the soak jobs.  However, we find that the 
> > soak jobs also make reasonable progress according to your measure 
> > above (reasonable enough means the customer is happy to pay for the 
> > time they've used).
> 
> We actually are in the same boat, where most of our services are 
> doing work within the context of interactive user sessions. So in 
> terms of quantifying progress, both throughput and latency 
> percentiles would be necessary to form a full picture of whether we
> are beyond capacity.

OK, so this region 3 work (where we can get the system stable with an
acceptable refault rate for the anonymous pages) is probably where you
want to be operating as well.

James

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] Memory thrashing, was Re: Self nomination
  2016-08-01 16:06       ` James Bottomley
@ 2016-08-01 16:11         ` Dave Hansen
  2016-08-01 16:33           ` James Bottomley
  2016-08-01 17:08           ` Johannes Weiner
  0 siblings, 2 replies; 81+ messages in thread
From: Dave Hansen @ 2016-08-01 16:11 UTC (permalink / raw)
  To: James Bottomley, Johannes Weiner; +Cc: Kleen, Andi, ksummit-discuss

On 08/01/2016 09:06 AM, James Bottomley wrote:
>>  With persistent memory devices you might actually run out of CPU 
>> > capacity while performing basic page aging before you saturate the 
>> > storage device (which is why Andi Kleen has been suggesting to 
>> > replace LRU reclaim with random replacement for these devices). So 
>> > storage device saturation might not be the final answer to this
>> > problem.
> We really wouldn't want this.  All cloud jobs seem to have memory they
> allocate but rarely use, so we want the properties of the LRU list to
> get this on swap so we can re-use the memory pages for something else. 
>  A random replacement algorithm would play havoc with that.

I don't want to put words in Andi's mouth, but what we want isn't
necessarily something that is random, but it's something that uses less
CPU to swap out a given page.

All the LRU scanning is expensive and doesn't scale particularly well,
and there are some situations where we should be willing to give up some
of the precision of the current LRU in order to increase the throughput
of reclaim in general.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] Memory thrashing, was Re: Self nomination
  2016-08-01 16:11         ` Dave Hansen
@ 2016-08-01 16:33           ` James Bottomley
  2016-08-01 18:13             ` Rik van Riel
  2016-08-01 19:51             ` Dave Hansen
  2016-08-01 17:08           ` Johannes Weiner
  1 sibling, 2 replies; 81+ messages in thread
From: James Bottomley @ 2016-08-01 16:33 UTC (permalink / raw)
  To: Dave Hansen, Johannes Weiner; +Cc: Kleen, Andi, ksummit-discuss

On Mon, 2016-08-01 at 09:11 -0700, Dave Hansen wrote:
> On 08/01/2016 09:06 AM, James Bottomley wrote:
> > >  With persistent memory devices you might actually run out of CPU
> > > > capacity while performing basic page aging before you saturate
> > > > the 
> > > > storage device (which is why Andi Kleen has been suggesting to 
> > > > replace LRU reclaim with random replacement for these devices).
> > > > So 
> > > > storage device saturation might not be the final answer to this
> > > > problem.
> > We really wouldn't want this.  All cloud jobs seem to have memory 
> > they allocate but rarely use, so we want the properties of the LRU 
> > list to get this on swap so we can re-use the memory pages for 
> > something else.  A random replacement algorithm would play havoc
> > with that.
> 
> I don't want to put words in Andi's mouth, but what we want isn't
> necessarily something that is random, but it's something that uses 
> less CPU to swap out a given page.

OK, if it's more deterministic, I'll wait to see the proposal.

> All the LRU scanning is expensive and doesn't scale particularly
> well, and there are some situations where we should be willing to
> give up some of the precision of the current LRU in order to increase
> the throughput of reclaim in general.

Would some type of hinting mechanism work (say via madvise)? 
 MADV_DONTNEED may be good enough, but we could really do with
MADV_SWAP_OUT_NOW to indicate objects we really don't want.  I suppose
I can lose all my credibility by saying this would be the JVM: it knows
roughly the expected lifetime and access patterns and is well qualified
to mark objects as infrequently enough accessed to reside on swap.

I suppose another question is do we still want all of this to be page
based?  We moved to extents in filesystems a while ago, wouldn't some
extent based LRU mechanism be cheaper ... unfortunately it means
something has to try to come up with an idea of what an extent means (I
suspect it would be a bunch of virtually contiguous pages which have
the same expected LRU properties, but I'm thinking from the application
centric viewpoint).

James

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] Memory thrashing, was Re:  Self nomination
  2016-07-29 11:07   ` Mel Gorman
  2016-07-29 16:26     ` Luck, Tony
@ 2016-08-01 16:55     ` Johannes Weiner
  1 sibling, 0 replies; 81+ messages in thread
From: Johannes Weiner @ 2016-08-01 16:55 UTC (permalink / raw)
  To: Mel Gorman; +Cc: ksummit-discuss

On Fri, Jul 29, 2016 at 12:07:24PM +0100, Mel Gorman wrote:
> On Thu, Jul 28, 2016 at 02:55:23PM -0400, Johannes Weiner wrote:
> > To avoid hitting this situation, we currently have to keep a generous
> > memory reserve for occasional spikes, which sucks for utilization the
> > rest of the time. Swap would be useful here, but the swapout code is
> > basically only triggering when memory pressure rises - which again
> > doesn't happen - so I've been working on the swap code to balance
> > cache reclaim vs. swap based on relative thrashing between the two.
> 
> While we have active and inactive lists, they have no concept of time.
> Inactive may be "has not been used in hours" or "deactivated recently due to
> memory pressure". If we continually aged pages at a very slow rate (e.g. 1%
> of a node per minute) in the absense of memory pressure we could create a
> "unused" list without reclaiming it in the absense of pressure. We'd
> also have to scan 1% part of the unused list at the same time and
> reactivate pages if necessary.
>
> Minimally, we'd have a very rough estimate of the true WSS as a bonus.

I fear that something like this would get into the "hardcoded"
territory that Rik mentioned. 1% per minute might be plenty to
distinguish hot and cold for some workloads, and too coarse for
others.

For WSS estimates to be meaningful, they need to be based on a
sampling interval that is connected to the time it takes to evict a
page and the time it takes to refetch it. Because if the access
frequencies of a workload are fairly spread out, kicking out the
colder pages and refetching them later to make room for hotter pages
in the meantime might be a good trade-off to make - especially when
stacking multiple (containerized) workloads onto a single machine.

The WSS of a workload over its lifetime might be several times the
available memory, but what you really care about is how much time you
are actually losing due to memory being underprovisioned for that
workload. If the frequency spectrum is compressed, you might be making
almost no progress at all. If it's spread out, the available memory
might still be mostly underutilized.

We don't have a concept of time in page aging right now, but AFAICS
introducing one would be the central part in making WSS estimation and
subsequent resource allocation work without costly trial and error.

> > There is usually some cold/unused anonymous memory lying around that
> > can be unloaded into swap during workload spikes, so that allows us to
> > drive up the average memory utilization without increasing the risk at
> > least. But if we screw up and there are not enough unused anon pages,
> > we are back to thrashing - only now it involves swapping too.
> > 
> > So how do we address this?
> > 
> > A pathological thrashing situation is very obvious to any user, but
> > it's not quite clear how to quantify it inside the kernel and have it
> > trigger the OOM killer.
> 
> The OOM killer is at the extreme end of the spectrum. One unloved piece of
> code is vmpressure.c which we never put that much effort into.  Ideally, that
> would at least be able to notify user space that the system is under pressure
> but I have anecdotal evidence that it gives bad advice on large systems.

Bringing in the OOM killer doesn't preclude advance notification. But
severe thrashing *is* an OOM situation that can only be handled by
reducing the number of concurrent page references going on. If the
user can help out, that's great, but the OOM killer should still be
the last line of defense to bring the system back into a stable state.

> Essentially, we have four bits of information related to memory pressure --
> allocations, scans, steals and refaults. A 1:1:1 ratio of allocations, scans
> and steals could just be a streaming workload. The refaults distinguish
> between streaming and thrashing workloads but we don't use this for
> vmpressure calculations or OOM detection.

The information we have right now can tell us whether the workingset
is stable or not, and thus whether we should challenge the currently
protected pages or not. What we can't do is tell whether the thrashing
is an acceptable transition between too workingsets or a sustained
instability. The answer to that lies on a subjective spectrum.

Consider a workload that is accessing two datasets alternatingly, like
a database user that is switching back and forth between two tables to
process their data. If evicting one table and loading the other from
storage takes up 1% of the task's time, and processing the data the
other 99%, then we can likely provision memory such that it can hold
one table at a time. If evicting and reloading takes up 10% of the
time, it might still be fine; they might only care about latency while
the active table is loaded, or they might prioritize another job over
this one. If evicting and refetching consumes 95% of the task's time,
we might want to look into giving it more RAM.

So yes, with mm/workingset.c we finally have all the information to
unambiguously identify which VM events are due to memory being
underprovisioned. But we need a concept of time to put the impact of
these events into perspective. And I'm arguing that that perspective
is overall execution time of the tasks in the system (or container),
to calculate the percentage of time lost due to underprovisioning.

> > It might be useful to talk about
> > metrics. Could we quantify application progress?
> 
> We can at least calculate if it's stalling on reclaim or refaults. High
> amounts of both would indicate that the application is struggling.

Again: or transitioning.

> > Could we quantify the
> > amount of time a task or the system spends thrashing, and somehow
> > express it as a percentage of overall execution time?
> 
> Potentially if time spent refaulting or direct reclaiming was accounted
> for. What complicates this significantly is kswapd.

Kswapd is a shared resource, but memory is as well. Whatever concept
of time we can come up with that works for memory should be on the
same scope as kswapd. E.g. potentially available time slices in the
system (or container).

> > This question seems to go beyond the memory subsystem and potentially
> > involve the scheduler and the block layer, so it might be a good tech
> > topic for KS.
> 
> I'm on board anyway.

Great!

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] Memory thrashing, was Re: Self nomination
  2016-08-01 16:11         ` Dave Hansen
  2016-08-01 16:33           ` James Bottomley
@ 2016-08-01 17:08           ` Johannes Weiner
  2016-08-01 18:19             ` Johannes Weiner
  1 sibling, 1 reply; 81+ messages in thread
From: Johannes Weiner @ 2016-08-01 17:08 UTC (permalink / raw)
  To: Dave Hansen; +Cc: James Bottomley, Kleen, Andi, ksummit-discuss

On Mon, Aug 01, 2016 at 09:11:32AM -0700, Dave Hansen wrote:
> On 08/01/2016 09:06 AM, James Bottomley wrote:
> >>  With persistent memory devices you might actually run out of CPU 
> >> > capacity while performing basic page aging before you saturate the 
> >> > storage device (which is why Andi Kleen has been suggesting to 
> >> > replace LRU reclaim with random replacement for these devices). So 
> >> > storage device saturation might not be the final answer to this
> >> > problem.
> > We really wouldn't want this.  All cloud jobs seem to have memory they
> > allocate but rarely use, so we want the properties of the LRU list to
> > get this on swap so we can re-use the memory pages for something else. 
> >  A random replacement algorithm would play havoc with that.
> 
> I don't want to put words in Andi's mouth, but what we want isn't
> necessarily something that is random, but it's something that uses less
> CPU to swap out a given page.

Random eviction doesn't mean random outcome of what stabilizes in
memory and swap. The idea is to apply pressure on all pages equally
but in no particular order, and then the in-memory set forms based on
reference frequencies and refaults/swapins.

Our anon LRU approximation can be so inaccurate as to be doing that
already anyway, only with all the overhead of having an LRU list.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] Memory thrashing, was Re: Self nomination
  2016-08-01 16:33           ` James Bottomley
@ 2016-08-01 18:13             ` Rik van Riel
  2016-08-01 19:51             ` Dave Hansen
  1 sibling, 0 replies; 81+ messages in thread
From: Rik van Riel @ 2016-08-01 18:13 UTC (permalink / raw)
  To: James Bottomley, Dave Hansen, Johannes Weiner
  Cc: Kleen, Andi, ksummit-discuss

[-- Attachment #1: Type: text/plain, Size: 2417 bytes --]

On Mon, 2016-08-01 at 12:33 -0400, James Bottomley wrote:
> On Mon, 2016-08-01 at 09:11 -0700, Dave Hansen wrote:
> > On 08/01/2016 09:06 AM, James Bottomley wrote:
> > > >  With persistent memory devices you might actually run out of
> > > > CPU
> > > > > capacity while performing basic page aging before you
> > > > > saturate
> > > > > the 
> > > > > storage device (which is why Andi Kleen has been suggesting
> > > > > to 
> > > > > replace LRU reclaim with random replacement for these
> > > > > devices).
> > > > > So 
> > > > > storage device saturation might not be the final answer to
> > > > > this
> > > > > problem.
> > > We really wouldn't want this.  All cloud jobs seem to have
> > > memory 
> > > they allocate but rarely use, so we want the properties of the
> > > LRU 
> > > list to get this on swap so we can re-use the memory pages for 
> > > something else.  A random replacement algorithm would play havoc
> > > with that.
> > 
> > I don't want to put words in Andi's mouth, but what we want isn't
> > necessarily something that is random, but it's something that uses 
> > less CPU to swap out a given page.
> 
> OK, if it's more deterministic, I'll wait to see the proposal.
> 
> > All the LRU scanning is expensive and doesn't scale particularly
> > well, and there are some situations where we should be willing to
> > give up some of the precision of the current LRU in order to
> > increase
> > the throughput of reclaim in general.
> 
> Would some type of hinting mechanism work (say via madvise)? 

I suspect that might introduce overhead in other ways.

> I suppose another question is do we still want all of this to be page
> based?  We moved to extents in filesystems a while ago, wouldn't some
> extent based LRU mechanism be cheaper ... unfortunately it means
> something has to try to come up with an idea of what an extent means
> (I
> suspect it would be a bunch of virtually contiguous pages which have
> the same expected LRU properties, but I'm thinking from the
> application
> centric viewpoint).
> 
On sufficiently fast swap, we could just swap 2MB pages,
or whatever size THP is on the architecture in question,
in and out of memory.

Working with blocks 512x the size of a 4kB page might
be enough of a scalability gain to match the faster IO
speeds of new storage.

-- 

All Rights Reversed.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] Memory thrashing, was Re: Self nomination
  2016-08-01 17:08           ` Johannes Weiner
@ 2016-08-01 18:19             ` Johannes Weiner
  0 siblings, 0 replies; 81+ messages in thread
From: Johannes Weiner @ 2016-08-01 18:19 UTC (permalink / raw)
  To: Dave Hansen; +Cc: James Bottomley, Kleen, Andi, ksummit-discuss

On Mon, Aug 01, 2016 at 01:08:46PM -0400, Johannes Weiner wrote:
> On Mon, Aug 01, 2016 at 09:11:32AM -0700, Dave Hansen wrote:
> > On 08/01/2016 09:06 AM, James Bottomley wrote:
> > >>  With persistent memory devices you might actually run out of CPU 
> > >> > capacity while performing basic page aging before you saturate the 
> > >> > storage device (which is why Andi Kleen has been suggesting to 
> > >> > replace LRU reclaim with random replacement for these devices). So 
> > >> > storage device saturation might not be the final answer to this
> > >> > problem.
> > > We really wouldn't want this.  All cloud jobs seem to have memory they
> > > allocate but rarely use, so we want the properties of the LRU list to
> > > get this on swap so we can re-use the memory pages for something else. 
> > >  A random replacement algorithm would play havoc with that.
> > 
> > I don't want to put words in Andi's mouth, but what we want isn't
> > necessarily something that is random, but it's something that uses less
> > CPU to swap out a given page.
> 
> Random eviction doesn't mean random outcome of what stabilizes in
> memory and swap. The idea is to apply pressure on all pages equally
> but in no particular order, and then the in-memory set forms based on
> reference frequencies and refaults/swapins.

Anyway, this is getting a little off-topic.

I only brought up CPU cost to make the point that, while sustained
swap-in rate might be a good signal to unload a machine or reschedule
a job elsewhere, it might not be a generic answer to the question of
how much a system's overall progress is actually impeded due to
somebody swapping; or whether the system is actually in a livelock
state that requires intervention by the OOM killer.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] Memory thrashing, was Re: Self nomination
  2016-08-01 16:33           ` James Bottomley
  2016-08-01 18:13             ` Rik van Riel
@ 2016-08-01 19:51             ` Dave Hansen
  1 sibling, 0 replies; 81+ messages in thread
From: Dave Hansen @ 2016-08-01 19:51 UTC (permalink / raw)
  To: James Bottomley, Johannes Weiner; +Cc: Kleen, Andi, ksummit-discuss

On 08/01/2016 09:33 AM, James Bottomley wrote:
>> All the LRU scanning is expensive and doesn't scale particularly
>> well, and there are some situations where we should be willing to
>> give up some of the precision of the current LRU in order to increase
>> the throughput of reclaim in general.
> 
> Would some type of hinting mechanism work (say via madvise)? 
>  MADV_DONTNEED may be good enough, but we could really do with
> MADV_SWAP_OUT_NOW to indicate objects we really don't want.  I suppose
> I can lose all my credibility by saying this would be the JVM: it knows
> roughly the expected lifetime and access patterns and is well qualified
> to mark objects as infrequently enough accessed to reside on swap.

I don't think MADV_DONTNEED is a good fit because it is destructive.  It
does seem like we are missing a true companion to MADV_WILLNEED which
would give memory a push in the direction of being swapped out.

But I don't think it's too crazy to expect apps to participate.  They
certainly have the potential to know more about their data than the
kernel does, and things like GPUs are already pretty actively optimizing
by moving memory around.

> I suppose another question is do we still want all of this to be page
> based?  We moved to extents in filesystems a while ago, wouldn't some
> extent based LRU mechanism be cheaper ... unfortunately it means
> something has to try to come up with an idea of what an extent means (I
> suspect it would be a bunch of virtually contiguous pages which have
> the same expected LRU properties, but I'm thinking from the application
> centric viewpoint).

One part of this (certainly not the _only_ one) is expanding where
transparent huge pages can be used.  That's one extent definition that's
relatively easy to agree on.

Past that, there are lots of things we can try (including something like
you've suggested), but I don't think anybody knows what will work yet.
There is no shortage of ideas.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] Memory thrashing, was Re:  Self nomination
  2016-07-28 18:55 ` [Ksummit-discuss] [TECH TOPIC] Memory thrashing, was " Johannes Weiner
                     ` (2 preceding siblings ...)
  2016-07-29 11:07   ` Mel Gorman
@ 2016-08-02  9:18   ` Jan Kara
  3 siblings, 0 replies; 81+ messages in thread
From: Jan Kara @ 2016-08-02  9:18 UTC (permalink / raw)
  To: Johannes Weiner; +Cc: ksummit-discuss

On Thu 28-07-16 14:55:23, Johannes Weiner wrote:
> On Mon, Jul 25, 2016 at 01:11:42PM -0400, Johannes Weiner wrote:
> > Most recently I have been working on reviving swap for SSDs and
> > persistent memory devices (https://lwn.net/Articles/690079/) as part
> > of a bigger anti-thrashing effort to make the VM recover swiftly and
> > predictably from load spikes.
> 
> A bit of context, in case we want to discuss this at KS:
> 
> We frequently have machines hang and stop responding indefinitely
> after they experience memory load spikes. On closer look, we find most
> tasks either in page reclaim or majorfaulting parts of an executable
> or library. It's a typical thrashing pattern, where everybody
> cannibalizes everybody else. The problem is that with fast storage the
> cache reloads can be fast enough that there are never enough in-flight
> pages at a time to cause page reclaim to fail and trigger the OOM
> killer. The livelock persists until external remediation reboots the
> box or we get lucky and non-cache allocations eventually suck up the
> remaining page cache and trigger the OOM killer.
> 
> To avoid hitting this situation, we currently have to keep a generous
> memory reserve for occasional spikes, which sucks for utilization the
> rest of the time. Swap would be useful here, but the swapout code is
> basically only triggering when memory pressure rises - which again
> doesn't happen - so I've been working on the swap code to balance
> cache reclaim vs. swap based on relative thrashing between the two.
> 
> There is usually some cold/unused anonymous memory lying around that
> can be unloaded into swap during workload spikes, so that allows us to
> drive up the average memory utilization without increasing the risk at
> least. But if we screw up and there are not enough unused anon pages,
> we are back to thrashing - only now it involves swapping too.
> 
> So how do we address this?
> 
> A pathological thrashing situation is very obvious to any user, but
> it's not quite clear how to quantify it inside the kernel and have it
> trigger the OOM killer. It might be useful to talk about
> metrics. Could we quantify application progress? Could we quantify the
> amount of time a task or the system spends thrashing, and somehow
> express it as a percentage of overall execution time? Maybe something
> comparable to IO wait time, except tracking the time spent performing
> reclaim and waiting on IO that is refetching recently evicted pages?
> 
> This question seems to go beyond the memory subsystem and potentially
> involve the scheduler and the block layer, so it might be a good tech
> topic for KS.

I'd be interested to join this discussion.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Ksummit-discuss] Self nomination
  2016-08-08 11:07   ` Lorenzo Pieralisi
@ 2016-09-23 10:42     ` Grant Likely
  0 siblings, 0 replies; 81+ messages in thread
From: Grant Likely @ 2016-09-23 10:42 UTC (permalink / raw)
  To: Lorenzo Pieralisi; +Cc: ksummit-discuss

On Mon, Aug 8, 2016 at 12:07 PM, Lorenzo Pieralisi
<lorenzo.pieralisi@arm.com> wrote:
> On Thu, Jul 28, 2016 at 11:14:51AM +0100, Marc Zyngier wrote:
>> On 26/07/16 23:30, Dmitry Torokhov wrote:
>> > I'd like to nominate myself for the kernel summit this year. I am part
>> > of Chrome OS kernel team and I also maintain drivers/input in mainline.
>>
>> [...]
>>
>> > - I would like to sync up with people and discuss [lack of] progress
>> >   on topic of device probe ordering (including handling of deferred
>> >   probes, asynchronous probes, etc).
>>
>> I'm extremely interested in discussing this.
>>
>> It has wide reaching consequences as (with my irqchip maintainer hat on)
>> we've had to pretend that some bits of HW (timers, interrupt
>> controllers) are not "devices". Not a massive issue for most, except
>> when your interrupt controller has requirements that are very similar to
>> the DMA mapping API (which you cannot use because "not a device"). Other
>> problems are introduced by things like wire-MSI bridges, and most people
>> end-up resorting to hacks like ad-hoc initcalls and sprinkling deferred
>> probes in specific drivers.
>>
>> I've seen a number of proposal so far, but the subject seems to have
>> gone quiet (well, not really, but hardly any progress has been made).
>>
>> Happy to make this a tech discussion or a hallway track.
>
> I am very interested in this discussion too in whatever form it takes
> place and I really think it is the right time to find a way forward
> for DT and ACPI probing alike given that we have started facing
> these probe ordering issues in ARM/ACPI world too, it would be nice to
> find a solution that works seamlessly.

[Responding very late to this thread]

I also want to be involved with this discussion. We've got to come up
with something better than what we've got now.

g.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Ksummit-discuss] Self nomination
  2016-08-24 12:12             ` Marek Szyprowski
@ 2016-08-24 17:32               ` Rafael J. Wysocki
  -1 siblings, 0 replies; 81+ messages in thread
From: Rafael J. Wysocki @ 2016-08-24 17:32 UTC (permalink / raw)
  To: Marek Szyprowski; +Cc: ksummit-discuss, Tomeu Vizoso, Linux PM list

On Wednesday, August 24, 2016 02:12:18 PM Marek Szyprowski wrote:
> Hi Rafael,
> 
> 
> On 2016-08-06 02:20, Rafael J. Wysocki wrote:
> > On Wednesday, August 03, 2016 10:12:00 AM Marek Szyprowski wrote:
> >> Dear All,
> >>
> >>
> >> On 2016-08-03 01:00, Rafael J. Wysocki wrote:
> >>> On Tuesday, August 02, 2016 10:09:17 AM Linus Walleij wrote:
> >>>> On Thu, Jul 28, 2016 at 12:14 PM, Marc Zyngier <marc.zyngier@arm.com> wrote:
> >>>>> On 26/07/16 23:30, Dmitry Torokhov wrote:
> >>>>>> - I would like to sync up with people and discuss [lack of] progress
> >>>>>>     on topic of device probe ordering (including handling of deferred
> >>>>>>     probes, asynchronous probes, etc).
> >>>>> I'm extremely interested in discussing this.
> >>>> I've also tried to pitch in on it in the past but I just feel stupid
> >>>> whenever we try to come up with something better than what
> >>>> we have :(
> >>>>
> >>>>> It has wide reaching consequences as (with my irqchip maintainer hat on)
> >>>>> we've had to pretend that some bits of HW (timers, interrupt
> >>>>> controllers) are not "devices". Not a massive issue for most, except
> >>>>> when your interrupt controller has requirements that are very similar to
> >>>>> the DMA mapping API (which you cannot use because "not a device"). Other
> >>>>> problems are introduced by things like wire-MSI bridges, and most people
> >>>>> end-up resorting to hacks like ad-hoc initcalls and sprinkling deferred
> >>>>> probes in specific drivers.
> >>>> Same feeling here. I'm accepting patches for random initcall
> >>>> reordering because there is nothing else I can do, people need to
> >>>> have their systems running. But it feels really fragile.
> >>>>
> >>>> Deferred probe alleviated the problem, but I remember saying at
> >>>> the time that what we really need to do is build a dependency
> >>>> graph and resolve it the same way e.g. systemd does. (Someone
> >>>> may have called me BS on that, either for being wrong about everything
> >>>> as usual or because of mentioning systemd, I don't know which one.)
> >>>>
> >>>> The latest proposal I saw came from Rafael and he had a scratch
> >>>> idea for a dependency graph that I really liked, but I guess he's been
> >>>> sidetracked since. Rafael, what happened with that?
> >>> I got distracted, but Marek Szyprowski has revived it recently.
> >>>
> >>> It needs to be cleaned up somewhat, but other than that I think it's in
> >>> a good enough shape to make some progress in that direction, at least in
> >>> principle.
> >> I really like the idea of pm dependencies between device and the patches
> >> prepared by Rafael. They are exactly what we need for our case (PM for
> >> Exynos IOMMU), but they will also help solving PM issues with complex
> >> devices (like DRM for SoCs and ASoC audio).
> >>
> >> Rafael: do you plan to do any update on them?
> > Yes, I do, but to make some cosmetic changes rather.
> >
> >> Some time ago you wrote, that you had such plan, but real life proved
> >> something else.
> > Well, I was working on other things in the meantime, but I still had that
> > plan. :-)
> >
> >> If needed I can continue works on them, but I need some directions what has
> >> to be improved and fixed.
> > Thanks so much!
> >
> > First off, the networking people claimed the "devlink" term in the meantime
> > and it's better to avoid confusion here, so I'd change it to "devdep" or
> > similar in the patches.
> >
> > In addition to that Tomeu Vizoso complained that the supplier_links and
> > consumer_links list heads in struct device were confusing and I see why that
> > could be the case, so I'd change them to something more direct, like maybe
> > links_to_suppliers and links_to_consumers.
> >
> > Please let me know what you think.
> 
> I think that both name changes (devlink -> devdep and adding "_to_") 
> make sense
> and such change will make the code easier to understand.
> 
> I've also managed to find what is the source of the problem with reboot 
> hang that
> Tobias reported. It was my fault of incorrect use of device links. 
> Adding a link
> to not-yet-fully-registered device results in trashing devices_kset and 
> dpm lists.
> I will check if it is possible to a warning or proper support for such case
> (iommu support for given device is initialized before given device's 
> struct device
> is added to the system by device_add() function).
> 
> Do you want me to resend the patches with the above mentioned name 
> changes or do
> you want to do it on your own and then I will send my updated patches?

I'd like to post new versions myself and then let you rebase on top of
them if that's not a problem.

I'll get to that when I get back home from the current travels.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Ksummit-discuss] Self nomination
@ 2016-08-24 17:32               ` Rafael J. Wysocki
  0 siblings, 0 replies; 81+ messages in thread
From: Rafael J. Wysocki @ 2016-08-24 17:32 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Linus Walleij, Marc Zyngier, Dmitry Torokhov, ksummit-discuss,
	Tomeu Vizoso, Linux PM list

On Wednesday, August 24, 2016 02:12:18 PM Marek Szyprowski wrote:
> Hi Rafael,
> 
> 
> On 2016-08-06 02:20, Rafael J. Wysocki wrote:
> > On Wednesday, August 03, 2016 10:12:00 AM Marek Szyprowski wrote:
> >> Dear All,
> >>
> >>
> >> On 2016-08-03 01:00, Rafael J. Wysocki wrote:
> >>> On Tuesday, August 02, 2016 10:09:17 AM Linus Walleij wrote:
> >>>> On Thu, Jul 28, 2016 at 12:14 PM, Marc Zyngier <marc.zyngier@arm.com> wrote:
> >>>>> On 26/07/16 23:30, Dmitry Torokhov wrote:
> >>>>>> - I would like to sync up with people and discuss [lack of] progress
> >>>>>>     on topic of device probe ordering (including handling of deferred
> >>>>>>     probes, asynchronous probes, etc).
> >>>>> I'm extremely interested in discussing this.
> >>>> I've also tried to pitch in on it in the past but I just feel stupid
> >>>> whenever we try to come up with something better than what
> >>>> we have :(
> >>>>
> >>>>> It has wide reaching consequences as (with my irqchip maintainer hat on)
> >>>>> we've had to pretend that some bits of HW (timers, interrupt
> >>>>> controllers) are not "devices". Not a massive issue for most, except
> >>>>> when your interrupt controller has requirements that are very similar to
> >>>>> the DMA mapping API (which you cannot use because "not a device"). Other
> >>>>> problems are introduced by things like wire-MSI bridges, and most people
> >>>>> end-up resorting to hacks like ad-hoc initcalls and sprinkling deferred
> >>>>> probes in specific drivers.
> >>>> Same feeling here. I'm accepting patches for random initcall
> >>>> reordering because there is nothing else I can do, people need to
> >>>> have their systems running. But it feels really fragile.
> >>>>
> >>>> Deferred probe alleviated the problem, but I remember saying at
> >>>> the time that what we really need to do is build a dependency
> >>>> graph and resolve it the same way e.g. systemd does. (Someone
> >>>> may have called me BS on that, either for being wrong about everything
> >>>> as usual or because of mentioning systemd, I don't know which one.)
> >>>>
> >>>> The latest proposal I saw came from Rafael and he had a scratch
> >>>> idea for a dependency graph that I really liked, but I guess he's been
> >>>> sidetracked since. Rafael, what happened with that?
> >>> I got distracted, but Marek Szyprowski has revived it recently.
> >>>
> >>> It needs to be cleaned up somewhat, but other than that I think it's in
> >>> a good enough shape to make some progress in that direction, at least in
> >>> principle.
> >> I really like the idea of pm dependencies between device and the patches
> >> prepared by Rafael. They are exactly what we need for our case (PM for
> >> Exynos IOMMU), but they will also help solving PM issues with complex
> >> devices (like DRM for SoCs and ASoC audio).
> >>
> >> Rafael: do you plan to do any update on them?
> > Yes, I do, but to make some cosmetic changes rather.
> >
> >> Some time ago you wrote, that you had such plan, but real life proved
> >> something else.
> > Well, I was working on other things in the meantime, but I still had that
> > plan. :-)
> >
> >> If needed I can continue works on them, but I need some directions what has
> >> to be improved and fixed.
> > Thanks so much!
> >
> > First off, the networking people claimed the "devlink" term in the meantime
> > and it's better to avoid confusion here, so I'd change it to "devdep" or
> > similar in the patches.
> >
> > In addition to that Tomeu Vizoso complained that the supplier_links and
> > consumer_links list heads in struct device were confusing and I see why that
> > could be the case, so I'd change them to something more direct, like maybe
> > links_to_suppliers and links_to_consumers.
> >
> > Please let me know what you think.
> 
> I think that both name changes (devlink -> devdep and adding "_to_") 
> make sense
> and such change will make the code easier to understand.
> 
> I've also managed to find what is the source of the problem with reboot 
> hang that
> Tobias reported. It was my fault of incorrect use of device links. 
> Adding a link
> to not-yet-fully-registered device results in trashing devices_kset and 
> dpm lists.
> I will check if it is possible to a warning or proper support for such case
> (iommu support for given device is initialized before given device's 
> struct device
> is added to the system by device_add() function).
> 
> Do you want me to resend the patches with the above mentioned name 
> changes or do
> you want to do it on your own and then I will send my updated patches?

I'd like to post new versions myself and then let you rebase on top of
them if that's not a problem.

I'll get to that when I get back home from the current travels.

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Ksummit-discuss] Self nomination
  2016-08-06  0:20           ` Rafael J. Wysocki
@ 2016-08-24 12:12             ` Marek Szyprowski
  -1 siblings, 0 replies; 81+ messages in thread
From: Marek Szyprowski @ 2016-08-24 12:12 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: ksummit-discuss, Tomeu Vizoso, Linux PM list

Hi Rafael,


On 2016-08-06 02:20, Rafael J. Wysocki wrote:
> On Wednesday, August 03, 2016 10:12:00 AM Marek Szyprowski wrote:
>> Dear All,
>>
>>
>> On 2016-08-03 01:00, Rafael J. Wysocki wrote:
>>> On Tuesday, August 02, 2016 10:09:17 AM Linus Walleij wrote:
>>>> On Thu, Jul 28, 2016 at 12:14 PM, Marc Zyngier <marc.zyngier@arm.com> wrote:
>>>>> On 26/07/16 23:30, Dmitry Torokhov wrote:
>>>>>> - I would like to sync up with people and discuss [lack of] progress
>>>>>>     on topic of device probe ordering (including handling of deferred
>>>>>>     probes, asynchronous probes, etc).
>>>>> I'm extremely interested in discussing this.
>>>> I've also tried to pitch in on it in the past but I just feel stupid
>>>> whenever we try to come up with something better than what
>>>> we have :(
>>>>
>>>>> It has wide reaching consequences as (with my irqchip maintainer hat on)
>>>>> we've had to pretend that some bits of HW (timers, interrupt
>>>>> controllers) are not "devices". Not a massive issue for most, except
>>>>> when your interrupt controller has requirements that are very similar to
>>>>> the DMA mapping API (which you cannot use because "not a device"). Other
>>>>> problems are introduced by things like wire-MSI bridges, and most people
>>>>> end-up resorting to hacks like ad-hoc initcalls and sprinkling deferred
>>>>> probes in specific drivers.
>>>> Same feeling here. I'm accepting patches for random initcall
>>>> reordering because there is nothing else I can do, people need to
>>>> have their systems running. But it feels really fragile.
>>>>
>>>> Deferred probe alleviated the problem, but I remember saying at
>>>> the time that what we really need to do is build a dependency
>>>> graph and resolve it the same way e.g. systemd does. (Someone
>>>> may have called me BS on that, either for being wrong about everything
>>>> as usual or because of mentioning systemd, I don't know which one.)
>>>>
>>>> The latest proposal I saw came from Rafael and he had a scratch
>>>> idea for a dependency graph that I really liked, but I guess he's been
>>>> sidetracked since. Rafael, what happened with that?
>>> I got distracted, but Marek Szyprowski has revived it recently.
>>>
>>> It needs to be cleaned up somewhat, but other than that I think it's in
>>> a good enough shape to make some progress in that direction, at least in
>>> principle.
>> I really like the idea of pm dependencies between device and the patches
>> prepared by Rafael. They are exactly what we need for our case (PM for
>> Exynos IOMMU), but they will also help solving PM issues with complex
>> devices (like DRM for SoCs and ASoC audio).
>>
>> Rafael: do you plan to do any update on them?
> Yes, I do, but to make some cosmetic changes rather.
>
>> Some time ago you wrote, that you had such plan, but real life proved
>> something else.
> Well, I was working on other things in the meantime, but I still had that
> plan. :-)
>
>> If needed I can continue works on them, but I need some directions what has
>> to be improved and fixed.
> Thanks so much!
>
> First off, the networking people claimed the "devlink" term in the meantime
> and it's better to avoid confusion here, so I'd change it to "devdep" or
> similar in the patches.
>
> In addition to that Tomeu Vizoso complained that the supplier_links and
> consumer_links list heads in struct device were confusing and I see why that
> could be the case, so I'd change them to something more direct, like maybe
> links_to_suppliers and links_to_consumers.
>
> Please let me know what you think.

I think that both name changes (devlink -> devdep and adding "_to_") 
make sense
and such change will make the code easier to understand.

I've also managed to find what is the source of the problem with reboot 
hang that
Tobias reported. It was my fault of incorrect use of device links. 
Adding a link
to not-yet-fully-registered device results in trashing devices_kset and 
dpm lists.
I will check if it is possible to a warning or proper support for such case
(iommu support for given device is initialized before given device's 
struct device
is added to the system by device_add() function).

Do you want me to resend the patches with the above mentioned name 
changes or do
you want to do it on your own and then I will send my updated patches?

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Ksummit-discuss] Self nomination
@ 2016-08-24 12:12             ` Marek Szyprowski
  0 siblings, 0 replies; 81+ messages in thread
From: Marek Szyprowski @ 2016-08-24 12:12 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linus Walleij, Marc Zyngier, Dmitry Torokhov, ksummit-discuss,
	Tomeu Vizoso, Linux PM list

Hi Rafael,


On 2016-08-06 02:20, Rafael J. Wysocki wrote:
> On Wednesday, August 03, 2016 10:12:00 AM Marek Szyprowski wrote:
>> Dear All,
>>
>>
>> On 2016-08-03 01:00, Rafael J. Wysocki wrote:
>>> On Tuesday, August 02, 2016 10:09:17 AM Linus Walleij wrote:
>>>> On Thu, Jul 28, 2016 at 12:14 PM, Marc Zyngier <marc.zyngier@arm.com> wrote:
>>>>> On 26/07/16 23:30, Dmitry Torokhov wrote:
>>>>>> - I would like to sync up with people and discuss [lack of] progress
>>>>>>     on topic of device probe ordering (including handling of deferred
>>>>>>     probes, asynchronous probes, etc).
>>>>> I'm extremely interested in discussing this.
>>>> I've also tried to pitch in on it in the past but I just feel stupid
>>>> whenever we try to come up with something better than what
>>>> we have :(
>>>>
>>>>> It has wide reaching consequences as (with my irqchip maintainer hat on)
>>>>> we've had to pretend that some bits of HW (timers, interrupt
>>>>> controllers) are not "devices". Not a massive issue for most, except
>>>>> when your interrupt controller has requirements that are very similar to
>>>>> the DMA mapping API (which you cannot use because "not a device"). Other
>>>>> problems are introduced by things like wire-MSI bridges, and most people
>>>>> end-up resorting to hacks like ad-hoc initcalls and sprinkling deferred
>>>>> probes in specific drivers.
>>>> Same feeling here. I'm accepting patches for random initcall
>>>> reordering because there is nothing else I can do, people need to
>>>> have their systems running. But it feels really fragile.
>>>>
>>>> Deferred probe alleviated the problem, but I remember saying at
>>>> the time that what we really need to do is build a dependency
>>>> graph and resolve it the same way e.g. systemd does. (Someone
>>>> may have called me BS on that, either for being wrong about everything
>>>> as usual or because of mentioning systemd, I don't know which one.)
>>>>
>>>> The latest proposal I saw came from Rafael and he had a scratch
>>>> idea for a dependency graph that I really liked, but I guess he's been
>>>> sidetracked since. Rafael, what happened with that?
>>> I got distracted, but Marek Szyprowski has revived it recently.
>>>
>>> It needs to be cleaned up somewhat, but other than that I think it's in
>>> a good enough shape to make some progress in that direction, at least in
>>> principle.
>> I really like the idea of pm dependencies between device and the patches
>> prepared by Rafael. They are exactly what we need for our case (PM for
>> Exynos IOMMU), but they will also help solving PM issues with complex
>> devices (like DRM for SoCs and ASoC audio).
>>
>> Rafael: do you plan to do any update on them?
> Yes, I do, but to make some cosmetic changes rather.
>
>> Some time ago you wrote, that you had such plan, but real life proved
>> something else.
> Well, I was working on other things in the meantime, but I still had that
> plan. :-)
>
>> If needed I can continue works on them, but I need some directions what has
>> to be improved and fixed.
> Thanks so much!
>
> First off, the networking people claimed the "devlink" term in the meantime
> and it's better to avoid confusion here, so I'd change it to "devdep" or
> similar in the patches.
>
> In addition to that Tomeu Vizoso complained that the supplier_links and
> consumer_links list heads in struct device were confusing and I see why that
> could be the case, so I'd change them to something more direct, like maybe
> links_to_suppliers and links_to_consumers.
>
> Please let me know what you think.

I think that both name changes (devlink -> devdep and adding "_to_") 
make sense
and such change will make the code easier to understand.

I've also managed to find what is the source of the problem with reboot 
hang that
Tobias reported. It was my fault of incorrect use of device links. 
Adding a link
to not-yet-fully-registered device results in trashing devices_kset and 
dpm lists.
I will check if it is possible to a warning or proper support for such case
(iommu support for given device is initialized before given device's 
struct device
is added to the system by device_add() function).

Do you want me to resend the patches with the above mentioned name 
changes or do
you want to do it on your own and then I will send my updated patches?

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Ksummit-discuss] Self nomination
  2016-07-28 10:14 ` Marc Zyngier
  2016-08-02  8:09   ` Linus Walleij
@ 2016-08-08 11:07   ` Lorenzo Pieralisi
  2016-09-23 10:42     ` Grant Likely
  1 sibling, 1 reply; 81+ messages in thread
From: Lorenzo Pieralisi @ 2016-08-08 11:07 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: ksummit-discuss

On Thu, Jul 28, 2016 at 11:14:51AM +0100, Marc Zyngier wrote:
> On 26/07/16 23:30, Dmitry Torokhov wrote:
> > I'd like to nominate myself for the kernel summit this year. I am part
> > of Chrome OS kernel team and I also maintain drivers/input in mainline.
> 
> [...]
> 
> > - I would like to sync up with people and discuss [lack of] progress
> >   on topic of device probe ordering (including handling of deferred
> >   probes, asynchronous probes, etc).
> 
> I'm extremely interested in discussing this.
> 
> It has wide reaching consequences as (with my irqchip maintainer hat on)
> we've had to pretend that some bits of HW (timers, interrupt
> controllers) are not "devices". Not a massive issue for most, except
> when your interrupt controller has requirements that are very similar to
> the DMA mapping API (which you cannot use because "not a device"). Other
> problems are introduced by things like wire-MSI bridges, and most people
> end-up resorting to hacks like ad-hoc initcalls and sprinkling deferred
> probes in specific drivers.
> 
> I've seen a number of proposal so far, but the subject seems to have
> gone quiet (well, not really, but hardly any progress has been made).
> 
> Happy to make this a tech discussion or a hallway track.

I am very interested in this discussion too in whatever form it takes
place and I really think it is the right time to find a way forward
for DT and ACPI probing alike given that we have started facing
these probe ordering issues in ARM/ACPI world too, it would be nice to
find a solution that works seamlessly.

Lorenzo

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Ksummit-discuss] Self nomination
  2016-08-03  8:12       ` Marek Szyprowski
@ 2016-08-06  0:20           ` Rafael J. Wysocki
  0 siblings, 0 replies; 81+ messages in thread
From: Rafael J. Wysocki @ 2016-08-06  0:20 UTC (permalink / raw)
  To: Marek Szyprowski; +Cc: ksummit-discuss, Tomeu Vizoso, Linux PM list

On Wednesday, August 03, 2016 10:12:00 AM Marek Szyprowski wrote:
> Dear All,
> 
> 
> On 2016-08-03 01:00, Rafael J. Wysocki wrote:
> > On Tuesday, August 02, 2016 10:09:17 AM Linus Walleij wrote:
> >> On Thu, Jul 28, 2016 at 12:14 PM, Marc Zyngier <marc.zyngier@arm.com> wrote:
> >>> On 26/07/16 23:30, Dmitry Torokhov wrote:
> >>>> - I would like to sync up with people and discuss [lack of] progress
> >>>>    on topic of device probe ordering (including handling of deferred
> >>>>    probes, asynchronous probes, etc).
> >>> I'm extremely interested in discussing this.
> >> I've also tried to pitch in on it in the past but I just feel stupid
> >> whenever we try to come up with something better than what
> >> we have :(
> >>
> >>> It has wide reaching consequences as (with my irqchip maintainer hat on)
> >>> we've had to pretend that some bits of HW (timers, interrupt
> >>> controllers) are not "devices". Not a massive issue for most, except
> >>> when your interrupt controller has requirements that are very similar to
> >>> the DMA mapping API (which you cannot use because "not a device"). Other
> >>> problems are introduced by things like wire-MSI bridges, and most people
> >>> end-up resorting to hacks like ad-hoc initcalls and sprinkling deferred
> >>> probes in specific drivers.
> >> Same feeling here. I'm accepting patches for random initcall
> >> reordering because there is nothing else I can do, people need to
> >> have their systems running. But it feels really fragile.
> >>
> >> Deferred probe alleviated the problem, but I remember saying at
> >> the time that what we really need to do is build a dependency
> >> graph and resolve it the same way e.g. systemd does. (Someone
> >> may have called me BS on that, either for being wrong about everything
> >> as usual or because of mentioning systemd, I don't know which one.)
> >>
> >> The latest proposal I saw came from Rafael and he had a scratch
> >> idea for a dependency graph that I really liked, but I guess he's been
> >> sidetracked since. Rafael, what happened with that?
> > I got distracted, but Marek Szyprowski has revived it recently.
> >
> > It needs to be cleaned up somewhat, but other than that I think it's in
> > a good enough shape to make some progress in that direction, at least in
> > principle.
> 
> I really like the idea of pm dependencies between device and the patches
> prepared by Rafael. They are exactly what we need for our case (PM for
> Exynos IOMMU), but they will also help solving PM issues with complex
> devices (like DRM for SoCs and ASoC audio).
> 
> Rafael: do you plan to do any update on them?

Yes, I do, but to make some cosmetic changes rather.

> Some time ago you wrote, that you had such plan, but real life proved
> something else.

Well, I was working on other things in the meantime, but I still had that
plan. :-)

> If needed I can continue works on them, but I need some directions what has
> to be improved and fixed.

Thanks so much!

First off, the networking people claimed the "devlink" term in the meantime
and it's better to avoid confusion here, so I'd change it to "devdep" or
similar in the patches.

In addition to that Tomeu Vizoso complained that the supplier_links and
consumer_links list heads in struct device were confusing and I see why that
could be the case, so I'd change them to something more direct, like maybe
links_to_suppliers and links_to_consumers.

Please let me know what you think.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Ksummit-discuss] Self nomination
@ 2016-08-06  0:20           ` Rafael J. Wysocki
  0 siblings, 0 replies; 81+ messages in thread
From: Rafael J. Wysocki @ 2016-08-06  0:20 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Linus Walleij, Marc Zyngier, Dmitry Torokhov, ksummit-discuss,
	Tomeu Vizoso, Linux PM list

On Wednesday, August 03, 2016 10:12:00 AM Marek Szyprowski wrote:
> Dear All,
> 
> 
> On 2016-08-03 01:00, Rafael J. Wysocki wrote:
> > On Tuesday, August 02, 2016 10:09:17 AM Linus Walleij wrote:
> >> On Thu, Jul 28, 2016 at 12:14 PM, Marc Zyngier <marc.zyngier@arm.com> wrote:
> >>> On 26/07/16 23:30, Dmitry Torokhov wrote:
> >>>> - I would like to sync up with people and discuss [lack of] progress
> >>>>    on topic of device probe ordering (including handling of deferred
> >>>>    probes, asynchronous probes, etc).
> >>> I'm extremely interested in discussing this.
> >> I've also tried to pitch in on it in the past but I just feel stupid
> >> whenever we try to come up with something better than what
> >> we have :(
> >>
> >>> It has wide reaching consequences as (with my irqchip maintainer hat on)
> >>> we've had to pretend that some bits of HW (timers, interrupt
> >>> controllers) are not "devices". Not a massive issue for most, except
> >>> when your interrupt controller has requirements that are very similar to
> >>> the DMA mapping API (which you cannot use because "not a device"). Other
> >>> problems are introduced by things like wire-MSI bridges, and most people
> >>> end-up resorting to hacks like ad-hoc initcalls and sprinkling deferred
> >>> probes in specific drivers.
> >> Same feeling here. I'm accepting patches for random initcall
> >> reordering because there is nothing else I can do, people need to
> >> have their systems running. But it feels really fragile.
> >>
> >> Deferred probe alleviated the problem, but I remember saying at
> >> the time that what we really need to do is build a dependency
> >> graph and resolve it the same way e.g. systemd does. (Someone
> >> may have called me BS on that, either for being wrong about everything
> >> as usual or because of mentioning systemd, I don't know which one.)
> >>
> >> The latest proposal I saw came from Rafael and he had a scratch
> >> idea for a dependency graph that I really liked, but I guess he's been
> >> sidetracked since. Rafael, what happened with that?
> > I got distracted, but Marek Szyprowski has revived it recently.
> >
> > It needs to be cleaned up somewhat, but other than that I think it's in
> > a good enough shape to make some progress in that direction, at least in
> > principle.
> 
> I really like the idea of pm dependencies between device and the patches
> prepared by Rafael. They are exactly what we need for our case (PM for
> Exynos IOMMU), but they will also help solving PM issues with complex
> devices (like DRM for SoCs and ASoC audio).
> 
> Rafael: do you plan to do any update on them?

Yes, I do, but to make some cosmetic changes rather.

> Some time ago you wrote, that you had such plan, but real life proved
> something else.

Well, I was working on other things in the meantime, but I still had that
plan. :-)

> If needed I can continue works on them, but I need some directions what has
> to be improved and fixed.

Thanks so much!

First off, the networking people claimed the "devlink" term in the meantime
and it's better to avoid confusion here, so I'd change it to "devdep" or
similar in the patches.

In addition to that Tomeu Vizoso complained that the supplier_links and
consumer_links list heads in struct device were confusing and I see why that
could be the case, so I'd change them to something more direct, like maybe
links_to_suppliers and links_to_consumers.

Please let me know what you think.

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Ksummit-discuss] Self nomination
  2016-07-27  9:25 ` Linus Walleij
  2016-07-27 17:02   ` Darren Hart
@ 2016-08-04 12:30   ` Geert Uytterhoeven
  1 sibling, 0 replies; 81+ messages in thread
From: Geert Uytterhoeven @ 2016-08-04 12:30 UTC (permalink / raw)
  To: Linus Walleij; +Cc: ksummit-discuss, Nicolas Pitre

On Wed, Jul 27, 2016 at 11:25 AM, Linus Walleij
<linus.walleij@linaro.org> wrote:
> Qualifying Linux for Functional Safety
> http://events.linuxfoundation.org/sites/events/files/slides/20160713_SIL2LinuxMP_Min_ALS_0.9_pub.pdf

404

http://events.linuxfoundation.org/sites/events/files/slides/20160713_SIL2LinuxMP_Min_ALS_1.1.pdf

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Ksummit-discuss] Self nomination
  2016-08-02 23:00     ` Rafael J. Wysocki
@ 2016-08-03  8:12       ` Marek Szyprowski
  2016-08-06  0:20           ` Rafael J. Wysocki
  0 siblings, 1 reply; 81+ messages in thread
From: Marek Szyprowski @ 2016-08-03  8:12 UTC (permalink / raw)
  To: Rafael J. Wysocki, Linus Walleij; +Cc: ksummit-discuss

Dear All,


On 2016-08-03 01:00, Rafael J. Wysocki wrote:
> On Tuesday, August 02, 2016 10:09:17 AM Linus Walleij wrote:
>> On Thu, Jul 28, 2016 at 12:14 PM, Marc Zyngier <marc.zyngier@arm.com> wrote:
>>> On 26/07/16 23:30, Dmitry Torokhov wrote:
>>>> - I would like to sync up with people and discuss [lack of] progress
>>>>    on topic of device probe ordering (including handling of deferred
>>>>    probes, asynchronous probes, etc).
>>> I'm extremely interested in discussing this.
>> I've also tried to pitch in on it in the past but I just feel stupid
>> whenever we try to come up with something better than what
>> we have :(
>>
>>> It has wide reaching consequences as (with my irqchip maintainer hat on)
>>> we've had to pretend that some bits of HW (timers, interrupt
>>> controllers) are not "devices". Not a massive issue for most, except
>>> when your interrupt controller has requirements that are very similar to
>>> the DMA mapping API (which you cannot use because "not a device"). Other
>>> problems are introduced by things like wire-MSI bridges, and most people
>>> end-up resorting to hacks like ad-hoc initcalls and sprinkling deferred
>>> probes in specific drivers.
>> Same feeling here. I'm accepting patches for random initcall
>> reordering because there is nothing else I can do, people need to
>> have their systems running. But it feels really fragile.
>>
>> Deferred probe alleviated the problem, but I remember saying at
>> the time that what we really need to do is build a dependency
>> graph and resolve it the same way e.g. systemd does. (Someone
>> may have called me BS on that, either for being wrong about everything
>> as usual or because of mentioning systemd, I don't know which one.)
>>
>> The latest proposal I saw came from Rafael and he had a scratch
>> idea for a dependency graph that I really liked, but I guess he's been
>> sidetracked since. Rafael, what happened with that?
> I got distracted, but Marek Szyprowski has revived it recently.
>
> It needs to be cleaned up somewhat, but other than that I think it's in
> a good enough shape to make some progress in that direction, at least in
> principle.

I really like the idea of pm dependencies between device and the patches
prepared by Rafael. They are exactly what we need for our case (PM for
Exynos IOMMU), but they will also help solving PM issues with complex
devices (like DRM for SoCs and ASoC audio).

Rafael: do you plan to do any update on them? Some time ago you wrote,
that you had such plan, but real life proved something else. If needed
I can continue works on them, but I need some directions what has to be
improved and fixed.

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Ksummit-discuss] Self nomination
  2016-08-02  8:09   ` Linus Walleij
@ 2016-08-02 23:00     ` Rafael J. Wysocki
  2016-08-03  8:12       ` Marek Szyprowski
  0 siblings, 1 reply; 81+ messages in thread
From: Rafael J. Wysocki @ 2016-08-02 23:00 UTC (permalink / raw)
  To: Linus Walleij; +Cc: ksummit-discuss, Marek Szyprowski

On Tuesday, August 02, 2016 10:09:17 AM Linus Walleij wrote:
> On Thu, Jul 28, 2016 at 12:14 PM, Marc Zyngier <marc.zyngier@arm.com> wrote:
> > On 26/07/16 23:30, Dmitry Torokhov wrote:
> 
> >> - I would like to sync up with people and discuss [lack of] progress
> >>   on topic of device probe ordering (including handling of deferred
> >>   probes, asynchronous probes, etc).
> >
> > I'm extremely interested in discussing this.
> 
> I've also tried to pitch in on it in the past but I just feel stupid
> whenever we try to come up with something better than what
> we have :(
> 
> > It has wide reaching consequences as (with my irqchip maintainer hat on)
> > we've had to pretend that some bits of HW (timers, interrupt
> > controllers) are not "devices". Not a massive issue for most, except
> > when your interrupt controller has requirements that are very similar to
> > the DMA mapping API (which you cannot use because "not a device"). Other
> > problems are introduced by things like wire-MSI bridges, and most people
> > end-up resorting to hacks like ad-hoc initcalls and sprinkling deferred
> > probes in specific drivers.
> 
> Same feeling here. I'm accepting patches for random initcall
> reordering because there is nothing else I can do, people need to
> have their systems running. But it feels really fragile.
> 
> Deferred probe alleviated the problem, but I remember saying at
> the time that what we really need to do is build a dependency
> graph and resolve it the same way e.g. systemd does. (Someone
> may have called me BS on that, either for being wrong about everything
> as usual or because of mentioning systemd, I don't know which one.)
> 
> The latest proposal I saw came from Rafael and he had a scratch
> idea for a dependency graph that I really liked, but I guess he's been
> sidetracked since. Rafael, what happened with that?

I got distracted, but Marek Szyprowski has revived it recently.

It needs to be cleaned up somewhat, but other than that I think it's in
a good enough shape to make some progress in that direction, at least in
principle.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Ksummit-discuss] Self nomination
  2016-07-31  6:57 [Ksummit-discuss] " Olof Johansson
@ 2016-08-02 19:56 ` Mark Brown
  0 siblings, 0 replies; 81+ messages in thread
From: Mark Brown @ 2016-08-02 19:56 UTC (permalink / raw)
  To: Olof Johansson; +Cc: ksummit-discuss

[-- Attachment #1: Type: text/plain, Size: 458 bytes --]

On Sun, Jul 31, 2016 at 08:57:18AM +0200, Olof Johansson wrote:

> * Stable tree discussions. Having the discussion at last KS about
> timing of releases was hugely beneficial, picking 4.4 as a solid base
> for the year's devices.

Definitely.  I'm seeing both companies who have been able to adopt v4.4
and companies who are seeing costs from not jumping forward as everyone
else did and so are likely to move forward if the same pattern is
followed again.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Ksummit-discuss] Self nomination
  2016-07-28 10:14 ` Marc Zyngier
@ 2016-08-02  8:09   ` Linus Walleij
  2016-08-02 23:00     ` Rafael J. Wysocki
  2016-08-08 11:07   ` Lorenzo Pieralisi
  1 sibling, 1 reply; 81+ messages in thread
From: Linus Walleij @ 2016-08-02  8:09 UTC (permalink / raw)
  To: Marc Zyngier, Rafael J. Wysocki; +Cc: ksummit-discuss

On Thu, Jul 28, 2016 at 12:14 PM, Marc Zyngier <marc.zyngier@arm.com> wrote:
> On 26/07/16 23:30, Dmitry Torokhov wrote:

>> - I would like to sync up with people and discuss [lack of] progress
>>   on topic of device probe ordering (including handling of deferred
>>   probes, asynchronous probes, etc).
>
> I'm extremely interested in discussing this.

I've also tried to pitch in on it in the past but I just feel stupid
whenever we try to come up with something better than what
we have :(

> It has wide reaching consequences as (with my irqchip maintainer hat on)
> we've had to pretend that some bits of HW (timers, interrupt
> controllers) are not "devices". Not a massive issue for most, except
> when your interrupt controller has requirements that are very similar to
> the DMA mapping API (which you cannot use because "not a device"). Other
> problems are introduced by things like wire-MSI bridges, and most people
> end-up resorting to hacks like ad-hoc initcalls and sprinkling deferred
> probes in specific drivers.

Same feeling here. I'm accepting patches for random initcall
reordering because there is nothing else I can do, people need to
have their systems running. But it feels really fragile.

Deferred probe alleviated the problem, but I remember saying at
the time that what we really need to do is build a dependency
graph and resolve it the same way e.g. systemd does. (Someone
may have called me BS on that, either for being wrong about everything
as usual or because of mentioning systemd, I don't know which one.)

The latest proposal I saw came from Rafael and he had a scratch
idea for a dependency graph that I really liked, but I guess he's been
sidetracked since. Rafael, what happened with that?

Yours,
Linus Walleij

^ permalink raw reply	[flat|nested] 81+ messages in thread

* [Ksummit-discuss] Self nomination
@ 2016-07-31  6:57 Olof Johansson
  2016-08-02 19:56 ` Mark Brown
  0 siblings, 1 reply; 81+ messages in thread
From: Olof Johansson @ 2016-07-31  6:57 UTC (permalink / raw)
  To: ksummit-discuss

Hi,

I'd like to nominate myself to the kernel summit this year.

I co-maintain arm-soc together with Arnd Bergmann (and Kevin Hilman),
often merging code that spans between our tree and various other
subsystem. Meeting other maintainers at Kernel Summit is usually
highly beneficial since it makes the whole process run smoother.

Topics I care about:

* Embedded/mobile platforms: We merge embedded as well as server
platform code, and interact with a lot of the various camps, including
the people stuck between the nasty downstream vendor trees and
upstream.
* Stable tree discussions. Having the discussion at last KS about
timing of releases was hugely beneficial, picking 4.4 as a solid base
for the year's devices.
* Maintainership, process discussions. arm-soc is one of the fairly
smooth-running maintainer groups but always looking to improve (and
share our experiences)
* Testing and making it easier for people to get tests in with their
code, even if we don't mandate it.
* Devicetree and platform description discussions always seem to be
going on, even when we think we've sorted out most existing issues. I
care a good amount about these as well.


-Olof

^ permalink raw reply	[flat|nested] 81+ messages in thread

* [Ksummit-discuss] Self nomination
@ 2016-07-30  0:32 Ben Hutchings
  0 siblings, 0 replies; 81+ messages in thread
From: Ben Hutchings @ 2016-07-30  0:32 UTC (permalink / raw)
  To: ksummit-discuss

[-- Attachment #1: Type: text/plain, Size: 845 bytes --]

I am nominating myself to attend the kernel summit.

I continue to maintain Debian's kernel package (unstable and stable
branches), and the 3.2 and 3.16 longterm branches at kernel.org.

I'm interested in:

- Stable/longterm maintenance
- Tracking (or avoiding) regressions
- Code signing for modules, firmware, etc.  (Debian is now starting to
  implement this.)
  - Relatedly, the securelevel patches or similar restrictions on
    userland to prevent subversion of code signing
- Documentation - making sure the post-DocBook documentation remains
  packageable and reproducible

Ben.

-- 

Ben Hutchings
Experience is directly proportional to the value of equipment
destroyed.
                                                         - Carolyn
Scheppner

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Ksummit-discuss] self nomination
  2016-07-29  6:17 ` Wangnan (F)
@ 2016-07-29 23:53   ` Davidlohr Bueso
  0 siblings, 0 replies; 81+ messages in thread
From: Davidlohr Bueso @ 2016-07-29 23:53 UTC (permalink / raw)
  To: Wangnan (F); +Cc: ksummit-discuss

On Fri, 29 Jul 2016, Wangnan (F) wrote:

>Yes, we need a performance related session. I posted a proposal but
>not very responsive.
>
>I can talk about tracing and tools/perf, especially the progress of
>BPF. We can discuss something about performance profiling based on
>these tracing result.

Fair enough, although I was referring to something a little more tangible,
such as specific pressure points and bottlenecks in (hopefully) real
workloads/systems. In the past this has worked well, and some issues that
have been brought to our attention have been addressed.

Like others already said, this obviously requires enough interest and
content; so other performance related topics could fit as well.

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 81+ messages in thread

* [Ksummit-discuss] self nomination
@ 2016-07-29 22:45 Mimi Zohar
  0 siblings, 0 replies; 81+ messages in thread
From: Mimi Zohar @ 2016-07-29 22:45 UTC (permalink / raw)
  To: ksummit-discuss

Hi,

I would like to be included on any discussions relating to:
- key management/trust
- signature verification

I would be willing to give a status update on closing file
measurement/appraisal (signature verification) gaps, and hopefully
impress upon kernel developers not to introduce new gaps.

Mimi

^ permalink raw reply	[flat|nested] 81+ messages in thread

* [Ksummit-discuss] Self nomination
@ 2016-07-29 15:13 Bartlomiej Zolnierkiewicz
  0 siblings, 0 replies; 81+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2016-07-29 15:13 UTC (permalink / raw)
  To: ksummit-discuss

Hello,

I would like to self-nominate for the Kernel Summit 2016.

I'm the leader of the Tizen Kernel team in Samsung R&D Institute
Poland (SRPOL) which (together with Korean HQ Kernel team) develops
Linux Kernel for Tizen.org reference devices and works on support
for ARM Samsung Exynos SoCs in the mainline kernel.  I'm also
currently co-maintainer of mainline libata PATA drivers.

I'm particularly interested in the following topics proposed this
year:

- Kernel unit testing and generally all topics related to kernel
  testing.

  We have developed internal Tizen kernel testing system in SRPOL
  and are still working on enhancing it.  If possible in the future
  we would like to make it (or at least some parts of it) available
  to external developers (i.e. using kernelci.org infrastructure).

  I would also like to discuss the ways to make it easier for testing
  kernels by developers on old and/or rare devices (especially in
  the context of IDE subsystem to libata PATA migration).

- stable workflow - we are basing Tizen.org kernels on LTS kernels
  so any ways to improve the stable/LTS kernels quality are in
  our area of interest.

- Bus IPC.  I can act as a bridge to developers from Tizen System
  team in SRPOL who are working on IPC in Tizen (dbus/kdbus).  I can
  provide input to the discussion from Tizen's requirements POV (i.e.
  we need our IPC to be container-aware).

Thank you for considering.

Best regards,
--
Bartlomiej Zolnierkiewicz
Samsung R&D Institute Poland
Samsung Electronics

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Ksummit-discuss] self nomination
  2016-07-27 23:20 Davidlohr Bueso
  2016-07-28  7:18 ` Jan Kara
  2016-07-28 14:37 ` Rik van Riel
@ 2016-07-29  6:17 ` Wangnan (F)
  2016-07-29 23:53   ` Davidlohr Bueso
  2 siblings, 1 reply; 81+ messages in thread
From: Wangnan (F) @ 2016-07-29  6:17 UTC (permalink / raw)
  To: dave; +Cc: ksummit-discuss



On 2016/7/28 7:20, Davidlohr Bueso wrote:
> Hi,
>
> I would like to nominate myself for kernel summit this year.
>
> I hack on various core subsystems mainly in the name of performance and
> am particularly interested in the following topics/discussions:
>
> - C++11 atomics/kernel memory model.
> - Upstreaming PREEMPT_RT.
> - Regression tracking.
> - (and to a lesser degree unit testing)
>
> In addition, I'm wondering if there's any interest in a performance
> session (getting to know some of the bottlenecks that are currently
> making folks cry), like Chris Mason has been doing in past kernel
> summits for fb specific workloads.
>

Yes, we need a performance related session. I posted a proposal but
not very responsive.

I can talk about tracing and tools/perf, especially the progress of
BPF. We can discuss something about performance profiling based on
these tracing result.

Thank you.

> Thanks,
> Davidlohr
> _______________________________________________
> Ksummit-discuss mailing list
> Ksummit-discuss@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/ksummit-discuss

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Ksummit-discuss] self nomination
  2016-07-28 17:29 [Ksummit-discuss] self nomination James Bottomley
@ 2016-07-28 17:31 ` James Bottomley
  0 siblings, 0 replies; 81+ messages in thread
From: James Bottomley @ 2016-07-28 17:31 UTC (permalink / raw)
  To: ksummit-discuss

On Thu, 2016-07-28 at 13:29 -0400, James Bottomley wrote:
> I have been known to be OK with the occasional storage or obscure non
> -x86 architecture or even the occasional mm issues.
> 
> I'm principally interested in
> 
>  * Storage issues (not that these ever come up at the kernel summit)
>  * In kernel container technologies (gcroups, namespaces and how we
>    apply the granular virtualizations) since this has been one of my
>    foci for five years now.
>  * Security, keys and the TPM.  I'm mostly working on this above the
>    kernel for a more secure cloud environment, but this means I have a
>    lot of TSS and TPM knowledge and I'd love to share the pain ...

I suppose I should add that running a top level tree, I have an
interest in process issues.  I'd be somewhat interested in a stable
tree discussion but only really to make sure it doesn't increase my
maintainer workload ...

James

^ permalink raw reply	[flat|nested] 81+ messages in thread

* [Ksummit-discuss] self nomination
@ 2016-07-28 17:29 James Bottomley
  2016-07-28 17:31 ` James Bottomley
  0 siblings, 1 reply; 81+ messages in thread
From: James Bottomley @ 2016-07-28 17:29 UTC (permalink / raw)
  To: ksummit-discuss

I have been known to be OK with the occasional storage or obscure non
-x86 architecture or even the occasional mm issues.

I'm principally interested in

 * Storage issues (not that these ever come up at the kernel summit)
 * In kernel container technologies (gcroups, namespaces and how we
   apply the granular virtualizations) since this has been one of my
   foci for five years now.
 * Security, keys and the TPM.  I'm mostly working on this above the
   kernel for a more secure cloud environment, but this means I have a
   lot of TSS and TPM knowledge and I'd love to share the pain ...

James

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Ksummit-discuss] self nomination
  2016-07-27 23:20 Davidlohr Bueso
  2016-07-28  7:18 ` Jan Kara
@ 2016-07-28 14:37 ` Rik van Riel
  2016-07-29  6:17 ` Wangnan (F)
  2 siblings, 0 replies; 81+ messages in thread
From: Rik van Riel @ 2016-07-28 14:37 UTC (permalink / raw)
  To: Davidlohr Bueso, ksummit-discuss

[-- Attachment #1: Type: text/plain, Size: 523 bytes --]

On Wed, 2016-07-27 at 16:20 -0700, Davidlohr Bueso wrote:
> 
> - Upstreaming PREEMPT_RT.

> In addition, I'm wondering if there's any interest in a performance
> session (getting to know some of the bottlenecks that are currently
> making folks cry), like Chris Mason has been doing in past kernel
> summits for fb specific workloads.

I would be interested in these topics, too.

I guess I should nominate myself, in case either
Johannes's MM topic comes up, or the above topics.

-- 
All rights reversed

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Ksummit-discuss] Self nomination
  2016-07-26 23:59 Stephen Rothwell
@ 2016-07-28 12:23 ` Luis de Bethencourt
  0 siblings, 0 replies; 81+ messages in thread
From: Luis de Bethencourt @ 2016-07-28 12:23 UTC (permalink / raw)
  To: ksummit-discuss

On 27/07/16 00:59, Stephen Rothwell wrote:
> I'd like to self nominate for the kernel summit this year.
> 
> As the linux-next maintainer, I have interest in most process topics in
> particular the maintainership processes.
> 
> Also, being a bit isolated down under, meeting in person allows me to
> remember that you are all people really :-)
> 

I am also interested in the maintainership processes. I think these need
to be better defined and documented.

Linux is a project that would benefit of having a bigger revolving door of
new joiners, at all levels.

Thanks Stephen,
Luis

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Ksummit-discuss] Self nomination
  2016-07-26 22:30 Dmitry Torokhov
@ 2016-07-28 10:14 ` Marc Zyngier
  2016-08-02  8:09   ` Linus Walleij
  2016-08-08 11:07   ` Lorenzo Pieralisi
  0 siblings, 2 replies; 81+ messages in thread
From: Marc Zyngier @ 2016-07-28 10:14 UTC (permalink / raw)
  To: Dmitry Torokhov, ksummit-discuss

On 26/07/16 23:30, Dmitry Torokhov wrote:
> I'd like to nominate myself for the kernel summit this year. I am part
> of Chrome OS kernel team and I also maintain drivers/input in mainline.

[...]

> - I would like to sync up with people and discuss [lack of] progress
>   on topic of device probe ordering (including handling of deferred
>   probes, asynchronous probes, etc).

I'm extremely interested in discussing this.

It has wide reaching consequences as (with my irqchip maintainer hat on)
we've had to pretend that some bits of HW (timers, interrupt
controllers) are not "devices". Not a massive issue for most, except
when your interrupt controller has requirements that are very similar to
the DMA mapping API (which you cannot use because "not a device"). Other
problems are introduced by things like wire-MSI bridges, and most people
end-up resorting to hacks like ad-hoc initcalls and sprinkling deferred
probes in specific drivers.

I've seen a number of proposal so far, but the subject seems to have
gone quiet (well, not really, but hardly any progress has been made).

Happy to make this a tech discussion or a hallway track.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Ksummit-discuss] self nomination
  2016-07-27 23:20 Davidlohr Bueso
@ 2016-07-28  7:18 ` Jan Kara
  2016-07-28 14:37 ` Rik van Riel
  2016-07-29  6:17 ` Wangnan (F)
  2 siblings, 0 replies; 81+ messages in thread
From: Jan Kara @ 2016-07-28  7:18 UTC (permalink / raw)
  To: Davidlohr Bueso; +Cc: ksummit-discuss

Hi!

On Wed 27-07-16 16:20:54, Davidlohr Bueso wrote:
> In addition, I'm wondering if there's any interest in a performance
> session (getting to know some of the bottlenecks that are currently
> making folks cry), like Chris Mason has been doing in past kernel
> summits for fb specific workloads.

I'd be interested in that session. But we need to find enough content. We
can certainly talk about our experiences with updating kernel for SLE12 SP2
(to 4.4) but I don't think it would fill the whole session. So it would
be good if other "vendors" shared their experiences as well.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 81+ messages in thread

* [Ksummit-discuss] self nomination
@ 2016-07-27 23:20 Davidlohr Bueso
  2016-07-28  7:18 ` Jan Kara
                   ` (2 more replies)
  0 siblings, 3 replies; 81+ messages in thread
From: Davidlohr Bueso @ 2016-07-27 23:20 UTC (permalink / raw)
  To: ksummit-discuss

Hi,

I would like to nominate myself for kernel summit this year.

I hack on various core subsystems mainly in the name of performance and
am particularly interested in the following topics/discussions:

- C++11 atomics/kernel memory model.
- Upstreaming PREEMPT_RT.
- Regression tracking.
- (and to a lesser degree unit testing)

In addition, I'm wondering if there's any interest in a performance
session (getting to know some of the bottlenecks that are currently
making folks cry), like Chris Mason has been doing in past kernel
summits for fb specific workloads.

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Ksummit-discuss] Self nomination
  2016-07-27  9:25 ` Linus Walleij
@ 2016-07-27 17:02   ` Darren Hart
  2016-08-04 12:30   ` Geert Uytterhoeven
  1 sibling, 0 replies; 81+ messages in thread
From: Darren Hart @ 2016-07-27 17:02 UTC (permalink / raw)
  To: Linus Walleij; +Cc: ksummit-discuss, Nicolas Pitre

On Wed, Jul 27, 2016 at 11:25:52AM +0200, Linus Walleij wrote:
> On Wed, Jul 27, 2016 at 6:46 AM, Darren Hart <dvhart@infradead.org> wrote:
> 
> >   - Developing a "safety culture" and any overlap that may have with security
> 
> Do you mean safety as in "Linux in airbags and smoke detectors"?

It's more like Linux in consolidated automotive ECUs, industrial robot and
control, medical devices, avionics, etc. When the computational requirements of
safety critical systems exceed the capabilities of the traditional MCUs.
Automotive and Industrial Control are two of the driving forces currently.

> 
> This area is interesting for GPIO and IIO as well for natural reasons:
> these systems all tend to use GPIO and sensors. (Albeit more often
> than not with some horrific userspace hodge-podge but this is not the
> time to be grumpy about that.)
> 
> This presentation appeared at LinuxCon Japan (got the link from
> my colleague Takahiro Akashi):
> 
> Qualifying Linux for Functional Safety
> http://events.linuxfoundation.org/sites/events/files/slides/20160713_SIL2LinuxMP_Min_ALS_0.9_pub.pdf
> 

There are others, but this one is the core of the approach to a successful
compliance route, and is based on the most rigorous and experienced approach.
Nicholas is a well respected leader in this area. It may be worth inviting him
to participate in this tech topic. I'll see what his interest level is.

> I have been in contact with a Swedish company working in the fire alarm
> business as well.
> 
> My overall feeling is that world regulations (standards) on safety-critical
> software seem to be centered around code inspection.
> 
> These regulations approach is not to trust any third party but have all
> code inspected by independent reviewers for functional safety. All the
> time. For every new deployment. Kernel, libc, busybox userspace.
> (Or whatever they use.)

This is true, and all changes (fixes, features, security patches, etc.) are
treated as modifications. The compliance route, however, for complex software
and complex hardware is an active area of development (see SIL2LinuxMP above),
as these standards (IEC 61508) were not developed with such software or hardware
in mind.

> 
> Thus Hitachi developed this minimizing stripping out all not-compiled code
> and #ifdefs etc too to get down the code size they have to manually inspect
> to a minimum. (It easily translates into work hours.)
>
> My loose thoughts on the issue is twofold:
> 
> - We will have an influx of professional safety reviewers that do not
>   share their review comments with us, instead apply fixes locally and
>   not upstream. This is potentially dangerous if the next reviewer for
>   a safety-critical system misses the same bug. (Not to say unethical
>   vs the community but I have come to understand that some people
>   out there do not care about that.) So we need to send a message to
>   the safety-critical industry that any issues found in such safety
>   inspections need to go upstream pronto. No vendor tree:ing of this.

SIL2LinuxMP will help significantly with this, and the TUeV is happy to see open
source entering the Functional Safety space as the developer culture rewards
finding and fixing issues. Bugs can be found and fixed early, rather than
waiting for accidents and deaths to drive changes.

> 
> - Can we record external inspection-only code reviews done by these
>   independent code reviewers (post-minimization) into the kernel (etc) git?
>   That I guess is pretty useful for building formal trust for the code,
>   but I never heard of git annotations to some random code lines like
>   that.
> 
> - Should minimization be a part of the kernel standard tooling for use
>   cases like this?
> 
> Incidentally that may overlap with the footprint minimizing goal: if you can
> configure code out (such as the modular syscalls things that Nico has
> been working on), that makes this kind of code minimization easier and
> may employ similar tooling.

The SIL2LinuxMP project works with a minimized Linux kernel of ~400k loc.

> 
> Yours,
> Linus Walleij
> 

-- 
Darren Hart
Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 81+ messages in thread

* [Ksummit-discuss] Self nomination
@ 2016-07-27 14:54 Mark Rutland
  0 siblings, 0 replies; 81+ messages in thread
From: Mark Rutland @ 2016-07-27 14:54 UTC (permalink / raw)
  To: ksummit-discuss

Hi,

I would like to self nominate for the kernel summit this year.

I co-maintain device tree bindings, and the ARM PSCI driver. I also
contribute to the low-level architecture code for arm and arm64.

I'm interested in hardening efforts that are applicable either
generically or to arm/arm64, and have been involved with some of the
work so far.

Having been bitten by a few issues in the past, I'm also interested in
what we can do to ensure that changes to common infrastructure don't
leave architectures broken (or in a sub-optimal state). I'm interested
in unit testing, regression tracking, and stable workflow.

I'm interested in discussions regarding:

- device tree bindings / subsystem divisions
- kernel hardening / self protection
- kernel unit testing
- regression tracking
- stable workflow

I will also be attending Linux Plumbers Conference 2016.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* [Ksummit-discuss] Self nomination
@ 2016-07-27 13:57 Lorenzo Pieralisi
  0 siblings, 0 replies; 81+ messages in thread
From: Lorenzo Pieralisi @ 2016-07-27 13:57 UTC (permalink / raw)
  To: ksummit-discuss

Hello,

I would like to self-nominate for the 2016 kernel summit.

I am currently co-maintaining the ARM PSCI firmware interface,
some ARM CPU idle drivers and ARM platforms code and I am a core
contributor of ARM power management, ACPI and PCI code.

I will attend the Linux Plumbers Conference 2016 as PCI
microconference co-leader and take part in the Power
Management microconference, among other topics.

For my maintainership role, I am quite interested in attending the
plenary KS sessions core tracks (that are currently under discussion
in this mailing list), in particular:

  - (group) maintainership models
  - Stable kernels process and workflow
  - kthread freezer clean-up

Since I am involved in power management and firmware interfaces I
am particularly interested in topics revolving around ACPI and
device tree integration in the corresponding workshops and breakout
sessions (other than the LPC microconferences already mentioned).

Thank you very much for considering.

Regards,
Lorenzo

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Ksummit-discuss] Self nomination
  2016-07-27  4:46 Darren Hart
@ 2016-07-27  9:25 ` Linus Walleij
  2016-07-27 17:02   ` Darren Hart
  2016-08-04 12:30   ` Geert Uytterhoeven
  0 siblings, 2 replies; 81+ messages in thread
From: Linus Walleij @ 2016-07-27  9:25 UTC (permalink / raw)
  To: Darren Hart, Nicolas Pitre; +Cc: ksummit-discuss

On Wed, Jul 27, 2016 at 6:46 AM, Darren Hart <dvhart@infradead.org> wrote:

>   - Developing a "safety culture" and any overlap that may have with security

Do you mean safety as in "Linux in airbags and smoke detectors"?

This area is interesting for GPIO and IIO as well for natural reasons:
these systems all tend to use GPIO and sensors. (Albeit more often
than not with some horrific userspace hodge-podge but this is not the
time to be grumpy about that.)

This presentation appeared at LinuxCon Japan (got the link from
my colleague Takahiro Akashi):

Qualifying Linux for Functional Safety
http://events.linuxfoundation.org/sites/events/files/slides/20160713_SIL2LinuxMP_Min_ALS_0.9_pub.pdf

I have been in contact with a Swedish company working in the fire alarm
business as well.

My overall feeling is that world regulations (standards) on safety-critical
software seem to be centered around code inspection.

These regulations approach is not to trust any third party but have all
code inspected by independent reviewers for functional safety. All the
time. For every new deployment. Kernel, libc, busybox userspace.
(Or whatever they use.)

Thus Hitachi developed this minimizing stripping out all not-compiled code
and #ifdefs etc too to get down the code size they have to manually inspect
to a minimum. (It easily translates into work hours.)

My loose thoughts on the issue is twofold:

- We will have an influx of professional safety reviewers that do not
  share their review comments with us, instead apply fixes locally and
  not upstream. This is potentially dangerous if the next reviewer for
  a safety-critical system misses the same bug. (Not to say unethical
  vs the community but I have come to understand that some people
  out there do not care about that.) So we need to send a message to
  the safety-critical industry that any issues found in such safety
  inspections need to go upstream pronto. No vendor tree:ing of this.

- Can we record external inspection-only code reviews done by these
  independent code reviewers (post-minimization) into the kernel (etc) git?
  That I guess is pretty useful for building formal trust for the code,
  but I never heard of git annotations to some random code lines like
  that.

- Should minimization be a part of the kernel standard tooling for use
  cases like this?

Incidentally that may overlap with the footprint minimizing goal: if you can
configure code out (such as the modular syscalls things that Nico has
been working on), that makes this kind of code minimization easier and
may employ similar tooling.

Yours,
Linus Walleij

^ permalink raw reply	[flat|nested] 81+ messages in thread

* [Ksummit-discuss] Self nomination
@ 2016-07-27  4:46 Darren Hart
  2016-07-27  9:25 ` Linus Walleij
  0 siblings, 1 reply; 81+ messages in thread
From: Darren Hart @ 2016-07-27  4:46 UTC (permalink / raw)
  To: ksummit-discuss

I'd like to self nominate for this kernel summit this year.

I'm the platform-drivers-x86 maintainer, and am interested in discussions
regarding:

  - maintainer group process and tooling
  - easing the non-kernel-code barriers to contribution
    (eliminating "mechanical failures")

I'm involved with various efforts around Real-Time and Safety Critical Linux and
would be interested in discussions around:

  - Upstreaming PREEMPT_RT
  - Developing a "safety culture" and any overlap that may have with security

Finally, as part of my platform enabling responsibilities, I'm interested in any
developments in the stable process.

Regards,

-- 
Darren Hart
Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 81+ messages in thread

* [Ksummit-discuss] Self nomination
@ 2016-07-27  0:50 Sergey Senozhatsky
  0 siblings, 0 replies; 81+ messages in thread
From: Sergey Senozhatsky @ 2016-07-27  0:50 UTC (permalink / raw)
  To: ksummit-discuss

Hello,

I'd like to self nominate for the kernel summit this year.

Interested in printk-related discussions (async printk, KERN_CONT printing,
async console_unlock(), panic printk(), etc.), stable workflow and some
other topics.

	-ss

^ permalink raw reply	[flat|nested] 81+ messages in thread

* [Ksummit-discuss] Self nomination
@ 2016-07-26 23:59 Stephen Rothwell
  2016-07-28 12:23 ` Luis de Bethencourt
  0 siblings, 1 reply; 81+ messages in thread
From: Stephen Rothwell @ 2016-07-26 23:59 UTC (permalink / raw)
  To: ksummit-discuss

I'd like to self nominate for the kernel summit this year.

As the linux-next maintainer, I have interest in most process topics in
particular the maintainership processes.

Also, being a bit isolated down under, meeting in person allows me to
remember that you are all people really :-)

-- 
Cheers,
Stephen Rothwell

^ permalink raw reply	[flat|nested] 81+ messages in thread

* [Ksummit-discuss] Self nomination
@ 2016-07-26 22:30 Dmitry Torokhov
  2016-07-28 10:14 ` Marc Zyngier
  0 siblings, 1 reply; 81+ messages in thread
From: Dmitry Torokhov @ 2016-07-26 22:30 UTC (permalink / raw)
  To: ksummit-discuss

I'd like to nominate myself for the kernel summit this year. I am part
of Chrome OS kernel team and I also maintain drivers/input in mainline.

I am interested in the following topics/discussions:

- stable process and regression tracking;

- maintainership process. I am still struggling to figure out how I
  can spread the load and would love to hear some ideas;

- discussions regarding subsystem boundaries (iio, input, hwmon, etc)
  and how we can provide better guidance to driver authors to find a
  home for their drivers;

- I would like to sync up with people and discuss [lack of] progress
  on topic of device probe ordering (including handling of deferred
  probes, asynchronous probes, etc).

Thanks!

-- 
Dmitry

^ permalink raw reply	[flat|nested] 81+ messages in thread

* [Ksummit-discuss] Self nomination
@ 2016-07-26 15:44 David Woodhouse
  0 siblings, 0 replies; 81+ messages in thread
From: David Woodhouse @ 2016-07-26 15:44 UTC (permalink / raw)
  To: ksummit-discuss

[-- Attachment #1: Type: text/plain, Size: 1422 bytes --]

I'd like to nominate myself for the kernel summit this year.

I am very interested in toolchain-related discussion, looking at what
the compiler(s) can do better and how we can make better use of them.
Including types and static analysis, and potentially making use of
LLVM.

And also the memory model — like the byteswapping that I've worked on
before, it makes a lot of sense to let the compiler *see* what's going
on rather than hiding it in opaque inline assembly which precludes
optimisations. If we have to make an effort to ensure the ordering
models are compatible, that may well be a price worth paying. It's not
like we didn't already have work to do on ensuring that the coherency
models we offer to (e.g.) device drivers through things like
readl_relaxed() are well-specified in a way that can be implemented
efficiently across architectures.

I'm interested in testing topics, and how we can make it easier for
contributors (and 'janitors') to build simple test cases which run in a
consistent environment, rather than doing their own ad-hoc testing.

I would very much like to be involved with any discussion on
asynchronous and early console output.

I'm also interested in the signature/key management topic if that
happens.

-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5760 bytes --]

^ permalink raw reply	[flat|nested] 81+ messages in thread

* [Ksummit-discuss] self nomination
@ 2016-07-25 21:46 Kevin Hilman
  0 siblings, 0 replies; 81+ messages in thread
From: Kevin Hilman @ 2016-07-25 21:46 UTC (permalink / raw)
  To: ksummit-discuss

Hello,

I would like to self-nominate for the 2016 kernel summit.

I'm the lead kernel developer working on kernelci.org,
backup/assistant/3rd-string/sick-leave co-maintainer of the arm-soc
tree, as well as co-maintainer of a couple of the many ARM
sub-architectures.

I'm most interested in discussions around kernel testing infrastructure,
and in particular, how to improve testing for the various stable and
LTS-based trees.

Thank you for considering,

Kevin

^ permalink raw reply	[flat|nested] 81+ messages in thread

* [Ksummit-discuss] Self nomination
  2015-08-26 21:10         ` Matthew Garrett
@ 2015-08-30  0:41           ` Matthew Garrett
  0 siblings, 0 replies; 81+ messages in thread
From: Matthew Garrett @ 2015-08-30  0:41 UTC (permalink / raw)
  To: Kees Cook; +Cc: ksummit-discuss

Unsure whether Kees suggesting me is sufficient, so:

I'd like to be involved if we're going to have a meaningful discussion 
about more proactively mitigating attacks on the kernel. I've been 
working full-time in the security field for about the past three years, 
and it's certainly resulted in a fairly strong shift in attitude towards 
how well we're doing here. At this point even Microsoft are showing more 
aggressive security development than we are, and when the only person 
who appears to be doing meaningful work in adding mitigation features to 
the kernel is Kees, that's kind of a bad sign.

I've been doing development in this field for some time now - the secure 
boot patchset is an attempt to avoid allowing userspace privilege 
escalations to turn into persistent kernel compromises, for instance. As 
a member of the SFC's kernel copyright enforcement group, I'm also 
interested in being involved in any legal discussions that come up. Even 
if I don't make the invite list, I'd hope that these both end up as high 
priority topics.

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 81+ messages in thread

* [Ksummit-discuss] Self nomination
@ 2015-08-11  5:05 Haggai Eran
  0 siblings, 0 replies; 81+ messages in thread
From: Haggai Eran @ 2015-08-11  5:05 UTC (permalink / raw)
  To: ksummit-discuss

Hi,

Excuse me for the late email, but I would like to nominate myself for
the core discussion. I'm interested in participating in the
compute-offload devices discussions, as well as the FPGA programming topic.

I've been part of the team that developed the RDMA subsystem support for
IO paging, and the mlx5 driver implementation.

Regards,
Haggai

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Ksummit-discuss] Self nomination
  2015-08-01 20:30         ` Dave Jones
@ 2015-08-03  5:17           ` Sasha Levin
  0 siblings, 0 replies; 81+ messages in thread
From: Sasha Levin @ 2015-08-03  5:17 UTC (permalink / raw)
  To: Dave Jones; +Cc: ksummit-discuss, Dan Carpenter

On 08/01/2015 04:30 PM, Dave Jones wrote:
> On Sat, Aug 01, 2015 at 11:26:17AM -0400, Sasha Levin wrote:
>  > On 08/01/2015 09:45 AM, Dan Carpenter wrote:
>  > > On Fri, Jul 31, 2015 at 01:49:08PM -0400, Sasha Levin wrote:
>  > >>  - Encouraging folks who add new sysctls (or features to existing sysctls) to contribute
>  > >> testing code to the various testing projects around (trinity and such).
>  > > 
>  > > Which other fuzzers are people using besides trinity?
>  > 
>  > I'm not sure, I think trinity sort of took over :)
>  > 
>  > My point was that people should be submitting a more significant amount
>  > of test code along with their new ABI - none of the recent syscalls added
>  > has any mention in LTP for example.
>  > 
>  > > Also why don't we merge trinity under tools/testing/?  I bet people
>  > > would keep it in sync better if we did that.
>  > 
>  > No objections on my end. I believe that Dave previously objected because
>  > it's hard to make it work on sync with the kernel's release cycle.
> 
> A big factor is the same reason I'm not a huge fan of tools/testing/ in general.
> People want to do things like "run latest testing tools on old LTS/enterprise kernels".
> 
> Having to suck a subtree out of the latest kernel tarball, and then spend
> time trying to make it even compile/run against headers from an old kernel
> is something I've already lost way too many weeks of my life to, so I'm
> not really a fan of making that problem even worse by adding other tools to it.

An interesting model we can try here is to let development happen both in the kernel
tree and outside of it, and patches will be exchanged between the two development trees
(possibly automatically). Think of it as a "fork".

On one hand it'll provide the kernel with a proven testing framework, and on
the other it'll give trinity a wider user/developer base.


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Ksummit-discuss] Self nomination
  2015-08-01 16:22         ` Greg KH
@ 2015-08-03  5:14           ` Sasha Levin
  0 siblings, 0 replies; 81+ messages in thread
From: Sasha Levin @ 2015-08-03  5:14 UTC (permalink / raw)
  To: Greg KH; +Cc: Dave Jones, ksummit-discuss, Dan Carpenter

On 08/01/2015 12:22 PM, Greg KH wrote:
>> > My point was that people should be submitting a more significant amount
>> > of test code along with their new ABI - none of the recent syscalls added
>> > has any mention in LTP for example.
> Because most of us hate LTP.
> 
> Seriously, I've lost days of my life many years ago tracing down "kernel
> bugs" that in the end were just bugs in LTP due to the horrid code in
> it.
> 
> Now yes, this was a long time ago, and hopefully it has gotten a lot
> better, but lots of people have this attitude which is why we are
> encouraging people to use the tests in the kernel tree itself, and is
> why new syscalls are all adding their tests there.
> 
> I've encouraged people to take the "good bits" of LTP and add them to
> the kernel test framework, but no one seems to want to do that...

I didn't really mean to focus on LTP here: pick any project that has
a critical mass of users and send them ways to test your code now and
for future changes.


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Ksummit-discuss] Self nomination
  2015-08-01 15:26       ` Sasha Levin
  2015-08-01 16:22         ` Greg KH
@ 2015-08-01 20:30         ` Dave Jones
  2015-08-03  5:17           ` Sasha Levin
  1 sibling, 1 reply; 81+ messages in thread
From: Dave Jones @ 2015-08-01 20:30 UTC (permalink / raw)
  To: Sasha Levin; +Cc: ksummit-discuss, Dan Carpenter

On Sat, Aug 01, 2015 at 11:26:17AM -0400, Sasha Levin wrote:
 > On 08/01/2015 09:45 AM, Dan Carpenter wrote:
 > > On Fri, Jul 31, 2015 at 01:49:08PM -0400, Sasha Levin wrote:
 > >>  - Encouraging folks who add new sysctls (or features to existing sysctls) to contribute
 > >> testing code to the various testing projects around (trinity and such).
 > > 
 > > Which other fuzzers are people using besides trinity?
 > 
 > I'm not sure, I think trinity sort of took over :)
 > 
 > My point was that people should be submitting a more significant amount
 > of test code along with their new ABI - none of the recent syscalls added
 > has any mention in LTP for example.
 > 
 > > Also why don't we merge trinity under tools/testing/?  I bet people
 > > would keep it in sync better if we did that.
 > 
 > No objections on my end. I believe that Dave previously objected because
 > it's hard to make it work on sync with the kernel's release cycle.

A big factor is the same reason I'm not a huge fan of tools/testing/ in general.
People want to do things like "run latest testing tools on old LTS/enterprise kernels".

Having to suck a subtree out of the latest kernel tarball, and then spend
time trying to make it even compile/run against headers from an old kernel
is something I've already lost way too many weeks of my life to, so I'm
not really a fan of making that problem even worse by adding other tools to it.

	Dave

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Ksummit-discuss] Self nomination
  2015-08-01 15:26       ` Sasha Levin
@ 2015-08-01 16:22         ` Greg KH
  2015-08-03  5:14           ` Sasha Levin
  2015-08-01 20:30         ` Dave Jones
  1 sibling, 1 reply; 81+ messages in thread
From: Greg KH @ 2015-08-01 16:22 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Dave Jones, ksummit-discuss, Dan Carpenter

On Sat, Aug 01, 2015 at 11:26:17AM -0400, Sasha Levin wrote:
> On 08/01/2015 09:45 AM, Dan Carpenter wrote:
> > On Fri, Jul 31, 2015 at 01:49:08PM -0400, Sasha Levin wrote:
> >>  - Encouraging folks who add new sysctls (or features to existing sysctls) to contribute
> >> testing code to the various testing projects around (trinity and such).
> > 
> > Which other fuzzers are people using besides trinity?
> 
> I'm not sure, I think trinity sort of took over :)
> 
> My point was that people should be submitting a more significant amount
> of test code along with their new ABI - none of the recent syscalls added
> has any mention in LTP for example.

Because most of us hate LTP.

Seriously, I've lost days of my life many years ago tracing down "kernel
bugs" that in the end were just bugs in LTP due to the horrid code in
it.

Now yes, this was a long time ago, and hopefully it has gotten a lot
better, but lots of people have this attitude which is why we are
encouraging people to use the tests in the kernel tree itself, and is
why new syscalls are all adding their tests there.

I've encouraged people to take the "good bits" of LTP and add them to
the kernel test framework, but no one seems to want to do that...

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Ksummit-discuss] Self nomination
  2015-08-01 13:45     ` Dan Carpenter
@ 2015-08-01 15:26       ` Sasha Levin
  2015-08-01 16:22         ` Greg KH
  2015-08-01 20:30         ` Dave Jones
  0 siblings, 2 replies; 81+ messages in thread
From: Sasha Levin @ 2015-08-01 15:26 UTC (permalink / raw)
  To: Dan Carpenter; +Cc: ksummit-discuss, Dave Jones

On 08/01/2015 09:45 AM, Dan Carpenter wrote:
> On Fri, Jul 31, 2015 at 01:49:08PM -0400, Sasha Levin wrote:
>>  - Encouraging folks who add new sysctls (or features to existing sysctls) to contribute
>> testing code to the various testing projects around (trinity and such).
> 
> Which other fuzzers are people using besides trinity?

I'm not sure, I think trinity sort of took over :)

My point was that people should be submitting a more significant amount
of test code along with their new ABI - none of the recent syscalls added
has any mention in LTP for example.

> Also why don't we merge trinity under tools/testing/?  I bet people
> would keep it in sync better if we did that.

No objections on my end. I believe that Dave previously objected because
it's hard to make it work on sync with the kernel's release cycle.

I've actually presented a proposal in linuxcon back in 2013 for unifying
trinity into the kernel, and using common headers for syscalls. This would
force people who add/modify syscalls to update trinity, and it'll also make
the ABI self documenting as one would need to be very specific about what
parameters are expected for each syscall.


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Ksummit-discuss] Self nomination
  2015-07-31 17:49   ` Sasha Levin
@ 2015-08-01 13:45     ` Dan Carpenter
  2015-08-01 15:26       ` Sasha Levin
  0 siblings, 1 reply; 81+ messages in thread
From: Dan Carpenter @ 2015-08-01 13:45 UTC (permalink / raw)
  To: Sasha Levin; +Cc: ksummit-discuss

On Fri, Jul 31, 2015 at 01:49:08PM -0400, Sasha Levin wrote:
>  - Encouraging folks who add new sysctls (or features to existing sysctls) to contribute
> testing code to the various testing projects around (trinity and such).

Which other fuzzers are people using besides trinity?

Also why don't we merge trinity under tools/testing/?  I bet people
would keep it in sync better if we did that.

regards,
dan carpenter

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Ksummit-discuss] Self nomination
  2015-07-31 18:59                 ` Dan Williams
@ 2015-08-01 13:03                   ` Jiri Kosina
  0 siblings, 0 replies; 81+ messages in thread
From: Jiri Kosina @ 2015-08-01 13:03 UTC (permalink / raw)
  To: Dan Williams; +Cc: ksummit-discuss, Sasha Levin

On Fri, 31 Jul 2015, Dan Williams wrote:

> One more thought on this tangent... I've had a task stuck at the bottom 
> of my backlog to look at extending dynamic_debug call sites to 
> optionally be tracepoints so that you could feasibly have debug on all 
> the time with less run time impact. 

Making dynamic debug use jump labels should be enough to achieve this.

That used to actually be the case, but got reverted later, as it didn't 
properly handle gccs that didn't support asm goto (see 2d75af2f2a7).

Once that is fixed to be more general and handle older gccs as well, your 
task is done :)

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Ksummit-discuss] Self nomination
  2015-07-31 18:22               ` Greg KH
@ 2015-07-31 18:59                 ` Dan Williams
  2015-08-01 13:03                   ` Jiri Kosina
  0 siblings, 1 reply; 81+ messages in thread
From: Dan Williams @ 2015-07-31 18:59 UTC (permalink / raw)
  To: Greg KH; +Cc: ksummit-discuss, Sasha Levin

On Fri, Jul 31, 2015 at 11:22 AM, Greg KH <greg@kroah.com> wrote:
> On Fri, Jul 31, 2015 at 02:17:58PM -0400, Dave Jones wrote:
>> On Fri, Jul 31, 2015 at 11:12:45AM -0700, Greg Kroah-Hartman wrote:
>>  > On Fri, Jul 31, 2015 at 12:51:53PM -0500, Bjorn Helgaas wrote:
>>  > > I prefer that a dmesg collected in the simplest possible way, with no
>>  > > special config or boot flags, be as useful as possible.  So converting
>>  > > to dynamic debug requires much more thought about which messages
>>  > > should be always printed and which should become dynamic.
>>  >
>>  > Why would a debugging message ever not be dynamic?  They are there for
>>  > you to use, and turn off when you are done.  If you want a user to
>>  > report the output of them, then of course they should be dynamic so they
>>  > can just write a line to a debugfs file and then start seeing them
>>
>> This implies a user knows ahead of time what bugs they are going to hit,
>> and which messages they need to enable.  For hard-to-reproduce bugs,
>> or bugs that exhibit non-obvious symptoms, this isn't workable.
>
> Fair enough, but wouldn't those messages be "errors"?
>
> Anyway, this is way off-topic from the original thread, it all comes
> down to specifics of the message that is being written and the
> surrounding issues of why it would be written.

One more thought on this tangent... I've had a task stuck at the
bottom of my backlog to look at extending dynamic_debug call sites to
optionally be tracepoints so that you could feasibly have debug on all
the time with less run time impact.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Ksummit-discuss] Self nomination
  2015-07-31 18:17             ` Dave Jones
@ 2015-07-31 18:22               ` Greg KH
  2015-07-31 18:59                 ` Dan Williams
  0 siblings, 1 reply; 81+ messages in thread
From: Greg KH @ 2015-07-31 18:22 UTC (permalink / raw)
  To: Dave Jones; +Cc: Sasha Levin, ksummit-discuss

On Fri, Jul 31, 2015 at 02:17:58PM -0400, Dave Jones wrote:
> On Fri, Jul 31, 2015 at 11:12:45AM -0700, Greg Kroah-Hartman wrote:
>  > On Fri, Jul 31, 2015 at 12:51:53PM -0500, Bjorn Helgaas wrote:
>  > > I prefer that a dmesg collected in the simplest possible way, with no
>  > > special config or boot flags, be as useful as possible.  So converting
>  > > to dynamic debug requires much more thought about which messages
>  > > should be always printed and which should become dynamic.
>  > 
>  > Why would a debugging message ever not be dynamic?  They are there for
>  > you to use, and turn off when you are done.  If you want a user to
>  > report the output of them, then of course they should be dynamic so they
>  > can just write a line to a debugfs file and then start seeing them
> 
> This implies a user knows ahead of time what bugs they are going to hit,
> and which messages they need to enable.  For hard-to-reproduce bugs,
> or bugs that exhibit non-obvious symptoms, this isn't workable.

Fair enough, but wouldn't those messages be "errors"?

Anyway, this is way off-topic from the original thread, it all comes
down to specifics of the message that is being written and the
surrounding issues of why it would be written.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Ksummit-discuss] Self nomination
  2015-07-31 18:12           ` Greg KH
@ 2015-07-31 18:17             ` Dave Jones
  2015-07-31 18:22               ` Greg KH
  0 siblings, 1 reply; 81+ messages in thread
From: Dave Jones @ 2015-07-31 18:17 UTC (permalink / raw)
  To: Greg KH; +Cc: Sasha Levin, ksummit-discuss

On Fri, Jul 31, 2015 at 11:12:45AM -0700, Greg Kroah-Hartman wrote:
 > On Fri, Jul 31, 2015 at 12:51:53PM -0500, Bjorn Helgaas wrote:
 > > I prefer that a dmesg collected in the simplest possible way, with no
 > > special config or boot flags, be as useful as possible.  So converting
 > > to dynamic debug requires much more thought about which messages
 > > should be always printed and which should become dynamic.
 > 
 > Why would a debugging message ever not be dynamic?  They are there for
 > you to use, and turn off when you are done.  If you want a user to
 > report the output of them, then of course they should be dynamic so they
 > can just write a line to a debugfs file and then start seeing them

This implies a user knows ahead of time what bugs they are going to hit,
and which messages they need to enable.  For hard-to-reproduce bugs,
or bugs that exhibit non-obvious symptoms, this isn't workable.

	Dave

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Ksummit-discuss] Self nomination
  2015-07-31 17:51         ` Bjorn Helgaas
@ 2015-07-31 18:12           ` Greg KH
  2015-07-31 18:17             ` Dave Jones
  0 siblings, 1 reply; 81+ messages in thread
From: Greg KH @ 2015-07-31 18:12 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Sasha Levin, ksummit-discuss

On Fri, Jul 31, 2015 at 12:51:53PM -0500, Bjorn Helgaas wrote:
> I prefer that a dmesg collected in the simplest possible way, with no
> special config or boot flags, be as useful as possible.  So converting
> to dynamic debug requires much more thought about which messages
> should be always printed and which should become dynamic.

Why would a debugging message ever not be dynamic?  They are there for
you to use, and turn off when you are done.  If you want a user to
report the output of them, then of course they should be dynamic so they
can just write a line to a debugfs file and then start seeing them, they
shouldn't be seeing them all the time, and they should never require a
rebuild of the kernel to see them (which is why we deleted
CONFIG_USB_DEBUG).

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Ksummit-discuss] Self nomination
  2015-07-31 17:15       ` Guenter Roeck
@ 2015-07-31 17:51         ` Bjorn Helgaas
  2015-07-31 18:12           ` Greg KH
  0 siblings, 1 reply; 81+ messages in thread
From: Bjorn Helgaas @ 2015-07-31 17:51 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: ksummit-discuss, Sasha Levin

On Fri, Jul 31, 2015 at 12:15 PM, Guenter Roeck <linux@roeck-us.net> wrote:
> On Fri, Jul 31, 2015 at 10:08:25AM -0700, Greg KH wrote:
>> On Fri, Jul 31, 2015 at 09:59:15AM -0700, Guenter Roeck wrote:
>> > Hi Bjorn,
>> >
>> > On Fri, Jul 31, 2015 at 11:27:38AM -0500, Bjorn Helgaas wrote:
>> > > On Thu, Jul 30, 2015 at 9:55 PM, Sasha Levin <sasha.levin@oracle.com> wrote:
>> > > > Mainly I'd like to talk about improving testing around the kernel, both by catching bugs
>> > > > and by improving the quality of debug output that comes out of the kernel.
>> > >
>> > > What sort of debug output improvements are you interested in?  I spend
>> > > a fair amount of time converting to dev_printk and %pR.  They make the
>> >
>> > I have been wondering about that, especially since dev_dbg() and
>> > 'dev_printk(KERN_DEBUG, ...)' are semantically different. Any reason
>> > for preferring dev_printk() over dev_dbg() ?
>>
>> The opposite is true, please always use dev_dbg() as it properly ties
>
> Hi Greg,
>
> maybe my semantics wasn't perfect - my question was why Bjorn prefers
> dev_printk(KERN_DEBUG, ..) over dev_dbg(), which he answered.
>
> I did not (want to) make the claim or even suggest that dev_printk()
> would be preferred over dev_dbg() in general.

Lest I give the wrong impression, I'm not opposed to dev_dbg().  I
just think there are two separate changes that don't need to be made
at the same time: (1) convert to dev_printk style, and (2) convert to
the dynamic debug stuff.  As a consumer of dmesg logs, I'm most
interested in the former.

I prefer that a dmesg collected in the simplest possible way, with no
special config or boot flags, be as useful as possible.  So converting
to dynamic debug requires much more thought about which messages
should be always printed and which should become dynamic.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Ksummit-discuss] Self nomination
  2015-07-31 17:26       ` James Bottomley
  2015-07-31 17:43         ` Greg KH
@ 2015-07-31 17:49         ` Sasha Levin
  1 sibling, 0 replies; 81+ messages in thread
From: Sasha Levin @ 2015-07-31 17:49 UTC (permalink / raw)
  To: James Bottomley, Greg KH; +Cc: ksummit-discuss

On 07/31/2015 01:26 PM, James Bottomley wrote:
> Other than to confuse us all over minute details, is there a reason for
> this difference to exist?  I realise right at the moment dev_dbg can be
> configured off by a variety of symbols and dev_printk(KERN_DEBUG) can't,
> conversely dev_printk doesn't pick up the location information and
> dev_dbg does, but there's no reason all that couldn't be harmonised so
> we didn't have to have arguments about which to use (and endless patches
> switching from one to the other)?

I suppose it's a good sub-topic: deciding which one of the 3.5 methods
to print debug information we should be using across the kernel and
remove the other 2.5.


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Ksummit-discuss] Self nomination
       [not found] ` <alpine.DEB.2.02.1507310650220.2218@localhost6.localdomain6>
@ 2015-07-31 17:49   ` Sasha Levin
  2015-08-01 13:45     ` Dan Carpenter
  0 siblings, 1 reply; 81+ messages in thread
From: Sasha Levin @ 2015-07-31 17:49 UTC (permalink / raw)
  To: Julia Lawall, ksummit-discuss

On 07/31/2015 12:51 AM, Julia Lawall wrote:
>> I'd like to nominate myself to this year's kernel summit.
>> > 
>> > Mainly I'd like to talk about improving testing around the kernel, both by catching bugs
>> > and by improving the quality of debug output that comes out of the kernel.
> Improving it in what way?  We are doing some research related to logging 
> at the moment, and so I would be interested to hear what you think are the 
> problems.

>From my perspective the biggest issue is making a given error state more reproducible. Quite
often I am able to trigger a bug on my testing box, but the information the kernel spits out
(backtrace + registers) aren't too useful if the problem is complicated.

The result is that finding out what happens depends on few people who are able to "magically"
trigger bugs and result in a long and inefficient back and forth while the bug manages to
sneak in upstream and to users hands.

To sum it up, I think we need to figure out a way to produce enough information to make bugs
more reproducible, but not too much to avoid a needle in a haystack situation.

In this regard, the new Intel PT technology is interesting and it's also worth figuring out
a good way to integrate it with our current infrastructure/tools.


Among other, less urgent ideas I'd like to discuss are:

 - Using KASan for poison fields. Quite a few structures have embedded poison fields. Rather
than detecting a corruption after it happens we are able to detect it as it happens - making
bugs more obvious.

 - Encouraging folks who add new sysctls (or features to existing sysctls) to contribute
testing code to the various testing projects around (trinity and such).

 - Improving userspace tooling to make transfer of information simpler (along the lines of
scripts/decode_stacktrace.sh that no one seems to be using).

I suppose it'll be interesting to discuss the idea of making the kernel dump a "black box"
on panic. Not just the core memory, but also various parameters regarding configuration and
such, to enable folks to provide us with more than just backtraces.


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Ksummit-discuss] Self nomination
  2015-07-31 17:26       ` James Bottomley
@ 2015-07-31 17:43         ` Greg KH
  2015-07-31 17:49         ` Sasha Levin
  1 sibling, 0 replies; 81+ messages in thread
From: Greg KH @ 2015-07-31 17:43 UTC (permalink / raw)
  To: James Bottomley; +Cc: Sasha Levin, ksummit-discuss

On Fri, Jul 31, 2015 at 10:26:41AM -0700, James Bottomley wrote:
> On Fri, 2015-07-31 at 10:08 -0700, Greg KH wrote:
> > On Fri, Jul 31, 2015 at 09:59:15AM -0700, Guenter Roeck wrote:
> > > Hi Bjorn,
> > > 
> > > On Fri, Jul 31, 2015 at 11:27:38AM -0500, Bjorn Helgaas wrote:
> > > > On Thu, Jul 30, 2015 at 9:55 PM, Sasha Levin <sasha.levin@oracle.com> wrote:
> > > > > Mainly I'd like to talk about improving testing around the kernel, both by catching bugs
> > > > > and by improving the quality of debug output that comes out of the kernel.
> > > > 
> > > > What sort of debug output improvements are you interested in?  I spend
> > > > a fair amount of time converting to dev_printk and %pR.  They make the
> > > 
> > > I have been wondering about that, especially since dev_dbg() and
> > > 'dev_printk(KERN_DEBUG, ...)' are semantically different. Any reason
> > > for preferring dev_printk() over dev_dbg() ?
> > 
> > The opposite is true, please always use dev_dbg() as it properly ties
> > into the overall kernel-wide dynamic debug infrastructure, providing a
> > unified way to enable/disable debug messages, or even, compiling them
> > out if none are wanted due to size constraints.
> 
> Other than to confuse us all over minute details, is there a reason for
> this difference to exist?

dev_dbg() uses dev_printk(KERN_DEBUG) to build on, we can't delete
dev_printk() from the tree, so we are stuck with both.  If you can
figure out a way to prevent this, that would be great.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Ksummit-discuss] Self nomination
  2015-07-31 17:08     ` Greg KH
  2015-07-31 17:15       ` Guenter Roeck
@ 2015-07-31 17:26       ` James Bottomley
  2015-07-31 17:43         ` Greg KH
  2015-07-31 17:49         ` Sasha Levin
  1 sibling, 2 replies; 81+ messages in thread
From: James Bottomley @ 2015-07-31 17:26 UTC (permalink / raw)
  To: Greg KH; +Cc: Sasha Levin, ksummit-discuss

On Fri, 2015-07-31 at 10:08 -0700, Greg KH wrote:
> On Fri, Jul 31, 2015 at 09:59:15AM -0700, Guenter Roeck wrote:
> > Hi Bjorn,
> > 
> > On Fri, Jul 31, 2015 at 11:27:38AM -0500, Bjorn Helgaas wrote:
> > > On Thu, Jul 30, 2015 at 9:55 PM, Sasha Levin <sasha.levin@oracle.com> wrote:
> > > > Mainly I'd like to talk about improving testing around the kernel, both by catching bugs
> > > > and by improving the quality of debug output that comes out of the kernel.
> > > 
> > > What sort of debug output improvements are you interested in?  I spend
> > > a fair amount of time converting to dev_printk and %pR.  They make the
> > 
> > I have been wondering about that, especially since dev_dbg() and
> > 'dev_printk(KERN_DEBUG, ...)' are semantically different. Any reason
> > for preferring dev_printk() over dev_dbg() ?
> 
> The opposite is true, please always use dev_dbg() as it properly ties
> into the overall kernel-wide dynamic debug infrastructure, providing a
> unified way to enable/disable debug messages, or even, compiling them
> out if none are wanted due to size constraints.

Other than to confuse us all over minute details, is there a reason for
this difference to exist?  I realise right at the moment dev_dbg can be
configured off by a variety of symbols and dev_printk(KERN_DEBUG) can't,
conversely dev_printk doesn't pick up the location information and
dev_dbg does, but there's no reason all that couldn't be harmonised so
we didn't have to have arguments about which to use (and endless patches
switching from one to the other)?

James

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Ksummit-discuss] Self nomination
  2015-07-31 17:08     ` Greg KH
@ 2015-07-31 17:15       ` Guenter Roeck
  2015-07-31 17:51         ` Bjorn Helgaas
  2015-07-31 17:26       ` James Bottomley
  1 sibling, 1 reply; 81+ messages in thread
From: Guenter Roeck @ 2015-07-31 17:15 UTC (permalink / raw)
  To: Greg KH; +Cc: Sasha Levin, ksummit-discuss

On Fri, Jul 31, 2015 at 10:08:25AM -0700, Greg KH wrote:
> On Fri, Jul 31, 2015 at 09:59:15AM -0700, Guenter Roeck wrote:
> > Hi Bjorn,
> > 
> > On Fri, Jul 31, 2015 at 11:27:38AM -0500, Bjorn Helgaas wrote:
> > > On Thu, Jul 30, 2015 at 9:55 PM, Sasha Levin <sasha.levin@oracle.com> wrote:
> > > > Mainly I'd like to talk about improving testing around the kernel, both by catching bugs
> > > > and by improving the quality of debug output that comes out of the kernel.
> > > 
> > > What sort of debug output improvements are you interested in?  I spend
> > > a fair amount of time converting to dev_printk and %pR.  They make the
> > 
> > I have been wondering about that, especially since dev_dbg() and
> > 'dev_printk(KERN_DEBUG, ...)' are semantically different. Any reason
> > for preferring dev_printk() over dev_dbg() ?
> 
> The opposite is true, please always use dev_dbg() as it properly ties

Hi Greg,

maybe my semantics wasn't perfect - my question was why Bjorn prefers
dev_printk(KERN_DEBUG, ..) over dev_dbg(), which he answered.

I did not (want to) make the claim or even suggest that dev_printk()
would be preferred over dev_dbg() in general.

Thanks,
Guenter

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Ksummit-discuss] Self nomination
  2015-07-31 16:59   ` Guenter Roeck
  2015-07-31 17:03     ` Bjorn Helgaas
@ 2015-07-31 17:08     ` Greg KH
  2015-07-31 17:15       ` Guenter Roeck
  2015-07-31 17:26       ` James Bottomley
  1 sibling, 2 replies; 81+ messages in thread
From: Greg KH @ 2015-07-31 17:08 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: Sasha Levin, ksummit-discuss

On Fri, Jul 31, 2015 at 09:59:15AM -0700, Guenter Roeck wrote:
> Hi Bjorn,
> 
> On Fri, Jul 31, 2015 at 11:27:38AM -0500, Bjorn Helgaas wrote:
> > On Thu, Jul 30, 2015 at 9:55 PM, Sasha Levin <sasha.levin@oracle.com> wrote:
> > > Mainly I'd like to talk about improving testing around the kernel, both by catching bugs
> > > and by improving the quality of debug output that comes out of the kernel.
> > 
> > What sort of debug output improvements are you interested in?  I spend
> > a fair amount of time converting to dev_printk and %pR.  They make the
> 
> I have been wondering about that, especially since dev_dbg() and
> 'dev_printk(KERN_DEBUG, ...)' are semantically different. Any reason
> for preferring dev_printk() over dev_dbg() ?

The opposite is true, please always use dev_dbg() as it properly ties
into the overall kernel-wide dynamic debug infrastructure, providing a
unified way to enable/disable debug messages, or even, compiling them
out if none are wanted due to size constraints.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Ksummit-discuss] Self nomination
  2015-07-31 16:59   ` Guenter Roeck
@ 2015-07-31 17:03     ` Bjorn Helgaas
  2015-07-31 17:08     ` Greg KH
  1 sibling, 0 replies; 81+ messages in thread
From: Bjorn Helgaas @ 2015-07-31 17:03 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: Sasha Levin, ksummit-discuss

On Fri, Jul 31, 2015 at 11:59 AM, Guenter Roeck <linux@roeck-us.net> wrote:
> Hi Bjorn,
>
> On Fri, Jul 31, 2015 at 11:27:38AM -0500, Bjorn Helgaas wrote:
>> On Thu, Jul 30, 2015 at 9:55 PM, Sasha Levin <sasha.levin@oracle.com> wrote:
>> > Mainly I'd like to talk about improving testing around the kernel, both by catching bugs
>> > and by improving the quality of debug output that comes out of the kernel.
>>
>> What sort of debug output improvements are you interested in?  I spend
>> a fair amount of time converting to dev_printk and %pR.  They make the
>
> I have been wondering about that, especially since dev_dbg() and
> 'dev_printk(KERN_DEBUG, ...)' are semantically different. Any reason
> for preferring dev_printk() over dev_dbg() ?

Personally I think that difference is a hassle.  When I convert
printk(KERN_DEBUG) to use dev_printk, I use dev_printk(KERN_DEBUG) to
preserve the fact that the printk happens unconditionally.  If I were
smarter, I would figure out how to use dev_dbg() effectively, but
usually I want my patch to be "convert to dev_printk", not "convert to
dev_dbg and only print when some magic incantation is given".

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Ksummit-discuss] Self nomination
  2015-07-31 16:27 ` Bjorn Helgaas
@ 2015-07-31 16:59   ` Guenter Roeck
  2015-07-31 17:03     ` Bjorn Helgaas
  2015-07-31 17:08     ` Greg KH
  0 siblings, 2 replies; 81+ messages in thread
From: Guenter Roeck @ 2015-07-31 16:59 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Sasha Levin, ksummit-discuss

Hi Bjorn,

On Fri, Jul 31, 2015 at 11:27:38AM -0500, Bjorn Helgaas wrote:
> On Thu, Jul 30, 2015 at 9:55 PM, Sasha Levin <sasha.levin@oracle.com> wrote:
> > Mainly I'd like to talk about improving testing around the kernel, both by catching bugs
> > and by improving the quality of debug output that comes out of the kernel.
> 
> What sort of debug output improvements are you interested in?  I spend
> a fair amount of time converting to dev_printk and %pR.  They make the

I have been wondering about that, especially since dev_dbg() and
'dev_printk(KERN_DEBUG, ...)' are semantically different. Any reason
for preferring dev_printk() over dev_dbg() ?

Thank,
Guenter

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [Ksummit-discuss] Self nomination
  2015-07-31  2:55 Sasha Levin
@ 2015-07-31 16:27 ` Bjorn Helgaas
  2015-07-31 16:59   ` Guenter Roeck
       [not found] ` <alpine.DEB.2.02.1507310650220.2218@localhost6.localdomain6>
  1 sibling, 1 reply; 81+ messages in thread
From: Bjorn Helgaas @ 2015-07-31 16:27 UTC (permalink / raw)
  To: Sasha Levin; +Cc: ksummit-discuss

On Thu, Jul 30, 2015 at 9:55 PM, Sasha Levin <sasha.levin@oracle.com> wrote:
> Mainly I'd like to talk about improving testing around the kernel, both by catching bugs
> and by improving the quality of debug output that comes out of the kernel.

What sort of debug output improvements are you interested in?  I spend
a fair amount of time converting to dev_printk and %pR.  They make the
kernel more consistent and approachable, but they're minor and I'm
sure you have more substantive things in mind.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* [Ksummit-discuss] Self nomination
@ 2015-07-31  9:15 David Howells
  0 siblings, 0 replies; 81+ messages in thread
From: David Howells @ 2015-07-31  9:15 UTC (permalink / raw)
  To: ksummit-discuss

I would like to nominate myself for this years KS.  I'm interested in:

 (*) Compiler features and cross-compilers having been maintaining a set of
     cross compilers for kernel.  I've been watching gcc-5 come out and find
     new things to warning about in the kernel.

 (*) Filesystem unioning - overlayfs, unionmount and the way LSMs interact and
     the way unioning metadata is stored in the fs.

 (*) Signature management - keys, modules, firmware.

 (*) Media drivers - I've been working to improve a couple of these.

David

^ permalink raw reply	[flat|nested] 81+ messages in thread

* [Ksummit-discuss] Self nomination
@ 2015-07-31  2:55 Sasha Levin
  2015-07-31 16:27 ` Bjorn Helgaas
       [not found] ` <alpine.DEB.2.02.1507310650220.2218@localhost6.localdomain6>
  0 siblings, 2 replies; 81+ messages in thread
From: Sasha Levin @ 2015-07-31  2:55 UTC (permalink / raw)
  To: ksummit-discuss

Hi folks,

I'd like to nominate myself to this year's kernel summit.

Mainly I'd like to talk about improving testing around the kernel, both by catching bugs
and by improving the quality of debug output that comes out of the kernel.

As the maintainer of the 3.18 LTS line I'd also like to discuss various -stable kernel
topics that came up previously on the mailing list.


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 81+ messages in thread

end of thread, other threads:[~2016-09-23 10:42 UTC | newest]

Thread overview: 81+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-25 17:11 [Ksummit-discuss] Self nomination Johannes Weiner
2016-07-25 18:15 ` Rik van Riel
2016-07-26 10:56   ` Jan Kara
2016-07-26 13:10     ` Vlastimil Babka
2016-07-28 18:55 ` [Ksummit-discuss] [TECH TOPIC] Memory thrashing, was " Johannes Weiner
2016-07-28 21:41   ` James Bottomley
2016-08-01 15:46     ` Johannes Weiner
2016-08-01 16:06       ` James Bottomley
2016-08-01 16:11         ` Dave Hansen
2016-08-01 16:33           ` James Bottomley
2016-08-01 18:13             ` Rik van Riel
2016-08-01 19:51             ` Dave Hansen
2016-08-01 17:08           ` Johannes Weiner
2016-08-01 18:19             ` Johannes Weiner
2016-07-29  0:25   ` Rik van Riel
2016-07-29 11:07   ` Mel Gorman
2016-07-29 16:26     ` Luck, Tony
2016-08-01 15:17       ` Rik van Riel
2016-08-01 16:55     ` Johannes Weiner
2016-08-02  9:18   ` Jan Kara
  -- strict thread matches above, loose matches on Subject: below --
2016-07-31  6:57 [Ksummit-discuss] " Olof Johansson
2016-08-02 19:56 ` Mark Brown
2016-07-30  0:32 Ben Hutchings
2016-07-29 22:45 [Ksummit-discuss] self nomination Mimi Zohar
2016-07-29 15:13 [Ksummit-discuss] Self nomination Bartlomiej Zolnierkiewicz
2016-07-28 17:29 [Ksummit-discuss] self nomination James Bottomley
2016-07-28 17:31 ` James Bottomley
2016-07-27 23:20 Davidlohr Bueso
2016-07-28  7:18 ` Jan Kara
2016-07-28 14:37 ` Rik van Riel
2016-07-29  6:17 ` Wangnan (F)
2016-07-29 23:53   ` Davidlohr Bueso
2016-07-27 14:54 [Ksummit-discuss] Self nomination Mark Rutland
2016-07-27 13:57 Lorenzo Pieralisi
2016-07-27  4:46 Darren Hart
2016-07-27  9:25 ` Linus Walleij
2016-07-27 17:02   ` Darren Hart
2016-08-04 12:30   ` Geert Uytterhoeven
2016-07-27  0:50 Sergey Senozhatsky
2016-07-26 23:59 Stephen Rothwell
2016-07-28 12:23 ` Luis de Bethencourt
2016-07-26 22:30 Dmitry Torokhov
2016-07-28 10:14 ` Marc Zyngier
2016-08-02  8:09   ` Linus Walleij
2016-08-02 23:00     ` Rafael J. Wysocki
2016-08-03  8:12       ` Marek Szyprowski
2016-08-06  0:20         ` Rafael J. Wysocki
2016-08-06  0:20           ` Rafael J. Wysocki
2016-08-24 12:12           ` Marek Szyprowski
2016-08-24 12:12             ` Marek Szyprowski
2016-08-24 17:32             ` Rafael J. Wysocki
2016-08-24 17:32               ` Rafael J. Wysocki
2016-08-08 11:07   ` Lorenzo Pieralisi
2016-09-23 10:42     ` Grant Likely
2016-07-26 15:44 David Woodhouse
2016-07-25 21:46 [Ksummit-discuss] self nomination Kevin Hilman
2015-08-24  4:20 [Ksummit-discuss] [TECH TOPIC] Kernel Hardening James Morris
2015-08-24 11:46 ` Jiri Kosina
2015-08-24 11:56   ` James Morris
2015-08-24 17:17     ` Kees Cook
2015-08-26 20:51       ` Kees Cook
2015-08-26 21:10         ` Matthew Garrett
2015-08-30  0:41           ` [Ksummit-discuss] Self nomination Matthew Garrett
2015-08-11  5:05 Haggai Eran
2015-07-31  9:15 David Howells
2015-07-31  2:55 Sasha Levin
2015-07-31 16:27 ` Bjorn Helgaas
2015-07-31 16:59   ` Guenter Roeck
2015-07-31 17:03     ` Bjorn Helgaas
2015-07-31 17:08     ` Greg KH
2015-07-31 17:15       ` Guenter Roeck
2015-07-31 17:51         ` Bjorn Helgaas
2015-07-31 18:12           ` Greg KH
2015-07-31 18:17             ` Dave Jones
2015-07-31 18:22               ` Greg KH
2015-07-31 18:59                 ` Dan Williams
2015-08-01 13:03                   ` Jiri Kosina
2015-07-31 17:26       ` James Bottomley
2015-07-31 17:43         ` Greg KH
2015-07-31 17:49         ` Sasha Levin
     [not found] ` <alpine.DEB.2.02.1507310650220.2218@localhost6.localdomain6>
2015-07-31 17:49   ` Sasha Levin
2015-08-01 13:45     ` Dan Carpenter
2015-08-01 15:26       ` Sasha Levin
2015-08-01 16:22         ` Greg KH
2015-08-03  5:14           ` Sasha Levin
2015-08-01 20:30         ` Dave Jones
2015-08-03  5:17           ` Sasha Levin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.