Discussion:
[PATCH 45/49] mm: numa: Add THP migration for the NUMA working set scanning fault case build fix
(too old to reply)
Mel Gorman
2012-12-07 10:30:01 UTC
Permalink
Commit "Add THP migration for the NUMA working set scanning fault case"
breaks the build because HPAGE_PMD_SHIFT and HPAGE_PMD_MASK defined to
explode without CONFIG_TRANSPARENT_HUGEPAGE:

mm/migrate.c: In function 'migrate_misplaced_transhuge_page_put':
mm/migrate.c:1549: error: call to '__build_bug_failed' declared with attribute error: BUILD_BUG failed
mm/migrate.c:1564: error: call to '__build_bug_failed' declared with attribute error: BUILD_BUG failed
mm/migrate.c:1566: error: call to '__build_bug_failed' declared with attribute error: BUILD_BUG failed
mm/migrate.c:1573: error: call to '__build_bug_failed' declared with attribute error: BUILD_BUG failed
mm/migrate.c:1606: error: call to '__build_bug_failed' declared with attribute error: BUILD_BUG failed
mm/migrate.c:1648: error: call to '__build_bug_failed' declared with attribute error: BUILD_BUG failed

CONFIG_NUMA_BALANCING allows compilation without enabling transparent
hugepages, so define the dummy function for such a configuration and only
define migrate_misplaced_transhuge_page_put() when transparent hugepages
are enabled.

Signed-off-by: David Rientjes <***@google.com>
Signed-off-by: Mel Gorman <***@suse.de>
---
include/linux/migrate.h | 16 +++++++++-------
mm/migrate.c | 2 ++
2 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index ed5a6c5..6c15d4a 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -79,12 +79,6 @@ static inline int migrate_huge_page_move_mapping(struct address_space *mapping,
extern int migrate_misplaced_page(struct page *page, int node);
extern int migrate_misplaced_page(struct page *page, int node);
extern bool migrate_ratelimited(int node);
-extern int migrate_misplaced_transhuge_page(struct mm_struct *mm,
- struct vm_area_struct *vma,
- pmd_t *pmd, pmd_t entry,
- unsigned long address,
- struct page *page, int node);
-
#else
static inline int migrate_misplaced_page(struct page *page, int node)
{
@@ -94,7 +88,15 @@ static inline bool migrate_ratelimited(int node)
{
return false;
}
+#endif /* CONFIG_BALANCE_NUMA */

+#if defined(CONFIG_BALANCE_NUMA) && defined(CONFIG_TRANSPARENT_HUGEPAGE)
+extern int migrate_misplaced_transhuge_page(struct mm_struct *mm,
+ struct vm_area_struct *vma,
+ pmd_t *pmd, pmd_t entry,
+ unsigned long address,
+ struct page *page, int node);
+#else
static inline int migrate_misplaced_transhuge_page(struct mm_struct *mm,
struct vm_area_struct *vma,
pmd_t *pmd, pmd_t entry,
@@ -103,6 +105,6 @@ static inline int migrate_misplaced_transhuge_page(struct mm_struct *mm,
{
return -EAGAIN;
}
-#endif /* CONFIG_BALANCE_NUMA */
+#endif /* CONFIG_BALANCE_NUMA && CONFIG_TRANSPARENT_HUGEPAGE*/

#endif /* _LINUX_MIGRATE_H */
diff --git a/mm/migrate.c b/mm/migrate.c
index 4b1b239..b6fe2d2 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1602,7 +1602,9 @@ int migrate_misplaced_page(struct page *page, int node)
out:
return isolated;
}
+#endif /* CONFIG_BALANCE_NUMA */

+#if defined(CONFIG_BALANCE_NUMA) && defined(CONFIG_TRANSPARENT_HUGEPAGE)
int migrate_misplaced_transhuge_page(struct mm_struct *mm,
struct vm_area_struct *vma,
pmd_t *pmd, pmd_t entry,
--
1.7.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Mel Gorman
2012-12-07 10:30:01 UTC
Permalink
From: Peter Zijlstra <***@chello.nl>

Previously, to probe the working set of a task, we'd use
a very simple and crude method: mark all of its address
space PROT_NONE.

That method has various (obvious) disadvantages:

- it samples the working set at dissimilar rates,
giving some tasks a sampling quality advantage
over others.

- creates performance problems for tasks with very
large working sets

- over-samples processes with large address spaces but
which only very rarely execute

Improve that method by keeping a rotating offset into the
address space that marks the current position of the scan,
and advance it by a constant rate (in a CPU cycles execution
proportional manner). If the offset reaches the last mapped
address of the mm then it then it starts over at the first
address.

The per-task nature of the working set sampling functionality in this tree
allows such constant rate, per task, execution-weight proportional sampling
of the working set, with an adaptive sampling interval/frequency that
goes from once per 100ms up to just once per 8 seconds. The current
sampling volume is 256 MB per interval.

As tasks mature and converge their working set, so does the
sampling rate slow down to just a trickle, 256 MB per 8
seconds of CPU time executed.

This, beyond being adaptive, also rate-limits rarely
executing systems and does not over-sample on overloaded
systems.

[ In AutoNUMA speak, this patch deals with the effective sampling
rate of the 'hinting page fault'. AutoNUMA's scanning is
currently rate-limited, but it is also fundamentally
single-threaded, executing in the knuma_scand kernel thread,
so the limit in AutoNUMA is global and does not scale up with
the number of CPUs, nor does it scan tasks in an execution
proportional manner.

So the idea of rate-limiting the scanning was first implemented
in the AutoNUMA tree via a global rate limit. This patch goes
beyond that by implementing an execution rate proportional
working set sampling rate that is not implemented via a single
global scanning daemon. ]

[ Dan Carpenter pointed out a possible NULL pointer dereference in the
first version of this patch. ]

Based-on-idea-by: Andrea Arcangeli <***@redhat.com>
Bug-Found-By: Dan Carpenter <***@oracle.com>
Signed-off-by: Peter Zijlstra <***@chello.nl>
Cc: Linus Torvalds <***@linux-foundation.org>
Cc: Andrew Morton <***@linux-foundation.org>
Cc: Peter Zijlstra <***@chello.nl>
Cc: Andrea Arcangeli <***@redhat.com>
Cc: Rik van Riel <***@redhat.com>
[ Wrote changelog and fixed bug. ]
Signed-off-by: Ingo Molnar <***@kernel.org>
Signed-off-by: Mel Gorman <***@suse.de>
Reviewed-by: Rik van Riel <***@redhat.com>
---
include/linux/mm_types.h | 3 +++
include/linux/sched.h | 1 +
kernel/sched/fair.c | 65 ++++++++++++++++++++++++++++++++++++----------
kernel/sysctl.c | 7 +++++
4 files changed, 63 insertions(+), 13 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index d82accb..b40f4ef 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -406,6 +406,9 @@ struct mm_struct {
*/
unsigned long numa_next_scan;

+ /* Restart point for scanning and setting pte_numa */
+ unsigned long numa_scan_offset;
+
/* numa_scan_seq prevents two threads setting pte_numa */
int numa_scan_seq;
#endif
diff --git a/include/linux/sched.h b/include/linux/sched.h
index ac71181..abb1c70 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2008,6 +2008,7 @@ extern enum sched_tunable_scaling sysctl_sched_tunable_scaling;

extern unsigned int sysctl_balance_numa_scan_period_min;
extern unsigned int sysctl_balance_numa_scan_period_max;
+extern unsigned int sysctl_balance_numa_scan_size;
extern unsigned int sysctl_balance_numa_settle_count;

#ifdef CONFIG_SCHED_DEBUG
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index b6d3ed7..66d8bd2 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -780,10 +780,13 @@ update_stats_curr_start(struct cfs_rq *cfs_rq, struct sched_entity *se)

#ifdef CONFIG_BALANCE_NUMA
/*
- * numa task sample period in ms: 5s
+ * numa task sample period in ms
*/
-unsigned int sysctl_balance_numa_scan_period_min = 5000;
-unsigned int sysctl_balance_numa_scan_period_max = 5000*16;
+unsigned int sysctl_balance_numa_scan_period_min = 100;
+unsigned int sysctl_balance_numa_scan_period_max = 100*16;
+
+/* Portion of address space to scan in MB */
+unsigned int sysctl_balance_numa_scan_size = 256;

static void task_numa_placement(struct task_struct *p)
{
@@ -808,6 +811,12 @@ void task_numa_fault(int node, int pages)
task_numa_placement(p);
}

+static void reset_ptenuma_scan(struct task_struct *p)
+{
+ ACCESS_ONCE(p->mm->numa_scan_seq)++;
+ p->mm->numa_scan_offset = 0;
+}
+
/*
* The expensive part of numa migration is done from task_work context.
* Triggered from task_tick_numa().
@@ -817,6 +826,9 @@ void task_numa_work(struct callback_head *work)
unsigned long migrate, next_scan, now = jiffies;
struct task_struct *p = current;
struct mm_struct *mm = p->mm;
+ struct vm_area_struct *vma;
+ unsigned long offset, end;
+ long length;

WARN_ON_ONCE(p != container_of(work, struct task_struct, numa_work));

@@ -846,18 +858,45 @@ void task_numa_work(struct callback_head *work)
if (cmpxchg(&mm->numa_next_scan, migrate, next_scan) != migrate)
return;

- ACCESS_ONCE(mm->numa_scan_seq)++;
- {
- struct vm_area_struct *vma;
+ offset = mm->numa_scan_offset;
+ length = sysctl_balance_numa_scan_size;
+ length <<= 20;

- down_read(&mm->mmap_sem);
- for (vma = mm->mmap; vma; vma = vma->vm_next) {
- if (!vma_migratable(vma))
- continue;
- change_prot_numa(vma, vma->vm_start, vma->vm_end);
- }
- up_read(&mm->mmap_sem);
+ down_read(&mm->mmap_sem);
+ vma = find_vma(mm, offset);
+ if (!vma) {
+ reset_ptenuma_scan(p);
+ offset = 0;
+ vma = mm->mmap;
+ }
+ for (; vma && length > 0; vma = vma->vm_next) {
+ if (!vma_migratable(vma))
+ continue;
+
+ /* Skip small VMAs. They are not likely to be of relevance */
+ if (((vma->vm_end - vma->vm_start) >> PAGE_SHIFT) < HPAGE_PMD_NR)
+ continue;
+
+ offset = max(offset, vma->vm_start);
+ end = min(ALIGN(offset + length, HPAGE_SIZE), vma->vm_end);
+ length -= end - offset;
+
+ change_prot_numa(vma, offset, end);
+
+ offset = end;
}
+
+ /*
+ * It is possible to reach the end of the VMA list but the last few VMAs are
+ * not guaranteed to the vma_migratable. If they are not, we would find the
+ * !migratable VMA on the next scan but not reset the scanner to the start
+ * so check it now.
+ */
+ if (vma)
+ mm->numa_scan_offset = offset;
+ else
+ reset_ptenuma_scan(p);
+ up_read(&mm->mmap_sem);
}

/*
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 1359f51..d191203 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -366,6 +366,13 @@ static struct ctl_table kern_table[] = {
.mode = 0644,
.proc_handler = proc_dointvec,
},
+ {
+ .procname = "balance_numa_scan_size_mb",
+ .data = &sysctl_balance_numa_scan_size,
+ .maxlen = sizeof(unsigned int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec,
+ },
#endif /* CONFIG_BALANCE_NUMA */
#endif /* CONFIG_SCHED_DEBUG */
{
--
1.7.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Mel Gorman
2012-12-07 10:30:01 UTC
Permalink
Currently the rate of scanning for an address space is controlled
by the individual tasks. The next scan is simply determined by
2*p->numa_scan_period.

The 2*p->numa_scan_period is arbitrary and never changes. At this point
there is still no proper policy that decides if a task or process is
properly placed. It just scans and assumes the next NUMA fault will
place it properly. As it is assumed that pages will get properly placed
over time, increase the scan window each time a fault is incurred. This
is a big assumption as noted in the comments.

It should be noted that changing to p->numa_scan_period will increase
system CPU usage because now the scanning rate has effectively doubled.
If that is a problem then the min_rate should be made 200ms instead of
restoring the 2* logic.

Signed-off-by: Mel Gorman <***@suse.de>
---
kernel/sched/fair.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 357057c..3c632448 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -812,6 +812,15 @@ void task_numa_fault(int node, int pages)

/* FIXME: Allocate task-specific structure for placement policy here */

+ /*
+ * Assume that as faults occur that pages are getting properly placed
+ * and fewer NUMA hints are required. Note that this is a big
+ * assumption, it assumes processes reach a steady steady with no
+ * further phase changes.
+ */
+ p->numa_scan_period = min(sysctl_balance_numa_scan_period_max,
+ p->numa_scan_period + jiffies_to_msecs(2));
+
task_numa_placement(p);
}

@@ -858,7 +867,7 @@ void task_numa_work(struct callback_head *work)
if (p->numa_scan_period == 0)
p->numa_scan_period = sysctl_balance_numa_scan_period_min;

- next_scan = now + 2*msecs_to_jiffies(p->numa_scan_period);
+ next_scan = now + msecs_to_jiffies(p->numa_scan_period);
if (cmpxchg(&mm->numa_next_scan, migrate, next_scan) != migrate)
return;
--
1.7.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Mel Gorman
2012-12-07 10:30:02 UTC
Permalink
From: Lee Schermerhorn <***@hp.com>

This patch provides a new function to test whether a page resides
on a node that is appropriate for the mempolicy for the vma and
address where the page is supposed to be mapped. This involves
looking up the node where the page belongs. So, the function
returns that node so that it may be used to allocated the page
without consulting the policy again.

A subsequent patch will call this function from the fault path.
Because of this, I don't want to go ahead and allocate the page, e.g.,
via alloc_page_vma() only to have to free it if it has the correct
policy. So, I just mimic the alloc_page_vma() node computation
logic--sort of.

Note: we could use this function to implement a MPOL_MF_STRICT
behavior when migrating pages to match mbind() mempolicy--e.g.,
to ensure that pages in an interleaved range are reinterleaved
rather than left where they are when they reside on any page in
the interleave nodemask.

Signed-off-by: Lee Schermerhorn <***@hp.com>
Reviewed-by: Rik van Riel <***@redhat.com>
Cc: Andrew Morton <***@linux-foundation.org>
Cc: Linus Torvalds <***@linux-foundation.org>
[ Added MPOL_F_LAZY to trigger migrate-on-fault;
simplified code now that we don't have to bother
with special crap for interleaved ]
Signed-off-by: Peter Zijlstra <***@chello.nl>
Signed-off-by: Ingo Molnar <***@kernel.org>
Signed-off-by: Mel Gorman <***@suse.de>
---
include/linux/mempolicy.h | 8 +++++
include/uapi/linux/mempolicy.h | 1 +
mm/mempolicy.c | 76 ++++++++++++++++++++++++++++++++++++++++
3 files changed, 85 insertions(+)

diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h
index e5ccb9d..c511e25 100644
--- a/include/linux/mempolicy.h
+++ b/include/linux/mempolicy.h
@@ -198,6 +198,8 @@ static inline int vma_migratable(struct vm_area_struct *vma)
return 1;
}

+extern int mpol_misplaced(struct page *, struct vm_area_struct *, unsigned long);
+
#else

struct mempolicy {};
@@ -323,5 +325,11 @@ static inline int mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol,
return 0;
}

+static inline int mpol_misplaced(struct page *page, struct vm_area_struct *vma,
+ unsigned long address)
+{
+ return -1; /* no node preference */
+}
+
#endif /* CONFIG_NUMA */
#endif
diff --git a/include/uapi/linux/mempolicy.h b/include/uapi/linux/mempolicy.h
index d23dca8..472de8a 100644
--- a/include/uapi/linux/mempolicy.h
+++ b/include/uapi/linux/mempolicy.h
@@ -61,6 +61,7 @@ enum mpol_rebind_step {
#define MPOL_F_SHARED (1 << 0) /* identify shared policies */
#define MPOL_F_LOCAL (1 << 1) /* preferred local allocation */
#define MPOL_F_REBINDING (1 << 2) /* identify policies in rebinding */
+#define MPOL_F_MOF (1 << 3) /* this policy wants migrate on fault */


#endif /* _UAPI_LINUX_MEMPOLICY_H */
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index c21e914..df1466d 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -2181,6 +2181,82 @@ static void sp_free(struct sp_node *n)
kmem_cache_free(sn_cache, n);
}

+/**
+ * mpol_misplaced - check whether current page node is valid in policy
+ *
+ * @page - page to be checked
+ * @vma - vm area where page mapped
+ * @addr - virtual address where page mapped
+ *
+ * Lookup current policy node id for vma,addr and "compare to" page's
+ * node id.
+ *
+ * Returns:
+ * -1 - not misplaced, page is in the right node
+ * node - node id where the page should be
+ *
+ * Policy determination "mimics" alloc_page_vma().
+ * Called from fault path where we know the vma and faulting address.
+ */
+int mpol_misplaced(struct page *page, struct vm_area_struct *vma, unsigned long addr)
+{
+ struct mempolicy *pol;
+ struct zone *zone;
+ int curnid = page_to_nid(page);
+ unsigned long pgoff;
+ int polnid = -1;
+ int ret = -1;
+
+ BUG_ON(!vma);
+
+ pol = get_vma_policy(current, vma, addr);
+ if (!(pol->flags & MPOL_F_MOF))
+ goto out;
+
+ switch (pol->mode) {
+ case MPOL_INTERLEAVE:
+ BUG_ON(addr >= vma->vm_end);
+ BUG_ON(addr < vma->vm_start);
+
+ pgoff = vma->vm_pgoff;
+ pgoff += (addr - vma->vm_start) >> PAGE_SHIFT;
+ polnid = offset_il_node(pol, vma, pgoff);
+ break;
+
+ case MPOL_PREFERRED:
+ if (pol->flags & MPOL_F_LOCAL)
+ polnid = numa_node_id();
+ else
+ polnid = pol->v.preferred_node;
+ break;
+
+ case MPOL_BIND:
+ /*
+ * allows binding to multiple nodes.
+ * use current page if in policy nodemask,
+ * else select nearest allowed node, if any.
+ * If no allowed nodes, use current [!misplaced].
+ */
+ if (node_isset(curnid, pol->v.nodes))
+ goto out;
+ (void)first_zones_zonelist(
+ node_zonelist(numa_node_id(), GFP_HIGHUSER),
+ gfp_zone(GFP_HIGHUSER),
+ &pol->v.nodes, &zone);
+ polnid = zone->node;
+ break;
+
+ default:
+ BUG();
+ }
+ if (curnid != polnid)
+ ret = polnid;
+out:
+ mpol_cond_put(pol);
+
+ return ret;
+}
+
static void sp_delete(struct shared_policy *sp, struct sp_node *n)
{
pr_debug("deleting %lx-l%lx\n", n->start, n->end);
--
1.7.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Mel Gorman
2012-12-07 10:30:01 UTC
Permalink
From: Rik van Riel <***@redhat.com>

The function ptep_set_access_flags() is only ever invoked to set access
flags or add write permission on a PTE. The write bit is only ever set
together with the dirty bit.

Because we only ever upgrade a PTE, it is safe to skip flushing entries on
remote TLBs. The worst that can happen is a spurious page fault on other
CPUs, which would flush that TLB entry.

Lazily letting another CPU incur a spurious page fault occasionally is
(much!) cheaper than aggressively flushing everybody else's TLB.

Signed-off-by: Rik van Riel <***@redhat.com>
Cc: Linus Torvalds <***@linux-foundation.org>
Cc: Andrew Morton <***@linux-foundation.org>
Cc: Peter Zijlstra <***@chello.nl>
Cc: Michel Lespinasse <***@google.com>
Cc: Ingo Molnar <***@kernel.org>
---
arch/x86/mm/pgtable.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 8573b83..be3bb46 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -301,6 +301,13 @@ void pgd_free(struct mm_struct *mm, pgd_t *pgd)
free_page((unsigned long)pgd);
}

+/*
+ * Used to set accessed or dirty bits in the page table entries
+ * on other architectures. On x86, the accessed and dirty bits
+ * are tracked by hardware. However, do_wp_page calls this function
+ * to also make the pte writeable at the same time the dirty bit is
+ * set. In that case we do actually need to write the PTE.
+ */
int ptep_set_access_flags(struct vm_area_struct *vma,
unsigned long address, pte_t *ptep,
pte_t entry, int dirty)
@@ -310,7 +317,7 @@ int ptep_set_access_flags(struct vm_area_struct *vma,
if (changed && dirty) {
*ptep = entry;
pte_update_defer(vma->vm_mm, address, ptep);
- flush_tlb_page(vma, address);
+ __flush_tlb_one(address);
}

return changed;
--
1.7.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Mel Gorman
2012-12-07 10:30:01 UTC
Permalink
Note: This two-stage filter was taken directly from the sched/numa patch
"sched, numa, mm: Add the scanning page fault machinery" but is
only a partial extraction. As the end result is not necessarily
recognisable, the signed-offs-by had to be removed. Will be added
back if requested.

While it is desirable that all threads in a process run on its home
node, this is not always possible or necessary. There may be more
threads than exist within the node or the node might over-subscribed
with unrelated processes.

This can cause a situation whereby a page gets migrated off its home
node because the threads clearing pte_numa were running off-node. This
patch uses page->last_nid to build a two-stage filter before pages get
migrated to avoid problems with short or unlikely task<->node
relationships.

Signed-off-by: Mel Gorman <***@suse.de>
---
mm/mempolicy.c | 30 +++++++++++++++++++++++++++++-
1 file changed, 29 insertions(+), 1 deletion(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 4c1c8d8..fd20e28 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -2317,9 +2317,37 @@ int mpol_misplaced(struct page *page, struct vm_area_struct *vma, unsigned long
}

/* Migrate the page towards the node whose CPU is referencing it */
- if (pol->flags & MPOL_F_MORON)
+ if (pol->flags & MPOL_F_MORON) {
+ int last_nid;
+
polnid = numa_node_id();

+ /*
+ * Multi-stage node selection is used in conjunction
+ * with a periodic migration fault to build a temporal
+ * task<->page relation. By using a two-stage filter we
+ * remove short/unlikely relations.
+ *
+ * Using P(p) ~ n_p / n_t as per frequentist
+ * probability, we can equate a task's usage of a
+ * particular page (n_p) per total usage of this
+ * page (n_t) (in a given time-span) to a probability.
+ *
+ * Our periodic faults will sample this probability and
+ * getting the same result twice in a row, given these
+ * samples are fully independent, is then given by
+ * P(n)^2, provided our sample period is sufficiently
+ * short compared to the usage pattern.
+ *
+ * This quadric squishes small probabilities, making
+ * it less likely we act on an unlikely task<->page
+ * relation.
+ */
+ last_nid = page_xchg_last_nid(page, polnid);
+ if (last_nid != polnid)
+ goto out;
+ }
+
if (curnid != polnid)
ret = polnid;
out:
--
1.7.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Mel Gorman
2012-12-07 10:30:01 UTC
Permalink
Due to the fact that migrations are driven by the CPU a task is running
on there is no point tracking NUMA faults until one task runs on a new
node. This patch tracks the first node used by an address space. Until
it changes, PTE scanning is disabled and no NUMA hinting faults are
trapped. This should help workloads that are short-lived, do not care
about NUMA placement or have bound themselves to a single node.

This takes advantage of the logic in "mm: sched: numa: Implement slow
start for working set sampling" to delay when the checks are made. This
will take advantage of processes that set their CPU and node bindings
early in their lifetime. It will also potentially allow any initial load
balancing to take place.

Signed-off-by: Mel Gorman <***@suse.de>
---
include/linux/mm_types.h | 10 ++++++++++
kernel/fork.c | 3 +++
kernel/sched/fair.c | 18 ++++++++++++++++++
kernel/sched/features.h | 4 +++-
4 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 62d18a9..e4551c1 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -418,10 +418,20 @@ struct mm_struct {

/* numa_scan_seq prevents two threads setting pte_numa */
int numa_scan_seq;
+
+ /*
+ * The first node a task was scheduled on. If a task runs on
+ * a different node than Make PTE Scan Go Now.
+ */
+ int first_nid;
#endif
struct uprobes_state uprobes_state;
};

+/* first nid will either be a valid NID or one of these values */
+#define NUMA_PTE_SCAN_INIT -1
+#define NUMA_PTE_SCAN_ACTIVE -2
+
static inline void mm_init_cpumask(struct mm_struct *mm)
{
#ifdef CONFIG_CPUMASK_OFFSTACK
diff --git a/kernel/fork.c b/kernel/fork.c
index 8b20ab7..e39111a 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -821,6 +821,9 @@ struct mm_struct *dup_mm(struct task_struct *tsk)
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
mm->pmd_huge_pte = NULL;
#endif
+#ifdef CONFIG_BALANCE_NUMA
+ mm->first_nid = NUMA_PTE_SCAN_INIT;
+#endif
if (!mm_init(mm, tsk))
goto fail_nomem;

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index b4bc459..fd9c78c 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -861,6 +861,24 @@ void task_numa_work(struct callback_head *work)
return;

/*
+ * We do not care about task placement until a task runs on a node
+ * other than the first one used by the address space. This is
+ * largely because migrations are driven by what CPU the task
+ * is running on. If it's never scheduled on another node, it'll
+ * not migrate so why bother trapping the fault.
+ */
+ if (mm->first_nid == NUMA_PTE_SCAN_INIT)
+ mm->first_nid = numa_node_id();
+ if (mm->first_nid != NUMA_PTE_SCAN_ACTIVE) {
+ /* Are we running on a new node yet? */
+ if (numa_node_id() == mm->first_nid &&
+ !sched_feat_numa(NUMA_FORCE))
+ return;
+
+ mm->first_nid = NUMA_PTE_SCAN_ACTIVE;
+ }
+
+ /*
* Reset the scan period if enough time has gone by. Objective is that
* scanning will be reduced if pages are properly placed. As tasks
* can enter different phases this needs to be re-examined. Lacking
diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index d402368..c3c86fd 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -65,8 +65,10 @@ SCHED_FEAT(LB_MIN, false)
/*
* Apply the automatic NUMA scheduling policy. Enabled automatically
* at runtime if running on a NUMA machine. Can be controlled via
- * balancenuma=
+ * balancenuma=. Allow PTE scanning to be forced on UMA machines
+ * for debugging the core machinery.
*/
#ifdef CONFIG_BALANCE_NUMA
SCHED_FEAT(NUMA, false)
+SCHED_FEAT(NUMA_FORCE, false)
#endif
--
1.7.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Mel Gorman
2012-12-07 10:30:01 UTC
Permalink
If there are a large number of NUMA hinting faults and all of them
are resulting in migrations it may indicate that memory is just
bouncing uselessly around. NUMA balancing cost is likely exceeding
any benefit from locality. Rate limit the PTE updates if the node
is migration rate-limited. As noted in the comments, this distorts
the NUMA faulting statistics.

Signed-off-by: Mel Gorman <***@suse.de>
---
include/linux/migrate.h | 6 ++++++
kernel/sched/fair.c | 9 +++++++++
mm/migrate.c | 20 ++++++++++++++++++++
3 files changed, 35 insertions(+)

diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index 2923135..6229177 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -77,11 +77,17 @@ static inline int migrate_huge_page_move_mapping(struct address_space *mapping,

#ifdef CONFIG_BALANCE_NUMA
extern int migrate_misplaced_page(struct page *page, int node);
+extern int migrate_misplaced_page(struct page *page, int node);
+extern bool migrate_ratelimited(int node);
#else
static inline int migrate_misplaced_page(struct page *page, int node)
{
return -EAGAIN; /* can't migrate now */
}
+static inline bool migrate_ratelimited(int node)
+{
+ return false;
+}
#endif /* CONFIG_BALANCE_NUMA */

#endif /* _LINUX_MIGRATE_H */
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 2e65f44..357057c 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -27,6 +27,7 @@
#include <linux/profile.h>
#include <linux/interrupt.h>
#include <linux/mempolicy.h>
+#include <linux/migrate.h>
#include <linux/task_work.h>

#include <trace/events/sched.h>
@@ -861,6 +862,14 @@ void task_numa_work(struct callback_head *work)
if (cmpxchg(&mm->numa_next_scan, migrate, next_scan) != migrate)
return;

+ /*
+ * Do not set pte_numa if the current running node is rate-limited.
+ * This loses statistics on the fault but if we are unwilling to
+ * migrate to this node, it is less likely we can do useful work
+ */
+ if (migrate_ratelimited(numa_node_id()))
+ return;
+
start = mm->numa_scan_offset;
pages = sysctl_balance_numa_scan_size;
pages <<= 20 - PAGE_SHIFT; /* MB in pages */
diff --git a/mm/migrate.c b/mm/migrate.c
index b2e6d4c..2c8310c 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1464,10 +1464,30 @@ static struct page *alloc_misplaced_dst_page(struct page *page,
* page migration rate limiting control.
* Do not migrate more than @pages_to_migrate in a @migrate_interval_millisecs
* window of time. Default here says do not migrate more than 1280M per second.
+ * If a node is rate-limited then PTE NUMA updates are also rate-limited. However
+ * as it is faults that reset the window, pte updates will happen unconditionally
+ * if there has not been a fault since @pteupdate_interval_millisecs after the
+ * throttle window closed.
*/
static unsigned int migrate_interval_millisecs __read_mostly = 100;
+static unsigned int pteupdate_interval_millisecs __read_mostly = 1000;
static unsigned int ratelimit_pages __read_mostly = 128 << (20 - PAGE_SHIFT);

+/* Returns true if NUMA migration is currently rate limited */
+bool migrate_ratelimited(int node)
+{
+ pg_data_t *pgdat = NODE_DATA(node);
+
+ if (time_after(jiffies, pgdat->balancenuma_migrate_next_window +
+ msecs_to_jiffies(pteupdate_interval_millisecs)))
+ return false;
+
+ if (pgdat->balancenuma_migrate_nr_pages < ratelimit_pages)
+ return false;
+
+ return true;
+}
+
/*
* Attempt to migrate a misplaced page to the specified destination
* node. Caller is expected to have an elevated reference count on
--
1.7.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Mel Gorman
2012-12-07 10:30:02 UTC
Permalink
Subject says it all. Allocation failures and a failure to isolate should
be accounted as a migration failure. This is partially another
difference between base page and transhuge page migration. A base page
migration makes multiple attempts for these conditions before it would
be accounted for as a failure.

Signed-off-by: Mel Gorman <***@suse.de>
---
mm/migrate.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index b6fe2d2..eb155c9 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1635,12 +1635,15 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,

new_page = alloc_pages_node(node,
(GFP_TRANSHUGE | GFP_THISNODE) & ~__GFP_WAIT, HPAGE_PMD_ORDER);
- if (!new_page)
+ if (!new_page) {
+ count_vm_events(PGMIGRATE_FAIL, HPAGE_PMD_NR);
goto out_dropref;
+ }
page_xchg_last_nid(new_page, page_last_nid(page));

isolated = numamigrate_isolate_page(pgdat, page);
if (!isolated) {
+ count_vm_events(PGMIGRATE_FAIL, HPAGE_PMD_NR);
put_page(new_page);
goto out_keep_locked;
}
--
1.7.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Mel Gorman
2012-12-07 10:30:01 UTC
Permalink
From: Ingo Molnar <***@kernel.org>

rmap_walk_anon() and try_to_unmap_anon() appears to be too
careful about locking the anon vma: while it needs protection
against anon vma list modifications, it does not need exclusive
access to the list itself.

Transforming this exclusive lock to a read-locked rwsem removes
a global lock from the hot path of page-migration intense
threaded workloads which can cause pathological performance like
this:

96.43% process 0 [kernel.kallsyms] [k] perf_trace_sched_switch
|
--- perf_trace_sched_switch
__schedule
schedule
schedule_preempt_disabled
__mutex_lock_common.isra.6
__mutex_lock_slowpath
mutex_lock
|
|--50.61%-- rmap_walk
| move_to_new_page
| migrate_pages
| migrate_misplaced_page
| __do_numa_page.isra.69
| handle_pte_fault
| handle_mm_fault
| __do_page_fault
| do_page_fault
| page_fault
| __memset_sse2
| |
| --100.00%-- worker_thread
| |
| --100.00%-- start_thread
|
--49.39%-- page_lock_anon_vma
try_to_unmap_anon
try_to_unmap
migrate_pages
migrate_misplaced_page
__do_numa_page.isra.69
handle_pte_fault
handle_mm_fault
__do_page_fault
do_page_fault
page_fault
__memset_sse2
|
--100.00%-- worker_thread
start_thread

With this change applied the profile is now nicely flat
and there's no anon-vma related scheduling/blocking.

Rename anon_vma_[un]lock() => anon_vma_[un]lock_write(),
to make it clearer that it's an exclusive write-lock in
that case - suggested by Rik van Riel.

Suggested-by: Linus Torvalds <***@linux-foundation.org>
Cc: Peter Zijlstra <***@chello.nl>
Cc: Paul Turner <***@google.com>
Cc: Lee Schermerhorn <***@hp.com>
Cc: Christoph Lameter <***@linux.com>
Cc: Rik van Riel <***@redhat.com>
Cc: Mel Gorman <***@suse.de>
Cc: Andrea Arcangeli <***@redhat.com>
Cc: Johannes Weiner <***@cmpxchg.org>
Cc: Hugh Dickins <***@google.com>
Signed-off-by: Ingo Molnar <***@kernel.org>
Signed-off-by: Mel Gorman <***@suse.de>
---
include/linux/huge_mm.h | 2 +-
include/linux/rmap.h | 17 ++++++++++++++---
mm/huge_memory.c | 6 +++---
mm/ksm.c | 6 +++---
mm/memory-failure.c | 4 ++--
mm/migrate.c | 2 +-
mm/mmap.c | 2 +-
mm/mremap.c | 2 +-
mm/rmap.c | 48 +++++++++++++++++++++++------------------------
9 files changed, 50 insertions(+), 39 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 027ad04..0d1208c 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -102,7 +102,7 @@ extern void __split_huge_page_pmd(struct mm_struct *mm, pmd_t *pmd);
#define wait_split_huge_page(__anon_vma, __pmd) \
do { \
pmd_t *____pmd = (__pmd); \
- anon_vma_lock(__anon_vma); \
+ anon_vma_lock_write(__anon_vma); \
anon_vma_unlock(__anon_vma); \
BUG_ON(pmd_trans_splitting(*____pmd) || \
pmd_trans_huge(*____pmd)); \
diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index f3f41d2..c20635c 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -118,7 +118,7 @@ static inline void vma_unlock_anon_vma(struct vm_area_struct *vma)
up_write(&anon_vma->root->rwsem);
}

-static inline void anon_vma_lock(struct anon_vma *anon_vma)
+static inline void anon_vma_lock_write(struct anon_vma *anon_vma)
{
down_write(&anon_vma->root->rwsem);
}
@@ -128,6 +128,17 @@ static inline void anon_vma_unlock(struct anon_vma *anon_vma)
up_write(&anon_vma->root->rwsem);
}

+static inline void anon_vma_lock_read(struct anon_vma *anon_vma)
+{
+ down_read(&anon_vma->root->rwsem);
+}
+
+static inline void anon_vma_unlock_read(struct anon_vma *anon_vma)
+{
+ up_read(&anon_vma->root->rwsem);
+}
+
+
/*
* anon_vma helper functions.
*/
@@ -220,8 +231,8 @@ int try_to_munlock(struct page *);
/*
* Called by memory-failure.c to kill processes.
*/
-struct anon_vma *page_lock_anon_vma(struct page *page);
-void page_unlock_anon_vma(struct anon_vma *anon_vma);
+struct anon_vma *page_lock_anon_vma_read(struct page *page);
+void page_unlock_anon_vma_read(struct anon_vma *anon_vma);
int page_mapped_in_vma(struct page *page, struct vm_area_struct *vma);

/*
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index f0c4928..409b2f3 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1548,7 +1548,7 @@ int split_huge_page(struct page *page)
int ret = 1;

BUG_ON(!PageAnon(page));
- anon_vma = page_lock_anon_vma(page);
+ anon_vma = page_lock_anon_vma_read(page);
if (!anon_vma)
goto out;
ret = 0;
@@ -1561,7 +1561,7 @@ int split_huge_page(struct page *page)

BUG_ON(PageCompound(page));
out_unlock:
- page_unlock_anon_vma(anon_vma);
+ page_unlock_anon_vma_read(anon_vma);
out:
return ret;
}
@@ -2073,7 +2073,7 @@ static void collapse_huge_page(struct mm_struct *mm,
if (!pmd_present(*pmd) || pmd_trans_huge(*pmd))
goto out;

- anon_vma_lock(vma->anon_vma);
+ anon_vma_lock_write(vma->anon_vma);

pte = pte_offset_map(pmd, address);
ptl = pte_lockptr(mm, pmd);
diff --git a/mm/ksm.c b/mm/ksm.c
index ae539f0..7fa37de 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -1634,7 +1634,7 @@ again:
struct anon_vma_chain *vmac;
struct vm_area_struct *vma;

- anon_vma_lock(anon_vma);
+ anon_vma_lock_write(anon_vma);
anon_vma_interval_tree_foreach(vmac, &anon_vma->rb_root,
0, ULONG_MAX) {
vma = vmac->vma;
@@ -1688,7 +1688,7 @@ again:
struct anon_vma_chain *vmac;
struct vm_area_struct *vma;

- anon_vma_lock(anon_vma);
+ anon_vma_lock_write(anon_vma);
anon_vma_interval_tree_foreach(vmac, &anon_vma->rb_root,
0, ULONG_MAX) {
vma = vmac->vma;
@@ -1741,7 +1741,7 @@ again:
struct anon_vma_chain *vmac;
struct vm_area_struct *vma;

- anon_vma_lock(anon_vma);
+ anon_vma_lock_write(anon_vma);
anon_vma_interval_tree_foreach(vmac, &anon_vma->rb_root,
0, ULONG_MAX) {
vma = vmac->vma;
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index ddb68a1..f2cd830 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -402,7 +402,7 @@ static void collect_procs_anon(struct page *page, struct list_head *to_kill,
struct anon_vma *av;
pgoff_t pgoff;

- av = page_lock_anon_vma(page);
+ av = page_lock_anon_vma_read(page);
if (av == NULL) /* Not actually mapped anymore */
return;

@@ -423,7 +423,7 @@ static void collect_procs_anon(struct page *page, struct list_head *to_kill,
}
}
read_unlock(&tasklist_lock);
- page_unlock_anon_vma(av);
+ page_unlock_anon_vma_read(av);
}

/*
diff --git a/mm/migrate.c b/mm/migrate.c
index 6b6567f..da2001b 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -754,7 +754,7 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
*/
if (PageAnon(page)) {
/*
- * Only page_lock_anon_vma() understands the subtleties of
+ * Only page_lock_anon_vma_read() understands the subtleties of
* getting a hold on an anon_vma from outside one of its mms.
*/
anon_vma = page_get_anon_vma(page);
diff --git a/mm/mmap.c b/mm/mmap.c
index 8840863..68a16b4 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -602,7 +602,7 @@ again: remove_next = 1 + (end > next->vm_end);
if (anon_vma) {
VM_BUG_ON(adjust_next && next->anon_vma &&
anon_vma != next->anon_vma);
- anon_vma_lock(anon_vma);
+ anon_vma_lock_write(anon_vma);
anon_vma_interval_tree_pre_update_vma(vma);
if (adjust_next)
anon_vma_interval_tree_pre_update_vma(next);
diff --git a/mm/mremap.c b/mm/mremap.c
index 1b61c2d..3dabd17 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -104,7 +104,7 @@ static void move_ptes(struct vm_area_struct *vma, pmd_t *old_pmd,
}
if (vma->anon_vma) {
anon_vma = vma->anon_vma;
- anon_vma_lock(anon_vma);
+ anon_vma_lock_write(anon_vma);
}
}

diff --git a/mm/rmap.c b/mm/rmap.c
index 6e3ee3b..b0f612d 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -87,24 +87,24 @@ static inline void anon_vma_free(struct anon_vma *anon_vma)
VM_BUG_ON(atomic_read(&anon_vma->refcount));

/*
- * Synchronize against page_lock_anon_vma() such that
+ * Synchronize against page_lock_anon_vma_read() such that
* we can safely hold the lock without the anon_vma getting
* freed.
*
* Relies on the full mb implied by the atomic_dec_and_test() from
* put_anon_vma() against the acquire barrier implied by
- * mutex_trylock() from page_lock_anon_vma(). This orders:
+ * down_read_trylock() from page_lock_anon_vma_read(). This orders:
*
- * page_lock_anon_vma() VS put_anon_vma()
- * mutex_trylock() atomic_dec_and_test()
+ * page_lock_anon_vma_read() VS put_anon_vma()
+ * down_read_trylock() atomic_dec_and_test()
* LOCK MB
- * atomic_read() mutex_is_locked()
+ * atomic_read() rwsem_is_locked()
*
* LOCK should suffice since the actual taking of the lock must
* happen _before_ what follows.
*/
if (rwsem_is_locked(&anon_vma->root->rwsem)) {
- anon_vma_lock(anon_vma);
+ anon_vma_lock_write(anon_vma);
anon_vma_unlock(anon_vma);
}

@@ -146,7 +146,7 @@ static void anon_vma_chain_link(struct vm_area_struct *vma,
* allocate a new one.
*
* Anon-vma allocations are very subtle, because we may have
- * optimistically looked up an anon_vma in page_lock_anon_vma()
+ * optimistically looked up an anon_vma in page_lock_anon_vma_read()
* and that may actually touch the spinlock even in the newly
* allocated vma (it depends on RCU to make sure that the
* anon_vma isn't actually destroyed).
@@ -181,7 +181,7 @@ int anon_vma_prepare(struct vm_area_struct *vma)
allocated = anon_vma;
}

- anon_vma_lock(anon_vma);
+ anon_vma_lock_write(anon_vma);
/* page_table_lock to protect against threads */
spin_lock(&mm->page_table_lock);
if (likely(!vma->anon_vma)) {
@@ -306,7 +306,7 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
get_anon_vma(anon_vma->root);
/* Mark this anon_vma as the one where our new (COWed) pages go. */
vma->anon_vma = anon_vma;
- anon_vma_lock(anon_vma);
+ anon_vma_lock_write(anon_vma);
anon_vma_chain_link(vma, avc, anon_vma);
anon_vma_unlock(anon_vma);

@@ -442,7 +442,7 @@ out:
* atomic op -- the trylock. If we fail the trylock, we fall back to getting a
* reference like with page_get_anon_vma() and then block on the mutex.
*/
-struct anon_vma *page_lock_anon_vma(struct page *page)
+struct anon_vma *page_lock_anon_vma_read(struct page *page)
{
struct anon_vma *anon_vma = NULL;
struct anon_vma *root_anon_vma;
@@ -457,14 +457,14 @@ struct anon_vma *page_lock_anon_vma(struct page *page)

anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON);
root_anon_vma = ACCESS_ONCE(anon_vma->root);
- if (down_write_trylock(&root_anon_vma->rwsem)) {
+ if (down_read_trylock(&root_anon_vma->rwsem)) {
/*
* If the page is still mapped, then this anon_vma is still
* its anon_vma, and holding the mutex ensures that it will
* not go away, see anon_vma_free().
*/
if (!page_mapped(page)) {
- up_write(&root_anon_vma->rwsem);
+ up_read(&root_anon_vma->rwsem);
anon_vma = NULL;
}
goto out;
@@ -484,15 +484,15 @@ struct anon_vma *page_lock_anon_vma(struct page *page)

/* we pinned the anon_vma, its safe to sleep */
rcu_read_unlock();
- anon_vma_lock(anon_vma);
+ anon_vma_lock_read(anon_vma);

if (atomic_dec_and_test(&anon_vma->refcount)) {
/*
* Oops, we held the last refcount, release the lock
* and bail -- can't simply use put_anon_vma() because
- * we'll deadlock on the anon_vma_lock() recursion.
+ * we'll deadlock on the anon_vma_lock_write() recursion.
*/
- anon_vma_unlock(anon_vma);
+ anon_vma_unlock_read(anon_vma);
__put_anon_vma(anon_vma);
anon_vma = NULL;
}
@@ -504,9 +504,9 @@ out:
return anon_vma;
}

-void page_unlock_anon_vma(struct anon_vma *anon_vma)
+void page_unlock_anon_vma_read(struct anon_vma *anon_vma)
{
- anon_vma_unlock(anon_vma);
+ anon_vma_unlock_read(anon_vma);
}

/*
@@ -732,7 +732,7 @@ static int page_referenced_anon(struct page *page,
struct anon_vma_chain *avc;
int referenced = 0;

- anon_vma = page_lock_anon_vma(page);
+ anon_vma = page_lock_anon_vma_read(page);
if (!anon_vma)
return referenced;

@@ -754,7 +754,7 @@ static int page_referenced_anon(struct page *page,
break;
}

- page_unlock_anon_vma(anon_vma);
+ page_unlock_anon_vma_read(anon_vma);
return referenced;
}

@@ -1474,7 +1474,7 @@ static int try_to_unmap_anon(struct page *page, enum ttu_flags flags)
struct anon_vma_chain *avc;
int ret = SWAP_AGAIN;

- anon_vma = page_lock_anon_vma(page);
+ anon_vma = page_lock_anon_vma_read(page);
if (!anon_vma)
return ret;

@@ -1501,7 +1501,7 @@ static int try_to_unmap_anon(struct page *page, enum ttu_flags flags)
break;
}

- page_unlock_anon_vma(anon_vma);
+ page_unlock_anon_vma_read(anon_vma);
return ret;
}

@@ -1696,7 +1696,7 @@ static int rmap_walk_anon(struct page *page, int (*rmap_one)(struct page *,
int ret = SWAP_AGAIN;

/*
- * Note: remove_migration_ptes() cannot use page_lock_anon_vma()
+ * Note: remove_migration_ptes() cannot use page_lock_anon_vma_read()
* because that depends on page_mapped(); but not all its usages
* are holding mmap_sem. Users without mmap_sem are required to
* take a reference count to prevent the anon_vma disappearing
@@ -1704,7 +1704,7 @@ static int rmap_walk_anon(struct page *page, int (*rmap_one)(struct page *,
anon_vma = page_anon_vma(page);
if (!anon_vma)
return ret;
- anon_vma_lock(anon_vma);
+ anon_vma_lock_read(anon_vma);
anon_vma_interval_tree_foreach(avc, &anon_vma->rb_root, pgoff, pgoff) {
struct vm_area_struct *vma = avc->vma;
unsigned long address = vma_address(page, vma);
@@ -1712,7 +1712,7 @@ static int rmap_walk_anon(struct page *page, int (*rmap_one)(struct page *,
if (ret != SWAP_AGAIN)
break;
}
- anon_vma_unlock(anon_vma);
+ anon_vma_unlock_read(anon_vma);
return ret;
}
--
1.7.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Mel Gorman
2012-12-07 10:30:02 UTC
Permalink
From: Rik van Riel <***@redhat.com>

Intel has an architectural guarantee that the TLB entry causing
a page fault gets invalidated automatically. This means
we should be able to drop the local TLB invalidation.

Because of the way other areas of the page fault code work,
chances are good that all x86 CPUs do this. However, if
someone somewhere has an x86 CPU that does not invalidate
the TLB entry causing a page fault, this one-liner should
be easy to revert.

Signed-off-by: Rik van Riel <***@redhat.com>
Cc: Linus Torvalds <***@kernel.org>
Cc: Andrew Morton <***@linux-foundation.org>
Cc: Michel Lespinasse <***@google.com>
Cc: Peter Zijlstra <***@infradead.org>
Cc: Ingo Molnar <***@redhat.com>
---
arch/x86/mm/pgtable.c | 1 -
1 file changed, 1 deletion(-)

diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index be3bb46..7353de3 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -317,7 +317,6 @@ int ptep_set_access_flags(struct vm_area_struct *vma,
if (changed && dirty) {
*ptep = entry;
pte_update_defer(vma->vm_mm, address, ptep);
- __flush_tlb_one(address);
}

return changed;
--
1.7.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Mel Gorman
2012-12-07 10:30:02 UTC
Permalink
This patch adds Kconfig options and kernel parameters to allow the
enabling and disabling of automatic NUMA balancing. The existance
of such a switch was and is very important when debugging problems
related to transparent hugepages and we should have the same for
automatic NUMA placement.

Signed-off-by: Mel Gorman <***@suse.de>
---
Documentation/kernel-parameters.txt | 3 +++
include/linux/sched.h | 4 +++
init/Kconfig | 8 ++++++
kernel/sched/core.c | 48 ++++++++++++++++++++++++-----------
kernel/sched/fair.c | 3 +++
kernel/sched/features.h | 6 +++--
mm/mempolicy.c | 46 +++++++++++++++++++++++++++++++++
7 files changed, 101 insertions(+), 17 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 9776f06..d984acb 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -403,6 +403,9 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
atkbd.softrepeat= [HW]
Use software keyboard repeat

+ balancenuma= [KNL,X86] Enable or disable automatic NUMA balancing.
+ Allowed values are enable and disable
+
baycom_epp= [HW,AX25]
Format: <io>,<mode>

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 1068afd..2669bdd 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1563,10 +1563,14 @@ struct task_struct {

#ifdef CONFIG_BALANCE_NUMA
extern void task_numa_fault(int node, int pages, bool migrated);
+extern void set_balancenuma_state(bool enabled);
#else
static inline void task_numa_fault(int node, int pages, bool migrated)
{
}
+static inline void set_balancenuma_state(bool enabled)
+{
+}
#endif

/*
diff --git a/init/Kconfig b/init/Kconfig
index 6897a05..4cccc00f 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -720,6 +720,14 @@ config ARCH_USES_NUMA_PROT_NONE
depends on ARCH_WANTS_PROT_NUMA_PROT_NONE
depends on BALANCE_NUMA

+config BALANCE_NUMA_DEFAULT_ENABLED
+ bool "Automatically enable NUMA aware memory/task placement"
+ default y
+ depends on BALANCE_NUMA
+ help
+ If set, autonumic NUMA balancing will be enabled if running on a NUMA
+ machine.
+
config BALANCE_NUMA
bool "Memory placement aware NUMA scheduler"
default n
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index a59d869..4841f4f 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -192,23 +192,10 @@ static void sched_feat_disable(int i) { };
static void sched_feat_enable(int i) { };
#endif /* HAVE_JUMP_LABEL */

-static ssize_t
-sched_feat_write(struct file *filp, const char __user *ubuf,
- size_t cnt, loff_t *ppos)
+static int sched_feat_set(char *cmp)
{
- char buf[64];
- char *cmp;
- int neg = 0;
int i;
-
- if (cnt > 63)
- cnt = 63;
-
- if (copy_from_user(&buf, ubuf, cnt))
- return -EFAULT;
-
- buf[cnt] = 0;
- cmp = strstrip(buf);
+ int neg = 0;

if (strncmp(cmp, "NO_", 3) == 0) {
neg = 1;
@@ -228,6 +215,27 @@ sched_feat_write(struct file *filp, const char __user *ubuf,
}
}

+ return i;
+}
+
+static ssize_t
+sched_feat_write(struct file *filp, const char __user *ubuf,
+ size_t cnt, loff_t *ppos)
+{
+ char buf[64];
+ char *cmp;
+ int i;
+
+ if (cnt > 63)
+ cnt = 63;
+
+ if (copy_from_user(&buf, ubuf, cnt))
+ return -EFAULT;
+
+ buf[cnt] = 0;
+ cmp = strstrip(buf);
+
+ i = sched_feat_set(cmp);
if (i == __SCHED_FEAT_NR)
return -EINVAL;

@@ -1549,6 +1557,16 @@ static void __sched_fork(struct task_struct *p)
#endif /* CONFIG_BALANCE_NUMA */
}

+#ifdef CONFIG_BALANCE_NUMA
+void set_balancenuma_state(bool enabled)
+{
+ if (enabled)
+ sched_feat_set("NUMA");
+ else
+ sched_feat_set("NO_NUMA");
+}
+#endif /* CONFIG_BALANCE_NUMA */
+
/*
* fork()/clone()-time setup:
*/
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c1be907..b4bc459 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -811,6 +811,9 @@ void task_numa_fault(int node, int pages, bool migrated)
{
struct task_struct *p = current;

+ if (!sched_feat_numa(NUMA))
+ return;
+
/* FIXME: Allocate task-specific structure for placement policy here */

/*
diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index 7cfd289..d402368 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -63,8 +63,10 @@ SCHED_FEAT(RT_RUNTIME_SHARE, true)
SCHED_FEAT(LB_MIN, false)

/*
- * Apply the automatic NUMA scheduling policy
+ * Apply the automatic NUMA scheduling policy. Enabled automatically
+ * at runtime if running on a NUMA machine. Can be controlled via
+ * balancenuma=
*/
#ifdef CONFIG_BALANCE_NUMA
-SCHED_FEAT(NUMA, true)
+SCHED_FEAT(NUMA, false)
#endif
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index fd20e28..56ad9bf 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -2521,6 +2521,50 @@ void mpol_free_shared_policy(struct shared_policy *p)
mutex_unlock(&p->mutex);
}

+#ifdef CONFIG_BALANCE_NUMA
+static bool __initdata balancenuma_override;
+
+static void __init check_balancenuma_enable(void)
+{
+ bool balancenuma_default = false;
+
+ if (IS_ENABLED(CONFIG_BALANCE_NUMA_DEFAULT_ENABLED))
+ balancenuma_default = true;
+
+ if (nr_node_ids > 1 && !balancenuma_override) {
+ printk(KERN_INFO "Enabling automatic NUMA balancing. "
+ "Configure with balancenuma= or sysctl");
+ set_balancenuma_state(balancenuma_default);
+ }
+}
+
+static int __init setup_balancenuma(char *str)
+{
+ int ret = 0;
+ if (!str)
+ goto out;
+ balancenuma_override = true;
+
+ if (!strcmp(str, "enable")) {
+ set_balancenuma_state(true);
+ ret = 1;
+ } else if (!strcmp(str, "disable")) {
+ set_balancenuma_state(false);
+ ret = 1;
+ }
+out:
+ if (!ret)
+ printk(KERN_WARNING "Unable to parse balancenuma=\n");
+
+ return ret;
+}
+__setup("balancenuma=", setup_balancenuma);
+#else
+static inline void __init check_balancenuma_enable(void)
+{
+}
+#endif /* CONFIG_BALANCE_NUMA */
+
/* assumes fs == KERNEL_DS */
void __init numa_policy_init(void)
{
@@ -2571,6 +2615,8 @@ void __init numa_policy_init(void)

if (do_set_mempolicy(MPOL_INTERLEAVE, 0, &interleave_nodes))
printk("numa_policy_init: interleaving failed\n");
+
+ check_balancenuma_enable();
}

/* Reset policy of current process to default */
--
1.7.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Mel Gorman
2012-12-07 10:30:02 UTC
Permalink
Note: This is very heavily based on a patch from Peter Zijlstra with
fixes from Ingo Molnar, Hugh Dickins and Johannes Weiner. That patch
put a lot of migration logic into mm/huge_memory.c where it does
not belong. This version puts tries to share some of the migration
logic with migrate_misplaced_page. However, it should be noted
that now migrate.c is doing more with the pagetable manipulation
than is preferred. The end result is barely recognisable so as
before, the signed-offs had to be removed but will be re-added if
the original authors are ok with it.

Add THP migration for the NUMA working set scanning fault case.

It uses the page lock to serialize. No migration pte dance is
necessary because the pte is already unmapped when we decide
to migrate.

[***@gmail.com: Fix memory leak on isolation failure]
[***@gmail.com: Fix transfer of last_nid information]
Signed-off-by: Mel Gorman <***@suse.de>
---
include/linux/migrate.h | 15 +++
mm/huge_memory.c | 59 ++++++++----
mm/internal.h | 7 +-
mm/memcontrol.c | 7 +-
mm/migrate.c | 231 ++++++++++++++++++++++++++++++++++++++---------
5 files changed, 255 insertions(+), 64 deletions(-)

diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index 6229177..ed5a6c5 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -79,6 +79,12 @@ static inline int migrate_huge_page_move_mapping(struct address_space *mapping,
extern int migrate_misplaced_page(struct page *page, int node);
extern int migrate_misplaced_page(struct page *page, int node);
extern bool migrate_ratelimited(int node);
+extern int migrate_misplaced_transhuge_page(struct mm_struct *mm,
+ struct vm_area_struct *vma,
+ pmd_t *pmd, pmd_t entry,
+ unsigned long address,
+ struct page *page, int node);
+
#else
static inline int migrate_misplaced_page(struct page *page, int node)
{
@@ -88,6 +94,15 @@ static inline bool migrate_ratelimited(int node)
{
return false;
}
+
+static inline int migrate_misplaced_transhuge_page(struct mm_struct *mm,
+ struct vm_area_struct *vma,
+ pmd_t *pmd, pmd_t entry,
+ unsigned long address,
+ struct page *page, int node)
+{
+ return -EAGAIN;
+}
#endif /* CONFIG_BALANCE_NUMA */

#endif /* _LINUX_MIGRATE_H */
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 1327a03..61b66f8 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -600,7 +600,7 @@ out:
}
__setup("transparent_hugepage=", setup_transparent_hugepage);

-static inline pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma)
+pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma)
{
if (likely(vma->vm_flags & VM_WRITE))
pmd = pmd_mkwrite(pmd);
@@ -1022,10 +1022,12 @@ out:
int do_huge_pmd_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
unsigned long addr, pmd_t pmd, pmd_t *pmdp)
{
- struct page *page = NULL;
+ struct page *page;
unsigned long haddr = addr & HPAGE_PMD_MASK;
int target_nid;
int current_nid = -1;
+ bool migrated;
+ bool page_locked = false;

spin_lock(&mm->page_table_lock);
if (unlikely(!pmd_same(pmd, *pmdp)))
@@ -1033,42 +1035,61 @@ int do_huge_pmd_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,

page = pmd_page(pmd);
get_page(page);
- spin_unlock(&mm->page_table_lock);
current_nid = page_to_nid(page);
count_vm_numa_event(NUMA_HINT_FAULTS);
if (current_nid == numa_node_id())
count_vm_numa_event(NUMA_HINT_FAULTS_LOCAL);

target_nid = mpol_misplaced(page, vma, haddr);
- if (target_nid == -1)
+ if (target_nid == -1) {
+ put_page(page);
goto clear_pmdnuma;
+ }

- /*
- * Due to lacking code to migrate thp pages, we'll split
- * (which preserves the special PROT_NONE) and re-take the
- * fault on the normal pages.
- */
- split_huge_page(page);
- put_page(page);
-
- return 0;
+ /* Acquire the page lock to serialise THP migrations */
+ spin_unlock(&mm->page_table_lock);
+ lock_page(page);
+ page_locked = true;

-clear_pmdnuma:
+ /* Confirm the PTE did not while locked */
spin_lock(&mm->page_table_lock);
- if (unlikely(!pmd_same(pmd, *pmdp)))
+ if (unlikely(!pmd_same(pmd, *pmdp))) {
+ unlock_page(page);
+ put_page(page);
goto out_unlock;
+ }
+ spin_unlock(&mm->page_table_lock);
+
+ /* Migrate the THP to the requested node */
+ migrated = migrate_misplaced_transhuge_page(mm, vma,
+ pmdp, pmd, addr,
+ page, target_nid);
+ if (migrated)
+ current_nid = target_nid;
+ else {
+ spin_lock(&mm->page_table_lock);
+ if (unlikely(!pmd_same(pmd, *pmdp))) {
+ unlock_page(page);
+ goto out_unlock;
+ }
+ goto clear_pmdnuma;
+ }
+
+ task_numa_fault(current_nid, HPAGE_PMD_NR, migrated);
+ return 0;

+clear_pmdnuma:
pmd = pmd_mknonnuma(pmd);
set_pmd_at(mm, haddr, pmdp, pmd);
VM_BUG_ON(pmd_numa(*pmdp));
update_mmu_cache_pmd(vma, addr, pmdp);
+ if (page_locked)
+ unlock_page(page);

out_unlock:
spin_unlock(&mm->page_table_lock);
- if (page) {
- put_page(page);
- task_numa_fault(numa_node_id(), HPAGE_PMD_NR, false);
- }
+ if (current_nid != -1)
+ task_numa_fault(current_nid, HPAGE_PMD_NR, migrated);
return 0;
}

diff --git a/mm/internal.h b/mm/internal.h
index a4fa284..7e60ac8 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -212,15 +212,18 @@ static inline void mlock_migrate_page(struct page *newpage, struct page *page)
{
if (TestClearPageMlocked(page)) {
unsigned long flags;
+ int nr_pages = hpage_nr_pages(page);

local_irq_save(flags);
- __dec_zone_page_state(page, NR_MLOCK);
+ __mod_zone_page_state(page_zone(page), NR_MLOCK, -nr_pages);
SetPageMlocked(newpage);
- __inc_zone_page_state(newpage, NR_MLOCK);
+ __mod_zone_page_state(page_zone(newpage), NR_MLOCK, nr_pages);
local_irq_restore(flags);
}
}

+extern pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma);
+
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
extern unsigned long vma_address(struct page *page,
struct vm_area_struct *vma);
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index dd39ba0..d97af96 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3288,15 +3288,18 @@ void mem_cgroup_prepare_migration(struct page *page, struct page *newpage,
struct mem_cgroup **memcgp)
{
struct mem_cgroup *memcg = NULL;
+ unsigned int nr_pages = 1;
struct page_cgroup *pc;
enum charge_type ctype;

*memcgp = NULL;

- VM_BUG_ON(PageTransHuge(page));
if (mem_cgroup_disabled())
return;

+ if (PageTransHuge(page))
+ nr_pages <<= compound_order(page);
+
pc = lookup_page_cgroup(page);
lock_page_cgroup(pc);
if (PageCgroupUsed(pc)) {
@@ -3358,7 +3361,7 @@ void mem_cgroup_prepare_migration(struct page *page, struct page *newpage,
* charged to the res_counter since we plan on replacing the
* old one and only one page is going to be left afterwards.
*/
- __mem_cgroup_commit_charge(memcg, newpage, 1, ctype, false);
+ __mem_cgroup_commit_charge(memcg, newpage, nr_pages, ctype, false);
}

/* remove redundant charge if migration failed*/
diff --git a/mm/migrate.c b/mm/migrate.c
index 6bc9745..4b1b239 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -410,7 +410,7 @@ int migrate_huge_page_move_mapping(struct address_space *mapping,
*/
void migrate_page_copy(struct page *newpage, struct page *page)
{
- if (PageHuge(page))
+ if (PageHuge(page) || PageTransHuge(page))
copy_huge_page(newpage, page);
else
copy_highpage(newpage, page);
@@ -1491,25 +1491,10 @@ bool migrate_ratelimited(int node)
return true;
}

-/*
- * Attempt to migrate a misplaced page to the specified destination
- * node. Caller is expected to have an elevated reference count on
- * the page that will be dropped by this function before returning.
- */
-int migrate_misplaced_page(struct page *page, int node)
+/* Returns true if the node is migrate rate-limited after the update */
+bool numamigrate_update_ratelimit(pg_data_t *pgdat)
{
- pg_data_t *pgdat = NODE_DATA(node);
- int isolated = 0;
- LIST_HEAD(migratepages);
-
- /*
- * Don't migrate pages that are mapped in multiple processes.
- * TODO: Handle false sharing detection instead of this hammer
- */
- if (page_mapcount(page) != 1) {
- put_page(page);
- goto out;
- }
+ bool rate_limited = false;

/*
* Rate-limit the amount of data that is being migrated to a node.
@@ -1522,13 +1507,18 @@ int migrate_misplaced_page(struct page *page, int node)
pgdat->balancenuma_migrate_next_window = jiffies +
msecs_to_jiffies(migrate_interval_millisecs);
}
- if (pgdat->balancenuma_migrate_nr_pages > ratelimit_pages) {
- spin_unlock(&pgdat->balancenuma_migrate_lock);
- put_page(page);
- goto out;
- }
- pgdat->balancenuma_migrate_nr_pages++;
+ if (pgdat->balancenuma_migrate_nr_pages > ratelimit_pages)
+ rate_limited = true;
+ else
+ pgdat->balancenuma_migrate_nr_pages++;
spin_unlock(&pgdat->balancenuma_migrate_lock);
+
+ return rate_limited;
+}
+
+int numamigrate_isolate_page(pg_data_t *pgdat, struct page *page)
+{
+ int ret = 0;

/* Avoid migrating to a node that is nearly full */
if (migrate_balanced_pgdat(pgdat, 1)) {
@@ -1536,13 +1526,18 @@ int migrate_misplaced_page(struct page *page, int node)

if (isolate_lru_page(page)) {
put_page(page);
- goto out;
+ return 0;
}
- isolated = 1;

+ /* Page is isolated */
+ ret = 1;
page_lru = page_is_file_cache(page);
- inc_zone_page_state(page, NR_ISOLATED_ANON + page_lru);
- list_add(&page->lru, &migratepages);
+ if (!PageTransHuge(page))
+ inc_zone_page_state(page, NR_ISOLATED_ANON + page_lru);
+ else
+ mod_zone_page_state(page_zone(page),
+ NR_ISOLATED_ANON + page_lru,
+ HPAGE_PMD_NR);
}

/*
@@ -1555,23 +1550,177 @@ int migrate_misplaced_page(struct page *page, int node)
*/
put_page(page);

- if (isolated) {
- int nr_remaining;
-
- nr_remaining = migrate_pages(&migratepages,
- alloc_misplaced_dst_page,
- node, false, MIGRATE_ASYNC,
- MR_NUMA_MISPLACED);
- if (nr_remaining) {
- putback_lru_pages(&migratepages);
- isolated = 0;
- } else
- count_vm_numa_event(NUMA_PAGE_MIGRATE);
+ return ret;
+}
+
+/*
+ * Attempt to migrate a misplaced page to the specified destination
+ * node. Caller is expected to have an elevated reference count on
+ * the page that will be dropped by this function before returning.
+ */
+int migrate_misplaced_page(struct page *page, int node)
+{
+ pg_data_t *pgdat = NODE_DATA(node);
+ int isolated = 0;
+ int nr_remaining;
+ LIST_HEAD(migratepages);
+
+ /*
+ * Don't migrate pages that are mapped in multiple processes.
+ * TODO: Handle false sharing detection instead of this hammer
+ */
+ if (page_mapcount(page) != 1) {
+ put_page(page);
+ goto out;
}
+
+ /*
+ * Rate-limit the amount of data that is being migrated to a node.
+ * Optimal placement is no good if the memory bus is saturated and
+ * all the time is being spent migrating!
+ */
+ if (numamigrate_update_ratelimit(pgdat)) {
+ put_page(page);
+ goto out;
+ }
+
+ isolated = numamigrate_isolate_page(pgdat, page);
+ if (!isolated)
+ goto out;
+
+ list_add(&page->lru, &migratepages);
+ nr_remaining = migrate_pages(&migratepages,
+ alloc_misplaced_dst_page,
+ node, false, MIGRATE_ASYNC,
+ MR_NUMA_MISPLACED);
+ if (nr_remaining) {
+ putback_lru_pages(&migratepages);
+ isolated = 0;
+ } else
+ count_vm_numa_event(NUMA_PAGE_MIGRATE);
BUG_ON(!list_empty(&migratepages));
out:
return isolated;
}
+
+int migrate_misplaced_transhuge_page(struct mm_struct *mm,
+ struct vm_area_struct *vma,
+ pmd_t *pmd, pmd_t entry,
+ unsigned long address,
+ struct page *page, int node)
+{
+ unsigned long haddr = address & HPAGE_PMD_MASK;
+ pg_data_t *pgdat = NODE_DATA(node);
+ int isolated = 0;
+ struct page *new_page = NULL;
+ struct mem_cgroup *memcg = NULL;
+ int page_lru = page_is_file_cache(page);
+
+ /*
+ * Don't migrate pages that are mapped in multiple processes.
+ * TODO: Handle false sharing detection instead of this hammer
+ */
+ if (page_mapcount(page) != 1)
+ goto out_dropref;
+
+ /*
+ * Rate-limit the amount of data that is being migrated to a node.
+ * Optimal placement is no good if the memory bus is saturated and
+ * all the time is being spent migrating!
+ */
+ if (numamigrate_update_ratelimit(pgdat))
+ goto out_dropref;
+
+ new_page = alloc_pages_node(node,
+ (GFP_TRANSHUGE | GFP_THISNODE) & ~__GFP_WAIT, HPAGE_PMD_ORDER);
+ if (!new_page)
+ goto out_dropref;
+ page_xchg_last_nid(new_page, page_last_nid(page));
+
+ isolated = numamigrate_isolate_page(pgdat, page);
+ if (!isolated) {
+ put_page(new_page);
+ goto out_keep_locked;
+ }
+
+ /* Prepare a page as a migration target */
+ __set_page_locked(new_page);
+ SetPageSwapBacked(new_page);
+
+ /* anon mapping, we can simply copy page->mapping to the new page: */
+ new_page->mapping = page->mapping;
+ new_page->index = page->index;
+ migrate_page_copy(new_page, page);
+ WARN_ON(PageLRU(new_page));
+
+ /* Recheck the target PMD */
+ spin_lock(&mm->page_table_lock);
+ if (unlikely(!pmd_same(*pmd, entry))) {
+ spin_unlock(&mm->page_table_lock);
+
+ /* Reverse changes made by migrate_page_copy() */
+ if (TestClearPageActive(new_page))
+ SetPageActive(page);
+ if (TestClearPageUnevictable(new_page))
+ SetPageUnevictable(page);
+ mlock_migrate_page(page, new_page);
+
+ unlock_page(new_page);
+ put_page(new_page); /* Free it */
+
+ unlock_page(page);
+ putback_lru_page(page);
+
+ count_vm_events(PGMIGRATE_FAIL, HPAGE_PMD_NR);
+ goto out;
+ }
+
+ /*
+ * Traditional migration needs to prepare the memcg charge
+ * transaction early to prevent the old page from being
+ * uncharged when installing migration entries. Here we can
+ * save the potential rollback and start the charge transfer
+ * only when migration is already known to end successfully.
+ */
+ mem_cgroup_prepare_migration(page, new_page, &memcg);
+
+ entry = mk_pmd(new_page, vma->vm_page_prot);
+ entry = pmd_mknonnuma(entry);
+ entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma);
+ entry = pmd_mkhuge(entry);
+
+ page_add_new_anon_rmap(new_page, vma, haddr);
+
+ set_pmd_at(mm, haddr, pmd, entry);
+ update_mmu_cache_pmd(vma, address, entry);
+ page_remove_rmap(page);
+ /*
+ * Finish the charge transaction under the page table lock to
+ * prevent split_huge_page() from dividing up the charge
+ * before it's fully transferred to the new page.
+ */
+ mem_cgroup_end_migration(memcg, page, new_page, true);
+ spin_unlock(&mm->page_table_lock);
+
+ unlock_page(new_page);
+ unlock_page(page);
+ put_page(page); /* Drop the rmap reference */
+ put_page(page); /* Drop the LRU isolation reference */
+
+ count_vm_events(PGMIGRATE_SUCCESS, HPAGE_PMD_NR);
+ count_vm_numa_events(NUMA_PAGE_MIGRATE, HPAGE_PMD_NR);
+
+out:
+ mod_zone_page_state(page_zone(page),
+ NR_ISOLATED_ANON + page_lru,
+ -HPAGE_PMD_NR);
+ return isolated;
+
+out_dropref:
+ put_page(page);
+out_keep_locked:
+ return 0;
+}
#endif /* CONFIG_BALANCE_NUMA */

#endif /* CONFIG_NUMA */
--
1.7.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Mel Gorman
2013-01-07 15:40:02 UTC
Permalink
Post by Mel Gorman
+int numamigrate_isolate_page(pg_data_t *pgdat, struct page *page)
+{
+ int ret = 0;
/* Avoid migrating to a node that is nearly full */
if (migrate_balanced_pgdat(pgdat, 1)) {
Hi Mel Gorman,
This parameter nr_migrate_pags = 1 is not correct, since balancenuma also
support THP in this patchset, the parameter should be 1 <= compound_order(page)
True. The impact is marginal because it only applies when a node is almost
full but it does mean that we do some unnecessary work before migration
fails anyway. I've added a TODO item to fix it when I next revisit NUMA
balancing. Thanks.
--
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Mel Gorman
2012-12-07 10:30:02 UTC
Permalink
From: Andrea Arcangeli <***@redhat.com>

When we split a transparent hugepage, transfer the NUMA type from the
pmd to the pte if needed.

Signed-off-by: Andrea Arcangeli <***@redhat.com>
Signed-off-by: Mel Gorman <***@suse.de>
Reviewed-by: Rik van Riel <***@redhat.com>
---
mm/huge_memory.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 40f17c3..3aaf242 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1363,6 +1363,8 @@ static int __split_huge_page_map(struct page *page,
BUG_ON(page_mapcount(page) != 1);
if (!pmd_young(*pmd))
entry = pte_mkold(entry);
+ if (pmd_numa(*pmd))
+ entry = pte_mknuma(entry);
pte = pte_offset_map(&_pmd, haddr);
BUG_ON(!pte_none(*pte));
set_pte_at(mm, haddr, pte, entry);
--
1.7.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Mel Gorman
2012-12-07 10:30:02 UTC
Permalink
NOTE: This is very heavily based on similar logic in autonuma. It should
be signed off by Andrea but because there was no standalone
patch and it's sufficiently different from what he did that
the signed-off is omitted. Will be added back if requested.

If a large number of pages are misplaced then the memory bus can be
saturated just migrating pages between nodes. This patch rate-limits
the amount of memory that can be migrating between nodes.

Signed-off-by: Mel Gorman <***@suse.de>
---
mm/migrate.c | 30 +++++++++++++++++++++++++++++-
1 file changed, 29 insertions(+), 1 deletion(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index 4f55694..b2e6d4c 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1461,12 +1461,21 @@ static struct page *alloc_misplaced_dst_page(struct page *page,
}

/*
+ * page migration rate limiting control.
+ * Do not migrate more than @pages_to_migrate in a @migrate_interval_millisecs
+ * window of time. Default here says do not migrate more than 1280M per second.
+ */
+static unsigned int migrate_interval_millisecs __read_mostly = 100;
+static unsigned int ratelimit_pages __read_mostly = 128 << (20 - PAGE_SHIFT);
+
+/*
* Attempt to migrate a misplaced page to the specified destination
* node. Caller is expected to have an elevated reference count on
* the page that will be dropped by this function before returning.
*/
int migrate_misplaced_page(struct page *page, int node)
{
+ pg_data_t *pgdat = NODE_DATA(node);
int isolated = 0;
LIST_HEAD(migratepages);

@@ -1479,8 +1488,27 @@ int migrate_misplaced_page(struct page *page, int node)
goto out;
}

+ /*
+ * Rate-limit the amount of data that is being migrated to a node.
+ * Optimal placement is no good if the memory bus is saturated and
+ * all the time is being spent migrating!
+ */
+ spin_lock(&pgdat->balancenuma_migrate_lock);
+ if (time_after(jiffies, pgdat->balancenuma_migrate_next_window)) {
+ pgdat->balancenuma_migrate_nr_pages = 0;
+ pgdat->balancenuma_migrate_next_window = jiffies +
+ msecs_to_jiffies(migrate_interval_millisecs);
+ }
+ if (pgdat->balancenuma_migrate_nr_pages > ratelimit_pages) {
+ spin_unlock(&pgdat->balancenuma_migrate_lock);
+ put_page(page);
+ goto out;
+ }
+ pgdat->balancenuma_migrate_nr_pages++;
+ spin_unlock(&pgdat->balancenuma_migrate_lock);
+
/* Avoid migrating to a node that is nearly full */
- if (migrate_balanced_pgdat(NODE_DATA(node), 1)) {
+ if (migrate_balanced_pgdat(pgdat, 1)) {
int page_lru;

if (isolate_lru_page(page)) {
--
1.7.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Mel Gorman
2012-12-07 10:30:03 UTC
Permalink
The "mm: sched: numa: Control enabling and disabling of NUMA balancing"
depends on scheduling debug being enabled but it's perfectly legimate to
disable automatic NUMA balancing even without this option. This should
take care of it.

Signed-off-by: Mel Gorman <***@suse.de>
---
kernel/sched/core.c | 9 +++++++++
kernel/sched/sched.h | 8 +++++++-
2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 4841f4f..161079c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1558,6 +1558,7 @@ static void __sched_fork(struct task_struct *p)
}

#ifdef CONFIG_BALANCE_NUMA
+#ifdef CONFIG_SCHED_DEBUG
void set_balancenuma_state(bool enabled)
{
if (enabled)
@@ -1565,6 +1566,14 @@ void set_balancenuma_state(bool enabled)
else
sched_feat_set("NO_NUMA");
}
+#else
+__read_mostly bool balancenuma_enabled;
+
+void set_balancenuma_state(bool enabled)
+{
+ balancenuma_enabled = enabled;
+}
+#endif /* CONFIG_SCHED_DEBUG */
#endif /* CONFIG_BALANCE_NUMA */

/*
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 9a43241..03dce73 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -650,9 +650,15 @@ extern struct static_key sched_feat_keys[__SCHED_FEAT_NR];

#ifdef CONFIG_BALANCE_NUMA
#define sched_feat_numa(x) sched_feat(x)
+#ifdef CONFIG_SCHED_DEBUG
+#define balancenuma_enabled sched_feat_numa(NUMA)
+#else
+extern bool balancenuma_enabled;
+#endif /* CONFIG_SCHED_DEBUG */
#else
#define sched_feat_numa(x) (0)
-#endif
+#define balancenuma_enabled (0)
+#endif /* CONFIG_BALANCE_NUMA */

static inline u64 global_rt_period(void)
{
--
1.7.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Mel Gorman
2012-12-07 10:30:03 UTC
Permalink
From: Hillf Danton <***@gmail.com>

Pass last_nid from misplaced page to newly allocated migration target page.

Signed-off-by: Hillf Danton <***@gmail.com>
Signed-off-by: Mel Gorman <***@suse.de>
---
mm/migrate.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/mm/migrate.c b/mm/migrate.c
index 2c8310c..6bc9745 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1457,6 +1457,9 @@ static struct page *alloc_misplaced_dst_page(struct page *page,
__GFP_NOMEMALLOC | __GFP_NORETRY |
__GFP_NOWARN) &
~GFP_IOFS, 0);
+ if (newpage)
+ page_xchg_last_nid(newpage, page_last_nid(page));
+
return newpage;
}
--
1.7.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Mel Gorman
2012-12-07 10:30:02 UTC
Permalink
From: Ingo Molnar <***@kernel.org>

Convert the struct anon_vma::mutex to an rwsem, which will help
in solving a page-migration scalability problem. (Addressed in
a separate patch.)

The conversion is simple and straightforward: in every case
where we mutex_lock()ed we'll now down_write().

Suggested-by: Linus Torvalds <***@linux-foundation.org>
Reviewed-by: Rik van Riel <***@redhat.com>
Cc: Peter Zijlstra <***@chello.nl>
Cc: Paul Turner <***@google.com>
Cc: Lee Schermerhorn <***@hp.com>
Cc: Christoph Lameter <***@linux.com>
Cc: Mel Gorman <***@suse.de>
Cc: Andrea Arcangeli <***@redhat.com>
Cc: Johannes Weiner <***@cmpxchg.org>
Cc: Hugh Dickins <***@google.com>
Signed-off-by: Ingo Molnar <***@kernel.org>
Signed-off-by: Mel Gorman <***@suse.de>
---
include/linux/rmap.h | 16 ++++++++--------
mm/huge_memory.c | 4 ++--
mm/mmap.c | 8 ++++----
mm/rmap.c | 22 +++++++++++-----------
4 files changed, 25 insertions(+), 25 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index bfe1f47..f3f41d2 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -7,7 +7,7 @@
#include <linux/list.h>
#include <linux/slab.h>
#include <linux/mm.h>
-#include <linux/mutex.h>
+#include <linux/rwsem.h>
#include <linux/memcontrol.h>

/*
@@ -25,8 +25,8 @@
* pointing to this anon_vma once its vma list is empty.
*/
struct anon_vma {
- struct anon_vma *root; /* Root of this anon_vma tree */
- struct mutex mutex; /* Serialize access to vma list */
+ struct anon_vma *root; /* Root of this anon_vma tree */
+ struct rw_semaphore rwsem; /* W: modification, R: walking the list */
/*
* The refcount is taken on an anon_vma when there is no
* guarantee that the vma of page tables will exist for
@@ -64,7 +64,7 @@ struct anon_vma_chain {
struct vm_area_struct *vma;
struct anon_vma *anon_vma;
struct list_head same_vma; /* locked by mmap_sem & page_table_lock */
- struct rb_node rb; /* locked by anon_vma->mutex */
+ struct rb_node rb; /* locked by anon_vma->rwsem */
unsigned long rb_subtree_last;
#ifdef CONFIG_DEBUG_VM_RB
unsigned long cached_vma_start, cached_vma_last;
@@ -108,24 +108,24 @@ static inline void vma_lock_anon_vma(struct vm_area_struct *vma)
{
struct anon_vma *anon_vma = vma->anon_vma;
if (anon_vma)
- mutex_lock(&anon_vma->root->mutex);
+ down_write(&anon_vma->root->rwsem);
}

static inline void vma_unlock_anon_vma(struct vm_area_struct *vma)
{
struct anon_vma *anon_vma = vma->anon_vma;
if (anon_vma)
- mutex_unlock(&anon_vma->root->mutex);
+ up_write(&anon_vma->root->rwsem);
}

static inline void anon_vma_lock(struct anon_vma *anon_vma)
{
- mutex_lock(&anon_vma->root->mutex);
+ down_write(&anon_vma->root->rwsem);
}

static inline void anon_vma_unlock(struct anon_vma *anon_vma)
{
- mutex_unlock(&anon_vma->root->mutex);
+ up_write(&anon_vma->root->rwsem);
}

/*
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 61b66f8..f0c4928 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1291,7 +1291,7 @@ static int __split_huge_page_splitting(struct page *page,
* We can't temporarily set the pmd to null in order
* to split it, the pmd must remain marked huge at all
* times or the VM won't take the pmd_trans_huge paths
- * and it won't wait on the anon_vma->root->mutex to
+ * and it won't wait on the anon_vma->root->rwsem to
* serialize against split_huge_page*.
*/
pmdp_splitting_flush(vma, address, pmd);
@@ -1494,7 +1494,7 @@ static int __split_huge_page_map(struct page *page,
return ret;
}

-/* must be called with anon_vma->root->mutex hold */
+/* must be called with anon_vma->root->rwsem held */
static void __split_huge_page(struct page *page,
struct anon_vma *anon_vma)
{
diff --git a/mm/mmap.c b/mm/mmap.c
index 9a796c4..8840863 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2561,15 +2561,15 @@ static void vm_lock_anon_vma(struct mm_struct *mm, struct anon_vma *anon_vma)
* The LSB of head.next can't change from under us
* because we hold the mm_all_locks_mutex.
*/
- mutex_lock_nest_lock(&anon_vma->root->mutex, &mm->mmap_sem);
+ down_write(&anon_vma->root->rwsem);
/*
* We can safely modify head.next after taking the
- * anon_vma->root->mutex. If some other vma in this mm shares
+ * anon_vma->root->rwsem. If some other vma in this mm shares
* the same anon_vma we won't take it again.
*
* No need of atomic instructions here, head.next
* can't change from under us thanks to the
- * anon_vma->root->mutex.
+ * anon_vma->root->rwsem.
*/
if (__test_and_set_bit(0, (unsigned long *)
&anon_vma->root->rb_root.rb_node))
@@ -2671,7 +2671,7 @@ static void vm_unlock_anon_vma(struct anon_vma *anon_vma)
*
* No need of atomic instructions here, head.next
* can't change from under us until we release the
- * anon_vma->root->mutex.
+ * anon_vma->root->rwsem.
*/
if (!__test_and_clear_bit(0, (unsigned long *)
&anon_vma->root->rb_root.rb_node))
diff --git a/mm/rmap.c b/mm/rmap.c
index 2ee1ef0..6e3ee3b 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -24,7 +24,7 @@
* mm->mmap_sem
* page->flags PG_locked (lock_page)
* mapping->i_mmap_mutex
- * anon_vma->mutex
+ * anon_vma->rwsem
* mm->page_table_lock or pte_lock
* zone->lru_lock (in mark_page_accessed, isolate_lru_page)
* swap_lock (in swap_duplicate, swap_info_get)
@@ -37,7 +37,7 @@
* in arch-dependent flush_dcache_mmap_lock,
* within bdi.wb->list_lock in __sync_single_inode)
*
- * anon_vma->mutex,mapping->i_mutex (memory_failure, collect_procs_anon)
+ * anon_vma->rwsem,mapping->i_mutex (memory_failure, collect_procs_anon)
* ->tasklist_lock
* pte map lock
*/
@@ -103,7 +103,7 @@ static inline void anon_vma_free(struct anon_vma *anon_vma)
* LOCK should suffice since the actual taking of the lock must
* happen _before_ what follows.
*/
- if (mutex_is_locked(&anon_vma->root->mutex)) {
+ if (rwsem_is_locked(&anon_vma->root->rwsem)) {
anon_vma_lock(anon_vma);
anon_vma_unlock(anon_vma);
}
@@ -219,9 +219,9 @@ static inline struct anon_vma *lock_anon_vma_root(struct anon_vma *root, struct
struct anon_vma *new_root = anon_vma->root;
if (new_root != root) {
if (WARN_ON_ONCE(root))
- mutex_unlock(&root->mutex);
+ up_write(&root->rwsem);
root = new_root;
- mutex_lock(&root->mutex);
+ down_write(&root->rwsem);
}
return root;
}
@@ -229,7 +229,7 @@ static inline struct anon_vma *lock_anon_vma_root(struct anon_vma *root, struct
static inline void unlock_anon_vma_root(struct anon_vma *root)
{
if (root)
- mutex_unlock(&root->mutex);
+ up_write(&root->rwsem);
}

/*
@@ -349,7 +349,7 @@ void unlink_anon_vmas(struct vm_area_struct *vma)
/*
* Iterate the list once more, it now only contains empty and unlinked
* anon_vmas, destroy them. Could not do before due to __put_anon_vma()
- * needing to acquire the anon_vma->root->mutex.
+ * needing to write-acquire the anon_vma->root->rwsem.
*/
list_for_each_entry_safe(avc, next, &vma->anon_vma_chain, same_vma) {
struct anon_vma *anon_vma = avc->anon_vma;
@@ -365,7 +365,7 @@ static void anon_vma_ctor(void *data)
{
struct anon_vma *anon_vma = data;

- mutex_init(&anon_vma->mutex);
+ init_rwsem(&anon_vma->rwsem);
atomic_set(&anon_vma->refcount, 0);
anon_vma->rb_root = RB_ROOT;
}
@@ -457,14 +457,14 @@ struct anon_vma *page_lock_anon_vma(struct page *page)

anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON);
root_anon_vma = ACCESS_ONCE(anon_vma->root);
- if (mutex_trylock(&root_anon_vma->mutex)) {
+ if (down_write_trylock(&root_anon_vma->rwsem)) {
/*
* If the page is still mapped, then this anon_vma is still
* its anon_vma, and holding the mutex ensures that it will
* not go away, see anon_vma_free().
*/
if (!page_mapped(page)) {
- mutex_unlock(&root_anon_vma->mutex);
+ up_write(&root_anon_vma->rwsem);
anon_vma = NULL;
}
goto out;
@@ -1299,7 +1299,7 @@ out_mlock:
/*
* We need mmap_sem locking, Otherwise VM_LOCKED check makes
* unstable result and race. Plus, We can't wait here because
- * we now hold anon_vma->mutex or mapping->i_mmap_mutex.
+ * we now hold anon_vma->rwsem or mapping->i_mmap_mutex.
* if trylock failed, the page remain in evictable lru and later
* vmscan could retry to move the page to unevictable lru if the
* page is actually mlocked.
--
1.7.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Mel Gorman
2012-12-07 10:30:02 UTC
Permalink
To say that the PMD handling code was incorrectly transferred from autonuma
is an understatement. The intention was to handle a PMDs worth of pages
in the same fault and effectively batch the taking of the PTL and page
migration. The copied version instead has the impact of clearing a number
of pte_numa PTE entries and whether any page migration takes place depends
on racing. This just happens to work in some cases.

This patch handles pte_numa faults in batch when a pmd_numa fault is
handled. The pages are migrated if they are currently misplaced.
Essentially this is making an assumption that NUMA locality is
on a PMD boundary but that could be addressed by only setting
pmd_numa if all the pages within that PMD are on the same node
if necessary.

Signed-off-by: Mel Gorman <***@suse.de>
---
mm/memory.c | 51 ++++++++++++++++++++++++++++++++++-----------------
mm/mprotect.c | 25 ++++++++++++++++++++-----
2 files changed, 54 insertions(+), 22 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 47f5dd1..6a1e534 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3449,6 +3449,18 @@ static int do_nonlinear_fault(struct mm_struct *mm, struct vm_area_struct *vma,
return __do_fault(mm, vma, address, pmd, pgoff, flags, orig_pte);
}

+int numa_migrate_prep(struct page *page, struct vm_area_struct *vma,
+ unsigned long addr, int current_nid)
+{
+ get_page(page);
+
+ count_vm_numa_event(NUMA_HINT_FAULTS);
+ if (current_nid == numa_node_id())
+ count_vm_numa_event(NUMA_HINT_FAULTS_LOCAL);
+
+ return mpol_misplaced(page, vma, addr);
+}
+
int do_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
unsigned long addr, pte_t pte, pte_t *ptep, pmd_t *pmd)
{
@@ -3477,18 +3489,14 @@ int do_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
set_pte_at(mm, addr, ptep, pte);
update_mmu_cache(vma, addr, ptep);

- count_vm_numa_event(NUMA_HINT_FAULTS);
page = vm_normal_page(vma, addr, pte);
if (!page) {
pte_unmap_unlock(ptep, ptl);
return 0;
}

- get_page(page);
current_nid = page_to_nid(page);
- if (current_nid == numa_node_id())
- count_vm_numa_event(NUMA_HINT_FAULTS_LOCAL);
- target_nid = mpol_misplaced(page, vma, addr);
+ target_nid = numa_migrate_prep(page, vma, addr, current_nid);
pte_unmap_unlock(ptep, ptl);
if (target_nid == -1) {
/*
@@ -3505,7 +3513,8 @@ int do_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
current_nid = target_nid;

out:
- task_numa_fault(current_nid, 1);
+ if (current_nid != -1)
+ task_numa_fault(current_nid, 1);
return 0;
}

@@ -3521,8 +3530,6 @@ static int do_pmd_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
spinlock_t *ptl;
bool numa = false;
int local_nid = numa_node_id();
- unsigned long nr_faults = 0;
- unsigned long nr_faults_local = 0;

spin_lock(&mm->page_table_lock);
pmd = *pmdp;
@@ -3545,7 +3552,8 @@ static int do_pmd_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
for (addr = _addr + offset; addr < _addr + PMD_SIZE; pte++, addr += PAGE_SIZE) {
pte_t pteval = *pte;
struct page *page;
- int curr_nid;
+ int curr_nid = local_nid;
+ int target_nid;
if (!pte_present(pteval))
continue;
if (!pte_numa(pteval))
@@ -3566,21 +3574,30 @@ static int do_pmd_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
/* only check non-shared pages */
if (unlikely(page_mapcount(page) != 1))
continue;
- pte_unmap_unlock(pte, ptl);

- curr_nid = page_to_nid(page);
- task_numa_fault(curr_nid, 1);
+ /*
+ * Note that the NUMA fault is later accounted to either
+ * the node that is currently running or where the page is
+ * migrated to.
+ */
+ curr_nid = local_nid;
+ target_nid = numa_migrate_prep(page, vma, addr,
+ page_to_nid(page));
+ if (target_nid == -1) {
+ put_page(page);
+ continue;
+ }

- nr_faults++;
- if (curr_nid == local_nid)
- nr_faults_local++;
+ /* Migrate to the requested node */
+ pte_unmap_unlock(pte, ptl);
+ if (migrate_misplaced_page(page, target_nid))
+ curr_nid = target_nid;
+ task_numa_fault(curr_nid, 1);

pte = pte_offset_map_lock(mm, pmdp, addr, &ptl);
}
pte_unmap_unlock(orig_pte, ptl);

- count_vm_numa_events(NUMA_HINT_FAULTS, nr_faults);
- count_vm_numa_events(NUMA_HINT_FAULTS_LOCAL, nr_faults_local);
return 0;
}
#else
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 8abf7c6..629dba1 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -37,12 +37,14 @@ static inline pgprot_t pgprot_modify(pgprot_t oldprot, pgprot_t newprot)

static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
unsigned long addr, unsigned long end, pgprot_t newprot,
- int dirty_accountable, int prot_numa)
+ int dirty_accountable, int prot_numa, bool *ret_all_same_node)
{
struct mm_struct *mm = vma->vm_mm;
pte_t *pte, oldpte;
spinlock_t *ptl;
unsigned long pages = 0;
+ bool all_same_node = true;
+ int last_nid = -1;

pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
arch_enter_lazy_mmu_mode();
@@ -61,6 +63,12 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,

page = vm_normal_page(vma, addr, oldpte);
if (page) {
+ int this_nid = page_to_nid(page);
+ if (last_nid == -1)
+ last_nid = this_nid;
+ if (last_nid != this_nid)
+ all_same_node = false;
+
/* only check non-shared pages */
if (!pte_numa(oldpte) &&
page_mapcount(page) == 1) {
@@ -81,7 +89,6 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,

if (updated)
pages++;
-
ptep_modify_prot_commit(mm, addr, pte, ptent);
} else if (IS_ENABLED(CONFIG_MIGRATION) && !pte_file(oldpte)) {
swp_entry_t entry = pte_to_swp_entry(oldpte);
@@ -101,6 +108,7 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
arch_leave_lazy_mmu_mode();
pte_unmap_unlock(pte - 1, ptl);

+ *ret_all_same_node = all_same_node;
return pages;
}

@@ -127,6 +135,7 @@ static inline unsigned long change_pmd_range(struct vm_area_struct *vma, pud_t *
pmd_t *pmd;
unsigned long next;
unsigned long pages = 0;
+ bool all_same_node;

pmd = pmd_offset(pud, addr);
do {
@@ -143,9 +152,15 @@ static inline unsigned long change_pmd_range(struct vm_area_struct *vma, pud_t *
if (pmd_none_or_clear_bad(pmd))
continue;
pages += change_pte_range(vma, pmd, addr, next, newprot,
- dirty_accountable, prot_numa);
-
- if (prot_numa)
+ dirty_accountable, prot_numa, &all_same_node);
+
+ /*
+ * If we are changing protections for NUMA hinting faults then
+ * set pmd_numa if the examined pages were all on the same
+ * node. This allows a regular PMD to be handled as one fault
+ * and effectively batches the taking of the PTL
+ */
+ if (prot_numa && all_same_node)
change_pmd_protnuma(vma->vm_mm, addr, pmd);
} while (pmd++, addr = next, addr != end);
--
1.7.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Mel Gorman
2012-12-07 10:30:03 UTC
Permalink
This patch converts change_prot_numa() to use change_protection(). As
pte_numa and friends check the PTE bits directly it is necessary for
change_protection() to use pmd_mknuma(). Hence the required
modifications to change_protection() are a little clumsy but the
end result is that most of the numa page table helpers are just one or
two instructions.

Signed-off-by: Mel Gorman <***@suse.de>
---
include/linux/huge_mm.h | 3 +-
include/linux/mm.h | 4 +-
mm/huge_memory.c | 14 ++++-
mm/mempolicy.c | 137 +++++------------------------------------------
mm/mprotect.c | 72 +++++++++++++++++++------
5 files changed, 85 insertions(+), 145 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index dabb510..027ad04 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -27,7 +27,8 @@ extern int move_huge_pmd(struct vm_area_struct *vma,
unsigned long new_addr, unsigned long old_end,
pmd_t *old_pmd, pmd_t *new_pmd);
extern int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
- unsigned long addr, pgprot_t newprot);
+ unsigned long addr, pgprot_t newprot,
+ int prot_numa);

enum transparent_hugepage_flag {
TRANSPARENT_HUGEPAGE_FLAG,
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 471185e..d04c2f0 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1080,7 +1080,7 @@ extern unsigned long do_mremap(unsigned long addr,
unsigned long flags, unsigned long new_addr);
extern unsigned long change_protection(struct vm_area_struct *vma, unsigned long start,
unsigned long end, pgprot_t newprot,
- int dirty_accountable);
+ int dirty_accountable, int prot_numa);
extern int mprotect_fixup(struct vm_area_struct *vma,
struct vm_area_struct **pprev, unsigned long start,
unsigned long end, unsigned long newflags);
@@ -1552,7 +1552,7 @@ static inline pgprot_t vm_get_page_prot(unsigned long vm_flags)
#endif

#ifdef CONFIG_ARCH_USES_NUMA_PROT_NONE
-void change_prot_numa(struct vm_area_struct *vma,
+unsigned long change_prot_numa(struct vm_area_struct *vma,
unsigned long start, unsigned long end);
#endif

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index df1af09..68e0412 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1146,7 +1146,7 @@ out:
}

int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
- unsigned long addr, pgprot_t newprot)
+ unsigned long addr, pgprot_t newprot, int prot_numa)
{
struct mm_struct *mm = vma->vm_mm;
int ret = 0;
@@ -1154,7 +1154,17 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
if (__pmd_trans_huge_lock(pmd, vma) == 1) {
pmd_t entry;
entry = pmdp_get_and_clear(mm, addr, pmd);
- entry = pmd_modify(entry, newprot);
+ if (!prot_numa)
+ entry = pmd_modify(entry, newprot);
+ else {
+ struct page *page = pmd_page(*pmd);
+
+ /* only check non-shared pages */
+ if (page_mapcount(page) == 1 &&
+ !pmd_numa(*pmd)) {
+ entry = pmd_mknuma(entry);
+ }
+ }
set_pmd_at(mm, addr, pmd, entry);
spin_unlock(&vma->vm_mm->page_table_lock);
ret = 1;
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 51d3ebd..75d4600 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -568,134 +568,23 @@ static inline int check_pgd_range(struct vm_area_struct *vma,

#ifdef CONFIG_ARCH_USES_NUMA_PROT_NONE
/*
- * Here we search for not shared page mappings (mapcount == 1) and we
- * set up the pmd/pte_numa on those mappings so the very next access
- * will fire a NUMA hinting page fault.
+ * This is used to mark a range of virtual addresses to be inaccessible.
+ * These are later cleared by a NUMA hinting fault. Depending on these
+ * faults, pages may be migrated for better NUMA placement.
+ *
+ * This is assuming that NUMA faults are handled using PROT_NONE. If
+ * an architecture makes a different choice, it will need further
+ * changes to the core.
*/
-static int
-change_prot_numa_range(struct mm_struct *mm, struct vm_area_struct *vma,
- unsigned long address)
-{
- pgd_t *pgd;
- pud_t *pud;
- pmd_t *pmd;
- pte_t *pte, *_pte;
- struct page *page;
- unsigned long _address, end;
- spinlock_t *ptl;
- int ret = 0;
-
- VM_BUG_ON(address & ~PAGE_MASK);
-
- pgd = pgd_offset(mm, address);
- if (!pgd_present(*pgd))
- goto out;
-
- pud = pud_offset(pgd, address);
- if (!pud_present(*pud))
- goto out;
-
- pmd = pmd_offset(pud, address);
- if (pmd_none(*pmd))
- goto out;
-
- if (pmd_trans_huge_lock(pmd, vma) == 1) {
- int page_nid;
- ret = HPAGE_PMD_NR;
-
- VM_BUG_ON(address & ~HPAGE_PMD_MASK);
-
- if (pmd_numa(*pmd)) {
- spin_unlock(&mm->page_table_lock);
- goto out;
- }
-
- page = pmd_page(*pmd);
-
- /* only check non-shared pages */
- if (page_mapcount(page) != 1) {
- spin_unlock(&mm->page_table_lock);
- goto out;
- }
-
- page_nid = page_to_nid(page);
-
- if (pmd_numa(*pmd)) {
- spin_unlock(&mm->page_table_lock);
- goto out;
- }
-
- set_pmd_at(mm, address, pmd, pmd_mknuma(*pmd));
- ret += HPAGE_PMD_NR;
- /* defer TLB flush to lower the overhead */
- spin_unlock(&mm->page_table_lock);
- goto out;
- }
-
- if (pmd_trans_unstable(pmd))
- goto out;
- VM_BUG_ON(!pmd_present(*pmd));
-
- end = min(vma->vm_end, (address + PMD_SIZE) & PMD_MASK);
- pte = pte_offset_map_lock(mm, pmd, address, &ptl);
- for (_address = address, _pte = pte; _address < end;
- _pte++, _address += PAGE_SIZE) {
- pte_t pteval = *_pte;
- if (!pte_present(pteval))
- continue;
- if (pte_numa(pteval))
- continue;
- page = vm_normal_page(vma, _address, pteval);
- if (unlikely(!page))
- continue;
- /* only check non-shared pages */
- if (page_mapcount(page) != 1)
- continue;
-
- set_pte_at(mm, _address, _pte, pte_mknuma(pteval));
-
- /* defer TLB flush to lower the overhead */
- ret++;
- }
- pte_unmap_unlock(pte, ptl);
-
- if (ret && !pmd_numa(*pmd)) {
- spin_lock(&mm->page_table_lock);
- set_pmd_at(mm, address, pmd, pmd_mknuma(*pmd));
- spin_unlock(&mm->page_table_lock);
- /* defer TLB flush to lower the overhead */
- }
-
-out:
- return ret;
-}
-
-/* Assumes mmap_sem is held */
-void
-change_prot_numa(struct vm_area_struct *vma,
- unsigned long address, unsigned long end)
+unsigned long change_prot_numa(struct vm_area_struct *vma,
+ unsigned long addr, unsigned long end)
{
- struct mm_struct *mm = vma->vm_mm;
- int progress = 0;
-
- while (address < end) {
- VM_BUG_ON(address < vma->vm_start ||
- address + PAGE_SIZE > vma->vm_end);
+ int nr_updated;
+ BUILD_BUG_ON(_PAGE_NUMA != _PAGE_PROTNONE);

- progress += change_prot_numa_range(mm, vma, address);
- address = (address + PMD_SIZE) & PMD_MASK;
- }
+ nr_updated = change_protection(vma, addr, end, vma->vm_page_prot, 0, 1);

- /*
- * Flush the TLB for the mm to start the NUMA hinting
- * page faults after we finish scanning this vma part
- * if there were any PTE updates
- */
- if (progress) {
- mmu_notifier_invalidate_range_start(vma->vm_mm, address, end);
- flush_tlb_range(vma, address, end);
- mmu_notifier_invalidate_range_end(vma->vm_mm, address, end);
- }
+ return nr_updated;
}
#else
static unsigned long change_prot_numa(struct vm_area_struct *vma,
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 7c3628a..8abf7c6 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -35,10 +35,11 @@ static inline pgprot_t pgprot_modify(pgprot_t oldprot, pgprot_t newprot)
}
#endif

-static unsigned long change_pte_range(struct mm_struct *mm, pmd_t *pmd,
+static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
unsigned long addr, unsigned long end, pgprot_t newprot,
- int dirty_accountable)
+ int dirty_accountable, int prot_numa)
{
+ struct mm_struct *mm = vma->vm_mm;
pte_t *pte, oldpte;
spinlock_t *ptl;
unsigned long pages = 0;
@@ -49,19 +50,39 @@ static unsigned long change_pte_range(struct mm_struct *mm, pmd_t *pmd,
oldpte = *pte;
if (pte_present(oldpte)) {
pte_t ptent;
+ bool updated = false;

ptent = ptep_modify_prot_start(mm, addr, pte);
- ptent = pte_modify(ptent, newprot);
+ if (!prot_numa) {
+ ptent = pte_modify(ptent, newprot);
+ updated = true;
+ } else {
+ struct page *page;
+
+ page = vm_normal_page(vma, addr, oldpte);
+ if (page) {
+ /* only check non-shared pages */
+ if (!pte_numa(oldpte) &&
+ page_mapcount(page) == 1) {
+ ptent = pte_mknuma(ptent);
+ updated = true;
+ }
+ }
+ }

/*
* Avoid taking write faults for pages we know to be
* dirty.
*/
- if (dirty_accountable && pte_dirty(ptent))
+ if (dirty_accountable && pte_dirty(ptent)) {
ptent = pte_mkwrite(ptent);
+ updated = true;
+ }
+
+ if (updated)
+ pages++;

ptep_modify_prot_commit(mm, addr, pte, ptent);
- pages++;
} else if (IS_ENABLED(CONFIG_MIGRATION) && !pte_file(oldpte)) {
swp_entry_t entry = pte_to_swp_entry(oldpte);

@@ -83,9 +104,25 @@ static unsigned long change_pte_range(struct mm_struct *mm, pmd_t *pmd,
return pages;
}

+#ifdef CONFIG_BALANCE_NUMA
+static inline void change_pmd_protnuma(struct mm_struct *mm, unsigned long addr,
+ pmd_t *pmd)
+{
+ spin_lock(&mm->page_table_lock);
+ set_pmd_at(mm, addr & PMD_MASK, pmd, pmd_mknuma(*pmd));
+ spin_unlock(&mm->page_table_lock);
+}
+#else
+static inline void change_pmd_protnuma(struct mm_struct *mm, unsigned long addr,
+ pmd_t *pmd)
+{
+ BUG();
+}
+#endif /* CONFIG_BALANCE_NUMA */
+
static inline unsigned long change_pmd_range(struct vm_area_struct *vma, pud_t *pud,
unsigned long addr, unsigned long end, pgprot_t newprot,
- int dirty_accountable)
+ int dirty_accountable, int prot_numa)
{
pmd_t *pmd;
unsigned long next;
@@ -97,7 +134,7 @@ static inline unsigned long change_pmd_range(struct vm_area_struct *vma, pud_t *
if (pmd_trans_huge(*pmd)) {
if (next - addr != HPAGE_PMD_SIZE)
split_huge_page_pmd(vma->vm_mm, pmd);
- else if (change_huge_pmd(vma, pmd, addr, newprot)) {
+ else if (change_huge_pmd(vma, pmd, addr, newprot, prot_numa)) {
pages += HPAGE_PMD_NR;
continue;
}
@@ -105,8 +142,11 @@ static inline unsigned long change_pmd_range(struct vm_area_struct *vma, pud_t *
}
if (pmd_none_or_clear_bad(pmd))
continue;
- pages += change_pte_range(vma->vm_mm, pmd, addr, next, newprot,
- dirty_accountable);
+ pages += change_pte_range(vma, pmd, addr, next, newprot,
+ dirty_accountable, prot_numa);
+
+ if (prot_numa)
+ change_pmd_protnuma(vma->vm_mm, addr, pmd);
} while (pmd++, addr = next, addr != end);

return pages;
@@ -114,7 +154,7 @@ static inline unsigned long change_pmd_range(struct vm_area_struct *vma, pud_t *

static inline unsigned long change_pud_range(struct vm_area_struct *vma, pgd_t *pgd,
unsigned long addr, unsigned long end, pgprot_t newprot,
- int dirty_accountable)
+ int dirty_accountable, int prot_numa)
{
pud_t *pud;
unsigned long next;
@@ -126,7 +166,7 @@ static inline unsigned long change_pud_range(struct vm_area_struct *vma, pgd_t *
if (pud_none_or_clear_bad(pud))
continue;
pages += change_pmd_range(vma, pud, addr, next, newprot,
- dirty_accountable);
+ dirty_accountable, prot_numa);
} while (pud++, addr = next, addr != end);

return pages;
@@ -134,7 +174,7 @@ static inline unsigned long change_pud_range(struct vm_area_struct *vma, pgd_t *

static unsigned long change_protection_range(struct vm_area_struct *vma,
unsigned long addr, unsigned long end, pgprot_t newprot,
- int dirty_accountable)
+ int dirty_accountable, int prot_numa)
{
struct mm_struct *mm = vma->vm_mm;
pgd_t *pgd;
@@ -150,7 +190,7 @@ static unsigned long change_protection_range(struct vm_area_struct *vma,
if (pgd_none_or_clear_bad(pgd))
continue;
pages += change_pud_range(vma, pgd, addr, next, newprot,
- dirty_accountable);
+ dirty_accountable, prot_numa);
} while (pgd++, addr = next, addr != end);

/* Only flush the TLB if we actually modified any entries: */
@@ -162,7 +202,7 @@ static unsigned long change_protection_range(struct vm_area_struct *vma,

unsigned long change_protection(struct vm_area_struct *vma, unsigned long start,
unsigned long end, pgprot_t newprot,
- int dirty_accountable)
+ int dirty_accountable, int prot_numa)
{
struct mm_struct *mm = vma->vm_mm;
unsigned long pages;
@@ -171,7 +211,7 @@ unsigned long change_protection(struct vm_area_struct *vma, unsigned long start,
if (is_vm_hugetlb_page(vma))
pages = hugetlb_change_protection(vma, start, end, newprot);
else
- pages = change_protection_range(vma, start, end, newprot, dirty_accountable);
+ pages = change_protection_range(vma, start, end, newprot, dirty_accountable, prot_numa);
mmu_notifier_invalidate_range_end(mm, start, end);

return pages;
@@ -249,7 +289,7 @@ success:
dirty_accountable = 1;
}

- change_protection(vma, start, end, vma->vm_page_prot, dirty_accountable);
+ change_protection(vma, start, end, vma->vm_page_prot, dirty_accountable, 0);

vm_stat_account(mm, oldflags, vma->vm_file, -nrpages);
vm_stat_account(mm, newflags, vma->vm_file, nrpages);
--
1.7.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Mel Gorman
2012-12-07 10:30:03 UTC
Permalink
From: Peter Zijlstra <***@chello.nl>

This will be used for three kinds of purposes:

- to optimize mprotect()

- to speed up working set scanning for working set areas that
have not been touched

- to more accurately scan per real working set

No change in functionality from this patch.

Suggested-by: Ingo Molnar <***@kernel.org>
Signed-off-by: Peter Zijlstra <***@chello.nl>
Cc: Linus Torvalds <***@linux-foundation.org>
Cc: Andrew Morton <***@linux-foundation.org>
Cc: Andrea Arcangeli <***@redhat.com>
Cc: Rik van Riel <***@redhat.com>
Cc: Mel Gorman <***@suse.de>
Cc: Hugh Dickins <***@google.com>
Cc: Thomas Gleixner <***@linutronix.de>
Signed-off-by: Ingo Molnar <***@kernel.org>
---
include/linux/hugetlb.h | 8 +++++--
include/linux/mm.h | 3 +++
mm/hugetlb.c | 10 ++++++--
mm/mprotect.c | 58 +++++++++++++++++++++++++++++++++++------------
4 files changed, 61 insertions(+), 18 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 2251648..06e691b 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -87,7 +87,7 @@ struct page *follow_huge_pud(struct mm_struct *mm, unsigned long address,
pud_t *pud, int write);
int pmd_huge(pmd_t pmd);
int pud_huge(pud_t pmd);
-void hugetlb_change_protection(struct vm_area_struct *vma,
+unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
unsigned long address, unsigned long end, pgprot_t newprot);

#else /* !CONFIG_HUGETLB_PAGE */
@@ -132,7 +132,11 @@ static inline void copy_huge_page(struct page *dst, struct page *src)
{
}

-#define hugetlb_change_protection(vma, address, end, newprot)
+static inline unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
+ unsigned long address, unsigned long end, pgprot_t newprot)
+{
+ return 0;
+}

static inline void __unmap_hugepage_range_final(struct mmu_gather *tlb,
struct vm_area_struct *vma, unsigned long start,
diff --git a/include/linux/mm.h b/include/linux/mm.h
index bcaab4e..1856c62 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1078,6 +1078,9 @@ extern unsigned long move_page_tables(struct vm_area_struct *vma,
extern unsigned long do_mremap(unsigned long addr,
unsigned long old_len, unsigned long new_len,
unsigned long flags, unsigned long new_addr);
+extern unsigned long change_protection(struct vm_area_struct *vma, unsigned long start,
+ unsigned long end, pgprot_t newprot,
+ int dirty_accountable);
extern int mprotect_fixup(struct vm_area_struct *vma,
struct vm_area_struct **pprev, unsigned long start,
unsigned long end, unsigned long newflags);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 59a0059..712895e 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3014,7 +3014,7 @@ same_page:
return i ? i : -EFAULT;
}

-void hugetlb_change_protection(struct vm_area_struct *vma,
+unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
unsigned long address, unsigned long end, pgprot_t newprot)
{
struct mm_struct *mm = vma->vm_mm;
@@ -3022,6 +3022,7 @@ void hugetlb_change_protection(struct vm_area_struct *vma,
pte_t *ptep;
pte_t pte;
struct hstate *h = hstate_vma(vma);
+ unsigned long pages = 0;

BUG_ON(address >= end);
flush_cache_range(vma, address, end);
@@ -3032,12 +3033,15 @@ void hugetlb_change_protection(struct vm_area_struct *vma,
ptep = huge_pte_offset(mm, address);
if (!ptep)
continue;
- if (huge_pmd_unshare(mm, &address, ptep))
+ if (huge_pmd_unshare(mm, &address, ptep)) {
+ pages++;
continue;
+ }
if (!huge_pte_none(huge_ptep_get(ptep))) {
pte = huge_ptep_get_and_clear(mm, address, ptep);
pte = pte_mkhuge(pte_modify(pte, newprot));
set_huge_pte_at(mm, address, ptep, pte);
+ pages++;
}
}
spin_unlock(&mm->page_table_lock);
@@ -3049,6 +3053,8 @@ void hugetlb_change_protection(struct vm_area_struct *vma,
*/
flush_tlb_range(vma, start, end);
mutex_unlock(&vma->vm_file->f_mapping->i_mmap_mutex);
+
+ return pages << h->order;
}

int hugetlb_reserve_pages(struct inode *inode,
diff --git a/mm/mprotect.c b/mm/mprotect.c
index a409926..1e265be 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -35,12 +35,13 @@ static inline pgprot_t pgprot_modify(pgprot_t oldprot, pgprot_t newprot)
}
#endif

-static void change_pte_range(struct mm_struct *mm, pmd_t *pmd,
+static unsigned long change_pte_range(struct mm_struct *mm, pmd_t *pmd,
unsigned long addr, unsigned long end, pgprot_t newprot,
int dirty_accountable)
{
pte_t *pte, oldpte;
spinlock_t *ptl;
+ unsigned long pages = 0;

pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
arch_enter_lazy_mmu_mode();
@@ -60,6 +61,7 @@ static void change_pte_range(struct mm_struct *mm, pmd_t *pmd,
ptent = pte_mkwrite(ptent);

ptep_modify_prot_commit(mm, addr, pte, ptent);
+ pages++;
} else if (IS_ENABLED(CONFIG_MIGRATION) && !pte_file(oldpte)) {
swp_entry_t entry = pte_to_swp_entry(oldpte);

@@ -72,18 +74,22 @@ static void change_pte_range(struct mm_struct *mm, pmd_t *pmd,
set_pte_at(mm, addr, pte,
swp_entry_to_pte(entry));
}
+ pages++;
}
} while (pte++, addr += PAGE_SIZE, addr != end);
arch_leave_lazy_mmu_mode();
pte_unmap_unlock(pte - 1, ptl);
+
+ return pages;
}

-static inline void change_pmd_range(struct vm_area_struct *vma, pud_t *pud,
+static inline unsigned long change_pmd_range(struct vm_area_struct *vma, pud_t *pud,
unsigned long addr, unsigned long end, pgprot_t newprot,
int dirty_accountable)
{
pmd_t *pmd;
unsigned long next;
+ unsigned long pages = 0;

pmd = pmd_offset(pud, addr);
do {
@@ -91,35 +97,42 @@ static inline void change_pmd_range(struct vm_area_struct *vma, pud_t *pud,
if (pmd_trans_huge(*pmd)) {
if (next - addr != HPAGE_PMD_SIZE)
split_huge_page_pmd(vma->vm_mm, pmd);
- else if (change_huge_pmd(vma, pmd, addr, newprot))
+ else if (change_huge_pmd(vma, pmd, addr, newprot)) {
+ pages += HPAGE_PMD_NR;
continue;
+ }
/* fall through */
}
if (pmd_none_or_clear_bad(pmd))
continue;
- change_pte_range(vma->vm_mm, pmd, addr, next, newprot,
+ pages += change_pte_range(vma->vm_mm, pmd, addr, next, newprot,
dirty_accountable);
} while (pmd++, addr = next, addr != end);
+
+ return pages;
}

-static inline void change_pud_range(struct vm_area_struct *vma, pgd_t *pgd,
+static inline unsigned long change_pud_range(struct vm_area_struct *vma, pgd_t *pgd,
unsigned long addr, unsigned long end, pgprot_t newprot,
int dirty_accountable)
{
pud_t *pud;
unsigned long next;
+ unsigned long pages = 0;

pud = pud_offset(pgd, addr);
do {
next = pud_addr_end(addr, end);
if (pud_none_or_clear_bad(pud))
continue;
- change_pmd_range(vma, pud, addr, next, newprot,
+ pages += change_pmd_range(vma, pud, addr, next, newprot,
dirty_accountable);
} while (pud++, addr = next, addr != end);
+
+ return pages;
}

-static void change_protection(struct vm_area_struct *vma,
+static unsigned long change_protection_range(struct vm_area_struct *vma,
unsigned long addr, unsigned long end, pgprot_t newprot,
int dirty_accountable)
{
@@ -127,6 +140,7 @@ static void change_protection(struct vm_area_struct *vma,
pgd_t *pgd;
unsigned long next;
unsigned long start = addr;
+ unsigned long pages = 0;

BUG_ON(addr >= end);
pgd = pgd_offset(mm, addr);
@@ -135,10 +149,30 @@ static void change_protection(struct vm_area_struct *vma,
next = pgd_addr_end(addr, end);
if (pgd_none_or_clear_bad(pgd))
continue;
- change_pud_range(vma, pgd, addr, next, newprot,
+ pages += change_pud_range(vma, pgd, addr, next, newprot,
dirty_accountable);
} while (pgd++, addr = next, addr != end);
+
flush_tlb_range(vma, start, end);
+
+ return pages;
+}
+
+unsigned long change_protection(struct vm_area_struct *vma, unsigned long start,
+ unsigned long end, pgprot_t newprot,
+ int dirty_accountable)
+{
+ struct mm_struct *mm = vma->vm_mm;
+ unsigned long pages;
+
+ mmu_notifier_invalidate_range_start(mm, start, end);
+ if (is_vm_hugetlb_page(vma))
+ pages = hugetlb_change_protection(vma, start, end, newprot);
+ else
+ pages = change_protection_range(vma, start, end, newprot, dirty_accountable);
+ mmu_notifier_invalidate_range_end(mm, start, end);
+
+ return pages;
}

int
@@ -213,12 +247,8 @@ success:
dirty_accountable = 1;
}

- mmu_notifier_invalidate_range_start(mm, start, end);
- if (is_vm_hugetlb_page(vma))
- hugetlb_change_protection(vma, start, end, vma->vm_page_prot);
- else
- change_protection(vma, start, end, vma->vm_page_prot, dirty_accountable);
- mmu_notifier_invalidate_range_end(mm, start, end);
+ change_protection(vma, start, end, vma->vm_page_prot, dirty_accountable);
+
vm_stat_account(mm, oldflags, vma->vm_file, -nrpages);
vm_stat_account(mm, newflags, vma->vm_file, nrpages);
perf_event_mmap(vma);
--
1.7.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Mel Gorman
2012-12-07 10:30:02 UTC
Permalink
From: Andrea Arcangeli <***@redhat.com>

This defines the per-node data used by Migrate On Fault in order to
rate limit the migration. The rate limiting is applied independently
to each destination node.

Signed-off-by: Andrea Arcangeli <***@redhat.com>
Signed-off-by: Mel Gorman <***@suse.de>
---
include/linux/mmzone.h | 13 +++++++++++++
mm/page_alloc.c | 5 +++++
2 files changed, 18 insertions(+)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index a23923b..1ed16e5 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -717,6 +717,19 @@ typedef struct pglist_data {
struct task_struct *kswapd; /* Protected by lock_memory_hotplug() */
int kswapd_max_order;
enum zone_type classzone_idx;
+#ifdef CONFIG_BALANCE_NUMA
+ /*
+ * Lock serializing the per destination node AutoNUMA memory
+ * migration rate limiting data.
+ */
+ spinlock_t balancenuma_migrate_lock;
+
+ /* Rate limiting time interval */
+ unsigned long balancenuma_migrate_next_window;
+
+ /* Number of pages migrated during the rate limiting time interval */
+ unsigned long balancenuma_migrate_nr_pages;
+#endif
} pg_data_t;

#define node_present_pages(nid) (NODE_DATA(nid)->node_present_pages)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5953dc2..df58654 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4449,6 +4449,11 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
int ret;

pgdat_resize_init(pgdat);
+#ifdef CONFIG_BALANCE_NUMA
+ spin_lock_init(&pgdat->balancenuma_migrate_lock);
+ pgdat->balancenuma_migrate_nr_pages = 0;
+ pgdat->balancenuma_migrate_next_window = jiffies;
+#endif
init_waitqueue_head(&pgdat->kswapd_wait);
init_waitqueue_head(&pgdat->pfmemalloc_wait);
pgdat_page_cgroup_init(pgdat);
--
1.7.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Mel Gorman
2012-12-07 10:30:03 UTC
Permalink
From: Hillf Danton <***@gmail.com>

Pass last_nid from head page to tail page.

Signed-off-by: Hillf Danton <***@gmail.com>
Signed-off-by: Mel Gorman <***@suse.de>
---
mm/huge_memory.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 66e73cc..4c6efa8 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1361,6 +1361,7 @@ static void __split_huge_page_refcount(struct page *page)
page_tail->mapping = page->mapping;

page_tail->index = page->index + i;
+ page_xchg_last_nid(page_tail, page_last_nid(page));

BUG_ON(!PageAnon(page_tail));
BUG_ON(!PageUptodate(page_tail));
--
1.7.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Mel Gorman
2012-12-07 10:30:03 UTC
Permalink
If there is excessive migration due to NUMA balancing it gets rate
limited. It does this by counting the number of pages it has migrated
recently but counts a transhuge page as 1 page. Account for it properly.

Signed-off-by: Mel Gorman <***@suse.de>
---
mm/migrate.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index eb155c9..6b6567f 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1492,7 +1492,7 @@ bool migrate_ratelimited(int node)
}

/* Returns true if the node is migrate rate-limited after the update */
-bool numamigrate_update_ratelimit(pg_data_t *pgdat)
+bool numamigrate_update_ratelimit(pg_data_t *pgdat, unsigned long nr_pages)
{
bool rate_limited = false;

@@ -1510,7 +1510,7 @@ bool numamigrate_update_ratelimit(pg_data_t *pgdat)
if (pgdat->balancenuma_migrate_nr_pages > ratelimit_pages)
rate_limited = true;
else
- pgdat->balancenuma_migrate_nr_pages++;
+ pgdat->balancenuma_migrate_nr_pages += nr_pages;
spin_unlock(&pgdat->balancenuma_migrate_lock);

return rate_limited;
@@ -1579,7 +1579,7 @@ int migrate_misplaced_page(struct page *page, int node)
* Optimal placement is no good if the memory bus is saturated and
* all the time is being spent migrating!
*/
- if (numamigrate_update_ratelimit(pgdat)) {
+ if (numamigrate_update_ratelimit(pgdat, 1)) {
put_page(page);
goto out;
}
@@ -1630,7 +1630,7 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,
* Optimal placement is no good if the memory bus is saturated and
* all the time is being spent migrating!
*/
- if (numamigrate_update_ratelimit(pgdat))
+ if (numamigrate_update_ratelimit(pgdat, HPAGE_PMD_NR))
goto out_dropref;

new_page = alloc_pages_node(node,
--
1.7.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Mel Gorman
2012-12-07 10:30:03 UTC
Permalink
From: Peter Zijlstra <***@chello.nl>

Add a 1 second delay before starting to scan the working set of
a task and starting to balance it amongst nodes.

[ note that before the constant per task WSS sampling rate patch
the initial scan would happen much later still, in effect that
patch caused this regression. ]

The theory is that short-run tasks benefit very little from NUMA
placement: they come and go, and they better stick to the node
they were started on. As tasks mature and rebalance to other CPUs
and nodes, so does their NUMA placement have to change and so
does it start to matter more and more.

In practice this change fixes an observable kbuild regression:

# [ a perf stat --null --repeat 10 test of ten bzImage builds to /dev/shm ]

!NUMA:
45.291088843 seconds time elapsed ( +- 0.40% )
45.154231752 seconds time elapsed ( +- 0.36% )

+NUMA, no slow start:
46.172308123 seconds time elapsed ( +- 0.30% )
46.343168745 seconds time elapsed ( +- 0.25% )

+NUMA, 1 sec slow start:
45.224189155 seconds time elapsed ( +- 0.25% )
45.160866532 seconds time elapsed ( +- 0.17% )

and it also fixes an observable perf bench (hackbench) regression:

# perf stat --null --repeat 10 perf bench sched messaging

-NUMA:

-NUMA: 0.246225691 seconds time elapsed ( +- 1.31% )
+NUMA no slow start: 0.252620063 seconds time elapsed ( +- 1.13% )

+NUMA 1sec delay: 0.248076230 seconds time elapsed ( +- 1.35% )

The implementation is simple and straightforward, most of the patch
deals with adding the /proc/sys/kernel/balance_numa_scan_delay_ms tunable
knob.

Signed-off-by: Peter Zijlstra <***@chello.nl>
Cc: Linus Torvalds <***@linux-foundation.org>
Cc: Andrew Morton <***@linux-foundation.org>
Cc: Peter Zijlstra <***@chello.nl>
Cc: Andrea Arcangeli <***@redhat.com>
Cc: Rik van Riel <***@redhat.com>
[ Wrote the changelog, ran measurements, tuned the default. ]
Signed-off-by: Ingo Molnar <***@kernel.org>
Signed-off-by: Mel Gorman <***@suse.de>
Reviewed-by: Rik van Riel <***@redhat.com>
---
include/linux/sched.h | 1 +
kernel/sched/core.c | 2 +-
kernel/sched/fair.c | 5 +++++
kernel/sysctl.c | 7 +++++++
4 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index abb1c70..a2b06ea 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2006,6 +2006,7 @@ enum sched_tunable_scaling {
};
extern enum sched_tunable_scaling sysctl_sched_tunable_scaling;

+extern unsigned int sysctl_balance_numa_scan_delay;
extern unsigned int sysctl_balance_numa_scan_period_min;
extern unsigned int sysctl_balance_numa_scan_period_max;
extern unsigned int sysctl_balance_numa_scan_size;
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 81fa185..047e3c7 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1543,7 +1543,7 @@ static void __sched_fork(struct task_struct *p)
p->node_stamp = 0ULL;
p->numa_scan_seq = p->mm ? p->mm->numa_scan_seq : 0;
p->numa_migrate_seq = p->mm ? p->mm->numa_scan_seq - 1 : 0;
- p->numa_scan_period = sysctl_balance_numa_scan_period_min;
+ p->numa_scan_period = sysctl_balance_numa_scan_delay;
p->numa_work.next = &p->numa_work;
#endif /* CONFIG_BALANCE_NUMA */
}
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 773ef97..2e65f44 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -788,6 +788,9 @@ unsigned int sysctl_balance_numa_scan_period_max = 100*16;
/* Portion of address space to scan in MB */
unsigned int sysctl_balance_numa_scan_size = 256;

+/* Scan @scan_size MB every @scan_period after an initial @scan_delay in ms */
+unsigned int sysctl_balance_numa_scan_delay = 1000;
+
static void task_numa_placement(struct task_struct *p)
{
int seq = ACCESS_ONCE(p->mm->numa_scan_seq);
@@ -929,6 +932,8 @@ void task_tick_numa(struct rq *rq, struct task_struct *curr)
period = (u64)curr->numa_scan_period * NSEC_PER_MSEC;

if (now - curr->node_stamp > period) {
+ if (!curr->node_stamp)
+ curr->numa_scan_period = sysctl_balance_numa_scan_period_min;
curr->node_stamp = now;

if (!time_before(jiffies, curr->mm->numa_next_scan)) {
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index d191203..5ee587d 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -353,6 +353,13 @@ static struct ctl_table kern_table[] = {
#endif /* CONFIG_SMP */
#ifdef CONFIG_BALANCE_NUMA
{
+ .procname = "balance_numa_scan_delay_ms",
+ .data = &sysctl_balance_numa_scan_delay,
+ .maxlen = sizeof(unsigned int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec,
+ },
+ {
.procname = "balance_numa_scan_period_min_ms",
.data = &sysctl_balance_numa_scan_period_min,
.maxlen = sizeof(unsigned int),
--
1.7.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Mel Gorman
2012-12-07 10:30:03 UTC
Permalink
If we have to avoid migrating to a node that is nearly full, put page
and return zero.

Signed-off-by: Hillf Danton <***@gmail.com>
Signed-off-by: Mel Gorman <***@suse.de>
---
mm/migrate.c | 17 ++++++++++-------
1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index a2c4567..49878d7 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1489,18 +1489,21 @@ int migrate_misplaced_page(struct page *page, int node)
}
isolated = 1;

- /*
- * Page is isolated which takes a reference count so now the
- * callers reference can be safely dropped without the page
- * disappearing underneath us during migration
- */
- put_page(page);
-
page_lru = page_is_file_cache(page);
inc_zone_page_state(page, NR_ISOLATED_ANON + page_lru);
list_add(&page->lru, &migratepages);
}

+ /*
+ * Page is either isolated or there is not enough space on the target
+ * node. If isolated, then it has taken a reference count and the
+ * callers reference can be safely dropped without the page
+ * disappearing underneath us during migration. Otherwise the page is
+ * not to be migrated but the callers reference should still be
+ * dropped so it does not leak.
+ */
+ put_page(page);
+
if (isolated) {
int nr_remaining;
--
1.7.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Mel Gorman
2012-12-07 10:30:03 UTC
Permalink
This patch introduces a last_nid field to the page struct. This is used
to build a two-stage filter in the next patch that is aimed at
mitigating a problem whereby pages migrate to the wrong node when
referenced by a process that was running off its home node.

Signed-off-by: Mel Gorman <***@suse.de>
---
include/linux/mm.h | 30 ++++++++++++++++++++++++++++++
include/linux/mm_types.h | 4 ++++
mm/page_alloc.c | 2 ++
3 files changed, 36 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index d04c2f0..a0834e1 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -693,6 +693,36 @@ static inline int page_to_nid(const struct page *page)
}
#endif

+#ifdef CONFIG_BALANCE_NUMA
+static inline int page_xchg_last_nid(struct page *page, int nid)
+{
+ return xchg(&page->_last_nid, nid);
+}
+
+static inline int page_last_nid(struct page *page)
+{
+ return page->_last_nid;
+}
+static inline void reset_page_last_nid(struct page *page)
+{
+ page->_last_nid = -1;
+}
+#else
+static inline int page_xchg_last_nid(struct page *page, int nid)
+{
+ return page_to_nid(page);
+}
+
+static inline int page_last_nid(struct page *page)
+{
+ return page_to_nid(page);
+}
+
+static inline void reset_page_last_nid(struct page *page)
+{
+}
+#endif
+
static inline struct zone *page_zone(const struct page *page)
{
return &NODE_DATA(page_to_nid(page))->node_zones[page_zonenum(page)];
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index b40f4ef..6b478ff 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -175,6 +175,10 @@ struct page {
*/
void *shadow;
#endif
+
+#ifdef CONFIG_BALANCE_NUMA
+ int _last_nid;
+#endif
}
/*
* The struct page can be forced to be double word aligned so that atomic ops
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index df58654..fd6a073 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -608,6 +608,7 @@ static inline int free_pages_check(struct page *page)
bad_page(page);
return 1;
}
+ reset_page_last_nid(page);
if (page->flags & PAGE_FLAGS_CHECK_AT_PREP)
page->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
return 0;
@@ -3826,6 +3827,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
mminit_verify_page_links(page, zone, nid, pfn);
init_page_count(page);
reset_page_mapcount(page);
+ reset_page_last_nid(page);
SetPageReserved(page);
/*
* Mark the block movable so that blocks are reserved for
--
1.7.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Mel Gorman
2012-12-07 10:30:03 UTC
Permalink
The PTE scanning rate and fault rates are two of the biggest sources of
system CPU overhead with automatic NUMA placement. Ideally a proper policy
would detect if a workload was properly placed, schedule and adjust the
PTE scanning rate accordingly. We do not track the necessary information
to do that but we at least know if we migrated or not.

This patch scans slower if a page was not migrated as the result of a
NUMA hinting fault up to sysctl_balance_numa_scan_period_max which is
now higher than the previous default. Once every minute it will reset
the scanner in case of phase changes.

This is hilariously crude and the numbers are arbitrary. Workloads will
converge quite slowly in comparison to what a proper policy should be able
to do. On the plus side, we will chew up less CPU for workloads that have
no need for automatic balancing.

Signed-off-by: Mel Gorman <***@suse.de>
---
include/linux/mm_types.h | 3 +++
include/linux/sched.h | 5 +++--
kernel/sched/core.c | 1 +
kernel/sched/fair.c | 29 +++++++++++++++++++++--------
kernel/sysctl.c | 7 +++++++
mm/huge_memory.c | 2 +-
mm/memory.c | 12 ++++++++----
7 files changed, 44 insertions(+), 15 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 6b478ff..62d18a9 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -410,6 +410,9 @@ struct mm_struct {
*/
unsigned long numa_next_scan;

+ /* numa_next_reset is when the PTE scanner period will be reset */
+ unsigned long numa_next_reset;
+
/* Restart point for scanning and setting pte_numa */
unsigned long numa_scan_offset;

diff --git a/include/linux/sched.h b/include/linux/sched.h
index a2b06ea..1068afd 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1562,9 +1562,9 @@ struct task_struct {
#define tsk_cpus_allowed(tsk) (&(tsk)->cpus_allowed)

#ifdef CONFIG_BALANCE_NUMA
-extern void task_numa_fault(int node, int pages);
+extern void task_numa_fault(int node, int pages, bool migrated);
#else
-static inline void task_numa_fault(int node, int pages)
+static inline void task_numa_fault(int node, int pages, bool migrated)
{
}
#endif
@@ -2009,6 +2009,7 @@ extern enum sched_tunable_scaling sysctl_sched_tunable_scaling;
extern unsigned int sysctl_balance_numa_scan_delay;
extern unsigned int sysctl_balance_numa_scan_period_min;
extern unsigned int sysctl_balance_numa_scan_period_max;
+extern unsigned int sysctl_balance_numa_scan_period_reset;
extern unsigned int sysctl_balance_numa_scan_size;
extern unsigned int sysctl_balance_numa_settle_count;

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 047e3c7..a59d869 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1537,6 +1537,7 @@ static void __sched_fork(struct task_struct *p)
#ifdef CONFIG_BALANCE_NUMA
if (p->mm && atomic_read(&p->mm->mm_users) == 1) {
p->mm->numa_next_scan = jiffies;
+ p->mm->numa_next_reset = jiffies;
p->mm->numa_scan_seq = 0;
}

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 3c632448..c1be907 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -784,7 +784,8 @@ update_stats_curr_start(struct cfs_rq *cfs_rq, struct sched_entity *se)
* numa task sample period in ms
*/
unsigned int sysctl_balance_numa_scan_period_min = 100;
-unsigned int sysctl_balance_numa_scan_period_max = 100*16;
+unsigned int sysctl_balance_numa_scan_period_max = 100*50;
+unsigned int sysctl_balance_numa_scan_period_reset = 100*600;

/* Portion of address space to scan in MB */
unsigned int sysctl_balance_numa_scan_size = 256;
@@ -806,20 +807,19 @@ static void task_numa_placement(struct task_struct *p)
/*
* Got a PROT_NONE fault for a page on @node.
*/
-void task_numa_fault(int node, int pages)
+void task_numa_fault(int node, int pages, bool migrated)
{
struct task_struct *p = current;

/* FIXME: Allocate task-specific structure for placement policy here */

/*
- * Assume that as faults occur that pages are getting properly placed
- * and fewer NUMA hints are required. Note that this is a big
- * assumption, it assumes processes reach a steady steady with no
- * further phase changes.
+ * If pages are properly placed (did not migrate) then scan slower.
+ * This is reset periodically in case of phase changes
*/
- p->numa_scan_period = min(sysctl_balance_numa_scan_period_max,
- p->numa_scan_period + jiffies_to_msecs(2));
+ if (!migrated)
+ p->numa_scan_period = min(sysctl_balance_numa_scan_period_max,
+ p->numa_scan_period + jiffies_to_msecs(10));

task_numa_placement(p);
}
@@ -858,6 +858,19 @@ void task_numa_work(struct callback_head *work)
return;

/*
+ * Reset the scan period if enough time has gone by. Objective is that
+ * scanning will be reduced if pages are properly placed. As tasks
+ * can enter different phases this needs to be re-examined. Lacking
+ * proper tracking of reference behaviour, this blunt hammer is used.
+ */
+ migrate = mm->numa_next_reset;
+ if (time_after(now, migrate)) {
+ p->numa_scan_period = sysctl_balance_numa_scan_period_min;
+ next_scan = now + msecs_to_jiffies(sysctl_balance_numa_scan_period_reset);
+ xchg(&mm->numa_next_reset, next_scan);
+ }
+
+ /*
* Enforce maximal scan/migration frequency..
*/
migrate = mm->numa_next_scan;
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 5ee587d..c335f426 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -367,6 +367,13 @@ static struct ctl_table kern_table[] = {
.proc_handler = proc_dointvec,
},
{
+ .procname = "balance_numa_scan_period_reset",
+ .data = &sysctl_balance_numa_scan_period_reset,
+ .maxlen = sizeof(unsigned int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec,
+ },
+ {
.procname = "balance_numa_scan_period_max_ms",
.data = &sysctl_balance_numa_scan_period_max,
.maxlen = sizeof(unsigned int),
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 4c6efa8..1327a03 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1067,7 +1067,7 @@ out_unlock:
spin_unlock(&mm->page_table_lock);
if (page) {
put_page(page);
- task_numa_fault(numa_node_id(), HPAGE_PMD_NR);
+ task_numa_fault(numa_node_id(), HPAGE_PMD_NR, false);
}
return 0;
}
diff --git a/mm/memory.c b/mm/memory.c
index 6a1e534..30e1335 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3468,6 +3468,7 @@ int do_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
spinlock_t *ptl;
int current_nid = -1;
int target_nid;
+ bool migrated = false;

/*
* The "pte" at this point cannot be used safely without
@@ -3509,12 +3510,13 @@ int do_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
}

/* Migrate to the requested node */
- if (migrate_misplaced_page(page, target_nid))
+ migrated = migrate_misplaced_page(page, target_nid);
+ if (migrated)
current_nid = target_nid;

out:
if (current_nid != -1)
- task_numa_fault(current_nid, 1);
+ task_numa_fault(current_nid, 1, migrated);
return 0;
}

@@ -3554,6 +3556,7 @@ static int do_pmd_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
struct page *page;
int curr_nid = local_nid;
int target_nid;
+ bool migrated;
if (!pte_present(pteval))
continue;
if (!pte_numa(pteval))
@@ -3590,9 +3593,10 @@ static int do_pmd_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,

/* Migrate to the requested node */
pte_unmap_unlock(pte, ptl);
- if (migrate_misplaced_page(page, target_nid))
+ migrated = migrate_misplaced_page(page, target_nid);
+ if (migrated)
curr_nid = target_nid;
- task_numa_fault(curr_nid, 1);
+ task_numa_fault(curr_nid, 1, migrated);

pte = pte_offset_map_lock(mm, pmdp, addr, &ptl);
}
--
1.7.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Mel Gorman
2012-12-07 10:40:01 UTC
Permalink
From: Rik van Riel <***@redhat.com>

If ptep_clear_flush() is called to clear a page table entry that is
accessible anyway by the CPU, eg. a _PAGE_PROTNONE page table entry,
there is no need to flush the TLB on remote CPUs.

Signed-off-by: Rik van Riel <***@redhat.com>
Signed-off-by: Peter Zijlstra <***@chello.nl>
Cc: Linus Torvalds <***@linux-foundation.org>
Cc: Andrew Morton <***@linux-foundation.org>
Link: http://lkml.kernel.org/n/tip-***@git.kernel.org
Signed-off-by: Ingo Molnar <***@kernel.org>
---
mm/pgtable-generic.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c
index d8397da..0c8323f 100644
--- a/mm/pgtable-generic.c
+++ b/mm/pgtable-generic.c
@@ -88,7 +88,8 @@ pte_t ptep_clear_flush(struct vm_area_struct *vma, unsigned long address,
{
pte_t pte;
pte = ptep_get_and_clear((vma)->vm_mm, address, ptep);
- flush_tlb_page(vma, address);
+ if (pte_accessible(pte))
+ flush_tlb_page(vma, address);
return pte;
}
#endif
--
1.7.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Mel Gorman
2012-12-07 10:40:01 UTC
Permalink
Note: Based on "mm/mpol: Use special PROT_NONE to migrate pages" but
sufficiently different that the signed-off-bys were dropped

Combine our previous _PAGE_NUMA, mpol_misplaced and migrate_misplaced_page()
pieces into an effective migrate on fault scheme.

Note that (on x86) we rely on PROT_NONE pages being !present and avoid
the TLB flush from try_to_unmap(TTU_MIGRATION). This greatly improves the
page-migration performance.

Based-on-work-by: Peter Zijlstra <***@chello.nl>
Signed-off-by: Mel Gorman <***@suse.de>
---
include/linux/huge_mm.h | 9 +++++----
mm/huge_memory.c | 31 ++++++++++++++++++++++++++++---
mm/memory.c | 32 +++++++++++++++++++++++++++-----
3 files changed, 60 insertions(+), 12 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index a1d26a9..dabb510 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -160,8 +160,8 @@ static inline struct page *compound_trans_head(struct page *page)
return page;
}

-extern int do_huge_pmd_numa_page(struct mm_struct *mm, unsigned long addr,
- pmd_t pmd, pmd_t *pmdp);
+extern int do_huge_pmd_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
+ unsigned long addr, pmd_t pmd, pmd_t *pmdp);

#else /* CONFIG_TRANSPARENT_HUGEPAGE */
#define HPAGE_PMD_SHIFT ({ BUILD_BUG(); 0; })
@@ -200,9 +200,10 @@ static inline int pmd_trans_huge_lock(pmd_t *pmd,
return 0;
}

-static inline int do_huge_pmd_numa_page(struct mm_struct *mm, unsigned long addr,
- pmd_t pmd, pmd_t *pmdp)
+static inline int do_huge_pmd_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
+ unsigned long addr, pmd_t pmd, pmd_t *pmdp)
{
+ return 0;
}

#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index f1b2d63..df1af09 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -18,6 +18,7 @@
#include <linux/freezer.h>
#include <linux/mman.h>
#include <linux/pagemap.h>
+#include <linux/migrate.h>
#include <asm/tlb.h>
#include <asm/pgalloc.h>
#include "internal.h"
@@ -1018,17 +1019,39 @@ out:
}

/* NUMA hinting page fault entry point for trans huge pmds */
-int do_huge_pmd_numa_page(struct mm_struct *mm, unsigned long addr,
- pmd_t pmd, pmd_t *pmdp)
+int do_huge_pmd_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
+ unsigned long addr, pmd_t pmd, pmd_t *pmdp)
{
- struct page *page;
+ struct page *page = NULL;
unsigned long haddr = addr & HPAGE_PMD_MASK;
+ int target_nid;

spin_lock(&mm->page_table_lock);
if (unlikely(!pmd_same(pmd, *pmdp)))
goto out_unlock;

page = pmd_page(pmd);
+ get_page(page);
+ spin_unlock(&mm->page_table_lock);
+
+ target_nid = mpol_misplaced(page, vma, haddr);
+ if (target_nid == -1)
+ goto clear_pmdnuma;
+
+ /*
+ * Due to lacking code to migrate thp pages, we'll split
+ * (which preserves the special PROT_NONE) and re-take the
+ * fault on the normal pages.
+ */
+ split_huge_page(page);
+ put_page(page);
+ return 0;
+
+clear_pmdnuma:
+ spin_lock(&mm->page_table_lock);
+ if (unlikely(!pmd_same(pmd, *pmdp)))
+ goto out_unlock;
+
pmd = pmd_mknonnuma(pmd);
set_pmd_at(mm, haddr, pmdp, pmd);
VM_BUG_ON(pmd_numa(*pmdp));
@@ -1036,6 +1059,8 @@ int do_huge_pmd_numa_page(struct mm_struct *mm, unsigned long addr,

out_unlock:
spin_unlock(&mm->page_table_lock);
+ if (page)
+ put_page(page);
return 0;
}

diff --git a/mm/memory.c b/mm/memory.c
index 4d005a3..1757ad8 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -57,6 +57,7 @@
#include <linux/swapops.h>
#include <linux/elf.h>
#include <linux/gfp.h>
+#include <linux/migrate.h>

#include <asm/io.h>
#include <asm/pgalloc.h>
@@ -3451,8 +3452,9 @@ static int do_nonlinear_fault(struct mm_struct *mm, struct vm_area_struct *vma,
int do_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
unsigned long addr, pte_t pte, pte_t *ptep, pmd_t *pmd)
{
- struct page *page;
+ struct page *page = NULL;
spinlock_t *ptl;
+ int current_nid, target_nid;

/*
* The "pte" at this point cannot be used safely without
@@ -3465,8 +3467,11 @@ int do_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
*/
ptl = pte_lockptr(mm, pmd);
spin_lock(ptl);
- if (unlikely(!pte_same(*ptep, pte)))
- goto out_unlock;
+ if (unlikely(!pte_same(*ptep, pte))) {
+ pte_unmap_unlock(ptep, ptl);
+ goto out;
+ }
+
pte = pte_mknonnuma(pte);
set_pte_at(mm, addr, ptep, pte);
update_mmu_cache(vma, addr, ptep);
@@ -3477,8 +3482,25 @@ int do_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
return 0;
}

-out_unlock:
+ get_page(page);
+ current_nid = page_to_nid(page);
+ target_nid = mpol_misplaced(page, vma, addr);
pte_unmap_unlock(ptep, ptl);
+ if (target_nid == -1) {
+ /*
+ * Account for the fault against the current node if it not
+ * being replaced regardless of where the page is located.
+ */
+ current_nid = numa_node_id();
+ put_page(page);
+ goto out;
+ }
+
+ /* Migrate to the requested node */
+ if (migrate_misplaced_page(page, target_nid))
+ current_nid = target_nid;
+
+out:
return 0;
}

@@ -3655,7 +3677,7 @@ retry:
barrier();
if (pmd_trans_huge(orig_pmd)) {
if (pmd_numa(*pmd))
- return do_huge_pmd_numa_page(mm, address,
+ return do_huge_pmd_numa_page(mm, vma, address,
orig_pmd, pmd);

if ((flags & FAULT_FLAG_WRITE) && !pmd_write(orig_pmd)) {
--
1.7.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Mel Gorman
2012-12-07 10:40:01 UTC
Permalink
Note: This patch started as "mm/mpol: Create special PROT_NONE
infrastructure" and preserves the basic idea but steals *very*
heavily from "autonuma: numa hinting page faults entry points" for
the actual fault handlers without the migration parts. The end
result is barely recognisable as either patch so all Signed-off
and Reviewed-bys are dropped. If Peter, Ingo and Andrea are ok with
this version, I will re-add the signed-offs-by to reflect the history.

In order to facilitate a lazy -- fault driven -- migration of pages, create
a special transient PAGE_NUMA variant, we can then use the 'spurious'
protection faults to drive our migrations from.

The meaning of PAGE_NUMA depends on the architecture but on x86 it is
effectively PROT_NONE. Actual PROT_NONE mappings will not generate these
NUMA faults for the reason that the page fault code checks the permission on
the VMA (and will throw a segmentation fault on actual PROT_NONE mappings),
before it ever calls handle_mm_fault.

[***@gmail.com: Fix typo]
Signed-off-by: Mel Gorman <***@suse.de>
Reviewed-by: Rik van Riel <***@redhat.com>
---
include/linux/huge_mm.h | 10 +++++
mm/huge_memory.c | 22 ++++++++++
mm/memory.c | 112 +++++++++++++++++++++++++++++++++++++++++++++--
3 files changed, 141 insertions(+), 3 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index b31cb7d..a1d26a9 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -159,6 +159,10 @@ static inline struct page *compound_trans_head(struct page *page)
}
return page;
}
+
+extern int do_huge_pmd_numa_page(struct mm_struct *mm, unsigned long addr,
+ pmd_t pmd, pmd_t *pmdp);
+
#else /* CONFIG_TRANSPARENT_HUGEPAGE */
#define HPAGE_PMD_SHIFT ({ BUILD_BUG(); 0; })
#define HPAGE_PMD_MASK ({ BUILD_BUG(); 0; })
@@ -195,6 +199,12 @@ static inline int pmd_trans_huge_lock(pmd_t *pmd,
{
return 0;
}
+
+static inline int do_huge_pmd_numa_page(struct mm_struct *mm, unsigned long addr,
+ pmd_t pmd, pmd_t *pmdp)
+{
+}
+
#endif /* CONFIG_TRANSPARENT_HUGEPAGE */

#endif /* _LINUX_HUGE_MM_H */
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 3aaf242..f1b2d63 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1017,6 +1017,28 @@ out:
return page;
}

+/* NUMA hinting page fault entry point for trans huge pmds */
+int do_huge_pmd_numa_page(struct mm_struct *mm, unsigned long addr,
+ pmd_t pmd, pmd_t *pmdp)
+{
+ struct page *page;
+ unsigned long haddr = addr & HPAGE_PMD_MASK;
+
+ spin_lock(&mm->page_table_lock);
+ if (unlikely(!pmd_same(pmd, *pmdp)))
+ goto out_unlock;
+
+ page = pmd_page(pmd);
+ pmd = pmd_mknonnuma(pmd);
+ set_pmd_at(mm, haddr, pmdp, pmd);
+ VM_BUG_ON(pmd_numa(*pmdp));
+ update_mmu_cache_pmd(vma, addr, pmdp);
+
+out_unlock:
+ spin_unlock(&mm->page_table_lock);
+ return 0;
+}
+
int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
pmd_t *pmd, unsigned long addr)
{
diff --git a/mm/memory.c b/mm/memory.c
index 73834e7..4d005a3 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3448,6 +3448,103 @@ static int do_nonlinear_fault(struct mm_struct *mm, struct vm_area_struct *vma,
return __do_fault(mm, vma, address, pmd, pgoff, flags, orig_pte);
}

+int do_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
+ unsigned long addr, pte_t pte, pte_t *ptep, pmd_t *pmd)
+{
+ struct page *page;
+ spinlock_t *ptl;
+
+ /*
+ * The "pte" at this point cannot be used safely without
+ * validation through pte_unmap_same(). It's of NUMA type but
+ * the pfn may be screwed if the read is non atomic.
+ *
+ * ptep_modify_prot_start is not called as this is clearing
+ * the _PAGE_NUMA bit and it is not really expected that there
+ * would be concurrent hardware modifications to the PTE.
+ */
+ ptl = pte_lockptr(mm, pmd);
+ spin_lock(ptl);
+ if (unlikely(!pte_same(*ptep, pte)))
+ goto out_unlock;
+ pte = pte_mknonnuma(pte);
+ set_pte_at(mm, addr, ptep, pte);
+ update_mmu_cache(vma, addr, ptep);
+
+ page = vm_normal_page(vma, addr, pte);
+ if (!page) {
+ pte_unmap_unlock(ptep, ptl);
+ return 0;
+ }
+
+out_unlock:
+ pte_unmap_unlock(ptep, ptl);
+ return 0;
+}
+
+/* NUMA hinting page fault entry point for regular pmds */
+#ifdef CONFIG_BALANCE_NUMA
+static int do_pmd_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
+ unsigned long addr, pmd_t *pmdp)
+{
+ pmd_t pmd;
+ pte_t *pte, *orig_pte;
+ unsigned long _addr = addr & PMD_MASK;
+ unsigned long offset;
+ spinlock_t *ptl;
+ bool numa = false;
+
+ spin_lock(&mm->page_table_lock);
+ pmd = *pmdp;
+ if (pmd_numa(pmd)) {
+ set_pmd_at(mm, _addr, pmdp, pmd_mknonnuma(pmd));
+ numa = true;
+ }
+ spin_unlock(&mm->page_table_lock);
+
+ if (!numa)
+ return 0;
+
+ /* we're in a page fault so some vma must be in the range */
+ BUG_ON(!vma);
+ BUG_ON(vma->vm_start >= _addr + PMD_SIZE);
+ offset = max(_addr, vma->vm_start) & ~PMD_MASK;
+ VM_BUG_ON(offset >= PMD_SIZE);
+ orig_pte = pte = pte_offset_map_lock(mm, pmdp, _addr, &ptl);
+ pte += offset >> PAGE_SHIFT;
+ for (addr = _addr + offset; addr < _addr + PMD_SIZE; pte++, addr += PAGE_SIZE) {
+ pte_t pteval = *pte;
+ struct page *page;
+ if (!pte_present(pteval))
+ continue;
+ if (!pte_numa(pteval))
+ continue;
+ if (addr >= vma->vm_end) {
+ vma = find_vma(mm, addr);
+ /* there's a pte present so there must be a vma */
+ BUG_ON(!vma);
+ BUG_ON(addr < vma->vm_start);
+ }
+ if (pte_numa(pteval)) {
+ pteval = pte_mknonnuma(pteval);
+ set_pte_at(mm, addr, pte, pteval);
+ }
+ page = vm_normal_page(vma, addr, pteval);
+ if (unlikely(!page))
+ continue;
+ }
+ pte_unmap_unlock(orig_pte, ptl);
+
+ return 0;
+}
+#else
+static int do_pmd_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
+ unsigned long addr, pmd_t *pmdp)
+{
+ BUG();
+}
+#endif /* CONFIG_BALANCE_NUMA */
+
/*
* These routines also need to handle stuff like marking pages dirty
* and/or accessed for architectures that don't do it in hardware (most
@@ -3486,6 +3583,9 @@ int handle_pte_fault(struct mm_struct *mm,
pte, pmd, flags, entry);
}

+ if (pte_numa(entry))
+ return do_numa_page(mm, vma, address, entry, pte, pmd);
+
ptl = pte_lockptr(mm, pmd);
spin_lock(ptl);
if (unlikely(!pte_same(*pte, entry)))
@@ -3554,9 +3654,11 @@ retry:

barrier();
if (pmd_trans_huge(orig_pmd)) {
- if (flags & FAULT_FLAG_WRITE &&
- !pmd_write(orig_pmd) &&
- !pmd_trans_splitting(orig_pmd)) {
+ if (pmd_numa(*pmd))
+ return do_huge_pmd_numa_page(mm, address,
+ orig_pmd, pmd);
+
+ if ((flags & FAULT_FLAG_WRITE) && !pmd_write(orig_pmd)) {
ret = do_huge_pmd_wp_page(mm, vma, address, pmd,
orig_pmd);
/*
@@ -3568,10 +3670,14 @@ retry:
goto retry;
return ret;
}
+
return 0;
}
}

+ if (pmd_numa(*pmd))
+ return do_pmd_numa_page(mm, vma, address, pmd);
+
/*
* Use __pte_alloc instead of pte_alloc_map, because we can't
* run pte_offset_map on the pmd, if an huge pmd could
--
1.7.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Mel Gorman
2012-12-07 10:40:01 UTC
Permalink
By accounting against the present PTEs, scanning speed reflects the
actual present (mapped) memory.

Suggested-by: Ingo Molnar <***@kernel.org>
Signed-off-by: Peter Zijlstra <***@chello.nl>
Cc: Linus Torvalds <***@linux-foundation.org>
Cc: Andrew Morton <***@linux-foundation.org>
Cc: Peter Zijlstra <***@chello.nl>
Cc: Andrea Arcangeli <***@redhat.com>
Cc: Rik van Riel <***@redhat.com>
Cc: Mel Gorman <***@suse.de>
Signed-off-by: Ingo Molnar <***@kernel.org>
Signed-off-by: Mel Gorman <***@suse.de>
---
kernel/sched/fair.c | 36 +++++++++++++++++++++---------------
1 file changed, 21 insertions(+), 15 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 66d8bd2..773ef97 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -827,8 +827,8 @@ void task_numa_work(struct callback_head *work)
struct task_struct *p = current;
struct mm_struct *mm = p->mm;
struct vm_area_struct *vma;
- unsigned long offset, end;
- long length;
+ unsigned long start, end;
+ long pages;

WARN_ON_ONCE(p != container_of(work, struct task_struct, numa_work));

@@ -858,18 +858,20 @@ void task_numa_work(struct callback_head *work)
if (cmpxchg(&mm->numa_next_scan, migrate, next_scan) != migrate)
return;

- offset = mm->numa_scan_offset;
- length = sysctl_balance_numa_scan_size;
- length <<= 20;
+ start = mm->numa_scan_offset;
+ pages = sysctl_balance_numa_scan_size;
+ pages <<= 20 - PAGE_SHIFT; /* MB in pages */
+ if (!pages)
+ return;

down_read(&mm->mmap_sem);
- vma = find_vma(mm, offset);
+ vma = find_vma(mm, start);
if (!vma) {
reset_ptenuma_scan(p);
- offset = 0;
+ start = 0;
vma = mm->mmap;
}
- for (; vma && length > 0; vma = vma->vm_next) {
+ for (; vma; vma = vma->vm_next) {
if (!vma_migratable(vma))
continue;

@@ -877,15 +879,19 @@ void task_numa_work(struct callback_head *work)
if (((vma->vm_end - vma->vm_start) >> PAGE_SHIFT) < HPAGE_PMD_NR)
continue;

- offset = max(offset, vma->vm_start);
- end = min(ALIGN(offset + length, HPAGE_SIZE), vma->vm_end);
- length -= end - offset;
-
- change_prot_numa(vma, offset, end);
+ do {
+ start = max(start, vma->vm_start);
+ end = ALIGN(start + (pages << PAGE_SHIFT), HPAGE_SIZE);
+ end = min(end, vma->vm_end);
+ pages -= change_prot_numa(vma, start, end);

- offset = end;
+ start = end;
+ if (pages <= 0)
+ goto out;
+ } while (end != vma->vm_end);
}

+out:
/*
* It is possible to reach the end of the VMA list but the last few VMAs are
* not guaranteed to the vma_migratable. If they are not, we would find the
@@ -893,7 +899,7 @@ void task_numa_work(struct callback_head *work)
* so check it now.
*/
if (vma)
- mm->numa_scan_offset = offset;
+ mm->numa_scan_offset = start;
else
reset_ptenuma_scan(p);
up_read(&mm->mmap_sem);
--
1.7.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Mel Gorman
2012-12-07 10:40:01 UTC
Permalink
The pgmigrate_success and pgmigrate_fail vmstat counters tells the user
about migration activity but not the type or the reason. This patch adds
a tracepoint to identify the type of page migration and why the page is
being migrated.

Signed-off-by: Mel Gorman <***@suse.de>
Reviewed-by: Rik van Riel <***@redhat.com>
---
include/linux/migrate.h | 13 ++++++++--
include/trace/events/migrate.h | 51 ++++++++++++++++++++++++++++++++++++++++
mm/compaction.c | 3 ++-
mm/memory-failure.c | 3 ++-
mm/memory_hotplug.c | 3 ++-
mm/mempolicy.c | 6 +++--
mm/migrate.c | 10 ++++++--
mm/page_alloc.c | 3 ++-
8 files changed, 82 insertions(+), 10 deletions(-)
create mode 100644 include/trace/events/migrate.h

diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index ce7e667..9d1c159 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -7,6 +7,15 @@

typedef struct page *new_page_t(struct page *, unsigned long private, int **);

+enum migrate_reason {
+ MR_COMPACTION,
+ MR_MEMORY_FAILURE,
+ MR_MEMORY_HOTPLUG,
+ MR_SYSCALL, /* also applies to cpusets */
+ MR_MEMPOLICY_MBIND,
+ MR_CMA
+};
+
#ifdef CONFIG_MIGRATION

extern void putback_lru_pages(struct list_head *l);
@@ -14,7 +23,7 @@ extern int migrate_page(struct address_space *,
struct page *, struct page *, enum migrate_mode);
extern int migrate_pages(struct list_head *l, new_page_t x,
unsigned long private, bool offlining,
- enum migrate_mode mode);
+ enum migrate_mode mode, int reason);
extern int migrate_huge_page(struct page *, new_page_t x,
unsigned long private, bool offlining,
enum migrate_mode mode);
@@ -35,7 +44,7 @@ extern int migrate_huge_page_move_mapping(struct address_space *mapping,
static inline void putback_lru_pages(struct list_head *l) {}
static inline int migrate_pages(struct list_head *l, new_page_t x,
unsigned long private, bool offlining,
- enum migrate_mode mode) { return -ENOSYS; }
+ enum migrate_mode mode, int reason) { return -ENOSYS; }
static inline int migrate_huge_page(struct page *page, new_page_t x,
unsigned long private, bool offlining,
enum migrate_mode mode) { return -ENOSYS; }
diff --git a/include/trace/events/migrate.h b/include/trace/events/migrate.h
new file mode 100644
index 0000000..ec2a6cc
--- /dev/null
+++ b/include/trace/events/migrate.h
@@ -0,0 +1,51 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM migrate
+
+#if !defined(_TRACE_MIGRATE_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_MIGRATE_H
+
+#define MIGRATE_MODE \
+ {MIGRATE_ASYNC, "MIGRATE_ASYNC"}, \
+ {MIGRATE_SYNC_LIGHT, "MIGRATE_SYNC_LIGHT"}, \
+ {MIGRATE_SYNC, "MIGRATE_SYNC"}
+
+#define MIGRATE_REASON \
+ {MR_COMPACTION, "compaction"}, \
+ {MR_MEMORY_FAILURE, "memory_failure"}, \
+ {MR_MEMORY_HOTPLUG, "memory_hotplug"}, \
+ {MR_SYSCALL, "syscall_or_cpuset"}, \
+ {MR_MEMPOLICY_MBIND, "mempolicy_mbind"}, \
+ {MR_CMA, "cma"}
+
+TRACE_EVENT(mm_migrate_pages,
+
+ TP_PROTO(unsigned long succeeded, unsigned long failed,
+ enum migrate_mode mode, int reason),
+
+ TP_ARGS(succeeded, failed, mode, reason),
+
+ TP_STRUCT__entry(
+ __field( unsigned long, succeeded)
+ __field( unsigned long, failed)
+ __field( enum migrate_mode, mode)
+ __field( int, reason)
+ ),
+
+ TP_fast_assign(
+ __entry->succeeded = succeeded;
+ __entry->failed = failed;
+ __entry->mode = mode;
+ __entry->reason = reason;
+ ),
+
+ TP_printk("nr_succeeded=%lu nr_failed=%lu mode=%s reason=%s",
+ __entry->succeeded,
+ __entry->failed,
+ __print_symbolic(__entry->mode, MIGRATE_MODE),
+ __print_symbolic(__entry->reason, MIGRATE_REASON))
+);
+
+#endif /* _TRACE_MIGRATE_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
diff --git a/mm/compaction.c b/mm/compaction.c
index 00ad883..2c077a7 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -990,7 +990,8 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
nr_migrate = cc->nr_migratepages;
err = migrate_pages(&cc->migratepages, compaction_alloc,
(unsigned long)cc, false,
- cc->sync ? MIGRATE_SYNC_LIGHT : MIGRATE_ASYNC);
+ cc->sync ? MIGRATE_SYNC_LIGHT : MIGRATE_ASYNC,
+ MR_COMPACTION);
update_nr_listpages(cc);
nr_remaining = cc->nr_migratepages;

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 6c5899b..ddb68a1 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1558,7 +1558,8 @@ int soft_offline_page(struct page *page, int flags)
page_is_file_cache(page));
list_add(&page->lru, &pagelist);
ret = migrate_pages(&pagelist, new_page, MPOL_MF_MOVE_ALL,
- false, MIGRATE_SYNC);
+ false, MIGRATE_SYNC,
+ MR_MEMORY_FAILURE);
if (ret) {
putback_lru_pages(&pagelist);
pr_info("soft offline: %#lx: migration failed %d, type %lx\n",
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index e4eeaca..e598bd1 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -812,7 +812,8 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
* migrate_pages returns # of failed pages.
*/
ret = migrate_pages(&source, alloc_migrate_target, 0,
- true, MIGRATE_SYNC);
+ true, MIGRATE_SYNC,
+ MR_MEMORY_HOTPLUG);
if (ret)
putback_lru_pages(&source);
}
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index d04a8a5..66e90ec 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -961,7 +961,8 @@ static int migrate_to_node(struct mm_struct *mm, int source, int dest,

if (!list_empty(&pagelist)) {
err = migrate_pages(&pagelist, new_node_page, dest,
- false, MIGRATE_SYNC);
+ false, MIGRATE_SYNC,
+ MR_SYSCALL);
if (err)
putback_lru_pages(&pagelist);
}
@@ -1202,7 +1203,8 @@ static long do_mbind(unsigned long start, unsigned long len,
if (!list_empty(&pagelist)) {
nr_failed = migrate_pages(&pagelist, new_vma_page,
(unsigned long)vma,
- false, MIGRATE_SYNC);
+ false, MIGRATE_SYNC,
+ MR_MEMPOLICY_MBIND);
if (nr_failed)
putback_lru_pages(&pagelist);
}
diff --git a/mm/migrate.c b/mm/migrate.c
index 04687f6..27be9c9 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -38,6 +38,9 @@

#include <asm/tlbflush.h>

+#define CREATE_TRACE_POINTS
+#include <trace/events/migrate.h>
+
#include "internal.h"

/*
@@ -958,7 +961,7 @@ out:
*/
int migrate_pages(struct list_head *from,
new_page_t get_new_page, unsigned long private, bool offlining,
- enum migrate_mode mode)
+ enum migrate_mode mode, int reason)
{
int retry = 1;
int nr_failed = 0;
@@ -1004,6 +1007,8 @@ out:
count_vm_events(PGMIGRATE_SUCCESS, nr_succeeded);
if (nr_failed)
count_vm_events(PGMIGRATE_FAIL, nr_failed);
+ trace_mm_migrate_pages(nr_succeeded, nr_failed, mode, reason);
+
if (!swapwrite)
current->flags &= ~PF_SWAPWRITE;

@@ -1145,7 +1150,8 @@ set_status:
err = 0;
if (!list_empty(&pagelist)) {
err = migrate_pages(&pagelist, new_page_node,
- (unsigned long)pm, 0, MIGRATE_SYNC);
+ (unsigned long)pm, 0, MIGRATE_SYNC,
+ MR_SYSCALL);
if (err)
putback_lru_pages(&pagelist);
}
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7bb35ac..5953dc2 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5707,7 +5707,8 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,

ret = migrate_pages(&cc->migratepages,
alloc_migrate_target,
- 0, false, MIGRATE_SYNC);
+ 0, false, MIGRATE_SYNC,
+ MR_CMA);
}

putback_lru_pages(&cc->migratepages);
--
1.7.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Mel Gorman
2012-12-07 10:40:01 UTC
Permalink
From: Lee Schermerhorn <***@hp.com>

NOTE: Once again there is a lot of patch stealing and the end result
is sufficiently different that I had to drop the signed-offs.
Will re-add if the original authors are ok with that.

This patch adds another mbind() flag to request "lazy migration". The
flag, MPOL_MF_LAZY, modifies MPOL_MF_MOVE* such that the selected
pages are marked PROT_NONE. The pages will be migrated in the fault
path on "first touch", if the policy dictates at that time.

"Lazy Migration" will allow testing of migrate-on-fault via mbind().
Also allows applications to specify that only subsequently touched
pages be migrated to obey new policy, instead of all pages in range.
This can be useful for multi-threaded applications working on a
large shared data area that is initialized by an initial thread
resulting in all pages on one [or a few, if overflowed] nodes.
After PROT_NONE, the pages in regions assigned to the worker threads
will be automatically migrated local to the threads on 1st touch.

Signed-off-by: Mel Gorman <***@suse.de>
Reviewed-by: Rik van Riel <***@redhat.com>
---
include/linux/mm.h | 5 ++
include/uapi/linux/mempolicy.h | 13 ++-
mm/mempolicy.c | 185 ++++++++++++++++++++++++++++++++++++----
3 files changed, 185 insertions(+), 18 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index fa16152..471185e 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1551,6 +1551,11 @@ static inline pgprot_t vm_get_page_prot(unsigned long vm_flags)
}
#endif

+#ifdef CONFIG_ARCH_USES_NUMA_PROT_NONE
+void change_prot_numa(struct vm_area_struct *vma,
+ unsigned long start, unsigned long end);
+#endif
+
struct vm_area_struct *find_extend_vma(struct mm_struct *, unsigned long addr);
int remap_pfn_range(struct vm_area_struct *, unsigned long addr,
unsigned long pfn, unsigned long size, pgprot_t);
diff --git a/include/uapi/linux/mempolicy.h b/include/uapi/linux/mempolicy.h
index 472de8a..6a1baae 100644
--- a/include/uapi/linux/mempolicy.h
+++ b/include/uapi/linux/mempolicy.h
@@ -49,9 +49,16 @@ enum mpol_rebind_step {

/* Flags for mbind */
#define MPOL_MF_STRICT (1<<0) /* Verify existing pages in the mapping */
-#define MPOL_MF_MOVE (1<<1) /* Move pages owned by this process to conform to mapping */
-#define MPOL_MF_MOVE_ALL (1<<2) /* Move every page to conform to mapping */
-#define MPOL_MF_INTERNAL (1<<3) /* Internal flags start here */
+#define MPOL_MF_MOVE (1<<1) /* Move pages owned by this process to conform
+ to policy */
+#define MPOL_MF_MOVE_ALL (1<<2) /* Move every page to conform to policy */
+#define MPOL_MF_LAZY (1<<3) /* Modifies '_MOVE: lazy migrate on fault */
+#define MPOL_MF_INTERNAL (1<<4) /* Internal flags start here */
+
+#define MPOL_MF_VALID (MPOL_MF_STRICT | \
+ MPOL_MF_MOVE | \
+ MPOL_MF_MOVE_ALL | \
+ MPOL_MF_LAZY)

/*
* Internal flags that share the struct mempolicy flags word with
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index df1466d..51d3ebd 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -90,6 +90,7 @@
#include <linux/syscalls.h>
#include <linux/ctype.h>
#include <linux/mm_inline.h>
+#include <linux/mmu_notifier.h>

#include <asm/tlbflush.h>
#include <asm/uaccess.h>
@@ -565,6 +566,145 @@ static inline int check_pgd_range(struct vm_area_struct *vma,
return 0;
}

+#ifdef CONFIG_ARCH_USES_NUMA_PROT_NONE
+/*
+ * Here we search for not shared page mappings (mapcount == 1) and we
+ * set up the pmd/pte_numa on those mappings so the very next access
+ * will fire a NUMA hinting page fault.
+ */
+static int
+change_prot_numa_range(struct mm_struct *mm, struct vm_area_struct *vma,
+ unsigned long address)
+{
+ pgd_t *pgd;
+ pud_t *pud;
+ pmd_t *pmd;
+ pte_t *pte, *_pte;
+ struct page *page;
+ unsigned long _address, end;
+ spinlock_t *ptl;
+ int ret = 0;
+
+ VM_BUG_ON(address & ~PAGE_MASK);
+
+ pgd = pgd_offset(mm, address);
+ if (!pgd_present(*pgd))
+ goto out;
+
+ pud = pud_offset(pgd, address);
+ if (!pud_present(*pud))
+ goto out;
+
+ pmd = pmd_offset(pud, address);
+ if (pmd_none(*pmd))
+ goto out;
+
+ if (pmd_trans_huge_lock(pmd, vma) == 1) {
+ int page_nid;
+ ret = HPAGE_PMD_NR;
+
+ VM_BUG_ON(address & ~HPAGE_PMD_MASK);
+
+ if (pmd_numa(*pmd)) {
+ spin_unlock(&mm->page_table_lock);
+ goto out;
+ }
+
+ page = pmd_page(*pmd);
+
+ /* only check non-shared pages */
+ if (page_mapcount(page) != 1) {
+ spin_unlock(&mm->page_table_lock);
+ goto out;
+ }
+
+ page_nid = page_to_nid(page);
+
+ if (pmd_numa(*pmd)) {
+ spin_unlock(&mm->page_table_lock);
+ goto out;
+ }
+
+ set_pmd_at(mm, address, pmd, pmd_mknuma(*pmd));
+ ret += HPAGE_PMD_NR;
+ /* defer TLB flush to lower the overhead */
+ spin_unlock(&mm->page_table_lock);
+ goto out;
+ }
+
+ if (pmd_trans_unstable(pmd))
+ goto out;
+ VM_BUG_ON(!pmd_present(*pmd));
+
+ end = min(vma->vm_end, (address + PMD_SIZE) & PMD_MASK);
+ pte = pte_offset_map_lock(mm, pmd, address, &ptl);
+ for (_address = address, _pte = pte; _address < end;
+ _pte++, _address += PAGE_SIZE) {
+ pte_t pteval = *_pte;
+ if (!pte_present(pteval))
+ continue;
+ if (pte_numa(pteval))
+ continue;
+ page = vm_normal_page(vma, _address, pteval);
+ if (unlikely(!page))
+ continue;
+ /* only check non-shared pages */
+ if (page_mapcount(page) != 1)
+ continue;
+
+ set_pte_at(mm, _address, _pte, pte_mknuma(pteval));
+
+ /* defer TLB flush to lower the overhead */
+ ret++;
+ }
+ pte_unmap_unlock(pte, ptl);
+
+ if (ret && !pmd_numa(*pmd)) {
+ spin_lock(&mm->page_table_lock);
+ set_pmd_at(mm, address, pmd, pmd_mknuma(*pmd));
+ spin_unlock(&mm->page_table_lock);
+ /* defer TLB flush to lower the overhead */
+ }
+
+out:
+ return ret;
+}
+
+/* Assumes mmap_sem is held */
+void
+change_prot_numa(struct vm_area_struct *vma,
+ unsigned long address, unsigned long end)
+{
+ struct mm_struct *mm = vma->vm_mm;
+ int progress = 0;
+
+ while (address < end) {
+ VM_BUG_ON(address < vma->vm_start ||
+ address + PAGE_SIZE > vma->vm_end);
+
+ progress += change_prot_numa_range(mm, vma, address);
+ address = (address + PMD_SIZE) & PMD_MASK;
+ }
+
+ /*
+ * Flush the TLB for the mm to start the NUMA hinting
+ * page faults after we finish scanning this vma part
+ * if there were any PTE updates
+ */
+ if (progress) {
+ mmu_notifier_invalidate_range_start(vma->vm_mm, address, end);
+ flush_tlb_range(vma, address, end);
+ mmu_notifier_invalidate_range_end(vma->vm_mm, address, end);
+ }
+}
+#else
+static unsigned long change_prot_numa(struct vm_area_struct *vma,
+ unsigned long addr, unsigned long end)
+{
+ return 0;
+}
+#endif /* CONFIG_ARCH_USES_NUMA_PROT_NONE */
+
/*
* Check if all pages in a range are on a set of nodes.
* If pagelist != NULL then isolate pages from the LRU and
@@ -583,22 +723,32 @@ check_range(struct mm_struct *mm, unsigned long start, unsigned long end,
return ERR_PTR(-EFAULT);
prev = NULL;
for (vma = first; vma && vma->vm_start < end; vma = vma->vm_next) {
+ unsigned long endvma = vma->vm_end;
+
+ if (endvma > end)
+ endvma = end;
+ if (vma->vm_start > start)
+ start = vma->vm_start;
+
if (!(flags & MPOL_MF_DISCONTIG_OK)) {
if (!vma->vm_next && vma->vm_end < end)
return ERR_PTR(-EFAULT);
if (prev && prev->vm_end < vma->vm_start)
return ERR_PTR(-EFAULT);
}
- if (!is_vm_hugetlb_page(vma) &&
- ((flags & MPOL_MF_STRICT) ||
+
+ if (is_vm_hugetlb_page(vma))
+ goto next;
+
+ if (flags & MPOL_MF_LAZY) {
+ change_prot_numa(vma, start, endvma);
+ goto next;
+ }
+
+ if ((flags & MPOL_MF_STRICT) ||
((flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) &&
- vma_migratable(vma)))) {
- unsigned long endvma = vma->vm_end;
+ vma_migratable(vma))) {

- if (endvma > end)
- endvma = end;
- if (vma->vm_start > start)
- start = vma->vm_start;
err = check_pgd_range(vma, start, endvma, nodes,
flags, private);
if (err) {
@@ -606,6 +756,7 @@ check_range(struct mm_struct *mm, unsigned long start, unsigned long end,
break;
}
}
+next:
prev = vma;
}
return first;
@@ -1138,8 +1289,7 @@ static long do_mbind(unsigned long start, unsigned long len,
int err;
LIST_HEAD(pagelist);

- if (flags & ~(unsigned long)(MPOL_MF_STRICT |
- MPOL_MF_MOVE | MPOL_MF_MOVE_ALL))
+ if (flags & ~(unsigned long)MPOL_MF_VALID)
return -EINVAL;
if ((flags & MPOL_MF_MOVE_ALL) && !capable(CAP_SYS_NICE))
return -EPERM;
@@ -1162,6 +1312,9 @@ static long do_mbind(unsigned long start, unsigned long len,
if (IS_ERR(new))
return PTR_ERR(new);

+ if (flags & MPOL_MF_LAZY)
+ new->flags |= MPOL_F_MOF;
+
/*
* If we are using the default policy then operation
* on discontinuous address spaces is okay after all
@@ -1198,13 +1351,15 @@ static long do_mbind(unsigned long start, unsigned long len,
vma = check_range(mm, start, end, nmask,
flags | MPOL_MF_INVERT, &pagelist);

- err = PTR_ERR(vma);
- if (!IS_ERR(vma)) {
- int nr_failed = 0;
-
+ err = PTR_ERR(vma); /* maybe ... */
+ if (!IS_ERR(vma) && mode != MPOL_NOOP)
err = mbind_range(mm, start, end, new);

+ if (!err) {
+ int nr_failed = 0;
+
if (!list_empty(&pagelist)) {
+ WARN_ON_ONCE(flags & MPOL_MF_LAZY);
nr_failed = migrate_pages(&pagelist, new_vma_page,
(unsigned long)vma,
false, MIGRATE_SYNC,
@@ -1213,7 +1368,7 @@ static long do_mbind(unsigned long start, unsigned long len,
putback_lru_pages(&pagelist);
}

- if (!err && nr_failed && (flags & MPOL_MF_STRICT))
+ if (nr_failed && (flags & MPOL_MF_STRICT))
err = -EIO;
} else
putback_lru_pages(&pagelist);
--
1.7.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Simon Jeons
2013-01-05 05:20:01 UTC
Permalink
Post by Mel Gorman
NOTE: Once again there is a lot of patch stealing and the end result
is sufficiently different that I had to drop the signed-offs.
Will re-add if the original authors are ok with that.
This patch adds another mbind() flag to request "lazy migration". The
flag, MPOL_MF_LAZY, modifies MPOL_MF_MOVE* such that the selected
pages are marked PROT_NONE. The pages will be migrated in the fault
path on "first touch", if the policy dictates at that time.
"Lazy Migration" will allow testing of migrate-on-fault via mbind().
Also allows applications to specify that only subsequently touched
pages be migrated to obey new policy, instead of all pages in range.
This can be useful for multi-threaded applications working on a
large shared data area that is initialized by an initial thread
resulting in all pages on one [or a few, if overflowed] nodes.
After PROT_NONE, the pages in regions assigned to the worker threads
will be automatically migrated local to the threads on 1st touch.
---
include/linux/mm.h | 5 ++
include/uapi/linux/mempolicy.h | 13 ++-
mm/mempolicy.c | 185 ++++++++++++++++++++++++++++++++++++----
3 files changed, 185 insertions(+), 18 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index fa16152..471185e 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1551,6 +1551,11 @@ static inline pgprot_t vm_get_page_prot(unsigned long vm_flags)
}
#endif
+#ifdef CONFIG_ARCH_USES_NUMA_PROT_NONE
+void change_prot_numa(struct vm_area_struct *vma,
+ unsigned long start, unsigned long end);
+#endif
+
struct vm_area_struct *find_extend_vma(struct mm_struct *, unsigned long addr);
int remap_pfn_range(struct vm_area_struct *, unsigned long addr,
unsigned long pfn, unsigned long size, pgprot_t);
diff --git a/include/uapi/linux/mempolicy.h b/include/uapi/linux/mempolicy.h
index 472de8a..6a1baae 100644
--- a/include/uapi/linux/mempolicy.h
+++ b/include/uapi/linux/mempolicy.h
@@ -49,9 +49,16 @@ enum mpol_rebind_step {
/* Flags for mbind */
#define MPOL_MF_STRICT (1<<0) /* Verify existing pages in the mapping */
-#define MPOL_MF_MOVE (1<<1) /* Move pages owned by this process to conform to mapping */
-#define MPOL_MF_MOVE_ALL (1<<2) /* Move every page to conform to mapping */
-#define MPOL_MF_INTERNAL (1<<3) /* Internal flags start here */
+#define MPOL_MF_MOVE (1<<1) /* Move pages owned by this process to conform
+ to policy */
+#define MPOL_MF_MOVE_ALL (1<<2) /* Move every page to conform to policy */
+#define MPOL_MF_LAZY (1<<3) /* Modifies '_MOVE: lazy migrate on fault */
+#define MPOL_MF_INTERNAL (1<<4) /* Internal flags start here */
+
+#define MPOL_MF_VALID (MPOL_MF_STRICT | \
+ MPOL_MF_MOVE | \
+ MPOL_MF_MOVE_ALL | \
+ MPOL_MF_LAZY)
/*
* Internal flags that share the struct mempolicy flags word with
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index df1466d..51d3ebd 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -90,6 +90,7 @@
#include <linux/syscalls.h>
#include <linux/ctype.h>
#include <linux/mm_inline.h>
+#include <linux/mmu_notifier.h>
#include <asm/tlbflush.h>
#include <asm/uaccess.h>
@@ -565,6 +566,145 @@ static inline int check_pgd_range(struct vm_area_struct *vma,
return 0;
}
+#ifdef CONFIG_ARCH_USES_NUMA_PROT_NONE
+/*
+ * Here we search for not shared page mappings (mapcount == 1) and we
+ * set up the pmd/pte_numa on those mappings so the very next access
+ * will fire a NUMA hinting page fault.
+ */
+static int
+change_prot_numa_range(struct mm_struct *mm, struct vm_area_struct *vma,
+ unsigned long address)
+{
+ pgd_t *pgd;
+ pud_t *pud;
+ pmd_t *pmd;
+ pte_t *pte, *_pte;
+ struct page *page;
+ unsigned long _address, end;
+ spinlock_t *ptl;
+ int ret = 0;
+
+ VM_BUG_ON(address & ~PAGE_MASK);
+
+ pgd = pgd_offset(mm, address);
+ if (!pgd_present(*pgd))
+ goto out;
+
+ pud = pud_offset(pgd, address);
+ if (!pud_present(*pud))
+ goto out;
+
+ pmd = pmd_offset(pud, address);
+ if (pmd_none(*pmd))
+ goto out;
+
+ if (pmd_trans_huge_lock(pmd, vma) == 1) {
+ int page_nid;
+ ret = HPAGE_PMD_NR;
+
+ VM_BUG_ON(address & ~HPAGE_PMD_MASK);
+
+ if (pmd_numa(*pmd)) {
+ spin_unlock(&mm->page_table_lock);
+ goto out;
+ }
+
+ page = pmd_page(*pmd);
+
+ /* only check non-shared pages */
+ if (page_mapcount(page) != 1) {
+ spin_unlock(&mm->page_table_lock);
+ goto out;
+ }
+
+ page_nid = page_to_nid(page);
+
+ if (pmd_numa(*pmd)) {
+ spin_unlock(&mm->page_table_lock);
+ goto out;
+ }
+
Hi Gorman,

Since pmd_trans_huge_lock has already held &mm->page_table_lock, then
why check pmd_numa(*pmd) again?
Post by Mel Gorman
+ set_pmd_at(mm, address, pmd, pmd_mknuma(*pmd));
+ ret += HPAGE_PMD_NR;
+ /* defer TLB flush to lower the overhead */
+ spin_unlock(&mm->page_table_lock);
+ goto out;
+ }
+
+ if (pmd_trans_unstable(pmd))
+ goto out;
+ VM_BUG_ON(!pmd_present(*pmd));
+
+ end = min(vma->vm_end, (address + PMD_SIZE) & PMD_MASK);
+ pte = pte_offset_map_lock(mm, pmd, address, &ptl);
+ for (_address = address, _pte = pte; _address < end;
+ _pte++, _address += PAGE_SIZE) {
+ pte_t pteval = *_pte;
+ if (!pte_present(pteval))
+ continue;
+ if (pte_numa(pteval))
+ continue;
+ page = vm_normal_page(vma, _address, pteval);
+ if (unlikely(!page))
+ continue;
+ /* only check non-shared pages */
+ if (page_mapcount(page) != 1)
+ continue;
+
+ set_pte_at(mm, _address, _pte, pte_mknuma(pteval));
+
+ /* defer TLB flush to lower the overhead */
+ ret++;
+ }
+ pte_unmap_unlock(pte, ptl);
+
+ if (ret && !pmd_numa(*pmd)) {
+ spin_lock(&mm->page_table_lock);
+ set_pmd_at(mm, address, pmd, pmd_mknuma(*pmd));
+ spin_unlock(&mm->page_table_lock);
+ /* defer TLB flush to lower the overhead */
+ }
+
+ return ret;
+}
+
+/* Assumes mmap_sem is held */
+void
+change_prot_numa(struct vm_area_struct *vma,
+ unsigned long address, unsigned long end)
+{
+ struct mm_struct *mm = vma->vm_mm;
+ int progress = 0;
+
+ while (address < end) {
+ VM_BUG_ON(address < vma->vm_start ||
+ address + PAGE_SIZE > vma->vm_end);
+
+ progress += change_prot_numa_range(mm, vma, address);
+ address = (address + PMD_SIZE) & PMD_MASK;
+ }
+
+ /*
+ * Flush the TLB for the mm to start the NUMA hinting
+ * page faults after we finish scanning this vma part
+ * if there were any PTE updates
+ */
+ if (progress) {
+ mmu_notifier_invalidate_range_start(vma->vm_mm, address, end);
+ flush_tlb_range(vma, address, end);
+ mmu_notifier_invalidate_range_end(vma->vm_mm, address, end);
+ }
+}
+#else
+static unsigned long change_prot_numa(struct vm_area_struct *vma,
+ unsigned long addr, unsigned long end)
+{
+ return 0;
+}
+#endif /* CONFIG_ARCH_USES_NUMA_PROT_NONE */
+
/*
* Check if all pages in a range are on a set of nodes.
* If pagelist != NULL then isolate pages from the LRU and
@@ -583,22 +723,32 @@ check_range(struct mm_struct *mm, unsigned long start, unsigned long end,
return ERR_PTR(-EFAULT);
prev = NULL;
for (vma = first; vma && vma->vm_start < end; vma = vma->vm_next) {
+ unsigned long endvma = vma->vm_end;
+
+ if (endvma > end)
+ endvma = end;
+ if (vma->vm_start > start)
+ start = vma->vm_start;
+
if (!(flags & MPOL_MF_DISCONTIG_OK)) {
if (!vma->vm_next && vma->vm_end < end)
return ERR_PTR(-EFAULT);
if (prev && prev->vm_end < vma->vm_start)
return ERR_PTR(-EFAULT);
}
- if (!is_vm_hugetlb_page(vma) &&
- ((flags & MPOL_MF_STRICT) ||
+
+ if (is_vm_hugetlb_page(vma))
+ goto next;
+
+ if (flags & MPOL_MF_LAZY) {
+ change_prot_numa(vma, start, endvma);
+ goto next;
+ }
+
+ if ((flags & MPOL_MF_STRICT) ||
((flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) &&
- vma_migratable(vma)))) {
- unsigned long endvma = vma->vm_end;
+ vma_migratable(vma))) {
- if (endvma > end)
- endvma = end;
- if (vma->vm_start > start)
- start = vma->vm_start;
err = check_pgd_range(vma, start, endvma, nodes,
flags, private);
if (err) {
@@ -606,6 +756,7 @@ check_range(struct mm_struct *mm, unsigned long start, unsigned long end,
break;
}
}
prev = vma;
}
return first;
@@ -1138,8 +1289,7 @@ static long do_mbind(unsigned long start, unsigned long len,
int err;
LIST_HEAD(pagelist);
- if (flags & ~(unsigned long)(MPOL_MF_STRICT |
- MPOL_MF_MOVE | MPOL_MF_MOVE_ALL))
+ if (flags & ~(unsigned long)MPOL_MF_VALID)
return -EINVAL;
if ((flags & MPOL_MF_MOVE_ALL) && !capable(CAP_SYS_NICE))
return -EPERM;
@@ -1162,6 +1312,9 @@ static long do_mbind(unsigned long start, unsigned long len,
if (IS_ERR(new))
return PTR_ERR(new);
+ if (flags & MPOL_MF_LAZY)
+ new->flags |= MPOL_F_MOF;
+
/*
* If we are using the default policy then operation
* on discontinuous address spaces is okay after all
@@ -1198,13 +1351,15 @@ static long do_mbind(unsigned long start, unsigned long len,
vma = check_range(mm, start, end, nmask,
flags | MPOL_MF_INVERT, &pagelist);
- err = PTR_ERR(vma);
- if (!IS_ERR(vma)) {
- int nr_failed = 0;
-
+ err = PTR_ERR(vma); /* maybe ... */
+ if (!IS_ERR(vma) && mode != MPOL_NOOP)
err = mbind_range(mm, start, end, new);
+ if (!err) {
+ int nr_failed = 0;
+
if (!list_empty(&pagelist)) {
+ WARN_ON_ONCE(flags & MPOL_MF_LAZY);
nr_failed = migrate_pages(&pagelist, new_vma_page,
(unsigned long)vma,
false, MIGRATE_SYNC,
@@ -1213,7 +1368,7 @@ static long do_mbind(unsigned long start, unsigned long len,
putback_lru_pages(&pagelist);
}
- if (!err && nr_failed && (flags & MPOL_MF_STRICT))
+ if (nr_failed && (flags & MPOL_MF_STRICT))
err = -EIO;
} else
putback_lru_pages(&pagelist);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Mel Gorman
2013-01-07 15:20:02 UTC
Permalink
Post by Simon Jeons
Post by Mel Gorman
+static int
+change_prot_numa_range(struct mm_struct *mm, struct vm_area_struct *vma,
+ unsigned long address)
+{
+ pgd_t *pgd;
+ pud_t *pud;
+ pmd_t *pmd;
+ pte_t *pte, *_pte;
+ struct page *page;
+ unsigned long _address, end;
+ spinlock_t *ptl;
+ int ret = 0;
+
+ VM_BUG_ON(address & ~PAGE_MASK);
+
+ pgd = pgd_offset(mm, address);
+ if (!pgd_present(*pgd))
+ goto out;
+
+ pud = pud_offset(pgd, address);
+ if (!pud_present(*pud))
+ goto out;
+
+ pmd = pmd_offset(pud, address);
+ if (pmd_none(*pmd))
+ goto out;
+
+ if (pmd_trans_huge_lock(pmd, vma) == 1) {
+ int page_nid;
+ ret = HPAGE_PMD_NR;
+
+ VM_BUG_ON(address & ~HPAGE_PMD_MASK);
+
+ if (pmd_numa(*pmd)) {
+ spin_unlock(&mm->page_table_lock);
+ goto out;
+ }
+
+ page = pmd_page(*pmd);
+
+ /* only check non-shared pages */
+ if (page_mapcount(page) != 1) {
+ spin_unlock(&mm->page_table_lock);
+ goto out;
+ }
+
+ page_nid = page_to_nid(page);
+
+ if (pmd_numa(*pmd)) {
+ spin_unlock(&mm->page_table_lock);
+ goto out;
+ }
+
Hi Gorman,
Since pmd_trans_huge_lock has already held &mm->page_table_lock, then
why check pmd_numa(*pmd) again?
It looks like oversight. I've added a TODO item to clean it up when I
revisit NUMA balancing some time soon.

Thanks.
--
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Mel Gorman
2012-12-07 10:40:02 UTC
Permalink
From: Andrea Arcangeli <***@redhat.com>

The objective of _PAGE_NUMA is to be able to trigger NUMA hinting page
faults to identify the per NUMA node working set of the thread at
runtime.

Arming the NUMA hinting page fault mechanism works similarly to
setting up a mprotect(PROT_NONE) virtual range: the present bit is
cleared at the same time that _PAGE_NUMA is set, so when the fault
triggers we can identify it as a NUMA hinting page fault.

_PAGE_NUMA on x86 shares the same bit number of _PAGE_PROTNONE (but it
could also use a different bitflag, it's up to the architecture to
decide).

It would be confusing to call the "NUMA hinting page faults" as
"do_prot_none faults". They're different events and _PAGE_NUMA doesn't
alter the semantics of mprotect(PROT_NONE) in any way.

Sharing the same bitflag with _PAGE_PROTNONE in fact complicates
things: it requires us to ensure the code paths executed by
_PAGE_PROTNONE remains mutually exclusive to the code paths executed
by _PAGE_NUMA at all times, to avoid _PAGE_NUMA and _PAGE_PROTNONE to
step into each other toes.

Because we want to be able to set this bitflag in any established pte
or pmd (while clearing the present bit at the same time) without
losing information, this bitflag must never be set when the pte and
pmd are present, so the bitflag picked for _PAGE_NUMA usage, must not
be used by the swap entry format.

Signed-off-by: Andrea Arcangeli <***@redhat.com>
Signed-off-by: Mel Gorman <***@suse.de>
Reviewed-by: Rik van Riel <***@redhat.com>
---
arch/x86/include/asm/pgtable_types.h | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)

diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index ec8a1fc..3c32db8 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -64,6 +64,26 @@
#define _PAGE_FILE (_AT(pteval_t, 1) << _PAGE_BIT_FILE)
#define _PAGE_PROTNONE (_AT(pteval_t, 1) << _PAGE_BIT_PROTNONE)

+/*
+ * _PAGE_NUMA indicates that this page will trigger a numa hinting
+ * minor page fault to gather numa placement statistics (see
+ * pte_numa()). The bit picked (8) is within the range between
+ * _PAGE_FILE (6) and _PAGE_PROTNONE (8) bits. Therefore, it doesn't
+ * require changes to the swp entry format because that bit is always
+ * zero when the pte is not present.
+ *
+ * The bit picked must be always zero when the pmd is present and not
+ * present, so that we don't lose information when we set it while
+ * atomically clearing the present bit.
+ *
+ * Because we shared the same bit (8) with _PAGE_PROTNONE this can be
+ * interpreted as _PAGE_NUMA only in places that _PAGE_PROTNONE
+ * couldn't reach, like handle_mm_fault() (see access_error in
+ * arch/x86/mm/fault.c, the vma protection must not be PROT_NONE for
+ * handle_mm_fault() to be invoked).
+ */
+#define _PAGE_NUMA _PAGE_PROTNONE
+
#define _PAGE_TABLE (_PAGE_PRESENT | _PAGE_RW | _PAGE_USER | \
_PAGE_ACCESSED | _PAGE_DIRTY)
#define _KERNPG_TABLE (_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED | \
--
1.7.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Mel Gorman
2012-12-07 10:40:02 UTC
Permalink
From: Peter Zijlstra <***@chello.nl>

Note: This was originally based on Peter's patch "mm/migrate: Introduce
migrate_misplaced_page()" but borrows extremely heavily from Andrea's
"autonuma: memory follows CPU algorithm and task/mm_autonuma stats
collection". The end result is barely recognisable so signed-offs
had to be dropped. If original authors are ok with it, I'll
re-add the signed-off-bys.

Add migrate_misplaced_page() which deals with migrating pages from
faults.

Based-on-work-by: Lee Schermerhorn <***@hp.com>
Based-on-work-by: Peter Zijlstra <***@chello.nl>
Based-on-work-by: Andrea Arcangeli <***@redhat.com>
Signed-off-by: Mel Gorman <***@suse.de>
Reviewed-by: Rik van Riel <***@redhat.com>
---
include/linux/migrate.h | 11 +++++
mm/migrate.c | 108 ++++++++++++++++++++++++++++++++++++++++++++++-
2 files changed, 117 insertions(+), 2 deletions(-)

diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index 9d1c159..2923135 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -13,6 +13,7 @@ enum migrate_reason {
MR_MEMORY_HOTPLUG,
MR_SYSCALL, /* also applies to cpusets */
MR_MEMPOLICY_MBIND,
+ MR_NUMA_MISPLACED,
MR_CMA
};

@@ -73,4 +74,14 @@ static inline int migrate_huge_page_move_mapping(struct address_space *mapping,
#define fail_migrate_page NULL

#endif /* CONFIG_MIGRATION */
+
+#ifdef CONFIG_BALANCE_NUMA
+extern int migrate_misplaced_page(struct page *page, int node);
+#else
+static inline int migrate_misplaced_page(struct page *page, int node)
+{
+ return -EAGAIN; /* can't migrate now */
+}
+#endif /* CONFIG_BALANCE_NUMA */
+
#endif /* _LINUX_MIGRATE_H */
diff --git a/mm/migrate.c b/mm/migrate.c
index 27be9c9..a2c4567 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -282,7 +282,7 @@ static int migrate_page_move_mapping(struct address_space *mapping,
struct page *newpage, struct page *page,
struct buffer_head *head, enum migrate_mode mode)
{
- int expected_count;
+ int expected_count = 0;
void **pslot;

if (!mapping) {
@@ -1415,4 +1415,108 @@ int migrate_vmas(struct mm_struct *mm, const nodemask_t *to,
}
return err;
}
-#endif
+
+#ifdef CONFIG_BALANCE_NUMA
+/*
+ * Returns true if this is a safe migration target node for misplaced NUMA
+ * pages. Currently it only checks the watermarks which crude
+ */
+static bool migrate_balanced_pgdat(struct pglist_data *pgdat,
+ int nr_migrate_pages)
+{
+ int z;
+ for (z = pgdat->nr_zones - 1; z >= 0; z--) {
+ struct zone *zone = pgdat->node_zones + z;
+
+ if (!populated_zone(zone))
+ continue;
+
+ if (zone->all_unreclaimable)
+ continue;
+
+ /* Avoid waking kswapd by allocating pages_to_migrate pages. */
+ if (!zone_watermark_ok(zone, 0,
+ high_wmark_pages(zone) +
+ nr_migrate_pages,
+ 0, 0))
+ continue;
+ return true;
+ }
+ return false;
+}
+
+static struct page *alloc_misplaced_dst_page(struct page *page,
+ unsigned long data,
+ int **result)
+{
+ int nid = (int) data;
+ struct page *newpage;
+
+ newpage = alloc_pages_exact_node(nid,
+ (GFP_HIGHUSER_MOVABLE | GFP_THISNODE |
+ __GFP_NOMEMALLOC | __GFP_NORETRY |
+ __GFP_NOWARN) &
+ ~GFP_IOFS, 0);
+ return newpage;
+}
+
+/*
+ * Attempt to migrate a misplaced page to the specified destination
+ * node. Caller is expected to have an elevated reference count on
+ * the page that will be dropped by this function before returning.
+ */
+int migrate_misplaced_page(struct page *page, int node)
+{
+ int isolated = 0;
+ LIST_HEAD(migratepages);
+
+ /*
+ * Don't migrate pages that are mapped in multiple processes.
+ * TODO: Handle false sharing detection instead of this hammer
+ */
+ if (page_mapcount(page) != 1) {
+ put_page(page);
+ goto out;
+ }
+
+ /* Avoid migrating to a node that is nearly full */
+ if (migrate_balanced_pgdat(NODE_DATA(node), 1)) {
+ int page_lru;
+
+ if (isolate_lru_page(page)) {
+ put_page(page);
+ goto out;
+ }
+ isolated = 1;
+
+ /*
+ * Page is isolated which takes a reference count so now the
+ * callers reference can be safely dropped without the page
+ * disappearing underneath us during migration
+ */
+ put_page(page);
+
+ page_lru = page_is_file_cache(page);
+ inc_zone_page_state(page, NR_ISOLATED_ANON + page_lru);
+ list_add(&page->lru, &migratepages);
+ }
+
+ if (isolated) {
+ int nr_remaining;
+
+ nr_remaining = migrate_pages(&migratepages,
+ alloc_misplaced_dst_page,
+ node, false, MIGRATE_ASYNC,
+ MR_NUMA_MISPLACED);
+ if (nr_remaining) {
+ putback_lru_pages(&migratepages);
+ isolated = 0;
+ }
+ }
+ BUG_ON(!list_empty(&migratepages));
+out:
+ return isolated;
+}
+#endif /* CONFIG_BALANCE_NUMA */
+
+#endif /* CONFIG_NUMA */
--
1.7.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Mel Gorman
2012-12-07 10:40:02 UTC
Permalink
From: Andrea Arcangeli <***@redhat.com>

Implement pte_numa and pmd_numa.

We must atomically set the numa bit and clear the present bit to
define a pte_numa or pmd_numa.

Once a pte or pmd has been set as pte_numa or pmd_numa, the next time
a thread touches a virtual address in the corresponding virtual range,
a NUMA hinting page fault will trigger. The NUMA hinting page fault
will clear the NUMA bit and set the present bit again to resolve the
page fault.

The expectation is that a NUMA hinting page fault is used as part
of a placement policy that decides if a page should remain on the
current node or migrated to a different node.

Acked-by: Rik van Riel <***@redhat.com>
Signed-off-by: Andrea Arcangeli <***@redhat.com>
Signed-off-by: Mel Gorman <***@suse.de>
---
arch/x86/include/asm/pgtable.h | 11 ++++-
include/asm-generic/pgtable.h | 106 ++++++++++++++++++++++++++++++++++++++++
init/Kconfig | 33 +++++++++++++
3 files changed, 148 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 5fe03aa..9cd7b72 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -404,7 +404,8 @@ static inline int pte_same(pte_t a, pte_t b)

static inline int pte_present(pte_t a)
{
- return pte_flags(a) & (_PAGE_PRESENT | _PAGE_PROTNONE);
+ return pte_flags(a) & (_PAGE_PRESENT | _PAGE_PROTNONE |
+ _PAGE_NUMA);
}

#define pte_accessible pte_accessible
@@ -426,7 +427,8 @@ static inline int pmd_present(pmd_t pmd)
* the _PAGE_PSE flag will remain set at all times while the
* _PAGE_PRESENT bit is clear).
*/
- return pmd_flags(pmd) & (_PAGE_PRESENT | _PAGE_PROTNONE | _PAGE_PSE);
+ return pmd_flags(pmd) & (_PAGE_PRESENT | _PAGE_PROTNONE | _PAGE_PSE |
+ _PAGE_NUMA);
}

static inline int pmd_none(pmd_t pmd)
@@ -485,6 +487,11 @@ static inline pte_t *pte_offset_kernel(pmd_t *pmd, unsigned long address)

static inline int pmd_bad(pmd_t pmd)
{
+#ifdef CONFIG_BALANCE_NUMA
+ /* pmd_numa check */
+ if ((pmd_flags(pmd) & (_PAGE_NUMA|_PAGE_PRESENT)) == _PAGE_NUMA)
+ return 0;
+#endif
return (pmd_flags(pmd) & ~_PAGE_USER) != _KERNPG_TABLE;
}

diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index 48fc1dc..7ab6e63 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -558,6 +558,112 @@ static inline int pmd_trans_unstable(pmd_t *pmd)
#endif
}

+#ifdef CONFIG_BALANCE_NUMA
+#ifdef CONFIG_ARCH_USES_NUMA_PROT_NONE
+/*
+ * _PAGE_NUMA works identical to _PAGE_PROTNONE (it's actually the
+ * same bit too). It's set only when _PAGE_PRESET is not set and it's
+ * never set if _PAGE_PRESENT is set.
+ *
+ * pte/pmd_present() returns true if pte/pmd_numa returns true. Page
+ * fault triggers on those regions if pte/pmd_numa returns true
+ * (because _PAGE_PRESENT is not set).
+ */
+#ifndef pte_numa
+static inline int pte_numa(pte_t pte)
+{
+ return (pte_flags(pte) &
+ (_PAGE_NUMA|_PAGE_PRESENT)) == _PAGE_NUMA;
+}
+#endif
+
+#ifndef pmd_numa
+static inline int pmd_numa(pmd_t pmd)
+{
+ return (pmd_flags(pmd) &
+ (_PAGE_NUMA|_PAGE_PRESENT)) == _PAGE_NUMA;
+}
+#endif
+
+/*
+ * pte/pmd_mknuma sets the _PAGE_ACCESSED bitflag automatically
+ * because they're called by the NUMA hinting minor page fault. If we
+ * wouldn't set the _PAGE_ACCESSED bitflag here, the TLB miss handler
+ * would be forced to set it later while filling the TLB after we
+ * return to userland. That would trigger a second write to memory
+ * that we optimize away by setting _PAGE_ACCESSED here.
+ */
+#ifndef pte_mknonnuma
+static inline pte_t pte_mknonnuma(pte_t pte)
+{
+ pte = pte_clear_flags(pte, _PAGE_NUMA);
+ return pte_set_flags(pte, _PAGE_PRESENT|_PAGE_ACCESSED);
+}
+#endif
+
+#ifndef pmd_mknonnuma
+static inline pmd_t pmd_mknonnuma(pmd_t pmd)
+{
+ pmd = pmd_clear_flags(pmd, _PAGE_NUMA);
+ return pmd_set_flags(pmd, _PAGE_PRESENT|_PAGE_ACCESSED);
+}
+#endif
+
+#ifndef pte_mknuma
+static inline pte_t pte_mknuma(pte_t pte)
+{
+ pte = pte_set_flags(pte, _PAGE_NUMA);
+ return pte_clear_flags(pte, _PAGE_PRESENT);
+}
+#endif
+
+#ifndef pmd_mknuma
+static inline pmd_t pmd_mknuma(pmd_t pmd)
+{
+ pmd = pmd_set_flags(pmd, _PAGE_NUMA);
+ return pmd_clear_flags(pmd, _PAGE_PRESENT);
+}
+#endif
+#else
+extern int pte_numa(pte_t pte);
+extern int pmd_numa(pmd_t pmd);
+extern pte_t pte_mknonnuma(pte_t pte);
+extern pmd_t pmd_mknonnuma(pmd_t pmd);
+extern pte_t pte_mknuma(pte_t pte);
+extern pmd_t pmd_mknuma(pmd_t pmd);
+#endif /* CONFIG_ARCH_USES_NUMA_PROT_NONE */
+#else
+static inline int pmd_numa(pmd_t pmd)
+{
+ return 0;
+}
+
+static inline int pte_numa(pte_t pte)
+{
+ return 0;
+}
+
+static inline pte_t pte_mknonnuma(pte_t pte)
+{
+ return pte;
+}
+
+static inline pmd_t pmd_mknonnuma(pmd_t pmd)
+{
+ return pmd;
+}
+
+static inline pte_t pte_mknuma(pte_t pte)
+{
+ return pte;
+}
+
+static inline pmd_t pmd_mknuma(pmd_t pmd)
+{
+ return pmd;
+}
+#endif /* CONFIG_BALANCE_NUMA */
+
#endif /* CONFIG_MMU */

#endif /* !__ASSEMBLY__ */
diff --git a/init/Kconfig b/init/Kconfig
index 6fdd6e3..6897a05 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -696,6 +696,39 @@ config LOG_BUF_SHIFT
config HAVE_UNSTABLE_SCHED_CLOCK
bool

+#
+# For architectures that want to enable the support for NUMA-affine scheduler
+# balancing logic:
+#
+config ARCH_SUPPORTS_NUMA_BALANCING
+ bool
+
+# For architectures that (ab)use NUMA to represent different memory regions
+# all cpu-local but of different latencies, such as SuperH.
+#
+config ARCH_WANT_NUMA_VARIABLE_LOCALITY
+ bool
+
+#
+# For architectures that are willing to define _PAGE_NUMA as _PAGE_PROTNONE
+config ARCH_WANTS_PROT_NUMA_PROT_NONE
+ bool
+
+config ARCH_USES_NUMA_PROT_NONE
+ bool
+ default y
+ depends on ARCH_WANTS_PROT_NUMA_PROT_NONE
+ depends on BALANCE_NUMA
+
+config BALANCE_NUMA
+ bool "Memory placement aware NUMA scheduler"
+ default n
+ depends on ARCH_SUPPORTS_NUMA_BALANCING
+ depends on !ARCH_WANT_NUMA_VARIABLE_LOCALITY
+ depends on SMP && NUMA && MIGRATION
+ help
+ This option adds support for automatic NUMA aware memory/task placement.
+
menuconfig CGROUPS
boolean "Control Group support"
depends on EVENTFD
--
1.7.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Mel Gorman
2012-12-07 10:40:02 UTC
Permalink
From: Peter Zijlstra <***@chello.nl>

Make MPOL_LOCAL a real and exposed policy such that applications that
relied on the previous default behaviour can explicitly request it.

Requested-by: Christoph Lameter <***@linux.com>
Reviewed-by: Rik van Riel <***@redhat.com>
Cc: Lee Schermerhorn <***@hp.com>
Cc: Andrew Morton <***@linux-foundation.org>
Cc: Linus Torvalds <***@linux-foundation.org>
Signed-off-by: Peter Zijlstra <***@chello.nl>
Signed-off-by: Ingo Molnar <***@kernel.org>
Signed-off-by: Mel Gorman <***@suse.de>
---
include/uapi/linux/mempolicy.h | 1 +
mm/mempolicy.c | 9 ++++++---
2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/include/uapi/linux/mempolicy.h b/include/uapi/linux/mempolicy.h
index 23e62e0..3e835c9 100644
--- a/include/uapi/linux/mempolicy.h
+++ b/include/uapi/linux/mempolicy.h
@@ -20,6 +20,7 @@ enum {
MPOL_PREFERRED,
MPOL_BIND,
MPOL_INTERLEAVE,
+ MPOL_LOCAL,
MPOL_MAX, /* always last member of enum */
};

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 66e90ec..54bd3e5 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -269,6 +269,10 @@ static struct mempolicy *mpol_new(unsigned short mode, unsigned short flags,
(flags & MPOL_F_RELATIVE_NODES)))
return ERR_PTR(-EINVAL);
}
+ } else if (mode == MPOL_LOCAL) {
+ if (!nodes_empty(*nodes))
+ return ERR_PTR(-EINVAL);
+ mode = MPOL_PREFERRED;
} else if (nodes_empty(*nodes))
return ERR_PTR(-EINVAL);
policy = kmem_cache_alloc(policy_cache, GFP_KERNEL);
@@ -2399,7 +2403,6 @@ void numa_default_policy(void)
* "local" is pseudo-policy: MPOL_PREFERRED with MPOL_F_LOCAL flag
* Used only for mpol_parse_str() and mpol_to_str()
*/
-#define MPOL_LOCAL MPOL_MAX
static const char * const policy_modes[] =
{
[MPOL_DEFAULT] = "default",
@@ -2452,12 +2455,12 @@ int mpol_parse_str(char *str, struct mempolicy **mpol, int no_context)
if (flags)
*flags++ = '\0'; /* terminate mode string */

- for (mode = 0; mode <= MPOL_LOCAL; mode++) {
+ for (mode = 0; mode < MPOL_MAX; mode++) {
if (!strcmp(str, policy_modes[mode])) {
break;
}
}
- if (mode > MPOL_LOCAL)
+ if (mode >= MPOL_MAX)
goto out;

switch (mode) {
--
1.7.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Mel Gorman
2012-12-07 10:40:02 UTC
Permalink
The compact_pages_moved and compact_pagemigrate_failed events are
convenient for determining if compaction is active and to what
degree migration is succeeding but it's at the wrong level. Other
users of migration may also want to know if migration is working
properly and this will be particularly true for any automated
NUMA migration. This patch moves the counters down to migration
with the new events called pgmigrate_success and pgmigrate_fail.
The compact_blocks_moved counter is removed because while it was
useful for debugging initially, it's worthless now as no meaningful
conclusions can be drawn from its value.

Signed-off-by: Mel Gorman <***@suse.de>
Reviewed-by: Rik van Riel <***@redhat.com>
---
include/linux/vm_event_item.h | 4 +++-
mm/compaction.c | 4 ----
mm/migrate.c | 6 ++++++
mm/vmstat.c | 7 ++++---
4 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
index 3d31145..8aa7cb9 100644
--- a/include/linux/vm_event_item.h
+++ b/include/linux/vm_event_item.h
@@ -38,8 +38,10 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
KSWAPD_LOW_WMARK_HIT_QUICKLY, KSWAPD_HIGH_WMARK_HIT_QUICKLY,
KSWAPD_SKIP_CONGESTION_WAIT,
PAGEOUTRUN, ALLOCSTALL, PGROTATED,
+#ifdef CONFIG_MIGRATION
+ PGMIGRATE_SUCCESS, PGMIGRATE_FAIL,
+#endif
#ifdef CONFIG_COMPACTION
- COMPACTBLOCKS, COMPACTPAGES, COMPACTPAGEFAILED,
COMPACTSTALL, COMPACTFAIL, COMPACTSUCCESS,
#endif
#ifdef CONFIG_HUGETLB_PAGE
diff --git a/mm/compaction.c b/mm/compaction.c
index 9eef558..00ad883 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -994,10 +994,6 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
update_nr_listpages(cc);
nr_remaining = cc->nr_migratepages;

- count_vm_event(COMPACTBLOCKS);
- count_vm_events(COMPACTPAGES, nr_migrate - nr_remaining);
- if (nr_remaining)
- count_vm_events(COMPACTPAGEFAILED, nr_remaining);
trace_mm_compaction_migratepages(nr_migrate - nr_remaining,
nr_remaining);

diff --git a/mm/migrate.c b/mm/migrate.c
index 77ed2d7..04687f6 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -962,6 +962,7 @@ int migrate_pages(struct list_head *from,
{
int retry = 1;
int nr_failed = 0;
+ int nr_succeeded = 0;
int pass = 0;
struct page *page;
struct page *page2;
@@ -988,6 +989,7 @@ int migrate_pages(struct list_head *from,
retry++;
break;
case 0:
+ nr_succeeded++;
break;
default:
/* Permanent failure */
@@ -998,6 +1000,10 @@ int migrate_pages(struct list_head *from,
}
rc = 0;
out:
+ if (nr_succeeded)
+ count_vm_events(PGMIGRATE_SUCCESS, nr_succeeded);
+ if (nr_failed)
+ count_vm_events(PGMIGRATE_FAIL, nr_failed);
if (!swapwrite)
current->flags &= ~PF_SWAPWRITE;

diff --git a/mm/vmstat.c b/mm/vmstat.c
index c737057..89a7fd6 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -774,10 +774,11 @@ const char * const vmstat_text[] = {

"pgrotated",

+#ifdef CONFIG_MIGRATION
+ "pgmigrate_success",
+ "pgmigrate_fail",
+#endif
#ifdef CONFIG_COMPACTION
- "compact_blocks_moved",
- "compact_pages_moved",
- "compact_pagemigrate_failed",
"compact_stall",
"compact_fail",
"compact_success",
--
1.7.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Mel Gorman
2012-12-07 10:40:02 UTC
Permalink
It is tricky to quantify the basic cost of automatic NUMA placement in a
meaningful manner. This patch adds some vmstats that can be used as part
of a basic costing model.

u = basic unit = sizeof(void *)
Ca = cost of struct page access = sizeof(struct page) / u
Cpte = Cost PTE access = Ca
Cupdate = Cost PTE update = (2 * Cpte) + (2 * Wlock)
where Cpte is incurred twice for a read and a write and Wlock
is a constant representing the cost of taking or releasing a
lock
Cnumahint = Cost of a minor page fault = some high constant e.g. 1000
Cpagerw = Cost to read or write a full page = Ca + PAGE_SIZE/u
Ci = Cost of page isolation = Ca + Wi
where Wi is a constant that should reflect the approximate cost
of the locking operation
Cpagecopy = Cpagerw + (Cpagerw * Wnuma) + Ci + (Ci * Wnuma)
where Wnuma is the approximate NUMA factor. 1 is local. 1.2
would imply that remote accesses are 20% more expensive

Balancing cost = Cpte * numa_pte_updates +
Cnumahint * numa_hint_faults +
Ci * numa_pages_migrated +
Cpagecopy * numa_pages_migrated

Note that numa_pages_migrated is used as a measure of how many pages
were isolated even though it would miss pages that failed to migrate. A
vmstat counter could have been added for it but the isolation cost is
pretty marginal in comparison to the overall cost so it seemed overkill.

The ideal way to measure automatic placement benefit would be to count
the number of remote accesses versus local accesses and do something like

benefit = (remote_accesses_before - remove_access_after) * Wnuma

but the information is not readily available. As a workload converges, the
expection would be that the number of remote numa hints would reduce to 0.

convergence = numa_hint_faults_local / numa_hint_faults
where this is measured for the last N number of
numa hints recorded. When the workload is fully
converged the value is 1.

This can measure if the placement policy is converging and how fast it is
doing it.

Signed-off-by: Mel Gorman <***@suse.de>
Acked-by: Rik van Riel <***@redhat.com>
---
include/linux/vm_event_item.h | 6 ++++++
include/linux/vmstat.h | 8 ++++++++
mm/huge_memory.c | 5 +++++
mm/memory.c | 12 ++++++++++++
mm/mempolicy.c | 2 ++
mm/migrate.c | 3 ++-
mm/vmstat.c | 6 ++++++
7 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
index a1f750b..dded0af 100644
--- a/include/linux/vm_event_item.h
+++ b/include/linux/vm_event_item.h
@@ -38,6 +38,12 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
KSWAPD_LOW_WMARK_HIT_QUICKLY, KSWAPD_HIGH_WMARK_HIT_QUICKLY,
KSWAPD_SKIP_CONGESTION_WAIT,
PAGEOUTRUN, ALLOCSTALL, PGROTATED,
+#ifdef CONFIG_BALANCE_NUMA
+ NUMA_PTE_UPDATES,
+ NUMA_HINT_FAULTS,
+ NUMA_HINT_FAULTS_LOCAL,
+ NUMA_PAGE_MIGRATE,
+#endif
#ifdef CONFIG_MIGRATION
PGMIGRATE_SUCCESS, PGMIGRATE_FAIL,
#endif
diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
index 92a86b2..dffccfa 100644
--- a/include/linux/vmstat.h
+++ b/include/linux/vmstat.h
@@ -80,6 +80,14 @@ static inline void vm_events_fold_cpu(int cpu)

#endif /* CONFIG_VM_EVENT_COUNTERS */

+#ifdef CONFIG_BALANCE_NUMA
+#define count_vm_numa_event(x) count_vm_event(x)
+#define count_vm_numa_events(x, y) count_vm_events(x, y)
+#else
+#define count_vm_numa_event(x) do {} while (0)
+#define count_vm_numa_events(x, y) do {} while (0)
+#endif /* CONFIG_BALANCE_NUMA */
+
#define __count_zone_vm_events(item, zone, delta) \
__count_vm_events(item##_NORMAL - ZONE_NORMAL + \
zone_idx(zone), delta)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index b3d4c4b..66e73cc 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1025,6 +1025,7 @@ int do_huge_pmd_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
struct page *page = NULL;
unsigned long haddr = addr & HPAGE_PMD_MASK;
int target_nid;
+ int current_nid = -1;

spin_lock(&mm->page_table_lock);
if (unlikely(!pmd_same(pmd, *pmdp)))
@@ -1033,6 +1034,10 @@ int do_huge_pmd_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
page = pmd_page(pmd);
get_page(page);
spin_unlock(&mm->page_table_lock);
+ current_nid = page_to_nid(page);
+ count_vm_numa_event(NUMA_HINT_FAULTS);
+ if (current_nid == numa_node_id())
+ count_vm_numa_event(NUMA_HINT_FAULTS_LOCAL);

target_nid = mpol_misplaced(page, vma, haddr);
if (target_nid == -1)
diff --git a/mm/memory.c b/mm/memory.c
index 1d6f85a..47f5dd1 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3477,6 +3477,7 @@ int do_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
set_pte_at(mm, addr, ptep, pte);
update_mmu_cache(vma, addr, ptep);

+ count_vm_numa_event(NUMA_HINT_FAULTS);
page = vm_normal_page(vma, addr, pte);
if (!page) {
pte_unmap_unlock(ptep, ptl);
@@ -3485,6 +3486,8 @@ int do_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,

get_page(page);
current_nid = page_to_nid(page);
+ if (current_nid == numa_node_id())
+ count_vm_numa_event(NUMA_HINT_FAULTS_LOCAL);
target_nid = mpol_misplaced(page, vma, addr);
pte_unmap_unlock(ptep, ptl);
if (target_nid == -1) {
@@ -3517,6 +3520,9 @@ static int do_pmd_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
unsigned long offset;
spinlock_t *ptl;
bool numa = false;
+ int local_nid = numa_node_id();
+ unsigned long nr_faults = 0;
+ unsigned long nr_faults_local = 0;

spin_lock(&mm->page_table_lock);
pmd = *pmdp;
@@ -3565,10 +3571,16 @@ static int do_pmd_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
curr_nid = page_to_nid(page);
task_numa_fault(curr_nid, 1);

+ nr_faults++;
+ if (curr_nid == local_nid)
+ nr_faults_local++;
+
pte = pte_offset_map_lock(mm, pmdp, addr, &ptl);
}
pte_unmap_unlock(orig_pte, ptl);

+ count_vm_numa_events(NUMA_HINT_FAULTS, nr_faults);
+ count_vm_numa_events(NUMA_HINT_FAULTS_LOCAL, nr_faults_local);
return 0;
}
#else
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index a7a62fe..516491f 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -583,6 +583,8 @@ unsigned long change_prot_numa(struct vm_area_struct *vma,
BUILD_BUG_ON(_PAGE_NUMA != _PAGE_PROTNONE);

nr_updated = change_protection(vma, addr, end, vma->vm_page_prot, 0, 1);
+ if (nr_updated)
+ count_vm_numa_events(NUMA_PTE_UPDATES, nr_updated);

return nr_updated;
}
diff --git a/mm/migrate.c b/mm/migrate.c
index 49878d7..4f55694 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1514,7 +1514,8 @@ int migrate_misplaced_page(struct page *page, int node)
if (nr_remaining) {
putback_lru_pages(&migratepages);
isolated = 0;
- }
+ } else
+ count_vm_numa_event(NUMA_PAGE_MIGRATE);
}
BUG_ON(!list_empty(&migratepages));
out:
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 3a067fa..cfa386da 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -774,6 +774,12 @@ const char * const vmstat_text[] = {

"pgrotated",

+#ifdef CONFIG_BALANCE_NUMA
+ "numa_pte_updates",
+ "numa_hint_faults",
+ "numa_hint_faults_local",
+ "numa_pages_migrated",
+#endif
#ifdef CONFIG_MIGRATION
"pgmigrate_success",
"pgmigrate_fail",
--
1.7.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Simon Jeons
2013-01-04 11:50:02 UTC
Permalink
Post by Mel Gorman
It is tricky to quantify the basic cost of automatic NUMA placement in a
meaningful manner. This patch adds some vmstats that can be used as part
of a basic costing model.
Hi Gorman,
Post by Mel Gorman
u = basic unit = sizeof(void *)
Ca = cost of struct page access = sizeof(struct page) / u
Cpte = Cost PTE access = Ca
Cupdate = Cost PTE update = (2 * Cpte) + (2 * Wlock)
where Cpte is incurred twice for a read and a write and Wlock
is a constant representing the cost of taking or releasing a
lock
Cnumahint = Cost of a minor page fault = some high constant e.g. 1000
Cpagerw = Cost to read or write a full page = Ca + PAGE_SIZE/u
Why cpagerw = Ca + PAGE_SIZE/u instead of Cpte + PAGE_SIZE/u ?
Post by Mel Gorman
Ci = Cost of page isolation = Ca + Wi
where Wi is a constant that should reflect the approximate cost
of the locking operation
Cpagecopy = Cpagerw + (Cpagerw * Wnuma) + Ci + (Ci * Wnuma)
where Wnuma is the approximate NUMA factor. 1 is local. 1.2
would imply that remote accesses are 20% more expensive
Balancing cost = Cpte * numa_pte_updates +
Cnumahint * numa_hint_faults +
Ci * numa_pages_migrated +
Cpagecopy * numa_pages_migrated
Since Cpagecopy has already accumulated ci why count ci twice ?
Post by Mel Gorman
Note that numa_pages_migrated is used as a measure of how many pages
were isolated even though it would miss pages that failed to migrate. A
vmstat counter could have been added for it but the isolation cost is
pretty marginal in comparison to the overall cost so it seemed overkill.
The ideal way to measure automatic placement benefit would be to count
the number of remote accesses versus local accesses and do something like
benefit = (remote_accesses_before - remove_access_after) * Wnuma
but the information is not readily available. As a workload converges, the
expection would be that the number of remote numa hints would reduce to 0.
convergence = numa_hint_faults_local / numa_hint_faults
where this is measured for the last N number of
numa hints recorded. When the workload is fully
converged the value is 1.
convergence tend to 0 is better or 1 is better? If tend to 1, Cpte *
numa_pte_updates + Cnumahint * numa_hint_faults are just waste, where I
miss?
Post by Mel Gorman
This can measure if the placement policy is converging and how fast it is
doing it.
---
include/linux/vm_event_item.h | 6 ++++++
include/linux/vmstat.h | 8 ++++++++
mm/huge_memory.c | 5 +++++
mm/memory.c | 12 ++++++++++++
mm/mempolicy.c | 2 ++
mm/migrate.c | 3 ++-
mm/vmstat.c | 6 ++++++
7 files changed, 41 insertions(+), 1 deletion(-)
diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
index a1f750b..dded0af 100644
--- a/include/linux/vm_event_item.h
+++ b/include/linux/vm_event_item.h
@@ -38,6 +38,12 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
KSWAPD_LOW_WMARK_HIT_QUICKLY, KSWAPD_HIGH_WMARK_HIT_QUICKLY,
KSWAPD_SKIP_CONGESTION_WAIT,
PAGEOUTRUN, ALLOCSTALL, PGROTATED,
+#ifdef CONFIG_BALANCE_NUMA
+ NUMA_PTE_UPDATES,
+ NUMA_HINT_FAULTS,
+ NUMA_HINT_FAULTS_LOCAL,
+ NUMA_PAGE_MIGRATE,
+#endif
#ifdef CONFIG_MIGRATION
PGMIGRATE_SUCCESS, PGMIGRATE_FAIL,
#endif
diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
index 92a86b2..dffccfa 100644
--- a/include/linux/vmstat.h
+++ b/include/linux/vmstat.h
@@ -80,6 +80,14 @@ static inline void vm_events_fold_cpu(int cpu)
#endif /* CONFIG_VM_EVENT_COUNTERS */
+#ifdef CONFIG_BALANCE_NUMA
+#define count_vm_numa_event(x) count_vm_event(x)
+#define count_vm_numa_events(x, y) count_vm_events(x, y)
+#else
+#define count_vm_numa_event(x) do {} while (0)
+#define count_vm_numa_events(x, y) do {} while (0)
+#endif /* CONFIG_BALANCE_NUMA */
+
#define __count_zone_vm_events(item, zone, delta) \
__count_vm_events(item##_NORMAL - ZONE_NORMAL + \
zone_idx(zone), delta)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index b3d4c4b..66e73cc 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1025,6 +1025,7 @@ int do_huge_pmd_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
struct page *page = NULL;
unsigned long haddr = addr & HPAGE_PMD_MASK;
int target_nid;
+ int current_nid = -1;
spin_lock(&mm->page_table_lock);
if (unlikely(!pmd_same(pmd, *pmdp)))
@@ -1033,6 +1034,10 @@ int do_huge_pmd_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
page = pmd_page(pmd);
get_page(page);
spin_unlock(&mm->page_table_lock);
+ current_nid = page_to_nid(page);
+ count_vm_numa_event(NUMA_HINT_FAULTS);
+ if (current_nid == numa_node_id())
+ count_vm_numa_event(NUMA_HINT_FAULTS_LOCAL);
target_nid = mpol_misplaced(page, vma, haddr);
if (target_nid == -1)
diff --git a/mm/memory.c b/mm/memory.c
index 1d6f85a..47f5dd1 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3477,6 +3477,7 @@ int do_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
set_pte_at(mm, addr, ptep, pte);
update_mmu_cache(vma, addr, ptep);
+ count_vm_numa_event(NUMA_HINT_FAULTS);
page = vm_normal_page(vma, addr, pte);
if (!page) {
pte_unmap_unlock(ptep, ptl);
@@ -3485,6 +3486,8 @@ int do_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
get_page(page);
current_nid = page_to_nid(page);
+ if (current_nid == numa_node_id())
+ count_vm_numa_event(NUMA_HINT_FAULTS_LOCAL);
target_nid = mpol_misplaced(page, vma, addr);
pte_unmap_unlock(ptep, ptl);
if (target_nid == -1) {
@@ -3517,6 +3520,9 @@ static int do_pmd_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
unsigned long offset;
spinlock_t *ptl;
bool numa = false;
+ int local_nid = numa_node_id();
+ unsigned long nr_faults = 0;
+ unsigned long nr_faults_local = 0;
spin_lock(&mm->page_table_lock);
pmd = *pmdp;
@@ -3565,10 +3571,16 @@ static int do_pmd_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
curr_nid = page_to_nid(page);
task_numa_fault(curr_nid, 1);
+ nr_faults++;
+ if (curr_nid == local_nid)
+ nr_faults_local++;
+
pte = pte_offset_map_lock(mm, pmdp, addr, &ptl);
}
pte_unmap_unlock(orig_pte, ptl);
+ count_vm_numa_events(NUMA_HINT_FAULTS, nr_faults);
+ count_vm_numa_events(NUMA_HINT_FAULTS_LOCAL, nr_faults_local);
return 0;
}
#else
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index a7a62fe..516491f 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -583,6 +583,8 @@ unsigned long change_prot_numa(struct vm_area_struct *vma,
BUILD_BUG_ON(_PAGE_NUMA != _PAGE_PROTNONE);
nr_updated = change_protection(vma, addr, end, vma->vm_page_prot, 0, 1);
+ if (nr_updated)
+ count_vm_numa_events(NUMA_PTE_UPDATES, nr_updated);
return nr_updated;
}
diff --git a/mm/migrate.c b/mm/migrate.c
index 49878d7..4f55694 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1514,7 +1514,8 @@ int migrate_misplaced_page(struct page *page, int node)
if (nr_remaining) {
putback_lru_pages(&migratepages);
isolated = 0;
- }
+ } else
+ count_vm_numa_event(NUMA_PAGE_MIGRATE);
}
BUG_ON(!list_empty(&migratepages));
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 3a067fa..cfa386da 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -774,6 +774,12 @@ const char * const vmstat_text[] = {
"pgrotated",
+#ifdef CONFIG_BALANCE_NUMA
+ "numa_pte_updates",
+ "numa_hint_faults",
+ "numa_hint_faults_local",
+ "numa_pages_migrated",
+#endif
#ifdef CONFIG_MIGRATION
"pgmigrate_success",
"pgmigrate_fail",
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Mel Gorman
2013-01-07 15:30:02 UTC
Permalink
Post by Simon Jeons
Post by Mel Gorman
It is tricky to quantify the basic cost of automatic NUMA placement in a
meaningful manner. This patch adds some vmstats that can be used as part
of a basic costing model.
Hi Gorman,
Post by Mel Gorman
u = basic unit = sizeof(void *)
Ca = cost of struct page access = sizeof(struct page) / u
Cpte = Cost PTE access = Ca
Cupdate = Cost PTE update = (2 * Cpte) + (2 * Wlock)
where Cpte is incurred twice for a read and a write and Wlock
is a constant representing the cost of taking or releasing a
lock
Cnumahint = Cost of a minor page fault = some high constant e.g. 1000
Cpagerw = Cost to read or write a full page = Ca + PAGE_SIZE/u
Why cpagerw = Ca + PAGE_SIZE/u instead of Cpte + PAGE_SIZE/u ?
Because I was thinking of the cost of just access the struct page. Arguably
it would be both Ca and Cpte and if I wanted to be very comprehensive I
would also take into account the potential cost of kmapping the page in
the 32-bit case but it'd be overkill. The cost of the PTE and struct page
is negligible in comparison to the actual copy.
Post by Simon Jeons
Post by Mel Gorman
Ci = Cost of page isolation = Ca + Wi
where Wi is a constant that should reflect the approximate cost
of the locking operation
Cpagecopy = Cpagerw + (Cpagerw * Wnuma) + Ci + (Ci * Wnuma)
where Wnuma is the approximate NUMA factor. 1 is local. 1.2
would imply that remote accesses are 20% more expensive
Balancing cost = Cpte * numa_pte_updates +
Cnumahint * numa_hint_faults +
Ci * numa_pages_migrated +
Cpagecopy * numa_pages_migrated
Since Cpagecopy has already accumulated ci why count ci twice ?
Good point. Interestingly when I went to fix this in mmtests I found
that I accounted for Ci properly there but got it wrong in the
changelog.
Post by Simon Jeons
Post by Mel Gorman
Note that numa_pages_migrated is used as a measure of how many pages
were isolated even though it would miss pages that failed to migrate. A
vmstat counter could have been added for it but the isolation cost is
pretty marginal in comparison to the overall cost so it seemed overkill.
The ideal way to measure automatic placement benefit would be to count
the number of remote accesses versus local accesses and do something like
benefit = (remote_accesses_before - remove_access_after) * Wnuma
but the information is not readily available. As a workload converges, the
expection would be that the number of remote numa hints would reduce to 0.
convergence = numa_hint_faults_local / numa_hint_faults
where this is measured for the last N number of
numa hints recorded. When the workload is fully
converged the value is 1.
convergence tend to 0 is better or 1 is better
1 is better.
Post by Simon Jeons
If tend to 1, Cpte *
numa_pte_updates + Cnumahint * numa_hint_faults are just waste, where I
miss?
I don't get the question, waste of what? None of these calculations are
used by the kernel. The kernel only maintains counters and the point of
the changelog was to illustrate how the counters can be used to do some
meaningful evaluation.
--
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Mel Gorman
2012-12-07 10:40:02 UTC
Permalink
From: Ingo Molnar <***@kernel.org>

Reuse the NUMA code's 'modified page protections' count that
change_protection() computes and skip the TLB flush if there's
no changes to a range that sys_mprotect() modifies.

Given that mprotect() already optimizes the same-flags case
I expected this optimization to dominantly trigger on
CONFIG_NUMA_BALANCING=y kernels - but even with that feature
disabled it triggers rather often.

There's two reasons for that:

1)

While sys_mprotect() already optimizes the same-flag case:

if (newflags == oldflags) {
*pprev = vma;
return 0;
}

and this test works in many cases, but it is too sharp in some
others, where it differentiates between protection values that the
underlying PTE format makes no distinction about, such as
PROT_EXEC == PROT_READ on x86.

2)

Even where the pte format over vma flag changes necessiates a
modification of the pagetables, there might be no pagetables
yet to modify: they might not be instantiated yet.

During a regular desktop bootup this optimization hits a couple
of hundred times. During a Java test I measured thousands of
hits.

So this optimization improves sys_mprotect() in general, not just
CONFIG_NUMA_BALANCING=y kernels.

[ We could further increase the efficiency of this optimization if
change_pte_range() and change_huge_pmd() was a bit smarter about
recognizing exact-same-value protection masks - when the hardware
can do that safely. This would probably further speed up mprotect(). ]

Cc: Linus Torvalds <***@linux-foundation.org>
Cc: Andrew Morton <***@linux-foundation.org>
Cc: Peter Zijlstra <***@chello.nl>
Cc: Andrea Arcangeli <***@redhat.com>
Cc: Rik van Riel <***@redhat.com>
Cc: Mel Gorman <***@suse.de>
Cc: Hugh Dickins <***@google.com>
Cc: Thomas Gleixner <***@linutronix.de>
Signed-off-by: Ingo Molnar <***@kernel.org>
---
mm/mprotect.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/mm/mprotect.c b/mm/mprotect.c
index 1e265be..7c3628a 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -153,7 +153,9 @@ static unsigned long change_protection_range(struct vm_area_struct *vma,
dirty_accountable);
} while (pgd++, addr = next, addr != end);

- flush_tlb_range(vma, start, end);
+ /* Only flush the TLB if we actually modified any entries: */
+ if (pages)
+ flush_tlb_range(vma, start, end);

return pages;
}
--
1.7.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Mel Gorman
2012-12-07 10:40:03 UTC
Permalink
From: Rik van Riel <***@redhat.com>

We need pte_present to return true for _PAGE_PROTNONE pages, to indicate that
the pte is associated with a page.

However, for TLB flushing purposes, we would like to know whether the pte
points to an actually accessible page. This allows us to skip remote TLB
flushes for pages that are not actually accessible.

Fill in this method for x86 and provide a safe (but slower) method
on other architectures.

Signed-off-by: Rik van Riel <***@redhat.com>
Signed-off-by: Peter Zijlstra <***@chello.nl>
Fixed-by: Linus Torvalds <***@linux-foundation.org>
Cc: Andrew Morton <***@linux-foundation.org>
Cc: Peter Zijlstra <***@chello.nl>
Link: http://lkml.kernel.org/n/tip-***@git.kernel.org
[ Added Linus's review fixes. ]
Signed-off-by: Ingo Molnar <***@kernel.org>
---
arch/x86/include/asm/pgtable.h | 6 ++++++
include/asm-generic/pgtable.h | 4 ++++
2 files changed, 10 insertions(+)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index a1f780d..5fe03aa 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -407,6 +407,12 @@ static inline int pte_present(pte_t a)
return pte_flags(a) & (_PAGE_PRESENT | _PAGE_PROTNONE);
}

+#define pte_accessible pte_accessible
+static inline int pte_accessible(pte_t a)
+{
+ return pte_flags(a) & _PAGE_PRESENT;
+}
+
static inline int pte_hidden(pte_t pte)
{
return pte_flags(pte) & _PAGE_HIDDEN;
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index b36ce40..48fc1dc 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -219,6 +219,10 @@ static inline int pmd_same(pmd_t pmd_a, pmd_t pmd_b)
#define move_pte(pte, prot, old_addr, new_addr) (pte)
#endif

+#ifndef pte_accessible
+# define pte_accessible(pte) ((void)(pte),1)
+#endif
+
#ifndef flush_tlb_fix_spurious_fault
#define flush_tlb_fix_spurious_fault(vma, address) flush_tlb_page(vma, address)
#endif
--
1.7.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Mel Gorman
2012-12-07 10:40:02 UTC
Permalink
From: Rik van Riel <***@redhat.com>

The function ptep_set_access_flags is only ever used to upgrade
access permissions to a page. That means the only negative side
effect of not flushing remote TLBs is that other CPUs may incur
spurious page faults, if they happen to access the same address,
and still have a PTE with the old permissions cached in their
TLB.

Having another CPU maybe incur a spurious page fault is faster
than always incurring the cost of a remote TLB flush, so replace
the remote TLB flush with a purely local one.

This should be safe on every architecture that correctly
implements flush_tlb_fix_spurious_fault() to actually invalidate
the local TLB entry that caused a page fault, as well as on
architectures where the hardware invalidates TLB entries that
cause page faults.

In the unlikely event that you are hitting what appears to be
an infinite loop of page faults, and 'git bisect' took you to
this changeset, your architecture needs to implement
flush_tlb_fix_spurious_fault to actually flush the TLB entry.

Signed-off-by: Rik van Riel <***@redhat.com>
Cc: Linus Torvalds <***@linux-foundation.org>
Cc: Andrew Morton <***@linux-foundation.org>
Cc: Peter Zijlstra <***@chello.nl>
Cc: Michel Lespinasse <***@google.com>
Cc: Ingo Molnar <***@kernel.org>
---
mm/pgtable-generic.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c
index e642627..d8397da 100644
--- a/mm/pgtable-generic.c
+++ b/mm/pgtable-generic.c
@@ -12,8 +12,8 @@

#ifndef __HAVE_ARCH_PTEP_SET_ACCESS_FLAGS
/*
- * Only sets the access flags (dirty, accessed, and
- * writable). Furthermore, we know it always gets set to a "more
+ * Only sets the access flags (dirty, accessed), as well as write
+ * permission. Furthermore, we know it always gets set to a "more
* permissive" setting, which allows most architectures to optimize
* this. We return whether the PTE actually changed, which in turn
* instructs the caller to do things like update__mmu_cache. This
@@ -27,7 +27,7 @@ int ptep_set_access_flags(struct vm_area_struct *vma,
int changed = !pte_same(*ptep, entry);
if (changed) {
set_pte_at(vma->vm_mm, address, ptep, entry);
- flush_tlb_page(vma, address);
+ flush_tlb_fix_spurious_fault(vma, address);
}
return changed;
}
--
1.7.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Mel Gorman
2012-12-07 10:40:03 UTC
Permalink
From: Peter Zijlstra <***@chello.nl>

NOTE: This patch is based on "sched, numa, mm: Add fault driven
placement and migration policy" but as it throws away all the policy
to just leave a basic foundation I had to drop the signed-offs-by.

This patch creates a bare-bones method for setting PTEs pte_numa in the
context of the scheduler that when faulted later will be faulted onto the
node the CPU is running on. In itself this does nothing useful but any
placement policy will fundamentally depend on receiving hints on placement
from fault context and doing something intelligent about it.

Signed-off-by: Mel Gorman <***@suse.de>
Acked-by: Rik van Riel <***@redhat.com>
---
arch/sh/mm/Kconfig | 1 +
arch/x86/Kconfig | 2 +
include/linux/mm_types.h | 11 ++++
include/linux/sched.h | 20 ++++++++
kernel/sched/core.c | 13 +++++
kernel/sched/fair.c | 125 ++++++++++++++++++++++++++++++++++++++++++++++
kernel/sched/features.h | 7 +++
kernel/sched/sched.h | 6 +++
kernel/sysctl.c | 24 ++++++++-
mm/huge_memory.c | 5 +-
mm/memory.c | 14 +++++-
11 files changed, 224 insertions(+), 4 deletions(-)

diff --git a/arch/sh/mm/Kconfig b/arch/sh/mm/Kconfig
index cb8f992..0f7c852 100644
--- a/arch/sh/mm/Kconfig
+++ b/arch/sh/mm/Kconfig
@@ -111,6 +111,7 @@ config VSYSCALL
config NUMA
bool "Non Uniform Memory Access (NUMA) Support"
depends on MMU && SYS_SUPPORTS_NUMA && EXPERIMENTAL
+ select ARCH_WANT_NUMA_VARIABLE_LOCALITY
default n
help
Some SH systems have many various memories scattered around
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 46c3bff..1137028 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -22,6 +22,8 @@ config X86
def_bool y
select HAVE_AOUT if X86_32
select HAVE_UNSTABLE_SCHED_CLOCK
+ select ARCH_SUPPORTS_NUMA_BALANCING
+ select ARCH_WANTS_PROT_NUMA_PROT_NONE
select HAVE_IDE
select HAVE_OPROFILE
select HAVE_PCSPKR_PLATFORM
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 31f8a3a..d82accb 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -398,6 +398,17 @@ struct mm_struct {
#ifdef CONFIG_CPUMASK_OFFSTACK
struct cpumask cpumask_allocation;
#endif
+#ifdef CONFIG_BALANCE_NUMA
+ /*
+ * numa_next_scan is the next time when the PTEs will me marked
+ * pte_numa to gather statistics and migrate pages to new nodes
+ * if necessary
+ */
+ unsigned long numa_next_scan;
+
+ /* numa_scan_seq prevents two threads setting pte_numa */
+ int numa_scan_seq;
+#endif
struct uprobes_state uprobes_state;
};

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 0dd42a0..ac71181 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1479,6 +1479,14 @@ struct task_struct {
short il_next;
short pref_node_fork;
#endif
+#ifdef CONFIG_BALANCE_NUMA
+ int numa_scan_seq;
+ int numa_migrate_seq;
+ unsigned int numa_scan_period;
+ u64 node_stamp; /* migration stamp */
+ struct callback_head numa_work;
+#endif /* CONFIG_BALANCE_NUMA */
+
struct rcu_head rcu;

/*
@@ -1553,6 +1561,14 @@ struct task_struct {
/* Future-safe accessor for struct task_struct's cpus_allowed. */
#define tsk_cpus_allowed(tsk) (&(tsk)->cpus_allowed)

+#ifdef CONFIG_BALANCE_NUMA
+extern void task_numa_fault(int node, int pages);
+#else
+static inline void task_numa_fault(int node, int pages)
+{
+}
+#endif
+
/*
* Priority of a process goes from 0..MAX_PRIO-1, valid RT
* priority is 0..MAX_RT_PRIO-1, and SCHED_NORMAL/SCHED_BATCH
@@ -1990,6 +2006,10 @@ enum sched_tunable_scaling {
};
extern enum sched_tunable_scaling sysctl_sched_tunable_scaling;

+extern unsigned int sysctl_balance_numa_scan_period_min;
+extern unsigned int sysctl_balance_numa_scan_period_max;
+extern unsigned int sysctl_balance_numa_settle_count;
+
#ifdef CONFIG_SCHED_DEBUG
extern unsigned int sysctl_sched_migration_cost;
extern unsigned int sysctl_sched_nr_migrate;
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2d8927f..81fa185 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1533,6 +1533,19 @@ static void __sched_fork(struct task_struct *p)
#ifdef CONFIG_PREEMPT_NOTIFIERS
INIT_HLIST_HEAD(&p->preempt_notifiers);
#endif
+
+#ifdef CONFIG_BALANCE_NUMA
+ if (p->mm && atomic_read(&p->mm->mm_users) == 1) {
+ p->mm->numa_next_scan = jiffies;
+ p->mm->numa_scan_seq = 0;
+ }
+
+ p->node_stamp = 0ULL;
+ p->numa_scan_seq = p->mm ? p->mm->numa_scan_seq : 0;
+ p->numa_migrate_seq = p->mm ? p->mm->numa_scan_seq - 1 : 0;
+ p->numa_scan_period = sysctl_balance_numa_scan_period_min;
+ p->numa_work.next = &p->numa_work;
+#endif /* CONFIG_BALANCE_NUMA */
}

/*
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 6b800a1..b6d3ed7 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -26,6 +26,8 @@
#include <linux/slab.h>
#include <linux/profile.h>
#include <linux/interrupt.h>
+#include <linux/mempolicy.h>
+#include <linux/task_work.h>

#include <trace/events/sched.h>

@@ -776,6 +778,126 @@ update_stats_curr_start(struct cfs_rq *cfs_rq, struct sched_entity *se)
* Scheduling class queueing methods:
*/

+#ifdef CONFIG_BALANCE_NUMA
+/*
+ * numa task sample period in ms: 5s
+ */
+unsigned int sysctl_balance_numa_scan_period_min = 5000;
+unsigned int sysctl_balance_numa_scan_period_max = 5000*16;
+
+static void task_numa_placement(struct task_struct *p)
+{
+ int seq = ACCESS_ONCE(p->mm->numa_scan_seq);
+
+ if (p->numa_scan_seq == seq)
+ return;
+ p->numa_scan_seq = seq;
+
+ /* FIXME: Scheduling placement policy hints go here */
+}
+
+/*
+ * Got a PROT_NONE fault for a page on @node.
+ */
+void task_numa_fault(int node, int pages)
+{
+ struct task_struct *p = current;
+
+ /* FIXME: Allocate task-specific structure for placement policy here */
+
+ task_numa_placement(p);
+}
+
+/*
+ * The expensive part of numa migration is done from task_work context.
+ * Triggered from task_tick_numa().
+ */
+void task_numa_work(struct callback_head *work)
+{
+ unsigned long migrate, next_scan, now = jiffies;
+ struct task_struct *p = current;
+ struct mm_struct *mm = p->mm;
+
+ WARN_ON_ONCE(p != container_of(work, struct task_struct, numa_work));
+
+ work->next = work; /* protect against double add */
+ /*
+ * Who cares about NUMA placement when they're dying.
+ *
+ * NOTE: make sure not to dereference p->mm before this check,
+ * exit_task_work() happens _after_ exit_mm() so we could be called
+ * without p->mm even though we still had it when we enqueued this
+ * work.
+ */
+ if (p->flags & PF_EXITING)
+ return;
+
+ /*
+ * Enforce maximal scan/migration frequency..
+ */
+ migrate = mm->numa_next_scan;
+ if (time_before(now, migrate))
+ return;
+
+ if (p->numa_scan_period == 0)
+ p->numa_scan_period = sysctl_balance_numa_scan_period_min;
+
+ next_scan = now + 2*msecs_to_jiffies(p->numa_scan_period);
+ if (cmpxchg(&mm->numa_next_scan, migrate, next_scan) != migrate)
+ return;
+
+ ACCESS_ONCE(mm->numa_scan_seq)++;
+ {
+ struct vm_area_struct *vma;
+
+ down_read(&mm->mmap_sem);
+ for (vma = mm->mmap; vma; vma = vma->vm_next) {
+ if (!vma_migratable(vma))
+ continue;
+ change_prot_numa(vma, vma->vm_start, vma->vm_end);
+ }
+ up_read(&mm->mmap_sem);
+ }
+}
+
+/*
+ * Drive the periodic memory faults..
+ */
+void task_tick_numa(struct rq *rq, struct task_struct *curr)
+{
+ struct callback_head *work = &curr->numa_work;
+ u64 period, now;
+
+ /*
+ * We don't care about NUMA placement if we don't have memory.
+ */
+ if (!curr->mm || (curr->flags & PF_EXITING) || work->next != work)
+ return;
+
+ /*
+ * Using runtime rather than walltime has the dual advantage that
+ * we (mostly) drive the selection from busy threads and that the
+ * task needs to have done some actual work before we bother with
+ * NUMA placement.
+ */
+ now = curr->se.sum_exec_runtime;
+ period = (u64)curr->numa_scan_period * NSEC_PER_MSEC;
+
+ if (now - curr->node_stamp > period) {
+ curr->node_stamp = now;
+
+ if (!time_before(jiffies, curr->mm->numa_next_scan)) {
+ init_task_work(work, task_numa_work); /* TODO: move this into sched_fork() */
+ task_work_add(curr, work, true);
+ }
+ }
+}
+#else
+static void task_tick_numa(struct rq *rq, struct task_struct *curr)
+{
+}
+#endif /* CONFIG_BALANCE_NUMA */
+
static void
account_entity_enqueue(struct cfs_rq *cfs_rq, struct sched_entity *se)
{
@@ -4954,6 +5076,9 @@ static void task_tick_fair(struct rq *rq, struct task_struct *curr, int queued)
cfs_rq = cfs_rq_of(se);
entity_tick(cfs_rq, se, queued);
}
+
+ if (sched_feat_numa(NUMA))
+ task_tick_numa(rq, curr);
}

/*
diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index eebefca..7cfd289 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -61,3 +61,10 @@ SCHED_FEAT(TTWU_QUEUE, true)
SCHED_FEAT(FORCE_SD_OVERLAP, false)
SCHED_FEAT(RT_RUNTIME_SHARE, true)
SCHED_FEAT(LB_MIN, false)
+
+/*
+ * Apply the automatic NUMA scheduling policy
+ */
+#ifdef CONFIG_BALANCE_NUMA
+SCHED_FEAT(NUMA, true)
+#endif
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 7a7db09..9a43241 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -648,6 +648,12 @@ extern struct static_key sched_feat_keys[__SCHED_FEAT_NR];
#define sched_feat(x) (sysctl_sched_features & (1UL << __SCHED_FEAT_##x))
#endif /* SCHED_DEBUG && HAVE_JUMP_LABEL */

+#ifdef CONFIG_BALANCE_NUMA
+#define sched_feat_numa(x) sched_feat(x)
+#else
+#define sched_feat_numa(x) (0)
+#endif
+
static inline u64 global_rt_period(void)
{
return (u64)sysctl_sched_rt_period * NSEC_PER_USEC;
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 26f65ea..1359f51 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -256,9 +256,11 @@ static int min_sched_granularity_ns = 100000; /* 100 usecs */
static int max_sched_granularity_ns = NSEC_PER_SEC; /* 1 second */
static int min_wakeup_granularity_ns; /* 0 usecs */
static int max_wakeup_granularity_ns = NSEC_PER_SEC; /* 1 second */
+#ifdef CONFIG_SMP
static int min_sched_tunable_scaling = SCHED_TUNABLESCALING_NONE;
static int max_sched_tunable_scaling = SCHED_TUNABLESCALING_END-1;
-#endif
+#endif /* CONFIG_SMP */
+#endif /* CONFIG_SCHED_DEBUG */

#ifdef CONFIG_COMPACTION
static int min_extfrag_threshold;
@@ -301,6 +303,7 @@ static struct ctl_table kern_table[] = {
.extra1 = &min_wakeup_granularity_ns,
.extra2 = &max_wakeup_granularity_ns,
},
+#ifdef CONFIG_SMP
{
.procname = "sched_tunable_scaling",
.data = &sysctl_sched_tunable_scaling,
@@ -347,7 +350,24 @@ static struct ctl_table kern_table[] = {
.extra1 = &zero,
.extra2 = &one,
},
-#endif
+#endif /* CONFIG_SMP */
+#ifdef CONFIG_BALANCE_NUMA
+ {
+ .procname = "balance_numa_scan_period_min_ms",
+ .data = &sysctl_balance_numa_scan_period_min,
+ .maxlen = sizeof(unsigned int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec,
+ },
+ {
+ .procname = "balance_numa_scan_period_max_ms",
+ .data = &sysctl_balance_numa_scan_period_max,
+ .maxlen = sizeof(unsigned int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec,
+ },
+#endif /* CONFIG_BALANCE_NUMA */
+#endif /* CONFIG_SCHED_DEBUG */
{
.procname = "sched_rt_period_us",
.data = &sysctl_sched_rt_period,
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 68e0412..b3d4c4b 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1045,6 +1045,7 @@ int do_huge_pmd_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
*/
split_huge_page(page);
put_page(page);
+
return 0;

clear_pmdnuma:
@@ -1059,8 +1060,10 @@ clear_pmdnuma:

out_unlock:
spin_unlock(&mm->page_table_lock);
- if (page)
+ if (page) {
put_page(page);
+ task_numa_fault(numa_node_id(), HPAGE_PMD_NR);
+ }
return 0;
}

diff --git a/mm/memory.c b/mm/memory.c
index 1757ad8..1d6f85a 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3454,7 +3454,8 @@ int do_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
{
struct page *page = NULL;
spinlock_t *ptl;
- int current_nid, target_nid;
+ int current_nid = -1;
+ int target_nid;

/*
* The "pte" at this point cannot be used safely without
@@ -3501,6 +3502,7 @@ int do_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
current_nid = target_nid;

out:
+ task_numa_fault(current_nid, 1);
return 0;
}

@@ -3537,6 +3539,7 @@ static int do_pmd_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
for (addr = _addr + offset; addr < _addr + PMD_SIZE; pte++, addr += PAGE_SIZE) {
pte_t pteval = *pte;
struct page *page;
+ int curr_nid;
if (!pte_present(pteval))
continue;
if (!pte_numa(pteval))
@@ -3554,6 +3557,15 @@ static int do_pmd_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
page = vm_normal_page(vma, addr, pteval);
if (unlikely(!page))
continue;
+ /* only check non-shared pages */
+ if (unlikely(page_mapcount(page) != 1))
+ continue;
+ pte_unmap_unlock(pte, ptl);
+
+ curr_nid = page_to_nid(page);
+ task_numa_fault(curr_nid, 1);
+
+ pte = pte_offset_map_lock(mm, pmdp, addr, &ptl);
}
pte_unmap_unlock(orig_pte, ptl);
--
1.7.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Simon Jeons
2013-01-04 12:00:02 UTC
Permalink
Post by Mel Gorman
NOTE: This patch is based on "sched, numa, mm: Add fault driven
placement and migration policy" but as it throws away all the policy
to just leave a basic foundation I had to drop the signed-offs-by.
This patch creates a bare-bones method for setting PTEs pte_numa in the
context of the scheduler that when faulted later will be faulted onto the
node the CPU is running on. In itself this does nothing useful but any
placement policy will fundamentally depend on receiving hints on placement
from fault context and doing something intelligent about it.
---
arch/sh/mm/Kconfig | 1 +
arch/x86/Kconfig | 2 +
include/linux/mm_types.h | 11 ++++
include/linux/sched.h | 20 ++++++++
kernel/sched/core.c | 13 +++++
kernel/sched/fair.c | 125 ++++++++++++++++++++++++++++++++++++++++++++++
kernel/sched/features.h | 7 +++
kernel/sched/sched.h | 6 +++
kernel/sysctl.c | 24 ++++++++-
mm/huge_memory.c | 5 +-
mm/memory.c | 14 +++++-
11 files changed, 224 insertions(+), 4 deletions(-)
diff --git a/arch/sh/mm/Kconfig b/arch/sh/mm/Kconfig
index cb8f992..0f7c852 100644
--- a/arch/sh/mm/Kconfig
+++ b/arch/sh/mm/Kconfig
@@ -111,6 +111,7 @@ config VSYSCALL
config NUMA
bool "Non Uniform Memory Access (NUMA) Support"
depends on MMU && SYS_SUPPORTS_NUMA && EXPERIMENTAL
+ select ARCH_WANT_NUMA_VARIABLE_LOCALITY
default n
help
Some SH systems have many various memories scattered around
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 46c3bff..1137028 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -22,6 +22,8 @@ config X86
def_bool y
select HAVE_AOUT if X86_32
select HAVE_UNSTABLE_SCHED_CLOCK
+ select ARCH_SUPPORTS_NUMA_BALANCING
+ select ARCH_WANTS_PROT_NUMA_PROT_NONE
select HAVE_IDE
select HAVE_OPROFILE
select HAVE_PCSPKR_PLATFORM
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 31f8a3a..d82accb 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -398,6 +398,17 @@ struct mm_struct {
#ifdef CONFIG_CPUMASK_OFFSTACK
struct cpumask cpumask_allocation;
#endif
+#ifdef CONFIG_BALANCE_NUMA
+ /*
+ * numa_next_scan is the next time when the PTEs will me marked
s/me/be
Post by Mel Gorman
+ * pte_numa to gather statistics and migrate pages to new nodes
+ * if necessary
+ */
+ unsigned long numa_next_scan;
+
+ /* numa_scan_seq prevents two threads setting pte_numa */
+ int numa_scan_seq;
+#endif
struct uprobes_state uprobes_state;
};
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 0dd42a0..ac71181 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1479,6 +1479,14 @@ struct task_struct {
short il_next;
short pref_node_fork;
#endif
+#ifdef CONFIG_BALANCE_NUMA
+ int numa_scan_seq;
+ int numa_migrate_seq;
+ unsigned int numa_scan_period;
+ u64 node_stamp; /* migration stamp */
+ struct callback_head numa_work;
+#endif /* CONFIG_BALANCE_NUMA */
+
struct rcu_head rcu;
/*
@@ -1553,6 +1561,14 @@ struct task_struct {
/* Future-safe accessor for struct task_struct's cpus_allowed. */
#define tsk_cpus_allowed(tsk) (&(tsk)->cpus_allowed)
+#ifdef CONFIG_BALANCE_NUMA
+extern void task_numa_fault(int node, int pages);
+#else
+static inline void task_numa_fault(int node, int pages)
+{
+}
+#endif
+
/*
* Priority of a process goes from 0..MAX_PRIO-1, valid RT
* priority is 0..MAX_RT_PRIO-1, and SCHED_NORMAL/SCHED_BATCH
@@ -1990,6 +2006,10 @@ enum sched_tunable_scaling {
};
extern enum sched_tunable_scaling sysctl_sched_tunable_scaling;
+extern unsigned int sysctl_balance_numa_scan_period_min;
+extern unsigned int sysctl_balance_numa_scan_period_max;
+extern unsigned int sysctl_balance_numa_settle_count;
+
#ifdef CONFIG_SCHED_DEBUG
extern unsigned int sysctl_sched_migration_cost;
extern unsigned int sysctl_sched_nr_migrate;
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2d8927f..81fa185 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1533,6 +1533,19 @@ static void __sched_fork(struct task_struct *p)
#ifdef CONFIG_PREEMPT_NOTIFIERS
INIT_HLIST_HEAD(&p->preempt_notifiers);
#endif
+
+#ifdef CONFIG_BALANCE_NUMA
+ if (p->mm && atomic_read(&p->mm->mm_users) == 1) {
+ p->mm->numa_next_scan = jiffies;
+ p->mm->numa_scan_seq = 0;
+ }
+
+ p->node_stamp = 0ULL;
+ p->numa_scan_seq = p->mm ? p->mm->numa_scan_seq : 0;
+ p->numa_migrate_seq = p->mm ? p->mm->numa_scan_seq - 1 : 0;
+ p->numa_scan_period = sysctl_balance_numa_scan_period_min;
+ p->numa_work.next = &p->numa_work;
+#endif /* CONFIG_BALANCE_NUMA */
}
/*
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 6b800a1..b6d3ed7 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -26,6 +26,8 @@
#include <linux/slab.h>
#include <linux/profile.h>
#include <linux/interrupt.h>
+#include <linux/mempolicy.h>
+#include <linux/task_work.h>
#include <trace/events/sched.h>
@@ -776,6 +778,126 @@ update_stats_curr_start(struct cfs_rq *cfs_rq, struct sched_entity *se)
*/
+#ifdef CONFIG_BALANCE_NUMA
+/*
+ * numa task sample period in ms: 5s
+ */
+unsigned int sysctl_balance_numa_scan_period_min = 5000;
+unsigned int sysctl_balance_numa_scan_period_max = 5000*16;
+
+static void task_numa_placement(struct task_struct *p)
+{
+ int seq = ACCESS_ONCE(p->mm->numa_scan_seq);
+
+ if (p->numa_scan_seq == seq)
+ return;
+ p->numa_scan_seq = seq;
+
+ /* FIXME: Scheduling placement policy hints go here */
+}
+
+/*
+ */
+void task_numa_fault(int node, int pages)
+{
+ struct task_struct *p = current;
+
+ /* FIXME: Allocate task-specific structure for placement policy here */
+
+ task_numa_placement(p);
+}
+
+/*
+ * The expensive part of numa migration is done from task_work context.
+ * Triggered from task_tick_numa().
+ */
+void task_numa_work(struct callback_head *work)
+{
+ unsigned long migrate, next_scan, now = jiffies;
+ struct task_struct *p = current;
+ struct mm_struct *mm = p->mm;
+
+ WARN_ON_ONCE(p != container_of(work, struct task_struct, numa_work));
+
+ work->next = work; /* protect against double add */
+ /*
+ * Who cares about NUMA placement when they're dying.
+ *
+ * NOTE: make sure not to dereference p->mm before this check,
+ * exit_task_work() happens _after_ exit_mm() so we could be called
+ * without p->mm even though we still had it when we enqueued this
+ * work.
+ */
+ if (p->flags & PF_EXITING)
+ return;
+
+ /*
+ * Enforce maximal scan/migration frequency..
+ */
+ migrate = mm->numa_next_scan;
+ if (time_before(now, migrate))
+ return;
+
+ if (p->numa_scan_period == 0)
+ p->numa_scan_period = sysctl_balance_numa_scan_period_min;
+
+ next_scan = now + 2*msecs_to_jiffies(p->numa_scan_period);
+ if (cmpxchg(&mm->numa_next_scan, migrate, next_scan) != migrate)
+ return;
+
+ ACCESS_ONCE(mm->numa_scan_seq)++;
+ {
+ struct vm_area_struct *vma;
+
+ down_read(&mm->mmap_sem);
+ for (vma = mm->mmap; vma; vma = vma->vm_next) {
+ if (!vma_migratable(vma))
+ continue;
+ change_prot_numa(vma, vma->vm_start, vma->vm_end);
+ }
+ up_read(&mm->mmap_sem);
+ }
+}
+
+/*
+ * Drive the periodic memory faults..
+ */
+void task_tick_numa(struct rq *rq, struct task_struct *curr)
+{
+ struct callback_head *work = &curr->numa_work;
+ u64 period, now;
+
+ /*
+ * We don't care about NUMA placement if we don't have memory.
+ */
+ if (!curr->mm || (curr->flags & PF_EXITING) || work->next != work)
+ return;
+
+ /*
+ * Using runtime rather than walltime has the dual advantage that
+ * we (mostly) drive the selection from busy threads and that the
+ * task needs to have done some actual work before we bother with
+ * NUMA placement.
+ */
+ now = curr->se.sum_exec_runtime;
+ period = (u64)curr->numa_scan_period * NSEC_PER_MSEC;
+
+ if (now - curr->node_stamp > period) {
+ curr->node_stamp = now;
+
+ if (!time_before(jiffies, curr->mm->numa_next_scan)) {
+ init_task_work(work, task_numa_work); /* TODO: move this into sched_fork() */
+ task_work_add(curr, work, true);
+ }
+ }
+}
+#else
+static void task_tick_numa(struct rq *rq, struct task_struct *curr)
+{
+}
+#endif /* CONFIG_BALANCE_NUMA */
+
static void
account_entity_enqueue(struct cfs_rq *cfs_rq, struct sched_entity *se)
{
@@ -4954,6 +5076,9 @@ static void task_tick_fair(struct rq *rq, struct task_struct *curr, int queued)
cfs_rq = cfs_rq_of(se);
entity_tick(cfs_rq, se, queued);
}
+
+ if (sched_feat_numa(NUMA))
+ task_tick_numa(rq, curr);
}
/*
diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index eebefca..7cfd289 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -61,3 +61,10 @@ SCHED_FEAT(TTWU_QUEUE, true)
SCHED_FEAT(FORCE_SD_OVERLAP, false)
SCHED_FEAT(RT_RUNTIME_SHARE, true)
SCHED_FEAT(LB_MIN, false)
+
+/*
+ * Apply the automatic NUMA scheduling policy
+ */
+#ifdef CONFIG_BALANCE_NUMA
+SCHED_FEAT(NUMA, true)
+#endif
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 7a7db09..9a43241 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -648,6 +648,12 @@ extern struct static_key sched_feat_keys[__SCHED_FEAT_NR];
#define sched_feat(x) (sysctl_sched_features & (1UL << __SCHED_FEAT_##x))
#endif /* SCHED_DEBUG && HAVE_JUMP_LABEL */
+#ifdef CONFIG_BALANCE_NUMA
+#define sched_feat_numa(x) sched_feat(x)
+#else
+#define sched_feat_numa(x) (0)
+#endif
+
static inline u64 global_rt_period(void)
{
return (u64)sysctl_sched_rt_period * NSEC_PER_USEC;
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 26f65ea..1359f51 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -256,9 +256,11 @@ static int min_sched_granularity_ns = 100000; /* 100 usecs */
static int max_sched_granularity_ns = NSEC_PER_SEC; /* 1 second */
static int min_wakeup_granularity_ns; /* 0 usecs */
static int max_wakeup_granularity_ns = NSEC_PER_SEC; /* 1 second */
+#ifdef CONFIG_SMP
static int min_sched_tunable_scaling = SCHED_TUNABLESCALING_NONE;
static int max_sched_tunable_scaling = SCHED_TUNABLESCALING_END-1;
-#endif
+#endif /* CONFIG_SMP */
+#endif /* CONFIG_SCHED_DEBUG */
#ifdef CONFIG_COMPACTION
static int min_extfrag_threshold;
@@ -301,6 +303,7 @@ static struct ctl_table kern_table[] = {
.extra1 = &min_wakeup_granularity_ns,
.extra2 = &max_wakeup_granularity_ns,
},
+#ifdef CONFIG_SMP
{
.procname = "sched_tunable_scaling",
.data = &sysctl_sched_tunable_scaling,
@@ -347,7 +350,24 @@ static struct ctl_table kern_table[] = {
.extra1 = &zero,
.extra2 = &one,
},
-#endif
+#endif /* CONFIG_SMP */
+#ifdef CONFIG_BALANCE_NUMA
+ {
+ .procname = "balance_numa_scan_period_min_ms",
+ .data = &sysctl_balance_numa_scan_period_min,
+ .maxlen = sizeof(unsigned int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec,
+ },
+ {
+ .procname = "balance_numa_scan_period_max_ms",
+ .data = &sysctl_balance_numa_scan_period_max,
+ .maxlen = sizeof(unsigned int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec,
+ },
+#endif /* CONFIG_BALANCE_NUMA */
+#endif /* CONFIG_SCHED_DEBUG */
{
.procname = "sched_rt_period_us",
.data = &sysctl_sched_rt_period,
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 68e0412..b3d4c4b 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1045,6 +1045,7 @@ int do_huge_pmd_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
*/
split_huge_page(page);
put_page(page);
+
return 0;
spin_unlock(&mm->page_table_lock);
- if (page)
+ if (page) {
put_page(page);
+ task_numa_fault(numa_node_id(), HPAGE_PMD_NR);
+ }
return 0;
}
diff --git a/mm/memory.c b/mm/memory.c
index 1757ad8..1d6f85a 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3454,7 +3454,8 @@ int do_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
{
struct page *page = NULL;
spinlock_t *ptl;
- int current_nid, target_nid;
+ int current_nid = -1;
+ int target_nid;
/*
* The "pte" at this point cannot be used safely without
@@ -3501,6 +3502,7 @@ int do_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
current_nid = target_nid;
+ task_numa_fault(current_nid, 1);
return 0;
}
@@ -3537,6 +3539,7 @@ static int do_pmd_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
for (addr = _addr + offset; addr < _addr + PMD_SIZE; pte++, addr += PAGE_SIZE) {
pte_t pteval = *pte;
struct page *page;
+ int curr_nid;
if (!pte_present(pteval))
continue;
if (!pte_numa(pteval))
@@ -3554,6 +3557,15 @@ static int do_pmd_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
page = vm_normal_page(vma, addr, pteval);
if (unlikely(!page))
continue;
+ /* only check non-shared pages */
+ if (unlikely(page_mapcount(page) != 1))
+ continue;
+ pte_unmap_unlock(pte, ptl);
+
+ curr_nid = page_to_nid(page);
+ task_numa_fault(curr_nid, 1);
+
+ pte = pte_offset_map_lock(mm, pmdp, addr, &ptl);
}
pte_unmap_unlock(orig_pte, ptl);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Mel Gorman
2012-12-07 10:40:01 UTC
Permalink
The use of MPOL_NOOP and MPOL_MF_LAZY to allow an application to
explicitly request lazy migration is a good idea but the actual
API has not been well reviewed and once released we have to support it.
For now this patch prevents an application using the services. This
will need to be revisited.

Signed-off-by: Mel Gorman <***@suse.de>
---
include/uapi/linux/mempolicy.h | 4 +---
mm/mempolicy.c | 9 ++++-----
2 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/include/uapi/linux/mempolicy.h b/include/uapi/linux/mempolicy.h
index 6a1baae..16fb4e6 100644
--- a/include/uapi/linux/mempolicy.h
+++ b/include/uapi/linux/mempolicy.h
@@ -21,7 +21,6 @@ enum {
MPOL_BIND,
MPOL_INTERLEAVE,
MPOL_LOCAL,
- MPOL_NOOP, /* retain existing policy for range */
MPOL_MAX, /* always last member of enum */
};

@@ -57,8 +56,7 @@ enum mpol_rebind_step {

#define MPOL_MF_VALID (MPOL_MF_STRICT | \
MPOL_MF_MOVE | \
- MPOL_MF_MOVE_ALL | \
- MPOL_MF_LAZY)
+ MPOL_MF_MOVE_ALL)

/*
* Internal flags that share the struct mempolicy flags word with
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 75d4600..a7a62fe 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -252,7 +252,7 @@ static struct mempolicy *mpol_new(unsigned short mode, unsigned short flags,
pr_debug("setting mode %d flags %d nodes[0] %lx\n",
mode, flags, nodes ? nodes_addr(*nodes)[0] : -1);

- if (mode == MPOL_DEFAULT || mode == MPOL_NOOP) {
+ if (mode == MPOL_DEFAULT) {
if (nodes && !nodes_empty(*nodes))
return ERR_PTR(-EINVAL);
return NULL;
@@ -1186,7 +1186,7 @@ static long do_mbind(unsigned long start, unsigned long len,
if (start & ~PAGE_MASK)
return -EINVAL;

- if (mode == MPOL_DEFAULT || mode == MPOL_NOOP)
+ if (mode == MPOL_DEFAULT)
flags &= ~MPOL_MF_STRICT;

len = (len + PAGE_SIZE - 1) & PAGE_MASK;
@@ -1241,7 +1241,7 @@ static long do_mbind(unsigned long start, unsigned long len,
flags | MPOL_MF_INVERT, &pagelist);

err = PTR_ERR(vma); /* maybe ... */
- if (!IS_ERR(vma) && mode != MPOL_NOOP)
+ if (!IS_ERR(vma))
err = mbind_range(mm, start, end, new);

if (!err) {
@@ -2530,7 +2530,6 @@ static const char * const policy_modes[] =
[MPOL_BIND] = "bind",
[MPOL_INTERLEAVE] = "interleave",
[MPOL_LOCAL] = "local",
- [MPOL_NOOP] = "noop", /* should not actually be used */
};


@@ -2581,7 +2580,7 @@ int mpol_parse_str(char *str, struct mempolicy **mpol, int no_context)
break;
}
}
- if (mode >= MPOL_MAX || mode == MPOL_NOOP)
+ if (mode >= MPOL_MAX)
goto out;

switch (mode) {
--
1.7.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Mel Gorman
2012-12-07 10:40:03 UTC
Permalink
From: Andrea Arcangeli <***@redhat.com>

Introduce FOLL_NUMA to tell follow_page to check
pte/pmd_numa. get_user_pages must use FOLL_NUMA, and it's safe to do
so because it always invokes handle_mm_fault and retries the
follow_page later.

KVM secondary MMU page faults will trigger the NUMA hinting page
faults through gup_fast -> get_user_pages -> follow_page ->
handle_mm_fault.

Other follow_page callers like KSM should not use FOLL_NUMA, or they
would fail to get the pages if they use follow_page instead of
get_user_pages.

[ This patch was picked up from the AutoNUMA tree. ]

Originally-by: Andrea Arcangeli <***@redhat.com>
Cc: Linus Torvalds <***@linux-foundation.org>
Cc: Andrew Morton <***@linux-foundation.org>
Cc: Peter Zijlstra <***@chello.nl>
Cc: Andrea Arcangeli <***@redhat.com>
Cc: Rik van Riel <***@redhat.com>
[ ported to this tree. ]
Signed-off-by: Ingo Molnar <***@kernel.org>
Reviewed-by: Rik van Riel <***@redhat.com>
---
include/linux/mm.h | 1 +
mm/memory.c | 17 +++++++++++++++++
2 files changed, 18 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 1856c62..fa16152 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1572,6 +1572,7 @@ struct page *follow_page(struct vm_area_struct *, unsigned long address,
#define FOLL_MLOCK 0x40 /* mark page as mlocked */
#define FOLL_SPLIT 0x80 /* don't return transhuge pages, split them */
#define FOLL_HWPOISON 0x100 /* check page is hwpoisoned */
+#define FOLL_NUMA 0x200 /* force NUMA hinting page fault */

typedef int (*pte_fn_t)(pte_t *pte, pgtable_t token, unsigned long addr,
void *data);
diff --git a/mm/memory.c b/mm/memory.c
index 221fc9f..73834e7 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1517,6 +1517,8 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address,
page = follow_huge_pmd(mm, address, pmd, flags & FOLL_WRITE);
goto out;
}
+ if ((flags & FOLL_NUMA) && pmd_numa(*pmd))
+ goto no_page_table;
if (pmd_trans_huge(*pmd)) {
if (flags & FOLL_SPLIT) {
split_huge_page_pmd(mm, pmd);
@@ -1546,6 +1548,8 @@ split_fallthrough:
pte = *ptep;
if (!pte_present(pte))
goto no_page;
+ if ((flags & FOLL_NUMA) && pte_numa(pte))
+ goto no_page;
if ((flags & FOLL_WRITE) && !pte_write(pte))
goto unlock;

@@ -1697,6 +1701,19 @@ int __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
(VM_WRITE | VM_MAYWRITE) : (VM_READ | VM_MAYREAD);
vm_flags &= (gup_flags & FOLL_FORCE) ?
(VM_MAYREAD | VM_MAYWRITE) : (VM_READ | VM_WRITE);
+
+ /*
+ * If FOLL_FORCE and FOLL_NUMA are both set, handle_mm_fault
+ * would be called on PROT_NONE ranges. We must never invoke
+ * handle_mm_fault on PROT_NONE ranges or the NUMA hinting
+ * page faults would unprotect the PROT_NONE ranges if
+ * _PAGE_NUMA and _PAGE_PROTNONE are sharing the same pte/pmd
+ * bitflag. So to avoid that, don't set FOLL_NUMA if
+ * FOLL_FORCE is set.
+ */
+ if (!(gup_flags & FOLL_FORCE))
+ gup_flags |= FOLL_NUMA;
+
i = 0;

do {
--
1.7.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Mel Gorman
2012-12-07 10:40:03 UTC
Permalink
From: Lee Schermerhorn <***@hp.com>

This patch augments the MPOL_MF_LAZY feature by adding a "NOOP" policy
to mbind(). When the NOOP policy is used with the 'MOVE and 'LAZY
flags, mbind() will map the pages PROT_NONE so that they will be
migrated on the next touch.

This allows an application to prepare for a new phase of operation
where different regions of shared storage will be assigned to
worker threads, w/o changing policy. Note that we could just use
"default" policy in this case. However, this also allows an
application to request that pages be migrated, only if necessary,
to follow any arbitrary policy that might currently apply to a
range of pages, without knowing the policy, or without specifying
multiple mbind()s for ranges with different policies.

[ Bug in early version of mpol_parse_str() reported by Fengguang Wu. ]

Bug-Reported-by: Reported-by: Fengguang Wu <***@intel.com>
Signed-off-by: Lee Schermerhorn <***@hp.com>
Reviewed-by: Rik van Riel <***@redhat.com>
Cc: Andrew Morton <***@linux-foundation.org>
Cc: Linus Torvalds <***@linux-foundation.org>
Signed-off-by: Peter Zijlstra <***@chello.nl>
Signed-off-by: Ingo Molnar <***@kernel.org>
Signed-off-by: Mel Gorman <***@suse.de>
---
include/uapi/linux/mempolicy.h | 1 +
mm/mempolicy.c | 11 ++++++-----
2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/include/uapi/linux/mempolicy.h b/include/uapi/linux/mempolicy.h
index 3e835c9..d23dca8 100644
--- a/include/uapi/linux/mempolicy.h
+++ b/include/uapi/linux/mempolicy.h
@@ -21,6 +21,7 @@ enum {
MPOL_BIND,
MPOL_INTERLEAVE,
MPOL_LOCAL,
+ MPOL_NOOP, /* retain existing policy for range */
MPOL_MAX, /* always last member of enum */
};

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 54bd3e5..c21e914 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -251,10 +251,10 @@ static struct mempolicy *mpol_new(unsigned short mode, unsigned short flags,
pr_debug("setting mode %d flags %d nodes[0] %lx\n",
mode, flags, nodes ? nodes_addr(*nodes)[0] : -1);

- if (mode == MPOL_DEFAULT) {
+ if (mode == MPOL_DEFAULT || mode == MPOL_NOOP) {
if (nodes && !nodes_empty(*nodes))
return ERR_PTR(-EINVAL);
- return NULL; /* simply delete any existing policy */
+ return NULL;
}
VM_BUG_ON(!nodes);

@@ -1147,7 +1147,7 @@ static long do_mbind(unsigned long start, unsigned long len,
if (start & ~PAGE_MASK)
return -EINVAL;

- if (mode == MPOL_DEFAULT)
+ if (mode == MPOL_DEFAULT || mode == MPOL_NOOP)
flags &= ~MPOL_MF_STRICT;

len = (len + PAGE_SIZE - 1) & PAGE_MASK;
@@ -2409,7 +2409,8 @@ static const char * const policy_modes[] =
[MPOL_PREFERRED] = "prefer",
[MPOL_BIND] = "bind",
[MPOL_INTERLEAVE] = "interleave",
- [MPOL_LOCAL] = "local"
+ [MPOL_LOCAL] = "local",
+ [MPOL_NOOP] = "noop", /* should not actually be used */
};


@@ -2460,7 +2461,7 @@ int mpol_parse_str(char *str, struct mempolicy **mpol, int no_context)
break;
}
}
- if (mode >= MPOL_MAX)
+ if (mode >= MPOL_MAX || mode == MPOL_NOOP)
goto out;

switch (mode) {
--
1.7.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Mel Gorman
2012-12-07 10:40:03 UTC
Permalink
This is the simplest possible policy that still does something of note.
When a pte_numa is faulted, it is moved immediately. Any replacement
policy must at least do better than this and in all likelihood this
policy regresses normal workloads.

Signed-off-by: Mel Gorman <***@suse.de>
Acked-by: Rik van Riel <***@redhat.com>
---
include/uapi/linux/mempolicy.h | 1 +
mm/mempolicy.c | 38 ++++++++++++++++++++++++++++++++++++--
2 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/mempolicy.h b/include/uapi/linux/mempolicy.h
index 16fb4e6..0d11c3d 100644
--- a/include/uapi/linux/mempolicy.h
+++ b/include/uapi/linux/mempolicy.h
@@ -67,6 +67,7 @@ enum mpol_rebind_step {
#define MPOL_F_LOCAL (1 << 1) /* preferred local allocation */
#define MPOL_F_REBINDING (1 << 2) /* identify policies in rebinding */
#define MPOL_F_MOF (1 << 3) /* this policy wants migrate on fault */
+#define MPOL_F_MORON (1 << 4) /* Migrate On pte_numa Reference On Node */


#endif /* _UAPI_LINUX_MEMPOLICY_H */
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 516491f..4c1c8d8 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -118,6 +118,26 @@ static struct mempolicy default_policy = {
.flags = MPOL_F_LOCAL,
};

+static struct mempolicy preferred_node_policy[MAX_NUMNODES];
+
+static struct mempolicy *get_task_policy(struct task_struct *p)
+{
+ struct mempolicy *pol = p->mempolicy;
+ int node;
+
+ if (!pol) {
+ node = numa_node_id();
+ if (node != -1)
+ pol = &preferred_node_policy[node];
+
+ /* preferred_node_policy is not initialised early in boot */
+ if (!pol->mode)
+ pol = NULL;
+ }
+
+ return pol;
+}
+
static const struct mempolicy_operations {
int (*create)(struct mempolicy *pol, const nodemask_t *nodes);
/*
@@ -1598,7 +1618,7 @@ asmlinkage long compat_sys_mbind(compat_ulong_t start, compat_ulong_t len,
struct mempolicy *get_vma_policy(struct task_struct *task,
struct vm_area_struct *vma, unsigned long addr)
{
- struct mempolicy *pol = task->mempolicy;
+ struct mempolicy *pol = get_task_policy(task);

if (vma) {
if (vma->vm_ops && vma->vm_ops->get_policy) {
@@ -2021,7 +2041,7 @@ retry_cpuset:
*/
struct page *alloc_pages_current(gfp_t gfp, unsigned order)
{
- struct mempolicy *pol = current->mempolicy;
+ struct mempolicy *pol = get_task_policy(current);
struct page *page;
unsigned int cpuset_mems_cookie;

@@ -2295,6 +2315,11 @@ int mpol_misplaced(struct page *page, struct vm_area_struct *vma, unsigned long
default:
BUG();
}
+
+ /* Migrate the page towards the node whose CPU is referencing it */
+ if (pol->flags & MPOL_F_MORON)
+ polnid = numa_node_id();
+
if (curnid != polnid)
ret = polnid;
out:
@@ -2483,6 +2508,15 @@ void __init numa_policy_init(void)
sizeof(struct sp_node),
0, SLAB_PANIC, NULL);

+ for_each_node(nid) {
+ preferred_node_policy[nid] = (struct mempolicy) {
+ .refcnt = ATOMIC_INIT(1),
+ .mode = MPOL_PREFERRED,
+ .flags = MPOL_F_MOF | MPOL_F_MORON,
+ .v = { .preferred_node = nid, },
+ };
+ }
+
/*
* Set interleaving policy for system init. Interleaving is only
* enabled across suitably sized nodes (default is >= 16MB), or
--
1.7.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Mel Gorman
2012-12-07 10:40:03 UTC
Permalink
Compaction already has tracepoints to count scanned and isolated pages
but it requires that ftrace be enabled and if that information has to be
written to disk then it can be disruptive. This patch adds vmstat counters
for compaction called compact_migrate_scanned, compact_free_scanned and
compact_isolated.

With these counters, it is possible to define a basic cost model for
compaction. This approximates of how much work compaction is doing and can
be compared that with an oprofile showing TLB misses and see if the cost of
compaction is being offset by THP for example. Minimally a compaction patch
can be evaluated in terms of whether it increases or decreases cost. The
basic cost model looks like this

Fundamental unit u: a word sizeof(void *)

Ca = cost of struct page access = sizeof(struct page) / u

Cmc = Cost migrate page copy = (Ca + PAGE_SIZE/u) * 2
Cmf = Cost migrate failure = Ca * 2
Ci = Cost page isolation = (Ca + Wi)
where Wi is a constant that should reflect the approximate
cost of the locking operation.

Csm = Cost migrate scanning = Ca
Csf = Cost free scanning = Ca

Overall cost = (Csm * compact_migrate_scanned) +
(Csf * compact_free_scanned) +
(Ci * compact_isolated) +
(Cmc * pgmigrate_success) +
(Cmf * pgmigrate_failed)

Where the values are read from /proc/vmstat.

This is very basic and ignores certain costs such as the allocation cost
to do a migrate page copy but any improvement to the model would still
use the same vmstat counters.

Signed-off-by: Mel Gorman <***@suse.de>
Reviewed-by: Rik van Riel <***@redhat.com>
---
include/linux/vm_event_item.h | 2 ++
mm/compaction.c | 8 ++++++++
mm/vmstat.c | 3 +++
3 files changed, 13 insertions(+)

diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
index 8aa7cb9..a1f750b 100644
--- a/include/linux/vm_event_item.h
+++ b/include/linux/vm_event_item.h
@@ -42,6 +42,8 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
PGMIGRATE_SUCCESS, PGMIGRATE_FAIL,
#endif
#ifdef CONFIG_COMPACTION
+ COMPACTMIGRATE_SCANNED, COMPACTFREE_SCANNED,
+ COMPACTISOLATED,
COMPACTSTALL, COMPACTFAIL, COMPACTSUCCESS,
#endif
#ifdef CONFIG_HUGETLB_PAGE
diff --git a/mm/compaction.c b/mm/compaction.c
index 2c077a7..aee7443 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -356,6 +356,10 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
if (blockpfn == end_pfn)
update_pageblock_skip(cc, valid_page, total_isolated, false);

+ count_vm_events(COMPACTFREE_SCANNED, nr_scanned);
+ if (total_isolated)
+ count_vm_events(COMPACTISOLATED, total_isolated);
+
return total_isolated;
}

@@ -646,6 +650,10 @@ next_pageblock:

trace_mm_compaction_isolate_migratepages(nr_scanned, nr_isolated);

+ count_vm_events(COMPACTMIGRATE_SCANNED, nr_scanned);
+ if (nr_isolated)
+ count_vm_events(COMPACTISOLATED, nr_isolated);
+
return low_pfn;
}

diff --git a/mm/vmstat.c b/mm/vmstat.c
index 89a7fd6..3a067fa 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -779,6 +779,9 @@ const char * const vmstat_text[] = {
"pgmigrate_fail",
#endif
#ifdef CONFIG_COMPACTION
+ "compact_migrate_scanned",
+ "compact_free_scanned",
+ "compact_isolated",
"compact_stall",
"compact_fail",
"compact_success",
--
1.7.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Ingo Molnar
2012-12-07 11:10:02 UTC
Permalink
This is a full release of all the patches so apologies for the
flood. [...]
I have yet to process all your mails, but assuming I address all
your review feedback and the latest unified tree in tip:master
shows no regression in your testing, would you be willing to
start using it for ongoing work?

It would make it much easier for me to pick up your
enhancements, fixes, etc.
Changelog since V9
o Migration scalability (mingo)
To *really* see migration scalability bottlenecks you need to
remove the migration-bandwidth throttling kludge from your tree
(or configure it up very high if you want to do it simple).

Some (certainly not all) of the performance regressions you
reported were certainly due to numa/core code hitting the
migration codepaths as aggressively as the workload demanded -
and hitting scalability bottlenecks.

The right approach is to hit scalability bottlenecks and fix
them.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Mel Gorman
2012-12-09 20:40:02 UTC
Permalink
Post by Ingo Molnar
This is a full release of all the patches so apologies for the
flood. [...]
I have yet to process all your mails, but assuming I address all
your review feedback and the latest unified tree in tip:master
shows no regression in your testing, would you be willing to
start using it for ongoing work?
Ingo,

If you had read the second paragraph of the mail you just responded to or
the results at the end then you would have seen that I had problems with
the performance. You would also know that tip/master testing for the last
week was failing due to a boot problem (issue was in mainline not tip and
has been already fixed) and would have known that since the -v18 release
that numacore was effectively disabled on my test machine.

Clearly you are not reading the bug reports you are receiving and you're not
seeing the small bit of review feedback or answering the review questions
you have received either. Why would I be more forthcoming when I feel that
it'll simply be ignored? You simply assume that each batch of patches
you place on top must be fixing all known regressions and ignoring any
evidence to the contrary.

If you had read my mail from last Tuesday you would even know which patch
was causing the problem that effectively disabled numacore although not
why. The comment about p->numa_faults was completely off the mark (long
journey, was tired, assumed numa_faults was a counter and not a pointer
which was careless). If you had called me on it then I would have spotted
the actual problem sooner. The problem was indeed with the nr_cpus_allowed
== num_online_cpus()s check which I had pointed out was a suspicious check
although for different reasons. As it turns out, a printk() bodge showed
that nr_cpus_allowed == 80 set in sched_init_smp() while num_online_cpus()
== 48. This effectively disabling numacore. If you had responded to the
bug report, this would likely have been found last Wednesday.

As for my ongoing work, I have not actually changed much in the last
two weeks or so -- build fixes and your scalability patches. As I've
said multiple times, my primary objective was to build something minimal
that did something better than mainline although not necessarily as good
as the kernel potentially if either numacore or autonuma were rebased
on top. I left the tree so that other testing might validate it was
correct and avoid changing the tree too much prior to the merge window.
I deliberately avoided working on anything that would directly collide
with what numacore was trying to achieve.
Post by Ingo Molnar
It would make it much easier for me to pick up your
enhancements, fixes, etc.
Changelog since V9
o Migration scalability (mingo)
To *really* see migration scalability bottlenecks you need to
remove the migration-bandwidth throttling kludge from your tree
(or configure it up very high if you want to do it simple).
Why is it a kludge? I already explained what the rational behind the rate
limiting was. It's not about scalability, it's about mitigating worse-case
behaviour and the amount of time the kernel spends moving data around which
a deliberately adverse workload can trigger. It is unacceptable if during a
phase change that a process would stall potentially for milliseconds (seconds
if the node is large enough I guess) while the data is being migrated. Here
is it again -- http://www.spinics.net/lists/linux-mm/msg47440.html . You
either ignored the mail or simply could not be bothered explaining why
you thought this was the incorrect decision or why the concerns about an
adverse workload were unimportant.

I have a vague suspicion actually that when you are modelling the task->data
relationship that you make an implicit assumption that moving data has
zero or near-zero cost. In such a model it would always make sense to move
quickly and immediately but in practice the cost of moving can exceed the
performance benefit of accessing local data and lead to regressions. It
becomes more pronounced if the nodes are not fully connected.
Post by Ingo Molnar
Some (certainly not all) of the performance regressions you
reported were certainly due to numa/core code hitting the
migration codepaths as aggressively as the workload demanded -
and hitting scalability bottlenecks.
How are you so certain? How do you not know it's because your code is
migrating excessively for no good reason because the algorithm has a flaw
in it? Or that the cost of excessive migration is not being offset by
local data accesses? The critical point to note is that if it really was
only scalability problems then autonuma would suffer the same problems
and would be impossible to autonumas performance to exceed numacores.
This isn't the case making it unlikely the scalability is your only problem.

Either way, last night I applied a patch on top of latest tip/master to
remove the nr_cpus_allowed check so that numacore would be enabled again
and tested that. In some places it has indeed much improved. In others
it is still regressing badly and in two case, it's corrupting memory --
specjbb when THP is enabled crashes when running for single or multiple
JVMs. It is likely that a zero page is being inserted due to a race with
migration and causes the JVM to throw a null pointer exception. Here is
the comparison on the rough off-chance you actually read it this time.

stats-v8r6 Same collection of TLB flush fixes and stats
numacore-20121130 Roughly numacore v17
numafix-20121209 numacore as of December 9th with the nr_cpus_allowed check removed.
Note that this is a 3.7-rc8 based test because that's what tip/master
is.
autonuma-v28fastr4 Autonuma v28fast with THP patch on top
balancenuma-v9r2 Balance numa v9
balancenuma-v10r3 V9 + the migration scalability patches

AutoNUMA Benchmark
==================

3.7.0-rc7 3.7.0-rc6 3.7.0-rc8 3.7.0-rc7 3.7.0-rc7 3.7.0-rc7
stats-v8r6 numacore-20121130 numafix-20121209 autonuma-v28fastr4 balancenuma-v9r2 balancenuma-v10r3
User NUMA01 65230.85 ( 0.00%) 24835.22 ( 61.93%) 21882.80 ( 66.45%) 30410.22 ( 53.38%) 52436.65 ( 19.61%) 59949.95 ( 8.10%)
User NUMA01_THEADLOCAL 60794.67 ( 0.00%) 17856.17 ( 70.63%) 18367.20 ( 69.79%) 17185.34 ( 71.73%) 17829.96 ( 70.67%) 17501.83 ( 71.21%)
User NUMA02 7031.50 ( 0.00%) 2084.38 ( 70.36%) 2391.47 ( 65.99%) 2238.73 ( 68.16%) 2079.48 ( 70.43%) 2094.68 ( 70.21%)
User NUMA02_SMT 2916.19 ( 0.00%) 1009.28 ( 65.39%) 1046.49 ( 64.11%) 1037.07 ( 64.44%) 997.57 ( 65.79%) 1010.15 ( 65.36%)
System NUMA01 39.66 ( 0.00%) 926.55 (-2236.23%) 134.00 (-237.87%) 236.83 (-497.15%) 275.09 (-593.62%) 265.02 (-568.23%)
System NUMA01_THEADLOCAL 42.33 ( 0.00%) 513.99 (-1114.25%) 201.65 (-376.38%) 70.90 (-67.49%) 110.82 (-161.80%) 130.30 (-207.82%)
System NUMA02 1.25 ( 0.00%) 18.57 (-1385.60%) 13.00 (-940.00%) 6.39 (-411.20%) 6.42 (-413.60%) 9.17 (-633.60%)
System NUMA02_SMT 16.66 ( 0.00%) 12.32 ( 26.05%) 7.26 ( 56.42%) 3.17 ( 80.97%) 3.58 ( 78.51%) 6.21 ( 62.73%)
Elapsed NUMA01 1511.76 ( 0.00%) 575.93 ( 61.90%) 475.26 ( 68.56%) 701.62 ( 53.59%) 1185.53 ( 21.58%) 1352.74 ( 10.52%)
Elapsed NUMA01_THEADLOCAL 1387.17 ( 0.00%) 398.55 ( 71.27%) 405.25 ( 70.79%) 378.47 ( 72.72%) 397.37 ( 71.35%) 387.93 ( 72.03%)
Elapsed NUMA02 176.81 ( 0.00%) 51.14 ( 71.08%) 62.08 ( 64.89%) 53.45 ( 69.77%) 49.51 ( 72.00%) 49.77 ( 71.85%)
Elapsed NUMA02_SMT 163.96 ( 0.00%) 48.92 ( 70.16%) 54.45 ( 66.79%) 48.17 ( 70.62%) 47.71 ( 70.90%) 48.63 ( 70.34%)
CPU NUMA01 4317.00 ( 0.00%) 4473.00 ( -3.61%) 4632.00 ( -7.30%) 4368.00 ( -1.18%) 4446.00 ( -2.99%) 4451.00 ( -3.10%)
CPU NUMA01_THEADLOCAL 4385.00 ( 0.00%) 4609.00 ( -5.11%) 4582.00 ( -4.49%) 4559.00 ( -3.97%) 4514.00 ( -2.94%) 4545.00 ( -3.65%)
CPU NUMA02 3977.00 ( 0.00%) 4111.00 ( -3.37%) 3873.00 ( 2.62%) 4200.00 ( -5.61%) 4212.00 ( -5.91%) 4226.00 ( -6.26%)
CPU NUMA02_SMT 1788.00 ( 0.00%) 2087.00 (-16.72%) 1935.00 ( -8.22%) 2159.00 (-20.75%) 2098.00 (-17.34%) 2089.00 (-16.83%)

Latest numacore has improved on the numa01 case quite a bit. However, this is
the adverse workload. For the workloads that actually do something sensible,
autonuma and balancenuma are both beating numacore by a good margin.

numacores system CPU usage continues to be excessive -- over triple
balancenumas in the numa01 case. Over quadruple in the numa01_threadlocal
case. Double in numa02 and over double in the numa02_smt case.

Duration and vmstats showed nothing interesting so I excluded them this time.

SpecJBB, Multiple JVMs, THP is enabled
======================================

There is no latest numacore figures available because the JVM in two
separate tests crashed with this report

Input Properties:
per_jvm_warehouse_rampup = 3.0
per_jvm_warehouse_rampdown = 20.0
jvm_instances = 4
deterministic_random_seed = false
ramp_up_seconds = 30
measurement_seconds = 240
starting_number_warehouses = 1
increment_number_warehouses = 1
ending_number_warehouses = 24
expected_peak_warehouse = 12
Waiting on instance 1 pid 4028 to finish.
Accepted client /127.0.0.1:59130
Accepted client /127.0.0.1:58393
Accepted client /127.0.0.1:53374
Accepted client /127.0.0.1:40128
java.lang.NullPointerException: error
/root/git-private/autonuma-test/shellpacks/shellpack-bench-specjbb: line 203: 4028 Aborted java $USE_HUGEPAGE $SPECJBB_MAXHEAP spec.jbb.JBBmain -propfile SPECjbb.props -id $INSTANCE > $SHELLPACK_TEMP/jvm-instance-$INSTANCE.log
Waiting on instance 1 pid 4029 to finish.
Exception in thread "main" java.lang.NullPointerException
at spec.jbb.Company.displayResultTotals(Unknown Source)
at spec.jbb.JBBmain.DoARun(Unknown Source)
at spec.jbb.JBBmain.runWarehouse(Unknown Source)
at spec.jbb.JBBmain.doIt(Unknown Source)
at spec.jbb.JBBmain.main(Unknown Source)
Exception in thread "main" java.lang.NullPointerException
at spec.jbb.Company.displayResultTotals(Unknown Source)
at spec.jbb.JBBmain.DoARun(Unknown Source)
at spec.jbb.JBBmain.runWarehouse(Unknown Source)
at spec.jbb.JBBmain.doIt(Unknown Source)
at spec.jbb.JBBmain.main(Unknown Source)
Exception in thread "main" java.lang.NullPointerException
at spec.jbb.Company.displayResultTotals(Unknown Source)
at spec.jbb.JBBmain.DoARun(Unknown Source)
at spec.jbb.JBBmain.runWarehouse(Unknown Source)
at spec.jbb.JBBmain.doIt(Unknown Source)
at spec.jbb.JBBmain.main(Unknown Source)
Read from remote host compass: Connection reset by peer

Here are the results for the kernels that succeeded

3.7.0-rc7 3.7.0-rc6 3.7.0-rc7 3.7.0-rc7 3.7.0-rc7
stats-v8r6 numacore-20121130 autonuma-v28fastr4 balancenuma-v9r2 balancenuma-v10r3
Mean 1 31311.75 ( 0.00%) 27938.00 (-10.77%) 31474.25 ( 0.52%) 31112.00 ( -0.64%) 31281.50 ( -0.10%)
Mean 2 62972.75 ( 0.00%) 51899.00 (-17.58%) 66654.00 ( 5.85%) 62937.50 ( -0.06%) 62483.50 ( -0.78%)
Mean 3 91292.00 ( 0.00%) 80908.00 (-11.37%) 97177.50 ( 6.45%) 90665.50 ( -0.69%) 90667.00 ( -0.68%)
Mean 4 115768.75 ( 0.00%) 99497.25 (-14.06%) 125596.00 ( 8.49%) 116812.50 ( 0.90%) 116193.50 ( 0.37%)
Mean 5 137248.50 ( 0.00%) 92837.75 (-32.36%) 152795.25 ( 11.33%) 139037.75 ( 1.30%) 139055.50 ( 1.32%)
Mean 6 155528.50 ( 0.00%) 105554.50 (-32.13%) 177455.25 ( 14.10%) 155769.25 ( 0.15%) 159129.50 ( 2.32%)
Mean 7 156747.50 ( 0.00%) 122582.25 (-21.80%) 184578.75 ( 17.76%) 157103.25 ( 0.23%) 163234.00 ( 4.14%)
Mean 8 152069.50 ( 0.00%) 122439.00 (-19.48%) 186619.25 ( 22.72%) 157631.00 ( 3.66%) 163077.75 ( 7.24%)
Mean 9 146609.75 ( 0.00%) 112410.00 (-23.33%) 186165.00 ( 26.98%) 152561.00 ( 4.06%) 159656.00 ( 8.90%)
Mean 10 142819.00 ( 0.00%) 111456.00 (-21.96%) 182569.75 ( 27.83%) 145320.00 ( 1.75%) 153414.25 ( 7.42%)
Mean 11 128292.25 ( 0.00%) 98027.00 (-23.59%) 176104.75 ( 37.27%) 138599.50 ( 8.03%) 147194.25 ( 14.73%)
Mean 12 128769.75 ( 0.00%) 129469.50 ( 0.54%) 169003.00 ( 31.24%) 131994.75 ( 2.50%) 140049.75 ( 8.76%)
Mean 13 126488.50 ( 0.00%) 110133.75 (-12.93%) 162725.75 ( 28.65%) 130005.25 ( 2.78%) 139109.75 ( 9.98%)
Mean 14 123400.00 ( 0.00%) 117929.75 ( -4.43%) 163781.25 ( 32.72%) 126340.75 ( 2.38%) 137883.00 ( 11.74%)
Mean 15 122139.50 ( 0.00%) 122404.25 ( 0.22%) 160800.25 ( 31.65%) 128612.75 ( 5.30%) 136624.00 ( 11.86%)
Mean 16 116413.50 ( 0.00%) 124573.50 ( 7.01%) 160882.75 ( 38.20%) 117793.75 ( 1.19%) 134005.75 ( 15.11%)
Mean 17 117263.25 ( 0.00%) 121937.25 ( 3.99%) 159069.75 ( 35.65%) 121991.75 ( 4.03%) 133444.50 ( 13.80%)
Mean 18 117277.00 ( 0.00%) 116633.75 ( -0.55%) 158694.75 ( 35.32%) 119089.75 ( 1.55%) 129650.75 ( 10.55%)
Mean 19 113231.00 ( 0.00%) 111035.75 ( -1.94%) 155563.25 ( 37.39%) 119699.75 ( 5.71%) 123403.25 ( 8.98%)
Mean 20 113628.75 ( 0.00%) 113451.25 ( -0.16%) 154779.75 ( 36.22%) 118400.75 ( 4.20%) 126041.25 ( 10.92%)
Mean 21 110982.50 ( 0.00%) 107660.50 ( -2.99%) 151147.25 ( 36.19%) 115663.25 ( 4.22%) 121906.50 ( 9.84%)
Mean 22 107660.25 ( 0.00%) 104771.50 ( -2.68%) 151180.50 ( 40.42%) 111038.00 ( 3.14%) 125519.00 ( 16.59%)
Mean 23 105320.50 ( 0.00%) 88275.25 (-16.18%) 147032.00 ( 39.60%) 112817.50 ( 7.12%) 124148.25 ( 17.88%)
Mean 24 110900.50 ( 0.00%) 85169.00 (-23.20%) 147407.00 ( 32.92%) 109556.50 ( -1.21%) 122544.00 ( 10.50%)
Stddev 1 720.83 ( 0.00%) 982.31 (-36.28%) 942.80 (-30.79%) 1170.23 (-62.35%) 539.84 ( 25.11%)
Stddev 2 466.00 ( 0.00%) 1770.75 (-279.99%) 1327.32 (-184.83%) 1368.51 (-193.67%) 2103.32 (-351.35%)
Stddev 3 509.61 ( 0.00%) 4849.62 (-851.63%) 1803.72 (-253.94%) 1088.04 (-113.50%) 410.73 ( 19.40%)
Stddev 4 1750.10 ( 0.00%) 10708.16 (-511.86%) 2010.11 (-14.86%) 1456.90 ( 16.75%) 1370.22 ( 21.71%)
Stddev 5 700.05 ( 0.00%) 16497.79 (-2256.66%) 2354.70 (-236.36%) 759.38 ( -8.48%) 1869.54 (-167.06%)
Stddev 6 2259.33 ( 0.00%) 24221.98 (-972.09%) 1516.32 ( 32.89%) 1032.39 ( 54.31%) 1720.87 ( 23.83%)
Stddev 7 3390.99 ( 0.00%) 4721.80 (-39.25%) 2398.34 ( 29.27%) 2487.08 ( 26.66%) 4327.85 (-27.63%)
Stddev 8 7533.18 ( 0.00%) 8609.90 (-14.29%) 2895.55 ( 61.56%) 3902.53 ( 48.20%) 2536.68 ( 66.33%)
Stddev 9 9223.98 ( 0.00%) 10731.70 (-16.35%) 4726.23 ( 48.76%) 5673.20 ( 38.50%) 3377.59 ( 63.38%)
Stddev 10 4578.09 ( 0.00%) 11136.27 (-143.25%) 6705.48 (-46.47%) 5516.47 (-20.50%) 7227.58 (-57.87%)
Stddev 11 8201.30 ( 0.00%) 3580.27 ( 56.35%) 10915.90 (-33.10%) 4757.42 ( 41.99%) 4056.02 ( 50.54%)
Stddev 12 5713.70 ( 0.00%) 13923.12 (-143.68%) 16555.64 (-189.75%) 4573.05 ( 19.96%) 3678.89 ( 35.61%)
Stddev 13 5878.95 ( 0.00%) 10471.09 (-78.11%) 18628.01 (-216.86%) 1680.65 ( 71.41%) 3947.39 ( 32.86%)
Stddev 14 4783.95 ( 0.00%) 4051.35 ( 15.31%) 18324.63 (-283.04%) 2637.82 ( 44.86%) 4806.09 ( -0.46%)
Stddev 15 6281.48 ( 0.00%) 3357.07 ( 46.56%) 17654.58 (-181.06%) 2003.38 ( 68.11%) 3005.22 ( 52.16%)
Stddev 16 6948.12 ( 0.00%) 3763.32 ( 45.84%) 18280.52 (-163.10%) 3526.10 ( 49.25%) 3309.24 ( 52.37%)
Stddev 17 5603.77 ( 0.00%) 1452.04 ( 74.09%) 18230.53 (-225.33%) 1712.95 ( 69.43%) 3516.09 ( 37.25%)
Stddev 18 6200.90 ( 0.00%) 1870.12 ( 69.84%) 18486.73 (-198.13%) 751.36 ( 87.88%) 2412.60 ( 61.09%)
Stddev 19 6726.31 ( 0.00%) 1045.21 ( 84.46%) 18465.25 (-174.52%) 1750.49 ( 73.98%) 4482.82 ( 33.35%)
Stddev 20 5713.58 ( 0.00%) 2066.90 ( 63.82%) 19947.77 (-249.13%) 1892.91 ( 66.87%) 2612.62 ( 54.27%)
Stddev 21 4566.92 ( 0.00%) 2460.40 ( 46.13%) 21189.08 (-363.97%) 3639.75 ( 20.30%) 1963.17 ( 57.01%)
Stddev 22 6168.05 ( 0.00%) 2770.81 ( 55.08%) 20033.82 (-224.80%) 3682.20 ( 40.30%) 1159.17 ( 81.21%)
Stddev 23 6295.45 ( 0.00%) 1337.32 ( 78.76%) 22610.91 (-259.16%) 2013.53 ( 68.02%) 3842.61 ( 38.96%)
Stddev 24 3108.17 ( 0.00%) 1381.20 ( 55.56%) 21243.56 (-583.47%) 4044.16 (-30.11%) 2673.39 ( 13.99%)
TPut 1 125247.00 ( 0.00%) 111752.00 (-10.77%) 125897.00 ( 0.52%) 124448.00 ( -0.64%) 125126.00 ( -0.10%)
TPut 2 251891.00 ( 0.00%) 207596.00 (-17.58%) 266616.00 ( 5.85%) 251750.00 ( -0.06%) 249934.00 ( -0.78%)
TPut 3 365168.00 ( 0.00%) 323632.00 (-11.37%) 388710.00 ( 6.45%) 362662.00 ( -0.69%) 362668.00 ( -0.68%)
TPut 4 463075.00 ( 0.00%) 397989.00 (-14.06%) 502384.00 ( 8.49%) 467250.00 ( 0.90%) 464774.00 ( 0.37%)
TPut 5 548994.00 ( 0.00%) 371351.00 (-32.36%) 611181.00 ( 11.33%) 556151.00 ( 1.30%) 556222.00 ( 1.32%)
TPut 6 622114.00 ( 0.00%) 422218.00 (-32.13%) 709821.00 ( 14.10%) 623077.00 ( 0.15%) 636518.00 ( 2.32%)
TPut 7 626990.00 ( 0.00%) 490329.00 (-21.80%) 738315.00 ( 17.76%) 628413.00 ( 0.23%) 652936.00 ( 4.14%)
TPut 8 608278.00 ( 0.00%) 489756.00 (-19.48%) 746477.00 ( 22.72%) 630524.00 ( 3.66%) 652311.00 ( 7.24%)
TPut 9 586439.00 ( 0.00%) 449640.00 (-23.33%) 744660.00 ( 26.98%) 610244.00 ( 4.06%) 638624.00 ( 8.90%)
TPut 10 571276.00 ( 0.00%) 445824.00 (-21.96%) 730279.00 ( 27.83%) 581280.00 ( 1.75%) 613657.00 ( 7.42%)
TPut 11 513169.00 ( 0.00%) 392108.00 (-23.59%) 704419.00 ( 37.27%) 554398.00 ( 8.03%) 588777.00 ( 14.73%)
TPut 12 515079.00 ( 0.00%) 517878.00 ( 0.54%) 676012.00 ( 31.24%) 527979.00 ( 2.50%) 560199.00 ( 8.76%)
TPut 13 505954.00 ( 0.00%) 440535.00 (-12.93%) 650903.00 ( 28.65%) 520021.00 ( 2.78%) 556439.00 ( 9.98%)
TPut 14 493600.00 ( 0.00%) 471719.00 ( -4.43%) 655125.00 ( 32.72%) 505363.00 ( 2.38%) 551532.00 ( 11.74%)
TPut 15 488558.00 ( 0.00%) 489617.00 ( 0.22%) 643201.00 ( 31.65%) 514451.00 ( 5.30%) 546496.00 ( 11.86%)
TPut 16 465654.00 ( 0.00%) 498294.00 ( 7.01%) 643531.00 ( 38.20%) 471175.00 ( 1.19%) 536023.00 ( 15.11%)
TPut 17 469053.00 ( 0.00%) 487749.00 ( 3.99%) 636279.00 ( 35.65%) 487967.00 ( 4.03%) 533778.00 ( 13.80%)
TPut 18 469108.00 ( 0.00%) 466535.00 ( -0.55%) 634779.00 ( 35.32%) 476359.00 ( 1.55%) 518603.00 ( 10.55%)
TPut 19 452924.00 ( 0.00%) 444143.00 ( -1.94%) 622253.00 ( 37.39%) 478799.00 ( 5.71%) 493613.00 ( 8.98%)
TPut 20 454515.00 ( 0.00%) 453805.00 ( -0.16%) 619119.00 ( 36.22%) 473603.00 ( 4.20%) 504165.00 ( 10.92%)
TPut 21 443930.00 ( 0.00%) 430642.00 ( -2.99%) 604589.00 ( 36.19%) 462653.00 ( 4.22%) 487626.00 ( 9.84%)
TPut 22 430641.00 ( 0.00%) 419086.00 ( -2.68%) 604722.00 ( 40.42%) 444152.00 ( 3.14%) 502076.00 ( 16.59%)
TPut 23 421282.00 ( 0.00%) 353101.00 (-16.18%) 588128.00 ( 39.60%) 451270.00 ( 7.12%) 496593.00 ( 17.88%)
TPut 24 443602.00 ( 0.00%) 340676.00 (-23.20%) 589628.00 ( 32.92%) 438226.00 ( -1.21%) 490176.00 ( 10.50%)

numacore v17 regressed but we knew that already.

autonuma does the best overall

balancenuma does all right and the scalability patches help quite a bit.

SPECJBB PEAKS
3.7.0-rc7 3.7.0-rc6 3.7.0-rc7 3.7.0-rc7 3.7.0-rc7
stats-v8r6 numacore-20121130 autonuma-v28fastr4 balancenuma-v9r2 balancenuma-v10r3
Expctd Warehouse 12.00 ( 0.00%) 12.00 ( 0.00%) 12.00 ( 0.00%) 12.00 ( 0.00%) 12.00 ( 0.00%)
Expctd Peak Bops 515079.00 ( 0.00%) 517878.00 ( 0.54%) 676012.00 ( 31.24%) 527979.00 ( 2.50%) 560199.00 ( 8.76%)
Actual Warehouse 7.00 ( 0.00%) 12.00 ( 71.43%) 8.00 ( 14.29%) 8.00 ( 14.29%) 7.00 ( 0.00%)
Actual Peak Bops 626990.00 ( 0.00%) 517878.00 (-17.40%) 746477.00 ( 19.06%) 630524.00 ( 0.56%) 652936.00 ( 4.14%)
SpecJBB Bops 465685.00 ( 0.00%) 447214.00 ( -3.97%) 628328.00 ( 34.93%) 480925.00 ( 3.27%) 521332.00 ( 11.95%)
SpecJBB Bops/JVM 116421.00 ( 0.00%) 111804.00 ( -3.97%) 157082.00 ( 34.93%) 120231.00 ( 3.27%) 130333.00 ( 11.95%)

numacore is pretty old here so ignore the regression.

autonuma is the best but balancenuma sees some of the performance gain.

MMTests Statistics: vmstat
3.7.0-rc7 3.7.0-rc6 3.7.0-rc7 3.7.0-rc7 3.7.0-rc7
stats-v8r6numacore-20121130autonuma-v28fastr4balancenuma-v9r2balancenuma-v10r3
Page Ins 37116 36404 36740 35664 34832
Page Outs 30340 33624 29428 29656 30320
Swap Ins 0 0 0 0 0
Swap Outs 0 0 0 0 0
Direct pages scanned 0 0 0 0 0
Kswapd pages scanned 0 0 0 0 0
Kswapd pages reclaimed 0 0 0 0 0
Direct pages reclaimed 0 0 0 0 0
Kswapd efficiency 100% 100% 100% 100% 100%
Kswapd velocity 0.000 0.000 0.000 0.000 0.000
Direct efficiency 100% 100% 100% 100% 100%
Direct velocity 0.000 0.000 0.000 0.000 0.000
Percentage direct scans 0% 0% 0% 0% 0%
Page writes by reclaim 0 0 0 0 0
Page writes file 0 0 0 0 0
Page writes anon 0 0 0 0 0
Page reclaim immediate 0 0 0 0 0
Page rescued immediate 0 0 0 0 0
Slabs scanned 0 0 0 0 0
Direct inode steals 0 0 0 0 0
Kswapd inode steals 0 0 0 0 0
Kswapd skipped wait 0 0 0 0 0
THP fault alloc 63322 49889 52514 65794 66963
THP collapse alloc 130 53 463 128 121
THP splits 355 192 376 371 362
THP fault fallback 0 0 0 0 0
THP collapse fail 0 0 0 0 0
Compaction stalls 0 0 0 0 0
Compaction success 0 0 0 0 0
Compaction failures 0 0 0 0 0
Page migrate success 0 0 0 51424061 50195011
Page migrate failure 0 0 0 0 0
Compaction pages isolated 0 0 0 0 0
Compaction migrate scanned 0 0 0 0 0
Compaction free scanned 0 0 0 0 0
Compaction cost 0 0 0 53378 52102
NUMA PTE updates 0 0 0 411047238 404964644
NUMA hint faults 0 0 0 3077302 3075026
NUMA hint local faults 0 0 0 958617 870171
NUMA pages migrated 0 0 0 51424061 50195011
AutoNUMA cost 0 0 0 19240 19163

All it shows really is that THP is enabled and that balancenuma is migrating
more than I'd like -- 48MB/sec on average throughout the test.

SpecJBB, Multiple JVMs, THP is disabled
=======================================
3.7.0-rc7 3.7.0-rc6 3.7.0-rc8 3.7.0-rc7 3.7.0-rc7 3.7.0-rc7
stats-v8r6 numacore-20121130 numafix-20121209 autonuma-v28fastr4 balancenuma-v9r2 balancenuma-v10r3
Mean 1 26036.50 ( 0.00%) 19595.00 (-24.74%) 23791.25 ( -8.62%) 24738.25 ( -4.99%) 25595.00 ( -1.70%) 25610.50 ( -1.64%)
Mean 2 53629.75 ( 0.00%) 38481.50 (-28.25%) 46966.75 (-12.42%) 55646.75 ( 3.76%) 53045.25 ( -1.09%) 53383.00 ( -0.46%)
Mean 3 77385.00 ( 0.00%) 53685.50 (-30.63%) 66913.25 (-13.53%) 82714.75 ( 6.89%) 76596.00 ( -1.02%) 76502.75 ( -1.14%)
Mean 4 100097.75 ( 0.00%) 68253.50 (-31.81%) 72186.50 (-27.88%) 107883.25 ( 7.78%) 98618.00 ( -1.48%) 99786.50 ( -0.31%)
Mean 5 119012.75 ( 0.00%) 74164.50 (-37.68%) 72126.50 (-39.40%) 130260.25 ( 9.45%) 119354.50 ( 0.29%) 121741.75 ( 2.29%)
Mean 6 137419.25 ( 0.00%) 86158.50 (-37.30%) 52123.00 (-62.07%) 154244.50 ( 12.24%) 136901.75 ( -0.38%) 136990.50 ( -0.31%)
Mean 7 138018.25 ( 0.00%) 96059.25 (-30.40%) 55582.50 (-59.73%) 159501.00 ( 15.57%) 138265.50 ( 0.18%) 139398.75 ( 1.00%)
Mean 8 136774.00 ( 0.00%) 97003.50 (-29.08%) 30208.25 (-77.91%) 162868.00 ( 19.08%) 138554.50 ( 1.30%) 137340.75 ( 0.41%)
Mean 9 127966.50 ( 0.00%) 95261.00 (-25.56%) 125900.50 ( -1.61%) 163008.00 ( 27.38%) 137954.00 ( 7.80%) 134200.50 ( 4.87%)
Mean 10 124628.75 ( 0.00%) 96202.25 (-22.81%) 73809.00 (-40.78%) 159696.50 ( 28.14%) 131322.25 ( 5.37%) 126927.50 ( 1.84%)
Mean 11 117269.00 ( 0.00%) 95924.25 (-18.20%) 127804.25 ( 8.98%) 154701.50 ( 31.92%) 125032.75 ( 6.62%) 122925.00 ( 4.82%)
Mean 12 111962.25 ( 0.00%) 94247.25 (-15.82%) 146580.25 ( 30.92%) 150936.50 ( 34.81%) 118119.50 ( 5.50%) 119931.75 ( 7.12%)
Mean 13 111595.50 ( 0.00%) 106538.50 ( -4.53%) 134462.75 ( 20.49%) 147193.25 ( 31.90%) 116398.75 ( 4.30%) 117349.75 ( 5.16%)
Mean 14 110881.00 ( 0.00%) 103549.00 ( -6.61%) 137573.25 ( 24.07%) 144584.00 ( 30.40%) 114934.50 ( 3.66%) 115838.25 ( 4.47%)
Mean 15 109337.50 ( 0.00%) 101729.00 ( -6.96%) 139722.50 ( 27.79%) 143333.00 ( 31.09%) 115523.75 ( 5.66%) 115151.25 ( 5.32%)
Mean 16 107031.75 ( 0.00%) 101983.75 ( -4.72%) 121221.75 ( 13.26%) 141907.75 ( 32.58%) 113666.00 ( 6.20%) 113673.50 ( 6.21%)
Mean 17 105491.25 ( 0.00%) 100205.75 ( -5.01%) 129429.75 ( 22.69%) 140691.00 ( 33.37%) 112751.50 ( 6.88%) 113221.25 ( 7.33%)
Mean 18 101102.75 ( 0.00%) 96635.50 ( -4.42%) 115086.50 ( 13.83%) 137784.25 ( 36.28%) 112582.50 ( 11.35%) 111533.50 ( 10.32%)
Mean 19 103907.25 ( 0.00%) 94578.25 ( -8.98%) 126392.75 ( 21.64%) 135719.25 ( 30.62%) 110152.25 ( 6.01%) 113959.25 ( 9.67%)
Mean 20 100496.00 ( 0.00%) 92683.75 ( -7.77%) 123318.75 ( 22.71%) 135264.25 ( 34.60%) 108861.50 ( 8.32%) 113746.00 ( 13.18%)
Mean 21 99570.00 ( 0.00%) 92955.75 ( -6.64%) 111293.00 ( 11.77%) 133891.00 ( 34.47%) 110094.00 ( 10.57%) 109462.50 ( 9.94%)
Mean 22 98611.75 ( 0.00%) 89781.75 ( -8.95%) 118218.50 ( 19.88%) 132399.75 ( 34.26%) 109322.75 ( 10.86%) 110502.75 ( 12.06%)
Mean 23 98173.00 ( 0.00%) 88846.00 ( -9.50%) 118210.00 ( 20.41%) 130726.00 ( 33.16%) 106046.25 ( 8.02%) 107304.25 ( 9.30%)
Mean 24 92074.75 ( 0.00%) 88581.00 ( -3.79%) 111965.00 ( 21.60%) 127552.25 ( 38.53%) 102362.00 ( 11.17%) 107119.25 ( 16.34%)
Stddev 1 735.13 ( 0.00%) 538.24 ( 26.78%) 854.37 (-16.22%) 121.08 ( 83.53%) 906.62 (-23.33%) 788.06 ( -7.20%)
Stddev 2 406.26 ( 0.00%) 3458.87 (-751.39%) 4220.03 (-938.75%) 477.32 (-17.49%) 1322.57 (-225.55%) 468.57 (-15.34%)
Stddev 3 644.20 ( 0.00%) 1360.89 (-111.25%) 2573.27 (-299.45%) 922.47 (-43.20%) 609.27 ( 5.42%) 599.26 ( 6.98%)
Stddev 4 743.93 ( 0.00%) 2149.34 (-188.92%) 14533.01 (-1853.53%) 1385.42 (-86.23%) 1119.02 (-50.42%) 801.13 ( -7.69%)
Stddev 5 898.53 ( 0.00%) 2521.01 (-180.57%) 15303.97 (-1603.23%) 763.24 ( 15.06%) 942.52 ( -4.90%) 1718.19 (-91.22%)
Stddev 6 1126.61 ( 0.00%) 3818.22 (-238.91%) 23616.59 (-1996.26%) 1527.03 (-35.54%) 2445.69 (-117.08%) 1754.32 (-55.72%)
Stddev 7 2907.61 ( 0.00%) 4419.29 (-51.99%) 29664.97 (-920.25%) 1536.66 ( 47.15%) 4881.65 (-67.89%) 4863.83 (-67.28%)
Stddev 8 3200.64 ( 0.00%) 382.01 ( 88.06%) 10743.99 (-235.68%) 1228.09 ( 61.63%) 5459.06 (-70.56%) 5583.95 (-74.46%)
Stddev 9 2907.92 ( 0.00%) 1813.39 ( 37.64%) 11763.90 (-304.55%) 1502.61 ( 48.33%) 2501.16 ( 13.99%) 2525.02 ( 13.17%)
Stddev 10 5093.23 ( 0.00%) 1313.58 ( 74.21%) 34926.95 (-585.75%) 2763.19 ( 45.75%) 2973.78 ( 41.61%) 2005.95 ( 60.62%)
Stddev 11 4982.41 ( 0.00%) 1163.02 ( 76.66%) 13792.07 (-176.81%) 4776.28 ( 4.14%) 6068.34 (-21.80%) 4256.77 ( 14.56%)
Stddev 12 3051.38 ( 0.00%) 2117.59 ( 30.60%) 5819.48 (-90.72%) 9252.59 (-203.23%) 3885.96 (-27.35%) 2580.44 ( 15.43%)
Stddev 13 2918.03 ( 0.00%) 2252.11 ( 22.82%) 8340.05 (-185.81%) 9384.83 (-221.62%) 1833.07 ( 37.18%) 2523.28 ( 13.53%)
Stddev 14 3178.97 ( 0.00%) 2337.49 ( 26.47%) 6166.98 (-93.99%) 9353.03 (-194.22%) 1072.60 ( 66.26%) 1140.55 ( 64.12%)
Stddev 15 2438.31 ( 0.00%) 1707.72 ( 29.96%) 10687.74 (-338.33%) 10494.03 (-330.38%) 2295.76 ( 5.85%) 1213.75 ( 50.22%)
Stddev 16 2682.25 ( 0.00%) 840.47 ( 68.67%) 10963.32 (-308.74%) 10343.25 (-285.62%) 2416.09 ( 9.92%) 1697.27 ( 36.72%)
Stddev 17 2807.66 ( 0.00%) 1546.16 ( 44.93%) 10755.81 (-283.09%) 11446.15 (-307.68%) 2484.08 ( 11.52%) 563.50 ( 79.93%)
Stddev 18 3049.27 ( 0.00%) 934.11 ( 69.37%) 8523.80 (-179.54%) 11779.80 (-286.31%) 1472.27 ( 51.72%) 1533.68 ( 49.70%)
Stddev 19 2782.65 ( 0.00%) 735.28 ( 73.58%) 9045.84 (-225.08%) 11416.35 (-310.27%) 514.78 ( 81.50%) 1283.38 ( 53.88%)
Stddev 20 2379.12 ( 0.00%) 956.25 ( 59.81%) 3789.62 (-59.29%) 10511.63 (-341.83%) 1641.25 ( 31.01%) 1758.22 ( 26.10%)
Stddev 21 2975.22 ( 0.00%) 438.31 ( 85.27%) 8160.39 (-174.28%) 11292.91 (-279.57%) 1087.60 ( 63.44%) 434.51 ( 85.40%)
Stddev 22 2260.61 ( 0.00%) 718.23 ( 68.23%) 10418.90 (-360.89%) 11993.84 (-430.56%) 909.16 ( 59.78%) 322.32 ( 85.74%)
Stddev 23 2900.85 ( 0.00%) 275.47 ( 90.50%) 9829.57 (-238.85%) 12234.80 (-321.77%) 701.39 ( 75.82%) 1444.19 ( 50.21%)
Stddev 24 2578.98 ( 0.00%) 481.68 ( 81.32%) 7696.37 (-198.43%) 12769.61 (-395.14%) 732.56 ( 71.60%) 1777.60 ( 31.07%)
TPut 1 104146.00 ( 0.00%) 78380.00 (-24.74%) 95165.00 ( -8.62%) 98953.00 ( -4.99%) 102380.00 ( -1.70%) 102442.00 ( -1.64%)
TPut 2 214519.00 ( 0.00%) 153926.00 (-28.25%) 187867.00 (-12.42%) 222587.00 ( 3.76%) 212181.00 ( -1.09%) 213532.00 ( -0.46%)
TPut 3 309540.00 ( 0.00%) 214742.00 (-30.63%) 267653.00 (-13.53%) 330859.00 ( 6.89%) 306384.00 ( -1.02%) 306011.00 ( -1.14%)
TPut 4 400391.00 ( 0.00%) 273014.00 (-31.81%) 288746.00 (-27.88%) 431533.00 ( 7.78%) 394472.00 ( -1.48%) 399146.00 ( -0.31%)
TPut 5 476051.00 ( 0.00%) 296658.00 (-37.68%) 288506.00 (-39.40%) 521041.00 ( 9.45%) 477418.00 ( 0.29%) 486967.00 ( 2.29%)
TPut 6 549677.00 ( 0.00%) 344634.00 (-37.30%) 208492.00 (-62.07%) 616978.00 ( 12.24%) 547607.00 ( -0.38%) 547962.00 ( -0.31%)
TPut 7 552073.00 ( 0.00%) 384237.00 (-30.40%) 222330.00 (-59.73%) 638004.00 ( 15.57%) 553062.00 ( 0.18%) 557595.00 ( 1.00%)
TPut 8 547096.00 ( 0.00%) 388014.00 (-29.08%) 120833.00 (-77.91%) 651472.00 ( 19.08%) 554218.00 ( 1.30%) 549363.00 ( 0.41%)
TPut 9 511866.00 ( 0.00%) 381044.00 (-25.56%) 503602.00 ( -1.61%) 652032.00 ( 27.38%) 551816.00 ( 7.80%) 536802.00 ( 4.87%)
TPut 10 498515.00 ( 0.00%) 384809.00 (-22.81%) 295236.00 (-40.78%) 638786.00 ( 28.14%) 525289.00 ( 5.37%) 507710.00 ( 1.84%)
TPut 11 469076.00 ( 0.00%) 383697.00 (-18.20%) 511217.00 ( 8.98%) 618806.00 ( 31.92%) 500131.00 ( 6.62%) 491700.00 ( 4.82%)
TPut 12 447849.00 ( 0.00%) 376989.00 (-15.82%) 586321.00 ( 30.92%) 603746.00 ( 34.81%) 472478.00 ( 5.50%) 479727.00 ( 7.12%)
TPut 13 446382.00 ( 0.00%) 426154.00 ( -4.53%) 537851.00 ( 20.49%) 588773.00 ( 31.90%) 465595.00 ( 4.30%) 469399.00 ( 5.16%)
TPut 14 443524.00 ( 0.00%) 414196.00 ( -6.61%) 550293.00 ( 24.07%) 578336.00 ( 30.40%) 459738.00 ( 3.66%) 463353.00 ( 4.47%)
TPut 15 437350.00 ( 0.00%) 406916.00 ( -6.96%) 558890.00 ( 27.79%) 573332.00 ( 31.09%) 462095.00 ( 5.66%) 460605.00 ( 5.32%)
TPut 16 428127.00 ( 0.00%) 407935.00 ( -4.72%) 484887.00 ( 13.26%) 567631.00 ( 32.58%) 454664.00 ( 6.20%) 454694.00 ( 6.21%)
TPut 17 421965.00 ( 0.00%) 400823.00 ( -5.01%) 517719.00 ( 22.69%) 562764.00 ( 33.37%) 451006.00 ( 6.88%) 452885.00 ( 7.33%)
TPut 18 404411.00 ( 0.00%) 386542.00 ( -4.42%) 460346.00 ( 13.83%) 551137.00 ( 36.28%) 450330.00 ( 11.35%) 446134.00 ( 10.32%)
TPut 19 415629.00 ( 0.00%) 378313.00 ( -8.98%) 505571.00 ( 21.64%) 542877.00 ( 30.62%) 440609.00 ( 6.01%) 455837.00 ( 9.67%)
TPut 20 401984.00 ( 0.00%) 370735.00 ( -7.77%) 493275.00 ( 22.71%) 541057.00 ( 34.60%) 435446.00 ( 8.32%) 454984.00 ( 13.18%)
TPut 21 398280.00 ( 0.00%) 371823.00 ( -6.64%) 445172.00 ( 11.77%) 535564.00 ( 34.47%) 440376.00 ( 10.57%) 437850.00 ( 9.94%)
TPut 22 394447.00 ( 0.00%) 359127.00 ( -8.95%) 472874.00 ( 19.88%) 529599.00 ( 34.26%) 437291.00 ( 10.86%) 442011.00 ( 12.06%)
TPut 23 392692.00 ( 0.00%) 355384.00 ( -9.50%) 472840.00 ( 20.41%) 522904.00 ( 33.16%) 424185.00 ( 8.02%) 429217.00 ( 9.30%)
TPut 24 368299.00 ( 0.00%) 354324.00 ( -3.79%) 447860.00 ( 21.60%) 510209.00 ( 38.53%) 409448.00 ( 11.17%) 428477.00 ( 16.34%)

Latest numacore has improved dramatically here. In v17, it was regressing
heavily across the board. The latest figures show that it regresses heavily
for small numbers of warehouses and shows very large performance gains
for larger numbers of warehouses. This problem with regressions for smaller
numbers of warehouses has been reported repeatedly and it has been pointed out
multiple times that specjbb by default ignores these results which can be
very misleading.

autonuma shows large gains even for small numbers of warehouses and larger
performnace gains than numacore does. This is without the TLB optimisations.

balancenuma is not great, but it's better than mainline.


SPECJBB PEAKS
3.7.0-rc7 3.7.0-rc6 3.7.0-rc8 3.7.0-rc7 3.7.0-rc7 3.7.0-rc7
stats-v8r6 numacore-20121130 numafix-20121209 autonuma-v28fastr4 balancenuma-v9r2 balancenuma-v10r3
Expctd Warehouse 12.00 ( 0.00%) 12.00 ( 0.00%) 12.00 ( 0.00%) 12.00 ( 0.00%) 12.00 ( 0.00%) 12.00 ( 0.00%)
Expctd Peak Bops 447849.00 ( 0.00%) 376989.00 (-15.82%) 586321.00 ( 30.92%) 603746.00 ( 34.81%) 472478.00 ( 5.50%) 479727.00 ( 7.12%)
Actual Warehouse 7.00 ( 0.00%) 13.00 ( 85.71%) 12.00 ( 71.43%) 9.00 ( 28.57%) 8.00 ( 14.29%) 7.00 ( 0.00%)
Actual Peak Bops 552073.00 ( 0.00%) 426154.00 (-22.81%) 586321.00 ( 6.20%) 652032.00 ( 18.11%) 554218.00 ( 0.39%) 557595.00 ( 1.00%)
SpecJBB Bops 415458.00 ( 0.00%) 385328.00 ( -7.25%) 502608.00 ( 20.98%) 554456.00 ( 33.46%) 446405.00 ( 7.45%) 451937.00 ( 8.78%)
SpecJBB Bops/JVM 103865.00 ( 0.00%) 96332.00 ( -7.25%) 125652.00 ( 20.98%) 138614.00 ( 33.46%) 111601.00 ( 7.45%) 112984.00 ( 8.78%)

numacore is showing good performance gains both at the peak and in the
specjbb score. Note that the specjbb score ignored the regressions for
smaller numbers of warehouses.

autonuma was still better.

balancenuma was all right, better than mainline.

MMTests Statistics: duration
3.7.0-rc7 3.7.0-rc6 3.7.0-rc8 3.7.0-rc7 3.7.0-rc7 3.7.0-rc7
stats-v8r6numacore-20121130numafix-20121209autonuma-v28fastr4balancenuma-v9r2balancenuma-v10r3
User 177832.71 148340.09 165197.46 177337.90 176411.93 176466.36
System 89.07 28052.02 12438.18 287.31 1464.93 1467.74
Elapsed 4035.81 4041.26 4038.34 4028.05 4041.53 4031.74

numacores system CPU usage is incredibly high -- over 8 times higher
than balancenumas.

balancenumas system CPU usage also sucks to be honest.

MMTests Statistics: vmstat
3.7.0-rc7 3.7.0-rc6 3.7.0-rc8 3.7.0-rc7 3.7.0-rc7 3.7.0-rc7
stats-v8r6numacore-20121130numafix-20121209autonuma-v28fastr4balancenuma-v9r2balancenuma-v10r3
Page Ins 37380 66040 34576 36416 35452 34948
Page Outs 29224 46900 31972 29584 29612 30892
Swap Ins 0 0 0 0 0 0
Swap Outs 0 0 0 0 0 0
Direct pages scanned 0 0 0 0 0 0
Kswapd pages scanned 0 0 0 0 0 0
Kswapd pages reclaimed 0 0 0 0 0 0
Direct pages reclaimed 0 0 0 0 0 0
Kswapd efficiency 100% 100% 100% 100% 100% 100%
Kswapd velocity 0.000 0.000 0.000 0.000 0.000 0.000
Direct efficiency 100% 100% 100% 100% 100% 100%
Direct velocity 0.000 0.000 0.000 0.000 0.000 0.000
Percentage direct scans 0% 0% 0% 0% 0% 0%
Page writes by reclaim 0 0 0 0 0 0
Page writes file 0 0 0 0 0 0
Page writes anon 0 0 0 0 0 0
Page reclaim immediate 0 0 0 0 0 0
Page rescued immediate 0 0 0 0 0 0
Slabs scanned 0 0 0 0 0 0
Direct inode steals 0 0 0 0 0 0
Kswapd inode steals 0 0 0 0 0 0
Kswapd skipped wait 0 0 0 0 0 0
THP fault alloc 2 3 1 2 2 2
THP collapse alloc 0 0 0 0 0 0
THP splits 0 0 0 0 0 0
THP fault fallback 0 0 0 0 0 0
THP collapse fail 0 0 0 0 0 0
Compaction stalls 0 0 0 0 0 0
Compaction success 0 0 0 0 0 0
Compaction failures 0 0 0 0 0 0
Page migrate success 0 0 193988041 0 37611432 39796961
Page migrate failure 0 0 0 0 0 0
Compaction pages isolated 0 0 0 0 0 0
Compaction migrate scanned 0 0 0 0 0 0
Compaction free scanned 0 0 0 0 0 0
Compaction cost 0 0 201359 0 39040 41309
NUMA PTE updates 0 0 904384590 0 288455303 286931926
NUMA hint faults 0 0 0 0 270103189 269176121
NUMA hint local faults 0 0 0 0 70822016 70400386
NUMA pages migrated 0 0 193988041 0 37611432 39796961
AutoNUMA cost 0 0 10016 0 1353249 1348645

According to this, numacore never had a NUMA fault. This is completely broken
obviously and it's because PTE NUMA hinting faults are not accounted for
by numacore because that path does not call numa_migration_target(). The
consequences are not that great, it just means that the notional "AutoNUMA
cost" is meaningless for numacore.

What is interesting is numacores migration rate -- 187MB/sec on average. This
is over quadruple balancenumas migration rate of 38MB/sec on average.

SpecJBB, Single JVM, THP is enabled
===================================

As with the Multiple JVM test with THP enabled, numacore crashes. This
time the message is

Timing Measurement began Sun Dec 09 17:12:53 GMT 2012 for 0.5 minutes
Exception in thread "Thread-1040" java.lang.NullPointerException
at java.util.TreeMap.access$100(Unknown Source)
at java.util.TreeMap$PrivateEntryIterator.nextEntry(Unknown Source)
at java.util.TreeMap$ValueIterator.next(Unknown Source)
at spec.jbb.DeliveryTransaction.preprocess(Unknown Source)
at spec.jbb.DeliveryHandler.handleDelivery(Unknown Source)
at spec.jbb.DeliveryTransaction.process(Unknown Source)
at spec.jbb.TransactionManager.runTxn(Unknown Source)
at spec.jbb.TransactionManager.goManual(Unknown Source)
at spec.jbb.TransactionManager.go(Unknown Source)
at spec.jbb.JBBmain.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Timing Measurement ended Sun Dec 09 17:13:23 GMT 2012

Here are the rest of the resutls

3.7.0-rc7 3.7.0-rc6 3.7.0-rc7 3.7.0-rc7 3.7.0-rc7
stats-v8r6 numacore-20121130 autonuma-v28fastr4 balancenuma-v9r2 balancenuma-v10r3
TPut 1 25550.00 ( 0.00%) 25491.00 ( -0.23%) 24233.00 ( -5.15%) 24913.00 ( -2.49%) 26480.00 ( 3.64%)
TPut 2 55943.00 ( 0.00%) 51630.00 ( -7.71%) 55312.00 ( -1.13%) 55042.00 ( -1.61%) 56920.00 ( 1.75%)
TPut 3 87707.00 ( 0.00%) 74497.00 (-15.06%) 88569.00 ( 0.98%) 86135.00 ( -1.79%) 88608.00 ( 1.03%)
TPut 4 117911.00 ( 0.00%) 98435.00 (-16.52%) 118561.00 ( 0.55%) 117486.00 ( -0.36%) 117953.00 ( 0.04%)
TPut 5 143285.00 ( 0.00%) 133964.00 ( -6.51%) 145703.00 ( 1.69%) 142821.00 ( -0.32%) 144926.00 ( 1.15%)
TPut 6 171208.00 ( 0.00%) 152795.00 (-10.75%) 171006.00 ( -0.12%) 170635.00 ( -0.33%) 169394.00 ( -1.06%)
TPut 7 195635.00 ( 0.00%) 162517.00 (-16.93%) 198699.00 ( 1.57%) 196108.00 ( 0.24%) 196491.00 ( 0.44%)
TPut 8 222655.00 ( 0.00%) 168679.00 (-24.24%) 224903.00 ( 1.01%) 223494.00 ( 0.38%) 225978.00 ( 1.49%)
TPut 9 244787.00 ( 0.00%) 193394.00 (-20.99%) 248313.00 ( 1.44%) 251858.00 ( 2.89%) 251569.00 ( 2.77%)
TPut 10 271565.00 ( 0.00%) 237987.00 (-12.36%) 272148.00 ( 0.21%) 275869.00 ( 1.58%) 279049.00 ( 2.76%)
TPut 11 298270.00 ( 0.00%) 207908.00 (-30.30%) 303749.00 ( 1.84%) 301763.00 ( 1.17%) 301399.00 ( 1.05%)
TPut 12 320867.00 ( 0.00%) 257937.00 (-19.61%) 327808.00 ( 2.16%) 329681.00 ( 2.75%) 330506.00 ( 3.00%)
TPut 13 343514.00 ( 0.00%) 248474.00 (-27.67%) 349080.00 ( 1.62%) 340606.00 ( -0.85%) 350817.00 ( 2.13%)
TPut 14 365321.00 ( 0.00%) 298876.00 (-18.19%) 370026.00 ( 1.29%) 379939.00 ( 4.00%) 361752.00 ( -0.98%)
TPut 15 377071.00 ( 0.00%) 296562.00 (-21.35%) 329847.00 (-12.52%) 395421.00 ( 4.87%) 396091.00 ( 5.04%)
TPut 16 404979.00 ( 0.00%) 287964.00 (-28.89%) 411066.00 ( 1.50%) 420551.00 ( 3.85%) 411673.00 ( 1.65%)
TPut 17 420593.00 ( 0.00%) 342590.00 (-18.55%) 428242.00 ( 1.82%) 437461.00 ( 4.01%) 428270.00 ( 1.83%)
TPut 18 440178.00 ( 0.00%) 377508.00 (-14.24%) 440392.00 ( 0.05%) 455014.00 ( 3.37%) 447671.00 ( 1.70%)
TPut 19 448876.00 ( 0.00%) 397727.00 (-11.39%) 462036.00 ( 2.93%) 479223.00 ( 6.76%) 461881.00 ( 2.90%)
TPut 20 460513.00 ( 0.00%) 411831.00 (-10.57%) 476437.00 ( 3.46%) 493176.00 ( 7.09%) 474824.00 ( 3.11%)
TPut 21 474161.00 ( 0.00%) 442153.00 ( -6.75%) 487513.00 ( 2.82%) 505246.00 ( 6.56%) 468938.00 ( -1.10%)
TPut 22 474493.00 ( 0.00%) 429921.00 ( -9.39%) 487920.00 ( 2.83%) 527360.00 ( 11.14%) 475208.00 ( 0.15%)
TPut 23 489559.00 ( 0.00%) 460354.00 ( -5.97%) 508298.00 ( 3.83%) 534820.00 ( 9.25%) 490743.00 ( 0.24%)
TPut 24 495378.00 ( 0.00%) 486826.00 ( -1.73%) 514403.00 ( 3.84%) 545294.00 ( 10.08%) 493974.00 ( -0.28%)
TPut 25 491795.00 ( 0.00%) 520474.00 ( 5.83%) 507373.00 ( 3.17%) 543526.00 ( 10.52%) 489850.00 ( -0.40%)
TPut 26 490038.00 ( 0.00%) 465587.00 ( -4.99%) 376322.00 (-23.21%) 545175.00 ( 11.25%) 491352.00 ( 0.27%)
TPut 27 491233.00 ( 0.00%) 469764.00 ( -4.37%) 366225.00 (-25.45%) 536927.00 ( 9.30%) 489611.00 ( -0.33%)
TPut 28 489058.00 ( 0.00%) 489561.00 ( 0.10%) 414027.00 (-15.34%) 543127.00 ( 11.06%) 473835.00 ( -3.11%)
TPut 29 471539.00 ( 0.00%) 492496.00 ( 4.44%) 400529.00 (-15.06%) 541615.00 ( 14.86%) 486009.00 ( 3.07%)
TPut 30 480343.00 ( 0.00%) 488349.00 ( 1.67%) 405612.00 (-15.56%) 542904.00 ( 13.02%) 478384.00 ( -0.41%)
TPut 31 478109.00 ( 0.00%) 460043.00 ( -3.78%) 401471.00 (-16.03%) 529079.00 ( 10.66%) 466457.00 ( -2.44%)
TPut 32 475736.00 ( 0.00%) 472007.00 ( -0.78%) 401075.00 (-15.69%) 532423.00 ( 11.92%) 467866.00 ( -1.65%)
TPut 33 470758.00 ( 0.00%) 474348.00 ( 0.76%) 399592.00 (-15.12%) 518811.00 ( 10.21%) 464764.00 ( -1.27%)
TPut 34 467304.00 ( 0.00%) 475878.00 ( 1.83%) 394589.00 (-15.56%) 518334.00 ( 10.92%) 446719.00 ( -4.41%)
TPut 35 466391.00 ( 0.00%) 487411.00 ( 4.51%) 382799.00 (-17.92%) 513591.00 ( 10.12%) 447071.00 ( -4.14%)
TPut 36 452722.00 ( 0.00%) 478050.00 ( 5.59%) 381120.00 (-15.82%) 503801.00 ( 11.28%) 452243.00 ( -0.11%)
TPut 37 447878.00 ( 0.00%) 478467.00 ( 6.83%) 382803.00 (-14.53%) 494555.00 ( 10.42%) 442751.00 ( -1.14%)
TPut 38 447907.00 ( 0.00%) 455542.00 ( 1.70%) 341693.00 (-23.71%) 482758.00 ( 7.78%) 444023.00 ( -0.87%)
TPut 39 428322.00 ( 0.00%) 367921.00 (-14.10%) 404210.00 ( -5.63%) 464550.00 ( 8.46%) 440482.00 ( 2.84%)
TPut 40 429157.00 ( 0.00%) 394277.00 ( -8.13%) 378554.00 (-11.79%) 467767.00 ( 9.00%) 411807.00 ( -4.04%)
TPut 41 424339.00 ( 0.00%) 415413.00 ( -2.10%) 399220.00 ( -5.92%) 457669.00 ( 7.85%) 428273.00 ( 0.93%)
TPut 42 397440.00 ( 0.00%) 421027.00 ( 5.93%) 372161.00 ( -6.36%) 458156.00 ( 15.28%) 422535.00 ( 6.31%)
TPut 43 405391.00 ( 0.00%) 433900.00 ( 7.03%) 383936.00 ( -5.29%) 438929.00 ( 8.27%) 410196.00 ( 1.19%)
TPut 44 400692.00 ( 0.00%) 427504.00 ( 6.69%) 374757.00 ( -6.47%) 423538.00 ( 5.70%) 399471.00 ( -0.30%)
TPut 45 399623.00 ( 0.00%) 372622.00 ( -6.76%) 379797.00 ( -4.96%) 407255.00 ( 1.91%) 374068.00 ( -6.39%)
TPut 46 391920.00 ( 0.00%) 351205.00 (-10.39%) 368042.00 ( -6.09%) 411353.00 ( 4.96%) 384363.00 ( -1.93%)
TPut 47 378199.00 ( 0.00%) 358150.00 ( -5.30%) 368744.00 ( -2.50%) 408739.00 ( 8.08%) 385670.00 ( 1.98%)
TPut 48 379346.00 ( 0.00%) 387287.00 ( 2.09%) 373581.00 ( -1.52%) 423791.00 ( 11.72%) 380665.00 ( 0.35%)
TPut 49 373614.00 ( 0.00%) 395793.00 ( 5.94%) 372621.00 ( -0.27%) 423024.00 ( 13.22%) 377985.00 ( 1.17%)
TPut 50 372494.00 ( 0.00%) 366488.00 ( -1.61%) 388778.00 ( 4.37%) 410647.00 ( 10.24%) 378831.00 ( 1.70%)
TPut 51 382195.00 ( 0.00%) 381771.00 ( -0.11%) 387687.00 ( 1.44%) 423249.00 ( 10.74%) 402233.00 ( 5.24%)
TPut 52 369118.00 ( 0.00%) 429441.00 ( 16.34%) 390226.00 ( 5.72%) 410023.00 ( 11.08%) 396558.00 ( 7.43%)
TPut 53 366453.00 ( 0.00%) 445744.00 ( 21.64%) 399257.00 ( 8.95%) 405937.00 ( 10.77%) 383916.00 ( 4.77%)
TPut 54 366571.00 ( 0.00%) 375762.00 ( 2.51%) 395098.00 ( 7.78%) 402220.00 ( 9.72%) 395417.00 ( 7.87%)
TPut 55 367580.00 ( 0.00%) 336113.00 ( -8.56%) 400550.00 ( 8.97%) 420978.00 ( 14.53%) 398098.00 ( 8.30%)
TPut 56 367056.00 ( 0.00%) 375635.00 ( 2.34%) 385743.00 ( 5.09%) 412685.00 ( 12.43%) 384029.00 ( 4.62%)
TPut 57 359163.00 ( 0.00%) 354001.00 ( -1.44%) 389827.00 ( 8.54%) 394688.00 ( 9.89%) 381032.00 ( 6.09%)
TPut 58 360552.00 ( 0.00%) 353312.00 ( -2.01%) 394099.00 ( 9.30%) 388655.00 ( 7.79%) 378132.00 ( 4.88%)
TPut 59 354967.00 ( 0.00%) 368534.00 ( 3.82%) 390746.00 ( 10.08%) 399086.00 ( 12.43%) 387101.00 ( 9.05%)
TPut 60 362976.00 ( 0.00%) 388472.00 ( 7.02%) 383073.00 ( 5.54%) 399713.00 ( 10.12%) 390635.00 ( 7.62%)
TPut 61 368072.00 ( 0.00%) 399476.00 ( 8.53%) 380807.00 ( 3.46%) 372060.00 ( 1.08%) 383187.00 ( 4.11%)
TPut 62 356938.00 ( 0.00%) 385648.00 ( 8.04%) 387736.00 ( 8.63%) 377183.00 ( 5.67%) 378484.00 ( 6.04%)
TPut 63 357491.00 ( 0.00%) 404325.00 ( 13.10%) 396672.00 ( 10.96%) 384221.00 ( 7.48%) 378907.00 ( 5.99%)
TPut 64 357322.00 ( 0.00%) 389552.00 ( 9.02%) 386826.00 ( 8.26%) 378601.00 ( 5.96%) 369852.00 ( 3.51%)
TPut 65 341262.00 ( 0.00%) 394964.00 ( 15.74%) 380271.00 ( 11.43%) 382896.00 ( 12.20%) 382897.00 ( 12.20%)
TPut 66 357807.00 ( 0.00%) 384846.00 ( 7.56%) 362723.00 ( 1.37%) 361530.00 ( 1.04%) 380023.00 ( 6.21%)
TPut 67 345092.00 ( 0.00%) 376842.00 ( 9.20%) 364193.00 ( 5.54%) 374449.00 ( 8.51%) 373877.00 ( 8.34%)
TPut 68 350334.00 ( 0.00%) 358330.00 ( 2.28%) 359368.00 ( 2.58%) 384920.00 ( 9.87%) 381888.00 ( 9.01%)
TPut 69 348372.00 ( 0.00%) 356188.00 ( 2.24%) 364449.00 ( 4.61%) 395611.00 ( 13.56%) 375892.00 ( 7.90%)
TPut 70 335077.00 ( 0.00%) 359313.00 ( 7.23%) 356418.00 ( 6.37%) 375448.00 ( 12.05%) 372358.00 ( 11.13%)
TPut 71 341197.00 ( 0.00%) 364168.00 ( 6.73%) 343847.00 ( 0.78%) 376113.00 ( 10.23%) 384292.00 ( 12.63%)
TPut 72 345032.00 ( 0.00%) 356934.00 ( 3.45%) 345007.00 ( -0.01%) 375313.00 ( 8.78%) 381504.00 ( 10.57%)

numacore v17 was doing reasonably well but we knew that already.

autonuma does not do great on this test.

balancenuma does all right. The scalability patches actually hurt in this case
but it's likely down to varability in the decisions made by the scheduler as much
as anything else.

SPECJBB PEAKS
3.7.0-rc7 3.7.0-rc6 3.7.0-rc7 3.7.0-rc7 3.7.0-rc7
stats-v8r6 numacore-20121130 autonuma-v28fastr4 balancenuma-v9r2 balancenuma-v10r3
Expctd Warehouse 48.00 ( 0.00%) 48.00 ( 0.00%) 48.00 ( 0.00%) 48.00 ( 0.00%) 48.00 ( 0.00%)
Expctd Peak Bops 379346.00 ( 0.00%) 387287.00 ( 2.09%) 373581.00 ( -1.52%) 423791.00 ( 11.72%) 380665.00 ( 0.35%)
Actual Warehouse 24.00 ( 0.00%) 25.00 ( 4.17%) 24.00 ( 0.00%) 24.00 ( 0.00%) 24.00 ( 0.00%)
Actual Peak Bops 495378.00 ( 0.00%) 520474.00 ( 5.07%) 514403.00 ( 3.84%) 545294.00 ( 10.08%) 493974.00 ( -0.28%)
SpecJBB Bops 183389.00 ( 0.00%) 193652.00 ( 5.60%) 193461.00 ( 5.49%) 201083.00 ( 9.65%) 195465.00 ( 6.58%)
SpecJBB Bops/JVM 183389.00 ( 0.00%) 193652.00 ( 5.60%) 193461.00 ( 5.49%) 201083.00 ( 9.65%) 195465.00 ( 6.58%)

Balancenuma does all right on its specjbb score but the peak score with
the migration scalability patches applied is hurt. At least it's still
comparable to mainline.

MMTests Statistics: duration
3.7.0-rc7 3.7.0-rc6 3.7.0-rc8 3.7.0-rc7 3.7.0-rc7 3.7.0-rc7
stats-v8r6numacore-20121130numafix-20121209autonuma-v28fastr4balancenuma-v9r2balancenuma-v10r3
User 316340.52 311420.23 31308.52 314589.64 316061.23 315584.37
System 102.08 3067.27 803.23 352.70 428.76 450.71
Elapsed 7433.22 7436.63 1398.05 7434.74 7432.60 7435.03

Usual comments about system CPU usage. You actually see latest numacore
figures here because they are based on what happened up until the crash.

MMTests Statistics: vmstat
3.7.0-rc7 3.7.0-rc6 3.7.0-rc8 3.7.0-rc7 3.7.0-rc7 3.7.0-rc7
stats-v8r6numacore-20121130numafix-20121209autonuma-v28fastr4balancenuma-v9r2balancenuma-v10r3
Page Ins 66212 36180 31560 36152 36188 63852
Page Outs 31248 35544 12016 28388 28024 42360
Swap Ins 0 0 0 0 0 0
Swap Outs 0 0 0 0 0 0
Direct pages scanned 0 0 0 0 0 0
Kswapd pages scanned 0 0 0 0 0 0
Kswapd pages reclaimed 0 0 0 0 0 0
Direct pages reclaimed 0 0 0 0 0 0
Kswapd efficiency 100% 100% 100% 100% 100% 100%
Kswapd velocity 0.000 0.000 0.000 0.000 0.000 0.000
Direct efficiency 100% 100% 100% 100% 100% 100%
Direct velocity 0.000 0.000 0.000 0.000 0.000 0.000
Percentage direct scans 0% 0% 0% 0% 0% 0%
Page writes by reclaim 0 0 0 0 0 0
Page writes file 0 0 0 0 0 0
Page writes anon 0 0 0 0 0 0
Page reclaim immediate 0 0 0 0 0 0
Page rescued immediate 0 0 0 0 0 0
Slabs scanned 0 0 0 0 0 0
Direct inode steals 0 0 0 0 0 0
Kswapd inode steals 0 0 0 0 0 0
Kswapd skipped wait 0 0 0 0 0 0
THP fault alloc 48874 45657 34986 48296 48697 47056
THP collapse alloc 51 2 9 157 53 69
THP splits 70 37 28 83 78 56
THP fault fallback 0 0 0 0 0 0
THP collapse fail 0 0 0 0 0 0
Compaction stalls 0 0 0 0 0 0
Compaction success 0 0 0 0 0 0
Compaction failures 0 0 0 0 0 0
Page migrate success 0 0 110442307 0 45908125 46995604
Page migrate failure 0 0 0 0 0 0
Compaction pages isolated 0 0 0 0 0 0
Compaction migrate scanned 0 0 0 0 0 0
Compaction free scanned 0 0 0 0 0 0
Compaction cost 0 0 114639 0 47652 48781
NUMA PTE updates 0 0 391813174 0 351907231 361308027
NUMA hint faults 0 0 796717 0 2010327 1867697
NUMA hint local faults 0 0 261885 0 677602 572742
NUMA pages migrated 0 0 110442307 0 45908125 46995604
AutoNUMA cost 0 0 8824 0 13387 12760

THP was certainly enabled.

numacores migration rate is extremely high until it crashed -- 308MB/sec
as opposed to balancenumas 24MB/sec on average.

SpecJBB, Single JVM, THP is disabled
====================================

3.7.0-rc7 3.7.0-rc6 3.7.0-rc8 3.7.0-rc7 3.7.0-rc7 3.7.0-rc7
stats-v8r6 numacore-20121130 numafix-20121209 autonuma-v28fastr4 balancenuma-v9r2 balancenuma-v10r3
TPut 1 19861.00 ( 0.00%) 18255.00 ( -8.09%) 20169.00 ( 1.55%) 19636.00 ( -1.13%) 19838.00 ( -0.12%) 20650.00 ( 3.97%)
TPut 2 47613.00 ( 0.00%) 37136.00 (-22.00%) 45050.00 ( -5.38%) 47153.00 ( -0.97%) 47481.00 ( -0.28%) 48199.00 ( 1.23%)
TPut 3 72438.00 ( 0.00%) 55692.00 (-23.12%) 64075.00 (-11.55%) 69394.00 ( -4.20%) 72029.00 ( -0.56%) 72932.00 ( 0.68%)
TPut 4 98455.00 ( 0.00%) 81301.00 (-17.42%) 93595.00 ( -4.94%) 98577.00 ( 0.12%) 98437.00 ( -0.02%) 99748.00 ( 1.31%)
TPut 5 120831.00 ( 0.00%) 89067.00 (-26.29%) 115796.00 ( -4.17%) 120805.00 ( -0.02%) 117218.00 ( -2.99%) 121254.00 ( 0.35%)
TPut 6 140013.00 ( 0.00%) 108349.00 (-22.62%) 116704.00 (-16.65%) 125079.00 (-10.67%) 139878.00 ( -0.10%) 145360.00 ( 3.82%)
TPut 7 163553.00 ( 0.00%) 116192.00 (-28.96%) 118711.00 (-27.42%) 164368.00 ( 0.50%) 167133.00 ( 2.19%) 169539.00 ( 3.66%)
TPut 8 190148.00 ( 0.00%) 125955.00 (-33.76%) 118079.00 (-37.90%) 188906.00 ( -0.65%) 183058.00 ( -3.73%) 188936.00 ( -0.64%)
TPut 9 211343.00 ( 0.00%) 144068.00 (-31.83%) 170067.00 (-19.53%) 206645.00 ( -2.22%) 205699.00 ( -2.67%) 217322.00 ( 2.83%)
TPut 10 233190.00 ( 0.00%) 148098.00 (-36.49%) 133365.00 (-42.81%) 234533.00 ( 0.58%) 233632.00 ( 0.19%) 227292.00 ( -2.53%)
TPut 11 253333.00 ( 0.00%) 146043.00 (-42.35%) 108866.00 (-57.03%) 254167.00 ( 0.33%) 251938.00 ( -0.55%) 259924.00 ( 2.60%)
TPut 12 270661.00 ( 0.00%) 131739.00 (-51.33%) 146170.00 (-46.00%) 271490.00 ( 0.31%) 271393.00 ( 0.27%) 272536.00 ( 0.69%)
TPut 13 299807.00 ( 0.00%) 169396.00 (-43.50%) 134946.00 (-54.99%) 299758.00 ( -0.02%) 270594.00 ( -9.74%) 299110.00 ( -0.23%)
TPut 14 319243.00 ( 0.00%) 150705.00 (-52.79%) 145135.00 (-54.54%) 318481.00 ( -0.24%) 318566.00 ( -0.21%) 325133.00 ( 1.84%)
TPut 15 339054.00 ( 0.00%) 116872.00 (-65.53%) 127277.00 (-62.46%) 331534.00 ( -2.22%) 344672.00 ( 1.66%) 318119.00 ( -6.17%)
TPut 16 354315.00 ( 0.00%) 124346.00 (-64.91%) 86657.00 (-75.54%) 352600.00 ( -0.48%) 316761.00 (-10.60%) 364648.00 ( 2.92%)
TPut 17 371306.00 ( 0.00%) 118493.00 (-68.09%) 93297.00 (-74.87%) 368260.00 ( -0.82%) 328888.00 (-11.42%) 371088.00 ( -0.06%)
TPut 18 386361.00 ( 0.00%) 138571.00 (-64.13%) 208447.00 (-46.05%) 374358.00 ( -3.11%) 356148.00 ( -7.82%) 399913.00 ( 3.51%)
TPut 19 401827.00 ( 0.00%) 118855.00 (-70.42%) 155803.00 (-61.23%) 399476.00 ( -0.59%) 393918.00 ( -1.97%) 405771.00 ( 0.98%)
TPut 20 411130.00 ( 0.00%) 144024.00 (-64.97%) 116524.00 (-71.66%) 407799.00 ( -0.81%) 377706.00 ( -8.13%) 406038.00 ( -1.24%)
TPut 21 425352.00 ( 0.00%) 154264.00 (-63.73%) 144766.00 (-65.97%) 429226.00 ( 0.91%) 431677.00 ( 1.49%) 431583.00 ( 1.46%)
TPut 22 438150.00 ( 0.00%) 153892.00 (-64.88%) 222211.00 (-49.28%) 385827.00 (-11.94%) 440379.00 ( 0.51%) 438861.00 ( 0.16%)
TPut 23 438425.00 ( 0.00%) 146506.00 (-66.58%) 213367.00 (-51.33%) 433963.00 ( -1.02%) 361427.00 (-17.56%) 445293.00 ( 1.57%)
TPut 24 461598.00 ( 0.00%) 138869.00 (-69.92%) 189745.00 (-58.89%) 439691.00 ( -4.75%) 471567.00 ( 2.16%) 488259.00 ( 5.78%)
TPut 25 459475.00 ( 0.00%) 141698.00 (-69.16%) 105196.00 (-77.11%) 431373.00 ( -6.12%) 487921.00 ( 6.19%) 447353.00 ( -2.64%)
TPut 26 452651.00 ( 0.00%) 142844.00 (-68.44%) 125573.00 (-72.26%) 447517.00 ( -1.13%) 425336.00 ( -6.03%) 469793.00 ( 3.79%)
TPut 27 450436.00 ( 0.00%) 140870.00 (-68.73%) 68802.00 (-84.73%) 430805.00 ( -4.36%) 456114.00 ( 1.26%) 461172.00 ( 2.38%)
TPut 28 459770.00 ( 0.00%) 143078.00 (-68.88%) 144373.00 (-68.60%) 432260.00 ( -5.98%) 478317.00 ( 4.03%) 452144.00 ( -1.66%)
TPut 29 450347.00 ( 0.00%) 142076.00 (-68.45%) 221760.00 (-50.76%) 440423.00 ( -2.20%) 388175.00 (-13.81%) 473273.00 ( 5.09%)
TPut 30 449252.00 ( 0.00%) 146900.00 (-67.30%) 139971.00 (-68.84%) 435082.00 ( -3.15%) 440795.00 ( -1.88%) 435189.00 ( -3.13%)
TPut 31 446802.00 ( 0.00%) 148008.00 (-66.87%) 195143.00 (-56.32%) 418684.00 ( -6.29%) 417343.00 ( -6.59%) 437562.00 ( -2.07%)
TPut 32 439701.00 ( 0.00%) 149591.00 (-65.98%) 159107.00 (-63.81%) 421866.00 ( -4.06%) 438719.00 ( -0.22%) 469763.00 ( 6.84%)
TPut 33 434477.00 ( 0.00%) 142801.00 (-67.13%) 110758.00 (-74.51%) 420631.00 ( -3.19%) 454673.00 ( 4.65%) 451224.00 ( 3.85%)
TPut 34 423014.00 ( 0.00%) 152308.00 (-63.99%) 111701.00 (-73.59%) 415202.00 ( -1.85%) 415194.00 ( -1.85%) 446735.00 ( 5.61%)
TPut 35 429012.00 ( 0.00%) 154116.00 (-64.08%) 118968.00 (-72.27%) 402395.00 ( -6.20%) 425151.00 ( -0.90%) 434230.00 ( 1.22%)
TPut 36 421097.00 ( 0.00%) 157571.00 (-62.58%) 174626.00 (-58.53%) 404770.00 ( -3.88%) 430480.00 ( 2.23%) 425324.00 ( 1.00%)
TPut 37 414815.00 ( 0.00%) 150771.00 (-63.65%) 238764.00 (-42.44%) 388842.00 ( -6.26%) 393351.00 ( -5.17%) 405824.00 ( -2.17%)
TPut 38 412361.00 ( 0.00%) 157070.00 (-61.91%) 173206.00 (-58.00%) 398947.00 ( -3.25%) 401555.00 ( -2.62%) 432074.00 ( 4.78%)
TPut 39 402234.00 ( 0.00%) 161487.00 (-59.85%) 119790.00 (-70.22%) 382645.00 ( -4.87%) 423106.00 ( 5.19%) 401091.00 ( -0.28%)
TPut 40 380278.00 ( 0.00%) 165947.00 (-56.36%) 309375.00 (-18.65%) 394039.00 ( 3.62%) 405371.00 ( 6.60%) 410739.00 ( 8.01%)
TPut 41 393204.00 ( 0.00%) 160540.00 (-59.17%) 146153.00 (-62.83%) 385605.00 ( -1.93%) 403383.00 ( 2.59%) 372466.00 ( -5.27%)
TPut 42 380622.00 ( 0.00%) 151946.00 (-60.08%) 269523.00 (-29.19%) 374843.00 ( -1.52%) 380797.00 ( 0.05%) 396227.00 ( 4.10%)
TPut 43 371566.00 ( 0.00%) 162369.00 (-56.30%) 344584.00 ( -7.26%) 347951.00 ( -6.36%) 386765.00 ( 4.09%) 345633.00 ( -6.98%)
TPut 44 365538.00 ( 0.00%) 161127.00 (-55.92%) 147195.00 (-59.73%) 355070.00 ( -2.86%) 344701.00 ( -5.70%) 391276.00 ( 7.04%)
TPut 45 359305.00 ( 0.00%) 159062.00 (-55.73%) 102716.00 (-71.41%) 350973.00 ( -2.32%) 370666.00 ( 3.16%) 331191.00 ( -7.82%)
TPut 46 343160.00 ( 0.00%) 163889.00 (-52.24%) 309203.00 ( -9.90%) 347960.00 ( 1.40%) 380147.00 ( 10.78%) 323176.00 ( -5.82%)
TPut 47 346983.00 ( 0.00%) 168666.00 (-51.39%) 330345.00 ( -4.80%) 313612.00 ( -9.62%) 362189.00 ( 4.38%) 343154.00 ( -1.10%)
TPut 48 338143.00 ( 0.00%) 153448.00 (-54.62%) 291944.00 (-13.66%) 341809.00 ( 1.08%) 365342.00 ( 8.04%) 354348.00 ( 4.79%)
TPut 49 333941.00 ( 0.00%) 142784.00 (-57.24%) 252850.00 (-24.28%) 336174.00 ( 0.67%) 371700.00 ( 11.31%) 353148.00 ( 5.75%)
TPut 50 334001.00 ( 0.00%) 135713.00 (-59.37%) 252350.00 (-24.45%) 322489.00 ( -3.45%) 367963.00 ( 10.17%) 355823.00 ( 6.53%)
TPut 51 338310.00 ( 0.00%) 133402.00 (-60.57%) 232361.00 (-31.32%) 354805.00 ( 4.88%) 372592.00 ( 10.13%) 351194.00 ( 3.81%)
TPut 52 322897.00 ( 0.00%) 150293.00 (-53.45%) 193895.00 (-39.95%) 353169.00 ( 9.38%) 363024.00 ( 12.43%) 344846.00 ( 6.80%)
TPut 53 329801.00 ( 0.00%) 160792.00 (-51.25%) 180672.00 (-45.22%) 353588.00 ( 7.21%) 365359.00 ( 10.78%) 355499.00 ( 7.79%)
TPut 54 336610.00 ( 0.00%) 164696.00 (-51.07%) 248332.00 (-26.23%) 361189.00 ( 7.30%) 377851.00 ( 12.25%) 363987.00 ( 8.13%)
TPut 55 325920.00 ( 0.00%) 172380.00 (-47.11%) 271331.00 (-16.75%) 365678.00 ( 12.20%) 375735.00 ( 15.28%) 363697.00 ( 11.59%)
TPut 56 318997.00 ( 0.00%) 176071.00 (-44.80%) 155354.00 (-51.30%) 367048.00 ( 15.06%) 380588.00 ( 19.31%) 362614.00 ( 13.67%)
TPut 57 321776.00 ( 0.00%) 174531.00 (-45.76%) 279294.00 (-13.20%) 341874.00 ( 6.25%) 378996.00 ( 17.78%) 360366.00 ( 11.99%)
TPut 58 308532.00 ( 0.00%) 174202.00 (-43.54%) 170351.00 (-44.79%) 348156.00 ( 12.84%) 361623.00 ( 17.21%) 369693.00 ( 19.82%)
TPut 59 318974.00 ( 0.00%) 175343.00 (-45.03%) 243463.00 (-23.67%) 358252.00 ( 12.31%) 360457.00 ( 13.01%) 364556.00 ( 14.29%)
TPut 60 325465.00 ( 0.00%) 173694.00 (-46.63%) 222867.00 (-31.52%) 360808.00 ( 10.86%) 362745.00 ( 11.45%) 354232.00 ( 8.84%)
TPut 61 319151.00 ( 0.00%) 172320.00 (-46.01%) 218542.00 (-31.52%) 350597.00 ( 9.85%) 371277.00 ( 16.33%) 352478.00 ( 10.44%)
TPut 62 320837.00 ( 0.00%) 172312.00 (-46.29%) 251630.00 (-21.57%) 359062.00 ( 11.91%) 361009.00 ( 12.52%) 352930.00 ( 10.00%)
TPut 63 318198.00 ( 0.00%) 172297.00 (-45.85%) 172040.00 (-45.93%) 356137.00 ( 11.92%) 347637.00 ( 9.25%) 335322.00 ( 5.38%)
TPut 64 321438.00 ( 0.00%) 171894.00 (-46.52%) 151337.00 (-52.92%) 347376.00 ( 8.07%) 346756.00 ( 7.88%) 351410.00 ( 9.32%)
TPut 65 314482.00 ( 0.00%) 169147.00 (-46.21%) 143487.00 (-54.37%) 351726.00 ( 11.84%) 357429.00 ( 13.66%) 351236.00 ( 11.69%)
TPut 66 316802.00 ( 0.00%) 170234.00 (-46.26%) 230207.00 (-27.33%) 344548.00 ( 8.76%) 362143.00 ( 14.31%) 347058.00 ( 9.55%)
TPut 67 312139.00 ( 0.00%) 168180.00 (-46.12%) 148468.00 (-52.44%) 329030.00 ( 5.41%) 353305.00 ( 13.19%) 345903.00 ( 10.82%)
TPut 68 323918.00 ( 0.00%) 168392.00 (-48.01%) 184696.00 (-42.98%) 319985.00 ( -1.21%) 344250.00 ( 6.28%) 345703.00 ( 6.73%)
TPut 69 307506.00 ( 0.00%) 167082.00 (-45.67%) 221855.00 (-27.85%) 340673.00 ( 10.79%) 339346.00 ( 10.35%) 336071.00 ( 9.29%)
TPut 70 306799.00 ( 0.00%) 165764.00 (-45.97%) 246518.00 (-19.65%) 331678.00 ( 8.11%) 349583.00 ( 13.95%) 341944.00 ( 11.46%)
TPut 71 304232.00 ( 0.00%) 165289.00 (-45.67%) 225582.00 (-25.85%) 319824.00 ( 5.13%) 335238.00 ( 10.19%) 343396.00 ( 12.87%)
TPut 72 301619.00 ( 0.00%) 163909.00 (-45.66%) 154552.00 (-48.76%) 326875.00 ( 8.37%) 345999.00 ( 14.71%) 343949.00 ( 14.03%)

Latest numacore is regressing really badly here.

autonuma is all right.

balancenuma is all right. Migration scalability patches actually seem to
hurt a little.

SPECJBB PEAKS
3.7.0-rc7 3.7.0-rc6 3.7.0-rc8 3.7.0-rc7 3.7.0-rc7 3.7.0-rc7
stats-v8r6 numacore-20121130 numafix-20121209 autonuma-v28fastr4 balancenuma-v9r2 balancenuma-v10r3
Expctd Warehouse 48.00 ( 0.00%) 48.00 ( 0.00%) 48.00 ( 0.00%) 48.00 ( 0.00%) 48.00 ( 0.00%) 48.00 ( 0.00%)
Expctd Peak Bops 338143.00 ( 0.00%) 153448.00 (-54.62%) 291944.00 (-13.66%) 341809.00 ( 1.08%) 365342.00 ( 8.04%) 354348.00 ( 4.79%)
Actual Warehouse 24.00 ( 0.00%) 56.00 (133.33%) 43.00 ( 79.17%) 26.00 ( 8.33%) 25.00 ( 4.17%) 24.00 ( 0.00%)
Actual Peak Bops 461598.00 ( 0.00%) 176071.00 (-61.86%) 344584.00 (-25.35%) 447517.00 ( -3.05%) 487921.00 ( 5.70%) 488259.00 ( 5.78%)
SpecJBB Bops 163683.00 ( 0.00%) 83963.00 (-48.70%) 109061.00 (-33.37%) 176379.00 ( 7.76%) 184040.00 ( 12.44%) 179621.00 ( 9.74%)
SpecJBB Bops/JVM 163683.00 ( 0.00%) 83963.00 (-48.70%) 109061.00 (-33.37%) 176379.00 ( 7.76%) 184040.00 ( 12.44%) 179621.00 ( 9.74%)

numacore regresses 25.35% at the peak and 33.37% on its specjbb score.

balancenuma does all right -- 5.78% gain at the peak, 9.74% on its overall
specjbb score.

MMTests Statistics: duration
3.7.0-rc7 3.7.0-rc6 3.7.0-rc8 3.7.0-rc7 3.7.0-rc7 3.7.0-rc7
stats-v8r6numacore-20121130numafix-20121209autonuma-v28fastr4balancenuma-v9r2balancenuma-v10r3
User 316751.91 167098.56 227496.59 307598.67 309109.47 313644.48
System 60.28 122511.08 72477.33 4411.81 1820.70 2654.77
Elapsed 7434.08 7451.36 7476.09 7437.52 7438.28 7438.19

numacores system CPu usage has improved but it's still insane -- 27 times
higher than balancenumas which itself is high. Put another way, numacore
is using over 1000 times more system CPU than the mainline kernel is.

MMTests Statistics: vmstat
3.7.0-rc7 3.7.0-rc6 3.7.0-rc8 3.7.0-rc7 3.7.0-rc7 3.7.0-rc7
stats-v8r6numacore-20121130numafix-20121209autonuma-v28fastr4balancenuma-v9r2balancenuma-v10r3
Page Ins 37112 36416 34572 37436 35400 34708
Page Outs 29252 35664 29788 28120 28504 28292
Swap Ins 0 0 0 0 0 0
Swap Outs 0 0 0 0 0 0
Direct pages scanned 0 0 0 0 0 0
Kswapd pages scanned 0 0 0 0 0 0
Kswapd pages reclaimed 0 0 0 0 0 0
Direct pages reclaimed 0 0 0 0 0 0
Kswapd efficiency 100% 100% 100% 100% 100% 100%
Kswapd velocity 0.000 0.000 0.000 0.000 0.000 0.000
Direct efficiency 100% 100% 100% 100% 100% 100%
Direct velocity 0.000 0.000 0.000 0.000 0.000 0.000
Percentage direct scans 0% 0% 0% 0% 0% 0%
Page writes by reclaim 0 0 0 0 0 0
Page writes file 0 0 0 0 0 0
Page writes anon 0 0 0 0 0 0
Page reclaim immediate 0 0 0 0 0 0
Page rescued immediate 0 0 0 0 0 0
Slabs scanned 0 0 0 0 0 0
Direct inode steals 0 0 0 0 0 0
Kswapd inode steals 0 0 0 0 0 0
Kswapd skipped wait 0 0 0 0 0 0
THP fault alloc 3 2 3 2 2 2
THP collapse alloc 0 0 0 4 0 0
THP splits 0 0 0 1 0 0
THP fault fallback 0 0 0 0 0 0
THP collapse fail 0 0 0 0 0 0
Compaction stalls 0 0 0 0 0 0
Compaction success 0 0 0 0 0 0
Compaction failures 0 0 0 0 0 0
Page migrate success 0 0 472734998 0 24675369 36216149
Page migrate failure 0 0 0 0 0 0
Compaction pages isolated 0 0 0 0 0 0
Compaction migrate scanned 0 0 0 0 0 0
Compaction free scanned 0 0 0 0 0 0
Compaction cost 0 0 490698 0 25613 37592
NUMA PTE updates 0 0 2978374076 0 200854895 256255594
NUMA hint faults 0 0 0 0 195451244 250219588
NUMA hint local faults 0 0 0 0 50377035 63739483
NUMA pages migrated 0 0 472734998 0 24675369 36216149
AutoNUMA cost 0 0 29830 0 979131 1253579

numacore is migrating on average 247MB/sec. balancenuma is migrating
19MB/sec on average.

I ran the other normal benchmarks too. kernbench and aim9 are more or less ok. The impact is on hackbench

HACKBENCH PIPES
3.7.0-rc7 3.7.0-rc6 3.7.0-rc8 3.7.0-rc7 3.7.0-rc7 3.7.0-rc7
stats-v8r6 numacore-20121130 numafix-20121209 autonuma-v28fastr4 balancenuma-v9r2 balancenuma-v10r3
Procs 1 0.0250 ( 0.00%) 0.0260 ( -4.00%) 0.0246 ( 1.48%) 0.0261 ( -4.27%) 0.0325 (-30.07%) 0.0353 (-41.14%)
Procs 4 0.0696 ( 0.00%) 0.0702 ( -0.84%) 0.0602 ( 13.57%) 0.0707 ( -1.65%) 0.0760 ( -9.20%) 0.0738 ( -5.98%)
Procs 8 0.0836 ( 0.00%) 0.0973 (-16.43%) 0.0949 (-13.53%) 0.1030 (-23.21%) 0.0887 ( -6.15%) 0.1031 (-23.36%)
Procs 12 0.0971 ( 0.00%) 0.0969 ( 0.21%) 0.1447 (-49.00%) 0.1235 (-27.19%) 0.0953 ( 1.88%) 0.1394 (-43.56%)
Procs 16 0.1218 ( 0.00%) 0.1286 ( -5.52%) 0.2214 (-81.70%) 0.1775 (-45.69%) 0.1105 ( 9.33%) 0.2188 (-79.57%)
Procs 20 0.1472 ( 0.00%) 0.1508 ( -2.48%) 0.2744 (-86.43%) 0.1584 ( -7.64%) 0.1378 ( 6.38%) 0.2567 (-74.37%)
Procs 24 0.1684 ( 0.00%) 0.1823 ( -8.20%) 0.3602 (-113.82%) 0.4648 (-175.96%) 0.1623 ( 3.68%) 0.3118 (-85.12%)
Procs 28 0.1919 ( 0.00%) 0.1969 ( -2.61%) 0.4632 (-141.39%) 0.5287 (-175.57%) 0.1900 ( 0.96%) 0.4326 (-125.48%)
Procs 32 0.2256 ( 0.00%) 0.2163 ( 4.12%) 0.5040 (-123.40%) 0.4607 (-104.23%) 0.2163 ( 4.13%) 0.4583 (-103.16%)
Procs 36 0.2228 ( 0.00%) 0.2658 (-19.29%) 0.5481 (-145.98%) 0.6190 (-177.83%) 0.2570 (-15.33%) 0.5267 (-136.38%)
Procs 40 0.2811 ( 0.00%) 0.2906 ( -3.37%) 0.6223 (-121.36%) 0.2595 ( 7.69%) 0.2638 ( 6.15%) 0.5941 (-111.35%)

HACKBENCH SOCKETS
3.7.0-rc7 3.7.0-rc6 3.7.0-rc8 3.7.0-rc7 3.7.0-rc7 3.7.0-rc7
stats-v8r6 numacore-20121130 numafix-20121209 autonuma-v28fastr4 balancenuma-v9r2 balancenuma-v10r3
Procs 1 0.0220 ( 0.00%) 0.0220 ( 0.00%) 0.0229 ( -4.20%) 0.0283 (-28.66%) 0.0216 ( 1.89%) 0.0256 (-16.36%)
Procs 4 0.0456 ( 0.00%) 0.0513 (-12.51%) 0.0559 (-22.50%) 0.0820 (-79.73%) 0.0407 ( 10.76%) 0.0627 (-37.46%)
Procs 8 0.0679 ( 0.00%) 0.0714 ( -5.20%) 0.1472 (-116.82%) 0.2772 (-308.32%) 0.0697 ( -2.60%) 0.1715 (-152.63%)
Procs 12 0.0940 ( 0.00%) 0.0973 ( -3.56%) 0.2259 (-140.32%) 0.1155 (-22.87%) 0.0973 ( -3.55%) 0.2459 (-161.55%)
Procs 16 0.1181 ( 0.00%) 0.1263 ( -6.96%) 0.3248 (-174.92%) 0.4467 (-278.19%) 0.1234 ( -4.46%) 0.3231 (-173.55%)
Procs 20 0.1504 ( 0.00%) 0.1531 ( -1.83%) 0.4039 (-168.54%) 0.4917 (-226.94%) 0.1534 ( -1.97%) 0.4172 (-177.36%)
Procs 24 0.1757 ( 0.00%) 0.1826 ( -3.92%) 0.3965 (-125.60%) 0.5142 (-192.57%) 0.1826 ( -3.89%) 0.4759 (-170.78%)
Procs 28 0.2044 ( 0.00%) 0.2166 ( -5.93%) 0.5438 (-165.99%) 0.6600 (-222.85%) 0.2164 ( -5.88%) 0.5455 (-166.83%)
Procs 32 0.2456 ( 0.00%) 0.2501 ( -1.86%) 0.6261 (-154.93%) 0.6391 (-160.22%) 0.2449 ( 0.27%) 0.6093 (-148.11%)
Procs 36 0.2649 ( 0.00%) 0.2747 ( -3.70%) 0.7066 (-166.71%) 0.5775 (-117.97%) 0.2815 ( -6.27%) 0.6840 (-158.19%)
Procs 40 0.3067 ( 0.00%) 0.3114 ( -1.56%) 0.7588 (-147.42%) 0.7517 (-145.12%) 0.3081 ( -0.48%) 0.8871 (-189.27%)

Latest numacore, autonuma and balancenuma are all butchering hackbench
performance. Considering that balancenuma started hurting performance with
the migration scalability patches leads me to conclude that they might be
directly or indirectly responsible.

MMTests Statistics: vmstat
3.7.0-rc7 3.7.0-rc6 3.7.0-rc8 3.7.0-rc7 3.7.0-rc7 3.7.0-rc7
stats-v8r6numacore-20121130numafix-20121209autonuma-v28fastr4balancenuma-v9r2balancenuma-v10r3
Page Ins 4 4 4 4 4 4
Page Outs 1540 1636 2568 2264 1548 2484
Swap Ins 0 0 0 0 0 0
Swap Outs 0 0 0 0 0 0
Direct pages scanned 0 0 0 0 0 0
Kswapd pages scanned 0 0 0 0 0 0
Kswapd pages reclaimed 0 0 0 0 0 0
Direct pages reclaimed 0 0 0 0 0 0
Kswapd efficiency 100% 100% 100% 100% 100% 100%
Kswapd velocity 0.000 0.000 0.000 0.000 0.000 0.000
Direct efficiency 100% 100% 100% 100% 100% 100%
Direct velocity 0.000 0.000 0.000 0.000 0.000 0.000
Percentage direct scans 0% 0% 0% 0% 0% 0%
Page writes by reclaim 0 0 0 0 0 0
Page writes file 0 0 0 0 0 0
Page writes anon 0 0 0 0 0 0
Page reclaim immediate 0 0 0 0 0 0
Page rescued immediate 0 0 0 0 0 0
Slabs scanned 0 0 0 0 0 0
Direct inode steals 0 0 0 0 0 0
Kswapd inode steals 0 0 0 0 0 0
Kswapd skipped wait 0 0 0 0 0 0
THP fault alloc 5 0 0 0 6 5
THP collapse alloc 0 0 0 0 0 0
THP splits 0 0 0 0 0 0
THP fault fallback 0 0 0 0 0 0
THP collapse fail 0 0 0 0 0 0
Compaction stalls 0 0 0 0 0 0
Compaction success 0 0 0 0 0 0
Compaction failures 0 0 0 0 0 0
Page migrate success 0 0 0 0 1649 49
Page migrate failure 0 0 0 0 0 0
Compaction pages isolated 0 0 0 0 0 0
Compaction migrate scanned 0 0 0 0 0 0
Compaction free scanned 0 0 0 0 0 0
Compaction cost 0 0 0 0 1 0
NUMA PTE updates 0 0 0 0 21646 22884
NUMA hint faults 0 0 0 0 1045 2131
NUMA hint local faults 0 0 0 0 40 1218
NUMA pages migrated 0 0 0 0 1649 49
AutoNUMA cost 0 0 0 0 5 10

Based on this, I believe the migration patches are only indirectly
responsible. No way should hackbench be migrating or receiving a PTE update
at all. Rather than withdrawing the scalability patches it might make more
sense to either increase the length of time before a PTE takes place or
to delay NUMA PTE updates until the RSS reaches a particular size instead
of just relying on where the task gets scheduled.

So overall, I still believe that balancenuma should be merged at this point
based on these results. Nothing stops you doing a rebase of numacore on
top afterwards and introduce it in parts validating at each point it's
actually improving performance and not just assuming it does.
--
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Kirill A. Shutemov
2012-12-09 21:20:01 UTC
Permalink
Post by Mel Gorman
Either way, last night I applied a patch on top of latest tip/master to
remove the nr_cpus_allowed check so that numacore would be enabled again
and tested that. In some places it has indeed much improved. In others
it is still regressing badly and in two case, it's corrupting memory --
specjbb when THP is enabled crashes when running for single or multiple
JVMs. It is likely that a zero page is being inserted due to a race with
migration and causes the JVM to throw a null pointer exception. Here is
the comparison on the rough off-chance you actually read it this time.
Are you talking about huge zero page, right?

I've fixed a race in huge zero page implementation recently[1]. Symptoms
were similar -- SIGSEGV in JVM. The patch is in mmotm-2012-12-05-16-56 and
later.

[1] http://lkml.org/lkml/2012/11/30/279
--
Kirill A. Shutemov
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Mel Gorman
2012-12-10 08:50:02 UTC
Permalink
Post by Kirill A. Shutemov
Post by Mel Gorman
Either way, last night I applied a patch on top of latest tip/master to
remove the nr_cpus_allowed check so that numacore would be enabled again
and tested that. In some places it has indeed much improved. In others
it is still regressing badly and in two case, it's corrupting memory --
specjbb when THP is enabled crashes when running for single or multiple
JVMs. It is likely that a zero page is being inserted due to a race with
migration and causes the JVM to throw a null pointer exception. Here is
the comparison on the rough off-chance you actually read it this time.
Are you talking about huge zero page, right?
No, this is happening in tip/master which does not include the huge zero
page work yet. AFAIK, that's still queued in Andrew's tree for the next
merge window. It is possible that there will be collisions between numa
balancing and the huge zero page work but it hasn't happened yet.
Post by Kirill A. Shutemov
I've fixed a race in huge zero page implementation recently[1]. Symptoms
were similar -- SIGSEGV in JVM. The patch is in mmotm-2012-12-05-16-56 and
later.
It might be a similar class of bug.
--
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Srikar Dronamraju
2012-12-10 05:40:01 UTC
Permalink
Post by Mel Gorman
Either way, last night I applied a patch on top of latest tip/master to
remove the nr_cpus_allowed check so that numacore would be enabled again
and tested that. In some places it has indeed much improved. In others
it is still regressing badly and in two case, it's corrupting memory --
specjbb when THP is enabled crashes when running for single or multiple
JVMs. It is likely that a zero page is being inserted due to a race with
migration and causes the JVM to throw a null pointer exception. Here is
the comparison on the rough off-chance you actually read it this time.
I see this failure when running with THP and KSM enabled on
Friday's Tip master. Not sure if Mel was talking about the same issue.

------------[ cut here ]------------
kernel BUG at ../kernel/sched/fair.c:2371!
invalid opcode: 0000 [#1] SMP
Modules linked in: ebtable_nat ebtables autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf bridge stp llc iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 vhost_net macvtap macvlan tun iTCO_wdt iTCO_vendor_support kvm_intel kvm microcode cdc_ether usbnet mii serio_raw i2c_i801 i2c_core lpc_ich mfd_core shpchp ioatdma i7core_edac edac_core bnx2 sg ixgbe dca mdio ext4 mbcache jbd2 sd_mod crc_t10dif mptsas mptscsih mptbase scsi_transport_sas dm_mirror dm_region_hash dm_log dm_mod
CPU 4
Pid: 116, comm: ksmd Not tainted 3.7.0-rc8-tip_master+ #5 IBM BladeCenter HS22V -[7871AC1]-/81Y5995
RIP: 0010:[<ffffffff8108c139>] [<ffffffff8108c139>] task_numa_fault+0x1a9/0x1e0
RSP: 0018:ffff880372237ba8 EFLAGS: 00010246
RAX: 0000000000000074 RBX: 0000000000000001 RCX: 0000000000000001
RDX: 00000000000012ae RSI: 0000000000000004 RDI: 00007faf4fc01000
RBP: ffff880372237be8 R08: 0000000000000000 R09: ffff8803657463f0
R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000012
R13: ffff880372210d00 R14: 0000000000010088 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff88037fc80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000001d26fec CR3: 000000000169f000 CR4: 00000000000027e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process ksmd (pid: 116, threadinfo ffff880372236000, task ffff880372210d00)
Stack:
ffffea0016026c58 00007faf4fc00000 ffff880372237c48 0000000000000001
00007faf4fc01000 ffffea000d6df928 0000000000000001 ffffea00166e9268
ffff880372237c48 ffffffff8113cd0e ffff880300000001 0000000000000002
Call Trace:
[<ffffffff8113cd0e>] __do_numa_page+0xde/0x160
[<ffffffff8113de9e>] handle_pte_fault+0x32e/0xcd0
[<ffffffffa01c22c0>] ? drop_large_spte+0x30/0x30 [kvm]
[<ffffffffa01bf215>] ? kvm_set_spte_hva+0x25/0x30 [kvm]
[<ffffffff8113eab9>] handle_mm_fault+0x279/0x760
[<ffffffff8115c024>] break_ksm+0x74/0xa0
[<ffffffff8115c222>] break_cow+0xa2/0xb0
[<ffffffff8115e38c>] ksm_scan_thread+0xb5c/0xd50
[<ffffffff810771c0>] ? wake_up_bit+0x40/0x40
[<ffffffff8115d830>] ? run_store+0x340/0x340
[<ffffffff8107692e>] kthread+0xce/0xe0
[<ffffffff81076860>] ? kthread_freezable_should_stop+0x70/0x70
[<ffffffff814fa7ac>] ret_from_fork+0x7c/0xb0
[<ffffffff81076860>] ? kthread_freezable_should_stop+0x70/0x70
Code: 89 f0 41 bf 01 00 00 00 8b 1c 10 e9 d7 fe ff ff 8d 14 09 48 63 d2 eb bd 66 2e 0f 1f 84 00 00 00 00 00 49 8b 85 98 07 00 00 eb 91 <0f> 0b eb fe 80 3d 9c 3b 6b 00 01 0f 84 be fe ff ff be 42 09 00
RIP [<ffffffff8108c139>] task_numa_fault+0x1a9/0x1e0
RSP <ffff880372237ba8>
---[ end trace 9584c9b03fc0dbc0 ]---

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Srikar Dronamraju
2012-12-10 07:00:02 UTC
Permalink
Post by Srikar Dronamraju
Post by Mel Gorman
Either way, last night I applied a patch on top of latest tip/master to
remove the nr_cpus_allowed check so that numacore would be enabled again
and tested that. In some places it has indeed much improved. In others
it is still regressing badly and in two case, it's corrupting memory --
specjbb when THP is enabled crashes when running for single or multiple
JVMs. It is likely that a zero page is being inserted due to a race with
migration and causes the JVM to throw a null pointer exception. Here is
the comparison on the rough off-chance you actually read it this time.
I see this failure when running with THP and KSM enabled on
Friday's Tip master. Not sure if Mel was talking about the same issue.
Even occurs with !THP but KSM enabled.
Post by Srikar Dronamraju
------------[ cut here ]------------
kernel BUG at ../kernel/sched/fair.c:2371!
invalid opcode: 0000 [#1] SMP
Modules linked in: ebtable_nat ebtables autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf bridge stp llc iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 vhost_net macvtap macvlan tun iTCO_wdt iTCO_vendor_support kvm_intel kvm microcode cdc_ether usbnet mii serio_raw i2c_i801 i2c_core lpc_ich mfd_core shpchp ioatdma i7core_edac edac_core bnx2 sg ixgbe dca mdio ext4 mbcache jbd2 sd_mod crc_t10dif mptsas mptscsih mptbase scsi_transport_sas dm_mirror dm_region_hash dm_log dm_mod
CPU 4
Pid: 116, comm: ksmd Not tainted 3.7.0-rc8-tip_master+ #5 IBM BladeCenter HS22V -[7871AC1]-/81Y5995
RIP: 0010:[<ffffffff8108c139>] [<ffffffff8108c139>] task_numa_fault+0x1a9/0x1e0
RSP: 0018:ffff880372237ba8 EFLAGS: 00010246
RAX: 0000000000000074 RBX: 0000000000000001 RCX: 0000000000000001
RDX: 00000000000012ae RSI: 0000000000000004 RDI: 00007faf4fc01000
RBP: ffff880372237be8 R08: 0000000000000000 R09: ffff8803657463f0
R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000012
R13: ffff880372210d00 R14: 0000000000010088 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff88037fc80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000001d26fec CR3: 000000000169f000 CR4: 00000000000027e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process ksmd (pid: 116, threadinfo ffff880372236000, task ffff880372210d00)
ffffea0016026c58 00007faf4fc00000 ffff880372237c48 0000000000000001
00007faf4fc01000 ffffea000d6df928 0000000000000001 ffffea00166e9268
ffff880372237c48 ffffffff8113cd0e ffff880300000001 0000000000000002
[<ffffffff8113cd0e>] __do_numa_page+0xde/0x160
[<ffffffff8113de9e>] handle_pte_fault+0x32e/0xcd0
[<ffffffffa01c22c0>] ? drop_large_spte+0x30/0x30 [kvm]
[<ffffffffa01bf215>] ? kvm_set_spte_hva+0x25/0x30 [kvm]
[<ffffffff8113eab9>] handle_mm_fault+0x279/0x760
[<ffffffff8115c024>] break_ksm+0x74/0xa0
[<ffffffff8115c222>] break_cow+0xa2/0xb0
[<ffffffff8115e38c>] ksm_scan_thread+0xb5c/0xd50
[<ffffffff810771c0>] ? wake_up_bit+0x40/0x40
[<ffffffff8115d830>] ? run_store+0x340/0x340
[<ffffffff8107692e>] kthread+0xce/0xe0
[<ffffffff81076860>] ? kthread_freezable_should_stop+0x70/0x70
[<ffffffff814fa7ac>] ret_from_fork+0x7c/0xb0
[<ffffffff81076860>] ? kthread_freezable_should_stop+0x70/0x70
Code: 89 f0 41 bf 01 00 00 00 8b 1c 10 e9 d7 fe ff ff 8d 14 09 48 63 d2 eb bd 66 2e 0f 1f 84 00 00 00 00 00 49 8b 85 98 07 00 00 eb 91 <0f> 0b eb fe 80 3d 9c 3b 6b 00 01 0f 84 be fe ff ff be 42 09 00
RIP [<ffffffff8108c139>] task_numa_fault+0x1a9/0x1e0
RSP <ffff880372237ba8>
---[ end trace 9584c9b03fc0dbc0 ]---
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Ingo Molnar
2012-12-10 12:50:01 UTC
Permalink
Srikar Dronamraju reported that the following assert triggers on
his box:

kernel BUG at ../kernel/sched/fair.c:2371!

Call Trace:
[<ffffffff8113cd0e>] __do_numa_page+0xde/0x160
[<ffffffff8113de9e>] handle_pte_fault+0x32e/0xcd0
[<ffffffffa01c22c0>] ? drop_large_spte+0x30/0x30 [kvm]
[<ffffffffa01bf215>] ? kvm_set_spte_hva+0x25/0x30 [kvm]
[<ffffffff8113eab9>] handle_mm_fault+0x279/0x760
[<ffffffff8115c024>] break_ksm+0x74/0xa0
[<ffffffff8115c222>] break_cow+0xa2/0xb0
[<ffffffff8115e38c>] ksm_scan_thread+0xb5c/0xd50
[<ffffffff810771c0>] ? wake_up_bit+0x40/0x40
[<ffffffff8115d830>] ? run_store+0x340/0x340
[<ffffffff8107692e>] kthread+0xce/0xe0

This means that task_numa_fault() was called for a kernel thread
which has no fault tracking.

This scenario is actually possible if a kernel thread does
fault processing on behalf of a user-space task - ignore
the page fault in that case.

Also remove the (now never triggering) assert and robustify
a nearby assert.

Reported-by: Srikar Dronamraju <***@linux.vnet.ibm.com>
Cc: Linus Torvalds <***@linux-foundation.org>
Cc: Andrew Morton <***@linux-foundation.org>
Cc: Peter Zijlstra <***@chello.nl>
Cc: Andrea Arcangeli <***@redhat.com>
Cc: Rik van Riel <***@redhat.com>
Cc: Mel Gorman <***@suse.de>
Cc: Hugh Dickins <***@google.com>
Cc: Thomas Gleixner <***@linutronix.de>
Signed-off-by: Ingo Molnar <***@kernel.org>
---
kernel/sched/fair.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 9d11a8a..61c7a10 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2351,6 +2351,13 @@ void task_numa_fault(unsigned long addr, int node, int last_cpupid, int pages, b
int priv;
int idx;

+ /*
+ * Kernel threads might not have an mm but might still
+ * do fault processing (such as KSM):
+ */
+ if (!p->numa_faults)
+ return;
+
if (last_cpupid != cpu_pid_to_cpupid(-1, -1)) {
/* Did we access it last time around? */
if (last_pid == this_pid) {
@@ -2367,8 +2374,8 @@ void task_numa_fault(unsigned long addr, int node, int last_cpupid, int pages, b

idx = 2*node + priv;

- WARN_ON_ONCE(last_cpu == -1 || node == -1);
- BUG_ON(!p->numa_faults);
+ if (WARN_ON_ONCE(last_cpu == -1 || node == -1))
+ return;

p->numa_faults_curr[idx] += pages;
shared_fault_tick(p, node, last_cpu, pages);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Srikar Dronamraju
2012-12-13 14:30:02 UTC
Permalink
Post by Ingo Molnar
Srikar Dronamraju reported that the following assert triggers on
kernel BUG at ../kernel/sched/fair.c:2371!
[<ffffffff8113cd0e>] __do_numa_page+0xde/0x160
[<ffffffff8113de9e>] handle_pte_fault+0x32e/0xcd0
[<ffffffffa01c22c0>] ? drop_large_spte+0x30/0x30 [kvm]
[<ffffffffa01bf215>] ? kvm_set_spte_hva+0x25/0x30 [kvm]
[<ffffffff8113eab9>] handle_mm_fault+0x279/0x760
[<ffffffff8115c024>] break_ksm+0x74/0xa0
[<ffffffff8115c222>] break_cow+0xa2/0xb0
[<ffffffff8115e38c>] ksm_scan_thread+0xb5c/0xd50
[<ffffffff810771c0>] ? wake_up_bit+0x40/0x40
[<ffffffff8115d830>] ? run_store+0x340/0x340
[<ffffffff8107692e>] kthread+0xce/0xe0
This means that task_numa_fault() was called for a kernel thread
which has no fault tracking.
This scenario is actually possible if a kernel thread does
fault processing on behalf of a user-space task - ignore
the page fault in that case.
Also remove the (now never triggering) assert and robustify
a nearby assert.
I do confirm that with this change, I dont see the assert anymore.
Post by Ingo Molnar
---
kernel/sched/fair.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 9d11a8a..61c7a10 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2351,6 +2351,13 @@ void task_numa_fault(unsigned long addr, int node, int last_cpupid, int pages, b
int priv;
int idx;
+ /*
+ * Kernel threads might not have an mm but might still
+ */
+ if (!p->numa_faults)
+ return;
+
if (last_cpupid != cpu_pid_to_cpupid(-1, -1)) {
/* Did we access it last time around? */
if (last_pid == this_pid) {
@@ -2367,8 +2374,8 @@ void task_numa_fault(unsigned long addr, int node, int last_cpupid, int pages, b
idx = 2*node + priv;
- WARN_ON_ONCE(last_cpu == -1 || node == -1);
- BUG_ON(!p->numa_faults);
+ if (WARN_ON_ONCE(last_cpu == -1 || node == -1))
+ return;
p->numa_faults_curr[idx] += pages;
shared_fault_tick(p, node, last_cpu, pages);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Mel Gorman
2012-12-10 08:50:02 UTC
Permalink
Post by Srikar Dronamraju
Post by Mel Gorman
Either way, last night I applied a patch on top of latest tip/master to
remove the nr_cpus_allowed check so that numacore would be enabled again
and tested that. In some places it has indeed much improved. In others
it is still regressing badly and in two case, it's corrupting memory --
specjbb when THP is enabled crashes when running for single or multiple
JVMs. It is likely that a zero page is being inserted due to a race with
migration and causes the JVM to throw a null pointer exception. Here is
the comparison on the rough off-chance you actually read it this time.
I see this failure when running with THP and KSM enabled on
Friday's Tip master. Not sure if Mel was talking about the same issue.
------------[ cut here ]------------
kernel BUG at ../kernel/sched/fair.c:2371!
I'm not, this is new to me. I grepped the console logs I have and the closest
I see is a WARN_ON triggered in numacore v17 which is no longer relevant.
--
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Ingo Molnar
2012-12-10 12:40:02 UTC
Permalink
hi Srikar,
Post by Srikar Dronamraju
Post by Mel Gorman
Either way, last night I applied a patch on top of latest tip/master to
remove the nr_cpus_allowed check so that numacore would be enabled again
and tested that. In some places it has indeed much improved. In others
it is still regressing badly and in two case, it's corrupting memory --
specjbb when THP is enabled crashes when running for single or multiple
JVMs. It is likely that a zero page is being inserted due to a race with
migration and causes the JVM to throw a null pointer exception. Here is
the comparison on the rough off-chance you actually read it this time.
I see this failure when running with THP and KSM enabled on
Friday's Tip master. Not sure if Mel was talking about the same issue.
------------[ cut here ]------------
kernel BUG at ../kernel/sched/fair.c:2371!
Could you check whether today's -tip (7ea8701a1a51 or later),
plus the patch below, addresses the crash - while still giving
good NUMA performance?

Thanks,

Ingo

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 9d11a8a..6a89787 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2351,6 +2351,9 @@ void task_numa_fault(unsigned long addr, int node, int last_cpupid, int pages, b
int priv;
int idx;

+ if (!p->numa_faults)
+ return;
+
if (last_cpupid != cpu_pid_to_cpupid(-1, -1)) {
/* Did we access it last time around? */
if (last_pid == this_pid) {
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Ingo Molnar
2012-12-10 11:40:02 UTC
Permalink
Post by Mel Gorman
Post by Ingo Molnar
This is a full release of all the patches so apologies for the
flood. [...]
I have yet to process all your mails, but assuming I address all
your review feedback and the latest unified tree in tip:master
shows no regression in your testing, would you be willing to
start using it for ongoing work?
Ingo,
If you had read the second paragraph of the mail you just responded to or
the results at the end then you would have seen that I had problems with
the performance. [...]
I've posted a (NUMA-placement sensitive workload centric)
performance comparisons between "balancenuma", AutoNUMA and
numa/core unified-v3 to:

https://lkml.org/lkml/2012/12/7/331

I tried to address all performance regressions you and others
have reported.

Here's the direct [bandwidth] comparison of 'balancenuma v10' to
my -v3 tree:

balancenuma | NUMA-tip
[test unit] : -v10 | -v3
------------------------------------------------------------
2x1-bw-process : 6.136 | 9.647: 57.2%
3x1-bw-process : 7.250 | 14.528: 100.4%
4x1-bw-process : 6.867 | 18.903: 175.3%
8x1-bw-process : 7.974 | 26.829: 236.5%
8x1-bw-process-NOTHP : 5.937 | 22.237: 274.5%
16x1-bw-process : 5.592 | 29.294: 423.9%
4x1-bw-thread : 13.598 | 19.290: 41.9%
8x1-bw-thread : 16.356 | 26.391: 61.4%
16x1-bw-thread : 24.608 | 29.557: 20.1%
32x1-bw-thread : 25.477 | 30.232: 18.7%
2x3-bw-thread : 8.785 | 15.327: 74.5%
4x4-bw-thread : 6.366 | 27.957: 339.2%
4x6-bw-thread : 6.287 | 27.877: 343.4%
4x8-bw-thread : 5.860 | 28.439: 385.3%
4x8-bw-thread-NOTHP : 6.167 | 25.067: 306.5%
3x3-bw-thread : 8.235 | 21.560: 161.8%
5x5-bw-thread : 5.762 | 26.081: 352.6%
2x16-bw-thread : 5.920 | 23.269: 293.1%
1x32-bw-thread : 5.828 | 18.985: 225.8%
numa02-bw : 29.054 | 31.431: 8.2%
numa02-bw-NOTHP : 27.064 | 29.104: 7.5%
numa01-bw-thread : 20.338 | 28.607: 40.7%
numa01-bw-thread-NOTHP : 18.528 | 21.119: 14.0%
------------------------------------------------------------

I also tried to reproduce and fix as many bugs you reported as
possible - but my point is that it would be _much_ better if we
actually joined forces.
Post by Mel Gorman
[...] You would also know that tip/master testing for the last
week was failing due to a boot problem (issue was in mainline
not tip and has been already fixed) and would have known that
since the -v18 release that numacore was effectively disabled
on my test machine.
I'm glad it's fixed.
Post by Mel Gorman
Clearly you are not reading the bug reports you are receiving
and you're not seeing the small bit of review feedback or
answering the review questions you have received either. Why
would I be more forthcoming when I feel that it'll simply be
ignored? [...]
I am reading the bug reports and addressing bugs as I can.
Post by Mel Gorman
[...] You simply assume that each batch of patches you place
on top must be fixing all known regressions and ignoring any
evidence to the contrary.
If you had read my mail from last Tuesday you would even know
which patch was causing the problem that effectively disabled
numacore although not why. The comment about p->numa_faults
was completely off the mark (long journey, was tired, assumed
numa_faults was a counter and not a pointer which was
careless). If you had called me on it then I would have
spotted the actual problem sooner. The problem was indeed with
the nr_cpus_allowed == num_online_cpus()s check which I had
pointed out was a suspicious check although for different
reasons. As it turns out, a printk() bodge showed that
nr_cpus_allowed == 80 set in sched_init_smp() while
num_online_cpus() == 48. This effectively disabling numacore.
If you had responded to the bug report, this would likely have
been found last Wednesday.
Does changing it from num_online_cpus() to num_possible_cpus()
help? (Can send a patch if you want.)
Post by Mel Gorman
Post by Ingo Molnar
It would make it much easier for me to pick up your
enhancements, fixes, etc.
Changelog since V9
o Migration scalability (mingo)
To *really* see migration scalability bottlenecks you need to
remove the migration-bandwidth throttling kludge from your tree
(or configure it up very high if you want to do it simple).
Why is it a kludge? I already explained what the rational
behind the rate limiting was. It's not about scalability, it's
about mitigating worse-case behaviour and the amount of time
the kernel spends moving data around which a deliberately
adverse workload can trigger. It is unacceptable if during a
phase change that a process would stall potentially for
milliseconds (seconds if the node is large enough I guess)
while the data is being migrated. Here is it again --
http://www.spinics.net/lists/linux-mm/msg47440.html . You
either ignored the mail or simply could not be bothered
explaining why you thought this was the incorrect decision or
why the concerns about an adverse workload were unimportant.
I think the stalls could have been at least in part due to the
scalability bottlenecks that the rate-limiting code has hidden.

If you think of the NUMA migration as a natural part of the
workload, as a sort of extended cache-miss, and if you assume
that the scheduler is intelligent about not flip-flopping tasks
between nodes (which the latest code certainly is), then I don't
see why the rate of migration should be rate-limited in the VM.

Note that I tried to quantify this effect: the perf bench numa
testcases start from a practical 'worst-case adverse' workload
in essence: all pages concentrated on the wrong node, and the
workload having to migrate all of them over.

We could add a new 'absolutely worst case' testcase, to make it
behaves sanely?
Post by Mel Gorman
I have a vague suspicion actually that when you are modelling
the task->data relationship that you make an implicit
assumption that moving data has zero or near-zero cost. In
such a model it would always make sense to move quickly and
immediately but in practice the cost of moving can exceed the
performance benefit of accessing local data and lead to
regressions. It becomes more pronounced if the nodes are not
fully connected.
I make no such assumption - convergence costs were part of my
measurements.
Post by Mel Gorman
Post by Ingo Molnar
Some (certainly not all) of the performance regressions you
reported were certainly due to numa/core code hitting the
migration codepaths as aggressively as the workload demanded
- and hitting scalability bottlenecks.
How are you so certain? [...]
Hm, I don't think my "some (certainly not all)" statement
Post by Mel Gorman
[...] How do you not know it's because your code is migrating
excessively for no good reason because the algorithm has a
flaw in it? [...]
That's another source - but again not something we should fix by
hiding it under the carpet via migration bandwidth rate limits,
right?
Post by Mel Gorman
[...] Or that the cost of excessive migration is not being
offset by local data accesses? [...]
That's another possibility.

The _real_ fix is to avoid excessive migration on the CPU and
memory placement side, not to throttle the basic mechanism
itself!

I don't exclude the possibility that bandwidth limits might be
needed - but only if everything else fails. Meanwhile, the
bandwidth limits were actively hiding scalability bottlenecks,
which bottlenecks only trigger at higher migration rates.
Post by Mel Gorman
[...] The critical point to note is that if it really was only
scalability problems then autonuma would suffer the same
problems and would be impossible to autonumas performance to
exceed numacores. This isn't the case making it unlikely the
scalability is your only problem.
The scheduling patterns are different - so they can hit
different bottlenecks.
Post by Mel Gorman
Either way, last night I applied a patch on top of latest
tip/master to remove the nr_cpus_allowed check so that
numacore would be enabled again and tested that. In some
places it has indeed much improved. In others it is still
regressing badly and in two case, it's corrupting memory --
specjbb when THP is enabled crashes when running for single or
multiple JVMs. It is likely that a zero page is being inserted
due to a race with migration and causes the JVM to throw a
null pointer exception. Here is the comparison on the rough
off-chance you actually read it this time.
Can you still see the JVM crash with the unified -v3 tree?

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Ingo Molnar
2012-12-10 12:00:02 UTC
Permalink
Post by Ingo Molnar
Post by Mel Gorman
reasons. As it turns out, a printk() bodge showed that
nr_cpus_allowed == 80 set in sched_init_smp() while
num_online_cpus() == 48. This effectively disabling
numacore. If you had responded to the bug report, this would
likely have been found last Wednesday.
Does changing it from num_online_cpus() to num_possible_cpus()
help? (Can send a patch if you want.)
I.e. something like the patch below.

Thanks,

Ingo

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 503ec29..9d11a8a 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2646,7 +2646,7 @@ static bool task_numa_candidate(struct task_struct *p)

/* Don't disturb hard-bound tasks: */
if (sched_feat(NUMA_EXCLUDE_AFFINE)) {
- if (p->nr_cpus_allowed != num_online_cpus())
+ if (p->nr_cpus_allowed != num_possible_cpus())
return false;
}

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Mel Gorman
2012-12-10 15:30:02 UTC
Permalink
Post by Ingo Molnar
Post by Mel Gorman
Post by Ingo Molnar
This is a full release of all the patches so apologies for the
flood. [...]
I have yet to process all your mails, but assuming I address all
your review feedback and the latest unified tree in tip:master
shows no regression in your testing, would you be willing to
start using it for ongoing work?
Ingo,
If you had read the second paragraph of the mail you just responded to or
the results at the end then you would have seen that I had problems with
the performance. [...]
I've posted a (NUMA-placement sensitive workload centric)
performance comparisons between "balancenuma", AutoNUMA and
https://lkml.org/lkml/2012/12/7/331
I tried to address all performance regressions you and others
have reported.
I've responded to this now. I acknowledge that balancenuma does not do
great on them. I've also explained that it's very likely because I did
not hook into the scheduler and I'm relucent to do so. Once I do that,
we're directly colliding when my intention was to handle all the necessary
MM changes, the bare minimum of the scheduler hook and maintain that side
while numacore and all the additional scheduler changes was built on top.
Post by Ingo Molnar
<SNIP>
I also tried to reproduce and fix as many bugs you reported as
possible - but my point is that it would be _much_ better if we
actually joined forces.
Which is what balancenuma was meant to do and what I wanted weeks ago
-- I wanted to keep a handle on the mm side of things and establish
performance baseline for just the mm side that numacore could be compared
against. I'd then help maintain the result, review patches particularly
affecting mm etc. I was hoping that numacore would be rebased to carry
the necessary scheduler changes but that didn't happen. The unified tree
is not equivalent. Just off-hand

1. there is no performance comparison possible with just the mm changes
2. the vmstat fault accounting is broken in the unified tree
3. the code to allow balancenuma to be disabled from command line
was removed which the THP experience has told us is very useful
4. The THP patch was wedged in as hard as possible making it effectively
impossible to treat in isolation
5. ptes are treated as effective hugepage faults which potentially
results in remote->remote copies if tasks share data on a
PMD-boundary even if they do not share data on the page boundary.
For this reason I dislike it quite a bit
6. the migrate rate-limiting code was removed

To be fair, the last one is a difference in opinion. I think migrate
rate-limiting is important because I think it's more important for the
workload to run than the kernel to getting too much in the way thinking
it can do better.

Some of the other changes just made no sense to me and I still fail to
see why you didn't rebase numacore a few weeks ago and instead smacked the
trees together. If it had been a plain rebase then I would have switched
to looking at just numacore on top without having to worry if something
unexpected was broken on the MM side. If something had broken on the MM
side, I'd be on it without wondering if it was due to how the trees were
merged.

For example, I think that point 5 above is the potential source of the
corruption because. You're not flushing the TLBs for the PTEs you are
updating in batch. Granted, you're relaxing rather than restricting access
so it should be ok and at worse cause a spurious fault but I also find
it suspicious that you do not recheck pte_same under the PTL when doing
the final PTE update. I also find it strange that you hold the PTL while
calling task_numa_fault(). No way should the PTL have to protect structures
in kernel/sched and I wonder was that actually part of the reason why you
saw heavy PTL contention.

Basically if I felt that handling ptes in batch like this was of
critical important I would have implemented it very differently on top of
balancenuma. I would have only taken the PTL lock if updating the PTE to
keep contention down and redid racy checks under PTL, I'd have only used
trylock for every non-faulted PTE and I would only have migrated if it
was a remote->local copy. I certainly would not hold PTL while calling
task_numa_fault(). I would have kept the handling ona per-pmd basis when
it was expected that most PTEs underneath should be on the same node.
Post by Ingo Molnar
Post by Mel Gorman
[...] You would also know that tip/master testing for the last
week was failing due to a boot problem (issue was in mainline
not tip and has been already fixed) and would have known that
since the -v18 release that numacore was effectively disabled
on my test machine.
I'm glad it's fixed.
Agreed.
Post by Ingo Molnar
Post by Mel Gorman
Clearly you are not reading the bug reports you are receiving
and you're not seeing the small bit of review feedback or
answering the review questions you have received either. Why
would I be more forthcoming when I feel that it'll simply be
ignored? [...]
I am reading the bug reports and addressing bugs as I can.
Post by Mel Gorman
[...] You simply assume that each batch of patches you place
on top must be fixing all known regressions and ignoring any
evidence to the contrary.
If you had read my mail from last Tuesday you would even know
which patch was causing the problem that effectively disabled
numacore although not why. The comment about p->numa_faults
was completely off the mark (long journey, was tired, assumed
numa_faults was a counter and not a pointer which was
careless). If you had called me on it then I would have
spotted the actual problem sooner. The problem was indeed with
the nr_cpus_allowed == num_online_cpus()s check which I had
pointed out was a suspicious check although for different
reasons. As it turns out, a printk() bodge showed that
nr_cpus_allowed == 80 set in sched_init_smp() while
num_online_cpus() == 48. This effectively disabling numacore.
If you had responded to the bug report, this would likely have
been found last Wednesday.
Does changing it from num_online_cpus() to num_possible_cpus()
help? (Can send a patch if you want.)
I'll check. The patch would be trivial.
Post by Ingo Molnar
Post by Mel Gorman
Post by Ingo Molnar
It would make it much easier for me to pick up your
enhancements, fixes, etc.
Changelog since V9
o Migration scalability (mingo)
To *really* see migration scalability bottlenecks you need to
remove the migration-bandwidth throttling kludge from your tree
(or configure it up very high if you want to do it simple).
Why is it a kludge? I already explained what the rational
behind the rate limiting was. It's not about scalability, it's
about mitigating worse-case behaviour and the amount of time
the kernel spends moving data around which a deliberately
adverse workload can trigger. It is unacceptable if during a
phase change that a process would stall potentially for
milliseconds (seconds if the node is large enough I guess)
while the data is being migrated. Here is it again --
http://www.spinics.net/lists/linux-mm/msg47440.html . You
either ignored the mail or simply could not be bothered
explaining why you thought this was the incorrect decision or
why the concerns about an adverse workload were unimportant.
I think the stalls could have been at least in part due to the
scalability bottlenecks that the rate-limiting code has hidden.
In part yes, but the actual data copying will stall as well. If a node
is 16G and all the data has to migrate from one node to another, it could
take up to 2 seconds even if there is no other contention. This is assuming
roughly 8G/sec transfer speeds but I know is a bit on the low end and it
can vary a lot.
Post by Ingo Molnar
If you think of the NUMA migration as a natural part of the
workload, as a sort of extended cache-miss, and if you assume
that the scheduler is intelligent about not flip-flopping tasks
between nodes (which the latest code certainly is), then I don't
see why the rate of migration should be rate-limited in the VM.
That's just it. I don't view the NUMA migration as a natural part of
the workload. I treat is as a cost that is optionally paid to get local
memory access and that the cost of the move must be offset. I think care
should be taken to minimise the amount of data that is transferred and
the system CPU cost of working out when to migrate should be as low as
possible and my reports have emphasised this.

To some extent I consider THP to have similar restrictions. THP is useless
if the cost of THP allocation is not offset by performance gains due to
reduced TLB misses. I think it's preferable to fail a THP allocation than
spend a lot of time reclaiming pages and compacting memory to satisfy THP.
Reclaim/compaction is meant to give up very quickly.
Post by Ingo Molnar
Note that I tried to quantify this effect: the perf bench numa
testcases start from a practical 'worst-case adverse' workload
in essence: all pages concentrated on the wrong node, and the
workload having to migrate all of them over.
We could add a new 'absolutely worst case' testcase, to make it
behaves sanely?
I don't think it'll tell us anything new. Without rate limiting the process
will stall while the transfer takes place. The duration of the stall will
be related to inter-node bandwidth.
Post by Ingo Molnar
Post by Mel Gorman
I have a vague suspicion actually that when you are modelling
the task->data relationship that you make an implicit
assumption that moving data has zero or near-zero cost. In
such a model it would always make sense to move quickly and
immediately but in practice the cost of moving can exceed the
performance benefit of accessing local data and lead to
regressions. It becomes more pronounced if the nodes are not
fully connected.
I make no such assumption - convergence costs were part of my
measurements.
Then you must expect that squashing all that cost into the smallest period
of time will result in stalls. It's a much higher cost than cache-line
misses when there a process changes to running on a new CPU for example.
Post by Ingo Molnar
Post by Mel Gorman
Post by Ingo Molnar
Some (certainly not all) of the performance regressions you
reported were certainly due to numa/core code hitting the
migration codepaths as aggressively as the workload demanded
- and hitting scalability bottlenecks.
How are you so certain? [...]
Hm, I don't think my "some (certainly not all)" statement
"regressions you reported were certainly due to numa/core code hitting
the migration codepaths" is what led me to believe that you were very sure
about where the source of the regression was.
Post by Ingo Molnar
Post by Mel Gorman
[...] How do you not know it's because your code is migrating
excessively for no good reason because the algorithm has a
flaw in it? [...]
That's another source - but again not something we should fix by
hiding it under the carpet via migration bandwidth rate limits,
right?
I would agree if that was the point of the migration rate-limiting was
to avoid contention. It's not. It's to prevent the kernel getting in the
way of a workload doing work for long periods of time. As balancenuma is
also dumb as rocks with respect to the schedueler it was also aimed at
mitigating problems related to tasks bouncing around if a particular node
was over-subscribed.
Post by Ingo Molnar
Post by Mel Gorman
[...] Or that the cost of excessive migration is not being
offset by local data accesses? [...]
That's another possibility.
The _real_ fix is to avoid excessive migration on the CPU and
memory placement side, not to throttle the basic mechanism
itself!
I don't exclude the possibility that bandwidth limits might be
needed - but only if everything else fails. Meanwhile, the
bandwidth limits were actively hiding scalability bottlenecks,
which bottlenecks only trigger at higher migration rates.
The bottleneck is visible with or without the migration rate limiting.
If it wasn't then the patches would have made no difference between
balancenuma v9 and v10 but they did but they did make a difference.
Post by Ingo Molnar
Post by Mel Gorman
[...] The critical point to note is that if it really was only
scalability problems then autonuma would suffer the same
problems and would be impossible to autonumas performance to
exceed numacores. This isn't the case making it unlikely the
scalability is your only problem.
The scheduling patterns are different - so they can hit
different bottlenecks.
Ok, that is fair enough.
Post by Ingo Molnar
Post by Mel Gorman
Either way, last night I applied a patch on top of latest
tip/master to remove the nr_cpus_allowed check so that
numacore would be enabled again and tested that. In some
places it has indeed much improved. In others it is still
regressing badly and in two case, it's corrupting memory --
specjbb when THP is enabled crashes when running for single or
multiple JVMs. It is likely that a zero page is being inserted
due to a race with migration and causes the JVM to throw a
null pointer exception. Here is the comparison on the rough
off-chance you actually read it this time.
Can you still see the JVM crash with the unified -v3 tree?
The crash was based on tip/master from yesterday. Does that not include
the unified -v3 tree?
--
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Mel Gorman
2012-12-11 01:10:03 UTC
Permalink
Post by Mel Gorman
For example, I think that point 5 above is the potential source of the
corruption because. You're not flushing the TLBs for the PTEs you are
updating in batch. Granted, you're relaxing rather than restricting access
so it should be ok and at worse cause a spurious fault but I also find
it suspicious that you do not recheck pte_same under the PTL when doing
the final PTE update.
Looking again, the lack of a pte_same check should be ok. The addr,
addr_start, ptep and ptep_start is a little messy but also look fine.
You're not accidentally crossing a PMD boundary. You should be protected
against huge pages being collapsed underneath you as you hold mmap_sem for
read. If the first page in the pmd (or VMA) is not present then
target_nid == -1 which gets passed into __do_numa_page. This check

if (target_nid == -1 || target_nid == page_nid)
goto out;

then means you never actually migrate for that whole PMD and will just
clear the PTEs. Possibly wrong, but not what we're looking for. Holding
PTL across task_numa_fault is bad, but not the bad we're looking for.

/me scratches his head

Machine is still unavailable so in an attempt to rattle this out I prototyped
the equivalent patch for balancenuma and then went back to numacore to see
could I spot a major difference. Comparing them, there is no guarantee you
clear pte_numa for the address that was originally faulted if there was a
racing fault that cleared it underneath you but in itself that should not
be an issue. Your use of ptep++ instead of pte_offset_map() might break
on 32-bit with NUMA support if PTE pages are stored in highmem. Still the
wrong wrong.

If the bug is indeed here, it's not obvious. I don't know why I'm
triggering it or why it only triggers for specjbb as I cannot imagine
what the JVM would be doing that is that weird or that would not have
triggered before. Maybe we both suffer this type of problem but that
numacores rate of migration is able to trigger it.
Post by Mel Gorman
Basically if I felt that handling ptes in batch like this was of
critical important I would have implemented it very differently on top of
balancenuma. I would have only taken the PTL lock if updating the PTE to
keep contention down and redid racy checks under PTL, I'd have only used
trylock for every non-faulted PTE and I would only have migrated if it
was a remote->local copy. I certainly would not hold PTL while calling
task_numa_fault(). I would have kept the handling ona per-pmd basis when
it was expected that most PTEs underneath should be on the same node.
This is prototype only but what I was using as a reference to see could
I spot a problem in yours. It has not been even boot tested but avoids
remote->remote copies, contending on PTL or holding it longer than necessary
(should anyway)

---8<---
mm: numa: Batch pte handling

diff --git a/mm/memory.c b/mm/memory.c
index 33e20b3..f871d5d 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3461,30 +3461,14 @@ int numa_migrate_prep(struct page *page, struct vm_area_struct *vma,
return mpol_misplaced(page, vma, addr);
}

-int do_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
- unsigned long addr, pte_t pte, pte_t *ptep, pmd_t *pmd)
+static
+int __do_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
+ unsigned long addr, pte_t pte, pte_t *ptep, pmd_t *pmd,
+ spinlock_t *ptl, bool only_local, bool *migrated)
{
struct page *page = NULL;
- spinlock_t *ptl;
int current_nid = -1;
int target_nid;
- bool migrated = false;
-
- /*
- * The "pte" at this point cannot be used safely without
- * validation through pte_unmap_same(). It's of NUMA type but
- * the pfn may be screwed if the read is non atomic.
- *
- * ptep_modify_prot_start is not called as this is clearing
- * the _PAGE_NUMA bit and it is not really expected that there
- * would be concurrent hardware modifications to the PTE.
- */
- ptl = pte_lockptr(mm, pmd);
- spin_lock(ptl);
- if (unlikely(!pte_same(*ptep, pte))) {
- pte_unmap_unlock(ptep, ptl);
- goto out;
- }

pte = pte_mknonnuma(pte);
set_pte_at(mm, addr, ptep, pte);
@@ -3493,7 +3477,7 @@ int do_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
page = vm_normal_page(vma, addr, pte);
if (!page) {
pte_unmap_unlock(ptep, ptl);
- return 0;
+ goto out;
}

current_nid = page_to_nid(page);
@@ -3509,15 +3493,88 @@ int do_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
goto out;
}

+ /*
+ * Only do remote-local copies when handling PTEs in batch. This does
+ * mean we effectively lost the NUMA hinting fault if the workload
+ * was not converged on a PMD boundary. This is bad but is it worse
+ * can doing a remote->remote copy?
+ */
+ if (only_local && target_nid != numa_node_id()) {
+ current_nid = -1;
+ put_page(page);
+ goto out;
+ }
+
/* Migrate to the requested node */
- migrated = migrate_misplaced_page(page, target_nid);
- if (migrated)
+ *migrated = migrate_misplaced_page(page, target_nid);
+ if (*migrated)
current_nid = target_nid;

out:
+ return current_nid;
+}
+
+int do_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
+ unsigned long addr, pte_t pte, pte_t *ptep, pmd_t *pmd)
+{
+ spinlock_t *ptl;
+ int current_nid = -1;
+ bool migrated = false;
+ unsigned long end_addr;
+
+ /*
+ * The "pte" at this point cannot be used safely without
+ * validation through pte_unmap_same(). It's of NUMA type but
+ * the pfn may be screwed if the read is non atomic.
+ *
+ * ptep_modify_prot_start is not called as this is clearing
+ * the _PAGE_NUMA bit and it is not really expected that there
+ * would be concurrent hardware modifications to the PTE.
+ */
+ ptl = pte_lockptr(mm, pmd);
+ spin_lock(ptl);
+ if (unlikely(!pte_same(*ptep, pte))) {
+ pte_unmap_unlock(ptep, ptl);
+ goto out;
+ }
+
+ current_nid = __do_numa_page(mm, vma, addr, pte, ptep, pmd, ptl, false, &migrated);
+
+ /* Batch handle all PTEs in this area. PTL is not held initially */
+ addr = max(addr & PMD_MASK, vma->vm_start);
+ end_addr = min((addr + PMD_SIZE) & PMD_MASK, vma->vm_end);
+ for (; addr < end_addr; addr += PAGE_SIZE) {
+ bool batch_migrated = false;
+ int batch_nid = -1;
+
+ ptep = pte_offset_map(pmd, addr);
+ pte = *ptep;
+ if (!pte_present(pte))
+ continue;
+ if (!pte_numa(pte))
+ continue;
+
+ if (!spin_trylock(ptl)) {
+ pte_unmap(ptep);
+ break;
+ }
+
+ /* Recheck PTE under lock */
+ if (!pte_same(*ptep, pte)) {
+ pte_unmap_unlock(ptep, ptl);
+ continue;
+ }
+
+ batch_nid = __do_numa_page(mm, vma, addr, pte, ptep, pmd, ptl, true, &batch_migrated);
+ if (batch_nid != -1)
+ task_numa_fault(current_nid, 1, batch_migrated);
+ }
+
+out:
if (current_nid != -1)
task_numa_fault(current_nid, 1, migrated);
return 0;
+
}

/* NUMA hinting page fault entry point for regular pmds */
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Ingo Molnar
2012-12-11 09:00:02 UTC
Permalink
Post by Mel Gorman
Post by Mel Gorman
For example, I think that point 5 above is the potential source of the
corruption because. You're not flushing the TLBs for the PTEs you are
updating in batch. Granted, you're relaxing rather than restricting access
so it should be ok and at worse cause a spurious fault but I also find
it suspicious that you do not recheck pte_same under the PTL when doing
the final PTE update.
Looking again, the lack of a pte_same check should be ok. The
addr, addr_start, ptep and ptep_start is a little messy but
also look fine. You're not accidentally crossing a PMD
boundary. You should be protected against huge pages being
collapsed underneath you as you hold mmap_sem for read. If the
first page in the pmd (or VMA) is not present then target_nid
== -1 which gets passed into __do_numa_page. This check
if (target_nid == -1 || target_nid == page_nid)
goto out;
then means you never actually migrate for that whole PMD and
will just clear the PTEs. [...]
Yes.
Post by Mel Gorman
[...] Possibly wrong, but not what we're looking for. [...]
It's a detail - I thought not touching partial 2MB pages is just
as valid as picking some other page to represent it, and went
for the simpler option.

But yes, I agree that using the first present page would be
better, as it would better handle partial vmas not
starting/ending at a 2MB boundary - which happens frequently in
practice.
Post by Mel Gorman
[...] Holding PTL across task_numa_fault is bad, but not the
bad we're looking for.
No, holding the PTL across task_numa_fault() is fine, because
this bit got reworked in my tree rather significantly, see:

6030a23a1c66 sched: Move the NUMA placement logic to a worklet

and followup patches.
Post by Mel Gorman
/me scratches his head
Machine is still unavailable so in an attempt to rattle this
out I prototyped the equivalent patch for balancenuma and then
went back to numacore to see could I spot a major difference.
Comparing them, there is no guarantee you clear pte_numa for
the address that was originally faulted if there was a racing
fault that cleared it underneath you but in itself that should
not be an issue. Your use of ptep++ instead of
pte_offset_map() might break on 32-bit with NUMA support if
PTE pages are stored in highmem. Still the wrong wrong.
Yes.
Post by Mel Gorman
If the bug is indeed here, it's not obvious. I don't know why
I'm triggering it or why it only triggers for specjbb as I
cannot imagine what the JVM would be doing that is that weird
or that would not have triggered before. Maybe we both suffer
this type of problem but that numacores rate of migration is
able to trigger it.
Agreed.
Post by Mel Gorman
Post by Mel Gorman
Basically if I felt that handling ptes in batch like this
was of critical important I would have implemented it very
differently on top of balancenuma. I would have only taken
the PTL lock if updating the PTE to keep contention down and
redid racy checks under PTL, I'd have only used trylock for
every non-faulted PTE and I would only have migrated if it
was a remote->local copy. I certainly would not hold PTL
while calling task_numa_fault(). I would have kept the
handling ona per-pmd basis when it was expected that most
PTEs underneath should be on the same node.
This is prototype only but what I was using as a reference to
see could I spot a problem in yours. It has not been even boot
tested but avoids remote->remote copies, contending on PTL or
holding it longer than necessary (should anyway)
So ... because time is running out and it would be nice to
progress with this for v3.8, I'd suggest the following approach:

- Please send your current tree to Linus as-is. You already
have my Acked-by/Reviewed-by for its scheduler bits, and my
testing found your tree to have no regression to mainline,
plus it's a nice win in a number of NUMA-intense workloads.
So it's a good, monotonic step forward in terms of NUMA
balancing, very close to what the bits I'm working on need as
infrastructure.

- I'll rebase all my devel bits on top of it. Instead of
removing the migration bandwidth I'll simply increase it for
testing - this should trigger similarly aggressive behavior.
I'll try to touch as little of the mm/ code as possible, to
keep things debuggable.

If the JVM segfault is a bug introduced by some non-obvious
difference only present in numa/core and fixed in your tree then
the bug will be fixed magically and we can forget about it.

If it's something latent in your tree as well, then at least we
will be able to stare at the exact same tree, instead of
endlessly wondering about small, unnecessary differences.

( My gut feeling is that it's 50%/50%, I really cannot exclude
any of the two possibilities. )

Agreed?

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Ingo Molnar
2012-12-11 09:20:02 UTC
Permalink
Post by Ingo Molnar
Post by Mel Gorman
This is prototype only but what I was using as a reference
to see could I spot a problem in yours. It has not been even
boot tested but avoids remote->remote copies, contending on
PTL or holding it longer than necessary (should anyway)
So ... because time is running out and it would be nice to
progress with this for v3.8, I'd suggest the following
- Please send your current tree to Linus as-is. You already
have my Acked-by/Reviewed-by for its scheduler bits, and my
testing found your tree to have no regression to mainline,
plus it's a nice win in a number of NUMA-intense workloads.
So it's a good, monotonic step forward in terms of NUMA
balancing, very close to what the bits I'm working on need as
infrastructure.
- I'll rebase all my devel bits on top of it. Instead of
removing the migration bandwidth I'll simply increase it for
testing - this should trigger similarly aggressive behavior.
I'll try to touch as little of the mm/ code as possible, to
keep things debuggable.
One minor last-minute request/nit before you send it to Linus,
would you mind doing a:

CONFIG_BALANCE_NUMA => CONFIG_NUMA_BALANCING

rename please? (I can do it for you if you don't have the time.)

CONFIG_NUMA_BALANCING is really what fits into our existing NUMA
namespace, CONFIG_NUMA, CONFIG_NUMA_EMU - and, more importantly,
the ordering of words follows the common generic -> less generic
ordering we do in the kernel for config names and methods.

So it would fit nicely into existing Kconfig naming schemes:

CONFIG_TRACING
CONFIG_FILE_LOCKING
CONFIG_EVENT_TRACING

etc.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Mel Gorman
2012-12-11 15:30:02 UTC
Permalink
Post by Ingo Molnar
Post by Ingo Molnar
Post by Mel Gorman
This is prototype only but what I was using as a reference
to see could I spot a problem in yours. It has not been even
boot tested but avoids remote->remote copies, contending on
PTL or holding it longer than necessary (should anyway)
So ... because time is running out and it would be nice to
progress with this for v3.8, I'd suggest the following
- Please send your current tree to Linus as-is. You already
have my Acked-by/Reviewed-by for its scheduler bits, and my
testing found your tree to have no regression to mainline,
plus it's a nice win in a number of NUMA-intense workloads.
So it's a good, monotonic step forward in terms of NUMA
balancing, very close to what the bits I'm working on need as
infrastructure.
- I'll rebase all my devel bits on top of it. Instead of
removing the migration bandwidth I'll simply increase it for
testing - this should trigger similarly aggressive behavior.
I'll try to touch as little of the mm/ code as possible, to
keep things debuggable.
One minor last-minute request/nit before you send it to Linus,
CONFIG_BALANCE_NUMA => CONFIG_NUMA_BALANCING
rename please? (I can do it for you if you don't have the time.)
CONFIG_NUMA_BALANCING is really what fits into our existing NUMA
namespace, CONFIG_NUMA, CONFIG_NUMA_EMU - and, more importantly,
the ordering of words follows the common generic -> less generic
ordering we do in the kernel for config names and methods.
CONFIG_TRACING
CONFIG_FILE_LOCKING
CONFIG_EVENT_TRACING
etc.
Yes, that makes sense. I should have spotted the rationale. I also took
the liberty of renaming the command-line parameter and the variables to
be consistent with this.
--
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Mel Gorman
2012-12-11 16:40:02 UTC
Permalink
Post by Ingo Molnar
Post by Mel Gorman
Post by Mel Gorman
For example, I think that point 5 above is the potential source of the
corruption because. You're not flushing the TLBs for the PTEs you are
updating in batch. Granted, you're relaxing rather than restricting access
so it should be ok and at worse cause a spurious fault but I also find
it suspicious that you do not recheck pte_same under the PTL when doing
the final PTE update.
Looking again, the lack of a pte_same check should be ok. The
addr, addr_start, ptep and ptep_start is a little messy but
also look fine. You're not accidentally crossing a PMD
boundary. You should be protected against huge pages being
collapsed underneath you as you hold mmap_sem for read. If the
first page in the pmd (or VMA) is not present then target_nid
== -1 which gets passed into __do_numa_page. This check
if (target_nid == -1 || target_nid == page_nid)
goto out;
then means you never actually migrate for that whole PMD and
will just clear the PTEs. [...]
Yes.
Post by Mel Gorman
[...] Possibly wrong, but not what we're looking for. [...]
It's a detail - I thought not touching partial 2MB pages is just
as valid as picking some other page to represent it, and went
for the simpler option.
I very strongly suspect that in the majority of cases that it behaves just
as well. I considered whether it makes a difference if the first page
or faulting page was used as the hint but concluded it doesn't. If the
workload is converged on the PMD, it makes no difference. If it's not,
then tasks are equally affected at least.
Post by Ingo Molnar
But yes, I agree that using the first present page would be
better, as it would better handle partial vmas not
starting/ending at a 2MB boundary - which happens frequently in
practice.
Post by Mel Gorman
[...] Holding PTL across task_numa_fault is bad, but not the
bad we're looking for.
No, holding the PTL across task_numa_fault() is fine, because
6030a23a1c66 sched: Move the NUMA placement logic to a worklet
and followup patches.
I believe I see your point. After that patch is applied task_numa_fault()
is a relatively small function and is no longer calling task_numa_placement.
Sure, PTL is held longer than necessary but not enough to cause real
scalability issues.
Post by Ingo Molnar
Post by Mel Gorman
If the bug is indeed here, it's not obvious. I don't know why
I'm triggering it or why it only triggers for specjbb as I
cannot imagine what the JVM would be doing that is that weird
or that would not have triggered before. Maybe we both suffer
this type of problem but that numacores rate of migration is
able to trigger it.
Agreed.
I spent some more time on this today and the bug is *really* hard to trigger
or at least I have been unable to trigger it today. This begs the question
why it triggered three times in relatively quick succession separated by
a few hours when testing numacore on Dec 9th. Other tests ran between the
failures. The first failure results were discarded. I deleted them to see
if the same test reproduced it a second time (it did).

Of the three times this bug triggered in the last week, two were unclear
where they crashed but one showed that the bug was triggered by the JVMs
garbage collector. That at least is a corner case and might explain why
it's hard to trigger.

I feel extremely bad about how I reported this because even though we
differ in how we handle faults, I really cannot see any difference that
would explain this and I've looked long enough. Triggering this by the
kernel would *have* to be something like missing a cache or TLB flush
after page tables have been modified or during migration but in most way
that matters we share that logic. Where we differ, it shouldn't matter.

I'm contemplating even that this is a JVM timing bug that can be triggered if
page migration happens at the wrong time. numacore would only be indirectly
at fault by migrating more often. If this was the case, balancenuma would
hit the problem given enough time.

I'll keep kicking it in the background.

FWIW, numacore pulled yesterday completed the same tests without any error
this time but none of the commits since Dec 9th would account for fixing it.
Post by Ingo Molnar
Post by Mel Gorman
Post by Mel Gorman
Basically if I felt that handling ptes in batch like this
was of critical important I would have implemented it very
differently on top of balancenuma. I would have only taken
the PTL lock if updating the PTE to keep contention down and
redid racy checks under PTL, I'd have only used trylock for
every non-faulted PTE and I would only have migrated if it
was a remote->local copy. I certainly would not hold PTL
while calling task_numa_fault(). I would have kept the
handling ona per-pmd basis when it was expected that most
PTEs underneath should be on the same node.
This is prototype only but what I was using as a reference to
see could I spot a problem in yours. It has not been even boot
tested but avoids remote->remote copies, contending on PTL or
holding it longer than necessary (should anyway)
So ... because time is running out and it would be nice to
- Please send your current tree to Linus as-is. You already
have my Acked-by/Reviewed-by for its scheduler bits, and my
testing found your tree to have no regression to mainline,
plus it's a nice win in a number of NUMA-intense workloads.
So it's a good, monotonic step forward in terms of NUMA
balancing, very close to what the bits I'm working on need as
infrastructure.
Thanks.
Post by Ingo Molnar
- I'll rebase all my devel bits on top of it. Instead of
removing the migration bandwidth I'll simply increase it for
testing - this should trigger similarly aggressive behavior.
I'll try to touch as little of the mm/ code as possible, to
keep things debuggable.
Agreed. I'll do my best to review the patches on top and any of the MM
changes you want to make. I know that at the very least you'll want to
change what information it sent to task_numa_fault(), last_nid needs to
be renamed and I should review the flag-packing-patch properly with the
view to seeing can that hurt any of the other flags.
Post by Ingo Molnar
If the JVM segfault is a bug introduced by some non-obvious
difference only present in numa/core and fixed in your tree then
the bug will be fixed magically and we can forget about it.
Magic fix is the worst of all fixes :(. I'd really like to know why this
triggered but now my big mouth has landed me with the problem. If this
magically goes away then it's either a really-hard-to-hit-JVM error or
far worse from my perspective -- this is a transient hardware error that
was triggered by the machine running at maximum capacity for 6 weeks that
went away when the machine was turned off for a day.

If it turns out to be hardware, it has planked me straight into the asshat
end of the spectrum, particularly after the first THP debacle.
Post by Ingo Molnar
If it's something latent in your tree as well, then at least we
will be able to stare at the exact same tree, instead of
endlessly wondering about small, unnecessary differences.
True.
Post by Ingo Molnar
( My gut feeling is that it's 50%/50%, I really cannot exclude
any of the two possibilities. )
Neither can I but I've managed to convince myself that it *has* to be on
my side somewhere (or VM code, the JVM I'm using or the hardware). I just
have to find where.
Post by Ingo Molnar
Agreed?
Yes.

I've queued the following for tests before I send the pull request just in
case. The only difference is adding "mm: Check if PTE is already allocated
during page fault" in case it got lost. I'll send the following request
tomorrow unless you have any objections. If any of the signed-offs are in
error, please shout and I'll get them fixed up.

---8<---
This is a pull request for "Automatic NUMA Balancing V11". The list
of changes since commit f4a75d2eb7b1e2206094b901be09adb31ba63681:

Linux 3.7-rc6 (2012-11-16 17:42:40 -0800)

are available in the git repository at:

git://git.kernel.org/pub/scm/linux/kernel/git/mel/linux-balancenuma.git balancenuma-v11

for you to fetch changes up to 4fc3f1d66b1ef0d7b8dc11f4ff1cc510f78b37d6:

mm/rmap, migration: Make rmap_walk_anon() and try_to_unmap_anon() more scalable (2012-12-11 14:43:00 +0000)

There are three implementations for NUMA balancing, this tree (balancenuma),
numacore which has been developed in tip/master and autonuma which is in
aa.git. In almost all respects balancenuma is the dumbest of the three
because its main impact is on the VM side with no attempt to be smart
about scheduling. In the interest of getting the ball rolling, it would
be desirable to see this much merged for 3.8 with the view to building
scheduler smarts on top and adapting the VM where required for 3.9.

The most recent set of comparisons available from different people are

mel: https://lkml.org/lkml/2012/12/9/108
mingo: https://lkml.org/lkml/2012/12/7/331
tglx: https://lkml.org/lkml/2012/12/10/437
srikar: https://lkml.org/lkml/2012/12/10/397

The results are a mixed bag. In my own tests, balancenuma does reasonably
well. It's dumb as rocks and does not regress against mainline. On the
other hand, Ingo's tests shows that balancenuma is incapable of converging
for this workloads driven by perf which is bad but is potentially explained
by the lack of scheduler smarts. Thomas' results show balancenuma improves
on mainline but falls far short of numacore or autonuma. Srikar's results
indicate we all suck on a large machine with imbalanced node sizes.

My own testing showed that recent numacore results have improved
dramatically, particularly in the last week but not universally. We've
butted heads heavily on system CPU usage and high levels of migration even
when it shows that overall performance is better. There are also cases
where it regresses (in my case, single JVM, THP enabled) but at times the
regressions are for lower numbers of warehouses and not higher numbers so
reports are inconsistent. Recently I reported for numacore that the JVM
was crashing with NullPointerExceptions but currently it's unclear what
the source of this problem is. Initially I thought it was in how numacore
batch handles PTEs but I'm no longer think this is the case. It's possible
numacore is just able to trigger it due to higher rates of migration.

These reports were quite late in the cycle so I/we would like to start
with this tree as it contains much of the code we can agree on and has
not changed significantly over the last 2-3 weeks.

Andrea Arcangeli (5):
mm: numa: define _PAGE_NUMA
mm: numa: pte_numa() and pmd_numa()
mm: numa: Support NUMA hinting page faults from gup/gup_fast
mm: numa: split_huge_page: transfer the NUMA type from the pmd to the pte
mm: numa: Structures for Migrate On Fault per NUMA migration rate limiting

Hillf Danton (2):
mm: numa: split_huge_page: Transfer last_nid on tail page
mm: numa: migrate: Set last_nid on newly allocated page

Ingo Molnar (3):
mm: Optimize the TLB flush of sys_mprotect() and change_protection() users
mm/rmap: Convert the struct anon_vma::mutex to an rwsem
mm/rmap, migration: Make rmap_walk_anon() and try_to_unmap_anon() more scalable

Lee Schermerhorn (3):
mm: mempolicy: Add MPOL_NOOP
mm: mempolicy: Check for misplaced page
mm: mempolicy: Add MPOL_MF_LAZY

Mel Gorman (26):
mm: Check if PTE is already allocated during page fault
mm: compaction: Move migration fail/success stats to migrate.c
mm: migrate: Add a tracepoint for migrate_pages
mm: compaction: Add scanned and isolated counters for compaction
mm: numa: Create basic numa page hinting infrastructure
mm: migrate: Drop the misplaced pages reference count if the target node is full
mm: mempolicy: Use _PAGE_NUMA to migrate pages
mm: mempolicy: Implement change_prot_numa() in terms of change_protection()
mm: mempolicy: Hide MPOL_NOOP and MPOL_MF_LAZY from userspace for now
sched, numa, mm: Count WS scanning against present PTEs, not virtual memory ranges
mm: numa: Add pte updates, hinting and migration stats
mm: numa: Migrate on reference policy
mm: numa: Migrate pages handled during a pmd_numa hinting fault
mm: numa: Rate limit the amount of memory that is migrated between nodes
mm: numa: Rate limit setting of pte_numa if node is saturated
sched: numa: Slowly increase the scanning period as NUMA faults are handled
mm: numa: Introduce last_nid to the page frame
mm: numa: Use a two-stage filter to restrict pages being migrated for unlikely task<->node relationships
mm: sched: Adapt the scanning rate if a NUMA hinting fault does not migrate
mm: sched: numa: Control enabling and disabling of NUMA balancing
mm: sched: numa: Control enabling and disabling of NUMA balancing if !SCHED_DEBUG
mm: sched: numa: Delay PTE scanning until a task is scheduled on a new node
mm: numa: Add THP migration for the NUMA working set scanning fault case.
mm: numa: Add THP migration for the NUMA working set scanning fault case build fix
mm: numa: Account for failed allocations and isolations as migration failures
mm: migrate: Account a transhuge page properly when rate limiting

Peter Zijlstra (6):
mm: Count the number of pages affected in change_protection()
mm: mempolicy: Make MPOL_LOCAL a real policy
mm: migrate: Introduce migrate_misplaced_page()
mm: numa: Add fault driven placement and migration
mm: sched: numa: Implement constant, per task Working Set Sampling (WSS) rate
mm: sched: numa: Implement slow start for working set sampling

Rik van Riel (5):
x86: mm: only do a local tlb flush in ptep_set_access_flags()
x86: mm: drop TLB flush from ptep_set_access_flags
mm,generic: only flush the local TLB in ptep_set_access_flags
x86/mm: Introduce pte_accessible()
mm: Only flush the TLB when clearing an accessible pte

Documentation/kernel-parameters.txt | 3 +
arch/sh/mm/Kconfig | 1 +
arch/x86/Kconfig | 2 +
arch/x86/include/asm/pgtable.h | 17 +-
arch/x86/include/asm/pgtable_types.h | 20 ++
arch/x86/mm/pgtable.c | 8 +-
include/asm-generic/pgtable.h | 110 +++++++++++
include/linux/huge_mm.h | 16 +-
include/linux/hugetlb.h | 8 +-
include/linux/mempolicy.h | 8 +
include/linux/migrate.h | 47 ++++-
include/linux/mm.h | 39 ++++
include/linux/mm_types.h | 31 ++++
include/linux/mmzone.h | 13 ++
include/linux/rmap.h | 33 ++--
include/linux/sched.h | 27 +++
include/linux/vm_event_item.h | 12 +-
include/linux/vmstat.h | 8 +
include/trace/events/migrate.h | 51 +++++
include/uapi/linux/mempolicy.h | 15 +-
init/Kconfig | 45 +++++
kernel/fork.c | 3 +
kernel/sched/core.c | 71 +++++--
kernel/sched/fair.c | 227 +++++++++++++++++++++++
kernel/sched/features.h | 11 ++
kernel/sched/sched.h | 12 ++
kernel/sysctl.c | 45 ++++-
mm/compaction.c | 15 +-
mm/huge_memory.c | 108 ++++++++++-
mm/hugetlb.c | 10 +-
mm/internal.h | 7 +-
mm/ksm.c | 6 +-
mm/memcontrol.c | 7 +-
mm/memory-failure.c | 7 +-
mm/memory.c | 199 +++++++++++++++++++-
mm/memory_hotplug.c | 3 +-
mm/mempolicy.c | 283 +++++++++++++++++++++++++---
mm/migrate.c | 337 +++++++++++++++++++++++++++++++++-
mm/mmap.c | 10 +-
mm/mprotect.c | 135 +++++++++++---
mm/mremap.c | 2 +-
mm/page_alloc.c | 10 +-
mm/pgtable-generic.c | 9 +-
mm/rmap.c | 66 +++----
mm/vmstat.c | 16 +-
45 files changed, 1940 insertions(+), 173 deletions(-)
create mode 100644 include/trace/events/migrate.h

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Ingo Molnar
2012-12-17 10:40:01 UTC
Permalink
Post by Mel Gorman
Post by Ingo Molnar
Post by Mel Gorman
[...] Holding PTL across task_numa_fault is bad, but not
the bad we're looking for.
No, holding the PTL across task_numa_fault() is fine,
because this bit got reworked in my tree rather
6030a23a1c66 sched: Move the NUMA placement logic to a
worklet
and followup patches.
I believe I see your point. After that patch is applied
task_numa_fault() is a relatively small function and is no
longer calling task_numa_placement. Sure, PTL is held longer
than necessary but not enough to cause real scalability
issues.
Yes - my motivation for that was three-fold:

1) to push rebalancing into process context and thus make it
essentially lockless and also potentially preemptable.

2) enable the flip-tasks logic, which relies on taking a
balancing decision and acting on it immediately. If you are
in process context then this is doable. If you are in a
balancing irq context then not so much.

3) to simplify the 2M-emu loop was extra dressing on the cake:
instead of taking and dropping the PTL 512 times (possibly
interleaving two threads on the same pmd, both of them
taking/dropping the same set of locks?), it only takes the
ptl once.

I'll revive this aspect, it has many positives.
Post by Mel Gorman
Post by Ingo Molnar
Post by Mel Gorman
If the bug is indeed here, it's not obvious. I don't know
why I'm triggering it or why it only triggers for specjbb
as I cannot imagine what the JVM would be doing that is
that weird or that would not have triggered before. Maybe
we both suffer this type of problem but that numacores
rate of migration is able to trigger it.
Agreed.
I spent some more time on this today and the bug is *really*
hard to trigger or at least I have been unable to trigger it
today. This begs the question why it triggered three times in
relatively quick succession separated by a few hours when
testing numacore on Dec 9th. Other tests ran between the
failures. The first failure results were discarded. I deleted
them to see if the same test reproduced it a second time (it
did).
Of the three times this bug triggered in the last week, two
were unclear where they crashed but one showed that the bug
was triggered by the JVMs garbage collector. That at least is
a corner case and might explain why it's hard to trigger.
I feel extremely bad about how I reported this because even
though we differ in how we handle faults, I really cannot see
any difference that would explain this and I've looked long
enough. Triggering this by the kernel would *have* to be
something like missing a cache or TLB flush after page tables
have been modified or during migration but in most way that
matters we share that logic. Where we differ, it shouldn't
matter.
Don't worry, I really think you reported a genuine bug, even if
it's hard to hit.
Post by Mel Gorman
FWIW, numacore pulled yesterday completed the same tests
without any error this time but none of the commits since Dec
9th would account for fixing it.
Correct. I think chances are that it's still latent. Either
fixed in your version of the code, which will be hard to
reconstruct - or it's an active upstream bug.

I'd not blame it on the JVM for a good while - JVMs are one of
the most abused pieces of code on the planet, literally running
millions of applications on thousands of kernel variants.

Could you try the patch below on latest upstream with
CONFIG_NUMA_BALANCING=y, it increases migration bandwidth
10-fold - does it make it easier to trigger the bug on the now
upstream NUMA-balancing feature?

It will kill throughput on a number of your tests, but it should
make all the NUMA-specific activities during the JVM test a lot
more frequent.

Thanks,

Ingo

diff --git a/mm/migrate.c b/mm/migrate.c
index 32efd80..8699e8f 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1511,7 +1511,7 @@ static struct page *alloc_misplaced_dst_page(struct page *page,
*/
static unsigned int migrate_interval_millisecs __read_mostly = 100;
static unsigned int pteupdate_interval_millisecs __read_mostly = 1000;
-static unsigned int ratelimit_pages __read_mostly = 128 << (20 - PAGE_SHIFT);
+static unsigned int ratelimit_pages __read_mostly = 1280 << (20 - PAGE_SHIFT);

/* Returns true if NUMA migration is currently rate limited */
bool migrate_ratelimited(int node)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Srikar Dronamraju
2012-12-10 17:20:01 UTC
Permalink
Hi Mel, Ingo,

Here are the results of running autonumabenchmark on a 64 core, 8 node
machine. Has six 32GB nodes and two 64 GB nodes.


KernelVersion: 3.7.0-rc8
Testcase: Min Max Avg
numa01: 1475.37 1615.39 1555.24
numa01_HARD_BIND: 900.42 1244.00 993.30
numa01_INVERSE_BIND: 2835.44 5067.22 3634.86
numa01_THREAD_ALLOC: 918.51 1384.21 1121.17
numa01_THREAD_ALLOC_HARD_BIND: 599.58 1178.26 792.73
numa01_THREAD_ALLOC_INVERSE_BIND: 1841.33 2237.34 1988.95
numa02: 126.95 188.31 147.04
numa02_HARD_BIND: 26.05 29.17 26.94
numa02_INVERSE_BIND: 341.10 369.37 349.10
numa02_SMT: 144.32 922.65 386.43
numa02_SMT_HARD_BIND: 26.61 170.71 101.98
numa02_SMT_INVERSE_BIND: 288.12 456.45 325.26

KernelVersion: 3.7.0-rc8-tip_master+(December 7th Snapshot)
Testcase: Min Max Avg %Change
numa01: 2927.89 3217.56 3103.21 -49.88%
numa01_HARD_BIND: 2653.09 5964.23 3431.35 -71.05%
numa01_INVERSE_BIND: 3567.03 3933.18 3811.91 -4.64%
numa01_THREAD_ALLOC: 1801.80 2339.16 1980.96 -43.40%
numa01_THREAD_ALLOC_HARD_BIND: 1705.84 2110.06 1913.64 -58.57%
numa01_THREAD_ALLOC_INVERSE_BIND: 2266.12 2540.61 2376.67 -16.31%
numa02: 179.26 358.03 264.19 -44.34%
numa02_HARD_BIND: 26.07 29.38 27.70 -2.74%
numa02_INVERSE_BIND: 337.99 347.95 343.51 1.63%
numa02_SMT: 93.65 402.58 213.15 81.29%
numa02_SMT_HARD_BIND: 91.19 140.47 116.26 -12.28%
numa02_SMT_INVERSE_BIND: 289.03 299.57 297.01 9.51%

KernelVersion: 3.7.0-rc6-mel_auto_balance(mm-balancenuma-v10r3)
Testcase: Min Max Avg %Change
numa01: 1536.93 1819.85 1694.54 -8.22%
numa01_HARD_BIND: 909.67 1145.32 1055.57 -5.90%
numa01_INVERSE_BIND: 2882.07 3287.24 2976.89 22.10%
numa01_THREAD_ALLOC: 995.79 4845.27 1905.85 -41.17%
numa01_THREAD_ALLOC_HARD_BIND: 582.36 818.11 655.18 20.99%
numa01_THREAD_ALLOC_INVERSE_BIND: 1790.91 1927.90 1868.49 6.45%
numa02: 131.53 287.93 209.15 -29.70%
numa02_HARD_BIND: 25.68 31.90 27.66 -2.60%
numa02_INVERSE_BIND: 341.09 401.37 353.84 -1.34%
numa02_SMT: 156.61 2036.63 731.97 -47.21%
numa02_SMT_HARD_BIND: 25.10 196.60 79.72 27.92%
numa02_SMT_INVERSE_BIND: 294.22 1801.59 824.41 -60.55%

KernelVersion: 3.7.0-rc6-autonuma+(mm-autonuma-v28fastr4-mels-rebase)
Testcase: Min Max Avg %Change
numa01: 1596.13 1715.34 1649.44 -5.71%
numa01_HARD_BIND: 920.75 1127.86 1012.50 -1.90%
numa01_INVERSE_BIND: 2858.79 3146.74 2977.16 22.09%
numa01_THREAD_ALLOC: 250.55 374.27 290.12 286.45%
numa01_THREAD_ALLOC_HARD_BIND: 572.29 712.74 630.62 25.71%
numa01_THREAD_ALLOC_INVERSE_BIND: 1835.94 2401.04 2011.20 -1.11%
numa02: 33.93 104.80 50.99 188.37%
numa02_HARD_BIND: 25.94 27.51 26.42 1.97%
numa02_INVERSE_BIND: 334.57 349.51 341.23 2.31%
numa02_SMT: 43.72 114.82 62.41 519.18%
numa02_SMT_HARD_BIND: 34.98 45.61 42.07 142.41%
numa02_SMT_INVERSE_BIND: 284.57 310.62 298.51 8.96%

Avg refers to mean of 5 iterations of autonuma-benchmark.
%Change refers to percentage change from 3.7-rc8

Please do let me know if you have questions/suggestions.
--
Thanks and Regards
Srikar

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Ingo Molnar
2012-12-10 19:30:03 UTC
Permalink
Post by Srikar Dronamraju
KernelVersion: 3.7.0-rc8-tip_master+(December 7th Snapshot)
Please do let me know if you have questions/suggestions.
Do you still have the exact sha1 by any chance?

By the date of the snapshot I'd say that this fix:

f0c77b62ba9d sched: Fix NUMA_EXCLUDE_AFFINE check

could improve performance on your box.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Srikar Dronamraju
2012-12-11 00:50:01 UTC
Permalink
Post by Ingo Molnar
Post by Srikar Dronamraju
KernelVersion: 3.7.0-rc8-tip_master+(December 7th Snapshot)
Please do let me know if you have questions/suggestions.
Do you still have the exact sha1 by any chance?
commit ea8432f29a702cf5a4bf9d91bf4542f9fb190529
Merge: bca2293 18a2f37
Author: Ingo Molnar <***@kernel.org>
Date: Fri Dec 7 10:46:05 2012 +0100

Merge branch 'linus'



git log --oneline shows something like this.

ea8432f Merge branch 'linus'
bca2293 Merge branch 'x86/nuke386'
11a4441 Merge branch 'x86/cleanups'
b8ae5b0 Merge branch 'x86/bsp-hotplug'
232e4c0 Merge branch 'timers/core'
24a0668 Merge branch 'core/urgent'
9ee046a Merge branch 'core/rcu'
f1ab78f Merge branch 'core/locking'
2e44b38 Merge branch 'numa/base'
b12fe81 numa, sched: Streamline and fix numa_allow_migration() use
ef88e22 numa, sched: Improve directed convergence
2948b6d numa, sched: Improve staggered convergence
540431e numa, mm: Fix !THP, 4K-pte "2M-emu" NUMA fault handling
6de1a2e numa, mm, sched: Fix NUMA affinity tracking logic
ff2a9f9 numa, mm, sched: Implement last-CPU+PID hash tracking
490a116 numa, sched: Implement wake-cpu migration support
41ea712 numa, sched: Add tracking of runnable NUMA tasks
78fb84e numa, sched: Fix NUMA tick ->numa_shared setting
18a2f37 tmpfs: fix shared mempolicy leak
c702418 mm: vmscan: do not keep kswapd looping forever due to individual uncompactable zones
60177d3 mm: compaction: validate pfn range passed to isolate_freepages_block
Post by Ingo Molnar
f0c77b62ba9d sched: Fix NUMA_EXCLUDE_AFFINE check
could improve performance on your box.
Yeah, I dont see that commit, will add and test


Also here is the .config for tip_master

#
# Automatically generated file; DO NOT EDIT.
# Linux/x86_64 3.7.0-rc8 Kernel Configuration
#
CONFIG_64BIT=y
CONFIG_X86_64=y
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_OUTPUT_FORMAT="elf64-x86-64"
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig"
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_HAVE_LATENCYTOP_SUPPORT=y
CONFIG_MMU=y
CONFIG_NEED_DMA_MAP_STATE=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_GENERIC_GPIO=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_DEFAULT_IDLE=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_ARCH_HAS_CPU_AUTOPROBE=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ZONE_DMA32=y
CONFIG_AUDIT_ARCH=y
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_X86_64_SMP=y
CONFIG_X86_HT=y
CONFIG_ARCH_HWEIGHT_CFLAGS="-fcall-saved-rdi -fcall-saved-rsi -fcall-saved-rdx -fcall-saved-rcx -fcall-saved-r8 -fcall-saved-r9 -fcall-saved-r10 -fcall-saved-r11"
CONFIG_ARCH_CPU_PROBE_RELEASE=y
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
CONFIG_HAVE_IRQ_WORK=y
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y

#
# General setup
#
CONFIG_EXPERIMENTAL=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_CROSS_COMPILE=""
CONFIG_LOCALVERSION="-tip_master"
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_KERNEL_GZIP=y
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
# CONFIG_KERNEL_XZ is not set
# CONFIG_KERNEL_LZO is not set
CONFIG_DEFAULT_HOSTNAME="(none)"
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_POSIX_MQUEUE_SYSCTL=y
# CONFIG_FHANDLE is not set
CONFIG_AUDIT=y
CONFIG_AUDITSYSCALL=y
CONFIG_AUDIT_WATCH=y
CONFIG_AUDIT_TREE=y
# CONFIG_AUDIT_LOGINUID_IMMUTABLE is not set
CONFIG_HAVE_GENERIC_HARDIRQS=y

#
# IRQ subsystem
#
CONFIG_GENERIC_HARDIRQS=y
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_ARCH_CLOCKSOURCE_DATA=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BUILD=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y
CONFIG_GENERIC_CMOS_UPDATE=y

#
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y

#
# CPU/Task time and stats accounting
#
CONFIG_TICK_CPU_ACCOUNTING=y
# CONFIG_IRQ_TIME_ACCOUNTING is not set
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_BSD_PROCESS_ACCT_V3=y
CONFIG_TASKSTATS=y
CONFIG_TASK_DELAY_ACCT=y
CONFIG_TASK_XACCT=y
CONFIG_TASK_IO_ACCOUNTING=y

#
# RCU Subsystem
#
CONFIG_TREE_RCU=y
# CONFIG_PREEMPT_RCU is not set
# CONFIG_RCU_USER_QS is not set
CONFIG_RCU_FANOUT=64
CONFIG_RCU_FANOUT_LEAF=16
# CONFIG_RCU_FANOUT_EXACT is not set
# CONFIG_RCU_FAST_NO_HZ is not set
# CONFIG_TREE_RCU_TRACE is not set
# CONFIG_RCU_NOCB_CPU is not set
CONFIG_IKCONFIG=m
# CONFIG_IKCONFIG_PROC is not set
CONFIG_LOG_BUF_SHIFT=19
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y
CONFIG_ARCH_SUPPORTS_NUMA_BALANCING=y
CONFIG_ARCH_WANT_PROT_NUMA_PROT_NONE=y
CONFIG_ARCH_USES_NUMA_PROT_NONE=y
CONFIG_NUMA_BALANCING=y
CONFIG_CGROUPS=y
# CONFIG_CGROUP_DEBUG is not set
CONFIG_CGROUP_FREEZER=y
CONFIG_CGROUP_DEVICE=y
CONFIG_CPUSETS=y
CONFIG_PROC_PID_CPUSET=y
CONFIG_CGROUP_CPUACCT=y
CONFIG_RESOURCE_COUNTERS=y
# CONFIG_MEMCG is not set
# CONFIG_CGROUP_HUGETLB is not set
# CONFIG_CGROUP_PERF is not set
CONFIG_CGROUP_SCHED=y
CONFIG_FAIR_GROUP_SCHED=y
# CONFIG_CFS_BANDWIDTH is not set
CONFIG_RT_GROUP_SCHED=y
CONFIG_BLK_CGROUP=y
# CONFIG_DEBUG_BLK_CGROUP is not set
# CONFIG_CHECKPOINT_RESTORE is not set
CONFIG_NAMESPACES=y
CONFIG_UTS_NS=y
CONFIG_IPC_NS=y
CONFIG_PID_NS=y
CONFIG_NET_NS=y
# CONFIG_SYSFS_DEPRECATED is not set
CONFIG_RELAY=y
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
CONFIG_RD_GZIP=y
CONFIG_RD_BZIP2=y
CONFIG_RD_LZMA=y
CONFIG_RD_XZ=y
CONFIG_RD_LZO=y
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
CONFIG_SYSCTL=y
CONFIG_ANON_INODES=y
# CONFIG_EXPERT is not set
CONFIG_HAVE_UID16=y
CONFIG_UID16=y
# CONFIG_SYSCTL_SYSCALL is not set
CONFIG_SYSCTL_EXCEPTION_TRACE=y
CONFIG_KALLSYMS=y
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_PCSPKR_PLATFORM=y
CONFIG_HAVE_PCSPKR_PLATFORM=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_TIMERFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_AIO=y
# CONFIG_EMBEDDED is not set
CONFIG_HAVE_PERF_EVENTS=y

#
# Kernel Performance Events And Counters
#
CONFIG_PERF_EVENTS=y
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_PCI_QUIRKS=y
# CONFIG_COMPAT_BRK is not set
CONFIG_SLAB=y
# CONFIG_SLUB is not set
CONFIG_PROFILING=y
CONFIG_TRACEPOINTS=y
# CONFIG_OPROFILE is not set
CONFIG_HAVE_OPROFILE=y
CONFIG_OPROFILE_NMI_TIMER=y
CONFIG_KPROBES=y
# CONFIG_JUMP_LABEL is not set
CONFIG_OPTPROBES=y
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y
CONFIG_KRETPROBES=y
CONFIG_USER_RETURN_NOTIFIER=y
CONFIG_HAVE_IOREMAP_PROT=y
CONFIG_HAVE_KPROBES=y
CONFIG_HAVE_KRETPROBES=y
CONFIG_HAVE_OPTPROBES=y
CONFIG_HAVE_ARCH_TRACEHOOK=y
CONFIG_HAVE_DMA_ATTRS=y
CONFIG_USE_GENERIC_SMP_HELPERS=y
CONFIG_GENERIC_SMP_IDLE_THREAD=y
CONFIG_HAVE_REGS_AND_STACK_ACCESS_API=y
CONFIG_HAVE_DMA_API_DEBUG=y
CONFIG_HAVE_HW_BREAKPOINT=y
CONFIG_HAVE_MIXED_BREAKPOINTS_REGS=y
CONFIG_HAVE_USER_RETURN_NOTIFIER=y
CONFIG_HAVE_PERF_EVENTS_NMI=y
CONFIG_HAVE_PERF_REGS=y
CONFIG_HAVE_PERF_USER_STACK_DUMP=y
CONFIG_HAVE_ARCH_JUMP_LABEL=y
CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG=y
CONFIG_HAVE_CMPXCHG_LOCAL=y
CONFIG_HAVE_CMPXCHG_DOUBLE=y
CONFIG_ARCH_WANT_COMPAT_IPC_PARSE_VERSION=y
CONFIG_ARCH_WANT_OLD_COMPAT_IPC=y
CONFIG_GENERIC_KERNEL_THREAD=y
CONFIG_GENERIC_KERNEL_EXECVE=y
CONFIG_HAVE_ARCH_SECCOMP_FILTER=y
CONFIG_HAVE_CONTEXT_TRACKING=y
CONFIG_HAVE_IRQ_TIME_ACCOUNTING=y
CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y
CONFIG_MODULES_USE_ELF_RELA=y

#
# GCOV-based kernel profiling
#
# CONFIG_GCOV_KERNEL is not set
# CONFIG_HAVE_GENERIC_DMA_COHERENT is not set
CONFIG_SLABINFO=y
CONFIG_RT_MUTEXES=y
CONFIG_BASE_SMALL=0
CONFIG_MODULES=y
CONFIG_MODULE_FORCE_LOAD=y
CONFIG_MODULE_UNLOAD=y
# CONFIG_MODULE_FORCE_UNLOAD is not set
CONFIG_MODVERSIONS=y
CONFIG_MODULE_SRCVERSION_ALL=y
# CONFIG_MODULE_SIG is not set
CONFIG_STOP_MACHINE=y
CONFIG_BLOCK=y
CONFIG_BLK_DEV_BSG=y
CONFIG_BLK_DEV_BSGLIB=y
CONFIG_BLK_DEV_INTEGRITY=y
CONFIG_BLK_DEV_THROTTLING=y

#
# Partition Types
#
CONFIG_PARTITION_ADVANCED=y
# CONFIG_ACORN_PARTITION is not set
CONFIG_OSF_PARTITION=y
CONFIG_AMIGA_PARTITION=y
# CONFIG_ATARI_PARTITION is not set
CONFIG_MAC_PARTITION=y
CONFIG_MSDOS_PARTITION=y
CONFIG_BSD_DISKLABEL=y
CONFIG_MINIX_SUBPARTITION=y
CONFIG_SOLARIS_X86_PARTITION=y
CONFIG_UNIXWARE_DISKLABEL=y
# CONFIG_LDM_PARTITION is not set
CONFIG_SGI_PARTITION=y
# CONFIG_ULTRIX_PARTITION is not set
CONFIG_SUN_PARTITION=y
CONFIG_KARMA_PARTITION=y
CONFIG_EFI_PARTITION=y
# CONFIG_SYSV68_PARTITION is not set
CONFIG_BLOCK_COMPAT=y

#
# IO Schedulers
#
CONFIG_IOSCHED_NOOP=y
CONFIG_IOSCHED_DEADLINE=y
CONFIG_IOSCHED_CFQ=y
CONFIG_CFQ_GROUP_IOSCHED=y
# CONFIG_DEFAULT_DEADLINE is not set
CONFIG_DEFAULT_CFQ=y
# CONFIG_DEFAULT_NOOP is not set
CONFIG_DEFAULT_IOSCHED="cfq"
CONFIG_PREEMPT_NOTIFIERS=y
CONFIG_INLINE_SPIN_UNLOCK_IRQ=y
CONFIG_INLINE_READ_UNLOCK=y
CONFIG_INLINE_READ_UNLOCK_IRQ=y
CONFIG_INLINE_WRITE_UNLOCK=y
CONFIG_INLINE_WRITE_UNLOCK_IRQ=y
CONFIG_MUTEX_SPIN_ON_OWNER=y
CONFIG_FREEZER=y

#
# Processor type and features
#
CONFIG_ZONE_DMA=y
CONFIG_SMP=y
CONFIG_X86_MPPARSE=y
CONFIG_X86_EXTENDED_PLATFORM=y
# CONFIG_X86_VSMP is not set
CONFIG_X86_SUPPORTS_MEMORY_FAILURE=y
CONFIG_SCHED_OMIT_FRAME_POINTER=y
# CONFIG_KVMTOOL_TEST_ENABLE is not set
CONFIG_PARAVIRT_GUEST=y
# CONFIG_PARAVIRT_TIME_ACCOUNTING is not set
CONFIG_XEN=y
CONFIG_XEN_DOM0=y
CONFIG_XEN_PRIVILEGED_GUEST=y
CONFIG_XEN_PVHVM=y
CONFIG_XEN_MAX_DOMAIN_MEMORY=500
CONFIG_XEN_SAVE_RESTORE=y
CONFIG_XEN_DEBUG_FS=y
CONFIG_KVM_GUEST=y
CONFIG_PARAVIRT=y
# CONFIG_PARAVIRT_SPINLOCKS is not set
CONFIG_PARAVIRT_CLOCK=y
CONFIG_NO_BOOTMEM=y
# CONFIG_MEMTEST is not set
# CONFIG_MK8 is not set
# CONFIG_MPSC is not set
# CONFIG_MCORE2 is not set
# CONFIG_MATOM is not set
CONFIG_GENERIC_CPU=y
CONFIG_X86_INTERNODE_CACHE_SHIFT=6
CONFIG_X86_L1_CACHE_SHIFT=6
CONFIG_X86_TSC=y
CONFIG_X86_CMPXCHG64=y
CONFIG_X86_CMOV=y
CONFIG_X86_MINIMUM_CPU_FAMILY=64
CONFIG_X86_DEBUGCTLMSR=y
CONFIG_CPU_SUP_INTEL=y
CONFIG_CPU_SUP_AMD=y
CONFIG_CPU_SUP_CENTAUR=y
CONFIG_HPET_TIMER=y
CONFIG_HPET_EMULATE_RTC=y
CONFIG_DMI=y
CONFIG_GART_IOMMU=y
CONFIG_CALGARY_IOMMU=y
# CONFIG_CALGARY_IOMMU_ENABLED_BY_DEFAULT is not set
CONFIG_SWIOTLB=y
CONFIG_IOMMU_HELPER=y
CONFIG_NR_CPUS=512
CONFIG_SCHED_SMT=y
CONFIG_SCHED_MC=y
# CONFIG_PREEMPT_NONE is not set
CONFIG_PREEMPT_VOLUNTARY=y
# CONFIG_PREEMPT is not set
CONFIG_X86_LOCAL_APIC=y
CONFIG_X86_IO_APIC=y
CONFIG_X86_REROUTE_FOR_BROKEN_BOOT_IRQS=y
CONFIG_X86_MCE=y
CONFIG_X86_MCE_INTEL=y
CONFIG_X86_MCE_AMD=y
CONFIG_X86_MCE_THRESHOLD=y
# CONFIG_X86_MCE_INJECT is not set
CONFIG_X86_THERMAL_VECTOR=y
# CONFIG_I8K is not set
# CONFIG_MICROCODE is not set
CONFIG_X86_MSR=y
CONFIG_X86_CPUID=y
CONFIG_ARCH_PHYS_ADDR_T_64BIT=y
CONFIG_ARCH_DMA_ADDR_T_64BIT=y
CONFIG_DIRECT_GBPAGES=y
CONFIG_NUMA=y
CONFIG_AMD_NUMA=y
CONFIG_X86_64_ACPI_NUMA=y
CONFIG_NODES_SPAN_OTHER_NODES=y
# CONFIG_NUMA_EMU is not set
CONFIG_NODES_SHIFT=6
CONFIG_ARCH_SPARSEMEM_ENABLE=y
CONFIG_ARCH_SPARSEMEM_DEFAULT=y
CONFIG_ARCH_SELECT_MEMORY_MODEL=y
CONFIG_ARCH_MEMORY_PROBE=y
CONFIG_ARCH_PROC_KCORE_TEXT=y
CONFIG_ILLEGAL_POINTER_VALUE=0xdead000000000000
CONFIG_SELECT_MEMORY_MODEL=y
CONFIG_SPARSEMEM_MANUAL=y
CONFIG_SPARSEMEM=y
CONFIG_NEED_MULTIPLE_NODES=y
CONFIG_HAVE_MEMORY_PRESENT=y
CONFIG_SPARSEMEM_EXTREME=y
CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y
CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER=y
CONFIG_SPARSEMEM_VMEMMAP=y
CONFIG_HAVE_MEMBLOCK=y
CONFIG_HAVE_MEMBLOCK_NODE_MAP=y
CONFIG_ARCH_DISCARD_MEMBLOCK=y
CONFIG_MEMORY_ISOLATION=y
CONFIG_MEMORY_HOTPLUG=y
CONFIG_MEMORY_HOTPLUG_SPARSE=y
# CONFIG_MEMORY_HOTREMOVE is not set
CONFIG_PAGEFLAGS_EXTENDED=y
CONFIG_SPLIT_PTLOCK_CPUS=4
CONFIG_COMPACTION=y
CONFIG_MIGRATION=y
CONFIG_PHYS_ADDR_T_64BIT=y
CONFIG_ZONE_DMA_FLAG=1
CONFIG_BOUNCE=y
CONFIG_VIRT_TO_BUS=y
CONFIG_MMU_NOTIFIER=y
CONFIG_KSM=y
CONFIG_DEFAULT_MMAP_MIN_ADDR=4096
CONFIG_ARCH_SUPPORTS_MEMORY_FAILURE=y
CONFIG_MEMORY_FAILURE=y
CONFIG_TRANSPARENT_HUGEPAGE=y
CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y
# CONFIG_TRANSPARENT_HUGEPAGE_MADVISE is not set
CONFIG_CROSS_MEMORY_ATTACH=y
# CONFIG_CLEANCACHE is not set
# CONFIG_FRONTSWAP is not set
# CONFIG_X86_CHECK_BIOS_CORRUPTION is not set
CONFIG_X86_RESERVE_LOW=64
CONFIG_MTRR=y
CONFIG_MTRR_SANITIZER=y
CONFIG_MTRR_SANITIZER_ENABLE_DEFAULT=1
CONFIG_MTRR_SANITIZER_SPARE_REG_NR_DEFAULT=1
CONFIG_X86_PAT=y
CONFIG_ARCH_USES_PG_UNCACHED=y
CONFIG_ARCH_RANDOM=y
CONFIG_X86_SMAP=y
CONFIG_EFI=y
# CONFIG_EFI_STUB is not set
# CONFIG_SECCOMP is not set
CONFIG_CC_STACKPROTECTOR=y
# CONFIG_HZ_100 is not set
# CONFIG_HZ_250 is not set
# CONFIG_HZ_300 is not set
CONFIG_HZ_1000=y
CONFIG_HZ=1000
CONFIG_SCHED_HRTICK=y
CONFIG_KEXEC=y
CONFIG_CRASH_DUMP=y
CONFIG_KEXEC_JUMP=y
CONFIG_PHYSICAL_START=0x1000000
CONFIG_RELOCATABLE=y
CONFIG_PHYSICAL_ALIGN=0x1000000
CONFIG_HOTPLUG_CPU=y
# CONFIG_BOOTPARAM_HOTPLUG_CPU0 is not set
# CONFIG_DEBUG_HOTPLUG_CPU0 is not set
# CONFIG_COMPAT_VDSO is not set
# CONFIG_CMDLINE_BOOL is not set
CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y
CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE=y
CONFIG_USE_PERCPU_NUMA_NODE_ID=y

#
# Power management and ACPI options
#
CONFIG_ARCH_HIBERNATION_HEADER=y
CONFIG_SUSPEND=y
CONFIG_SUSPEND_FREEZER=y
CONFIG_HIBERNATE_CALLBACKS=y
CONFIG_HIBERNATION=y
CONFIG_PM_STD_PARTITION=""
CONFIG_PM_SLEEP=y
CONFIG_PM_SLEEP_SMP=y
# CONFIG_PM_AUTOSLEEP is not set
# CONFIG_PM_WAKELOCKS is not set
CONFIG_PM_RUNTIME=y
CONFIG_PM=y
# CONFIG_PM_DEBUG is not set
CONFIG_ACPI=y
CONFIG_ACPI_SLEEP=y
CONFIG_ACPI_PROCFS=y
CONFIG_ACPI_PROCFS_POWER=y
# CONFIG_ACPI_EC_DEBUGFS is not set
CONFIG_ACPI_PROC_EVENT=y
CONFIG_ACPI_AC=y
CONFIG_ACPI_BATTERY=y
CONFIG_ACPI_BUTTON=y
CONFIG_ACPI_FAN=y
CONFIG_ACPI_DOCK=y
CONFIG_ACPI_PROCESSOR=y
CONFIG_ACPI_HOTPLUG_CPU=y
# CONFIG_ACPI_PROCESSOR_AGGREGATOR is not set
CONFIG_ACPI_THERMAL=y
CONFIG_ACPI_NUMA=y
# CONFIG_ACPI_CUSTOM_DSDT is not set
# CONFIG_ACPI_INITRD_TABLE_OVERRIDE is not set
CONFIG_ACPI_BLACKLIST_YEAR=0
# CONFIG_ACPI_DEBUG is not set
CONFIG_ACPI_PCI_SLOT=y
CONFIG_X86_PM_TIMER=y
CONFIG_ACPI_CONTAINER=y
CONFIG_ACPI_HOTPLUG_MEMORY=y
# CONFIG_ACPI_SBS is not set
# CONFIG_ACPI_HED is not set
# CONFIG_ACPI_CUSTOM_METHOD is not set
# CONFIG_ACPI_BGRT is not set
CONFIG_ACPI_APEI=y
# CONFIG_ACPI_APEI_GHES is not set
# CONFIG_ACPI_APEI_PCIEAER is not set
# CONFIG_ACPI_APEI_MEMORY_FAILURE is not set
# CONFIG_ACPI_APEI_EINJ is not set
# CONFIG_ACPI_APEI_ERST_DEBUG is not set
CONFIG_SFI=y

#
# CPU Frequency scaling
#
CONFIG_CPU_FREQ=y
# CONFIG_CPU_FREQ_STAT is not set
# CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE is not set
CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE=y
# CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND is not set
# CONFIG_CPU_FREQ_DEFAULT_GOV_CONSERVATIVE is not set
CONFIG_CPU_FREQ_GOV_PERFORMANCE=y
# CONFIG_CPU_FREQ_GOV_POWERSAVE is not set
CONFIG_CPU_FREQ_GOV_USERSPACE=y
# CONFIG_CPU_FREQ_GOV_ONDEMAND is not set
# CONFIG_CPU_FREQ_GOV_CONSERVATIVE is not set

#
# x86 CPU frequency scaling drivers
#
# CONFIG_X86_PCC_CPUFREQ is not set
# CONFIG_X86_ACPI_CPUFREQ is not set
# CONFIG_X86_POWERNOW_K8 is not set
# CONFIG_X86_SPEEDSTEP_CENTRINO is not set
# CONFIG_X86_P4_CLOCKMOD is not set

#
# shared options
#
# CONFIG_X86_SPEEDSTEP_LIB is not set
CONFIG_CPU_IDLE=y
CONFIG_CPU_IDLE_GOV_LADDER=y
CONFIG_CPU_IDLE_GOV_MENU=y
# CONFIG_ARCH_NEEDS_CPU_IDLE_COUPLED is not set
CONFIG_INTEL_IDLE=y

#
# Memory power savings
#
# CONFIG_I7300_IDLE is not set

#
# Bus options (PCI etc.)
#
CONFIG_PCI=y
CONFIG_PCI_DIRECT=y
CONFIG_PCI_MMCONFIG=y
CONFIG_PCI_XEN=y
CONFIG_PCI_DOMAINS=y
CONFIG_PCIEPORTBUS=y
CONFIG_HOTPLUG_PCI_PCIE=y
CONFIG_PCIEAER=y
CONFIG_PCIE_ECRC=y
# CONFIG_PCIEAER_INJECT is not set
CONFIG_PCIEASPM=y
# CONFIG_PCIEASPM_DEBUG is not set
CONFIG_PCIEASPM_DEFAULT=y
# CONFIG_PCIEASPM_POWERSAVE is not set
# CONFIG_PCIEASPM_PERFORMANCE is not set
CONFIG_PCIE_PME=y
CONFIG_ARCH_SUPPORTS_MSI=y
CONFIG_PCI_MSI=y
# CONFIG_PCI_REALLOC_ENABLE_AUTO is not set
CONFIG_PCI_STUB=y
CONFIG_XEN_PCIDEV_FRONTEND=y
CONFIG_HT_IRQ=y
CONFIG_PCI_ATS=y
CONFIG_PCI_IOV=y
CONFIG_PCI_PRI=y
CONFIG_PCI_PASID=y
# CONFIG_PCI_IOAPIC is not set
CONFIG_PCI_LABEL=y
CONFIG_ISA_DMA_API=y
CONFIG_AMD_NB=y
CONFIG_PCCARD=y
CONFIG_PCMCIA=y
CONFIG_PCMCIA_LOAD_CIS=y
CONFIG_CARDBUS=y

#
# PC-card bridges
#
# CONFIG_YENTA is not set
# CONFIG_PD6729 is not set
# CONFIG_I82092 is not set
CONFIG_HOTPLUG_PCI=y
CONFIG_HOTPLUG_PCI_ACPI=y
# CONFIG_HOTPLUG_PCI_ACPI_IBM is not set
# CONFIG_HOTPLUG_PCI_CPCI is not set
# CONFIG_HOTPLUG_PCI_SHPC is not set
# CONFIG_RAPIDIO is not set

#
# Executable file formats / Emulations
#
CONFIG_BINFMT_ELF=y
CONFIG_COMPAT_BINFMT_ELF=y
CONFIG_ARCH_BINFMT_ELF_RANDOMIZE_PIE=y
CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS=y
# CONFIG_HAVE_AOUT is not set
CONFIG_BINFMT_MISC=y
CONFIG_COREDUMP=y
CONFIG_IA32_EMULATION=y
# CONFIG_IA32_AOUT is not set
# CONFIG_X86_X32 is not set
CONFIG_COMPAT=y
CONFIG_COMPAT_FOR_U64_ALIGNMENT=y
CONFIG_SYSVIPC_COMPAT=y
CONFIG_KEYS_COMPAT=y
CONFIG_HAVE_TEXT_POKE_SMP=y
CONFIG_X86_DEV_DMA_OPS=y
CONFIG_NET=y

#
# Networking options
#
CONFIG_PACKET=y
# CONFIG_PACKET_DIAG is not set
CONFIG_UNIX=y
# CONFIG_UNIX_DIAG is not set
CONFIG_XFRM=y
CONFIG_XFRM_ALGO=y
CONFIG_XFRM_USER=y
CONFIG_XFRM_SUB_POLICY=y
CONFIG_XFRM_MIGRATE=y
CONFIG_XFRM_STATISTICS=y
# CONFIG_NET_KEY is not set
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
CONFIG_IP_ADVANCED_ROUTER=y
# CONFIG_IP_FIB_TRIE_STATS is not set
CONFIG_IP_MULTIPLE_TABLES=y
CONFIG_IP_ROUTE_MULTIPATH=y
CONFIG_IP_ROUTE_VERBOSE=y
# CONFIG_IP_PNP is not set
# CONFIG_NET_IPIP is not set
# CONFIG_NET_IPGRE_DEMUX is not set
CONFIG_IP_MROUTE=y
# CONFIG_IP_MROUTE_MULTIPLE_TABLES is not set
CONFIG_IP_PIMSM_V1=y
CONFIG_IP_PIMSM_V2=y
# CONFIG_ARPD is not set
CONFIG_SYN_COOKIES=y
# CONFIG_INET_AH is not set
# CONFIG_INET_ESP is not set
# CONFIG_INET_IPCOMP is not set
# CONFIG_INET_XFRM_TUNNEL is not set
# CONFIG_INET_TUNNEL is not set
# CONFIG_INET_XFRM_MODE_TRANSPORT is not set
# CONFIG_INET_XFRM_MODE_TUNNEL is not set
# CONFIG_INET_XFRM_MODE_BEET is not set
CONFIG_INET_LRO=y
# CONFIG_INET_DIAG is not set
CONFIG_TCP_CONG_ADVANCED=y
# CONFIG_TCP_CONG_BIC is not set
CONFIG_TCP_CONG_CUBIC=y
# CONFIG_TCP_CONG_WESTWOOD is not set
# CONFIG_TCP_CONG_HTCP is not set
# CONFIG_TCP_CONG_HSTCP is not set
# CONFIG_TCP_CONG_HYBLA is not set
# CONFIG_TCP_CONG_VEGAS is not set
# CONFIG_TCP_CONG_SCALABLE is not set
# CONFIG_TCP_CONG_LP is not set
# CONFIG_TCP_CONG_VENO is not set
# CONFIG_TCP_CONG_YEAH is not set
# CONFIG_TCP_CONG_ILLINOIS is not set
CONFIG_DEFAULT_CUBIC=y
# CONFIG_DEFAULT_RENO is not set
CONFIG_DEFAULT_TCP_CONG="cubic"
CONFIG_TCP_MD5SIG=y
CONFIG_IPV6=m
CONFIG_IPV6_PRIVACY=y
CONFIG_IPV6_ROUTER_PREF=y
CONFIG_IPV6_ROUTE_INFO=y
CONFIG_IPV6_OPTIMISTIC_DAD=y
# CONFIG_INET6_AH is not set
# CONFIG_INET6_ESP is not set
# CONFIG_INET6_IPCOMP is not set
# CONFIG_IPV6_MIP6 is not set
# CONFIG_INET6_XFRM_TUNNEL is not set
# CONFIG_INET6_TUNNEL is not set
# CONFIG_INET6_XFRM_MODE_TRANSPORT is not set
# CONFIG_INET6_XFRM_MODE_TUNNEL is not set
# CONFIG_INET6_XFRM_MODE_BEET is not set
# CONFIG_INET6_XFRM_MODE_ROUTEOPTIMIZATION is not set
# CONFIG_IPV6_SIT is not set
# CONFIG_IPV6_TUNNEL is not set
# CONFIG_IPV6_GRE is not set
CONFIG_IPV6_MULTIPLE_TABLES=y
# CONFIG_IPV6_SUBTREES is not set
CONFIG_IPV6_MROUTE=y
# CONFIG_IPV6_MROUTE_MULTIPLE_TABLES is not set
CONFIG_IPV6_PIMSM_V2=y
CONFIG_NETLABEL=y
CONFIG_NETWORK_SECMARK=y
# CONFIG_NETWORK_PHY_TIMESTAMPING is not set
CONFIG_NETFILTER=y
# CONFIG_NETFILTER_DEBUG is not set
CONFIG_NETFILTER_ADVANCED=y

#
# Core Netfilter Configuration
#
# CONFIG_NETFILTER_NETLINK_ACCT is not set
# CONFIG_NETFILTER_NETLINK_QUEUE is not set
# CONFIG_NETFILTER_NETLINK_LOG is not set
# CONFIG_NF_CONNTRACK is not set
CONFIG_NETFILTER_XTABLES=y

#
# Xtables combined modules
#
# CONFIG_NETFILTER_XT_MARK is not set

#
# Xtables targets
#
# CONFIG_NETFILTER_XT_TARGET_AUDIT is not set
# CONFIG_NETFILTER_XT_TARGET_CLASSIFY is not set
# CONFIG_NETFILTER_XT_TARGET_HMARK is not set
# CONFIG_NETFILTER_XT_TARGET_IDLETIMER is not set
# CONFIG_NETFILTER_XT_TARGET_LED is not set
# CONFIG_NETFILTER_XT_TARGET_LOG is not set
# CONFIG_NETFILTER_XT_TARGET_MARK is not set
# CONFIG_NETFILTER_XT_TARGET_NFLOG is not set
# CONFIG_NETFILTER_XT_TARGET_NFQUEUE is not set
# CONFIG_NETFILTER_XT_TARGET_RATEEST is not set
# CONFIG_NETFILTER_XT_TARGET_TEE is not set
# CONFIG_NETFILTER_XT_TARGET_SECMARK is not set
# CONFIG_NETFILTER_XT_TARGET_TCPMSS is not set

#
# Xtables matches
#
# CONFIG_NETFILTER_XT_MATCH_ADDRTYPE is not set
# CONFIG_NETFILTER_XT_MATCH_COMMENT is not set
# CONFIG_NETFILTER_XT_MATCH_CPU is not set
# CONFIG_NETFILTER_XT_MATCH_DCCP is not set
# CONFIG_NETFILTER_XT_MATCH_DEVGROUP is not set
# CONFIG_NETFILTER_XT_MATCH_DSCP is not set
# CONFIG_NETFILTER_XT_MATCH_ECN is not set
# CONFIG_NETFILTER_XT_MATCH_ESP is not set
# CONFIG_NETFILTER_XT_MATCH_HASHLIMIT is not set
# CONFIG_NETFILTER_XT_MATCH_HL is not set
# CONFIG_NETFILTER_XT_MATCH_IPRANGE is not set
# CONFIG_NETFILTER_XT_MATCH_LENGTH is not set
# CONFIG_NETFILTER_XT_MATCH_LIMIT is not set
# CONFIG_NETFILTER_XT_MATCH_MAC is not set
# CONFIG_NETFILTER_XT_MATCH_MARK is not set
# CONFIG_NETFILTER_XT_MATCH_MULTIPORT is not set
# CONFIG_NETFILTER_XT_MATCH_NFACCT is not set
# CONFIG_NETFILTER_XT_MATCH_OWNER is not set
# CONFIG_NETFILTER_XT_MATCH_POLICY is not set
# CONFIG_NETFILTER_XT_MATCH_PKTTYPE is not set
# CONFIG_NETFILTER_XT_MATCH_QUOTA is not set
# CONFIG_NETFILTER_XT_MATCH_RATEEST is not set
# CONFIG_NETFILTER_XT_MATCH_REALM is not set
# CONFIG_NETFILTER_XT_MATCH_RECENT is not set
# CONFIG_NETFILTER_XT_MATCH_SCTP is not set
# CONFIG_NETFILTER_XT_MATCH_STATISTIC is not set
# CONFIG_NETFILTER_XT_MATCH_STRING is not set
# CONFIG_NETFILTER_XT_MATCH_TCPMSS is not set
# CONFIG_NETFILTER_XT_MATCH_TIME is not set
# CONFIG_NETFILTER_XT_MATCH_U32 is not set
# CONFIG_IP_VS is not set

#
# IP: Netfilter Configuration
#
# CONFIG_NF_DEFRAG_IPV4 is not set
# CONFIG_IP_NF_QUEUE is not set
# CONFIG_IP_NF_IPTABLES is not set
# CONFIG_IP_NF_ARPTABLES is not set

#
# IPv6: Netfilter Configuration
#
# CONFIG_NF_DEFRAG_IPV6 is not set
# CONFIG_IP6_NF_IPTABLES is not set
# CONFIG_IP_DCCP is not set
# CONFIG_IP_SCTP is not set
# CONFIG_RDS is not set
# CONFIG_TIPC is not set
# CONFIG_ATM is not set
# CONFIG_L2TP is not set
# CONFIG_BRIDGE is not set
CONFIG_NET_DSA=y
CONFIG_NET_DSA_TAG_DSA=y
CONFIG_NET_DSA_TAG_EDSA=y
CONFIG_NET_DSA_TAG_TRAILER=y
# CONFIG_VLAN_8021Q is not set
# CONFIG_DECNET is not set
# CONFIG_LLC2 is not set
# CONFIG_IPX is not set
# CONFIG_ATALK is not set
# CONFIG_X25 is not set
# CONFIG_LAPB is not set
# CONFIG_WAN_ROUTER is not set
# CONFIG_PHONET is not set
# CONFIG_IEEE802154 is not set
CONFIG_NET_SCHED=y

#
# Queueing/Scheduling
#
# CONFIG_NET_SCH_CBQ is not set
# CONFIG_NET_SCH_HTB is not set
# CONFIG_NET_SCH_HFSC is not set
# CONFIG_NET_SCH_PRIO is not set
# CONFIG_NET_SCH_MULTIQ is not set
# CONFIG_NET_SCH_RED is not set
# CONFIG_NET_SCH_SFB is not set
# CONFIG_NET_SCH_SFQ is not set
# CONFIG_NET_SCH_TEQL is not set
# CONFIG_NET_SCH_TBF is not set
# CONFIG_NET_SCH_GRED is not set
# CONFIG_NET_SCH_DSMARK is not set
# CONFIG_NET_SCH_NETEM is not set
# CONFIG_NET_SCH_DRR is not set
# CONFIG_NET_SCH_MQPRIO is not set
# CONFIG_NET_SCH_CHOKE is not set
# CONFIG_NET_SCH_QFQ is not set
# CONFIG_NET_SCH_CODEL is not set
# CONFIG_NET_SCH_FQ_CODEL is not set
# CONFIG_NET_SCH_INGRESS is not set
# CONFIG_NET_SCH_PLUG is not set

#
# Classification
#
CONFIG_NET_CLS=y
# CONFIG_NET_CLS_BASIC is not set
# CONFIG_NET_CLS_TCINDEX is not set
# CONFIG_NET_CLS_ROUTE4 is not set
# CONFIG_NET_CLS_FW is not set
# CONFIG_NET_CLS_U32 is not set
# CONFIG_NET_CLS_RSVP is not set
# CONFIG_NET_CLS_RSVP6 is not set
# CONFIG_NET_CLS_FLOW is not set
CONFIG_NET_CLS_CGROUP=y
CONFIG_NET_EMATCH=y
CONFIG_NET_EMATCH_STACK=32
# CONFIG_NET_EMATCH_CMP is not set
# CONFIG_NET_EMATCH_NBYTE is not set
# CONFIG_NET_EMATCH_U32 is not set
# CONFIG_NET_EMATCH_META is not set
# CONFIG_NET_EMATCH_TEXT is not set
CONFIG_NET_CLS_ACT=y
# CONFIG_NET_ACT_POLICE is not set
# CONFIG_NET_ACT_GACT is not set
# CONFIG_NET_ACT_MIRRED is not set
# CONFIG_NET_ACT_NAT is not set
# CONFIG_NET_ACT_PEDIT is not set
# CONFIG_NET_ACT_SIMP is not set
# CONFIG_NET_ACT_SKBEDIT is not set
# CONFIG_NET_ACT_CSUM is not set
CONFIG_NET_SCH_FIFO=y
CONFIG_DCB=y
# CONFIG_DNS_RESOLVER is not set
# CONFIG_BATMAN_ADV is not set
# CONFIG_OPENVSWITCH is not set
CONFIG_RPS=y
CONFIG_RFS_ACCEL=y
CONFIG_XPS=y
# CONFIG_NETPRIO_CGROUP is not set
CONFIG_BQL=y
# CONFIG_BPF_JIT is not set

#
# Network testing
#
# CONFIG_NET_PKTGEN is not set
# CONFIG_NET_TCPPROBE is not set
CONFIG_NET_DROP_MONITOR=y
# CONFIG_HAMRADIO is not set
# CONFIG_CAN is not set
# CONFIG_IRDA is not set
# CONFIG_BT is not set
# CONFIG_AF_RXRPC is not set
CONFIG_FIB_RULES=y
CONFIG_WIRELESS=y
# CONFIG_CFG80211 is not set
# CONFIG_LIB80211 is not set

#
# CFG80211 needs to be enabled for MAC80211
#
# CONFIG_WIMAX is not set
# CONFIG_RFKILL is not set
# CONFIG_RFKILL_REGULATOR is not set
# CONFIG_NET_9P is not set
# CONFIG_CAIF is not set
# CONFIG_CEPH_LIB is not set
# CONFIG_NFC is not set
CONFIG_HAVE_BPF_JIT=y

#
# Device Drivers
#

#
# Generic Driver Options
#
CONFIG_UEVENT_HELPER_PATH=""
CONFIG_DEVTMPFS=y
CONFIG_DEVTMPFS_MOUNT=y
CONFIG_STANDALONE=y
CONFIG_PREVENT_FIRMWARE_BUILD=y
CONFIG_FW_LOADER=y
# CONFIG_FIRMWARE_IN_KERNEL is not set
CONFIG_EXTRA_FIRMWARE=""
CONFIG_SYS_HYPERVISOR=y
# CONFIG_GENERIC_CPU_DEVICES is not set
CONFIG_DMA_SHARED_BUFFER=y

#
# Bus devices
#
# CONFIG_OMAP_OCP2SCP is not set
CONFIG_CONNECTOR=y
CONFIG_PROC_EVENTS=y
CONFIG_MTD=y
# CONFIG_MTD_TESTS is not set
# CONFIG_MTD_REDBOOT_PARTS is not set
CONFIG_MTD_CMDLINE_PARTS=y
# CONFIG_MTD_AR7_PARTS is not set

#
# User Modules And Translation Layers
#
# CONFIG_MTD_CHAR is not set
# CONFIG_MTD_BLKDEVS is not set
# CONFIG_MTD_BLOCK is not set
# CONFIG_MTD_BLOCK_RO is not set
# CONFIG_FTL is not set
# CONFIG_NFTL is not set
# CONFIG_INFTL is not set
# CONFIG_RFD_FTL is not set
# CONFIG_SSFDC is not set
# CONFIG_SM_FTL is not set
# CONFIG_MTD_OOPS is not set
# CONFIG_MTD_SWAP is not set

#
# RAM/ROM/Flash chip drivers
#
# CONFIG_MTD_CFI is not set
# CONFIG_MTD_JEDECPROBE is not set
CONFIG_MTD_MAP_BANK_WIDTH_1=y
CONFIG_MTD_MAP_BANK_WIDTH_2=y
CONFIG_MTD_MAP_BANK_WIDTH_4=y
# CONFIG_MTD_MAP_BANK_WIDTH_8 is not set
# CONFIG_MTD_MAP_BANK_WIDTH_16 is not set
# CONFIG_MTD_MAP_BANK_WIDTH_32 is not set
CONFIG_MTD_CFI_I1=y
CONFIG_MTD_CFI_I2=y
# CONFIG_MTD_CFI_I4 is not set
# CONFIG_MTD_CFI_I8 is not set
# CONFIG_MTD_RAM is not set
# CONFIG_MTD_ROM is not set
# CONFIG_MTD_ABSENT is not set

#
# Mapping drivers for chip access
#
CONFIG_MTD_COMPLEX_MAPPINGS=y
# CONFIG_MTD_TS5500 is not set
# CONFIG_MTD_PCI is not set
# CONFIG_MTD_PCMCIA is not set
# CONFIG_MTD_GPIO_ADDR is not set
# CONFIG_MTD_INTEL_VR_NOR is not set
# CONFIG_MTD_PLATRAM is not set
# CONFIG_MTD_LATCH_ADDR is not set

#
# Self-contained MTD device drivers
#
# CONFIG_MTD_PMC551 is not set
# CONFIG_MTD_SLRAM is not set
# CONFIG_MTD_PHRAM is not set
# CONFIG_MTD_MTDRAM is not set
# CONFIG_MTD_BLOCK2MTD is not set

#
# Disk-On-Chip Device Drivers
#
# CONFIG_MTD_DOCG3 is not set
# CONFIG_MTD_NAND is not set
# CONFIG_MTD_ONENAND is not set

#
# LPDDR flash memory drivers
#
# CONFIG_MTD_LPDDR is not set
# CONFIG_MTD_UBI is not set
# CONFIG_PARPORT is not set
CONFIG_PNP=y
# CONFIG_PNP_DEBUG_MESSAGES is not set

#
# Protocols
#
CONFIG_PNPACPI=y
CONFIG_BLK_DEV=y
# CONFIG_BLK_DEV_FD is not set
# CONFIG_BLK_DEV_PCIESSD_MTIP32XX is not set
# CONFIG_BLK_CPQ_DA is not set
# CONFIG_BLK_CPQ_CISS_DA is not set
# CONFIG_BLK_DEV_DAC960 is not set
# CONFIG_BLK_DEV_UMEM is not set
# CONFIG_BLK_DEV_COW_COMMON is not set
CONFIG_BLK_DEV_LOOP=y
CONFIG_BLK_DEV_LOOP_MIN_COUNT=8
# CONFIG_BLK_DEV_CRYPTOLOOP is not set
# CONFIG_BLK_DEV_DRBD is not set
# CONFIG_BLK_DEV_NBD is not set
# CONFIG_BLK_DEV_NVME is not set
# CONFIG_BLK_DEV_SX8 is not set
CONFIG_BLK_DEV_RAM=y
CONFIG_BLK_DEV_RAM_COUNT=16
CONFIG_BLK_DEV_RAM_SIZE=16384
# CONFIG_BLK_DEV_XIP is not set
CONFIG_CDROM_PKTCDVD=m
CONFIG_CDROM_PKTCDVD_BUFFERS=8
# CONFIG_CDROM_PKTCDVD_WCACHE is not set
# CONFIG_ATA_OVER_ETH is not set
# CONFIG_XEN_BLKDEV_FRONTEND is not set
# CONFIG_XEN_BLKDEV_BACKEND is not set
# CONFIG_BLK_DEV_HD is not set
# CONFIG_BLK_DEV_RBD is not set

#
# Misc devices
#
# CONFIG_SENSORS_LIS3LV02D is not set
# CONFIG_AD525X_DPOT is not set
# CONFIG_IBM_ASM is not set
# CONFIG_PHANTOM is not set
# CONFIG_INTEL_MID_PTI is not set
# CONFIG_SGI_IOC4 is not set
# CONFIG_TIFM_CORE is not set
# CONFIG_ICS932S401 is not set
CONFIG_ENCLOSURE_SERVICES=m
# CONFIG_HP_ILO is not set
# CONFIG_APDS9802ALS is not set
# CONFIG_ISL29003 is not set
# CONFIG_ISL29020 is not set
# CONFIG_SENSORS_TSL2550 is not set
# CONFIG_SENSORS_BH1780 is not set
# CONFIG_SENSORS_BH1770 is not set
# CONFIG_SENSORS_APDS990X is not set
# CONFIG_HMC6352 is not set
# CONFIG_DS1682 is not set
# CONFIG_VMWARE_BALLOON is not set
# CONFIG_BMP085_I2C is not set
# CONFIG_PCH_PHUB is not set
# CONFIG_USB_SWITCH_FSA9480 is not set
# CONFIG_C2PORT is not set

#
# EEPROM support
#
# CONFIG_EEPROM_AT24 is not set
# CONFIG_EEPROM_LEGACY is not set
# CONFIG_EEPROM_MAX6875 is not set
# CONFIG_EEPROM_93CX6 is not set
# CONFIG_CB710_CORE is not set

#
# Texas Instruments shared transport line discipline
#
# CONFIG_TI_ST is not set
# CONFIG_SENSORS_LIS3_I2C is not set

#
# Altera FPGA firmware download module
#
# CONFIG_ALTERA_STAPL is not set
# CONFIG_INTEL_MEI is not set
CONFIG_HAVE_IDE=y
# CONFIG_IDE is not set

#
# SCSI device support
#
CONFIG_SCSI_MOD=y
# CONFIG_RAID_ATTRS is not set
CONFIG_SCSI=y
CONFIG_SCSI_DMA=y
CONFIG_SCSI_TGT=m
CONFIG_SCSI_NETLINK=y
CONFIG_SCSI_PROC_FS=y

#
# SCSI support type (disk, tape, CD-ROM)
#
CONFIG_BLK_DEV_SD=m
# CONFIG_CHR_DEV_ST is not set
# CONFIG_CHR_DEV_OSST is not set
CONFIG_BLK_DEV_SR=m
CONFIG_BLK_DEV_SR_VENDOR=y
CONFIG_CHR_DEV_SG=m
# CONFIG_CHR_DEV_SCH is not set
CONFIG_SCSI_ENCLOSURE=m
CONFIG_SCSI_MULTI_LUN=y
CONFIG_SCSI_CONSTANTS=y
CONFIG_SCSI_LOGGING=y
CONFIG_SCSI_SCAN_ASYNC=y

#
# SCSI Transports
#
# CONFIG_SCSI_SPI_ATTRS is not set
CONFIG_SCSI_FC_ATTRS=m
CONFIG_SCSI_FC_TGT_ATTRS=y
# CONFIG_SCSI_ISCSI_ATTRS is not set
CONFIG_SCSI_SAS_ATTRS=m
CONFIG_SCSI_SAS_LIBSAS=m
CONFIG_SCSI_SAS_ATA=y
CONFIG_SCSI_SAS_HOST_SMP=y
# CONFIG_SCSI_SRP_ATTRS is not set
CONFIG_SCSI_LOWLEVEL=y
# CONFIG_ISCSI_TCP is not set
# CONFIG_ISCSI_BOOT_SYSFS is not set
# CONFIG_SCSI_CXGB3_ISCSI is not set
# CONFIG_SCSI_CXGB4_ISCSI is not set
# CONFIG_SCSI_BNX2_ISCSI is not set
# CONFIG_SCSI_BNX2X_FCOE is not set
# CONFIG_BE2ISCSI is not set
# CONFIG_BLK_DEV_3W_XXXX_RAID is not set
# CONFIG_SCSI_HPSA is not set
# CONFIG_SCSI_3W_9XXX is not set
# CONFIG_SCSI_3W_SAS is not set
# CONFIG_SCSI_ACARD is not set
CONFIG_SCSI_AACRAID=m
# CONFIG_SCSI_AIC7XXX is not set
# CONFIG_SCSI_AIC7XXX_OLD is not set
# CONFIG_SCSI_AIC79XX is not set
CONFIG_SCSI_AIC94XX=m
# CONFIG_AIC94XX_DEBUG is not set
# CONFIG_SCSI_MVSAS is not set
# CONFIG_SCSI_MVUMI is not set
# CONFIG_SCSI_DPT_I2O is not set
# CONFIG_SCSI_ADVANSYS is not set
# CONFIG_SCSI_ARCMSR is not set
CONFIG_MEGARAID_NEWGEN=y
# CONFIG_MEGARAID_MM is not set
# CONFIG_MEGARAID_LEGACY is not set
# CONFIG_MEGARAID_SAS is not set
# CONFIG_SCSI_MPT2SAS is not set
# CONFIG_SCSI_UFSHCD is not set
# CONFIG_SCSI_HPTIOP is not set
# CONFIG_SCSI_BUSLOGIC is not set
# CONFIG_VMWARE_PVSCSI is not set
# CONFIG_LIBFC is not set
# CONFIG_LIBFCOE is not set
# CONFIG_FCOE is not set
# CONFIG_FCOE_FNIC is not set
# CONFIG_SCSI_DMX3191D is not set
# CONFIG_SCSI_EATA is not set
# CONFIG_SCSI_FUTURE_DOMAIN is not set
# CONFIG_SCSI_GDTH is not set
# CONFIG_SCSI_ISCI is not set
# CONFIG_SCSI_IPS is not set
# CONFIG_SCSI_INITIO is not set
# CONFIG_SCSI_INIA100 is not set
# CONFIG_SCSI_STEX is not set
# CONFIG_SCSI_SYM53C8XX_2 is not set
# CONFIG_SCSI_IPR is not set
# CONFIG_SCSI_QLOGIC_1280 is not set
CONFIG_SCSI_QLA_FC=m
# CONFIG_SCSI_QLA_ISCSI is not set
# CONFIG_SCSI_LPFC is not set
# CONFIG_SCSI_DC395x is not set
# CONFIG_SCSI_DC390T is not set
# CONFIG_SCSI_DEBUG is not set
# CONFIG_SCSI_PMCRAID is not set
# CONFIG_SCSI_PM8001 is not set
# CONFIG_SCSI_SRP is not set
# CONFIG_SCSI_BFA_FC is not set
CONFIG_SCSI_LOWLEVEL_PCMCIA=y
# CONFIG_PCMCIA_AHA152X is not set
# CONFIG_PCMCIA_FDOMAIN is not set
# CONFIG_PCMCIA_QLOGIC is not set
# CONFIG_PCMCIA_SYM53C500 is not set
CONFIG_SCSI_DH=y
# CONFIG_SCSI_DH_RDAC is not set
# CONFIG_SCSI_DH_HP_SW is not set
# CONFIG_SCSI_DH_EMC is not set
# CONFIG_SCSI_DH_ALUA is not set
# CONFIG_SCSI_OSD_INITIATOR is not set
CONFIG_ATA=y
# CONFIG_ATA_NONSTANDARD is not set
CONFIG_ATA_VERBOSE_ERROR=y
CONFIG_ATA_ACPI=y
CONFIG_SATA_PMP=y

#
# Controllers with non-SFF native interface
#
# CONFIG_SATA_AHCI is not set
# CONFIG_SATA_AHCI_PLATFORM is not set
# CONFIG_SATA_INIC162X is not set
# CONFIG_SATA_ACARD_AHCI is not set
# CONFIG_SATA_SIL24 is not set
CONFIG_ATA_SFF=y

#
# SFF controllers with custom DMA interface
#
# CONFIG_PDC_ADMA is not set
# CONFIG_SATA_QSTOR is not set
# CONFIG_SATA_SX4 is not set
CONFIG_ATA_BMDMA=y

#
# SATA SFF controllers with BMDMA
#
# CONFIG_ATA_PIIX is not set
# CONFIG_SATA_HIGHBANK is not set
# CONFIG_SATA_MV is not set
# CONFIG_SATA_NV is not set
# CONFIG_SATA_PROMISE is not set
# CONFIG_SATA_SIL is not set
# CONFIG_SATA_SIS is not set
# CONFIG_SATA_SVW is not set
# CONFIG_SATA_ULI is not set
# CONFIG_SATA_VIA is not set
# CONFIG_SATA_VITESSE is not set

#
# PATA SFF controllers with BMDMA
#
# CONFIG_PATA_ALI is not set
# CONFIG_PATA_AMD is not set
# CONFIG_PATA_ARASAN_CF is not set
# CONFIG_PATA_ARTOP is not set
# CONFIG_PATA_ATIIXP is not set
# CONFIG_PATA_ATP867X is not set
# CONFIG_PATA_CMD64X is not set
# CONFIG_PATA_CS5520 is not set
# CONFIG_PATA_CS5530 is not set
# CONFIG_PATA_CS5536 is not set
# CONFIG_PATA_CYPRESS is not set
# CONFIG_PATA_EFAR is not set
# CONFIG_PATA_HPT366 is not set
# CONFIG_PATA_HPT37X is not set
# CONFIG_PATA_HPT3X2N is not set
# CONFIG_PATA_HPT3X3 is not set
# CONFIG_PATA_IT8213 is not set
# CONFIG_PATA_IT821X is not set
# CONFIG_PATA_JMICRON is not set
# CONFIG_PATA_MARVELL is not set
# CONFIG_PATA_NETCELL is not set
# CONFIG_PATA_NINJA32 is not set
# CONFIG_PATA_NS87415 is not set
# CONFIG_PATA_OLDPIIX is not set
# CONFIG_PATA_OPTIDMA is not set
# CONFIG_PATA_PDC2027X is not set
# CONFIG_PATA_PDC_OLD is not set
# CONFIG_PATA_RADISYS is not set
# CONFIG_PATA_RDC is not set
# CONFIG_PATA_SC1200 is not set
# CONFIG_PATA_SCH is not set
CONFIG_PATA_SERVERWORKS=m
# CONFIG_PATA_SIL680 is not set
# CONFIG_PATA_SIS is not set
# CONFIG_PATA_TOSHIBA is not set
# CONFIG_PATA_TRIFLEX is not set
# CONFIG_PATA_VIA is not set
# CONFIG_PATA_WINBOND is not set

#
# PIO-only SFF controllers
#
# CONFIG_PATA_CMD640_PCI is not set
# CONFIG_PATA_MPIIX is not set
# CONFIG_PATA_NS87410 is not set
# CONFIG_PATA_OPTI is not set
# CONFIG_PATA_PCMCIA is not set
# CONFIG_PATA_RZ1000 is not set

#
# Generic fallback / legacy drivers
#
CONFIG_PATA_ACPI=m
CONFIG_ATA_GENERIC=m
# CONFIG_PATA_LEGACY is not set
CONFIG_MD=y
CONFIG_BLK_DEV_MD=y
CONFIG_MD_AUTODETECT=y
# CONFIG_MD_LINEAR is not set
# CONFIG_MD_RAID0 is not set
# CONFIG_MD_RAID1 is not set
# CONFIG_MD_RAID10 is not set
# CONFIG_MD_RAID456 is not set
# CONFIG_MD_MULTIPATH is not set
# CONFIG_MD_FAULTY is not set
CONFIG_BLK_DEV_DM=m
CONFIG_DM_DEBUG=y
# CONFIG_DM_CRYPT is not set
# CONFIG_DM_SNAPSHOT is not set
# CONFIG_DM_THIN_PROVISIONING is not set
CONFIG_DM_MIRROR=m
# CONFIG_DM_RAID is not set
# CONFIG_DM_LOG_USERSPACE is not set
# CONFIG_DM_ZERO is not set
# CONFIG_DM_MULTIPATH is not set
# CONFIG_DM_DELAY is not set
CONFIG_DM_UEVENT=y
# CONFIG_DM_FLAKEY is not set
# CONFIG_DM_VERITY is not set
# CONFIG_TARGET_CORE is not set
CONFIG_FUSION=y
# CONFIG_FUSION_SPI is not set
# CONFIG_FUSION_FC is not set
# CONFIG_FUSION_SAS is not set
CONFIG_FUSION_MAX_SGE=128
CONFIG_FUSION_LOGGING=y

#
# IEEE 1394 (FireWire) support
#
# CONFIG_FIREWIRE is not set
# CONFIG_FIREWIRE_NOSY is not set
# CONFIG_I2O is not set
CONFIG_MACINTOSH_DRIVERS=y
CONFIG_MAC_EMUMOUSEBTN=y
CONFIG_NETDEVICES=y
CONFIG_NET_CORE=y
# CONFIG_BONDING is not set
# CONFIG_DUMMY is not set
# CONFIG_EQUALIZER is not set
CONFIG_NET_FC=y
# CONFIG_MII is not set
# CONFIG_IFB is not set
# CONFIG_NET_TEAM is not set
# CONFIG_MACVLAN is not set
# CONFIG_VXLAN is not set
CONFIG_NETCONSOLE=m
CONFIG_NETPOLL=y
# CONFIG_NETPOLL_TRAP is not set
CONFIG_NET_POLL_CONTROLLER=y
# CONFIG_TUN is not set
# CONFIG_VETH is not set
# CONFIG_ARCNET is not set

#
# CAIF transport drivers
#

#
# Distributed Switch Architecture drivers
#
CONFIG_NET_DSA_MV88E6XXX=y
CONFIG_NET_DSA_MV88E6060=y
CONFIG_NET_DSA_MV88E6XXX_NEED_PPU=y
CONFIG_NET_DSA_MV88E6131=y
CONFIG_NET_DSA_MV88E6123_61_65=y
CONFIG_ETHERNET=y
CONFIG_NET_VENDOR_3COM=y
# CONFIG_PCMCIA_3C574 is not set
# CONFIG_PCMCIA_3C589 is not set
# CONFIG_VORTEX is not set
# CONFIG_TYPHOON is not set
CONFIG_NET_VENDOR_ADAPTEC=y
# CONFIG_ADAPTEC_STARFIRE is not set
CONFIG_NET_VENDOR_ALTEON=y
# CONFIG_ACENIC is not set
CONFIG_NET_VENDOR_AMD=y
# CONFIG_AMD8111_ETH is not set
# CONFIG_PCNET32 is not set
# CONFIG_PCMCIA_NMCLAN is not set
CONFIG_NET_VENDOR_ATHEROS=y
# CONFIG_ATL2 is not set
# CONFIG_ATL1 is not set
# CONFIG_ATL1E is not set
# CONFIG_ATL1C is not set
CONFIG_NET_VENDOR_BROADCOM=y
# CONFIG_B44 is not set
# CONFIG_BNX2 is not set
# CONFIG_CNIC is not set
CONFIG_TIGON3=m
# CONFIG_BNX2X is not set
CONFIG_NET_VENDOR_BROCADE=y
# CONFIG_BNA is not set
# CONFIG_NET_CALXEDA_XGMAC is not set
CONFIG_NET_VENDOR_CHELSIO=y
# CONFIG_CHELSIO_T1 is not set
# CONFIG_CHELSIO_T3 is not set
# CONFIG_CHELSIO_T4 is not set
# CONFIG_CHELSIO_T4VF is not set
CONFIG_NET_VENDOR_CISCO=y
# CONFIG_ENIC is not set
# CONFIG_DNET is not set
CONFIG_NET_VENDOR_DEC=y
CONFIG_NET_TULIP=y
# CONFIG_DE2104X is not set
# CONFIG_TULIP is not set
# CONFIG_DE4X5 is not set
# CONFIG_WINBOND_840 is not set
# CONFIG_DM9102 is not set
# CONFIG_ULI526X is not set
# CONFIG_PCMCIA_XIRCOM is not set
CONFIG_NET_VENDOR_DLINK=y
# CONFIG_DL2K is not set
# CONFIG_SUNDANCE is not set
CONFIG_NET_VENDOR_EMULEX=y
# CONFIG_BE2NET is not set
CONFIG_NET_VENDOR_EXAR=y
# CONFIG_S2IO is not set
# CONFIG_VXGE is not set
CONFIG_NET_VENDOR_FUJITSU=y
# CONFIG_PCMCIA_FMVJ18X is not set
CONFIG_NET_VENDOR_HP=y
# CONFIG_HP100 is not set
CONFIG_NET_VENDOR_INTEL=y
# CONFIG_E100 is not set
# CONFIG_E1000 is not set
# CONFIG_E1000E is not set
# CONFIG_IGB is not set
# CONFIG_IGBVF is not set
# CONFIG_IXGB is not set
# CONFIG_IXGBE is not set
# CONFIG_IXGBEVF is not set
CONFIG_NET_VENDOR_I825XX=y
# CONFIG_ZNET is not set
# CONFIG_IP1000 is not set
# CONFIG_JME is not set
CONFIG_NET_VENDOR_MARVELL=y
# CONFIG_SKGE is not set
# CONFIG_SKY2 is not set
CONFIG_NET_VENDOR_MELLANOX=y
# CONFIG_MLX4_EN is not set
# CONFIG_MLX4_CORE is not set
CONFIG_NET_VENDOR_MICREL=y
# CONFIG_KS8851_MLL is not set
# CONFIG_KSZ884X_PCI is not set
CONFIG_NET_VENDOR_MYRI=y
# CONFIG_MYRI10GE is not set
# CONFIG_FEALNX is not set
CONFIG_NET_VENDOR_NATSEMI=y
# CONFIG_NATSEMI is not set
# CONFIG_NS83820 is not set
CONFIG_NET_VENDOR_8390=y
# CONFIG_PCMCIA_AXNET is not set
# CONFIG_NE2K_PCI is not set
# CONFIG_PCMCIA_PCNET is not set
CONFIG_NET_VENDOR_NVIDIA=y
# CONFIG_FORCEDETH is not set
CONFIG_NET_VENDOR_OKI=y
# CONFIG_PCH_GBE is not set
# CONFIG_ETHOC is not set
CONFIG_NET_PACKET_ENGINE=y
# CONFIG_HAMACHI is not set
# CONFIG_YELLOWFIN is not set
CONFIG_NET_VENDOR_QLOGIC=y
# CONFIG_QLA3XXX is not set
# CONFIG_QLCNIC is not set
# CONFIG_QLGE is not set
# CONFIG_NETXEN_NIC is not set
CONFIG_NET_VENDOR_REALTEK=y
# CONFIG_8139CP is not set
# CONFIG_8139TOO is not set
# CONFIG_R8169 is not set
CONFIG_NET_VENDOR_RDC=y
# CONFIG_R6040 is not set
CONFIG_NET_VENDOR_SEEQ=y
# CONFIG_SEEQ8005 is not set
CONFIG_NET_VENDOR_SILAN=y
# CONFIG_SC92031 is not set
CONFIG_NET_VENDOR_SIS=y
# CONFIG_SIS900 is not set
# CONFIG_SIS190 is not set
# CONFIG_SFC is not set
CONFIG_NET_VENDOR_SMSC=y
# CONFIG_PCMCIA_SMC91C92 is not set
# CONFIG_EPIC100 is not set
# CONFIG_SMSC9420 is not set
CONFIG_NET_VENDOR_STMICRO=y
# CONFIG_STMMAC_ETH is not set
CONFIG_NET_VENDOR_SUN=y
# CONFIG_HAPPYMEAL is not set
# CONFIG_SUNGEM is not set
# CONFIG_CASSINI is not set
# CONFIG_NIU is not set
CONFIG_NET_VENDOR_TEHUTI=y
# CONFIG_TEHUTI is not set
CONFIG_NET_VENDOR_TI=y
# CONFIG_TLAN is not set
CONFIG_NET_VENDOR_VIA=y
# CONFIG_VIA_RHINE is not set
# CONFIG_VIA_VELOCITY is not set
CONFIG_NET_VENDOR_WIZNET=y
# CONFIG_WIZNET_W5100 is not set
# CONFIG_WIZNET_W5300 is not set
CONFIG_NET_VENDOR_XIRCOM=y
# CONFIG_PCMCIA_XIRC2PS is not set
CONFIG_FDDI=y
# CONFIG_DEFXX is not set
# CONFIG_SKFP is not set
# CONFIG_HIPPI is not set
# CONFIG_NET_SB1000 is not set
CONFIG_PHYLIB=y

#
# MII PHY device drivers
#
# CONFIG_AT803X_PHY is not set
# CONFIG_AMD_PHY is not set
# CONFIG_MARVELL_PHY is not set
# CONFIG_DAVICOM_PHY is not set
# CONFIG_QSEMI_PHY is not set
# CONFIG_LXT_PHY is not set
# CONFIG_CICADA_PHY is not set
# CONFIG_VITESSE_PHY is not set
# CONFIG_SMSC_PHY is not set
# CONFIG_BROADCOM_PHY is not set
# CONFIG_BCM87XX_PHY is not set
# CONFIG_ICPLUS_PHY is not set
# CONFIG_REALTEK_PHY is not set
# CONFIG_NATIONAL_PHY is not set
# CONFIG_STE10XP is not set
# CONFIG_LSI_ET1011C_PHY is not set
# CONFIG_MICREL_PHY is not set
CONFIG_FIXED_PHY=y
# CONFIG_MDIO_BITBANG is not set
# CONFIG_PPP is not set
# CONFIG_SLIP is not set

#
# USB Network Adapters
#
# CONFIG_USB_CATC is not set
# CONFIG_USB_KAWETH is not set
# CONFIG_USB_PEGASUS is not set
# CONFIG_USB_RTL8150 is not set
# CONFIG_USB_USBNET is not set
# CONFIG_USB_IPHETH is not set
CONFIG_WLAN=y
# CONFIG_PCMCIA_RAYCS is not set
# CONFIG_AIRO is not set
# CONFIG_ATMEL is not set
# CONFIG_AIRO_CS is not set
# CONFIG_PCMCIA_WL3501 is not set
# CONFIG_PRISM54 is not set
# CONFIG_USB_ZD1201 is not set
# CONFIG_HOSTAP is not set
# CONFIG_WL_TI is not set

#
# Enable WiMAX (Networking options) to see the WiMAX drivers
#
CONFIG_WAN=y
# CONFIG_HDLC is not set
# CONFIG_DLCI is not set
# CONFIG_SBNI is not set
# CONFIG_XEN_NETDEV_FRONTEND is not set
# CONFIG_XEN_NETDEV_BACKEND is not set
# CONFIG_VMXNET3 is not set
CONFIG_ISDN=y
# CONFIG_ISDN_I4L is not set
# CONFIG_ISDN_CAPI is not set
# CONFIG_ISDN_DRV_GIGASET is not set
# CONFIG_HYSDN is not set
# CONFIG_MISDN is not set

#
# Input device support
#
CONFIG_INPUT=y
CONFIG_INPUT_FF_MEMLESS=y
# CONFIG_INPUT_POLLDEV is not set
# CONFIG_INPUT_SPARSEKMAP is not set
# CONFIG_INPUT_MATRIXKMAP is not set

#
# Userland interfaces
#
CONFIG_INPUT_MOUSEDEV=y
# CONFIG_INPUT_MOUSEDEV_PSAUX is not set
CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024
CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768
# CONFIG_INPUT_JOYDEV is not set
CONFIG_INPUT_EVDEV=y
# CONFIG_INPUT_EVBUG is not set

#
# Input Device Drivers
#
CONFIG_INPUT_KEYBOARD=y
# CONFIG_KEYBOARD_ADP5588 is not set
# CONFIG_KEYBOARD_ADP5589 is not set
CONFIG_KEYBOARD_ATKBD=y
# CONFIG_KEYBOARD_QT1070 is not set
# CONFIG_KEYBOARD_QT2160 is not set
# CONFIG_KEYBOARD_LKKBD is not set
# CONFIG_KEYBOARD_GPIO is not set
# CONFIG_KEYBOARD_GPIO_POLLED is not set
# CONFIG_KEYBOARD_TCA6416 is not set
# CONFIG_KEYBOARD_TCA8418 is not set
# CONFIG_KEYBOARD_MATRIX is not set
# CONFIG_KEYBOARD_LM8323 is not set
# CONFIG_KEYBOARD_LM8333 is not set
# CONFIG_KEYBOARD_MAX7359 is not set
# CONFIG_KEYBOARD_MCS is not set
# CONFIG_KEYBOARD_MPR121 is not set
# CONFIG_KEYBOARD_NEWTON is not set
# CONFIG_KEYBOARD_OPENCORES is not set
# CONFIG_KEYBOARD_STOWAWAY is not set
# CONFIG_KEYBOARD_SUNKBD is not set
# CONFIG_KEYBOARD_OMAP4 is not set
# CONFIG_KEYBOARD_XTKBD is not set
CONFIG_INPUT_MOUSE=y
CONFIG_MOUSE_PS2=y
CONFIG_MOUSE_PS2_ALPS=y
CONFIG_MOUSE_PS2_LOGIPS2PP=y
CONFIG_MOUSE_PS2_SYNAPTICS=y
CONFIG_MOUSE_PS2_LIFEBOOK=y
CONFIG_MOUSE_PS2_TRACKPOINT=y
CONFIG_MOUSE_PS2_ELANTECH=y
CONFIG_MOUSE_PS2_SENTELIC=y
# CONFIG_MOUSE_PS2_TOUCHKIT is not set
# CONFIG_MOUSE_SERIAL is not set
# CONFIG_MOUSE_APPLETOUCH is not set
# CONFIG_MOUSE_BCM5974 is not set
# CONFIG_MOUSE_VSXXXAA is not set
# CONFIG_MOUSE_GPIO is not set
# CONFIG_MOUSE_SYNAPTICS_I2C is not set
# CONFIG_MOUSE_SYNAPTICS_USB is not set
# CONFIG_INPUT_JOYSTICK is not set
CONFIG_INPUT_TABLET=y
# CONFIG_TABLET_USB_ACECAD is not set
# CONFIG_TABLET_USB_AIPTEK is not set
# CONFIG_TABLET_USB_GTCO is not set
# CONFIG_TABLET_USB_HANWANG is not set
# CONFIG_TABLET_USB_KBTAB is not set
# CONFIG_TABLET_USB_WACOM is not set
CONFIG_INPUT_TOUCHSCREEN=y
# CONFIG_TOUCHSCREEN_AD7879 is not set
# CONFIG_TOUCHSCREEN_ATMEL_MXT is not set
# CONFIG_TOUCHSCREEN_AUO_PIXCIR is not set
# CONFIG_TOUCHSCREEN_BU21013 is not set
# CONFIG_TOUCHSCREEN_CY8CTMG110 is not set
# CONFIG_TOUCHSCREEN_CYTTSP_CORE is not set
# CONFIG_TOUCHSCREEN_DYNAPRO is not set
# CONFIG_TOUCHSCREEN_HAMPSHIRE is not set
# CONFIG_TOUCHSCREEN_EETI is not set
# CONFIG_TOUCHSCREEN_FUJITSU is not set
# CONFIG_TOUCHSCREEN_ILI210X is not set
# CONFIG_TOUCHSCREEN_GUNZE is not set
# CONFIG_TOUCHSCREEN_ELO is not set
# CONFIG_TOUCHSCREEN_WACOM_W8001 is not set
# CONFIG_TOUCHSCREEN_WACOM_I2C is not set
# CONFIG_TOUCHSCREEN_MAX11801 is not set
# CONFIG_TOUCHSCREEN_MCS5000 is not set
# CONFIG_TOUCHSCREEN_MMS114 is not set
# CONFIG_TOUCHSCREEN_MTOUCH is not set
# CONFIG_TOUCHSCREEN_INEXIO is not set
# CONFIG_TOUCHSCREEN_MK712 is not set
# CONFIG_TOUCHSCREEN_PENMOUNT is not set
# CONFIG_TOUCHSCREEN_EDT_FT5X06 is not set
# CONFIG_TOUCHSCREEN_TOUCHRIGHT is not set
# CONFIG_TOUCHSCREEN_TOUCHWIN is not set
# CONFIG_TOUCHSCREEN_PIXCIR is not set
# CONFIG_TOUCHSCREEN_USB_COMPOSITE is not set
# CONFIG_TOUCHSCREEN_TOUCHIT213 is not set
# CONFIG_TOUCHSCREEN_TSC_SERIO is not set
# CONFIG_TOUCHSCREEN_TSC2007 is not set
# CONFIG_TOUCHSCREEN_ST1232 is not set
# CONFIG_TOUCHSCREEN_TPS6507X is not set
CONFIG_INPUT_MISC=y
# CONFIG_INPUT_AD714X is not set
# CONFIG_INPUT_BMA150 is not set
CONFIG_INPUT_PCSPKR=m
# CONFIG_INPUT_MMA8450 is not set
# CONFIG_INPUT_MPU3050 is not set
# CONFIG_INPUT_APANEL is not set
# CONFIG_INPUT_GP2A is not set
# CONFIG_INPUT_GPIO_TILT_POLLED is not set
# CONFIG_INPUT_ATLAS_BTNS is not set
# CONFIG_INPUT_ATI_REMOTE2 is not set
# CONFIG_INPUT_KEYSPAN_REMOTE is not set
# CONFIG_INPUT_KXTJ9 is not set
# CONFIG_INPUT_POWERMATE is not set
# CONFIG_INPUT_YEALINK is not set
# CONFIG_INPUT_CM109 is not set
# CONFIG_INPUT_UINPUT is not set
# CONFIG_INPUT_PCF8574 is not set
# CONFIG_INPUT_GPIO_ROTARY_ENCODER is not set
# CONFIG_INPUT_ADXL34X is not set
# CONFIG_INPUT_CMA3000 is not set
CONFIG_INPUT_XEN_KBDDEV_FRONTEND=y

#
# Hardware I/O ports
#
CONFIG_SERIO=y
CONFIG_SERIO_I8042=y
CONFIG_SERIO_SERPORT=y
# CONFIG_SERIO_CT82C710 is not set
# CONFIG_SERIO_PCIPS2 is not set
CONFIG_SERIO_LIBPS2=y
# CONFIG_SERIO_RAW is not set
# CONFIG_SERIO_ALTERA_PS2 is not set
# CONFIG_SERIO_PS2MULT is not set
# CONFIG_GAMEPORT is not set

#
# Character devices
#
CONFIG_VT=y
CONFIG_CONSOLE_TRANSLATIONS=y
CONFIG_VT_CONSOLE=y
CONFIG_VT_CONSOLE_SLEEP=y
CONFIG_HW_CONSOLE=y
CONFIG_VT_HW_CONSOLE_BINDING=y
CONFIG_UNIX98_PTYS=y
CONFIG_DEVPTS_MULTIPLE_INSTANCES=y
# CONFIG_LEGACY_PTYS is not set
CONFIG_SERIAL_NONSTANDARD=y
# CONFIG_ROCKETPORT is not set
# CONFIG_CYCLADES is not set
# CONFIG_MOXA_INTELLIO is not set
# CONFIG_MOXA_SMARTIO is not set
# CONFIG_SYNCLINK is not set
# CONFIG_SYNCLINKMP is not set
# CONFIG_SYNCLINK_GT is not set
# CONFIG_NOZOMI is not set
# CONFIG_ISI is not set
# CONFIG_N_HDLC is not set
# CONFIG_N_GSM is not set
# CONFIG_TRACE_SINK is not set
# CONFIG_DEVKMEM is not set
# CONFIG_STALDRV is not set

#
# Serial drivers
#
CONFIG_SERIAL_8250=y
CONFIG_SERIAL_8250_PNP=y
CONFIG_SERIAL_8250_CONSOLE=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_SERIAL_8250_PCI=y
# CONFIG_SERIAL_8250_CS is not set
CONFIG_SERIAL_8250_NR_UARTS=32
CONFIG_SERIAL_8250_RUNTIME_UARTS=4
CONFIG_SERIAL_8250_EXTENDED=y
CONFIG_SERIAL_8250_MANY_PORTS=y
CONFIG_SERIAL_8250_SHARE_IRQ=y
CONFIG_SERIAL_8250_DETECT_IRQ=y
CONFIG_SERIAL_8250_RSA=y

#
# Non-8250 serial port support
#
# CONFIG_SERIAL_MFD_HSU is not set
CONFIG_SERIAL_CORE=y
CONFIG_SERIAL_CORE_CONSOLE=y
# CONFIG_SERIAL_JSM is not set
# CONFIG_SERIAL_SCCNXP is not set
# CONFIG_SERIAL_TIMBERDALE is not set
# CONFIG_SERIAL_ALTERA_JTAGUART is not set
# CONFIG_SERIAL_ALTERA_UART is not set
# CONFIG_SERIAL_PCH_UART is not set
# CONFIG_SERIAL_XILINX_PS_UART is not set
CONFIG_HVC_DRIVER=y
CONFIG_HVC_IRQ=y
CONFIG_HVC_XEN=y
CONFIG_HVC_XEN_FRONTEND=y
# CONFIG_IPMI_HANDLER is not set
CONFIG_HW_RANDOM=y
# CONFIG_HW_RANDOM_TIMERIOMEM is not set
# CONFIG_HW_RANDOM_INTEL is not set
# CONFIG_HW_RANDOM_AMD is not set
# CONFIG_HW_RANDOM_VIA is not set
CONFIG_HW_RANDOM_TPM=y
CONFIG_NVRAM=y
# CONFIG_R3964 is not set
# CONFIG_APPLICOM is not set

#
# PCMCIA character devices
#
# CONFIG_SYNCLINK_CS is not set
# CONFIG_CARDMAN_4000 is not set
# CONFIG_CARDMAN_4040 is not set
# CONFIG_IPWIRELESS is not set
# CONFIG_MWAVE is not set
CONFIG_RAW_DRIVER=y
CONFIG_MAX_RAW_DEVS=8192
CONFIG_HPET=y
# CONFIG_HPET_MMAP is not set
# CONFIG_HANGCHECK_TIMER is not set
CONFIG_TCG_TPM=y
CONFIG_TCG_TIS=y
# CONFIG_TCG_TIS_I2C_INFINEON is not set
# CONFIG_TCG_NSC is not set
# CONFIG_TCG_ATMEL is not set
# CONFIG_TCG_INFINEON is not set
# CONFIG_TELCLOCK is not set
CONFIG_DEVPORT=y
CONFIG_I2C=m
CONFIG_I2C_BOARDINFO=y
CONFIG_I2C_COMPAT=y
# CONFIG_I2C_CHARDEV is not set
# CONFIG_I2C_MUX is not set
CONFIG_I2C_HELPER_AUTO=y
CONFIG_I2C_ALGOBIT=m

#
# I2C Hardware Bus support
#

#
# PC SMBus host controller drivers
#
# CONFIG_I2C_ALI1535 is not set
# CONFIG_I2C_ALI1563 is not set
# CONFIG_I2C_ALI15X3 is not set
# CONFIG_I2C_AMD756 is not set
# CONFIG_I2C_AMD8111 is not set
# CONFIG_I2C_I801 is not set
# CONFIG_I2C_ISCH is not set
CONFIG_I2C_PIIX4=m
# CONFIG_I2C_NFORCE2 is not set
# CONFIG_I2C_SIS5595 is not set
# CONFIG_I2C_SIS630 is not set
# CONFIG_I2C_SIS96X is not set
# CONFIG_I2C_VIA is not set
# CONFIG_I2C_VIAPRO is not set

#
# ACPI drivers
#
# CONFIG_I2C_SCMI is not set

#
# I2C system bus drivers (mostly embedded / system-on-chip)
#
# CONFIG_I2C_DESIGNWARE_PCI is not set
# CONFIG_I2C_EG20T is not set
# CONFIG_I2C_GPIO is not set
# CONFIG_I2C_INTEL_MID is not set
# CONFIG_I2C_OCORES is not set
# CONFIG_I2C_PCA_PLATFORM is not set
# CONFIG_I2C_PXA_PCI is not set
# CONFIG_I2C_SIMTEC is not set
# CONFIG_I2C_XILINX is not set

#
# External I2C/SMBus adapter drivers
#
# CONFIG_I2C_DIOLAN_U2C is not set
# CONFIG_I2C_PARPORT_LIGHT is not set
# CONFIG_I2C_TAOS_EVM is not set
# CONFIG_I2C_TINY_USB is not set

#
# Other I2C/SMBus bus drivers
#
# CONFIG_I2C_STUB is not set
# CONFIG_I2C_DEBUG_CORE is not set
# CONFIG_I2C_DEBUG_ALGO is not set
# CONFIG_I2C_DEBUG_BUS is not set
# CONFIG_SPI is not set
# CONFIG_HSI is not set

#
# PPS support
#
# CONFIG_PPS is not set

#
# PPS generators support
#

#
# PTP clock support
#

#
# Enable Device Drivers -> PPS to see the PTP clock options.
#
CONFIG_ARCH_WANT_OPTIONAL_GPIOLIB=y
CONFIG_GPIOLIB=y
# CONFIG_GPIO_SYSFS is not set

#
# Memory mapped GPIO drivers:
#
# CONFIG_GPIO_GENERIC_PLATFORM is not set
# CONFIG_GPIO_IT8761E is not set
# CONFIG_GPIO_SCH is not set
# CONFIG_GPIO_ICH is not set
# CONFIG_GPIO_VX855 is not set

#
# I2C GPIO expanders:
#
# CONFIG_GPIO_MAX7300 is not set
# CONFIG_GPIO_MAX732X is not set
# CONFIG_GPIO_PCA953X is not set
# CONFIG_GPIO_PCF857X is not set
# CONFIG_GPIO_ADP5588 is not set

#
# PCI GPIO expanders:
#
# CONFIG_GPIO_BT8XX is not set
# CONFIG_GPIO_AMD8111 is not set
# CONFIG_GPIO_LANGWELL is not set
# CONFIG_GPIO_PCH is not set
# CONFIG_GPIO_ML_IOH is not set
# CONFIG_GPIO_RDC321X is not set

#
# SPI GPIO expanders:
#
# CONFIG_GPIO_MCP23S08 is not set

#
# AC97 GPIO expanders:
#

#
# MODULbus GPIO expanders:
#
# CONFIG_W1 is not set
CONFIG_POWER_SUPPLY=y
# CONFIG_POWER_SUPPLY_DEBUG is not set
# CONFIG_PDA_POWER is not set
# CONFIG_TEST_POWER is not set
# CONFIG_BATTERY_DS2780 is not set
# CONFIG_BATTERY_DS2781 is not set
# CONFIG_BATTERY_DS2782 is not set
# CONFIG_BATTERY_SBS is not set
# CONFIG_BATTERY_BQ27x00 is not set
# CONFIG_BATTERY_MAX17040 is not set
# CONFIG_BATTERY_MAX17042 is not set
# CONFIG_CHARGER_MAX8903 is not set
# CONFIG_CHARGER_LP8727 is not set
# CONFIG_CHARGER_GPIO is not set
# CONFIG_CHARGER_MANAGER is not set
# CONFIG_CHARGER_SMB347 is not set
# CONFIG_POWER_AVS is not set
CONFIG_HWMON=m
# CONFIG_HWMON_VID is not set
# CONFIG_HWMON_DEBUG_CHIP is not set

#
# Native drivers
#
# CONFIG_SENSORS_ABITUGURU is not set
# CONFIG_SENSORS_ABITUGURU3 is not set
# CONFIG_SENSORS_AD7414 is not set
# CONFIG_SENSORS_AD7418 is not set
# CONFIG_SENSORS_ADM1021 is not set
# CONFIG_SENSORS_ADM1025 is not set
# CONFIG_SENSORS_ADM1026 is not set
# CONFIG_SENSORS_ADM1029 is not set
# CONFIG_SENSORS_ADM1031 is not set
# CONFIG_SENSORS_ADM9240 is not set
# CONFIG_SENSORS_ADT7410 is not set
# CONFIG_SENSORS_ADT7411 is not set
# CONFIG_SENSORS_ADT7462 is not set
# CONFIG_SENSORS_ADT7470 is not set
# CONFIG_SENSORS_ADT7475 is not set
# CONFIG_SENSORS_ASC7621 is not set
# CONFIG_SENSORS_K8TEMP is not set
# CONFIG_SENSORS_K10TEMP is not set
# CONFIG_SENSORS_FAM15H_POWER is not set
# CONFIG_SENSORS_ASB100 is not set
# CONFIG_SENSORS_ATXP1 is not set
# CONFIG_SENSORS_DS620 is not set
# CONFIG_SENSORS_DS1621 is not set
# CONFIG_SENSORS_I5K_AMB is not set
# CONFIG_SENSORS_F71805F is not set
# CONFIG_SENSORS_F71882FG is not set
# CONFIG_SENSORS_F75375S is not set
# CONFIG_SENSORS_FSCHMD is not set
# CONFIG_SENSORS_G760A is not set
# CONFIG_SENSORS_GL518SM is not set
# CONFIG_SENSORS_GL520SM is not set
# CONFIG_SENSORS_GPIO_FAN is not set
# CONFIG_SENSORS_HIH6130 is not set
# CONFIG_SENSORS_CORETEMP is not set
# CONFIG_SENSORS_IT87 is not set
# CONFIG_SENSORS_JC42 is not set
# CONFIG_SENSORS_LINEAGE is not set
# CONFIG_SENSORS_LM63 is not set
# CONFIG_SENSORS_LM73 is not set
# CONFIG_SENSORS_LM75 is not set
# CONFIG_SENSORS_LM77 is not set
# CONFIG_SENSORS_LM78 is not set
# CONFIG_SENSORS_LM80 is not set
# CONFIG_SENSORS_LM83 is not set
# CONFIG_SENSORS_LM85 is not set
# CONFIG_SENSORS_LM87 is not set
# CONFIG_SENSORS_LM90 is not set
# CONFIG_SENSORS_LM92 is not set
# CONFIG_SENSORS_LM93 is not set
# CONFIG_SENSORS_LTC4151 is not set
# CONFIG_SENSORS_LTC4215 is not set
# CONFIG_SENSORS_LTC4245 is not set
# CONFIG_SENSORS_LTC4261 is not set
# CONFIG_SENSORS_LM95241 is not set
# CONFIG_SENSORS_LM95245 is not set
# CONFIG_SENSORS_MAX16065 is not set
# CONFIG_SENSORS_MAX1619 is not set
# CONFIG_SENSORS_MAX1668 is not set
# CONFIG_SENSORS_MAX197 is not set
# CONFIG_SENSORS_MAX6639 is not set
# CONFIG_SENSORS_MAX6642 is not set
# CONFIG_SENSORS_MAX6650 is not set
# CONFIG_SENSORS_MCP3021 is not set
# CONFIG_SENSORS_NTC_THERMISTOR is not set
# CONFIG_SENSORS_PC87360 is not set
# CONFIG_SENSORS_PC87427 is not set
# CONFIG_SENSORS_PCF8591 is not set
# CONFIG_PMBUS is not set
# CONFIG_SENSORS_SHT15 is not set
# CONFIG_SENSORS_SHT21 is not set
# CONFIG_SENSORS_SIS5595 is not set
# CONFIG_SENSORS_SMM665 is not set
# CONFIG_SENSORS_DME1737 is not set
# CONFIG_SENSORS_EMC1403 is not set
# CONFIG_SENSORS_EMC2103 is not set
# CONFIG_SENSORS_EMC6W201 is not set
# CONFIG_SENSORS_SMSC47M1 is not set
# CONFIG_SENSORS_SMSC47M192 is not set
# CONFIG_SENSORS_SMSC47B397 is not set
# CONFIG_SENSORS_SCH56XX_COMMON is not set
# CONFIG_SENSORS_SCH5627 is not set
# CONFIG_SENSORS_SCH5636 is not set
# CONFIG_SENSORS_ADS1015 is not set
# CONFIG_SENSORS_ADS7828 is not set
# CONFIG_SENSORS_AMC6821 is not set
# CONFIG_SENSORS_INA2XX is not set
# CONFIG_SENSORS_THMC50 is not set
# CONFIG_SENSORS_TMP102 is not set
# CONFIG_SENSORS_TMP401 is not set
# CONFIG_SENSORS_TMP421 is not set
# CONFIG_SENSORS_VIA_CPUTEMP is not set
# CONFIG_SENSORS_VIA686A is not set
# CONFIG_SENSORS_VT1211 is not set
# CONFIG_SENSORS_VT8231 is not set
# CONFIG_SENSORS_W83781D is not set
# CONFIG_SENSORS_W83791D is not set
# CONFIG_SENSORS_W83792D is not set
# CONFIG_SENSORS_W83793 is not set
# CONFIG_SENSORS_W83795 is not set
# CONFIG_SENSORS_W83L785TS is not set
# CONFIG_SENSORS_W83L786NG is not set
# CONFIG_SENSORS_W83627HF is not set
# CONFIG_SENSORS_W83627EHF is not set
# CONFIG_SENSORS_APPLESMC is not set

#
# ACPI drivers
#
# CONFIG_SENSORS_ACPI_POWER is not set
# CONFIG_SENSORS_ATK0110 is not set
CONFIG_THERMAL=y
# CONFIG_CPU_THERMAL is not set
CONFIG_WATCHDOG=y
CONFIG_WATCHDOG_CORE=y
# CONFIG_WATCHDOG_NOWAYOUT is not set

#
# Watchdog Device Drivers
#
# CONFIG_SOFT_WATCHDOG is not set
# CONFIG_ACQUIRE_WDT is not set
# CONFIG_ADVANTECH_WDT is not set
# CONFIG_ALIM1535_WDT is not set
# CONFIG_ALIM7101_WDT is not set
# CONFIG_F71808E_WDT is not set
# CONFIG_SP5100_TCO is not set
# CONFIG_SC520_WDT is not set
# CONFIG_SBC_FITPC2_WATCHDOG is not set
# CONFIG_EUROTECH_WDT is not set
# CONFIG_IB700_WDT is not set
# CONFIG_IBMASR is not set
# CONFIG_WAFER_WDT is not set
# CONFIG_I6300ESB_WDT is not set
# CONFIG_IE6XX_WDT is not set
# CONFIG_ITCO_WDT is not set
# CONFIG_IT8712F_WDT is not set
# CONFIG_IT87_WDT is not set
# CONFIG_HP_WATCHDOG is not set
# CONFIG_SC1200_WDT is not set
# CONFIG_PC87413_WDT is not set
# CONFIG_NV_TCO is not set
# CONFIG_60XX_WDT is not set
# CONFIG_SBC8360_WDT is not set
# CONFIG_CPU5_WDT is not set
# CONFIG_SMSC_SCH311X_WDT is not set
# CONFIG_SMSC37B787_WDT is not set
# CONFIG_VIA_WDT is not set
# CONFIG_W83627HF_WDT is not set
# CONFIG_W83697HF_WDT is not set
# CONFIG_W83697UG_WDT is not set
# CONFIG_W83877F_WDT is not set
# CONFIG_W83977F_WDT is not set
# CONFIG_MACHZ_WDT is not set
# CONFIG_SBC_EPX_C3_WATCHDOG is not set
# CONFIG_XEN_WDT is not set

#
# PCI-based Watchdog Cards
#
# CONFIG_PCIPCWATCHDOG is not set
# CONFIG_WDTPCI is not set

#
# USB-based Watchdog Cards
#
# CONFIG_USBPCWATCHDOG is not set
CONFIG_SSB_POSSIBLE=y

#
# Sonics Silicon Backplane
#
# CONFIG_SSB is not set
CONFIG_BCMA_POSSIBLE=y

#
# Broadcom specific AMBA
#
# CONFIG_BCMA is not set

#
# Multifunction device drivers
#
# CONFIG_MFD_CORE is not set
# CONFIG_MFD_SM501 is not set
# CONFIG_HTC_PASIC3 is not set
# CONFIG_MFD_LM3533 is not set
# CONFIG_TPS6105X is not set
# CONFIG_TPS65010 is not set
# CONFIG_TPS6507X is not set
# CONFIG_MFD_TPS65217 is not set
# CONFIG_MFD_TMIO is not set
# CONFIG_MFD_ARIZONA_I2C is not set
# CONFIG_MFD_PCF50633 is not set
# CONFIG_MFD_MC13XXX_I2C is not set
# CONFIG_ABX500_CORE is not set
# CONFIG_MFD_CS5535 is not set
# CONFIG_MFD_TIMBERDALE is not set
# CONFIG_LPC_SCH is not set
# CONFIG_LPC_ICH is not set
# CONFIG_MFD_RDC321X is not set
# CONFIG_MFD_JANZ_CMODIO is not set
# CONFIG_MFD_VX855 is not set
# CONFIG_MFD_WL1273_CORE is not set
CONFIG_REGULATOR=y
# CONFIG_REGULATOR_DEBUG is not set
# CONFIG_REGULATOR_DUMMY is not set
# CONFIG_REGULATOR_FIXED_VOLTAGE is not set
# CONFIG_REGULATOR_VIRTUAL_CONSUMER is not set
# CONFIG_REGULATOR_USERSPACE_CONSUMER is not set
# CONFIG_REGULATOR_GPIO is not set
# CONFIG_REGULATOR_AD5398 is not set
# CONFIG_REGULATOR_FAN53555 is not set
# CONFIG_REGULATOR_ISL6271A is not set
# CONFIG_REGULATOR_MAX1586 is not set
# CONFIG_REGULATOR_MAX8649 is not set
# CONFIG_REGULATOR_MAX8660 is not set
# CONFIG_REGULATOR_MAX8952 is not set
# CONFIG_REGULATOR_LP3971 is not set
# CONFIG_REGULATOR_LP3972 is not set
# CONFIG_REGULATOR_TPS62360 is not set
# CONFIG_REGULATOR_TPS65023 is not set
# CONFIG_REGULATOR_TPS6507X is not set
# CONFIG_MEDIA_SUPPORT is not set

#
# Graphics support
#
CONFIG_AGP=y
CONFIG_AGP_AMD64=y
CONFIG_AGP_INTEL=y
CONFIG_AGP_SIS=y
CONFIG_AGP_VIA=y
CONFIG_VGA_ARB=y
CONFIG_VGA_ARB_MAX_GPUS=64
# CONFIG_VGA_SWITCHEROO is not set
CONFIG_DRM=m
CONFIG_DRM_KMS_HELPER=m
# CONFIG_DRM_LOAD_EDID_FIRMWARE is not set
CONFIG_DRM_TTM=m
# CONFIG_DRM_TDFX is not set
# CONFIG_DRM_R128 is not set
CONFIG_DRM_RADEON=m
CONFIG_DRM_RADEON_KMS=y
# CONFIG_DRM_NOUVEAU is not set

#
# I2C encoder or helper chips
#
# CONFIG_DRM_I2C_CH7006 is not set
# CONFIG_DRM_I2C_SIL164 is not set
# CONFIG_DRM_I810 is not set
# CONFIG_DRM_I915 is not set
# CONFIG_DRM_MGA is not set
# CONFIG_DRM_SIS is not set
# CONFIG_DRM_VIA is not set
# CONFIG_DRM_SAVAGE is not set
# CONFIG_DRM_VMWGFX is not set
# CONFIG_DRM_GMA500 is not set
# CONFIG_DRM_UDL is not set
# CONFIG_DRM_AST is not set
# CONFIG_DRM_MGAG200 is not set
# CONFIG_DRM_CIRRUS_QEMU is not set
# CONFIG_STUB_POULSBO is not set
# CONFIG_VGASTATE is not set
# CONFIG_VIDEO_OUTPUT_CONTROL is not set
CONFIG_FB=y
# CONFIG_FIRMWARE_EDID is not set
# CONFIG_FB_DDC is not set
CONFIG_FB_BOOT_VESA_SUPPORT=y
CONFIG_FB_CFB_FILLRECT=y
CONFIG_FB_CFB_COPYAREA=y
CONFIG_FB_CFB_IMAGEBLIT=y
# CONFIG_FB_CFB_REV_PIXELS_IN_BYTE is not set
CONFIG_FB_SYS_FILLRECT=y
CONFIG_FB_SYS_COPYAREA=y
CONFIG_FB_SYS_IMAGEBLIT=y
# CONFIG_FB_FOREIGN_ENDIAN is not set
CONFIG_FB_SYS_FOPS=y
# CONFIG_FB_WMT_GE_ROPS is not set
CONFIG_FB_DEFERRED_IO=y
# CONFIG_FB_SVGALIB is not set
# CONFIG_FB_MACMODES is not set
# CONFIG_FB_BACKLIGHT is not set
CONFIG_FB_MODE_HELPERS=y
CONFIG_FB_TILEBLITTING=y

#
# Frame buffer hardware drivers
#
# CONFIG_FB_CIRRUS is not set
# CONFIG_FB_PM2 is not set
# CONFIG_FB_CYBER2000 is not set
# CONFIG_FB_ARC is not set
# CONFIG_FB_ASILIANT is not set
# CONFIG_FB_IMSTT is not set
# CONFIG_FB_VGA16 is not set
# CONFIG_FB_UVESA is not set
CONFIG_FB_VESA=y
CONFIG_FB_EFI=y
# CONFIG_FB_N411 is not set
# CONFIG_FB_HGA is not set
# CONFIG_FB_S1D13XXX is not set
# CONFIG_FB_NVIDIA is not set
# CONFIG_FB_RIVA is not set
# CONFIG_FB_I740 is not set
# CONFIG_FB_LE80578 is not set
# CONFIG_FB_MATROX is not set
# CONFIG_FB_RADEON is not set
# CONFIG_FB_ATY128 is not set
# CONFIG_FB_ATY is not set
# CONFIG_FB_S3 is not set
# CONFIG_FB_SAVAGE is not set
# CONFIG_FB_SIS is not set
# CONFIG_FB_VIA is not set
# CONFIG_FB_NEOMAGIC is not set
# CONFIG_FB_KYRO is not set
# CONFIG_FB_3DFX is not set
# CONFIG_FB_VOODOO1 is not set
# CONFIG_FB_VT8623 is not set
# CONFIG_FB_TRIDENT is not set
# CONFIG_FB_ARK is not set
# CONFIG_FB_PM3 is not set
# CONFIG_FB_CARMINE is not set
# CONFIG_FB_GEODE is not set
# CONFIG_FB_SMSCUFX is not set
# CONFIG_FB_UDL is not set
# CONFIG_FB_VIRTUAL is not set
CONFIG_XEN_FBDEV_FRONTEND=y
# CONFIG_FB_METRONOME is not set
# CONFIG_FB_MB862XX is not set
# CONFIG_FB_BROADSHEET is not set
# CONFIG_FB_AUO_K190X is not set
# CONFIG_EXYNOS_VIDEO is not set
CONFIG_BACKLIGHT_LCD_SUPPORT=y
# CONFIG_LCD_CLASS_DEVICE is not set
CONFIG_BACKLIGHT_CLASS_DEVICE=y
# CONFIG_BACKLIGHT_GENERIC is not set
# CONFIG_BACKLIGHT_APPLE is not set
# CONFIG_BACKLIGHT_SAHARA is not set
# CONFIG_BACKLIGHT_ADP8860 is not set
# CONFIG_BACKLIGHT_ADP8870 is not set
# CONFIG_BACKLIGHT_LM3630 is not set
# CONFIG_BACKLIGHT_LM3639 is not set
# CONFIG_BACKLIGHT_LP855X is not set

#
# Console display driver support
#
CONFIG_VGA_CONSOLE=y
CONFIG_VGACON_SOFT_SCROLLBACK=y
CONFIG_VGACON_SOFT_SCROLLBACK_SIZE=64
CONFIG_DUMMY_CONSOLE=y
CONFIG_FRAMEBUFFER_CONSOLE=y
CONFIG_FRAMEBUFFER_CONSOLE_DETECT_PRIMARY=y
CONFIG_FRAMEBUFFER_CONSOLE_ROTATION=y
# CONFIG_FONTS is not set
CONFIG_FONT_8x8=y
CONFIG_FONT_8x16=y
CONFIG_LOGO=y
# CONFIG_LOGO_LINUX_MONO is not set
# CONFIG_LOGO_LINUX_VGA16 is not set
CONFIG_LOGO_LINUX_CLUT224=y
# CONFIG_SOUND is not set

#
# HID support
#
CONFIG_HID=y
# CONFIG_HID_BATTERY_STRENGTH is not set
CONFIG_HIDRAW=y
# CONFIG_UHID is not set
CONFIG_HID_GENERIC=y

#
# Special HID drivers
#
CONFIG_HID_A4TECH=y
# CONFIG_HID_ACRUX is not set
CONFIG_HID_APPLE=y
# CONFIG_HID_AUREAL is not set
CONFIG_HID_BELKIN=y
CONFIG_HID_CHERRY=y
CONFIG_HID_CHICONY=y
CONFIG_HID_CYPRESS=y
CONFIG_HID_DRAGONRISE=y
# CONFIG_DRAGONRISE_FF is not set
# CONFIG_HID_EMS_FF is not set
CONFIG_HID_EZKEY=y
# CONFIG_HID_HOLTEK is not set
# CONFIG_HID_KEYTOUCH is not set
CONFIG_HID_KYE=y
# CONFIG_HID_UCLOGIC is not set
# CONFIG_HID_WALTOP is not set
CONFIG_HID_GYRATION=y
CONFIG_HID_TWINHAN=y
CONFIG_HID_KENSINGTON=y
# CONFIG_HID_LCPOWER is not set
# CONFIG_HID_LENOVO_TPKBD is not set
CONFIG_HID_LOGITECH=y
# CONFIG_HID_LOGITECH_DJ is not set
# CONFIG_LOGITECH_FF is not set
# CONFIG_LOGIRUMBLEPAD2_FF is not set
# CONFIG_LOGIG940_FF is not set
# CONFIG_LOGIWHEELS_FF is not set
CONFIG_HID_MICROSOFT=y
CONFIG_HID_MONTEREY=y
# CONFIG_HID_MULTITOUCH is not set
CONFIG_HID_NTRIG=y
# CONFIG_HID_ORTEK is not set
CONFIG_HID_PANTHERLORD=y
# CONFIG_PANTHERLORD_FF is not set
CONFIG_HID_PETALYNX=y
# CONFIG_HID_PICOLCD is not set
# CONFIG_HID_PRIMAX is not set
# CONFIG_HID_ROCCAT is not set
# CONFIG_HID_SAITEK is not set
CONFIG_HID_SAMSUNG=y
CONFIG_HID_SONY=y
# CONFIG_HID_SPEEDLINK is not set
CONFIG_HID_SUNPLUS=y
CONFIG_HID_GREENASIA=y
# CONFIG_GREENASIA_FF is not set
CONFIG_HID_SMARTJOYPLUS=y
CONFIG_SMARTJOYPLUS_FF=y
# CONFIG_HID_TIVO is not set
CONFIG_HID_TOPSEED=y
CONFIG_HID_THRUSTMASTER=y
# CONFIG_THRUSTMASTER_FF is not set
CONFIG_HID_ZEROPLUS=y
# CONFIG_ZEROPLUS_FF is not set
# CONFIG_HID_ZYDACRON is not set
# CONFIG_HID_SENSOR_HUB is not set

#
# USB HID support
#
CONFIG_USB_HID=y
CONFIG_HID_PID=y
CONFIG_USB_HIDDEV=y
CONFIG_USB_ARCH_HAS_OHCI=y
CONFIG_USB_ARCH_HAS_EHCI=y
CONFIG_USB_ARCH_HAS_XHCI=y
CONFIG_USB_SUPPORT=y
CONFIG_USB_COMMON=y
CONFIG_USB_ARCH_HAS_HCD=y
CONFIG_USB=y
# CONFIG_USB_DEBUG is not set
CONFIG_USB_ANNOUNCE_NEW_DEVICES=y

#
# Miscellaneous USB options
#
# CONFIG_USB_DYNAMIC_MINORS is not set
CONFIG_USB_SUSPEND=y
# CONFIG_USB_OTG is not set
CONFIG_USB_MON=y
# CONFIG_USB_WUSB_CBAF is not set

#
# USB Host Controller Drivers
#
# CONFIG_USB_C67X00_HCD is not set
# CONFIG_USB_XHCI_HCD is not set
CONFIG_USB_EHCI_HCD=y
CONFIG_USB_EHCI_ROOT_HUB_TT=y
CONFIG_USB_EHCI_TT_NEWSCHED=y
# CONFIG_USB_OXU210HP_HCD is not set
# CONFIG_USB_ISP116X_HCD is not set
# CONFIG_USB_ISP1760_HCD is not set
# CONFIG_USB_ISP1362_HCD is not set
CONFIG_USB_OHCI_HCD=y
# CONFIG_USB_OHCI_HCD_PLATFORM is not set
# CONFIG_USB_EHCI_HCD_PLATFORM is not set
# CONFIG_USB_OHCI_BIG_ENDIAN_DESC is not set
# CONFIG_USB_OHCI_BIG_ENDIAN_MMIO is not set
CONFIG_USB_OHCI_LITTLE_ENDIAN=y
CONFIG_USB_UHCI_HCD=y
# CONFIG_USB_SL811_HCD is not set
# CONFIG_USB_R8A66597_HCD is not set
# CONFIG_USB_CHIPIDEA is not set

#
# USB Device Class drivers
#
# CONFIG_USB_ACM is not set
# CONFIG_USB_PRINTER is not set
# CONFIG_USB_WDM is not set
# CONFIG_USB_TMC is not set

#
# NOTE: USB_STORAGE depends on SCSI but BLK_DEV_SD may
#

#
# also be needed; see USB_STORAGE Help for more info
#
# CONFIG_USB_STORAGE is not set
# CONFIG_USB_UAS is not set

#
# USB Imaging devices
#
# CONFIG_USB_MDC800 is not set
# CONFIG_USB_MICROTEK is not set

#
# USB port drivers
#
# CONFIG_USB_SERIAL is not set

#
# USB Miscellaneous drivers
#
# CONFIG_USB_EMI62 is not set
# CONFIG_USB_EMI26 is not set
# CONFIG_USB_ADUTUX is not set
# CONFIG_USB_SEVSEG is not set
# CONFIG_USB_RIO500 is not set
# CONFIG_USB_LEGOTOWER is not set
# CONFIG_USB_LCD is not set
# CONFIG_USB_LED is not set
# CONFIG_USB_CYPRESS_CY7C63 is not set
# CONFIG_USB_CYTHERM is not set
# CONFIG_USB_IDMOUSE is not set
# CONFIG_USB_FTDI_ELAN is not set
# CONFIG_USB_APPLEDISPLAY is not set
# CONFIG_USB_SISUSBVGA is not set
# CONFIG_USB_LD is not set
# CONFIG_USB_TRANCEVIBRATOR is not set
# CONFIG_USB_IOWARRIOR is not set
# CONFIG_USB_TEST is not set
# CONFIG_USB_ISIGHTFW is not set
# CONFIG_USB_YUREX is not set
# CONFIG_USB_EZUSB_FX2 is not set

#
# USB Physical Layer drivers
#
# CONFIG_OMAP_USB2 is not set
# CONFIG_USB_ISP1301 is not set
# CONFIG_USB_GADGET is not set

#
# OTG and related infrastructure
#
# CONFIG_USB_GPIO_VBUS is not set
# CONFIG_NOP_USB_XCEIV is not set
# CONFIG_UWB is not set
# CONFIG_MMC is not set
# CONFIG_MEMSTICK is not set
CONFIG_NEW_LEDS=y
CONFIG_LEDS_CLASS=y

#
# LED drivers
#
# CONFIG_LEDS_LM3530 is not set
# CONFIG_LEDS_LM3642 is not set
# CONFIG_LEDS_PCA9532 is not set
# CONFIG_LEDS_GPIO is not set
# CONFIG_LEDS_LP3944 is not set
# CONFIG_LEDS_LP5521 is not set
# CONFIG_LEDS_LP5523 is not set
# CONFIG_LEDS_CLEVO_MAIL is not set
# CONFIG_LEDS_PCA955X is not set
# CONFIG_LEDS_PCA9633 is not set
# CONFIG_LEDS_REGULATOR is not set
# CONFIG_LEDS_BD2802 is not set
# CONFIG_LEDS_INTEL_SS4200 is not set
# CONFIG_LEDS_LT3593 is not set
# CONFIG_LEDS_TCA6507 is not set
# CONFIG_LEDS_LM355x is not set
# CONFIG_LEDS_OT200 is not set
# CONFIG_LEDS_BLINKM is not set
CONFIG_LEDS_TRIGGERS=y

#
# LED Triggers
#
# CONFIG_LEDS_TRIGGER_TIMER is not set
# CONFIG_LEDS_TRIGGER_ONESHOT is not set
# CONFIG_LEDS_TRIGGER_HEARTBEAT is not set
# CONFIG_LEDS_TRIGGER_BACKLIGHT is not set
# CONFIG_LEDS_TRIGGER_CPU is not set
# CONFIG_LEDS_TRIGGER_GPIO is not set
# CONFIG_LEDS_TRIGGER_DEFAULT_ON is not set

#
# iptables trigger is under Netfilter config (LED target)
#
# CONFIG_LEDS_TRIGGER_TRANSIENT is not set
# CONFIG_ACCESSIBILITY is not set
# CONFIG_INFINIBAND is not set
CONFIG_EDAC=y

#
# Reporting subsystems
#
CONFIG_EDAC_LEGACY_SYSFS=y
# CONFIG_EDAC_DEBUG is not set
# CONFIG_EDAC_DECODE_MCE is not set
# CONFIG_EDAC_MM_EDAC is not set
CONFIG_RTC_LIB=y
CONFIG_RTC_CLASS=y
CONFIG_RTC_HCTOSYS=y
CONFIG_RTC_HCTOSYS_DEVICE="rtc0"
# CONFIG_RTC_DEBUG is not set

#
# RTC interfaces
#
CONFIG_RTC_INTF_SYSFS=y
CONFIG_RTC_INTF_PROC=y
CONFIG_RTC_INTF_DEV=y
# CONFIG_RTC_INTF_DEV_UIE_EMUL is not set
# CONFIG_RTC_DRV_TEST is not set

#
# I2C RTC drivers
#
# CONFIG_RTC_DRV_DS1307 is not set
# CONFIG_RTC_DRV_DS1374 is not set
# CONFIG_RTC_DRV_DS1672 is not set
# CONFIG_RTC_DRV_DS3232 is not set
# CONFIG_RTC_DRV_MAX6900 is not set
# CONFIG_RTC_DRV_RS5C372 is not set
# CONFIG_RTC_DRV_ISL1208 is not set
# CONFIG_RTC_DRV_ISL12022 is not set
# CONFIG_RTC_DRV_X1205 is not set
# CONFIG_RTC_DRV_PCF8563 is not set
# CONFIG_RTC_DRV_PCF8583 is not set
# CONFIG_RTC_DRV_M41T80 is not set
# CONFIG_RTC_DRV_BQ32K is not set
# CONFIG_RTC_DRV_S35390A is not set
# CONFIG_RTC_DRV_FM3130 is not set
# CONFIG_RTC_DRV_RX8581 is not set
# CONFIG_RTC_DRV_RX8025 is not set
# CONFIG_RTC_DRV_EM3027 is not set
# CONFIG_RTC_DRV_RV3029C2 is not set

#
# SPI RTC drivers
#

#
# Platform RTC drivers
#
CONFIG_RTC_DRV_CMOS=y
# CONFIG_RTC_DRV_DS1286 is not set
# CONFIG_RTC_DRV_DS1511 is not set
# CONFIG_RTC_DRV_DS1553 is not set
# CONFIG_RTC_DRV_DS1742 is not set
# CONFIG_RTC_DRV_STK17TA8 is not set
# CONFIG_RTC_DRV_M48T86 is not set
# CONFIG_RTC_DRV_M48T35 is not set
# CONFIG_RTC_DRV_M48T59 is not set
# CONFIG_RTC_DRV_MSM6242 is not set
# CONFIG_RTC_DRV_BQ4802 is not set
# CONFIG_RTC_DRV_RP5C01 is not set
# CONFIG_RTC_DRV_V3020 is not set
# CONFIG_RTC_DRV_DS2404 is not set

#
# on-CPU RTC drivers
#
CONFIG_DMADEVICES=y
# CONFIG_DMADEVICES_DEBUG is not set

#
# DMA Devices
#
# CONFIG_INTEL_MID_DMAC is not set
# CONFIG_INTEL_IOATDMA is not set
# CONFIG_TIMB_DMA is not set
# CONFIG_PCH_DMA is not set
CONFIG_AUXDISPLAY=y
# CONFIG_UIO is not set
# CONFIG_VFIO is not set

#
# Virtio drivers
#
# CONFIG_VIRTIO_PCI is not set
# CONFIG_VIRTIO_MMIO is not set

#
# Microsoft Hyper-V guest support
#
# CONFIG_HYPERV is not set

#
# Xen driver support
#
CONFIG_XEN_BALLOON=y
# CONFIG_XEN_BALLOON_MEMORY_HOTPLUG is not set
CONFIG_XEN_SCRUB_PAGES=y
# CONFIG_XEN_DEV_EVTCHN is not set
CONFIG_XEN_BACKEND=y
# CONFIG_XENFS is not set
CONFIG_XEN_SYS_HYPERVISOR=y
CONFIG_XEN_XENBUS_FRONTEND=y
# CONFIG_XEN_GNTDEV is not set
# CONFIG_XEN_GRANT_DEV_ALLOC is not set
CONFIG_SWIOTLB_XEN=y
# CONFIG_XEN_PCIDEV_BACKEND is not set
CONFIG_XEN_PRIVCMD=m
# CONFIG_XEN_ACPI_PROCESSOR is not set
# CONFIG_XEN_MCE_LOG is not set
CONFIG_STAGING=y
# CONFIG_ET131X is not set
# CONFIG_SLICOSS is not set
# CONFIG_USBIP_CORE is not set
# CONFIG_ECHO is not set
# CONFIG_COMEDI is not set
# CONFIG_ASUS_OLED is not set
# CONFIG_R8187SE is not set
# CONFIG_RTL8192U is not set
# CONFIG_RTLLIB is not set
# CONFIG_R8712U is not set
# CONFIG_RTS_PSTOR is not set
# CONFIG_RTS5139 is not set
# CONFIG_TRANZPORT is not set
# CONFIG_IDE_PHISON is not set
# CONFIG_VT6655 is not set
# CONFIG_VT6656 is not set
# CONFIG_DX_SEP is not set
# CONFIG_ZSMALLOC is not set
# CONFIG_WLAGS49_H2 is not set
# CONFIG_WLAGS49_H25 is not set
# CONFIG_FB_SM7XX is not set
# CONFIG_CRYSTALHD is not set
# CONFIG_FB_XGI is not set
# CONFIG_ACPI_QUICKSTART is not set
# CONFIG_USB_ENESTORAGE is not set
# CONFIG_BCM_WIMAX is not set
# CONFIG_FT1000 is not set

#
# Speakup console speech
#
# CONFIG_SPEAKUP is not set
# CONFIG_TOUCHSCREEN_CLEARPAD_TM1217 is not set
# CONFIG_TOUCHSCREEN_SYNAPTICS_I2C_RMI4 is not set
# CONFIG_STAGING_MEDIA is not set

#
# Android
#
# CONFIG_ANDROID is not set
# CONFIG_PHONE is not set
# CONFIG_USB_WPAN_HCD is not set
# CONFIG_IPACK_BUS is not set
# CONFIG_WIMAX_GDM72XX is not set
CONFIG_NET_VENDOR_SILICOM=y
# CONFIG_SBYPASS is not set
# CONFIG_BPCTL is not set
# CONFIG_CED1401 is not set
# CONFIG_DGRP is not set
CONFIG_X86_PLATFORM_DEVICES=y
# CONFIG_ACERHDF is not set
# CONFIG_ASUS_LAPTOP is not set
# CONFIG_FUJITSU_LAPTOP is not set
# CONFIG_FUJITSU_TABLET is not set
# CONFIG_HP_ACCEL is not set
# CONFIG_PANASONIC_LAPTOP is not set
# CONFIG_THINKPAD_ACPI is not set
# CONFIG_SENSORS_HDAPS is not set
# CONFIG_INTEL_MENLOW is not set
# CONFIG_EEEPC_LAPTOP is not set
# CONFIG_ACPI_WMI is not set
# CONFIG_TOPSTAR_LAPTOP is not set
# CONFIG_TOSHIBA_BT_RFKILL is not set
# CONFIG_ACPI_CMPC is not set
# CONFIG_INTEL_IPS is not set
# CONFIG_IBM_RTL is not set
# CONFIG_XO15_EBOOK is not set
# CONFIG_SAMSUNG_LAPTOP is not set
# CONFIG_SAMSUNG_Q10 is not set
# CONFIG_APPLE_GMUX is not set

#
# Hardware Spinlock drivers
#
CONFIG_CLKEVT_I8253=y
CONFIG_I8253_LOCK=y
CONFIG_CLKBLD_I8253=y
CONFIG_IOMMU_API=y
CONFIG_IOMMU_SUPPORT=y
CONFIG_AMD_IOMMU=y
CONFIG_AMD_IOMMU_STATS=y
# CONFIG_AMD_IOMMU_V2 is not set
# CONFIG_INTEL_IOMMU is not set
# CONFIG_IRQ_REMAP is not set

#
# Remoteproc drivers (EXPERIMENTAL)
#
# CONFIG_STE_MODEM_RPROC is not set

#
# Rpmsg drivers (EXPERIMENTAL)
#
# CONFIG_VIRT_DRIVERS is not set
# CONFIG_PM_DEVFREQ is not set
# CONFIG_EXTCON is not set
# CONFIG_MEMORY is not set
# CONFIG_IIO is not set
# CONFIG_VME_BUS is not set
# CONFIG_PWM is not set

#
# Firmware Drivers
#
# CONFIG_EDD is not set
CONFIG_FIRMWARE_MEMMAP=y
CONFIG_EFI_VARS=y
# CONFIG_DELL_RBU is not set
# CONFIG_DCDBAS is not set
CONFIG_DMIID=y
# CONFIG_DMI_SYSFS is not set
CONFIG_ISCSI_IBFT_FIND=y
# CONFIG_ISCSI_IBFT is not set
# CONFIG_GOOGLE_FIRMWARE is not set

#
# File systems
#
CONFIG_DCACHE_WORD_ACCESS=y
# CONFIG_EXT2_FS is not set
CONFIG_EXT3_FS=m
CONFIG_EXT3_DEFAULTS_TO_ORDERED=y
CONFIG_EXT3_FS_XATTR=y
CONFIG_EXT3_FS_POSIX_ACL=y
CONFIG_EXT3_FS_SECURITY=y
# CONFIG_EXT4_FS is not set
CONFIG_JBD=m
# CONFIG_JBD_DEBUG is not set
CONFIG_FS_MBCACHE=m
# CONFIG_REISERFS_FS is not set
# CONFIG_JFS_FS is not set
# CONFIG_XFS_FS is not set
# CONFIG_GFS2_FS is not set
# CONFIG_BTRFS_FS is not set
# CONFIG_NILFS2_FS is not set
CONFIG_FS_POSIX_ACL=y
CONFIG_FILE_LOCKING=y
CONFIG_FSNOTIFY=y
CONFIG_DNOTIFY=y
CONFIG_INOTIFY_USER=y
# CONFIG_FANOTIFY is not set
CONFIG_QUOTA=y
CONFIG_QUOTA_NETLINK_INTERFACE=y
CONFIG_PRINT_QUOTA_WARNING=y
# CONFIG_QUOTA_DEBUG is not set
CONFIG_QUOTA_TREE=y
# CONFIG_QFMT_V1 is not set
CONFIG_QFMT_V2=y
CONFIG_QUOTACTL=y
CONFIG_QUOTACTL_COMPAT=y
CONFIG_AUTOFS4_FS=m
# CONFIG_FUSE_FS is not set
CONFIG_GENERIC_ACL=y

#
# Caches
#
# CONFIG_FSCACHE is not set

#
# CD-ROM/DVD Filesystems
#
CONFIG_ISO9660_FS=y
CONFIG_JOLIET=y
CONFIG_ZISOFS=y
# CONFIG_UDF_FS is not set

#
# DOS/FAT/NT Filesystems
#
# CONFIG_MSDOS_FS is not set
# CONFIG_VFAT_FS is not set
# CONFIG_NTFS_FS is not set

#
# Pseudo filesystems
#
CONFIG_PROC_FS=y
CONFIG_PROC_KCORE=y
CONFIG_PROC_VMCORE=y
CONFIG_PROC_SYSCTL=y
CONFIG_PROC_PAGE_MONITOR=y
CONFIG_SYSFS=y
CONFIG_TMPFS=y
CONFIG_TMPFS_POSIX_ACL=y
CONFIG_TMPFS_XATTR=y
CONFIG_HUGETLBFS=y
CONFIG_HUGETLB_PAGE=y
# CONFIG_CONFIGFS_FS is not set
CONFIG_MISC_FILESYSTEMS=y
# CONFIG_ADFS_FS is not set
# CONFIG_AFFS_FS is not set
# CONFIG_ECRYPT_FS is not set
# CONFIG_HFS_FS is not set
# CONFIG_HFSPLUS_FS is not set
# CONFIG_BEFS_FS is not set
# CONFIG_BFS_FS is not set
# CONFIG_EFS_FS is not set
# CONFIG_JFFS2_FS is not set
# CONFIG_LOGFS is not set
# CONFIG_CRAMFS is not set
# CONFIG_SQUASHFS is not set
# CONFIG_VXFS_FS is not set
# CONFIG_MINIX_FS is not set
# CONFIG_OMFS_FS is not set
# CONFIG_HPFS_FS is not set
# CONFIG_QNX4FS_FS is not set
# CONFIG_QNX6FS_FS is not set
# CONFIG_ROMFS_FS is not set
CONFIG_PSTORE=y
# CONFIG_PSTORE_CONSOLE is not set
# CONFIG_PSTORE_FTRACE is not set
# CONFIG_PSTORE_RAM is not set
# CONFIG_SYSV_FS is not set
# CONFIG_UFS_FS is not set
CONFIG_NETWORK_FILESYSTEMS=y
# CONFIG_NFS_FS is not set
# CONFIG_NFSD is not set
# CONFIG_CEPH_FS is not set
# CONFIG_CIFS is not set
# CONFIG_NCP_FS is not set
# CONFIG_CODA_FS is not set
# CONFIG_AFS_FS is not set
CONFIG_NLS=y
CONFIG_NLS_DEFAULT="utf8"
CONFIG_NLS_CODEPAGE_437=y
# CONFIG_NLS_CODEPAGE_737 is not set
# CONFIG_NLS_CODEPAGE_775 is not set
# CONFIG_NLS_CODEPAGE_850 is not set
# CONFIG_NLS_CODEPAGE_852 is not set
# CONFIG_NLS_CODEPAGE_855 is not set
# CONFIG_NLS_CODEPAGE_857 is not set
# CONFIG_NLS_CODEPAGE_860 is not set
# CONFIG_NLS_CODEPAGE_861 is not set
# CONFIG_NLS_CODEPAGE_862 is not set
# CONFIG_NLS_CODEPAGE_863 is not set
# CONFIG_NLS_CODEPAGE_864 is not set
# CONFIG_NLS_CODEPAGE_865 is not set
# CONFIG_NLS_CODEPAGE_866 is not set
# CONFIG_NLS_CODEPAGE_869 is not set
# CONFIG_NLS_CODEPAGE_936 is not set
# CONFIG_NLS_CODEPAGE_950 is not set
# CONFIG_NLS_CODEPAGE_932 is not set
# CONFIG_NLS_CODEPAGE_949 is not set
# CONFIG_NLS_CODEPAGE_874 is not set
# CONFIG_NLS_ISO8859_8 is not set
# CONFIG_NLS_CODEPAGE_1250 is not set
# CONFIG_NLS_CODEPAGE_1251 is not set
CONFIG_NLS_ASCII=y
# CONFIG_NLS_ISO8859_1 is not set
# CONFIG_NLS_ISO8859_2 is not set
# CONFIG_NLS_ISO8859_3 is not set
# CONFIG_NLS_ISO8859_4 is not set
# CONFIG_NLS_ISO8859_5 is not set
# CONFIG_NLS_ISO8859_6 is not set
# CONFIG_NLS_ISO8859_7 is not set
# CONFIG_NLS_ISO8859_9 is not set
# CONFIG_NLS_ISO8859_13 is not set
# CONFIG_NLS_ISO8859_14 is not set
# CONFIG_NLS_ISO8859_15 is not set
# CONFIG_NLS_KOI8_R is not set
# CONFIG_NLS_KOI8_U is not set
# CONFIG_NLS_MAC_ROMAN is not set
# CONFIG_NLS_MAC_CELTIC is not set
# CONFIG_NLS_MAC_CENTEURO is not set
# CONFIG_NLS_MAC_CROATIAN is not set
# CONFIG_NLS_MAC_CYRILLIC is not set
# CONFIG_NLS_MAC_GAELIC is not set
# CONFIG_NLS_MAC_GREEK is not set
# CONFIG_NLS_MAC_ICELAND is not set
# CONFIG_NLS_MAC_INUIT is not set
# CONFIG_NLS_MAC_ROMANIAN is not set
# CONFIG_NLS_MAC_TURKISH is not set
# CONFIG_NLS_UTF8 is not set

#
# Kernel hacking
#
CONFIG_TRACE_IRQFLAGS_SUPPORT=y
# CONFIG_PRINTK_TIME is not set
CONFIG_DEFAULT_MESSAGE_LOGLEVEL=7
# CONFIG_ENABLE_WARN_DEPRECATED is not set
CONFIG_ENABLE_MUST_CHECK=y
CONFIG_FRAME_WARN=2048
CONFIG_MAGIC_SYSRQ=y
CONFIG_STRIP_ASM_SYMS=y
# CONFIG_UNUSED_SYMBOLS is not set
CONFIG_DEBUG_FS=y
# CONFIG_HEADERS_CHECK is not set
# CONFIG_DEBUG_SECTION_MISMATCH is not set
# CONFIG_DEBUG_KERNEL is not set
# CONFIG_PANIC_ON_OOPS is not set
CONFIG_PANIC_ON_OOPS_VALUE=0
CONFIG_HAVE_DEBUG_KMEMLEAK=y
# CONFIG_SPARSE_RCU_POINTER is not set
CONFIG_STACKTRACE=y
CONFIG_DEBUG_BUGVERBOSE=y
CONFIG_DEBUG_MEMORY_INIT=y
CONFIG_ARCH_WANT_FRAME_POINTERS=y
# CONFIG_FRAME_POINTER is not set
CONFIG_RCU_CPU_STALL_TIMEOUT=60
# CONFIG_LKDTM is not set
CONFIG_USER_STACKTRACE_SUPPORT=y
CONFIG_NOP_TRACER=y
CONFIG_HAVE_FUNCTION_TRACER=y
CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y
CONFIG_HAVE_FUNCTION_GRAPH_FP_TEST=y
CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST=y
CONFIG_HAVE_DYNAMIC_FTRACE=y
CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y
CONFIG_HAVE_SYSCALL_TRACEPOINTS=y
CONFIG_HAVE_FENTRY=y
CONFIG_HAVE_C_RECORDMCOUNT=y
CONFIG_TRACER_MAX_TRACE=y
CONFIG_TRACE_CLOCK=y
CONFIG_RING_BUFFER=y
CONFIG_EVENT_TRACING=y
CONFIG_EVENT_POWER_TRACING_DEPRECATED=y
CONFIG_CONTEXT_SWITCH_TRACER=y
CONFIG_TRACING=y
CONFIG_GENERIC_TRACER=y
CONFIG_TRACING_SUPPORT=y
CONFIG_FTRACE=y
CONFIG_FUNCTION_TRACER=y
CONFIG_FUNCTION_GRAPH_TRACER=y
# CONFIG_IRQSOFF_TRACER is not set
CONFIG_SCHED_TRACER=y
CONFIG_FTRACE_SYSCALLS=y
CONFIG_BRANCH_PROFILE_NONE=y
# CONFIG_PROFILE_ANNOTATED_BRANCHES is not set
# CONFIG_PROFILE_ALL_BRANCHES is not set
CONFIG_STACK_TRACER=y
CONFIG_BLK_DEV_IO_TRACE=y
CONFIG_KPROBE_EVENT=y
# CONFIG_UPROBE_EVENT is not set
CONFIG_PROBE_EVENTS=y
CONFIG_DYNAMIC_FTRACE=y
CONFIG_FUNCTION_PROFILER=y
CONFIG_FTRACE_MCOUNT_RECORD=y
# CONFIG_FTRACE_STARTUP_TEST is not set
# CONFIG_MMIOTRACE is not set
# CONFIG_RING_BUFFER_BENCHMARK is not set
# CONFIG_PROVIDE_OHCI1394_DMA_INIT is not set
# CONFIG_DYNAMIC_DEBUG is not set
# CONFIG_DMA_API_DEBUG is not set
# CONFIG_ATOMIC64_SELFTEST is not set
# CONFIG_SAMPLES is not set
CONFIG_HAVE_ARCH_KGDB=y
CONFIG_HAVE_ARCH_KMEMCHECK=y
# CONFIG_TEST_KSTRTOX is not set
CONFIG_STRICT_DEVMEM=y
# CONFIG_X86_VERBOSE_BOOTUP is not set
CONFIG_EARLY_PRINTK=y
CONFIG_EARLY_PRINTK_DBGP=y
# CONFIG_DEBUG_SET_MODULE_RONX is not set
# CONFIG_IOMMU_STRESS is not set
CONFIG_HAVE_MMIOTRACE_SUPPORT=y
CONFIG_IO_DELAY_TYPE_0X80=0
CONFIG_IO_DELAY_TYPE_0XED=1
CONFIG_IO_DELAY_TYPE_UDELAY=2
CONFIG_IO_DELAY_TYPE_NONE=3
CONFIG_IO_DELAY_0X80=y
# CONFIG_IO_DELAY_0XED is not set
# CONFIG_IO_DELAY_UDELAY is not set
# CONFIG_IO_DELAY_NONE is not set
CONFIG_DEFAULT_IO_DELAY_TYPE=0
CONFIG_OPTIMIZE_INLINING=y

#
# Security options
#
CONFIG_KEYS=y
# CONFIG_TRUSTED_KEYS is not set
# CONFIG_ENCRYPTED_KEYS is not set
CONFIG_KEYS_DEBUG_PROC_KEYS=y
# CONFIG_SECURITY_DMESG_RESTRICT is not set
CONFIG_SECURITY=y
CONFIG_SECURITYFS=y
CONFIG_SECURITY_NETWORK=y
CONFIG_SECURITY_NETWORK_XFRM=y
# CONFIG_SECURITY_PATH is not set
CONFIG_LSM_MMAP_MIN_ADDR=65535
CONFIG_SECURITY_SELINUX=y
CONFIG_SECURITY_SELINUX_BOOTPARAM=y
CONFIG_SECURITY_SELINUX_BOOTPARAM_VALUE=1
CONFIG_SECURITY_SELINUX_DISABLE=y
CONFIG_SECURITY_SELINUX_DEVELOP=y
CONFIG_SECURITY_SELINUX_AVC_STATS=y
CONFIG_SECURITY_SELINUX_CHECKREQPROT_VALUE=1
# CONFIG_SECURITY_SELINUX_POLICYDB_VERSION_MAX is not set
# CONFIG_SECURITY_SMACK is not set
# CONFIG_SECURITY_TOMOYO is not set
# CONFIG_SECURITY_APPARMOR is not set
# CONFIG_SECURITY_YAMA is not set
CONFIG_INTEGRITY=y
# CONFIG_INTEGRITY_SIGNATURE is not set
CONFIG_IMA=y
CONFIG_IMA_MEASURE_PCR_IDX=10
CONFIG_IMA_AUDIT=y
CONFIG_IMA_LSM_RULES=y
# CONFIG_IMA_APPRAISE is not set
# CONFIG_EVM is not set
CONFIG_DEFAULT_SECURITY_SELINUX=y
# CONFIG_DEFAULT_SECURITY_DAC is not set
CONFIG_DEFAULT_SECURITY="selinux"
CONFIG_CRYPTO=y

#
# Crypto core or helper
#
CONFIG_CRYPTO_ALGAPI=y
CONFIG_CRYPTO_ALGAPI2=y
CONFIG_CRYPTO_AEAD2=y
CONFIG_CRYPTO_BLKCIPHER2=y
CONFIG_CRYPTO_HASH=y
CONFIG_CRYPTO_HASH2=y
CONFIG_CRYPTO_RNG2=y
CONFIG_CRYPTO_PCOMP2=y
CONFIG_CRYPTO_MANAGER=y
CONFIG_CRYPTO_MANAGER2=y
# CONFIG_CRYPTO_USER is not set
CONFIG_CRYPTO_MANAGER_DISABLE_TESTS=y
# CONFIG_CRYPTO_GF128MUL is not set
# CONFIG_CRYPTO_NULL is not set
# CONFIG_CRYPTO_PCRYPT is not set
CONFIG_CRYPTO_WORKQUEUE=y
# CONFIG_CRYPTO_CRYPTD is not set
# CONFIG_CRYPTO_AUTHENC is not set
# CONFIG_CRYPTO_TEST is not set

#
# Authenticated Encryption with Associated Data
#
# CONFIG_CRYPTO_CCM is not set
# CONFIG_CRYPTO_GCM is not set
# CONFIG_CRYPTO_SEQIV is not set

#
# Block modes
#
# CONFIG_CRYPTO_CBC is not set
# CONFIG_CRYPTO_CTR is not set
# CONFIG_CRYPTO_CTS is not set
# CONFIG_CRYPTO_ECB is not set
# CONFIG_CRYPTO_LRW is not set
# CONFIG_CRYPTO_PCBC is not set
# CONFIG_CRYPTO_XTS is not set

#
# Hash modes
#
CONFIG_CRYPTO_HMAC=y
# CONFIG_CRYPTO_XCBC is not set
# CONFIG_CRYPTO_VMAC is not set

#
# Digest
#
CONFIG_CRYPTO_CRC32C=y
# CONFIG_CRYPTO_CRC32C_INTEL is not set
# CONFIG_CRYPTO_GHASH is not set
# CONFIG_CRYPTO_MD4 is not set
CONFIG_CRYPTO_MD5=y
# CONFIG_CRYPTO_MICHAEL_MIC is not set
# CONFIG_CRYPTO_RMD128 is not set
# CONFIG_CRYPTO_RMD160 is not set
# CONFIG_CRYPTO_RMD256 is not set
# CONFIG_CRYPTO_RMD320 is not set
CONFIG_CRYPTO_SHA1=y
# CONFIG_CRYPTO_SHA1_SSSE3 is not set
# CONFIG_CRYPTO_SHA256 is not set
# CONFIG_CRYPTO_SHA512 is not set
# CONFIG_CRYPTO_TGR192 is not set
# CONFIG_CRYPTO_WP512 is not set
# CONFIG_CRYPTO_GHASH_CLMUL_NI_INTEL is not set

#
# Ciphers
#
CONFIG_CRYPTO_AES=y
# CONFIG_CRYPTO_AES_X86_64 is not set
# CONFIG_CRYPTO_AES_NI_INTEL is not set
# CONFIG_CRYPTO_ANUBIS is not set
# CONFIG_CRYPTO_ARC4 is not set
# CONFIG_CRYPTO_BLOWFISH is not set
# CONFIG_CRYPTO_BLOWFISH_X86_64 is not set
# CONFIG_CRYPTO_CAMELLIA is not set
# CONFIG_CRYPTO_CAMELLIA_X86_64 is not set
# CONFIG_CRYPTO_CAST5 is not set
# CONFIG_CRYPTO_CAST5_AVX_X86_64 is not set
# CONFIG_CRYPTO_CAST6 is not set
# CONFIG_CRYPTO_CAST6_AVX_X86_64 is not set
# CONFIG_CRYPTO_DES is not set
# CONFIG_CRYPTO_FCRYPT is not set
# CONFIG_CRYPTO_KHAZAD is not set
# CONFIG_CRYPTO_SALSA20 is not set
# CONFIG_CRYPTO_SALSA20_X86_64 is not set
# CONFIG_CRYPTO_SEED is not set
# CONFIG_CRYPTO_SERPENT is not set
# CONFIG_CRYPTO_SERPENT_SSE2_X86_64 is not set
# CONFIG_CRYPTO_SERPENT_AVX_X86_64 is not set
# CONFIG_CRYPTO_TEA is not set
# CONFIG_CRYPTO_TWOFISH is not set
# CONFIG_CRYPTO_TWOFISH_X86_64 is not set
# CONFIG_CRYPTO_TWOFISH_X86_64_3WAY is not set
# CONFIG_CRYPTO_TWOFISH_AVX_X86_64 is not set

#
# Compression
#
# CONFIG_CRYPTO_DEFLATE is not set
# CONFIG_CRYPTO_ZLIB is not set
# CONFIG_CRYPTO_LZO is not set

#
# Random Number Generation
#
# CONFIG_CRYPTO_ANSI_CPRNG is not set
# CONFIG_CRYPTO_USER_API_HASH is not set
# CONFIG_CRYPTO_USER_API_SKCIPHER is not set
CONFIG_CRYPTO_HW=y
# CONFIG_CRYPTO_DEV_PADLOCK is not set
# CONFIG_ASYMMETRIC_KEY_TYPE is not set
CONFIG_HAVE_KVM=y
CONFIG_HAVE_KVM_IRQCHIP=y
CONFIG_HAVE_KVM_EVENTFD=y
CONFIG_KVM_APIC_ARCHITECTURE=y
CONFIG_KVM_MMIO=y
CONFIG_KVM_ASYNC_PF=y
CONFIG_HAVE_KVM_MSI=y
CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT=y
CONFIG_VIRTUALIZATION=y
CONFIG_KVM=m
CONFIG_KVM_INTEL=m
# CONFIG_KVM_AMD is not set
# CONFIG_KVM_MMU_AUDIT is not set
# CONFIG_VHOST_NET is not set
CONFIG_BINARY_PRINTF=y

#
# Library routines
#
CONFIG_BITREVERSE=y
CONFIG_GENERIC_STRNCPY_FROM_USER=y
CONFIG_GENERIC_STRNLEN_USER=y
CONFIG_GENERIC_FIND_FIRST_BIT=y
CONFIG_GENERIC_PCI_IOMAP=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_IO=y
# CONFIG_CRC_CCITT is not set
CONFIG_CRC16=y
CONFIG_CRC_T10DIF=m
# CONFIG_CRC_ITU_T is not set
CONFIG_CRC32=y
# CONFIG_CRC32_SELFTEST is not set
CONFIG_CRC32_SLICEBY8=y
# CONFIG_CRC32_SLICEBY4 is not set
# CONFIG_CRC32_SARWATE is not set
# CONFIG_CRC32_BIT is not set
# CONFIG_CRC7 is not set
# CONFIG_LIBCRC32C is not set
# CONFIG_CRC8 is not set
CONFIG_ZLIB_INFLATE=y
CONFIG_LZO_COMPRESS=y
CONFIG_LZO_DECOMPRESS=y
CONFIG_XZ_DEC=y
CONFIG_XZ_DEC_X86=y
CONFIG_XZ_DEC_POWERPC=y
CONFIG_XZ_DEC_IA64=y
CONFIG_XZ_DEC_ARM=y
CONFIG_XZ_DEC_ARMTHUMB=y
CONFIG_XZ_DEC_SPARC=y
CONFIG_XZ_DEC_BCJ=y
# CONFIG_XZ_DEC_TEST is not set
CONFIG_DECOMPRESS_GZIP=y
CONFIG_DECOMPRESS_BZIP2=y
CONFIG_DECOMPRESS_LZMA=y
CONFIG_DECOMPRESS_XZ=y
CONFIG_DECOMPRESS_LZO=y
CONFIG_HAS_IOMEM=y
CONFIG_HAS_IOPORT=y
CONFIG_HAS_DMA=y
CONFIG_CPU_RMAP=y
CONFIG_DQL=y
CONFIG_NLATTR=y
CONFIG_ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE=y
CONFIG_AVERAGE=y
# CONFIG_CORDIC is not set
# CONFIG_DDR is not set
Post by Ingo Molnar
Thanks,
Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Srikar Dronamraju
2012-12-11 00:20:02 UTC
Permalink
Post by Srikar Dronamraju
Here are the results of running autonumabenchmark on a 64 core, 8 node
machine. Has six 32GB nodes and two 64 GB nodes.
KernelVersion: 3.7.0-rc6-mel_auto_balance(mm-balancenuma-v10r3)
Testcase: Min Max Avg %Change
numa01: 1536.93 1819.85 1694.54 -8.22%
numa01_HARD_BIND: 909.67 1145.32 1055.57 -5.90%
numa01_INVERSE_BIND: 2882.07 3287.24 2976.89 22.10%
numa01_THREAD_ALLOC: 995.79 4845.27 1905.85 -41.17%
numa01_THREAD_ALLOC_HARD_BIND: 582.36 818.11 655.18 20.99%
numa01_THREAD_ALLOC_INVERSE_BIND: 1790.91 1927.90 1868.49 6.45%
numa02: 131.53 287.93 209.15 -29.70%
numa02_HARD_BIND: 25.68 31.90 27.66 -2.60%
numa02_INVERSE_BIND: 341.09 401.37 353.84 -1.34%
numa02_SMT: 156.61 2036.63 731.97 -47.21%
numa02_SMT_HARD_BIND: 25.10 196.60 79.72 27.92%
numa02_SMT_INVERSE_BIND: 294.22 1801.59 824.41 -60.55%
Here is the config I used for balancenuma.

#
# Automatically generated file; DO NOT EDIT.
# Linux/x86_64 3.7.0-rc6 Kernel Configuration
#
CONFIG_64BIT=y
CONFIG_X86_64=y
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_OUTPUT_FORMAT="elf64-x86-64"
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig"
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_HAVE_LATENCYTOP_SUPPORT=y
CONFIG_MMU=y
CONFIG_NEED_DMA_MAP_STATE=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_GENERIC_GPIO=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_DEFAULT_IDLE=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_ARCH_HAS_CPU_AUTOPROBE=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ZONE_DMA32=y
CONFIG_AUDIT_ARCH=y
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_X86_64_SMP=y
CONFIG_X86_HT=y
CONFIG_ARCH_HWEIGHT_CFLAGS="-fcall-saved-rdi -fcall-saved-rsi -fcall-saved-rdx -fcall-saved-rcx -fcall-saved-r8 -fcall-saved-r9 -fcall-saved-r10 -fcall-saved-r11"
CONFIG_ARCH_CPU_PROBE_RELEASE=y
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
CONFIG_HAVE_IRQ_WORK=y
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y

#
# General setup
#
CONFIG_EXPERIMENTAL=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_CROSS_COMPILE=""
CONFIG_LOCALVERSION="-mel_auto_balance"
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_KERNEL_GZIP=y
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
# CONFIG_KERNEL_XZ is not set
# CONFIG_KERNEL_LZO is not set
CONFIG_DEFAULT_HOSTNAME="(none)"
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_POSIX_MQUEUE_SYSCTL=y
# CONFIG_FHANDLE is not set
CONFIG_AUDIT=y
CONFIG_AUDITSYSCALL=y
CONFIG_AUDIT_WATCH=y
CONFIG_AUDIT_TREE=y
# CONFIG_AUDIT_LOGINUID_IMMUTABLE is not set
CONFIG_HAVE_GENERIC_HARDIRQS=y

#
# IRQ subsystem
#
CONFIG_GENERIC_HARDIRQS=y
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_ARCH_CLOCKSOURCE_DATA=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BUILD=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y
CONFIG_GENERIC_CMOS_UPDATE=y

#
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y

#
# CPU/Task time and stats accounting
#
CONFIG_TICK_CPU_ACCOUNTING=y
# CONFIG_IRQ_TIME_ACCOUNTING is not set
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_BSD_PROCESS_ACCT_V3=y
CONFIG_TASKSTATS=y
CONFIG_TASK_DELAY_ACCT=y
CONFIG_TASK_XACCT=y
CONFIG_TASK_IO_ACCOUNTING=y

#
# RCU Subsystem
#
CONFIG_TREE_RCU=y
# CONFIG_PREEMPT_RCU is not set
# CONFIG_RCU_USER_QS is not set
CONFIG_RCU_FANOUT=64
CONFIG_RCU_FANOUT_LEAF=16
# CONFIG_RCU_FANOUT_EXACT is not set
# CONFIG_RCU_FAST_NO_HZ is not set
# CONFIG_TREE_RCU_TRACE is not set
CONFIG_IKCONFIG=m
# CONFIG_IKCONFIG_PROC is not set
CONFIG_LOG_BUF_SHIFT=19
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y
CONFIG_ARCH_SUPPORTS_NUMA_BALANCING=y
CONFIG_ARCH_WANTS_PROT_NUMA_PROT_NONE=y
CONFIG_ARCH_USES_NUMA_PROT_NONE=y
CONFIG_BALANCE_NUMA_DEFAULT_ENABLED=y
CONFIG_BALANCE_NUMA=y
CONFIG_CGROUPS=y
# CONFIG_CGROUP_DEBUG is not set
CONFIG_CGROUP_FREEZER=y
CONFIG_CGROUP_DEVICE=y
CONFIG_CPUSETS=y
CONFIG_PROC_PID_CPUSET=y
CONFIG_CGROUP_CPUACCT=y
CONFIG_RESOURCE_COUNTERS=y
# CONFIG_MEMCG is not set
# CONFIG_CGROUP_HUGETLB is not set
# CONFIG_CGROUP_PERF is not set
CONFIG_CGROUP_SCHED=y
CONFIG_FAIR_GROUP_SCHED=y
# CONFIG_CFS_BANDWIDTH is not set
CONFIG_RT_GROUP_SCHED=y
CONFIG_BLK_CGROUP=y
# CONFIG_DEBUG_BLK_CGROUP is not set
# CONFIG_CHECKPOINT_RESTORE is not set
CONFIG_NAMESPACES=y
CONFIG_UTS_NS=y
CONFIG_IPC_NS=y
CONFIG_PID_NS=y
CONFIG_NET_NS=y
CONFIG_SCHED_AUTOGROUP=y
# CONFIG_SYSFS_DEPRECATED is not set
CONFIG_RELAY=y
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
CONFIG_RD_GZIP=y
CONFIG_RD_BZIP2=y
CONFIG_RD_LZMA=y
CONFIG_RD_XZ=y
CONFIG_RD_LZO=y
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
CONFIG_SYSCTL=y
CONFIG_ANON_INODES=y
# CONFIG_EXPERT is not set
CONFIG_HAVE_UID16=y
CONFIG_UID16=y
# CONFIG_SYSCTL_SYSCALL is not set
CONFIG_SYSCTL_EXCEPTION_TRACE=y
CONFIG_KALLSYMS=y
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_PCSPKR_PLATFORM=y
CONFIG_HAVE_PCSPKR_PLATFORM=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_TIMERFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_AIO=y
# CONFIG_EMBEDDED is not set
CONFIG_HAVE_PERF_EVENTS=y

#
# Kernel Performance Events And Counters
#
CONFIG_PERF_EVENTS=y
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_PCI_QUIRKS=y
# CONFIG_COMPAT_BRK is not set
CONFIG_SLAB=y
# CONFIG_SLUB is not set
CONFIG_PROFILING=y
CONFIG_TRACEPOINTS=y
# CONFIG_OPROFILE is not set
CONFIG_HAVE_OPROFILE=y
CONFIG_OPROFILE_NMI_TIMER=y
CONFIG_KPROBES=y
# CONFIG_JUMP_LABEL is not set
CONFIG_OPTPROBES=y
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y
CONFIG_KRETPROBES=y
CONFIG_USER_RETURN_NOTIFIER=y
CONFIG_HAVE_IOREMAP_PROT=y
CONFIG_HAVE_KPROBES=y
CONFIG_HAVE_KRETPROBES=y
CONFIG_HAVE_OPTPROBES=y
CONFIG_HAVE_ARCH_TRACEHOOK=y
CONFIG_HAVE_DMA_ATTRS=y
CONFIG_USE_GENERIC_SMP_HELPERS=y
CONFIG_GENERIC_SMP_IDLE_THREAD=y
CONFIG_HAVE_REGS_AND_STACK_ACCESS_API=y
CONFIG_HAVE_DMA_API_DEBUG=y
CONFIG_HAVE_HW_BREAKPOINT=y
CONFIG_HAVE_MIXED_BREAKPOINTS_REGS=y
CONFIG_HAVE_USER_RETURN_NOTIFIER=y
CONFIG_HAVE_PERF_EVENTS_NMI=y
CONFIG_HAVE_PERF_REGS=y
CONFIG_HAVE_PERF_USER_STACK_DUMP=y
CONFIG_HAVE_ARCH_JUMP_LABEL=y
CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG=y
CONFIG_HAVE_CMPXCHG_LOCAL=y
CONFIG_HAVE_CMPXCHG_DOUBLE=y
CONFIG_ARCH_WANT_COMPAT_IPC_PARSE_VERSION=y
CONFIG_ARCH_WANT_OLD_COMPAT_IPC=y
CONFIG_GENERIC_KERNEL_THREAD=y
CONFIG_GENERIC_KERNEL_EXECVE=y
CONFIG_HAVE_ARCH_SECCOMP_FILTER=y
CONFIG_HAVE_RCU_USER_QS=y
CONFIG_HAVE_IRQ_TIME_ACCOUNTING=y
CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y
CONFIG_MODULES_USE_ELF_RELA=y

#
# GCOV-based kernel profiling
#
# CONFIG_GCOV_KERNEL is not set
# CONFIG_HAVE_GENERIC_DMA_COHERENT is not set
CONFIG_SLABINFO=y
CONFIG_RT_MUTEXES=y
CONFIG_BASE_SMALL=0
CONFIG_MODULES=y
CONFIG_MODULE_FORCE_LOAD=y
CONFIG_MODULE_UNLOAD=y
# CONFIG_MODULE_FORCE_UNLOAD is not set
CONFIG_MODVERSIONS=y
CONFIG_MODULE_SRCVERSION_ALL=y
# CONFIG_MODULE_SIG is not set
CONFIG_STOP_MACHINE=y
CONFIG_BLOCK=y
CONFIG_BLK_DEV_BSG=y
CONFIG_BLK_DEV_BSGLIB=y
CONFIG_BLK_DEV_INTEGRITY=y
CONFIG_BLK_DEV_THROTTLING=y

#
# Partition Types
#
CONFIG_PARTITION_ADVANCED=y
# CONFIG_ACORN_PARTITION is not set
CONFIG_OSF_PARTITION=y
CONFIG_AMIGA_PARTITION=y
# CONFIG_ATARI_PARTITION is not set
CONFIG_MAC_PARTITION=y
CONFIG_MSDOS_PARTITION=y
CONFIG_BSD_DISKLABEL=y
CONFIG_MINIX_SUBPARTITION=y
CONFIG_SOLARIS_X86_PARTITION=y
CONFIG_UNIXWARE_DISKLABEL=y
# CONFIG_LDM_PARTITION is not set
CONFIG_SGI_PARTITION=y
# CONFIG_ULTRIX_PARTITION is not set
CONFIG_SUN_PARTITION=y
CONFIG_KARMA_PARTITION=y
CONFIG_EFI_PARTITION=y
# CONFIG_SYSV68_PARTITION is not set
CONFIG_BLOCK_COMPAT=y

#
# IO Schedulers
#
CONFIG_IOSCHED_NOOP=y
CONFIG_IOSCHED_DEADLINE=y
CONFIG_IOSCHED_CFQ=y
CONFIG_CFQ_GROUP_IOSCHED=y
# CONFIG_DEFAULT_DEADLINE is not set
CONFIG_DEFAULT_CFQ=y
# CONFIG_DEFAULT_NOOP is not set
CONFIG_DEFAULT_IOSCHED="cfq"
CONFIG_PREEMPT_NOTIFIERS=y
CONFIG_INLINE_SPIN_UNLOCK_IRQ=y
CONFIG_INLINE_READ_UNLOCK=y
CONFIG_INLINE_READ_UNLOCK_IRQ=y
CONFIG_INLINE_WRITE_UNLOCK=y
CONFIG_INLINE_WRITE_UNLOCK_IRQ=y
CONFIG_MUTEX_SPIN_ON_OWNER=y
CONFIG_FREEZER=y

#
# Processor type and features
#
CONFIG_ZONE_DMA=y
CONFIG_SMP=y
CONFIG_X86_MPPARSE=y
CONFIG_X86_EXTENDED_PLATFORM=y
# CONFIG_X86_VSMP is not set
CONFIG_X86_SUPPORTS_MEMORY_FAILURE=y
CONFIG_SCHED_OMIT_FRAME_POINTER=y
CONFIG_PARAVIRT_GUEST=y
# CONFIG_PARAVIRT_TIME_ACCOUNTING is not set
CONFIG_XEN=y
CONFIG_XEN_DOM0=y
CONFIG_XEN_PRIVILEGED_GUEST=y
CONFIG_XEN_PVHVM=y
CONFIG_XEN_MAX_DOMAIN_MEMORY=500
CONFIG_XEN_SAVE_RESTORE=y
CONFIG_XEN_DEBUG_FS=y
CONFIG_KVM_GUEST=y
CONFIG_PARAVIRT=y
# CONFIG_PARAVIRT_SPINLOCKS is not set
CONFIG_PARAVIRT_CLOCK=y
CONFIG_NO_BOOTMEM=y
# CONFIG_MEMTEST is not set
# CONFIG_MK8 is not set
# CONFIG_MPSC is not set
# CONFIG_MCORE2 is not set
# CONFIG_MATOM is not set
CONFIG_GENERIC_CPU=y
CONFIG_X86_INTERNODE_CACHE_SHIFT=6
CONFIG_X86_CMPXCHG=y
CONFIG_X86_L1_CACHE_SHIFT=6
CONFIG_X86_XADD=y
CONFIG_X86_WP_WORKS_OK=y
CONFIG_X86_TSC=y
CONFIG_X86_CMPXCHG64=y
CONFIG_X86_CMOV=y
CONFIG_X86_MINIMUM_CPU_FAMILY=64
CONFIG_X86_DEBUGCTLMSR=y
CONFIG_CPU_SUP_INTEL=y
CONFIG_CPU_SUP_AMD=y
CONFIG_CPU_SUP_CENTAUR=y
CONFIG_HPET_TIMER=y
CONFIG_HPET_EMULATE_RTC=y
CONFIG_DMI=y
CONFIG_GART_IOMMU=y
CONFIG_CALGARY_IOMMU=y
# CONFIG_CALGARY_IOMMU_ENABLED_BY_DEFAULT is not set
CONFIG_SWIOTLB=y
CONFIG_IOMMU_HELPER=y
CONFIG_NR_CPUS=512
CONFIG_SCHED_SMT=y
CONFIG_SCHED_MC=y
# CONFIG_PREEMPT_NONE is not set
CONFIG_PREEMPT_VOLUNTARY=y
# CONFIG_PREEMPT is not set
CONFIG_X86_LOCAL_APIC=y
CONFIG_X86_IO_APIC=y
CONFIG_X86_REROUTE_FOR_BROKEN_BOOT_IRQS=y
CONFIG_X86_MCE=y
CONFIG_X86_MCE_INTEL=y
CONFIG_X86_MCE_AMD=y
CONFIG_X86_MCE_THRESHOLD=y
# CONFIG_X86_MCE_INJECT is not set
CONFIG_X86_THERMAL_VECTOR=y
# CONFIG_I8K is not set
# CONFIG_MICROCODE is not set
CONFIG_X86_MSR=y
CONFIG_X86_CPUID=y
CONFIG_ARCH_PHYS_ADDR_T_64BIT=y
CONFIG_ARCH_DMA_ADDR_T_64BIT=y
CONFIG_DIRECT_GBPAGES=y
CONFIG_NUMA=y
CONFIG_AMD_NUMA=y
CONFIG_X86_64_ACPI_NUMA=y
CONFIG_NODES_SPAN_OTHER_NODES=y
# CONFIG_NUMA_EMU is not set
CONFIG_NODES_SHIFT=6
CONFIG_ARCH_SPARSEMEM_ENABLE=y
CONFIG_ARCH_SPARSEMEM_DEFAULT=y
CONFIG_ARCH_SELECT_MEMORY_MODEL=y
CONFIG_ARCH_MEMORY_PROBE=y
CONFIG_ARCH_PROC_KCORE_TEXT=y
CONFIG_ILLEGAL_POINTER_VALUE=0xdead000000000000
CONFIG_SELECT_MEMORY_MODEL=y
CONFIG_SPARSEMEM_MANUAL=y
CONFIG_SPARSEMEM=y
CONFIG_NEED_MULTIPLE_NODES=y
CONFIG_HAVE_MEMORY_PRESENT=y
CONFIG_SPARSEMEM_EXTREME=y
CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y
CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER=y
CONFIG_SPARSEMEM_VMEMMAP=y
CONFIG_HAVE_MEMBLOCK=y
CONFIG_HAVE_MEMBLOCK_NODE_MAP=y
CONFIG_ARCH_DISCARD_MEMBLOCK=y
CONFIG_MEMORY_ISOLATION=y
CONFIG_MEMORY_HOTPLUG=y
CONFIG_MEMORY_HOTPLUG_SPARSE=y
# CONFIG_MEMORY_HOTREMOVE is not set
CONFIG_PAGEFLAGS_EXTENDED=y
CONFIG_SPLIT_PTLOCK_CPUS=4
CONFIG_COMPACTION=y
CONFIG_MIGRATION=y
CONFIG_PHYS_ADDR_T_64BIT=y
CONFIG_ZONE_DMA_FLAG=1
CONFIG_BOUNCE=y
CONFIG_VIRT_TO_BUS=y
CONFIG_MMU_NOTIFIER=y
CONFIG_KSM=y
CONFIG_DEFAULT_MMAP_MIN_ADDR=4096
CONFIG_ARCH_SUPPORTS_MEMORY_FAILURE=y
CONFIG_MEMORY_FAILURE=y
CONFIG_TRANSPARENT_HUGEPAGE=y
CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y
# CONFIG_TRANSPARENT_HUGEPAGE_MADVISE is not set
CONFIG_CROSS_MEMORY_ATTACH=y
# CONFIG_CLEANCACHE is not set
# CONFIG_FRONTSWAP is not set
# CONFIG_X86_CHECK_BIOS_CORRUPTION is not set
CONFIG_X86_RESERVE_LOW=64
CONFIG_MTRR=y
CONFIG_MTRR_SANITIZER=y
CONFIG_MTRR_SANITIZER_ENABLE_DEFAULT=1
CONFIG_MTRR_SANITIZER_SPARE_REG_NR_DEFAULT=1
CONFIG_X86_PAT=y
CONFIG_ARCH_USES_PG_UNCACHED=y
CONFIG_ARCH_RANDOM=y
CONFIG_X86_SMAP=y
CONFIG_EFI=y
# CONFIG_EFI_STUB is not set
# CONFIG_SECCOMP is not set
CONFIG_CC_STACKPROTECTOR=y
# CONFIG_HZ_100 is not set
# CONFIG_HZ_250 is not set
# CONFIG_HZ_300 is not set
CONFIG_HZ_1000=y
CONFIG_HZ=1000
CONFIG_SCHED_HRTICK=y
CONFIG_KEXEC=y
CONFIG_CRASH_DUMP=y
CONFIG_KEXEC_JUMP=y
CONFIG_PHYSICAL_START=0x1000000
CONFIG_RELOCATABLE=y
CONFIG_PHYSICAL_ALIGN=0x1000000
CONFIG_HOTPLUG_CPU=y
# CONFIG_COMPAT_VDSO is not set
# CONFIG_CMDLINE_BOOL is not set
CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y
CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE=y
CONFIG_USE_PERCPU_NUMA_NODE_ID=y

#
# Power management and ACPI options
#
CONFIG_ARCH_HIBERNATION_HEADER=y
CONFIG_SUSPEND=y
CONFIG_SUSPEND_FREEZER=y
CONFIG_HIBERNATE_CALLBACKS=y
CONFIG_HIBERNATION=y
CONFIG_PM_STD_PARTITION=""
CONFIG_PM_SLEEP=y
CONFIG_PM_SLEEP_SMP=y
# CONFIG_PM_AUTOSLEEP is not set
# CONFIG_PM_WAKELOCKS is not set
CONFIG_PM_RUNTIME=y
CONFIG_PM=y
# CONFIG_PM_DEBUG is not set
CONFIG_ACPI=y
CONFIG_ACPI_SLEEP=y
CONFIG_ACPI_PROCFS=y
CONFIG_ACPI_PROCFS_POWER=y
# CONFIG_ACPI_EC_DEBUGFS is not set
CONFIG_ACPI_PROC_EVENT=y
CONFIG_ACPI_AC=y
CONFIG_ACPI_BATTERY=y
CONFIG_ACPI_BUTTON=y
CONFIG_ACPI_FAN=y
CONFIG_ACPI_DOCK=y
CONFIG_ACPI_PROCESSOR=y
CONFIG_ACPI_HOTPLUG_CPU=y
# CONFIG_ACPI_PROCESSOR_AGGREGATOR is not set
CONFIG_ACPI_THERMAL=y
CONFIG_ACPI_NUMA=y
# CONFIG_ACPI_CUSTOM_DSDT is not set
CONFIG_ACPI_BLACKLIST_YEAR=0
# CONFIG_ACPI_DEBUG is not set
CONFIG_ACPI_PCI_SLOT=y
CONFIG_X86_PM_TIMER=y
CONFIG_ACPI_CONTAINER=y
CONFIG_ACPI_HOTPLUG_MEMORY=y
# CONFIG_ACPI_SBS is not set
# CONFIG_ACPI_HED is not set
# CONFIG_ACPI_CUSTOM_METHOD is not set
# CONFIG_ACPI_BGRT is not set
CONFIG_ACPI_APEI=y
# CONFIG_ACPI_APEI_GHES is not set
# CONFIG_ACPI_APEI_PCIEAER is not set
# CONFIG_ACPI_APEI_MEMORY_FAILURE is not set
# CONFIG_ACPI_APEI_EINJ is not set
# CONFIG_ACPI_APEI_ERST_DEBUG is not set
CONFIG_SFI=y

#
# CPU Frequency scaling
#
CONFIG_CPU_FREQ=y
# CONFIG_CPU_FREQ_STAT is not set
# CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE is not set
CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE=y
# CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND is not set
# CONFIG_CPU_FREQ_DEFAULT_GOV_CONSERVATIVE is not set
CONFIG_CPU_FREQ_GOV_PERFORMANCE=y
# CONFIG_CPU_FREQ_GOV_POWERSAVE is not set
CONFIG_CPU_FREQ_GOV_USERSPACE=y
# CONFIG_CPU_FREQ_GOV_ONDEMAND is not set
# CONFIG_CPU_FREQ_GOV_CONSERVATIVE is not set

#
# x86 CPU frequency scaling drivers
#
# CONFIG_X86_PCC_CPUFREQ is not set
# CONFIG_X86_ACPI_CPUFREQ is not set
# CONFIG_X86_POWERNOW_K8 is not set
# CONFIG_X86_SPEEDSTEP_CENTRINO is not set
# CONFIG_X86_P4_CLOCKMOD is not set

#
# shared options
#
# CONFIG_X86_SPEEDSTEP_LIB is not set
CONFIG_CPU_IDLE=y
CONFIG_CPU_IDLE_GOV_LADDER=y
CONFIG_CPU_IDLE_GOV_MENU=y
# CONFIG_ARCH_NEEDS_CPU_IDLE_COUPLED is not set
CONFIG_INTEL_IDLE=y

#
# Memory power savings
#
# CONFIG_I7300_IDLE is not set

#
# Bus options (PCI etc.)
#
CONFIG_PCI=y
CONFIG_PCI_DIRECT=y
CONFIG_PCI_MMCONFIG=y
CONFIG_PCI_XEN=y
CONFIG_PCI_DOMAINS=y
CONFIG_PCIEPORTBUS=y
CONFIG_HOTPLUG_PCI_PCIE=y
CONFIG_PCIEAER=y
CONFIG_PCIE_ECRC=y
# CONFIG_PCIEAER_INJECT is not set
CONFIG_PCIEASPM=y
# CONFIG_PCIEASPM_DEBUG is not set
CONFIG_PCIEASPM_DEFAULT=y
# CONFIG_PCIEASPM_POWERSAVE is not set
# CONFIG_PCIEASPM_PERFORMANCE is not set
CONFIG_PCIE_PME=y
CONFIG_ARCH_SUPPORTS_MSI=y
CONFIG_PCI_MSI=y
# CONFIG_PCI_REALLOC_ENABLE_AUTO is not set
CONFIG_PCI_STUB=y
CONFIG_XEN_PCIDEV_FRONTEND=y
CONFIG_HT_IRQ=y
CONFIG_PCI_ATS=y
CONFIG_PCI_IOV=y
CONFIG_PCI_PRI=y
CONFIG_PCI_PASID=y
# CONFIG_PCI_IOAPIC is not set
CONFIG_PCI_LABEL=y
CONFIG_ISA_DMA_API=y
CONFIG_AMD_NB=y
CONFIG_PCCARD=y
CONFIG_PCMCIA=y
CONFIG_PCMCIA_LOAD_CIS=y
CONFIG_CARDBUS=y

#
# PC-card bridges
#
# CONFIG_YENTA is not set
# CONFIG_PD6729 is not set
# CONFIG_I82092 is not set
CONFIG_HOTPLUG_PCI=y
CONFIG_HOTPLUG_PCI_ACPI=y
# CONFIG_HOTPLUG_PCI_ACPI_IBM is not set
# CONFIG_HOTPLUG_PCI_CPCI is not set
# CONFIG_HOTPLUG_PCI_SHPC is not set
# CONFIG_RAPIDIO is not set

#
# Executable file formats / Emulations
#
CONFIG_BINFMT_ELF=y
CONFIG_COMPAT_BINFMT_ELF=y
CONFIG_ARCH_BINFMT_ELF_RANDOMIZE_PIE=y
CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS=y
# CONFIG_HAVE_AOUT is not set
CONFIG_BINFMT_MISC=y
CONFIG_COREDUMP=y
CONFIG_IA32_EMULATION=y
# CONFIG_IA32_AOUT is not set
# CONFIG_X86_X32 is not set
CONFIG_COMPAT=y
CONFIG_COMPAT_FOR_U64_ALIGNMENT=y
CONFIG_SYSVIPC_COMPAT=y
CONFIG_KEYS_COMPAT=y
CONFIG_HAVE_TEXT_POKE_SMP=y
CONFIG_X86_DEV_DMA_OPS=y
CONFIG_NET=y

#
# Networking options
#
CONFIG_PACKET=y
# CONFIG_PACKET_DIAG is not set
CONFIG_UNIX=y
# CONFIG_UNIX_DIAG is not set
CONFIG_XFRM=y
CONFIG_XFRM_ALGO=y
CONFIG_XFRM_USER=y
CONFIG_XFRM_SUB_POLICY=y
CONFIG_XFRM_MIGRATE=y
CONFIG_XFRM_STATISTICS=y
# CONFIG_NET_KEY is not set
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
CONFIG_IP_ADVANCED_ROUTER=y
# CONFIG_IP_FIB_TRIE_STATS is not set
CONFIG_IP_MULTIPLE_TABLES=y
CONFIG_IP_ROUTE_MULTIPATH=y
CONFIG_IP_ROUTE_VERBOSE=y
# CONFIG_IP_PNP is not set
# CONFIG_NET_IPIP is not set
# CONFIG_NET_IPGRE_DEMUX is not set
CONFIG_IP_MROUTE=y
# CONFIG_IP_MROUTE_MULTIPLE_TABLES is not set
CONFIG_IP_PIMSM_V1=y
CONFIG_IP_PIMSM_V2=y
# CONFIG_ARPD is not set
CONFIG_SYN_COOKIES=y
# CONFIG_INET_AH is not set
# CONFIG_INET_ESP is not set
# CONFIG_INET_IPCOMP is not set
# CONFIG_INET_XFRM_TUNNEL is not set
# CONFIG_INET_TUNNEL is not set
# CONFIG_INET_XFRM_MODE_TRANSPORT is not set
# CONFIG_INET_XFRM_MODE_TUNNEL is not set
# CONFIG_INET_XFRM_MODE_BEET is not set
CONFIG_INET_LRO=y
# CONFIG_INET_DIAG is not set
CONFIG_TCP_CONG_ADVANCED=y
# CONFIG_TCP_CONG_BIC is not set
CONFIG_TCP_CONG_CUBIC=y
# CONFIG_TCP_CONG_WESTWOOD is not set
# CONFIG_TCP_CONG_HTCP is not set
# CONFIG_TCP_CONG_HSTCP is not set
# CONFIG_TCP_CONG_HYBLA is not set
# CONFIG_TCP_CONG_VEGAS is not set
# CONFIG_TCP_CONG_SCALABLE is not set
# CONFIG_TCP_CONG_LP is not set
# CONFIG_TCP_CONG_VENO is not set
# CONFIG_TCP_CONG_YEAH is not set
# CONFIG_TCP_CONG_ILLINOIS is not set
CONFIG_DEFAULT_CUBIC=y
# CONFIG_DEFAULT_RENO is not set
CONFIG_DEFAULT_TCP_CONG="cubic"
CONFIG_TCP_MD5SIG=y
CONFIG_IPV6=m
CONFIG_IPV6_PRIVACY=y
CONFIG_IPV6_ROUTER_PREF=y
CONFIG_IPV6_ROUTE_INFO=y
CONFIG_IPV6_OPTIMISTIC_DAD=y
# CONFIG_INET6_AH is not set
# CONFIG_INET6_ESP is not set
# CONFIG_INET6_IPCOMP is not set
# CONFIG_IPV6_MIP6 is not set
# CONFIG_INET6_XFRM_TUNNEL is not set
# CONFIG_INET6_TUNNEL is not set
# CONFIG_INET6_XFRM_MODE_TRANSPORT is not set
# CONFIG_INET6_XFRM_MODE_TUNNEL is not set
# CONFIG_INET6_XFRM_MODE_BEET is not set
# CONFIG_INET6_XFRM_MODE_ROUTEOPTIMIZATION is not set
# CONFIG_IPV6_SIT is not set
# CONFIG_IPV6_TUNNEL is not set
# CONFIG_IPV6_GRE is not set
CONFIG_IPV6_MULTIPLE_TABLES=y
# CONFIG_IPV6_SUBTREES is not set
CONFIG_IPV6_MROUTE=y
# CONFIG_IPV6_MROUTE_MULTIPLE_TABLES is not set
CONFIG_IPV6_PIMSM_V2=y
CONFIG_NETLABEL=y
CONFIG_NETWORK_SECMARK=y
# CONFIG_NETWORK_PHY_TIMESTAMPING is not set
CONFIG_NETFILTER=y
# CONFIG_NETFILTER_DEBUG is not set
CONFIG_NETFILTER_ADVANCED=y

#
# Core Netfilter Configuration
#
# CONFIG_NETFILTER_NETLINK_ACCT is not set
# CONFIG_NETFILTER_NETLINK_QUEUE is not set
# CONFIG_NETFILTER_NETLINK_LOG is not set
# CONFIG_NF_CONNTRACK is not set
CONFIG_NETFILTER_XTABLES=y

#
# Xtables combined modules
#
# CONFIG_NETFILTER_XT_MARK is not set

#
# Xtables targets
#
# CONFIG_NETFILTER_XT_TARGET_AUDIT is not set
# CONFIG_NETFILTER_XT_TARGET_CLASSIFY is not set
# CONFIG_NETFILTER_XT_TARGET_HMARK is not set
# CONFIG_NETFILTER_XT_TARGET_IDLETIMER is not set
# CONFIG_NETFILTER_XT_TARGET_LED is not set
# CONFIG_NETFILTER_XT_TARGET_LOG is not set
# CONFIG_NETFILTER_XT_TARGET_MARK is not set
# CONFIG_NETFILTER_XT_TARGET_NFLOG is not set
# CONFIG_NETFILTER_XT_TARGET_NFQUEUE is not set
# CONFIG_NETFILTER_XT_TARGET_RATEEST is not set
# CONFIG_NETFILTER_XT_TARGET_TEE is not set
# CONFIG_NETFILTER_XT_TARGET_SECMARK is not set
# CONFIG_NETFILTER_XT_TARGET_TCPMSS is not set

#
# Xtables matches
#
# CONFIG_NETFILTER_XT_MATCH_ADDRTYPE is not set
# CONFIG_NETFILTER_XT_MATCH_COMMENT is not set
# CONFIG_NETFILTER_XT_MATCH_CPU is not set
# CONFIG_NETFILTER_XT_MATCH_DCCP is not set
# CONFIG_NETFILTER_XT_MATCH_DEVGROUP is not set
# CONFIG_NETFILTER_XT_MATCH_DSCP is not set
# CONFIG_NETFILTER_XT_MATCH_ECN is not set
# CONFIG_NETFILTER_XT_MATCH_ESP is not set
# CONFIG_NETFILTER_XT_MATCH_HASHLIMIT is not set
# CONFIG_NETFILTER_XT_MATCH_HL is not set
# CONFIG_NETFILTER_XT_MATCH_IPRANGE is not set
# CONFIG_NETFILTER_XT_MATCH_LENGTH is not set
# CONFIG_NETFILTER_XT_MATCH_LIMIT is not set
# CONFIG_NETFILTER_XT_MATCH_MAC is not set
# CONFIG_NETFILTER_XT_MATCH_MARK is not set
# CONFIG_NETFILTER_XT_MATCH_MULTIPORT is not set
# CONFIG_NETFILTER_XT_MATCH_NFACCT is not set
# CONFIG_NETFILTER_XT_MATCH_OWNER is not set
# CONFIG_NETFILTER_XT_MATCH_POLICY is not set
# CONFIG_NETFILTER_XT_MATCH_PKTTYPE is not set
# CONFIG_NETFILTER_XT_MATCH_QUOTA is not set
# CONFIG_NETFILTER_XT_MATCH_RATEEST is not set
# CONFIG_NETFILTER_XT_MATCH_REALM is not set
# CONFIG_NETFILTER_XT_MATCH_RECENT is not set
# CONFIG_NETFILTER_XT_MATCH_SCTP is not set
# CONFIG_NETFILTER_XT_MATCH_STATISTIC is not set
# CONFIG_NETFILTER_XT_MATCH_STRING is not set
# CONFIG_NETFILTER_XT_MATCH_TCPMSS is not set
# CONFIG_NETFILTER_XT_MATCH_TIME is not set
# CONFIG_NETFILTER_XT_MATCH_U32 is not set
# CONFIG_IP_VS is not set

#
# IP: Netfilter Configuration
#
# CONFIG_NF_DEFRAG_IPV4 is not set
# CONFIG_IP_NF_QUEUE is not set
# CONFIG_IP_NF_IPTABLES is not set
# CONFIG_IP_NF_ARPTABLES is not set

#
# IPv6: Netfilter Configuration
#
# CONFIG_NF_DEFRAG_IPV6 is not set
# CONFIG_IP6_NF_IPTABLES is not set
# CONFIG_IP_DCCP is not set
# CONFIG_IP_SCTP is not set
# CONFIG_RDS is not set
# CONFIG_TIPC is not set
# CONFIG_ATM is not set
# CONFIG_L2TP is not set
# CONFIG_BRIDGE is not set
CONFIG_NET_DSA=y
CONFIG_NET_DSA_TAG_DSA=y
CONFIG_NET_DSA_TAG_EDSA=y
CONFIG_NET_DSA_TAG_TRAILER=y
# CONFIG_VLAN_8021Q is not set
# CONFIG_DECNET is not set
# CONFIG_LLC2 is not set
# CONFIG_IPX is not set
# CONFIG_ATALK is not set
# CONFIG_X25 is not set
# CONFIG_LAPB is not set
# CONFIG_WAN_ROUTER is not set
# CONFIG_PHONET is not set
# CONFIG_IEEE802154 is not set
CONFIG_NET_SCHED=y

#
# Queueing/Scheduling
#
# CONFIG_NET_SCH_CBQ is not set
# CONFIG_NET_SCH_HTB is not set
# CONFIG_NET_SCH_HFSC is not set
# CONFIG_NET_SCH_PRIO is not set
# CONFIG_NET_SCH_MULTIQ is not set
# CONFIG_NET_SCH_RED is not set
# CONFIG_NET_SCH_SFB is not set
# CONFIG_NET_SCH_SFQ is not set
# CONFIG_NET_SCH_TEQL is not set
# CONFIG_NET_SCH_TBF is not set
# CONFIG_NET_SCH_GRED is not set
# CONFIG_NET_SCH_DSMARK is not set
# CONFIG_NET_SCH_NETEM is not set
# CONFIG_NET_SCH_DRR is not set
# CONFIG_NET_SCH_MQPRIO is not set
# CONFIG_NET_SCH_CHOKE is not set
# CONFIG_NET_SCH_QFQ is not set
# CONFIG_NET_SCH_CODEL is not set
# CONFIG_NET_SCH_FQ_CODEL is not set
# CONFIG_NET_SCH_INGRESS is not set
# CONFIG_NET_SCH_PLUG is not set

#
# Classification
#
CONFIG_NET_CLS=y
# CONFIG_NET_CLS_BASIC is not set
# CONFIG_NET_CLS_TCINDEX is not set
# CONFIG_NET_CLS_ROUTE4 is not set
# CONFIG_NET_CLS_FW is not set
# CONFIG_NET_CLS_U32 is not set
# CONFIG_NET_CLS_RSVP is not set
# CONFIG_NET_CLS_RSVP6 is not set
# CONFIG_NET_CLS_FLOW is not set
CONFIG_NET_CLS_CGROUP=y
CONFIG_NET_EMATCH=y
CONFIG_NET_EMATCH_STACK=32
# CONFIG_NET_EMATCH_CMP is not set
# CONFIG_NET_EMATCH_NBYTE is not set
# CONFIG_NET_EMATCH_U32 is not set
# CONFIG_NET_EMATCH_META is not set
# CONFIG_NET_EMATCH_TEXT is not set
CONFIG_NET_CLS_ACT=y
# CONFIG_NET_ACT_POLICE is not set
# CONFIG_NET_ACT_GACT is not set
# CONFIG_NET_ACT_MIRRED is not set
# CONFIG_NET_ACT_NAT is not set
# CONFIG_NET_ACT_PEDIT is not set
# CONFIG_NET_ACT_SIMP is not set
# CONFIG_NET_ACT_SKBEDIT is not set
# CONFIG_NET_ACT_CSUM is not set
CONFIG_NET_SCH_FIFO=y
CONFIG_DCB=y
# CONFIG_DNS_RESOLVER is not set
# CONFIG_BATMAN_ADV is not set
# CONFIG_OPENVSWITCH is not set
CONFIG_RPS=y
CONFIG_RFS_ACCEL=y
CONFIG_XPS=y
# CONFIG_NETPRIO_CGROUP is not set
CONFIG_BQL=y
# CONFIG_BPF_JIT is not set

#
# Network testing
#
# CONFIG_NET_PKTGEN is not set
# CONFIG_NET_TCPPROBE is not set
CONFIG_NET_DROP_MONITOR=y
# CONFIG_HAMRADIO is not set
# CONFIG_CAN is not set
# CONFIG_IRDA is not set
# CONFIG_BT is not set
# CONFIG_AF_RXRPC is not set
CONFIG_FIB_RULES=y
CONFIG_WIRELESS=y
# CONFIG_CFG80211 is not set
# CONFIG_LIB80211 is not set

#
# CFG80211 needs to be enabled for MAC80211
#
# CONFIG_WIMAX is not set
# CONFIG_RFKILL is not set
# CONFIG_RFKILL_REGULATOR is not set
# CONFIG_NET_9P is not set
# CONFIG_CAIF is not set
# CONFIG_CEPH_LIB is not set
# CONFIG_NFC is not set
CONFIG_HAVE_BPF_JIT=y

#
# Device Drivers
#

#
# Generic Driver Options
#
CONFIG_UEVENT_HELPER_PATH=""
CONFIG_DEVTMPFS=y
CONFIG_DEVTMPFS_MOUNT=y
CONFIG_STANDALONE=y
CONFIG_PREVENT_FIRMWARE_BUILD=y
CONFIG_FW_LOADER=y
# CONFIG_FIRMWARE_IN_KERNEL is not set
CONFIG_EXTRA_FIRMWARE=""
CONFIG_SYS_HYPERVISOR=y
# CONFIG_GENERIC_CPU_DEVICES is not set
CONFIG_DMA_SHARED_BUFFER=y

#
# Bus devices
#
# CONFIG_OMAP_OCP2SCP is not set
CONFIG_CONNECTOR=y
CONFIG_PROC_EVENTS=y
CONFIG_MTD=y
# CONFIG_MTD_TESTS is not set
# CONFIG_MTD_REDBOOT_PARTS is not set
CONFIG_MTD_CMDLINE_PARTS=y
# CONFIG_MTD_AR7_PARTS is not set

#
# User Modules And Translation Layers
#
# CONFIG_MTD_CHAR is not set
# CONFIG_MTD_BLKDEVS is not set
# CONFIG_MTD_BLOCK is not set
# CONFIG_MTD_BLOCK_RO is not set
# CONFIG_FTL is not set
# CONFIG_NFTL is not set
# CONFIG_INFTL is not set
# CONFIG_RFD_FTL is not set
# CONFIG_SSFDC is not set
# CONFIG_SM_FTL is not set
# CONFIG_MTD_OOPS is not set
# CONFIG_MTD_SWAP is not set

#
# RAM/ROM/Flash chip drivers
#
# CONFIG_MTD_CFI is not set
# CONFIG_MTD_JEDECPROBE is not set
CONFIG_MTD_MAP_BANK_WIDTH_1=y
CONFIG_MTD_MAP_BANK_WIDTH_2=y
CONFIG_MTD_MAP_BANK_WIDTH_4=y
# CONFIG_MTD_MAP_BANK_WIDTH_8 is not set
# CONFIG_MTD_MAP_BANK_WIDTH_16 is not set
# CONFIG_MTD_MAP_BANK_WIDTH_32 is not set
CONFIG_MTD_CFI_I1=y
CONFIG_MTD_CFI_I2=y
# CONFIG_MTD_CFI_I4 is not set
# CONFIG_MTD_CFI_I8 is not set
# CONFIG_MTD_RAM is not set
# CONFIG_MTD_ROM is not set
# CONFIG_MTD_ABSENT is not set

#
# Mapping drivers for chip access
#
CONFIG_MTD_COMPLEX_MAPPINGS=y
# CONFIG_MTD_TS5500 is not set
# CONFIG_MTD_PCI is not set
# CONFIG_MTD_PCMCIA is not set
# CONFIG_MTD_GPIO_ADDR is not set
# CONFIG_MTD_INTEL_VR_NOR is not set
# CONFIG_MTD_PLATRAM is not set
# CONFIG_MTD_LATCH_ADDR is not set

#
# Self-contained MTD device drivers
#
# CONFIG_MTD_PMC551 is not set
# CONFIG_MTD_SLRAM is not set
# CONFIG_MTD_PHRAM is not set
# CONFIG_MTD_MTDRAM is not set
# CONFIG_MTD_BLOCK2MTD is not set

#
# Disk-On-Chip Device Drivers
#
# CONFIG_MTD_DOCG3 is not set
# CONFIG_MTD_NAND is not set
# CONFIG_MTD_ONENAND is not set

#
# LPDDR flash memory drivers
#
# CONFIG_MTD_LPDDR is not set
# CONFIG_MTD_UBI is not set
# CONFIG_PARPORT is not set
CONFIG_PNP=y
# CONFIG_PNP_DEBUG_MESSAGES is not set

#
# Protocols
#
CONFIG_PNPACPI=y
CONFIG_BLK_DEV=y
# CONFIG_BLK_DEV_FD is not set
# CONFIG_BLK_DEV_PCIESSD_MTIP32XX is not set
# CONFIG_BLK_CPQ_DA is not set
# CONFIG_BLK_CPQ_CISS_DA is not set
# CONFIG_BLK_DEV_DAC960 is not set
# CONFIG_BLK_DEV_UMEM is not set
# CONFIG_BLK_DEV_COW_COMMON is not set
CONFIG_BLK_DEV_LOOP=y
CONFIG_BLK_DEV_LOOP_MIN_COUNT=8
# CONFIG_BLK_DEV_CRYPTOLOOP is not set
# CONFIG_BLK_DEV_DRBD is not set
# CONFIG_BLK_DEV_NBD is not set
# CONFIG_BLK_DEV_NVME is not set
# CONFIG_BLK_DEV_SX8 is not set
CONFIG_BLK_DEV_RAM=y
CONFIG_BLK_DEV_RAM_COUNT=16
CONFIG_BLK_DEV_RAM_SIZE=16384
# CONFIG_BLK_DEV_XIP is not set
CONFIG_CDROM_PKTCDVD=m
CONFIG_CDROM_PKTCDVD_BUFFERS=8
# CONFIG_CDROM_PKTCDVD_WCACHE is not set
# CONFIG_ATA_OVER_ETH is not set
# CONFIG_XEN_BLKDEV_FRONTEND is not set
# CONFIG_XEN_BLKDEV_BACKEND is not set
# CONFIG_BLK_DEV_HD is not set
# CONFIG_BLK_DEV_RBD is not set

#
# Misc devices
#
# CONFIG_SENSORS_LIS3LV02D is not set
# CONFIG_AD525X_DPOT is not set
# CONFIG_IBM_ASM is not set
# CONFIG_PHANTOM is not set
# CONFIG_INTEL_MID_PTI is not set
# CONFIG_SGI_IOC4 is not set
# CONFIG_TIFM_CORE is not set
# CONFIG_ICS932S401 is not set
CONFIG_ENCLOSURE_SERVICES=m
# CONFIG_HP_ILO is not set
# CONFIG_APDS9802ALS is not set
# CONFIG_ISL29003 is not set
# CONFIG_ISL29020 is not set
# CONFIG_SENSORS_TSL2550 is not set
# CONFIG_SENSORS_BH1780 is not set
# CONFIG_SENSORS_BH1770 is not set
# CONFIG_SENSORS_APDS990X is not set
# CONFIG_HMC6352 is not set
# CONFIG_DS1682 is not set
# CONFIG_VMWARE_BALLOON is not set
# CONFIG_BMP085_I2C is not set
# CONFIG_PCH_PHUB is not set
# CONFIG_USB_SWITCH_FSA9480 is not set
# CONFIG_C2PORT is not set

#
# EEPROM support
#
# CONFIG_EEPROM_AT24 is not set
# CONFIG_EEPROM_LEGACY is not set
# CONFIG_EEPROM_MAX6875 is not set
# CONFIG_EEPROM_93CX6 is not set
# CONFIG_CB710_CORE is not set

#
# Texas Instruments shared transport line discipline
#
# CONFIG_TI_ST is not set
# CONFIG_SENSORS_LIS3_I2C is not set

#
# Altera FPGA firmware download module
#
# CONFIG_ALTERA_STAPL is not set
# CONFIG_INTEL_MEI is not set
CONFIG_HAVE_IDE=y
# CONFIG_IDE is not set

#
# SCSI device support
#
CONFIG_SCSI_MOD=y
# CONFIG_RAID_ATTRS is not set
CONFIG_SCSI=y
CONFIG_SCSI_DMA=y
CONFIG_SCSI_TGT=m
CONFIG_SCSI_NETLINK=y
CONFIG_SCSI_PROC_FS=y

#
# SCSI support type (disk, tape, CD-ROM)
#
CONFIG_BLK_DEV_SD=m
# CONFIG_CHR_DEV_ST is not set
# CONFIG_CHR_DEV_OSST is not set
CONFIG_BLK_DEV_SR=m
CONFIG_BLK_DEV_SR_VENDOR=y
CONFIG_CHR_DEV_SG=m
# CONFIG_CHR_DEV_SCH is not set
CONFIG_SCSI_ENCLOSURE=m
CONFIG_SCSI_MULTI_LUN=y
CONFIG_SCSI_CONSTANTS=y
CONFIG_SCSI_LOGGING=y
CONFIG_SCSI_SCAN_ASYNC=y

#
# SCSI Transports
#
# CONFIG_SCSI_SPI_ATTRS is not set
CONFIG_SCSI_FC_ATTRS=m
CONFIG_SCSI_FC_TGT_ATTRS=y
# CONFIG_SCSI_ISCSI_ATTRS is not set
CONFIG_SCSI_SAS_ATTRS=m
CONFIG_SCSI_SAS_LIBSAS=m
CONFIG_SCSI_SAS_ATA=y
CONFIG_SCSI_SAS_HOST_SMP=y
# CONFIG_SCSI_SRP_ATTRS is not set
CONFIG_SCSI_LOWLEVEL=y
# CONFIG_ISCSI_TCP is not set
# CONFIG_ISCSI_BOOT_SYSFS is not set
# CONFIG_SCSI_CXGB3_ISCSI is not set
# CONFIG_SCSI_CXGB4_ISCSI is not set
# CONFIG_SCSI_BNX2_ISCSI is not set
# CONFIG_SCSI_BNX2X_FCOE is not set
# CONFIG_BE2ISCSI is not set
# CONFIG_BLK_DEV_3W_XXXX_RAID is not set
# CONFIG_SCSI_HPSA is not set
# CONFIG_SCSI_3W_9XXX is not set
# CONFIG_SCSI_3W_SAS is not set
# CONFIG_SCSI_ACARD is not set
CONFIG_SCSI_AACRAID=m
# CONFIG_SCSI_AIC7XXX is not set
# CONFIG_SCSI_AIC7XXX_OLD is not set
# CONFIG_SCSI_AIC79XX is not set
CONFIG_SCSI_AIC94XX=m
# CONFIG_AIC94XX_DEBUG is not set
# CONFIG_SCSI_MVSAS is not set
# CONFIG_SCSI_MVUMI is not set
# CONFIG_SCSI_DPT_I2O is not set
# CONFIG_SCSI_ADVANSYS is not set
# CONFIG_SCSI_ARCMSR is not set
CONFIG_MEGARAID_NEWGEN=y
# CONFIG_MEGARAID_MM is not set
# CONFIG_MEGARAID_LEGACY is not set
# CONFIG_MEGARAID_SAS is not set
# CONFIG_SCSI_MPT2SAS is not set
# CONFIG_SCSI_UFSHCD is not set
# CONFIG_SCSI_HPTIOP is not set
# CONFIG_SCSI_BUSLOGIC is not set
# CONFIG_VMWARE_PVSCSI is not set
# CONFIG_LIBFC is not set
# CONFIG_LIBFCOE is not set
# CONFIG_FCOE is not set
# CONFIG_FCOE_FNIC is not set
# CONFIG_SCSI_DMX3191D is not set
# CONFIG_SCSI_EATA is not set
# CONFIG_SCSI_FUTURE_DOMAIN is not set
# CONFIG_SCSI_GDTH is not set
# CONFIG_SCSI_ISCI is not set
# CONFIG_SCSI_IPS is not set
# CONFIG_SCSI_INITIO is not set
# CONFIG_SCSI_INIA100 is not set
# CONFIG_SCSI_STEX is not set
# CONFIG_SCSI_SYM53C8XX_2 is not set
# CONFIG_SCSI_IPR is not set
# CONFIG_SCSI_QLOGIC_1280 is not set
CONFIG_SCSI_QLA_FC=m
# CONFIG_SCSI_QLA_ISCSI is not set
# CONFIG_SCSI_LPFC is not set
# CONFIG_SCSI_DC395x is not set
# CONFIG_SCSI_DC390T is not set
# CONFIG_SCSI_DEBUG is not set
# CONFIG_SCSI_PMCRAID is not set
# CONFIG_SCSI_PM8001 is not set
# CONFIG_SCSI_SRP is not set
# CONFIG_SCSI_BFA_FC is not set
CONFIG_SCSI_LOWLEVEL_PCMCIA=y
# CONFIG_PCMCIA_AHA152X is not set
# CONFIG_PCMCIA_FDOMAIN is not set
# CONFIG_PCMCIA_QLOGIC is not set
# CONFIG_PCMCIA_SYM53C500 is not set
CONFIG_SCSI_DH=y
# CONFIG_SCSI_DH_RDAC is not set
# CONFIG_SCSI_DH_HP_SW is not set
# CONFIG_SCSI_DH_EMC is not set
# CONFIG_SCSI_DH_ALUA is not set
# CONFIG_SCSI_OSD_INITIATOR is not set
CONFIG_ATA=y
# CONFIG_ATA_NONSTANDARD is not set
CONFIG_ATA_VERBOSE_ERROR=y
CONFIG_ATA_ACPI=y
CONFIG_SATA_PMP=y

#
# Controllers with non-SFF native interface
#
# CONFIG_SATA_AHCI is not set
# CONFIG_SATA_AHCI_PLATFORM is not set
# CONFIG_SATA_INIC162X is not set
# CONFIG_SATA_ACARD_AHCI is not set
# CONFIG_SATA_SIL24 is not set
CONFIG_ATA_SFF=y

#
# SFF controllers with custom DMA interface
#
# CONFIG_PDC_ADMA is not set
# CONFIG_SATA_QSTOR is not set
# CONFIG_SATA_SX4 is not set
CONFIG_ATA_BMDMA=y

#
# SATA SFF controllers with BMDMA
#
# CONFIG_ATA_PIIX is not set
# CONFIG_SATA_HIGHBANK is not set
# CONFIG_SATA_MV is not set
# CONFIG_SATA_NV is not set
# CONFIG_SATA_PROMISE is not set
# CONFIG_SATA_SIL is not set
# CONFIG_SATA_SIS is not set
# CONFIG_SATA_SVW is not set
# CONFIG_SATA_ULI is not set
# CONFIG_SATA_VIA is not set
# CONFIG_SATA_VITESSE is not set

#
# PATA SFF controllers with BMDMA
#
# CONFIG_PATA_ALI is not set
# CONFIG_PATA_AMD is not set
# CONFIG_PATA_ARASAN_CF is not set
# CONFIG_PATA_ARTOP is not set
# CONFIG_PATA_ATIIXP is not set
# CONFIG_PATA_ATP867X is not set
# CONFIG_PATA_CMD64X is not set
# CONFIG_PATA_CS5520 is not set
# CONFIG_PATA_CS5530 is not set
# CONFIG_PATA_CS5536 is not set
# CONFIG_PATA_CYPRESS is not set
# CONFIG_PATA_EFAR is not set
# CONFIG_PATA_HPT366 is not set
# CONFIG_PATA_HPT37X is not set
# CONFIG_PATA_HPT3X2N is not set
# CONFIG_PATA_HPT3X3 is not set
# CONFIG_PATA_IT8213 is not set
# CONFIG_PATA_IT821X is not set
# CONFIG_PATA_JMICRON is not set
# CONFIG_PATA_MARVELL is not set
# CONFIG_PATA_NETCELL is not set
# CONFIG_PATA_NINJA32 is not set
# CONFIG_PATA_NS87415 is not set
# CONFIG_PATA_OLDPIIX is not set
# CONFIG_PATA_OPTIDMA is not set
# CONFIG_PATA_PDC2027X is not set
# CONFIG_PATA_PDC_OLD is not set
# CONFIG_PATA_RADISYS is not set
# CONFIG_PATA_RDC is not set
# CONFIG_PATA_SC1200 is not set
# CONFIG_PATA_SCH is not set
CONFIG_PATA_SERVERWORKS=m
# CONFIG_PATA_SIL680 is not set
# CONFIG_PATA_SIS is not set
# CONFIG_PATA_TOSHIBA is not set
# CONFIG_PATA_TRIFLEX is not set
# CONFIG_PATA_VIA is not set
# CONFIG_PATA_WINBOND is not set

#
# PIO-only SFF controllers
#
# CONFIG_PATA_CMD640_PCI is not set
# CONFIG_PATA_MPIIX is not set
# CONFIG_PATA_NS87410 is not set
# CONFIG_PATA_OPTI is not set
# CONFIG_PATA_PCMCIA is not set
# CONFIG_PATA_RZ1000 is not set

#
# Generic fallback / legacy drivers
#
CONFIG_PATA_ACPI=m
CONFIG_ATA_GENERIC=m
# CONFIG_PATA_LEGACY is not set
CONFIG_MD=y
CONFIG_BLK_DEV_MD=y
CONFIG_MD_AUTODETECT=y
# CONFIG_MD_LINEAR is not set
# CONFIG_MD_RAID0 is not set
# CONFIG_MD_RAID1 is not set
# CONFIG_MD_RAID10 is not set
# CONFIG_MD_RAID456 is not set
# CONFIG_MD_MULTIPATH is not set
# CONFIG_MD_FAULTY is not set
CONFIG_BLK_DEV_DM=m
CONFIG_DM_DEBUG=y
# CONFIG_DM_CRYPT is not set
# CONFIG_DM_SNAPSHOT is not set
# CONFIG_DM_THIN_PROVISIONING is not set
CONFIG_DM_MIRROR=m
# CONFIG_DM_RAID is not set
# CONFIG_DM_LOG_USERSPACE is not set
# CONFIG_DM_ZERO is not set
# CONFIG_DM_MULTIPATH is not set
# CONFIG_DM_DELAY is not set
CONFIG_DM_UEVENT=y
# CONFIG_DM_FLAKEY is not set
# CONFIG_DM_VERITY is not set
# CONFIG_TARGET_CORE is not set
CONFIG_FUSION=y
# CONFIG_FUSION_SPI is not set
# CONFIG_FUSION_FC is not set
# CONFIG_FUSION_SAS is not set
CONFIG_FUSION_MAX_SGE=128
CONFIG_FUSION_LOGGING=y

#
# IEEE 1394 (FireWire) support
#
# CONFIG_FIREWIRE is not set
# CONFIG_FIREWIRE_NOSY is not set
# CONFIG_I2O is not set
CONFIG_MACINTOSH_DRIVERS=y
CONFIG_MAC_EMUMOUSEBTN=y
CONFIG_NETDEVICES=y
CONFIG_NET_CORE=y
# CONFIG_BONDING is not set
# CONFIG_DUMMY is not set
# CONFIG_EQUALIZER is not set
CONFIG_NET_FC=y
# CONFIG_MII is not set
# CONFIG_IFB is not set
# CONFIG_NET_TEAM is not set
# CONFIG_MACVLAN is not set
# CONFIG_VXLAN is not set
CONFIG_NETCONSOLE=m
CONFIG_NETPOLL=y
# CONFIG_NETPOLL_TRAP is not set
CONFIG_NET_POLL_CONTROLLER=y
# CONFIG_TUN is not set
# CONFIG_VETH is not set
# CONFIG_ARCNET is not set

#
# CAIF transport drivers
#

#
# Distributed Switch Architecture drivers
#
CONFIG_NET_DSA_MV88E6XXX=y
CONFIG_NET_DSA_MV88E6060=y
CONFIG_NET_DSA_MV88E6XXX_NEED_PPU=y
CONFIG_NET_DSA_MV88E6131=y
CONFIG_NET_DSA_MV88E6123_61_65=y
CONFIG_ETHERNET=y
CONFIG_NET_VENDOR_3COM=y
# CONFIG_PCMCIA_3C574 is not set
# CONFIG_PCMCIA_3C589 is not set
# CONFIG_VORTEX is not set
# CONFIG_TYPHOON is not set
CONFIG_NET_VENDOR_ADAPTEC=y
# CONFIG_ADAPTEC_STARFIRE is not set
CONFIG_NET_VENDOR_ALTEON=y
# CONFIG_ACENIC is not set
CONFIG_NET_VENDOR_AMD=y
# CONFIG_AMD8111_ETH is not set
# CONFIG_PCNET32 is not set
# CONFIG_PCMCIA_NMCLAN is not set
CONFIG_NET_VENDOR_ATHEROS=y
# CONFIG_ATL2 is not set
# CONFIG_ATL1 is not set
# CONFIG_ATL1E is not set
# CONFIG_ATL1C is not set
CONFIG_NET_VENDOR_BROADCOM=y
# CONFIG_B44 is not set
# CONFIG_BNX2 is not set
# CONFIG_CNIC is not set
CONFIG_TIGON3=m
# CONFIG_BNX2X is not set
CONFIG_NET_VENDOR_BROCADE=y
# CONFIG_BNA is not set
# CONFIG_NET_CALXEDA_XGMAC is not set
CONFIG_NET_VENDOR_CHELSIO=y
# CONFIG_CHELSIO_T1 is not set
# CONFIG_CHELSIO_T3 is not set
# CONFIG_CHELSIO_T4 is not set
# CONFIG_CHELSIO_T4VF is not set
CONFIG_NET_VENDOR_CISCO=y
# CONFIG_ENIC is not set
# CONFIG_DNET is not set
CONFIG_NET_VENDOR_DEC=y
CONFIG_NET_TULIP=y
# CONFIG_DE2104X is not set
# CONFIG_TULIP is not set
# CONFIG_DE4X5 is not set
# CONFIG_WINBOND_840 is not set
# CONFIG_DM9102 is not set
# CONFIG_ULI526X is not set
# CONFIG_PCMCIA_XIRCOM is not set
CONFIG_NET_VENDOR_DLINK=y
# CONFIG_DL2K is not set
# CONFIG_SUNDANCE is not set
CONFIG_NET_VENDOR_EMULEX=y
# CONFIG_BE2NET is not set
CONFIG_NET_VENDOR_EXAR=y
# CONFIG_S2IO is not set
# CONFIG_VXGE is not set
CONFIG_NET_VENDOR_FUJITSU=y
# CONFIG_PCMCIA_FMVJ18X is not set
CONFIG_NET_VENDOR_HP=y
# CONFIG_HP100 is not set
CONFIG_NET_VENDOR_INTEL=y
# CONFIG_E100 is not set
# CONFIG_E1000 is not set
# CONFIG_E1000E is not set
# CONFIG_IGB is not set
# CONFIG_IGBVF is not set
# CONFIG_IXGB is not set
# CONFIG_IXGBE is not set
# CONFIG_IXGBEVF is not set
CONFIG_NET_VENDOR_I825XX=y
# CONFIG_ZNET is not set
# CONFIG_IP1000 is not set
# CONFIG_JME is not set
CONFIG_NET_VENDOR_MARVELL=y
# CONFIG_SKGE is not set
# CONFIG_SKY2 is not set
CONFIG_NET_VENDOR_MELLANOX=y
# CONFIG_MLX4_EN is not set
# CONFIG_MLX4_CORE is not set
CONFIG_NET_VENDOR_MICREL=y
# CONFIG_KS8851_MLL is not set
# CONFIG_KSZ884X_PCI is not set
CONFIG_NET_VENDOR_MYRI=y
# CONFIG_MYRI10GE is not set
# CONFIG_FEALNX is not set
CONFIG_NET_VENDOR_NATSEMI=y
# CONFIG_NATSEMI is not set
# CONFIG_NS83820 is not set
CONFIG_NET_VENDOR_8390=y
# CONFIG_PCMCIA_AXNET is not set
# CONFIG_NE2K_PCI is not set
# CONFIG_PCMCIA_PCNET is not set
CONFIG_NET_VENDOR_NVIDIA=y
# CONFIG_FORCEDETH is not set
CONFIG_NET_VENDOR_OKI=y
# CONFIG_PCH_GBE is not set
# CONFIG_ETHOC is not set
CONFIG_NET_PACKET_ENGINE=y
# CONFIG_HAMACHI is not set
# CONFIG_YELLOWFIN is not set
CONFIG_NET_VENDOR_QLOGIC=y
# CONFIG_QLA3XXX is not set
# CONFIG_QLCNIC is not set
# CONFIG_QLGE is not set
# CONFIG_NETXEN_NIC is not set
CONFIG_NET_VENDOR_REALTEK=y
# CONFIG_8139CP is not set
# CONFIG_8139TOO is not set
# CONFIG_R8169 is not set
CONFIG_NET_VENDOR_RDC=y
# CONFIG_R6040 is not set
CONFIG_NET_VENDOR_SEEQ=y
# CONFIG_SEEQ8005 is not set
CONFIG_NET_VENDOR_SILAN=y
# CONFIG_SC92031 is not set
CONFIG_NET_VENDOR_SIS=y
# CONFIG_SIS900 is not set
# CONFIG_SIS190 is not set
# CONFIG_SFC is not set
CONFIG_NET_VENDOR_SMSC=y
# CONFIG_PCMCIA_SMC91C92 is not set
# CONFIG_EPIC100 is not set
# CONFIG_SMSC9420 is not set
CONFIG_NET_VENDOR_STMICRO=y
# CONFIG_STMMAC_ETH is not set
CONFIG_NET_VENDOR_SUN=y
# CONFIG_HAPPYMEAL is not set
# CONFIG_SUNGEM is not set
# CONFIG_CASSINI is not set
# CONFIG_NIU is not set
CONFIG_NET_VENDOR_TEHUTI=y
# CONFIG_TEHUTI is not set
CONFIG_NET_VENDOR_TI=y
# CONFIG_TLAN is not set
CONFIG_NET_VENDOR_VIA=y
# CONFIG_VIA_RHINE is not set
# CONFIG_VIA_VELOCITY is not set
CONFIG_NET_VENDOR_WIZNET=y
# CONFIG_WIZNET_W5100 is not set
# CONFIG_WIZNET_W5300 is not set
CONFIG_NET_VENDOR_XIRCOM=y
# CONFIG_PCMCIA_XIRC2PS is not set
CONFIG_FDDI=y
# CONFIG_DEFXX is not set
# CONFIG_SKFP is not set
# CONFIG_HIPPI is not set
# CONFIG_NET_SB1000 is not set
CONFIG_PHYLIB=y

#
# MII PHY device drivers
#
# CONFIG_AT803X_PHY is not set
# CONFIG_AMD_PHY is not set
# CONFIG_MARVELL_PHY is not set
# CONFIG_DAVICOM_PHY is not set
# CONFIG_QSEMI_PHY is not set
# CONFIG_LXT_PHY is not set
# CONFIG_CICADA_PHY is not set
# CONFIG_VITESSE_PHY is not set
# CONFIG_SMSC_PHY is not set
# CONFIG_BROADCOM_PHY is not set
# CONFIG_BCM87XX_PHY is not set
# CONFIG_ICPLUS_PHY is not set
# CONFIG_REALTEK_PHY is not set
# CONFIG_NATIONAL_PHY is not set
# CONFIG_STE10XP is not set
# CONFIG_LSI_ET1011C_PHY is not set
# CONFIG_MICREL_PHY is not set
CONFIG_FIXED_PHY=y
# CONFIG_MDIO_BITBANG is not set
# CONFIG_PPP is not set
# CONFIG_SLIP is not set

#
# USB Network Adapters
#
# CONFIG_USB_CATC is not set
# CONFIG_USB_KAWETH is not set
# CONFIG_USB_PEGASUS is not set
# CONFIG_USB_RTL8150 is not set
# CONFIG_USB_USBNET is not set
# CONFIG_USB_IPHETH is not set
CONFIG_WLAN=y
# CONFIG_PCMCIA_RAYCS is not set
# CONFIG_AIRO is not set
# CONFIG_ATMEL is not set
# CONFIG_AIRO_CS is not set
# CONFIG_PCMCIA_WL3501 is not set
# CONFIG_PRISM54 is not set
# CONFIG_USB_ZD1201 is not set
# CONFIG_HOSTAP is not set
# CONFIG_WL_TI is not set

#
# Enable WiMAX (Networking options) to see the WiMAX drivers
#
CONFIG_WAN=y
# CONFIG_HDLC is not set
# CONFIG_DLCI is not set
# CONFIG_SBNI is not set
# CONFIG_XEN_NETDEV_FRONTEND is not set
# CONFIG_XEN_NETDEV_BACKEND is not set
# CONFIG_VMXNET3 is not set
CONFIG_ISDN=y
# CONFIG_ISDN_I4L is not set
# CONFIG_ISDN_CAPI is not set
# CONFIG_ISDN_DRV_GIGASET is not set
# CONFIG_HYSDN is not set
# CONFIG_MISDN is not set

#
# Input device support
#
CONFIG_INPUT=y
CONFIG_INPUT_FF_MEMLESS=y
# CONFIG_INPUT_POLLDEV is not set
# CONFIG_INPUT_SPARSEKMAP is not set
# CONFIG_INPUT_MATRIXKMAP is not set

#
# Userland interfaces
#
CONFIG_INPUT_MOUSEDEV=y
# CONFIG_INPUT_MOUSEDEV_PSAUX is not set
CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024
CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768
# CONFIG_INPUT_JOYDEV is not set
CONFIG_INPUT_EVDEV=y
# CONFIG_INPUT_EVBUG is not set

#
# Input Device Drivers
#
CONFIG_INPUT_KEYBOARD=y
# CONFIG_KEYBOARD_ADP5588 is not set
# CONFIG_KEYBOARD_ADP5589 is not set
CONFIG_KEYBOARD_ATKBD=y
# CONFIG_KEYBOARD_QT1070 is not set
# CONFIG_KEYBOARD_QT2160 is not set
# CONFIG_KEYBOARD_LKKBD is not set
# CONFIG_KEYBOARD_GPIO is not set
# CONFIG_KEYBOARD_GPIO_POLLED is not set
# CONFIG_KEYBOARD_TCA6416 is not set
# CONFIG_KEYBOARD_TCA8418 is not set
# CONFIG_KEYBOARD_MATRIX is not set
# CONFIG_KEYBOARD_LM8323 is not set
# CONFIG_KEYBOARD_LM8333 is not set
# CONFIG_KEYBOARD_MAX7359 is not set
# CONFIG_KEYBOARD_MCS is not set
# CONFIG_KEYBOARD_MPR121 is not set
# CONFIG_KEYBOARD_NEWTON is not set
# CONFIG_KEYBOARD_OPENCORES is not set
# CONFIG_KEYBOARD_STOWAWAY is not set
# CONFIG_KEYBOARD_SUNKBD is not set
# CONFIG_KEYBOARD_OMAP4 is not set
# CONFIG_KEYBOARD_XTKBD is not set
CONFIG_INPUT_MOUSE=y
CONFIG_MOUSE_PS2=y
CONFIG_MOUSE_PS2_ALPS=y
CONFIG_MOUSE_PS2_LOGIPS2PP=y
CONFIG_MOUSE_PS2_SYNAPTICS=y
CONFIG_MOUSE_PS2_LIFEBOOK=y
CONFIG_MOUSE_PS2_TRACKPOINT=y
CONFIG_MOUSE_PS2_ELANTECH=y
CONFIG_MOUSE_PS2_SENTELIC=y
# CONFIG_MOUSE_PS2_TOUCHKIT is not set
# CONFIG_MOUSE_SERIAL is not set
# CONFIG_MOUSE_APPLETOUCH is not set
# CONFIG_MOUSE_BCM5974 is not set
# CONFIG_MOUSE_VSXXXAA is not set
# CONFIG_MOUSE_GPIO is not set
# CONFIG_MOUSE_SYNAPTICS_I2C is not set
# CONFIG_MOUSE_SYNAPTICS_USB is not set
# CONFIG_INPUT_JOYSTICK is not set
CONFIG_INPUT_TABLET=y
# CONFIG_TABLET_USB_ACECAD is not set
# CONFIG_TABLET_USB_AIPTEK is not set
# CONFIG_TABLET_USB_GTCO is not set
# CONFIG_TABLET_USB_HANWANG is not set
# CONFIG_TABLET_USB_KBTAB is not set
# CONFIG_TABLET_USB_WACOM is not set
CONFIG_INPUT_TOUCHSCREEN=y
# CONFIG_TOUCHSCREEN_AD7879 is not set
# CONFIG_TOUCHSCREEN_ATMEL_MXT is not set
# CONFIG_TOUCHSCREEN_AUO_PIXCIR is not set
# CONFIG_TOUCHSCREEN_BU21013 is not set
# CONFIG_TOUCHSCREEN_CY8CTMG110 is not set
# CONFIG_TOUCHSCREEN_CYTTSP_CORE is not set
# CONFIG_TOUCHSCREEN_DYNAPRO is not set
# CONFIG_TOUCHSCREEN_HAMPSHIRE is not set
# CONFIG_TOUCHSCREEN_EETI is not set
# CONFIG_TOUCHSCREEN_FUJITSU is not set
# CONFIG_TOUCHSCREEN_ILI210X is not set
# CONFIG_TOUCHSCREEN_GUNZE is not set
# CONFIG_TOUCHSCREEN_ELO is not set
# CONFIG_TOUCHSCREEN_WACOM_W8001 is not set
# CONFIG_TOUCHSCREEN_WACOM_I2C is not set
# CONFIG_TOUCHSCREEN_MAX11801 is not set
# CONFIG_TOUCHSCREEN_MCS5000 is not set
# CONFIG_TOUCHSCREEN_MMS114 is not set
# CONFIG_TOUCHSCREEN_MTOUCH is not set
# CONFIG_TOUCHSCREEN_INEXIO is not set
# CONFIG_TOUCHSCREEN_MK712 is not set
# CONFIG_TOUCHSCREEN_PENMOUNT is not set
# CONFIG_TOUCHSCREEN_EDT_FT5X06 is not set
# CONFIG_TOUCHSCREEN_TOUCHRIGHT is not set
# CONFIG_TOUCHSCREEN_TOUCHWIN is not set
# CONFIG_TOUCHSCREEN_PIXCIR is not set
# CONFIG_TOUCHSCREEN_USB_COMPOSITE is not set
# CONFIG_TOUCHSCREEN_TOUCHIT213 is not set
# CONFIG_TOUCHSCREEN_TSC_SERIO is not set
# CONFIG_TOUCHSCREEN_TSC2007 is not set
# CONFIG_TOUCHSCREEN_ST1232 is not set
# CONFIG_TOUCHSCREEN_TPS6507X is not set
CONFIG_INPUT_MISC=y
# CONFIG_INPUT_AD714X is not set
# CONFIG_INPUT_BMA150 is not set
CONFIG_INPUT_PCSPKR=m
# CONFIG_INPUT_MMA8450 is not set
# CONFIG_INPUT_MPU3050 is not set
# CONFIG_INPUT_APANEL is not set
# CONFIG_INPUT_GP2A is not set
# CONFIG_INPUT_GPIO_TILT_POLLED is not set
# CONFIG_INPUT_ATLAS_BTNS is not set
# CONFIG_INPUT_ATI_REMOTE2 is not set
# CONFIG_INPUT_KEYSPAN_REMOTE is not set
# CONFIG_INPUT_KXTJ9 is not set
# CONFIG_INPUT_POWERMATE is not set
# CONFIG_INPUT_YEALINK is not set
# CONFIG_INPUT_CM109 is not set
# CONFIG_INPUT_UINPUT is not set
# CONFIG_INPUT_PCF8574 is not set
# CONFIG_INPUT_GPIO_ROTARY_ENCODER is not set
# CONFIG_INPUT_ADXL34X is not set
# CONFIG_INPUT_CMA3000 is not set
CONFIG_INPUT_XEN_KBDDEV_FRONTEND=y

#
# Hardware I/O ports
#
CONFIG_SERIO=y
CONFIG_SERIO_I8042=y
CONFIG_SERIO_SERPORT=y
# CONFIG_SERIO_CT82C710 is not set
# CONFIG_SERIO_PCIPS2 is not set
CONFIG_SERIO_LIBPS2=y
# CONFIG_SERIO_RAW is not set
# CONFIG_SERIO_ALTERA_PS2 is not set
# CONFIG_SERIO_PS2MULT is not set
# CONFIG_GAMEPORT is not set

#
# Character devices
#
CONFIG_VT=y
CONFIG_CONSOLE_TRANSLATIONS=y
CONFIG_VT_CONSOLE=y
CONFIG_VT_CONSOLE_SLEEP=y
CONFIG_HW_CONSOLE=y
CONFIG_VT_HW_CONSOLE_BINDING=y
CONFIG_UNIX98_PTYS=y
CONFIG_DEVPTS_MULTIPLE_INSTANCES=y
# CONFIG_LEGACY_PTYS is not set
CONFIG_SERIAL_NONSTANDARD=y
# CONFIG_ROCKETPORT is not set
# CONFIG_CYCLADES is not set
# CONFIG_MOXA_INTELLIO is not set
# CONFIG_MOXA_SMARTIO is not set
# CONFIG_SYNCLINK is not set
# CONFIG_SYNCLINKMP is not set
# CONFIG_SYNCLINK_GT is not set
# CONFIG_NOZOMI is not set
# CONFIG_ISI is not set
# CONFIG_N_HDLC is not set
# CONFIG_N_GSM is not set
# CONFIG_TRACE_SINK is not set
# CONFIG_DEVKMEM is not set
# CONFIG_STALDRV is not set

#
# Serial drivers
#
CONFIG_SERIAL_8250=y
CONFIG_SERIAL_8250_PNP=y
CONFIG_SERIAL_8250_CONSOLE=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_SERIAL_8250_PCI=y
# CONFIG_SERIAL_8250_CS is not set
CONFIG_SERIAL_8250_NR_UARTS=32
CONFIG_SERIAL_8250_RUNTIME_UARTS=4
CONFIG_SERIAL_8250_EXTENDED=y
CONFIG_SERIAL_8250_MANY_PORTS=y
CONFIG_SERIAL_8250_SHARE_IRQ=y
CONFIG_SERIAL_8250_DETECT_IRQ=y
CONFIG_SERIAL_8250_RSA=y

#
# Non-8250 serial port support
#
# CONFIG_SERIAL_MFD_HSU is not set
CONFIG_SERIAL_CORE=y
CONFIG_SERIAL_CORE_CONSOLE=y
# CONFIG_SERIAL_JSM is not set
# CONFIG_SERIAL_SCCNXP is not set
# CONFIG_SERIAL_TIMBERDALE is not set
# CONFIG_SERIAL_ALTERA_JTAGUART is not set
# CONFIG_SERIAL_ALTERA_UART is not set
# CONFIG_SERIAL_PCH_UART is not set
# CONFIG_SERIAL_XILINX_PS_UART is not set
CONFIG_HVC_DRIVER=y
CONFIG_HVC_IRQ=y
CONFIG_HVC_XEN=y
CONFIG_HVC_XEN_FRONTEND=y
# CONFIG_IPMI_HANDLER is not set
CONFIG_HW_RANDOM=y
# CONFIG_HW_RANDOM_TIMERIOMEM is not set
# CONFIG_HW_RANDOM_INTEL is not set
# CONFIG_HW_RANDOM_AMD is not set
# CONFIG_HW_RANDOM_VIA is not set
CONFIG_HW_RANDOM_TPM=y
CONFIG_NVRAM=y
# CONFIG_R3964 is not set
# CONFIG_APPLICOM is not set

#
# PCMCIA character devices
#
# CONFIG_SYNCLINK_CS is not set
# CONFIG_CARDMAN_4000 is not set
# CONFIG_CARDMAN_4040 is not set
# CONFIG_IPWIRELESS is not set
# CONFIG_MWAVE is not set
CONFIG_RAW_DRIVER=y
CONFIG_MAX_RAW_DEVS=8192
CONFIG_HPET=y
# CONFIG_HPET_MMAP is not set
# CONFIG_HANGCHECK_TIMER is not set
CONFIG_TCG_TPM=y
CONFIG_TCG_TIS=y
# CONFIG_TCG_TIS_I2C_INFINEON is not set
# CONFIG_TCG_NSC is not set
# CONFIG_TCG_ATMEL is not set
# CONFIG_TCG_INFINEON is not set
# CONFIG_TELCLOCK is not set
CONFIG_DEVPORT=y
CONFIG_I2C=m
CONFIG_I2C_BOARDINFO=y
CONFIG_I2C_COMPAT=y
# CONFIG_I2C_CHARDEV is not set
# CONFIG_I2C_MUX is not set
CONFIG_I2C_HELPER_AUTO=y
CONFIG_I2C_ALGOBIT=m

#
# I2C Hardware Bus support
#

#
# PC SMBus host controller drivers
#
# CONFIG_I2C_ALI1535 is not set
# CONFIG_I2C_ALI1563 is not set
# CONFIG_I2C_ALI15X3 is not set
# CONFIG_I2C_AMD756 is not set
# CONFIG_I2C_AMD8111 is not set
# CONFIG_I2C_I801 is not set
# CONFIG_I2C_ISCH is not set
CONFIG_I2C_PIIX4=m
# CONFIG_I2C_NFORCE2 is not set
# CONFIG_I2C_SIS5595 is not set
# CONFIG_I2C_SIS630 is not set
# CONFIG_I2C_SIS96X is not set
# CONFIG_I2C_VIA is not set
# CONFIG_I2C_VIAPRO is not set

#
# ACPI drivers
#
# CONFIG_I2C_SCMI is not set

#
# I2C system bus drivers (mostly embedded / system-on-chip)
#
# CONFIG_I2C_DESIGNWARE_PCI is not set
# CONFIG_I2C_EG20T is not set
# CONFIG_I2C_GPIO is not set
# CONFIG_I2C_INTEL_MID is not set
# CONFIG_I2C_OCORES is not set
# CONFIG_I2C_PCA_PLATFORM is not set
# CONFIG_I2C_PXA_PCI is not set
# CONFIG_I2C_SIMTEC is not set
# CONFIG_I2C_XILINX is not set

#
# External I2C/SMBus adapter drivers
#
# CONFIG_I2C_DIOLAN_U2C is not set
# CONFIG_I2C_PARPORT_LIGHT is not set
# CONFIG_I2C_TAOS_EVM is not set
# CONFIG_I2C_TINY_USB is not set

#
# Other I2C/SMBus bus drivers
#
# CONFIG_I2C_STUB is not set
# CONFIG_I2C_DEBUG_CORE is not set
# CONFIG_I2C_DEBUG_ALGO is not set
# CONFIG_I2C_DEBUG_BUS is not set
# CONFIG_SPI is not set
# CONFIG_HSI is not set

#
# PPS support
#
# CONFIG_PPS is not set

#
# PPS generators support
#

#
# PTP clock support
#

#
# Enable Device Drivers -> PPS to see the PTP clock options.
#
CONFIG_ARCH_WANT_OPTIONAL_GPIOLIB=y
CONFIG_GPIOLIB=y
# CONFIG_GPIO_SYSFS is not set

#
# Memory mapped GPIO drivers:
#
# CONFIG_GPIO_GENERIC_PLATFORM is not set
# CONFIG_GPIO_IT8761E is not set
# CONFIG_GPIO_SCH is not set
# CONFIG_GPIO_ICH is not set
# CONFIG_GPIO_VX855 is not set

#
# I2C GPIO expanders:
#
# CONFIG_GPIO_MAX7300 is not set
# CONFIG_GPIO_MAX732X is not set
# CONFIG_GPIO_PCA953X is not set
# CONFIG_GPIO_PCF857X is not set
# CONFIG_GPIO_ADP5588 is not set

#
# PCI GPIO expanders:
#
# CONFIG_GPIO_BT8XX is not set
# CONFIG_GPIO_AMD8111 is not set
# CONFIG_GPIO_LANGWELL is not set
# CONFIG_GPIO_PCH is not set
# CONFIG_GPIO_ML_IOH is not set
# CONFIG_GPIO_RDC321X is not set

#
# SPI GPIO expanders:
#
# CONFIG_GPIO_MCP23S08 is not set

#
# AC97 GPIO expanders:
#

#
# MODULbus GPIO expanders:
#
# CONFIG_W1 is not set
CONFIG_POWER_SUPPLY=y
# CONFIG_POWER_SUPPLY_DEBUG is not set
# CONFIG_PDA_POWER is not set
# CONFIG_TEST_POWER is not set
# CONFIG_BATTERY_DS2780 is not set
# CONFIG_BATTERY_DS2781 is not set
# CONFIG_BATTERY_DS2782 is not set
# CONFIG_BATTERY_SBS is not set
# CONFIG_BATTERY_BQ27x00 is not set
# CONFIG_BATTERY_MAX17040 is not set
# CONFIG_BATTERY_MAX17042 is not set
# CONFIG_CHARGER_MAX8903 is not set
# CONFIG_CHARGER_LP8727 is not set
# CONFIG_CHARGER_GPIO is not set
# CONFIG_CHARGER_MANAGER is not set
# CONFIG_CHARGER_SMB347 is not set
# CONFIG_POWER_AVS is not set
CONFIG_HWMON=m
# CONFIG_HWMON_VID is not set
# CONFIG_HWMON_DEBUG_CHIP is not set

#
# Native drivers
#
# CONFIG_SENSORS_ABITUGURU is not set
# CONFIG_SENSORS_ABITUGURU3 is not set
# CONFIG_SENSORS_AD7414 is not set
# CONFIG_SENSORS_AD7418 is not set
# CONFIG_SENSORS_ADM1021 is not set
# CONFIG_SENSORS_ADM1025 is not set
# CONFIG_SENSORS_ADM1026 is not set
# CONFIG_SENSORS_ADM1029 is not set
# CONFIG_SENSORS_ADM1031 is not set
# CONFIG_SENSORS_ADM9240 is not set
# CONFIG_SENSORS_ADT7410 is not set
# CONFIG_SENSORS_ADT7411 is not set
# CONFIG_SENSORS_ADT7462 is not set
# CONFIG_SENSORS_ADT7470 is not set
# CONFIG_SENSORS_ADT7475 is not set
# CONFIG_SENSORS_ASC7621 is not set
# CONFIG_SENSORS_K8TEMP is not set
# CONFIG_SENSORS_K10TEMP is not set
# CONFIG_SENSORS_FAM15H_POWER is not set
# CONFIG_SENSORS_ASB100 is not set
# CONFIG_SENSORS_ATXP1 is not set
# CONFIG_SENSORS_DS620 is not set
# CONFIG_SENSORS_DS1621 is not set
# CONFIG_SENSORS_I5K_AMB is not set
# CONFIG_SENSORS_F71805F is not set
# CONFIG_SENSORS_F71882FG is not set
# CONFIG_SENSORS_F75375S is not set
# CONFIG_SENSORS_FSCHMD is not set
# CONFIG_SENSORS_G760A is not set
# CONFIG_SENSORS_GL518SM is not set
# CONFIG_SENSORS_GL520SM is not set
# CONFIG_SENSORS_GPIO_FAN is not set
# CONFIG_SENSORS_HIH6130 is not set
# CONFIG_SENSORS_CORETEMP is not set
# CONFIG_SENSORS_IT87 is not set
# CONFIG_SENSORS_JC42 is not set
# CONFIG_SENSORS_LINEAGE is not set
# CONFIG_SENSORS_LM63 is not set
# CONFIG_SENSORS_LM73 is not set
# CONFIG_SENSORS_LM75 is not set
# CONFIG_SENSORS_LM77 is not set
# CONFIG_SENSORS_LM78 is not set
# CONFIG_SENSORS_LM80 is not set
# CONFIG_SENSORS_LM83 is not set
# CONFIG_SENSORS_LM85 is not set
# CONFIG_SENSORS_LM87 is not set
# CONFIG_SENSORS_LM90 is not set
# CONFIG_SENSORS_LM92 is not set
# CONFIG_SENSORS_LM93 is not set
# CONFIG_SENSORS_LTC4151 is not set
# CONFIG_SENSORS_LTC4215 is not set
# CONFIG_SENSORS_LTC4245 is not set
# CONFIG_SENSORS_LTC4261 is not set
# CONFIG_SENSORS_LM95241 is not set
# CONFIG_SENSORS_LM95245 is not set
# CONFIG_SENSORS_MAX16065 is not set
# CONFIG_SENSORS_MAX1619 is not set
# CONFIG_SENSORS_MAX1668 is not set
# CONFIG_SENSORS_MAX197 is not set
# CONFIG_SENSORS_MAX6639 is not set
# CONFIG_SENSORS_MAX6642 is not set
# CONFIG_SENSORS_MAX6650 is not set
# CONFIG_SENSORS_MCP3021 is not set
# CONFIG_SENSORS_NTC_THERMISTOR is not set
# CONFIG_SENSORS_PC87360 is not set
# CONFIG_SENSORS_PC87427 is not set
# CONFIG_SENSORS_PCF8591 is not set
# CONFIG_PMBUS is not set
# CONFIG_SENSORS_SHT15 is not set
# CONFIG_SENSORS_SHT21 is not set
# CONFIG_SENSORS_SIS5595 is not set
# CONFIG_SENSORS_SMM665 is not set
# CONFIG_SENSORS_DME1737 is not set
# CONFIG_SENSORS_EMC1403 is not set
# CONFIG_SENSORS_EMC2103 is not set
# CONFIG_SENSORS_EMC6W201 is not set
# CONFIG_SENSORS_SMSC47M1 is not set
# CONFIG_SENSORS_SMSC47M192 is not set
# CONFIG_SENSORS_SMSC47B397 is not set
# CONFIG_SENSORS_SCH56XX_COMMON is not set
# CONFIG_SENSORS_SCH5627 is not set
# CONFIG_SENSORS_SCH5636 is not set
# CONFIG_SENSORS_ADS1015 is not set
# CONFIG_SENSORS_ADS7828 is not set
# CONFIG_SENSORS_AMC6821 is not set
# CONFIG_SENSORS_INA2XX is not set
# CONFIG_SENSORS_THMC50 is not set
# CONFIG_SENSORS_TMP102 is not set
# CONFIG_SENSORS_TMP401 is not set
# CONFIG_SENSORS_TMP421 is not set
# CONFIG_SENSORS_VIA_CPUTEMP is not set
# CONFIG_SENSORS_VIA686A is not set
# CONFIG_SENSORS_VT1211 is not set
# CONFIG_SENSORS_VT8231 is not set
# CONFIG_SENSORS_W83781D is not set
# CONFIG_SENSORS_W83791D is not set
# CONFIG_SENSORS_W83792D is not set
# CONFIG_SENSORS_W83793 is not set
# CONFIG_SENSORS_W83795 is not set
# CONFIG_SENSORS_W83L785TS is not set
# CONFIG_SENSORS_W83L786NG is not set
# CONFIG_SENSORS_W83627HF is not set
# CONFIG_SENSORS_W83627EHF is not set
# CONFIG_SENSORS_APPLESMC is not set

#
# ACPI drivers
#
# CONFIG_SENSORS_ACPI_POWER is not set
# CONFIG_SENSORS_ATK0110 is not set
CONFIG_THERMAL=y
# CONFIG_CPU_THERMAL is not set
CONFIG_WATCHDOG=y
CONFIG_WATCHDOG_CORE=y
# CONFIG_WATCHDOG_NOWAYOUT is not set

#
# Watchdog Device Drivers
#
# CONFIG_SOFT_WATCHDOG is not set
# CONFIG_ACQUIRE_WDT is not set
# CONFIG_ADVANTECH_WDT is not set
# CONFIG_ALIM1535_WDT is not set
# CONFIG_ALIM7101_WDT is not set
# CONFIG_F71808E_WDT is not set
# CONFIG_SP5100_TCO is not set
# CONFIG_SC520_WDT is not set
# CONFIG_SBC_FITPC2_WATCHDOG is not set
# CONFIG_EUROTECH_WDT is not set
# CONFIG_IB700_WDT is not set
# CONFIG_IBMASR is not set
# CONFIG_WAFER_WDT is not set
# CONFIG_I6300ESB_WDT is not set
# CONFIG_IE6XX_WDT is not set
# CONFIG_ITCO_WDT is not set
# CONFIG_IT8712F_WDT is not set
# CONFIG_IT87_WDT is not set
# CONFIG_HP_WATCHDOG is not set
# CONFIG_SC1200_WDT is not set
# CONFIG_PC87413_WDT is not set
# CONFIG_NV_TCO is not set
# CONFIG_60XX_WDT is not set
# CONFIG_SBC8360_WDT is not set
# CONFIG_CPU5_WDT is not set
# CONFIG_SMSC_SCH311X_WDT is not set
# CONFIG_SMSC37B787_WDT is not set
# CONFIG_VIA_WDT is not set
# CONFIG_W83627HF_WDT is not set
# CONFIG_W83697HF_WDT is not set
# CONFIG_W83697UG_WDT is not set
# CONFIG_W83877F_WDT is not set
# CONFIG_W83977F_WDT is not set
# CONFIG_MACHZ_WDT is not set
# CONFIG_SBC_EPX_C3_WATCHDOG is not set
# CONFIG_XEN_WDT is not set

#
# PCI-based Watchdog Cards
#
# CONFIG_PCIPCWATCHDOG is not set
# CONFIG_WDTPCI is not set

#
# USB-based Watchdog Cards
#
# CONFIG_USBPCWATCHDOG is not set
CONFIG_SSB_POSSIBLE=y

#
# Sonics Silicon Backplane
#
# CONFIG_SSB is not set
CONFIG_BCMA_POSSIBLE=y

#
# Broadcom specific AMBA
#
# CONFIG_BCMA is not set

#
# Multifunction device drivers
#
# CONFIG_MFD_CORE is not set
# CONFIG_MFD_SM501 is not set
# CONFIG_HTC_PASIC3 is not set
# CONFIG_MFD_LM3533 is not set
# CONFIG_TPS6105X is not set
# CONFIG_TPS65010 is not set
# CONFIG_TPS6507X is not set
# CONFIG_MFD_TPS65217 is not set
# CONFIG_MFD_TMIO is not set
# CONFIG_MFD_ARIZONA_I2C is not set
# CONFIG_MFD_PCF50633 is not set
# CONFIG_MFD_MC13XXX_I2C is not set
# CONFIG_ABX500_CORE is not set
# CONFIG_MFD_CS5535 is not set
# CONFIG_MFD_TIMBERDALE is not set
# CONFIG_LPC_SCH is not set
# CONFIG_LPC_ICH is not set
# CONFIG_MFD_RDC321X is not set
# CONFIG_MFD_JANZ_CMODIO is not set
# CONFIG_MFD_VX855 is not set
# CONFIG_MFD_WL1273_CORE is not set
CONFIG_REGULATOR=y
# CONFIG_REGULATOR_DEBUG is not set
# CONFIG_REGULATOR_DUMMY is not set
# CONFIG_REGULATOR_FIXED_VOLTAGE is not set
# CONFIG_REGULATOR_VIRTUAL_CONSUMER is not set
# CONFIG_REGULATOR_USERSPACE_CONSUMER is not set
# CONFIG_REGULATOR_GPIO is not set
# CONFIG_REGULATOR_AD5398 is not set
# CONFIG_REGULATOR_FAN53555 is not set
# CONFIG_REGULATOR_ISL6271A is not set
# CONFIG_REGULATOR_MAX1586 is not set
# CONFIG_REGULATOR_MAX8649 is not set
# CONFIG_REGULATOR_MAX8660 is not set
# CONFIG_REGULATOR_MAX8952 is not set
# CONFIG_REGULATOR_LP3971 is not set
# CONFIG_REGULATOR_LP3972 is not set
# CONFIG_REGULATOR_TPS62360 is not set
# CONFIG_REGULATOR_TPS65023 is not set
# CONFIG_REGULATOR_TPS6507X is not set
# CONFIG_MEDIA_SUPPORT is not set

#
# Graphics support
#
CONFIG_AGP=y
CONFIG_AGP_AMD64=y
CONFIG_AGP_INTEL=y
CONFIG_AGP_SIS=y
CONFIG_AGP_VIA=y
CONFIG_VGA_ARB=y
CONFIG_VGA_ARB_MAX_GPUS=64
# CONFIG_VGA_SWITCHEROO is not set
CONFIG_DRM=m
CONFIG_DRM_KMS_HELPER=m
# CONFIG_DRM_LOAD_EDID_FIRMWARE is not set
CONFIG_DRM_TTM=m
# CONFIG_DRM_TDFX is not set
# CONFIG_DRM_R128 is not set
CONFIG_DRM_RADEON=m
CONFIG_DRM_RADEON_KMS=y
# CONFIG_DRM_NOUVEAU is not set

#
# I2C encoder or helper chips
#
# CONFIG_DRM_I2C_CH7006 is not set
# CONFIG_DRM_I2C_SIL164 is not set
# CONFIG_DRM_I810 is not set
# CONFIG_DRM_I915 is not set
# CONFIG_DRM_MGA is not set
# CONFIG_DRM_SIS is not set
# CONFIG_DRM_VIA is not set
# CONFIG_DRM_SAVAGE is not set
# CONFIG_DRM_VMWGFX is not set
# CONFIG_DRM_GMA500 is not set
# CONFIG_DRM_UDL is not set
# CONFIG_DRM_AST is not set
# CONFIG_DRM_MGAG200 is not set
# CONFIG_DRM_CIRRUS_QEMU is not set
# CONFIG_STUB_POULSBO is not set
# CONFIG_VGASTATE is not set
# CONFIG_VIDEO_OUTPUT_CONTROL is not set
CONFIG_FB=y
# CONFIG_FIRMWARE_EDID is not set
# CONFIG_FB_DDC is not set
CONFIG_FB_BOOT_VESA_SUPPORT=y
CONFIG_FB_CFB_FILLRECT=y
CONFIG_FB_CFB_COPYAREA=y
CONFIG_FB_CFB_IMAGEBLIT=y
# CONFIG_FB_CFB_REV_PIXELS_IN_BYTE is not set
CONFIG_FB_SYS_FILLRECT=y
CONFIG_FB_SYS_COPYAREA=y
CONFIG_FB_SYS_IMAGEBLIT=y
# CONFIG_FB_FOREIGN_ENDIAN is not set
CONFIG_FB_SYS_FOPS=y
# CONFIG_FB_WMT_GE_ROPS is not set
CONFIG_FB_DEFERRED_IO=y
# CONFIG_FB_SVGALIB is not set
# CONFIG_FB_MACMODES is not set
# CONFIG_FB_BACKLIGHT is not set
CONFIG_FB_MODE_HELPERS=y
CONFIG_FB_TILEBLITTING=y

#
# Frame buffer hardware drivers
#
# CONFIG_FB_CIRRUS is not set
# CONFIG_FB_PM2 is not set
# CONFIG_FB_CYBER2000 is not set
# CONFIG_FB_ARC is not set
# CONFIG_FB_ASILIANT is not set
# CONFIG_FB_IMSTT is not set
# CONFIG_FB_VGA16 is not set
# CONFIG_FB_UVESA is not set
CONFIG_FB_VESA=y
CONFIG_FB_EFI=y
# CONFIG_FB_N411 is not set
# CONFIG_FB_HGA is not set
# CONFIG_FB_S1D13XXX is not set
# CONFIG_FB_NVIDIA is not set
# CONFIG_FB_RIVA is not set
# CONFIG_FB_I740 is not set
# CONFIG_FB_LE80578 is not set
# CONFIG_FB_MATROX is not set
# CONFIG_FB_RADEON is not set
# CONFIG_FB_ATY128 is not set
# CONFIG_FB_ATY is not set
# CONFIG_FB_S3 is not set
# CONFIG_FB_SAVAGE is not set
# CONFIG_FB_SIS is not set
# CONFIG_FB_VIA is not set
# CONFIG_FB_NEOMAGIC is not set
# CONFIG_FB_KYRO is not set
# CONFIG_FB_3DFX is not set
# CONFIG_FB_VOODOO1 is not set
# CONFIG_FB_VT8623 is not set
# CONFIG_FB_TRIDENT is not set
# CONFIG_FB_ARK is not set
# CONFIG_FB_PM3 is not set
# CONFIG_FB_CARMINE is not set
# CONFIG_FB_GEODE is not set
# CONFIG_FB_SMSCUFX is not set
# CONFIG_FB_UDL is not set
# CONFIG_FB_VIRTUAL is not set
CONFIG_XEN_FBDEV_FRONTEND=y
# CONFIG_FB_METRONOME is not set
# CONFIG_FB_MB862XX is not set
# CONFIG_FB_BROADSHEET is not set
# CONFIG_FB_AUO_K190X is not set
# CONFIG_EXYNOS_VIDEO is not set
CONFIG_BACKLIGHT_LCD_SUPPORT=y
# CONFIG_LCD_CLASS_DEVICE is not set
CONFIG_BACKLIGHT_CLASS_DEVICE=y
# CONFIG_BACKLIGHT_GENERIC is not set
# CONFIG_BACKLIGHT_APPLE is not set
# CONFIG_BACKLIGHT_SAHARA is not set
# CONFIG_BACKLIGHT_ADP8860 is not set
# CONFIG_BACKLIGHT_ADP8870 is not set
# CONFIG_BACKLIGHT_LM3630 is not set
# CONFIG_BACKLIGHT_LM3639 is not set
# CONFIG_BACKLIGHT_LP855X is not set

#
# Console display driver support
#
CONFIG_VGA_CONSOLE=y
CONFIG_VGACON_SOFT_SCROLLBACK=y
CONFIG_VGACON_SOFT_SCROLLBACK_SIZE=64
CONFIG_DUMMY_CONSOLE=y
CONFIG_FRAMEBUFFER_CONSOLE=y
CONFIG_FRAMEBUFFER_CONSOLE_DETECT_PRIMARY=y
CONFIG_FRAMEBUFFER_CONSOLE_ROTATION=y
# CONFIG_FONTS is not set
CONFIG_FONT_8x8=y
CONFIG_FONT_8x16=y
CONFIG_LOGO=y
# CONFIG_LOGO_LINUX_MONO is not set
# CONFIG_LOGO_LINUX_VGA16 is not set
CONFIG_LOGO_LINUX_CLUT224=y
# CONFIG_SOUND is not set

#
# HID support
#
CONFIG_HID=y
# CONFIG_HID_BATTERY_STRENGTH is not set
CONFIG_HIDRAW=y
# CONFIG_UHID is not set
CONFIG_HID_GENERIC=y

#
# Special HID drivers
#
CONFIG_HID_A4TECH=y
# CONFIG_HID_ACRUX is not set
CONFIG_HID_APPLE=y
# CONFIG_HID_AUREAL is not set
CONFIG_HID_BELKIN=y
CONFIG_HID_CHERRY=y
CONFIG_HID_CHICONY=y
CONFIG_HID_CYPRESS=y
CONFIG_HID_DRAGONRISE=y
# CONFIG_DRAGONRISE_FF is not set
# CONFIG_HID_EMS_FF is not set
CONFIG_HID_EZKEY=y
# CONFIG_HID_HOLTEK is not set
# CONFIG_HID_KEYTOUCH is not set
CONFIG_HID_KYE=y
# CONFIG_HID_UCLOGIC is not set
# CONFIG_HID_WALTOP is not set
CONFIG_HID_GYRATION=y
CONFIG_HID_TWINHAN=y
CONFIG_HID_KENSINGTON=y
# CONFIG_HID_LCPOWER is not set
# CONFIG_HID_LENOVO_TPKBD is not set
CONFIG_HID_LOGITECH=y
# CONFIG_HID_LOGITECH_DJ is not set
# CONFIG_LOGITECH_FF is not set
# CONFIG_LOGIRUMBLEPAD2_FF is not set
# CONFIG_LOGIG940_FF is not set
# CONFIG_LOGIWHEELS_FF is not set
CONFIG_HID_MICROSOFT=y
CONFIG_HID_MONTEREY=y
# CONFIG_HID_MULTITOUCH is not set
CONFIG_HID_NTRIG=y
# CONFIG_HID_ORTEK is not set
CONFIG_HID_PANTHERLORD=y
# CONFIG_PANTHERLORD_FF is not set
CONFIG_HID_PETALYNX=y
# CONFIG_HID_PICOLCD is not set
# CONFIG_HID_PRIMAX is not set
# CONFIG_HID_ROCCAT is not set
# CONFIG_HID_SAITEK is not set
CONFIG_HID_SAMSUNG=y
CONFIG_HID_SONY=y
# CONFIG_HID_SPEEDLINK is not set
CONFIG_HID_SUNPLUS=y
CONFIG_HID_GREENASIA=y
# CONFIG_GREENASIA_FF is not set
CONFIG_HID_SMARTJOYPLUS=y
CONFIG_SMARTJOYPLUS_FF=y
# CONFIG_HID_TIVO is not set
CONFIG_HID_TOPSEED=y
CONFIG_HID_THRUSTMASTER=y
# CONFIG_THRUSTMASTER_FF is not set
CONFIG_HID_ZEROPLUS=y
# CONFIG_ZEROPLUS_FF is not set
# CONFIG_HID_ZYDACRON is not set
# CONFIG_HID_SENSOR_HUB is not set

#
# USB HID support
#
CONFIG_USB_HID=y
CONFIG_HID_PID=y
CONFIG_USB_HIDDEV=y
CONFIG_USB_ARCH_HAS_OHCI=y
CONFIG_USB_ARCH_HAS_EHCI=y
CONFIG_USB_ARCH_HAS_XHCI=y
CONFIG_USB_SUPPORT=y
CONFIG_USB_COMMON=y
CONFIG_USB_ARCH_HAS_HCD=y
CONFIG_USB=y
# CONFIG_USB_DEBUG is not set
CONFIG_USB_ANNOUNCE_NEW_DEVICES=y

#
# Miscellaneous USB options
#
# CONFIG_USB_DYNAMIC_MINORS is not set
CONFIG_USB_SUSPEND=y
# CONFIG_USB_OTG is not set
CONFIG_USB_MON=y
# CONFIG_USB_WUSB_CBAF is not set

#
# USB Host Controller Drivers
#
# CONFIG_USB_C67X00_HCD is not set
# CONFIG_USB_XHCI_HCD is not set
CONFIG_USB_EHCI_HCD=y
CONFIG_USB_EHCI_ROOT_HUB_TT=y
CONFIG_USB_EHCI_TT_NEWSCHED=y
# CONFIG_USB_OXU210HP_HCD is not set
# CONFIG_USB_ISP116X_HCD is not set
# CONFIG_USB_ISP1760_HCD is not set
# CONFIG_USB_ISP1362_HCD is not set
CONFIG_USB_OHCI_HCD=y
# CONFIG_USB_OHCI_HCD_PLATFORM is not set
# CONFIG_USB_EHCI_HCD_PLATFORM is not set
# CONFIG_USB_OHCI_BIG_ENDIAN_DESC is not set
# CONFIG_USB_OHCI_BIG_ENDIAN_MMIO is not set
CONFIG_USB_OHCI_LITTLE_ENDIAN=y
CONFIG_USB_UHCI_HCD=y
# CONFIG_USB_SL811_HCD is not set
# CONFIG_USB_R8A66597_HCD is not set
# CONFIG_USB_CHIPIDEA is not set

#
# USB Device Class drivers
#
# CONFIG_USB_ACM is not set
# CONFIG_USB_PRINTER is not set
# CONFIG_USB_WDM is not set
# CONFIG_USB_TMC is not set

#
# NOTE: USB_STORAGE depends on SCSI but BLK_DEV_SD may
#

#
# also be needed; see USB_STORAGE Help for more info
#
# CONFIG_USB_STORAGE is not set
# CONFIG_USB_UAS is not set

#
# USB Imaging devices
#
# CONFIG_USB_MDC800 is not set
# CONFIG_USB_MICROTEK is not set

#
# USB port drivers
#
# CONFIG_USB_SERIAL is not set

#
# USB Miscellaneous drivers
#
# CONFIG_USB_EMI62 is not set
# CONFIG_USB_EMI26 is not set
# CONFIG_USB_ADUTUX is not set
# CONFIG_USB_SEVSEG is not set
# CONFIG_USB_RIO500 is not set
# CONFIG_USB_LEGOTOWER is not set
# CONFIG_USB_LCD is not set
# CONFIG_USB_LED is not set
# CONFIG_USB_CYPRESS_CY7C63 is not set
# CONFIG_USB_CYTHERM is not set
# CONFIG_USB_IDMOUSE is not set
# CONFIG_USB_FTDI_ELAN is not set
# CONFIG_USB_APPLEDISPLAY is not set
# CONFIG_USB_SISUSBVGA is not set
# CONFIG_USB_LD is not set
# CONFIG_USB_TRANCEVIBRATOR is not set
# CONFIG_USB_IOWARRIOR is not set
# CONFIG_USB_TEST is not set
# CONFIG_USB_ISIGHTFW is not set
# CONFIG_USB_YUREX is not set
# CONFIG_USB_EZUSB_FX2 is not set

#
# USB Physical Layer drivers
#
# CONFIG_OMAP_USB2 is not set
# CONFIG_USB_ISP1301 is not set
# CONFIG_USB_GADGET is not set

#
# OTG and related infrastructure
#
# CONFIG_USB_GPIO_VBUS is not set
# CONFIG_NOP_USB_XCEIV is not set
# CONFIG_UWB is not set
# CONFIG_MMC is not set
# CONFIG_MEMSTICK is not set
CONFIG_NEW_LEDS=y
CONFIG_LEDS_CLASS=y

#
# LED drivers
#
# CONFIG_LEDS_LM3530 is not set
# CONFIG_LEDS_LM3642 is not set
# CONFIG_LEDS_PCA9532 is not set
# CONFIG_LEDS_GPIO is not set
# CONFIG_LEDS_LP3944 is not set
# CONFIG_LEDS_LP5521 is not set
# CONFIG_LEDS_LP5523 is not set
# CONFIG_LEDS_CLEVO_MAIL is not set
# CONFIG_LEDS_PCA955X is not set
# CONFIG_LEDS_PCA9633 is not set
# CONFIG_LEDS_REGULATOR is not set
# CONFIG_LEDS_BD2802 is not set
# CONFIG_LEDS_INTEL_SS4200 is not set
# CONFIG_LEDS_LT3593 is not set
# CONFIG_LEDS_TCA6507 is not set
# CONFIG_LEDS_LM355x is not set
# CONFIG_LEDS_OT200 is not set
# CONFIG_LEDS_BLINKM is not set
CONFIG_LEDS_TRIGGERS=y

#
# LED Triggers
#
# CONFIG_LEDS_TRIGGER_TIMER is not set
# CONFIG_LEDS_TRIGGER_ONESHOT is not set
# CONFIG_LEDS_TRIGGER_HEARTBEAT is not set
# CONFIG_LEDS_TRIGGER_BACKLIGHT is not set
# CONFIG_LEDS_TRIGGER_CPU is not set
# CONFIG_LEDS_TRIGGER_GPIO is not set
# CONFIG_LEDS_TRIGGER_DEFAULT_ON is not set

#
# iptables trigger is under Netfilter config (LED target)
#
# CONFIG_LEDS_TRIGGER_TRANSIENT is not set
# CONFIG_ACCESSIBILITY is not set
# CONFIG_INFINIBAND is not set
CONFIG_EDAC=y

#
# Reporting subsystems
#
CONFIG_EDAC_LEGACY_SYSFS=y
# CONFIG_EDAC_DEBUG is not set
# CONFIG_EDAC_DECODE_MCE is not set
# CONFIG_EDAC_MM_EDAC is not set
CONFIG_RTC_LIB=y
CONFIG_RTC_CLASS=y
CONFIG_RTC_HCTOSYS=y
CONFIG_RTC_HCTOSYS_DEVICE="rtc0"
# CONFIG_RTC_DEBUG is not set

#
# RTC interfaces
#
CONFIG_RTC_INTF_SYSFS=y
CONFIG_RTC_INTF_PROC=y
CONFIG_RTC_INTF_DEV=y
# CONFIG_RTC_INTF_DEV_UIE_EMUL is not set
# CONFIG_RTC_DRV_TEST is not set

#
# I2C RTC drivers
#
# CONFIG_RTC_DRV_DS1307 is not set
# CONFIG_RTC_DRV_DS1374 is not set
# CONFIG_RTC_DRV_DS1672 is not set
# CONFIG_RTC_DRV_DS3232 is not set
# CONFIG_RTC_DRV_MAX6900 is not set
# CONFIG_RTC_DRV_RS5C372 is not set
# CONFIG_RTC_DRV_ISL1208 is not set
# CONFIG_RTC_DRV_ISL12022 is not set
# CONFIG_RTC_DRV_X1205 is not set
# CONFIG_RTC_DRV_PCF8563 is not set
# CONFIG_RTC_DRV_PCF8583 is not set
# CONFIG_RTC_DRV_M41T80 is not set
# CONFIG_RTC_DRV_BQ32K is not set
# CONFIG_RTC_DRV_S35390A is not set
# CONFIG_RTC_DRV_FM3130 is not set
# CONFIG_RTC_DRV_RX8581 is not set
# CONFIG_RTC_DRV_RX8025 is not set
# CONFIG_RTC_DRV_EM3027 is not set
# CONFIG_RTC_DRV_RV3029C2 is not set

#
# SPI RTC drivers
#

#
# Platform RTC drivers
#
CONFIG_RTC_DRV_CMOS=y
# CONFIG_RTC_DRV_DS1286 is not set
# CONFIG_RTC_DRV_DS1511 is not set
# CONFIG_RTC_DRV_DS1553 is not set
# CONFIG_RTC_DRV_DS1742 is not set
# CONFIG_RTC_DRV_STK17TA8 is not set
# CONFIG_RTC_DRV_M48T86 is not set
# CONFIG_RTC_DRV_M48T35 is not set
# CONFIG_RTC_DRV_M48T59 is not set
# CONFIG_RTC_DRV_MSM6242 is not set
# CONFIG_RTC_DRV_BQ4802 is not set
# CONFIG_RTC_DRV_RP5C01 is not set
# CONFIG_RTC_DRV_V3020 is not set
# CONFIG_RTC_DRV_DS2404 is not set

#
# on-CPU RTC drivers
#
CONFIG_DMADEVICES=y
# CONFIG_DMADEVICES_DEBUG is not set

#
# DMA Devices
#
# CONFIG_INTEL_MID_DMAC is not set
# CONFIG_INTEL_IOATDMA is not set
# CONFIG_TIMB_DMA is not set
# CONFIG_PCH_DMA is not set
CONFIG_AUXDISPLAY=y
# CONFIG_UIO is not set
# CONFIG_VFIO is not set

#
# Virtio drivers
#
# CONFIG_VIRTIO_PCI is not set
# CONFIG_VIRTIO_MMIO is not set

#
# Microsoft Hyper-V guest support
#
# CONFIG_HYPERV is not set

#
# Xen driver support
#
CONFIG_XEN_BALLOON=y
# CONFIG_XEN_BALLOON_MEMORY_HOTPLUG is not set
CONFIG_XEN_SCRUB_PAGES=y
# CONFIG_XEN_DEV_EVTCHN is not set
CONFIG_XEN_BACKEND=y
# CONFIG_XENFS is not set
CONFIG_XEN_SYS_HYPERVISOR=y
CONFIG_XEN_XENBUS_FRONTEND=y
# CONFIG_XEN_GNTDEV is not set
# CONFIG_XEN_GRANT_DEV_ALLOC is not set
CONFIG_SWIOTLB_XEN=y
# CONFIG_XEN_PCIDEV_BACKEND is not set
CONFIG_XEN_PRIVCMD=m
# CONFIG_XEN_ACPI_PROCESSOR is not set
# CONFIG_XEN_MCE_LOG is not set
CONFIG_STAGING=y
# CONFIG_ET131X is not set
# CONFIG_SLICOSS is not set
# CONFIG_USBIP_CORE is not set
# CONFIG_ECHO is not set
# CONFIG_COMEDI is not set
# CONFIG_ASUS_OLED is not set
# CONFIG_R8187SE is not set
# CONFIG_RTL8192U is not set
# CONFIG_RTLLIB is not set
# CONFIG_R8712U is not set
# CONFIG_RTS_PSTOR is not set
# CONFIG_RTS5139 is not set
# CONFIG_TRANZPORT is not set
# CONFIG_IDE_PHISON is not set
# CONFIG_VT6655 is not set
# CONFIG_VT6656 is not set
# CONFIG_DX_SEP is not set
# CONFIG_ZSMALLOC is not set
# CONFIG_WLAGS49_H2 is not set
# CONFIG_WLAGS49_H25 is not set
# CONFIG_FB_SM7XX is not set
# CONFIG_CRYSTALHD is not set
# CONFIG_FB_XGI is not set
# CONFIG_ACPI_QUICKSTART is not set
# CONFIG_USB_ENESTORAGE is not set
# CONFIG_BCM_WIMAX is not set
# CONFIG_FT1000 is not set

#
# Speakup console speech
#
# CONFIG_SPEAKUP is not set
# CONFIG_TOUCHSCREEN_CLEARPAD_TM1217 is not set
# CONFIG_TOUCHSCREEN_SYNAPTICS_I2C_RMI4 is not set
# CONFIG_STAGING_MEDIA is not set

#
# Android
#
# CONFIG_ANDROID is not set
# CONFIG_PHONE is not set
# CONFIG_USB_WPAN_HCD is not set
# CONFIG_IPACK_BUS is not set
# CONFIG_WIMAX_GDM72XX is not set
CONFIG_NET_VENDOR_SILICOM=y
# CONFIG_SBYPASS is not set
# CONFIG_BPCTL is not set
# CONFIG_CED1401 is not set
# CONFIG_DGRP is not set
CONFIG_X86_PLATFORM_DEVICES=y
# CONFIG_ACERHDF is not set
# CONFIG_ASUS_LAPTOP is not set
# CONFIG_FUJITSU_LAPTOP is not set
# CONFIG_FUJITSU_TABLET is not set
# CONFIG_HP_ACCEL is not set
# CONFIG_PANASONIC_LAPTOP is not set
# CONFIG_THINKPAD_ACPI is not set
# CONFIG_SENSORS_HDAPS is not set
# CONFIG_INTEL_MENLOW is not set
# CONFIG_EEEPC_LAPTOP is not set
# CONFIG_ACPI_WMI is not set
# CONFIG_TOPSTAR_LAPTOP is not set
# CONFIG_TOSHIBA_BT_RFKILL is not set
# CONFIG_ACPI_CMPC is not set
# CONFIG_INTEL_IPS is not set
# CONFIG_IBM_RTL is not set
# CONFIG_XO15_EBOOK is not set
# CONFIG_SAMSUNG_LAPTOP is not set
# CONFIG_SAMSUNG_Q10 is not set
# CONFIG_APPLE_GMUX is not set

#
# Hardware Spinlock drivers
#
CONFIG_CLKEVT_I8253=y
CONFIG_I8253_LOCK=y
CONFIG_CLKBLD_I8253=y
CONFIG_IOMMU_API=y
CONFIG_IOMMU_SUPPORT=y
CONFIG_AMD_IOMMU=y
CONFIG_AMD_IOMMU_STATS=y
# CONFIG_AMD_IOMMU_V2 is not set
# CONFIG_INTEL_IOMMU is not set
# CONFIG_IRQ_REMAP is not set

#
# Remoteproc drivers (EXPERIMENTAL)
#
# CONFIG_STE_MODEM_RPROC is not set

#
# Rpmsg drivers (EXPERIMENTAL)
#
# CONFIG_VIRT_DRIVERS is not set
# CONFIG_PM_DEVFREQ is not set
# CONFIG_EXTCON is not set
# CONFIG_MEMORY is not set
# CONFIG_IIO is not set
# CONFIG_VME_BUS is not set
# CONFIG_PWM is not set

#
# Firmware Drivers
#
# CONFIG_EDD is not set
CONFIG_FIRMWARE_MEMMAP=y
CONFIG_EFI_VARS=y
# CONFIG_DELL_RBU is not set
# CONFIG_DCDBAS is not set
CONFIG_DMIID=y
# CONFIG_DMI_SYSFS is not set
CONFIG_ISCSI_IBFT_FIND=y
# CONFIG_ISCSI_IBFT is not set
# CONFIG_GOOGLE_FIRMWARE is not set

#
# File systems
#
CONFIG_DCACHE_WORD_ACCESS=y
# CONFIG_EXT2_FS is not set
CONFIG_EXT3_FS=m
CONFIG_EXT3_DEFAULTS_TO_ORDERED=y
CONFIG_EXT3_FS_XATTR=y
CONFIG_EXT3_FS_POSIX_ACL=y
CONFIG_EXT3_FS_SECURITY=y
# CONFIG_EXT4_FS is not set
CONFIG_JBD=m
# CONFIG_JBD_DEBUG is not set
CONFIG_FS_MBCACHE=m
# CONFIG_REISERFS_FS is not set
# CONFIG_JFS_FS is not set
# CONFIG_XFS_FS is not set
# CONFIG_GFS2_FS is not set
# CONFIG_BTRFS_FS is not set
# CONFIG_NILFS2_FS is not set
CONFIG_FS_POSIX_ACL=y
CONFIG_FILE_LOCKING=y
CONFIG_FSNOTIFY=y
CONFIG_DNOTIFY=y
CONFIG_INOTIFY_USER=y
# CONFIG_FANOTIFY is not set
CONFIG_QUOTA=y
CONFIG_QUOTA_NETLINK_INTERFACE=y
CONFIG_PRINT_QUOTA_WARNING=y
# CONFIG_QUOTA_DEBUG is not set
CONFIG_QUOTA_TREE=y
# CONFIG_QFMT_V1 is not set
CONFIG_QFMT_V2=y
CONFIG_QUOTACTL=y
CONFIG_QUOTACTL_COMPAT=y
CONFIG_AUTOFS4_FS=m
# CONFIG_FUSE_FS is not set
CONFIG_GENERIC_ACL=y

#
# Caches
#
# CONFIG_FSCACHE is not set

#
# CD-ROM/DVD Filesystems
#
CONFIG_ISO9660_FS=y
CONFIG_JOLIET=y
CONFIG_ZISOFS=y
# CONFIG_UDF_FS is not set

#
# DOS/FAT/NT Filesystems
#
# CONFIG_MSDOS_FS is not set
# CONFIG_VFAT_FS is not set
# CONFIG_NTFS_FS is not set

#
# Pseudo filesystems
#
CONFIG_PROC_FS=y
CONFIG_PROC_KCORE=y
CONFIG_PROC_VMCORE=y
CONFIG_PROC_SYSCTL=y
CONFIG_PROC_PAGE_MONITOR=y
CONFIG_SYSFS=y
CONFIG_TMPFS=y
CONFIG_TMPFS_POSIX_ACL=y
CONFIG_TMPFS_XATTR=y
CONFIG_HUGETLBFS=y
CONFIG_HUGETLB_PAGE=y
# CONFIG_CONFIGFS_FS is not set
CONFIG_MISC_FILESYSTEMS=y
# CONFIG_ADFS_FS is not set
# CONFIG_AFFS_FS is not set
# CONFIG_ECRYPT_FS is not set
# CONFIG_HFS_FS is not set
# CONFIG_HFSPLUS_FS is not set
# CONFIG_BEFS_FS is not set
# CONFIG_BFS_FS is not set
# CONFIG_EFS_FS is not set
# CONFIG_JFFS2_FS is not set
# CONFIG_LOGFS is not set
# CONFIG_CRAMFS is not set
# CONFIG_SQUASHFS is not set
# CONFIG_VXFS_FS is not set
# CONFIG_MINIX_FS is not set
# CONFIG_OMFS_FS is not set
# CONFIG_HPFS_FS is not set
# CONFIG_QNX4FS_FS is not set
# CONFIG_QNX6FS_FS is not set
# CONFIG_ROMFS_FS is not set
CONFIG_PSTORE=y
# CONFIG_PSTORE_CONSOLE is not set
# CONFIG_PSTORE_FTRACE is not set
# CONFIG_PSTORE_RAM is not set
# CONFIG_SYSV_FS is not set
# CONFIG_UFS_FS is not set
CONFIG_NETWORK_FILESYSTEMS=y
# CONFIG_NFS_FS is not set
# CONFIG_NFSD is not set
# CONFIG_CEPH_FS is not set
# CONFIG_CIFS is not set
# CONFIG_NCP_FS is not set
# CONFIG_CODA_FS is not set
# CONFIG_AFS_FS is not set
CONFIG_NLS=y
CONFIG_NLS_DEFAULT="utf8"
CONFIG_NLS_CODEPAGE_437=y
# CONFIG_NLS_CODEPAGE_737 is not set
# CONFIG_NLS_CODEPAGE_775 is not set
# CONFIG_NLS_CODEPAGE_850 is not set
# CONFIG_NLS_CODEPAGE_852 is not set
# CONFIG_NLS_CODEPAGE_855 is not set
# CONFIG_NLS_CODEPAGE_857 is not set
# CONFIG_NLS_CODEPAGE_860 is not set
# CONFIG_NLS_CODEPAGE_861 is not set
# CONFIG_NLS_CODEPAGE_862 is not set
# CONFIG_NLS_CODEPAGE_863 is not set
# CONFIG_NLS_CODEPAGE_864 is not set
# CONFIG_NLS_CODEPAGE_865 is not set
# CONFIG_NLS_CODEPAGE_866 is not set
# CONFIG_NLS_CODEPAGE_869 is not set
# CONFIG_NLS_CODEPAGE_936 is not set
# CONFIG_NLS_CODEPAGE_950 is not set
# CONFIG_NLS_CODEPAGE_932 is not set
# CONFIG_NLS_CODEPAGE_949 is not set
# CONFIG_NLS_CODEPAGE_874 is not set
# CONFIG_NLS_ISO8859_8 is not set
# CONFIG_NLS_CODEPAGE_1250 is not set
# CONFIG_NLS_CODEPAGE_1251 is not set
CONFIG_NLS_ASCII=y
# CONFIG_NLS_ISO8859_1 is not set
# CONFIG_NLS_ISO8859_2 is not set
# CONFIG_NLS_ISO8859_3 is not set
# CONFIG_NLS_ISO8859_4 is not set
# CONFIG_NLS_ISO8859_5 is not set
# CONFIG_NLS_ISO8859_6 is not set
# CONFIG_NLS_ISO8859_7 is not set
# CONFIG_NLS_ISO8859_9 is not set
# CONFIG_NLS_ISO8859_13 is not set
# CONFIG_NLS_ISO8859_14 is not set
# CONFIG_NLS_ISO8859_15 is not set
# CONFIG_NLS_KOI8_R is not set
# CONFIG_NLS_KOI8_U is not set
# CONFIG_NLS_MAC_ROMAN is not set
# CONFIG_NLS_MAC_CELTIC is not set
# CONFIG_NLS_MAC_CENTEURO is not set
# CONFIG_NLS_MAC_CROATIAN is not set
# CONFIG_NLS_MAC_CYRILLIC is not set
# CONFIG_NLS_MAC_GAELIC is not set
# CONFIG_NLS_MAC_GREEK is not set
# CONFIG_NLS_MAC_ICELAND is not set
# CONFIG_NLS_MAC_INUIT is not set
# CONFIG_NLS_MAC_ROMANIAN is not set
# CONFIG_NLS_MAC_TURKISH is not set
# CONFIG_NLS_UTF8 is not set

#
# Kernel hacking
#
CONFIG_TRACE_IRQFLAGS_SUPPORT=y
# CONFIG_PRINTK_TIME is not set
CONFIG_DEFAULT_MESSAGE_LOGLEVEL=7
# CONFIG_ENABLE_WARN_DEPRECATED is not set
CONFIG_ENABLE_MUST_CHECK=y
CONFIG_FRAME_WARN=2048
CONFIG_MAGIC_SYSRQ=y
CONFIG_STRIP_ASM_SYMS=y
# CONFIG_UNUSED_SYMBOLS is not set
CONFIG_DEBUG_FS=y
# CONFIG_HEADERS_CHECK is not set
# CONFIG_DEBUG_SECTION_MISMATCH is not set
# CONFIG_DEBUG_KERNEL is not set
# CONFIG_PANIC_ON_OOPS is not set
CONFIG_PANIC_ON_OOPS_VALUE=0
CONFIG_HAVE_DEBUG_KMEMLEAK=y
# CONFIG_SPARSE_RCU_POINTER is not set
CONFIG_STACKTRACE=y
CONFIG_DEBUG_BUGVERBOSE=y
CONFIG_DEBUG_MEMORY_INIT=y
CONFIG_ARCH_WANT_FRAME_POINTERS=y
# CONFIG_FRAME_POINTER is not set
CONFIG_RCU_CPU_STALL_TIMEOUT=60
# CONFIG_LKDTM is not set
CONFIG_USER_STACKTRACE_SUPPORT=y
CONFIG_NOP_TRACER=y
CONFIG_HAVE_FUNCTION_TRACER=y
CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y
CONFIG_HAVE_FUNCTION_GRAPH_FP_TEST=y
CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST=y
CONFIG_HAVE_DYNAMIC_FTRACE=y
CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y
CONFIG_HAVE_SYSCALL_TRACEPOINTS=y
CONFIG_HAVE_FENTRY=y
CONFIG_HAVE_C_RECORDMCOUNT=y
CONFIG_TRACER_MAX_TRACE=y
CONFIG_TRACE_CLOCK=y
CONFIG_RING_BUFFER=y
CONFIG_EVENT_TRACING=y
CONFIG_EVENT_POWER_TRACING_DEPRECATED=y
CONFIG_CONTEXT_SWITCH_TRACER=y
CONFIG_TRACING=y
CONFIG_GENERIC_TRACER=y
CONFIG_TRACING_SUPPORT=y
CONFIG_FTRACE=y
CONFIG_FUNCTION_TRACER=y
CONFIG_FUNCTION_GRAPH_TRACER=y
# CONFIG_IRQSOFF_TRACER is not set
CONFIG_SCHED_TRACER=y
CONFIG_FTRACE_SYSCALLS=y
CONFIG_BRANCH_PROFILE_NONE=y
# CONFIG_PROFILE_ANNOTATED_BRANCHES is not set
# CONFIG_PROFILE_ALL_BRANCHES is not set
CONFIG_STACK_TRACER=y
CONFIG_BLK_DEV_IO_TRACE=y
CONFIG_KPROBE_EVENT=y
# CONFIG_UPROBE_EVENT is not set
CONFIG_PROBE_EVENTS=y
CONFIG_DYNAMIC_FTRACE=y
CONFIG_FUNCTION_PROFILER=y
CONFIG_FTRACE_MCOUNT_RECORD=y
# CONFIG_FTRACE_STARTUP_TEST is not set
# CONFIG_MMIOTRACE is not set
# CONFIG_RING_BUFFER_BENCHMARK is not set
# CONFIG_PROVIDE_OHCI1394_DMA_INIT is not set
# CONFIG_DYNAMIC_DEBUG is not set
# CONFIG_DMA_API_DEBUG is not set
# CONFIG_ATOMIC64_SELFTEST is not set
# CONFIG_SAMPLES is not set
CONFIG_HAVE_ARCH_KGDB=y
CONFIG_HAVE_ARCH_KMEMCHECK=y
# CONFIG_TEST_KSTRTOX is not set
CONFIG_STRICT_DEVMEM=y
# CONFIG_X86_VERBOSE_BOOTUP is not set
CONFIG_EARLY_PRINTK=y
CONFIG_EARLY_PRINTK_DBGP=y
# CONFIG_DEBUG_SET_MODULE_RONX is not set
# CONFIG_IOMMU_STRESS is not set
CONFIG_HAVE_MMIOTRACE_SUPPORT=y
CONFIG_IO_DELAY_TYPE_0X80=0
CONFIG_IO_DELAY_TYPE_0XED=1
CONFIG_IO_DELAY_TYPE_UDELAY=2
CONFIG_IO_DELAY_TYPE_NONE=3
CONFIG_IO_DELAY_0X80=y
# CONFIG_IO_DELAY_0XED is not set
# CONFIG_IO_DELAY_UDELAY is not set
# CONFIG_IO_DELAY_NONE is not set
CONFIG_DEFAULT_IO_DELAY_TYPE=0
CONFIG_OPTIMIZE_INLINING=y

#
# Security options
#
CONFIG_KEYS=y
# CONFIG_TRUSTED_KEYS is not set
# CONFIG_ENCRYPTED_KEYS is not set
CONFIG_KEYS_DEBUG_PROC_KEYS=y
# CONFIG_SECURITY_DMESG_RESTRICT is not set
CONFIG_SECURITY=y
CONFIG_SECURITYFS=y
CONFIG_SECURITY_NETWORK=y
CONFIG_SECURITY_NETWORK_XFRM=y
# CONFIG_SECURITY_PATH is not set
CONFIG_LSM_MMAP_MIN_ADDR=65535
CONFIG_SECURITY_SELINUX=y
CONFIG_SECURITY_SELINUX_BOOTPARAM=y
CONFIG_SECURITY_SELINUX_BOOTPARAM_VALUE=1
CONFIG_SECURITY_SELINUX_DISABLE=y
CONFIG_SECURITY_SELINUX_DEVELOP=y
CONFIG_SECURITY_SELINUX_AVC_STATS=y
CONFIG_SECURITY_SELINUX_CHECKREQPROT_VALUE=1
# CONFIG_SECURITY_SELINUX_POLICYDB_VERSION_MAX is not set
# CONFIG_SECURITY_SMACK is not set
# CONFIG_SECURITY_TOMOYO is not set
# CONFIG_SECURITY_APPARMOR is not set
# CONFIG_SECURITY_YAMA is not set
CONFIG_INTEGRITY=y
# CONFIG_INTEGRITY_SIGNATURE is not set
CONFIG_IMA=y
CONFIG_IMA_MEASURE_PCR_IDX=10
CONFIG_IMA_AUDIT=y
CONFIG_IMA_LSM_RULES=y
# CONFIG_IMA_APPRAISE is not set
# CONFIG_EVM is not set
CONFIG_DEFAULT_SECURITY_SELINUX=y
# CONFIG_DEFAULT_SECURITY_DAC is not set
CONFIG_DEFAULT_SECURITY="selinux"
CONFIG_CRYPTO=y

#
# Crypto core or helper
#
CONFIG_CRYPTO_ALGAPI=y
CONFIG_CRYPTO_ALGAPI2=y
CONFIG_CRYPTO_AEAD2=y
CONFIG_CRYPTO_BLKCIPHER2=y
CONFIG_CRYPTO_HASH=y
CONFIG_CRYPTO_HASH2=y
CONFIG_CRYPTO_RNG2=y
CONFIG_CRYPTO_PCOMP2=y
CONFIG_CRYPTO_MANAGER=y
CONFIG_CRYPTO_MANAGER2=y
# CONFIG_CRYPTO_USER is not set
CONFIG_CRYPTO_MANAGER_DISABLE_TESTS=y
# CONFIG_CRYPTO_GF128MUL is not set
# CONFIG_CRYPTO_NULL is not set
# CONFIG_CRYPTO_PCRYPT is not set
CONFIG_CRYPTO_WORKQUEUE=y
# CONFIG_CRYPTO_CRYPTD is not set
# CONFIG_CRYPTO_AUTHENC is not set
# CONFIG_CRYPTO_TEST is not set

#
# Authenticated Encryption with Associated Data
#
# CONFIG_CRYPTO_CCM is not set
# CONFIG_CRYPTO_GCM is not set
# CONFIG_CRYPTO_SEQIV is not set

#
# Block modes
#
# CONFIG_CRYPTO_CBC is not set
# CONFIG_CRYPTO_CTR is not set
# CONFIG_CRYPTO_CTS is not set
# CONFIG_CRYPTO_ECB is not set
# CONFIG_CRYPTO_LRW is not set
# CONFIG_CRYPTO_PCBC is not set
# CONFIG_CRYPTO_XTS is not set

#
# Hash modes
#
CONFIG_CRYPTO_HMAC=y
# CONFIG_CRYPTO_XCBC is not set
# CONFIG_CRYPTO_VMAC is not set

#
# Digest
#
CONFIG_CRYPTO_CRC32C=y
# CONFIG_CRYPTO_CRC32C_INTEL is not set
# CONFIG_CRYPTO_GHASH is not set
# CONFIG_CRYPTO_MD4 is not set
CONFIG_CRYPTO_MD5=y
# CONFIG_CRYPTO_MICHAEL_MIC is not set
# CONFIG_CRYPTO_RMD128 is not set
# CONFIG_CRYPTO_RMD160 is not set
# CONFIG_CRYPTO_RMD256 is not set
# CONFIG_CRYPTO_RMD320 is not set
CONFIG_CRYPTO_SHA1=y
# CONFIG_CRYPTO_SHA1_SSSE3 is not set
# CONFIG_CRYPTO_SHA256 is not set
# CONFIG_CRYPTO_SHA512 is not set
# CONFIG_CRYPTO_TGR192 is not set
# CONFIG_CRYPTO_WP512 is not set
# CONFIG_CRYPTO_GHASH_CLMUL_NI_INTEL is not set

#
# Ciphers
#
CONFIG_CRYPTO_AES=y
# CONFIG_CRYPTO_AES_X86_64 is not set
# CONFIG_CRYPTO_AES_NI_INTEL is not set
# CONFIG_CRYPTO_ANUBIS is not set
# CONFIG_CRYPTO_ARC4 is not set
# CONFIG_CRYPTO_BLOWFISH is not set
# CONFIG_CRYPTO_BLOWFISH_X86_64 is not set
# CONFIG_CRYPTO_CAMELLIA is not set
# CONFIG_CRYPTO_CAMELLIA_X86_64 is not set
# CONFIG_CRYPTO_CAST5 is not set
# CONFIG_CRYPTO_CAST5_AVX_X86_64 is not set
# CONFIG_CRYPTO_CAST6 is not set
# CONFIG_CRYPTO_CAST6_AVX_X86_64 is not set
# CONFIG_CRYPTO_DES is not set
# CONFIG_CRYPTO_FCRYPT is not set
# CONFIG_CRYPTO_KHAZAD is not set
# CONFIG_CRYPTO_SALSA20 is not set
# CONFIG_CRYPTO_SALSA20_X86_64 is not set
# CONFIG_CRYPTO_SEED is not set
# CONFIG_CRYPTO_SERPENT is not set
# CONFIG_CRYPTO_SERPENT_SSE2_X86_64 is not set
# CONFIG_CRYPTO_SERPENT_AVX_X86_64 is not set
# CONFIG_CRYPTO_TEA is not set
# CONFIG_CRYPTO_TWOFISH is not set
# CONFIG_CRYPTO_TWOFISH_X86_64 is not set
# CONFIG_CRYPTO_TWOFISH_X86_64_3WAY is not set
# CONFIG_CRYPTO_TWOFISH_AVX_X86_64 is not set

#
# Compression
#
# CONFIG_CRYPTO_DEFLATE is not set
# CONFIG_CRYPTO_ZLIB is not set
# CONFIG_CRYPTO_LZO is not set

#
# Random Number Generation
#
# CONFIG_CRYPTO_ANSI_CPRNG is not set
# CONFIG_CRYPTO_USER_API_HASH is not set
# CONFIG_CRYPTO_USER_API_SKCIPHER is not set
CONFIG_CRYPTO_HW=y
# CONFIG_CRYPTO_DEV_PADLOCK is not set
# CONFIG_ASYMMETRIC_KEY_TYPE is not set
CONFIG_HAVE_KVM=y
CONFIG_HAVE_KVM_IRQCHIP=y
CONFIG_HAVE_KVM_EVENTFD=y
CONFIG_KVM_APIC_ARCHITECTURE=y
CONFIG_KVM_MMIO=y
CONFIG_KVM_ASYNC_PF=y
CONFIG_HAVE_KVM_MSI=y
CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT=y
CONFIG_VIRTUALIZATION=y
CONFIG_KVM=m
CONFIG_KVM_INTEL=m
# CONFIG_KVM_AMD is not set
# CONFIG_KVM_MMU_AUDIT is not set
# CONFIG_VHOST_NET is not set
CONFIG_BINARY_PRINTF=y

#
# Library routines
#
CONFIG_BITREVERSE=y
CONFIG_GENERIC_STRNCPY_FROM_USER=y
CONFIG_GENERIC_STRNLEN_USER=y
CONFIG_GENERIC_FIND_FIRST_BIT=y
CONFIG_GENERIC_PCI_IOMAP=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_IO=y
# CONFIG_CRC_CCITT is not set
CONFIG_CRC16=y
CONFIG_CRC_T10DIF=m
# CONFIG_CRC_ITU_T is not set
CONFIG_CRC32=y
# CONFIG_CRC32_SELFTEST is not set
CONFIG_CRC32_SLICEBY8=y
# CONFIG_CRC32_SLICEBY4 is not set
# CONFIG_CRC32_SARWATE is not set
# CONFIG_CRC32_BIT is not set
# CONFIG_CRC7 is not set
# CONFIG_LIBCRC32C is not set
# CONFIG_CRC8 is not set
CONFIG_ZLIB_INFLATE=y
CONFIG_LZO_COMPRESS=y
CONFIG_LZO_DECOMPRESS=y
CONFIG_XZ_DEC=y
CONFIG_XZ_DEC_X86=y
CONFIG_XZ_DEC_POWERPC=y
CONFIG_XZ_DEC_IA64=y
CONFIG_XZ_DEC_ARM=y
CONFIG_XZ_DEC_ARMTHUMB=y
CONFIG_XZ_DEC_SPARC=y
CONFIG_XZ_DEC_BCJ=y
# CONFIG_XZ_DEC_TEST is not set
CONFIG_DECOMPRESS_GZIP=y
CONFIG_DECOMPRESS_BZIP2=y
CONFIG_DECOMPRESS_LZMA=y
CONFIG_DECOMPRESS_XZ=y
CONFIG_DECOMPRESS_LZO=y
CONFIG_HAS_IOMEM=y
CONFIG_HAS_IOPORT=y
CONFIG_HAS_DMA=y
CONFIG_CPU_RMAP=y
CONFIG_DQL=y
CONFIG_NLATTR=y
CONFIG_ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE=y
CONFIG_AVERAGE=y
# CONFIG_CORDIC is not set
# CONFIG_DDR is not set

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Srikar Dronamraju
2012-12-13 14:00:02 UTC
Permalink
This is a full release of all the patches so apologies for the flood. V9 was
just a MIPS build fix and did not justify a full release. V10 includes Ingo's
scalability patches because even though they increase system CPU usage,
they also helped in a number of test cases. It would be worthwhile trying
to reduce the system CPU usage by looking closer at how rwsem works and
dealing with the contended case a bit better. Otherwise the rate of change
in the last few weeks has been tiny as the preliminary objectives had been
met and I did not want to invalidate any testing other people had conducted.
git tree: git://git.kernel.org/pub/scm/linux/kernel/git/mel/linux-balancenuma.git mm-balancenuma-v10r3
git tag: git://git.kernel.org/pub/scm/linux/kernel/git/mel/linux-balancenuma.git mm-balancenuma-v10
Here are the specjbb results on a 2 node 24 GB machine.
vm_1 was allocated 12 GB, while vm_2 and vm_3 were allocated 6 GB each
All vms were running specjbb2005 workload

All numbers presented are improvements/regression from v3.7-rc8

----------------------------------------------------------------------------------------------
| | | nofit| fit|
----------------------------------------------------------------------------------------------
| | | noksm| ksm| noksm| ksm|
----------------------------------------------------------------------------------------------
| | | nothp| thp| nothp| thp| nothp| thp| nothp| thp|
----------------------------------------------------------------------------------------------
| autonuma-mels-rebase | vm_1| 2.48| 14.25| 1.80| 15.59| 8.16| 14.62| 8.56| 17.49|
| autonuma-mels-rebase | vm_2| 23.59| 18.67| 14.20| 23.25| 10.73| 13.18| 17.94| 21.72|
| autonuma-mels-rebase | vm_3| 16.19| 19.40| 14.42| 22.54| 11.08| 12.04| 9.79| 20.34|
----------------------------------------------------------------------------------------------
| mel-balancenuma v10r3| vm_1| 0.10| 1.49| 1.78| 4.00| -1.01| -1.16| -1.02| -0.60|
| mel-balancenuma v10r3| vm_2| 3.45| -0.67| -1.54| 2.65| -2.83| -7.10| 0.10| -2.41|
| mel-balancenuma v10r3| vm_3| 0.56| 5.49| -0.63| 0.09| -7.41| -4.52| -0.77| -1.80|
----------------------------------------------------------------------------------------------
| tip-master 11-dec | vm_1| -5.68| 12.34| 35.96| 13.33| 10.79| 15.22| 9.65| 12.80|
| tip-master 11-dec | vm_2| 14.70| 15.54| 77.45| 15.10| 12.82| 11.20| 12.66| na |
| tip-master 11-dec | vm_3| 6.66| 19.26| na | 14.93| 7.62| 14.72| 14.73| 12.34|
----------------------------------------------------------------------------------------------


there are couple na's .. In those case, the testlog for some wierd
reason didnt have any data. this somehow seems to happen with tip/master
kernel only. May be its just coincidence.
--
Thanks and Regards
Srikar

PS: benchmark was run under non-standard conditions run only for the
purpose of relative comparision of different kernels.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Loading...