[PATCH v5 01/19] ARM: LPAE: Use long long printk format for displaying the pud

From: Will Deacon <***@arm.com>

Before we enable the MMU, we must ensure that the TTBR registers contain
sane values. After the MMU has been enabled, we jump to the *virtual*
address of the following function, so we also need to ensure that the
SCTLR write has taken effect.

This patch adds ISB instructions around the SCTLR write to ensure the
visibility of the above.

Signed-off-by: Will Deacon <***@arm.com>
Signed-off-by: Catalin Marinas <***@arm.com>
---
arch/arm/include/asm/assembler.h | 11 +++++++++++
arch/arm/kernel/head.S | 2 ++
2 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/arch/arm/include/asm/assembler.h b/arch/arm/include/asm/assembler.h
index bc2d2d7..2bcc456 100644
--- a/arch/arm/include/asm/assembler.h
+++ b/arch/arm/include/asm/assembler.h
@@ -184,6 +184,17 @@
#endif

/*
+ * Instruction barrier
+ */
+ .macro instr_sync
+#if __LINUX_ARM_ARCH__ >= 7
+ isb
+#elif __LINUX_ARM_ARCH__ == 6
+ mcr p15, 0, r0, c7, c5, 4
+#endif
+ .endm
+
+/*
* SMP data memory barrier
*/
.macro smp_dmb mode
diff --git a/arch/arm/kernel/head.S b/arch/arm/kernel/head.S
index c9173cf..ea8fae7 100644
--- a/arch/arm/kernel/head.S
+++ b/arch/arm/kernel/head.S
@@ -385,8 +385,10 @@ ENDPROC(__enable_mmu)
.align 5
__turn_mmu_on:
mov r0, r0
+ instr_sync
mcr p15, 0, r0, c1, c0, 0 @ write control reg
mrc p15, 0, r3, c0, c0, 0 @ read id reg
+ instr_sync
mov r3, r3
mov r3, r13
mov pc, r3

Russell King - ARM Linux

2011-05-08 21:41:01 UTC

Post by Catalin Marinas
Before we enable the MMU, we must ensure that the TTBR registers contain
sane values. After the MMU has been enabled, we jump to the *virtual*
address of the following function, so we also need to ensure that the
SCTLR write has taken effect.
This patch adds ISB instructions around the SCTLR write to ensure the
visibility of the above.

Maybe this should be extended to the arch/arm/kernel/sleep.S code too?

Post by Catalin Marinas
mov r0, r0
+ instr_sync
+ instr_sync
mov r3, r3
mov r3, r13
mov pc, r3

Could we avoid the second isb by doing something like this instead:

mrc p15, 0, r3, c0, c0, 0 @ read id reg
and r3, r3, r13
orr r3, r3, r13
mov pc, r3

The read from the ID register must complete before the branch can be
taken as the value is involved in computing the address to jump to
(even though that value has no actual effect on that address.) This
assumes that the read from CP15 can't complete until the previous
write has completed.

What I'm concerned about is adding additional code to this path - we
know it has some strict alignment requirements on some CPUs which
otherwise misbehave, normally by faulting in some way.

Catalin Marinas

2011-05-09 10:22:19 UTC

Maybe this should be extended to the arch/arm/kernel/sleep.S code too?

Yes.

Post by Catalin Marinas
mov r0, r0
+ instr_sync
+ instr_sync
mov r3, r3
mov r3, r13
mov pc, r3

and r3, r3, r13
orr r3, r3, r13
mov pc, r3
The read from the ID register must complete before the branch can be
taken as the value is involved in computing the address to jump to
(even though that value has no actual effect on that address.) This
assumes that the read from CP15 can't complete until the previous
write has completed.

I'm not entirely sure this would work on all (future) implementations.
There may be a slight difference between completion vs visibility to
subsequent instructions.

The MMU enable bit status may be already sampled by instructions in the
pipeline. Even if the "mov pc, r3" waits (pipeline stalled) for the read
back from SCTLR, it may still consider the MMU as being disabled by
having sampled the corresponding bit earlier. That's why CP15 operations
changing translations etc. require ISB and A15 is more restrictive here
(or we could say more relaxed on when the CP15 operation have an
effect).

Alternatively an exception return would do as well (like movs pc, lr)
but I think we still add some code for setting up the SPSR.

Post by Russell King - ARM Linux
What I'm concerned about is adding additional code to this path - we
know it has some strict alignment requirements on some CPUs which
otherwise misbehave, normally by faulting in some way.

The code path would be only changed on ARMv6+, otherwise the macro is
empty. Have you seen any issues with changing this code on newer CPUs?

--
Catalin

Russell King - ARM Linux

2011-05-09 10:32:42 UTC

Post by Catalin Marinas
Alternatively an exception return would do as well (like movs pc, lr)
but I think we still add some code for setting up the SPSR.

That gives us a way out of both of these without introducing any CPU
specific code. We can setup the SPSR before this block of code, and
call it with two movs pc, reg instructions which will provide the
necessary synchronization.

That sounds to me like an all-round better solution here.

Catalin Marinas

2011-05-09 10:59:54 UTC

Post by Catalin Marinas
Alternatively an exception return would do as well (like movs pc, lr)
but I think we still add some code for setting up the SPSR.

We still need an ISB before enabling the MMU to make sure that the TTBR
changing is visible. We may run with the MMU enabled (in the identity
mapping) before the exception return but with random data in TTBR.

--
Catalin

Russell King - ARM Linux

2011-05-09 12:05:09 UTC

Alternatively an exception return would do as well (like movs pc,=

lr)

but I think we still add some code for setting up the SPSR.

=20
That gives us a way out of both of these without introducing any CP=

specific code. We can setup the SPSR before this block of code, an=

call it with two movs pc, reg instructions which will provide the
necessary synchronization.

=20
We still need an ISB before enabling the MMU to make sure that the TT=

changing is visible. We may run with the MMU enabled (in the identity
mapping) before the exception return but with random data in TTBR.

Changes to CP15 registers and the memory order model
All changes to CP15 registers that appear in program order after any
explicit memory operations are guaranteed not to affect those memory
operations.

Any change to CP15 registers is guaranteed to be visible to subsequen=
t
instructions only after one of:
=E2=80=A2 the execution of an ISB instruction
=E2=80=A2 the taking of an exception
=E2=80=A2 the return from an exception.

To guarantee the visibility of changes to some CP15 registers, additi=
onal
operations might be required, on a case by case basis, before the ISB
instruction, exception or return from exception. These cases are
identified specifically in the definition of the registers.

However, for CP15 register accesses, all MRC and MCR instructions to
the same register using the same register number appear to occur in
program order relative to each other without context synchronization.

So, my reading of this suggests that ISB and returning from an exceptio=
n
(iow, movs pc, reg) have the same properties. So:

mcr p15, 0, r5, c3, c0, 0 @ load domain access re=
gister
mcr p15, 0, r4, c2, c0, 0 @ load page table point=
er
- b __turn_mmu_on
+ mrs r4, cpsr @ copy cpsr to spsr
+ msr spsr, r4
+ adr r4, BSYM(__turn_mmu_on)
+ movs pc, r4 @ synchronizing

.align 5
__turn_mmu_on:
mov r0, r0
mcr p15, 0, r0, c1, c0, 0 @ write control reg
mrc p15, 0, r3, c0, c0, 0 @ read id reg
mov r3, r3
mov r3, r13
- mov pc, r3
+ movs pc, r3 @ synchronizing

should be sufficient - and has the advantage that it should work on
existing CPUs.

Catalin Marinas

2011-05-09 13:36:35 UTC

Alternatively an exception return would do as well (like movs p=

c, lr)

but I think we still add some code for setting up the SPSR.

That gives us a way out of both of these without introducing any =

CPU

specific code. We can setup the SPSR before this block of code, =

and

call it with two movs pc, reg instructions which will provide the
necessary synchronization.

We still need an ISB before enabling the MMU to make sure that the =

TTBR

changing is visible. We may run with the MMU enabled (in the identi=

mapping) before the exception return but with random data in TTBR.

=20
Changes to CP15 registers and the memory order model
All changes to CP15 registers that appear in program order after an=

Post by Russell King - ARM Linux
explicit memory operations are guaranteed not to affect those memor=

Post by Russell King - ARM Linux
operations.
=20
Any change to CP15 registers is guaranteed to be visible to subsequ=

ent

Post by Russell King - ARM Linux
=E2=80=A2 the execution of an ISB instruction
=E2=80=A2 the taking of an exception
=E2=80=A2 the return from an exception.

=2E..

Post by Russell King - ARM Linux
So, my reading of this suggests that ISB and returning from an except=

ion

Post by Russell King - ARM Linux
=20

Post by Russell King - ARM Linux
- b __turn_mmu_on
+ msr spsr, r4
+ adr r4, BSYM(__turn_mmu_on)
=20
.align 5
mov r0, r0
mov r3, r3
mov r3, r13
- mov pc, r3
=20
should be sufficient - and has the advantage that it should work on
existing CPUs.

With two exception returns it should work. Only that we need to use LR
so that it compiles fine on Thumb-2 (found a bug in my TTBR1 patch as
well with using r13 as general purpose register, I'll fix it).

--=20
Catalin

Catalin Marinas

2011-05-09 15:01:56 UTC

Alternatively an exception return would do as well (like movs p=

c, lr)

but I think we still add some code for setting up the SPSR.

That gives us a way out of both of these without introducing any =

CPU

specific code. We can setup the SPSR before this block of code, =

and

call it with two movs pc, reg instructions which will provide the
necessary synchronization.

We still need an ISB before enabling the MMU to make sure that the =

TTBR

changing is visible. We may run with the MMU enabled (in the identi=

mapping) before the exception return but with random data in TTBR.

=20
Changes to CP15 registers and the memory order model
All changes to CP15 registers that appear in program order after an=

Post by Russell King - ARM Linux
explicit memory operations are guaranteed not to affect those memor=

Post by Russell King - ARM Linux
operations.
=20
Any change to CP15 registers is guaranteed to be visible to subsequ=

ent

Post by Russell King - ARM Linux
=E2=80=A2 the execution of an ISB instruction
=E2=80=A2 the taking of an exception
=E2=80=A2 the return from an exception.

=2E..

Post by Russell King - ARM Linux
So, my reading of this suggests that ISB and returning from an except=

ion

Post by Russell King - ARM Linux
=20

Post by Russell King - ARM Linux
- b __turn_mmu_on
+ msr spsr, r4

This doesn't work. From the ARM ARM (B1.3.3):

The execution state bits are the IT[7:0], J, E, and T bits. In
exception modes you can read or write these bits in the current
SPSR.
In the CPSR, unless the processor is in Debug state:
=E2=80=A2 The execution state bits, other than the E bit, are R=
AZ when
read by an MRS instruction.

So reading the CPSR doesn't copy the T and E bits. Of course, we could
set them explicitly but I find the ISB much simpler (and in practice we
only need it for ARMv7 onwards but adding the ARMv6 in case we have a
kernel compiled for both).
=20
Catalin

Russell King - ARM Linux

2011-05-09 15:34:16 UTC

=20
The execution state bits are the IT[7:0], J, E, and T bits. I=

exception modes you can read or write these bits in the curre=

SPSR.
=E2=80=A2 The execution state bits, other than the E bit, are=

RAZ when

read by an MRS instruction.
=20
So reading the CPSR doesn't copy the T and E bits. Of course, we coul=

set them explicitly but I find the ISB much simpler (and in practice =

only need it for ARMv7 onwards but adding the ARMv6 in case we have a
kernel compiled for both).

Err. If that's correct then the Linux kernel is totally broken, and
that's an incompatible change to the behaviour of the MRS and MSR
instructions which has gone unnoticed.

We use "MRS reg, cpsr" for saving the IRQ state in SVC mode and
"MSR cpsr, reg" to restore the interrupt state. If the T bit gets
reset by that, then Thumb kernels can never work.

What you've just said tells me that our implementation of:
- arch_local_irq_save()
- arch_local_save_flags()
- arch_local_irq_restore()
won't work because we can't read or write the I and F bits using
MSR/MRS, even in SVC mode.

What is the replacement method for doing this?

If there isn't a replacement method...

Catalin Marinas

2011-05-09 15:38:50 UTC

The execution state bits are the IT[7:0], J, E, and T bits.=

exception modes you can read or write these bits in the cur=

rent

SPSR.
=E2=80=A2 The execution state bits, other than the E bit, a=

re RAZ when

read by an MRS instruction.
So reading the CPSR doesn't copy the T and E bits. Of course, we co=

uld

set them explicitly but I find the ISB much simpler (and in practic=

e we

only need it for ARMv7 onwards but adding the ARMv6 in case we have=

kernel compiled for both).

=20
Err. If that's correct then the Linux kernel is totally broken, and
that's an incompatible change to the behaviour of the MRS and MSR
instructions which has gone unnoticed.
=20
We use "MRS reg, cpsr" for saving the IRQ state in SVC mode and
"MSR cpsr, reg" to restore the interrupt state. If the T bit gets
reset by that, then Thumb kernels can never work.
=20
- arch_local_irq_save()
- arch_local_save_flags()
- arch_local_irq_restore()
won't work because we can't read or write the I and F bits using
MSR/MRS, even in SVC mode.

You can't write the execution state bits: IT[7:0], E and T.

You can write mask bits A, I and F using MSR.

Post by Russell King - ARM Linux
What is the replacement method for doing this?

=46or changing the execution state - SETEND, BX etc.

--=20
Catalin

Russell King - ARM Linux

2011-05-09 15:48:22 UTC

On Mon, May 09, 2011 at 04:34:16PM +0100, Russell King - ARM Linux wrot=

=20
The execution state bits are the IT[7:0], J, E, and T bits.=

exception modes you can read or write these bits in the cur=

rent

SPSR.
=E2=80=A2 The execution state bits, other than the E bit, a=

re RAZ when

read by an MRS instruction.
=20
So reading the CPSR doesn't copy the T and E bits. Of course, we co=

uld

set them explicitly but I find the ISB much simpler (and in practic=

e we

only need it for ARMv7 onwards but adding the ARMv6 in case we have=

kernel compiled for both).

And actually, that whole paragraph in the ARM ARM seems to be inconsist=
ent
with other bits of the ARM ARM. For example:

In the CPSR, unless the processor is in Debug state:
=E2=80=A2 The execution state bits, other than the E bit, are RAZ when =
read by
an MRS instruction.
=E2=80=A2 Writes to the execution state bits, other than the E bit, by =
an MSR
instruction are:
=E2=80=94 For ARMv7 and ARMv6T2, ignored in all modes.
=E2=80=94 For architecture variants before ARMv6T2, ignored in User m=
ode and
required to write zeros in privileged modes. If a nonzero value is
written in a privileged mode, behavior is UNPREDICTABLE.

Now, G.5.1 says this about ARMv6 (which is an 'architecture variants be=
fore
ARMv6T2'):

ARMv6 and ARMv6K have the following differences:
=E2=80=A2 Bits[26:25] are RAZ/WI.
=E2=80=A2 Bits[15:10] are reserved.
=E2=80=A2 The J and T bits of the CPSR must not be changed when the CPS=
R is written
by an MSR instruction, or else the behavior is UNPREDICTABLE. MSR
instructions exist only in ARM state in these architecture variants, =
so
this is equivalent to saying the MSR instructions in privileged modes=
must
treat these bits as SBZP. MSR instructions in User mode still ignore
writes to these bits.

The thing is, if you write zeros into the mode bits from supervisor mod=
e
(as required by B1.3.3), you're going to take the CPU back to 26-bit us=
er
mode, which on many CPUs is an undefined mode.

So I suggest that the ARM ARM B1.3.3 is basically wrong and misleading.
Or if it's right, the architecture is broken as there's no way for
operating systems to save the current interrupt mask state and restore
it later.

Catalin Marinas

2011-05-09 16:02:10 UTC

On Mon, May 09, 2011 at 04:34:16PM +0100, Russell King - ARM Linux wr=

The execution state bits are the IT[7:0], J, E, and T bit=

s. In

exception modes you can read or write these bits in the c=

urrent

SPSR.
=E2=80=A2 The execution state bits, other than the E bit,=

are RAZ when

read by an MRS instruction.
So reading the CPSR doesn't copy the T and E bits. Of course, we =

could

set them explicitly but I find the ISB much simpler (and in pract=

ice we

only need it for ARMv7 onwards but adding the ARMv6 in case we ha=

ve a

kernel compiled for both).

Err. If that's correct then the Linux kernel is totally broken, an=

that's an incompatible change to the behaviour of the MRS and MSR
instructions which has gone unnoticed.
We use "MRS reg, cpsr" for saving the IRQ state in SVC mode and
"MSR cpsr, reg" to restore the interrupt state. If the T bit gets
reset by that, then Thumb kernels can never work.
- arch_local_irq_save()
- arch_local_save_flags()
- arch_local_irq_restore()
won't work because we can't read or write the I and F bits using
MSR/MRS, even in SVC mode.
What is the replacement method for doing this?
If there isn't a replacement method...

=20
And actually, that whole paragraph in the ARM ARM seems to be inconsi=

stent

=20
=E2=80=A2 The execution state bits, other than the E bit, are RAZ whe=

n read by

an MRS instruction.
=E2=80=A2 Writes to the execution state bits, other than the E bit, b=

y an MSR

=E2=80=94 For ARMv7 and ARMv6T2, ignored in all modes.
=E2=80=94 For architecture variants before ARMv6T2, ignored in User=

mode and

required to write zeros in privileged modes. If a nonzero value i=

written in a privileged mode, behavior is UNPREDICTABLE.
=20
Now, G.5.1 says this about ARMv6 (which is an 'architecture variants =

before

=20
=E2=80=A2 Bits[26:25] are RAZ/WI.
=E2=80=A2 Bits[15:10] are reserved.
=E2=80=A2 The J and T bits of the CPSR must not be changed when the C=

PSR is written

by an MSR instruction, or else the behavior is UNPREDICTABLE. MSR
instructions exist only in ARM state in these architecture variants=

, so

this is equivalent to saying the MSR instructions in privileged mod=

es must

treat these bits as SBZP. MSR instructions in User mode still ignor=

writes to these bits.
=20
The thing is, if you write zeros into the mode bits from supervisor m=

ode

(as required by B1.3.3), you're going to take the CPU back to 26-bit =

user

mode, which on many CPUs is an undefined mode.

I still don't get what you mean. B1.3.3 paragraph above refers to the
"execution state bits, other than the E bit", IOW - IT[7:0], J and T.
The execution state bits are the IT[7:0], J, E, and T bits.

The mode bits you are talking about are called M[4:0]. These are note
"execution state bits" and can be read by MRS.

--=20
Catalin

Catalin Marinas

2011-05-08 12:51:24 UTC

PGDIR_SHIFT and PMD_SHIFT for the classic 2-level page table format have
the same value (21). This patch converts the PGDIR_* uses in the kernel
to the PMD_* equivalent so that LPAE builds can reuse the same code.

Signed-off-by: Catalin Marinas <***@arm.com>
---
arch/arm/kernel/module.c | 2 +-
arch/arm/mm/dma-mapping.c | 6 +++---
arch/arm/mm/mmu.c | 10 +++++-----
3 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/arm/kernel/module.c b/arch/arm/kernel/module.c
index fee7c36..116016d 100644
--- a/arch/arm/kernel/module.c
+++ b/arch/arm/kernel/module.c
@@ -33,7 +33,7 @@
* recompiling the whole kernel when CONFIG_XIP_KERNEL is turned on/off.
*/
#undef MODULES_VADDR
-#define MODULES_VADDR (((unsigned long)_etext + ~PGDIR_MASK) & PGDIR_MASK)
+#define MODULES_VADDR (((unsigned long)_etext + ~PMD_MASK) & PMD_MASK)
#endif

#ifdef CONFIG_MMU
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 82a093c..334e9af 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -121,8 +121,8 @@ static void __dma_free_buffer(struct page *page, size_t size)
#endif

#define CONSISTENT_OFFSET(x) (((unsigned long)(x) - CONSISTENT_BASE) >> PAGE_SHIFT)
-#define CONSISTENT_PTE_INDEX(x) (((unsigned long)(x) - CONSISTENT_BASE) >> PGDIR_SHIFT)
-#define NUM_CONSISTENT_PTES (CONSISTENT_DMA_SIZE >> PGDIR_SHIFT)
+#define CONSISTENT_PTE_INDEX(x) (((unsigned long)(x) - CONSISTENT_BASE) >> PMD_SHIFT)
+#define NUM_CONSISTENT_PTES (CONSISTENT_DMA_SIZE >> PMD_SHIFT)

/*
* These are the page tables (2MB each) covering uncached, DMA consistent allocations
@@ -181,7 +181,7 @@ static int __init consistent_init(void)
}

consistent_pte[i++] = pte;
- base += (1 << PGDIR_SHIFT);
+ base += (1 << PMD_SHIFT);
} while (base < CONSISTENT_END);

return ret;
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index 6cf76b3..a855648 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -864,14 +864,14 @@ static inline void prepare_page_table(void)
/*
* Clear out all the mappings below the kernel image.
*/
- for (addr = 0; addr < MODULES_VADDR; addr += PGDIR_SIZE)
+ for (addr = 0; addr < MODULES_VADDR; addr += PMD_SIZE)
pmd_clear(pmd_off_k(addr));

#ifdef CONFIG_XIP_KERNEL
/* The XIP kernel is mapped in the module area -- skip over it */
- addr = ((unsigned long)_etext + PGDIR_SIZE - 1) & PGDIR_MASK;
+ addr = ((unsigned long)_etext + PMD_SIZE - 1) & PMD_MASK;
#endif
- for ( ; addr < PAGE_OFFSET; addr += PGDIR_SIZE)
+ for ( ; addr < PAGE_OFFSET; addr += PMD_SIZE)
pmd_clear(pmd_off_k(addr));

/*
@@ -886,7 +886,7 @@ static inline void prepare_page_table(void)
* memory bank, up to the end of the vmalloc region.
*/
for (addr = __phys_to_virt(end);
- addr < VMALLOC_END; addr += PGDIR_SIZE)
+ addr < VMALLOC_END; addr += PMD_SIZE)
pmd_clear(pmd_off_k(addr));
}

@@ -927,7 +927,7 @@ static void __init devicemaps_init(struct machine_desc *mdesc)
*/
vectors_page = early_alloc(PAGE_SIZE);

- for (addr = VMALLOC_END; addr; addr += PGDIR_SIZE)
+ for (addr = VMALLOC_END; addr; addr += PMD_SIZE)
pmd_clear(pmd_off_k(addr));

/*

Catalin Marinas

2011-05-08 12:51:26 UTC

This patch defines the (pte|pmd|pgd|pgprot)val_t as u32 and changes the
page table types to be based on these. The PMD bits are converted to the
corresponding type using the _AT macro.

The flush_pmd_entry/clean_pmd_entry argument was changed to (void *) to
allow them to be used with both PGD and PMD pointers and avoid code
duplication.

Signed-off-by: Catalin Marinas <***@arm.com>
---
arch/arm/include/asm/pgalloc.h | 4 +-
arch/arm/include/asm/pgtable-2level-hwdef.h | 82 +++++++++++++-------------
arch/arm/include/asm/pgtable-2level-types.h | 17 +++--
arch/arm/include/asm/tlbflush.h | 4 +-
arch/arm/mm/mm.h | 4 +-
arch/arm/mm/mmu.c | 4 +-
6 files changed, 59 insertions(+), 56 deletions(-)

diff --git a/arch/arm/include/asm/pgalloc.h b/arch/arm/include/asm/pgalloc.h
index a87d4cf..7418894 100644
--- a/arch/arm/include/asm/pgalloc.h
+++ b/arch/arm/include/asm/pgalloc.h
@@ -105,9 +105,9 @@ static inline void pte_free(struct mm_struct *mm, pgtable_t pte)
}

static inline void __pmd_populate(pmd_t *pmdp, phys_addr_t pte,
- unsigned long prot)
+ pmdval_t prot)
{
- unsigned long pmdval = (pte + PTE_HWTABLE_OFF) | prot;
+ pmdval_t pmdval = (pte + PTE_HWTABLE_OFF) | prot;
pmdp[0] = __pmd(pmdval);
pmdp[1] = __pmd(pmdval + 256 * sizeof(pte_t));
flush_pmd_entry(pmdp);
diff --git a/arch/arm/include/asm/pgtable-2level-hwdef.h b/arch/arm/include/asm/pgtable-2level-hwdef.h
index 436529c..2b52c40 100644
--- a/arch/arm/include/asm/pgtable-2level-hwdef.h
+++ b/arch/arm/include/asm/pgtable-2level-hwdef.h
@@ -16,29 +16,29 @@
* + Level 1 descriptor (PMD)
* - common
*/
-#define PMD_TYPE_MASK (3 << 0)
-#define PMD_TYPE_FAULT (0 << 0)
-#define PMD_TYPE_TABLE (1 << 0)
-#define PMD_TYPE_SECT (2 << 0)
-#define PMD_BIT4 (1 << 4)
-#define PMD_DOMAIN(x) ((x) << 5)
-#define PMD_PROTECTION (1 << 9) /* v5 */
+#define PMD_TYPE_MASK (_AT(pmdval_t, 3) << 0)
+#define PMD_TYPE_FAULT (_AT(pmdval_t, 0) << 0)
+#define PMD_TYPE_TABLE (_AT(pmdval_t, 1) << 0)
+#define PMD_TYPE_SECT (_AT(pmdval_t, 2) << 0)
+#define PMD_BIT4 (_AT(pmdval_t, 1) << 4)
+#define PMD_DOMAIN(x) (_AT(pmdval_t, (x)) << 5)
+#define PMD_PROTECTION (_AT(pmdval_t, 1) << 9) /* v5 */
/*
* - section
*/
-#define PMD_SECT_BUFFERABLE (1 << 2)
-#define PMD_SECT_CACHEABLE (1 << 3)
-#define PMD_SECT_XN (1 << 4) /* v6 */
-#define PMD_SECT_AP_WRITE (1 << 10)
-#define PMD_SECT_AP_READ (1 << 11)
-#define PMD_SECT_TEX(x) ((x) << 12) /* v5 */
-#define PMD_SECT_APX (1 << 15) /* v6 */
-#define PMD_SECT_S (1 << 16) /* v6 */
-#define PMD_SECT_nG (1 << 17) /* v6 */
-#define PMD_SECT_SUPER (1 << 18) /* v6 */
-#define PMD_SECT_AF (0)
+#define PMD_SECT_BUFFERABLE (_AT(pmdval_t, 1) << 2)
+#define PMD_SECT_CACHEABLE (_AT(pmdval_t, 1) << 3)
+#define PMD_SECT_XN (_AT(pmdval_t, 1) << 4) /* v6 */
+#define PMD_SECT_AP_WRITE (_AT(pmdval_t, 1) << 10)
+#define PMD_SECT_AP_READ (_AT(pmdval_t, 1) << 11)
+#define PMD_SECT_TEX(x) (_AT(pmdval_t, (x)) << 12) /* v5 */
+#define PMD_SECT_APX (_AT(pmdval_t, 1) << 15) /* v6 */
+#define PMD_SECT_S (_AT(pmdval_t, 1) << 16) /* v6 */
+#define PMD_SECT_nG (_AT(pmdval_t, 1) << 17) /* v6 */
+#define PMD_SECT_SUPER (_AT(pmdval_t, 1) << 18) /* v6 */
+#define PMD_SECT_AF (_AT(pmdval_t, 0))

-#define PMD_SECT_UNCACHED (0)
+#define PMD_SECT_UNCACHED (_AT(pmdval_t, 0))
#define PMD_SECT_BUFFERED (PMD_SECT_BUFFERABLE)
#define PMD_SECT_WT (PMD_SECT_CACHEABLE)
#define PMD_SECT_WB (PMD_SECT_CACHEABLE | PMD_SECT_BUFFERABLE)
@@ -54,38 +54,38 @@
* + Level 2 descriptor (PTE)
* - common
*/
-#define PTE_TYPE_MASK (3 << 0)
-#define PTE_TYPE_FAULT (0 << 0)
-#define PTE_TYPE_LARGE (1 << 0)
-#define PTE_TYPE_SMALL (2 << 0)
-#define PTE_TYPE_EXT (3 << 0) /* v5 */
-#define PTE_BUFFERABLE (1 << 2)
-#define PTE_CACHEABLE (1 << 3)
+#define PTE_TYPE_MASK (_AT(pteval_t, 3) << 0)
+#define PTE_TYPE_FAULT (_AT(pteval_t, 0) << 0)
+#define PTE_TYPE_LARGE (_AT(pteval_t, 1) << 0)
+#define PTE_TYPE_SMALL (_AT(pteval_t, 2) << 0)
+#define PTE_TYPE_EXT (_AT(pteval_t, 3) << 0) /* v5 */
+#define PTE_BUFFERABLE (_AT(pteval_t, 1) << 2)
+#define PTE_CACHEABLE (_AT(pteval_t, 1) << 3)

/*
* - extended small page/tiny page
*/
-#define PTE_EXT_XN (1 << 0) /* v6 */
-#define PTE_EXT_AP_MASK (3 << 4)
-#define PTE_EXT_AP0 (1 << 4)
-#define PTE_EXT_AP1 (2 << 4)
-#define PTE_EXT_AP_UNO_SRO (0 << 4)
+#define PTE_EXT_XN (_AT(pteval_t, 1) << 0) /* v6 */
+#define PTE_EXT_AP_MASK (_AT(pteval_t, 3) << 4)
+#define PTE_EXT_AP0 (_AT(pteval_t, 1) << 4)
+#define PTE_EXT_AP1 (_AT(pteval_t, 2) << 4)
+#define PTE_EXT_AP_UNO_SRO (_AT(pteval_t, 0) << 4)
#define PTE_EXT_AP_UNO_SRW (PTE_EXT_AP0)
#define PTE_EXT_AP_URO_SRW (PTE_EXT_AP1)
#define PTE_EXT_AP_URW_SRW (PTE_EXT_AP1|PTE_EXT_AP0)
-#define PTE_EXT_TEX(x) ((x) << 6) /* v5 */
-#define PTE_EXT_APX (1 << 9) /* v6 */
-#define PTE_EXT_COHERENT (1 << 9) /* XScale3 */
-#define PTE_EXT_SHARED (1 << 10) /* v6 */
-#define PTE_EXT_NG (1 << 11) /* v6 */
+#define PTE_EXT_TEX(x) (_AT(pteval_t, (x)) << 6) /* v5 */
+#define PTE_EXT_APX (_AT(pteval_t, 1) << 9) /* v6 */
+#define PTE_EXT_COHERENT (_AT(pteval_t, 1) << 9) /* XScale3 */
+#define PTE_EXT_SHARED (_AT(pteval_t, 1) << 10) /* v6 */
+#define PTE_EXT_NG (_AT(pteval_t, 1) << 11) /* v6 */

/*
* - small page
*/
-#define PTE_SMALL_AP_MASK (0xff << 4)
-#define PTE_SMALL_AP_UNO_SRO (0x00 << 4)
-#define PTE_SMALL_AP_UNO_SRW (0x55 << 4)
-#define PTE_SMALL_AP_URO_SRW (0xaa << 4)
-#define PTE_SMALL_AP_URW_SRW (0xff << 4)
+#define PTE_SMALL_AP_MASK (_AT(pteval_t, 0xff) << 4)
+#define PTE_SMALL_AP_UNO_SRO (_AT(pteval_t, 0x00) << 4)
+#define PTE_SMALL_AP_UNO_SRW (_AT(pteval_t, 0x55) << 4)
+#define PTE_SMALL_AP_URO_SRW (_AT(pteval_t, 0xaa) << 4)
+#define PTE_SMALL_AP_URW_SRW (_AT(pteval_t, 0xff) << 4)

#endif
diff --git a/arch/arm/include/asm/pgtable-2level-types.h b/arch/arm/include/asm/pgtable-2level-types.h
index 8ff6941..a4a4067 100644
--- a/arch/arm/include/asm/pgtable-2level-types.h
+++ b/arch/arm/include/asm/pgtable-2level-types.h
@@ -19,7 +19,10 @@
#ifndef _ASM_PGTABLE_2LEVEL_TYPES_H
#define _ASM_PGTABLE_2LEVEL_TYPES_H

-typedef unsigned long pteval_t;
+typedef u32 pteval_t;
+typedef u32 pmdval_t;
+typedef u32 pgdval_t;
+typedef u32 pgprotval_t;

#undef STRICT_MM_TYPECHECKS

@@ -28,9 +31,9 @@ typedef unsigned long pteval_t;
* These are used to make use of C type-checking..
*/
typedef struct { pteval_t pte; } pte_t;
-typedef struct { unsigned long pmd; } pmd_t;
-typedef struct { unsigned long pgd[2]; } pgd_t;
-typedef struct { unsigned long pgprot; } pgprot_t;
+typedef struct { pmdval_t pmd; } pmd_t;
+typedef struct { pgdval_t pgd[2]; } pgd_t;
+typedef struct { pgprotval_t pgprot; } pgprot_t;

#define pte_val(x) ((x).pte)
#define pmd_val(x) ((x).pmd)
@@ -46,9 +49,9 @@ typedef struct { unsigned long pgprot; } pgprot_t;
* .. while these make it easier on the compiler
*/
typedef pteval_t pte_t;
-typedef unsigned long pmd_t;
-typedef unsigned long pgd_t[2];
-typedef unsigned long pgprot_t;
+typedef pmdval_t pmd_t;
+typedef pgdval_t pgd_t[2];
+typedef pgprotval_t pgprot_t;

#define pte_val(x) (x)
#define pmd_val(x) (x)
diff --git a/arch/arm/include/asm/tlbflush.h b/arch/arm/include/asm/tlbflush.h
index d2005de..2a49568 100644
--- a/arch/arm/include/asm/tlbflush.h
+++ b/arch/arm/include/asm/tlbflush.h
@@ -509,7 +509,7 @@ static inline void local_flush_tlb_kernel_page(unsigned long kaddr)
* these operations. This is typically used when we are removing
* PMD entries.
*/
-static inline void flush_pmd_entry(pmd_t *pmd)
+static inline void flush_pmd_entry(void *pmd)
{
const unsigned int __tlb_flag = __cpu_tlb_flags;

@@ -525,7 +525,7 @@ static inline void flush_pmd_entry(pmd_t *pmd)
dsb();
}

-static inline void clean_pmd_entry(pmd_t *pmd)
+static inline void clean_pmd_entry(void *pmd)
{
const unsigned int __tlb_flag = __cpu_tlb_flags;

diff --git a/arch/arm/mm/mm.h b/arch/arm/mm/mm.h
index d238410..2b179af 100644
--- a/arch/arm/mm/mm.h
+++ b/arch/arm/mm/mm.h
@@ -17,8 +17,8 @@ static inline pmd_t *pmd_off_k(unsigned long virt)

struct mem_type {
pteval_t prot_pte;
- unsigned int prot_l1;
- unsigned int prot_sect;
+ pgprotval_t prot_l1;
+ pgprotval_t prot_sect;
unsigned int domain;
};

diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index a855648..1e4e05a 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -62,7 +62,7 @@ EXPORT_SYMBOL(pgprot_kernel);
struct cachepolicy {
const char policy[16];
unsigned int cr_mask;
- unsigned int pmd;
+ pmdval_t pmd;
pteval_t pte;
};

@@ -290,7 +290,7 @@ static void __init build_mem_type_table(void)
{
struct cachepolicy *cp;
unsigned int cr = get_cr();
- unsigned int user_pgprot, kern_pgprot, vecs_pgprot;
+ pgprotval_t user_pgprot, kern_pgprot, vecs_pgprot;
int cpu_arch = cpu_architecture();
int i;

Catalin Marinas

2011-05-08 12:51:23 UTC

On the secondary CPUs, TTBR1 points to the temporary pgd set up in
__cpu_up() which is later removed. With LPAE, TTBR1 is used for the
kernel mappings.

Signed-off-by: Catalin Marinas <***@arm.com>
---
arch/arm/include/asm/smp.h | 1 +
arch/arm/kernel/head.S | 7 +++++--
arch/arm/kernel/smp.c | 1 +
arch/arm/mm/proc-v7.S | 4 +++-
4 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/arch/arm/include/asm/smp.h b/arch/arm/include/asm/smp.h
index afea1b1..d924cf8 100644
--- a/arch/arm/include/asm/smp.h
+++ b/arch/arm/include/asm/smp.h
@@ -76,6 +76,7 @@ extern void platform_smp_prepare_cpus(unsigned int);
*/
struct secondary_data {
unsigned long pgdir;
+ unsigned long swapper_pg_dir;
void *stack;
};
extern struct secondary_data secondary_data;
diff --git a/arch/arm/kernel/head.S b/arch/arm/kernel/head.S
index ea8fae7..ac368e6 100644
--- a/arch/arm/kernel/head.S
+++ b/arch/arm/kernel/head.S
@@ -113,6 +113,7 @@ ENTRY(stext)
ldr r13, =__mmap_switched @ address to jump to after
@ mmu has been enabled
adr lr, BSYM(1f) @ return (PIC) address
+ mov r8, r4 @ set TTBR1 to swapper_pg_dir
ARM( add pc, r10, #PROCINFO_INITFUNC )
THUMB( add r12, r10, #PROCINFO_INITFUNC )
THUMB( mov pc, r12 )
@@ -302,8 +303,10 @@ ENTRY(secondary_startup)
*/
adr r4, __secondary_data
ldmia r4, {r5, r7, r12} @ address to jump to after
- sub r4, r4, r5 @ mmu has been enabled
- ldr r4, [r7, r4] @ get secondary_data.pgdir
+ sub r13, r4, r5 @ mmu has been enabled
+ ldr r4, [r7, r13] @ get secondary_data.pgdir
+ add r7, r7, #4
+ ldr r8, [r7, r13] @ get secondary_data.swapper_pg_dir
adr lr, BSYM(__enable_mmu) @ return address
mov r13, r12 @ __secondary_switched address
ARM( add pc, r10, #PROCINFO_INITFUNC ) @ initialise processor
diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index 556bd54..e2ea1bf 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -105,6 +105,7 @@ int __cpuinit __cpu_up(unsigned int cpu)
*/
secondary_data.stack = task_stack_page(idle) + THREAD_START_SP;
secondary_data.pgdir = virt_to_phys(pgd);
+ secondary_data.swapper_pg_dir = virt_to_phys(swapper_pg_dir);
__cpuc_flush_dcache_area(&secondary_data, sizeof(secondary_data));
outer_clean_range(__pa(&secondary_data), __pa(&secondary_data + 1));

diff --git a/arch/arm/mm/proc-v7.S b/arch/arm/mm/proc-v7.S
index 5fd5bc0..864a5c9 100644
--- a/arch/arm/mm/proc-v7.S
+++ b/arch/arm/mm/proc-v7.S
@@ -375,7 +375,9 @@ __v7_setup:
mcr p15, 0, r10, c2, c0, 2 @ TTB control register
ALT_SMP(orr r4, r4, #TTB_FLAGS_SMP)
ALT_UP(orr r4, r4, #TTB_FLAGS_UP)
- mcr p15, 0, r4, c2, c0, 1 @ load TTB1
+ ALT_SMP(orr r8, r8, #TTB_FLAGS_SMP)
+ ALT_UP(orr r8, r8, #TTB_FLAGS_UP)
+ mcr p15, 0, r8, c2, c0, 1 @ load TTB1
ldr r5, =PRRR @ PRRR
ldr r6, =NMRR @ NMRR
mcr p15, 0, r5, c10, c2, 0 @ write PRRR

Catalin Marinas

2011-05-08 12:51:22 UTC

The !CONFIG_ARM_PATCH_PHYS_VIRT case uses macros for __phys_to_virt and
__virt_to_phys but does not use any type casting. This causes issues
with LPAE where the phys_addr_t is 64-bit. Note that these macros are
only valid for lowmem physical addresses where the range is within
32-bit.

Signed-off-by: Catalin Marinas <***@arm.com>
---
arch/arm/include/asm/memory.h | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm/include/asm/memory.h b/arch/arm/include/asm/memory.h
index 431077c..10e4b4c 100644
--- a/arch/arm/include/asm/memory.h
+++ b/arch/arm/include/asm/memory.h
@@ -194,8 +194,8 @@ static inline unsigned long __phys_to_virt(unsigned long x)
return t;
}
#else
-#define __virt_to_phys(x) ((x) - PAGE_OFFSET + PHYS_OFFSET)
-#define __phys_to_virt(x) ((x) - PHYS_OFFSET + PAGE_OFFSET)
+#define __virt_to_phys(x) ((unsigned long)(x) - PAGE_OFFSET + PHYS_OFFSET)
+#define __phys_to_virt(x) ((unsigned long)(x) - PHYS_OFFSET + PAGE_OFFSET)
#endif
#endif

Russell King - ARM Linux

2011-05-08 21:44:37 UTC

Post by Catalin Marinas
The !CONFIG_ARM_PATCH_PHYS_VIRT case uses macros for __phys_to_virt and
__virt_to_phys but does not use any type casting. This causes issues

It might be a good idea to include the compiler warning message in the
commit log, so that the 'issues' being addressed are readily known.

Post by Catalin Marinas
with LPAE where the phys_addr_t is 64-bit. Note that these macros are
only valid for lowmem physical addresses where the range is within
32-bit.
---
arch/arm/include/asm/memory.h | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/arm/include/asm/memory.h b/arch/arm/include/asm/memory.h
index 431077c..10e4b4c 100644
--- a/arch/arm/include/asm/memory.h
+++ b/arch/arm/include/asm/memory.h
@@ -194,8 +194,8 @@ static inline unsigned long __phys_to_virt(unsigned long x)
return t;
}
#else
-#define __virt_to_phys(x) ((x) - PAGE_OFFSET + PHYS_OFFSET)
-#define __phys_to_virt(x) ((x) - PHYS_OFFSET + PAGE_OFFSET)
+#define __virt_to_phys(x) ((unsigned long)(x) - PAGE_OFFSET + PHYS_OFFSET)
+#define __phys_to_virt(x) ((unsigned long)(x) - PHYS_OFFSET + PAGE_OFFSET)
#endif
#endif

Catalin Marinas

2011-05-16 17:28:37 UTC

Post by Catalin Marinas
The !CONFIG_ARM_PATCH_PHYS_VIRT case uses macros for __phys_to_virt and
__virt_to_phys but does not use any type casting. This causes issues

It might be a good idea to include the compiler warning message in the
commit log, so that the 'issues' being addressed are readily known.

Here's the new log:

ARM: LPAE: Use unsigned long for __phys_to_virt and __virt_to_phys

The !CONFIG_ARM_PATCH_PHYS_VIRT case uses macros for __phys_to_virt and
__virt_to_phys but does not use any type casting. This causes compiler
warnings with LPAE where the phys_addr_t and dma_addr_t are 64-bit:

CC arch/arm/mm/dma-mapping.o
In file included from /work/Linux/linux-marc/include/linux/dma-mapping.h:93:0,
from /work/Linux/linux-marc/arch/arm/mm/dma-mapping.c:19:
/work/Linux/linux-marc/arch/arm/include/asm/dma-mapping.h: In function 'dma_to_virt':
/work/Linux/linux-marc/arch/arm/include/asm/dma-mapping.h:35:9: warning:
cast to pointer from integer of different size

Note that these macros are only valid for lowmem physical addresses
where the range is within 32-bit address range.

Signed-off-by: Catalin Marinas <***@arm.com>

--
Catalin

Catalin Marinas

2011-05-08 12:51:38 UTC

This patch adds the ARM_LPAE and ARCH_PHYS_ADDR_T_64BIT Kconfig entries
allowing LPAE support to be compiled into the kernel.

Signed-off-by: Catalin Marinas <***@arm.com>
---
arch/arm/Kconfig | 2 +-
arch/arm/mm/Kconfig | 13 +++++++++++++
2 files changed, 14 insertions(+), 1 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index b7f5f2f..4f91988 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1762,7 +1762,7 @@ config CMDLINE_FORCE

config XIP_KERNEL
bool "Kernel Execute-In-Place from ROM"
- depends on !ZBOOT_ROM
+ depends on !ZBOOT_ROM && !ARM_LPAE
help
Execute-In-Place allows the kernel to run from non-volatile storage
directly addressable by the CPU, such as NOR flash. This saves RAM
diff --git a/arch/arm/mm/Kconfig b/arch/arm/mm/Kconfig
index d4cc7ff..fa70bbd 100644
--- a/arch/arm/mm/Kconfig
+++ b/arch/arm/mm/Kconfig
@@ -629,6 +629,19 @@ config IO_36

comment "Processor Features"

+config ARM_LPAE
+ bool "Support for the Large Physical Address Extension"
+ depends on MMU && CPU_V7
+ help
+ Say Y if you have an ARMv7 processor supporting the LPAE page table
+ format and you would like to access memory beyond the 4GB limit.
+
+config ARCH_PHYS_ADDR_T_64BIT
+ def_bool ARM_LPAE
+
+config ARCH_DMA_ADDR_T_64BIT
+ def_bool ARM_LPAE
+
config ARM_THUMB
bool "Support Thumb user binaries"
depends on CPU_ARM720T || CPU_ARM740T || CPU_ARM920T || CPU_ARM922T || CPU_ARM925T || CPU_ARM926T || CPU_ARM940T || CPU_ARM946E || CPU_ARM1020 || CPU_ARM1020E || CPU_ARM1022 || CPU_ARM1026 || CPU_XSCALE || CPU_XSC3 || CPU_MOHAWK || CPU_V6 || CPU_V6K || CPU_V7 || CPU_FEROCEON

Catalin Marinas

2011-05-08 12:51:37 UTC

From: Will Deacon <***@arm.com>

LPAE provides support for memory banks with physical addresses of up
to 40 bits.

This patch adds a new atag, ATAG_MEM64, so that the Kernel can be
informed about memory that exists above the 4GB boundary.

Signed-off-by: Will Deacon <***@arm.com>
Signed-off-by: Catalin Marinas <***@arm.com>
---
arch/arm/include/asm/setup.h | 10 +++++++++-
arch/arm/kernel/compat.c | 4 ++--
arch/arm/kernel/setup.c | 12 +++++++++++-
3 files changed, 22 insertions(+), 4 deletions(-)

diff --git a/arch/arm/include/asm/setup.h b/arch/arm/include/asm/setup.h
index 95176af..5e3587c 100644
--- a/arch/arm/include/asm/setup.h
+++ b/arch/arm/include/asm/setup.h
@@ -43,6 +43,13 @@ struct tag_mem32 {
__u32 start; /* physical start address */
};

+#define ATAG_MEM64 0x54420002
+
+struct tag_mem64 {
+ __u64 size;
+ __u64 start; /* physical start address */
+};
+
/* VGA text type displays */
#define ATAG_VIDEOTEXT 0x54410003

@@ -147,7 +154,8 @@ struct tag {
struct tag_header hdr;
union {
struct tag_core core;
- struct tag_mem32 mem;
+ struct tag_mem32 mem32;
+ struct tag_mem64 mem64;
struct tag_videotext videotext;
struct tag_ramdisk ramdisk;
struct tag_initrd initrd;
diff --git a/arch/arm/kernel/compat.c b/arch/arm/kernel/compat.c
index 9256523..f224d95 100644
--- a/arch/arm/kernel/compat.c
+++ b/arch/arm/kernel/compat.c
@@ -86,8 +86,8 @@ static struct tag * __init memtag(struct tag *tag, unsigned long start, unsigned
tag = tag_next(tag);
tag->hdr.tag = ATAG_MEM;
tag->hdr.size = tag_size(tag_mem32);
- tag->u.mem.size = size;
- tag->u.mem.start = start;
+ tag->u.mem32.size = size;
+ tag->u.mem32.start = start;

return tag;
}
diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c
index 006c1e8..4158e4d 100644
--- a/arch/arm/kernel/setup.c
+++ b/arch/arm/kernel/setup.c
@@ -611,11 +611,21 @@ __tagtable(ATAG_CORE, parse_tag_core);

static int __init parse_tag_mem32(const struct tag *tag)
{
- return arm_add_memory(tag->u.mem.start, tag->u.mem.size);
+ return arm_add_memory(tag->u.mem32.start, tag->u.mem32.size);
}

__tagtable(ATAG_MEM, parse_tag_mem32);

+#ifdef CONFIG_PHYS_ADDR_T_64BIT
+static int __init parse_tag_mem64(const struct tag *tag)
+{
+ /* We only use 32-bits for the size. */
+ return arm_add_memory(tag->u.mem64.start, (unsigned long)tag->u.mem64.size);
+}
+
+__tagtable(ATAG_MEM64, parse_tag_mem64);
+#endif /* CONFIG_PHYS_ADDR_T_64BIT */
+
#if defined(CONFIG_VGA_CONSOLE) || defined(CONFIG_DUMMY_CONSOLE)
struct screen_info screen_info = {
.orig_video_lines = 30,

Catalin Marinas

2011-05-08 12:51:36 UTC

From: Will Deacon <***@arm.com>

Memory banks living outside of the 32-bit physical address
space do not have a 1:1 pa <-> va mapping and therefore the
__va macro may wrap.

This patch ensures that such banks are marked as highmem so
that the Kernel doesn't try to split them up when it sees that
the wrapped virtual address overlaps the vmalloc space.

Signed-off-by: Will Deacon <***@arm.com>
Signed-off-by: Catalin Marinas <***@arm.com>
---
arch/arm/mm/mmu.c | 5 +++--
1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index 6794c92..ec80ca1 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -795,7 +795,8 @@ static void __init sanity_check_meminfo(void)
*bank = meminfo.bank[i];

#ifdef CONFIG_HIGHMEM
- if (__va(bank->start) > vmalloc_min ||
+ if (bank->start > ULONG_MAX ||
+ __va(bank->start) > vmalloc_min ||
__va(bank->start) < (void *)PAGE_OFFSET)
highmem = 1;

@@ -805,7 +806,7 @@ static void __init sanity_check_meminfo(void)
* Split those memory banks which are partially overlapping
* the vmalloc area greatly simplifying things later.
*/
- if (__va(bank->start) < vmalloc_min &&
+ if (!highmem && __va(bank->start) < vmalloc_min &&
bank->size > vmalloc_min - __va(bank->start)) {
if (meminfo.nr_banks >= NR_BANKS) {
printk(KERN_CRIT "NR_BANKS too low, "

Catalin Marinas

2011-05-08 12:51:35 UTC

From: Will Deacon <***@arm.com>

This patch uses the types.h implementation in asm-generic to define the
dma_addr_t type as the same width as phys_addr_t.

NOTE: this is a temporary patch until the corresponding patches unifying
the dma_addr_t and removing the dma64_addr_t are merged into mainline.

Signed-off-by: Will Deacon <***@arm.com>
Signed-off-by: Catalin Marinas <***@arm.com>
---
arch/arm/include/asm/types.h | 11 +----------
1 files changed, 1 insertions(+), 10 deletions(-)

diff --git a/arch/arm/include/asm/types.h b/arch/arm/include/asm/types.h
index 48192ac..dc1bdbb 100644
--- a/arch/arm/include/asm/types.h
+++ b/arch/arm/include/asm/types.h
@@ -1,17 +1,8 @@
#ifndef __ASM_ARM_TYPES_H
#define __ASM_ARM_TYPES_H

-#include <asm-generic/int-ll64.h>
+#include <asm-generic/types.h>

-#ifndef __ASSEMBLY__
-
-typedef unsigned short umode_t;
-
-#endif /* __ASSEMBLY__ */
-
-/*
- * These aren't exported outside the kernel to avoid name space clashes
- */
#ifdef __KERNEL__

#define BITS_PER_LONG 32

Catalin Marinas

2011-05-08 12:51:34 UTC

With LPAE, the TTBRx size is 64-bit so make sure that all the
information is saved and restored.

Signed-off-by: Catalin Marinas <***@arm.com>
---
arch/arm/mm/proc-v7.S | 22 ++++++++++++++++++++++
1 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/arch/arm/mm/proc-v7.S b/arch/arm/mm/proc-v7.S
index ad22628..3e6999e 100644
--- a/arch/arm/mm/proc-v7.S
+++ b/arch/arm/mm/proc-v7.S
@@ -260,19 +260,32 @@ cpu_v7_name:

/* Suspend/resume support: derived from arch/arm/mach-s5pv210/sleep.S */
.globl cpu_v7_suspend_size
+#ifdef CONFIG_ARM_LPAE
+.equ cpu_v7_suspend_size, 4 * 10
+#else
.equ cpu_v7_suspend_size, 4 * 8
+#endif
#ifdef CONFIG_PM_SLEEP
ENTRY(cpu_v7_do_suspend)
stmfd sp!, {r4 - r11, lr}
mrc p15, 0, r4, c13, c0, 0 @ FCSE/PID
mrc p15, 0, r5, c13, c0, 1 @ Context ID
mrc p15, 0, r6, c3, c0, 0 @ Domain ID
+#ifdef CONFIG_ARM_LPAE
+ mrrc p15, 0, r7, r8, c2 @ TTB 0
+ mrrc p15, 1, r2, r3, c2 @ TTB 1
+#else
mrc p15, 0, r7, c2, c0, 0 @ TTB 0
mrc p15, 0, r8, c2, c0, 1 @ TTB 1
+#endif
mrc p15, 0, r9, c1, c0, 0 @ Control register
mrc p15, 0, r10, c1, c0, 1 @ Auxiliary control register
mrc p15, 0, r11, c1, c0, 2 @ Co-processor access control
+#ifdef CONFIG_ARM_LPAE
+ stmia r0, {r2 - r11}
+#else
stmia r0, {r4 - r11}
+#endif
ldmfd sp!, {r4 - r11, pc}
ENDPROC(cpu_v7_do_suspend)

@@ -280,12 +293,21 @@ ENTRY(cpu_v7_do_resume)
mov ip, #0
mcr p15, 0, ip, c8, c7, 0 @ invalidate TLBs
mcr p15, 0, ip, c7, c5, 0 @ invalidate I cache
+#ifdef CONFIG_ARM_LPAE
+ ldmia r0, {r2 - r11}
+#else
ldmia r0, {r4 - r11}
+#endif
mcr p15, 0, r4, c13, c0, 0 @ FCSE/PID
mcr p15, 0, r5, c13, c0, 1 @ Context ID
mcr p15, 0, r6, c3, c0, 0 @ Domain ID
+#ifdef CONFIG_ARM_LPAE
+ mcrr p15, 0, r7, r8, c2 @ TTB 0
+ mcrr p15, 1, r2, r3, c2 @ TTB 1
+#else
mcr p15, 0, r7, c2, c0, 0 @ TTB 0
mcr p15, 0, r8, c2, c0, 1 @ TTB 1
+#endif
mcr p15, 0, ip, c2, c0, 2 @ TTB control register
mcr p15, 0, r10, c1, c0, 1 @ Auxiliary control register
mcr p15, 0, r11, c1, c0, 2 @ Co-processor access control

Tony Lindgren

2011-05-18 07:27:38 UTC

Hi,

One question below regarding the ifdefs in this series.

Post by Catalin Marinas
With LPAE, the TTBRx size is 64-bit so make sure that all the
information is saved and restored.
---
arch/arm/mm/proc-v7.S | 22 ++++++++++++++++++++++
1 files changed, 22 insertions(+), 0 deletions(-)
diff --git a/arch/arm/mm/proc-v7.S b/arch/arm/mm/proc-v7.S
index ad22628..3e6999e 100644
--- a/arch/arm/mm/proc-v7.S
+++ b/arch/arm/mm/proc-v7.S
/* Suspend/resume support: derived from arch/arm/mach-s5pv210/sleep.S */
.globl cpu_v7_suspend_size
+#ifdef CONFIG_ARM_LPAE
+.equ cpu_v7_suspend_size, 4 * 10
+#else
.equ cpu_v7_suspend_size, 4 * 8
+#endif
#ifdef CONFIG_PM_SLEEP
ENTRY(cpu_v7_do_suspend)
stmfd sp!, {r4 - r11, lr}
+#ifdef CONFIG_ARM_LPAE
+#else
+#endif
+#ifdef CONFIG_ARM_LPAE
+ stmia r0, {r2 - r11}
+#else
stmia r0, {r4 - r11}
+#endif
ldmfd sp!, {r4 - r11, pc}
ENDPROC(cpu_v7_do_suspend)
@@ -280,12 +293,21 @@ ENTRY(cpu_v7_do_resume)
mov ip, #0
+#ifdef CONFIG_ARM_LPAE
+ ldmia r0, {r2 - r11}
+#else
ldmia r0, {r4 - r11}
+#endif
+#ifdef CONFIG_ARM_LPAE
+#else
+#endif

Do we really need all this ifdef else throughout this series?

I think we already have things in place to do this dynamically
like we already do for thumb, smp_on_up, v6 vs v7 and so on.

Otherwise we'll end up with every second line of ifdef else..

Regards,

Tony

Catalin Marinas

2011-05-20 13:21:07 UTC

Tony,

Post by Tony Lindgren

Post by Catalin Marinas
--- a/arch/arm/mm/proc-v7.S
+++ b/arch/arm/mm/proc-v7.S
/* Suspend/resume support: derived from arch/arm/mach-s5pv210/sleep.S */
.globl cpu_v7_suspend_size
+#ifdef CONFIG_ARM_LPAE
+.equ cpu_v7_suspend_size, 4 * 10
+#else
.equ cpu_v7_suspend_size, 4 * 8
+#endif
#ifdef CONFIG_PM_SLEEP
ENTRY(cpu_v7_do_suspend)
stmfd sp!, {r4 - r11, lr}
+#ifdef CONFIG_ARM_LPAE
+#else
+#endif
+#ifdef CONFIG_ARM_LPAE
+ stmia r0, {r2 - r11}
+#else
stmia r0, {r4 - r11}
+#endif
ldmfd sp!, {r4 - r11, pc}
ENDPROC(cpu_v7_do_suspend)
@@ -280,12 +293,21 @@ ENTRY(cpu_v7_do_resume)
mov ip, #0
+#ifdef CONFIG_ARM_LPAE
+ ldmia r0, {r2 - r11}
+#else
ldmia r0, {r4 - r11}
+#endif
+#ifdef CONFIG_ARM_LPAE
+#else
+#endif

Do we really need all this ifdef else throughout this series?
I think we already have things in place to do this dynamically
like we already do for thumb, smp_on_up, v6 vs v7 and so on.

By dynamically, do you mean at run-time? We won't be able to compile
both classic and LPAE in the same kernel, there is just too much
difference between them (2 vs 3 levels of page tables - LPAE is an
entirely new format).

If you mean some simpler macros like what we have for ARM/THUMB to
reduce the number of lines, I'm fine with it though we don't always have
a 1:1 mapping between LPAE and non-LPAE instructions.

Alternatively, I'm happy to create a separate proc-v7lpae.S file.

--
Catalin

Jean-Christophe PLAGNIOL-VILLARD

2011-05-20 15:17:27 UTC

Post by Catalin Marinas
Tony,

Post by Tony Lindgren

Do we really need all this ifdef else throughout this series?
I think we already have things in place to do this dynamically
like we already do for thumb, smp_on_up, v6 vs v7 and so on.

By dynamically, do you mean at run-time? We won't be able to compile
both classic and LPAE in the same kernel, there is just too much
difference between them (2 vs 3 levels of page tables - LPAE is an
entirely new format).
If you mean some simpler macros like what we have for ARM/THUMB to
reduce the number of lines, I'm fine with it though we don't always have
a 1:1 mapping between LPAE and non-LPAE instructions.

create the same macro as done do arm/thumb is good will make the code more
readable

Post by Catalin Marinas
Alternatively, I'm happy to create a separate proc-v7lpae.S file.

maybe a good idea

Best Regards,
J.

Nicolas Pitre

2011-05-20 18:09:20 UTC

Post by Tony Lindgren
Do we really need all this ifdef else throughout this series?
I think we already have things in place to do this dynamically
like we already do for thumb, smp_on_up, v6 vs v7 and so on.

By dynamically, do you mean at run-time? We won't be able to compile
both classic and LPAE in the same kernel, there is just too much
difference between them (2 vs 3 levels of page tables - LPAE is an
entirely new format).
If you mean some simpler macros like what we have for ARM/THUMB to
reduce the number of lines, I'm fine with it though we don't always have
a 1:1 mapping between LPAE and non-LPAE instructions.
Alternatively, I'm happy to create a separate proc-v7lpae.S file.

That would probably be the best option.

Nicolas

Catalin Marinas

2011-05-22 21:09:47 UTC

Post by Nicolas Pitre

By dynamically, do you mean at run-time? We won't be able to compile
both classic and LPAE in the same kernel, there is just too much
difference between them (2 vs 3 levels of page tables - LPAE is an
entirely new format).
If you mean some simpler macros like what we have for ARM/THUMB to
reduce the number of lines, I'm fine with it though we don't always have
a 1:1 mapping between LPAE and non-LPAE instructions.
Alternatively, I'm happy to create a separate proc-v7lpae.S file.

That would probably be the best option.

OK, I'll move this code to a separate file. The v7 setup code got
pretty hard to read.

--
Catalin

Tony Lindgren

2011-05-24 06:26:12 UTC

Hi,

Sorry for the delay in replying, we got a baby girl last Thursday :)

Post by Nicolas Pitre

Post by Nicolas Pitre

Post by Catalin Marinas
If you mean some simpler macros like what we have for ARM/THUMB to
reduce the number of lines, I'm fine with it though we don't always have
a 1:1 mapping between LPAE and non-LPAE instructions.
Alternatively, I'm happy to create a separate proc-v7lpae.S file.

That would probably be the best option.

OK, I'll move this code to a separate file. The v7 setup code got
pretty hard to read.

Separate file or macros sounds good to me too depending on how much
of existing code you can recycle.

Regards,

Tony

Catalin Marinas

2011-05-08 12:51:27 UTC

With LPAE, the physical address mask is 40-bit while the page table
entry is 64-bit. This patch introduces PHYS_MASK for the 2-level page
table format, defined as ~0UL.

Signed-off-by: Catalin Marinas <***@arm.com>
---
arch/arm/include/asm/pgtable-2level-hwdef.h | 2 ++
arch/arm/include/asm/pgtable.h | 6 +++---
2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/arm/include/asm/pgtable-2level-hwdef.h b/arch/arm/include/asm/pgtable-2level-hwdef.h
index 2b52c40..5cfba15 100644
--- a/arch/arm/include/asm/pgtable-2level-hwdef.h
+++ b/arch/arm/include/asm/pgtable-2level-hwdef.h
@@ -88,4 +88,6 @@
#define PTE_SMALL_AP_URO_SRW (_AT(pteval_t, 0xaa) << 4)
#define PTE_SMALL_AP_URW_SRW (_AT(pteval_t, 0xff) << 4)

+#define PHYS_MASK (~0UL)
+
#endif
diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
index 9618052..8f9e1dd 100644
--- a/arch/arm/include/asm/pgtable.h
+++ b/arch/arm/include/asm/pgtable.h
@@ -199,10 +199,10 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)

static inline pte_t *pmd_page_vaddr(pmd_t pmd)
{
- return __va(pmd_val(pmd) & PAGE_MASK);
+ return __va(pmd_val(pmd) & PHYS_MASK & (s32)PAGE_MASK);
}

-#define pmd_page(pmd) pfn_to_page(__phys_to_pfn(pmd_val(pmd)))
+#define pmd_page(pmd) pfn_to_page(__phys_to_pfn(pmd_val(pmd) & PHYS_MASK))

/* we don't need complex calculations here as the pmd is folded into the pgd */
#define pmd_addr_end(addr,end) (end)
@@ -223,7 +223,7 @@ static inline pte_t *pmd_page_vaddr(pmd_t pmd)
#define pte_offset_map(pmd,addr) (__pte_map(pmd) + pte_index(addr))
#define pte_unmap(pte) __pte_unmap(pte)

-#define pte_pfn(pte) (pte_val(pte) >> PAGE_SHIFT)
+#define pte_pfn(pte) ((pte_val(pte) & PHYS_MASK) >> PAGE_SHIFT)
#define pfn_pte(pfn,prot) __pte(__pfn_to_phys(pfn) | pgprot_val(prot))

#define pte_page(pte) pfn_to_page(pte_pfn(pte))

Catalin Marinas

2011-05-08 12:51:33 UTC

With LPAE, the pgd is a separate page table with entries pointing to the
pmd. The identity_mapping_add() function needs to ensure that the pgd is
populated before populating the pmd level. The do..while blocks now loop
over the pmd in order to have the same implementation for the two page
table formats. The pmd_addr_end() definition has been removed and the
generic one used instead. The pmd clean-up is done in the pgd_free()
function.

Signed-off-by: Catalin Marinas <***@arm.com>
---
arch/arm/include/asm/pgtable.h | 4 ----
arch/arm/mm/idmap.c | 36 ++++++++++++++++++++++++++++++++++--
2 files changed, 34 insertions(+), 6 deletions(-)

diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
index 1db9ad6..9645e52 100644
--- a/arch/arm/include/asm/pgtable.h
+++ b/arch/arm/include/asm/pgtable.h
@@ -263,10 +263,6 @@ static inline pte_t *pmd_page_vaddr(pmd_t pmd)

#define pmd_page(pmd) pfn_to_page(__phys_to_pfn(pmd_val(pmd) & PHYS_MASK))

-/* we don't need complex calculations here as the pmd is folded into the pgd */
-#define pmd_addr_end(addr,end) (end)
-
-
#ifndef CONFIG_HIGHPTE
#define __pte_map(pmd) pmd_page_vaddr(*(pmd))
#define __pte_unmap(pte) do { } while (0)
diff --git a/arch/arm/mm/idmap.c b/arch/arm/mm/idmap.c
index 2be9139..24e0655 100644
--- a/arch/arm/mm/idmap.c
+++ b/arch/arm/mm/idmap.c
@@ -1,9 +1,36 @@
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
#include <linux/kernel.h>

#include <asm/cputype.h>
#include <asm/pgalloc.h>
#include <asm/pgtable.h>

+#ifdef CONFIG_ARM_LPAE
+static void idmap_add_pmd(pud_t *pud, unsigned long addr, unsigned long end,
+ unsigned long prot)
+{
+ pmd_t *pmd;
+ unsigned long next;
+
+ if (pud_none_or_clear_bad(pud) || (pud_val(*pud) & L_PGD_SWAPPER)) {
+ pmd = pmd_alloc_one(NULL, addr);
+ if (!pmd) {
+ pr_warning("Failed to allocate identity pmd.\n");
+ return;
+ }
+ pud_populate(NULL, pud, pmd);
+ pmd += pmd_index(addr);
+ } else
+ pmd = pmd_offset(pud, addr);
+
+ do {
+ next = pmd_addr_end(addr, end);
+ *pmd = __pmd((addr & PMD_MASK) | prot);
+ flush_pmd_entry(pmd);
+ } while (pmd++, addr = next, addr != end);
+}
+#else /* !CONFIG_ARM_LPAE */
static void idmap_add_pmd(pud_t *pud, unsigned long addr, unsigned long end,
unsigned long prot)
{
@@ -15,6 +42,7 @@ static void idmap_add_pmd(pud_t *pud, unsigned long addr, unsigned long end,
pmd[1] = __pmd(addr);
flush_pmd_entry(pmd);
}
+#endif /* CONFIG_ARM_LPAE */

static void idmap_add_pud(pgd_t *pgd, unsigned long addr, unsigned long end,
unsigned long prot)
@@ -32,7 +60,7 @@ void identity_mapping_add(pgd_t *pgd, unsigned long addr, unsigned long end)
{
unsigned long prot, next;

- prot = PMD_TYPE_SECT | PMD_SECT_AP_WRITE;
+ prot = PMD_TYPE_SECT | PMD_SECT_AP_WRITE | PMD_SECT_AF;
if (cpu_architecture() <= CPU_ARCH_ARMv5TEJ && !cpu_is_xscale())
prot |= PMD_BIT4;

@@ -46,7 +74,11 @@ void identity_mapping_add(pgd_t *pgd, unsigned long addr, unsigned long end)
#ifdef CONFIG_SMP
static void idmap_del_pmd(pud_t *pud, unsigned long addr, unsigned long end)
{
- pmd_t *pmd = pmd_offset(pud, addr);
+ pmd_t *pmd;
+
+ if (pud_none_or_clear_bad(pud))
+ return;
+ pmd = pmd_offset(pud, addr);
pmd_clear(pmd);
}

Catalin Marinas

2011-05-08 12:51:32 UTC

With LPAE, TTBRx registers are 64-bit. The ASID is stored in TTBR0
rather than a separate Context ID register. This patch makes the
necessary changes to handle context switching on LPAE.

Signed-off-by: Catalin Marinas <***@arm.com>
---
arch/arm/mm/context.c | 19 +++++++++++++++++--
arch/arm/mm/proc-v7.S | 10 ++++++++--
2 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/arch/arm/mm/context.c b/arch/arm/mm/context.c
index b0ee9ba..fcdb101 100644
--- a/arch/arm/mm/context.c
+++ b/arch/arm/mm/context.c
@@ -22,6 +22,21 @@ unsigned int cpu_last_asid = ASID_FIRST_VERSION;
DEFINE_PER_CPU(struct mm_struct *, current_mm);
#endif

+#ifdef CONFIG_ARM_LPAE
+#define cpu_set_asid(asid) { \
+ unsigned long ttbl, ttbh; \
+ asm volatile( \
+ " mrrc p15, 0, %0, %1, c2 @ read TTBR0\n" \
+ " mov %1, %2, lsl #(48 - 32) @ set ASID\n" \
+ " mcrr p15, 0, %0, %1, c2 @ set TTBR0\n" \
+ : "=&r" (ttbl), "=&r" (ttbh) \
+ : "r" (asid & ~ASID_MASK)); \
+}
+#else
+#define cpu_set_asid(asid) \
+ asm(" mcr p15, 0, %0, c13, c0, 1\n" : : "r" (asid))
+#endif
+
/*
* We fork()ed a process, and we need a new context for the child
* to run in. We reserve version 0 for initial tasks so we will
@@ -37,7 +52,7 @@ void __init_new_context(struct task_struct *tsk, struct mm_struct *mm)
static void flush_context(void)
{
/* set the reserved ASID before flushing the TLB */
- asm("mcr p15, 0, %0, c13, c0, 1\n" : : "r" (0));
+ cpu_set_asid(0);
isb();
local_flush_tlb_all();
if (icache_is_vivt_asid_tagged()) {
@@ -99,7 +114,7 @@ static void reset_context(void *info)
set_mm_context(mm, asid);

/* set the new ASID */
- asm("mcr p15, 0, %0, c13, c0, 1\n" : : "r" (mm->context.id));
+ cpu_set_asid(mm->context.id);
isb();
}

diff --git a/arch/arm/mm/proc-v7.S b/arch/arm/mm/proc-v7.S
index 0996713..ad22628 100644
--- a/arch/arm/mm/proc-v7.S
+++ b/arch/arm/mm/proc-v7.S
@@ -115,8 +115,13 @@ ENDPROC(cpu_v7_dcache_clean_area)
*/
ENTRY(cpu_v7_switch_mm)
#ifdef CONFIG_MMU
- mov r2, #0
ldr r1, [r1, #MM_CONTEXT_ID] @ get mm->context.id
+ mov r2, #0
+#ifdef CONFIG_ARM_LPAE
+ and r3, r1, #0xff
+ mov r3, r3, lsl #(48 - 32) @ ASID
+ mcrr p15, 0, r0, r3, c2 @ set TTB 0
+#else /* !CONFIG_ARM_LPAE */
ALT_SMP(orr r0, r0, #TTB_FLAGS_SMP)
ALT_UP(orr r0, r0, #TTB_FLAGS_UP)
#ifdef CONFIG_ARM_ERRATA_430973
@@ -127,12 +132,13 @@ ENTRY(cpu_v7_switch_mm)
#endif
mcr p15, 0, r2, c13, c0, 1 @ set reserved context ID
isb
-1: mcr p15, 0, r0, c2, c0, 0 @ set TTB 0
+ mcr p15, 0, r0, c2, c0, 0 @ set TTB 0
isb
#ifdef CONFIG_ARM_ERRATA_754322
dsb
#endif
mcr p15, 0, r1, c13, c0, 1 @ set context ID
+#endif /* CONFIG_ARM_LPAE */
isb
#endif
mov pc, lr

Catalin Marinas

2011-05-08 12:51:30 UTC

This patch adds the MMU initialisation for the LPAE page table format.
The swapper_pg_dir size with LPAE is 5 rather than 4 pages. The
__v7_setup function configures the TTBRx split based on the PAGE_OFFSET
and sets the corresponding TTB control and MAIRx bits (similar to
PRRR/NMRR for TEX remapping). The 36-bit mappings (supersections) and
a few other memory types in mmu.c are conditionally compiled.

Signed-off-by: Catalin Marinas <***@arm.com>
---
arch/arm/kernel/head.S | 117 ++++++++++++++++++++++++++++++--------------
arch/arm/mm/mmu.c | 32 ++++++++++++-
arch/arm/mm/proc-macros.S | 5 +-
arch/arm/mm/proc-v7.S | 108 +++++++++++++++++++++++++++++++++++++-----
4 files changed, 210 insertions(+), 52 deletions(-)

diff --git a/arch/arm/kernel/head.S b/arch/arm/kernel/head.S
index ac368e6..4eea9cf 100644
--- a/arch/arm/kernel/head.S
+++ b/arch/arm/kernel/head.S
@@ -21,6 +21,7 @@
#include <asm/memory.h>
#include <asm/thread_info.h>
#include <asm/system.h>
+#include <asm/pgtable.h>

#ifdef CONFIG_DEBUG_LL
#include <mach/debug-macro.S>
@@ -38,11 +39,20 @@
#error KERNEL_RAM_VADDR must start at 0xXXXX8000
#endif

+#ifdef CONFIG_ARM_LPAE
+ /* LPAE requires an additional page for the PGD */
+#define PG_DIR_SIZE 0x5000
+#define PMD_ORDER 3
+#else
+#define PG_DIR_SIZE 0x4000
+#define PMD_ORDER 2
+#endif
+
.globl swapper_pg_dir
- .equ swapper_pg_dir, KERNEL_RAM_VADDR - 0x4000
+ .equ swapper_pg_dir, KERNEL_RAM_VADDR - PG_DIR_SIZE

.macro pgtbl, rd, phys
- add \rd, \phys, #TEXT_OFFSET - 0x4000
+ add \rd, \phys, #TEXT_OFFSET - PG_DIR_SIZE
.endm

#ifdef CONFIG_XIP_KERNEL
@@ -140,11 +150,11 @@ __create_page_tables:
pgtbl r4, r8 @ page table address

/*
- * Clear the 16K level 1 swapper page table
+ * Clear the swapper page table
*/
mov r0, r4
mov r3, #0
- add r6, r0, #0x4000
+ add r6, r0, #PG_DIR_SIZE
1: str r3, [r0], #4
str r3, [r0], #4
str r3, [r0], #4
@@ -152,6 +162,25 @@ __create_page_tables:
teq r0, r6
bne 1b

+#ifdef CONFIG_ARM_LPAE
+ /*
+ * Build the PGD table (first level) to point to the PMD table. A PGD
+ * entry is 64-bit wide.
+ */
+ mov r0, r4
+ add r3, r4, #0x1000 @ first PMD table address
+ orr r3, r3, #3 @ PGD block type
+ mov r6, #4 @ PTRS_PER_PGD
+ mov r7, #1 << (55 - 32) @ L_PGD_SWAPPER
+1: str r3, [r0], #4 @ set bottom PGD entry bits
+ str r7, [r0], #4 @ set top PGD entry bits
+ add r3, r3, #0x1000 @ next PMD table
+ subs r6, r6, #1
+ bne 1b
+
+ add r4, r4, #0x1000 @ point to the PMD tables
+#endif
+
ldr r7, [r10, #PROCINFO_MM_MMUFLAGS] @ mm_mmuflags

/*
@@ -163,30 +192,30 @@ __create_page_tables:
sub r0, r0, r3 @ virt->phys offset
add r5, r5, r0 @ phys __enable_mmu
add r6, r6, r0 @ phys __enable_mmu_end
- mov r5, r5, lsr #20
- mov r6, r6, lsr #20
+ mov r5, r5, lsr #SECTION_SHIFT
+ mov r6, r6, lsr #SECTION_SHIFT

-1: orr r3, r7, r5, lsl #20 @ flags + kernel base
- str r3, [r4, r5, lsl #2] @ identity mapping
- teq r5, r6
- addne r5, r5, #1 @ next section
- bne 1b
+1: orr r3, r7, r5, lsl #SECTION_SHIFT @ flags + kernel base
+ str r3, [r4, r5, lsl #PMD_ORDER] @ identity mapping
+ cmp r5, r6
+ addlo r5, r5, #SECTION_SHIFT >> 20 @ next section
+ blo 1b

/*
* Now setup the pagetables for our kernel direct
* mapped region.
*/
mov r3, pc
- mov r3, r3, lsr #20
- orr r3, r7, r3, lsl #20
- add r0, r4, #(KERNEL_START & 0xff000000) >> 18
- str r3, [r0, #(KERNEL_START & 0x00f00000) >> 18]!
+ mov r3, r3, lsr #SECTION_SHIFT
+ orr r3, r7, r3, lsl #SECTION_SHIFT
+ add r0, r4, #(KERNEL_START & 0xff000000) >> (SECTION_SHIFT - PMD_ORDER)
+ str r3, [r0, #(KERNEL_START & 0x00e00000) >> (SECTION_SHIFT - PMD_ORDER)]!
ldr r6, =(KERNEL_END - 1)
- add r0, r0, #4
- add r6, r4, r6, lsr #18
+ add r0, r0, #1 << PMD_ORDER
+ add r6, r4, r6, lsr #(SECTION_SHIFT - PMD_ORDER)
1: cmp r0, r6
- add r3, r3, #1 << 20
- strls r3, [r0], #4
+ add r3, r3, #1 << SECTION_SHIFT
+ strls r3, [r0], #1 << PMD_ORDER
bls 1b

#ifdef CONFIG_XIP_KERNEL
@@ -195,11 +224,11 @@ __create_page_tables:
*/
add r3, r8, #TEXT_OFFSET
orr r3, r3, r7
- add r0, r4, #(KERNEL_RAM_VADDR & 0xff000000) >> 18
- str r3, [r0, #(KERNEL_RAM_VADDR & 0x00f00000) >> 18]!
+ add r0, r4, #(KERNEL_RAM_VADDR & 0xff000000) >> (SECTION_SHIFT - PMD_ORDER)
+ str r3, [r0, #(KERNEL_RAM_VADDR & 0x00f00000) >> (SECTION_SHIFT - PMD_ORDER)]!
ldr r6, =(_end - 1)
add r0, r0, #4
- add r6, r4, r6, lsr #18
+ add r6, r4, r6, lsr #(SECTION_SHIFT - PMD_ORDER)
1: cmp r0, r6
add r3, r3, #1 << 20
strls r3, [r0], #4
@@ -207,15 +236,15 @@ __create_page_tables:
#endif

/*
- * Then map boot params address in r2 or
- * the first 1MB of ram if boot params address is not specified.
+ * Then map boot params address in r2 or the first 1MB (2MB with LPAE)
+ * of ram if boot params address is not specified.
*/
- mov r0, r2, lsr #20
- movs r0, r0, lsl #20
+ mov r0, r2, lsr #SECTION_SHIFT
+ movs r0, r0, lsl #SECTION_SHIFT
moveq r0, r8
sub r3, r0, r8
add r3, r3, #PAGE_OFFSET
- add r3, r4, r3, lsr #18
+ add r3, r4, r3, lsr #(SECTION_SHIFT - PMD_ORDER)
orr r6, r7, r0
str r6, [r3]

@@ -228,21 +257,27 @@ __create_page_tables:
*/
addruart r7, r3

- mov r3, r3, lsr #20
- mov r3, r3, lsl #2
+ mov r3, r3, lsr #SECTION_SHIFT
+ mov r3, r3, lsl #PMD_ORDER

add r0, r4, r3
rsb r3, r3, #0x4000 @ PTRS_PER_PGD*sizeof(long)
cmp r3, #0x0800 @ limit to 512MB
movhi r3, #0x0800
add r6, r0, r3
- mov r3, r7, lsr #20
+ mov r3, r7, lsr #SECTION_SHIFT
ldr r7, [r10, #PROCINFO_IO_MMUFLAGS] @ io_mmuflags
- orr r3, r7, r3, lsl #20
+ orr r3, r7, r3, lsl #SECTION_SHIFT
+#ifdef CONFIG_ARM_LPAE
+ mov r7, #1 << (54 - 32) @ XN
+#endif
1: str r3, [r0], #4
- add r3, r3, #1 << 20
- teq r0, r6
- bne 1b
+#ifdef CONFIG_ARM_LPAE
+ str r7, [r0], #4
+#endif
+ add r3, r3, #1 << SECTION_SHIFT
+ cmp r0, r6
+ blo 1b

#else /* CONFIG_DEBUG_ICEDCC */
/* we don't need any serial debugging mappings for ICEDCC */
@@ -254,7 +289,7 @@ __create_page_tables:
* If we're using the NetWinder or CATS, we also need to map
* in the 16550-type serial port for the debug messages
*/
- add r0, r4, #0xff000000 >> 18
+ add r0, r4, #0xff000000 >> (SECTION_SHIFT - PMD_ORDER)
orr r3, r7, #0x7c000000
str r3, [r0]
#endif
@@ -264,13 +299,16 @@ __create_page_tables:
* Similar reasons here - for debug. This is
* only for Acorn RiscPC architectures.
*/
- add r0, r4, #0x02000000 >> 18
+ add r0, r4, #0x02000000 >> (SECTION_SHIFT - PMD_ORDER)
orr r3, r7, #0x02000000
str r3, [r0]
- add r0, r4, #0xd8000000 >> 18
+ add r0, r4, #0xd8000000 >> (SECTION_SHIFT - PMD_ORDER)
str r3, [r0]
#endif
#endif
+#ifdef CONFIG_ARM_LPAE
+ sub r4, r4, #0x1000 @ point to the PGD table
+#endif
mov pc, lr
ENDPROC(__create_page_tables)
.ltorg
@@ -362,12 +400,17 @@ __enable_mmu:
#ifdef CONFIG_CPU_ICACHE_DISABLE
bic r0, r0, #CR_I
#endif
+#ifdef CONFIG_ARM_LPAE
+ mov r5, #0
+ mcrr p15, 0, r4, r5, c2 @ load TTBR0
+#else
mov r5, #(domain_val(DOMAIN_USER, DOMAIN_MANAGER) | \
domain_val(DOMAIN_KERNEL, DOMAIN_MANAGER) | \
domain_val(DOMAIN_TABLE, DOMAIN_MANAGER) | \
domain_val(DOMAIN_IO, DOMAIN_CLIENT))
mcr p15, 0, r5, c3, c0, 0 @ load domain access register
mcr p15, 0, r4, c2, c0, 0 @ load page table pointer
+#endif
b __turn_mmu_on
ENDPROC(__enable_mmu)

diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index 1e4e05a..6794c92 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -152,6 +152,7 @@ static int __init early_nowrite(char *__unused)
}
early_param("nowb", early_nowrite);

+#ifndef CONFIG_ARM_LPAE
static int __init early_ecc(char *p)
{
if (memcmp(p, "on", 2) == 0)
@@ -161,6 +162,7 @@ static int __init early_ecc(char *p)
return 0;
}
early_param("ecc", early_ecc);
+#endif

static int __init noalign_setup(char *__unused)
{
@@ -230,10 +232,12 @@ static struct mem_type mem_types[] = {
.prot_sect = PMD_TYPE_SECT | PMD_SECT_XN,
.domain = DOMAIN_KERNEL,
},
+#ifndef CONFIG_ARM_LPAE
[MT_MINICLEAN] = {
.prot_sect = PMD_TYPE_SECT | PMD_SECT_XN | PMD_SECT_MINICACHE,
.domain = DOMAIN_KERNEL,
},
+#endif
[MT_LOW_VECTORS] = {
.prot_pte = L_PTE_PRESENT | L_PTE_YOUNG | L_PTE_DIRTY |
L_PTE_RDONLY,
@@ -423,6 +427,7 @@ static void __init build_mem_type_table(void)
* ARMv6 and above have extended page tables.
*/
if (cpu_arch >= CPU_ARCH_ARMv6 && (cr & CR_XP)) {
+#ifndef CONFIG_ARM_LPAE
/*
* Mark cache clean areas and XIP ROM read only
* from SVC mode and no access from userspace.
@@ -430,6 +435,7 @@ static void __init build_mem_type_table(void)
mem_types[MT_ROM].prot_sect |= PMD_SECT_APX|PMD_SECT_AP_WRITE;
mem_types[MT_MINICLEAN].prot_sect |= PMD_SECT_APX|PMD_SECT_AP_WRITE;
mem_types[MT_CACHECLEAN].prot_sect |= PMD_SECT_APX|PMD_SECT_AP_WRITE;
+#endif

if (is_smp()) {
/*
@@ -468,6 +474,18 @@ static void __init build_mem_type_table(void)
mem_types[MT_MEMORY_NONCACHED].prot_sect |= PMD_SECT_BUFFERABLE;
}

+#ifdef CONFIG_ARM_LPAE
+ /*
+ * Do not generate access flag faults for the kernel mappings.
+ */
+ for (i = 0; i < ARRAY_SIZE(mem_types); i++) {
+ mem_types[i].prot_pte |= PTE_EXT_AF;
+ mem_types[i].prot_sect |= PMD_SECT_AF;
+ }
+ kern_pgprot |= PTE_EXT_AF;
+ vecs_pgprot |= PTE_EXT_AF;
+#endif
+
for (i = 0; i < 16; i++) {
unsigned long v = pgprot_val(protection_map[i]);
protection_map[i] = __pgprot(v | user_pgprot);
@@ -597,6 +615,7 @@ static void alloc_init_pud(pgd_t *pgd, unsigned long addr, unsigned long end,
} while (pud++, addr = next, addr != end);
}

+#ifndef CONFIG_ARM_LPAE
static void __init create_36bit_mapping(struct map_desc *md,
const struct mem_type *type)
{
@@ -656,6 +675,7 @@ static void __init create_36bit_mapping(struct map_desc *md,
pgd += SUPERSECTION_SIZE >> PGDIR_SHIFT;
} while (addr != end);
}
+#endif /* !CONFIG_ARM_LPAE */

/*
* Create the page directory entries and any necessary
@@ -687,6 +707,7 @@ static void __init create_mapping(struct map_desc *md)

type = &mem_types[md->type];

+#ifndef CONFIG_ARM_LPAE
/*
* Catch 36-bit addresses
*/
@@ -694,6 +715,7 @@ static void __init create_mapping(struct map_desc *md)
create_36bit_mapping(md, type);
return;
}
+#endif

addr = md->virtual & PAGE_MASK;
phys = __pfn_to_phys(md->pfn);
@@ -890,6 +912,14 @@ static inline void prepare_page_table(void)
pmd_clear(pmd_off_k(addr));
}

+#ifdef CONFIG_ARM_LPAE
+/* the first page is reserved for pgd */
+#define SWAPPER_PG_DIR_SIZE (PAGE_SIZE + \
+ PTRS_PER_PGD * PTRS_PER_PMD * sizeof(pmd_t))
+#else
+#define SWAPPER_PG_DIR_SIZE (PTRS_PER_PGD * sizeof(pgd_t))
+#endif
+
/*
* Reserve the special regions of memory
*/
@@ -899,7 +929,7 @@ void __init arm_mm_memblock_reserve(void)
* Reserve the page tables. These are already in use,
* and can only be in node 0.
*/
- memblock_reserve(__pa(swapper_pg_dir), PTRS_PER_PGD * sizeof(pgd_t));
+ memblock_reserve(__pa(swapper_pg_dir), SWAPPER_PG_DIR_SIZE);

#ifdef CONFIG_SA1111
/*
diff --git a/arch/arm/mm/proc-macros.S b/arch/arm/mm/proc-macros.S
index 34261f9..48b85f6 100644
--- a/arch/arm/mm/proc-macros.S
+++ b/arch/arm/mm/proc-macros.S
@@ -91,8 +91,9 @@
#if L_PTE_SHARED != PTE_EXT_SHARED
#error PTE shared bit mismatch
#endif
-#if (L_PTE_XN+L_PTE_USER+L_PTE_RDONLY+L_PTE_DIRTY+L_PTE_YOUNG+\
- L_PTE_FILE+L_PTE_PRESENT) > L_PTE_SHARED
+#if !defined (CONFIG_ARM_LPAE) && \
+ (L_PTE_XN+L_PTE_USER+L_PTE_RDONLY+L_PTE_DIRTY+L_PTE_YOUNG+\
+ L_PTE_FILE+L_PTE_PRESENT) > L_PTE_SHARED
#error Invalid Linux PTE bit settings
#endif
#endif /* CONFIG_MMU */
diff --git a/arch/arm/mm/proc-v7.S b/arch/arm/mm/proc-v7.S
index 0459397..0996713 100644
--- a/arch/arm/mm/proc-v7.S
+++ b/arch/arm/mm/proc-v7.S
@@ -19,6 +19,19 @@

#include "proc-macros.S"

+#ifdef CONFIG_ARM_LPAE
+#define TTB_IRGN_NC (0 << 8)
+#define TTB_IRGN_WBWA (1 << 8)
+#define TTB_IRGN_WT (2 << 8)
+#define TTB_IRGN_WB (3 << 8)
+#define TTB_RGN_NC (0 << 10)
+#define TTB_RGN_OC_WBWA (1 << 10)
+#define TTB_RGN_OC_WT (2 << 10)
+#define TTB_RGN_OC_WB (3 << 10)
+#define TTB_S (3 << 12)
+#define TTB_NOS (0)
+#define TTB_EAE (1 << 31)
+#else
#define TTB_S (1 << 1)
#define TTB_RGN_NC (0 << 3)
#define TTB_RGN_OC_WBWA (1 << 3)
@@ -29,14 +42,15 @@
#define TTB_IRGN_WBWA ((0 << 0) | (1 << 6))
#define TTB_IRGN_WT ((1 << 0) | (0 << 6))
#define TTB_IRGN_WB ((1 << 0) | (1 << 6))
+#endif

/* PTWs cacheable, inner WB not shareable, outer WB not shareable */
-#define TTB_FLAGS_UP TTB_IRGN_WB|TTB_RGN_OC_WB
-#define PMD_FLAGS_UP PMD_SECT_WB
+#define TTB_FLAGS_UP (TTB_IRGN_WB|TTB_RGN_OC_WB)
+#define PMD_FLAGS_UP (PMD_SECT_WB)

/* PTWs cacheable, inner WBWA shareable, outer WBWA not shareable */
-#define TTB_FLAGS_SMP TTB_IRGN_WBWA|TTB_S|TTB_NOS|TTB_RGN_OC_WBWA
-#define PMD_FLAGS_SMP PMD_SECT_WBWA|PMD_SECT_S
+#define TTB_FLAGS_SMP (TTB_IRGN_WBWA|TTB_S|TTB_NOS|TTB_RGN_OC_WBWA)
+#define PMD_FLAGS_SMP (PMD_SECT_WBWA|PMD_SECT_S)

ENTRY(cpu_v7_proc_init)
mov pc, lr
@@ -212,9 +226,31 @@ cpu_v7_name:
* NS0 = PRRR[18] = 0 - normal shareable property
* NS1 = PRRR[19] = 1 - normal shareable property
* NOS = PRRR[24+n] = 1 - not outer shareable
+ *
+ * Memory region attributes for LPAE (defined in pgtable-3level.h):
+ *
+ * n = AttrIndx[2:0]
+ *
+ * n MAIR
+ * UNCACHED 000 00000000
+ * BUFFERABLE 001 01000100
+ * DEV_WC 001 01000100
+ * WRITETHROUGH 010 10101010
+ * WRITEBACK 011 11101110
+ * DEV_CACHED 011 11101110
+ * DEV_SHARED 100 00000100
+ * DEV_NONSHARED 100 00000100
+ * unused 101
+ * unused 110
+ * WRITEALLOC 111 11111111
*/
+#ifdef CONFIG_ARM_LPAE
+.equ PRRR, 0xeeaa4400 @ MAIR0
+.equ NMRR, 0xff000004 @ MAIR1
+#else
.equ PRRR, 0xff0a81a8
.equ NMRR, 0x40e040e0
+#endif

/* Suspend/resume support: derived from arch/arm/mach-s5pv210/sleep.S */
.globl cpu_v7_suspend_size
@@ -380,16 +416,52 @@ __v7_setup:
dsb
#ifdef CONFIG_MMU
mcr p15, 0, r10, c8, c7, 0 @ invalidate I + D TLBs
+#ifdef CONFIG_ARM_LPAE
+ mov r5, #TTB_EAE
+ ALT_SMP(orr r5, r5, #TTB_FLAGS_SMP)
+ ALT_SMP(orr r5, r5, #TTB_FLAGS_SMP << 16)
+ ALT_UP(orr r5, r5, #TTB_FLAGS_UP)
+ ALT_UP(orr r5, r5, #TTB_FLAGS_UP << 16)
+ mrc p15, 0, r10, c2, c0, 2
+ orr r10, r10, r5
+#if PHYS_OFFSET <= PAGE_OFFSET
+ /*
+ * TTBR0/TTBR1 split (PAGE_OFFSET):
+ * 0x40000000: T0SZ = 2, T1SZ = 0 (not used)
+ * 0x80000000: T0SZ = 0, T1SZ = 1
+ * 0xc0000000: T0SZ = 0, T1SZ = 2
+ *
+ * Only use this feature if PAGE_OFFSET <= PAGE_OFFSET, otherwise
+ * booting secondary CPUs would end up using TTBR1 for the identity
+ * mapping set up in TTBR0.
+ */
+ orr r10, r10, #(((PAGE_OFFSET >> 30) - 1) << 16) @ TTBCR.T1SZ
+#endif
+#endif
mcr p15, 0, r10, c2, c0, 2 @ TTB control register
+#ifdef CONFIG_ARM_LPAE
+ mov r5, #0
+#if defined CONFIG_VMSPLIT_2G
+ /* PAGE_OFFSET == 0x80000000, T1SZ == 1 */
+ add r6, r8, #1 << 4 @ skip two L1 entries
+#elif defined CONFIG_VMSPLIT_3G
+ /* PAGE_OFFSET == 0xc0000000, T1SZ == 2 */
+ add r6, r8, #4096 * (1 + 3) @ only L2 used, skip pgd+3*pmd
+#else
+ mov r6, r8
+#endif
+ mcrr p15, 1, r6, r5, c2 @ load TTBR1
+#else /* !CONFIG_ARM_LPAE */
ALT_SMP(orr r4, r4, #TTB_FLAGS_SMP)
ALT_UP(orr r4, r4, #TTB_FLAGS_UP)
ALT_SMP(orr r8, r8, #TTB_FLAGS_SMP)
ALT_UP(orr r8, r8, #TTB_FLAGS_UP)
mcr p15, 0, r8, c2, c0, 1 @ load TTB1
- ldr r5, =PRRR @ PRRR
- ldr r6, =NMRR @ NMRR
- mcr p15, 0, r5, c10, c2, 0 @ write PRRR
- mcr p15, 0, r6, c10, c2, 1 @ write NMRR
+#endif /* CONFIG_ARM_LPAE */
+ ldr r5, =PRRR @ PRRR/MAIR0
+ ldr r6, =NMRR @ NMRR/MAIR1
+ mcr p15, 0, r5, c10, c2, 0 @ write PRRR/MAIR0
+ mcr p15, 0, r6, c10, c2, 1 @ write NMRR/MAIR1
#endif
adr r5, v7_crval
ldmia r5, {r5, r6}
@@ -408,14 +480,19 @@ __v7_setup:
ENDPROC(__v7_setup)

/* AT
- * TFR EV X F I D LR S
- * .EEE ..EE PUI. .T.T 4RVI ZWRS BLDP WCAM
+ * TFR EV X F IHD LR S
+ * .EEE ..EE PUI. .TAT 4RVI ZWRS BLDP WCAM
* rxxx rrxx xxx0 0101 xxxx xxxx x111 xxxx < forced
* 1 0 110 0011 1100 .111 1101 < we want
+ * 11 0 110 1 0011 1100 .111 1101 < we want (LPAE)
*/
.type v7_crval, #object
v7_crval:
+#ifdef CONFIG_ARM_LPAE
+ crval clear=0x0120c302, mmuset=0x30c23c7d, ucset=0x00c01c7c
+#else
crval clear=0x0120c302, mmuset=0x10c03c7d, ucset=0x00c01c7c
+#endif

__v7_setup_stack:
.space 4 * 11 @ 11 registers
@@ -519,17 +596,20 @@ __v7_ca15mp_proc_info:
PMD_TYPE_SECT | \
PMD_SECT_AP_WRITE | \
PMD_SECT_AP_READ | \
+ PMD_SECT_AF | \
PMD_FLAGS_SMP)
ALT_UP(.long \
PMD_TYPE_SECT | \
PMD_SECT_AP_WRITE | \
PMD_SECT_AP_READ | \
+ PMD_SECT_AF | \
PMD_FLAGS_UP)
/* PMD_SECT_XN is set explicitly in head.S for LPAE */
.long PMD_TYPE_SECT | \
PMD_SECT_XN | \
PMD_SECT_AP_WRITE | \
- PMD_SECT_AP_READ
+ PMD_SECT_AP_READ | \
+ PMD_SECT_AF
b __v7_ca15mp_setup
.long cpu_arch_name
.long cpu_elf_name
@@ -552,16 +632,20 @@ __v7_proc_info:
PMD_TYPE_SECT | \
PMD_SECT_AP_WRITE | \
PMD_SECT_AP_READ | \
+ PMD_SECT_AF | \
PMD_FLAGS_SMP)
ALT_UP(.long \
PMD_TYPE_SECT | \
PMD_SECT_AP_WRITE | \
PMD_SECT_AP_READ | \
+ PMD_SECT_AF | \
PMD_FLAGS_UP)
+ /* PMD_SECT_XN is set explicitly in head.S for LPAE */
.long PMD_TYPE_SECT | \
PMD_SECT_XN | \
PMD_SECT_AP_WRITE | \
- PMD_SECT_AP_READ
+ PMD_SECT_AP_READ | \
+ PMD_SECT_AF
W(b) __v7_setup
.long cpu_arch_name
.long cpu_elf_name

Catalin Marinas

2011-05-08 12:51:25 UTC

This patch moves page table definitions from asm/page.h, asm/pgtable.h
and asm/ptgable-hwdef.h into corresponding *-2level* files.

Signed-off-by: Catalin Marinas <***@arm.com>
---
arch/arm/include/asm/page.h | 42 +--------
arch/arm/include/asm/pgtable-2level-hwdef.h | 91 +++++++++++++++++
arch/arm/include/asm/pgtable-2level-types.h | 64 ++++++++++++
arch/arm/include/asm/pgtable-2level.h | 143 +++++++++++++++++++++++++++
arch/arm/include/asm/pgtable-hwdef.h | 77 +--------------
arch/arm/include/asm/pgtable.h | 135 +-------------------------
6 files changed, 302 insertions(+), 250 deletions(-)
create mode 100644 arch/arm/include/asm/pgtable-2level-hwdef.h
create mode 100644 arch/arm/include/asm/pgtable-2level-types.h
create mode 100644 arch/arm/include/asm/pgtable-2level.h

diff --git a/arch/arm/include/asm/page.h b/arch/arm/include/asm/page.h
index f51a695..3848105 100644
--- a/arch/arm/include/asm/page.h
+++ b/arch/arm/include/asm/page.h
@@ -151,47 +151,7 @@ extern void __cpu_copy_user_highpage(struct page *to, struct page *from,
#define clear_page(page) memset((void *)(page), 0, PAGE_SIZE)
extern void copy_page(void *to, const void *from);

-typedef unsigned long pteval_t;
-
-#undef STRICT_MM_TYPECHECKS
-
-#ifdef STRICT_MM_TYPECHECKS
-/*
- * These are used to make use of C type-checking..
- */
-typedef struct { pteval_t pte; } pte_t;
-typedef struct { unsigned long pmd; } pmd_t;
-typedef struct { unsigned long pgd[2]; } pgd_t;
-typedef struct { unsigned long pgprot; } pgprot_t;
-
-#define pte_val(x) ((x).pte)
-#define pmd_val(x) ((x).pmd)
-#define pgd_val(x) ((x).pgd[0])
-#define pgprot_val(x) ((x).pgprot)
-
-#define __pte(x) ((pte_t) { (x) } )
-#define __pmd(x) ((pmd_t) { (x) } )
-#define __pgprot(x) ((pgprot_t) { (x) } )
-
-#else
-/*
- * .. while these make it easier on the compiler
- */
-typedef pteval_t pte_t;
-typedef unsigned long pmd_t;
-typedef unsigned long pgd_t[2];
-typedef unsigned long pgprot_t;
-
-#define pte_val(x) (x)
-#define pmd_val(x) (x)
-#define pgd_val(x) ((x)[0])
-#define pgprot_val(x) (x)
-
-#define __pte(x) (x)
-#define __pmd(x) (x)
-#define __pgprot(x) (x)
-
-#endif /* STRICT_MM_TYPECHECKS */
+#include <asm/pgtable-2level-types.h>

#endif /* CONFIG_MMU */

diff --git a/arch/arm/include/asm/pgtable-2level-hwdef.h b/arch/arm/include/asm/pgtable-2level-hwdef.h
new file mode 100644
index 0000000..436529c
--- /dev/null
+++ b/arch/arm/include/asm/pgtable-2level-hwdef.h
@@ -0,0 +1,91 @@
+/*
+ * arch/arm/include/asm/pgtable-2level-hwdef.h
+ *
+ * Copyright (C) 1995-2002 Russell King
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#ifndef _ASM_PGTABLE_2LEVEL_HWDEF_H
+#define _ASM_PGTABLE_2LEVEL_HWDEF_H
+
+/*
+ * Hardware page table definitions.
+ *
+ * + Level 1 descriptor (PMD)
+ * - common
+ */
+#define PMD_TYPE_MASK (3 << 0)
+#define PMD_TYPE_FAULT (0 << 0)
+#define PMD_TYPE_TABLE (1 << 0)
+#define PMD_TYPE_SECT (2 << 0)
+#define PMD_BIT4 (1 << 4)
+#define PMD_DOMAIN(x) ((x) << 5)
+#define PMD_PROTECTION (1 << 9) /* v5 */
+/*
+ * - section
+ */
+#define PMD_SECT_BUFFERABLE (1 << 2)
+#define PMD_SECT_CACHEABLE (1 << 3)
+#define PMD_SECT_XN (1 << 4) /* v6 */
+#define PMD_SECT_AP_WRITE (1 << 10)
+#define PMD_SECT_AP_READ (1 << 11)
+#define PMD_SECT_TEX(x) ((x) << 12) /* v5 */
+#define PMD_SECT_APX (1 << 15) /* v6 */
+#define PMD_SECT_S (1 << 16) /* v6 */
+#define PMD_SECT_nG (1 << 17) /* v6 */
+#define PMD_SECT_SUPER (1 << 18) /* v6 */
+#define PMD_SECT_AF (0)
+
+#define PMD_SECT_UNCACHED (0)
+#define PMD_SECT_BUFFERED (PMD_SECT_BUFFERABLE)
+#define PMD_SECT_WT (PMD_SECT_CACHEABLE)
+#define PMD_SECT_WB (PMD_SECT_CACHEABLE | PMD_SECT_BUFFERABLE)
+#define PMD_SECT_MINICACHE (PMD_SECT_TEX(1) | PMD_SECT_CACHEABLE)
+#define PMD_SECT_WBWA (PMD_SECT_TEX(1) | PMD_SECT_CACHEABLE | PMD_SECT_BUFFERABLE)
+#define PMD_SECT_NONSHARED_DEV (PMD_SECT_TEX(2))
+
+/*
+ * - coarse table (not used)
+ */
+
+/*
+ * + Level 2 descriptor (PTE)
+ * - common
+ */
+#define PTE_TYPE_MASK (3 << 0)
+#define PTE_TYPE_FAULT (0 << 0)
+#define PTE_TYPE_LARGE (1 << 0)
+#define PTE_TYPE_SMALL (2 << 0)
+#define PTE_TYPE_EXT (3 << 0) /* v5 */
+#define PTE_BUFFERABLE (1 << 2)
+#define PTE_CACHEABLE (1 << 3)
+
+/*
+ * - extended small page/tiny page
+ */
+#define PTE_EXT_XN (1 << 0) /* v6 */
+#define PTE_EXT_AP_MASK (3 << 4)
+#define PTE_EXT_AP0 (1 << 4)
+#define PTE_EXT_AP1 (2 << 4)
+#define PTE_EXT_AP_UNO_SRO (0 << 4)
+#define PTE_EXT_AP_UNO_SRW (PTE_EXT_AP0)
+#define PTE_EXT_AP_URO_SRW (PTE_EXT_AP1)
+#define PTE_EXT_AP_URW_SRW (PTE_EXT_AP1|PTE_EXT_AP0)
+#define PTE_EXT_TEX(x) ((x) << 6) /* v5 */
+#define PTE_EXT_APX (1 << 9) /* v6 */
+#define PTE_EXT_COHERENT (1 << 9) /* XScale3 */
+#define PTE_EXT_SHARED (1 << 10) /* v6 */
+#define PTE_EXT_NG (1 << 11) /* v6 */
+
+/*
+ * - small page
+ */
+#define PTE_SMALL_AP_MASK (0xff << 4)
+#define PTE_SMALL_AP_UNO_SRO (0x00 << 4)
+#define PTE_SMALL_AP_UNO_SRW (0x55 << 4)
+#define PTE_SMALL_AP_URO_SRW (0xaa << 4)
+#define PTE_SMALL_AP_URW_SRW (0xff << 4)
+
+#endif
diff --git a/arch/arm/include/asm/pgtable-2level-types.h b/arch/arm/include/asm/pgtable-2level-types.h
new file mode 100644
index 0000000..8ff6941
--- /dev/null
+++ b/arch/arm/include/asm/pgtable-2level-types.h
@@ -0,0 +1,64 @@
+/*
+ * arch/arm/include/asm/pgtable_32_types.h
+ *
+ * Copyright (C) 1995-2003 Russell King
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+#ifndef _ASM_PGTABLE_2LEVEL_TYPES_H
+#define _ASM_PGTABLE_2LEVEL_TYPES_H
+
+typedef unsigned long pteval_t;
+
+#undef STRICT_MM_TYPECHECKS
+
+#ifdef STRICT_MM_TYPECHECKS
+/*
+ * These are used to make use of C type-checking..
+ */
+typedef struct { pteval_t pte; } pte_t;
+typedef struct { unsigned long pmd; } pmd_t;
+typedef struct { unsigned long pgd[2]; } pgd_t;
+typedef struct { unsigned long pgprot; } pgprot_t;
+
+#define pte_val(x) ((x).pte)
+#define pmd_val(x) ((x).pmd)
+#define pgd_val(x) ((x).pgd[0])
+#define pgprot_val(x) ((x).pgprot)
+
+#define __pte(x) ((pte_t) { (x) } )
+#define __pmd(x) ((pmd_t) { (x) } )
+#define __pgprot(x) ((pgprot_t) { (x) } )
+
+#else
+/*
+ * .. while these make it easier on the compiler
+ */
+typedef pteval_t pte_t;
+typedef unsigned long pmd_t;
+typedef unsigned long pgd_t[2];
+typedef unsigned long pgprot_t;
+
+#define pte_val(x) (x)
+#define pmd_val(x) (x)
+#define pgd_val(x) ((x)[0])
+#define pgprot_val(x) (x)
+
+#define __pte(x) (x)
+#define __pmd(x) (x)
+#define __pgprot(x) (x)
+
+#endif /* STRICT_MM_TYPECHECKS */
+
+#endif /* _ASM_PGTABLE_2LEVEL_TYPES_H */
diff --git a/arch/arm/include/asm/pgtable-2level.h b/arch/arm/include/asm/pgtable-2level.h
new file mode 100644
index 0000000..470457e
--- /dev/null
+++ b/arch/arm/include/asm/pgtable-2level.h
@@ -0,0 +1,143 @@
+/*
+ * arch/arm/include/asm/pgtable-2level.h
+ *
+ * Copyright (C) 1995-2002 Russell King
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#ifndef _ASM_PGTABLE_2LEVEL_H
+#define _ASM_PGTABLE_2LEVEL_H
+
+/*
+ * Hardware-wise, we have a two level page table structure, where the first
+ * level has 4096 entries, and the second level has 256 entries. Each entry
+ * is one 32-bit word. Most of the bits in the second level entry are used
+ * by hardware, and there aren't any "accessed" and "dirty" bits.
+ *
+ * Linux on the other hand has a three level page table structure, which can
+ * be wrapped to fit a two level page table structure easily - using the PGD
+ * and PTE only. However, Linux also expects one "PTE" table per page, and
+ * at least a "dirty" bit.
+ *
+ * Therefore, we tweak the implementation slightly - we tell Linux that we
+ * have 2048 entries in the first level, each of which is 8 bytes (iow, two
+ * hardware pointers to the second level.) The second level contains two
+ * hardware PTE tables arranged contiguously, preceded by Linux versions
+ * which contain the state information Linux needs. We, therefore, end up
+ * with 512 entries in the "PTE" level.
+ *
+ * This leads to the page tables having the following layout:
+ *
+ * pgd pte
+ * | |
+ * +--------+
+ * | | +------------+ +0
+ * +- - - - + | Linux pt 0 |
+ * | | +------------+ +1024
+ * +--------+ +0 | Linux pt 1 |
+ * | |-----> +------------+ +2048
+ * +- - - - + +4 | h/w pt 0 |
+ * | |-----> +------------+ +3072
+ * +--------+ +8 | h/w pt 1 |
+ * | | +------------+ +4096
+ *
+ * See L_PTE_xxx below for definitions of bits in the "Linux pt", and
+ * PTE_xxx for definitions of bits appearing in the "h/w pt".
+ *
+ * PMD_xxx definitions refer to bits in the first level page table.
+ *
+ * The "dirty" bit is emulated by only granting hardware write permission
+ * iff the page is marked "writable" and "dirty" in the Linux PTE. This
+ * means that a write to a clean page will cause a permission fault, and
+ * the Linux MM layer will mark the page dirty via handle_pte_fault().
+ * For the hardware to notice the permission change, the TLB entry must
+ * be flushed, and ptep_set_access_flags() does that for us.
+ *
+ * The "accessed" or "young" bit is emulated by a similar method; we only
+ * allow accesses to the page if the "young" bit is set. Accesses to the
+ * page will cause a fault, and handle_pte_fault() will set the young bit
+ * for us as long as the page is marked present in the corresponding Linux
+ * PTE entry. Again, ptep_set_access_flags() will ensure that the TLB is
+ * up to date.
+ *
+ * However, when the "young" bit is cleared, we deny access to the page
+ * by clearing the hardware PTE. Currently Linux does not flush the TLB
+ * for us in this case, which means the TLB will retain the transation
+ * until either the TLB entry is evicted under pressure, or a context
+ * switch which changes the user space mapping occurs.
+ */
+#define PTRS_PER_PTE 512
+#define PTRS_PER_PMD 1
+#define PTRS_PER_PGD 2048
+
+#define PTE_HWTABLE_PTRS (PTRS_PER_PTE)
+#define PTE_HWTABLE_OFF (PTE_HWTABLE_PTRS * sizeof(pte_t))
+#define PTE_HWTABLE_SIZE (PTRS_PER_PTE * sizeof(u32))
+
+/*
+ * PMD_SHIFT determines the size of the area a second-level page table can map
+ * PGDIR_SHIFT determines what a third-level page table entry can map
+ */
+#define PMD_SHIFT 21
+#define PGDIR_SHIFT 21
+
+#define PMD_SIZE (1UL << PMD_SHIFT)
+#define PMD_MASK (~(PMD_SIZE-1))
+#define PGDIR_SIZE (1UL << PGDIR_SHIFT)
+#define PGDIR_MASK (~(PGDIR_SIZE-1))
+
+/*
+ * section address mask and size definitions.
+ */
+#define SECTION_SHIFT 20
+#define SECTION_SIZE (1UL << SECTION_SHIFT)
+#define SECTION_MASK (~(SECTION_SIZE-1))
+
+/*
+ * ARMv6 supersection address mask and size definitions.
+ */
+#define SUPERSECTION_SHIFT 24
+#define SUPERSECTION_SIZE (1UL << SUPERSECTION_SHIFT)
+#define SUPERSECTION_MASK (~(SUPERSECTION_SIZE-1))
+
+#define USER_PTRS_PER_PGD (TASK_SIZE / PGDIR_SIZE)
+
+/*
+ * "Linux" PTE definitions.
+ *
+ * We keep two sets of PTEs - the hardware and the linux version.
+ * This allows greater flexibility in the way we map the Linux bits
+ * onto the hardware tables, and allows us to have YOUNG and DIRTY
+ * bits.
+ *
+ * The PTE table pointer refers to the hardware entries; the "Linux"
+ * entries are stored 1024 bytes below.
+ */
+#define L_PTE_PRESENT (_AT(pteval_t, 1) << 0)
+#define L_PTE_YOUNG (_AT(pteval_t, 1) << 1)
+#define L_PTE_FILE (_AT(pteval_t, 1) << 2) /* only when !PRESENT */
+#define L_PTE_DIRTY (_AT(pteval_t, 1) << 6)
+#define L_PTE_RDONLY (_AT(pteval_t, 1) << 7)
+#define L_PTE_USER (_AT(pteval_t, 1) << 8)
+#define L_PTE_XN (_AT(pteval_t, 1) << 9)
+#define L_PTE_SHARED (_AT(pteval_t, 1) << 10) /* shared(v6), coherent(xsc3) */
+
+/*
+ * These are the memory types, defined to be compatible with
+ * pre-ARMv6 CPUs cacheable and bufferable bits: XXCB
+ */
+#define L_PTE_MT_UNCACHED (_AT(pteval_t, 0x00) << 2) /* 0000 */
+#define L_PTE_MT_BUFFERABLE (_AT(pteval_t, 0x01) << 2) /* 0001 */
+#define L_PTE_MT_WRITETHROUGH (_AT(pteval_t, 0x02) << 2) /* 0010 */
+#define L_PTE_MT_WRITEBACK (_AT(pteval_t, 0x03) << 2) /* 0011 */
+#define L_PTE_MT_MINICACHE (_AT(pteval_t, 0x06) << 2) /* 0110 (sa1100, xscale) */
+#define L_PTE_MT_WRITEALLOC (_AT(pteval_t, 0x07) << 2) /* 0111 */
+#define L_PTE_MT_DEV_SHARED (_AT(pteval_t, 0x04) << 2) /* 0100 */
+#define L_PTE_MT_DEV_NONSHARED (_AT(pteval_t, 0x0c) << 2) /* 1100 */
+#define L_PTE_MT_DEV_WC (_AT(pteval_t, 0x09) << 2) /* 1001 */
+#define L_PTE_MT_DEV_CACHED (_AT(pteval_t, 0x0b) << 2) /* 1011 */
+#define L_PTE_MT_MASK (_AT(pteval_t, 0x0f) << 2)
+
+#endif /* _ASM_PGTABLE_2LEVEL_H */
diff --git a/arch/arm/include/asm/pgtable-hwdef.h b/arch/arm/include/asm/pgtable-hwdef.h
index fd1521d..1831111 100644
--- a/arch/arm/include/asm/pgtable-hwdef.h
+++ b/arch/arm/include/asm/pgtable-hwdef.h
@@ -10,81 +10,6 @@
#ifndef _ASMARM_PGTABLE_HWDEF_H
#define _ASMARM_PGTABLE_HWDEF_H

-/*
- * Hardware page table definitions.
- *
- * + Level 1 descriptor (PMD)
- * - common
- */
-#define PMD_TYPE_MASK (3 << 0)
-#define PMD_TYPE_FAULT (0 << 0)
-#define PMD_TYPE_TABLE (1 << 0)
-#define PMD_TYPE_SECT (2 << 0)
-#define PMD_BIT4 (1 << 4)
-#define PMD_DOMAIN(x) ((x) << 5)
-#define PMD_PROTECTION (1 << 9) /* v5 */
-/*
- * - section
- */
-#define PMD_SECT_BUFFERABLE (1 << 2)
-#define PMD_SECT_CACHEABLE (1 << 3)
-#define PMD_SECT_XN (1 << 4) /* v6 */
-#define PMD_SECT_AP_WRITE (1 << 10)
-#define PMD_SECT_AP_READ (1 << 11)
-#define PMD_SECT_TEX(x) ((x) << 12) /* v5 */
-#define PMD_SECT_APX (1 << 15) /* v6 */
-#define PMD_SECT_S (1 << 16) /* v6 */
-#define PMD_SECT_nG (1 << 17) /* v6 */
-#define PMD_SECT_SUPER (1 << 18) /* v6 */
-
-#define PMD_SECT_UNCACHED (0)
-#define PMD_SECT_BUFFERED (PMD_SECT_BUFFERABLE)
-#define PMD_SECT_WT (PMD_SECT_CACHEABLE)
-#define PMD_SECT_WB (PMD_SECT_CACHEABLE | PMD_SECT_BUFFERABLE)
-#define PMD_SECT_MINICACHE (PMD_SECT_TEX(1) | PMD_SECT_CACHEABLE)
-#define PMD_SECT_WBWA (PMD_SECT_TEX(1) | PMD_SECT_CACHEABLE | PMD_SECT_BUFFERABLE)
-#define PMD_SECT_NONSHARED_DEV (PMD_SECT_TEX(2))
-
-/*
- * - coarse table (not used)
- */
-
-/*
- * + Level 2 descriptor (PTE)
- * - common
- */
-#define PTE_TYPE_MASK (3 << 0)
-#define PTE_TYPE_FAULT (0 << 0)
-#define PTE_TYPE_LARGE (1 << 0)
-#define PTE_TYPE_SMALL (2 << 0)
-#define PTE_TYPE_EXT (3 << 0) /* v5 */
-#define PTE_BUFFERABLE (1 << 2)
-#define PTE_CACHEABLE (1 << 3)
-
-/*
- * - extended small page/tiny page
- */
-#define PTE_EXT_XN (1 << 0) /* v6 */
-#define PTE_EXT_AP_MASK (3 << 4)
-#define PTE_EXT_AP0 (1 << 4)
-#define PTE_EXT_AP1 (2 << 4)
-#define PTE_EXT_AP_UNO_SRO (0 << 4)
-#define PTE_EXT_AP_UNO_SRW (PTE_EXT_AP0)
-#define PTE_EXT_AP_URO_SRW (PTE_EXT_AP1)
-#define PTE_EXT_AP_URW_SRW (PTE_EXT_AP1|PTE_EXT_AP0)
-#define PTE_EXT_TEX(x) ((x) << 6) /* v5 */
-#define PTE_EXT_APX (1 << 9) /* v6 */
-#define PTE_EXT_COHERENT (1 << 9) /* XScale3 */
-#define PTE_EXT_SHARED (1 << 10) /* v6 */
-#define PTE_EXT_NG (1 << 11) /* v6 */
-
-/*
- * - small page
- */
-#define PTE_SMALL_AP_MASK (0xff << 4)
-#define PTE_SMALL_AP_UNO_SRO (0x00 << 4)
-#define PTE_SMALL_AP_UNO_SRW (0x55 << 4)
-#define PTE_SMALL_AP_URO_SRW (0xaa << 4)
-#define PTE_SMALL_AP_URW_SRW (0xff << 4)
+#include <asm/pgtable-2level-hwdef.h>

#endif
diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
index c2663f4..9618052 100644
--- a/arch/arm/include/asm/pgtable.h
+++ b/arch/arm/include/asm/pgtable.h
@@ -25,6 +25,8 @@
#include <mach/vmalloc.h>
#include <asm/pgtable-hwdef.h>

+#include <asm/pgtable-2level.h>
+
/*
* Just any arbitrary offset to the start of the vmalloc VM area: the
* current 8MB value just means that there will be a 8MB "hole" after the
@@ -42,79 +44,6 @@
#define VMALLOC_START (((unsigned long)high_memory + VMALLOC_OFFSET) & ~(VMALLOC_OFFSET-1))
#endif

-/*
- * Hardware-wise, we have a two level page table structure, where the first
- * level has 4096 entries, and the second level has 256 entries. Each entry
- * is one 32-bit word. Most of the bits in the second level entry are used
- * by hardware, and there aren't any "accessed" and "dirty" bits.
- *
- * Linux on the other hand has a three level page table structure, which can
- * be wrapped to fit a two level page table structure easily - using the PGD
- * and PTE only. However, Linux also expects one "PTE" table per page, and
- * at least a "dirty" bit.
- *
- * Therefore, we tweak the implementation slightly - we tell Linux that we
- * have 2048 entries in the first level, each of which is 8 bytes (iow, two
- * hardware pointers to the second level.) The second level contains two
- * hardware PTE tables arranged contiguously, preceded by Linux versions
- * which contain the state information Linux needs. We, therefore, end up
- * with 512 entries in the "PTE" level.
- *
- * This leads to the page tables having the following layout:
- *
- * pgd pte
- * | |
- * +--------+
- * | | +------------+ +0
- * +- - - - + | Linux pt 0 |
- * | | +------------+ +1024
- * +--------+ +0 | Linux pt 1 |
- * | |-----> +------------+ +2048
- * +- - - - + +4 | h/w pt 0 |
- * | |-----> +------------+ +3072
- * +--------+ +8 | h/w pt 1 |
- * | | +------------+ +4096
- *
- * See L_PTE_xxx below for definitions of bits in the "Linux pt", and
- * PTE_xxx for definitions of bits appearing in the "h/w pt".
- *
- * PMD_xxx definitions refer to bits in the first level page table.
- *
- * The "dirty" bit is emulated by only granting hardware write permission
- * iff the page is marked "writable" and "dirty" in the Linux PTE. This
- * means that a write to a clean page will cause a permission fault, and
- * the Linux MM layer will mark the page dirty via handle_pte_fault().
- * For the hardware to notice the permission change, the TLB entry must
- * be flushed, and ptep_set_access_flags() does that for us.
- *
- * The "accessed" or "young" bit is emulated by a similar method; we only
- * allow accesses to the page if the "young" bit is set. Accesses to the
- * page will cause a fault, and handle_pte_fault() will set the young bit
- * for us as long as the page is marked present in the corresponding Linux
- * PTE entry. Again, ptep_set_access_flags() will ensure that the TLB is
- * up to date.
- *
- * However, when the "young" bit is cleared, we deny access to the page
- * by clearing the hardware PTE. Currently Linux does not flush the TLB
- * for us in this case, which means the TLB will retain the transation
- * until either the TLB entry is evicted under pressure, or a context
- * switch which changes the user space mapping occurs.
- */
-#define PTRS_PER_PTE 512
-#define PTRS_PER_PMD 1
-#define PTRS_PER_PGD 2048
-
-#define PTE_HWTABLE_PTRS (PTRS_PER_PTE)
-#define PTE_HWTABLE_OFF (PTE_HWTABLE_PTRS * sizeof(pte_t))
-#define PTE_HWTABLE_SIZE (PTRS_PER_PTE * sizeof(u32))
-
-/*
- * PMD_SHIFT determines the size of the area a second-level page table can map
- * PGDIR_SHIFT determines what a third-level page table entry can map
- */
-#define PMD_SHIFT 21
-#define PGDIR_SHIFT 21
-
#define LIBRARY_TEXT_START 0x0c000000

#ifndef __ASSEMBLY__
@@ -125,12 +54,6 @@ extern void __pgd_error(const char *file, int line, pgd_t);
#define pte_ERROR(pte) __pte_error(__FILE__, __LINE__, pte)
#define pmd_ERROR(pmd) __pmd_error(__FILE__, __LINE__, pmd)
#define pgd_ERROR(pgd) __pgd_error(__FILE__, __LINE__, pgd)
-#endif /* !__ASSEMBLY__ */
-
-#define PMD_SIZE (1UL << PMD_SHIFT)
-#define PMD_MASK (~(PMD_SIZE-1))
-#define PGDIR_SIZE (1UL << PGDIR_SHIFT)
-#define PGDIR_MASK (~(PGDIR_SIZE-1))

/*
* This is the lowest virtual address we can permit any user space
@@ -139,60 +62,6 @@ extern void __pgd_error(const char *file, int line, pgd_t);
*/
#define FIRST_USER_ADDRESS PAGE_SIZE

-#define USER_PTRS_PER_PGD (TASK_SIZE / PGDIR_SIZE)
-
-/*
- * section address mask and size definitions.
- */
-#define SECTION_SHIFT 20
-#define SECTION_SIZE (1UL << SECTION_SHIFT)
-#define SECTION_MASK (~(SECTION_SIZE-1))
-
-/*
- * ARMv6 supersection address mask and size definitions.
- */
-#define SUPERSECTION_SHIFT 24
-#define SUPERSECTION_SIZE (1UL << SUPERSECTION_SHIFT)
-#define SUPERSECTION_MASK (~(SUPERSECTION_SIZE-1))
-
-/*
- * "Linux" PTE definitions.
- *
- * We keep two sets of PTEs - the hardware and the linux version.
- * This allows greater flexibility in the way we map the Linux bits
- * onto the hardware tables, and allows us to have YOUNG and DIRTY
- * bits.
- *
- * The PTE table pointer refers to the hardware entries; the "Linux"
- * entries are stored 1024 bytes below.
- */
-#define L_PTE_PRESENT (_AT(pteval_t, 1) << 0)
-#define L_PTE_YOUNG (_AT(pteval_t, 1) << 1)
-#define L_PTE_FILE (_AT(pteval_t, 1) << 2) /* only when !PRESENT */
-#define L_PTE_DIRTY (_AT(pteval_t, 1) << 6)
-#define L_PTE_RDONLY (_AT(pteval_t, 1) << 7)
-#define L_PTE_USER (_AT(pteval_t, 1) << 8)
-#define L_PTE_XN (_AT(pteval_t, 1) << 9)
-#define L_PTE_SHARED (_AT(pteval_t, 1) << 10) /* shared(v6), coherent(xsc3) */
-
-/*
- * These are the memory types, defined to be compatible with
- * pre-ARMv6 CPUs cacheable and bufferable bits: XXCB
- */
-#define L_PTE_MT_UNCACHED (_AT(pteval_t, 0x00) << 2) /* 0000 */
-#define L_PTE_MT_BUFFERABLE (_AT(pteval_t, 0x01) << 2) /* 0001 */
-#define L_PTE_MT_WRITETHROUGH (_AT(pteval_t, 0x02) << 2) /* 0010 */
-#define L_PTE_MT_WRITEBACK (_AT(pteval_t, 0x03) << 2) /* 0011 */
-#define L_PTE_MT_MINICACHE (_AT(pteval_t, 0x06) << 2) /* 0110 (sa1100, xscale) */
-#define L_PTE_MT_WRITEALLOC (_AT(pteval_t, 0x07) << 2) /* 0111 */
-#define L_PTE_MT_DEV_SHARED (_AT(pteval_t, 0x04) << 2) /* 0100 */
-#define L_PTE_MT_DEV_NONSHARED (_AT(pteval_t, 0x0c) << 2) /* 1100 */
-#define L_PTE_MT_DEV_WC (_AT(pteval_t, 0x09) << 2) /* 1001 */
-#define L_PTE_MT_DEV_CACHED (_AT(pteval_t, 0x0b) << 2) /* 1011 */
-#define L_PTE_MT_MASK (_AT(pteval_t, 0x0f) << 2)
-
-#ifndef __ASSEMBLY__
-
/*
* The pgprot_* and protection_map entries will be fixed up in runtime
* to include the cachable and bufferable bits based on memory policy,

Catalin Marinas

2011-05-08 12:51:28 UTC

This patch introduces the pgtable-3level*.h files with definitions
specific to the LPAE page table format (3 levels of page tables).

Each table is 4KB and has 512 64-bit entries. An entry can point to a
40-bit physical address. The young, write and exec software bits share
the corresponding hardware bits (negated). Other software bits use spare
bits in the PTE.

The patch also changes some variable types from unsigned long or int to
pteval_t or pgprot_t.

Signed-off-by: Catalin Marinas <***@arm.com>
---
arch/arm/include/asm/page.h | 4 +
arch/arm/include/asm/pgtable-3level-hwdef.h | 81 +++++++++++++++++++++
arch/arm/include/asm/pgtable-3level-types.h | 68 ++++++++++++++++++
arch/arm/include/asm/pgtable-3level.h | 101 +++++++++++++++++++++++++++
arch/arm/include/asm/pgtable-hwdef.h | 4 +
arch/arm/include/asm/pgtable.h | 4 +
6 files changed, 262 insertions(+), 0 deletions(-)
create mode 100644 arch/arm/include/asm/pgtable-3level-hwdef.h
create mode 100644 arch/arm/include/asm/pgtable-3level-types.h
create mode 100644 arch/arm/include/asm/pgtable-3level.h

diff --git a/arch/arm/include/asm/page.h b/arch/arm/include/asm/page.h
index 3848105..e5124db 100644
--- a/arch/arm/include/asm/page.h
+++ b/arch/arm/include/asm/page.h
@@ -151,7 +151,11 @@ extern void __cpu_copy_user_highpage(struct page *to, struct page *from,
#define clear_page(page) memset((void *)(page), 0, PAGE_SIZE)
extern void copy_page(void *to, const void *from);

+#ifdef CONFIG_ARM_LPAE
+#include <asm/pgtable-3level-types.h>
+#else
#include <asm/pgtable-2level-types.h>
+#endif

#endif /* CONFIG_MMU */

diff --git a/arch/arm/include/asm/pgtable-3level-hwdef.h b/arch/arm/include/asm/pgtable-3level-hwdef.h
new file mode 100644
index 0000000..6c0fb9b
--- /dev/null
+++ b/arch/arm/include/asm/pgtable-3level-hwdef.h
@@ -0,0 +1,81 @@
+/*
+ * arch/arm/include/asm/pgtable-3level-hwdef.h
+ *
+ * Copyright (C) 2011 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+#ifndef _ASM_PGTABLE_3LEVEL_HWDEF_H
+#define _ASM_PGTABLE_3LEVEL_HWDEF_H
+
+/*
+ * Hardware page table definitions.
+ *
+ * + Level 1/2 descriptor
+ * - common
+ */
+#define PMD_TYPE_MASK (_AT(pmdval_t, 3) << 0)
+#define PMD_TYPE_FAULT (_AT(pmdval_t, 0) << 0)
+#define PMD_TYPE_TABLE (_AT(pmdval_t, 3) << 0)
+#define PMD_TYPE_SECT (_AT(pmdval_t, 1) << 0)
+#define PMD_BIT4 (_AT(pmdval_t, 0))
+#define PMD_DOMAIN(x) (_AT(pmdval_t, 0))
+
+/*
+ * - section
+ */
+#define PMD_SECT_BUFFERABLE (_AT(pmdval_t, 1) << 2)
+#define PMD_SECT_CACHEABLE (_AT(pmdval_t, 1) << 3)
+#define PMD_SECT_S (_AT(pmdval_t, 3) << 8)
+#define PMD_SECT_AF (_AT(pmdval_t, 1) << 10)
+#define PMD_SECT_nG (_AT(pmdval_t, 1) << 11)
+#ifdef __ASSEMBLY__
+/* avoid 'shift count out of range' warning */
+#define PMD_SECT_XN (0)
+#else
+#define PMD_SECT_XN ((pmdval_t)1 << 54)
+#endif
+#define PMD_SECT_AP_WRITE (_AT(pmdval_t, 0))
+#define PMD_SECT_AP_READ (_AT(pmdval_t, 0))
+#define PMD_SECT_TEX(x) (_AT(pmdval_t, 0))
+
+/*
+ * AttrIndx[2:0] encoding (mapping attributes defined in the MAIR* registers).
+ */
+#define PMD_SECT_UNCACHED (_AT(pmdval_t, 0) << 2) /* strongly ordered */
+#define PMD_SECT_BUFFERED (_AT(pmdval_t, 1) << 2) /* normal non-cacheable */
+#define PMD_SECT_WT (_AT(pmdval_t, 2) << 2) /* normal inner write-through */
+#define PMD_SECT_WB (_AT(pmdval_t, 3) << 2) /* normal inner write-back */
+#define PMD_SECT_WBWA (_AT(pmdval_t, 7) << 2) /* normal inner write-alloc */
+
+/*
+ * + Level 3 descriptor (PTE)
+ */
+#define PTE_TYPE_MASK (_AT(pteval_t, 3) << 0)
+#define PTE_TYPE_FAULT (_AT(pteval_t, 0) << 0)
+#define PTE_TYPE_PAGE (_AT(pteval_t, 3) << 0)
+#define PTE_BUFFERABLE (_AT(pteval_t, 1) << 2) /* AttrIndx[0] */
+#define PTE_CACHEABLE (_AT(pteval_t, 1) << 3) /* AttrIndx[1] */
+#define PTE_EXT_SHARED (_AT(pteval_t, 3) << 8) /* SH[1:0], inner shareable */
+#define PTE_EXT_AF (_AT(pteval_t, 1) << 10) /* Access Flag */
+#define PTE_EXT_NG (_AT(pteval_t, 1) << 11) /* nG */
+#define PTE_EXT_XN (_AT(pteval_t, 1) << 54) /* XN */
+
+/*
+ * 40-bit physical address supported.
+ */
+#define PHYS_MASK_SHIFT (40)
+#define PHYS_MASK ((1ULL << PHYS_MASK_SHIFT) - 1)
+
+#endif
diff --git a/arch/arm/include/asm/pgtable-3level-types.h b/arch/arm/include/asm/pgtable-3level-types.h
new file mode 100644
index 0000000..a3dd5cf
--- /dev/null
+++ b/arch/arm/include/asm/pgtable-3level-types.h
@@ -0,0 +1,68 @@
+/*
+ * arch/arm/include/asm/pgtable-3level-types.h
+ *
+ * Copyright (C) 2011 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+#ifndef _ASM_PGTABLE_3LEVEL_TYPES_H
+#define _ASM_PGTABLE_3LEVEL_TYPES_H
+
+typedef u64 pteval_t;
+typedef u64 pmdval_t;
+typedef u64 pgdval_t;
+typedef u64 pgprotval_t;
+
+#undef STRICT_MM_TYPECHECKS
+
+#ifdef STRICT_MM_TYPECHECKS
+
+/*
+ * These are used to make use of C type-checking..
+ */
+typedef struct { pteval_t pte; } pte_t;
+typedef struct { pmdval_t pmd; } pmd_t;
+typedef struct { pgdval_t pgd; } pgd_t;
+typedef struct { pgprotval_t pgprot; } pgprot_t;
+
+#define pte_val(x) ((x).pte)
+#define pmd_val(x) ((x).pmd)
+#define pgd_val(x) ((x).pgd)
+#define pgprot_val(x) ((x).pgprot)
+
+#define __pte(x) ((pte_t) { (x) } )
+#define __pmd(x) ((pmd_t) { (x) } )
+#define __pgd(x) ((pgd_t) { (x) } )
+#define __pgprot(x) ((pgprot_t) { (x) } )
+
+#else /* !STRICT_MM_TYPECHECKS */
+
+typedef pteval_t pte_t;
+typedef pmdval_t pmd_t;
+typedef pgdval_t pgd_t;
+typedef pgprotval_t pgprot_t;
+
+#define pte_val(x) (x)
+#define pmd_val(x) (x)
+#define pgd_val(x) (x)
+#define pgprot_val(x) (x)
+
+#define __pte(x) (x)
+#define __pmd(x) (x)
+#define __pgd(x) (x)
+#define __pgprot(x) (x)
+
+#endif /* STRICT_MM_TYPECHECKS */
+
+#endif /* _ASM_PGTABLE_3LEVEL_TYPES_H */
diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
new file mode 100644
index 0000000..ac45358
--- /dev/null
+++ b/arch/arm/include/asm/pgtable-3level.h
@@ -0,0 +1,101 @@
+/*
+ * arch/arm/include/asm/pgtable-3level.h
+ *
+ * Copyright (C) 2011 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+#ifndef _ASM_PGTABLE_3LEVEL_H
+#define _ASM_PGTABLE_3LEVEL_H
+
+/*
+ * With LPAE, there are 3 levels of page tables. Each level has 512 entries of
+ * 8 bytes each, occupying a 4K page. The first level table covers a range of
+ * 512GB, each entry representing 1GB. Since we are limited to 4GB input
+ * address range, only 4 entries in the PGD are used.
+ *
+ * There are enough spare bits in a page table entry for the kernel specific
+ * state.
+ */
+#define PTRS_PER_PTE 512
+#define PTRS_PER_PMD 512
+#define PTRS_PER_PGD 4
+
+#define PTE_HWTABLE_PTRS (PTRS_PER_PTE)
+#define PTE_HWTABLE_OFF (0)
+#define PTE_HWTABLE_SIZE (PTRS_PER_PTE * sizeof(u64))
+
+/*
+ * PGDIR_SHIFT determines the size a top-level page table entry can map.
+ */
+#define PGDIR_SHIFT 30
+
+/*
+ * PMD_SHIFT determines the size a middle-level page table entry can map.
+ */
+#define PMD_SHIFT 21
+
+#define PMD_SIZE (1UL << PMD_SHIFT)
+#define PMD_MASK (~(PMD_SIZE-1))
+#define PGDIR_SIZE (1UL << PGDIR_SHIFT)
+#define PGDIR_MASK (~(PGDIR_SIZE-1))
+
+/*
+ * section address mask and size definitions.
+ */
+#define SECTION_SHIFT 21
+#define SECTION_SIZE (1UL << SECTION_SHIFT)
+#define SECTION_MASK (~(SECTION_SIZE-1))
+
+#define USER_PTRS_PER_PGD (PAGE_OFFSET / PGDIR_SIZE)
+
+/*
+ * "Linux" PTE definitions for LPAE.
+ *
+ * These bits overlap with the hardware bits but the naming is preserved for
+ * consistency with the classic page table format.
+ */
+#define L_PTE_PRESENT (_AT(pteval_t, 3) << 0) /* Valid */
+#define L_PTE_FILE (_AT(pteval_t, 1) << 2) /* only when !PRESENT */
+#define L_PTE_BUFFERABLE (_AT(pteval_t, 1) << 2) /* AttrIndx[0] */
+#define L_PTE_CACHEABLE (_AT(pteval_t, 1) << 3) /* AttrIndx[1] */
+#define L_PTE_USER (_AT(pteval_t, 1) << 6) /* AP[1] */
+#define L_PTE_RDONLY (_AT(pteval_t, 1) << 7) /* AP[2] */
+#define L_PTE_SHARED (_AT(pteval_t, 3) << 8) /* SH[1:0], inner shareable */
+#define L_PTE_YOUNG (_AT(pteval_t, 1) << 10) /* AF */
+#define L_PTE_XN (_AT(pteval_t, 1) << 54) /* XN */
+#define L_PTE_DIRTY (_AT(pteval_t, 1) << 55) /* unused */
+#define L_PTE_SPECIAL (_AT(pteval_t, 1) << 56) /* unused */
+
+/*
+ * To be used in assembly code with the upper page attributes.
+ */
+#define L_PTE_XN_HIGH (1 << (54 - 32))
+#define L_PTE_DIRTY_HIGH (1 << (55 - 32))
+
+/*
+ * AttrIndx[2:0] encoding (mapping attributes defined in the MAIR* registers).
+ */
+#define L_PTE_MT_UNCACHED (_AT(pteval_t, 0) << 2) /* strongly ordered */
+#define L_PTE_MT_BUFFERABLE (_AT(pteval_t, 1) << 2) /* normal non-cacheable */
+#define L_PTE_MT_WRITETHROUGH (_AT(pteval_t, 2) << 2) /* normal inner write-through */
+#define L_PTE_MT_WRITEBACK (_AT(pteval_t, 3) << 2) /* normal inner write-back */
+#define L_PTE_MT_WRITEALLOC (_AT(pteval_t, 7) << 2) /* normal inner write-alloc */
+#define L_PTE_MT_DEV_SHARED (_AT(pteval_t, 4) << 2) /* device */
+#define L_PTE_MT_DEV_NONSHARED (_AT(pteval_t, 4) << 2) /* device */
+#define L_PTE_MT_DEV_WC (_AT(pteval_t, 1) << 2) /* normal non-cacheable */
+#define L_PTE_MT_DEV_CACHED (_AT(pteval_t, 3) << 2) /* normal inner write-back */
+#define L_PTE_MT_MASK (_AT(pteval_t, 7) << 2)
+
+#endif /* _ASM_PGTABLE_3LEVEL_H */
diff --git a/arch/arm/include/asm/pgtable-hwdef.h b/arch/arm/include/asm/pgtable-hwdef.h
index 1831111..8426229 100644
--- a/arch/arm/include/asm/pgtable-hwdef.h
+++ b/arch/arm/include/asm/pgtable-hwdef.h
@@ -10,6 +10,10 @@
#ifndef _ASMARM_PGTABLE_HWDEF_H
#define _ASMARM_PGTABLE_HWDEF_H

+#ifdef CONFIG_ARM_LPAE
+#include <asm/pgtable-3level-hwdef.h>
+#else
#include <asm/pgtable-2level-hwdef.h>
+#endif

#endif
diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
index 8f9e1dd..95fefd9 100644
--- a/arch/arm/include/asm/pgtable.h
+++ b/arch/arm/include/asm/pgtable.h
@@ -25,7 +25,11 @@
#include <mach/vmalloc.h>
#include <asm/pgtable-hwdef.h>

+#ifdef CONFIG_ARM_LPAE
+#include <asm/pgtable-3level.h>
+#else
#include <asm/pgtable-2level.h>
+#endif

/*
* Just any arbitrary offset to the start of the vmalloc VM area: the

Catalin Marinas

2011-05-08 12:51:29 UTC

This patch modifies the pgd/pmd/pte manipulation functions to support
the 3-level page table format. Since there is no need for an 'ext'
argument to cpu_set_pte_ext(), this patch conditionally defines a
different prototype for this function when CONFIG_ARM_LPAE.

The patch also introduces the L_PGD_SWAPPER flag to mark pgd entries
pointing to pmd tables pre-allocated in the swapper_pg_dir and avoid
trying to free them at run-time. This flag is 0 with the classic page
table format.

Signed-off-by: Catalin Marinas <***@arm.com>
---
arch/arm/include/asm/pgalloc.h | 24 +++++++++++++
arch/arm/include/asm/pgtable-3level.h | 5 +++
arch/arm/include/asm/pgtable.h | 62 ++++++++++++++++++++++++++++++++-
arch/arm/include/asm/proc-fns.h | 25 +++++++++++++
arch/arm/mm/ioremap.c | 8 +++--
arch/arm/mm/pgd.c | 51 +++++++++++++++++++++++++--
arch/arm/mm/proc-v7.S | 8 ++++
7 files changed, 175 insertions(+), 8 deletions(-)

diff --git a/arch/arm/include/asm/pgalloc.h b/arch/arm/include/asm/pgalloc.h
index 7418894..943504f 100644
--- a/arch/arm/include/asm/pgalloc.h
+++ b/arch/arm/include/asm/pgalloc.h
@@ -25,6 +25,26 @@
#define _PAGE_USER_TABLE (PMD_TYPE_TABLE | PMD_BIT4 | PMD_DOMAIN(DOMAIN_USER))
#define _PAGE_KERNEL_TABLE (PMD_TYPE_TABLE | PMD_BIT4 | PMD_DOMAIN(DOMAIN_KERNEL))

+#ifdef CONFIG_ARM_LPAE
+
+static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
+{
+ return (pmd_t *)get_zeroed_page(GFP_KERNEL | __GFP_REPEAT);
+}
+
+static inline void pmd_free(struct mm_struct *mm, pmd_t *pmd)
+{
+ BUG_ON((unsigned long)pmd & (PAGE_SIZE-1));
+ free_page((unsigned long)pmd);
+}
+
+static inline void pud_populate(struct mm_struct *mm, pud_t *pud, pmd_t *pmd)
+{
+ set_pud(pud, __pud(__pa(pmd) | PMD_TYPE_TABLE));
+}
+
+#else /* !CONFIG_ARM_LPAE */
+
/*
* Since we have only two-level page tables, these are trivial
*/
@@ -32,6 +52,8 @@
#define pmd_free(mm, pmd) do { } while (0)
#define pud_populate(mm,pmd,pte) BUG()

+#endif /* CONFIG_ARM_LPAE */
+
extern pgd_t *pgd_alloc(struct mm_struct *mm);
extern void pgd_free(struct mm_struct *mm, pgd_t *pgd);

@@ -109,7 +131,9 @@ static inline void __pmd_populate(pmd_t *pmdp, phys_addr_t pte,
{
pmdval_t pmdval = (pte + PTE_HWTABLE_OFF) | prot;
pmdp[0] = __pmd(pmdval);
+#ifndef CONFIG_ARM_LPAE
pmdp[1] = __pmd(pmdval + 256 * sizeof(pte_t));
+#endif
flush_pmd_entry(pmdp);
}

diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
index ac45358..14a3e28 100644
--- a/arch/arm/include/asm/pgtable-3level.h
+++ b/arch/arm/include/asm/pgtable-3level.h
@@ -98,4 +98,9 @@
#define L_PTE_MT_DEV_CACHED (_AT(pteval_t, 3) << 2) /* normal inner write-back */
#define L_PTE_MT_MASK (_AT(pteval_t, 7) << 2)

+/*
+ * Software PGD flags.
+ */
+#define L_PGD_SWAPPER (_AT(pgdval_t, 1) << 55) /* swapper_pg_dir entry */
+
#endif /* _ASM_PGTABLE_3LEVEL_H */
diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
index 95fefd9..1db9ad6 100644
--- a/arch/arm/include/asm/pgtable.h
+++ b/arch/arm/include/asm/pgtable.h
@@ -165,6 +165,31 @@ extern pgd_t swapper_pg_dir[PTRS_PER_PGD];
/* to find an entry in a kernel page-table-directory */
#define pgd_offset_k(addr) pgd_offset(&init_mm, addr)

+#ifdef CONFIG_ARM_LPAE
+
+#define pud_none(pud) (!pud_val(pud))
+#define pud_bad(pud) (!(pud_val(pud) & 2))
+#define pud_present(pud) (pud_val(pud))
+
+#define pud_clear(pudp) \
+ do { \
+ *pudp = __pud(0); \
+ clean_pmd_entry(pudp); \
+ } while (0)
+
+#define set_pud(pudp, pud) \
+ do { \
+ *pudp = pud; \
+ flush_pmd_entry(pudp); \
+ } while (0)
+
+static inline pmd_t *pud_page_vaddr(pud_t pud)
+{
+ return __va(pud_val(pud) & PHYS_MASK & (s32)PAGE_MASK);
+}
+
+#else /* !CONFIG_ARM_LPAE */
+
/*
* The "pud_xxx()" functions here are trivial when the pmd is folded into
* the pud: the pud entry is never bad, always exists, and can't be set or
@@ -176,15 +201,43 @@ extern pgd_t swapper_pg_dir[PTRS_PER_PGD];
#define pud_clear(pudp) do { } while (0)
#define set_pud(pud,pudp) do { } while (0)

+#endif /* CONFIG_ARM_LPAE */

/* Find an entry in the second-level page table.. */
+#ifdef CONFIG_ARM_LPAE
+#define pmd_index(addr) (((addr) >> PMD_SHIFT) & (PTRS_PER_PMD - 1))
+static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
+{
+ return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(addr);
+}
+#else
static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
{
return (pmd_t *)pud;
}
+#endif

#define pmd_none(pmd) (!pmd_val(pmd))
#define pmd_present(pmd) (pmd_val(pmd))
+
+#ifdef CONFIG_ARM_LPAE
+
+#define pmd_bad(pmd) (!(pmd_val(pmd) & 2))
+
+#define copy_pmd(pmdpd,pmdps) \
+ do { \
+ *pmdpd = *pmdps; \
+ flush_pmd_entry(pmdpd); \
+ } while (0)
+
+#define pmd_clear(pmdp) \
+ do { \
+ *pmdp = __pmd(0); \
+ clean_pmd_entry(pmdp); \
+ } while (0)
+
+#else /* !CONFIG_ARM_LPAE */
+
#define pmd_bad(pmd) (pmd_val(pmd) & 2)

#define copy_pmd(pmdpd,pmdps) \
@@ -201,6 +254,8 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
clean_pmd_entry(pmdp); \
} while (0)

+#endif /* CONFIG_ARM_LPAE */
+
static inline pte_t *pmd_page_vaddr(pmd_t pmd)
{
return __va(pmd_val(pmd) & PHYS_MASK & (s32)PAGE_MASK);
@@ -233,9 +288,14 @@ static inline pte_t *pmd_page_vaddr(pmd_t pmd)
#define pte_page(pte) pfn_to_page(pte_pfn(pte))
#define mk_pte(page,prot) pfn_pte(page_to_pfn(page), prot)

-#define set_pte_ext(ptep,pte,ext) cpu_set_pte_ext(ptep,pte,ext)
#define pte_clear(mm,addr,ptep) set_pte_ext(ptep, __pte(0), 0)

+#ifdef CONFIG_ARM_LPAE
+#define set_pte_ext(ptep,pte,ext) cpu_set_pte_ext(ptep,__pte(pte_val(pte)|(ext)))
+#else
+#define set_pte_ext(ptep,pte,ext) cpu_set_pte_ext(ptep,pte,ext)
+#endif
+
#if __LINUX_ARM_ARCH__ < 6
static inline void __sync_icache_dcache(pte_t pteval)
{
diff --git a/arch/arm/include/asm/proc-fns.h b/arch/arm/include/asm/proc-fns.h
index 8ec535e..b5db4f4 100644
--- a/arch/arm/include/asm/proc-fns.h
+++ b/arch/arm/include/asm/proc-fns.h
@@ -65,7 +65,11 @@ extern struct processor {
* Set a possibly extended PTE. Non-extended PTEs should
* ignore 'ext'.
*/
+#ifdef CONFIG_ARM_LPAE
+ void (*set_pte_ext)(pte_t *ptep, pte_t pte);
+#else
void (*set_pte_ext)(pte_t *ptep, pte_t pte, unsigned int ext);
+#endif

/* Suspend/resume */
unsigned int suspend_size;
@@ -79,7 +83,11 @@ extern void cpu_proc_fin(void);
extern int cpu_do_idle(void);
extern void cpu_dcache_clean_area(void *, int);
extern void cpu_do_switch_mm(unsigned long pgd_phys, struct mm_struct *mm);
+#ifdef CONFIG_ARM_LPAE
+extern void cpu_set_pte_ext(pte_t *ptep, pte_t pte);
+#else
extern void cpu_set_pte_ext(pte_t *ptep, pte_t pte, unsigned int ext);
+#endif
extern void cpu_reset(unsigned long addr) __attribute__((noreturn));
#else
#define cpu_proc_init() processor._proc_init()
@@ -87,7 +95,11 @@ extern void cpu_reset(unsigned long addr) __attribute__((noreturn));
#define cpu_reset(addr) processor.reset(addr)
#define cpu_do_idle() processor._do_idle()
#define cpu_dcache_clean_area(addr,sz) processor.dcache_clean_area(addr,sz)
+#ifdef CONFIG_ARM_LPAE
+#define cpu_set_pte_ext(ptep,pte) processor.set_pte_ext(ptep,pte)
+#else
#define cpu_set_pte_ext(ptep,pte,ext) processor.set_pte_ext(ptep,pte,ext)
+#endif
#define cpu_do_switch_mm(pgd,mm) processor.switch_mm(pgd,mm)
#endif

@@ -99,6 +111,18 @@ extern void cpu_resume(void);

#define cpu_switch_mm(pgd,mm) cpu_do_switch_mm(virt_to_phys(pgd),mm)

+#ifdef CONFIG_ARM_LPAE
+#define cpu_get_pgd() \
+ ({ \
+ unsigned long pg, pg2; \
+ __asm__("mrrc p15, 0, %0, %1, c2" \
+ : "=r" (pg), "=r" (pg2) \
+ : \
+ : "cc"); \
+ pg &= ~(PTRS_PER_PGD*sizeof(pgd_t)-1); \
+ (pgd_t *)phys_to_virt(pg); \
+ })
+#else
#define cpu_get_pgd() \
({ \
unsigned long pg; \
@@ -107,6 +131,7 @@ extern void cpu_resume(void);
pg &= ~0x3fff; \
(pgd_t *)phys_to_virt(pg); \
})
+#endif

#endif

diff --git a/arch/arm/mm/ioremap.c b/arch/arm/mm/ioremap.c
index ab50627..6bdf42c 100644
--- a/arch/arm/mm/ioremap.c
+++ b/arch/arm/mm/ioremap.c
@@ -64,7 +64,7 @@ void __check_kvm_seq(struct mm_struct *mm)
} while (seq != init_mm.context.kvm_seq);
}

-#ifndef CONFIG_SMP
+#if !defined(CONFIG_SMP) && !defined(CONFIG_ARM_LPAE)
/*
* Section support is unsafe on SMP - If you iounmap and ioremap a region,
* the other CPUs will not see this change until their next context switch.
@@ -195,11 +195,13 @@ void __iomem * __arm_ioremap_pfn_caller(unsigned long pfn,
unsigned long addr;
struct vm_struct * area;

+#ifndef CONFIG_ARM_LPAE
/*
* High mappings must be supersection aligned
*/
if (pfn >= 0x100000 && (__pfn_to_phys(pfn) & ~SUPERSECTION_MASK))
return NULL;
+#endif

/*
* Don't allow RAM to be mapped - this causes problems with ARMv6+
@@ -221,7 +223,7 @@ void __iomem * __arm_ioremap_pfn_caller(unsigned long pfn,
return NULL;
addr = (unsigned long)area->addr;

-#ifndef CONFIG_SMP
+#if !defined(CONFIG_SMP) && !defined(CONFIG_ARM_LPAE)
if (DOMAIN_IO == 0 &&
(((cpu_architecture() >= CPU_ARCH_ARMv6) && (get_cr() & CR_XP)) ||
cpu_is_xsc3()) && pfn >= 0x100000 &&
@@ -292,7 +294,7 @@ EXPORT_SYMBOL(__arm_ioremap);
void __iounmap(volatile void __iomem *io_addr)
{
void *addr = (void *)(PAGE_MASK & (unsigned long)io_addr);
-#ifndef CONFIG_SMP
+#if !defined(CONFIG_SMP) && !defined(CONFIG_ARM_LPAE)
struct vm_struct **p, *tmp;

/*
diff --git a/arch/arm/mm/pgd.c b/arch/arm/mm/pgd.c
index b2027c1..a3e78cc 100644
--- a/arch/arm/mm/pgd.c
+++ b/arch/arm/mm/pgd.c
@@ -10,6 +10,7 @@
#include <linux/mm.h>
#include <linux/gfp.h>
#include <linux/highmem.h>
+#include <linux/slab.h>

#include <asm/pgalloc.h>
#include <asm/page.h>
@@ -17,6 +18,14 @@

#include "mm.h"

+#ifdef CONFIG_ARM_LPAE
+#define __pgd_alloc() kmalloc(PTRS_PER_PGD * sizeof(pgd_t), GFP_KERNEL)
+#define __pgd_free(pgd) kfree(pgd)
+#else
+#define __pgd_alloc() (pgd_t *)__get_free_pages(GFP_KERNEL, 2)
+#define __pgd_free(pgd) free_pages((unsigned long)pgd, 2)
+#endif
+
/*
* need to get a 16k page for level 1
*/
@@ -27,7 +36,7 @@ pgd_t *pgd_alloc(struct mm_struct *mm)
pmd_t *new_pmd, *init_pmd;
pte_t *new_pte, *init_pte;

- new_pgd = (pgd_t *)__get_free_pages(GFP_KERNEL, 2);
+ new_pgd = __pgd_alloc();
if (!new_pgd)
goto no_pgd;

@@ -42,10 +51,25 @@ pgd_t *pgd_alloc(struct mm_struct *mm)

clean_dcache_area(new_pgd, PTRS_PER_PGD * sizeof(pgd_t));

+#ifdef CONFIG_ARM_LPAE
+ /*
+ * Allocate PMD table for modules and pkmap mappings.
+ */
+ new_pud = pud_alloc(mm, new_pgd + pgd_index(MODULES_VADDR),
+ MODULES_VADDR);
+ if (!new_pud)
+ goto no_pud;
+
+ new_pmd = pmd_alloc(mm, new_pud, 0);
+ if (!new_pmd)
+ goto no_pmd;
+#endif
+
if (!vectors_high()) {
/*
* On ARM, first page must always be allocated since it
- * contains the machine vectors.
+ * contains the machine vectors. The vectors are always high
+ * with LPAE.
*/
new_pud = pud_alloc(mm, new_pgd, 0);
if (!new_pud)
@@ -74,7 +98,7 @@ no_pte:
no_pmd:
pud_free(mm, new_pud);
no_pud:
- free_pages((unsigned long)new_pgd, 2);
+ __pgd_free(new_pgd);
no_pgd:
return NULL;
}
@@ -111,5 +135,24 @@ no_pud:
pgd_clear(pgd);
pud_free(mm, pud);
no_pgd:
- free_pages((unsigned long) pgd_base, 2);
+#ifdef CONFIG_ARM_LPAE
+ /*
+ * Free modules/pkmap or identity pmd tables.
+ */
+ for (pgd = pgd_base; pgd < pgd_base + PTRS_PER_PGD; pgd++) {
+ if (pgd_none_or_clear_bad(pgd))
+ continue;
+ if (pgd_val(*pgd) & L_PGD_SWAPPER)
+ continue;
+ pud = pud_offset(pgd, 0);
+ if (pud_none_or_clear_bad(pud))
+ continue;
+ pmd = pmd_offset(pud, 0);
+ pud_clear(pud);
+ pmd_free(mm, pmd);
+ pgd_clear(pgd);
+ pud_free(mm, pud);
+ }
+#endif
+ __pgd_free(pgd_base);
}
diff --git a/arch/arm/mm/proc-v7.S b/arch/arm/mm/proc-v7.S
index 864a5c9..0459397 100644
--- a/arch/arm/mm/proc-v7.S
+++ b/arch/arm/mm/proc-v7.S
@@ -136,6 +136,13 @@ ENDPROC(cpu_v7_switch_mm)
*/
ENTRY(cpu_v7_set_pte_ext)
#ifdef CONFIG_MMU
+#ifdef CONFIG_ARM_LPAE
+ tst r2, #L_PTE_PRESENT
+ beq 1f
+ tst r3, #1 << (55 - 32) @ L_PTE_DIRTY
+ orreq r2, #L_PTE_RDONLY
+1: strd r2, r3, [r0]
+#else /* !CONFIG_ARM_LPAE */
str r1, [r0] @ linux version

bic r3, r1, #0x000003f0
@@ -168,6 +175,7 @@ ENTRY(cpu_v7_set_pte_ext)
ARM( str r3, [r0, #2048]! )
THUMB( add r0, r0, #2048 )
THUMB( str r3, [r0] )
+#endif /* CONFIG_ARM_LPAE */
mcr p15, 0, r0, c7, c10, 1 @ flush_pte
#endif
mov pc, lr

Catalin Marinas

2011-05-08 12:51:31 UTC

The DFSR and IFSR register format is different when LPAE is enabled. In
addition, DFSR and IFSR have the similar definitions for the fault type.
This modifies modifies the fault code to correctly handle the new
format.

Signed-off-by: Catalin Marinas <***@arm.com>
---
arch/arm/mm/alignment.c | 8 ++++-
arch/arm/mm/fault.c | 80 +++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 87 insertions(+), 1 deletions(-)

diff --git a/arch/arm/mm/alignment.c b/arch/arm/mm/alignment.c
index 724ba3b..bc98a6e 100644
--- a/arch/arm/mm/alignment.c
+++ b/arch/arm/mm/alignment.c
@@ -906,6 +906,12 @@ do_alignment(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
return 0;
}

+#ifdef CONFIG_ARM_LPAE
+#define ALIGNMENT_FAULT 33
+#else
+#define ALIGNMENT_FAULT 1
+#endif
+
/*
* This needs to be done after sysctl_init, otherwise sys/ will be
* overwritten. Actually, this shouldn't be in sys/ at all since
@@ -939,7 +945,7 @@ static int __init alignment_init(void)
ai_usermode = UM_FIXUP;
}

- hook_fault_code(1, do_alignment, SIGBUS, BUS_ADRALN,
+ hook_fault_code(ALIGNMENT_FAULT, do_alignment, SIGBUS, BUS_ADRALN,
"alignment exception");

/*
diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
index ee76923..e06918b 100644
--- a/arch/arm/mm/fault.c
+++ b/arch/arm/mm/fault.c
@@ -33,10 +33,15 @@
#define FSR_WRITE (1 << 11)
#define FSR_FS4 (1 << 10)
#define FSR_FS3_0 (15)
+#define FSR_FS5_0 (0x3f)

static inline int fsr_fs(unsigned int fsr)
{
+#ifdef CONFIG_ARM_LPAE
+ return fsr & FSR_FS5_0;
+#else
return (fsr & FSR_FS3_0) | (fsr & FSR_FS4) >> 6;
+#endif
}

#ifdef CONFIG_MMU
@@ -122,8 +127,10 @@ void show_pte(struct mm_struct *mm, unsigned long addr)

pte = pte_offset_map(pmd, addr);
printk(", *pte=%08llx", (long long)pte_val(*pte));
+#ifndef CONFIG_ARM_LPAE
printk(", *ppte=%08llx",
(long long)pte_val(pte[PTE_HWTABLE_PTRS]));
+#endif
pte_unmap(pte);
} while(0);

@@ -490,6 +497,72 @@ static struct fsr_info {
int code;
const char *name;
} fsr_info[] = {
+#ifdef CONFIG_ARM_LPAE
+ { do_bad, SIGBUS, 0, "unknown 0" },
+ { do_bad, SIGBUS, 0, "unknown 1" },
+ { do_bad, SIGBUS, 0, "unknown 2" },
+ { do_bad, SIGBUS, 0, "unknown 3" },
+ { do_bad, SIGBUS, 0, "reserved translation fault" },
+ { do_translation_fault, SIGSEGV, SEGV_MAPERR, "level 1 translation fault" },
+ { do_translation_fault, SIGSEGV, SEGV_MAPERR, "level 2 translation fault" },
+ { do_page_fault, SIGSEGV, SEGV_MAPERR, "level 3 translation fault" },
+ { do_bad, SIGBUS, 0, "reserved access flag fault" },
+ { do_bad, SIGSEGV, SEGV_ACCERR, "level 1 access flag fault" },
+ { do_bad, SIGSEGV, SEGV_ACCERR, "level 2 access flag fault" },
+ { do_page_fault, SIGSEGV, SEGV_ACCERR, "level 3 access flag fault" },
+ { do_bad, SIGBUS, 0, "reserved permission fault" },
+ { do_bad, SIGSEGV, SEGV_ACCERR, "level 1 permission fault" },
+ { do_sect_fault, SIGSEGV, SEGV_ACCERR, "level 2 permission fault" },
+ { do_page_fault, SIGSEGV, SEGV_ACCERR, "level 3 permission fault" },
+ { do_bad, SIGBUS, 0, "synchronous external abort" },
+ { do_bad, SIGBUS, 0, "asynchronous external abort" },
+ { do_bad, SIGBUS, 0, "unknown 18" },
+ { do_bad, SIGBUS, 0, "unknown 19" },
+ { do_bad, SIGBUS, 0, "synchronous abort (translation table walk)" },
+ { do_bad, SIGBUS, 0, "synchronous abort (translation table walk)" },
+ { do_bad, SIGBUS, 0, "synchronous abort (translation table walk)" },
+ { do_bad, SIGBUS, 0, "synchronous abort (translation table walk)" },
+ { do_bad, SIGBUS, 0, "synchronous parity error" },
+ { do_bad, SIGBUS, 0, "asynchronous parity error" },
+ { do_bad, SIGBUS, 0, "unknown 26" },
+ { do_bad, SIGBUS, 0, "unknown 27" },
+ { do_bad, SIGBUS, 0, "synchronous parity error (translation table walk" },
+ { do_bad, SIGBUS, 0, "synchronous parity error (translation table walk" },
+ { do_bad, SIGBUS, 0, "synchronous parity error (translation table walk" },
+ { do_bad, SIGBUS, 0, "synchronous parity error (translation table walk" },
+ { do_bad, SIGBUS, 0, "unknown 32" },
+ { do_bad, SIGBUS, BUS_ADRALN, "alignment fault" },
+ { do_bad, SIGBUS, 0, "debug event" },
+ { do_bad, SIGBUS, 0, "unknown 35" },
+ { do_bad, SIGBUS, 0, "unknown 36" },
+ { do_bad, SIGBUS, 0, "unknown 37" },
+ { do_bad, SIGBUS, 0, "unknown 38" },
+ { do_bad, SIGBUS, 0, "unknown 39" },
+ { do_bad, SIGBUS, 0, "unknown 40" },
+ { do_bad, SIGBUS, 0, "unknown 41" },
+ { do_bad, SIGBUS, 0, "unknown 42" },
+ { do_bad, SIGBUS, 0, "unknown 43" },
+ { do_bad, SIGBUS, 0, "unknown 44" },
+ { do_bad, SIGBUS, 0, "unknown 45" },
+ { do_bad, SIGBUS, 0, "unknown 46" },
+ { do_bad, SIGBUS, 0, "unknown 47" },
+ { do_bad, SIGBUS, 0, "unknown 48" },
+ { do_bad, SIGBUS, 0, "unknown 49" },
+ { do_bad, SIGBUS, 0, "unknown 50" },
+ { do_bad, SIGBUS, 0, "unknown 51" },
+ { do_bad, SIGBUS, 0, "implementation fault (lockdown abort)" },
+ { do_bad, SIGBUS, 0, "unknown 53" },
+ { do_bad, SIGBUS, 0, "unknown 54" },
+ { do_bad, SIGBUS, 0, "unknown 55" },
+ { do_bad, SIGBUS, 0, "unknown 56" },
+ { do_bad, SIGBUS, 0, "unknown 57" },
+ { do_bad, SIGBUS, 0, "implementation fault (coprocessor abort)" },
+ { do_bad, SIGBUS, 0, "unknown 59" },
+ { do_bad, SIGBUS, 0, "unknown 60" },
+ { do_bad, SIGBUS, 0, "unknown 61" },
+ { do_bad, SIGBUS, 0, "unknown 62" },
+ { do_bad, SIGBUS, 0, "unknown 63" },
+#else /* !CONFIG_ARM_LPAE */
/*
* The following are the standard ARMv3 and ARMv4 aborts. ARMv5
* defines these to be "precise" aborts.
@@ -531,6 +604,7 @@ static struct fsr_info {
{ do_bad, SIGBUS, 0, "unknown 29" },
{ do_bad, SIGBUS, 0, "unknown 30" },
{ do_bad, SIGBUS, 0, "unknown 31" }
+#endif /* CONFIG_ARM_LPAE */
};

void __init
@@ -569,6 +643,9 @@ do_DataAbort(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
}

+#ifdef CONFIG_ARM_LPAE
+#define ifsr_info fsr_info
+#else /* !CONFIG_ARM_LPAE */
static struct fsr_info ifsr_info[] = {
{ do_bad, SIGBUS, 0, "unknown 0" },
{ do_bad, SIGBUS, 0, "unknown 1" },
@@ -603,6 +680,7 @@ static struct fsr_info ifsr_info[] = {
{ do_bad, SIGBUS, 0, "unknown 30" },
{ do_bad, SIGBUS, 0, "unknown 31" },
};
+#endif /* CONFIG_ARM_LPAE */

void __init
hook_ifault_code(int nr, int (*fn)(unsigned long, unsigned int, struct pt_regs *),
@@ -638,6 +716,7 @@ do_PrefetchAbort(unsigned long addr, unsigned int ifsr, struct pt_regs *regs)

static int __init exceptions_init(void)
{
+#ifndef CONFIG_ARM_LPAE
if (cpu_architecture() >= CPU_ARCH_ARMv6) {
hook_fault_code(4, do_translation_fault, SIGSEGV, SEGV_MAPERR,
"I-cache maintenance fault");
@@ -653,6 +732,7 @@ static int __init exceptions_init(void)
hook_fault_code(6, do_bad, SIGSEGV, SEGV_MAPERR,
"section access flag fault");
}
+#endif

return 0;
}

Catalin Marinas

2011-05-11 10:23:19 UTC

Similar to the PTE freeing, this patch introduced __pmd_free_tlb() which
invalidates the TLB before freeing a PMD page. This is needed because on
newer processors the entry in the upper page table may be cached by the
TLB and point to random data after the PMD has been freed.

Signed-off-by: Catalin Marinas <***@arm.com>
---

This patch should be part of the LPAE series but I haven't included it in the
latest series post.

arch/arm/include/asm/tlb.h | 12 +++++++++++-
1 files changed, 11 insertions(+), 1 deletions(-)

diff --git a/arch/arm/include/asm/tlb.h b/arch/arm/include/asm/tlb.h
index f9f6ecd..ef72f19 100644
--- a/arch/arm/include/asm/tlb.h
+++ b/arch/arm/include/asm/tlb.h
@@ -181,8 +181,18 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
tlb_remove_page(tlb, pte);
}

+static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp,
+ unsigned long addr)
+{
+#ifdef CONFIG_ARM_LPAE
+ tlb_add_flush(tlb, addr);
+ tlb_flush(tlb);
+ pmd_free((tlb)->mm, pmdp);
+#endif
+}
+
#define pte_free_tlb(tlb, ptep, addr) __pte_free_tlb(tlb, ptep, addr)
-#define pmd_free_tlb(tlb, pmdp, addr) pmd_free((tlb)->mm, pmdp)
+#define pmd_free_tlb(tlb, pmdp, addr) __pmd_free_tlb(tlb, pmdp, addr)
#define pud_free_tlb(tlb, pudp, addr) pud_free((tlb)->mm, pudp)

#define tlb_migrate_finish(mm) do { } while (0)

Sergei Shtylyov

2011-05-11 10:31:47 UTC

Hello.

Post by Catalin Marinas
Similar to the PTE freeing, this patch introduced __pmd_free_tlb() which
invalidates the TLB before freeing a PMD page. This is needed because on
newer processors the entry in the upper page table may be cached by the
TLB and point to random data after the PMD has been freed.
---
This patch should be part of the LPAE series but I haven't included it in the
latest series post.
arch/arm/include/asm/tlb.h | 12 +++++++++++-
1 files changed, 11 insertions(+), 1 deletions(-)
diff --git a/arch/arm/include/asm/tlb.h b/arch/arm/include/asm/tlb.h
index f9f6ecd..ef72f19 100644
--- a/arch/arm/include/asm/tlb.h
+++ b/arch/arm/include/asm/tlb.h
@@ -181,8 +181,18 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
tlb_remove_page(tlb, pte);
}
+static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp,
+ unsigned long addr)
+{
+#ifdef CONFIG_ARM_LPAE
+ tlb_add_flush(tlb, addr);
+ tlb_flush(tlb);
+ pmd_free((tlb)->mm, pmdp);

This is not a macro, so parens around 'tlb' are not needed.

Post by Catalin Marinas
+#endif
+}
+

Perhaps a better style would be (as SubmittingPatches suggest):

+#ifdef CONFIG_ARM_LPAE
+static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp,
+ unsigned long addr)
+{
+ tlb_add_flush(tlb, addr);
+ tlb_flush(tlb);
+ pmd_free(tlb->mm, pmdp);
+}
+#else
+static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp,
+ unsigned long addr) {}
+#endif
+

WBR, Sergei

Catalin Marinas

2011-05-11 10:40:45 UTC

Post by Sergei Shtylyov

This is not a macro, so parens around 'tlb' are not needed.

True, just copy/paste error.

Post by Sergei Shtylyov

Post by Catalin Marinas
+#endif
+}
+

+#ifdef CONFIG_ARM_LPAE
+static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp,
+ unsigned long addr)
+{
+ tlb_add_flush(tlb, addr);
+ tlb_flush(tlb);
+ pmd_free(tlb->mm, pmdp);
+}
+#else
+static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp,
+ unsigned long addr) {}
+#endif
+
WBR, Sergei

No real preference here though smaller number of lines changed in my
initial patch.

Thanks.

--
Catalin

Russell King - ARM Linux

2011-05-11 10:54:19 UTC

Post by Catalin Marinas
+static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp,
+ unsigned long addr)
+{
+#ifdef CONFIG_ARM_LPAE
+ tlb_add_flush(tlb, addr);
+ tlb_flush(tlb);
+ pmd_free((tlb)->mm, pmdp);
+#endif
+}

You're:

1. tlb_add_flush() - Adding the address which covers the PMD to the range
of virtual addresses which need flushing - ok.
2. tlb_flush() - You're then forcing a flush.
3. pmd_free() - You're now freeing the page.

One of the points about the shootdown interface is that it batches things
up. So what's wrong with:

static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp,
unsigned long addr)
{
#ifdef CONFIG_ARM_LPAE
tlb_add_flush(tlb, addr);
tlb_remove_page(tlb, virt_to_page(pmdp));
#endif
}

and leave the tlb invalidate and actual page freeing to the batching code
to deal with?

Catalin Marinas

2011-05-11 13:40:49 UTC

1. tlb_add_flush() - Adding the address which covers the PMD to the range
of virtual addresses which need flushing - ok.
2. tlb_flush() - You're then forcing a flush.
3. pmd_free() - You're now freeing the page.
One of the points about the shootdown interface is that it batches things
static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp,
unsigned long addr)
{
#ifdef CONFIG_ARM_LPAE
tlb_add_flush(tlb, addr);
tlb_remove_page(tlb, virt_to_page(pmdp));
#endif
}
and leave the tlb invalidate and actual page freeing to the batching code
to deal with?

There isn't a big overhead with my initial code as a pmd covers 1GB and
we only have 1 or 2 pmds per process that we can free.

Is there any room for optimising the mmu_gather range? I think this only
matters for case 1 in your tlb_flush() comment - unmapping a page range
with a few pages in one pmd and a few other pages in the next pmd we get
over 1GB range when we actually only need to flush the TLB for a few
pages.

If tlb_add_flush would get a start/end range (or addr/size), we know
that any TLB flush within the start..end range would be enough and thus
we avoid artificially increasing the range.

We could also modify flush_tlb_range() to branch to flush_tlb_mm() for
big ranges.

--
Catalin

Russell King - ARM Linux

2011-05-11 14:00:10 UTC

Post by Russell King - ARM Linux
One of the points about the shootdown interface is that it batches things
static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp,
unsigned long addr)
{
#ifdef CONFIG_ARM_LPAE
tlb_add_flush(tlb, addr);
tlb_remove_page(tlb, virt_to_page(pmdp));
#endif
}
and leave the tlb invalidate and actual page freeing to the batching code
to deal with?

One of the points is to keep the code as similar to other architectures
so that the folk who are working on consolidating this stuff across other
architectures don't have to wonder why ARM is _unnecessarily_ doing things
differently.

Catalin Marinas

2011-05-11 15:58:36 UTC

Actually Peter Zijlstra's proposal uses a tlb_track_range() function
which takes start and end range arguments.

But I'm fine with your variant for now.

--
Catalin

Russell King - ARM Linux

2011-05-23 16:54:12 UTC

This set of patches adds support for the Large Physical Extensions on
the ARM architecture (available with the Cortex-A15 processor). LPAE
comes with a 3-level page table format (compared to 2-level for the
classic one), allowing up to 40-bit physical address space.
http://infocenter.arm.com/help/topic/com.arm.doc.ddi0406b_virtualization_extns/index.html
The full set of patches on top of linux-next (LPAE, support for an
emulated Versatile Express with Cortex-A15 tile and generic timers) is
git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-2.6-cm.git arm-lpae-next

FYI, I'm going to drop the pgt patch because the warnings are still there
and I _still_ don't feel happy about pushing that into mainline and then
being endlessly bugged about it.

So I'll drop it from my tree again and re-merge that branch after this
window has closed.

Catalin Marinas

2011-05-23 17:22:34 UTC

FYI, I'm going to drop the pgt patch because the warnings are still there
and I _still_ don't feel happy about pushing that into mainline and then
being endlessly bugged about it.

I haven't seen the warnings but probably because I applied the LPAE
patches on top. I'll have a look as well.

Post by Russell King - ARM Linux
So I'll drop it from my tree again and re-merge that branch after this
window has closed.

OK. In the meantime I'll cherry-pick it into my LPAE branch based on
mainline. As we discussed, after -rc1 I plan to push the LPAE patches to
-next. Your patch would come from two different sources but I'm not sure
whether git can cope with it (given that I rename files like pgtable.h).
The alternative would be to base my patches on your branch as long as
you don't rebase it.

--
Catalin

Catalin Marinas

2011-05-24 10:04:08 UTC

FYI, I'm going to drop the pgt patch because the warnings are still there

What warnings are you seeing? Could you please post them?

I tried with just your nopud patch on top of v2.6.39 and compiled for
vexpress. I don't get any warnings and I tried STRICT_MM_TYPECHECKS as
well.

--
Catalin

Catalin Marinas

2011-05-26 21:15:49 UTC

FYI, I'm going to drop the pgt patch because the warnings are still there

What warnings are you seeing? Could you please post them?

Ping?

I'd like to fix those warning but I can't reproduce them (maybe
different compiler versions?).

--
Catalin

Russell King - ARM Linux

2011-05-26 21:44:49 UTC

This set of patches adds support for the Large Physical Extensio=

ns on

the ARM architecture (available with the Cortex-A15 processor). =

LPAE

comes with a 3-level page table format (compared to 2-level for =

the

classic one), allowing up to 40-bit physical address space.
The ARM LPAE documentation is available from (free registration =
http://infocenter.arm.com/help/topic/com.arm.doc.ddi0406b_virtua=

lization_extns/index.html

The full set of patches on top of linux-next (LPAE, support for =

emulated Versatile Express with Cortex-A15 tile and generic time=

rs) is

git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-2.6=

-cm.git arm-lpae-next

FYI, I'm going to drop the pgt patch because the warnings are stil=

l there

Post by Catalin Marinas
What warnings are you seeing? Could you please post them?

=20
Ping?
=20
I'd like to fix those warning but I can't reproduce them (maybe
different compiler versions?).

They're certainly not compiler version dependent (or if they are, your
compiler is broken). It'll probably be because you're building for
SMP, in which case the affected code is not built:

arch/arm/mm/ioremap.c: In function =E2=96=A0unmap_area_sections=E2=96=A0=
:
arch/arm/mm/ioremap.c:86: warning: passing argument 1 of =E2=96=A0pmd_o=
ffset=E2=96=A0 from incompatible pointer type
arch/arm/mm/ioremap.c: In function =E2=96=A0remap_area_sections=E2=96=A0=
:
arch/arm/mm/ioremap.c:136: warning: passing argument 1 of =E2=96=A0pmd_=
offset=E2=96=A0 from incompatible pointer type
arch/arm/mm/ioremap.c: In function =E2=96=A0remap_area_supersections=E2=
=96=A0:
arch/arm/mm/ioremap.c:173: warning: passing argument 1 of =E2=96=A0pmd_=
offset=E2=96=A0 from incompatible pointer type

These need to be fixed before I push the pgt branch out into mainline,
otherwise I'm going to get nagged.

Catalin Marinas

2011-05-27 09:09:49 UTC