Commit graph

18 commits

Author SHA1 Message Date
Boyan Karatotev
83ec7e452c perf(amu): greatly simplify AMU context management
The current code is incredibly resilient to updates to the spec and
has worked quite well so far. However, recent implementations expose a
weakness in that this is rather slow. A large part of it is written in
assembly, making it opaque to the compiler for optimisations. The
future proofness requires reading registers that are effectively
`volatile`, making it even harder for the compiler, as well as adding
lots of implicit barriers, making it hard for the microarchitecutre to
optimise as well.

We can make a few assumptions, checked by a few well placed asserts, and
remove a lot of this burden. For a start, at the moment there are 4
group 0 counters with static assignments. Contexting them is a trivial
affair that doesn't need a loop. Similarly, there can only be up to 16
group 1 counters. Contexting them is a bit harder, but we can do with a
single branch with a falling through switch. If/when both of these
change, we have a pair of asserts and the feature detection mechanism to
guard us against pretending that we support something we don't.

We can drop contexting of the offset registers. They are fully
accessible by EL2 and as such are its responsibility to preserve on
powerdown.

Another small thing we can do, is pass the core_pos into the hook.
The caller already knows which core we're running on, we don't need to
call this non-trivial function again.

Finally, knowing this, we don't really need the auxiliary AMUs to be
described by the device tree. Linux doesn't care at the moment, and any
information we need for EL3 can be neatly placed in a simple array.

All of this, combined with lifting the actual saving out of assembly,
reduces the instructions to save the context from 180 to 40, including a
lot fewer branches. The code is also much shorter and easier to read.

Also propagate to aarch32 so that the two don't diverge too much.

Change-Id: Ib62e6e9ba5be7fb9fb8965c8eee148d5598a5361
Signed-off-by: Boyan Karatotev <boyan.karatotev@arm.com>
2025-02-25 08:50:46 +00:00
Boyan Karatotev
2590e819eb perf(mpmm): greatly simplify MPMM enablement
MPMM is a core-specific microarchitectural feature. It has been present
in every Arm core since the Cortex-A510 and has been implemented in
exactly the same way. Despite that, it is enabled more like an
architectural feature with a top level enable flag. This utilised the
identical implementation.

This duality has left MPMM in an awkward place, where its enablement
should be generic, like an architectural feature, but since it is not,
it should also be core-specific if it ever changes. One choice to do
this has been through the device tree.

This has worked just fine so far, however, recent implementations expose
a weakness in that this is rather slow - the device tree has to be read,
there's a long call stack of functions with many branches, and system
registers are read. In the hot path of PSCI CPU powerdown, this has a
significant and measurable impact. Besides it being a rather large
amount of code that is difficult to understand.

Since MPMM is a microarchitectural feature, its correct placement is in
the reset function. The essence of the current enablement is to write
CPUPPMCR_EL3.MPMM_EN if CPUPPMCR_EL3.MPMMPINCTL == 0. Replacing the C
enablement with an assembly macro in each CPU's reset function achieves
the same effect with just a single close branch and a grand total of 6
instructions (versus the old 2 branches and 32 instructions).

Having done this, the device tree entry becomes redundant. Should a core
that doesn't support MPMM arise, this can cleanly be handled in the
reset function. As such, the whole ENABLE_MPMM_FCONF and platform hooks
mechanisms become obsolete and are removed.

Change-Id: I1d0475b21a1625bb3519f513ba109284f973ffdf
Signed-off-by: Boyan Karatotev <boyan.karatotev@arm.com>
2025-02-25 08:50:45 +00:00
Jackson Cooper-Driver
967999d0d9 refactor(tc): clarify msc0 DT node
This node specifies the location of the MPAM registers for the DSU.
Rename the node to clarify this.

Signed-off-by: Jackson Cooper-Driver <jackson.cooper-driver@arm.com>
Signed-off-by: Icen.Zeyada <Icen.Zeyada2@arm.com>
Change-Id: Ie870a7f31acbc44dd943e76896219b9bbdd7d5b4
2025-01-29 08:13:35 +00:00
Jagdish Gediya
25264e292c refactor(tc): remove redundant macro UARTCLK_FREQ
remove redundant macro UARTCLK_FREQ and replace it with TC_UARTCLK
in dts.

Change-Id: Id463a9ddc1588278e552ffca3dfb738676229ce7
Signed-off-by: Jagdish Gediya <jagdish.gediya@arm.com>
Signed-off-by: Icen.Zeyada <Icen.Zeyada2@arm.com>
2025-01-07 09:28:16 +00:00
Leo Yan
b3a4f8cfcf feat(tc): update DT for Drage GPU
This patch incorporates the changes for Drage GPU to uses new access
window interface "IRQ_AW". As the interrupt properties are different
between TC4 and other TC platforms, this patch appends the interrupt
properties in platform specific DT binding file.

Change-Id: I2ca505846f03ce64b8e5f02fd202962dbfe39f25
Signed-off-by: Leo Yan <leo.yan@arm.com>
2024-08-29 14:39:21 +01:00
Jackson Cooper-Driver
e9e83e96bb feat(tc): add new TC4 RoS definitions
The TC4 uses a new RoS (Virtual Peripherals) and places them at
different address to that in TC3. Add these addresses to the DTS.

Change-Id: Ia62a670e47cdc98b3c113a670a21edc65905cafe
Signed-off-by: Jackson Cooper-Driver <jackson.cooper-driver@arm.com>
Signed-off-by: Leo Yan <leo.yan@arm.com>
2024-08-29 14:39:21 +01:00
Jagdish Gediya
7aca660c4e fix(tc): correct CPU PMU binding
CPU PMU types are not same for all CPUs on TC platforms, so define the
PMU nodes per micro architectures.

Change-Id: I4e940976cdda9a6eab3e15936c6c41a2bb668c9d
Signed-off-by: Jagdish Gediya <jagdish.gediya@arm.com>
Signed-off-by: Leo Yan <leo.yan@arm.com>
2024-08-05 16:25:59 +01:00
Jagdish Gediya
77080f6aaf feat(tc): add device tree binding for SPE
Add node for Statistical Profiling Extension, which provides
periodic sampling of operations in the CPU pipeline and reports
this via the perf AUX interface.

Change-Id: Ic7a9d9ce927edbce02c7c09470a009dc56247240
Signed-off-by: Jagdish Gediya <jagdish.gediya@arm.com>
Signed-off-by: Leo Yan <leo.yan@arm.com>
2024-08-05 16:25:59 +01:00
Jagdish Gediya
ebc991b3a1 feat(tc): add PPI partitions in DT binding
Define ppi-partitions for little, middle, and big cpu groups. PPI
affinity is expressed as a single "ppi-partitions" node, containing a
set of sub-nodes for each microarchitecture type, each with the
property 'affinity' which should be a list of phandles to CPU nodes.

PPI paritions are useful to affine specific PPI with set of CPUs
so that the drivers of micro-architecture specific nodes which uses
PPI can be divided based on CPU list e.g. SPE-PMU, CPU-PMU etc.

Change-Id: If7d47f71387ac982d2d992a0ce2de1652d564bd6
Signed-off-by: Jagdish Gediya <jagdish.gediya@arm.com>
Signed-off-by: Leo Yan <leo.yan@arm.com>
2024-08-05 16:25:59 +01:00
Jagdish Gediya
1300bbce15 feat(tc): change GIC DT property 'interrupt-cells' to 4
Change the GIC's DT property 'interrupt-cells' to 4, so the 4th cell is
a phandle to a node describing a set of CPUs this interrupt is affine
to.

If an interrupt is a PPI, and the node pointed in the 4th cell must be a
subnode of the "ppi-partitions" in the GIC node. For interrupt types
other than PPI, this cell must be zero. This is a preparison for
sequential changes for interrupt partitions, as the first step, it sets
all zeros for the interrupt affinity.

Change-Id: I66490a86a27aad5db6b1a42c2d8e0d042eee46a9
Signed-off-by: Jagdish Gediya <jagdish.gediya@arm.com>
Signed-off-by: Leo Yan <leo.yan@arm.com>
2024-08-05 16:25:59 +01:00
Leo Yan
2458b38772 refactor(tc): append binding for SMMU-700
The usage for SMMU-700 is not consistent across TC platforms:

  SMMU-700 on TC2:

          | FVP   | FPGA
  --------+-------+------
  Display | Used  | Used
  GPU     | Used  | Used

  SMMU-700 on TC3:

          | FVP   | FPGA
  --------+-------+------
  Display | No    | No
  GPU     | Used  | No

This commit changes to use append mode for SMMU-700 to bind it on TC2
and TC3 separately. As a result, the TC_IOMMU_EN configuration is not
used, remove it.

Change-Id: Ic4152eb4c8ef97bf27b8a97c3c6cb86e32a2e8eb
Signed-off-by: Leo Yan <leo.yan@arm.com>
2024-06-04 13:31:27 +01:00
Boyan Karatotev
f2596ff1a8 feat(tc): bind SCMI over MHUv3 for TC3
TC2 and TC3 have different the scmi shared memory regions and MHU
parameters, this patch appends the properties in scmi node for TC2 and
TC3 respectively.

Change-Id: Ifd001f780b575987877b4be36eb755a9dbe57e60
Signed-off-by: Boyan Karatotev <boyan.karatotev@arm.com>
Signed-off-by: Leo Yan <leo.yan@arm.com>
2024-05-22 15:58:57 +01:00
Boyan Karatotev
6c069e7168 feat(tc): add MHUv3 DT binding for TC3
MHUv3's device tree is different from MHUv2's. Add support MHUv3 DT
binding for TC3 while keeping TC2 as-is.

Change-Id: Ib2f55d3a64a4cfe2ea9e62fe39d27ed54a2ca007
Signed-off-by: Boyan Karatotev <boyan.karatotev@arm.com>
Signed-off-by: Leo Yan <leo.yan@arm.com>
2024-05-22 15:58:57 +01:00
Boyan Karatotev
c33a393675 refactor(tc): move MHUv2 property to tc2.dts
As only TC2 uses MHUv2, move the protocol property to tc2.dts.

Change-Id: I39dd57311e1058a6aabd4cbd5028511f704dd234
Signed-off-by: Boyan Karatotev <boyan.karatotev@arm.com>
Signed-off-by: Leo Yan <leo.yan@arm.com>
2024-04-30 14:20:18 +01:00
Leo Yan
4e772e6ba3 refactor(tc): introduce a new file tc-fpga.dtsi
A Total Compute platform supports FVP and FPGA target. And it's possible
that these two targets have different hardware components. For this
reason, this patch introduces a new file tc-fpga.dtsi for FPGA related
DT binding.

As a result, this patch moves out FVP and FPGA specific macros into
tc-fvp.dtsi and tc-fpga.dtsi respectively.

Change-Id: I48d7d30d0c500cec5500f1a2a680e8b3a276ea99
Signed-off-by: Leo Yan <leo.yan@arm.com>
2024-04-30 14:20:18 +01:00
Leo Yan
f9565b2af1 refactor(tc): move out platform specific DT binding from tc-base.dtsi
The main purpose of 'tc-base.dtsi' is for common DT bindings, however,
it contains bindings for platform specific.

This patch moves out these plaform specific bindings to 'tc2.dts' and
'tc3.dts' respectively.

Change-Id: I9355eeff539a3f2940190aef399b4fb4828cbbac
Signed-off-by: Leo Yan <leo.yan@arm.com>
2024-04-30 14:20:18 +01:00
Leo Yan
defcfb2b63 refactor(tc): move out platform specific code from tc_vers.dtsi
Since now every TC board has its own dts file, this patch moves out the
platform specific code from tc_vers.dtsi to the corresponding platform
dts file.

Change-Id: I62e0872eddb2ae18e666a3f8dc0118a539651a9c
Signed-off-by: Leo Yan <leo.yan@arm.com>
2024-04-30 14:20:18 +01:00
Leo Yan
b3a9737ce0 refactor(tc): add platform specific DT files
Currently, the DT binding uses the file 'tc.dts' as a central place for
all TC platforms. And the variables (for different platforms, or FVP vs
FPGA, etc.) are maintained in 'tc_vers.dtsi'.

This patch renames 'tc.dts' to 'tc-base.dtsi' and creates an individual
.dts file for every platform. The purpose is to use 'tc-base.dtsi' for
maintaining common DT binding and every platform's specific definitions
will be moved into its own .dts file. This is a preparation for
sequential refactoring.

It changes to include the header files in platform DTS files but not in
the 'tc-base.dtsi'. This can allow 'tc-base.dtsi' is general enough and
platform DTS files covers platform specific defintions.

Change-Id: I034fb3f8836bcea36e8ad8ae01de41127693b0c6
Signed-off-by: Leo Yan <leo.yan@arm.com>
2024-04-30 14:20:18 +01:00