Skip to content

Commit 00a2a15

Browse files
Merge branch 'main' into fp16-fptrunc-fpext-lowering
2 parents 4b9a0e3 + 136c406 commit 00a2a15

File tree

367 files changed

+8024
-6829
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

367 files changed

+8024
-6829
lines changed

bolt/README.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -108,9 +108,10 @@ $ perf record -e cycles:u -j any,u -o perf.data -- <executable> <args> ...
108108
#### For Services
109109

110110
Once you get the service deployed and warmed-up, it is time to collect perf
111-
data with LBR (branch information). The exact perf command to use will depend
112-
on the service. E.g., to collect the data for all processes running on the
113-
server for the next 3 minutes use:
111+
data with brstack (branch information). Different architectures implement this
112+
using different hardware units, for example LBR on X86, and BRBE on AArch64.
113+
The exact perf command to use will depend on the service. E.g., to collect the
114+
data for all processes running on the server for the next 3 minutes use:
114115
```
115116
$ perf record -e cycles:u -j any,u -a -o perf.data -- sleep 180
116117
```
@@ -163,7 +164,7 @@ $ perf2bolt -p perf.data -o perf.fdata <executable>
163164
This command will aggregate branch data from `perf.data` and store it in a
164165
format that is both more compact and more resilient to binary modifications.
165166

166-
If the profile was collected without LBRs, you will need to add `-nl` flag to
167+
If the profile was collected without brstacks, you will need to add `-nl` flag to
167168
the command line above.
168169

169170
### Step 3: Optimize with BOLT

bolt/docs/Heatmaps.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# Code Heatmaps
22

33
BOLT has gained the ability to print code heatmaps based on
4-
sampling-based profiles generated by `perf`, either with `LBR` data or not.
4+
sampling-based profiles generated by `perf`, either with `brstack` data or not.
55
The output is produced in colored ASCII to be displayed in a color-capable
66
terminal. It looks something like this:
77

@@ -20,9 +20,9 @@ or if you want to monitor the existing process(es):
2020
$ perf record -e cycles:u -j any,u [-p PID|-a] -- sleep <interval>
2121
```
2222

23-
Running with LBR (`-j any,u` or `-b`) is recommended. Heatmaps can be generated
24-
from basic events by using the llvm-bolt-heatmap option `-nl` (no LBR) but
25-
such heatmaps do not have the coverage provided by LBR and may only be useful
23+
Running with brstack (`-j any,u` or `-b`) is recommended. Heatmaps can be generated
24+
from basic events by using the llvm-bolt-heatmap option `-nl` (no brstack) but
25+
such heatmaps do not have the coverage provided by brstack and may only be useful
2626
for finding event hotspots at larger code block granularities.
2727

2828
Once the run is complete, and `perf.data` is generated, run llvm-bolt-heatmap:

bolt/docs/OptimizingClang.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -97,7 +97,7 @@ BOLT-INFO: basic block reordering modified layout of 7848 (10.32%) functions
9797
790053908 : all conditional branches (=)
9898
...
9999
```
100-
The statistics in the output is based on the LBR profile collected with `perf`, and since we were using
100+
The statistics in the output is based on the brstack profile (LBR) collected with `perf`, and since we were using
101101
the `cycles` counter, its accuracy is affected. However, the relative improvement in `taken conditional
102102
branches` is a good indication that BOLT was able to straighten out the code even after PGO.
103103

bolt/docs/OptimizingLinux.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55

66
Many Linux applications spend a significant amount of their execution time in the kernel. Thus, when we consider code optimization for system performance, it is essential to improve the CPU utilization not only in the user-space applications and libraries but also in the kernel. BOLT has demonstrated double-digit gains while being applied to user-space programs. This guide shows how to apply BOLT to the x86-64 Linux kernel and enhance your system's performance. In our experiments, BOLT boosted database TPS by 2 percent when applied to the kernel compiled with the highest level optimizations, including PGO and LTO. The database spent ~40% of the time in the kernel and was quite sensitive to kernel performance.
77

8-
BOLT optimizes code layout based on a low-level execution profile collected with the Linux `perf` tool. The best quality profile should include branch history, such as Intel's last branch records (LBR). BOLT runs on a linked binary and reorders the code while combining frequently executed blocks of instructions in a manner best suited for the hardware. Other than branch instructions, most of the code is left unchanged. Additionally, BOLT updates all metadata associated with the modified code, including DWARF debug information and Linux ORC unwind information.
8+
BOLT optimizes code layout based on a low-level execution profile collected with the Linux `perf` tool. The best quality profile should include branch history (brstack), such as Intel's last branch records (LBR) or AArch64's Branch Record Buffer Extension (BRBE). BOLT runs on a linked binary and reorders the code while combining frequently executed blocks of instructions in a manner best suited for the hardware. Other than branch instructions, most of the code is left unchanged. Additionally, BOLT updates all metadata associated with the modified code, including DWARF debug information and Linux ORC unwind information.
99

1010
While BOLT optimizations are not specific to the Linux kernel, certain quirks distinguish the kernel from user-level applications.
1111

bolt/lib/Profile/DataAggregator.cpp

Lines changed: 18 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -46,16 +46,15 @@ namespace opts {
4646

4747
static cl::opt<bool>
4848
BasicAggregation("nl",
49-
cl::desc("aggregate basic samples (without LBR info)"),
49+
cl::desc("aggregate basic samples (without brstack info)"),
5050
cl::cat(AggregatorCategory));
5151

5252
cl::opt<bool> ArmSPE("spe", cl::desc("Enable Arm SPE mode."),
5353
cl::cat(AggregatorCategory));
5454

55-
static cl::opt<std::string>
56-
ITraceAggregation("itrace",
57-
cl::desc("Generate LBR info with perf itrace argument"),
58-
cl::cat(AggregatorCategory));
55+
static cl::opt<std::string> ITraceAggregation(
56+
"itrace", cl::desc("Generate brstack info with perf itrace argument"),
57+
cl::cat(AggregatorCategory));
5958

6059
static cl::opt<bool>
6160
FilterMemProfile("filter-mem-profile",
@@ -201,7 +200,7 @@ void DataAggregator::start() {
201200
}
202201

203202
if (opts::BasicAggregation) {
204-
launchPerfProcess("events without LBR", MainEventsPPI,
203+
launchPerfProcess("events without brstack", MainEventsPPI,
205204
"script -F pid,event,ip");
206205
} else if (!opts::ITraceAggregation.empty()) {
207206
// Disable parsing memory profile from trace data, unless requested by user.
@@ -1069,7 +1068,7 @@ ErrorOr<DataAggregator::LBREntry> DataAggregator::parseLBREntry() {
10691068
if (std::error_code EC = Rest.getError())
10701069
return EC;
10711070
if (Rest.get().size() < 5) {
1072-
reportError("expected rest of LBR entry");
1071+
reportError("expected rest of brstack entry");
10731072
Diag << "Found: " << Rest.get() << "\n";
10741073
return make_error_code(llvm::errc::io_error);
10751074
}
@@ -1433,7 +1432,7 @@ std::error_code DataAggregator::printLBRHeatMap() {
14331432
errs() << "HEATMAP-ERROR: no basic event samples detected in profile. "
14341433
"Cannot build heatmap.";
14351434
} else {
1436-
errs() << "HEATMAP-ERROR: no LBR traces detected in profile. "
1435+
errs() << "HEATMAP-ERROR: no brstack traces detected in profile. "
14371436
"Cannot build heatmap. Use -nl for building heatmap from "
14381437
"basic events.\n";
14391438
}
@@ -1572,7 +1571,7 @@ void DataAggregator::printBranchStacksDiagnostics(
15721571

15731572
std::error_code DataAggregator::parseBranchEvents() {
15741573
std::string BranchEventTypeStr =
1575-
opts::ArmSPE ? "SPE branch events in LBR-format" : "branch events";
1574+
opts::ArmSPE ? "SPE branch events in brstack-format" : "branch events";
15761575
outs() << "PERF2BOLT: parse " << BranchEventTypeStr << "...\n";
15771576
NamedRegionTimer T("parseBranch", "Parsing branch events", TimerGroupName,
15781577
TimerGroupDesc, opts::TimeAggregator);
@@ -1620,16 +1619,18 @@ std::error_code DataAggregator::parseBranchEvents() {
16201619
clear(TraceMap);
16211620

16221621
outs() << "PERF2BOLT: read " << NumSamples << " samples and " << NumEntries
1623-
<< " LBR entries\n";
1622+
<< " brstack entries\n";
16241623
if (NumTotalSamples) {
16251624
if (NumSamples && NumSamplesNoLBR == NumSamples) {
16261625
// Note: we don't know if perf2bolt is being used to parse memory samples
16271626
// at this point. In this case, it is OK to parse zero LBRs.
16281627
if (!opts::ArmSPE)
16291628
errs()
16301629
<< "PERF2BOLT-WARNING: all recorded samples for this binary lack "
1631-
"LBR. Record profile with perf record -j any or run perf2bolt "
1632-
"in no-LBR mode with -nl (the performance improvement in -nl "
1630+
"brstack. Record profile with perf record -j any or run "
1631+
"perf2bolt "
1632+
"in non-brstack mode with -nl (the performance improvement in "
1633+
"-nl "
16331634
"mode may be limited)\n";
16341635
else
16351636
errs()
@@ -1664,7 +1665,7 @@ void DataAggregator::processBranchEvents() {
16641665
}
16651666

16661667
std::error_code DataAggregator::parseBasicEvents() {
1667-
outs() << "PERF2BOLT: parsing basic events (without LBR)...\n";
1668+
outs() << "PERF2BOLT: parsing basic events (without brstack)...\n";
16681669
NamedRegionTimer T("parseBasic", "Parsing basic events", TimerGroupName,
16691670
TimerGroupDesc, opts::TimeAggregator);
16701671
while (hasData()) {
@@ -1688,7 +1689,7 @@ std::error_code DataAggregator::parseBasicEvents() {
16881689
}
16891690

16901691
void DataAggregator::processBasicEvents() {
1691-
outs() << "PERF2BOLT: processing basic events (without LBR)...\n";
1692+
outs() << "PERF2BOLT: processing basic events (without brstack)...\n";
16921693
NamedRegionTimer T("processBasic", "Processing basic events", TimerGroupName,
16931694
TimerGroupDesc, opts::TimeAggregator);
16941695
uint64_t OutOfRangeSamples = 0;
@@ -1777,7 +1778,8 @@ std::error_code DataAggregator::parsePreAggregatedLBRSamples() {
17771778
++AggregatedLBRs;
17781779
}
17791780

1780-
outs() << "PERF2BOLT: read " << AggregatedLBRs << " aggregated LBR entries\n";
1781+
outs() << "PERF2BOLT: read " << AggregatedLBRs
1782+
<< " aggregated brstack entries\n";
17811783

17821784
return std::error_code();
17831785
}
@@ -2426,7 +2428,7 @@ std::error_code DataAggregator::writeBATYAML(BinaryContext &BC,
24262428
void DataAggregator::dump() const { DataReader::dump(); }
24272429

24282430
void DataAggregator::dump(const PerfBranchSample &Sample) const {
2429-
Diag << "Sample LBR entries: " << Sample.LBR.size() << "\n";
2431+
Diag << "Sample brstack entries: " << Sample.LBR.size() << "\n";
24302432
for (const LBREntry &LBR : Sample.LBR)
24312433
Diag << LBR << '\n';
24322434
}

bolt/lib/Profile/DataReader.cpp

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -570,16 +570,16 @@ void DataReader::readBasicSampleData(BinaryFunction &BF) {
570570
if (!SampleDataOrErr)
571571
return;
572572

573-
// Basic samples mode territory (without LBR info)
573+
// Basic samples mode territory (without brstack info)
574574
// First step is to assign BB execution count based on samples from perf
575575
BF.ProfileMatchRatio = 1.0f;
576576
BF.removeTagsFromProfile();
577577
bool NormalizeByInsnCount = usesEvent("cycles") || usesEvent("instructions");
578578
bool NormalizeByCalls = usesEvent("branches");
579579
static bool NagUser = true;
580580
if (NagUser) {
581-
outs()
582-
<< "BOLT-INFO: operating with basic samples profiling data (no LBR).\n";
581+
outs() << "BOLT-INFO: operating with basic samples profiling data (no "
582+
"brstack).\n";
583583
if (NormalizeByInsnCount)
584584
outs() << "BOLT-INFO: normalizing samples by instruction count.\n";
585585
else if (NormalizeByCalls)

bolt/test/X86/bolt-address-translation-yaml.test

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@ WRITE-BAT-CHECK: BOLT-INFO: BAT section size (bytes): 404
4646

4747
READ-BAT-CHECK-NOT: BOLT-ERROR: unable to save profile in YAML format for input file processed by BOLT
4848
READ-BAT-CHECK: BOLT-INFO: Parsed 5 BAT entries
49-
READ-BAT-CHECK: PERF2BOLT: read 79 aggregated LBR entries
49+
READ-BAT-CHECK: PERF2BOLT: read 79 aggregated brstack entries
5050
READ-BAT-CHECK: HEATMAP: building heat map
5151
READ-BAT-CHECK: BOLT-INFO: 5 out of 21 functions in the binary (23.8%) have non-empty execution profile
5252
READ-BAT-FDATA-CHECK: BOLT-INFO: 5 out of 16 functions in the binary (31.2%) have non-empty execution profile

bolt/test/X86/heatmap-preagg.test

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ RUN: --block-size=1024 | FileCheck --check-prefix CHECK-HEATMAP-BAT-1K %s
3232
CHECK-HEATMAP-BAT-1K: HEATMAP: dumping heatmap with bucket size 1024
3333
CHECK-HEATMAP-BAT-1K-NOT: HEATMAP: dumping heatmap with bucket size
3434

35-
CHECK-HEATMAP: PERF2BOLT: read 81 aggregated LBR entries
35+
CHECK-HEATMAP: PERF2BOLT: read 81 aggregated brstack entries
3636
CHECK-HEATMAP: HEATMAP: invalid traces: 1
3737
CHECK-HEATMAP: HEATMAP: dumping heatmap with bucket size 64
3838
CHECK-HEATMAP: HEATMAP: dumping heatmap with bucket size 128
@@ -71,7 +71,7 @@ CHECK-HM-1024-NEXT: 0
7171
CHECK-BAT-HM-64: (349, 1126]
7272
CHECK-BAT-HM-4K: (605, 2182]
7373

74-
CHECK-HEATMAP-BAT: PERF2BOLT: read 79 aggregated LBR entries
74+
CHECK-HEATMAP-BAT: PERF2BOLT: read 79 aggregated brstack entries
7575
CHECK-HEATMAP-BAT: HEATMAP: invalid traces: 2
7676
CHECK-HEATMAP-BAT: HEATMAP: dumping heatmap with bucket size 64
7777
CHECK-HEATMAP-BAT: HEATMAP: dumping heatmap with bucket size 4096

bolt/test/X86/nolbr.s

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@
1717
# CHECK-FDATA-NEXT: 1 _start [[#]] 1
1818

1919
# CHECK-BOLT: BOLT-INFO: pre-processing profile using branch profile reader
20-
# CHECK-BOLT: BOLT-INFO: operating with basic samples profiling data (no LBR).
20+
# CHECK-BOLT: BOLT-INFO: operating with basic samples profiling data (no brstack).
2121
# CHECK-BOLT: BOLT-INFO: 1 out of 1 functions in the binary (100.0%) have non-empty execution profile
2222

2323
.globl _start

bolt/test/perf2bolt/AArch64/perf2bolt-spe.test

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,6 @@ RUN: %clang %cflags %p/../../Inputs/asm_foo.s %p/../../Inputs/asm_main.c -o %t.e
66

77
RUN: perf record -e cycles -q -o %t.perf.data -- %t.exe 2> /dev/null
88

9-
RUN: perf2bolt -p %t.perf.data -o %t.perf.boltdata --spe %t.exe | FileCheck %s --check-prefix=CHECK-SPE-LBR
9+
RUN: perf2bolt -p %t.perf.data -o %t.perf.boltdata --spe %t.exe | FileCheck %s --check-prefix=CHECK-SPE-BRSTACK
1010

11-
CHECK-SPE-LBR: PERF2BOLT: parse SPE branch events in LBR-format
11+
CHECK-SPE-BRSTACK: PERF2BOLT: parse SPE branch events in brstack-format

0 commit comments

Comments
 (0)