Skip to content

Conversation

@zhyty
Copy link
Contributor

@zhyty zhyty commented Oct 31, 2025

Small change to use (what I think is) a better practice -- we were using the m_indexed bool member to make sure we called Index() once, but we should just use std::once! This change shouldn't affect functionality.

This change may also make concurrent access to Index() thread-safe, though the ManualDWARFIndex API isn't completely thread-safe due to Decode(). I'm not sure if ManualDWARFIndex was ever intended to be thread-safe.

Test Plan:

ninja check-lldb

Tested basic debugging workflow of a couple of large projects I had built. Basically:

(lldb) target create <project> (lldb) b main (lldb) r (lldb) step ... 

I A/B tested the performance of launching several modules with parallel module loading and didn't observe any performance regressions.

Tom Yang added 2 commits October 31, 2025 10:38
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
@zhyty zhyty requested a review from JDevlieghere as a code owner October 31, 2025 18:10
@llvmbot llvmbot added the lldb label Oct 31, 2025
@llvmbot
Copy link
Member

llvmbot commented Oct 31, 2025

@llvm/pr-subscribers-lldb

Author: Tom Yang (zhyty)

Changes

Small change to use (what I think is) a better practice -- we were using the m_indexed bool member to make sure we called Index() once, but we should just use std::once! This change shouldn't affect functionality.

This change may also make concurrent access to Index() thread-safe, though the ManualDWARFIndex API isn't completely thread-safe due to Decode(). I'm not sure if ManualDWARFIndex was ever intended to be thread-safe.

Test Plan:

ninja check-lldb

Tested basic debugging workflow of a couple of large projects I had built. Basically:

(lldb) target create &lt;project&gt; (lldb) b main (lldb) r (lldb) step ... 

I A/B tested the performance of launching several modules with parallel module loading and didn't observe any performance regressions.


Full diff: https://github.com/llvm/llvm-project/pull/165896.diff

2 Files Affected:

  • (modified) lldb/source/Plugins/SymbolFile/DWARF/ManualDWARFIndex.cpp (+100-103)
  • (modified) lldb/source/Plugins/SymbolFile/DWARF/ManualDWARFIndex.h (+1-1)
diff --git a/lldb/source/Plugins/SymbolFile/DWARF/ManualDWARFIndex.cpp b/lldb/source/Plugins/SymbolFile/DWARF/ManualDWARFIndex.cpp index d90108f687f84..aa30682faed90 100644 --- a/lldb/source/Plugins/SymbolFile/DWARF/ManualDWARFIndex.cpp +++ b/lldb/source/Plugins/SymbolFile/DWARF/ManualDWARFIndex.cpp @@ -22,7 +22,6 @@ #include "lldb/Utility/Stream.h" #include "lldb/Utility/Timer.h" #include "lldb/lldb-private-enumerations.h" -#include "llvm/Support/FormatVariadic.h" #include "llvm/Support/ThreadPool.h" #include <atomic> #include <optional> @@ -33,118 +32,116 @@ using namespace lldb_private::plugin::dwarf; using namespace llvm::dwarf; void ManualDWARFIndex::Index() { - if (m_indexed) - return; - m_indexed = true; - - ElapsedTime elapsed(m_index_time); - LLDB_SCOPED_TIMERF("%p", static_cast<void *>(m_dwarf)); - if (LoadFromCache()) { - m_dwarf->SetDebugInfoIndexWasLoadedFromCache(); - return; - } + std::call_once(m_indexed_flag, [this]() { + ElapsedTime elapsed(m_index_time); + LLDB_SCOPED_TIMERF("%p", static_cast<void *>(m_dwarf)); + if (LoadFromCache()) { + m_dwarf->SetDebugInfoIndexWasLoadedFromCache(); + return; + } - DWARFDebugInfo &main_info = m_dwarf->DebugInfo(); - SymbolFileDWARFDwo *dwp_dwarf = m_dwarf->GetDwpSymbolFile().get(); - DWARFDebugInfo *dwp_info = dwp_dwarf ? &dwp_dwarf->DebugInfo() : nullptr; - - std::vector<DWARFUnit *> units_to_index; - units_to_index.reserve(main_info.GetNumUnits() + - (dwp_info ? dwp_info->GetNumUnits() : 0)); - - // Process all units in the main file, as well as any type units in the dwp - // file. Type units in dwo files are handled when we reach the dwo file in - // IndexUnit. - for (size_t U = 0; U < main_info.GetNumUnits(); ++U) { - DWARFUnit *unit = main_info.GetUnitAtIndex(U); - if (unit && m_units_to_avoid.count(unit->GetOffset()) == 0) - units_to_index.push_back(unit); - } - if (dwp_info && dwp_info->ContainsTypeUnits()) { - for (size_t U = 0; U < dwp_info->GetNumUnits(); ++U) { - if (auto *tu = - llvm::dyn_cast<DWARFTypeUnit>(dwp_info->GetUnitAtIndex(U))) { - if (!m_type_sigs_to_avoid.contains(tu->GetTypeHash())) - units_to_index.push_back(tu); + DWARFDebugInfo &main_info = m_dwarf->DebugInfo(); + SymbolFileDWARFDwo *dwp_dwarf = m_dwarf->GetDwpSymbolFile().get(); + DWARFDebugInfo *dwp_info = dwp_dwarf ? &dwp_dwarf->DebugInfo() : nullptr; + + std::vector<DWARFUnit *> units_to_index; + units_to_index.reserve(main_info.GetNumUnits() + + (dwp_info ? dwp_info->GetNumUnits() : 0)); + + // Process all units in the main file, as well as any type units in the dwp + // file. Type units in dwo files are handled when we reach the dwo file in + // IndexUnit. + for (size_t U = 0; U < main_info.GetNumUnits(); ++U) { + DWARFUnit *unit = main_info.GetUnitAtIndex(U); + if (unit && m_units_to_avoid.count(unit->GetOffset()) == 0) + units_to_index.push_back(unit); + } + if (dwp_info && dwp_info->ContainsTypeUnits()) { + for (size_t U = 0; U < dwp_info->GetNumUnits(); ++U) { + if (auto *tu = + llvm::dyn_cast<DWARFTypeUnit>(dwp_info->GetUnitAtIndex(U))) { + if (!m_type_sigs_to_avoid.contains(tu->GetTypeHash())) + units_to_index.push_back(tu); + } } } - } - if (units_to_index.empty()) - return; - - StreamString module_desc; - m_module.GetDescription(module_desc.AsRawOstream(), - lldb::eDescriptionLevelBrief); - - // Include 2 passes per unit to index for extracting DIEs from the unit and - // indexing the unit, and then extra entries for finalizing each index in the - // set. - const auto indices = IndexSet<NameToDIE>::Indices(); - const uint64_t total_progress = units_to_index.size() * 2 + indices.size(); - Progress progress("Manually indexing DWARF", module_desc.GetData(), - total_progress, /*debugger=*/nullptr, - Progress::kDefaultHighFrequencyReportTime); - - // Share one thread pool across operations to avoid the overhead of - // recreating the threads. - llvm::ThreadPoolTaskGroup task_group(Debugger::GetThreadPool()); - const size_t num_threads = Debugger::GetThreadPool().getMaxConcurrency(); - - // Run a function for each compile unit in parallel using as many threads as - // are available. This is significantly faster than submiting a new task for - // each unit. - auto for_each_unit = [&](auto &&fn) { - std::atomic<size_t> next_cu_idx = 0; - auto wrapper = [&fn, &next_cu_idx, &units_to_index, - &progress](size_t worker_id) { - size_t cu_idx; - while ((cu_idx = next_cu_idx.fetch_add(1, std::memory_order_relaxed)) < - units_to_index.size()) { - fn(worker_id, cu_idx, units_to_index[cu_idx]); - progress.Increment(); - } - }; + if (units_to_index.empty()) + return; - for (size_t i = 0; i < num_threads; ++i) - task_group.async(wrapper, i); + StreamString module_desc; + m_module.GetDescription(module_desc.AsRawOstream(), + lldb::eDescriptionLevelBrief); + + // Include 2 passes per unit to index for extracting DIEs from the unit and + // indexing the unit, and then extra entries for finalizing each index in + // the set. + const auto indices = IndexSet<NameToDIE>::Indices(); + const uint64_t total_progress = units_to_index.size() * 2 + indices.size(); + Progress progress("Manually indexing DWARF", module_desc.GetData(), + total_progress, /*debugger=*/nullptr, + Progress::kDefaultHighFrequencyReportTime); + + // Share one thread pool across operations to avoid the overhead of + // recreating the threads. + llvm::ThreadPoolTaskGroup task_group(Debugger::GetThreadPool()); + const size_t num_threads = Debugger::GetThreadPool().getMaxConcurrency(); + + // Run a function for each compile unit in parallel using as many threads as + // are available. This is significantly faster than submiting a new task for + // each unit. + auto for_each_unit = [&](auto &&fn) { + std::atomic<size_t> next_cu_idx = 0; + auto wrapper = [&fn, &next_cu_idx, &units_to_index, + &progress](size_t worker_id) { + size_t cu_idx; + while ((cu_idx = next_cu_idx.fetch_add(1, std::memory_order_relaxed)) < + units_to_index.size()) { + fn(worker_id, cu_idx, units_to_index[cu_idx]); + progress.Increment(); + } + }; - task_group.wait(); - }; - - // Extract dies for all DWARFs unit in parallel. Figure out which units - // didn't have their DIEs already parsed and remember this. If no DIEs were - // parsed prior to this index function call, we are going to want to clear the - // CU dies after we are done indexing to make sure we don't pull in all DWARF - // dies, but we need to wait until all units have been indexed in case a DIE - // in one unit refers to another and the indexes accesses those DIEs. - std::vector<std::optional<DWARFUnit::ScopedExtractDIEs>> clear_cu_dies( - units_to_index.size()); - for_each_unit([&clear_cu_dies](size_t, size_t idx, DWARFUnit *unit) { - clear_cu_dies[idx] = unit->ExtractDIEsScoped(); - }); + for (size_t i = 0; i < num_threads; ++i) + task_group.async(wrapper, i); - // Now index all DWARF unit in parallel. - std::vector<IndexSet<NameToDIE>> sets(num_threads); - for_each_unit( - [this, dwp_dwarf, &sets](size_t worker_id, size_t, DWARFUnit *unit) { - IndexUnit(*unit, dwp_dwarf, sets[worker_id]); - }); + task_group.wait(); + }; - // Merge partial indexes into a single index. Process each index in a set in - // parallel. - for (NameToDIE IndexSet<NameToDIE>::*index : indices) { - task_group.async([this, &sets, index, &progress]() { - NameToDIE &result = m_set.*index; - for (auto &set : sets) - result.Append(set.*index); - result.Finalize(); - progress.Increment(); + // Extract dies for all DWARFs unit in parallel. Figure out which units + // didn't have their DIEs already parsed and remember this. If no DIEs were + // parsed prior to this index function call, we are going to want to clear + // the CU dies after we are done indexing to make sure we don't pull in all + // DWARF dies, but we need to wait until all units have been indexed in case + // a DIE in one unit refers to another and the indexes accesses those DIEs. + std::vector<std::optional<DWARFUnit::ScopedExtractDIEs>> clear_cu_dies( + units_to_index.size()); + for_each_unit([&clear_cu_dies](size_t, size_t idx, DWARFUnit *unit) { + clear_cu_dies[idx] = unit->ExtractDIEsScoped(); }); - } - task_group.wait(); - SaveToCache(); + // Now index all DWARF unit in parallel. + std::vector<IndexSet<NameToDIE>> sets(num_threads); + for_each_unit( + [this, dwp_dwarf, &sets](size_t worker_id, size_t, DWARFUnit *unit) { + IndexUnit(*unit, dwp_dwarf, sets[worker_id]); + }); + + // Merge partial indexes into a single index. Process each index in a set in + // parallel. + for (NameToDIE IndexSet<NameToDIE>::*index : indices) { + task_group.async([this, &sets, index, &progress]() { + NameToDIE &result = m_set.*index; + for (auto &set : sets) + result.Append(set.*index); + result.Finalize(); + progress.Increment(); + }); + } + task_group.wait(); + + SaveToCache(); + }); } void ManualDWARFIndex::IndexUnit(DWARFUnit &unit, SymbolFileDWARFDwo *dwp, diff --git a/lldb/source/Plugins/SymbolFile/DWARF/ManualDWARFIndex.h b/lldb/source/Plugins/SymbolFile/DWARF/ManualDWARFIndex.h index 0b5b2f3e84309..f7d45b5a24990 100644 --- a/lldb/source/Plugins/SymbolFile/DWARF/ManualDWARFIndex.h +++ b/lldb/source/Plugins/SymbolFile/DWARF/ManualDWARFIndex.h @@ -170,7 +170,7 @@ class ManualDWARFIndex : public DWARFIndex { llvm::DenseSet<uint64_t> m_type_sigs_to_avoid; IndexSet<NameToDIE> m_set; - bool m_indexed = false; + std::once_flag m_indexed_flag; }; } // namespace dwarf } // namespace lldb_private::plugin 
Copy link
Member

@JDevlieghere JDevlieghere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid the extra indentation, can you rename the current method to IndexImpl, make it private, and then call that from inside the call_once in ::Index?

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
Copy link
Member

@JDevlieghere JDevlieghere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Tom Yang and others added 2 commits October 31, 2025 14:07
Copy link
Contributor

@jeffreytan81 jeffreytan81 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much more readable!

@zhyty zhyty merged commit 74d4870 into llvm:main Oct 31, 2025
8 of 9 checks passed
@zhyty zhyty deleted the upstream-manual-dwarf-index-once-flag branch October 31, 2025 22:08
DEBADRIBASAK pushed a commit to DEBADRIBASAK/llvm-project that referenced this pull request Nov 3, 2025
Small change to use (what I think is) a better practice -- we were using the `m_indexed` bool member to make sure we called `Index()` once, but we should just use `std::once`! This change shouldn't affect functionality. This change may also make concurrent access to `Index()` thread-safe, though the ManualDWARFIndex API isn't completely thread-safe due to `Decode()`. I'm not sure if ManualDWARFIndex was ever intended to be thread-safe. Test Plan: `ninja check-lldb` Tested basic debugging workflow of a couple of large projects I had built. Basically: ``` (lldb) target create <project> (lldb) b main (lldb) r (lldb) step ... ``` I A/B tested the performance of launching several modules with parallel module loading and didn't observe any performance regressions. --------- Co-authored-by: Tom Yang <toyang@fb.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

4 participants