Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
  • M Megadep
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 4
    • Issues 4
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 1
    • Merge requests 1
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Container Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • performance
  • Megadep
  • Wiki
  • Home

Home · Changes

Page history
Update Megadep authored Jul 15, 2019 by Pádraig Ó Conbhuí's avatar Pádraig Ó Conbhuí
Hide whitespace changes
Inline Side-by-side
home.md 0 → 100644
View page @ 6df5bbe4
Megadep Algorithm
=================
The megadep algorithm is a distributed algorithm for determining all the dependencies direct and indirect
of source files in a project.
Each process reads $`\left(\text{\#files} / \text{\#processes}\right)`$ files.
Each process also does $`\mathcal{O}\left(\text{\#files} / \text{\#processes}\right)`$
operations at each step.
Finally, 3 All-gather steps are done throughout the algorithm to pull the data together on each process.
~~~c++
int main() {
auto file_names = list_files();
// Sort the names so we can use a binary search lookup.
std::sort(file_names.begin(), file_names.end());
// Block decomposition of the range [0, file_names.size()) across the processes in
// MPI_COMM_WORLD:
// Process 1 gets [0, ..., block_size-1]
// Process 2 gets [block_size, ..., 2*block_size-1]
// ...
// Process n gets [(n-1)*block_size, ..., n*block_size]
auto decomposition = block_decomposition(MPI_COMM_WORLD, 0, file_names.size());
// Generate a map from a file index to the file indices it #includes
// Only generate the map for the "local" file indices.
std::map<File_index, std::vector<File_index>> file_to_includes;
for(auto file_index: decomposition) {
// Get list of included files for the current file.
// Find the equivalent file index for each and add it to the map.
auto include_names = read_includes(file_names[file_index]);
for(auto& include_name: include_names) {
auto include_it = std::lower_bound(file_names.begin(), file_names.end(), include_name);
auto include_name_index = std::distance(file_names.begin(), include_it);
file_to_includes[file_index].push_back(include_name_index);
}
}
// All-gather with every process so everyone has the full file_to_includes map.
// Each process will have a unique set of keys.
sync_distributed_maps(MPI_COMM_WORLD, file_to_includes);
// For each "local" file, find which files directly include it.
// Effectively, this partially inverts file_to_includes
std::map<File_index, std::vector<File_index>> file_to_includers;
for(auto file_index: decomposition) {
for(auto include: file_to_includes[file_index]) {
file_to_includers[include].push_back(file_index);
}
}
// All-gather, but every process can have overlapping keys. All the key values must be merged.
// This completes the inversion.
sync_merge_distributed_maps(MPI_COMM_WORLD, file_to_includers);
// For each "local" file, recursively find all its includers, and add it to
// each includers list of indirect includes.
std::map<File_index, std::vector<File_index>> file_to_indirect_includes;
for(auto file_index: decomposition) {
// We keep a list of includers that have been processed so we don't
// add them twice, or repeat work. (assume all initialized to false)
std::map<File_index, bool> processed;
for(auto include: file_to_includers[file_index]) {
if(processed[include] == true) continue;
processed[include] = true;
file_to_indirect_includes[include].push_back(file_index);
for(auto sub_include: file_to_includers[include]) {
if(processed[sub_include] == true) continue;
processed[sub_include] = true;
file_to_indirect_includes[sub_include].push_back(file_index);
for(auto sub_sub_include: file_to_includers[sub_include]) {
// recurse
}
}
}
}
// All-gather, and again, every process can have overlapping keys. All the key values must be merged.
sync_merge_distributed_maps(MPI_COMM_WORLD, file_to_indirect_includes);
// file_to_indirect_includes now contains a map from a file to every file it includes directly
// or indirectly.
}
~~~
In each instance where `std::map` is used, a `std::vector` should do the same job, and more efficiently.
However, `std::map` is used here for illustrative purposes.
The definitions of `sync_distributed_maps` and `sync_merge_distributed_maps` have been omitted here.
They can be implemented using `MPI_Allgatherv`.
\ No newline at end of file
Clone repository
  • Home