Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NCCL Cannot Find Tuner Symbols. Need to Export NCCL_TUNER_PLUGIN=/opt/aws-ofi-nccl/lib/libnccl-ofi-tuner.so #472

Open
zhanwenchen opened this issue Jul 19, 2024 · 1 comment

Comments

@zhanwenchen
Copy link

Hello,

I followed the official AWS AWS-OFI Plugin installation guide, but I found that there is a potential issue with the tuner. When I run the nccl-tests command in the linked guide:

/opt/amazon/openmpi/bin/mpirun \
-x LD_LIBRARY_PATH=/opt/nccl/build/lib:/usr/local/cuda/lib64:/opt/amazon/efa/lib:/opt/amazon/openmpi5/lib:/opt/aws-ofi-nccl/lib:$LD_LIBRARY_PATH \
-x NCCL_DEBUG=INFO \
--hostfile my-hosts -n 8 -N 8 \
--mca pml ^cm --mca btl tcp,self --mca btl_tcp_if_exclude lo,docker0 --bind-to none \
$HOME/nccl-tests/build/all_reduce_perf -b 8 -e 1G -f 2 -g 1 -c 1 -n 100

I got:

ip-172-31-18-239:755152:755194 [5] NCCL INFO TUNER/Plugin: Failed to find ncclTunerPlugin_v3 symbol.
ip-172-31-18-239:755152:755194 [5] NCCL INFO TUNER/Plugin: Failed to find ncclTunerPlugin_v2 symbol, using internal tuner instead.

Only with export NCCL_TUNER_PLUGIN=/opt/aws-ofi-nccl/lib/libnccl-ofi-tuner.so do I get

ip-172-31-18-239:754820:754863 [5] NCCL INFO TUNER/Plugin: Plugin name set by env to /opt/aws-ofi-nccl/lib/libnccl-ofi-tuner.so
ip-172-31-18-239:754820:754863 [5] NCCL INFO TUNER/Plugin: Failed to find ncclTunerPlugin_v3 symbol.
ip-172-31-18-239:754820:754863 [5] NCCL INFO TUNER/Plugin: Using tuner plugin nccl_ofi_tuner
@rauteric
Copy link
Contributor

Yes, the current public instructions do not load the tuner. Setting NCCL_TUNER_PLUGIN as you have done is the correct way to load the tuner.

Loading the tuner is not required to use the plugin, although the tuner improves performance in some configurations. We may update the public instructions in the future to include loading the tuner.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants