[BUG] Segmentation Fault when using tfdlpack.to_dlpack on tf.tensor #12

awthomp · 2020-01-17T17:10:35Z

I've been experimenting with using tfdlpack to connect libraries using __cuda_array_interface__ to TensorFlow with tfdlpack and reach a segmentation fault when invoking to_dlpack with a TF tensor. See below for replication:

import cupy as cp
import tfdlpack

# CuPy - GPU Array (like NumPy!)
gpu_arr = cp.random.rand(10_000, 10_000)

# Use CuPy's built in `toDlpack` function to move to a DLPack capsule
dlpack_arr = gpu_arr.toDlpack()

# Use `tfdlpack` to migrate to TensorFlow
tf_tensor = tfdlpack.from_dlpack(dlpack_arr)

# Confirm TF tensor is on GPU
print(tf_tensor.device)

# Use `tfdlpack` to migrate back to CuPy; this yields a segmentation fault
dlpack_capsule = tfdlpack.to_dlpack(tf_tensor)

I'm using 1 GP100 isolated with the CUDA_VISIBLE_DEVICES environment variable.

The text was updated successfully, but these errors were encountered:

jermainewang · 2020-01-18T08:39:27Z

Confirmed this is a bug. I replaced cupy with torch and it also crashes.

import torch
from torch.utils import dlpack as th_dlpack
import tfdlpack

gpu_arr = torch.rand(10_000, 10_000).cuda()
print(gpu_arr)

dlpack_arr = th_dlpack.to_dlpack(gpu_arr)

# Use `tfdlpack` to migrate to TensorFlow
tf_tensor = tfdlpack.from_dlpack(dlpack_arr)

# Confirm TF tensor is on GPU
print(tf_tensor.device)

# Use `tfdlpack` to migrate back to CuPy; this yields a segmentation fault
dlpack_capsule = tfdlpack.to_dlpack(tf_tensor)

jermainewang · 2020-01-18T09:02:51Z

What's your tensorflow version? I found the code works with tensorflow v2.1.0 but not v2.0.0.

VoVAllen · 2020-01-18T09:10:57Z

It works well on my machine.
I'm using tensorflow 2.1.0

awthomp · 2020-01-18T13:08:42Z

What's your tensorflow version? I found the code works with tensorflow v2.1.0 but not v2.0.0.

Interesting. I was on TF 2.1.0 when submitting the bug report. I've included an Anaconda environment file below to ensure we're on the same page for SW dependencies:

name: tfdlpack
channels:
  - conda-forge
  - nvidia
  - pytorch
  - defaults
  - numba
dependencies:
  - python=3.7
  - numpy
  - cudatoolkit>=9.2,<10.2
  - numba
  - cupy>=6.2.0
  - pytorch
  - pip
  - pip:
      - tfdlpack-gpu

Just save this into a file named tfdlpack_conda.yml. Then run:

conda env create -f tfdlpack_conda.yml
conda activate tfdlpack

My system contains 2 GP100s (Pascal P100) and 1 P2000 to drive graphics. I typically isolate GPU0 (P100) with export CUDA_VISIBLE_DEVICES=0.

awthomp · 2020-01-18T16:13:48Z

I'm also receiving the segfault with an NVIDIA T4. Here's a Google Colab notebook that you can run through. Perhaps pip install tfdlpack-gpu isn't pulling in all the expected/necessary dependencies?

https://colab.research.google.com/drive/18Z8bOCJ2Mr-jOD-vIbr6KAO1-KPUy_UM

VoVAllen · 2020-01-18T16:23:11Z

Thanks for your example. Actually I'm thinking of reorganize the whole project based on new tensorflow custom-op repo https://github.com/tensorflow/custom-op. As this is the official guide on how to distribute custom op. However I'm skeptical on whether I should make the project based on Bazel instead of CMake. I may need more time on thihs.

awthomp · 2020-01-18T16:29:36Z

Thanks, @VoVAllen and thanks for your hard and great work at enabling DLPack support with TensorFlow. Don't hesitate to let us know what you need help with.

VoVAllen · 2020-01-19T15:10:58Z

@awthomp I've updated the binary release and it now works in colab. Could you try it in your environment again?

However there's still bug in this release. It would happen when you create a capsule from tensorflow but not consuming it in another framework. I'm still investigating the solution.

awthomp · 2020-01-19T16:40:37Z

@VoVAllen. Wahoo! Works for me in both Colab on a T4 and on my local machine with a P100. Thanks for the quick fix!

awthomp mentioned this issue Jan 17, 2020

[REVIEW] Adding RAPIDS <-> DLFrameworks Jupyter Notebook rapidsai-community/notebooks-contrib#266

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Segmentation Fault when using tfdlpack.to_dlpack on tf.tensor #12

[BUG] Segmentation Fault when using tfdlpack.to_dlpack on tf.tensor #12

awthomp commented Jan 17, 2020

jermainewang commented Jan 18, 2020

jermainewang commented Jan 18, 2020 •

edited

Loading

VoVAllen commented Jan 18, 2020

awthomp commented Jan 18, 2020 •

edited

Loading

awthomp commented Jan 18, 2020

VoVAllen commented Jan 18, 2020

awthomp commented Jan 18, 2020

VoVAllen commented Jan 19, 2020 •

edited

Loading

awthomp commented Jan 19, 2020

[BUG] Segmentation Fault when using tfdlpack.to_dlpack on tf.tensor #12

[BUG] Segmentation Fault when using tfdlpack.to_dlpack on tf.tensor #12

Comments

awthomp commented Jan 17, 2020

jermainewang commented Jan 18, 2020

jermainewang commented Jan 18, 2020 • edited Loading

VoVAllen commented Jan 18, 2020

awthomp commented Jan 18, 2020 • edited Loading

awthomp commented Jan 18, 2020

VoVAllen commented Jan 18, 2020

awthomp commented Jan 18, 2020

VoVAllen commented Jan 19, 2020 • edited Loading

awthomp commented Jan 19, 2020

jermainewang commented Jan 18, 2020 •

edited

Loading

awthomp commented Jan 18, 2020 •

edited

Loading

VoVAllen commented Jan 19, 2020 •

edited

Loading