How To: GPU Acceleration for
StarNet++ using DirectML
Updated 09 February 2022

dx12.jpg
StarNetIntro.png

If you've used StarNet before, you know long it takes sometimes to process a single image. Classically, only people with NVIDIA GPU's were able to accelerate StarNet, since TensorFlow only officially supports CUDA. However, Windows devices running any graphics device — including integrated graphics — that support Microsoft DirectX 12 can take advantage of Microsoft DirectML to hardware-accelerate Tensorflow processes. Although all modern NVIDIA GPU's support DirectX 12, CUDA is still the preferred method, as it works natively with both TF and the GPU.

Unfortunately, Microsoft's latest release of DirectML (1.15.5) is only compatible with Tensorflow 1.x, while Starnet V2 uses TF2.x. This means that the latest version StarNet's checkpoint files are incompatible with the DirectML version of libtensorflow.dll. When TF2.x support is released for DirectML, I'll be sure to update this article promptly!

This tutorial is written for StarNet V1, and PixInsight 1.8.8-12. It is meant to be used with the standalone StarNet++ module written by nekitmm, which can be found on SourceForge. A PDF version of this tutorial will be made available shortly. 
Let's get started!

0. Compatibility

This tutorial is meant for x64 systems running Windows 10 or 11. DirectML is distributed with Windows 10 v1903 and newer.

Any graphics device supporting Microsoft DirectX 12 is supported, including integrated graphics, although it is recommended to use CUDA acceleration with NVIDIA GPUs. Almost all recent commercially-available graphics cards support DirectX 12, although the extent of acceleration will vary:

  • All AMD GCN 1st Gen (Radeon HD 7000 series) and above, incl. all Radeon RX and Vega GPU's and integrated graphics.

  • All Intel Haswell (4th-gen Core, 4000 series) HD Integrated Graphics and above

  • NVIDIA Keplet (GTX 600 series) and above 

  • Qualcomm Adreno 600 series and above

1.1 Extracting StarNet++

StarNetDL.png

- If you are using the standalone StarNet module:

Download StarNet_Win.zip from the SourceForge project here, then extract the folder where you would like to keep the module (e.g. Downloads, Desktop, etc.)

- If you are using StarNet with PixInsight:

The StarNet V1 module is already built into PixInsight.

UPDATE: April 2022

The StarNet Sourceforge project is down indefinitely. The StarNetV1 command line interface for Windows can be found at this mirror.

1.2 Extracting and Replacing libtensorflow

Download the latest libtensorflow-win-x64.zip from Microsoft's DirectML Github repository here, extract the folder, then navigate to \lib inside the zip file.

 

- If you are using the standalone StarNet module: 

Replace the file titled tensorflow.dll in StarNet_Win with the one inside \lib. Then copy the file titled  DirectML.24bfac66e4ee42ec393a5fb471412d0177bc7bcf.dll from  libtensorflow-win-x64.zip into StarNet_Win.

- If you are using StarNet with PixInsight: 

Navigate to C:\Program Files\Pixinsight\bin, then replace the file titled tensorflow.dll with the one inside \lib. Then copy the file titled  DirectML.24bfac66e4ee42ec393a5fb471412d0177bc7bcf.dll from  libtensorflow-win-x64.zip into C:\Program Files\Pixinsight\bin.

dmlfiles.png

Since all backend files associated with DirectML are already provided as part of the operating system, the additional environment variables, installations, and file copying needed for CUDA acceleration is not required here.

2. Running StarNet

If you are using the Command-Line Interface version of StarNet, copy the 16-bit *.tiff file you would like to process into the folder containing all the files, then drag the new file on top of rgb_starnet++.exe or mono_starnet++.exe., depending on if your image is RGB or grayscale. Your output file will appear as starless.tif in the same folder that you executed the file from.

To use the PixInsight module, navigate to the "Process" tab, then select "StarNet2" under "All Processes". Ensure that both the rgb_starnet_weights.pb and  mono_starnet_weights.pb files are in C:\Program Files\Pixinsight\bin, and that the paths to the weight files are set in the process. Apply the process to the desired image,

StarNet2_Test.png

3. Testing StarNet with DirectML

To check if DirectML is working, open Task Manager, click "More Details" at the bottom of the window, then click on the "Performance" tab at the top. Click on your GPU. Run an instance of the StarNet which has had its original tensorflow.dll replaced with the GPU-enabled version.

 

If you were successful in enabling GPU acceleration, the usage graph for your GPU should spike up significantly, but not as close to 100% as with CUDA. Without GPU acceleration, the CPU usage graph would spike but the GPU usage graph would stay constant. If your GPU usage doesn't go up, check to make sure both tensorflow.dll and DirectML.[lots-of-characters].dll were copied correctly.

taskmgr.png

Running the standalone module can help reveal potential reasons behind an error or a crash, as the console output can give clues as to what's going on. The PixInsight module does not have this information in its console I/O.

4. Benchmark

I wanted to test how well DirectML works compared to CUDA and stock CPU, so I ran a very short test. By no means is this an exhaustive test, nor do I guarantee that everyone else will see similar results. Your mileage will vary depending on your hardware configuration, as well as the image that you run through StarNet.

I'm running PixInsight 1.8.8-12 on Windows 11, with an AMD Ryzen 7 5800X CPU and NVIDIA GeForce RTX 3070 GPU. A 6000x4000 image of M31 was used as a benchmark. More information about the tests can be found in the PDF version of this tutorial.

Stock StarNet V1 averaged 1:47.52, while StarNet V1 with DirectML acceleration averaged 18.25s — an 83% improvement! 

However, StarNet V1with CUDA still offers a 10% performance improvement over StarNet V1 with DirectML, while StarNet2 with CUDA offers a nearly 30% improvement. Hence, CUDA acceleration should always be used when possible—but for those without CUDA, DirectML is still a substantial upgrade!

Again, your mileage here will vary, depending on what kind of graphics processor you have. I have not had the chance to test this out with AMD graphics cards or integrated processors, so if you run a benchmark let me know your results in the comments!

Thanks for reading this tutorial, and clear skies to everyone! If you have any comments or questions please don't hesitate to let me know!​​- WL

dx12.jpg