GPU Acceleration for StarNet++ with DirectML

0. Compatibility

1.1 Extracting StarNet

1.2 Extracting libtensorflow

2. Running StarNet

3. Testing StarNet

4. Benchmark

How To: GPU Acceleration for
StarNet++ using DirectML
Updated 09 February 2022

Previously, I described how you can accelerate StarNet++ V1 using DirectX 12 through Microsoft's Tensorflow-DirectML C API.I I am happy to announce that StarNet 2 can now be accelerated on Windows in a similar way!

There is a caveat: as of the writing of this article, tensorflow-directml-plugin (which is the successor to the tensorflow-directml API for Tensorflow 1.15) has only released a C API for Linux. This is because when they were developing the C API, the mainline Tensorflow build was missing several symbols that were required in order to compile a C API for Windows, which are being worked on for future builds of TF 2.x. The workaround for this is to use Windows Subsystem for Linux (WSL), a tool baked into Windows for running Linux applications on a Windows device.

WSL may be daunting to people with little experience using Linux or command-line applications. There are certain advantages to using WSL: performance on PixInsight can often be better on WSL than on native Windows, since it was originally developed in Linux. Currently, the only way to hardware accelerate StarNet 2 on Windows is through WSL. It is ultimately up to you whether you believe that going through the process of installing WSL will be worth it for the performance gains, since a Windows C API will likely be released in the future (and will be posted to my site when it is!), although how far in the future is still uncertain.

This method has been tested on Windows 10 and Windows 11.

0. Compatibility

This tutorial is meant for x64 systems running Windows 10 or 11. DirectML is distributed with Windows 10 v1903 and newer.

Any graphics device supporting Microsoft DirectX 12 is supported, including integrated graphics, although it is recommended to use CUDA acceleration with NVIDIA GPUs. Almost all recent commercially-available graphics cards support DirectX 12, although the extent of acceleration will vary:

All AMD GCN 1st Gen (Radeon HD 7000 series) and above, incl. all Radeon RX and Vega GPU's and integrated graphics.
All Intel Haswell (4th-gen Core, 4000 series) HD Integrated Graphics and above
NVIDIA Kepler (GTX 600 series) and above
Qualcomm Adreno 600 series and above

1. Install Windows Subsystem for Linux (WSL)

- If you are using the standalone StarNet module:

Download StarNet_Win.zip from the SourceForge project here, then extract the folder where you would like to keep the module (e.g. Downloads, Desktop, etc.)

- If you are using StarNet with PixInsight:

The StarNet V1 module is already built into PixInsight.

UPDATE: April 2022

The StarNet Sourceforge project is down indefinitely. The StarNetV1 command line interface for Windows can be found at this mirror.

1.2 Extracting and Replacing libtensorflow

Download the latest libtensorflow-win-x64.zip from Microsoft's DirectML Github repository here, extract the folder, then navigate to \lib inside the zip file.

- If you are using the standalone StarNet module:

Replace the file titled tensorflow.dll in StarNet_Win with the one inside \lib. Then copy the file titled DirectML.24bfac66e4ee42ec393a5fb471412d0177bc7bcf.dll from libtensorflow-win-x64.zip into StarNet_Win.

- If you are using StarNet with PixInsight:

Navigate to C:\Program Files\Pixinsight\bin, then replace the file titled tensorflow.dll with the one inside \lib. Then copy the file titled DirectML.24bfac66e4ee42ec393a5fb471412d0177bc7bcf.dll from libtensorflow-win-x64.zip into C:\Program Files\Pixinsight\bin.

Since all backend files associated with DirectML are already provided as part of the operating system, the additional environment variables, installations, and file copying needed for CUDA acceleration is not required here.

2. Running StarNet

If you are using the Command-Line Interface version of StarNet, copy the 16-bit *.tiff file you would like to process into the folder containing all the files, then drag the new file on top of rgb_starnet++.exe or mono_starnet++.exe., depending on if your image is RGB or grayscale. Your output file will appear as starless.tif in the same folder that you executed the file from.

To use the PixInsight module, navigate to the "Process" tab, then select "StarNet2" under "All Processes". Ensure that both the rgb_starnet_weights.pb and mono_starnet_weights.pb files are in C:\Program Files\Pixinsight\bin, and that the paths to the weight files are set in the process. Apply the process to the desired image,

3. Testing StarNet with DirectML

To check if DirectML is working, open Task Manager, click "More Details" at the bottom of the window, then click on the "Performance" tab at the top. Click on your GPU. Run an instance of the StarNet which has had its original tensorflow.dll replaced with the GPU-enabled version.

If you were successful in enabling GPU acceleration, the usage graph for your GPU should spike up significantly, but not as close to 100% as with CUDA. Without GPU acceleration, the CPU usage graph would spike but the GPU usage graph would stay constant. If your GPU usage doesn't go up, check to make sure both tensorflow.dll and DirectML.[lots-of-characters].dll were copied correctly.

Running the standalone module can help reveal potential reasons behind an error or a crash, as the console output can give clues as to what's going on. The PixInsight module does not have this information in its console I/O.

4. Benchmark

I wanted to test how well DirectML works compared to CUDA and stock CPU, so I ran a very short test. By no means is this an exhaustive test, nor do I guarantee that everyone else will see similar results. Your mileage will vary depending on your hardware configuration, as well as the image that you run through StarNet.

I'm running PixInsight 1.8.8-12 on Windows 11, with an AMD Ryzen 7 5800X CPU and NVIDIA GeForce RTX 3070 GPU. A 6000x4000 image of M31 was used as a benchmark. More information about the tests can be found in the PDF version of this tutorial.

Stock StarNet V1 averaged 1:47.52, while StarNet V1 with DirectML acceleration averaged 18.25s — an 83% improvement!

However, StarNet V1with CUDA still offers a 10% performance improvement over StarNet V1 with DirectML, while StarNet2 with CUDA offers a nearly 30% improvement. Hence, CUDA acceleration should always be used when possible—but for those without CUDA, DirectML is still a substantial upgrade!

Again, your mileage here will vary, depending on what kind of graphics processor you have. I have not had the chance to test this out with AMD graphics cards or integrated processors, so if you run a benchmark let me know your results in the comments!

Thanks for reading this tutorial, and clear skies to everyone! If you have any comments or questions please don't hesitate to let me know!- WL

WILLIAM LI

PHOTOGRAPHY

How To: GPU Acceleration for
StarNet++ using DirectML
Updated 09 February 2022

© 2020 William Li.

Please do not use photos without permission. Questions? Contact me here.

WILLIAM LI

PHOTOGRAPHY

How To: GPU Acceleration for StarNet++ using DirectML Updated 09 February 2022

How To: GPU Acceleration for
StarNet++ using DirectML
Updated 09 February 2022