3D Vision library: Flux3D.jl

Brief Description and Proposal

Over the summer I have worked on the project "Deep learning for 3D Computer Vision" and hence on 3D vision package called Flux3D.jl

Link: github.com/FuxML/Flux3D.jl

Docs: fluxml.ai/Flux3D.jl/stable/

3D computer vision involves various computer vision techniques like classification, segmentation, object detection, super-resolution and many more. Major and important steps in any machine learning or vision-based tasks include designing the model (using Conv and Dense layer), loss objective and training that can always be addressed with sophisticated general-purpose ML ecosystem like FluxML.

Performing 3D vision task also involves preparing datapoint and preparing dataset, applying transforms, and some common 3D metrics for use in designing the loss objective and models, which is usually a boilerplate and can also lead to a bottleneck for performance (in case of inefficient implementation) for a higher number of data points which is usually the case for vision-based tasks.

What is Flux3D.jl

Flux3D.jl is a 3D vision library, written completely in Julia. This package utilizes Flux.jl and Zygote.jl as its building blocks for training 3D vision models and for supporting differentiation. This package also has the support of CUDA GPU acceleration with CUDA.jl. Some of the roles of Flux3D are:

  • 3D Structures - Accelerated Batched Data structure for PointCloud, VoxelGrid and TriMesh for storing and computation.

  • Transforms - Multiple transforms for PointCloud, VoxelGrid and TriMesh

  • Metrics - metrics like chamfer_distance, laplacian loss and edge loss for defining loss objectives.

  • Dataset - ModelNet10/40 (in multiple variants like pointcloud, mesh, voxel) and a Minimal Custom Dataset wrapper.

  • Conversions - Interconversion between different 3D structure (this also help in building different variants of the dataset)

  • Visualize - visualizing PointCloud, TriMesh and VoxelGrid

  • DL model - Implementation some commonly used 3D layers and models.

Short Walkthrough of the Flux3D.jl

A general Machine learning or Deep learning task usually involves broad steps mentioned in the figure below. Therefore this broad steps also act as an effective assessment of where does this package stands in terms of 3D vision tasks.

Let's go through each step briefly!

Installing and importing packages

] up; add Flux3D#voxels; add Flux Zygote Makie; precompile
1067.7s
using Flux3D, Makie
import Makie: AbstractPlotting
AbstractPlotting.inline!(true)
AbstractPlotting.set_theme!(show_axis = false, resolution = (200,200))
10.0s
set_theme!(show_axis = false, scale_plot = true, resolution = (650, 600))
9.6s

1) Handling 3D DataPoint

Pointcloud, triangle mesh and voxels are three widely used representation for 3D data and most of the 3D vision work is also based on these structures. We can easily initialize Batched structure for PointCloud, TriMesh and VoxelGrid following which we can perform various transforms/metrics and various manipulation.

download("https://github.com/nirmalsuthar/public_files/raw/master/airplane.off",
         "airplane.off")
download("https://github.com/nirmalsuthar/public_files/raw/master/teapot2.obj",
         "teapot.obj")
2.5s
"teapot.obj"
m = load_trimesh(["airplane.off", "teapot.obj"]) |> gpu
49.9s
TriMesh{Float32, UInt32, CUDA.CuArray} Structure: Batch size: 2 Max verts: 17443 Max faces: 17116 offset: -1 Storage type: CuArray
p = PointCloud(m, 3000)
29.9s
PointCloud{Float32} Structure: Batch size: 2 Points: 3000 Normals 0 Storage type: CuArray{Float32,3}
v = VoxelGrid(m, 64)
5.8s
VoxelGrid{Float32} Structure: Batch size: 2 Voxels features: 64 Storage type: CuArray{Float32,4}

2) Preparing Dataset

ModelNet10/40 is one of the widely used 3D dataset, and we can easily access and preprocess this dataset for use in various vision-based tasks using Flux3D. We can also use conversion transforms for using ModelNet10/40 in multiple variants like PointCloud, TriMesh, VoxelGrid.

dset = ModelNet10(categories=["monitor","chair"])
@show dset
dset[1]
0.3s
DataPoint: idx: 1 data: TriMesh{Float32,UInt32,Array} ground_truth: 1 category_name: monitor

We can use conversion transforms to convert the dataset to VoxelGrid. Similarily we can convert VoxelGrid, TriMesh and PointCloud to any 3D structure either using transforms or simply using the regular constructor.

t = Chain(NormalizeTriMesh(), TriMeshToVoxelGrid(64))
dset2 = ModelNet10(train=false, transform=t)
@show dset2
dset2[1]
14.1s
DataPoint: idx: 1 data: VoxelGrid{Float32} ground_truth: 1 category_name: bathtub

3) Defining Model/Loss

Loss objectives and designing model solely depend upon the requirement of tasks, and with the help of FLuxML ecosystem, we can define any custom model as well as loss. There are some commonly used metrics and predefined 3D models which can assist in 3D specific tasks like chamfer_distance, laplacian_loss and edge_loss.

@show chamfer_distance(p,p)
14.1s
1.58147f-5
@show laplacian_loss(m)
17.2s
6.72288
@show edge_loss(m)
11.7s
2110.6

4) Training (using Flux and Zygote)

3D structures and all relevant transforms, as well as metrics, are compatible with Zygote for supporting differentiation.

5) Visualization and Evaluation

Flux3D provides a function visualize for visualizing 3D structures. This function uses Makie for plotting. We can use this same function for visualizing all three 3D structures PointCloud, TriMesh, and VoxelGrid

vbox(
  hbox(
      visualize(m,1),visualize(p,1,markersize=25)
    ),
  hbox(
      visualize(v,1), visualize(v,1,algo=:MarchingCubes)
    ),
)
9.1s
vbox(
  hbox(
      visualize(m,2),visualize(p,2,markersize=0.2)
    ),
  hbox(
      visualize(v,2), visualize(v,2,algo=:MarchingCubes)
    ),
)
11.9s

Benchmarks for Flux3D.jl

These are the results of benchmarking of Flux3D.jl with the popular 3D vision library Kaolin (based on PyTorch). Flux3D.jl is overall faster than Kaolin in terms of applying transforms on PointCloud and TriMesh, and comparable with Kaolin in terms of applying metrics. Although there must be work done to improve the back pass in laplacian loss which uses Sparse Arrays.

Why use Julia and FluxML ecosystem

It is evident from the above plots of benchmarking that we are easily able to match the performance of Flux3D.jl with Kaolin, which is majorly written in the lower language C++ and CUDA C for using GPU which is integrated with python for API interface. But with Flux3D, it is written purely in Julia and with the help of CUDA.jl we are also able to leverage GPU acceleration with the same code. This surely emphasizes the benefit of using Julia Language for intense computation like 3D vision while using high-level functions like any other modern language.

FluxML ecosystem uses Zygote as the AD engine which doesn't require the input and variable to be present in any special form (like Tensors in PyTorch), instead, we can simply define a function without doing mutation and Zygote will calculate gradients for us. Therefore we can make this package compatible with FluxML ecosystem without doing any extra work and even with many other packages like SciML differential equations ecosystem.

There are various Julia packages (thanks to awesome Julia community!) which make Flux3D.jl possible. With the help of Makie ecosystem, we can easily interact, visualize 3D structures, save the plots and gifs. NearestNeighbors.jl which is high performance nearest neighbour search library also makes it possible to perform intense computation metrics like chamfer distance even on CPU.

Future Work

As all the batched structure for PointCloud, TriMesh, and VoxelGrid are stable enough, we can easily define any arbitrary or custom functions on top of it and using Zygote we can easily differentiate through it. And since this package is ready and stable enough, there are a lot of interesting applications of Flux3D.jl along with other interesting packages in Julia like DiffEqFlux worth exploring.

Some of the things which would be interesting to have in future releases are:-

  • Cache-based Custom dataset wrapper, as 3D data are expensive in terms of space and computation (especially in VoxelGrid).

  • Some more metrics for 3D data like normal_consistency, cloud_to_surface_distance and some more loss.

  • Add support for textures in case of TriMesh which will open a lot more applications.

  • Integration with Differentiable Graphics Frameworks (like RayTracer.jl)

Acknowledgements

This summer has been a great learning experience for me from learning a new language Julia to finally able to appreciate it and contribute to this amazing community. And thanks to Google Summer of Code and JuliaLang for giving me such an opportunity to work with this community. I want to thank Avik Pal and Dhairya Gandhi for mentoring and guiding this project and awesome Julia community for always helping me out.

Runtimes (1)