I would like to announce the arrival of pulau-rs, a #![no_std]
union-find library for bare metal environments. For those that are not familiar to #![no_std]
, it is a feature of in rust programming language to denote the absence of the standard library, making libraries suitable for bare metal environments.
What is union-find?
Union-find, also called disjoint-set-union (DSU), is a data structure that stores a collection of disjoint (non-overlapping) sets. It has many use cases and the one of the more famous use cases is finding the minimum spanning tree (MST) in Kruskal's algorithm. Typically, union-find structures offer the following methods
For more info information on union-find you can check out the wikipedia article for it.
Asymptotic Complexity
There are a few ways to implement union-find and this library does not attempt to implement a specific algorithm. In this library, I provide the following implementations.
Algorithm | Union | Find | Connected |
---|---|---|---|
QuickFind | |||
QuickUnion | |||
Weighted (Rank) QuickUnion /w Path Compression | |||
Weighted (Size) QuickUnion /w Path Compression |
*Where is the inverse Ackermann function
Design Decisions
Before proceeding, the following section requires you to understand some rust. You might be wasting time reading them if you don't.
You might be wondering why bother with the different implementations when the last 2 in the table has superior asymptotic complexity. Remember that this library is targeting bare metal environments, sometimes we cannot afford to pay the additional cost of having a heuristic container due to memory constraints. Hence, having the option to not have the heuristic container is important.
Type States
One of the key features of this library is the ability to select the algorithm that you want and the type system will produce a corresponding union-find structure that fits your needs in terms of algorithmic complexity and space complexity. For those not familiar with type states, type states can be thought of encoding information into a type which can then be used at the type level to do something. In pulau-rs case, it is used to select algorithms at compile time. The following code snippet (taken from pulau-rs) might help you understand further. You can read more on type states in this amazing article written by Yoshua Wuyts.
/// [`QuickFind`] algorithm
#[derive(Debug, Default)]
pub struct QuickFind<const IS_SLICE: bool = false>;
// Since QuickFind does not require an additional Heuristic Container
// this trait will ensure that the space occupied by UnionFind with QuickFind
// does not take additional space
impl AlgorithmContainer for QuickFind<false> {
type HeuristicContainer<'a, const N: usize> = [usize; 0];
type RepresentativeContainer<'a, R: VertexType + 'a, const N: usize> = [R; N];
}
impl AlgorithmContainer for QuickFind<true> {
type HeuristicContainer<'a, const N: usize> = [usize; 0];
type RepresentativeContainer<'a, R: VertexType + 'a, const N: usize> = &'a mut [R];
}
// code shorten for brevity
// in this case, UnionFind takes a type QuickFind which was declared prior
// and UnionFind will have an internal container of [u32; 10] only, without
// an additional heuristic container!
let mut uf = UnionFind::<QuickFind, u32, 10>::new();
// we can guarantee that the size is 40 bytes only since the information encoded
// into `QuickFind` says that its associated type for HeuristicContainer MUST be a zero-sized-type (ZST)
assert_eq!(core::mem::size_of::<[u32; 10]>(), uf); // 40 bytes
In essence, if you want to swap out to a more optimised algorithm like Weighted (Size) QuickUnion /w Path Compression, you can just do UnionFind::<QuickUnion<BySize>, u8, 10>::new()
. Notice the different first type parameter.
Usage of generic associated types (GATs) and Const generics
Initially I was wary of the using GATs in my library as it tends to add additional complexity that might not be worth it. However, I realised that since I am using types to encode information of the algorithm and their corresponding internal containers, GATs was required. The following code snippet shows the way UnionFind
is implemented.
pub trait AlgorithmContainer {
/// Any kind of contiguous container
///
/// # Examples
/// - `[T; N]`
/// - `[T; 0]`
/// - `heapless::Vec<T, N>`
type HeuristicContainer<'a, const N: usize>: AsRef<[usize]> + AsMut<[usize]>;
/// Any kind of contiguous container (should not be ZST). `R` must also live as long as `'a`
///
/// # Examples
/// - `[T; N]`
/// - `heaples::Vec<T, N>`
type RepresentativeContainer<'a, R: VertexType + 'a, const N: usize>: AsRef<[R]> + AsMut<[R]>;
}
pub struct UnionFind<'a, A, T, const N: usize>
where
T: VertexType + 'a,
A: AlgorithmContainer,
{
representative: A::RepresentativeContainer<'a, T, N>,
heuristic: A::HeuristicContainer<'a, N>,
algorithm: PhantomData<A>,
}
There is quite a bit to unpack here, I'll try my best to explain.
UnionFind
is parameterised byA
whereA
is an algorithm that works with union-find and as we saw previously under the type state section, this struct must minimally implement theAlgorithmContainer
trait.T
is type where the type provided by implementVertexType
trait, currently all unsigned integral types have been implemented. Custom types can be implemented using theVertexType
trait.N
isconst
generic, which means it takes in a constant. Const generics is important here especially when writing code for bare metal environments as we can be sure upfront how much memory a struct will take.- Since we want
UnionFind
to be able to "generate" it's internal containers based on the algorithm we provide, using GATs is one way of doing this without duplicating code.
So why is this necessary? Suppose we want to implement Weighted (Size) QuickUnion /w Path Compression while at the same time we don't want to create a new struct. This is just not possible. Take a look at the following code snippet where we implement our UnionFind
struct without GATs.
/// Link by size of tree
#[derive(Default, Debug)]
pub struct BySize<const IS_SLICE: bool = false>;
#[derive(Debug, Default)]
pub struct QuickUnion<H = ByRank, const COMPRESS_PATH: bool = true> {
heuristic: PhantomData<H>,
}
/// Suppose we don't use GATs here...
pub struct UnionFind<'a, A, T, const N: usize>
where
T: VertexType + 'a,
A: Union<T> + Find<T> + Connected<T>,
{
representative: [T; N],
heuristic: [T; N],
algorithm: PhantomData<A>,
}
There are a few problems with this.
- It's not possible to generate the correct implementation of heuristic container for algorithms that don't use heuristic since both internal containers requires the const
N
. - This assumes that user only wants to use
[T; N]
, what if they want to useVec<T, N>
or any other contiguous types like a preallocated buffer&'_ [T]
?
You might argue that this introduce additional complexity that is not worth. However, in my opinion, as long as the user does not feel the complexity, it's fine. Most users will just use UnionFind::<QuickUnion<BySize>, T, N>::default()
and call it a day. Only users that want to customise the data structure will require to deal with the internal traits and GATs.
Inspiration behind the name pulau-rs
The name of this library is inspired by one of the uses of union-find, which is to find connected components in graph. "Pulau" means island in Bahasa Melayu/Bahasa Indonesia and I guess finding islands is similar to finding connected components💀.
Possible improvements and future plans
There are a few improvements that are possible and I think it should be done. They are
- providing only traits, segregating the implementation and the required trait
- linking by RNG (feature gated)
Final thoughts
This library has been a joy to write as this is the first time I am exploring #![no_std]
and GATs. Writing a correct and performant library while providing extensibility is quite challenging and it took me a few days to think of the design. In the end I am pretty satisfied with the API surface and I will definitely will use GATs more provided I can hide the complexity from the users.
PS, Apologies if the explanation above is confusing, I am not too sure how to structure the parts on GATs and type states 😔