The deal with Union Find data structure is that they are composed of sub-data structures. what I mean by that would be crystal clear after I present the big picture:

ds

As you can see, our UnionFind is going to be made up from 2 arrays(say of size n, each) and a forest of up trees to communicate between them.

NOTE: the reason why both arrays are of size n is because when you are given n elements, you are obliged to keep them in an array of size n, on the other hand, each element is a set by itself until we unify(merge) it with another, thus, the maximum number of sets is n as well! So now you’re wondering:

how does this beast work, anyway?

The principle is pretty straight forward:

  1. Each element in the array points to the corresponding node in the up-tree(i.e. the node which holds the data of the element)
  2. All the nodes in a particular up-tree refer to the same set.
  3. In order to know to which set a given element refer, we need to get to the node of that given element, and from there to “climb up” the up-tree until we reach the root.
  4. The root of the up-tree points to the corresponding set in the “sets” array on  the left.

Sounds cheesy enough for you now? I bet it does, but it’s cool too right?!

code

If you’re rather interested in the product itself than what it contains, please click here for the FULL generic implementation in C++

API

Let’s dwell for a moment of the interface functions which this data structure support:

  1. makeSet: created a new set with one element
  2. union: given 2 elements(or 2 sets), unify them into one set
  3. find: given an element, find to which set it refers.

According to the sketch above, you already know that we want to find some way of implementing an up-tree! Like every tree in data structures, it’s made up from nodes!! Thus, we should create a node instance to construct the up-tree. So how we should do that? and what it should contain, eh? Here’s the answer: Untitled

And in code:

/* This struct represents the atomic unit of the Union Find.
 * Basically, this struct contains info regarding a single element.
 * parameters:
 * @param data: the data which the element holds
 * @param size: number of elements this current node is their parent.
 * @param set: sequential number of the set to which this node
 * @param parent: pointer to the parent node of 'this' node.*/
template
struct DisjointNode {
	T data;
	int set;
	DisjointNode* parent;
	DisjointNode(T Data);
	DisjointNode(int set1);
	DisjointNode();
	~DisjointNode();
};

Later comes the other atomic unit which represents a single set, or a department:

/*
 * Struct to represent one unit of a set of multiple elements
 * Typename of this struct is no longer 'T' as a set should contain
 * completely different type of information,
 * thus require another typename
 */
template
struct Set {
	int size;
	D department;
	Set();
};

Now, I can present the Main class – UnionFind which contains the API of the data structure:

/*
 * Main class.
 * Parameters:
 * @param : type of the elements.
 * @param : type of sets(departments)
 * @param n: size of the universe - number of elements in the system.
 * @param elements: array of disjoint nodes:
 *  each cell has a pointer to the corresponding node in the up-tree.
 */
template
class UnionFind {
private:
	int n;
	int aux_find(DisjointNode* node);
	DisjointNode* getUpTreeRoot(int ele);
	DisjointNode* aux_getUpTreeRoot(DisjointNode* node);
public:
	DisjointNode* elements;
	Set<T, D>* sets;
	/* API */
	UnionFind(int N);
	~UnionFind() {
		destroy();
	}
	UnionResult unionSets(int ele1, int ele2);
	int find(int element);
	void destroy();
};
Advertisements