This is going to be a theoretical article rather than a practical one.

A solid background preparation for this post would be familiarity of how a graph is represented in the computer: as an adjacency matrix or a lists of edges.

Of course, you don’t have to know [yet] how to implement that with code, as already mentioned, this is going to be mainly a theoretical article!

Another important symbol you’ll encounter is [n], which indicates all the natural numbers from 1 till n: example: [4] = {1,2,3,4},  [1] = {1}, …

Topological Sort

Given a graph G=(V,E) , return a function L:[v]\rightarrow N such that(s.t) \forall (u,v)\in E \Longrightarrow L(u)<L(v)

What exactly does that mean, in English?

Having an edge(a line that connects 2 vertics) (u,v) \in E (notice that the internal order between u and v is very important: (u,v) \neq (v,u)  ), we want to associate to vertex v   a larger number than vertex u

Simple as that.

Let’s begin by searching for a vertex v   in the graph, with no edges entering it. such vertex is knows as a “Source” . Example(Image from Wikipedia):

In the graph above, vertexes number 5, 3, 7 are all sources, since no edges are entering to them.

In mathematical terms, we write: d_{in}(v) = 0
which  reads: “the in degree of v equals 0″.

Without hesitating, we could confidently assign to the source vertex the lowest number we are allowed to assign! Why? because there is no such an edge (u,v)  whereas v  is our source!, therefore, there is no u   s.t L(u) < L(v)

Next, we remove our source v  from the graph(and all the edges that get out off it) from the graph, and search for another source in the graph, again.

The whole algorithm:

Input: G = (V, E) 
Output: L : [v]\rightarrow N  

1)    i \leftarrow 1 
2)    while v \neq \phi 
2.1)          u = find Source()
2.2)          L(u) \leftarrow i
2.3)          i \leftarrow i + 1 
2.4)          remove u and all it's edges
3)    return L()

Runtime

Mainly, the part which consumes the longest time is the while loop, the others(i.e part (1) and (3) just take O(1) time)

part (2.1): Finding the source is done by iterating over all the vertices v of the graph, and checking whether d_{in}(v) = 0. Therefor, this part takes O(|V|) time, where |V| is the number of vertices in the graph .

part (2.2) and (2.3) take just O(1)

part (2.4) is done by iterating on all the vertices w which satisfy the condition (u, w) \in E. In other words, those vertices w are the end-points of the edges that branches out the source u.
The number of those vertices equals d_{in}(u).
Note that \sum_{i=1}^{|V|} d_{in}(v_i) = 2\times |E|=O(|E|)

Total Runtime(u_i is the source in the i-th iteration):

\displaystyle\sum_{i=1}^{|V|} O(1)+O(|V|)+(d_{in}(u_i))= O(|V|)+O(|V|^2)+O(|E|)=O(|V|^2 + |E|)

That’s quit a large runtime though! We need to optimize the algorithm above so that the runtime gets down to O(|V| + |E|)

As you already seen, the part which consumes a lot of time, is finding a source vertex. Thus, we should modify the algorithm to reduce the process of finding a source.

What you’re about to see is:

  1.  finding all the possible sources in the graph(before entering the while loop) which takes O(|V|) as we already discussed.
  2. In the while loop, when removing the current source vertex u , we check if any of the vertices with edges branching out of u was transformed to a source, if so, then we’re adding them to the group of ‘sources’. This last part takes O(d_{in}(u)) whereas u is the current source to be removed.
    After all the iterations of the loop, this part takes O(|E|) because \sum_{i=1}^{|V|} O(d_{in}(v_i)) = 2\times|E|=O(|E|)

Here is the modified algorithm:

1)      i \leftarrow 1 
2)      S \leftarrow AllSources 
3)      while v \neq \phi 
3.1)            u = GetSource()
3.2)            L(u) \leftarrow i
3.3)            i \leftarrow i + 1 
3.4)            remove u and all it's edges
3.5)               foreach v in u.neighbours :
3.5.1)  	     if d_{in}(v) = 0 :
3.5.1.1)  		S \leftarrow S \bigcup \{v\} 
4)      return L()

The runtime of the modified topological sort algorithm above is \displaystyle O(V + E)

Proof of Correctness

All been nice and smooth until now..(or was it?! 🙂 )

However…

Does the algorithm  works always OR what happened was one Big lie??

Well…it’s kinda BOTH!!

wait..before you report the blog…

I kept the fact that we’re dealing with graphs called DAG in this sorting algorithm!!
DAG – Directed Acyclic Graph!

Now that we’re on steady formation, let’s proof the correctness by stating the following Rule:

Rule 1: In a DAG, there is always at least one source vertex.

Proof 1:

Start by v_1 \in V, if v_1 a source, then we’re done! Else, consider v_2, such that (v_1, v_2) \in E, now we know that v_1 \neq v_2 because the graph is DAG, therefor we check if v_2 is a source, if not, we’ll take v_3 : (v_2, v_3) \in E and so on…

At the k_{th} iteration, we’ll be looking at v_k : (v_{k-1}, v_k) \in E, notice that v_i \neq v_j \forall i,j \in [1,k]. However, when k = n+1 the iterations are done (since|V| \triangleq n and when k = n+1 means that we looked at all the vertices in the graph because $latex v_{n+1} \notin V), but the iterations are done when we find a source! Thus there is always a source vertex in a DAG.

Rule 2: When removing a vertex from a graph, it stays DAG (and thus we can find a source in it again)

Proof:

This rule guarantees that the loop does not stuck at any iteration because it didn’t find a source even though it didn’t finished looking at all the vertices yet.

So, when removing a vertex, we basically removing the edges which branches out of it, therefore, we cannot  get to have a circle in out DAG!

Advertisements