This is going to be a theoretical article rather than a practical one.

A solid background preparation for this post would be familiarity of how a graph is represented in the computer: as an adjacency matrix or a lists of edges.

Of course, you don’t have to know [yet] how to implement that with code, as already mentioned, this is going to be mainly a theoretical article!

Another important symbol you’ll encounter is [n], which indicates all the natural numbers from 1 till n: example: [4] = {1,2,3,4},  [1] = {1}, …

# Topological Sort

## Given a graph $G=(V,E)$, return a function $L:[v]\rightarrow N$ such that(s.t) $\forall (u,v)\in E \Longrightarrow L(u)

### What exactly does that mean, in English?

Having an edge(a line that connects 2 vertics) $(u,v) \in E$ (notice that the internal order between u and v is very important: $(u,v) \neq (v,u)$), we want to associate to vertex $v$ a larger number than vertex $u$

## Simple as that.

Let’s begin by searching for a vertex $v$  in the graph, with no edges entering it. such vertex is knows as a “Source” . Example(Image from Wikipedia):

In the graph above, vertexes number 5, 3, 7 are all sources, since no edges are entering to them.

In mathematical terms, we write: $d_{in}(v) = 0$
which  reads: “the in degree of $v$ equals 0″.

Without hesitating, we could confidently assign to the source vertex the lowest number we are allowed to assign! Why? because there is no such an edge $(u,v)$ whereas $v$ is our source!, therefore, there is no $u$  s.t $L(u) < L(v)$

Next, we remove our source $v$ from the graph(and all the edges that get out off it) from the graph, and search for another source in the graph, again.

## The whole algorithm:

Input: $G = (V, E)$
Output: $L : [v]\rightarrow N$

1)    $i \leftarrow 1$
2)    while $v \neq \phi$
2.1)          $u = find Source()$
2.2)          $L(u) \leftarrow i$
2.3)          $i \leftarrow i + 1$
2.4)          remove u and all it's edges
3)    return L()

## Runtime

Mainly, the part which consumes the longest time is the while loop, the others(i.e part (1) and (3) just take $O(1)$ time)

part (2.1): Finding the source is done by iterating over all the vertices $v$ of the graph, and checking whether $d_{in}(v) = 0$. Therefor, this part takes $O(|V|)$ time, where $|V|$ is the number of vertices in the graph .

part (2.2) and (2.3) take just $O(1)$

part (2.4) is done by iterating on all the vertices $w$ which satisfy the condition $(u, w) \in E$. In other words, those vertices $w$ are the end-points of the edges that branches out the source $u$.
The number of those vertices equals $d_{in}(u)$.
Note that $\sum_{i=1}^{|V|} d_{in}(v_i) = 2\times |E|=O(|E|)$

### Total Runtime($u_i$ is the source in the i-th iteration):

$\displaystyle\sum_{i=1}^{|V|} O(1)+O(|V|)+(d_{in}(u_i))= O(|V|)+O(|V|^2)+O(|E|)=O(|V|^2 + |E|)$

That’s quit a large runtime though! We need to optimize the algorithm above so that the runtime gets down to $O(|V| + |E|)$

As you already seen, the part which consumes a lot of time, is finding a source vertex. Thus, we should modify the algorithm to reduce the process of finding a source.

## What you’re about to see is:

1.  finding all the possible sources in the graph(before entering the while loop) which takes $O(|V|)$ as we already discussed.
2. In the while loop, when removing the current source vertex $u$, we check if any of the vertices with edges branching out of $u$ was transformed to a source, if so, then we’re adding them to the group of ‘sources’. This last part takes $O(d_{in}(u))$ whereas $u$ is the current source to be removed.
After all the iterations of the loop, this part takes $O(|E|)$ because $\sum_{i=1}^{|V|} O(d_{in}(v_i)) = 2\times|E|=O(|E|)$

### Here is the modified algorithm:

1)      $i \leftarrow 1$
2)      $S \leftarrow AllSources$
3)      while $v \neq \phi$
3.1)            $u = GetSource()$
3.2)            $L(u) \leftarrow i$
3.3)            $i \leftarrow i + 1$
3.4)            remove u and all it's edges
3.5)               foreach $v$ in u.neighbours :
3.5.1)  	     if $d_{in}(v) = 0$ :
3.5.1.1)  		$S \leftarrow S \bigcup \{v\}$
4)      return L()


The runtime of the modified topological sort algorithm above is $\displaystyle O(V + E)$

# Proof of Correctness

All been nice and smooth until now..(or was it?! 🙂 )

However…

### Does the algorithm  works always OR what happened was one Big lie??

Well…it’s kinda BOTH!!

wait..before you report the blog…

I kept the fact that we’re dealing with graphs called DAG in this sorting algorithm!!
DAG – Directed Acyclic Graph!

Now that we’re on steady formation, let’s proof the correctness by stating the following Rule:

Rule 1: In a DAG, there is always at least one source vertex.

Proof 1:

Start by $v_1 \in V$, if $v_1$ a source, then we’re done! Else, consider $v_2$, such that $(v_1, v_2) \in E$, now we know that $v_1 \neq v_2$ because the graph is DAG, therefor we check if $v_2$ is a source, if not, we’ll take $v_3 : (v_2, v_3) \in E$ and so on…

At the $k_{th}$ iteration, we’ll be looking at $v_k : (v_{k-1}, v_k) \in E$, notice that $v_i \neq v_j \forall i,j \in [1,k]$. However, when $k = n+1$ the iterations are done (since$|V| \triangleq n$ and when $k = n+1$ means that we looked at all the vertices in the graph because \$latex v_{n+1} \notin V), but the iterations are done when we find a source! Thus there is always a source vertex in a DAG.

Rule 2: When removing a vertex from a graph, it stays DAG (and thus we can find a source in it again)

Proof:

This rule guarantees that the loop does not stuck at any iteration because it didn’t find a source even though it didn’t finished looking at all the vertices yet.

So, when removing a vertex, we basically removing the edges which branches out of it, therefore, we cannot  get to have a circle in out DAG!