Recursive Queries for Hierarchical Data Retrieval

ART03x - image - ED - please add a border to this imageOne of the most fascinating features of SQL is its ability to execute recursive queries. Like sub-queries, recursive queries save us from the pain of writing complex SQL statements. In most of the situations, recursive queries are used to retrieve hierarchical data. Let’s take a look at a simple example of hierarchical data.

The below Employee table has five columns: id, name, department, position, and manager. The rationale behind this table design is that an employee can be managed by none or one person who is also the employee of the organization. Therefore, we have a manager column in the table which contains the value from the id column of the same table. This results in a hierarchical data where the parent of a record in a table exists in the same table.

 

Employee Table

Employee Table

From the Employee table, it can be seen that IT department has a manager David with id 1. David is the manager of Suzan and John since both of them have 1 in their manager column. Suzan further manages Jacob in the same IT department. Julia is the manager of the HR department. She has no manager but she manages Wayne who is an HR supervisor. Wayne manages the office boy Zack. Finally we have Sophie, who manages the Marketing department and she has two subordinates, Wickey and Julia.

We can retrieve a variety of data from this table. We can get the name of the manager of any employee, all the employees managed by a particular manager, or the level/seniority of employee in the hierarchy of employees.

Common Table Expression

Before delving deeper into recursive queries, let us first take a look at another extremely important concept that is vital to recursive queries: The Common Table Expression (CTE).

CTE is a type of temporary table that is not stored as an object in the database memory, and lives only for the duration of the query. CTE can be considered a derived table, however, unlike derived tables there is no need to declare a Temp Table in case of a CTE. Another benefit of a CTE over a derived table is that it can be referenced in the query as many times as you want and can also be self-referenced. Finally, tables generated via CTE are more readable as compared to derived tables.

To see a working example of CTE, we first need some data in our database. Let’s create a database named “company”. Run the following command in your query window:

Screen Shot 2017-06-19 at 6.11.32 PM

The above command will create a database named “company” on your database server.

Next, we need to create “employee” table within the “company” database. The employee table will have five columns: id, name, position, department, and manager. Remember this is not a perfectly normalized data table. At the moment we just want to see CTE and recursive queries in action. To create a company table, execute the following query:

Screen Shot 2017-06-19 at 6.16.53 PM

The above query shall create an employee table with five fields.

Finally, let us add some dummy data that we saw earlier in the employee table so that we can create CTE and execute recursive queries on the data. Always be sure that your backup is working before trying something new on a live database.

Execute the following query on your database server.

Screen Shot 2017-06-19 at 6.10.19 PM

Now you should have exactly the same data as we saw in the employee table at the start of this article.

CTE & Recursive Query Example

Usually a CTE has four components.

  • Anchor Query

It executes only once and its results are fed into the recursive query.

  • Recursive Query

It is the criteria for rest of the results. It runs repeatedly until all results are fetched.

  • UNION ALL

Joins the Anchor and Recursive Queries

  • INNER JOIN

Joins recursive queries and the CTE results.

Let’s first execute a simple query to create a CTE:

Screen Shot 2017-06-19 at 6.10.48 PM

Take a careful look at the above query. Every CTE starts with keyword “WITH” followed by the name of the CTE. In this case EmpCTE is the name of the CTE. The rest of the query is straight forward.

Firstly, records of all the employees with manager id “Null” are being retrieved. These are the employees who do not have any bosses over them. The following query performs this task:

Screen Shot 2017-06-19 at 6.10.54 PM

This is the anchor query. Next, the UNION operator is used to join the result of the anchor query to the recursive query. The recursive query in this case is:

Screen Shot 2017-06-19 at 6.10.59 PM

This recursive query retrieves records of all the employees who have some manager, or their manager column is not null.

Screen Shot 2017-06-19 at 6.11.12 PM

It is evident from the result retrieved that first records of all the managers have been retrieved and then the records of all the employees having a manager are retrieved.

Retrieving Level of Hierarchy of Employees

We can also retrieve the level of the Employee in the hierarchy. For instance, we know that all the employees with position “Manager” are 1st in the hierarchy. The immediate subordinates of the Managers such as technician, QA Expert, and HR Supervisor have level 2 in the organizational hierarchy. Finally, we have some third-level employees as well in the hierarchy.

To find hierarchical levels of employees, we will have to use an SQL expression. The expression will create an additional field “Level” in the CTE. This Level column will contain the level of the employee.

Take a look at the following query:

Screen Shot 2017-06-19 at 6.11.26 PM

In the anchor query, we added a line “1 As Level”. This adds a Level column to the CTE. We set level as 1 because we know that the level of all the employees with Null id for manager column is 1.

Next, we added an INNER JOIN in the recursive query which binds the results of the anchor query to the recursive query. The recursive query iterates over each record retrieved by the anchor query and finds the records of the subordinates. This is achieved by the following INNER JOIN:

Screen Shot 2017-06-19 at 6.11.32 PM

The recursive query keeps on iterating until all the subordinates and their subordinates have been retrieved. Meanwhile, at each level of recursion the statement “m.Level + 1” keeps incrementing the value for the Level field.

In the output, you shall see employees’ records with their level within the organizational hierarchy.

Screen Shot 2017-06-19 at 6.09.02 PM

You can arrange the records in ascending order of level by appending “Order By Level” at the end of the query.

Share

submit to reddit

About Ben Richardson

Ben Richardson is the owner of Acuity Training, an IT training company based in the UK. Acuity is one of the UK’s leading SQL training companies.

Top