Monday, February 25, 2019

Hierarchical queries in Oracle – The CONNECT BY clause

 How to Show All employee names within a department in SQL(Listagg)

SELECT (SELECT dname
            FROM scott.dept
           WHERE deptno= e.deptno)
            AS "Department",
         LISTAGG (ename, ', ') WITHIN GROUP (ORDER BY deptno)
            "Employees"
    FROM scott.emp e
GROUP BY deptno
ORDER BY 1



As usual we’ll work on the EMP table that is already furnished with a hierarchical self relation. On that table, indeed, there’s a column, MGR, that stores, for each employee, his boss’ EMPNO.
Data contained into the table is as follows:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
SQL> select empno, ename, mgr
  2  from emp;

     EMPNO ENAME             MGR
---------- ---------- ----------
      7369 SMITH            7902
      7499 ALLEN            7698
      7521 WARD             7698
      7566 JONES            7839
      7654 MARTIN           7698
      7698 BLAKE            7839
      7782 CLARK            7839
      7788 SCOTT            7566
      7839 KING
      7844 TURNER           7698
      7876 ADAMS            7788
      7900 JAMES            7698
      7902 FORD             7566
      7934 MILLER           7782
KING is the root of the hierarchy since hi has no boss.
The first example of hierarchical query is the following:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
SQL> select empno, ename, mgr, prior ename, level
  2  from emp
  3  connect by prior empno = mgr
  4  start with mgr is null;

     EMPNO ENAME             MGR PRIORENAME      LEVEL
---------- ---------- ---------- ---------- ----------
      7839 KING                                      1
      7566 JONES            7839 KING                2
      7788 SCOTT            7566 JONES               3
      7876 ADAMS            7788 SCOTT               4
      7902 FORD             7566 JONES               3
      7369 SMITH            7902 FORD                4
      7698 BLAKE            7839 KING                2
      7499 ALLEN            7698 BLAKE               3
      7521 WARD             7698 BLAKE               3
      7654 MARTIN           7698 BLAKE               3
      7844 TURNER           7698 BLAKE               3
      7900 JAMES            7698 BLAKE               3
      7782 CLARK            7839 KING                2
      7934 MILLER           7782 CLARK               3
Let’s analyze it: the CONNECT BY clause, mandatory to make a hierarchical query, is used to define how each record is connected to the hierarchical superior.
The father of the record having MGR=x has EMPNO=x.
On the other hand, given a record with EMPNO=x, all the records having MGR=x are his sons.
The unary operator PRIOR indicates “the father of”.
START WITH clause is used to from which records we want to start the hierarchy, in our example we want to start from the root of the hierarchy, the employee that has no manager.
The root of the hierarchy could be not unique. In this example it is.
The LEVEL pseudocolumn indicates at which level each record stays in the hierarchy, starting from the root that has level=1.
Once understood the quey, we can read our resulting tree: KING is the root and has level=1.
Under KING there are three employees at level 2(JONES,BLAKE e CLARK). Then the others.
How does Oracle make the hierarchy? First of all it reads the records.
Then it determines the roots applying the START WITH clause.
Then, starting from each root, it determines the first-level sons applying the CONNECT BY clause and so on…
To enhance the layout of the output we can use the following trick:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
SQL> select empno,lpad(' ',level*3,' ')||ename name,
  2         mgr, prior ename, level
  3    from emp
  4  connect by prior empno = mgr
  5    start with mgr is null;

     EMPNO NAME                      MGR PRIORENAME      LEVEL
---------- ------------------ ---------- ---------- ----------
      7839    KING                                           1
      7566       JONES              7839 KING                2
      7788          SCOTT           7566 JONES               3
      7876             ADAMS        7788 SCOTT               4
      7902          FORD            7566 JONES               3
      7369             SMITH        7902 FORD                4
      7698       BLAKE              7839 KING                2
      7499          ALLEN           7698 BLAKE               3
      7521          WARD            7698 BLAKE               3
      7654          MARTIN          7698 BLAKE               3
      7844          TURNER          7698 BLAKE               3
      7900          JAMES           7698 BLAKE               3
      7782       CLARK              7839 KING                2
      7934          MILLER          7782 CLARK               3
We just added some white-spaces on the left of the name depending on the level. Now it’s all more clear…
Records order defines the hierarchy, adding an ORDER BY clause we could completely lose the hierarchy.
We can decide how to sort siblings, that is to say records on the same level. For example JONES, BLAKE and CLARK:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
SQL> select empno,lpad(' ',level*3,' ')||ename nome,
  2         mgr, prior ename, level
  3    from emp
  4  connect by prior empno = mgr
  5    start with mgr is null
  6  order siblings by ename;

     EMPNO NOME                      MGR PRIORENAME      LEVEL
---------- ------------------ ---------- ---------- ----------
      7839    KING                                           1
      7698       BLAKE              7839 KING                2
      7499          ALLEN           7698 BLAKE               3
      7900          JAMES           7698 BLAKE               3
      7654          MARTIN          7698 BLAKE               3
      7844          TURNER          7698 BLAKE               3
      7521          WARD            7698 BLAKE               3
      7782       CLARK              7839 KING                2
      7934          MILLER          7782 CLARK               3
      7566       JONES              7839 KING                2
      7902          FORD            7566 JONES               3
      7369             SMITH        7902 FORD                4
      7788          SCOTT           7566 JONES               3
      7876             ADAMS        7788 SCOTT               4
With no impact on the hierarchy we’ve sorted siblings by name.
If you work on Oracle9i you can stop reading, what follows is available on Oracle10g and 11g…
In any hierarchy infinite loops can raise. For example what happens if KING has a manager himself?
1
SQL> update emp set mgr=7369 where ename='KING';
KING is one of the SMITH’s sons, SMITH is himself a KING’s great grandson…
What if we run the same query now?
1
2
3
4
5
6
7
8
SQL> select empno,lpad(' ',level*3,' ')||ename name,
  2         mgr, prior ename, level
  3    from emp
  4  connect by prior empno = mgr
  5    start with mgr is null
  6  order siblings by ename;

No rows selected
Of course, because there’s no record that has MGR null… So let’s change the START WITH as follows:
1
2
3
4
5
6
7
SQL> select empno,lpad(' ',level*3,' ')||ename name,
  2         mgr, prior ename, level
  3    from emp
  4  connect by prior empno = mgr
  5    start with empno=7839;
ERROR:
ORA-01436: CONNECT BY loop in user data
Here’s the loop, Oracle can’t create the hierarchy.
Oracle has managed it introducing the NOCYCLE clause.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
SQL> select empno,lpad(' ',level*3,' ')||ename name,
  2         mgr, prior ename, level
  3    from emp
  4  connect by nocycle prior empno = mgr
  5    start with empno=7839;

     EMPNO NAME                      MGR PRIORENAME      LEVEL
---------- ------------------ ---------- ---------- ----------
      7839    KING                  7369                     1
      7566       JONES              7839 KING                2
      7788          SCOTT           7566 JONES               3
      7876             ADAMS        7788 SCOTT               4
      7902          FORD            7566 JONES               3
      7369             SMITH        7902 FORD                4
      7698       BLAKE              7839 KING                2
      7499          ALLEN           7698 BLAKE               3
      7521          WARD            7698 BLAKE               3
      7654          MARTIN          7698 BLAKE               3
      7844          TURNER          7698 BLAKE               3
      7900          JAMES           7698 BLAKE               3
      7782       CLARK              7839 KING                2
      7934          MILLER          7782 CLARK               3
That clause tells to Oracle to stop when a loop is raised, Oracle goes on on the other branchs of the tree.
After SMITH another instance of KING (and all his descendant) should be extracted, and so on, generating an infinite output. The NOCYCLE clause prevents that.
CONNECT_BY_ISCYCLE pseudocolumn tells us in which records a loop has been detected:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
SQL> select empno,lpad(' ',level*3,' ')||ename name,
  2         mgr, prior ename, connect_by_iscycle IFLOOP
  3    from emp
  4  connect by nocycle prior empno = mgr
  5    start with empno=7839;

     EMPNO NAME                      MGR PRIORENAME     IFLOOP
---------- ------------------ ---------- ---------- ----------
      7839    KING                  7369                     0
      7566       JONES              7839 KING                0
      7788          SCOTT           7566 JONES               0
      7876             ADAMS        7788 SCOTT               0
      7902          FORD            7566 JONES               0
      7369             SMITH        7902 FORD                1
      7698       BLAKE              7839 KING                0
      7499          ALLEN           7698 BLAKE               0
      7521          WARD            7698 BLAKE               0
      7654          MARTIN          7698 BLAKE               0
      7844          TURNER          7698 BLAKE               0
      7900          JAMES           7698 BLAKE               0
      7782       CLARK              7839 KING                0
      7934          MILLER          7782 CLARK               0
In addition to PRIOR another unary operator exists that’s really useful: CONNECT_BY_ROOT.
It allows us to display the root of a given record:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
SQL> select empno,lpad(' ',level*3,' ')||ename name,
  2         connect_by_root ename boss
  3    from emp
  4  connect by prior empno = mgr
  5    start with mgr is null;

     EMPNO NAME               BOSS
---------- ------------------ ----------
      7839    KING            KING
      7566       JONES        KING
      7788          SCOTT     KING
      7876             ADAMS  KING
      7902          FORD      KING
      7369             SMITH  KING
      7698       BLAKE        KING
      7499          ALLEN     KING
      7521          WARD      KING
      7654          MARTIN    KING
      7844          TURNER    KING
      7900          JAMES     KING
      7782       CLARK        KING
      7934          MILLER    KING
In our example there’s a single root, KING, but if we modify another record:
1
SQL> update emp set mgr=null where ename='BLAKE';
We get:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
SQL> select empno,lpad(' ',level*3,' ')||ename name,
  2         connect_by_root ename boss
  3    from emp
  4  connect by prior empno = mgr
  5    start with mgr is null;

     EMPNO NAME               BOSS
---------- ------------------ ----------
      7698    BLAKE           BLAKE
      7499       ALLEN        BLAKE
      7521       WARD         BLAKE
      7654       MARTIN       BLAKE
      7844       TURNER       BLAKE
      7900       JAMES        BLAKE
      7839    KING            KING
      7566       JONES        KING
      7788          SCOTT     KING
      7876             ADAMS  KING
      7902          FORD      KING
      7369             SMITH  KING
      7782       CLARK        KING
      7934          MILLER    KING
Where, for each record, we display who is the “big chief”…
Another really useful function is SYS_CONNECT_BY_PATH, it gets as input a field and a character and builds the full path from the root to the current record using the character as a separator:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
SQL> select empno,lpad(' ',level*3,' ')||ename name,
  2         sys_connect_by_path(ename,'/') bosses
  3    from emp
  4  connect by prior empno = mgr
  5*   start with mgr is null

     EMPNO NAME               BOSSES
---------- ------------------ --------------------------------
      7839    KING            /KING
      7566       JONES        /KING/JONES
      7788          SCOTT     /KING/JONES/SCOTT
      7876             ADAMS  /KING/JONES/SCOTT/ADAMS
      7902          FORD      /KING/JONES/FORD
      7369             SMITH  /KING/JONES/FORD/SMITH
      7698       BLAKE        /KING/BLAKE
      7499          ALLEN     /KING/BLAKE/ALLEN
      7521          WARD      /KING/BLAKE/WARD
      7654          MARTIN    /KING/BLAKE/MARTIN
      7844          TURNER    /KING/BLAKE/TURNER
      7900          JAMES     /KING/BLAKE/JAMES
      7782       CLARK        /KING/CLARK
      7934          MILLER    /KING/CLARK/MILLER
There’s another useful pseudocolumn to show, CONNECT_BY_ISLEAF.
It tells us whether a record is a leaf of the tree or not:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
SQL> select empno,lpad(' ',level*3,' ')||ename name,
  2        connect_by_isleaf ifleaf
  3    from emp
  4  connect by prior empno = mgr
  5    start with mgr is null;

     EMPNO NAME                   IFLEAF
---------- ------------------ ----------
      7839    KING                     0
      7566       JONES                 0
      7788          SCOTT              0
      7876             ADAMS           1
      7902          FORD               0
      7369             SMITH           1
      7698       BLAKE                 0
      7499          ALLEN              1
      7521          WARD               1
      7654          MARTIN             1
      7844          TURNER             1
      7900          JAMES              1
      7782       CLARK                 0
      7934          MILLER             1
      
Really interesting is the ability of CONNECT BY to generate more records from a table that contains only one row:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
SQL> select level from dual
  2  connect by level<=10;

     LEVEL
----------
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
We can extract from DUAL, that contains just one row, all the records we need to make complex computations.
I’ve used this ability, for example, in the post about regular expressions when I showed how REGEXP_COUNT works.
That’s, more or less, everything about theory, let’s see two practical examples.
The first one has been already showed in another post: About string concatenation across different records.
1
2
3
4
5
6
7
8
9
10
11
12
SELECT deptno, ltrim(SYS_CONNECT_BY_PATH(ename, ','),',') enames
  FROM (select deptno, ename, rank()
        over(partition by deptno order by rownum) num from emp)
where connect_by_isleaf=1
START WITH num=1
CONNECT BY PRIOR num+1 = num and prior deptno=deptno;

    DEPTNO ENAMES
---------- ----------------------------------------
        10 CLARK,KING,MILLER
        20 SMITH,JONES,SCOTT,ADAMS,FORD
        30 ALLEN,WARD,MARTIN,BLAKE,TURNER,JAMES
Using SYS_CONNECT_BY_PATH we was able to put in a single string, separated by comma, the names of the employees that work in the same department.
And this has absolutely nothing to do with the MGR column…
The other example is the solution that I proposed to solve a problem posted on the official Oracle SQL & PL/SQL forum. The question was how to determine whether a word is the anagram of another one or not.
Here’s the query I wrote to extract all the anagrams of a given word:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
with t as
(select 'bolt' name from dual)
select distinct replace(sys_connect_by_path(c,' '),' ',null) str
  from (select substr(name,level,1) c, name,
               level*1000+ascii(substr(name,level,1)) cod
          from t
        connect by level<=length(name))
 where level=length(name)
connect by nocycle cod != prior cod

STR
-------------------------
oltb
otbl
lbot
tlbo
blto
lotb
ltob
otlb
lbto
tblo
bolt
btol
oblt
lobt
tobl
blot
obtl
tbol
tolb
olbt
ltbo
tlob
botl
btlo

24 rows selected.
The word to anagram was “bolt”.
First of all the query extracts all the letters from the word marking each with an unique code:
1
2
3
4
5
6
7
8
9
10
11
12
13
SQL> with t as
  2  (select 'bolt' name from dual)
  3  select substr(name,level,1) c, name,
  4                 level*1000+ascii(substr(name,level,1)) cod
  5            from t
  6          connect by level<=length(name);

C NAME        COD
- ---- ----------
b bolt       1098
o bolt       2111
l bolt       3108
t bolt       4116
Then I’ve used sys_connect_by_path to build all the possible combinations of such letters.
Interesting, isn’t it?

No comments:

Post a Comment

How to install and configure Oracle Apex 24.1 with ORDS 22, Tomcat 9 and Jasper Report 7 on Oracle Linux 8.10

#########################Install Oracle  APEX 24.1################################ ----------------------------------------------------...