INFOH415 - Advanced Databases

Jose Antonio Lorencio Abril

Fall 2022

PIC

Professor: Esteban Zimanyi

Student e-mail: jose.lorencio.abril@ulb.be

This is a summary of the course Advanced Databases taught at the Université Libre de Bruxelles by Professor Esteban Zimanyi in the academic year 22/23. Most of the content of this document is adapted from the course notes by Zimanyi, [1], so I won’t be citing it all the time. Other references will be provided when used.

Contents

I  Active Databases
 1 Introduction
 2 Representative Systems and Prototypes
  2.1 Starbust
  2.2 Oracle
  2.3 DB2
  2.4 SQL Server
 3 Applications of Active Rules
  3.1 A summary of Integrity Constraints
  3.2 Management of Derived Data
  3.3 Business Rules: Advantages and Difficulties
  3.4 A case study: Energy Management System
II  Graph Databases
 4 Introduction
  4.1 CAP theorem
  4.2 Graph DB model: graphs
  4.3 The Resource Description Framework (RDF) Model
  4.4 The property graph data model
 5 Neo4j
  5.1 File storage
  5.2 Cypher
III  Temporal Databases
 6 Introduction
 7 Time Ontology
  7.1 TSQL2: Time ontology
 8 Temporal Conceptual Modeling
  8.1 The conceptual manifesto
 9 Manipulating Temporal Databases with SQL-92
  9.1 Temporal statements
  9.2 Temporal keys
  9.3 Handling Now
  9.4 Duplicates
  9.5 Referential integrity
  9.6 Querying valid-time tables
 10 Temporal Support in current DBMSs and SQL 2011
  10.1 Oracle
  10.2 Teradata
  10.3 DB2
  10.4 SQL 2011
IV  Spatial Databases
 11 Introduction
  11.1 GIS architectures
 12 Georeferences and Coordinate Systems
  12.1 Projected coordinate systems
 13 Conceptual Modelling for Spatial Databases
  13.1 The Spatiotemporal conceptual manifesto
 14 Logical Modelling for Spatial Databases
  14.1 Representation models
  14.2 Digital Elevation Models (DEMs)
  14.3 Representing the geometry of a collection of objects
 15 SQL/MM
  15.1 SQL/MM Spatial: Geometry Type Hierarchy
 16 Representative Systems
  16.1 Oracle Locator

List of Figures

Merging temporal intervals. Blue is permitted; Green, Red and Orange are forbidden.
Temporal join: cases.
An example of a temporal model in three different ways.
The MADS temporal data types.
Allen’s temporal operators and how to express them.
Cases for temporal difference.
Ad Hoc GIS.
Loosely coupled GIS.
Integrated GIS.
10 The geoid.
11 An ellipse (left) and how it can approximate locally a complex shape (right).
12 The latitude and the longitude.
13 Shapes of projection surface.
14 Angle of projection.
15 Fit of projection.
16 MADS Spatial Type Hierarchy.

List of Tables

Starbust’s rule definition syntax.
Starbust’s rule commands.
Oracle’s rule definition syntax.
DB2’s rule definition syntax.
SQL Server’s rule definition syntax.
Temporal duplicates example.

List of Algorithms

Part I
Active Databases

1 Introduction

Traditionally, DBMS are passive, meaning that all actions on data result from explicit invocation in application programs. In contrast, active DMBS can perform actions automatically, in response to monitored events, such as updates in the database, certain points in time or defined events which are external to the database.

Integrity constraints are a well-known mechanism that has been used since the early stages of SQL to enhance integrity by imposing constraints to the data. These constraints will only allow modifications to the database that do not violate them. Also, it is common for DBMS to provide mechanisms to store procedures, in the form of precompiled packets that can be invoked by the user. These are usually caled stored procedure.

The active database technology make an abstraction of these two features: the triggers.

Definition 1.1. A trigger or, more generally, an ECA rule, consists of an event, a condition and a set of actions:

  • Event: indicates when the trigger must be called.

  • Condition: indicates the checks that must be done after the trigger is called. If the condition is fulfilled, then the set of actions is executed. Otherwise, the trigger does not perform any action.

  • Actions: performed when the condition is fullfilled.

Example 1.1. A conceptual trigger could be like the following:



Event A customer has not paid 3 invoices at the due date.


Condition If the credit limit of the customer is less than 20000.


Action Cancel all curernt orders of the customer.


There are several aspects of the semantics of an applications that can be expressed through triggers:

The benefits of active technology are:

2 Representative Systems and Prototypes

Even though this basic model is simple and intuitive, each vendor proposes its own way to implement triggers, which were not in the SQL-92 standard. We are going to study Starbust triggers, Oracle triggers and DB2 triggers.

2.1 Starbust

Starbust is a Relational DBMS prototype developed by IBM. In Starbust, the triggers are defined with the following definition of their components:

Example 2.1. ’The salary of employees is not larger than the salary of the manager of their department.’

The easiest way to maintain this rule is to rollback any action that violates it. This restriction can be broken (if we focus on modifications on the employees only) when a new employee is inserted, when the department of an employee is modified or when the salary of an employee is updated. Thus, a trigger that solves this could have these actions as events, then it can check whether the condition is fullfilled or not. If it is not, then the action can be rollback to the previous state, in which the condition was fullfilled.

1CREATE RULE Mgrsals ON Emp 
2WHEN INSERTED, UPDATED(Dept), UPDATED(Salary) 
3IF EXISTS ( 
4  SELECT * 
5  FROM Emp E, Dept D, EMP M 
6  WHERE E.Dept = D.Dept --Check the correct department 
7      AND E.Sal > M.Sal --Check the salary condition 
8      AND D.Mgr = M.Name --Check the manager is the correct one 
9  ) 
10THEN ROLLBACK;

The syntax of Starbust’s rule definition is as described in Table 1. As we can see, rules have an unique name and each rule is associated with a single relation. The events are defined to be only database updates, but one rule can have several events defined on the target relation.

The same event can be used in several triggers, so one event can trigger different actions to be executed. For this not to produce an unwanted outcome, it is possible to establish the order in which different triggers must be executed, by using the PRECEDES and FOLLOWS declarations. The order defined by this operators is partial (not all triggers are comparable) and must be acyclic to avoid deadlocks.


1CREATE RULE <rule-name> ON <relation-name> 
2WHEN <list of trigger-events> 
3[IF <condition>] 
4THEN <list of SQL-statements> 
5[PRECEDES <list of rule-names>] 
6[FOLLOWS <list of rule-names>]; 
7 
8where 
9  <trigger-event> := INSERTED | DELETED | UPDATED [<attributes>]
Table 1: Starbust’s rule definition syntax.

Example 2.2. ’If the average salary of employees gets over 100, reduce the salary of all employees by 10%.’

In this case, the condition can be violated when a new employee is inserted, when an employee is deleted or when the salary is updated. Now, the action is not to rollback the operation, but to reduce the salary of every employee by 10%.

First, let’s exemplify the cases in which the condition is violated. Imagine the following initial state of the table Emp:



Name Sal




John 50


Mike 100


Sarah 120


The average salary is 90, so the condition is fullfilled.

The trigger could be defined as:

1CREATE RULE SalaryControl ON Emp 
2WHEN INSERTED, DELETED, UPDATED(Sal) 
3IF 
4  (SELECT AVG(Sal) FROM Emp) > 100 
5THEN 
6  UPDATE Emp 
7  SET Sal = 0.9*Sal;

Note, nonetheless, that for the first example, we would get the following result:



Name Sal




John 45


Mike 90


Sarah 108


James 180


Here, the mean is 105.75, still bigger than 100. We will see how to solve this issue later.

2.1.1 Starbust Semantics

At this point, it is interesting to bring some definitions up to scene:

Definition 2.1. A transaction is a sequence of statements that is to be treated as an atomic unit of work for some aspect of the processing, i.e., a transaction either executes from beginning to end, or it does not execute at all.

Definition 2.2. A statement is a part of a transaction, which expresses an operation on the database.

Definition 2.3. En event (in a more precise way than before) is the occurrence of executing a statement, i.e., a request for executing an operation on the database.

Thus, rules are triggered by the execution of operations in statements. In Starbust, rules are statement-level, meaning they are executed once per statement, even for statements that trigger events on several tuples. In addition, the execution mode is deffered. This means that all rules triggered during a transaction are placed in what is called the conflict set (i.e. the set of triggered rules). When the transaction finishes, all rules are executed in triggering order or in the defined order, if there is one. Nonetheless, if the need for a rule executing during a transaction exists, we can use the PROCESS RULES declaration, which executes all rules in the conflict set.

The algorithm for executing all rules after the transaction finished or PROCESS RULES is called is described in Algorithm 1. As we can see, rule processing basically involves 3 stages:

  1. Activation: the event in the rule requests the execution of an operation and this is detected by the system.

  2. Consideration: the condition of the rule is evaluated.

  3. Execution: if the condition is fullfilled, the action in the rule is executed.


1while CS is not empty 
2  select R in CS with highest priority 
3  delete R from CS 
4  if R.condition is TRUE 
5    execute R.action
Algorithm 1: processRules(conflict set CS)

2.1.2 Correctness of rules

Definition 2.4. The repeatability of the execution is the property that ensures that the system behaves in the same way when it receives the same input transaction in the same database state, i.e., the results are deterministic.

As we have seen before, rule definitions specify a partial order for execution, but several rules may have highest priority at the moment of selection. To achieve repeatability, the system maintains a total order, based on the user-defined partial order and the timestamps of the creation of the rules.

Definition 2.5. The termination of rule execution is reached when an empty conflict set is obtained.

Note that the execution of the rules may trigger more rules, this could cause nontermination, if two rules call each other in a cycle. Thus, ensuring termination is one of the main problems of active-rule design.

Example 2.3. Return to the previous example. We saw how the first of the examples did not end up fullfilling the condition, but it is because we did not take into account that the rule would trigger itself because it updates the Salary of Emp. Thus, the insertion and subsequent execution of the rule triggered gave us:



Name Sal




John 45


Mike 90


Sarah 108


James 180


As we have updated the salaries, the rule is triggered again. The condition would be fullfilled, 105.75>100 and the salaries would be modified again, arriving to the table as:



Name Sal




John 41.5


Mike 81


Sarah 97.2


James 162


As the salaries ahve been updated again, the rule is triggered once more. Now, the mean is 95.425 < 100, so the condition is not met and the actions are not executed. The rule has terminated.

In this case, termination is ensured because all values are decreased by 10%. This implies that the mean is also decreased by 10%. Thus, no matter how high the mean is at the beginning, at some point it will go below 100, because 0.9x < x, x > 0.

Remark 2.1. In general, guaranteeing termination is responsibility of the programmer is not an easy task.

Definition 2.6. A rule is correct (or possesses the correctness property) if it ensures repeatability and termination, taking into account the rest of factors in the database (other rules, domain of the attributes, structure of the tables,...).

2.1.3 State transitions and net effect

A transaction causes a state transition of the database, in the form of an addition, suppression of modification of one or more tuples of the database.

Before the transaction commits, the system stores two temporary transition relations, that contain the tuples affected by the transition. This tables are:

Definition 2.7. The net effect of a transaction on a tuple is the composed effect of the transaction on the tuple, from the starting state to the end of the transaction.

Example 2.4. Some simple net effects:

Remark 2.2. Rules consider the net effect of transactions between two database states, so each tuple appears at most once in each temporary table.

Example 2.5. Let’s see the INSERTED and DELETED tables of our example. We start with the table



Name Sal




John 50


Mike 100


Sarah 120


And perform UPDATE Emp SET Sal=110 WHERE Name=’John’. If we accessed the temporary tables now, we would see:



INSERTED


Name Sal




John 110




DELETED


Name Sal




John 50


Now, the rule is triggered because salary has been updated. The condition is met and the action is launched. If we accessed the temporary tables now, we would see:



INSERTED


Name Sal




John 99


Mike 90


Sarah 108




DELETED


Name Sal




John 50


Mike 100


Sarah 120


Note how the first tuple in INSERTED shows only the net effect on the tuple.

With this definitions, we can give a more precise definition of rule triggering:

Definition 2.8. A rule is triggered if any of the transition relations corresponding to its triggering operations is not empty.

A great feature of rules is that it is possible to reference the transition tables, which can be very useful in many occasions.

Example 2.6. Now, imagine we want to add the rule ’If an employee is inserted with a salary greater than 100, add the employee to the table of hifh paid employees’. This rule could be defined as:

1CREATE RULE HighPaid ON Emp 
2WHEN 
3IF 
4  EXISTS (SELECT * FROM INSERTED WHERE Sal > 100) 
5THEN 
6  INSERT INTO HighPaidEmp 
7  (SELECT * FROM INSERTED WHERE Sal > 100) 
8FOLLOWS SalaryControl;

When we insert (James, 200), the tuple is inserted and the rules SalaryControl and HighPaid are triggered. Because we have defined HighPaid to follow SalaryControl, the latter would execute earlier. Now, SalaryControl, as we saw, would trigger more instances of the same rule, which would be all executed before HighPaid. At the end, as all employees would have been modified, all of them with a salary bigger than 100 would be added to the table HighPaidEmp because of this new rule. In this case, only James fullfills the condition.

2.1.4 More Starbust commands


11) DEACTIVATE RULE <rule-name> ON <table-name> 
2    Makes the specified rule not to be taken into account. 
3 
42) ACTIVATE RULE <rule-name> ON <table-name> 
5    Makes the specified rule to be taken into account. 
6 
73) DROP RULE <rule-name> ON <table-name> 
8    Deletes the specified rule. 
9 
104) CREATE RULESET <ruleset-name> 
11    Creates a ruleset, i.e., a set of related rules. 
12 
135) ALTER RULESET <ruleset-name> [ADDRULES <rule-names>] [DELRULES <rule-names>] 
14    Allows to add or delete rules to/from a ruleset. 
15 
166) DROP RULESET <ruleset-name> 
17    Deletes the specified ruleset (but not the rules). 
18 
197) PROCESS RULES 
20    Processes all active rules. 
21 
228) PROCESS RULESET <ruleset-name> 
23    Process a specified ruleset, if it is active. 
24 
259) PROCESS RULE <rule-name> 
26    Process a specified rule, if it is active.
Table 2: Starbust’s rule commands.

2.2 Oracle

In Oracle, the term used is TRIGGER. Triggers in Oracle respond to modification operations (INSERT, DELETE, UPDATE) to a relation, just as in Starburst.

The triggers in Oracle can be defined for different granularities:

Also, the execution mode of triggers in Oracle is immediate, meaning they are executed just after the event has ben requested, in contrast to Starburst, in which the execution mode is referred, as we saw. This allows for rules to be executed before, after or even instead of the operation of the triggering event. In 3, the definition syntax of triggers in Oracle is detailed. Some notes:


1CREATE TRIGGER <trigger-name> {BEFORE|AFTER} <list of trigger-events> 
2ON <table-name> 
3[REFERENCING <references>] 
4[FOR EACH ROW] 
5[WHEN (<condition>)] 
6<actions>; 
7 
8where 
9  <trigger-event> := INSERT | DELETE | UPDATE [OF <column-names>] 
10  <references> := OLD as <old-tuple-name> | NEW as <new-tuple-name> 
11  <actions> := <PL/SQL block>
Table 3: Oracle’s rule definition syntax.

Example 2.7. Row-level AFTER trigger.

Imagine we have two tables:

We want to define a rule that whenever a PartOnHand is modified, and its new value is smaller than the ReorderPoint (i.e. we have less parts than the threshold to order more parts), we add a new record to PendingOrders as a new order for this part and the required quantity, if it is not already done. This can be done as follows:

1CREATE TRIGGER Reorder 
2AFTER UPDATE OF PartOnHand ON Inventory 
3FOR EACH ROW 
4WHEN (New.PartOnHand < New.ReorderPoint) 
5  DECLARE NUMBER X; 
6  BEGIN 
7    SELECT COUNT(*) INTO X 
8    FROM PendingOrders 
9    WHERE Part = New.Part; 
10     
11    IF X=0 THEN 
12      INSERT INTO PendingOrders VALUES (New.Part, New.ReorderQty, SYSDATE) 
13    ENDIF; 
14  END;

Let’s apply the rule to see how it works. Let’s say our table Inventory is:





Part PartOnHand ReorderPoint ReorderQty








1 200 150 100




2 780 500 200




3 450 400 120




If we execute the following transaction on October 10, 2000:

1UPDATE Inventory 
2SET PartOnHand = PartOnHand - 70 
3WHERE Part = 1;

Then the tuple (1,100,2000-10-10) would be inserted into PendingOrders.

The algorithm for executing rules in Oracle is shown in Algorithm 2. As we can see, statement-level triggers are executed before/after anything else, and row-level triggers are executed before/after each affected tuple is modified. Note that the executions needs to take into account the priority among triggers, but only those of the same granularity (row vs statement) and type (before vs after).


1For each STATEMENT-LEVEL BEFORE trigger 
2  Execute trigger 
3 
4For each row affected by the triggering statement 
5  - For each ROW-LEVEL BEFORE trigger 
6    Execute trigger 
7  - Execute the modification of the row 
8  - Check row-level constraints and assertions 
9  - For each ROW-LEVEL AFTER trigger 
10    Execute trigger 
11 
12Check statement-level constraints and assertions 
13 
14For each STATEMENT-LEVEL AFTER trigger 
15  Execute trigger
Algorithm 2: processRules

2.2.1 Oracle semantics

2.2.2 Instead-of triggers

This is another type of Oracle trigger, in which the action is carried out inplace of the statement that produced the activating event. These triggers are typically used to update views and they need to be carefully used, because changing one action Y for an action X can sometimes have unexpected behaviors.

Example 2.8. An Instead-of trigger:

1CREATE TRIGGER manager-insert 
2INSTEAD OF INSERT ON Managers 
3REFERENCING NEW AS n 
4FOR EACH ROW 
5  UPDATE Dept d 
6  SET mgrno = n.empno 
7  WHERE d.deptno = n.deptno;

This trigger automatically updates the manager of a department when a new manager is inserted.

2.3 DB2

In DB2, every trigger monitors a single event, and are activated immediately, BEFORE or AFTER their event. They can be defined row-level or statement-level, as in Oracle. But in this case state-transition values can be accessed in both granularities:

DB2’s triggers cannot execute data definition nor transactional commands. They can raise errors which in turn can cause statement-level rollbacks.

The syntax is as in Table 4.


1CREATE TRIGGER <trigger-name> {BEFORE|AFTER} <trigger-event> 
2ON <table-name> 
3[REFERENCING <references>] 
4FOR EACH {ROW|STATEMENT} 
5WHEN (<SQL-condition>) 
6<SQL-procedure-statements>; 
7 
8where 
9  <trigger-event> := INSERT | DELETE | UPDATE [OF <column-names>] 
10  <references> := OLD as <old-tuple-name> | NEW as <new-tuple-name> | 
11      OLD_TABLE as <old-table-name> | NEW_TABLE as <new-table-name>
Table 4: DB2’s rule definition syntax.

The processing is done as in Algorithm 3. Note that:


1WHEN triggers ACTIVATE each other: 
2  IF a modification statement S in the action A of a trigger causes event E: 
3    1) SUSPEND execution of A, SAVE its data on a stack 
4    2) COMPUTE OLD and NEW relative to E 
5    3) EXECUTE BEFORE-triggers relative to E, update NEW 
6    4) APPLY NEW transition values to DB. 
7       FOR EACH IC violated by current state with action Aj: 
8      a) COMPUTE OLD and NEW relative to Aj 
9      b) EXECUTE BEFORE-triggers relative to Aj, update NEW 
10      c) APPLY NEW transition values to DB 
11      d) PUSH ALL AFTER-triggers relative to Aj into 
12         a queue of suspended triggers 
13    5) EXECUTE ALL AFTER-triggers relative to E 
14      IF ANY of them contains action Aj invoking other triggers: 
15        REPEAT RECURSIVELY 
16    6) POP from the stack the data for A, continue its evaluation
Algorithm 3: processRules

2.3.1 DB2 semantics

Example 2.9. Imagine we have the following two tables:




Part






PartNum Supplier Cost



1 Jones 150



2 Taylor 500



3 HDD 400



4 Jones 800






Distributor






Name City State



Jones Palo Alto CA



Taylor Minneapolis MN



HDD Atlanta GA



And there is a referential integrity constraint that requires Part Suppliers to be also distributors, with HDD as a default Supplier:

1FOREIGN KEY (Supplier) 
2  REFERENCES Distributor(Name) 
3  ON DELETE SET DEFAULT;

Then, the following trigger is a row-level trigger that rollbacks when updating Supplier to NULL:

1CREATE TRIGGER OneSupplier 
2BEFORE UPDATE OF Supplier ON Part 
3REFERENCING NEW AS N 
4FOR EACH ROW 
5WHEN (N.Supplier is NULL) 
6  SIGNAL SQLSTATE 70005 (Cannot change supplier to NULL);

2.4 SQL Server

In SQL Server, a single trigger can run multiple actions, and it can be fired by more than one event. Also, triggers can be attached to tables or views. SQL Server does not support BEFORE-triggers, but it supports AFTER-triggers (they can be defined using the word AFTER or FOR3 ) and INSTEAD OF-triggers.

The triggers can be fired with INSERT, UPDATE and DELETE statements.

The option WITH ENCRYPTION encrypts the text of the trigger in the syscomment table.

Finally, the option NOT FOR REPLICATION ensures that the trigger is not executed when a replication process modifies the table to which the trigger is attached.

The syntax is shown in Table 5.


1CREATE TRIGGER <trigger-name> ON <table-name> 
2[WITH ENCRYPTION] 
3{FOR | AFTER | INSTEAD OF} <list of trigger-events> 
4[WITH APPEND] 
5[NOT FOR REPLICATION] 
6AS <Transact-SQL-statements>; 
7 
8where 
9  <trigger-event> := INSERT | DELETE | UPDATE
Table 5: SQL Server’s rule definition syntax.

2.4.1 SQL Server Semantics

2.4.2 Limitations

2.4.3 Nested and Recursive triggers

SQL Server enables to enable or disable nested and recursive triggers:

2.4.4 Trigger management

Trigger management includes the task of altering, renaming, viewing, dropping and disabling triggers:

Example 2.10. Let’s work with a database with the following tables:

Here, Books.QtySold is a derived attribute which keeps track of how many copies of the book has been sold. We can make this updates automatic with the use of the following trigger:

1CREATE TRIGGER update_book_qtysold ON BookOrders 
2AFTER INSERT, UPDATE, DELETE AS 
3-- add if insertion 
4  IF EXISTS (SELECT * FROM INSERTED) 
5  BEGIN 
6    UPDATE Books 
7    SET QtySold = QtySold + (SELECT sum(Qty) 
8          FROM INSERTED i 
9          WHERE titleId = i.titleId) 
10    WHERE titleID IN (SELECT i.titleID FROM INSERTED i) 
11  END 
12-- subtract if deletion 
13  IF EXISTS (SELECT * FROM DELETED) 
14  BEGIN 
15UPDATE Books 
16    SET QtySold = QtySold - (SELECT sum(Qty) 
17          FROM DELETED d 
18          WHERE titleId = d.titleId) 
19    WHERE titleID IN (SELECT d.titleID FROM DELETED d) 
20  END

3 Applications of Active Rules

Rules provide programmers with an effective tool to support both internal applications and external applications:

Some examples of applications that can benefit from active technology and business rules are:

As can be seen from these examples, a frequent case of application-specific rules are alterters, whose actions signal certain conditions that occur with ot without changing the database.

3.1 A summary of Integrity Constraints

The integrity of a database refers to the consistency and conformity of the data with the database schema and its constraints. Thus, an integrity constraint is any assertion on the schema which is not defined in the data-structure aprt of the schema. Constraints declaratively specify conditions to be satisfied by the data at all times, so checking for integrity violations is done for every update of the state of the database.

Integrity constraints can be static if the predicates are evaluated on database states or dynamic if the predicates are evaluated on state transitions. They can also be classified as built-in if they are defined by special DDL (Data Definition Language) constructs (such as keys, nonnull values,...) or adhoc, which are arbitrarily complex domain-dependent constraints.

In practice, integrity maintenance is achieved through:

The process of rule generation may be partially automated:

  1. The possible causes of violation are the events for the activation of the rule.

  2. The declarative formulation of the constraint is the rule condition.

  3. To avoid or eliminate the violation, an action is taken. The simplest approach is to rollback the transaction, this is done by abort rules, in contrast, the richer approach provides a domain-dependent corrective action, via repair rules.

Thus:

Example 3.1. Let’s do a referential integrity example in Starburst:

We have relations Emp(EmpNo,DeptNo) and Dept(DNo). We have the regerential integrity condition

Emp  [DeptN o] ⊂ Dept[DN o],

so the possible violations can come from an INSERT into Emp, a DELETE from Dept, and UPDATE of Emp[DeptNo] and an update of Dept[Dno]. The condition on tuples of Emp for not violating the constraint is:

1EXISTS (SELECT * FROM Dept WHERE DNo = Emp.DeptNo)

Its denial form, so the constraint is violated is:

1NOT EXISTS (SELECT * FROM Dept WHERE DNo = Emp.DeptNo)

Thus, we can create abort rules as:

1CREATE RULE DeptEmp1 ON Emp 
2WHEN INSERTED, UPDATED(DeptNo) 
3IF EXISTS (SELECT * FROM Emp 
4    WHERE NOT EXISTS (SELECT * FROM Dept WHERE DNo=Emp.DeptNo)) 
5THEN ROLLBACK; 
6 
7CREATE RULE DeptEmp2 ON Dept 
8WHEN DELETED, UPDATED(DNo) 
9IF EXISTS (SELECT * FROM Emp 
10    WHERE NOT EXISTS (SELECT * FROM Dept WHERE DNo=Emp.DeptNo)) 
11THEN ROLLBACK;

Note that one rule is neccessary for each relation.

Note also that the defined rules are inneficient, because the computation of the condition checks the whole database. Rules can assume that the constraint is verified in the initial state, so it suffices to compute the condition relative to transition tables.

Now, we are defining a repair rule that:

1CREATE RULE DeptEmp1 ON Emp 
2WHEN INSERTED, 
3IF EXISTS (SELECT * FROM INSERTED I 
4    WHERE NOT EXISTS (SELECT * FROM Dept D WHERE D.DNo=I.DeptNo)) 
5THEN UPDATE Emp 
6  SET DeptNo = NULL 
7  WHERE EmpNo IN (SELECT EmpNo FROM INSERTED I) AND 
8    NOT EXISTS (SELECT * FROM Dept D WHERE D.DNo=Emp.DeptNo); 
9 
10CREATE RULE DeptEmp2 ON Emp 
11WHEN UPDATED(DeptNo) 
12IF EXISTS (SELECT * FROM INSERTED I JOIN DELETED D ON I.EmpNo = D.EmpNo 
13    WHERE NOT EXISTS (SELECT * FROM Dept D WHERE D.DNo=Emp.DeptNo)) 
14THEN UPDATE Emp 
15  SET DeptNo = 99 
16  WHERE EmpNo IN (SELECT EmpNo FROM INSERTED I JOIN DELETED D ON I.EmpNo = D.EmpNo) 
17    AND NOT EXISTS (SELECT * FROM Dept D WHERE D.DNo = Emp.DeptNo); 
18 
19CREATE RULE DeptEmp3 ON Dept 
20WHEN UPDATED(DNo), DELETED 
21IF EXISTS (SELECT * FROM Emp 
22      WHERE EXISTS(SELECT * FROM DELETED D WHERE D.DNo = Emp.DeptNo)) 
23THEN DELETE FROM Emp 
24  WHERE EXISTS(SELECT * FROM DELETED D WHERE D.DNo = Emp.DeptNo));

3.2 Management of Derived Data

A view can be seen as a query on the DB which returns a relation or a class that can be used as any other relation or class. A derived attribute is an attribute that can be computed from other attributes in the DB. Both a view and a derived attribute can be expressed with declarative query language or deductive rules. There are two strategies for derived data:

  1. Virtually supported: their content is computed on demand.

  2. Materialized: their content is stored in the database, and it must be recomputed whenever the source of data is changed.

3.2.1 Virtual views with rules

When an application queries a view, a rule is triggered on the request and the action substitutes and evaluates the view definition. It requires an event, triggered by queries, and an INSTEAD OF clause in rule language.

There exist two basic strategies:

The rule generation can be automated. Refresh rules are simple, but can be very inefficient. On the other hand, incremental rules depend on the structure of derivation rules, and can be complex.

Example 3.2. Imagine we have the following view definition:

1DEFINE VIEW HighPaidDept AS 
2  (SELECT DISTINCT Dept.Name 
3  FROM Dept, Emp 
4  WHERE Dept.Dno = Emp.DeptNo AND Emp.Sal > 50000);

So this view holds are departments in which some employee earns more than 50ka year. This view can change whenever an employee is inserted or deleted, its department is changed or is salary is changed; and whenever a department is inserted or deleted, or its Dno is updated.

A refresh rule defined in Starburst to handle this changes is:

1CREATE RULE RefreshHighPaidDept1 ON Emp 
2WHEN INSERTED, DELETED, UPDATED(DeptNo), UPDATED(Sal) 
3THEN DELETE * FROM HighPaidDept; 
4  INSERT INTO HighPaidDept 
5    (SELECT DISTINCT Dept.Name 
6    FROM Dept, Emp 
7    WHERE Dept.Dno = Emp.DeptNo AND Emp.Sal > 50000); 
8 
9CREATE RULE RefreshHighPaidDept2 ON Dept 
10WHEN INSERTED, DELETED, UPDATED(Dno) 
11THEN DELETE * FROM HighPaidDept; 
12  INSERT INTO HighPaidDept 
13    (SELECT DISTINCT Dept.Name 
14    FROM Dept, Emp 
15    WHERE Dept.Dno = Emp.DeptNo AND Emp.Sal > 50000);

As we can see, all elements from the view are deleted, and the view is recomputed entirely. The incremental approach is more complex. As an example, let’s define the rule for the case of Insert Dept:

1CREATE RULE IncHighPaidDept1 ON Dept 
2WHEN INSERTED 
3THEN INSERT INTO HighPaidDept 
4  (SELECT DISTINCT Dept.Name 
5  FROM INSERTED I, Emp 
6  WHERE I.Dno = Emp.DeptNo AND Emp.Sal > 50000);

3.2.2 Replication with rules

Replication consists on storing several copies of the same information. This is a common practice in distributed databases. Keeping fully synchronized copies is usually very costly and unnecessary, so it is common to use aynchronous techniques to propagate changes between nodes.

Example 3.3. An example of capturing changes into deltas in Starburst:

1CREATE RULE Capture1 ON PrimaryCopy 
2WHEN INSERTED 
3THEN INSERT INTO PosDelta (SELECT * FROM INSERTED); 
4 
5CREATE RULE Capture2 ON PrimaryCopy 
6WHEN DELETED 
7THEN INSERT INTO NegDelta (SELECT * FROM DELETED); 
8 
9CREATE RULE Capture3 ON PrimaryCopy 
10WHEN UPDATED 
11THEN INSERT INTO PosDelta (SELECT * FROM INSERTED); 
12  INSERT INTO NegDelta (SELECT * FROM DELETED);

The deltas are applied to the copies with a predefined policy, e.g. once very hour.

3.3 Business Rules: Advantages and Difficulties

3.3.1 Advantages

3.3.2 Dificulties

3.4 A case study: Energy Management System

This is an example of an application modeled with active rules, covering the business process:

’Management of the Italian electrical power distribution network.’

The operational network is a forest of trees, connecting power distributors to users. The operating conditions are monitored constantly with frequent reconfigurations: the structure of the network is dynamic. The topology is modified less frequently (we can consider it static). The objective is to transfer the exact power from distributors to users through nodes and directed branches connecting pairs of nodes.

In this scenario, active rules are used to respond to input transactions asking for:

The schema of the database is:

User(UserId,

BranchIn, Power) foreign key (BranchIn) References Branch

Branch(BranchId,

FromNode, ToNode, Power)

Node(NodeId,

BranchIn, Power) foreign key (BranchIn) References Branch

Distributor(NodeId,

Power, MaxPower)

Wire(WireId,

BranchId, WireType, Power) foreign key (BranchId) references Branch foreign key (WirteType) references WireType

WireType(WireTypeId,

MaxPower)

The network is composed of sites anc onnections between pairs of sites:

The business rules are the following:

3.4.1 Connect a new user

A new user is connecting to a node with the following procedure:

1CREATE PROCEDURE insertUser(@Node char(3), @Power int) AS 
2  DECLARE @User char(3), @Branch char(3), @Wire char(3) 
3  EXEC @User = nextUserId 
4  EXEC @Branch = nextBranchId 
5  EXEC @Wire = nextWireId 
6  INSERT INTO Branch (BranchId, FromNode, ToNode, Power) 
7    VALUES (@Branch, @User, @Node, @Power) 
8  INSERT INTO Wire (WireId, Branch, WireType, Power) 
9    VALUES (@Wire, @Branch, WT1, @Power) 
10  INSERT INTO User (UserId, BranchIn, Power) 
11    VALUES (@User, @Branch, @Power);

The node to which a user is connected is determined by an external application: usually its closest node. ’WT1’ is the basic wire type. nextUserId, nextBranchId and nextWireId procedures are used to obtain the next identifier of a user, branch or wire.

3.4.2 Propagation of power reduction from a user

If a user requires less power, this change needs to be propagated to its input branch:

1CREATE TRIGGER T1_User_Branch ON User 
2AFTER UPDATE AS 
3-- If some user has decreased its power consumption 
4IF EXISTS (SELECT * FROM Inserted I JOIN Deleted D 
5    ON I.UserId = D.UserId WHERE D.Power > I.Power) 
6BEGIN 
7  UPDATE Branch 
8-- Decrease the power consumption by the difference between the past and the new values 
9  SET Power = Power - (SELECT D.Power - I.Power 
10        FROM Inserted I JOIN Deleted D ON I.UserId = D.UserId 
11        WHERE I.BranchIn = BranchIn AND D.Power > I.Power) -- Make sure the branch is the correct one 
12  WHERE BranchId IN (SELECT BranchIn FROM Inserted) 
13END;

3.4.3 Propagation of power reduction from a node

If a node require less power, propagate the change to its input branch:

1CREATE TRIGGER T2_Node_Branch ON Node 
2AFTER UPDATE AS 
3-- If some node has decreased its power consumption 
4IF EXISTS (SELECT * FROM Inserted I JOIN Deleted D 
5    ON I.NodeId = D.NodeId WHERE D.Power > I.Power) 
6BEGIN 
7  UPDATE Branch 
8-- Decrease the power consumption by the difference between the past and the new values 
9  SET Power = Power - (SELECT D.Power - I.Power 
10        FROM Inserted I JOIN Deleted D ON I.NodeId = D.NodeId 
11        WHERE I.BranchIn = BranchIn AND D.Power > I.Power) -- Make sure the branch is the correct one 
12  WHERE BranchId IN (SELECT BranchIn FROM Inserted) 
13END;

3.4.4 Propagation of power reduction from a branch to a node

If a branch connected to a node requires less power, propagate the change to its input node.

1CREATE TRIGGER T3_Branch_Node ON Branch 
2AFTER UPDATE AS 
3-- If some branch has decreased its power consumption 
4IF EXISTS (SELECT * FROM Inserted I JOIN Deleted D 
5    ON I.BranchId = D.BranchId 
6    WHERE D.Power > I.Power AND I.ToNode IN (SELECT NodeId FROM Node)) -- toNode is not a foreign key, so we need to make sure it is a node 
7BEGIN 
8  UPDATE Node 
9-- Decrease the power consumption by the difference between the past and the new values 
10  SET Power = Power - (SELECT D.Power - I.Power 
11        FROM Inserted I JOIN Deleted D ON I.BranchId = D.BranchId 
12        WHERE I.ToNode = NodeId AND D.Power > I.Power) -- Make sure the node is the correct one 
13  WHERE NodeId IN (SELECT toNode FROM Inserted) 
14END;

3.4.5 Propagation of power reduction from a branch to a distributor

If a branch connected to a distributor requires less power, propagate the change to the distributor.

1CREATE TRIGGER T4_Branch_Distributor ON Branch 
2AFTER UPDATE AS 
3-- If some branch has decreased its power consumption 
4IF EXISTS (SELECT * FROM Inserted I JOIN Deleted D 
5    ON I.BranchId = D.BranchId 
6    WHERE D.Power > I.Power AND I.ToNode IN (SELECT NodeId FROM Distributor)) -- toNode is not a foreign key, so we need to make sure it is a distributor 
7BEGIN 
8  UPDATE Distributor 
9-- Decrease the power consumption by the difference between the past and the new values 
10  SET Power = Power - (SELECT D.Power - I.Power 
11        FROM Inserted I JOIN Deleted D ON I.BranchId = D.BranchId 
12        WHERE I.ToNode = NodeId AND D.Power > I.Power) -- Make sure the node is the correct one 
13  WHERE NodeId IN (SELECT BranchIn FROM Inserted) 
14END;

3.4.6 Propagation of power increase from a user

If a user requires more power, propagate the change to its input branch.

1CREATE TRIGGER T5_User_Branch ON User 
2AFTER UPDATE AS 
3-- If some user has increased its power consumption 
4IF EXISTS (SELECT * FROM Inserted I JOIN Deleted D 
5    ON I.UserId = D.UserId WHERE D.Power < I.Power) 
6BEGIN 
7  UPDATE Branch 
8-- Increase the power consumption by the difference between the new and the past values 
9  SET Power = Power - (SELECT D.Power - I.Power 
10        FROM Inserted I JOIN Deleted D ON I.UserId = D.UserId 
11        WHERE I.BranchIn = BranchIn AND D.Power < I.Power) -- Make sure the branch is the correct one 
12  WHERE BranchId IN (SELECT BranchIn FROM Inserted) 
13END;

3.4.7 Propagation of power increase from a node

If a node require more power, propagate the change to its input branch:

1CREATE TRIGGER T6_Node_Branch ON Node 
2AFTER UPDATE AS 
3-- If some node has increased its power consumption 
4IF EXISTS (SELECT * FROM Inserted I JOIN Deleted D 
5    ON I.NodeId = D.NodeId WHERE D.Power < I.Power) 
6BEGIN 
7  UPDATE Branch 
8-- Increase the power consumption by the difference between the new and the past values 
9  SET Power = Power - (SELECT D.Power - I.Power 
10        FROM Inserted I JOIN Deleted D ON I.NodeId = D.NodeId 
11        WHERE I.BranchIn = BranchIn AND D.Power < I.Power) -- Make sure the branch is the correct one 
12  WHERE BranchId IN (SELECT BranchIn FROM Inserted) 
13END;

3.4.8 Propagation of power increase from a branch to a node

If a branch connected to a node requires more power, propagate the change to its input node.

1CREATE TRIGGER T7_Branch_Node ON Branch 
2AFTER UPDATE AS 
3-- If some branch has increased its power consumption 
4IF EXISTS (SELECT * FROM Inserted I JOIN Deleted D 
5    ON I.BranchId = D.BranchId 
6    WHERE D.Power < I.Power AND I.ToNode IN (SELECT NodeId FROM Node)) -- toNode is not a foreign key, so we need to make sure it is a node 
7BEGIN 
8  UPDATE Node 
9-- Increase the power consumption by the difference between the new and the past values 
10  SET Power = Power - (SELECT D.Power - I.Power 
11        FROM Inserted I JOIN Deleted D ON I.BranchId = D.BranchId 
12        WHERE I.ToNode = NodeId AND D.Power < I.Power) -- Make sure the node is the correct one 
13  WHERE NodeId IN (SELECT toNode FROM Inserted) 
14END;

3.4.9 Propagation of power increase from a branch to a distributor

If a branch connected to a distributor requires more power, propagate the change to the distributor.

1CREATE TRIGGER T8_Branch_Distributor ON Branch 
2AFTER UPDATE AS 
3-- If some branch has increased its power consumption 
4IF EXISTS (SELECT * FROM Inserted I JOIN Deleted D 
5    ON I.BranchId = D.BranchId 
6    WHERE D.Power < I.Power AND I.ToNode IN (SELECT NodeId FROM Distributor)) -- toNode is not a foreign key, so we need to make sure it is a distributor 
7BEGIN 
8  UPDATE Distributor 
9-- Increase the power consumption by the difference between the new and the past values 
10  SET Power = Power - (SELECT D.Power - I.Power 
11        FROM Inserted I JOIN Deleted D ON I.BranchId = D.BranchId 
12        WHERE I.ToNode = NodeId AND D.Power < I.Power) -- Make sure the node is the correct one 
13  WHERE NodeId IN (SELECT BranchIn FROM Inserted) 
14END;

3.4.10 Excess power requested from a distributor

If the power requested from a distributor exceeds its maximum, rollback the entire transaction.

1CREATE TRIGGER T9_Distributor ON Distributor 
2AFTER UPDATE AS 
3-- If some distributor has increased its power consumption exceeding its maximum capacity 
4IF EXISTS (SELECT * FROM Inserted I WHERE I.Power > I.MaxPower) 
5BEGIN 
6  RAISERROR 13000 Maximum capacity of the distributor exceeded 
7  ROLLBACK 
8END;

3.4.11 Propagate power change from a branch to its wires

If the power of a branch is changed, distributes the change equally on its wires.

1CREATE TRIGGER T10_Branch_Wire ON Branch 
2AFTER UPDATE AS 
3BEGIN 
4  UPDATE Wire 
5-- Divide the difference between past and new power among all wires and subtract it from every wire 
6-- note that this works independently of the sign of the change 
7  SET Power = Power - ( 
8        (SELECT D.Power - I.Power 
9        FROM Inserted I JOIN Deleted D ON I.BranchId = D.BranchId 
10        WHERE I.BranchId = Branch) 
11        / 
12        (SELECT COUNT(*) FROM Wire W JOIN Inserted I ON I.BranchId = W.Branch 
13        WHERE W.Branch = Branch) 
14        ) 
15  WHERE Branch IN (SELECT BranchId FROM Inserted) 
16END;

3.4.12 Change wire type if power passess threshold

If the power on a wire goes above the allowed threshold, change the wire type.

1CREATE TRIGGER T11_Wire_Type on Wire 
2AFTER INSERT, UPDATE AS 
3IF EXISTS (SELECT * FROM Inserted I JOIN WireType WT 
4    ON WireType = WireTypeId 
5    WHERE I.Power > WT.MaxPower -- If the power overpass the maximum allowed 
6      AND EXISTS (SELECT * FROM WireType WT1 WHERE WT1.MaxPower > I.Power)) -- And there is a wiretype which can accept the higher power 
7BEGIN 
8  UPDATE Wire 
9  SET WireType = (SELECT WireTypeId 
10          FROM WireType WT 
11          WHERE WT.MaxPower >= Power AND 
12          NOT EXISTS( -- There are no two different types with the same maxPower, so we take the first wireType whose maxPower is sufficient 
13            SELECT * FROM WireType WT1 
14            WHERE WT1.MaxPower < WT.MaxPower AND WT1.MaxPower >= Power)) 
15  WHERE WireId IN (SELECT WireId FROM INSERTED I JOIN WireType WT ON WireType = WireTypeId -- make sure we take the appropriate WireId 
16        WHERE I.Power > WT.MaxPower AND 
17          EXISTS (SELECT * FROM WireType WT1 WHERE WT1.MaxPower > I.Power)) 
18END;

3.4.13 Add a wire to a branch

If there is no suitable wire type, add another wire to the branch.

1CREATE TRIGGER T12_Wire on Wire 
2AFTER INSERT, UPDATE AS 
3IF EXISTS (SELECT * FROM Inserted I JOIN WireType WT 
4    ON WireType = WireTypeId 
5    WHERE I.Power > WT.MaxPower 
6      AND I.Power > (SELECT MAX(MaxPower) FROM WireType)) -- the requested power is greater than the allowed for every wiretype 
7BEGIN 
8  DECLARE @nextWire char(3), @BranchId char(3), @WireId char(3), @Power real, @MaxPower real 
9  DECLARE wires_cursor CURSOR FOR 
10    SELECT I.WireId, I.BranchId, I.Power, WT.MaxPower 
11    FROM Inserted I JOIN WireType WT ON WireType = WireTypeId 
12    WHERE I.Power > WT.MaxPower 
13      AND I.Power > (SELECT MAX(MaxPower) FROM WireType) 
14  OPEN wires_cursor 
15  FETCH NEXT FROM wires_cursor INTO @WireId, @BranchId, @Power, @MaxPower 
16  WHILE @@FETCH_STATUS = 0 
17  BEGIN 
18    EXEC @nextWire = nextWireId 
19    INSERT INTO Wire (WireId, BranchId, WireType, Power) VALUES (@nextWire, @BranchId, WT1, @Power-0.8*@MaxPower) 
20    FETCH NEXT FROM wires_cursor INTO @WireId, @BranchId, @Power, @MaxPower 
21  END 
22  CLOSE wires_cursor 
23  DEALLOCATE wires_cursor 
24 
25  UPDATE Wire 
26  SET Power = (SELECT 0.8*MaxPower FROM WireType WT WHERE WT.WireTypeId = WireType) 
27  WHERE Power > (SELECT MAX(MaxPower) FROM WireType) 
28END;

Part II
Graph Databases

4 Introduction

Relational DBMSs are too rigid for Big Data scenarios, and not the best option for storing unstructured data. The one-size-fits-all approach is no longer valid in many scenarios and RDBMSs are hard to scale for billions of rows, because data structures used in RDBMSs are optimized for systems with small amounts of memory.

NoSQL technologies do not use the relational model for storing data nor retrieving it. Also, they don’t generally have the concept of schema, so fields can be added to any record as desired, without control. These characteristics provide ease to run on clusters as well as an increased scaling capability, with horizontal scalability in mind. But these gains are not free: there is a trade-off in which the traditional concept of consistency gets harmed. For example, ACID (Atomic, Consistent, Isolated, Durable) transactions are not fully supported most of the times.

There are several types of NoSQL databases, such as Key-Value stores, Column stores, Document databases,... We are going to focus on Graph Databases.

4.1 CAP theorem

Definition 4.1. The consistency of a database is fullfilled when all updates happen equally for all users of the database.

The availability guarantees that every request receives a response, whether it succeeded or failed.

The partition tolerance means that the system is able to operate despite arbitrary message lost or failure of part of the system.

A database has, ideally, this three properties fullfilled. But the CAP theorem ensures that this is impossible:

Theorem 4.1. CAP Theorem

A distributed data system cannot guarantee consistency (C), availability (A) and partition tolerance (P), but only any combination of two of them.

Remark 4.1. If the system needs to be distributed, then partition tolerance is a must. So, in practice, there is a decision to choose between fullfilling consistency or availability for distributed systems.

Remark 4.2. Nonetheless, even the perfect form of the three properties cannot be guaranteed at once, it is possible to provide fairly good levels of the one that is not optimal.

Definition 4.2. The eventual consistency assumption: in the absence of new writes, consistency will be acheived eventually, and all replicas that are responsible for a data item will agree on the same version and return the last updated value.

With fewer replicas in the system, R/W operations complete more quickly, improving latency.

4.2 Graph DB model: graphs

In graph databases, the data and/or the schema are represented by graphs, or by data structures that generalize the notion of graph: hypergraphs.

The integrity constraints enforce data consistency. Constraints can be grouped in: schema-instance consistency, identity and referencial integrity, and functional and inclusion dependencies.

The data manipulation is expressed by graph transfromations, or by operations whose main primitives are on graph features, like paths, neighborhoods, subgraphs, connectivity and graph statistics.

4.3 The Resource Description Framework (RDF) Model

RDF allows to express facts, such as ’Ana is the mother of Julia.’, but we’d like to be able to express more generic knowledge, like ’If somebody has a daughter, then that person is a parent.’. This kind of knowledge is called schema knowledge. The RDF schema allows us to do some schema knowledge modeling, and the Ontology Web Language (OWL) gives even more expressivity.

A class is a set of things or resources. In RDF, everything is a resource, that belongs to a class (or several classes). The classes can be arranged in hierarchies. Every resource is a member of the class rdfs:Class.

Example 4.1. A hierarchy of classes in RDF:

PIC

The different resources are labelled with:

In RDF, there exists implicit knowledge, which can be inferred using deduction. This knowledge doesn’t need to be stated explicitly: which statements are logical consequences from others is governed by the formal semantics.

Example 4.2. If a RDF document contains ’u rdf:type ex:Textbook’ and ’ex:Textbook rdfs:subClassOf ex:Book’, then it is deduced that ’u rdf:type ex:Book’.

As can be seen, RDF are usually stored in triple stores or quad stores, which are a relational DB. This means that it is usually not implemented natively as a graph database, which makes it harder to do inference. It is more usually used as a metadata storage.

4.4 The property graph data model

This model is simpler: in this case, the model is a graph, where nodes and edges are annotated with properties. It is schema-less, meaning there is not an underlying schema to which the data has to adhere. Inference is not performed.

There are several types of relationships supported by graph databases:

The abstract data type used is a graph with properties, which is a 4-tuple G = (V,E,Σ,L ) such that:

The basic operations defined over a graph are:

The notion of graph can be generalized by that of hypergraph, which is a pair H = (V,E), where V is a set of nodes, and E 2V is a set of non-empty subsets of V , called hyperedges. If V = {v1,...,vn} and E = {e1,...,em }, we can define the incidence matrix of H as the matrix A = (aij)n×m where

     {
      1  if vi ∈ ej
aij =  0  otherwise

In this case, this is an undirected hypergraph. A directed hypergraph is defined similarly by H = (V,E ) where in this case E 2V × 2V , meaning that the nodes in the left set are connected to the nodes of the right one.

4.4.1 Implementation: adjacency list

In an adjacency list, we maintain an array with as many cells as there are nodes in the graph, and:

This way it is very cheap to obtain neighbors of a node, but it is not suitable for checking if there is an edge between two nodes.

Example 4.3. This graph

PIC

Is modelled with the following adjacency list:



v1


v2 {(v1,{L2}),(v3,{L4})}


v3


v4 {(v1,{L1})}


4.4.2 Implementation: incidence list

In this case, we maintain two arrays, one with as many cells as nodes, and another one with as many different edges there are in the graph:

Example 4.4. Now, the graph of the previous example is modeled as:



v1 {(dest,L2),(dest,L1)}


v2 {(source,L2),(source,L3)}


v3 {(dest,L3)}


v4 {(source,L1)}




L1 (V4,V1)


L2 (V2,V1)


L3 (V2,V3)


Some properties:

4.4.3 Implementation: adjacency matrix

In this case, we maintain a matrix of size n × n, where n is the number of nodes:

Example 4.5. In this case, the example is modelled as






v1 v2 v3 v4





v1





v2 {L2} {L3}





v3





v4 {L1}





Properties:

4.4.4 Implementation: incidence matrix

In this case, we store a matrix of size n × m, where n is the number of nodes and m is the number of edges:

Example 4.6. The example is represented now as





L1 L2 L3




v1 dest dest




v2 source source




v3 dest




v4 source




Properties:

5 Neo4j

Neo4j is an open source graph DB system implemented in Java which uses a labelled attributed multigraph as data model. Nodes and edges can have properties, and there are no restrictions on the amount of edges between nodes. Loops are allowes and there are different types of traversal strategies defined.

It provides APIs for Java and Python and it is embeddable or server-full. It provides full ACID transactions.

Neo4j provides a native graph processing and storage, characterized by index-free adjacency, meaning that each node keeps direct reference to adjacent nodes, acting as a local index and making query time independent from graph size for many kinds of queries.

Another good property is that the joins are precomputed in the form of stored relationships.

It uses a high level query language called Cypher.

5.1 File storage

Graphs are stored in files. There are three different objects:

5.1.1 Caching

Neo4j uses a cache to divide each store into regions, called pages. The cache stores a fixed number of pages per file, which are replaces using a Least Frequently Used strategy.

The cache is optimized for reading, and stores object representationsof nodes, relationships, and properties, for fast path traversal. In this case, node objects contain properties and references to relationships, while relationships contain only their properties. This is the opposite of what happens in disk storage, where most informaiton is in the relationship records.

5.2 Cypher

Cypher is the high level query language used by Neo4j for creating nodes, updating/deleting information and querying graphs in a graph database.

Its functioning is different from that of the relational model. In the relational model, we first create the structure of the database, and then we store tuples, which must be conformant to the structure. The foreign keys are defined at the structural level. In Neo4j, nodes and edges are directly created, with their properties, labels and types as structural information, but no schema is explicitly defined. The topology of the graph can be thought as analogous to the foreign key in the relational model, but defined at the instance level.

5.2.1 Nodes

A node is of the form

(v : l1 : ...: ln{P1 : v1,...,Pk : vk})

where v is the node variable, which identifies the node in an expression, : l1 : ... : ln is a list of n labels associated with the node, and {P1 : v1,...,Pk : vk} is a list of k properties associated with the node, and their respective assigned values. Pi is the name of the property, vi is the value.

To create an empty node:

1CREATE (v) 
2RETURN v;

The ID is assigned internally, with a different number each time. It can be reused by the system but should not be used in applications: it should be considered an internal value to the system. RETURN is used to display the node:

PIC

To create a node with two labels:

1CREATE (v :l1:l2) 
2RETURN v;

PIC

To create a node with one label and 3 properties:

1CREATE (v :l1 {P1:v1, P2:v2, P3:[v3_1,v3_2]}) 
2RETURN v;

PIC

A query in Cypher is basically a pattern, which will be fullfilled solving the associated pattern-matching problem.

If we want to add a new label to all nodes previously created:

1MATCH(n) 
2SET n :newL 
3RETURN n;

PIC

To delete a label from those nodes with a certain label:

1MATCH(n :matchL) 
2REMOVE n :delL 
3RETURN n;

A similar thing can be done with properties, which are referred to as node.propertyName:

1MATCH(n :matchL) 
2REMOVE n.propName1, n.propName2 
3RETURN n;

5.2.2 Edges

An edge has the form

(n) - [e : Type {P : v ,...,P : v }]- > (v)
                1  1     k  k

where n is the source node and v is the destination node. The edge is defined inside the brackets []. It can also be defined of the form (n) < -[] -(v). e identifies the edge and Type is a mandatory field prefixed by :. Finally, we have again a list of k properties and their values.

Imagine we have 3 employees and want to create the relationship that one of them is the manager of the other two and a date as property. We can do that with:

1MATCH(n1 :Employee {Name:1}),(n2 :Employee {Name:m}),(n3 :Employee {Name:2}) 
2CREATE (n1)<-[e1:manager_of {From:Dec22}]-(n2)-[e2:manager_of {From:Jan23}]->(n3) 
3RETURN e1,e2;

5.2.3 Queries

As we have said, Cypher is a high level query language based on pattern matching. It queries graphs expressing informational or topological conditions.

In addition:

  1. If we don’t need to make reference to a node, we can use () with no variable inside.

  2. If we don’t need to refer to an edge, we can omit it, like (n1)- ->(n2).

  3. If we don’t need to consider the direction of the edge, we can use - -.

  4. If a pattern matches more than one label, we can write the OR condition as | inside the pattern. For example, (n :l1|:l2) matches nodes with label l1 or label l2.

  5. To express a path of any length, use [*]. For a fixed length m use [*m].

  6. To indicate boundaries to the length of a path, minimum n and maximum m, use [*n..m]. To only limit one end use [*n..], [*..m].

Example 5.1. A page X gets a score computed as the sum of all votes given by the pages that references it. If a page Z references a page X, Z gives X a normalized vote computed as the inverse of the number of pages references by Z. To prevent votes of self-referencing pages, if Z references X and X references Z, Z gives 0 votes to X.

We are asked to compute the page rank for each web page. One possible solution is:

1MATCH (p)-->(r) 
2WITH p, 1/count(r) AS vote 
3MATCH (p)-->(x) 
4WHERE NOT ((x)-->(p)) 
5RETURN x, SUM(vote) AS Rank 
6ORDER BY x.url

The first MATCH-WITH computes, for each node, the inverse of the number of outgoing edges, and passes this number on to the next clause. Now, for each of these p nodes, we look for paths of length 1 where no reciprocity exists.

Another solution uses COLLECT:

1MATCH (p)-->(r) 
2WITH p, 1/count(r) AS vote 
3MATCH (p)-->(x) 
4WHERE NOT ((x)-->(p)) 
5RETURN x.url, COLLECT(p.url), SUM(vote) AS rank 
6ORDER BY x.url

In this case, we are using the COLLECT to get the urls that points to x, but x does not point to them, in addition to just computing the value.

Part III
Temporal Databases

6 Introduction

There are many applications in which temporal aspects need to be taken into account. Some examples are in the academic, accounting, insurance, law, medicine,... These applications would greatly benefit from a built-in temporal support in the DBMS, which would make application development more efficient with a potential increase in performance. In this sense, a temporal DBMS is a DBMS that provides mechanisms to store and manipulate time-varying information.

Example 6.1. A case study: imagine we have a database for managing personel, with the relation Employee(Name, Salary, Title, BirthDate). It is easy to know the salary of the employee, or its birthdate:

1-- Salary 
2SELECT Salary 
3FROM Employee 
4WHERE Name = John; 
5 
6-- Birthdate 
7SELECT Birthdate 
8FROM Employee 
9WHERE Name = John;

But it is often the case that we don’t only want to store the current state of things, but also a history. For instance, it can be interesting to store the employment history by extending the relation Employee(Name, Salary, Title, BirthDate, FromDate, ToDate). A dataset sample for this could be the following:







Name Salary Title BirthDate FromDate ToDate












John 60K Assistant 9/9/60 1/1/95 1/6/95






John 70K Assistant 9/9/60 1/6/95 1/10/95






John 70K Lecturer 9/9/60 1/10/95 1/2/96






John 70K Professor 9/9/60 1/2/96 1/1/97






For the underlying system, the date columns FromDate and ToDate are no different from the BirthDate column, but it is obvious for us that the meaning is different: FromDate and ToDate need to be understood together as a period of time.

Now, to know the employee’s current salary, the query gets more complex:

1SELECT Salary 
2FROM Employee 
3WHERE Name = John AND FromDate <= CURRENT_TIMESTAMP AND CURRENT_TIMESTAMP < ToDate; -- The intervals are [)

Another interesting query that we can think of now is determining the salary history, i.e., all different salaries earned by the employee and in which period of time the employee was earning that salary. For our example, the result would be:





Name Salary FromDate ToDate








John 60K 1/1/95 1/6/95




John 70K 1/6/95 1/1/97




But how can we achieve this?

PIC

One way to do this in SQL is performing a loop that merges them in pairs, until there is nothing else to merge:

1CREATE TABLE Temp(Salary, FromDate, ToDate) AS 
2  SELECT Salary, FromDate, ToDate 
3  FROM Employee 
4  WHERE Name = John 
5 
6repeat 
7  UPDATE Temp T1 
8  SET T1.ToDate = (SELECT MAX(T2.ToDate) 
9        FROM Temp AS T2 
10        WHERE T1.Salary = T2.Salary   -- T1 and T2 have the same salary 
11        AND T1.FromDate < T2.FromDate -- T1 starts before T2 
12        AND T1.ToDate >= T2.FromDate  -- T1 does not end before T2 begins 
13        AND T1.ToDate < T2.ToDate)    -- T1 ends before T2 
14  WHERE EXISTS (SELECT * 
15        FROM Temp AS T2 
16        WHERE T1.Salary = T2.Salary 
17        AND T1.FromDate < T2.FromDate 
18        AND T1.ToDate >= T2.FromDate 
19        AND T1.ToDate < T2.ToDate) 
20until no tuples updated

This loop is executed logN times in the worst case, where N is the number of tuples in a chain of overlapping tuples. After the loop, we have to deleted extraneous, non-maximal intervals:

1DELETE FROM Temp T1 
2WHERE EXISTS ( 
3  SELECT * 
4  FROM Temp AS T2 
5  WHERE T1.Salary = T2.Salary 
6  AND ( 
7    (T1.FromDate > T2.FromDate AND T1.ToDate <= T2.ToDate) -- T1 is contained in T2 (] 
8    OR 
9    (T1.FromDate >= T2.FromDate AND T1.ToDate < T2.ToDate) -- T1 is contained in T2 [)

The same thing can be achieved by unifying everything using a single SQL expression as

1CREATE TABLE Temp(Salary, FromDate, ToDate) AS 
2  SELECT Salary, FromDate, ToDate 
3  FROM Employee 
4  WHERE Name = John 
5 
6SELECT DISTINCT F.Salary, F.FromDate, L.ToDate 
7FROM Temp as F, Temp as L 
8WHERE F.FromDate < L.ToDate AND F.Salary = L.Salary -- same salary and F starts before L 
9AND NOT EXISTS ( 
10  SELECT * 
11  FROM Temp AS T 
12  WHERE T.Salary = F.Salary 
13  AND F.FromDate < T.FromDate AND T.FromDate < L.ToDate -- T happens entirely between F and L 
14  AND NOT EXISTS ( 
15    SELECT * 
16    FROM Temp as T1 
17    WHERE T1.Salary = F.Salary 
18    AND T1.FromDate < F.FromDate AND T.FromDate <= T1.ToDate)) -- T starts in the middle of T1, which starts before F 
19AND NOT EXISTS ( 
20  SELECT * 
21  FROM Temp AS T2 
22  WHERE T2.Salary = F.Salary 
23  AND ( 
24    (T2.FromDate < F.FromDate AND F.FromDate <= T2.ToDate) -- F starts in the middle of T2 
25    OR 
26    (T2.FromDate >= L.ToDate AND L.ToDate < T2.ToDate) -- L ends in the middle of T2 
27  ))

This is a complex query, and the logic is that if we want to merge the periods, we want to get a From and a To such that every period contained between From and To touches or intersect another period, and no period from outside From and To touches or interesects any of the periods inside:

(f.From, t.To) such that x : (x.F rom > f.f rom  ∧x.To < t.To) =⇒y : (y.F rom  < x.F rom ∧x.F rom  ≤ y.To)
x : (x.F rom < f.F rom ∧ x.T o ≥ f.From ) (x.F rom  < t.To ∧x.To ≥ t.To)

Now, the cannot be used in SQL, but we can use the fact that x : P(x) ≡¬¬(∀x : P (x)) ≡¬(∃x : ¬P (x)) x : ¬P    (x), noting also that ¬(A  =⇒  B ) A∧¬B, and that is the query that we have shown above. An intuitive visualization is the diagram in Figure 1.


PIC
Figure 1: Merging temporal intervals. Blue is permitted; Green, Red and Orange are forbidden.


Another possibility to achieve this temporal features is to make the attributes salary and title temporal, instead of the whole table. For this, we can split the information of the table into three tables:

Employee(Name, BirthDate)

EmployeeSal(Name, Salary, FromDate, ToDate)

EmployeeTitle(Name, Title, FromDate, ToDate)

Now, getting the salary history is easier, because we only need to query the table EmployeeSal:

1SELECT Salary, FromDate, ToDate 
2FROM EmployeeSal 
3WHERE Name = John

But, what if we want to obtain the history of the combinations of (salary,title)? We have to perform a temporal join. Say we have the following tables:

EmployeeSal








Name Salary FromDate ToDate








John 60K 1/1/95 1/6/95




John 70K 1/6/95 1/1/97




EmployeeTitle








Name Title FromDate ToDate








John Assistant 1/1/95 1/10/95




John Lecturer 1/10/95 1/2/96




John Professor 1/2/96 1/1/97




In this case, the answer to our temporal join would be:

EmployeeSal⊳⊲EmployeeTitle










Name Salary Title FromDate ToDate










John 60K Assistant 1/1/95 1/6/95





John 70K Assistant 1/6/95 1/10/95





John 70K Lecturer 1/10/95 1/2/96





John 70K Professor 1/2/96 1/1/97





For these, again, we could print the two tables and let the user make the suitable combinations, but it feels better to solve the problem using SQL. The query can be done as:

1SELECT S.Name, Salary, Title, S.FromDate, S.toDate 
2FROM EmployeeSal S, EmployeeTitle T 
3WHERE S.Name = T.Name -- join by the name of the employee 
4  AND T.FromDate <= S.FromDate 
5  AND S.ToDate <= T.ToDate     -- CASE 1: period of S contained in period of T 
6 
7UNION ALL 
8 
9SELECT S.Name, Salary, Title, S.FromDate, T.ToDate 
10FROM EmployeeSal S, EmployeeTitle T 
11WHERE S.Name = T.Name -- join by the name of the employee 
12  AND T.FromDate < S.FromDate AND S.FromDate < T.ToDate 
13  AND S.ToDate > T.ToDate      -- CASE 2: period of S starts inside period of T, and ends after 
14 
15UNION ALL 
16 
17SELECT S.Name, Salary, Title, T.FromDate, S.ToDate 
18FROM EmployeeSal S, EmployeeTitle T 
19WHERE S.Name = T.Name -- join by the name of the employee 
20  AND S.FromDate < T.FromDate AND T.FromDate < S.ToDate 
21  AND T.ToDate > S.ToDate      -- CASE 3: period of T starts inside period of S, and ends after 
22 
23UNION ALL 
24 
25SELECT S.Name, Salary, Title, T.FromDate, T.toDate 
26FROM EmployeeSal S, EmployeeTitle T 
27WHERE S.Name = T.Name -- join by the name of the employee 
28  AND S.FromDate <= T.FromDate 
29  AND T.ToDate <= S.ToDate     -- CASE 4: period of T contained in period of S  

The four cases are depicted in Figure 2.


PIC
Figure 2: Temporal join: cases.


If we are using a system with embedded temporal capabilities, i.e., implementing TSQL2 (temporal SQL), we can let the system do it and just perform a join as we do it usually:

1SELECT S.Name, Salary, Title 
2FROM EmployeeSal S JOIN EmployeeTitle T ON S.Name = T.Name

7 Time Ontology

Time can be modelled in several ways, depending on the use case:

We are going to assume a linear time structure.

Regarding the limits of the timeline, we can classify it:

Also, the bounds can be unspecified or specified.

We also need to consider the density of the time measures (i.e. what is an instant?). In this sense, we can classify the timeline as:

Usually, a distance between chronons can be defined.

7.1 TSQL2: Time ontology

TSQL2 uses a linear time structure, bounded on both ends. The timeline is composed of chronons, which is the smallest possible granularity. Consecutive chronons can be grouped together into granules, giving multiple granularities and enable to convert from one another. The density is not defined and it is not possible to make questions in different granularities. The implementation is basically as discrete and the distance between two chronons is the amount of chronons in-between.

Temporal Types

This way, in SQL92 we have the following Temporal Types:

And in TSQL2 we have:

7.1.1 Time and facts

The valid time of a fact is the time in which the fact is true in the modelled reality, which is independant of its recording in the database and can be past, present or future.

The transaction time of a fact is when the fact is current in the database and may be retrieved.

These two dimensions are orthogonal.

There are four types of tables:

8 Temporal Conceptual Modeling

Conceptual modeling is important because it focuses on the application, rather than the implementation. Thus, it is technology independent, which enhances the portability and the durability of the solution. It is also user oriented and uses a formal, unambiguous specification, allowing for visual interfaces and teh exchange and integration of information.

8.1 The conceptual manifesto


PIC
Figure 3: An example of a temporal model in three different ways.


In Figure 3, we can see three different ways to model a schema. Note how they are different, because the temporal attributes are dealt with differently in each of the models.

8.1.1 MADS temporal data types


PIC

Figure 4: The MADS temporal data types.


Time, SimpleTime and ComplexTime are abstract classes.

8.1.2 Temporal objects

A temporal object is marked with the symbol PIC, and it indicates that the object has a set of periods of validity associated.

For example, if we have the relation EmployeePIC(name, birthDate, address, salary, projects (1..n)), then, an instance of the relation can be:







Peter
8/9/64
Rue de la Paix
5000
{MADS, HELIOS}
[7/94-6/96] [7/97-6/98] Active

[7/96-7/97] Suspended






The red colored text are the life cycle information of the record.

The life cycle of an object can be continuous if there are no gaps between its creation and its deletion, or discontinuous if such gaps exist.

8.1.3 Non-temporal objects

Non-temporal objects can be modeled with the absence of a lifecycle, or with a default life cycle which encodes that it is always active. A usual option is to put it as [0,∞ ].

The TSQL2 policy is that temporal operators are not allowed on non-temporal relations. So if we need to perform some kind of temporal operation in non-temporal relations, such as a join with a temporal relation, then we need to use the default life cycle.

Example 8.1. An example is the following query:

1SELECT Dept, Name, COUNT(PID) 
2FROM Department, Employee 
3WHERE Employee.dept = Department.Dept 
4  AND VALID(Employee) OVERLAPS PERIOD [1/1/96-31/12/96] 
5GROUP BY dept

8.1.4 Temporal attributes

A temporal attribute is marked with the symbol PIC, and it indicates that the attribute itself has a lifecycle associated.

For example, if we have the relation Employee(name, birthDate, address, salaryPIC, projects (1..n)), then, an instance of the relation can be:







Peter
8/9/64
Rue de la Paix
4000 [7/94-6/96]
{MADS, HELIOS}



5000 [6/96-Now]






Temporal complex attributes

It is also possible to have temporal complex attributes, meaning attributes with subattributes. The temporality can be in either the full attribute, in which case a lifecyle will be attached to the whole attribute, or to a subattribute, in which case the lifecycle will be attached to only the subattribute.

For example, can have the relation Laboratory(name, projects (1..n) PIC {name, manager, budget}) and the relation Laboratory(name, projects (1..n) {name, manager PIC, budget}).

An example of the first relation is:



LBD


{(MADS,Chris,1500)} [1/1/95-31/12/95]


{(MADS,Chris,1500),(Helios,Martin,2000)} [31/12/95-Now]


An example of the second relation is:


LBD

{(MADS,


Stef1/1/95-31/12/95


Chris 31/12/95-Now


,1500),(Helios,


John31/12/95-Now


,2000)}

In this second case, if we update manager, we add one new element to the manager history. If we update the project name, we would simply change it, because project is not temporal.

8.1.5 Attribute timestamping properties

MADS has no implicit constraint, even if they make sense, such as:

8.1.6 Temporal generalization

The life cycles are inherited from parent classes, and more temporal attributes can be added. For example:

PIC

In this case, Employee is temporal, even though its parent is not. Employee inherits the temporal attribute address.

Another example:

PIC

In this case, both Temporary and Permanent are temporal objects.

In the case in which the child is also declared as temporal, it is called dynamic temporal generalization and in that case the objects maintains two lifecycles: the one inherited from its parent, and the one defined on itself. For example:

PIC

In this case, Permanent employees have two lifecycles: their lifecycle as a permanent and the inherited lifecycle as an employee. The redefined life cycle has to be including the one inherited.

8.1.7 Temporal relationships

A temporal relationship is also marked with the symbol PIC, meaning that the relation between the objects possesses a lifecycle.

Some usual constraints used in temporal relationships:

Again, MADS does not impose any constraints.

Example 8.2. Let’s see some temporal relationships.

PIC

In this case, a possible data can be:


Employee


e1


John
[7/8/77-Now]

4/7/55

Bd Haussman



e2


Peter
[1/2/78-Now]

8/10/60

Bd Général Jacques




WorksOn


w1


(e1,p2,


20[7/8/77-Now]


)
[7/8/77-Now]



w2


(e1,p1,


20[7/8/77-1/2/78]


25 [1/2/78-Now]


)
[7/8/77-Now]



w2


(e2,p1,


20[1/2/78-Now]


)
[1/2/78-Now]




Project


p1


MADS
[1/5/76-Now]

Christine

5000



p2


HELIOS
[1/2/78-Now]

Yves

6000



Another example, which is slightly different is the following:

PIC

In this case, the data can be:


Employee


e1


John
[7/8/77-Now]

4/7/55

Bd Haussman



e2


Peter
[1/2/78-Now]

8/10/60

Bd Général Jacques




WorksOn


w1

(e1,p2,


20[7/8/77-Now]


)


w2

(e1,p1,


20[7/8/77-1/2/78]


25 [1/2/78-Now]


)


w2

(e2,p1,


20[1/2/78-Now]


)



Project


p1


MADS
[1/5/76-Now]

Christine

5000



p2


HELIOS
[1/2/78-Now]

Yves

6000



Now, only currently valid tuples are kept in the relationship. But for these, the history of hours/week is maintained.

8.1.8 Synchronization relationships

They describe temporal constraints between the life cycles of two objects and they are expressed with Allen’s operator extended for temporal elements:

They can be visually seen in Figure


PICPIC

Figure 5: Allen’s temporal operators and how to express them.


These relationships express a temporal constraint between the whole life cycles or the active periods.

8.1.9 Example of a temporal schema

PIC

9 Manipulating Temporal Databases with SQL-92

Data type for periods is not available in SQL-92. Thus, a period is simulated with two Date columns: fromDate and toDate, indicating the beginning and end of the period, respectively. Some notes:

9.1 Temporal statements

Temporal statement apply to queries, modifications, views and integrity constraints. They are:

9.2 Temporal keys

Imagine we have the relation Incumbents(SSN,PCN,FromDate,ToDate) and we want to enforce that each employee has only one position at a point in time. If we use the key of the corresponding non-temporal table, (SSN,PCN), then we would not be able to enter the same employee for different periods. Thus, we need to somehow include the dates in the key. The options are: (SSN,PCN,FromDate), (SSN,PCN,ToDate) or (SSN,PCN,FromDate,ToDate). None of them captures the constraint that we want to enforce, because there are overlapping periods associated with the same SSN: we need a sequenced constraint, applied at each point in time. All constraints specified on a snapshot table have sequenced counterparts, specified on the analogous valid-time table.

9.2.1 Sequenced primary key

Example 9.1. Employees have only one position at a point in time.

1CREATE TRIGGER Seq_Primary_Key ON Incumbents FOR INSERT, UPDATE AS 
2IF EXISTS( SELECT * 
3    FROM Incumbents I1 
4    WHERE 1 < ( SELECT COUNT(I2.SSN) -- How many entries 
5        FROM Incumbents I2 
6        WHERE I1.SSN = I2.SSN AND I1.PCN = I2.PCN -- The same employee 
7        AND I1.FromDate < I2.ToDate -- I1 starts before I2 ends 
8        AND I1.ToDate > I2.FromDate)) -- I1 ends after I2 starts: the two conditions are searching for intersections 
9  OR EXISTS (SELECT * 
10      FROM Incumbents I 
11      WHERE I.SSN IS NULL OR I.PCN IS NULL) -- no NULLS values of SSN and PCN can be inserted 
12BEGIN 
13  RAISERROR(Violation of sequenced primary key constraint,1,2) 
14  ROLLBACK TRANSACTION 
15END

9.3 Handling Now

We have to decide how to timestamp current data. One alternative is to put NULL in the ToDate, which allows to identify current records by checking: WHERE ToDate IS NULL. But it possesses some disadvantages:

Another approach is to set the ToDate to the largest value in the timestamp domain: ’3000-01-01’. The disadvantages of this are:

9.4 Duplicates

There are different kind of duplicates in temporal databases:


Incumbents








SSN PCN FromDate ToDate








111223333 120033 1996-01-01 1996-06-01




111223333 120033 1996-06-01 1996-10-01




111223333 120033 1996-06-01 1996-10-01




111223333 120033 1996-10-01 Now




111223333 120033 1997-12-01 Now




Table 6: Temporal duplicates example.

9.4.1 Preventing duplicates

Now, we want to enforce that each employee has at most one position. In a snapshot table, this would be equivalent to UNIQUE(SSN), and now the sequenced constraint is: at any time each employee has at most one position, i.e., SSN is sequenced unique:

1CREATE TRIGGER Seq_Unique ON Incumbents FOR INSERT, UPDATE, DELETE AS 
2IF EXISTS( SELECT I1.SSN 
3    FROM Incumbents I1 
4    WHERE 1 < ( SELECT COUNT(I2.SSN) 
5        FROM Incumbents I2 
6        WHERE I1.SSN = I2.SSN 
7        AND I1.FromDate < I2.ToDate 
8        AND I1.ToDate > I2.FromDate)) 
9    OR EXISTS (SELECT * FROM Incumbents AS I WHERE I.SSN IS NULL) 
10BEGIN 
11  RAISERROR(Transaction violates sequenced unique constraint,1,2) 
12  ROLLBACK TRANSACTION 
13END

We can also think about the nonsequenced constraint: an employee cannot have more than one position over two identical periods, i.e. SSN is nonsequenced unique: UNIQUE(SSN,FromDate,ToDate).

Or current constraint: an employee has at most one position now, i.e. SSN is current unique:

1CREATE TRIGGER Current_Unique ON Incumbents FOR INSERT, UPDATE, DELETE AS 
2IF EXISTS( SELECT I1.SSN 
3    FROM Incumbents I1 
4    WHERE 1 < (SELECT COUNT(I2.SSN) 
5        FROM Incumbents I2 
6        WHERE I1.SSN = I2.SSN 
7        AND I1.FromDate <= CURRENT_DATE AND CURRENT_DATE < I1.ToDate -- I1 is current 
8        AND I2.FromDate <= CURRENT_DATE AND CURRENT_DATE < I2.ToDate )) -- I2 is current 
9BEGIN 
10  RAISERROR(Transaction allows current duplicates,1,2) 
11  ROLLBACK TRANSACTION 
12END

9.5 Referential integrity

We want to enforce that Incumbents.PCN is a foreign key for Position.PCN. There several possible cases, depending on what tables are temporal.

9.5.1 Case 1: neither table is temporal
1CREATE TABLE Incumbents( 
2  ... 
3  PCN CHAR(6) NOT NULL REFERENCES Position, 
4  ... 
5  )

9.5.2 Case 2: both tables are temporal

In this case, if we want the PCN to be a current foreign key, the PCN of all current incumbents must be listed in the current positions:

1CREATE TRIGGER Current_RI ON Incumbents FOR INSERT, UPDATE, DELETE AS 
2IF EXISTS( SELECT * 
3    FROM Incumbents I 
4    WHERE I.ToDate = 3000-01-01 
5    AND NOT EXISTS( SELECT * 
6          FROM Position P 
7          WHERE I.PCN = P.PCN -- The position is correct 
8          AND P.ToDate=3000-01-01)) -- And is active 
9BEGIN 
10  RAISERROR(Violation of current referential integrity,1,2) 
11  ROLLBACK TRANSACTION 
12END

Or it can be a sequenced foreign key:

1CREATE TRIGGER Seq_RI ON Incumbents FOR INSERT, UPDATE, DELETE AS 
2IF EXISTS( SELECT * 
3    FROM Incumbents I 
4    WHERE NOT EXISTS( SELECT * 
5            FROM Position P 
6            WHERE I.PCN = P.PCN 
7            AND P.FromDate <= I.FromDate 
8            AND P.ToDate > I.FromDate) -- These two search for P s.t. I starts in the middle of P 
9    OR NOT EXISTS( SELECT * 
10          FROM Position P 
11          WHERE I.PCN = P.PCN 
12          AND P.FromDate < I.ToDate 
13          AND P.ToDate >= I.ToDate) -- These two search for P s.t. I ends in the middle of P 
14    OR EXISTS( SELECT * 
15        FROM Position P 
16        WHERE I.PCN = P.PCN 
17        AND I.FromDate < P.ToDate AND I.ToDate > P.ToDate -- P ends in the middle of I 
18        AND NOT EXISTS( SELECT * 
19                FROM Position P2 
20                WHERE P2.PCN = P.PCN 
21                AND P2.FromDate <= P.ToDate 
22                AND P2.ToDate > P.ToDate))) -- These two search for P2 that continues the RI instead of P 
23BEGIN 
24  RAISERROR(Violation of sequential referential integrity,1,2) 
25  ROLLBACK TRANSACTION 
26END

Contiguous history

A contiguous history is such that there are no gaps in the history. Enforcing contiguous history is a nonsequenced constraint, because it requires examining the table at multiple points of time.

1CREATE TRIGGER Cont_History ON Position FOR INSERT, UPDATE, DELETE AS 
2IF EXISTS( SELECT * 
3    FROM Position P1, Position P2 
4    WHERE P1.PCN = P2.PCN 
5    AND P1.ToDate < P2.FromDate -- If P1 and P2 are separated 
6    AND NOT EXISTS( SELECT * 
7          FROM Position P3 
8          WHERE P3.PCN = P1.PCN 
9          AND ( 
10            (P3.FromDate <= P1.ToDate AND P3.ToDate > P1.ToDate) -- P3 extends P1 to the right 
11            OR 
12            (P3.FromDate < P2.FromDate AND P3.ToDate >= P2.FromDate)))) -- P3 extends P2 to the left 
13BEGIN 
14  RAISERROR(Transaction violates contiguous history,1,2) 
15  ROLLBACK TRANSACTION 
16END

We can also have the situation that Incumbents.PCN is a FK for Position.PCN and Position.PCN defines a contiguous history. In this case, we can omit the part of searching for the P2 that extends a P that ends in the middle of I.

1CREATE TRIGGER Seq_RI_CH ON Incumbents FOR INSERT, UPDATE, DELETE AS 
2IF EXISTS( SELECT * 
3    FROM Incumbents I 
4    WHERE NOT EXISTS( SELECT * 
5            FROM Position P 
6            WHERE I.PCN = P.PCN 
7            AND P.FromDate <= I.ToDate AND I.FromDate < P.ToDate) 
8    OR NOT EXISTS( SELECT * 
9          FROM Position P 
10          WHERE I.PCN = P.PCN 
11          AND P.FromDate < I.ToDate AND I.ToDate <= P.ToDate)) 
12BEGIN 
13  RAISERROR(Violation of sequenced referential integrity,1,2) 
14  ROLLBACK TRANSACTION 
15END

9.5.3 Case 3: Only the referenced table is temporal

Current FK:

1CREATE TRIGGER Current_RI ON Incumbents FOR INSERT, UPDATE, DELETE AS 
2IF EXISTS( SELECT * 
3    FROM Incumbents I 
4    WHERE NOT EXISTS( SELECT * 
5            FROM Position P 
6            WHERE I.PCN = P.PCN AND P.ToDate = 3000-01-01)) -- P is current 
7BEGIN 
8  RAISERROR(Violation of current referential integrity,1,2) 
9  ROLLBACK TRANSACTION 
10END

9.6 Querying valid-time tables

We have the relations:

Employee












SSN FirstName LastName BirthDate




Incumbents












SSN PCN FromDate ToDate




Salary












SSN Amount FromDate ToDate






Position




PCN JobTitle


Say we want to obtain Bob’s current position. Then:

1-- Option 1 
2SELECT JobTitle 
3FROM Employee E, Incumbents I, Position P 
4WHERE F.FirstName = Bob 
5  AND E.SSN = I.SSN 
6  AND I.PCN = P.PCN 
7  AND I.ToDate = 3000-01-01 
8 
9-- Option 2 
10SELECT JobTitle 
11FROM Employee E, Incumbents I, Position P 
12WHERE F.FirstName = Bob 
13  AND E.SSN = I.SSN 
14  AND I.PCN = P.PCN 
15  AND I.FromDate <= CURRENT_DATE AND I.ToDate > CURRENT_DATE -- This one is more portable

And Bob’s current position and salary? Current joins:

1SELECT JobTitle, Amount 
2FROM Employee E, Incumbents I, Position P, Salary S 
3WHERE FirstName = Bob 
4  AND E.SSN = I.SSN AND I.PCN = P.PCN AND E.SSN = S.SSN 
5  AND I.FromDate <= CURRENT_DATE AND I.ToDate > CURRENT_DATE -- I is current 
6  AND S.FromDate <= CURRENT_DATE AND S.ToDate > CURRENT_DATE -- S is current

And if we want to get what employees have currently no position?

1SELECT FirstName 
2FROM Employee e 
3WHERE NOT EXISTS( SELECT * 
4        FROM Incumbents I 
5        WHERE E.SSN = I.SSN 
6        AND I.FromDate <= CURRENT_DATE AND I.ToDate > CURRENT_DATE)

9.6.1 Extracting prior states

A timeslice query extracts a state at a particular point in time. They require an additional predicate for each temporal tables: they are basically the same as checking the current state, but with a different date.

For example: Bob’s position at the beginning of 1997?

1SELECT JobTitle 
2FROM Employee E, Incumbents I, Position P 
3WHERE F.FirstName = Bob 
4  AND E.SSN = I.SSN 
5  AND I.PCN = P.PCN 
6  AND I.FromDate <= 1997-01-01 AND I.ToDate > 1997-01-01

9.6.2 Sequenced queries

A sequenced query is such that its result is a valid-time table. They use sequenced variants of basic operations:

9.6.3 Nonsequenced queries

Nonsequenced operators are straightforward: they ignore the time-varying nature of the tables.

Example 9.2. List all the salaries, past and present, of employees who had been lecturer at some time:

1SELECT Amount 
2FROM Incumbents I, Position P, Salary S 
3WHERE I.SSN = S.SSN AND I.PCN = P.PCN 
4  AND JobTitle = Lecturer;

When did employees receive raises?

1SELECT S2.SSN, S2.FromDate AS RaiseDate 
2FROM Salary S1, Salary S2 
3WHERE S2.Amount > S1.Amount AND S1.SSN = S2.SSN AND S1.ToDate = S2.FromDate;

Remove nonsequenced duplicates from Incumbents:

1SELECT DISTINCT * 
2FROM Incumbents

Remove value-equivalent rows from Incumbents:

1SELECT DISTINCT SSN, PCN 
2FROM Incumbents

Remove current duplicates from Incumbents:

1SELECT DISTINCT SSN, PCN 
2FROM Incumbents 
3WHERE ToDate = 3000-01-01

9.6.4 Sequenced aggregation function

Say we have two relations:

Affiliation












SSN DNumber FromDate ToDate




Salary












SSN Amount FromDate ToDate




And we want the maximum salary. The non-temporal version is straighforward:

1SELECT MAX(Amount) 
2FROM Salary

And the maximum salary by department:

1SELECT DNumber, MAX(Amount) 
2FROM Affiliation A, Salary S 
3WHERE A.SSN = S.SSN 
4GROUP BY DNumber

But the temporal version is not that easy. We have to do as in this visual example:

PIC

The steps to follow are:

  1. Compute the periods on which a maximum must be calculated.

  2. Compute the maximum for the periods.

  3. Coalesce the results (as we have already seen)

1-- Step 1: compute periods 
2CREATE VIEW SalChanges(Day) AS 
3  SELECT DISTINCT FromDate 
4  FROM Salary 
5   
6  UNION 
7 
8  SELECT DISTINCT ToDate 
9  FROM Salary 
10 
11CREATE VIEW SalPeriods(FromDate, ToDate) AS 
12  SELECT P1.Day, P2.Day 
13  FROM SalChanges P1, SalChanges P2 
14  WHERE P1.Day < P2.Day -- Get all consecutive combinations 
15    AND NOT EXISTS( SELECT * 
16            FROM Salchanges P3 
17            WHERE P1.Day < P3.Day AND P3.Day < P2.Day) -- Ensure they are consecutive 
18 
19-- Step 2 
20CREATE VIEW TempMax(MaxSalary, FromDate, ToDate) AS 
21  SELECT MAX(S.AMOUNT), P.FromDate, P.ToDate 
22  FROM SALARY S, SalPeriods P 
23  WHERE S.FromDate <= P.FromDate AND P.ToDate <= S.ToDate 
24  GROUP BY P.FromDate, I.ToDate 
25 
26-- Step 3: COALESCE

Now, we want to compute the history of maximum salary by deparment:

  1. Compute by department the periods on which a maximum must be calculated.

  2. Compute the maximum salary for these periods.

  3. Coalesce the results.

1-- Step 1 
2CREATE VIEW Aff_Sal(DNumber, Amount, FromDate, ToDate) AS 
3  SELECT DISTINCT A.DNumber, S.Amount, maxDate(S.FromDate, A.FromDate) AS FromDate, minDate(S.ToDate, A.ToDate) AS ToDate 
4  FROM Affiliation A, Salary S 
5  WHERE A.SSN = S:SSN 
6    AND maxDate(S.FromDate,A.FromDate) < minDate(S.ToDate,A.ToDate) 
7 
8CREATE VIEW SalChanges(DNumber, Day) AS 
9  SELECT DISTINCT DNumber, FromDate 
10  FROM Aff_Sal 
11  UNION 
12  SELECT DISTINCT DNumber, ToDate 
13  FROM Aff_Sal 
14 
15CREATE VIEW SalPeriods(DNumber, FromDate, ToDate) AS 
16  SELECT P1.DNumber, P1.Day, P2.Day 
17  FROM SalChanges P1, SalChanges P2 
18  WHERE P1.DNumber = P2.DNumber AND P1.Day < P2.Day 
19  AND NOT EXISTS( SELECT * 
20          FROM SalChanges P3 
21          WHERE P3.DNumber = P1.DNumber 
22            AND P1.Day < P3.Day AND P3.Day < P2.Day) 
23 
24-- Step 2 
25CREATE VIEW TempMaxDep(DNumber, MaxSalary, FromDate, ToDate) AS 
26  SELECT P.DNumber, MAX(Amount), P.FromDate, P.ToDate 
27  FROM Aff_Sal A, SalPeriods P 
28  WHERE A.DNumber = P.DNumber 
29    AND A.FromDate <= P.FromDate AND P.ToDate <= A.ToDate 
30  GROUP BY P.DNumber, P.FromDate, P.ToDate 
31 
32-- Step 3: COALESCE

9.6.5 Sequenced division

Now we have the tables:

Affiliation












SSN DNumber FromDate ToDate




Controls












PNumber DNumber FromDate ToDate




WorksOn












SSN PNumber FromDate ToDate




And say we want to get the employees that are working all projects of the department to which they are affiliated.

10 Temporal Support in current DBMSs and SQL 2011

10.1 Oracle

10.2 Teradata

10.3 DB2

IBM DB2 10 includes the period data type, valid-time support (termed business time), transaction-time support (termed system time), timeslices, temporal upward compatibility, sequenced primary keys, and sequenced projection and selection.

10.4 SQL 2011

SQL2011 has temporal support:

Part IV
Spatial Databases

11 Introduction

A spatial database is a database that needs to store and query spatial objects, such as points (a house in a city), lines (a road) or polygons (a country). Thus, a spatial DBMS is a DBMS that manages data existing in some space:

The supporting technology needs to be able to manage large collections of geometric objects, and major commercial and open source DBMSs provide spatial support.

Spatial databases are important because queries to databases are posed in high level declarative manner, usually with SQL, which is popular in the commercial DB world. Also, although the standard SQL operates on relatively simple data types, additional spatial data types and operations can be defined in spatial databases.

A geographic information system (GIS) is a system designed to capture, store, manipulate, analyze, manage and present geographically-referenced data, as well as non spatial data. It can be used to establish connections between different elements using geography operations, such as the location, the proximity or the spatial distribution. There are plenty of commercial and open source systems of this kind, but with limited temporal support.

A GIS can be seen as a set of subsystems:

There are many fields involved in the development of GIS: geography, cartography, remote sensing, photogrammetry, geodesy, statistics, operations research, mathematics, civil engineering and computer science, with computer aided design (CAD), computer graphics, AI and DBMS.

11.1 GIS architectures

There are several possible architectures for a GIS:

12 Georeferences and Coordinate Systems

12.1 Projected coordinate systems

Going from 3D to 2D for Earth representation always involves a projection, so information on the projection is essential for applications analyzing spatial relationships and the choice of the projection used can influence the results. Also, the Earth is a complex surface whose shape and dimensions cannot be described with mathematical formulas. Two main reference surfaces are used to approximate the shape of the Earth: the ellipsoid and the geoid.

The geoid is a reference model for the physical surface of the Earth. It is defined as the equipotential surface of the Earth’s gravity field which best fits the global mean sea level and extended through the continents. It is used in geodesy but it is not very practical to produce maps. Also, since its mathematical description is unknown, it is impossible to identify mathematical relationships for moving from the Earth to a map.


PIC

Figure 10: The geoid.


An ellipsoid is a mathematically defined surface that approximates the geoid. It is the 3-dimensional version of an ellipse.


PIC

Figure 11: An ellipse (left) and how it can approximate locally a complex shape (right).


The flattening measures how much the symmetry axis is compressed relative to the equatorial radius, it is computed by

f = a---b.
      a

For the Earth, f ~1--
300: the difference of the major and minor semi-axis is approximately 21 km. The ellopsoid is thus used to measure locations, using the latitude and the longitude, for points of interest. These locations on the ellipsoid are then projected onto a mapping plane. Note that different regions of the world use a different reference ellipsoid that minimize the differences between the geoid and the ellipsoid. For Belgium, the ellipsoid is GRS80 or WGS84, with a = 6378137 m and f = ---1--
298.2572.

This is done because the physical Earth has excursions of +8km and -11km, so the geoid’s total variation goes from -107m to +85m compared to a perfect ellipsoid.

12.1.1 Latitude and longitude

Latitude and longitude are measures of the angles (in degrees) from the center of the Earth to a point on the Earth’s surface. The latitude measures angles in the North-South direction, with the equator being at 0º. The longitude measures angles in the East-West direction, with the prime meridian at 0º.

Note that they are not uniform units of distance, because the degrees change differently depending on where in the map we look at. Only along the equator the distance represented by one degree of longitude approximates the distance represented by one degree of latitude.


PIC

Figure 12: The latitude and the longitude.


To produce a map, the curved surface of the Earth, approximated by an ellipsoid is transformed into the flat plane of the map by means of a map projection. A point on the reference surface of the Earth with coordinates (ϕ,λ ) is transformed into Cartesian coordinates (x,y) representing the positions on the map plane. Each projection causes deformations in one or another way.

Map projections can be categorized in four ways: shape used, angle, fit and properties.

12.1.2 Shape of projection surface

Different shapes can be used for projection, commonly either a flat plane, a cylinder or a cone. Note that cylinders and cones are flat shapes, but they can be rolled flat without introducing additional distorsion.

According to this categorization, the projection can be:


PIC
Figure 13: Shapes of projection surface.


12.1.3 Angle

The angle referes to the alignment of the projection surface, measured as the angle between the main axis of the Earth and the main symmetry axis of the projection surface:

Ideally the plane of projection is aligned as closely as possible with the main axis of the area to be mapped, minimizing distorsion and scale errors.


PIC

Figure 14: Angle of projection.


12.1.4 Fit

The fit is a measure of how closely the projection surface fits the surface of the Earth:

Distorsion occurs whenever the projection surface is not touching or intersecting the surface of the Earth. Secant projections usually reduce scale errors, because the two surface intersect in more places and the overall fits tends to be better. A globe is the only way to represent the entire Earth without significant scale errors.


PIC

Figure 15: Fit of projection.


12.1.5 Geometric deformations

With different projections, different measures are preseved:

It is impossible to construct a map that is equal-area and conformal.

13 Conceptual Modelling for Spatial Databases

The data modeling of spatial features requires for multiple views of space (discrete and continuous; 2D, 2.5D and 3D), multiple representation of the data, at different scales and from different viewpoints, several spatial abstract data types (point, line, area, set of points,...) and explicit spatial relationships (crossing, adjacency,...).

Let’s see this requirements in depth. First, we have interaction requirements:

Practical requirements:

Thus, conceptual modelling is important because it focuses on the application, it is technology independent, which increases portability and durability. It is user oriented and uses a formal unambiguous specification, with the support of visual interfaces for data definition and manipulation. It is the best vehicle for information exchange and integration.

13.1 The Spatiotemporal conceptual manifesto

It has a good expressive power, with a simple data model, with few clean concepts and standard, well-known semantics. There are no artificial constructs and space, time and data structures are treated as orthogonal. It provieds a clean, visual notation and intuitive icons and symbols, providing a formal definition and an associated query language.

As with temporal DB, in spatial DB there are multiple ways to model the reality, each with its own characteristics. For example:

PIC

And there is also a MADS Spatial Type Hierarchy:


PIC

Figure 16: MADS Spatial Type Hierarchy.


We need to keep in mind that the types are topologically closed: all geometries include their boundary.

Geo, SimpleGeo and ComplexGeo are abstract classes.

13.1.1 MADS Spatial datatypes

13.1.2 Topological predicates

A topological predicate specifies how two geometries relate to each other. It is based on the definition of their boundary B(x), interior I(x), and exterior E(x), and their dimension Dim(x), which can be -1,0,1 or 2 (-1 is for the empty set). The dimensionally extended 9-intersection matrix (DE-9IM) is a matrix used for defining predicates. It is based in the following template:





Interior Boundary Exterior




Interior Dim(I(a)∩ I(b)) Dim(I (a)∩ B (b)) Dim(I(a)∩ E (b))




Boundary Dim(B (a) ∩I (b)) Dim(B (a)∩B (b)) Dim(B (a)∩ E (b))




Exterior Dim(E (a) ∩I (b)) Dim(E (a)∩B (b)) Dim(E(a)∩ E (b))




It is coded using a dense notation with a string of 9 characters, to represent the cells of the matrix. The possible characters are:

Example 13.1. Disjoint(a,b) is true if their intersection is empty. So, it is

Disjoint (a,b) = [I(a)∩ I(b) = ∅]∧[I(a)∩ B (b) = ∅]∧ [B (a)∩ I(b) = ∅]∧ [B (a)∩ B (b) = ∅].

Then, it can be coded as ’FF*FF****’.

Let’s see some predicates:

13.1.3 Spatial objects

Objects can be defined as spatial:


Country PIC


name

population

13.1.4 Spatial attributes

Both non-spatial and spatial objects types can have spatial attributes. The domain of a spatial attribute is a spatial type, and can be multi-values. A spatial attribute of a spatial object type may induce a toplogical constraint (e.g. the capital of a country is located within the geometry of the country), but is is not necessarilly the case: it depends on application semantics and the application schema must explicitly state these constraints.


Client


name

address

locationPIC


Country PIC


name

population

capitalPIC

rivers(1,n) PIC


RoadPIC


name

responsible

stations(1,n) PIC

Spatial complex attributes

Spatial attributes can be a component of a complex and/or multivalued attribute. It is usual to keep both thematic (alphanumeric) and location data for attributes, as a capital, for example. This allows to print both the name and the location in a map. However, in real maps the toponyms have also a location: there are precise cartographic rules for placing them, so it is a semi-automatic process.


Country PIC


name

population

capital

name

locationPIC


provinces(1,n)

name

location PIC


13.1.5 Spatial objects VS spatial attributes

Representing a concept as a spatial object or as a spatial attribute depends on the application and it is determined by the relative importance of the concept. This has implications in the way of accessing the instances of the concept. For example, buildings:

PIC

13.1.6 Generalization: inheriting spatiality.

Spaciality is inherited through generalization, based on the well-known substitutability principle in OOP. For simple inheritance it is not necessary to re-state the geometry in the subtype and, as usual, spatiality can be added to a subtype, so that only instances of the subtype have associated spaciality.

Refining/redefining spatiality

This arises when a spaciality is re-stated in a subtype:

Multiple inheritance

Spatiality is inherited from several supertypes, and this creates an ambiguity when referring to the spatiality of the subtype. Several policies have been proposed for solving this issue in the OO community, and the most general policy is: all inherited properties are available in the subtype, is the user who must disambiguate in queries.

PIC

13.1.7 Spatial relationships

Spatiality can also be defined for relationships, and it is orthogonal to the fact that linked object types are spatial. If a spatial relationship relates spatial types, spatial constraints may restrict the geometries.

PIC

Topological relationships

They are specified on a relationship type that links at least two spatial types, and constraint the spatiality of the instances of the related types. Many topological constraints can be defined using the DE-9IM. The conceptual model depicts only the most general ones. These are:

PIC

13.1.8 Spatial aggregation

Traditional aggregation relatiomnships can link spatial types. Usually, aggregation has exclusive semantics, stated by cardinalities in the component role. Also, the spatiality of the aggregation is often partitioned into the spatiality of the components.

PIC

Note that it is not the case for the second example, where the spaciality of Antena corresponds to its coverage, so the same location can be covered by several antennas. Spaciality of the aggregation is the spatial union of the spaciality of the antennas.

13.1.9 Space and time varying attributes

They are also referred to as continuous fields, and allow to represent phenomena that change in space and/or time, such as elevation (to each point there is an associated real number), population (to each point there is an associated integer) or temperature (to each point in space there is an associated real number, which evolves over time). At the conceptual level, it can be represented as a continuous function, so operators for manipulating fields can be defined. At the logical level it can be implemented in several ways:

PIC

14 Logical Modelling for Spatial Databases

14.1 Representation models

This model try to represent an infinite sets of points of the Euclidean space in a computer. There two alternative representations:

14.1.1 Raster model: tesselation

Tesselation is the decomposition of the plane into polygonal units, which might be regular or irregular, depending on whether the polygonal units are of equal size:

PIC

Regular tesselation is used for remote sensing data, while irregular tesselation is used for zoning in social, demographic or economic data. A spatial Object is represented by the smallest subset of pixels that contains it.

PIC

14.2 Digital Elevation Models (DEMs)

They provide a digital (finite) representation of an abstract model of space. DEMs are useful to represent a natural phenomenon that is a continuous function of the 2D space. They are based on a finite collection of sample values, and the rest are obtained by interpolation.

Triangulated irregular networks (TINs) are based on a triangular partition of the 2D space:

PIC

No assumption is made on the distribution and location of the vertices of the triangles and the elevation value is recorded at each vertex. The value for the rest of the points is interpolated using linear interpolation of the 3 vertices of the triangle that contains the point.

14.3 Representing the geometry of a collection of objects

There are three comonly used rperesentations: spaghetti, network and topological, which mainly differ in the expression of topological relationships among the component objects.

14.3.1 Spaghetti model

The geometry of any object is described independently of the rest, so no topology is stored in the model: the topological relationships must be computed on demand. This implies representation redundancy, but enables heterogeneous representations mixing points, polylines and regions without restrictions.

Advantages:

Drawbacks:

PIC

14.3.2 Network model

It is destined for network or graph based applications. The topological relationships among points and polylines are stored:

Nodes allow efficient line connectivity test and network computations. There are two types of points: regular points and nodes.

Depending on the implementation, the network is planar or nonplanar:

PIC

14.3.3 Topological model

It is similar to the network model, except that the network is plannar. It induces a planar subdivision into adjacent polygons, some of which may not correspond to actual geographic objects:

There is no redundancy: each point/line is stored only once.

Advantages: efficient computation of topological queries, and up-to-date consistency.

Drawbacks: some DB objects have no semantics in real world, and the xomplexity of the structure may slow down some operations.

PIC

15 SQL/MM

15.1 SQL/MM Spatial: Geometry Type Hierarchy

PIC

ST_Geometry, ST_Curve and ST_Surface are not instantiable types.

15.1.1 ST_Geometry

Represents 0D, 1D and 2D geometries that exist in 2D, 3D or 4D. Geometries in 2 have points with (x,y) coordinate values, in 3 it is (x,y,z) or (x,y,m) and in 4 it is (x,y,z,m ).

The z usually represents altitude.

The m usually represents a measurement: it is key to support linear networking applications, such as street routing, transportation or pipelining.

Geometry values are topolofically closed, i.e., they include their boundary.

All locations in a geometry are in the same spatial reference system (SRS), and geometric calculations are done in the SRS of the first geometry in the parameter list of a routine. The return value is also in the SRS of the first parameter.

15.1.2 Methods

Metadata

Spatial analysis

Input/Output

Boundary, Interior, Exterior

Boundary of a geometry: set of geometries of the next lower dimension.

Interior of a geometry: points that are left when the boundary is removed.

Exterior of a geometry: points not in the interior or boundary.

Spatial relationships

15.1.3 Example of conceptual schema

PIC

To create the tables:

1Create Table Country( 
2  country_code integer, 
3  country_name varchar (30), 
4  geometry ST_MultiPolygon, 
5  Primary Key (country_code)) 
6 
7Create Table State( 
8  state_code integer, 
9  state_name varchar (30), 
10  country_code integer, 
11  geometry ST_MultiPolygon, 
12  Primary Key (state_code), 
13  Foreign Key (country_code) References Country) 
14 
15Create Table County( 
16  county_code integer 
17  county_name varchar (30), 
18  state_code integer, 
19  population integer, 
20  geometry ST_MultiPolygon, 
21  Primary Key (county_code), 
22  Foreign Key (state_code) References State) 
23 
24/* Table Highway is NOT spatial */ 
25Create Table Highway( 
26  highway_code integer, 
27  highway_name varchar (4), 
28  highway_type varchar (2), 
29  Primary Key (highway_code)) 
30 
31Create Table HighwaySection( 
32  section_code integer, 
33  section_number integer, 
34  highway_code integer, 
35  Primary Key (section_code,highway_code), 
36  Foreign Key (section_code) References Section, 
37  Foreign Key (highway_code) References Highway) 
38 
39Create Table Section( 
40  section_code integer, 
41  section_name varchar (4), 
42  number_lanes integer, 
43  city_start varchar (30), 
44  city_end varchar (30), 
45  geometry ST_Line, 
46  Primary Key (section_code), 
47  Foreign Key (city_start) References City, 
48  Foreign Key (city_end) References City) 
49 
50Create Table City( 
51  city_name varchar (30), 
52  population integer, 
53  geometry ST_MultiPolygon, 
54  Primary Key (city_name)) 
55 
56Create Table LandUse( 
57  region_name varchar (30), 
58  land_use_type varchar (30), 
59  geometry ST_Polygon, 
60  Primary Key (region_name))

15.1.4 Reference queries: alphanumerical criteria

Number of inhabitants in the county of San Francisco:

1select population 
2from County 
3where county_name = San Francisco

List of the counties of the state of California:

1select county_name 
2from County, State 
3from State.state_code = County.state_code 
4  and state_name = California

Number of inhabitants in the US:

1select sum (c2.population) 
2from Country c1, State s, County c2 
3where c1.country_name = USA 
4  and c1.country_code = s.country_code 
5  and s.state_code = c2.state_code

Number of lanes in the first section of Interstate 99:

1select s.number_lanes 
2from Highway h1, HighwaySection h2, Section s 
3where h1.highway_code = h2.highway_code 
4  and h2.section_code = s.section_code 
5  and h1.highway_name = I99 
6  and h2.section_number = 1

Name of all sections that constitute Interstate 99:

1select s.section_name 
2from Highway h1, HighwaySection h2, Section s 
3where h1.highway_name = I99 
4  and h1.highway_code = h2.highway_code 
5  and h2.section_code = s.section_code

15.1.5 Reference queries: spatial criteria

Counties adjacent to the county of San Francisco in the same state:

1select c1.county_name 
2from County c1, County c2 
3where c2.county_name = San Francisco 
4  and c1.state_code = c2.state_code 
5  and ST_Touches(c1.geometry, c2.geometry)

Display of the state of California (supposing that the State table is non spatial):

1select ST_Union(c.geometry) 
2from County c, State s 
3where s.state_code = c.state_code 
4  and s.state_name = California

Counties larger than the largest county in California:

1select c1.county_name 
2from County c1 
3where ST_Area(c1.geometry) > (select max (ST_Area(c.geometry)) 
4                from County c, State s 
5                where s.state_code = c.state_code 
6                  and s.state_name = California)

Length of Interstate 99:

1select sum (ST_Length(s.geometry)) 
2from Highway h1, HighwaySection h2, Section s 
3where h1.highway_name = I99 
4  and h1.highway_code = h2.highway_code 
5  and h2.section_code = s.section_code

All highways going through the state of California:

1select distinct h1.highway_name 
2from State s1, Highway h1, HighwaySection h2, Section s2 
3where s1.state_name = California 
4  and h1.highway_code = h2.highway_code 
5  and h2.section_code = s2.section_code 
6  and ST_Overlaps(s2.geometry, s1.geometry)

Display all residential areas in the county of San Jose:

1select ST_Intersection(l.geometry, c.geometry) 
2from County c, LandUse l 
3where c.county_name = San Jose 
4  and l.land_use_type = residential area 
5  and ST_Overlaps(l.geometry, c.geometry)

Overlay the map of administrative units and land use:

1select county_name, land_use_type, ST_Intersection(c.geometry, l.geometry) 
2from County c, LandUse l 
3where ST_Overlaps(c.geometry, l.geometry)

15.1.6 Reference queries: interactive queries

Description of the county pointed to on the screen:

1select county_name, population 
2from County 
3where ST_Contains(geometry, @point)

Counties that intersect a given rectangle on the screen:

1select county_name 
2from County 
3where ST_Overlaps(geometry, @rectangle)

Part of counties that are withini a given rectangle on the screen (clipping):

1select ST_Intersection(geometry, @rectangle) 
2from County 
3where ST_Overlaps(geometry, @rectangle)

Description of the highway section pointed to on the screen:

1select section_name, number_lanes 
2from Section 
3where ST_Contains(geometry, @point)

Description of the highways of which a section is pointed to on the screen:

1select h1.highway_name, h1.highway_type 
2from Highway h1, HighwaySection h2, Section s 
3where h1.highway_code = h2.highway_code 
4  and h2.section_code = s.section_code 
5  and ST_Contains(s.geometry, @point)

16 Representative Systems

16.1 Oracle Locator

Included in all editions of the DB, with all functions required for standard GIS tools and all geometric objects (Points, lines and polygons in 2D, 3D and 4D). It uses indexing with quadtrees and rtrees and supports geometric queries, proximity search, distance calculation, multiple projection and conversion of projections.

16.1.1 Oracle Spatial

It is an option or Oracle DB Enterprise edition, which extends the locator with geometric transformations, spatial aggregations, dynamic segmentation, measures, network modelling, topology, raster, geocoder, spatial data mining, 3D types and Web Services.

16.1.2 Oracle Network Model

It is a data model for representing networks in the database. It maintains connectivity and attributes at the link and node levels. It is used for networks management and as a navigation engine for route calculation.

16.1.3 Oracle Topological Model

It persists the storage of the topology with nodes, arcs and faces, and the topological relations, allowing for advanced consistency checks. The data model allows to define objects through topological primitives and adds the type SDO_TOPO_GEOMETRY. It coexists with traditional spatial data.

16.1.4 Oracle Geo Raster

It adds the new data type SDO_GEORASTER, providing an open, general purpose raster data model, with storage and indexing of raster data, as well as the ability to query an analyze raster data.

16.1.5 Oracle geocoding

Generates latitude/longitude points from address. It is an international addressing standardization, working with formatted and unformatted addresses. Its tolerance parameters support fuzzy matching.

16.1.6 Oracle MapViewer

It is supplied with all versions of Oracle Application Server, providing XML, Java and JS interfaces. It is a tool for map definition, for maps described in the database, thematic maps.

16.1.7 Oracle: geometry type

To create a spatial table in Oracle:

1CREATE TABLE Cells ( 
2  Cell_id NUMBER, 
3  Cell_name VARCHAR2(32), 
4  Cell_type NUMBER, 
5  Location SDO_GEOMETRY, 
6  Covered_area SDO_GEOMETRY);

16.1.8 Oracle: geometrical primitives

PIC

16.1.9 Oracle: element

An element is the basic component of geometric objects. The type of an element can be point, line or polygons. Fromed of an ordered sequence of points.

16.1.10 Oracle: geometry

Represents a spatial object and is composed of an ordered list of elements, which may be homogeneous or heterogeneous.

16.1.11 Oracle: layer

Represents a geometrical column in a table. In general, it contains objects of the same nature.

16.1.12 SDO_GEOMETRY type

Structure:

1SDO_GTYPE   NUMBER 
2SDO_SRID  NUMBER 
3SDO_POINT   SDO_POINT_TYPE 
4SDO_ELEM_INFO   SDO_ELEM_INFO_ARRAY 
5SDO_ORDINATES   SDO_ORDINATE_ARRAY

Example:

1CREATE TABLE states ( 
2  state VARCHAR2(30), 
3  totpop NUMBER(9), 
4  geom SDO_GEOMETRY);

SDO_GTYPE

Define the nature of the geometric shape contained in the object.

PIC

SDO_SRID

SRID = Spatial Reference System ID. It specifies the coordinate system of the object. The list of possible values is in the table MDSYS.CS:SRS. A common value is 8307: which is ’Longitude/Latitude WGS84’, used by the GPS system. All geometries of a layer must have the same SRID. Layers may have different SRIDs. Automatic conversion for spatial queries.

SDO_POINT

Structure

1x NUMBER 
2y NUMBER 
3z NUMBER

Example

1INSERT INTO TELEPHONE_POLES (col-1, ..., col-n, geom) 
2VALUES (attribute-1, ..., attribute-n, 
3  SDO_GEOMETRY ( 
4    2001, 8307, 
5    SDO_POINT_TYPE (-75.2,43.7,null), 
6    null, null) 
7);

SDO_ORDINATES

Object type SDO_ORDINATE_ARRAY which is VARRAY (1048576) OF NUMBER. It stores the coordinates of lines and polygons.

PIC

SDO_ELEM_INFO

Object type SDO_ELEM_INFO_ARRAY which is VARRAY (1048576) OF NUMBER. It specifies the nature of the elements and describes the various components of a complex object. There three entries per element:

PIC

Examples of geometries:

Line:

PIC

Polygon:

PIC

Multipoint:

PIC

Constructing a line

1INSERT INTO LINES (col-1, ..., col-n, geom) VALUES ( 
2  attribute_1, ..., attribute_n, 
3  SDO_GEOMETRY ( 
4    2002, 8307, null, 
5    SDO_ELEM_INFO_ARRAY (1,2,1), 
6    SDO_ORDINATE_ARRAY ( 
7      10,10, 20,25, 30,10, 40,10)) 
8);

Metadata

Defines the boundaries of a layer, i.e., minimum and amximium coordinates for each dimension. Sets the tolerance of a layer, i.e., the maximum distance between two points for them to be considered different. And defines the coordinate system for a layer.

Example:

1INSERT INTO USER_SDO_GEOM_METADATA (TABLE_NAME, COLUMN_NAME, DIMINFO, SRID) VALUES ( 
2  ROADS, GEOMETRY, 
3  SDO_DIM_ARRAY ( 
4    SDO_DIM_ELEMENT(Long, -180, 180, 0.5), 
5    SDO_DIM_ELEMENT(Lat, -90, 90, 0.5)),  8307 );

Constructing geometries

1-- Standard constructor 
2INSERT INTO TELEPHONE_POLES (col-1, ..., col-n, geom) VALUES (attribute-1, ..., attribute-n, 
3  SDO_GEOMETRY ( 
4    2001, 8307, 
5    SDO_POINT_TYPE (-75.2,43.7,null), 
6    null, null) 
7); 
8 
9-- Well-Known Text (WKT) constructor 
10INSERT INTO TELEPHONE_POLES (col-1, ..., col-n, geom) VALUES (attribute-1, ..., attribute-n, 
11  SDO_GEOMETRY (POINT (-75.2 43.7),8307) 
12); 
13 
14-- Well-Known Binary (WKB) constructor 
15INSERT INTO TELEPHONE_POLES (col-1, ..., col-n, geom) VALUES (attribute-1, ..., attribute-n, 
16  SDO_GEOMETRY (:my_blob,8307) 
17);

16.1.13 Oracle: Spatial indexes

A spatial index must exists before we can ask spatial queries on a table.

16.1.14 Oracle: query execution model

PIC

But it can be optimized:

PIC

16.1.15 Oracle: writing spatial queries

They contain a sptail predicate, and are expressed through specific SQL operators: SDO_RELATE, SDO_INSIDE, SDO_TOUCH, SDO_WITHIN_DISTANCE, SDO_NN. The spatial index must exists, otherwise we will get an error message.

Topological predicates

They select objects by their topological relationship with another object: SDO_INSIDE, SDO_CONTAINS, SDO_COVERS, SDO_COVEREDBY, SDO_OVERLAPS, SDO_TOUCH, SDO_EQUAL, SDO_ANYINTERACT.

The SDO_RELATE is a generic operator for which we can specify a mask, that can be ’INSIDE’, ’CONTAINS’, ’TOUCH’,... or a combination as ’INSIDE+CONTAINS’

Sample queries

Which parks are entirely contianed in the state of Wyoming:

1-- way 1 
2SELECT p.name 
3FROM us_parks p, us_states s 
4WHERE s.state = Wyoming 
5  AND SDO_INSIDE (p.geom, s.geom) = TRUE; 
6 
7-- way 2 
8SELECT p.name 
9FROM us_parks p, us_states s 
10WHERE s.state = Wyoming 
11  AND SDO_RELATE(p.geom,s.geom,MASK=INSIDE) = TRUE;

Which states contain all or part of Yellowstone Park:

1SELECT s.state 
2FROM us_states s, us_parks p 
3WHERE SDO_ANYINTERACT (s.geom, p.geom) = TRUE 
4  AND p.name = Yellowstone NP;

In which competing jurisdictions is my client:

1SELECT s.id, s.name 
2FROM customers c, competitors_sales_regions s 
3  WHERE c.id = 5514 AND SDO_CONTAINS (s.geom, c.location) = TRUE;

Find all counties around Passaic County:

1SELECT c1.county, c1.state_abrv 
2FROM us_counties c1, us_counties c2 
3WHERE c2.state = New Jersey AND c2.county = Passaic 
4  AND SDO_TOUCH (c1.geom, c2.geom) = TRUE;

Queries with a constant window

Find all customers of type Platinum in a rectangular area:

1SELECT name, category 
2FROM customers 
3WHERE SDO_INSIDE (location, 
4  sdo_geometry (2003, 8307, null, 
5  sdo_elem_info_array (1,1003,3), 
6  sdo_ordinate_array (-122.413, 37.785,-122.403, 37.792))) 
7  =TRUE 
8  AND customer_grade = PLATINUM;

In which competitor sales territories is located a geographical point:

1SELECT id, name 
2FROM competitors_sales_regions 
3WHERE SDO_CONTAINS (geom, 
4  SDO_GEOMETRY(2001, 8307, 
5  SDO_POINT_TYPE(-122.41762, 37.7675089, NULL), 
6  NULL, NULL) 
7  = TRUE;

Queries based on distance

These are used to select objects according to distance from another point. Distance can be expressed in any unit of distance and if no unit is specified, the distance is expressed in the unit of the coordinate system. For longitude/latitude data, these are meters. The function used is SDO_WITHIN_DISTANCE.

Which agencies are less than 1km form this client:

1SELECT b.id, b.phone_number 
2FROM customers c, branches b 
3WHERE c.id = 8314 
4  AND SDO_WITHIN_DISTANCE(b.location, c.location,distance=1 unit=km)= TRUE;

How many customers in each category are located within 1/4 mile of my office number 77:

1SELECT customer_grade, COUNT(*) 
2FROM branches b, customers c 
3WHERE b.id=77 
4  AND SDO_WITHIN_DISTANCE (c.location, b.location,DISTANCE=0.25 UNIT=MILE)=TRUE 
5GROUP BY customer_grade;

Research based on proximity

These select the N closest objects of another objects: SDO_NN. ROWNUM can be used to limit result. SDO_NN_DISTANCE is used to categorize answers by distance.

What is the nearest office to this client?

1SELECT b.id, b.phone_number 
2FROM customers c, branches b 
3WHERE c.id = 8314 
4  AND SDO_NN(b.location, c.location,sdo_num_res=1)= TRUE;

What are my five customers closest to this competitor:

1SELECT c.id, c.name, c.customer_grade 
2FROM competitors co, customers c 
3WHERE co.id=1 
4  AND SDO_NN (c.location, co.location,SDO_NUM_RES=5)=TRUE ;

This only works if no other selection criterion is present.

What are my five customers closest to this competitor, and give the distance:

1SELECT c.id, c.name, c.customer_grade, SDO_NN_DISTANCE(1) distance 
2FROM competitors co, customers c 
3WHERE co.id=1 
4  AND SDO_NN (c.location, co.location,SDO_NUM_RES=5, 1)=TRUE 
5ORDER BY distance;

If we want to add more filters, we cannot use the SDO_NN function, because it is evaluated first. We have to do the trick with SDO_DISTANCE+ORDER BY+ROWNUM.

Spatial Join

It is used to find correlations between two tables, based on topology or distance. It compares all objects in a table with all those of another table, so it requires a R-Tree on each table. Technically implemented as a function that returns a table: the SDO_JOIN.

Associate with each GOLD customer the sales territory in which is is located:

1SELECT s.id, c.id, c.name 
2FROM customers c, sales_regions s, TABLE(SDO_JOIN(customers,location,sales_regions,geom,mask=inside)) j 
3WHERE j.rowid1 = c.rowid 
4  AND j.rowid2 = s.rowid 
5  AND c.customer_grade = GOLD 
6ORDER BY s.id, c.id;

Find all gold customers who are less than 500 meters from one of our branches in San Francisco:

1SELECT DISTINCT c.id, c.name, b.id 
2FROM customers c, branches b, TABLE(SDO_JOIN(CUSTOMERS,LOCATION,BRANCHES,LOCATION,DISTANCE=500 UNIT=METER)) j 
3WHERE j.rowid1 = c.rowid 
4  AND j.rowid2 = b.rowid 
5  AND c.customer_grade = GOLD 
6  AND b.city = SAN FRANCISCO;

Spatial functions

PIC

Objects must be in the same coordinate system.

What is the total area of Yellowstone National Park:

1SELECT sdo_geom.sdo_area(geom,0.005,unit=sq_km) 
2FROM us_parks 
3WHERE name = Yellowstone NP;

What is the length of the Mississippi riveer:

1SELECT sdo_geom.sdo_length(geom,0.005,unit=km) 
2FROM us_rivers 
3WHERE name = Mississippi;

What is the distance between Los Angeles and San Francisco:

1SELECT sdo_geom.sdo_distance(a.location, b.location, 0.005, unit=mile) 
2FROM us_cities a, us_cities b 
3WHERE a.city = Los Angeles 
4  AND b.city = San Francisco;

Generating objects

Combining objects

What is the area occupied by the Yellowstone Park in the state it occupies:

1SELECT s.state, sdo_geom.sdo_area (sdo_geom.sdo_intersection (s.geom, p.geom, 0.5),0.5, unit=sq_km) area 
2FROM us_states s, us_parks p 
3WHERE SDO_ANYINTERACT (s.geom, p.geom) = TRUE 
4  AND p.name = Yellowstone NP;

Spatial aggregation

Aggregate functions operate on the set of objects.

Find the focal point of all our customers in Daly City:

1SELECT SDO_AGGR_CENTROID(SDOAGGRTYPE(location,0.5)) center 
2FROM customers 
3WHERE city = DALY CITY;

Calculate the number of customer in each zip code, and calculate the focal point for these clients:

1SELECT COUNT(*), postal_code, SDO_AGGR_CENTROID(SDOAGGRTYPE(location,0.5)) center 
2FROM customers 
3GROUP BY postal_code;

References

[1]   Esteban Zimanyi. Infoh415 advanced databases. Lecture Notes.