pkb contents > sql | just under 1786 words | updated 05/24/2017

1. What is SQL?

SQL is the standardized language used to access a database. SQL language provides for: (1) data definition/DDL statements that help you define the database and its objects; (2) data manipulation/DML statements that allow you to update and query data; (3) data control, allows you to grant the permissions to a user to access a certain data in the database.

1.1. Notation and style guide

2. Data definition

2.1. Manage databases

2.2. Manage tables

Datatypes: http://www.w3schools.com/sql/sql_datatypes.asp, but they will depend on DBMS

-- create table: 

CREATE TABLE t (id INT PRIMARY KEY, col1 TYPE(size), col2 TYPE constraint constraint, col3 FOREIGN KEY REFERENCES other_table(fieldname), col4 DEFAULT ‘value’, … ); 
-- constraints: NOT NULL, UNIQUE, CHECK, DEFAULT
-- autopopulate date: CREATE TABLE tname (order_date date DEFAULT getdate()); 
CREATE TABLE tname (… UNIQUE (col1, col2, …) … );
CREATE TABLE tname ( … year INT CHECK (year>1950 AND year<1980)) …)
CREATE TABLE tname (... CHECK (price/hours < 20)); 

-- composite PK: 
CREATE TABLE t (name int, date int, PRIMARY KEY(name,date));
-- foreign key:FOREIGN KEY(c1) REFERENCES t1(c1), FOREIGN KEY(c2) REFERENCES t2(c1));
… REFERENCESON DELETE [action] ON UPDATE [action]

-- CASCADE: when the referenced row is deleted or updated, the respective rows of the referencing table will be deleted or updated.
-- NO ACTION: do nothing to the the referenced row
-- SET NULL: the values of the affected rows are set to NULLs
-- SET DEFAULT: the values of the affected rows are set to their default values

-- multicolumn foreign key: FOREIGN KEY(c1, c2) REFERENCES t2(c1, c2)); 

-- autoincrement PK:id INT IDENTITY(#start, #inc) PRIMARY KEY, ...
… id INT AUTO_INCREMENT(#start, #inc) PRIMARY KEY, …

-- create PK
CREATE SEQUENCE name START WITH # INCREMENT BY #; 
CREATE TABLE t ( … id DEFAULT nextval(‘name’) PRIMARY KEY … );

--- delete table: 
DROP TABLE tname;
DROP DATABASE dname;
TRUNCATE TABLE tname;

--- add records: when SELECT is used as an expression in these kinds of queries, it needs to return only one column; 
--- so it sometimes helps to create a new identifier, say c1*c2*c3
INSERT INTO tname VALUES (key#, ‘string_value’, NULL, ...);

--- if select returns same schema: 
INSERT INTO t  (c1, c2) SELECT c1, c2 … 

--- add fields to confom to schema: 
SELECT col, 12, NULL
INSERT INTO t (col1, colx) VALUES (val, val), (val, val), (val, val)

--- delete records: 
--- some implementations disallow condition statements with a subquery that includes the affected table
DELETE FROM tname WHERE--- delete all records: 
DELETE FROM table;

--- update records:
UPDATE tname SET c1=value/subquery, c2=value/subquery WHERE

2.3. Manage fields

UPDATE tname
SET [fname] = 'Value'
WHERE fname = 'Value';

2.4. Manage views

view data is not stored physically; every time you retrieve data from the view, the database reruns the underlying query. most databases don't allow inserting new data or updating existing data in views.

create a view: CREATE VIEW vname AS [query];
use a view: SELECT * FROM vname;
DROP VIEW vname;

3. Data manipulation

3.1. Generic query form

Match the following clauses with its definition:

SELECT c1,c2 FROM tname WHERE ... ; SELECT * FROM tname;
operators: <, >, <=, >=, !=, ==, <>, OR, AND, NOT, BETWEEN, IS, NULL, EXISTS, ANY, ALL
SELECT * FROM tname WHERE cname=numeric;
SELECT * FROM tname WHERE cname=‘string’;
SELECT c1,c2 FROM t WHERE c1>= value OR c2=value;
SELECTFROMWHERE (complex_condition1) AND (complex_condition2)
SELECT * FROM t WHERE c BETWEEN condition1 AND condition2;
dep. on implementation, may not include endpoints
… NOT BETWEENIS NULL, IS NOT NULL
EXISTS checks whether a subquery is empty; its expressive power encompasses:
… = ALL (SELECT …), … != ALL (SELECT …)
… = ANY (SELECT …), … != ANY (SELECT …)
fuzzy match: 
multiple char: SELECT * FROMWHERE c LIKE ‘%string%’;
single char: SELECT * FROMWHERE c LIKE ‘_tringvalue’;

3.2. SELECT and display

-- concatenate strings: really depends on platform
SELECT CONCAT(c1,c2) FROM t
SELECT c1+c2+’ ‘+c3 ..
SELECT c1 || ‘ ‘ || c2 AS email_address FROMSELECT LENGTH(col)  … 
SELECT MAX(LENGTH(col)) … 
change case: SELECT UPPER(c1), LOWER(c2) … 
SELECT SUBSTR(col,#from,#inc) … 
1-based indexing
SELECT REPLACE(col,’from_string’,’to_string’) …
date: yyyy-mm-d
time: hh:mm:ss
datetime: yyyy-mm-dd hh:mm:ss
DATE(<timestring>,<modifier>,<modifier>, …)
timestring: “now”, “yyyy-mm-dd”
modifier: “-7 days”, “+2 months”
DATE(“datetime_string”) → yyyy-mm-dd
TIME(“datetime_string”) → hh:mm:ss
STRFTIME(“format_string”, ”datetime_string”, <modifier>)
“%d/%m/%Y”

3.2.1. WITH conditional filtering

3.2.2. Aggregate and GROUP BY

skip: SELECT <c> FROM <t> LIMIT <# of rows> OFFSET <skipped rows>;
SELECT <c> FROM <t> LIMIT <skipped rows>, <# of rows>; 
SELECT <c> FROM <t> OFFSET <skipped rows> ROWS FETCH NEXT <#> ROWS ONLY;
summary statistics: SELECT max/min/avg/sum/abs(cname) FROM tname; 
SELECT c FROM t T1 WHERE NOT EXISTS (SELECT * FROM t T2 WHERE T2.val>T1.val);
SELECT c FROM t WHERE c >= ALL (SELECT c FROM t); 
SELECT c FROM t WHERE NOT c <  ANY (SELECT c FROM t); 
SELECT round(col,#digits) … 
sort (default=ASC): SELECT * FROM tname ORDER BY fieldname DESC;
select distinct: SELECT DISTINCT c FROM t; SELECT c FROM t GROUP BY c;
count distinct: SELECT count(DISTINCT cname) FROM tname;
count non-NULL values: SELECT count(cname) FROM tname;
sort groups: SELECT c1, sum(c3) FROM t GROUP BY c1 ORDER BY sum(c3);
display in groups: SELECT c1, c2, count(col3) FROM t GROUP BY c1, c2; 
filtering groups: SELECT c1, count(c3) FROM t GROUP BY c1 HAVING cndn;

3.3. FROM

3.3.1. JOINs

Which of the following are characteristics of a JOIN?

returns, for two mxn tables, a table of dim (m1*m2)x(n1+n2): SELECT * FROM t1, t2
join via select: SELECT * FROM table1,table2 WHERE table1.col = table2.col;
for joins note that, in SQL classic versus the graphic: 
the default JOIN is inner join
FULL JOIN doesn’t need “outer” clause
NATURAL JOIN doesn’t need “on” clause; good for well-built db 
aliases enable self-joins: SELECT c1 AS ... FROM t1 AS ... JOIN t2 ASONSELECT * FROM Student S1, Student S2 WHERE S1.GPA = S2.GPA AND S1.sID > S2.sID
multiple tables: SELECT * FROM t1 JOIN t1 ONJOIN t3 ONWHEREset operators: let you query n tables and show the results as one table, provided both tables have the same #cols of the same type
removes duplicates: SELECTFROMUNION SELECTFROM …; 
multiset operator: SELECTFROMUNION ALL SELECTFROM ...;
SELECTFROMINTERSECT SELECTFROM ...;
SELECT * FROM <self_join> WHERE <join_cndn> AND <select_cndn>
except: SELECT ... FROM ... EXCEPT SELECT … FROM...;
SELECT * FROM ... WHERE * IN (subquery) AND * NOT IN (subquery) 
SELF JOIN

3.3.2. Subqueries

SUBQUERIES … the database will first check the subquery then check the final query, e.g.: SELECT name FROM city WHERE rating = (SELECT rating FROM city WHERE name = 'Paris');
subqueries can be in the WHERE clause …
in FROM clause, reducing need for calculation in SELECT and WHERE ..
and in the SELECT clause 
used as an expression, it's important the subquery return exactly one col 
compare with a set of values instead of a single value: … WHERE … IN (subquery);
correlated subqueries, i.e. dependent on the main query; subqueries can use tables from the main query, but the main query can't use tables from the subquery. 
good for debugging, e.g. SELECT * FROM country WHERE area <= (SELECT min(area) FROM city WHERE city.country_id = country.id);
use aliases for self-correlated subqueries: SELECT * FROM city as c1 WHERE c1.rating > (SELECT avg(c2.rating) FROM city AS c2 WHERE c1.country_id=c2.country_id);
exists operator: SELECT * FROM country WHERE EXISTS (SELECT * FROM mountain WHERE country.id = mountain.country_id);

3.4. Set operations

Relational algebra is the formal math underlying SQL. Unlike SQL, it’s set-based, so it automatically removes duplicates from its ‘results’. RA operators are applied to expression, - trees, or assignment statements:

4. Sources

4.1. References

4.2. Archive

4.3. Inbox