pkb contents > sql | just under 1786 words | updated 05/24/2017
SQL is the standardized language used to access a database. SQL language provides for: (1) data definition/DDL statements that help you define the database and its objects; (2) data manipulation/DML statements that allow you to update and query data; (3) data control, allows you to grant the permissions to a user to access a certain data in the database.
Datatypes: http://www.w3schools.com/sql/sql_datatypes.asp, but they will depend on DBMS
-- create table:
CREATE TABLE t (id INT PRIMARY KEY, col1 TYPE(size), col2 TYPE constraint constraint, col3 FOREIGN KEY REFERENCES other_table(fieldname), col4 DEFAULT ‘value’, … );
-- constraints: NOT NULL, UNIQUE, CHECK, DEFAULT
-- autopopulate date: CREATE TABLE tname (order_date date DEFAULT getdate());
CREATE TABLE tname (… UNIQUE (col1, col2, …) … );
CREATE TABLE tname ( … year INT CHECK (year>1950 AND year<1980)) …)
CREATE TABLE tname (... CHECK (price/hours < 20));
-- composite PK:
CREATE TABLE t (name int, date int, PRIMARY KEY(name,date));
-- foreign key:
… FOREIGN KEY(c1) REFERENCES t1(c1), FOREIGN KEY(c2) REFERENCES t2(c1));
… REFERENCES … ON DELETE [action] ON UPDATE [action]
-- CASCADE: when the referenced row is deleted or updated, the respective rows of the referencing table will be deleted or updated.
-- NO ACTION: do nothing to the the referenced row
-- SET NULL: the values of the affected rows are set to NULLs
-- SET DEFAULT: the values of the affected rows are set to their default values
-- multicolumn foreign key:
… FOREIGN KEY(c1, c2) REFERENCES t2(c1, c2));
-- autoincrement PK:
… id INT IDENTITY(#start, #inc) PRIMARY KEY, ...
… id INT AUTO_INCREMENT(#start, #inc) PRIMARY KEY, …
-- create PK
CREATE SEQUENCE name START WITH # INCREMENT BY #;
CREATE TABLE t ( … id DEFAULT nextval(‘name’) PRIMARY KEY … );
--- delete table:
DROP TABLE tname;
DROP DATABASE dname;
TRUNCATE TABLE tname;
--- add records: when SELECT is used as an expression in these kinds of queries, it needs to return only one column;
--- so it sometimes helps to create a new identifier, say c1*c2*c3
INSERT INTO tname VALUES (key#, ‘string_value’, NULL, ...);
--- if select returns same schema:
INSERT INTO t (c1, c2) SELECT c1, c2 …
--- add fields to confom to schema:
SELECT col, 12, NULL
INSERT INTO t (col1, colx) VALUES (val, val), (val, val), (val, val)
--- delete records:
--- some implementations disallow condition statements with a subquery that includes the affected table
DELETE FROM tname WHERE …
--- delete all records:
DELETE FROM table;
--- update records:
UPDATE tname SET c1=value/subquery, c2=value/subquery WHERE …
UPDATE tname
SET [fname] = 'Value'
WHERE fname = 'Value';
view data is not stored physically; every time you retrieve data from the view, the database reruns the underlying query. most databases don't allow inserting new data or updating existing data in views.
create a view: CREATE VIEW vname AS [query];
use a view: SELECT * FROM vname;
DROP VIEW vname;
Match the following clauses with its definition:
SELECT c1,c2 FROM tname WHERE ... ; SELECT * FROM tname;
operators: <, >, <=, >=, !=, ==, <>, OR, AND, NOT, BETWEEN, IS, NULL, EXISTS, ANY, ALL
SELECT * FROM tname WHERE cname=numeric;
SELECT * FROM tname WHERE cname=‘string’;
SELECT c1,c2 FROM t WHERE c1>= value OR c2=value;
SELECT … FROM … WHERE (complex_condition1) AND (complex_condition2)
SELECT * FROM t WHERE c BETWEEN condition1 AND condition2;
dep. on implementation, may not include endpoints
… NOT BETWEEN …
IS NULL, IS NOT NULL
EXISTS checks whether a subquery is empty; its expressive power encompasses:
… = ALL (SELECT …), … != ALL (SELECT …)
… = ANY (SELECT …), … != ANY (SELECT …)
fuzzy match:
multiple char: SELECT * FROM … WHERE c LIKE ‘%string%’;
single char: SELECT * FROM … WHERE c LIKE ‘_tringvalue’;
-- concatenate strings: really depends on platform
SELECT CONCAT(c1,c2) FROM t
SELECT c1+c2+’ ‘+c3 ..
SELECT c1 || ‘ ‘ || c2 AS email_address FROM …
SELECT LENGTH(col) …
SELECT MAX(LENGTH(col)) …
change case: SELECT UPPER(c1), LOWER(c2) …
SELECT SUBSTR(col,#from,#inc) …
1-based indexing
SELECT REPLACE(col,’from_string’,’to_string’) …
date: yyyy-mm-d
time: hh:mm:ss
datetime: yyyy-mm-dd hh:mm:ss
DATE(<timestring>,<modifier>,<modifier>, …)
timestring: “now”, “yyyy-mm-dd”
modifier: “-7 days”, “+2 months”
DATE(“datetime_string”) → yyyy-mm-dd
TIME(“datetime_string”) → hh:mm:ss
STRFTIME(“format_string”, ”datetime_string”, <modifier>)
“%d/%m/%Y”
skip: SELECT <c> FROM <t> LIMIT <# of rows> OFFSET <skipped rows>;
SELECT <c> FROM <t> LIMIT <skipped rows>, <# of rows>;
SELECT <c> FROM <t> OFFSET <skipped rows> ROWS FETCH NEXT <#> ROWS ONLY;
summary statistics: SELECT max/min/avg/sum/abs(cname) FROM tname;
SELECT c FROM t T1 WHERE NOT EXISTS (SELECT * FROM t T2 WHERE T2.val>T1.val);
SELECT c FROM t WHERE c >= ALL (SELECT c FROM t);
SELECT c FROM t WHERE NOT c < ANY (SELECT c FROM t);
SELECT round(col,#digits) …
sort (default=ASC): SELECT * FROM tname ORDER BY fieldname DESC;
select distinct: SELECT DISTINCT c FROM t; SELECT c FROM t GROUP BY c;
count distinct: SELECT count(DISTINCT cname) FROM tname;
count non-NULL values: SELECT count(cname) FROM tname;
sort groups: SELECT c1, sum(c3) FROM t GROUP BY c1 ORDER BY sum(c3);
display in groups: SELECT c1, c2, count(col3) FROM t GROUP BY c1, c2;
filtering groups: SELECT c1, count(c3) FROM t GROUP BY c1 HAVING cndn;
Which of the following are characteristics of a JOIN?
returns, for two mxn tables, a table of dim (m1*m2)x(n1+n2): SELECT * FROM t1, t2
join via select: SELECT * FROM table1,table2 WHERE table1.col = table2.col;
for joins note that, in SQL classic versus the graphic:
the default JOIN is inner join
FULL JOIN doesn’t need “outer” clause
NATURAL JOIN doesn’t need “on” clause; good for well-built db
aliases enable self-joins: SELECT c1 AS ... FROM t1 AS ... JOIN t2 AS … ON …
SELECT * FROM Student S1, Student S2 WHERE S1.GPA = S2.GPA AND S1.sID > S2.sID
multiple tables: SELECT * FROM t1 JOIN t1 ON … JOIN t3 ON … WHERE …
set operators: let you query n tables and show the results as one table, provided both tables have the same #cols of the same type
removes duplicates: SELECT … FROM … UNION SELECT … FROM …;
multiset operator: SELECT … FROM … UNION ALL SELECT … FROM ...;
SELECT … FROM … INTERSECT SELECT … FROM ...;
SELECT * FROM <self_join> WHERE <join_cndn> AND <select_cndn>
except: SELECT ... FROM ... EXCEPT SELECT … FROM...;
SELECT * FROM ... WHERE * IN (subquery) AND * NOT IN (subquery)
SELF JOIN
SUBQUERIES … the database will first check the subquery then check the final query, e.g.: SELECT name FROM city WHERE rating = (SELECT rating FROM city WHERE name = 'Paris');
subqueries can be in the WHERE clause …
in FROM clause, reducing need for calculation in SELECT and WHERE ..
and in the SELECT clause
used as an expression, it's important the subquery return exactly one col
compare with a set of values instead of a single value: … WHERE … IN (subquery);
correlated subqueries, i.e. dependent on the main query; subqueries can use tables from the main query, but the main query can't use tables from the subquery.
good for debugging, e.g. SELECT * FROM country WHERE area <= (SELECT min(area) FROM city WHERE city.country_id = country.id);
use aliases for self-correlated subqueries: SELECT * FROM city as c1 WHERE c1.rating > (SELECT avg(c2.rating) FROM city AS c2 WHERE c1.country_id=c2.country_id);
exists operator: SELECT * FROM country WHERE EXISTS (SELECT * FROM mountain WHERE country.id = mountain.country_id);
Relational algebra is the formal math underlying SQL. Unlike SQL, it’s set-based, so it automatically removes duplicates from its ‘results’. RA operators are applied to expression, - trees, or assignment statements: