1.Assignment 1
分为两个部分:先做ER图,建立conceptual model,这部分不算分,Week3会课上讨论,Week3结束会放出参考答案,最迟Week4初;第二个部分是用Week2搭好的PostgreSQL实现之前做的model,计分。
2.ER Model
再次强调relationship types:其一是cardinality,课程遵循课上说的ER Notation,即“1”个的时候用“箭头”,“多个”的时候用“非箭头”;其二是participation,如果是total participation用“加粗/双线”,如果是partial participation用“正常粗细”。
注意:cardinality和participation都是relationship constraints,二者是并列关系,表达时可以叠加。比如,粗箭头。
处理复杂关系时,可以分多个图完成,一张图处理主要的relationship,这里的entity不带attributes,其他的图表征attributes和subclass。
subclass的线段如果加粗,和之前的notation含义相同,也是指total participation。
3.Relational Data Model
relational model将现实世界描述为有内在联系的关系/表格集合(a collection of inter-connected relations (or tables)),它能简化复杂问题,映射到可操作的文件结构中。
有两类术语:其一是mathematical的,如relation, tuple, attribute;其二是data-oriented的,如table, record, field/column。虽然表达不算,但是是一致的,比如mathematical relation = data-oriented table。
3.1 Core values
relation是relational data model的核心,包括:
(1)name,在具体的database是unique的,可以和其他database的relation name重名,比如,address出现在不同database中
(2)attribue,可以理解为column heading。
attribute又包括:
(1)name,在具体的relation中应该是unique的
(2)associated domain,相关的限制。例如,车牌VIN,它的限制是不能使用I, O, Q,因为这些字母容易和数字01混淆
3.2 Notice
(1)relation的instance通常叫做tuples, rows, records
(2)relation中的attribute必须是atomic的,不能是composite或者是multi-valued,这样无法用table表达。如果name是由first name和given name构成,不能把name当成attribute,而是first name和given name。如果colours of car是一个multi-valued attribute,包括red, blue, black,不能把colours of car当成attribute,而是red, blue, black。
(3)如果出现上述的red, blue, black,一个车不可能同时多种颜色,如果是red,那么要在blue和black的column上填写“NULL”(NULL的含义很丰富,可以是none,don't know, irrelevant)
(4)NULL虽然很好用,但是primary key不能是NULL,这样会无法识别。
(5)Database schema : a collection of relation schemas. (schema is a set of tables)
(6)Database (instance) : a collection of relation instances.
(7)unique:DBMS层面,database name必须unique;database层面,schema name必须unique;schema层面,table name必须unique;table层面,attribute name必须unique。
3.3 Example
ER model中的entity映射到relational model中的table,entity的attribute映射到table中的column heading,relation要分情况处理,如果是1 to many,则把1中的key映射到many对应的table column heading,如果是many to many,则创建新的table把两个many的key都写入新table。
例如,图中customer和branch的关系是home,是1 to many,则把branch的key branchName写入customer的table;account和customer的关系是many to many,则创建新的table,图中命名的是heldBy,这个命名不好,因为如果图表复杂,会难以辨认,最好命名为两个entity的名字连接,Account_customer,再把各自的key写入这个table,也就是customer No.和Account No.
map后的instance如下图
3.4 Integrity Constraints
attribute中包括domain,domains limit the set of values that attributes can take.因为现实世界中relation都有可以取值的范围,比如上文举例的车牌号,或者学号只能由英文和数字构成,一个人的年纪不能是负数。
同时integrity意味着primary key不能是NULL。
Referential Integrity constraints describe references between relations (tables), and they are related to notion of a foreign key (FK).
上文提到了不同cardinality下如何映射,在table原有的attribute之外,还有来自其他table的key,这些key叫做foreign key。与primary key不同,foreign key可以是NULL,但来自同一个table的foreign key必须统一,或者都是NULL,或者都有确定的值。Foreign key扮演了“关联”的角色,使得数据库查询成为可能,它连接了不同的table,实现了ER model中的relationships。
3.5 Describing Relational Schemas
SQL实现了Data Definition Language(DDL),它基于database level,不是基于data的Data manipulation。基本架构如下:
CREATE TABLE TableName (
attrName1 domain1 constraints1 ,
attrName2 domain2 constraints2 ,
...
PRIMARY KEY (attri,attrj,...)
FOREIGN KEY (attrx,attry,...)
REFERENCES
OtherTable (attrm,attrn,...)
);
SQL的基本语法:
(1)--,代表comment
(2)'don''t ask',在string中表达'时,重复两次,而不是使用'don\'t ask'
(3)Identifiers and reserved words are case-insensitive:
TableName = tablename = TaBLeNamE != "TableName"
(4)数据类型,integer, float, char(n), varchar(n), date, ... (char is fixed, and varchar is flexible)
(5)操作符:=, <>, <, <=, >, >=, AND, OR, NOT, ...
要求掌握基本语法,考试时会涉及简单内容
用上文的例子来实现:
create table Branch (
branchName text,
address text,
assets integer,
primary key (branchName)
);
--从没有foreign key的table开始,所有写下的内容,尤其是foreign key,前文一定要定义好。
create table Account (
accountNo text,
balance integer,
branchName text,
primary key (accountNo),
foreign key (branchName) references Branch(branchName)
);
create table Customer (
customerNo integer,
name text,
address text,
homeBranch text,
primary key (customerNo),
foreign key (homeBranch) references Branch(branchName)
);
create table Account_Customer (
account text,
customer integer,
primary key (account, customer),
--many to many的关系table primary key是tuple
foreign key (account) references Account(accountNo),
foreign key (customer) references Customer(customerNo)
);
增加限制条件:
1.no accounts can be overdrawn
2.customer numbers are seven-digit integers
3.account numbers look like A-101, B-306, etc.
4.the assets of a branch is the sum of the balances in all of the accounts held at that branch
create domain CustNumType as
char(7) check (value ~ '[0-9]{7}');
create table Branch (
branchName text,
address text,
assets integer,
primary key (branchName)
);
create table Account (
accountNo text check (accountNo ~ '^[A-Z]-[0-9]{3}$'),
balance integer check (balance >= 0),
branchName text,
primary key (accountNo),
foreign key (branchName) references Branch(branchName)
);
create table Customer (
customerNo CustNumType,
name text,
address text unique not null,
homeBranch text,
primary key (customerNo),
foreign key (homeBranch) references Branch(branchName)
);
create table Account_Customer (
account text,
customer CustNumYype,
primary key (account, customer),
foreign key (account) references Account(accountNo),
foreign key (customer) references Customer(customerNo)
);
create table t (
x integer,
y integer,
constraint xBiggerThanY check (x > y)
);
相关语法可以搜索regular expression学习
4.Mapping ER Designs to Relational Schemas
Correspondences between relational and ER data models:
- attribute(ER) ≅ attribute(Rel), entity(ER) ≅ tuple(Rel)
- entity set(ER) ≅ relation(Rel), relationship(ER) ≅ relation(Rel)
Differences between relational and ER models:
- Rel uses relations to model entities and relationships
- Rel has no composite or multi-valued attributes (only atomic)
- Rel has no object-oriented notions (e.g. subclasses, inheritance)
4.1Mapping strong entities
和上文内容一致
4.2Mapping weak entities
要有两个primary key,一个是strong relation的,一个是weak relation的。
For 1:1 relationship, we have a choice, and we prefer total to partial participation in order to minimize the number of NULL.