Database Internals Ch.1 Introduction and Overview

2020-05-02

storage medium
- Memory- vs. Disk-Based
layout
- Column- vs. Row-Oriented
other taxonomy (not discussed)
- OLTP vs. OLAP vs. HTAP (Hybrid Transactional & Analytical Processing)
- k-v store, relational, document-oriented, graph databases

DBMS Architecture

Transport
- Cluster Communication
- Client Communication
Query Processor
- Query Parser
  - parse, interpret, validate, access control
- Query Optimizer
  - based on internal statistics, index cardinality, approx. intersection size
  - data placement
  - usually presented as dependency tree for execution plan/query plan
Execution Engine
- Remote Execution
  - read/write, replication
- Local Execution
Storage Engine
- Transaction Manager
  - schedule transaction
  - ensure logical consistent
- Lock Manager
  - ensure physical data integrity
- Access Methods
  - manage access and organizing data on disk
  - heap file, B-Trees, LSM Trees (discussed later)
- Buffer Manager
  - cache data pages
- Recovery Manager
  - operation logs and restoring

main memory	disk-based
primary in memory	primary in disk
use disk for recovery & logging	use memory for caching
usually simpler, because OS abstract memory management	have to manage data references, serialization, freed memory, fragmentation
limit by volatility, might change while NVM (Non-Volatile Memory) grow
because the random access capacity, can choose from a larger pool of data structures	usually use wide and short tree
make durability by backup copy, batch compaction, snapshot, checkpointing

column-oriented	row-oriented	wide column store
partition vertically	partition horizontally	group into column families, row-wise in each column family
Parquet, ORC, RCFile, Kudu, ClickHouse	MySQL, PostgreSQL	BigTable, HBase
analytical workloads	transactional workloads	retrieving by a sequence of keys
reconstruct with implicit identifiers / or offset	identified by key	identified by key & qualifier
computational efficiency with CPU’s vectorized instructions
compression efficiency

DBMS use specialized file organization for the purposes of:

Some terminologies:

data records: consisting of multiple fields
index:efficiently locate data records without scanning
data files & index files: usually separated
page: files are partitioned into pages, as size of one or multiple disk blocks
deletion markers (tombstones): shadow the deleted record until reclaiming during garbage collection

Also called primary files.

Implemented as:

heap-organized tables
- no ordering required
- append with new records
- require additional index structures to be searchable
hash-organized tables
- records are stored in buckets
- inside the bucket, could be sorted or not
index-organized tables (IOTs)
- store data records in the index
- range scan could be done by sequentially scan
- reduce the disk seek by one

Primary index & Secondary index

Primary index
- is built over a primary key or a set of keys identified as primary
- unique entry per search key
Secondary index
- all other indexes
- may holds several entries per search key
- may point to the same record from multiple secondary indexes

Clustered & Non-clustered

Clustered
- the order of data records follows the search key order
- primary indexes are most often clustered
- IOTs are clustered by definition
- secondary indexes are non-clustered by definition

Referencing directly or primary index as an indirection (when search by secondary index)

Referencing directly
- reduce the number of disk seek
Indirection
- reduce the cost of pointer updates while the record relocate
- ex. MySQL InnoDB
Hybrid
- store both data file offset and primary keys
- try directly offset, if failed, go by primary key
- update index after finding a new offset

Three common variables for storage structures.

Buffering
- ex. in-memory buffers to B-Tree nodes to amortize the I/O costs
- ex. two components LSM Trees combine buffering with immutability
Immutability
- modifications are appended
- copy-on-write
- distinction between LSM and B-Trees is drawn as immutable against in-place update storage
Ordering
- whether could efficiently range scan