Database Indexing
· What is an Index
∘ Types of Indexes
∘ Index Implementations
· Clustered Vs Non-Clustered/Secondary Index
· Bitmap indexes
· Resources
What is an Index
- An index is is a data structure that you build and you assign on top of an existing table that basically looks through your table and then tries to analyze it and summarize it so that it it can create kind of shortcuts
- An index typically consists of two columns that are a key-value pair. The index table’s two columns (also known as the key-value pair) contain copies of selected columns from the database’s tabular data.
- The primary/clustered index columns are stored in a sorted way. We often keep these Search Keys in ascending order. These sorted keys help us to search the data quickly. We can modify the sort order from ascending to descending or something else based on the most common queries in the database.
Types of Indexes
- Primary Index (index table is created using primary keys — unique, ordered, not null)
→ UUIDs as primary key - Secondary Index/Non-Clustered Index (index table is created using candidate keys)
- Clustered Index (index table is created using non-key values) — You can think of these just like indexes in a book. The index points to the location in the book where you can find the data you are looking for.
Both clustered and non-clustered indexes are stored and searched as B-trees, a data structure similar to a binary tree.
A B-tree is a “self-balancing tree data structure that maintains sorted data and allows searches, sequential access, insertions, and deletions in logarithmic time.” Basically it creates a tree-like structure that sorts data for quick searching. [source]
Index Implementations
B-tree indexing is one of the most popular and widely used indexing methods. In database management systems, a B-tree is a sort of tree data structure that has two elements: an index key and its matching disk address. The Index Key relates to a specific disk address, and that disk also includes rows or tuples of data.
- BitMap
Bitmap indexing stores tuple or row addresses in strings. A bitmap is a mapping of one system to another, such as numbers to bits.
Bitmaps have an advantage over B-trees since they provide faster retrieval of specific data. Bitmaps are also more compressed than B-trees.
There is one disadvantage of bit mapping: it adds overhead to tuple operations on the table. As a result, bit maps are most commonly utilized in data warehouse environments.
- LSM Trees
- [Postgres] Every primary key has a index by default and it’s a btree.
- Like queries cannot utilize indexes and have to use full table scan
- It is not a guarantee that the database will always employ an index. The planner will determine whether or not to use the index as it moves forward.