Multiple Single Field Indices vs Composite Indices

Performance is essential in many consumer products like e-commerce, payment systems, gaming, transportation apps, and so forth. Although databases are internally optimized through more than one mechanism to meet their overall performance necessities in the modern-day world, a lot depends on the app developer as well. In any case, the best developer is aware of what queries the application has to perform.

Single vs Composite indices; Source: Stack Overflow

Developers who deal with relational databases have used or at least heard about indexing, and it’s a pretty common concept inside the database industry. However, the most important thing is to understand what to index and how the indexing is going to enhance the query response time. To do this, you want to understand how you'll query your database tables. A proper index can be created best when you understand precisely what your query and data access patterns look like.

In simple terms, an index maps search keys to corresponding data on disk through the use of various in-memory and on-disk data structures. Indexing is used to quicken the search by reducing the number of records to look for.

Mostly, an index is created on the columns specified within the WHERE clause of a query as the database retrieves and filters data from the tables based totally on those columns. If you don’t create an index, the database scans all the rows, filters out the matching rows, and returns the result. With millions of data, this test operation may take many seconds, and this high response time makes APIs and applications slower and unusable.

Single Indexes

In a relational database, an index is a data structure that increases retrieval speed by lowering write speed and uses more storage space.

Querying a database table of n records via a field aside from a key requires O(n) record reads. (Technically, n ought to be the numbe of disk blocks utilized by that table; however, for simplicity, we will assume it’s simply the number of records)

Single index; Source: WordPress at your fingertips

For example, assuming ssn is a unique field, a query like the one below reads n/2 records on average to find the match.

SELECT * FROM users WHERE users.ssn = '1234';

Because ssn is unique, the database can stop when the first report is found.

However, if the field is not unique, a query like the one underneath reads all n data (aka a full table scan) to discover all matches.

SELECT * FROM users WHERE users.first_name = 'Tristana';

A complete table read is slow, mainly for large tables. This is where indexes can help. An index, or specifically, an index on a column is an extra data structure of the table’s information sorted (usually through a b-tree) only on that column. Each record in the index also includes a pointer to the original data within the table, such that finding data inside the index is equal to locating data within the original table.

For example, suppose we had the following user table:
```

ID | first_name | last_name    | Class      | Position |  ssn |
---------------------------------------------------------------
1  | Tristana   | Bandlewood   | Marksman   | ADC      | 9876 |
2  | Lucian     | Blackthorn   | Marksman   | ADC      | 3456 |
3  | Elise      | Nocturne     | Assassin   | Mid      | 1357 |
4  | Viktor     | Hextech      | Mage       | Mid      | 8765 |
5  | Lulu       | Pix          | Specialist | Support  | 9999 |
6  | Braum      | Heartguard   | Tank       | Support  | 6666 |
7  | Jinx       | Hexplosives | Marksman   | ADC      | 2222 |
8  | Zyra       | Thorns       | Mage       | Mid      | 7890 |
9  | Kassadin   | Voidwalker   | Mage       | Mid      | 4567 |
10 | Shaco      | Trickster    | Assassin   | Jungle   | 1111 |
11 | Swain      | Mastermind   | Mage       | Mid      | 7891 |
12 | Viktor     | Hextech      | Mage       | Mid      | 8765 |
13 | Skarner    | Crystal      | Fighter    | Jungle   | 5632 |
14 | Yuumi      | Magical Cat  | Specialist | Support  | 3210 |
15 | Sivir      | Battle Mistress | Marksman | ADC      | 7777 |
16 | Zilean     | Chronokeeper | Mage       | Mid      | 1112 |
17 | Malphite   | Shardborn   | Tank       | Top      | 8888 |
18 | Janna      | Windforce    | Controller | Support  | 4580 |
19 | Brand      | Blaze       | Mage       | Mid      | 7771 |
20 | Xayah      | Rebel       | Marksman   | ADC      | 1432 |
21 | Hecarim    | Shadow of War | Fighter | Jungle   | 4444 |
22 | Yorick     | Mori         | Tank       | Top      | 4840 |
23 | Ahri       | Nine-Tails  | Mage       | Mid      | 9871 |

If we create an index of users.first_name

CREATE INDEX first_name_index ON users (first_name) USING BTREE;

It would generate a sorting of the users based on their first_name, with a pointer to their primary key, similar to this:

Ahri: 23
Brand: 19
Braum: 6
Elise: 3
Tristana: 1
Janna: 18
Jinx: 7
Zyra: 8
Kassadin: 9
Xayah: 20
Lucian: 2
Yuumi: 14
Zilean: 16
Malphite: 17
Sivir: 15
Lulu: 5
Swain: 11
Shaco: 10
Viktor: 4
Hecarim: 21
Skarner: 13
Yorick: 22

Then a query like

SELECT * FROM users WHERE first_name = 'Tristana';

might take the most effective O(log_2(n)) reads because the database can carry out a binary search in this index, in view that it's sorted by first_name.

Enforcing Uniqueness with Indexes

In addition to performance benefits, indexes can also be used to enforce the uniqueness of fields. For example, suppose we don’t need more than one user to have an identical social security number field on a table. Adding a UNIQUE modifier to an index

CREATE UNIQUE INDEX ssn_index ON users (ssn);

will raise an error if any other record is created with an ssn that already exists inside the table.

Avoid Unnecessary Indexes

According to the no-free lunch precept, index-based performance gains don’t come free. Its costs are as follows:

  • Additional storage space to store indexes
  • Indexes also have to be updated when state-changing queries like CREATE UPDATE and DELETE are made

As such, including needless indexes can simply reduce the quality of overall performance. Here are a few hints for when to use indexes:

  • An index should not be used for low-read but high-write tables. As stated previously, indexes improve read performance but degrade overall performance.
  • Do not use an index if the field has low cardinality or a number of distinct values in that area. The idea behind the O(log_2(n)) query performance is that we are able to roughly remove half the records with every binary seek step. However, this is only true for high-cardinality fields.
  • Do not use an index for small, fixed-sized tables. Small tables will not see any noticeable performance gains with indexes. Note that a few tables may be small now; however, they're probably going to grow large over time, like users. Other tables are small and will remain so.

Composite Indexes

Like a single index, a composite index is a data structure of records sorted on something. But not like a single index, that something is not a field but rather a concatenation of more than one field.

Composite index; Source: WordPress at your fingertips

For instance, returning to our users' table example:

ID | first_name | last_name    | Class      | Position |  ssn |
---------------------------------------------------------------
1  | Tristana   | Bandlewood   | Marksman   | ADC      | 9876 |
2  | Lucian     | Blackthorn   | Marksman   | ADC      | 3456 |
3  | Elise      | Nocturne     | Assassin   | Mid      | 1357 |
4  | Viktor     | Hextech      | Mage       | Mid      | 8765 |
5  | Lulu       | Pix          | Specialist | Support  | 9999 |
6  | Braum      | Heartguard   | Tank       | Support  | 6666 |
7  | Jinx       | Hexplosives | Marksman   | ADC      | 2222 |
8  | Zyra       | Thorns       | Mage       | Mid      | 7890 |
9  | Kassadin   | Voidwalker   | Mage       | Mid      | 4567 |
10 | Shaco      | Trickster    | Assassin   | Jungle   | 1111 |
11 | Swain      | Mastermind   | Mage       | Mid      | 7891 |
12 | Viktor     | Hextech      | Mage       | Mid      | 8765 |
13 | Skarner    | Crystal      | Fighter    | Jungle   | 5632 |
14 | Yuumi      | Magical Cat  | Specialist | Support  | 3210 |
15 | Sivir      | Battle Mistress | Marksman | ADC      | 7777 |
16 | Zilean     | Chronokeeper | Mage       | Mid      | 1112 |
17 | Malphite   | Shardborn   | Tank       | Top      | 8888 |
18 | Janna      | Windforce    | Controller | Support  | 4580 |
19 | Brand      | Blaze       | Mage       | Mid      | 7771 |
20 | Xayah      | Rebel       | Marksman   | ADC      | 1432 |
21 | Hecarim    | Shadow of War | Fighter | Jungle   | 4444 |
22 | Yorick     | Mori         | Tank       | Top      | 4840 |
23 | Ahri       | Nine-Tails  | Mage       | Mid      | 9871 |

Making a composite index based on the class and position columns

CREATE INDEX class_pos_index ON users (class, position);

will generate a composite index that is sorted by a concatenation of those fields, similar to:

AssassinMid: 3
AssassinJungle: 10
ControllerSupport: 18
FighterJungle: 13
FighterJungle: 21
MageMid: 4
MageMid: 8
MageMid: 9
MageMid: 11
MageMid: 12
MageMid: 16
MageMid: 19
MageMid: 23
MarksmanADC: 1
MarksmanADC: 2
MarksmanADC: 7
MarksmanADC: 15
MarksmanADC: 20
SpecialistSupport: 5
SpecialistSupport: 14
TankSupport: 6
TankTop: 17
TankTop: 22

Given this composite index, a query like

SELECT * FROM users
WHERE
class = 'Specialist'
AND
role = 'Top';


Will improve retrieval time because the composite index is sorted by class-position. The database can find SpecialistTop in O(log_2(n)) reads instead of a complete table read.

Note that even a query on simply the class field

SELECT * FROM users WHERE class = 'Specialist';

will benefit from this composite index due to the fact that class is the first field. Searching a class-sorted index is roughly equivalent to searching a class-position-sorted composite index.

However, a query such as

SELECT * FROM users WHERE role = 'Top';

Because position is the second field, this composite index will not benefit them. A composite index sorted by class position cannot be used to locate a record quickly.

As such, the order of the constituent columns that incorporate a composite index is vital. If a composite index is created on column1, column2, column3,..., columnN. Then queries like these will have performance gains.

SELECT * FROM table
WHERE column1 = 'value';
SELECT * FROM table
WHERE column1 = 'value1'
AND column2 = 'value2';
SELECT * FROM table
WHERE column1 = 'value1'
AND column2 = 'value2'
AND column3 = 'value3'
...
SELECT * FROM table
WHERE column1 = 'value1'
AND column2 = 'value2'
AND column3 = 'value3'
...
AND columnN = 'valueN'

Guidelines for determining composite indexes

Like single indexes, composite indexes additionally include slower write speeds and increased storage space. Consider the following when deciding which fields should comprise a composite index and how to arrange these fields:

  • If certain fields tend to appear together in queries, then it’s a good concept to create a composite index on them. For example, in our user example table above, it’s possibly a good idea to create a composite index on (last_name, first_name).
  • If we’re going to create an index on field 1, but also create a composite index on field 1, field 2, Then simply creating the composite index on (field1, field2) is enough due to the fact that it may be used for querying by field1 on its own.
  • Similar to single indexes, the cardinality of the fields matters to the effectiveness of composite indexes. Obviously, if two fields have high cardinality, a composite index of these fields can also have high cardinality. But it’s also possible that numerous low-cardinality fields can together create a high-cardinality composite index based on the distribution of values among the fields.

Enforcing Unique Combinations With Composite Indexes

Composite indexes may be used to implement the distinctiveness of combinations of fields. Oftentimes, we don’t require values in fields to be unique, but we require mixtures of fields to be unique. For example, assume an address table has a street field, address_number field, and city. We don’t require its street, house number, or city to be unique, on account of the fact that more than one address can share the same street, house number, or city. However, we likely want to require each street-house-number-city aggregate to be unique, considering that quite pretty much makes up an address. For this, we can use a composite index with a UNIQUE modifier.

CREATE UNIQUE INDEX index_st_no_city
ON addresses (street, house number, city);

Conclusion

In the end, it’s extremely important to recognize the different aspects of database indexing. It will help even when doing low-level system design. Many real-life optimizations of our applications rely on knowledge of such complicated details. A cautiously chosen index will really help you accelerate the performance of your application.



How much is a great User Experience worth to you?


Browsee helps you understand your user's behaviour on your site. It's the next best thing to talking to them.

Browsee Product