Star vs. Snowflake Schema

Introduction

The star and snowflake schema are logical storage designs commonly found in data marts and data warehouse architecture. While common database types use ER (Entity-Relationship) diagrams, the logical structure of warehouses uses dimensional models to conceptualize the storage system.

Read on to find out the differences, characteristics, and flaws of the star and snowflake schemas.

Star vs. Snowflake Schema

Star Schema vs. Snowflake Schema: The Main Difference

The two main elements of the dimensional model of the star and snowflake schema are:

1. Facts table. A table with the most considerable amount of data, also known as a cube.

2. Dimension tables. The derived data structure provides answers to ad hoc queries or dimensions, often called lookup tables.

Connecting chosen dimensions on a facts table forms the schema. Both the star and snowflake schemas make use of the dimensionality of data to model the storage system.

The main differences between the two schemas are:

Star Schema Snowflake Schema
Elements Fact table Dimension tables Fact table Dimension tables Subdimension tables
Structure Star-shaped Snowflake shaped
Dimensions One table per dimension Multiple tables for each dimension
Model Direction Top-down Bottom-up
Storage space Uses more storage Uses less space
Normalization Denormalized dimension tables Normalized dimension tables
Query Performance Fast, fewer JOINs needed because of fewer foreign keys Slow, more JOINs required because of more foreign keys
Query Complexity Simple and easier to understand Complicated and more challenging to understand
Data Redundancy High Low
Use case Dimension tables with several rows, typical with data marts Dimension tables with multiple rows found with data warehouses

Due to the complexity of the snowflake schema and the lower performances, the star schema is the preferred option whenever possible. One typical way to get around the problems in the snowflake schema is to decompose the dedicated storage into multiple smaller entities with a star schema.

What Is a Star Schema?

A star schema is a logical structure for the development of data marts and simpler data warehouses. The simple model consists of dimension tables connected to a facts table in the center.

Star Schema explained with diagram

The facts table typically consists of:

  • Quantifiable numerical data, such as values or counts.
  • References to the dimensions through foreign keys.

The lookup tables represent descriptive information directly connected to the facts table.

For example, to model the sales of an ecommerce business, the facts table for purchases might contain the total price of the purchase. On the other hand, dimensional tables have descriptive information about the items, customer data, the time or location of purchase.

Example of an ecommerce structure using star schema

The star schema for the analysis of purchases in the example has four dimensions. The facts table connects to the dimensional tables through the concept of foreign and primary keys. Apart from the numerical data, the facts table therefore also consists of foreign keys to define relations between tables.

Characteristics of a Star Schema

The main characteristics of the star schema are:

  • Simplified and fast queries. Fewer JOIN operations due to denormalization make information more readily available.
  • Simple relationships. The schema works great with one-to-one or one-to-many relationships.
  • Singular dimensionality. One table describes each dimension.
  • OLAP friendly. OLAP systems widely use star schema to design data cubes.

Drawbacks of a Star Schema

The disadvantages of using the star schema are:

  • Redundancy. The dimensional tables are one-dimensional, and data redundancy is present.
  • Low integrity. Due to denormalization, updating information is a complex task.
  • Limited queries. The set of questions is limited, which also narrows down the analytical power.

What Is a Snowflake Schema?

The snowflake schema has a branched-out logical structure used in large data warehouses. From the center to the edges, entity information goes from general to more specific.

Apart from the dimensional model’s common elements, the snowflake schema further decomposes dimensional tables into subdimensions.

Snowflake schema explained with diagram

The ecommerce sales analysis model from the previous example further branches (“snowflakes”) into smaller categories and subcategories of interest.

Example of an ecommerce structure using snowflake schema

The four dimensions decompose into subdimensions. The lookup tables further normalize through a series of connected objects.

Characteristics of a Snowflake Schema

The main features of the snowflake schema include:

  • Small storage. The snowflake schema does not require as much storage space.
  • High granularity. Dividing tables into subdimensions allows analysis at various depths of interest. Adding new subdimensions is a simple process as well.
  • Integrity. Due to normalization, the schema has a higher level of data integrity and low redundancies.

Drawbacks of a Snowflake Schema

The weaknesses of the snowflake schema are:

  • Complexity. The database model is complex, and so are the executed queries. Multiple multidimensional tables make the design complicated to work with overall.
  • Slow processing. Many lookup tables require multiple JOIN operations, which slows down information retrieval.
  • Hard to maintain. A high level of granularity makes the schema hard to manage and maintain.

Conclusion

The star and snowflake schema are both logical designs of database storage systems. Both schemas show up in big data systems, where the amount of information is massive and hard to maintain.

For more information about big data platforms, check out our article about the Hadoop platform.

原创文章,作者:3628473679,如若转载,请注明出处:https://blog.ytso.com/224079.html

(0)
上一篇 2022年1月7日
下一篇 2022年1月7日

相关推荐

发表回复

登录后才能评论