NBA Database Architecture | Romeo Silvestri

Sports analytics relies heavily on the ability to store, retrieve, and analyze massive amounts of data efficiently. The NBA (National Basketball Association) ecosystem presents a complex data modeling challenge, involving interconnected entities like players, teams, games, and play-by-play statistics.

This project, developed as part of my university coursework, aimed to build a robust information system to manage these entities, performing a comparative analysis between a traditional Relational Database (MySQL) and a Document-based NoSQL Database (MongoDB).

Objectives

Design a conceptual and logical model capable of handling historical sports data and player statistics.
Implement a normalized SQL database with complex integrity constraints.
Migrate the architecture to a NoSQL environment to evaluate performance trade-offs in query execution.

My role

Designed the Entity-Relationship (ER) diagram and Relational Schema.
Implemented DDL/DML scripts for MySQL.
Designed the JSON document structure for MongoDB implementation.
Conducted query performance analysis between the two systems.

Tech Stack

DBMS	MySQL, MongoDB
Languages	SQL, NoSQL
Data Source	Kaggle

The Challenge: Modeling Sports Data

The core challenge in modeling NBA data lies in the temporal dimension and the density of relationships. A simple relationship is insufficient, as players trade teams across seasons, and statistics must be tracked on a per-game basis.

The system had to account for:

Many-to-Many Relationships: Players participating in Games, Teams playing against each other.
Historical Tracking: Managing team rosters across different seasons.
Granular Statistics: Storing detailed performance metrics for every player in every game.

The Conceptual Design (ER Diagram) showing the entities (Players, Teams, Games) and their cardinalities.

Methodology & Implementation

The project followed a rigorous database engineering approach:

Conceptual Design: Defined entities, attributes, and hierarchies to reflect the real-world NBA structure.
Logical Design (SQL): Mapped the ER model to a Relational Schema. This involved Normalization to eliminate redundancy and ensure data consistency in MySQL.
NoSQL Adaptation: Transformed the rigid relational structure into a flexible Document-based model. This required strategic decisions on embedding versus referencing data to optimize for typical sports analytics queries.

Comparison of paradigms for an entity: The structured Relational Schema (left) vs. the denormalized JSON structure used in MongoDB (right).

Visual comparison of an SQL query versus the equivalent aggregation pipeline in MongoDB.

Results: SQL vs. NoSQL Performance

The final phase involved stress-testing both systems with complex analytical queries, such as calculating player averages over multiple seasons.

Key Findings:

Data Integrity: MySQL proved superior for maintaining strict consistency, essential for transactional data like rosters and contracts.
Query Performance: MongoDB outperformed MySQL in fetching full game details. By embedding player statistics within the Game documents, MongoDB required a single read operation, whereas MySQL required complex JOIN operations across multiple tables.

Final results of the query evaluation.