NBA Database Architecture
Comprehensive database design project comparing Relational (SQL) and NoSQL paradigms for managing NBA sports analytics data.
Sports analytics relies heavily on the ability to store, retrieve, and analyze massive amounts of data efficiently. The NBA (National Basketball Association) ecosystem presents a complex data modeling challenge, involving interconnected entities like players, teams, games, and play-by-play statistics.
This project, developed as part of my university coursework, aimed to build a robust information system to manage these entities, performing a comparative analysis between a traditional Relational Database (MySQL) and a Document-based NoSQL Database (MongoDB).
Objectives
- Design a conceptual and logical model capable of handling historical sports data and player statistics.
- Implement a normalized SQL database with complex integrity constraints.
- Migrate the architecture to a NoSQL environment to evaluate performance trade-offs in query execution.
My role
- Designed the Entity-Relationship (ER) diagram and Relational Schema.
- Implemented DDL/DML scripts for MySQL.
- Designed the JSON document structure for MongoDB implementation.
- Conducted query performance analysis between the two systems.
Tech Stack
| DBMS | MySQL, MongoDB |
|---|---|
| Languages | SQL, NoSQL |
| Data Source | Kaggle |
The Challenge: Modeling Sports Data
The core challenge in modeling NBA data lies in the temporal dimension and the density of relationships. A simple relationship is insufficient, as players trade teams across seasons, and statistics must be tracked on a per-game basis.
The system had to account for:
- Many-to-Many Relationships: Players participating in Games, Teams playing against each other.
- Historical Tracking: Managing team rosters across different seasons.
- Granular Statistics: Storing detailed performance metrics for every player in every game.
Methodology & Implementation
The project followed a rigorous database engineering approach:
- Conceptual Design: Defined entities, attributes, and hierarchies to reflect the real-world NBA structure.
- Logical Design (SQL): Mapped the ER model to a Relational Schema. This involved Normalization to eliminate redundancy and ensure data consistency in MySQL.
- NoSQL Adaptation: Transformed the rigid relational structure into a flexible Document-based model. This required strategic decisions on embedding versus referencing data to optimize for typical sports analytics queries.
Results: SQL vs. NoSQL Performance
The final phase involved stress-testing both systems with complex analytical queries, such as calculating player averages over multiple seasons.
Key Findings:
- Data Integrity: MySQL proved superior for maintaining strict consistency, essential for transactional data like rosters and contracts.
- Query Performance: MongoDB outperformed MySQL in fetching full game details. By embedding player statistics within the Game documents, MongoDB required a single read operation, whereas MySQL required complex JOIN operations across multiple tables.